Researchers address the challenge of limited Arabic medical dialogue data by generating 80,000 synthetic question-answer pairs using ChatGPT-4o and Gemini 2.5 Pro, expanding an initial dataset of 20,000 records. They fine-tuned five LLMs, including Mistral-7B and AraGPT2, and evaluated performance using BERTScore and expert review. Results showed that training with ChatGPT-4o-generated data led to higher F1-scores and fewer hallucinations across models. Why it matters: This demonstrates the potential of synthetic data augmentation to improve domain-specific Arabic language models, particularly for low-resource medical NLP applications.
Researchers developed COVIBOT, a smart chatbot to spread awareness and assist during the COVID-19 pandemic in Saudi Arabia. The chatbot uses Azure Cognitive Services and is available in both English and Arabic. COVIBOT's use cases were tested and validated using a scenario-based approach.
This paper introduces MOTOR, a multimodal retrieval and re-ranking approach for medical visual question answering (MedVQA) that uses grounded captions and optimal transport to capture relationships between queries and retrieved context, leveraging both textual and visual information. MOTOR identifies clinically relevant contexts to augment VLM input, achieving higher accuracy on MedVQA datasets. Empirical analysis shows MOTOR outperforms state-of-the-art methods by an average of 6.45%.
The paper introduces MedPromptX, a clinical decision support system using multimodal large language models (MLLMs), few-shot prompting (FP), and visual grounding (VG) for chest X-ray diagnosis, integrating imagery with EHR data. MedPromptX refines few-shot data dynamically for real-time adjustment to new patient scenarios and narrows the search area in X-ray images. The study introduces MedPromptX-VQA, a new visual question answering dataset, and demonstrates state-of-the-art performance with an 11% improvement in F1-score compared to baselines.
MBZUAI researchers developed MedAgentSim, a simulated hospital environment to evaluate AI diagnostic abilities. The simulation uses LLM-powered agents to mimic doctor-patient conversations, providing a dynamic assessment of diagnostic skills. The system includes doctor, patient, and evaluator agents that interact within the simulated hospital, making real-time decisions. Why it matters: This research offers a more realistic evaluation of AI in clinical settings, addressing limitations of current benchmarks and potentially improving AI's use in healthcare.
MBZUAI researchers introduce XrayGPT, a conversational medical vision-language model for analyzing chest radiographs and answering open-ended questions. The model aligns a medical visual encoder (MedClip) with a fine-tuned large language model (Vicuna) using a linear transformation. To enhance performance, the LLM was fine-tuned using 217k interactive summaries generated from radiology reports.
This study explores fine-tuning large language models (LLMs) for Arabic medical text generation to improve hospital management systems. A unique dataset was collected from social media, capturing medical conversations between patients and doctors, and used to fine-tune models like Mistral-7B, LLaMA-2-7B, and GPT-2. The fine-tuned Mistral-7B model outperformed the others with a BERT F1-score of 68.5%. Why it matters: The research demonstrates the potential of generative AI to provide scalable and culturally relevant solutions for healthcare challenges in Arabic-speaking regions.