The paper introduces MedPromptX, a clinical decision support system using multimodal large language models (MLLMs), few-shot prompting (FP), and visual grounding (VG) for chest X-ray diagnosis, integrating imagery with EHR data. MedPromptX refines few-shot data dynamically for real-time adjustment to new patient scenarios and narrows the search area in X-ray images. The study introduces MedPromptX-VQA, a new visual question answering dataset, and demonstrates state-of-the-art performance with an 11% improvement in F1-score compared to baselines.
MBZUAI researchers introduce XrayGPT, a conversational medical vision-language model for analyzing chest radiographs and answering open-ended questions. The model aligns a medical visual encoder (MedClip) with a fine-tuned large language model (Vicuna) using a linear transformation. To enhance performance, the LLM was fine-tuned using 217k interactive summaries generated from radiology reports.
This paper introduces MOTOR, a multimodal retrieval and re-ranking approach for medical visual question answering (MedVQA) that uses grounded captions and optimal transport to capture relationships between queries and retrieved context, leveraging both textual and visual information. MOTOR identifies clinically relevant contexts to augment VLM input, achieving higher accuracy on MedVQA datasets. Empirical analysis shows MOTOR outperforms state-of-the-art methods by an average of 6.45%.
MBZUAI researchers introduce UniMed-CLIP, a unified Vision-Language Model (VLM) for diverse medical imaging modalities, trained on the new large-scale, open-source UniMed dataset. UniMed comprises over 5.3 million image-text pairs across six modalities: X-ray, CT, MRI, Ultrasound, Pathology, and Fundus, created using LLMs to transform classification datasets into image-text formats. UniMed-CLIP significantly outperforms existing generalist VLMs and matches modality-specific medical VLMs in zero-shot evaluations, improving over BiomedCLIP by +12.61 on average across 21 datasets while using 3x less training data.
Researchers at MBZUAI have introduced TiBiX, a novel approach leveraging temporal information from previous chest X-rays (CXRs) and reports for bidirectional generation of current CXRs and reports. TiBiX addresses two key challenges: generating current images from previous images and reports, and generating current reports from both previous and current images. The study also introduces a curated temporal benchmark dataset derived from the MIMIC-CXR dataset and achieves state-of-the-art results in report generation.