The paper introduces MedPromptX, a clinical decision support system using multimodal large language models (MLLMs), few-shot prompting (FP), and visual grounding (VG) for chest X-ray diagnosis, integrating imagery with EHR data. MedPromptX refines few-shot data dynamically for real-time adjustment to new patient scenarios and narrows the search area in X-ray images. The study introduces MedPromptX-VQA, a new visual question answering dataset, and demonstrates state-of-the-art performance with an 11% improvement in F1-score compared to baselines.
Researchers from MBZUAI have developed XReal, a diffusion model for generating realistic chest X-ray images with precise control over anatomy and pathology location. The model utilizes an Anatomy Controller and a Pathology Controller to introduce spatial control in a pre-trained Text-to-Image Diffusion Model without fine-tuning. XReal outperforms existing X-ray diffusion models in realism, as evaluated by quantitative metrics and radiologists' ratings, and the code/weights are available.
MBZUAI researchers introduce XrayGPT, a conversational medical vision-language model for analyzing chest radiographs and answering open-ended questions. The model aligns a medical visual encoder (MedClip) with a fine-tuned large language model (Vicuna) using a linear transformation. To enhance performance, the LLM was fine-tuned using 217k interactive summaries generated from radiology reports.
This paper introduces Pulmonary Embolism Detection using Contrastive Learning (PECon), a supervised contrastive pretraining strategy using both CT scans and EHR data to improve feature alignment between modalities for better PE diagnosis. PECon pulls sample features of the same class together while pushing away features of other classes. The approach achieves state-of-the-art results on the RadFusion dataset, with an F1-score of 0.913 and AUROC of 0.943.
Researchers at MBZUAI have introduced TiBiX, a novel approach leveraging temporal information from previous chest X-rays (CXRs) and reports for bidirectional generation of current CXRs and reports. TiBiX addresses two key challenges: generating current images from previous images and reports, and generating current reports from both previous and current images. The study also introduces a curated temporal benchmark dataset derived from the MIMIC-CXR dataset and achieves state-of-the-art results in report generation.
This paper introduces a self-supervised contrastive learning method for segmenting the left ventricle in echocardiography images when limited labeled data is available. The approach uses contrastive pretraining to improve the performance of UNet and DeepLabV3 segmentation networks. Experiments on the EchoNet-Dynamic dataset show the method achieves a Dice score of 0.9252, outperforming existing approaches, with code available on Github.
Researchers from MBZUAI have developed EchoCoTr, a novel spatiotemporal deep learning method for estimating left ventricular ejection fraction (LVEF) from echocardiograms. EchoCoTr combines CNNs and vision transformers to overcome the limitations of each when applied to medical video data. The method achieves state-of-the-art results on the EchoNet-Dynamic dataset, demonstrating improved accuracy compared to existing approaches, with code available on GitHub.
MBZUAI researchers introduce UniMed-CLIP, a unified Vision-Language Model (VLM) for diverse medical imaging modalities, trained on the new large-scale, open-source UniMed dataset. UniMed comprises over 5.3 million image-text pairs across six modalities: X-ray, CT, MRI, Ultrasound, Pathology, and Fundus, created using LLMs to transform classification datasets into image-text formats. UniMed-CLIP significantly outperforms existing generalist VLMs and matches modality-specific medical VLMs in zero-shot evaluations, improving over BiomedCLIP by +12.61 on average across 21 datasets while using 3x less training data.