Skip to content
GCC AI Research

MOTOR: Multimodal Optimal Transport via Grounded Retrieval in Medical Visual Question Answering

arXiv · · Significant research

Summary

This paper introduces MOTOR, a multimodal retrieval and re-ranking approach for medical visual question answering (MedVQA) that uses grounded captions and optimal transport to capture relationships between queries and retrieved context, leveraging both textual and visual information. MOTOR identifies clinically relevant contexts to augment VLM input, achieving higher accuracy on MedVQA datasets. Empirical analysis shows MOTOR outperforms state-of-the-art methods by an average of 6.45%.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

New approach for better AI analysis of medical images presented at MICCAI

MBZUAI ·

MBZUAI researchers developed a new approach called Multimodal Optimal Transport via Grounded Retrieval (MOTOR) to improve the accuracy of vision-language models for medical image analysis. MOTOR combines retrieval-augmented generation (RAG) with an optimal transport algorithm to retrieve and rank relevant image and textual data. Testing on two medical datasets showed that MOTOR improved average performance by 6.45%. Why it matters: This technique addresses the challenges of limited specialized medical datasets and computational costs associated with training AI models for medical image interpretation, offering a more efficient and accurate solution.