Skip to content
GCC AI Research

Search

Results for "CVQA"

MOTOR: Multimodal Optimal Transport via Grounded Retrieval in Medical Visual Question Answering

arXiv ·

This paper introduces MOTOR, a multimodal retrieval and re-ranking approach for medical visual question answering (MedVQA) that uses grounded captions and optimal transport to capture relationships between queries and retrieved context, leveraging both textual and visual information. MOTOR identifies clinically relevant contexts to augment VLM input, achieving higher accuracy on MedVQA datasets. Empirical analysis shows MOTOR outperforms state-of-the-art methods by an average of 6.45%.

Cultural awareness in AI: New visual question answering benchmark shared in oral presentation at NeurIPS

MBZUAI ·

MBZUAI researchers, in collaboration with over 70 researchers, have created the Culturally diverse Visual Question Answering (CVQA) benchmark to evaluate cultural understanding in multimodal LLMs. The CVQA dataset includes over 10,000 questions in 31 languages and 13 scripts, testing models on images of local dishes, personalities, and monuments. Testing of several multimodal LLMs on the CVQA benchmark revealed significant challenges, even for top models. Why it matters: This benchmark highlights the need for AI models to better understand diverse cultures, promoting fairness and relevance across different languages and regions.

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

arXiv ·

MBZUAI researchers introduce VideoMathQA, a new benchmark for evaluating mathematical reasoning in videos, requiring models to interpret visual information, text, and spoken cues. The dataset spans 10 mathematical domains with videos ranging from 10 seconds to over 1 hour, and includes multi-step reasoning annotations. The benchmark aims to evaluate temporal cross-modal reasoning and highlights the limitations of existing approaches in complex video-based mathematical problem solving.

Improving patient care with computer vision

MBZUAI ·

MBZUAI's BioMedIA lab, led by Mohammad Yaqub, is developing AI solutions for healthcare challenges in cardiology, pulmonology, and oncology using computer vision. Yaqub's previous research analyzed fetal ultrasound images to correlate bone development with maternal vitamin D levels. The lab is now applying image analysis to improve the treatment of head and neck cancer using PET and CT scans. Why it matters: This research demonstrates the potential of AI and computer vision to improve diagnostic accuracy and accessibility of healthcare in the region and beyond.

BRIQA: Balanced Reweighting in Image Quality Assessment of Pediatric Brain MRI

arXiv ·

This paper introduces BRIQA, a new method for automated assessment of artifact severity in pediatric brain MRI, which is important for diagnostic accuracy. BRIQA uses gradient-based loss reweighting and a rotating batching scheme to handle class imbalance in artifact severity levels. Experiments show BRIQA improves average macro F1 score from 0.659 to 0.706, especially for Noise, Zipper, Positioning and Contrast artifacts.

Old images to anticipate the future

MBZUAI ·

MBZUAI researchers presented a new approach to video question answering at ICCV 2023. The method leverages insights from analyzing still images to understand video content, potentially reducing the computational resources needed for training video question answering models. Guangyi Chen, Kun Zhang, and colleagues aim to apply pre-trained image models to understand video concepts. Why it matters: This research could lead to more efficient and accessible video analysis tools, benefiting fields like healthcare and security where video data is abundant.