Skip to content
GCC AI Research

Search

Results for "MIRAGE"

MIRA: A Novel Framework for Fusing Modalities in Medical RAG

arXiv ·

MBZUAI researchers have introduced MIRA, a novel framework for improving the factual accuracy of multimodal large language models in medical applications. MIRA uses calibrated retrieval to manage factual risk and integrates image embeddings with a medical knowledge base for efficient reasoning. Evaluated on medical VQA and report generation benchmarks, MIRA achieves state-of-the-art results, with code available on GitHub.

Making human-machine conversation more lifelike than ever at GITEX

MBZUAI ·

MBZUAI researchers demonstrated a low-latency, multilingual multimodal AI system at GITEX that integrates speech, text, and visual capabilities for more lifelike human-machine conversation. The demo, led by Dr. Hisham Cholakkal, includes a mobile app where users can point their camera at an object and ask questions, receiving spoken answers in multiple languages. They are also integrating the model into a robot dog that can respond to voice commands. Why it matters: This work addresses key challenges in deploying LLMs to real-world applications in the Middle East, such as multilingual support and real-time responsiveness.

A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos

arXiv ·

A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.