Search

Results for "FarSight"

When models see what isn’t there: Reducing hallucinations with FarSight

MBZUAI · Invalid Date

MBZUAI researchers developed FarSight, a plugin to reduce hallucinations in Multimodal Large Language Models (MLLMs). FarSight addresses the issue where MLLMs generate inaccurate text by losing focus on relevant image details, leading to snowball hallucinations. Testing on models like LLaVA-1.5-7B showed FarSight's effectiveness in reducing initial mistakes, thereby minimizing overall hallucinations. Why it matters: Improving the reliability of MLLMs is crucial for applications requiring high accuracy, enhancing their utility in various real-world scenarios.

A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos

arXiv · Dec 18

A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.