MBZUAI researchers developed FarSight, a plugin to reduce hallucinations in Multimodal Large Language Models (MLLMs). FarSight addresses the issue where MLLMs generate inaccurate text by losing focus on relevant image details, leading to snowball hallucinations. Testing on models like LLaVA-1.5-7B showed FarSight's effectiveness in reducing initial mistakes, thereby minimizing overall hallucinations. Why it matters: Improving the reliability of MLLMs is crucial for applications requiring high accuracy, enhancing their utility in various real-world scenarios.
MBZUAI researchers are working to improve computer vision models by incorporating common sense knowledge. They aim to address issues like the generation of unrealistic human features, such as hands with incorrect numbers of fingers. By integrating common-sense knowledge, like the fact that humans typically have five fingers per hand, they seek to make deep learning models more reliable. Why it matters: This research could improve the accuracy and trustworthiness of AI-generated content, making it more suitable for real-world applications.
Researchers from MBZUAI developed "uncertainty quantification heads" (UQ heads) to detect hallucinations in language models by probing internal states and estimating the credibility of generated text. UQ heads leverage attention maps and logits to identify potential hallucinations without altering the model's generation process or relying on external knowledge. The team found that UQ heads achieved state-of-the-art performance in claim-level hallucination detection across different domains and languages. Why it matters: This approach offers a more efficient and accurate method for identifying hallucinations, improving the reliability and trustworthiness of language models in various applications.
MBZUAI researchers introduce FAID, a fine-grained AI-generated text detection framework capable of classifying text as human-written, LLM-generated, or collaboratively written. FAID utilizes multi-level contrastive learning and multi-task auxiliary classification to capture authorship and model-specific characteristics, and can identify the underlying LLM family. The framework outperforms existing baselines, especially in generalizing to unseen domains and new LLMs, and includes a multilingual, multi-domain dataset called FAIDSet.
Xiuying Chen from KAUST presented her work on improving the trustworthiness of AI-generated text, focusing on accuracy and robustness. Her research analyzes causes of hallucination in language models related to semantic understanding and neglect of input knowledge, and proposes solutions. She also demonstrated vulnerabilities of language models to noise and enhances robustness using augmentation techniques. Why it matters: Improving the reliability of AI-generated text is crucial for its deployment in sensitive domains like healthcare and scientific discovery, where accuracy is paramount.