Truth from uncertainty: using AI’s internal signals to spot hallucinations

MBZUAI · Significant research

Summary

Researchers from MBZUAI developed "uncertainty quantification heads" (UQ heads) to detect hallucinations in language models by probing internal states and estimating the credibility of generated text. UQ heads leverage attention maps and logits to identify potential hallucinations without altering the model's generation process or relying on external knowledge. The team found that UQ heads achieved state-of-the-art performance in claim-level hallucination detection across different domains and languages. Why it matters: This approach offers a more efficient and accurate method for identifying hallucinations, improving the reliability and trustworthiness of language models in various applications.

Keywords

hallucination detection · language models · uncertainty quantification · attention maps · logits

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

Separating fact from fiction with uncertainty quantification

MBZUAI · Invalid Date

MBZUAI's Maxim Panov is developing uncertainty quantification methods to improve the reliability of language models. His work focuses on providing insights into the confidence level of machine learning models' predictions, especially in scenarios where accuracy is critical, such as medicine. Panov is working on post-processing techniques that can be applied to already-trained models. Why it matters: This research aims to address the issue of "hallucinations" in language models, enhancing their trustworthiness and applicability in sensitive domains within the region and globally.

A new approach to identify LLM hallucinations: Uncertainty quantification presented at ACL

MBZUAI · Invalid Date

MBZUAI researchers presented a new uncertainty quantification method at ACL to identify hallucinations in LLMs, called claim conditioned probability (CCP). CCP leverages the internal token probabilities generated by the LLM itself to highlight claims with low confidence. Unlike external fact-checking methods, CCP is computationally efficient as it uses probabilities already computed by the model. Why it matters: This research offers a practical approach to mitigate the impact of LLM hallucinations by highlighting potentially unreliable information, improving the trustworthiness of these models, especially for Arabic LLMs.

When disagreement becomes a signal for AI models

MBZUAI · Invalid Date

A new paper coauthored by researchers at The University of Melbourne and MBZUAI explores disagreement in human annotation for AI training. The paper treats disagreement as a signal (human label variation or HLV) rather than noise, and proposes new evaluation metrics based on fuzzy set theory. These metrics adapt accuracy and F-score to cases where multiple labels may plausibly apply, aligning model output with the distribution of human judgments. Why it matters: This research addresses a key challenge in NLP by accounting for the inherent ambiguity in human language, potentially leading to more robust and human-aligned AI systems.

Safety of Deploying NLP Models: Uncertainty Quantification of Generative LLMs

MBZUAI · Invalid Date

MBZUAI's Dr. Artem Shelmanov is working on uncertainty quantification (UQ) methods for generative LLMs to detect unreliable generations. He aims to address the issue of LLMs fabricating facts, often called "hallucinating", without clear indicators of veracity. He systemizes existing UQ efforts, discusses caveats, and suggests novel techniques for safer LLM use. Why it matters: Improving the reliability of LLMs is crucial for responsible AI deployment in the region, especially in sensitive applications.

Truth from uncertainty: using AI’s internal signals to spot hallucinations

Summary

Keywords

Related

Separating fact from fiction with uncertainty quantification

A new approach to identify LLM hallucinations: Uncertainty quantification presented at ACL

When disagreement becomes a signal for AI models

Safety of Deploying NLP Models: Uncertainty Quantification of Generative LLMs