MBZUAI researchers presented a new uncertainty quantification method at ACL to identify hallucinations in LLMs, called claim conditioned probability (CCP). CCP leverages the internal token probabilities generated by the LLM itself to highlight claims with low confidence. Unlike external fact-checking methods, CCP is computationally efficient as it uses probabilities already computed by the model. Why it matters: This research offers a practical approach to mitigate the impact of LLM hallucinations by highlighting potentially unreliable information, improving the trustworthiness of these models, especially for Arabic LLMs.
Researchers from MBZUAI developed "uncertainty quantification heads" (UQ heads) to detect hallucinations in language models by probing internal states and estimating the credibility of generated text. UQ heads leverage attention maps and logits to identify potential hallucinations without altering the model's generation process or relying on external knowledge. The team found that UQ heads achieved state-of-the-art performance in claim-level hallucination detection across different domains and languages. Why it matters: This approach offers a more efficient and accurate method for identifying hallucinations, improving the reliability and trustworthiness of language models in various applications.
MBZUAI's Maxim Panov is developing uncertainty quantification methods to improve the reliability of language models. His work focuses on providing insights into the confidence level of machine learning models' predictions, especially in scenarios where accuracy is critical, such as medicine. Panov is working on post-processing techniques that can be applied to already-trained models. Why it matters: This research aims to address the issue of "hallucinations" in language models, enhancing their trustworthiness and applicability in sensitive domains within the region and globally.
MBZUAI's Dr. Artem Shelmanov is working on uncertainty quantification (UQ) methods for generative LLMs to detect unreliable generations. He aims to address the issue of LLMs fabricating facts, often called "hallucinating", without clear indicators of veracity. He systemizes existing UQ efforts, discusses caveats, and suggests novel techniques for safer LLM use. Why it matters: Improving the reliability of LLMs is crucial for responsible AI deployment in the region, especially in sensitive applications.
Xiuying Chen from KAUST presented her work on improving the trustworthiness of AI-generated text, focusing on accuracy and robustness. Her research analyzes causes of hallucination in language models related to semantic understanding and neglect of input knowledge, and proposes solutions. She also demonstrated vulnerabilities of language models to noise and enhances robustness using augmentation techniques. Why it matters: Improving the reliability of AI-generated text is crucial for its deployment in sensitive domains like healthcare and scientific discovery, where accuracy is paramount.