MBZUAI researchers release OpenFactCheck, a unified framework to evaluate the factual accuracy of large language models. The framework includes modules for response evaluation, LLM evaluation, and fact-checker evaluation. OpenFactCheck is available as an open-source Python library, a web service, and via GitHub.
A new methodology emulating fact-checker criteria assesses news outlet factuality and bias using LLMs. The approach uses prompts based on fact-checking criteria to elicit and aggregate LLM responses for predictions. Experiments demonstrate improvements over baselines, with error analysis on media popularity and region, and a released dataset/code at https://github.com/mbzuai-nlp/llm-media-profiling.
This paper introduces ProgramFC, a fact-checking model that decomposes complex claims into simpler sub-tasks using a library of functions. The model uses LLMs to generate reasoning programs and executes them by delegating sub-tasks, enhancing explainability and data efficiency. Experiments on fact-checking datasets demonstrate ProgramFC's superior performance compared to baseline methods, with publicly available code and data.
Researchers introduce a benchmark to evaluate the factual recall and knowledge transferability of multilingual language models across 13 languages. The study reveals that language models often fail to transfer knowledge between languages, even when they possess the correct information in one language. The benchmark and evaluation framework are released to drive future research in multilingual knowledge transfer.
Researchers from MBZUAI have introduced UrduFactCheck, a new framework for fact-checking in Urdu, along with two datasets: UrduFactBench and UrduFactQA. The framework uses monolingual and translation-based evidence retrieval to address the lack of Urdu resources. Evaluations using twelve LLMs showed that translation-augmented methods improve performance, highlighting challenges for open-source LLMs in Urdu.