Language Models' Factuality Depends on the Language of Inquiry

arXiv · February 25, 2025 · Significant research

Summary

Researchers introduce a benchmark to evaluate the factual recall and knowledge transferability of multilingual language models across 13 languages. The study reveals that language models often fail to transfer knowledge between languages, even when they possess the correct information in one language. The benchmark and evaluation framework are released to drive future research in multilingual knowledge transfer.

Keywords

multilingual language models · factuality · knowledge transfer · cross-lingual generalization · Arabic

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards

arXiv · Feb 1

Researchers from the National Center for AI in Saudi Arabia investigated the sensitivity of Large Language Model (LLM) leaderboards to minor benchmark perturbations. They found that small changes, like choice order, can shift rankings by up to 8 positions. The study recommends hybrid scoring and warns against over-reliance on simple benchmark evaluations, providing code for further research.

Language Models' Factuality Depends on the Language of Inquiry

Summary

Keywords

Related

When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards