Skip to content
GCC AI Research

Highlighting LLM safety: How the Libra-Leaderboard is making AI more responsible

MBZUAI · Significant research

Summary

MBZUAI-based startup LibrAI has launched the Libra-Leaderboard, an evaluation framework for LLMs that assesses both capability and safety. The leaderboard evaluates 26 mainstream LLMs using 57 datasets, assigning scores based on bias, misinformation, and oversensitivity. LibrAI also launched the Interactive Safety Arena to engage the public and educate them on AI safety through adversarial prompt testing. Why it matters: The Libra-Leaderboard provides a benchmark for responsible AI development, emphasizing the importance of aligning AI capabilities with safety considerations in the rapidly evolving LLM landscape.

Keywords

LLM · safety · LibrAI · MBZUAI · leaderboard

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards

arXiv ·

Researchers from the National Center for AI in Saudi Arabia investigated the sensitivity of Large Language Model (LLM) leaderboards to minor benchmark perturbations. They found that small changes, like choice order, can shift rankings by up to 8 positions. The study recommends hybrid scoring and warns against over-reliance on simple benchmark evaluations, providing code for further research.

AI Safety Research

MBZUAI ·

Adel Bibi, a KAUST alumnus and researcher at the University of Oxford, presented his research on AI safety, covering robustness, alignment, and fairness of LLMs. The research addresses challenges in AI systems, alignment issues, and fairness across languages in common tokenizers. Bibi's work includes instruction prefix tuning and its theoretical limitations towards alignment. Why it matters: This research from a leading researcher highlights the importance of addressing safety concerns in LLMs, particularly regarding alignment and fairness in the Arabic language.

SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

arXiv ·

The paper introduces SalamahBench, a new benchmark for evaluating the safety of Arabic Language Models (ALMs). The benchmark comprises 8,170 prompts across 12 categories aligned with the MLCommons Safety Hazard Taxonomy. Five state-of-the-art ALMs, including Fanar 1 and 2, ALLaM 2, Falcon H1R, and Jais 2, were evaluated using the benchmark. Why it matters: The benchmark enables standardized, category-aware safety evaluation, highlighting the necessity of specialized safeguard mechanisms for robust harm mitigation in ALMs.