Skip to content
GCC AI Research

Testing LLMs safety in Arabic from two perspectives | NAACL

MBZUAI · Notable

Summary

Researchers at MBZUAI presented a new Arabic dataset at NAACL to measure LLM safety, building on a Chinese dataset called 'Do Not Answer'. The dataset includes nearly 5,800 questions with challenges and harmless requests containing sensitive terms to test for over-sensitivity. The team localized cultural concepts and added 3,000 questions specific to Arabic language and culture. Why it matters: This comprehensive benchmark, accounting for the diversity of Arabic dialects and cultures, advances the development of safer and more culturally aligned LLMs for Arabic speakers.

Keywords

LLM safety · Arabic · MBZUAI · NAACL · dataset

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

arXiv ·

The paper introduces SalamahBench, a new benchmark for evaluating the safety of Arabic Language Models (ALMs). The benchmark comprises 8,170 prompts across 12 categories aligned with the MLCommons Safety Hazard Taxonomy. Five state-of-the-art ALMs, including Fanar 1 and 2, ALLaM 2, Falcon H1R, and Jais 2, were evaluated using the benchmark. Why it matters: The benchmark enables standardized, category-aware safety evaluation, highlighting the necessity of specialized safeguard mechanisms for robust harm mitigation in ALMs.