The paper introduces FanarGuard, a bilingual moderation filter for Arabic and English language models that considers both safety and cultural alignment. A dataset of 468K prompt-response pairs was created and scored by LLM judges on harmlessness and cultural awareness to train the filter. The first benchmark targeting Arabic cultural contexts was developed to evaluate cultural alignment. Why it matters: FanarGuard advances context-sensitive AI safeguards by integrating cultural awareness into content moderation, addressing a critical gap in current alignment techniques.
Hamad Bin Khalifa University's Qatar Computing Research Institute (QCRI) introduced Fanar, an Arabic-centric multimodal generative AI platform featuring the Fanar Star (7B) and Fanar Prime (9B) Arabic LLMs. These models were trained on nearly 1 trillion tokens and are designed to address different prompts through a custom orchestrator. Fanar includes a customized Islamic RAG system, a Recency RAG, bilingual speech recognition, and an attribution service for content verification, sponsored by Qatar's Ministry of Communications and Information Technology. Why it matters: The platform signifies a major step towards sovereign AI development in Qatar, providing advanced Arabic language capabilities and addressing regional needs.
The paper introduces SalamahBench, a new benchmark for evaluating the safety of Arabic Language Models (ALMs). The benchmark comprises 8,170 prompts across 12 categories aligned with the MLCommons Safety Hazard Taxonomy. Five state-of-the-art ALMs, including Fanar 1 and 2, ALLaM 2, Falcon H1R, and Jais 2, were evaluated using the benchmark. Why it matters: The benchmark enables standardized, category-aware safety evaluation, highlighting the necessity of specialized safeguard mechanisms for robust harm mitigation in ALMs.
Hamad Bin Khalifa University (HBKU) has released Fanar 2.0, the second generation of Qatar's Arabic-centric Generative AI platform, built entirely at QCRI. The core of Fanar 2.0 is Fanar-27B, which was continually pre-trained from a Gemma-3-27B backbone using 120 billion high-quality tokens and only 256 NVIDIA H100 GPUs. Fanar 2.0 includes capabilities like FanarGuard, Aura, Oryx, Fanar-Sadiq, Fanar-Diwan, and FanarShaheen for moderation, speech recognition, vision understanding, Islamic content, poetry generation, and translation. Why it matters: This shows that sovereign, resource-constrained AI development in the Arabic language is possible, producing competitive systems in the region.