FanarGuard: A Culturally-Aware Moderation Filter for Arabic Language Models

arXiv · November 24, 2025 · Significant research

Summary

The paper introduces FanarGuard, a bilingual moderation filter for Arabic and English language models that considers both safety and cultural alignment. A dataset of 468K prompt-response pairs was created and scored by LLM judges on harmlessness and cultural awareness to train the filter. The first benchmark targeting Arabic cultural contexts was developed to evaluate cultural alignment. Why it matters: FanarGuard advances context-sensitive AI safeguards by integrating cultural awareness into content moderation, addressing a critical gap in current alignment techniques.

Keywords

content moderation · cultural alignment · Arabic language models · FanarGuard · benchmark

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.