Skip to content
GCC AI Research

Search

Results for "FanarGuard"

FanarGuard: A Culturally-Aware Moderation Filter for Arabic Language Models

arXiv ·

The paper introduces FanarGuard, a bilingual moderation filter for Arabic and English language models that considers both safety and cultural alignment. A dataset of 468K prompt-response pairs was created and scored by LLM judges on harmlessness and cultural awareness to train the filter. The first benchmark targeting Arabic cultural contexts was developed to evaluate cultural alignment. Why it matters: FanarGuard advances context-sensitive AI safeguards by integrating cultural awareness into content moderation, addressing a critical gap in current alignment techniques.