ADAB: Arabic Dataset for Automated Politeness Benchmarking -- A Large-Scale Resource for Computational Sociopragmatics
arXiv · · Notable
Summary
The paper introduces ADAB (Arabic Politeness Dataset), a new annotated Arabic dataset for politeness detection collected from online platforms. The dataset covers Modern Standard Arabic and multiple dialects (Gulf, Egyptian, Levantine, and Maghrebi). It contains 10,000 samples across 16 politeness categories and achieves substantial inter-annotator agreement (kappa = 0.703). Why it matters: This dataset addresses the under-explored area of Arabic-language resources for politeness detection, which is crucial for culturally-aware NLP systems.
Keywords
Arabic NLP · politeness detection · dataset · sociopragmatics · ADAB
Get the weekly digest
Top AI stories from the GCC region, every week.