ADAB: Arabic Dataset for Automated Politeness Benchmarking -- A Large-Scale Resource for Computational Sociopragmatics

arXiv · February 14, 2026 · Notable

Summary

The paper introduces ADAB (Arabic Politeness Dataset), a new annotated Arabic dataset for politeness detection collected from online platforms. The dataset covers Modern Standard Arabic and multiple dialects (Gulf, Egyptian, Levantine, and Maghrebi). It contains 10,000 samples across 16 politeness categories and achieves substantial inter-annotator agreement (kappa = 0.703). Why it matters: This dataset addresses the under-explored area of Arabic-language resources for politeness detection, which is crucial for culturally-aware NLP systems.

Keywords

Arabic NLP · politeness detection · dataset · sociopragmatics · ADAB

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.