ALPS: A Diagnostic Challenge Set for Arabic Linguistic & Pragmatic Reasoning

arXiv · February 19, 2026 · Significant research

Summary

The paper introduces ALPS (Arabic Linguistic & Pragmatic Suite), a diagnostic challenge set for evaluating deep semantics and pragmatics in Arabic NLP. The dataset contains 531 expert-curated questions across 15 tasks and 47 subtasks, designed to test morpho-syntactic dependencies and compositional semantics. Evaluation of 23 models, including commercial, open-source, and Arabic-native models, reveals that models struggle with fundamental morpho-syntactic dependencies, especially those reliant on diacritics. Why it matters: ALPS provides a valuable benchmark for evaluating the linguistic competence of Arabic NLP models, highlighting areas where current models fall short despite achieving high fluency.

Keywords

Arabic NLP · linguistic reasoning · pragmatics · benchmark · ALPS

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.