ArabicNumBench: Evaluating Arabic Number Reading in Large Language Models

arXiv · February 21, 2026 · Significant research

Summary

The paper introduces ArabicNumBench, a benchmark for evaluating LLMs on Arabic number reading using both Eastern and Western Arabic numerals. It evaluates 71 models from 10 providers on 210 number reading tasks, using zero-shot, zero-shot CoT, few-shot, and few-shot CoT prompting strategies. The results show substantial performance variation, with few-shot CoT prompting achieving 2.8x higher accuracy than zero-shot approaches. Why it matters: The benchmark establishes baselines for Arabic number comprehension and provides guidance for model selection in production Arabic NLP systems.

Keywords

Arabic NLP · LLM evaluation · Arabic numerals · benchmarking · language models

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.