ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
arXiv ·
MBZUAI researchers introduce ARB, the first comprehensive benchmark for evaluating step-by-step multimodal reasoning in Arabic across textual and visual modalities. The benchmark spans 11 diverse domains and includes 1,356 multimodal samples with 5,119 human-curated reasoning steps. Evaluations of 12 state-of-the-art LMMs revealed challenges in coherence, faithfulness, and cultural grounding, highlighting the need for culturally aware AI systems.