Skip to content
GCC AI Research

ALARB: An Arabic Legal Argument Reasoning Benchmark

arXiv · · Significant research

Summary

Researchers introduce ALARB, a new benchmark for evaluating reasoning in Arabic LLMs using 13K Saudi commercial court cases. The benchmark includes tasks like verdict prediction, reasoning chain completion, and identification of relevant regulations. Instruction-tuning a 12B parameter model on ALARB achieves performance comparable to GPT-4o in verdict prediction and generation.

Keywords

Arabic · legal · reasoning · benchmark · LLM

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark

arXiv ·

MBZUAI researchers introduce ARB, the first comprehensive benchmark for evaluating step-by-step multimodal reasoning in Arabic across textual and visual modalities. The benchmark spans 11 diverse domains and includes 1,356 multimodal samples with 5,119 human-curated reasoning steps. Evaluations of 12 state-of-the-art LMMs revealed challenges in coherence, faithfulness, and cultural grounding, highlighting the need for culturally aware AI systems.