Middle East AI

This Week arXiv

ALARB: An Arabic Legal Argument Reasoning Benchmark

arXiv · · Significant research

Summary

Researchers introduce ALARB, a new benchmark for evaluating reasoning in Arabic LLMs using 13K Saudi commercial court cases. The benchmark includes tasks like verdict prediction, reasoning chain completion, and identification of relevant regulations. Instruction-tuning a 12B parameter model on ALARB achieves performance comparable to GPT-4o in verdict prediction and generation.

Keywords

Arabic · legal · reasoning · benchmark · LLM

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark

arXiv ·

MBZUAI researchers introduce ARB, the first comprehensive benchmark for evaluating step-by-step multimodal reasoning in Arabic across textual and visual modalities. The benchmark spans 11 diverse domains and includes 1,356 multimodal samples with 5,119 human-curated reasoning steps. Evaluations of 12 state-of-the-art LMMs revealed challenges in coherence, faithfulness, and cultural grounding, highlighting the need for culturally aware AI systems.

RIRAG: Regulatory Information Retrieval and Answer Generation

arXiv ·

Researchers introduce a new task for generating question-passage pairs to aid in developing regulatory question-answering (QA) systems. The ObliQA dataset, comprising 27,869 questions from Abu Dhabi Global Markets (ADGM) financial regulations, is presented. A baseline Regulatory Information Retrieval and Answer Generation (RIRAG) system is designed and evaluated using the RePASs metric.

UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking

arXiv ·

Researchers from MBZUAI have introduced UrduFactCheck, a new framework for fact-checking in Urdu, along with two datasets: UrduFactBench and UrduFactQA. The framework uses monolingual and translation-based evidence retrieval to address the lack of Urdu resources. Evaluations using twelve LLMs showed that translation-augmented methods improve performance, highlighting challenges for open-source LLMs in Urdu.

The Saudi Privacy Policy Dataset

arXiv ·

A new dataset called the Saudi Privacy Policy Dataset is introduced, which contains Arabic privacy policies from various sectors in Saudi Arabia. The dataset is annotated based on the 10 principles of the Personal Data Protection Law (PDPL) and includes 1,000 websites, 4,638 lines of text, and 775,370 tokens. The dataset aims to facilitate research and development in privacy policy analysis, NLP, and machine learning applications related to data protection.