Skip to content
GCC AI Research

Search

Results for "LLM-as-a-Judge"

Prediction of Arabic Legal Rulings using Large Language Models

arXiv ·

This paper introduces a predictive analysis of Arabic court decisions, utilizing 10,813 real commercial court cases. The study evaluates LLaMA-7b, JAIS-13b, and GPT3.5-turbo models under zero-shot, one-shot, and fine-tuned training paradigms, also experimenting with summarization and translation. GPT-3.5 models significantly outperformed others, exceeding JAIS model performance by 50%, while also demonstrating the unreliability of most automated metrics. Why it matters: This research bridges computational linguistics and Arabic legal analytics, offering insights for enhancing judicial processes and legal strategies in the Arabic-speaking world.

ALARB: An Arabic Legal Argument Reasoning Benchmark

arXiv ·

Researchers introduce ALARB, a new benchmark for evaluating reasoning in Arabic LLMs using 13K Saudi commercial court cases. The benchmark includes tasks like verdict prediction, reasoning chain completion, and identification of relevant regulations. Instruction-tuning a 12B parameter model on ALARB achieves performance comparable to GPT-4o in verdict prediction and generation.

Assessing Large Language Models on Islamic Legal Reasoning: Evidence from Inheritance Law Evaluation

arXiv ·

The paper introduces a benchmark of 1,000 multiple-choice questions to evaluate LLMs on Islamic inheritance law ('ilm al-mawarith). Seven LLMs were tested, with o3 and Gemini 2.5 achieving over 90% accuracy, while ALLaM, Fanar, LLaMA, and Mistral scored below 50%. Error analysis revealed limitations in handling structured legal reasoning. Why it matters: This research highlights the challenges and opportunities for adapting LLMs to complex, culturally-specific legal domains like Islamic jurisprudence.