The paper introduces a benchmark of 1,000 multiple-choice questions to evaluate LLMs on Islamic inheritance law ('ilm al-mawarith). Seven LLMs were tested, with o3 and Gemini 2.5 achieving over 90% accuracy, while ALLaM, Fanar, LLaMA, and Mistral scored below 50%. Error analysis revealed limitations in handling structured legal reasoning. Why it matters: This research highlights the challenges and opportunities for adapting LLMs to complex, culturally-specific legal domains like Islamic jurisprudence.
The QU-NLP team presented their approach to the QIAS 2025 shared task on Islamic Inheritance Reasoning, fine-tuning the Fanar-1-9B model using LoRA and integrating it into a RAG pipeline. Their system achieved an accuracy of 0.858 on the final test, outperforming models like GPT 4.5, LLaMA, and Mistral in zero-shot settings. The system particularly excelled in advanced reasoning, achieving 97.6% accuracy. Why it matters: This demonstrates the effectiveness of domain-specific fine-tuning and retrieval augmentation for Arabic LLMs in complex reasoning tasks, even surpassing frontier models.
Researchers developed a semantic search tool for the Quran using Arabic NLP techniques. The tool was trained on a dataset of over 30 tafsirs (interpretations) of the Quran. Using the SNxLM model and cosine similarity, the tool identifies Quranic verses most relevant to a user's query, achieving a similarity score of up to 0.97. Why it matters: This tool could significantly improve access to the Quran's teachings for Arabic speakers and researchers, providing a valuable resource for religious study and understanding.