Grounding Arabic LLMs in the Doha Historical Dictionary: Retrieval-Augmented Understanding of Quran and Hadith

arXiv · March 25, 2026 · Significant research

Summary

Researchers developed a retrieval-augmented generation (RAG) framework to improve Arabic Large Language Models (LLMs) in understanding complex historical and religious texts like the Quran and Hadith. This framework grounds LLMs in the Doha Historical Dictionary of Arabic (DHDA) through hybrid retrieval and intent-based routing. The approach significantly boosted the accuracy of Arabic-native LLMs such as Fanar and ALLaM to over 85%, closing the performance gap with proprietary models like Gemini. Why it matters: This research offers a novel method for enhancing Arabic NLP capabilities for historically nuanced texts, demonstrating the value of integrating diachronic lexicographic resources into RAG systems for deeper language understanding.

Keywords

Large Language Models · Retrieval-Augmented Generation · Arabic NLP · Doha Historical Dictionary · Religious Texts

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.