Grounding Arabic LLMs in the Doha Historical Dictionary: Retrieval-Augmented Understanding of Quran and Hadith
arXiv · · Significant research
Summary
Researchers developed a retrieval-augmented generation (RAG) framework to improve Arabic Large Language Models (LLMs) in understanding complex historical and religious texts like the Quran and Hadith. This framework grounds LLMs in the Doha Historical Dictionary of Arabic (DHDA) through hybrid retrieval and intent-based routing. The approach significantly boosted the accuracy of Arabic-native LLMs such as Fanar and ALLaM to over 85%, closing the performance gap with proprietary models like Gemini. Why it matters: This research offers a novel method for enhancing Arabic NLP capabilities for historically nuanced texts, demonstrating the value of integrating diachronic lexicographic resources into RAG systems for deeper language understanding.
Keywords
Large Language Models · Retrieval-Augmented Generation · Arabic NLP · Doha Historical Dictionary · Religious Texts
Get the weekly digest
Top AI stories from the GCC region, every week.