This article surveys the landscape of Arabic Large Language Models (ALLMs), tracing their evolution from early text processing systems to sophisticated AI models. It highlights the unique challenges and opportunities in developing ALLMs for the 422 million Arabic speakers across 27 countries. The paper also examines the evaluation of ALLMs through benchmarks and public leaderboards. Why it matters: ALLMs can bridge technological gaps and empower Arabic-speaking communities by catering to their specific linguistic and cultural needs.
This study reviews the use of large language models (LLMs) for Arabic language processing, focusing on pre-trained models and their applications. It highlights the challenges in Arabic NLP due to the language's complexity and the relative scarcity of resources. The review also discusses how techniques like fine-tuning and prompt engineering enhance model performance on Arabic benchmarks. Why it matters: This overview helps consolidate research directions and benchmarks in Arabic NLP, guiding future development of LLMs tailored for the Arabic language and its diverse dialects.
The paper introduces ALLaM, a series of large language models for Arabic and English, designed to support Arabic Language Technologies. The models are trained with language alignment and knowledge transfer in mind, using a decoder-only architecture. ALLaM achieves state-of-the-art results on Arabic benchmarks like MMLU Arabic and Arabic Exams. Why it matters: This work advances Arabic NLP by providing high-performing LLMs and demonstrating effective techniques for cross-lingual transfer learning and alignment with human preferences.
This paper introduces AraLLaMA, a new Arabic large language model (LLM) trained using a progressive vocabulary expansion method inspired by second language acquisition. The model utilizes a modified byte-pair encoding (BPE) algorithm to dynamically extend the Arabic subwords in its vocabulary during training, balancing the out-of-vocabulary (OOV) ratio. Experiments show AraLLaMA achieves performance comparable to existing Arabic LLMs on various benchmarks, and all models, data, and code will be open-sourced. Why it matters: This work addresses the need for more accessible and performant Arabic LLMs, contributing to democratization of AI in the Arab world.
This paper introduces a predictive analysis of Arabic court decisions, utilizing 10,813 real commercial court cases. The study evaluates LLaMA-7b, JAIS-13b, and GPT3.5-turbo models under zero-shot, one-shot, and fine-tuned training paradigms, also experimenting with summarization and translation. GPT-3.5 models significantly outperformed others, exceeding JAIS model performance by 50%, while also demonstrating the unreliability of most automated metrics. Why it matters: This research bridges computational linguistics and Arabic legal analytics, offering insights for enhancing judicial processes and legal strategies in the Arabic-speaking world.
This research evaluates LLMs like ChatGPT, Llama, Aya, Jais, and ACEGPT on Arabic automated essay scoring (AES) using the AR-AES dataset. The study uses zero-shot, few-shot learning, and fine-tuning approaches while using a mixed-language prompting strategy. ACEGPT performed best among the LLMs with a QWK of 0.67, while a smaller BERT model achieved 0.88. Why it matters: The study highlights challenges faced by LLMs in processing Arabic and provides insights into improving LLM performance in Arabic NLP tasks.
Researchers compiled a 101 Billion Arabic Words Dataset by mining text from Common Crawl WET files and rigorously cleaning and deduplicating the extracted content. The dataset aims to address the scarcity of original, high-quality Arabic linguistic data, which often leads to bias in Arabic LLMs that rely on translated English data. This is the largest Arabic dataset available to date. Why it matters: The new dataset can significantly contribute to the development of authentic Arabic LLMs that are more linguistically and culturally accurate.