Researchers introduce AceGPT, a localized large language model (LLM) specifically for Arabic, addressing cultural sensitivity and local values not well-represented in mainstream models. AceGPT incorporates further pre-training with Arabic texts, supervised fine-tuning using native Arabic instructions and GPT-4 responses, and reinforcement learning with AI feedback using a reward model attuned to local culture. Evaluations demonstrate that AceGPT achieves state-of-the-art performance among open Arabic LLMs across several benchmarks. Why it matters: This work advances culturally-aware AI development for Arabic-speaking communities, providing a valuable resource and benchmark for future research.
The Open Arabic LLM Leaderboard (OALL) has been launched to benchmark Arabic language models, addressing the gap in resources for non-English NLP. It incorporates datasets like AlGhafa, ACVA, and translated versions of MMLU and EXAMS from the AceGPT suite. The leaderboard uses normalized log likelihood accuracy for tasks, built around HuggingFace’s LightEval framework. Why it matters: This initiative promotes research and development in Arabic NLP, serving over 380 million Arabic speakers by enhancing the evaluation and improvement of Arabic LLMs.
This research explores the use of generative AI, specifically ChatGPT, to create student assessments that align with academic accreditation standards, such as those of the National Center for Academic Accreditation in Saudi Arabia and ABET. The study introduces a method for mapping verbs used in questions to educational outcomes, enabling AI to produce and validate accreditation-compliant questions. A survey of faculty members in Saudi universities showed high acceptance rates for AI-generated exam questions and AI assistance in editing existing questions.
InfiAgent is a new agent framework comparable to GPT4-Agent, developed by replicating Codex. It includes InfiCoder, an open-source model for text-to-code, code-to-code, and freeform code-related QA tasks. The framework focuses on data analysis and integrates an LLM with programming capabilities and a sandbox environment for executing Python code. Why it matters: This research demonstrates the potential for advancements in AI operating systems and highlights areas where current models like GPT-4V can be improved, contributing to the broader development of more capable and versatile AI agents.
The paper introduces ArabianGPT, a suite of transformer-based language models designed specifically for Arabic, including versions with 0.1B and 0.3B parameters. A key component is the AraNizer tokenizer, tailored for Arabic script's morphology. Fine-tuning ArabianGPT-0.1B achieved 95% accuracy in sentiment analysis, up from 56% in the base model, and improved F1 scores in summarization. Why it matters: The models address the gap in native Arabic LLMs, offering better performance on Arabic NLP tasks through tailored architecture and tokenization.
This research evaluates LLMs like ChatGPT, Llama, Aya, Jais, and ACEGPT on Arabic automated essay scoring (AES) using the AR-AES dataset. The study uses zero-shot, few-shot learning, and fine-tuning approaches while using a mixed-language prompting strategy. ACEGPT performed best among the LLMs with a QWK of 0.67, while a smaller BERT model achieved 0.88. Why it matters: The study highlights challenges faced by LLMs in processing Arabic and provides insights into improving LLM performance in Arabic NLP tasks.
Researchers introduce AraDiCE, a benchmark for Arabic Dialect and Cultural Evaluation, comprising seven synthetic datasets in various dialects and Modern Standard Arabic (MSA). The benchmark includes approximately 45,000 post-edited samples and evaluates LLMs on dialect comprehension, generation, and cultural awareness across the Gulf, Egypt, and Levant. Results show that Arabic-specific models like Jais and AceGPT outperform multilingual models on dialectal tasks, but challenges remain in dialect identification, generation, and translation. Why it matters: This benchmark and associated datasets will help improve LLMs' ability to understand and generate diverse Arabic dialects and cultural contexts, addressing a significant gap in current models.
The paper introduces AraGPT2, a suite of pre-trained transformer models for Arabic language generation, with the largest model (AraGPT2-mega) containing 1.46 billion parameters. Trained on a large Arabic corpus of internet text and news, AraGPT2-mega demonstrates strong performance in synthetic news generation and zero-shot question answering. To address the risk of misuse, the authors also released a discriminator model with 98% accuracy in detecting AI-generated text. Why it matters: This release of both the model and discriminator fills a critical gap in Arabic NLP and encourages further research and applications in the field.