Researchers introduce AceGPT, a localized large language model (LLM) specifically for Arabic, addressing cultural sensitivity and local values not well-represented in mainstream models. AceGPT incorporates further pre-training with Arabic texts, supervised fine-tuning using native Arabic instructions and GPT-4 responses, and reinforcement learning with AI feedback using a reward model attuned to local culture. Evaluations demonstrate that AceGPT achieves state-of-the-art performance among open Arabic LLMs across several benchmarks. Why it matters: This work advances culturally-aware AI development for Arabic-speaking communities, providing a valuable resource and benchmark for future research.
The Open Arabic LLM Leaderboard (OALL) has been launched to benchmark Arabic language models, addressing the gap in resources for non-English NLP. It incorporates datasets like AlGhafa, ACVA, and translated versions of MMLU and EXAMS from the AceGPT suite. The leaderboard uses normalized log likelihood accuracy for tasks, built around HuggingFace’s LightEval framework. Why it matters: This initiative promotes research and development in Arabic NLP, serving over 380 million Arabic speakers by enhancing the evaluation and improvement of Arabic LLMs.
The paper introduces AraGPT2, a suite of pre-trained transformer models for Arabic language generation, with the largest model (AraGPT2-mega) containing 1.46 billion parameters. Trained on a large Arabic corpus of internet text and news, AraGPT2-mega demonstrates strong performance in synthetic news generation and zero-shot question answering. To address the risk of misuse, the authors also released a discriminator model with 98% accuracy in detecting AI-generated text. Why it matters: This release of both the model and discriminator fills a critical gap in Arabic NLP and encourages further research and applications in the field.
This research explores the use of generative AI, specifically ChatGPT, to create student assessments that align with academic accreditation standards, such as those of the National Center for Academic Accreditation in Saudi Arabia and ABET. The study introduces a method for mapping verbs used in questions to educational outcomes, enabling AI to produce and validate accreditation-compliant questions. A survey of faculty members in Saudi universities showed high acceptance rates for AI-generated exam questions and AI assistance in editing existing questions.
The paper introduces ArabianGPT, a suite of transformer-based language models designed specifically for Arabic, including versions with 0.1B and 0.3B parameters. A key component is the AraNizer tokenizer, tailored for Arabic script's morphology. Fine-tuning ArabianGPT-0.1B achieved 95% accuracy in sentiment analysis, up from 56% in the base model, and improved F1 scores in summarization. Why it matters: The models address the gap in native Arabic LLMs, offering better performance on Arabic NLP tasks through tailored architecture and tokenization.
MBZUAI researchers introduce VideoGPT+, a novel video Large Multimodal Model (LMM) that integrates image and video encoders to leverage both spatial and temporal information in videos. They also introduce VCGBench-Diverse, a comprehensive benchmark for evaluating video LMMs across 18 video categories. VideoGPT+ demonstrates improved performance on multiple video benchmarks, including VCGBench and MVBench.
InfiAgent is a new agent framework comparable to GPT4-Agent, developed by replicating Codex. It includes InfiCoder, an open-source model for text-to-code, code-to-code, and freeform code-related QA tasks. The framework focuses on data analysis and integrates an LLM with programming capabilities and a sandbox environment for executing Python code. Why it matters: This research demonstrates the potential for advancements in AI operating systems and highlights areas where current models like GPT-4V can be improved, contributing to the broader development of more capable and versatile AI agents.