MBZUAI has released Jais and Jais-chat, two new open generative large language models (LLMs) with a focus on Arabic. The 13 billion parameter models are based on the GPT-3 architecture and pretrained on Arabic, English, and code. Evaluation shows state-of-the-art Arabic knowledge and reasoning, with competitive English performance.
This paper introduces a predictive analysis of Arabic court decisions, utilizing 10,813 real commercial court cases. The study evaluates LLaMA-7b, JAIS-13b, and GPT3.5-turbo models under zero-shot, one-shot, and fine-tuned training paradigms, also experimenting with summarization and translation. GPT-3.5 models significantly outperformed others, exceeding JAIS model performance by 50%, while also demonstrating the unreliability of most automated metrics. Why it matters: This research bridges computational linguistics and Arabic legal analytics, offering insights for enhancing judicial processes and legal strategies in the Arabic-speaking world.
Researchers from MBZUAI have released MobiLlama, a fully transparent open-source 0.5 billion parameter Small Language Model (SLM). MobiLlama is designed for resource-constrained devices, emphasizing enhanced performance with reduced resource demands. The full training data pipeline, code, model weights, and checkpoints are available on Github.
This paper evaluates the performance of GPT-3.5 and GPT-4 on seven Arabic NLP tasks including sentiment analysis, translation, and diacritization. GPT-4 outperforms GPT-3.5 on most tasks. The study provides an analysis of sentiment analysis and introduces a Python interface, Taqyim, for evaluating Arabic NLP tasks. Why it matters: The evaluation of LLMs on Arabic NLP tasks helps to identify strengths and weaknesses, guiding future research and development efforts in the field.
Hamad Bin Khalifa University (HBKU) has released Fanar 2.0, the second generation of Qatar's Arabic-centric Generative AI platform, built entirely at QCRI. The core of Fanar 2.0 is Fanar-27B, which was continually pre-trained from a Gemma-3-27B backbone using 120 billion high-quality tokens and only 256 NVIDIA H100 GPUs. Fanar 2.0 includes capabilities like FanarGuard, Aura, Oryx, Fanar-Sadiq, Fanar-Diwan, and FanarShaheen for moderation, speech recognition, vision understanding, Islamic content, poetry generation, and translation. Why it matters: This shows that sovereign, resource-constrained AI development in the Arabic language is possible, producing competitive systems in the region.
This paper presents a comprehensive evaluation of ChatGPT's performance across 44 Arabic NLP tasks using over 60 datasets. The study compares ChatGPT's capabilities in Modern Standard Arabic (MSA) and Dialectal Arabic (DA) against smaller, fine-tuned models. Results show ChatGPT is outperformed by smaller, fine-tuned models and exhibits limitations in handling Arabic dialects compared to MSA. Why it matters: The work highlights the need for further research and development of Arabic-specific NLP models to overcome the limitations of general-purpose models like ChatGPT.
A new content improvement system has been developed to address issues of randomness and incorrectness in text generated by deep learning models like GPT-3. The system uses text mining to identify correct sentences and employs syntactic/semantic generalization to substitute problematic elements. The system can substantially improve the factual correctness and meaningfulness of raw content. Why it matters: Improving the quality of automatically generated content is crucial for ensuring reliability and trustworthiness across various AI applications.