Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM

arXiv · December 14, 2023 · Significant research

NLP LLM Research Arabic AI Climate Change

Summary

Researchers introduce Arabic Mini-ClimateGPT, a tailored Arabic LLM for climate change and sustainability. The model is fine-tuned on the Clima500-Instruct dataset and uses vector embedding retrieval during inference. Evaluations show the model outperforms baseline LLMs and is preferred by experts in 81.6% of cases.

Keywords

LLM · Arabic · Climate Change · Sustainability · Fine-tuning

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

AceGPT, Localizing Large Language Models in Arabic

arXiv · Sep 21

Researchers introduce AceGPT, a localized large language model (LLM) specifically for Arabic, addressing cultural sensitivity and local values not well-represented in mainstream models. AceGPT incorporates further pre-training with Arabic texts, supervised fine-tuning using native Arabic instructions and GPT-4 responses, and reinforcement learning with AI feedback using a reward model attuned to local culture. Evaluations demonstrate that AceGPT achieves state-of-the-art performance among open Arabic LLMs across several benchmarks. Why it matters: This work advances culturally-aware AI development for Arabic-speaking communities, providing a valuable resource and benchmark for future research.

Evaluating Arabic Large Language Models: A Survey of Benchmarks, Methods, and Gaps

arXiv · Oct 15

This survey paper analyzes over 40 benchmarks used to evaluate Arabic large language models, categorizing them into Knowledge, NLP Tasks, Culture and Dialects, and Target-Specific evaluations. It identifies progress in benchmark diversity but also highlights gaps like limited temporal evaluation and cultural misalignment. The paper also examines methods for creating benchmarks, including native collection, translation, and synthetic generation. Why it matters: The survey provides a comprehensive reference for Arabic NLP research and offers recommendations for future benchmark development to better align with cultural contexts.

ArabianGPT: Native Arabic GPT-based Large Language Model

arXiv · Feb 23

The paper introduces ArabianGPT, a suite of transformer-based language models designed specifically for Arabic, including versions with 0.1B and 0.3B parameters. A key component is the AraNizer tokenizer, tailored for Arabic script's morphology. Fine-tuning ArabianGPT-0.1B achieved 95% accuracy in sentiment analysis, up from 56% in the base model, and improved F1 scores in summarization. Why it matters: The models address the gap in native Arabic LLMs, offering better performance on Arabic NLP tasks through tailored architecture and tokenization.