This paper introduces Saudi-Dialect-ALLaM, a LoRA fine-tuned version of the Saudi Arabian foundation model ALLaM-7B-Instruct-preview, designed to improve the generation of Saudi dialects (Najdi and Hijazi). The model is trained on a private dataset of 5,466 synthetic instruction-response pairs, with two variants explored: Dialect-Token and No-Token training. Results indicate that the Dialect-Token model achieves superior dialect control and fidelity compared to generic instruction models, although the dataset and model weights are not released.
The paper introduces Aladdin-FTI, a system designed for generating and translating dialectal Arabic (DA). Aladdin-FTI supports text generation in Moroccan, Egyptian, Palestinian, Syrian, and Saudi dialects. It also handles bidirectional translation between these dialects, Modern Standard Arabic (MSA), and English. Why it matters: This work contributes to addressing the under-representation of Arabic dialects in NLP research and enables more inclusive Arabic language models.
Researchers introduce AraDiCE, a benchmark for Arabic Dialect and Cultural Evaluation, comprising seven synthetic datasets in various dialects and Modern Standard Arabic (MSA). The benchmark includes approximately 45,000 post-edited samples and evaluates LLMs on dialect comprehension, generation, and cultural awareness across the Gulf, Egypt, and Levant. Results show that Arabic-specific models like Jais and AceGPT outperform multilingual models on dialectal tasks, but challenges remain in dialect identification, generation, and translation. Why it matters: This benchmark and associated datasets will help improve LLMs' ability to understand and generate diverse Arabic dialects and cultural contexts, addressing a significant gap in current models.
This paper explores Dialectal Arabic (DA) to Modern Standard Arabic (MSA) machine translation using prompting and fine-tuning techniques for Levantine, Egyptian, and Gulf dialects. The study found that few-shot prompting outperformed zero-shot and chain-of-thought methods across six large language models, with GPT-4o achieving the highest performance. A quantized Gemma2-9B model achieved a chrF++ score of 49.88, outperforming zero-shot GPT-4o (44.58). Why it matters: The research provides a resource-efficient pipeline for DA-MSA translation, enabling more inclusive language technologies by addressing the challenges posed by dialectal variations in Arabic.
This paper critically examines common assumptions about Arabic dialects used in NLP. The authors analyze a multi-label dataset where sentences in 11 country-level dialects were assessed by native speakers. The analysis reveals that widely held assumptions about dialect grouping and distinctions are oversimplified and not always accurate. Why it matters: The findings suggest that current approaches in Arabic NLP tasks like dialect identification may be limited by these inaccurate assumptions, hindering further progress in the field.