Skip to content
GCC AI Research

Advancing Dialectal Arabic to Modern Standard Arabic Machine Translation

arXiv · · Significant research

Summary

This paper explores Dialectal Arabic (DA) to Modern Standard Arabic (MSA) machine translation using prompting and fine-tuning techniques for Levantine, Egyptian, and Gulf dialects. The study found that few-shot prompting outperformed zero-shot and chain-of-thought methods across six large language models, with GPT-4o achieving the highest performance. A quantized Gemma2-9B model achieved a chrF++ score of 49.88, outperforming zero-shot GPT-4o (44.58). Why it matters: The research provides a resource-efficient pipeline for DA-MSA translation, enabling more inclusive language technologies by addressing the challenges posed by dialectal variations in Arabic.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task

arXiv ·

The fifth Nuanced Arabic Dialect Identification (NADI) 2024 shared task aimed to advance Arabic NLP through dialect identification and dialect-to-MSA machine translation. 51 teams registered, with 12 participating and submitting 76 valid submissions across three subtasks. The winning teams achieved 50.57 F1 for multi-label dialect identification, 0.1403 RMSE for dialectness level identification, and 20.44 BLEU for dialect-to-MSA translation. Why it matters: The results highlight the continued challenges in Arabic dialect processing and provide a benchmark for future research in this area.