Skip to content
GCC AI Research

Search

Results for "Cross-dialect Translation"

Aladdin-FTI @ AMIYA Three Wishes for Arabic NLP: Fidelity, Diglossia, and Multidialectal Generation

arXiv ·

The paper introduces Aladdin-FTI, a system designed for generating and translating dialectal Arabic (DA). Aladdin-FTI supports text generation in Moroccan, Egyptian, Palestinian, Syrian, and Saudi dialects. It also handles bidirectional translation between these dialects, Modern Standard Arabic (MSA), and English. Why it matters: This work contributes to addressing the under-representation of Arabic dialects in NLP research and enables more inclusive Arabic language models.

Advancing Dialectal Arabic to Modern Standard Arabic Machine Translation

arXiv ·

This paper explores Dialectal Arabic (DA) to Modern Standard Arabic (MSA) machine translation using prompting and fine-tuning techniques for Levantine, Egyptian, and Gulf dialects. The study found that few-shot prompting outperformed zero-shot and chain-of-thought methods across six large language models, with GPT-4o achieving the highest performance. A quantized Gemma2-9B model achieved a chrF++ score of 49.88, outperforming zero-shot GPT-4o (44.58). Why it matters: The research provides a resource-efficient pipeline for DA-MSA translation, enabling more inclusive language technologies by addressing the challenges posed by dialectal variations in Arabic.

From FusHa to Folk: Exploring Cross-Lingual Transfer in Arabic Language Models

arXiv ·

This paper explores cross-lingual transfer in Arabic language models, which are typically pretrained on Modern Standard Arabic (MSA) but expected to generalize to diverse dialects. The study uses probing on 3 NLP tasks and representational similarity analysis to assess transfer effectiveness. Results show transfer is uneven across dialects, partially linked to geographic proximity, and models trained on all dialects exhibit negative interference. Why it matters: The findings highlight challenges in cross-lingual transfer for Arabic NLP and raise questions about dialect similarity for model training.

NADI 2023: The Fourth Nuanced Arabic Dialect Identification Shared Task

arXiv ·

The fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023) aimed to advance Arabic NLP through shared tasks focused on dialect identification and dialect-to-MSA machine translation. 58 teams registered, with 18 participating across three subtasks: dialect identification, dialect-to-MSA translation, and another translation task. The winning teams achieved 87.27 F1 in dialect identification, 14.76 BLEU in one translation task, and 21.10 BLEU in the other. Why it matters: NADI provides valuable benchmarks and datasets for Arabic dialect processing, encouraging further research in this challenging area.

NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task

arXiv ·

The fifth Nuanced Arabic Dialect Identification (NADI) 2024 shared task aimed to advance Arabic NLP through dialect identification and dialect-to-MSA machine translation. 51 teams registered, with 12 participating and submitting 76 valid submissions across three subtasks. The winning teams achieved 50.57 F1 for multi-label dialect identification, 0.1403 RMSE for dialectness level identification, and 20.44 BLEU for dialect-to-MSA translation. Why it matters: The results highlight the continued challenges in Arabic dialect processing and provide a benchmark for future research in this area.