The paper introduces Aladdin-FTI, a system designed for generating and translating dialectal Arabic (DA). Aladdin-FTI supports text generation in Moroccan, Egyptian, Palestinian, Syrian, and Saudi dialects. It also handles bidirectional translation between these dialects, Modern Standard Arabic (MSA), and English. Why it matters: This work contributes to addressing the under-representation of Arabic dialects in NLP research and enables more inclusive Arabic language models.
The paper introduces ALLaM, a series of large language models for Arabic and English, designed to support Arabic Language Technologies. The models are trained with language alignment and knowledge transfer in mind, using a decoder-only architecture. ALLaM achieves state-of-the-art results on Arabic benchmarks like MMLU Arabic and Arabic Exams. Why it matters: This work advances Arabic NLP by providing high-performing LLMs and demonstrating effective techniques for cross-lingual transfer learning and alignment with human preferences.
The fifth Nuanced Arabic Dialect Identification (NADI) 2024 shared task aimed to advance Arabic NLP through dialect identification and dialect-to-MSA machine translation. 51 teams registered, with 12 participating and submitting 76 valid submissions across three subtasks. The winning teams achieved 50.57 F1 for multi-label dialect identification, 0.1403 RMSE for dialectness level identification, and 20.44 BLEU for dialect-to-MSA translation. Why it matters: The results highlight the continued challenges in Arabic dialect processing and provide a benchmark for future research in this area.
The third Nuanced Arabic Dialect Identification Shared Task (NADI 2022) focused on advancing Arabic NLP through dialect identification and sentiment analysis at the country level. A total of 21 teams participated, with the winning team achieving 27.06 F1 score on dialect identification and 75.16 F1 score on sentiment analysis. The task highlights the challenges in Arabic dialect processing and motivates further research. Why it matters: Standardized evaluations like NADI are crucial for benchmarking progress and fostering innovation in Arabic NLP, especially for dialectal variations.
The fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023) aimed to advance Arabic NLP through shared tasks focused on dialect identification and dialect-to-MSA machine translation. 58 teams registered, with 18 participating across three subtasks: dialect identification, dialect-to-MSA translation, and another translation task. The winning teams achieved 87.27 F1 in dialect identification, 14.76 BLEU in one translation task, and 21.10 BLEU in the other. Why it matters: NADI provides valuable benchmarks and datasets for Arabic dialect processing, encouraging further research in this challenging area.