Skip to content
GCC AI Research

Search

Results for "Arabic-English"

ALLaM: Large Language Models for Arabic and English

arXiv ·

The paper introduces ALLaM, a series of large language models for Arabic and English, designed to support Arabic Language Technologies. The models are trained with language alignment and knowledge transfer in mind, using a decoder-only architecture. ALLaM achieves state-of-the-art results on Arabic benchmarks like MMLU Arabic and Arabic Exams. Why it matters: This work advances Arabic NLP by providing high-performing LLMs and demonstrating effective techniques for cross-lingual transfer learning and alignment with human preferences.

Arabic Stable LM: Adapting Stable LM 2 1.6B to Arabic

arXiv ·

The paper introduces Arabic Stable LM, a 1.6B parameter Arabic-centric language model, in both base and chat versions. The Arabic Stable LM 1.6B chat model achieves strong results on several benchmarks, outperforming models with up to 8x more parameters. The study also demonstrates the benefit of incorporating synthetic instruction tuning data through a large synthetic dialogue dataset. Why it matters: This work makes Arabic LLMs more accessible by reducing the parameter size while maintaining strong performance, facilitating deployment in resource-constrained environments.

Challenges and Solutions in Developing Code-switched Arabic-English NLP Systems

MBZUAI ·

Injy Hamed from NYU Abu Dhabi's CAMeL Lab presented work on Egyptian Arabic-English code-switching for ASR and MT. She discussed the ArzEn-ST speech translation corpus and compared end-to-end and hybrid systems for ASR. For MT, she presented data augmentation and word segmentation techniques to handle data scarcity, also addressing ASR evaluation challenges in code-switching. Why it matters: Research into code-switching is crucial for building NLP systems capable of processing real-world language use in the Arab world.

Aladdin-FTI @ AMIYA Three Wishes for Arabic NLP: Fidelity, Diglossia, and Multidialectal Generation

arXiv ·

The paper introduces Aladdin-FTI, a system designed for generating and translating dialectal Arabic (DA). Aladdin-FTI supports text generation in Moroccan, Egyptian, Palestinian, Syrian, and Saudi dialects. It also handles bidirectional translation between these dialects, Modern Standard Arabic (MSA), and English. Why it matters: This work contributes to addressing the under-representation of Arabic dialects in NLP research and enables more inclusive Arabic language models.

Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale

arXiv ·

The Hala technical report introduces a family of Arabic-centric instruction and translation models developed using a translate-and-tune pipeline. A strong Arabic-English teacher model is compressed to FP8 and used to create bilingual supervision data. The LFM2-1.2B model is fine-tuned on this data and used to translate English instruction sets into Arabic, creating a million-scale corpus. Why it matters: The release of models, data, evaluation tools, and recipes will accelerate research and development in Arabic NLP, providing valuable resources for the community.

QCRI Machine Translation Systems for IWSLT 16

arXiv ·

This paper describes QCRI's machine translation systems for the IWSLT 2016 evaluation campaign, focusing on Arabic-English and English-Arabic tracks. They built both Phrase-based and Neural machine translation models. A Neural MT system, trained by stacking data from different genres through fine-tuning, and applying ensemble over 8 models, outperformed a strong phrase-based system by 2 BLEU points in the Arabic->English direction. Why it matters: The research highlights the early promise of neural machine translation for Arabic language pairs, demonstrating its potential to surpass traditional methods.

A Panoramic Survey of Natural Language Processing in the Arab World

arXiv ·

This survey paper reviews the landscape of Natural Language Processing (NLP) research and applications in the Arab world. It discusses the unique challenges posed by the Arabic language, such as its morphological complexity and dialectal diversity. The paper also presents a historical overview of Arabic NLP and surveys various research areas, including machine translation, sentiment analysis, and speech recognition. Why it matters: The survey provides a comprehensive resource for researchers and practitioners interested in the current state and future directions of Arabic NLP, a field critical for enabling AI technologies to serve Arabic-speaking communities.