Skip to content
GCC AI Research

Search

Results for "Nile-Chat"

Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts

arXiv ·

The authors introduce Nile-Chat, a collection of LLMs (4B, 3x4B-A6B, and 12B) specifically for the Egyptian dialect, capable of understanding and generating text in both Arabic and Latin scripts. A novel language adaptation approach using the Branch-Train-MiX strategy is used to merge script-specialized experts into a single MoE model. Nile-Chat models outperform multilingual and Arabic LLMs like LLaMa, Jais, and ALLaM on newly introduced Egyptian benchmarks, with the 12B model achieving a 14.4% performance gain over Qwen2.5-14B-Instruct on Latin-script benchmarks; all resources are publicly available. Why it matters: This work addresses the overlooked aspect of adapting LLMs to dual-script languages, providing a methodology for creating more inclusive and representative language models in the Arabic-speaking world.

Egyptian AI startup Nanovate raises $1m pre-seed funding round - - Disrupt Africa

GCC AI Startup ·

Nanovate, an Egyptian AI startup, has raised $1 million in pre-seed funding. The round was led by বিনিয়োগ, with participation from angel investors. The company plans to use the funds to expand its AI-powered solutions across various sectors. Why it matters: The funding will enable Nanovate to further develop its AI capabilities and expand its reach in the Egyptian market.

Egyptian Arabic to English Statistical Machine Translation System for NIST OpenMT'2015

arXiv ·

This paper describes the QCRI-Columbia-NYUAD group's Egyptian Arabic-to-English statistical machine translation system submitted to the NIST OpenMT'2015 competition. The system used tools like 3arrib and MADAMIRA for processing and standardizing informal dialectal Arabic. The system was trained using phrase-based SMT with features such as operation sequence model, class-based language model and neural network joint model. Why it matters: The work demonstrates advances in machine translation for dialectal Arabic, a challenging but important area for regional communication and NLP research.

NADI 2022: The Third Nuanced Arabic Dialect Identification Shared Task

arXiv ·

The third Nuanced Arabic Dialect Identification Shared Task (NADI 2022) focused on advancing Arabic NLP through dialect identification and sentiment analysis at the country level. A total of 21 teams participated, with the winning team achieving 27.06 F1 score on dialect identification and 75.16 F1 score on sentiment analysis. The task highlights the challenges in Arabic dialect processing and motivates further research. Why it matters: Standardized evaluations like NADI are crucial for benchmarking progress and fostering innovation in Arabic NLP, especially for dialectal variations.

A Panoramic Survey of Natural Language Processing in the Arab World

arXiv ·

This survey paper reviews the landscape of Natural Language Processing (NLP) research and applications in the Arab world. It discusses the unique challenges posed by the Arabic language, such as its morphological complexity and dialectal diversity. The paper also presents a historical overview of Arabic NLP and surveys various research areas, including machine translation, sentiment analysis, and speech recognition. Why it matters: The survey provides a comprehensive resource for researchers and practitioners interested in the current state and future directions of Arabic NLP, a field critical for enabling AI technologies to serve Arabic-speaking communities.