MBZUAI researchers presented studies at EMNLP and ArabicNLP conferences on improving NLP for diverse languages, especially Arabic. One study evaluated ChatGPT and GPT-4's performance across Arabic dialects, finding limitations compared to English. GPT-4 showed better performance than GPT-3.5 in Arabic. Why it matters: This research highlights the need for NLP models to better support the linguistic diversity of Arabic and other languages to avoid widening existing technological gaps.
This survey paper reviews the landscape of Natural Language Processing (NLP) research and applications in the Arab world. It discusses the unique challenges posed by the Arabic language, such as its morphological complexity and dialectal diversity. The paper also presents a historical overview of Arabic NLP and surveys various research areas, including machine translation, sentiment analysis, and speech recognition. Why it matters: The survey provides a comprehensive resource for researchers and practitioners interested in the current state and future directions of Arabic NLP, a field critical for enabling AI technologies to serve Arabic-speaking communities.
This paper surveys the landscape of code-switched Arabic natural language processing, covering the mixture of Modern Standard Arabic, dialects, and foreign languages. It examines current efforts, challenges, and research gaps in the field. The survey also provides recommendations for future research directions in code-switched Arabic NLP. Why it matters: Understanding code-switching is crucial for developing effective language technologies that can handle the diverse linguistic landscape of the Arab world.
KAUST researchers have developed a parameter-efficient learning approach to identify Arabic dialects using limited data and computing power, fine-tuning the Whisper model with a dataset of 17 dialects. The model achieves high accuracy using only 2.5% of the parameters of the larger model and 30% of the training data. Srijith Radhakrishnan presented the findings at EMNLP 2023 and Interspeech 2023. Why it matters: This research addresses the challenge of dialect identification in Arabic NLP and enables more efficient use of large language models in resource-constrained environments.
This article surveys the landscape of Arabic Large Language Models (ALLMs), tracing their evolution from early text processing systems to sophisticated AI models. It highlights the unique challenges and opportunities in developing ALLMs for the 422 million Arabic speakers across 27 countries. The paper also examines the evaluation of ALLMs through benchmarks and public leaderboards. Why it matters: ALLMs can bridge technological gaps and empower Arabic-speaking communities by catering to their specific linguistic and cultural needs.