Skip to content
GCC AI Research

AI and the Arabic language: Preserving cultural heritage and enabling future discovery

MBZUAI · Significant research

Summary

This article discusses MBZUAI's efforts in advancing Arabic language AI, including the development of advanced linguistic models using deep learning techniques. Key initiatives include Jais, a 13B parameter Arabic LLM developed in collaboration with G42's Inception, and Atlas-Chat, which understands the Moroccan dialect. The university is also incorporating Arabic in practical AI solutions like BiMediX2, a healthcare multi-modal model that understands medical queries in both English and Arabic. Why it matters: These initiatives are crucial for preserving Arabic cultural heritage, enabling future discovery, and addressing linguistic challenges specific to the Arabic language in AI applications.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

The Landscape of Arabic Large Language Models (ALLMs): A New Era for Arabic Language Technology

arXiv ·

This article surveys the landscape of Arabic Large Language Models (ALLMs), tracing their evolution from early text processing systems to sophisticated AI models. It highlights the unique challenges and opportunities in developing ALLMs for the 422 million Arabic speakers across 27 countries. The paper also examines the evaluation of ALLMs through benchmarks and public leaderboards. Why it matters: ALLMs can bridge technological gaps and empower Arabic-speaking communities by catering to their specific linguistic and cultural needs.

Studying the History of the Arabic Language: Language Technology and a Large-Scale Historical Corpus

arXiv ·

This paper introduces a large-scale historical corpus of written Arabic spanning 1400 years. The corpus was cleaned and processed using Arabic NLP tools, including identification of reused text. The study uses a novel automatic periodization algorithm to study the history of the Arabic language, confirming the division into Modern Standard and Classical Arabic. Why it matters: This resource enables further computational research into the evolution of Arabic and the development of NLP tools for historical texts.

A Panoramic Survey of Natural Language Processing in the Arab World

arXiv ·

This survey paper reviews the landscape of Natural Language Processing (NLP) research and applications in the Arab world. It discusses the unique challenges posed by the Arabic language, such as its morphological complexity and dialectal diversity. The paper also presents a historical overview of Arabic NLP and surveys various research areas, including machine translation, sentiment analysis, and speech recognition. Why it matters: The survey provides a comprehensive resource for researchers and practitioners interested in the current state and future directions of Arabic NLP, a field critical for enabling AI technologies to serve Arabic-speaking communities.

Advancing cultural diversity through AI

MBZUAI ·

MBZUAI is conducting research to improve cross-cultural understanding using AI, including studying LLM limitations in recognizing cultural references. They developed "Culturally Yours," a tool that helps users comprehend cultural references in text, and the "All Languages Matter Benchmark" (ALM Bench) to evaluate multimodal LLMs across 100 languages. MBZUAI has also developed LLMs tailored to low-resource languages like Jais (Arabic), Nanda (Hindi), and Sherkala (Kazakh). Why it matters: These initiatives promote inclusivity and ensure AI systems are culturally aware and can serve diverse populations effectively, particularly in the Middle East's multicultural context.