The paper introduces a two-step approach for transliterating Judeo-Arabic text (written in Hebrew script) into Arabic script. The method involves character-level mapping followed by post-correction to fix grammatical and orthographic errors. The authors also benchmarked LLMs on the transliteration task and demonstrate that transliteration enables the use of Arabic NLP tools on Judeo-Arabic. Why it matters: This work makes Judeo-Arabic texts more accessible to Arabic NLP, enabling processing and analysis that was previously impossible.
This survey paper reviews the landscape of Natural Language Processing (NLP) research and applications in the Arab world. It discusses the unique challenges posed by the Arabic language, such as its morphological complexity and dialectal diversity. The paper also presents a historical overview of Arabic NLP and surveys various research areas, including machine translation, sentiment analysis, and speech recognition. Why it matters: The survey provides a comprehensive resource for researchers and practitioners interested in the current state and future directions of Arabic NLP, a field critical for enabling AI technologies to serve Arabic-speaking communities.
This paper surveys the landscape of code-switched Arabic natural language processing, covering the mixture of Modern Standard Arabic, dialects, and foreign languages. It examines current efforts, challenges, and research gaps in the field. The survey also provides recommendations for future research directions in code-switched Arabic NLP. Why it matters: Understanding code-switching is crucial for developing effective language technologies that can handle the diverse linguistic landscape of the Arab world.
The paper introduces Juhaina, a 9.24B parameter Arabic-English bilingual LLM trained with an 8,192 token context window. It identifies limitations in the Open Arabic LLM Leaderboard (OALL) and proposes a new benchmark, CamelEval, for more comprehensive evaluation. Juhaina outperforms models like Llama and Gemma in generating helpful Arabic responses and understanding cultural nuances. Why it matters: This culturally-aligned LLM and associated benchmark could significantly advance Arabic NLP and democratize AI access for Arabic speakers.