Technology Innovation Institute (TII) in Abu Dhabi, in collaboration with LightOn, has launched NOOR, a 10 billion parameter Arabic natural language processing (NLP) model. The model was trained on a large, high-quality cross-domain Arabic dataset including web data, books, poetry, news, and technical information. It enables applications in automated summarization, chatbots, and personalized marketing. Why it matters: NOOR represents a significant advancement in Arabic NLP, potentially enabling more sophisticated AI applications tailored to the Arabic language and regional needs.
TII and LightOn have partnered to build the NOOR Platform for exascale computing, aimed at developing foundation models. The collaboration will leverage LightOn's expertise in large language models, with the first output being the largest Arabic language model to date. The platform will provide high-quality data pipelines and facilitate extreme-scale distributed training and serving. Why it matters: This partnership aims to establish Abu Dhabi as a center of AI excellence and boost the UAE's ambitions in high-tech innovation and NLP research.
The paper introduces ORCA, a new public benchmark for evaluating Arabic language understanding. ORCA covers diverse Arabic varieties and includes 60 datasets across seven NLU task clusters. The benchmark was used to compare 18 multilingual and Arabic language models and includes a public leaderboard with a unified evaluation metric. Why it matters: ORCA addresses the lack of a comprehensive Arabic benchmark, enabling better progress measurement for Arabic and multilingual language models.
Five young researchers from KAUST participated in the virtual 70th Lindau Nobel Laureate Meeting, which focused on interdisciplinarity. The KAUST participants included Ph.D. students, postdocs, and faculty member Nazek El-Atab. El-Atab's research focuses on smart memory and electronic devices, with applications in computing and sensing. Why it matters: KAUST's representation at this prestigious event highlights the university's commitment to fostering scientific collaboration and innovation among its researchers.
Noura Shehab, a KAUST environmental engineering Ph.D. graduate (2014), now works as a material science researcher at RPD Innovations. Her research focuses on microbial electrochemical technologies and sustainable solutions for water scarcity. Shehab led a KAUST team in the Hult Prize in 2013 and is the incoming KAUST Saudi Arabian alumni chapter president. Why it matters: The profile highlights KAUST's role in developing scientific talent and fostering innovation in sustainable technologies relevant to Saudi Arabia.
Thamar Solorio from the University of Houston will discuss machine learning approaches for spontaneous human language processing. The talk will cover adapting multilingual transformers to code-switching data and using data augmentation for domain adaptation in sequence labeling tasks. Solorio will also provide an overview of other research projects at the RiTUAL lab, focusing on the scarcity of labeled data. Why it matters: This presentation addresses key challenges in Arabic NLP related to data scarcity, which is a persistent obstacle in developing effective AI applications for the region.