Thamar Solorio of MBZUAI served as general chair of EMNLP 2024, which hosted over 4,000 attendees. MBZUAI researchers presented nearly 50 studies, including one co-authored by Solorio and Monojit Choudhury that received an Outstanding Paper Award. Key themes included cultural awareness, machine-generated content detection, and LLM empathy and cultural representation. Why it matters: MBZUAI's strong presence at EMNLP highlights its growing influence in the international NLP research community and its focus on culturally aware AI.
Thamar Solorio from the University of Houston will discuss machine learning approaches for spontaneous human language processing. The talk will cover adapting multilingual transformers to code-switching data and using data augmentation for domain adaptation in sequence labeling tasks. Solorio will also provide an overview of other research projects at the RiTUAL lab, focusing on the scarcity of labeled data. Why it matters: This presentation addresses key challenges in Arabic NLP related to data scarcity, which is a persistent obstacle in developing effective AI applications for the region.
The first Workshop on Language Models for Low-Resource Languages (LoResLM 2025) was held in Abu Dhabi as part of COLING 2025. It provided a forum for researchers to share work on language models for low-resource languages. The workshop accepted 35 papers from 52 submissions, covering diverse languages and research areas.
Dr. Teresa Lynn from Dublin City University (DCU) discussed the challenges in developing NLP tools for Irish, a low-resource language facing digital extinction. She highlighted the lack of speech and language applications and fundamental language resources for Irish. Lynn also mentioned her work at DCU on the GaelTech project and her involvement in the European Language Equality project. Why it matters: The development of NLP tools for low-resource languages like Irish is crucial for preserving linguistic diversity and preventing digital marginalization in the AI era.
The AraFinNLP 2024 shared task introduced two subtasks focused on Arabic financial NLP: multi-dialect intent detection and cross-dialect translation with intent preservation. It utilized the updated ArBanking77 dataset, containing 39k parallel queries in MSA and four dialects, labeled with 77 banking-related intents. 45 teams registered, with 11 participating in intent detection (achieving a top F1 score of 0.8773) and only 1 team attempting translation (achieving a BLEU score of 1.667). Why it matters: This initiative addresses the need for specialized Arabic NLP tools in the growing Arab financial sector, promoting advancements in areas like banking chatbots and machine translation.