Dr. Teresa Lynn from Dublin City University (DCU) discussed the challenges in developing NLP tools for Irish, a low-resource language facing digital extinction. She highlighted the lack of speech and language applications and fundamental language resources for Irish. Lynn also mentioned her work at DCU on the GaelTech project and her involvement in the European Language Equality project. Why it matters: The development of NLP tools for low-resource languages like Irish is crucial for preserving linguistic diversity and preventing digital marginalization in the AI era.
A study investigated language shift from Tibetan to Arabic among Tibetan families who migrated to Saudi Arabia 70 years ago. Data from 96 participants across three age groups revealed significant intergenerational differences in language use. Younger members rarely used Tibetan, while older members used it slightly more, with a p-value of .001 indicating statistical significance.
This survey paper reviews the landscape of Natural Language Processing (NLP) research and applications in the Arab world. It discusses the unique challenges posed by the Arabic language, such as its morphological complexity and dialectal diversity. The paper also presents a historical overview of Arabic NLP and surveys various research areas, including machine translation, sentiment analysis, and speech recognition. Why it matters: The survey provides a comprehensive resource for researchers and practitioners interested in the current state and future directions of Arabic NLP, a field critical for enabling AI technologies to serve Arabic-speaking communities.
Prof. Daniel Panario gave a seminar on irreducible polynomials over finite fields and their applications in cryptography. The seminar covered how finite fields are used as basic components in many cryptographic applications. It surveyed families of irreducible polynomials and commented on their properties. Why it matters: The talk highlights the mathematical foundations and ongoing research relevant to cryptographic implementations in the region.
Conor McMenamin from Universitat Pompeu Fabra presented a seminar on State Machine Replication (SMR) without honest participants. The talk covered the limitations of current SMR protocols and introduced the ByRa model, a framework for player characterization free of honest participants. He then described FAIRSICAL, a sandbox SMR protocol, and discussed how the ideas could be extended to real-world protocols, with a focus on blockchains and cryptocurrencies. Why it matters: This research on SMR protocols and their incentive compatibility could lead to more robust and secure blockchain technologies in the region.
The ArabJobs dataset is a new corpus of over 8,500 Arabic job advertisements collected from Egypt, Jordan, Saudi Arabia, and the UAE. The dataset contains over 550,000 words and captures linguistic, regional, and socio-economic variation in the Arab labor market. It is available on GitHub and can be used for fairness-aware Arabic NLP and labor market research.
MBZUAI Professor Timothy Baldwin delivered the presidential keynote at the 60th Annual Meeting of the Association for Computational Linguistics (ACL). Baldwin also published three papers at the conference, including work on biomedical literature summarization, NLP for Indonesian languages, and understanding procedural texts. The papers address challenges such as reducing human effort in reviewing medical documents and digitally preserving Indonesian indigenous languages. Why it matters: Baldwin's contributions and leadership role at ACL highlight the growing prominence of MBZUAI and GCC-based researchers in the global NLP community.