The InterText project, funded by the European Research Council, aims to advance NLP by developing a framework for modeling fine-grained relationships between texts. This approach enables tracing the origin and evolution of texts and ideas. Iryna Gurevych from the Technical University of Darmstadt presented the intertextual approach to NLP, covering data modeling, representation learning, and practical applications. Why it matters: This research could enable a new generation of AI applications for text work and critical reading, with potential applications in collaborative knowledge construction and document revision assistance.
This paper introduces a new task: detecting propaganda techniques in code-switched text. The authors created and released a corpus of 1,030 English-Roman Urdu code-switched texts annotated with 20 propaganda techniques. Experiments show the importance of directly modeling multilinguality and using the right fine-tuning strategy for this task.
KAUST Associate Professor Xiangliang Zhang is using machine learning to analyze social media posts on Twitter related to COVID-19. Her team at KAUST's Computational Bioscience Research Center is analyzing sentiment in tweets using hashtags like #coronavirus and #covid19. Zhang aims to use this data to help predict localized outbreaks and provide an early warning system for governments and organizations. Why it matters: This research demonstrates the potential of AI-powered sentiment analysis to support public health efforts and inform decision-making during pandemics in the Middle East and globally.
The article discusses the challenges in effectively applying text classification techniques, despite the availability of tools like LibMultiLabel. It highlights the importance of guiding users to appropriately use machine learning methods due to considerations in practical applications such as evaluation criteria and data strategies. The piece also mentions a panel discussion hosted by MBZUAI in collaboration with the Manara Center for Coexistence and Dialogue. Why it matters: This signals ongoing efforts within the UAE AI ecosystem to address practical challenges and promote responsible AI usage in NLP applications.
The GenAI Content Detection Task 1 is a shared task on detecting machine-generated text, featuring monolingual (English) and multilingual subtasks. The task, part of the GenAI workshop at COLING 2025, attracted 36 teams for the English subtask and 26 for the multilingual one. The organizers provide a detailed overview of the data, results, system rankings, and analysis of the submitted systems.
Researchers developed a semantic search tool for the Quran using Arabic NLP techniques. The tool was trained on a dataset of over 30 tafsirs (interpretations) of the Quran. Using the SNxLM model and cosine similarity, the tool identifies Quranic verses most relevant to a user's query, achieving a similarity score of up to 0.97. Why it matters: This tool could significantly improve access to the Quran's teachings for Arabic speakers and researchers, providing a valuable resource for religious study and understanding.
This paper provides an overview of the UrduFake@FIRE2021 shared task, which focused on fake news detection in the Urdu language. The task involved binary classification of news articles into real or fake categories using a dataset of 1300 training and 300 testing articles across five domains. 34 teams registered, with 18 submitting results and 11 providing technical reports detailing various approaches from BoW to Transformer models, with the best system achieving an F1-macro score of 0.679.
This paper focuses on analyzing surveys of women entrepreneurs in the UAE using machine learning techniques. The goal is to extract relevant insights from the data to understand the current landscape and predict future trends. The study aims to support better business decisions related to women in entrepreneurship.