This paper introduces AraDhati+, a new comprehensive dataset for Arabic subjectivity analysis created by combining existing datasets like ASTD, LABR, HARD, and SANAD. The researchers fine-tuned Arabic language models including XLM-RoBERTa, AraBERT, and ArabianGPT on AraDhati+ for subjectivity classification. An ensemble decision approach achieved 97.79% accuracy. Why it matters: The work addresses the under-resourced nature of Arabic NLP by providing a new dataset and demonstrating strong results in subjectivity classification, advancing sentiment analysis capabilities for the Arabic language.
A new methodology emulating fact-checker criteria assesses news outlet factuality and bias using LLMs. The approach uses prompts based on fact-checking criteria to elicit and aggregate LLM responses for predictions. Experiments demonstrate improvements over baselines, with error analysis on media popularity and region, and a released dataset/code at https://github.com/mbzuai-nlp/llm-media-profiling.
A new paper coauthored by researchers at The University of Melbourne and MBZUAI explores disagreement in human annotation for AI training. The paper treats disagreement as a signal (human label variation or HLV) rather than noise, and proposes new evaluation metrics based on fuzzy set theory. These metrics adapt accuracy and F-score to cases where multiple labels may plausibly apply, aligning model output with the distribution of human judgments. Why it matters: This research addresses a key challenge in NLP by accounting for the inherent ambiguity in human language, potentially leading to more robust and human-aligned AI systems.
This paper introduces a new task: detecting propaganda techniques in code-switched text. The authors created and released a corpus of 1,030 English-Roman Urdu code-switched texts annotated with 20 propaganda techniques. Experiments show the importance of directly modeling multilinguality and using the right fine-tuning strategy for this task.
A talk will present two projects related to the use of NLP for estimating a client’s depression severity and well-being. The first project examines emotional coherence between the subjective experience of emotions and emotion expression in therapy using transformer-based emotion recognition models. The second project proposes a semantic pipeline to study depression severity in individuals based on their social media posts by exploring different aggregation methods to answer one of four Beck Depression Inventory (BDI) options per symptom. Why it matters: This research explores how NLP techniques can be applied to mental health assessment, potentially offering new tools for diagnosis and treatment monitoring.
The InterText project, funded by the European Research Council, aims to advance NLP by developing a framework for modeling fine-grained relationships between texts. This approach enables tracing the origin and evolution of texts and ideas. Iryna Gurevych from the Technical University of Darmstadt presented the intertextual approach to NLP, covering data modeling, representation learning, and practical applications. Why it matters: This research could enable a new generation of AI applications for text work and critical reading, with potential applications in collaborative knowledge construction and document revision assistance.
MBZUAI Professor Preslav Nakov has developed FRAPPE, an interactive website that analyzes news articles to identify persuasion techniques. FRAPPE helps users understand framing, persuasion, and propaganda at an aggregate level, across different news outlets and countries. Presented at EACL, FRAPPE uses 23 specific techniques categorized into six broader buckets, such as 'attack on reputation' and 'manipulative wording'. Why it matters: The tool addresses the increasing difficulty in discerning factual information from disinformation, providing a means to identify biases in news media from different countries.