This paper introduces a new task: detecting propaganda techniques in code-switched text. The authors created and released a corpus of 1,030 English-Roman Urdu code-switched texts annotated with 20 propaganda techniques. Experiments show the importance of directly modeling multilinguality and using the right fine-tuning strategy for this task.
MBZUAI 2023 graduate Muhammad Umar is researching propaganda detection in low-resource, code-switched languages like Roman Urdu. His master's thesis focuses on detecting propaganda techniques in social media text using deep learning models. Umar aims to submit a paper on his findings to the EMNLP 2023 conference. Why it matters: This research addresses the under-explored area of propaganda detection in low-resource languages, which is crucial for combating misinformation in bilingual communities.
This paper describes the Nexus team's participation in the ArAIEval shared task focused on detecting propaganda and disinformation in Arabic. The team fine-tuned transformer models and experimented with zero- and few-shot learning using GPT-4. Nexus's system achieved 9th place in subtask 1A and 10th place in subtask 2A. Why it matters: The work contributes to the important goal of automatically identifying and mitigating the spread of disinformation in Arabic content, which is critical for maintaining societal trust and informed public discourse.
The paper introduces MultiProSE, the first multi-label Arabic dataset for propaganda, sentiment, and emotion detection. It extends the existing ArPro dataset with sentiment and emotion annotations, resulting in 8,000 annotated news articles. Baseline models, including GPT-4o-mini and BERT-based models, were developed for each task, and the dataset, guidelines, and code are publicly available. Why it matters: This resource enables further research into Arabic language models and a better understanding of opinion dynamics within Arabic news media.
A new methodology emulating fact-checker criteria assesses news outlet factuality and bias using LLMs. The approach uses prompts based on fact-checking criteria to elicit and aggregate LLM responses for predictions. Experiments demonstrate improvements over baselines, with error analysis on media popularity and region, and a released dataset/code at https://github.com/mbzuai-nlp/llm-media-profiling.
This paper explores multilingual satire detection methods in English and Arabic using zero-shot and chain-of-thought (CoT) prompting. It compares the performance of Jais-chat(13B) and LLaMA-2-chat(7B) on distinguishing satire from truthful news. Results show that CoT prompting significantly improves Jais-chat's performance, achieving an F1-score of 80% in English. Why it matters: This demonstrates the potential of Arabic LLMs like Jais to handle nuanced language tasks such as satire detection, which is critical for combating misinformation in the region.
MBZUAI Professor Preslav Nakov has developed FRAPPE, an interactive website that analyzes news articles to identify persuasion techniques. FRAPPE helps users understand framing, persuasion, and propaganda at an aggregate level, across different news outlets and countries. Presented at EACL, FRAPPE uses 23 specific techniques categorized into six broader buckets, such as 'attack on reputation' and 'manipulative wording'. Why it matters: The tool addresses the increasing difficulty in discerning factual information from disinformation, providing a means to identify biases in news media from different countries.
MBZUAI Professor Preslav Nakov is researching methods to identify and combat the harmful uses of large language models in generating disinformation. He notes that disinformation, unlike fake news, is weaponized with the intent to persuade, not just to lie. His research focuses on the linguistic differences between human-written and machine-generated disinformation, such as the use of rhetorical devices in human propaganda. Why it matters: As AI-generated content becomes more prevalent, understanding and mitigating its potential for spreading disinformation is critical for maintaining trust and integrity in information ecosystems, especially during major election cycles.