Detecting Propaganda Techniques in Code-Switched Social Media Text

arXiv · May 23, 2023 · Notable

Summary

This paper introduces a new task: detecting propaganda techniques in code-switched text. The authors created and released a corpus of 1,030 English-Roman Urdu code-switched texts annotated with 20 propaganda techniques. Experiments show the importance of directly modeling multilinguality and using the right fine-tuning strategy for this task.

Keywords

propaganda detection · code-switching · Roman Urdu · corpus · multilingual

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

A Survey of Code-switched Arabic NLP: Progress, Challenges, and Future Directions

arXiv · Jan 23

This paper surveys the landscape of code-switched Arabic natural language processing, covering the mixture of Modern Standard Arabic, dialects, and foreign languages. It examines current efforts, challenges, and research gaps in the field. The survey also provides recommendations for future research directions in code-switched Arabic NLP. Why it matters: Understanding code-switching is crucial for developing effective language technologies that can handle the diverse linguistic landscape of the Arab world.

The power of propaganda and AI’s ability to fight it

MBZUAI · Invalid Date

MBZUAI 2023 graduate Muhammad Umar is researching propaganda detection in low-resource, code-switched languages like Roman Urdu. His master's thesis focuses on detecting propaganda techniques in social media text using deep learning models. Umar aims to submit a paper on his findings to the EMNLP 2023 conference. Why it matters: This research addresses the under-explored area of propaganda detection in low-resource languages, which is crucial for combating misinformation in bilingual communities.

Nexus at ArAIEval Shared Task: Fine-Tuning Arabic Language Models for Propaganda and Disinformation Detection

arXiv · Nov 6

This paper describes the Nexus team's participation in the ArAIEval shared task focused on detecting propaganda and disinformation in Arabic. The team fine-tuned transformer models and experimented with zero- and few-shot learning using GPT-4. Nexus's system achieved 9th place in subtask 1A and 10th place in subtask 2A. Why it matters: The work contributes to the important goal of automatically identifying and mitigating the spread of disinformation in Arabic content, which is critical for maintaining societal trust and informed public discourse.

MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection

arXiv · Feb 12

The paper introduces MultiProSE, the first multi-label Arabic dataset for propaganda, sentiment, and emotion detection. It extends the existing ArPro dataset with sentiment and emotion annotations, resulting in 8,000 annotated news articles. Baseline models, including GPT-4o-mini and BERT-based models, were developed for each task, and the dataset, guidelines, and code are publicly available. Why it matters: This resource enables further research into Arabic language models and a better understanding of opinion dynamics within Arabic news media.