Skip to content
GCC AI Research

MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection

arXiv · · Significant research

Summary

The paper introduces MultiProSE, the first multi-label Arabic dataset for propaganda, sentiment, and emotion detection. It extends the existing ArPro dataset with sentiment and emotion annotations, resulting in 8,000 annotated news articles. Baseline models, including GPT-4o-mini and BERT-based models, were developed for each task, and the dataset, guidelines, and code are publicly available. Why it matters: This resource enables further research into Arabic language models and a better understanding of opinion dynamics within Arabic news media.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

Detecting Propaganda Techniques in Code-Switched Social Media Text

arXiv ·

This paper introduces a new task: detecting propaganda techniques in code-switched text. The authors created and released a corpus of 1,030 English-Roman Urdu code-switched texts annotated with 20 propaganda techniques. Experiments show the importance of directly modeling multilinguality and using the right fine-tuning strategy for this task.

Nexus at ArAIEval Shared Task: Fine-Tuning Arabic Language Models for Propaganda and Disinformation Detection

arXiv ·

This paper describes the Nexus team's participation in the ArAIEval shared task focused on detecting propaganda and disinformation in Arabic. The team fine-tuned transformer models and experimented with zero- and few-shot learning using GPT-4. Nexus's system achieved 9th place in subtask 1A and 10th place in subtask 2A. Why it matters: The work contributes to the important goal of automatically identifying and mitigating the spread of disinformation in Arabic content, which is critical for maintaining societal trust and informed public discourse.