Nexus at ArAIEval Shared Task: Fine-Tuning Arabic Language Models for Propaganda and Disinformation Detection

arXiv · November 6, 2023 · Notable

Summary

This paper describes the Nexus team's participation in the ArAIEval shared task focused on detecting propaganda and disinformation in Arabic. The team fine-tuned transformer models and experimented with zero- and few-shot learning using GPT-4. Nexus's system achieved 9th place in subtask 1A and 10th place in subtask 2A. Why it matters: The work contributes to the important goal of automatically identifying and mitigating the spread of disinformation in Arabic content, which is critical for maintaining societal trust and informed public discourse.

Keywords

Arabic · disinformation · propaganda · ArAIEval · GPT-4

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

Detecting Propaganda Techniques in Code-Switched Social Media Text

arXiv · May 23

This paper introduces a new task: detecting propaganda techniques in code-switched text. The authors created and released a corpus of 1,030 English-Roman Urdu code-switched texts annotated with 20 propaganda techniques. Experiments show the importance of directly modeling multilinguality and using the right fine-tuning strategy for this task.

MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection

arXiv · Feb 12

The paper introduces MultiProSE, the first multi-label Arabic dataset for propaganda, sentiment, and emotion detection. It extends the existing ArPro dataset with sentiment and emotion annotations, resulting in 8,000 annotated news articles. Baseline models, including GPT-4o-mini and BERT-based models, were developed for each task, and the dataset, guidelines, and code are publicly available. Why it matters: This resource enables further research into Arabic language models and a better understanding of opinion dynamics within Arabic news media.

Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2021

arXiv · Jul 11

This paper provides an overview of the UrduFake@FIRE2021 shared task, which focused on fake news detection in the Urdu language. The task involved binary classification of news articles into real or fake categories using a dataset of 1300 training and 300 testing articles across five domains. 34 teams registered, with 18 submitting results and 11 providing technical reports detailing various approaches from BoW to Transformer models, with the best system achieving an F1-macro score of 0.679.

From FusHa to Folk: Exploring Cross-Lingual Transfer in Arabic Language Models

arXiv · Feb 10

Arabic Language Models (LMs) are primarily pretrained on Modern Standard Arabic (MSA), with an expectation of transferring to diverse Arabic dialects for real-world applications. This work explores cross-lingual transfer in Arabic LMs using probing on three Natural Language Processing (NLP) tasks and representational similarity. The findings indicate that transfer is possible but disproportionate across dialects, with some evidence of negative interference in models trained to support all Arabic dialects. Why it matters: This research highlights crucial challenges for building robust Arabic AI systems that effectively handle the significant linguistic diversity of the Arab world.