Skip to content
GCC AI Research

Search

Results for "abusive language detection"

Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021

arXiv ·

This paper introduces two shared tasks for abusive and threatening language detection in Urdu, a low-resource language with over 170 million speakers. The tasks involve binary classification of Urdu tweets into Abusive/Non-Abusive and Threatening/Non-Threatening categories, respectively. Datasets of 2400/6000 training tweets and 1100/3950 testing tweets were created and manually annotated, along with logistic regression and BERT-based baselines. 21 teams participated and the best systems achieved F1-scores of 0.880 and 0.545 on the abusive and threatening language tasks, respectively, with m-BERT showing the best performance.

Detecting Propaganda Techniques in Code-Switched Social Media Text

arXiv ·

This paper introduces a new task: detecting propaganda techniques in code-switched text. The authors created and released a corpus of 1,030 English-Roman Urdu code-switched texts annotated with 20 propaganda techniques. Experiments show the importance of directly modeling multilinguality and using the right fine-tuning strategy for this task.

GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human

arXiv ·

The GenAI Content Detection Task 1 is a shared task on detecting machine-generated text, featuring monolingual (English) and multilingual subtasks. The task, part of the GenAI workshop at COLING 2025, attracted 36 teams for the English subtask and 26 for the multilingual one. The organizers provide a detailed overview of the data, results, system rankings, and analysis of the submitted systems.