Middle East AI

This Week arXiv

Overview of the Arabic Sentiment Analysis 2021 Competition at KAUST

arXiv · · Notable

Summary

KAUST organized an Arabic Sentiment Analysis Challenge where participants developed ML models to classify tweets as positive, negative, or neutral. The competition used the ASAD dataset with 55K tweets for training, 20K for validation, and 20K for final evaluation. The full dataset of 100K labeled tweets has been released for public use.

Keywords

Arabic Sentiment Analysis · KAUST · ASAD dataset · machine learning · tweet classification

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

ASAD: A Twitter-based Benchmark Arabic Sentiment Analysis Dataset

arXiv ·

Researchers introduce ASAD, a new large-scale, high-quality Arabic Sentiment Analysis Dataset based on 95K tweets with positive, negative, and neutral labels. The dataset is launched with a competition sponsored by KAUST offering a total of 17000 USD in prizes. Baseline models are implemented and results reported to provide a reference for competition participants.

Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021

arXiv ·

This paper introduces two shared tasks for abusive and threatening language detection in Urdu, a low-resource language with over 170 million speakers. The tasks involve binary classification of Urdu tweets into Abusive/Non-Abusive and Threatening/Non-Threatening categories, respectively. Datasets of 2400/6000 training tweets and 1100/3950 testing tweets were created and manually annotated, along with logistic regression and BERT-based baselines. 21 teams participated and the best systems achieved F1-scores of 0.880 and 0.545 on the abusive and threatening language tasks, respectively, with m-BERT showing the best performance.

Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2021

arXiv ·

This paper provides an overview of the UrduFake@FIRE2021 shared task, which focused on fake news detection in the Urdu language. The task involved binary classification of news articles into real or fake categories using a dataset of 1300 training and 300 testing articles across five domains. 34 teams registered, with 18 submitting results and 11 providing technical reports detailing various approaches from BoW to Transformer models, with the best system achieving an F1-macro score of 0.679.

UrduFake@FIRE2021: Shared Track on Fake News Identification in Urdu

arXiv ·

The UrduFake@FIRE2021 shared task focused on fake news detection in the Urdu language, framed as a binary classification problem. 34 teams registered, with 18 submitting results and 11 providing technical reports, showcasing diverse approaches. The top-performing system utilized the stochastic gradient descent (SGD) algorithm, achieving an F-score of 0.679.