Skip to content
GCC AI Research

Search

Results for "tweet classification"

Hunting for Spammers: Detecting Evolved Spammers on Twitter

arXiv ·

A study analyzes spam content on trending hashtags on Saudi Twitter, finding that approximately 75% of the total generated content is spam. The paper assesses the performance of previous spam detection systems on a newly gathered dataset and proposes an updated manual classification algorithm to improve accuracy. Adapted features are used to build a new data-driven detection system to respond to spammers' evolving techniques. Why it matters: The high prevalence of spam in Arabic content on Twitter necessitates the development of adaptive detection techniques to maintain the quality and trustworthiness of online information in the region.

Overview of the Arabic Sentiment Analysis 2021 Competition at KAUST

arXiv ·

KAUST organized an Arabic Sentiment Analysis Challenge where participants developed ML models to classify tweets as positive, negative, or neutral. The competition used the ASAD dataset with 55K tweets for training, 20K for validation, and 20K for final evaluation. The full dataset of 100K labeled tweets has been released for public use.

Utilizing Social Media Analytics to Detect Trends in Saudi Arabias Evolving Market

arXiv ·

This paper explores how AI and social media analytics can identify and track trends in Saudi Arabia across sectors such as construction, food and beverage, tourism, technology, and entertainment. The study analyzed millions of social media posts each month, classifying discussions and calculating scores to track trends. The AI-driven methodology was able to predict the emergence and growth of trends by utilizing social media data.

AraNet: A Deep Learning Toolkit for Arabic Social Media

arXiv ·

Researchers introduce AraNet, a deep learning toolkit for Arabic social media processing. The toolkit uses BERT models trained on social media datasets to predict age, dialect, gender, emotion, irony, and sentiment. AraNet achieves state-of-the-art or competitive performance on these tasks without feature engineering. Why it matters: The public release of AraNet accelerates Arabic NLP research by providing a comprehensive, deep learning-based tool for various social media analysis tasks.

Detecting Propaganda Techniques in Code-Switched Social Media Text

arXiv ·

This paper introduces a new task: detecting propaganda techniques in code-switched text. The authors created and released a corpus of 1,030 English-Roman Urdu code-switched texts annotated with 20 propaganda techniques. Experiments show the importance of directly modeling multilinguality and using the right fine-tuning strategy for this task.

Combining Context-Free and Contextualized Representations for Arabic Sarcasm Detection and Sentiment Identification

arXiv ·

This paper presents team SPPU-AASM's hybrid model for Arabic sarcasm and sentiment detection in the WANLP ArSarcasm shared task 2021. The model combines sentence representations from AraBERT with static word vectors trained on Arabic social media corpora. Results show the system achieves an F1-sarcastic score of 0.62 and a F-PN score of 0.715, outperforming existing approaches. Why it matters: The research demonstrates that combining context-free and contextualized representations improves performance in nuanced Arabic NLP tasks like sarcasm and sentiment analysis.

ASAD: A Twitter-based Benchmark Arabic Sentiment Analysis Dataset

arXiv ·

Researchers introduce ASAD, a new large-scale, high-quality Arabic Sentiment Analysis Dataset based on 95K tweets with positive, negative, and neutral labels. The dataset is launched with a competition sponsored by KAUST offering a total of 17000 USD in prizes. Baseline models are implemented and results reported to provide a reference for competition participants.

Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021

arXiv ·

This paper introduces two shared tasks for abusive and threatening language detection in Urdu, a low-resource language with over 170 million speakers. The tasks involve binary classification of Urdu tweets into Abusive/Non-Abusive and Threatening/Non-Threatening categories, respectively. Datasets of 2400/6000 training tweets and 1100/3950 testing tweets were created and manually annotated, along with logistic regression and BERT-based baselines. 21 teams participated and the best systems achieved F1-scores of 0.880 and 0.545 on the abusive and threatening language tasks, respectively, with m-BERT showing the best performance.