Hunting for Spammers: Detecting Evolved Spammers on Twitter

arXiv · December 8, 2015 · Notable

Summary

A study analyzes spam content on trending hashtags on Saudi Twitter, finding that approximately 75% of the total generated content is spam. The paper assesses the performance of previous spam detection systems on a newly gathered dataset and proposes an updated manual classification algorithm to improve accuracy. Adapted features are used to build a new data-driven detection system to respond to spammers' evolving techniques. Why it matters: The high prevalence of spam in Arabic content on Twitter necessitates the development of adaptive detection techniques to maintain the quality and trustworthiness of online information in the region.

Keywords

spam detection · Arabic NLP · Twitter · Saudi Arabia · social media

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

Understanding & Predicting User Lifetime with Machine Learning in an Anonymous Location-Based Social Network

arXiv · Mar 1

Researchers studied user lifetime prediction in the location-based social network Jodel within Saudi Arabia, leveraging its disjoint communities. Machine learning models, particularly Random Forest, were trained to predict user lifetime as a regression and classification problem. A single countrywide model generalizes well and performs similarly to community-specific models.

Detecting Propaganda Techniques in Code-Switched Social Media Text

arXiv · May 23

This paper introduces a new task: detecting propaganda techniques in code-switched text. The authors created and released a corpus of 1,030 English-Roman Urdu code-switched texts annotated with 20 propaganda techniques. Experiments show the importance of directly modeling multilinguality and using the right fine-tuning strategy for this task.

FAID: Fine-Grained AI-Generated Text Detection Using Multi-Task Auxiliary and Multi-Level Contrastive Learning

arXiv · May 20

MBZUAI researchers introduce FAID, a fine-grained AI-generated text detection framework capable of classifying text as human-written, LLM-generated, or collaboratively written. FAID utilizes multi-level contrastive learning and multi-task auxiliary classification to capture authorship and model-specific characteristics, and can identify the underlying LLM family. The framework outperforms existing baselines, especially in generalizing to unseen domains and new LLMs, and includes a multilingual, multi-domain dataset called FAIDSet.

The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text

arXiv · May 29

This paper analyzes Arabic text generated by LLMs like ALLaM, Jais, Llama, and GPT-4 across academic and social media domains using stylometric analysis. The study found detectable linguistic patterns that differentiate human-written from machine-generated Arabic text. BERT-based detection models achieved up to 99.9% F1-score in formal contexts, though cross-domain generalization remains a challenge. Why it matters: The research lays groundwork for detecting AI-generated misinformation in Arabic, a crucial step for preserving information integrity in Arabic-language contexts.