This paper analyzes Arabic text generated by LLMs like ALLaM, Jais, Llama, and GPT-4 across academic and social media domains using stylometric analysis. The study found detectable linguistic patterns that differentiate human-written from machine-generated Arabic text. BERT-based detection models achieved up to 99.9% F1-score in formal contexts, though cross-domain generalization remains a challenge. Why it matters: The research lays groundwork for detecting AI-generated misinformation in Arabic, a crucial step for preserving information integrity in Arabic-language contexts.
This study investigates the ability of six large language models, including Jais, Mistral, and GPT-4o, to mimic human emotional expression in English and personality markers in Arabic. The researchers evaluated whether machine classifiers could distinguish between human-authored and AI-generated texts and assessed the emotional/personality traits exhibited by the LLMs. Results indicate that AI-generated texts are distinguishable from human-authored ones, with classification performance impacted by paraphrasing, and that LLMs encode affective signals differently than humans. Why it matters: The findings have implications for authorship attribution, affective computing, and the responsible deployment of AI, especially in under-resourced languages like Arabic.
This paper introduces DetectLLM-LRR and DetectLLM-NPR, two novel zero-shot methods for detecting machine-generated text using log rank information. Experiments across three datasets and seven language models demonstrate improvements of up to 3.9 AUROC points over state-of-the-art methods. The code and data for both methods are available on Github.
The GenAI Content Detection Task 1 is a shared task on detecting machine-generated text, featuring monolingual (English) and multilingual subtasks. The task, part of the GenAI workshop at COLING 2025, attracted 36 teams for the English subtask and 26 for the multilingual one. The organizers provide a detailed overview of the data, results, system rankings, and analysis of the submitted systems.
MBZUAI researchers introduce M4, a multi-generator, multi-domain, and multi-lingual benchmark dataset for detecting machine-generated text. The study reveals challenges in generalizing detection across unseen domains or LLMs, with detectors often misclassifying machine-generated text as human-written. The dataset aims to foster research into more robust detection methods and is available on GitHub.