The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text
arXiv ·
This paper analyzes Arabic text generated by LLMs like ALLaM, Jais, Llama, and GPT-4 across academic and social media domains using stylometric analysis. The study found detectable linguistic patterns that differentiate human-written from machine-generated Arabic text. BERT-based detection models achieved up to 99.9% F1-score in formal contexts, though cross-domain generalization remains a challenge. Why it matters: The research lays groundwork for detecting AI-generated misinformation in Arabic, a crucial step for preserving information integrity in Arabic-language contexts.