Skip to content
GCC AI Research

Search

Results for "authorship attribution"

The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text

arXiv ·

This paper analyzes Arabic text generated by LLMs like ALLaM, Jais, Llama, and GPT-4 across academic and social media domains using stylometric analysis. The study found detectable linguistic patterns that differentiate human-written from machine-generated Arabic text. BERT-based detection models achieved up to 99.9% F1-score in formal contexts, though cross-domain generalization remains a challenge. Why it matters: The research lays groundwork for detecting AI-generated misinformation in Arabic, a crucial step for preserving information integrity in Arabic-language contexts.

A mystery fit for a DetectAIve: Classifying machine involvement in writing

MBZUAI ·

Researchers at MBZUAI have developed LLM-DetectAIve, a tool to classify the degree of machine involvement in text generation. The system categorizes text into four types: human-written, machine-generated, machine-written and machine-humanized, and human-written and machine-polished. A demo website allows users to test the tool's ability to detect machine involvement. Why it matters: This research addresses the growing need to identify and classify AI-generated content in academic and professional settings, particularly in light of increasing LLM misuse.

FAID: Fine-Grained AI-Generated Text Detection Using Multi-Task Auxiliary and Multi-Level Contrastive Learning

arXiv ·

MBZUAI researchers introduce FAID, a fine-grained AI-generated text detection framework capable of classifying text as human-written, LLM-generated, or collaboratively written. FAID utilizes multi-level contrastive learning and multi-task auxiliary classification to capture authorship and model-specific characteristics, and can identify the underlying LLM family. The framework outperforms existing baselines, especially in generalizing to unseen domains and new LLMs, and includes a multilingual, multi-domain dataset called FAIDSet.

Is AI Catching Up to Human Expression? Exploring Emotion, Personality, Authorship, and Linguistic Style in English and Arabic with Six Large Language Models

arXiv ·

This study investigates the ability of six large language models, including Jais, Mistral, and GPT-4o, to mimic human emotional expression in English and personality markers in Arabic. The researchers evaluated whether machine classifiers could distinguish between human-authored and AI-generated texts and assessed the emotional/personality traits exhibited by the LLMs. Results indicate that AI-generated texts are distinguishable from human-authored ones, with classification performance impacted by paraphrasing, and that LLMs encode affective signals differently than humans. Why it matters: The findings have implications for authorship attribution, affective computing, and the responsible deployment of AI, especially in under-resourced languages like Arabic.

LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

arXiv ·

MBZUAI researchers release LLM-DetectAIve, a tool for fine-grained detection of machine-generated text across four categories: human-written, machine-generated, machine-written then humanized, and human-written then machine-polished. The tool aims to address concerns about misuse of LLMs, especially in education and academia, by identifying attempts to obfuscate or polish content. LLM-DetectAIve is publicly accessible with code and a demonstration video provided.

M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

arXiv ·

MBZUAI researchers introduce M4, a multi-generator, multi-domain, and multi-lingual benchmark dataset for detecting machine-generated text. The study reveals challenges in generalizing detection across unseen domains or LLMs, with detectors often misclassifying machine-generated text as human-written. The dataset aims to foster research into more robust detection methods and is available on GitHub.

M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

arXiv ·

MBZUAI researchers introduce M4GT-Bench, a new benchmark for evaluating machine-generated text (MGT) detection across multiple languages and domains. The benchmark includes tasks for binary MGT detection, identifying the specific model that generated the text, and detecting mixed human-machine text. Experiments with baseline models and human evaluation show that MGT detection performance is highly dependent on access to training data from the same domain and generators.