DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text

arXiv · May 23, 2023 · Significant research

Summary

This paper introduces DetectLLM-LRR and DetectLLM-NPR, two novel zero-shot methods for detecting machine-generated text using log rank information. Experiments across three datasets and seven language models demonstrate improvements of up to 3.9 AUROC points over state-of-the-art methods. The code and data for both methods are available on Github.

Keywords

LLM detection · zero-shot learning · log rank · DetectLLM-LRR · DetectLLM-NPR

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

arXiv · Aug 8

MBZUAI researchers release LLM-DetectAIve, a tool for fine-grained detection of machine-generated text across four categories: human-written, machine-generated, machine-written then humanized, and human-written then machine-polished. The tool aims to address concerns about misuse of LLMs, especially in education and academia, by identifying attempts to obfuscate or polish content. LLM-DetectAIve is publicly accessible with code and a demonstration video provided.

GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human

arXiv · Jan 19

The GenAI Content Detection Task 1 is a shared task on detecting machine-generated text, featuring monolingual (English) and multilingual subtasks. The task, part of the GenAI workshop at COLING 2025, attracted 36 teams for the English subtask and 26 for the multilingual one. The organizers provide a detailed overview of the data, results, system rankings, and analysis of the submitted systems.

A mystery fit for a DetectAIve: Classifying machine involvement in writing

MBZUAI · Invalid Date

Researchers at MBZUAI have developed LLM-DetectAIve, a tool to classify the degree of machine involvement in text generation. The system categorizes text into four types: human-written, machine-generated, machine-written and machine-humanized, and human-written and machine-polished. A demo website allows users to test the tool's ability to detect machine involvement. Why it matters: This research addresses the growing need to identify and classify AI-generated content in academic and professional settings, particularly in light of increasing LLM misuse.

M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

arXiv · May 24

MBZUAI researchers introduce M4, a multi-generator, multi-domain, and multi-lingual benchmark dataset for detecting machine-generated text. The study reveals challenges in generalizing detection across unseen domains or LLMs, with detectors often misclassifying machine-generated text as human-written. The dataset aims to foster research into more robust detection methods and is available on GitHub.