May 2023

6 articles

Top Stories

M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

arXiv · May 24 · NLP LLM

MBZUAI researchers introduce M4, a multi-generator, multi-domain, and multi-lingual benchmark dataset for detecting machine-generated text. The study reveals challenges in generalizing detection across unseen domains or LLMs, with detectors often misclassifying machine-generated text as human-written. The dataset aims to foster research into more robust detection methods and is available on GitHub.

Bactrian-X: Multilingual Replicable Instruction-Following Models with Low-Rank Adaptation

arXiv · May 24 · NLP LLM

MBZUAI releases Bactrian-X, a multilingual parallel dataset of 3.4 million instruction-response pairs across 52 languages. They trained low-rank adaptation (LoRA) adapters using this dataset, creating lightweight, replaceable components for large language models. Experiments show the LoRA-based models outperform vanilla and existing instruction-tuned models in multilingual settings.

DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text

arXiv · May 23 · NLP LLM

This paper introduces DetectLLM-LRR and DetectLLM-NPR, two novel zero-shot methods for detecting machine-generated text using log rank information. Experiments across three datasets and seven language models demonstrate improvements of up to 3.9 AUROC points over state-of-the-art methods. The code and data for both methods are available on Github.

Fact-Checking Complex Claims with Program-Guided Reasoning

arXiv · May 22 · NLP LLM

This paper introduces ProgramFC, a fact-checking model that decomposes complex claims into simpler sub-tasks using a library of functions. The model uses LLMs to generate reasoning programs and executes them by delegating sub-tasks, enhancing explainability and data efficiency. Experiments on fact-checking datasets demonstrate ProgramFC's superior performance compared to baseline methods, with publicly available code and data.

More This Month

Modeling Complex Object Changes in Satellite Image Time-Series: Approach based on CSP and Spatiotemporal Graph

arXiv · May 24 · CV Research

Detecting Propaganda Techniques in Code-Switched Social Media Text

arXiv · May 23 · NLP Arabic AI

← All months Weekly archive →