GCC AI Research

Archive Monthly

May 2023

6 articles

Top Stories

M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

arXiv · · NLP LLM

MBZUAI researchers introduce M4, a multi-generator, multi-domain, and multi-lingual benchmark dataset for detecting machine-generated text. The study reveals challenges in generalizing detection across unseen domains or LLMs, with detectors often misclassifying machine-generated text as human-written. The dataset aims to foster research into more robust detection methods and is available on GitHub.

Bactrian-X: Multilingual Replicable Instruction-Following Models with Low-Rank Adaptation

arXiv · · NLP LLM

MBZUAI releases Bactrian-X, a multilingual parallel dataset of 3.4 million instruction-response pairs across 52 languages. They trained low-rank adaptation (LoRA) adapters using this dataset, creating lightweight, replaceable components for large language models. Experiments show the LoRA-based models outperform vanilla and existing instruction-tuned models in multilingual settings.

Fact-Checking Complex Claims with Program-Guided Reasoning

arXiv · · NLP LLM

This paper introduces ProgramFC, a fact-checking model that decomposes complex claims into simpler sub-tasks using a library of functions. The model uses LLMs to generate reasoning programs and executes them by delegating sub-tasks, enhancing explainability and data efficiency. Experiments on fact-checking datasets demonstrate ProgramFC's superior performance compared to baseline methods, with publicly available code and data.