Challenging the promise of invisible ink in the era of large models

MBZUAI · Significant research

Summary

MBZUAI researchers Nils Lukas and Toluwani Samuel Aremu will present a paper at ICML 2025 demonstrating the vulnerability of current watermarking techniques in LLMs. Their research shows that adaptive paraphrasers can evade detection from watermarks with negligible impact on text quality, costing less than $10 of GPU compute. The attack involves fine-tuning a small open-weight model to rewrite sentences until surrogate keys no longer trigger detection. Why it matters: This work highlights critical weaknesses in current AI provenance methods, suggesting the need for more robust watermarking techniques to maintain trust in the authenticity of AI-generated content.

Keywords

MBZUAI · Watermarking · LLM · ICML · AI provenance

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

VENOM: Text-driven Unrestricted Adversarial Example Generation with Diffusion Models

arXiv · Jan 14

The paper introduces VENOM, a text-driven framework for generating high-quality unrestricted adversarial examples using diffusion models. VENOM unifies image content generation and adversarial synthesis into a single reverse diffusion process, enhancing both attack success rate and image quality. The framework incorporates an adaptive adversarial guidance strategy with momentum to ensure the generated adversarial examples align with the distribution of natural images.

Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation

arXiv · Dec 15

Researchers at MBZUAI have demonstrated a method called "Data Laundering" to artificially boost language model benchmark scores using knowledge distillation. The technique covertly transfers benchmark-specific knowledge, leading to inflated accuracy without genuine improvements in reasoning. The study highlights a vulnerability in current AI evaluation practices and calls for more robust benchmarks.

When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards

arXiv · Feb 1

Researchers from the National Center for AI in Saudi Arabia investigated the sensitivity of Large Language Model (LLM) leaderboards to minor benchmark perturbations. They found that small changes, like choice order, can shift rankings by up to 8 positions. The study recommends hybrid scoring and warns against over-reliance on simple benchmark evaluations, providing code for further research.

Supporting Undotted Arabic with Pre-trained Language Models

arXiv · Nov 18

The paper examines the performance of pre-trained Arabic language models on Arabic text intentionally stripped of diacritical dots to evade content classification. It proposes methods to support these "undotted" texts without retraining the models. The proposed methods achieve nearly perfect performance on one downstream task. Why it matters: The research highlights a vulnerability in Arabic NLP and offers solutions to maintain performance in the face of adversarial text manipulation.