MBZUAI researchers Nils Lukas and Toluwani Samuel Aremu will present a paper at ICML 2025 demonstrating the vulnerability of current watermarking techniques in LLMs. Their research shows that adaptive paraphrasers can evade detection from watermarks with negligible impact on text quality, costing less than $10 of GPU compute. The attack involves fine-tuning a small open-weight model to rewrite sentences until surrogate keys no longer trigger detection. Why it matters: This work highlights critical weaknesses in current AI provenance methods, suggesting the need for more robust watermarking techniques to maintain trust in the authenticity of AI-generated content.
The paper introduces VENOM, a text-driven framework for generating high-quality unrestricted adversarial examples using diffusion models. VENOM unifies image content generation and adversarial synthesis into a single reverse diffusion process, enhancing both attack success rate and image quality. The framework incorporates an adaptive adversarial guidance strategy with momentum to ensure the generated adversarial examples align with the distribution of natural images.
This paper introduces SemDiff, a novel method for generating unrestricted adversarial examples (UAEs) by exploring the semantic latent space of diffusion models. SemDiff uses multi-attribute optimization to ensure attack success while preserving the naturalness and imperceptibility of generated UAEs. Experiments on high-resolution datasets demonstrate SemDiff's superior performance compared to state-of-the-art methods in attack success rate and imperceptibility, while also evading defenses.
The paper introduces LLMEffiChecker, a tool to test the computational efficiency robustness of LLMs by identifying vulnerabilities that can significantly degrade performance. LLMEffiChecker uses both white-box (gradient-guided perturbation) and black-box (causal inference-based perturbation) methods to delay the generation of the end-of-sequence token. Experiments on nine public LLMs demonstrate that LLMEffiChecker can substantially increase response latency and energy consumption with minimal input perturbations.
Researchers at MBZUAI have demonstrated a method called "Data Laundering" to artificially boost language model benchmark scores using knowledge distillation. The technique covertly transfers benchmark-specific knowledge, leading to inflated accuracy without genuine improvements in reasoning. The study highlights a vulnerability in current AI evaluation practices and calls for more robust benchmarks.