Your voice can jailbreak a speech model – here’s how to stop it, without retraining

MBZUAI · Significant research

Summary

A new paper from MBZUAI demonstrates that state-of-the-art speech models can be easily jailbroken using audio perturbations to generate harmful content, achieving success rates of 76-93% on models like Qwen2-Audio and LLaMA-Omni. The researchers adapted projected gradient descent (PGD) to the audio domain to optimize waveforms that push the model towards harmful responses. They propose a defense mechanism based on post-hoc activation patching that hardens models at inference time without retraining. Why it matters: This research highlights a critical vulnerability in speech-based LLMs and offers a practical solution, contributing to the development of more secure and trustworthy AI systems in the region and globally.

Keywords

speech models · jailbreak · MBZUAI · audio perturbations · security

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

How jailbreak attacks work and a new way to stop them

MBZUAI · Invalid Date

Researchers at MBZUAI and other institutions have published a study at ACL 2024 investigating how jailbreak attacks work on LLMs. The study used a dataset of 30,000 prompts and non-linear probing to interpret the effects of jailbreak attacks, finding that existing interpretations were inadequate. The researchers propose a new approach to improve LLM safety against such attacks by identifying the layers in neural networks where the behavior occurs. Why it matters: Understanding and mitigating jailbreak attacks is crucial for ensuring the responsible and secure deployment of LLMs, particularly in the Arabic-speaking world where these models are increasingly being used.

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

arXiv · Mar 6

MBZUAI researchers introduce LLMVoX, a 30M-parameter, LLM-agnostic, autoregressive streaming text-to-speech (TTS) system that generates high-quality speech with low latency. The system preserves the capabilities of the base LLM and achieves a lower Word Error Rate compared to speech-enabled LLMs. LLMVoX supports seamless, infinite-length dialogues and generalizes to new languages with dataset adaptation, including Arabic.

Research talk on Privacy and Security Issues in Speech

MBZUAI · Invalid Date

A research talk was given on privacy and security issues in speech processing, highlighting the unique privacy challenges due to the biometric information embedded in speech. The talk covered the legal landscape, proposed solutions like cryptographic and hashing-based methods, and adversarial processing techniques. Dr. Bhiksha Raj from Carnegie Mellon University, an expert in speech and audio processing, delivered the talk. Why it matters: As speech-based interfaces become more prevalent in the Middle East, understanding and addressing the associated privacy risks is crucial for ethical AI development and deployment.

How many queries does it take to break an AI? We put a number on it.

MBZUAI · Invalid Date

MBZUAI researchers presented a NeurIPS 2024 Spotlight paper that quantifies AI vulnerability by measuring bits leaked per query. Their formula predicts the minimum queries needed for attacks based on mutual information between model output and attacker's target. Experiments across seven models and three attack types (system-prompt extraction, jailbreaks, relearning) validate the relationship. Why it matters: This work offers a framework to translate UI choices (like exposing log-probs or chain-of-thought) into concrete attack surfaces, informing more secure AI design and deployment in the region.

Your voice can jailbreak a speech model – here’s how to stop it, without retraining

Summary

Keywords

Related

How jailbreak attacks work and a new way to stop them

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Research talk on Privacy and Security Issues in Speech

How many queries does it take to break an AI? We put a number on it.