Search

Results for "jailbreak attacks"

How jailbreak attacks work and a new way to stop them

MBZUAI · Invalid Date

Researchers at MBZUAI and other institutions have published a study at ACL 2024 investigating how jailbreak attacks work on LLMs. The study used a dataset of 30,000 prompts and non-linear probing to interpret the effects of jailbreak attacks, finding that existing interpretations were inadequate. The researchers propose a new approach to improve LLM safety against such attacks by identifying the layers in neural networks where the behavior occurs. Why it matters: Understanding and mitigating jailbreak attacks is crucial for ensuring the responsible and secure deployment of LLMs, particularly in the Arabic-speaking world where these models are increasingly being used.

Your voice can jailbreak a speech model – here’s how to stop it, without retraining

MBZUAI · Invalid Date

A new paper from MBZUAI demonstrates that state-of-the-art speech models can be easily jailbroken using audio perturbations to generate harmful content, achieving success rates of 76-93% on models like Qwen2-Audio and LLaMA-Omni. The researchers adapted projected gradient descent (PGD) to the audio domain to optimize waveforms that push the model towards harmful responses. They propose a defense mechanism based on post-hoc activation patching that hardens models at inference time without retraining. Why it matters: This research highlights a critical vulnerability in speech-based LLMs and offers a practical solution, contributing to the development of more secure and trustworthy AI systems in the region and globally.

CRC Seminar Series - Cristofaro Mune, Niek Timmers

TII · Mar 17

Cristofaro Mune and Niek Timmers presented a seminar on bypassing unbreakable crypto using fault injection on Espressif ESP32 chips. The presentation detailed how the hardware-based Encrypted Secure Boot implementation of the ESP32 SoC was bypassed using a single EM glitch, without knowing the decryption key. This attack exploited multiple hardware vulnerabilities, enabling arbitrary code execution and extraction of plain-text data from external flash. Why it matters: The research highlights critical security vulnerabilities in embedded systems and the potential for fault injection attacks to bypass secure boot mechanisms, necessitating stronger hardware-level security measures.

VENOM: Text-driven Unrestricted Adversarial Example Generation with Diffusion Models

arXiv · Jan 14

The paper introduces VENOM, a text-driven framework for generating high-quality unrestricted adversarial examples using diffusion models. VENOM unifies image content generation and adversarial synthesis into a single reverse diffusion process, enhancing both attack success rate and image quality. The framework incorporates an adaptive adversarial guidance strategy with momentum to ensure the generated adversarial examples align with the distribution of natural images.

Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark

arXiv · Feb 14

This paper introduces a novel black-box adversarial attack method, Mixup-Attack, to generate universal adversarial examples for remote sensing data. The method identifies common vulnerabilities in neural networks by attacking features in the shallow layer of a surrogate model. The authors also present UAE-RS, the first dataset of black-box adversarial samples in remote sensing, to benchmark the robustness of deep learning models against adversarial attacks.

Opossum Attack

TII · Mar 17

Researchers at TII, in cooperation with University Paderborn and Ruhr University Bochum, have discovered a vulnerability called the Opossum Attack in Transport Layer Security (TLS) impacting protocols like HTTP(S), FTP(S), POP3(S), and SMTP(S). The vulnerability exposes a risk of desynchronization between client and server communications, potentially leading to exploits like session fixation and content confusion. Scans revealed over 2.9 million potentially affected servers, including over 1.4 million IMAP servers and 1.1 million POP3 servers. Why it matters: This discovery highlights the importance of ongoing cybersecurity research in the UAE and internationally to identify and address vulnerabilities in fundamental internet protocols, especially as it led to immediate action by Apache and Cyrus IMAPd.