Researchers at MBZUAI and other institutions have published a study at ACL 2024 investigating how jailbreak attacks work on LLMs. The study used a dataset of 30,000 prompts and non-linear probing to interpret the effects of jailbreak attacks, finding that existing interpretations were inadequate. The researchers propose a new approach to improve LLM safety against such attacks by identifying the layers in neural networks where the behavior occurs. Why it matters: Understanding and mitigating jailbreak attacks is crucial for ensuring the responsible and secure deployment of LLMs, particularly in the Arabic-speaking world where these models are increasingly being used.
Cristofaro Mune and Niek Timmers presented a seminar on bypassing unbreakable crypto using fault injection on Espressif ESP32 chips. The presentation detailed how the hardware-based Encrypted Secure Boot implementation of the ESP32 SoC was bypassed using a single EM glitch, without knowing the decryption key. This attack exploited multiple hardware vulnerabilities, enabling arbitrary code execution and extraction of plain-text data from external flash. Why it matters: The research highlights critical security vulnerabilities in embedded systems and the potential for fault injection attacks to bypass secure boot mechanisms, necessitating stronger hardware-level security measures.
A new paper from MBZUAI demonstrates that state-of-the-art speech models can be easily jailbroken using audio perturbations to generate harmful content, achieving success rates of 76-93% on models like Qwen2-Audio and LLaMA-Omni. The researchers adapted projected gradient descent (PGD) to the audio domain to optimize waveforms that push the model towards harmful responses. They propose a defense mechanism based on post-hoc activation patching that hardens models at inference time without retraining. Why it matters: This research highlights a critical vulnerability in speech-based LLMs and offers a practical solution, contributing to the development of more secure and trustworthy AI systems in the region and globally.
Researchers at TII, in cooperation with University Paderborn and Ruhr University Bochum, have discovered a vulnerability called the Opossum Attack in Transport Layer Security (TLS) impacting protocols like HTTP(S), FTP(S), POP3(S), and SMTP(S). The vulnerability exposes a risk of desynchronization between client and server communications, potentially leading to exploits like session fixation and content confusion. Scans revealed over 2.9 million potentially affected servers, including over 1.4 million IMAP servers and 1.1 million POP3 servers. Why it matters: This discovery highlights the importance of ongoing cybersecurity research in the UAE and internationally to identify and address vulnerabilities in fundamental internet protocols, especially as it led to immediate action by Apache and Cyrus IMAPd.
This paper introduces SemDiff, a novel method for generating unrestricted adversarial examples (UAEs) by exploring the semantic latent space of diffusion models. SemDiff uses multi-attribute optimization to ensure attack success while preserving the naturalness and imperceptibility of generated UAEs. Experiments on high-resolution datasets demonstrate SemDiff's superior performance compared to state-of-the-art methods in attack success rate and imperceptibility, while also evading defenses.
MBZUAI researchers presented a NeurIPS 2024 Spotlight paper that quantifies AI vulnerability by measuring bits leaked per query. Their formula predicts the minimum queries needed for attacks based on mutual information between model output and attacker's target. Experiments across seven models and three attack types (system-prompt extraction, jailbreaks, relearning) validate the relationship. Why it matters: This work offers a framework to translate UI choices (like exposing log-probs or chain-of-thought) into concrete attack surfaces, informing more secure AI design and deployment in the region.
This paper introduces Provable Unrestricted Adversarial Training (PUAT), a novel adversarial training approach. PUAT enhances robustness against both unrestricted and restricted adversarial examples while improving standard generalizability by aligning the distributions of adversarial examples, natural data, and the classifier's learned distribution. The approach uses partially labeled data and an augmented triple-GAN to generate effective unrestricted adversarial examples, demonstrating superior performance on benchmarks.
Conor McMenamin from Universitat Pompeu Fabra presented a seminar on State Machine Replication (SMR) without honest participants. The talk covered the limitations of current SMR protocols and introduced the ByRa model, a framework for player characterization free of honest participants. He then described FAIRSICAL, a sandbox SMR protocol, and discussed how the ideas could be extended to real-world protocols, with a focus on blockchains and cryptocurrencies. Why it matters: This research on SMR protocols and their incentive compatibility could lead to more robust and secure blockchain technologies in the region.