Skip to content
GCC AI Research

Search

Results for "jailbreak"

How jailbreak attacks work and a new way to stop them

MBZUAI ·

Researchers at MBZUAI and other institutions have published a study at ACL 2024 investigating how jailbreak attacks work on LLMs. The study used a dataset of 30,000 prompts and non-linear probing to interpret the effects of jailbreak attacks, finding that existing interpretations were inadequate. The researchers propose a new approach to improve LLM safety against such attacks by identifying the layers in neural networks where the behavior occurs. Why it matters: Understanding and mitigating jailbreak attacks is crucial for ensuring the responsible and secure deployment of LLMs, particularly in the Arabic-speaking world where these models are increasingly being used.

CRC Seminar Series - Cristofaro Mune, Niek Timmers

TII ·

Cristofaro Mune and Niek Timmers presented a seminar on bypassing unbreakable crypto using fault injection on Espressif ESP32 chips. The presentation detailed how the hardware-based Encrypted Secure Boot implementation of the ESP32 SoC was bypassed using a single EM glitch, without knowing the decryption key. This attack exploited multiple hardware vulnerabilities, enabling arbitrary code execution and extraction of plain-text data from external flash. Why it matters: The research highlights critical security vulnerabilities in embedded systems and the potential for fault injection attacks to bypass secure boot mechanisms, necessitating stronger hardware-level security measures.

Your voice can jailbreak a speech model – here’s how to stop it, without retraining

MBZUAI ·

A new paper from MBZUAI demonstrates that state-of-the-art speech models can be easily jailbroken using audio perturbations to generate harmful content, achieving success rates of 76-93% on models like Qwen2-Audio and LLaMA-Omni. The researchers adapted projected gradient descent (PGD) to the audio domain to optimize waveforms that push the model towards harmful responses. They propose a defense mechanism based on post-hoc activation patching that hardens models at inference time without retraining. Why it matters: This research highlights a critical vulnerability in speech-based LLMs and offers a practical solution, contributing to the development of more secure and trustworthy AI systems in the region and globally.

CRC Seminar Series - Conor McMenamin

TII ·

Conor McMenamin from Universitat Pompeu Fabra presented a seminar on State Machine Replication (SMR) without honest participants. The talk covered the limitations of current SMR protocols and introduced the ByRa model, a framework for player characterization free of honest participants. He then described FAIRSICAL, a sandbox SMR protocol, and discussed how the ideas could be extended to real-world protocols, with a focus on blockchains and cryptocurrencies. Why it matters: This research on SMR protocols and their incentive compatibility could lead to more robust and secure blockchain technologies in the region.

How secure is AI-generated Code: A Large-Scale Comparison of Large Language Models

arXiv ·

A study compared the vulnerability of C programs generated by nine state-of-the-art Large Language Models (LLMs) using a zero-shot prompt. The researchers introduced FormAI-v2, a dataset of 331,000 C programs generated by these LLMs, and found that at least 62.07% of the generated programs contained vulnerabilities, detected via formal verification. The research highlights the need for risk assessment and validation when deploying LLM-generated code in production environments.

K2 Think Hackathon: could your idea turn into impact in 48 hours?

MBZUAI ·

MBZUAI is hosting the K2 Think Hackathon, challenging participants to develop applications using the K2 Think reasoning model developed with G42. The hackathon involves a global idea call followed by a 48-hour build challenge in Abu Dhabi for the top 10 teams. The winning feature will be integrated into the K2 Think application. Why it matters: This hackathon provides a valuable opportunity to test and shape a cutting-edge AI model, potentially leading to innovative applications in various sectors like finance and education within the UAE and beyond.

CRC Team Places 6th in Global Cyber Security Competition

TII ·

A team from the Cryptography Research Center (CRC) secured 6th place out of 210 teams in the 'Donjon CTF 2021: Capture the Fortress' cybersecurity competition. The competition featured jeopardy-style challenges covering cryptography, reverse engineering, and hardware security. The CRC team participated to improve visibility and assess team capabilities, particularly in hardware security. Why it matters: The CRC team's strong performance highlights the growing cybersecurity expertise in the UAE and its attractiveness for talent in this field.