How many queries does it take to break an AI? We put a number on it.
MBZUAI ·
MBZUAI researchers presented a NeurIPS 2024 Spotlight paper that quantifies AI vulnerability by measuring bits leaked per query. Their formula predicts the minimum queries needed for attacks based on mutual information between model output and attacker's target. Experiments across seven models and three attack types (system-prompt extraction, jailbreaks, relearning) validate the relationship. Why it matters: This work offers a framework to translate UI choices (like exposing log-probs or chain-of-thought) into concrete attack surfaces, informing more secure AI design and deployment in the region.