Hassan Sajjad from Dalhousie University presented research on exploring the latent space of AI models to assess their safety and trustworthiness. He discussed use cases where analyzing latent space helps understand the robustness-generalization tradeoff in adversarial training and evaluate language comprehension. Sajjad's work aims to build better AI models and increase trust in their capabilities by looking at model internals. Why it matters: Intrinsic evaluation of model internals will become important to improving AI safety and robustness.
This paper introduces SemDiff, a novel method for generating unrestricted adversarial examples (UAEs) by exploring the semantic latent space of diffusion models. SemDiff uses multi-attribute optimization to ensure attack success while preserving the naturalness and imperceptibility of generated UAEs. Experiments on high-resolution datasets demonstrate SemDiff's superior performance compared to state-of-the-art methods in attack success rate and imperceptibility, while also evading defenses.
MBZUAI faculty Kun Zhang is researching methods to improve the reliability of generative AI, particularly in healthcare applications. Current generative AI models often act as "black boxes," making it difficult to understand why a specific result was produced. Zhang's research focuses on incorporating causal relationships into AI systems to ensure more accurate and meaningful information. Why it matters: Improving the trustworthiness of generative AI is crucial for sensitive sectors like healthcare and ensuring responsible AI deployment across the region.
MBZUAI's Dr. Artem Shelmanov is working on uncertainty quantification (UQ) methods for generative LLMs to detect unreliable generations. He aims to address the issue of LLMs fabricating facts, often called "hallucinating", without clear indicators of veracity. He systemizes existing UQ efforts, discusses caveats, and suggests novel techniques for safer LLM use. Why it matters: Improving the reliability of LLMs is crucial for responsible AI deployment in the region, especially in sensitive applications.
The paper introduces VENOM, a text-driven framework for generating high-quality unrestricted adversarial examples using diffusion models. VENOM unifies image content generation and adversarial synthesis into a single reverse diffusion process, enhancing both attack success rate and image quality. The framework incorporates an adaptive adversarial guidance strategy with momentum to ensure the generated adversarial examples align with the distribution of natural images.