Latent Space Exploration for Safe and Trustworthy AI Models

MBZUAI · Notable

Summary

Hassan Sajjad from Dalhousie University presented research on exploring the latent space of AI models to assess their safety and trustworthiness. He discussed use cases where analyzing latent space helps understand the robustness-generalization tradeoff in adversarial training and evaluate language comprehension. Sajjad's work aims to build better AI models and increase trust in their capabilities by looking at model internals. Why it matters: Intrinsic evaluation of model internals will become important to improving AI safety and robustness.

Keywords

latent space · AI models · trustworthiness · robustness · generalization

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

SemDiff: Generating Natural Unrestricted Adversarial Examples via Semantic Attributes Optimization in Diffusion Models

arXiv · Apr 16

This paper introduces SemDiff, a novel method for generating unrestricted adversarial examples (UAEs) by exploring the semantic latent space of diffusion models. SemDiff uses multi-attribute optimization to ensure attack success while preserving the naturalness and imperceptibility of generated UAEs. Experiments on high-resolution datasets demonstrate SemDiff's superior performance compared to state-of-the-art methods in attack success rate and imperceptibility, while also evading defenses.

Towards trustworthy generative AI

MBZUAI · Invalid Date

MBZUAI faculty Kun Zhang is researching methods to improve the reliability of generative AI, particularly in healthcare applications. Current generative AI models often act as "black boxes," making it difficult to understand why a specific result was produced. Zhang's research focuses on incorporating causal relationships into AI systems to ensure more accurate and meaningful information. Why it matters: Improving the trustworthiness of generative AI is crucial for sensitive sectors like healthcare and ensuring responsible AI deployment across the region.

Safety of Deploying NLP Models: Uncertainty Quantification of Generative LLMs

MBZUAI · Invalid Date

MBZUAI's Dr. Artem Shelmanov is working on uncertainty quantification (UQ) methods for generative LLMs to detect unreliable generations. He aims to address the issue of LLMs fabricating facts, often called "hallucinating", without clear indicators of veracity. He systemizes existing UQ efforts, discusses caveats, and suggests novel techniques for safer LLM use. Why it matters: Improving the reliability of LLMs is crucial for responsible AI deployment in the region, especially in sensitive applications.

VENOM: Text-driven Unrestricted Adversarial Example Generation with Diffusion Models

arXiv · Jan 14

The paper introduces VENOM, a text-driven framework for generating high-quality unrestricted adversarial examples using diffusion models. VENOM unifies image content generation and adversarial synthesis into a single reverse diffusion process, enhancing both attack success rate and image quality. The framework incorporates an adaptive adversarial guidance strategy with momentum to ensure the generated adversarial examples align with the distribution of natural images.