Image generation and manipulation research at VinAI

Summary

VinAI Research presented research projects focused on advancing image generation and manipulation using GANs and Diffusion Models. The research aims to improve GANs regarding utility, coverage, and output consistency. For Diffusion Models, the work focuses on improving the models’ speed to approach real-time performance and prevent negative social impact of diffusion-based personalized text-to-image generation. Why it matters: This talk indicates ongoing research and development in generative AI in Southeast Asia, an area of growing interest globally.

Keywords

VinAI · GANs · Diffusion Models · image generation · image manipulation

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

Cross-modal understanding and generation of multimodal content

MBZUAI · Invalid Date

Nicu Sebe from the University of Trento presented recent work on video generation, focusing on animating objects in a source image using external information like labels, driving videos, or text. He introduced a Learnable Game Engine (LGE) trained from monocular annotated videos, which maintains states of scenes, objects, and agents to render controllable viewpoints. Why it matters: This talk highlights advancements in cross-modal AI, potentially enabling new applications in gaming, simulation, and content creation within the region.

Towards the generation of more realistic avatars in virtual worlds

MBZUAI · Invalid Date

MBZUAI's Metaverse Center is developing technologies for realistic avatar generation. Hao Li and colleagues presented a novel approach at CVPR 2024, collaborating with ETH Zurich, VinAI Research, and Pinscreen. The technology addresses the challenge of mapping 2D images to 3D avatars, accounting for poses, expressions, and views. Why it matters: Creating realistic and efficient avatar generation could improve user experience and accessibility in virtual environments across the Middle East.

Scaling Generative Adversarial Networks

MBZUAI · Invalid Date

Axel Sauer from the University of Tübingen presented research on scaling Generative Adversarial Networks (GANs) using pretrained representations. The work explores shaping GANs into causal structures, training them up to 40 times faster, and achieving state-of-the-art image synthesis. The presentation mentions "Counterfactual Generative Networks", "Projected GANs", "StyleGAN-XL”, and “StyleGAN-T". Why it matters: Scaling GANs and improving their training efficiency is crucial for advancing image and video synthesis, with implications for various applications in computer vision, graphics, and robotics.

VENOM: Text-driven Unrestricted Adversarial Example Generation with Diffusion Models

arXiv · Jan 14

The paper introduces VENOM, a text-driven framework for generating high-quality unrestricted adversarial examples using diffusion models. VENOM unifies image content generation and adversarial synthesis into a single reverse diffusion process, enhancing both attack success rate and image quality. The framework incorporates an adaptive adversarial guidance strategy with momentum to ensure the generated adversarial examples align with the distribution of natural images.

Image generation and manipulation research at VinAI

Summary

Keywords

Related

Cross-modal understanding and generation of multimodal content

Towards the generation of more realistic avatars in virtual worlds

Scaling Generative Adversarial Networks

VENOM: Text-driven Unrestricted Adversarial Example Generation with Diffusion Models