Skip to content
GCC AI Research

Search

Results for "image synthesis"

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

arXiv ·

The paper introduces the Prism Hypothesis, which posits a correspondence between an encoder's feature spectrum and its functional role, with semantic encoders capturing low-frequency components and pixel encoders retaining high-frequency information. Based on this, the authors propose Unified Autoencoding (UAE), a model that harmonizes semantic structure and pixel details using a frequency-band modulator. Experiments on ImageNet and MS-COCO demonstrate that UAE effectively unifies semantic abstraction and pixel-level fidelity, achieving state-of-the-art performance.

Physically-Based Simulation for Generative AI Models

MBZUAI ·

Jorge Amador, a PhD student at KAUST's Visual Computing Center, presented a talk on physically-based simulation for generative AI models. The talk covered the use of synthetic data generation and physical priors to address the need for high-quality datasets. Applications discussed include photo editing, navigation, digital humans, and cosmological simulations. Why it matters: This research explores a promising technique to overcome data scarcity issues in AI, particularly relevant in resource-constrained environments or for sensitive applications.

OmniGen: Unified Multimodal Sensor Generation for Autonomous Driving

arXiv ·

The paper introduces OmniGen, a unified framework for generating aligned multimodal sensor data for autonomous driving using a shared Bird's Eye View (BEV) space. It uses a novel generalizable multimodal reconstruction method (UAE) to jointly decode LiDAR and multi-view camera data through volume rendering. The framework incorporates a Diffusion Transformer (DiT) with a ControlNet branch to enable controllable multimodal sensor generation, demonstrating good performance and multimodal consistency.

Scaling Generative Adversarial Networks

MBZUAI ·

Axel Sauer from the University of Tübingen presented research on scaling Generative Adversarial Networks (GANs) using pretrained representations. The work explores shaping GANs into causal structures, training them up to 40 times faster, and achieving state-of-the-art image synthesis. The presentation mentions "Counterfactual Generative Networks", "Projected GANs", "StyleGAN-XL”, and “StyleGAN-T". Why it matters: Scaling GANs and improving their training efficiency is crucial for advancing image and video synthesis, with implications for various applications in computer vision, graphics, and robotics.

Image generation and manipulation research at VinAI

MBZUAI ·

VinAI Research presented research projects focused on advancing image generation and manipulation using GANs and Diffusion Models. The research aims to improve GANs regarding utility, coverage, and output consistency. For Diffusion Models, the work focuses on improving the models’ speed to approach real-time performance and prevent negative social impact of diffusion-based personalized text-to-image generation. Why it matters: This talk indicates ongoing research and development in generative AI in Southeast Asia, an area of growing interest globally.

Learned Optics — Improving Computational Imaging Systems through Deep Learning and Optimization

MBZUAI ·

KAUST Professor Wolfgang Heidrich is researching computational imaging systems that jointly design optics and image reconstruction algorithms. He focuses on hardware-software co-design for imaging systems with applications in HDR, compact cameras, and hyperspectral imaging. Heidrich's work on HDR displays was the basis for Brightside Technologies, acquired by Dolby in 2007. Why it matters: This research aims to advance imaging technology through AI-driven design, potentially impacting various fields from consumer electronics to scientific research within the region and globally.