OmniGen: Unified Multimodal Sensor Generation for Autonomous Driving

arXiv · December 16, 2025 · Significant research

CV RL Research Robotics Autonomous Driving

Summary

The paper introduces OmniGen, a unified framework for generating aligned multimodal sensor data for autonomous driving using a shared Bird's Eye View (BEV) space. It uses a novel generalizable multimodal reconstruction method (UAE) to jointly decode LiDAR and multi-view camera data through volume rendering. The framework incorporates a Diffusion Transformer (DiT) with a ControlNet branch to enable controllable multimodal sensor generation, demonstrating good performance and multimodal consistency.

Keywords

autonomous driving · generative models · multimodal sensor data · Bird's Eye View · Diffusion Transformer

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

Tracking Meets Large Multimodal Models for Driving Scenario Understanding

arXiv · Mar 18

Researchers at MBZUAI have introduced a novel approach to enhance Large Multimodal Models (LMMs) for autonomous driving by integrating 3D tracking information. This method uses a track encoder to embed spatial and temporal data, enriching visual queries and improving the LMM's understanding of driving scenarios. Experiments on DriveLM-nuScenes and DriveLM-CARLA benchmarks demonstrate significant improvements in perception, planning, and prediction tasks compared to baseline models.

OmniGen: Unified Multimodal Sensor Generation for Autonomous Driving

Summary

Keywords

Related

Tracking Meets Large Multimodal Models for Driving Scenario Understanding