The paper introduces OmniGen, a unified framework for generating aligned multimodal sensor data for autonomous driving using a shared Bird's Eye View (BEV) space. It uses a novel generalizable multimodal reconstruction method (UAE) to jointly decode LiDAR and multi-view camera data through volume rendering. The framework incorporates a Diffusion Transformer (DiT) with a ControlNet branch to enable controllable multimodal sensor generation, demonstrating good performance and multimodal consistency.
Researchers at MBZUAI have introduced a novel approach to enhance Large Multimodal Models (LMMs) for autonomous driving by integrating 3D tracking information. This method uses a track encoder to embed spatial and temporal data, enriching visual queries and improving the LMM's understanding of driving scenarios. Experiments on DriveLM-nuScenes and DriveLM-CARLA benchmarks demonstrate significant improvements in perception, planning, and prediction tasks compared to baseline models.
A new research paper systematically benchmarked two learning-based and two empirical feedforward steering controllers for autonomous racing, introducing a new Empirical Hysteresis Dynamics (EHD) formulation. The study utilized a high-fidelity simulation framework based on the real-world Abu Dhabi Autonomous Racing League competition. While learning-based controllers showed lower prediction errors in open-loop evaluation, the proposed EHD approach achieved the best overall closed-loop robustness and lap times. Why it matters: This research highlights the critical importance of evaluating control strategies within a complete software stack for autonomous racing, directly informing the development for competitions like the AADRL.
Dr. Xiaoming Liu from Michigan State University discussed computer vision techniques for 3D world understanding at a talk hosted by MBZUAI. The talk covered 3D reconstruction, detection, depth estimation, and velocity estimation, with applications in biometrics and autonomous driving. Dr. Liu also touched on anti-spoofing and fair face recognition research at MSU's Computer Vision Lab. Why it matters: Showcasing international experts and research directions helps to catalyze computer vision and 3D understanding research efforts within the UAE's AI ecosystem.