A researcher at the University of Oxford presented new findings on 3D neural reconstruction. The talk introduced a dataset comprising real-world video captures with perfect 3D models. A novel joint optimization method refines camera poses during the reconstruction process. Why it matters: High-quality 3D reconstruction has broad applicability to robotics and computer vision applications in the region.
The paper introduces OmniGen, a unified framework for generating aligned multimodal sensor data for autonomous driving using a shared Bird's Eye View (BEV) space. It uses a novel generalizable multimodal reconstruction method (UAE) to jointly decode LiDAR and multi-view camera data through volume rendering. The framework incorporates a Diffusion Transformer (DiT) with a ControlNet branch to enable controllable multimodal sensor generation, demonstrating good performance and multimodal consistency.
Pascal Fua from EPFL presented an approach to implementing convolutional neural nets that output complex 3D surface meshes. The method overcomes limitations in converting implicit representations to explicit surface representations. Applications include single view reconstruction, physically-driven shape optimization, and bio-medical image segmentation. Why it matters: This research advances geometric deep learning by enabling end-to-end trainable models for 3D surface mesh generation, with potential impact on various applications in computer vision and biomedical imaging in the region.