Researchers at MBZUAI have introduced a novel approach to enhance Large Multimodal Models (LMMs) for autonomous driving by integrating 3D tracking information. This method uses a track encoder to embed spatial and temporal data, enriching visual queries and improving the LMM's understanding of driving scenarios. Experiments on DriveLM-nuScenes and DriveLM-CARLA benchmarks demonstrate significant improvements in perception, planning, and prediction tasks compared to baseline models.
A researcher at the University of Oxford presented new findings on 3D neural reconstruction. The talk introduced a dataset comprising real-world video captures with perfect 3D models. A novel joint optimization method refines camera poses during the reconstruction process. Why it matters: High-quality 3D reconstruction has broad applicability to robotics and computer vision applications in the region.
Dr. Xiaoming Liu from Michigan State University discussed computer vision techniques for 3D world understanding at a talk hosted by MBZUAI. The talk covered 3D reconstruction, detection, depth estimation, and velocity estimation, with applications in biometrics and autonomous driving. Dr. Liu also touched on anti-spoofing and fair face recognition research at MSU's Computer Vision Lab. Why it matters: Showcasing international experts and research directions helps to catalyze computer vision and 3D understanding research efforts within the UAE's AI ecosystem.
KAUST's Peter Wonka discusses the challenges and advancements in creating data-rich, three-dimensional maps for various applications. His team is working with Boeing on 3D modeling tools for aerospace design. KAUST-funded FalconViz uses UAV drones to create 3D maps of disaster areas for first responders. Why it matters: This highlights KAUST's contribution to cutting-edge 3D modeling and its practical applications in industries like aerospace and disaster response in the region.
This work presents a dual pose-graph architecture for robust real-time localization in autonomous drone racing. The system fuses monocular visual-inertial odometry with semantic gate detections, using a temporary graph to optimize multiple observations into refined constraints before promoting them to a persistent main graph. Evaluated on the TII-RATM dataset and deployed in the A2RL competition, it achieved a 56-74% reduction in Absolute Trajectory Error (ATE) compared to standalone VIO and reduced odometry drift by up to 4.2 meters per lap. Why it matters: This research significantly improves the reliability and accuracy of vision-based localization for high-speed autonomous drones, crucial for advanced robotics applications and competitive racing.
This paper presents a decentralized multi-agent unmanned aerial system designed for search, pickup, and relocation of objects. The system integrates multi-agent aerial exploration, object detection/tracking, and aerial gripping. The decentralized system uses global state estimation, reactive collision avoidance, and sweep planning for exploration. Why it matters: The system's successful deployment in demonstrations and competitions like MBZIRC highlights the potential of integrated robotic solutions for complex tasks such as search and rescue in the region.