Skip to content
GCC AI Research

Search

Results for "3D tracking"

Tracking Meets Large Multimodal Models for Driving Scenario Understanding

arXiv ·

Researchers at MBZUAI have introduced a novel approach to enhance Large Multimodal Models (LMMs) for autonomous driving by integrating 3D tracking information. This method uses a track encoder to embed spatial and temporal data, enriching visual queries and improving the LMM's understanding of driving scenarios. Experiments on DriveLM-nuScenes and DriveLM-CARLA benchmarks demonstrate significant improvements in perception, planning, and prediction tasks compared to baseline models.

High-quality Neural Reconstruction in Real-world Scenes

MBZUAI ·

A researcher at the University of Oxford presented new findings on 3D neural reconstruction. The talk introduced a dataset comprising real-world video captures with perfect 3D models. A novel joint optimization method refines camera poses during the reconstruction process. Why it matters: High-quality 3D reconstruction has broad applicability to robotics and computer vision applications in the region.

Computer Vision: A Journey of Pursuing 3D World Understanding

MBZUAI ·

Dr. Xiaoming Liu from Michigan State University discussed computer vision techniques for 3D world understanding at a talk hosted by MBZUAI. The talk covered 3D reconstruction, detection, depth estimation, and velocity estimation, with applications in biometrics and autonomous driving. Dr. Liu also touched on anti-spoofing and fair face recognition research at MSU's Computer Vision Lab. Why it matters: Showcasing international experts and research directions helps to catalyze computer vision and 3D understanding research efforts within the UAE's AI ecosystem.

Computing in three dimensions: A conversation with Peter Wonka

KAUST ·

KAUST's Peter Wonka discusses the challenges and advancements in creating data-rich, three-dimensional maps for various applications. His team is working with Boeing on 3D modeling tools for aerospace design. KAUST-funded FalconViz uses UAV drones to create 3D maps of disaster areas for first responders. Why it matters: This highlights KAUST's contribution to cutting-edge 3D modeling and its practical applications in industries like aerospace and disaster response in the region.

Dual Pose-Graph Semantic Localization for Vision-Based Autonomous Drone Racing

arXiv ·

This work presents a dual pose-graph architecture for robust real-time localization in autonomous drone racing. The system fuses monocular visual-inertial odometry with semantic gate detections, using a temporary graph to optimize multiple observations into refined constraints before promoting them to a persistent main graph. Evaluated on the TII-RATM dataset and deployed in the A2RL competition, it achieved a 56-74% reduction in Absolute Trajectory Error (ATE) compared to standalone VIO and reduced odometry drift by up to 4.2 meters per lap. Why it matters: This research significantly improves the reliability and accuracy of vision-based localization for high-speed autonomous drones, crucial for advanced robotics applications and competitive racing.

A Decentralized Multi-Agent Unmanned Aerial System to Search, Pick Up, and Relocate Objects

arXiv ·

This paper presents a decentralized multi-agent unmanned aerial system designed for search, pickup, and relocation of objects. The system integrates multi-agent aerial exploration, object detection/tracking, and aerial gripping. The decentralized system uses global state estimation, reactive collision avoidance, and sweep planning for exploration. Why it matters: The system's successful deployment in demonstrations and competitions like MBZIRC highlights the potential of integrated robotic solutions for complex tasks such as search and rescue in the region.