Tracking Meets Large Multimodal Models for Driving Scenario Understanding
arXiv ·
Researchers at MBZUAI have introduced a novel approach to enhance Large Multimodal Models (LMMs) for autonomous driving by integrating 3D tracking information. This method uses a track encoder to embed spatial and temporal data, enriching visual queries and improving the LMM's understanding of driving scenarios. Experiments on DriveLM-nuScenes and DriveLM-CARLA benchmarks demonstrate significant improvements in perception, planning, and prediction tasks compared to baseline models.