Dezhen Song from Texas A&M University presented a talk on Co-Modality Active sensing and Perception (C-MAP) for robotics, covering sensor fusion for autonomous vehicles, augmented reality, and remote environmental monitoring. The talk highlighted lessons learned in sensor fusion using autonomous motorcycles and NASA Robonaut as examples. Recent works in robotic remote environment monitoring, especially focused on subsurface surface void and pipeline mapping were discussed. Why it matters: This research explores sensor fusion techniques to enhance robot perception, which could improve the robustness and capabilities of autonomous systems developed and deployed in the Middle East, particularly in challenging environments.
Gregory Chirikjian presented an overview of research on robot navigation in unstructured environments, using computer vision, sensor tech, ML, and motion planning. The methods use multi-modal observations from RGB cameras, 3D LiDAR, and robot odometry for scene perception, along with deep RL for planning. These methods have been integrated with wheeled, home, and legged robots and tested in crowded indoor scenes, home environments, and dense outdoor terrains. Why it matters: This research pushes the boundaries of robotics in complex environments, paving the way for more versatile and autonomous robots in the Middle East.
Song Chaoyang from the Southern University of Science and Technology (SUSTech) presented research on Vision-Based Tactile Sensing (VBTS) for robot learning, combining soft robotic design with learning algorithms to achieve state-of-the-art performance in tactile perception. Their VBTS solution demonstrates robustness up to 1 million test cycles and enables multi-modal outputs from a single, vision-based input, facilitating applications such as amphibious tactile grasping and industrial welding. The talk also highlighted the DeepClaw system for capturing human demonstration actions, aiming for a universal interaction interface. Why it matters: This research advances embodied intelligence by improving robot dexterity and adaptability through enhanced tactile sensing, which is crucial for complex manipulation tasks in various sectors such as manufacturing and healthcare within the region.
The paper introduces OmniGen, a unified framework for generating aligned multimodal sensor data for autonomous driving using a shared Bird's Eye View (BEV) space. It uses a novel generalizable multimodal reconstruction method (UAE) to jointly decode LiDAR and multi-view camera data through volume rendering. The framework incorporates a Diffusion Transformer (DiT) with a ControlNet branch to enable controllable multimodal sensor generation, demonstrating good performance and multimodal consistency.
This paper introduces Adaptive Entropy-aware Optimization (AEO), a new framework to tackle Multimodal Open-set Test-time Adaptation (MM-OSTTA). AEO uses Unknown-aware Adaptive Entropy Optimization (UAE) and Adaptive Modality Prediction Discrepancy Optimization (AMP) to distinguish unknown class samples during online adaptation by amplifying the entropy difference between known and unknown samples. The study establishes a new benchmark derived from existing datasets with five modalities and evaluates AEO's performance across various domain shift scenarios, demonstrating its effectiveness in long-term and continual MM-OSTTA settings.