Researchers from MBZUAI introduced RP-SAM2, a method to improve surgical instrument segmentation by refining point prompts for more stable results. RP-SAM2 uses a novel shift block and compound loss function to reduce sensitivity to point prompt placement, improving segmentation accuracy in data-constrained settings. Experiments on the Cataract1k and CaDIS datasets show that RP-SAM2 enhances segmentation accuracy and reduces variance compared to SAM2, with code available on GitHub.
Dezhen Song from Texas A&M University presented a talk on Co-Modality Active sensing and Perception (C-MAP) for robotics, covering sensor fusion for autonomous vehicles, augmented reality, and remote environmental monitoring. The talk highlighted lessons learned in sensor fusion using autonomous motorcycles and NASA Robonaut as examples. Recent works in robotic remote environment monitoring, especially focused on subsurface surface void and pipeline mapping were discussed. Why it matters: This research explores sensor fusion techniques to enhance robot perception, which could improve the robustness and capabilities of autonomous systems developed and deployed in the Middle East, particularly in challenging environments.
Saudi Arabia has launched the SMAI 2 initiative, building upon the original Saudi National Strategy for Data and AI (NSDAI). The new phase aims to accelerate the development of data and AI capabilities within the Kingdom. SMAI 2 will focus on key sectors and strategic priorities to drive economic diversification and societal progress. Why it matters: The initiative signals Saudi Arabia's ongoing commitment to becoming a leader in AI and data-driven innovation.
This paper presents a decentralized multi-agent unmanned aerial system designed for search, pickup, and relocation of objects. The system integrates multi-agent aerial exploration, object detection/tracking, and aerial gripping. The decentralized system uses global state estimation, reactive collision avoidance, and sweep planning for exploration. Why it matters: The system's successful deployment in demonstrations and competitions like MBZIRC highlights the potential of integrated robotic solutions for complex tasks such as search and rescue in the region.
TII's Secure Systems Research Center in Abu Dhabi has integrated a secure PX4 stack into a RISC-V based drone, marking a milestone in making RISC-V UAV systems a reality. The center ported DroneCode's PX4 open source software to RISC-V using a commercially available RISC-V development platform. SSRC aims to improve the security and resilience of the PX4 flight control software and NuttX real-time OS, contributing modifications back to the open-source community. Why it matters: This achievement enhances TII's position in drone and autonomous systems research, contributing to safer and more efficient smart city applications in the region.
Researchers from MBZUAI have introduced VideoMolmo, a large multimodal model for spatio-temporal pointing conditioned on textual descriptions. The model incorporates a temporal module with an attention mechanism and a temporal mask fusion pipeline using SAM2 for improved coherence across video sequences. They also curated a dataset of 72k video-caption pairs and introduced VPoS-Bench, a benchmark for evaluating generalization across real-world scenarios, with code and models publicly available.