MBZUAI researchers have introduced SURPRISE3D, a benchmark for evaluating 3D spatial reasoning in AI systems, along with a 3D Spatial Reasoning Segmentation (3D-SRS) task. The benchmark includes over 900 indoor scenes and 200,000 language queries paired with 3D masks, emphasizing spatial relationships over object naming. A companion paper, MLLM-For3D, explores adapting 2D multimodal LLMs for 3D reasoning. Why it matters: This work addresses a key limitation in current AI, pushing towards embodied AI that can understand and act in 3D environments based on human-like spatial reasoning.
Marc Pollefeys from ETH Zurich and Microsoft Spatial AI Lab will discuss building 3D environment representations for assisting humans and robots. The talk covers visual 3D mapping, localization, spatial data access, and navigation using geometry and learning-based methods. It also explores building rich 3D semantic representations for scene interaction via open vocabulary queries leveraging foundation models. Why it matters: Advancements in spatial AI and 3D scene understanding are critical for enabling more capable robots and AI assistants in various applications within the region.
This article discusses the evolution of mobile extended reality (MEX) and its potential to revolutionize urban interaction. It highlights the convergence of augmented and virtual reality technologies for mobile usage. A novel approach to 3D models, characterized as urban situated models or “3D-plus-time” (4D.City), is introduced. Why it matters: The development of MEX and 4D.City could significantly enhance user experience and analog-digital convergence in urban environments, offering new possibilities for human-computer interaction.
The article discusses immersive analytics, which uses VR and AR to visualize data in 3D and embed it into the user's environment, and reviews systems and techniques from the Data Visualisation and Immersive Analytics lab at Monash University. It explores the concept of "embodied sensemaking" and its potential to improve how people work with complex data. Professor Tim Dwyer directs the Data Visualisation and Immersive Analytics Lab at Monash University. Why it matters: Immersive analytics could significantly enhance data comprehension and decision-making across various sectors in the Middle East, where large-scale projects and smart city initiatives generate vast datasets.
KAUST's Peter Wonka discusses the challenges and advancements in creating data-rich, three-dimensional maps for various applications. His team is working with Boeing on 3D modeling tools for aerospace design. KAUST-funded FalconViz uses UAV drones to create 3D maps of disaster areas for first responders. Why it matters: This highlights KAUST's contribution to cutting-edge 3D modeling and its practical applications in industries like aerospace and disaster response in the region.
Eyal Ofek of Microsoft Research is researching how to augment users' senses and use scene understanding to create more inclusive workspaces, especially for remote work. His work involves designing applications flexible to changing environments and personalized to each user. Ofek's background includes computer vision, augmented reality, and leading research groups at Microsoft. Why it matters: This research aims to improve remote collaboration and adapt technology to individual user needs, which could enhance productivity and inclusivity in the evolving work landscape of the GCC region.
Margaret Livingstone, a neurobiology professor at Harvard Medical School, lectured at KAUST's Winter Enrichment Program 2018 on how art can reveal insights into the human brain. She discussed how artists have long understood the independent roles of color and luminance in visual perception. Livingstone highlighted examples from Picasso, Monet, and Warhol to illustrate how artists manipulate visual cues. Why it matters: This interdisciplinary approach can potentially lead to new understandings of how the brain processes visual information and inform advances in both neuroscience and art.
Dezhen Song from Texas A&M University presented a talk on Co-Modality Active sensing and Perception (C-MAP) for robotics, covering sensor fusion for autonomous vehicles, augmented reality, and remote environmental monitoring. The talk highlighted lessons learned in sensor fusion using autonomous motorcycles and NASA Robonaut as examples. Recent works in robotic remote environment monitoring, especially focused on subsurface surface void and pipeline mapping were discussed. Why it matters: This research explores sensor fusion techniques to enhance robot perception, which could improve the robustness and capabilities of autonomous systems developed and deployed in the Middle East, particularly in challenging environments.