Ivan Laptev from INRIA Paris presented a talk at MBZUAI on embodied multi-modal visual understanding, covering advancements in video understanding tasks like question answering and captioning. The talk highlighted recent work on vision-language navigation and manipulation. He argued that detailed understanding of the physical world through vision is still in early stages, discussing open research directions related to robotics and video generation. Why it matters: The discussion of robotics applications and future research directions in embodied AI could influence the direction of AI research and development in the UAE, particularly at MBZUAI.
Tetsunari Inamura's talk explores using VR to collect HRI data and tailor assistive robotic functionalities to individual users. He discusses symbol emergence via multimodal interaction, interactive behavior generation through symbol manipulation, and VR for data collection. The talk emphasizes long-term human capability enhancement and avoiding over-reliance on technology. Why it matters: This research promotes independence and growth in human-robot interactions, potentially revolutionizing assistive technologies in the region.
Researchers created a cross-cultural corpus of annotated verbal and nonverbal behaviors in receptionist interactions. The corpus includes native speakers of American English and Arabic role-playing scenarios at university reception desks in Doha, Qatar, and Pittsburgh, USA. The manually annotated nonverbal behaviors include gaze direction, hand gestures, torso positions, and facial expressions. Why it matters: This resource can be valuable for the human-robot interaction community, especially for building culturally aware AI systems.
This article discusses the evolution of mobile extended reality (MEX) and its potential to revolutionize urban interaction. It highlights the convergence of augmented and virtual reality technologies for mobile usage. A novel approach to 3D models, characterized as urban situated models or “3D-plus-time” (4D.City), is introduced. Why it matters: The development of MEX and 4D.City could significantly enhance user experience and analog-digital convergence in urban environments, offering new possibilities for human-computer interaction.
This article previews a talk by Gül Varol from Ecole des Ponts ParisTech on bridging natural language and 3D human motions. The talk will cover text-to-motion synthesis using generative models and text-to-motion retrieval models based on the ACTOR, TEMOS, TMR, TEACH, and SINC papers. Varol's research interests include video representation learning, human motion synthesis, and sign languages. Why it matters: Research in this area could enable more intuitive human-computer interaction and new applications in areas like virtual reality and robotics.
Michael Yu Wang, Chair Professor and Founding Dean of the School of Engineering at Great Bay University, argues for combining "good old fashioned engineering" (GOFE) with learning-based approaches like LLMs for robot skill acquisition, particularly in manipulation. He suggests a modular framework that integrates engineering principles with learning, drawing inspiration from human hand-eye coordination and tactile perception. Wang emphasizes the need to address engineering features of robot tactile sensors, such as spatial and temporal resolutions, to achieve human-like robot manipulation skills. Why it matters: This perspective highlights the importance of hybrid approaches combining traditional engineering with modern AI for advancing robotics, especially in complex manipulation tasks relevant to industries in the GCC region.
Dr. Hao Dong from Peking University presented research on addressing the challenge of limited large-scale training data in embodied AI, particularly for manipulation, task planning, and navigation. The presentation covered simulation learning and large models. Dr. Dong is a chief scientist of China's National Key Research and Development Program and an area chair/associate editor for NeurIPS, CVPR, AAAI, and ICRA. Why it matters: Overcoming data scarcity is crucial for advancing embodied AI research and enabling more sophisticated robotic applications in the region.
MBZUAI Professor Ian Reid discusses his career in embodied AI, from early work on active vision at Oxford to current research. He highlights three key developments: cameras as geometric sensors, visual SLAM, and advancements in robot navigation. Reid distinguishes embodied AI from systems like ChatGPT, emphasizing its need for understanding and interaction with the physical world. Why it matters: The insights from a leading expert underscore the importance of embodied AI as the next frontier in intelligent systems and robotics in the region.