Skip to content
GCC AI Research

A next step for embodied agents: Ivan Laptev on world models

MBZUAI · Notable

Summary

MBZUAI Professor Ivan Laptev is working to bridge the gap between data-driven AI systems and embodied agents (robots). He notes challenges in robotics including data scarcity, the need to generate new data through actions, and the requirement for real-time operation. Laptev aims to transfer innovations from computer vision to robotics, addressing these challenges to improve robots' ability to interpret and respond to the complexities of the real world. Why it matters: Overcoming these hurdles is crucial for advancing robotics and enabling robots to effectively interact with and navigate dynamic real-world environments.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

Towards embodied multi-modal visual understanding

MBZUAI ·

Ivan Laptev from INRIA Paris presented a talk at MBZUAI on embodied multi-modal visual understanding, covering advancements in video understanding tasks like question answering and captioning. The talk highlighted recent work on vision-language navigation and manipulation. He argued that detailed understanding of the physical world through vision is still in early stages, discussing open research directions related to robotics and video generation. Why it matters: The discussion of robotics applications and future research directions in embodied AI could influence the direction of AI research and development in the UAE, particularly at MBZUAI.

Structured World Models for Robots

MBZUAI ·

Krishna Murthy, a postdoc at MIT, researches computational world models to enable robots to understand and operate effectively in the physical world. His work focuses on differentiable computing approaches for spatial perception and interfaces large image, language, and audio models with 3D scenes. Murthy envisions structured world models working with scaling-based approaches to create versatile robot perception and planning algorithms. Why it matters: This research could significantly advance robotics by enabling more sophisticated perception, reasoning, and action capabilities in embodied agents.

How MBZUAI built PAN, an interactive, general world model capable of long-horizon simulation

MBZUAI ·

MBZUAI's Institute of Foundation Models (IFM) has developed PAN, a novel interactive world model capable of long-horizon simulation. PAN uses a Generative Latent Prediction (GLP) architecture, coupling internal latent reasoning with generative supervision in the visual domain. The model evolves an internal latent state conditioned on history and natural language actions, then decodes that state into a video segment using a Causal Swin-DPM mechanism for smooth transitions. Why it matters: PAN represents a significant advancement in AI's ability to simulate and predict evolving environments, enabling more steerable and coherent long-term video generation and opening new possibilities for interactive AI systems.

Inside PAN, MBZUAI’s groundbreaking world model

MBZUAI ·

MBZUAI is previewing PAN, a next-generation world model designed to simulate diverse realities and advance machine reasoning. PAN allows researchers to test AI agents in simulated environments before real-world deployment, enabling them to learn from mistakes without real-world consequences. It facilitates complex reasoning about actions, outcomes, and interactions, crucial for reliable AI performance in dynamic environments. Why it matters: PAN represents a significant advancement in AI by enabling comprehensive simulation and testing of AI agents, which can revolutionize fields like disaster management and healthcare where real-world experimentation is risky.