Search

Results for "long-horizon simulation"

How MBZUAI built PAN, an interactive, general world model capable of long-horizon simulation

MBZUAI · Invalid Date

MBZUAI's Institute of Foundation Models (IFM) has developed PAN, a novel interactive world model capable of long-horizon simulation. PAN uses a Generative Latent Prediction (GLP) architecture, coupling internal latent reasoning with generative supervision in the visual domain. The model evolves an internal latent state conditioned on history and natural language actions, then decodes that state into a video segment using a Causal Swin-DPM mechanism for smooth transitions. Why it matters: PAN represents a significant advancement in AI's ability to simulate and predict evolving environments, enabling more steerable and coherent long-term video generation and opening new possibilities for interactive AI systems.

A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos

arXiv · Dec 18

A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.