Mingyu Ding from UC Berkeley presented research on endowing robots with human-like commonsense and physical reasoning capabilities. The talk covered multimodal commonsense reasoning integrating vision, world models, and language-based task planners. It also discussed physical reasoning approaches for robots to infer dynamics and physical properties of objects. Why it matters: Enhancing robots with these capabilities can improve their ability to generalize across everyday tasks, leading to greater social benefits and impact.
A new approach to composed video retrieval (CoVR) is presented, which leverages large multimodal models to infer causal and temporal consequences implied by an edit. The method aligns reasoned queries to candidate videos without task-specific finetuning. A new benchmark, CoVR-Reason, is introduced to evaluate reasoning in CoVR.
Jorge Amador, a PhD student at KAUST's Visual Computing Center, presented a talk on physically-based simulation for generative AI models. The talk covered the use of synthetic data generation and physical priors to address the need for high-quality datasets. Applications discussed include photo editing, navigation, digital humans, and cosmological simulations. Why it matters: This research explores a promising technique to overcome data scarcity issues in AI, particularly relevant in resource-constrained environments or for sensitive applications.
MBZUAI researchers have developed K2 Think, an open-source AI reasoning system for interpretable energy decisions. K2 Think uses long chain-of-thought supervised fine-tuning and reinforcement learning to improve accuracy on multi-step reasoning in complex energy problems. The system breaks down challenges into smaller, auditable steps and uses test-time scaling for real-time adaptation. Why it matters: The open-source nature of K2 Think promotes transparency, trust, and compliance in critical energy environments while allowing secure deployment on sovereign infrastructure.
MBZUAI is previewing PAN, a next-generation world model designed to simulate diverse realities and advance machine reasoning. PAN allows researchers to test AI agents in simulated environments before real-world deployment, enabling them to learn from mistakes without real-world consequences. It facilitates complex reasoning about actions, outcomes, and interactions, crucial for reliable AI performance in dynamic environments. Why it matters: PAN represents a significant advancement in AI by enabling comprehensive simulation and testing of AI agents, which can revolutionize fields like disaster management and healthcare where real-world experimentation is risky.
Liangming Pan from UCSB presented research on building reliable generative AI agents by integrating symbolic representations with LLMs. The neuro-symbolic strategy combines the flexibility of language models with precise knowledge representation and verifiable reasoning. The work covers Logic-LM, ProgramFC, and learning from automated feedback, aiming to address LLM limitations in complex reasoning tasks. Why it matters: Improving the reliability of LLMs is crucial for high-stakes applications in finance, medicine, and law within the region and globally.