Video-CoM: Interactive Video Reasoning via Chain of Manipulations

arXiv · November 28, 2025 · Significant research

Summary

Researchers at MBZUAI introduce "Interactive Video Reasoning," a new paradigm enabling models to actively "think with videos" by performing iterative visual actions to gather and refine evidence. They developed Video CoM, which reasons through a Chain of Manipulations (CoM), and constructed Video CoM Instruct, an 18K instruction tuning dataset for multi-step manipulation reasoning. The model is further optimized via reinforcement learning with reasoning aware Group Relative Policy Optimization (GRPO), achieving strong results across nine video reasoning benchmarks.

Keywords

Video Reasoning · Chain of Manipulations · MLLMs · Reinforcement Learning · MBZUAI

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

CoVR-R:Reason-Aware Composed Video Retrieval

arXiv · Mar 20

A new approach to composed video retrieval (CoVR) is presented, which leverages large multimodal models to infer causal and temporal consequences implied by an edit. The method aligns reasoned queries to candidate videos without task-specific finetuning. A new benchmark, CoVR-Reason, is introduced to evaluate reasoning in CoVR.

Video-CoM: Interactive Video Reasoning via Chain of Manipulations

Summary

Keywords

Related

CoVR-R:Reason-Aware Composed Video Retrieval