CoVR-R:Reason-Aware Composed Video Retrieval

arXiv · March 20, 2026 · Significant research

Summary

A new approach to composed video retrieval (CoVR) is presented, which leverages large multimodal models to infer causal and temporal consequences implied by an edit. The method aligns reasoned queries to candidate videos without task-specific finetuning. A new benchmark, CoVR-Reason, is introduced to evaluate reasoning in CoVR.

Keywords

video retrieval · multimodal models · reasoning · benchmark · zero-shot

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.