CoVR-R:Reason-Aware Composed Video Retrieval
arXiv · · Significant research
Summary
A new approach to composed video retrieval (CoVR) is presented, which leverages large multimodal models to infer causal and temporal consequences implied by an edit. The method aligns reasoned queries to candidate videos without task-specific finetuning. A new benchmark, CoVR-Reason, is introduced to evaluate reasoning in CoVR.
Keywords
video retrieval · multimodal models · reasoning · benchmark · zero-shot
Get the weekly digest
Top AI stories from the GCC region, every week.