A new approach to composed video retrieval (CoVR) is presented, which leverages large multimodal models to infer causal and temporal consequences implied by an edit. The method aligns reasoned queries to candidate videos without task-specific finetuning. A new benchmark, CoVR-Reason, is introduced to evaluate reasoning in CoVR.
The article discusses immersive analytics, which uses VR and AR to visualize data in 3D and embed it into the user's environment, and reviews systems and techniques from the Data Visualisation and Immersive Analytics lab at Monash University. It explores the concept of "embodied sensemaking" and its potential to improve how people work with complex data. Professor Tim Dwyer directs the Data Visualisation and Immersive Analytics Lab at Monash University. Why it matters: Immersive analytics could significantly enhance data comprehension and decision-making across various sectors in the Middle East, where large-scale projects and smart city initiatives generate vast datasets.
This article discusses the evolution of mobile extended reality (MEX) and its potential to revolutionize urban interaction. It highlights the convergence of augmented and virtual reality technologies for mobile usage. A novel approach to 3D models, characterized as urban situated models or “3D-plus-time” (4D.City), is introduced. Why it matters: The development of MEX and 4D.City could significantly enhance user experience and analog-digital convergence in urban environments, offering new possibilities for human-computer interaction.
Egor Zakharov from ETH Zurich AIT lab will present research on creating controllable and detailed 3D head avatars using data from consumer-grade devices. The presentation will cover high-fidelity image-based facial reconstruction/animation and video-based reconstruction of detailed structures like hairstyles. He will showcase integrating human-centric assets into virtual environments for real-time telepresence and entertainment. Why it matters: This research contributes to advancements in digital human modeling and telepresence, with applications in communication and gaming within the region.
A new paper at ICCV 2025, co-authored by MBZUAI Ph.D. student Dmitry Demidov, introduces Dense-WebVid-CoVR, a 1.6-million sample benchmark for composed video retrieval (CoVR). The benchmark features longer, context-rich descriptions and modification texts, generated using Gemini Pro and GPT-4o, with manual verification. The paper also presents a unified fusion approach that jointly reasons across video and text inputs, improving performance on fine-grained edit details. Why it matters: This work advances video search capabilities by enabling more human-like queries, which is crucial for creative and analytic workflows that require nuanced video retrieval.
KAUST's Visual Computing Center (VCC) hosted an Open House event on March 28, showcasing its interdisciplinary research in visual computing. Demonstrations included a virtual reality driving simulator by FalconViz, intended for driver education in Saudi Arabia. Researchers also presented a drone trained to autonomously navigate race courses and a neural network for autonomous driving using image-based technology without GPS. Why it matters: The VCC's work highlights KAUST's role in advancing visual computing applications relevant to Saudi Arabia, from driver training to autonomous systems.
Pong C Yuen from Hong Kong Baptist University will present a talk on remote photoplethysmography (rPPG) detection. The talk will review the development of rPPG detection, share recent research, and discuss future directions. rPPG is a technology for non-contact computer vision and healthcare applications like heart rate estimation. Why it matters: Advancements in rPPG could enable new remote patient monitoring and diagnostic tools in the region, reducing the need for physical contact.