KAUST Professor Peter Markowich discusses the role of mathematics in football, describing a match as a random process with a drift. The randomness stems from player conditions, referee decisions, weather, and more, while the drift represents the higher probability of the better team winning. He notes that the complexity arising from 11 players on each side increases the randomness compared to sports like tennis. Why it matters: This perspective highlights the interplay of chance and skill in sports, offering a mathematical lens for understanding game dynamics.
MBZUAI researchers presented a new approach to video analysis at ICCV in Paris, led by Syed Talal Wasim. The approach builds on still image processing techniques like focal modulation to analyze spatial and temporal information in video separately. It aims to improve temporal aggregation while avoiding the computational complexity of transformers. Why it matters: This research advances video understanding in computer vision by offering a more efficient method for temporal modeling, crucial for applications like activity recognition and video surveillance.
Nicu Sebe from the University of Trento presented recent work on video generation, focusing on animating objects in a source image using external information like labels, driving videos, or text. He introduced a Learnable Game Engine (LGE) trained from monocular annotated videos, which maintains states of scenes, objects, and agents to render controllable viewpoints. Why it matters: This talk highlights advancements in cross-modal AI, potentially enabling new applications in gaming, simulation, and content creation within the region.
MBZUAI researchers developed MedAgentSim, a simulated hospital environment to evaluate AI diagnostic abilities. The simulation uses LLM-powered agents to mimic doctor-patient conversations, providing a dynamic assessment of diagnostic skills. The system includes doctor, patient, and evaluator agents that interact within the simulated hospital, making real-time decisions. Why it matters: This research offers a more realistic evaluation of AI in clinical settings, addressing limitations of current benchmarks and potentially improving AI's use in healthcare.
A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.