Manling Li from UIUC proposes a new research direction: Event-Centric Multimodal Knowledge Acquisition, which transforms traditional entity-centric single-modal knowledge into event-centric multi-modal knowledge. The approach addresses challenges in understanding multimodal semantic structures using zero-shot cross-modal transfer (CLIP-Event) and long-horizon temporal dynamics through the Event Graph Model. Li's work aims to enable machines to capture complex timelines and relationships, with applications in timeline generation, meeting summarization, and question answering. Why it matters: This research pioneers a new approach to multimodal information extraction, moving from static entity-based understanding to dynamic, event-centric knowledge acquisition, which is essential for advanced AI applications in understanding complex scenarios.
Dr. Mikhail Burtsev of the London Institute presented research on GENA-LM, a suite of transformer-based DNA language models. The talk addressed the challenge of scaling transformers for genomic sequences, proposing recurrent memory augmentation to handle long input sequences efficiently. This approach improves language modeling performance and holds promise for memory-intensive applications in bioinformatics. Why it matters: This research can significantly advance AI's capabilities in genomics by enabling the processing of much larger DNA sequences, with potential breakthroughs in understanding and treating diseases.
A talk discusses the challenges of single-cell data analysis, such as feature sparsity and the effects of rare cells. AI/ML strategies are uniquely positioned to model this data. ImYoo, a startup founded in 2021, is applying single-cell model architectures for unsupervised discovery of patient groupings and predicting sample-level phenotypical data in autoimmune disease. Why it matters: This highlights the growing application of AI/ML in analyzing single-cell data for population-scale human health studies, an area ripe for innovation and improvement in the Middle East's growing biotech sector.
A new approach to composed video retrieval (CoVR) is presented, which leverages large multimodal models to infer causal and temporal consequences implied by an edit. The method aligns reasoned queries to candidate videos without task-specific finetuning. A new benchmark, CoVR-Reason, is introduced to evaluate reasoning in CoVR.
This paper introduces an interpretable pipeline that integrates mobility and social media data to analyze human behavior during crises. The framework was evaluated through two case studies, including a longitudinal analysis of UAE COVID-19 behavior from March 2020 to December 2021. The pipeline aligns heterogeneous daily signals, transforms them into binary behavioral states, applies Formal Concept Analysis (FCA) to extract co-occurrence structures, and mines association rules. Results demonstrate clear cross-domain behavioral structures in crises, yielding both scientifically credible and policy-actionable intelligence. Why it matters: This work provides a novel methodological approach for developing actionable crisis management strategies by fusing multimodal data, directly applicable to public health and emergency response in the UAE and the broader region.
Nicu Sebe from the University of Trento presented recent work on video generation, focusing on animating objects in a source image using external information like labels, driving videos, or text. He introduced a Learnable Game Engine (LGE) trained from monocular annotated videos, which maintains states of scenes, objects, and agents to render controllable viewpoints. Why it matters: This talk highlights advancements in cross-modal AI, potentially enabling new applications in gaming, simulation, and content creation within the region.