Vicky Kalogeiton from École Polytechnique discussed the importance of multimodality for story-level recognition and generation using video, audio, text, masks and clinical data. She presented on multimodal video understanding using FunnyNet-W and Short Film Dataset. She further showed examples of visual generation from text and other modalities (ET, CAD, DynamicGuidance). Why it matters: Multimodal AI research is growing globally, and this talk highlights the potential of combining different data types for enhanced understanding and generation, which could have implications for various applications, including those relevant to the Middle East.
A panel discussion was hosted at MBZUAI in collaboration with the Manara Center for Coexistence and Dialogue. The discussion centered on the potential of multimodal machine intelligence for human-centered applications, particularly in health and wellbeing. USC Professor Shrikanth Narayanan spoke on creating trustworthy and inclusive AI that considers protected variables. Why it matters: This signals MBZUAI's interest in exploring ethical AI development and its applications for societal good, potentially driving research and policy initiatives in the region.
Thamar Solorio from the University of Houston presented preliminary work on multimodal representation learning for detecting objectionable content in videos at MBZUAI. The research investigates two multimodal pretraining mechanisms, finding contrastive learning more effective than unimodal representation prediction. The study also assesses the value of common multimodal corpora for this task. Why it matters: This research contributes to the development of AI techniques for content moderation, an important issue for online platforms in the Middle East and globally.
Paul Liang from CMU presented on machine learning foundations for multisensory AI, discussing a theoretical framework for modality interactions. The talk covered cross-modal attention and multimodal transformer architectures, and applications in mental health, pathology, and robotics. Liang's research aims to enable AI systems to integrate and learn from diverse real-world sensory modalities. Why it matters: This highlights the growing importance of multimodal AI research and its potential for advancements across various sectors in the region, including healthcare and robotics.
This paper introduces Adaptive Entropy-aware Optimization (AEO), a new framework to tackle Multimodal Open-set Test-time Adaptation (MM-OSTTA). AEO uses Unknown-aware Adaptive Entropy Optimization (UAE) and Adaptive Modality Prediction Discrepancy Optimization (AMP) to distinguish unknown class samples during online adaptation by amplifying the entropy difference between known and unknown samples. The study establishes a new benchmark derived from existing datasets with five modalities and evaluates AEO's performance across various domain shift scenarios, demonstrating its effectiveness in long-term and continual MM-OSTTA settings.
Nicu Sebe from the University of Trento presented recent work on video generation, focusing on animating objects in a source image using external information like labels, driving videos, or text. He introduced a Learnable Game Engine (LGE) trained from monocular annotated videos, which maintains states of scenes, objects, and agents to render controllable viewpoints. Why it matters: This talk highlights advancements in cross-modal AI, potentially enabling new applications in gaming, simulation, and content creation within the region.
This paper introduces rational counterfactuals, a method for identifying counterfactuals that maximize the attainment of a desired consequent. The approach aims to identify the antecedent that leads to a specific outcome for rational decision-making. The theory is applied to identify variable values that contribute to peace, such as Allies, Contingency, Distance, Major Power, Capability, Democracy, and Economic Interdependency. Why it matters: The research provides a framework for analyzing and promoting conditions conducive to peace using counterfactual reasoning.