Skip to content
GCC AI Research

Foundations of Multisensory Artificial Intelligence

MBZUAI · Notable

Summary

Paul Liang from CMU presented on machine learning foundations for multisensory AI, discussing a theoretical framework for modality interactions. The talk covered cross-modal attention and multimodal transformer architectures, and applications in mental health, pathology, and robotics. Liang's research aims to enable AI systems to integrate and learn from diverse real-world sensory modalities. Why it matters: This highlights the growing importance of multimodal AI research and its potential for advancements across various sectors in the region, including healthcare and robotics.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

A unified theory of all things visual

MBZUAI ·

MBZUAI Professor Fahad Khan is working on a unified theory of machine visual intelligence. His goal is to enable AI systems to better understand and function in complex, chaotic visual environments. The aim is to improve real-world applications like smart cities, personalized healthcare, and autonomous vehicles. Why it matters: This research could significantly advance AI's ability to perceive and interact with the real world, especially in challenging environments common in the developing world.

Unifying Vision Representation

MBZUAI ·

This seminar explores vision systems through self-supervised representation learning, addressing challenges and solutions in mainstream vision self-supervised learning methods. It discusses developing versatile representations across modalities, tasks, and architectures to propel the evolution of the vision foundation model. Tong Zhang from EPFL, with a background from Beihang University, New York University, and Australian National University, will lead the talk. Why it matters: Advancing vision foundation models is crucial for expanding AI applications, especially in the Middle East where computer vision can address challenges in areas like urban planning, agriculture, and environmental monitoring.

Multimodality for story-level understanding and generation of visual data

MBZUAI ·

Vicky Kalogeiton from École Polytechnique discussed the importance of multimodality for story-level recognition and generation using video, audio, text, masks and clinical data. She presented on multimodal video understanding using FunnyNet-W and Short Film Dataset. She further showed examples of visual generation from text and other modalities (ET, CAD, DynamicGuidance). Why it matters: Multimodal AI research is growing globally, and this talk highlights the potential of combining different data types for enhanced understanding and generation, which could have implications for various applications, including those relevant to the Middle East.

Multimodal Factual Knowledge Acquisition

MBZUAI ·

Manling Li from UIUC proposes a new research direction: Event-Centric Multimodal Knowledge Acquisition, which transforms traditional entity-centric single-modal knowledge into event-centric multi-modal knowledge. The approach addresses challenges in understanding multimodal semantic structures using zero-shot cross-modal transfer (CLIP-Event) and long-horizon temporal dynamics through the Event Graph Model. Li's work aims to enable machines to capture complex timelines and relationships, with applications in timeline generation, meeting summarization, and question answering. Why it matters: This research pioneers a new approach to multimodal information extraction, moving from static entity-based understanding to dynamic, event-centric knowledge acquisition, which is essential for advanced AI applications in understanding complex scenarios.