Foundations of Multisensory Artificial Intelligence

MBZUAI · Notable

AI Research Healthcare Robotics Multimodal

Summary

Paul Liang from CMU presented on machine learning foundations for multisensory AI, discussing a theoretical framework for modality interactions. The talk covered cross-modal attention and multimodal transformer architectures, and applications in mental health, pathology, and robotics. Liang's research aims to enable AI systems to integrate and learn from diverse real-world sensory modalities. Why it matters: This highlights the growing importance of multimodal AI research and its potential for advancements across various sectors in the region, including healthcare and robotics.

Keywords

multisensory AI · machine learning · CMU · modalities · foundation models

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

Unifying Vision Representation

MBZUAI · Invalid Date

This seminar explores vision systems through self-supervised representation learning, addressing challenges and solutions in mainstream vision self-supervised learning methods. It discusses developing versatile representations across modalities, tasks, and architectures to propel the evolution of the vision foundation model. Tong Zhang from EPFL, with a background from Beihang University, New York University, and Australian National University, will lead the talk. Why it matters: Advancing vision foundation models is crucial for expanding AI applications, especially in the Middle East where computer vision can address challenges in areas like urban planning, agriculture, and environmental monitoring.

Multimodality for story-level understanding and generation of visual data

MBZUAI · Invalid Date

Vicky Kalogeiton from École Polytechnique discussed the importance of multimodality for story-level recognition and generation using video, audio, text, masks and clinical data. She presented on multimodal video understanding using FunnyNet-W and Short Film Dataset. She further showed examples of visual generation from text and other modalities (ET, CAD, DynamicGuidance). Why it matters: Multimodal AI research is growing globally, and this talk highlights the potential of combining different data types for enhanced understanding and generation, which could have implications for various applications, including those relevant to the Middle East.

A unified theory of all things visual

MBZUAI · Invalid Date

MBZUAI Professor Fahad Khan is working on a unified theory of machine visual intelligence. His goal is to enable AI systems to better understand and function in complex, chaotic visual environments. The aim is to improve real-world applications like smart cities, personalized healthcare, and autonomous vehicles. Why it matters: This research could significantly advance AI's ability to perceive and interact with the real world, especially in challenging environments common in the developing world.

Super-aligned Machine Intelligence via a Soft Touch

MBZUAI · Invalid Date

Song Chaoyang from the Southern University of Science and Technology (SUSTech) presented research on Vision-Based Tactile Sensing (VBTS) for robot learning, combining soft robotic design with learning algorithms to achieve state-of-the-art performance in tactile perception. Their VBTS solution demonstrates robustness up to 1 million test cycles and enables multi-modal outputs from a single, vision-based input, facilitating applications such as amphibious tactile grasping and industrial welding. The talk also highlighted the DeepClaw system for capturing human demonstration actions, aiming for a universal interaction interface. Why it matters: This research advances embodied intelligence by improving robot dexterity and adaptability through enhanced tactile sensing, which is crucial for complex manipulation tasks in various sectors such as manufacturing and healthcare within the region.

Foundations of Multisensory Artificial Intelligence

Summary

Keywords

Related

Unifying Vision Representation

Multimodality for story-level understanding and generation of visual data

A unified theory of all things visual

Super-aligned Machine Intelligence via a Soft Touch