Skip to content
GCC AI Research

Search

Results for "multisensory AI"

Foundations of Multisensory Artificial Intelligence

MBZUAI ·

Paul Liang from CMU presented on machine learning foundations for multisensory AI, discussing a theoretical framework for modality interactions. The talk covered cross-modal attention and multimodal transformer architectures, and applications in mental health, pathology, and robotics. Liang's research aims to enable AI systems to integrate and learn from diverse real-world sensory modalities. Why it matters: This highlights the growing importance of multimodal AI research and its potential for advancements across various sectors in the region, including healthcare and robotics.

Multimodality for story-level understanding and generation of visual data

MBZUAI ·

Vicky Kalogeiton from École Polytechnique discussed the importance of multimodality for story-level recognition and generation using video, audio, text, masks and clinical data. She presented on multimodal video understanding using FunnyNet-W and Short Film Dataset. She further showed examples of visual generation from text and other modalities (ET, CAD, DynamicGuidance). Why it matters: Multimodal AI research is growing globally, and this talk highlights the potential of combining different data types for enhanced understanding and generation, which could have implications for various applications, including those relevant to the Middle East.

A unified theory of all things visual

MBZUAI ·

MBZUAI Professor Fahad Khan is working on a unified theory of machine visual intelligence. His goal is to enable AI systems to better understand and function in complex, chaotic visual environments. The aim is to improve real-world applications like smart cities, personalized healthcare, and autonomous vehicles. Why it matters: This research could significantly advance AI's ability to perceive and interact with the real world, especially in challenging environments common in the developing world.

Super-aligned Machine Intelligence via a Soft Touch

MBZUAI ·

Song Chaoyang from the Southern University of Science and Technology (SUSTech) presented research on Vision-Based Tactile Sensing (VBTS) for robot learning, combining soft robotic design with learning algorithms to achieve state-of-the-art performance in tactile perception. Their VBTS solution demonstrates robustness up to 1 million test cycles and enables multi-modal outputs from a single, vision-based input, facilitating applications such as amphibious tactile grasping and industrial welding. The talk also highlighted the DeepClaw system for capturing human demonstration actions, aiming for a universal interaction interface. Why it matters: This research advances embodied intelligence by improving robot dexterity and adaptability through enhanced tactile sensing, which is crucial for complex manipulation tasks in various sectors such as manufacturing and healthcare within the region.

Humanizing Technology with Assistive Augmentations

MBZUAI ·

This article discusses a talk on "Assistive Augmentation," designing human-computer interfaces to augment human abilities. Examples include 'AiSee' for blind users, 'Prospero' for memory training, and 'MuSS-Bits' for deaf users to feel music. Suranga Nanayakkara from the National University of Singapore will present the talk, highlighting insights from psychology, human-centered machine learning, and design thinking. Why it matters: Such assistive technologies can significantly improve the quality of life for individuals with disabilities and extend human capabilities.

Multimodal machine intelligence and its human-centered possibilities

MBZUAI ·

A panel discussion was hosted at MBZUAI in collaboration with the Manara Center for Coexistence and Dialogue. The discussion centered on the potential of multimodal machine intelligence for human-centered applications, particularly in health and wellbeing. USC Professor Shrikanth Narayanan spoke on creating trustworthy and inclusive AI that considers protected variables. Why it matters: This signals MBZUAI's interest in exploring ethical AI development and its applications for societal good, potentially driving research and policy initiatives in the region.

Cross-modal understanding and generation of multimodal content

MBZUAI ·

Nicu Sebe from the University of Trento presented recent work on video generation, focusing on animating objects in a source image using external information like labels, driving videos, or text. He introduced a Learnable Game Engine (LGE) trained from monocular annotated videos, which maintains states of scenes, objects, and agents to render controllable viewpoints. Why it matters: This talk highlights advancements in cross-modal AI, potentially enabling new applications in gaming, simulation, and content creation within the region.

Multimodal Factual Knowledge Acquisition

MBZUAI ·

Manling Li from UIUC proposes a new research direction: Event-Centric Multimodal Knowledge Acquisition, which transforms traditional entity-centric single-modal knowledge into event-centric multi-modal knowledge. The approach addresses challenges in understanding multimodal semantic structures using zero-shot cross-modal transfer (CLIP-Event) and long-horizon temporal dynamics through the Event Graph Model. Li's work aims to enable machines to capture complex timelines and relationships, with applications in timeline generation, meeting summarization, and question answering. Why it matters: This research pioneers a new approach to multimodal information extraction, moving from static entity-based understanding to dynamic, event-centric knowledge acquisition, which is essential for advanced AI applications in understanding complex scenarios.