Skip to content
GCC AI Research

Search

Results for "foundational models"

Evolution of Foundational Models: From Deep Learning in Healthcare to Neuro-inspired AI

MBZUAI ·

IBM Fellow Dr. Tanveer Syeda-Mahmood gave a talk on the evolution of foundational models, covering multimodal fusion in healthcare and neuro-inspired AI for computer vision. She also discussed image-driven fact-checking of generative AI textual reports for responsible models. Dr. Syeda-Mahmood leads IBM's work in Multimodal Bioinspired AI and WatsonX features, and previously led the Medical Sieve Radiology Grand Challenge. Why it matters: The talk highlights the ongoing development and application of AI foundational models in critical areas like healthcare and responsible AI development, showing IBM's continued investment in these areas.

Foundations of Multisensory Artificial Intelligence

MBZUAI ·

Paul Liang from CMU presented on machine learning foundations for multisensory AI, discussing a theoretical framework for modality interactions. The talk covered cross-modal attention and multimodal transformer architectures, and applications in mental health, pathology, and robotics. Liang's research aims to enable AI systems to integrate and learn from diverse real-world sensory modalities. Why it matters: This highlights the growing importance of multimodal AI research and its potential for advancements across various sectors in the region, including healthcare and robotics.

From Learning, to Meta-Learning, to Lego-Learning — theory, systems, and engineering

MBZUAI ·

MBZUAI President Eric Xing delivered a talk at Carnegie Mellon University on May 13, 2022, titled “From Learning, to Meta-Learning, to Lego-Learning — theory, systems, and engineering.” Xing discussed the development of a standard model for learning, inspired by the standard model in physics, which aims to unify various machine learning paradigms. Before joining MBZUAI, Xing was a professor at CMU and founder of Petuum Inc., an AI development platform company. Why it matters: This talk highlights MBZUAI's leadership in advancing theoretical frameworks for machine learning and its commitment to unifying different AI approaches.

Unifying Vision Representation

MBZUAI ·

This seminar explores vision systems through self-supervised representation learning, addressing challenges and solutions in mainstream vision self-supervised learning methods. It discusses developing versatile representations across modalities, tasks, and architectures to propel the evolution of the vision foundation model. Tong Zhang from EPFL, with a background from Beihang University, New York University, and Australian National University, will lead the talk. Why it matters: Advancing vision foundation models is crucial for expanding AI applications, especially in the Middle East where computer vision can address challenges in areas like urban planning, agriculture, and environmental monitoring.

VideoMolmo: Spatio-Temporal Grounding Meets Pointing

arXiv ·

Researchers from MBZUAI have introduced VideoMolmo, a large multimodal model for spatio-temporal pointing conditioned on textual descriptions. The model incorporates a temporal module with an attention mechanism and a temporal mask fusion pipeline using SAM2 for improved coherence across video sequences. They also curated a dataset of 72k video-caption pairs and introduced VPoS-Bench, a benchmark for evaluating generalization across real-world scenarios, with code and models publicly available.

A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos

arXiv ·

A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.