MBZUAI's Dr. Hao Li is working on using AI and 3D telepresence to transform communication, work, and education by replacing physical transportation with virtual teleportation. His research focuses on the intersection of computer graphics, computer vision, and AI, specifically virtual avatar creation and facial performance capture. Li aims to improve communication using AI to achieve what cannot be done in real life. Why it matters: This research has the potential to reduce carbon footprints by enabling remote work and virtual collaboration, while also positioning MBZUAI and the UAE as leaders in AI-driven metaverse technologies.
MBZUAI's Metaverse Lab is developing AI algorithms for photorealistic virtual humans and dynamic environments. Hao Li, Director of the lab, envisions using the metaverse for immersive learning experiences related to history and culture. He is also working on tools to prevent deepfakes and other cyberthreats. Why it matters: This research at MBZUAI aims to advance AI and immersive technologies for education and address potential risks in the metaverse.
MBZUAI has launched a Metaverse Lab led by Hao Li, focusing on integrating computer vision, graphics, and machine learning for metaverse applications. The lab aims to develop AI algorithms for photorealistic virtual humans and dynamic environment digitization. Pinscreen, Li's AI startup, previously created avatars for Expo 2020 Dubai. Why it matters: This initiative positions MBZUAI and the UAE as key players in the development of core technologies underpinning the metaverse and digital communication.
MBZUAI's Metaverse Center is developing technologies for realistic avatar generation. Hao Li and colleagues presented a novel approach at CVPR 2024, collaborating with ETH Zurich, VinAI Research, and Pinscreen. The technology addresses the challenge of mapping 2D images to 3D avatars, accounting for poses, expressions, and views. Why it matters: Creating realistic and efficient avatar generation could improve user experience and accessibility in virtual environments across the Middle East.
MBZUAI researchers developed LLMVoX, a system enabling LLMs to produce real-time speech, including Arabic. LLMVoX addresses limitations of existing end-to-end and cascaded pipeline approaches, which suffer from either degraded reasoning or latency. LLMVoX was developed as part of Project OMER, which was recently awarded Regional Research Grant from Meta. Why it matters: This enhances the potential of LLMs to function as more natural, multimodal virtual assistants, especially for Arabic-speaking users in the Middle East.
The article discusses research on fine-tuning text-to-image diffusion models, including reward function training, online reinforcement learning (RL) fine-tuning, and addressing reward over-optimization. A Text-Image Alignment Assessment (TIA2) benchmark is introduced to study reward over-optimization. TextNorm, a method for confidence calibration in reward models, is presented to reduce over-optimization risks. Why it matters: Improving the alignment and fidelity of text-to-image models is crucial for generating high-quality content, and addressing over-optimization enhances the reliability of these models in creative applications.
Nicu Sebe from the University of Trento presented recent work on video generation, focusing on animating objects in a source image using external information like labels, driving videos, or text. He introduced a Learnable Game Engine (LGE) trained from monocular annotated videos, which maintains states of scenes, objects, and agents to render controllable viewpoints. Why it matters: This talk highlights advancements in cross-modal AI, potentially enabling new applications in gaming, simulation, and content creation within the region.