MBZUAI's Metaverse Lab is developing AI algorithms for photorealistic virtual humans and dynamic environments. Hao Li, Director of the lab, envisions using the metaverse for immersive learning experiences related to history and culture. He is also working on tools to prevent deepfakes and other cyberthreats. Why it matters: This research at MBZUAI aims to advance AI and immersive technologies for education and address potential risks in the metaverse.
MBZUAI's Dr. Hao Li is working on using AI and 3D telepresence to transform communication, work, and education by replacing physical transportation with virtual teleportation. His research focuses on the intersection of computer graphics, computer vision, and AI, specifically virtual avatar creation and facial performance capture. Li aims to improve communication using AI to achieve what cannot be done in real life. Why it matters: This research has the potential to reduce carbon footprints by enabling remote work and virtual collaboration, while also positioning MBZUAI and the UAE as leaders in AI-driven metaverse technologies.
MBZUAI's Metaverse Center is developing technologies for realistic avatar generation. Hao Li and colleagues presented a novel approach at CVPR 2024, collaborating with ETH Zurich, VinAI Research, and Pinscreen. The technology addresses the challenge of mapping 2D images to 3D avatars, accounting for poses, expressions, and views. Why it matters: Creating realistic and efficient avatar generation could improve user experience and accessibility in virtual environments across the Middle East.
The article discusses research on fine-tuning text-to-image diffusion models, including reward function training, online reinforcement learning (RL) fine-tuning, and addressing reward over-optimization. A Text-Image Alignment Assessment (TIA2) benchmark is introduced to study reward over-optimization. TextNorm, a method for confidence calibration in reward models, is presented to reduce over-optimization risks. Why it matters: Improving the alignment and fidelity of text-to-image models is crucial for generating high-quality content, and addressing over-optimization enhances the reliability of these models in creative applications.
FancyVideo, a new video generator, introduces a Cross-frame Textual Guidance Module (CTGM) to enhance text-to-video models. CTGM uses a Temporal Information Injector and Temporal Affinity Refiner to achieve frame-specific textual guidance, improving comprehension of temporal logic. Experiments on the EvalCrafter benchmark demonstrate FancyVideo's state-of-the-art performance in generating dynamic and consistent videos, also supporting image-to-video tasks.
MBZUAI has launched a Metaverse Lab led by Hao Li, focusing on integrating computer vision, graphics, and machine learning for metaverse applications. The lab aims to develop AI algorithms for photorealistic virtual humans and dynamic environment digitization. Pinscreen, Li's AI startup, previously created avatars for Expo 2020 Dubai. Why it matters: This initiative positions MBZUAI and the UAE as key players in the development of core technologies underpinning the metaverse and digital communication.
MBZUAI researchers demonstrated a low-latency, multilingual multimodal AI system at GITEX that integrates speech, text, and visual capabilities for more lifelike human-machine conversation. The demo, led by Dr. Hisham Cholakkal, includes a mobile app where users can point their camera at an object and ask questions, receiving spoken answers in multiple languages. They are also integrating the model into a robot dog that can respond to voice commands. Why it matters: This work addresses key challenges in deploying LLMs to real-world applications in the Middle East, such as multilingual support and real-time responsiveness.