Skip to content
GCC AI Research

Search

Results for "Autodub"

I see what you’re saying: the Abu Dhabi AI researchers making video dubbing sync

MBZUAI ·

Researchers at MBZUAI have developed Auto-DUB, a system using deep learning, NLP, and CV to improve audio-visual dubbing, particularly for educational videos. The three-step process generates subtitles, creates an audio representation, and synchronizes the audio with lip movements. The system aims to overcome language barriers in e-learning by providing accurate translations and lip-synced audio. Why it matters: This research addresses a critical need in online education by making content more accessible to non-native English speakers, potentially expanding access to global educational resources in the Arab world.

MBZUAI teams shine in competition

MBZUAI ·

Two teams from MBZUAI won awards at the IEEE SLT international hackathon held in Qatar. One team won the "Best Potential Impact Project" award for Autodub, a human-in-the-loop AI dubbing platform. The second MBZUAI team won the "Craziest Idea Award" for a commentator voice synthesizer for video games. Why it matters: The wins highlight MBZUAI's strength in applied AI research and its students' ability to develop innovative solutions with practical applications.

The future of audio AI: adoption use cases powering the Middle East

MBZUAI ·

ElevenLabs, a voice AI research and product company, presented at MBZUAI's Incubation and Entrepreneurship Center (IEC) on the adoption of audio AI in the Middle East. Hussein Makki, general manager for the Middle East at ElevenLabs, highlighted the potential of voice-native AI across sectors like telecommunications, banking, and education. ElevenLabs focuses on making content accessible and engaging across languages and voices through its text-to-speech models. Why it matters: This signals growing interest and investment in voice AI applications within the region, potentially transforming customer service and content accessibility in Arabic.

How MBZUAI’s Incubation and Entrepreneurship Center is helping two students revolutionize content creation

MBZUAI ·

MBZUAI students Muhammad Taimoor Haseeb and Ahmad Hammoudeh have created Audiomatic, an AI-driven platform that automates audio tasks for visual storytelling and addresses licensing challenges. The platform allows users to upload videos and automatically find suitable audio elements, streamlining the content creation process. The MBZUAI Incubation and Entrepreneurship Center (IEC) is providing support to help commercialize the platform. Why it matters: This platform has the potential to significantly impact the content creation industry in the region by simplifying audio production and mitigating licensing issues, while also highlighting MBZUAI's role in fostering AI innovation and entrepreneurship.

Upsampling Autoencoder for Self-Supervised Point Cloud Learning

arXiv ·

This paper introduces a self-supervised learning method for point cloud analysis using an upsampling autoencoder (UAE). The model uses subsampling and an encoder-decoder architecture to reconstruct the original point cloud, learning both semantic and geometric information. Experiments show the UAE outperforms existing methods in shape classification, part segmentation, and point cloud upsampling tasks.

Inferring and Improving Street Maps with Data-Driven Automation

arXiv ·

Researchers at MIT and QCRI developed Mapster, a human-in-the-loop street map editing system. Mapster incorporates high-precision automatic map inference, data refinement, and machine-assisted map editing. Evaluation across forty cities using satellite imagery, GPS trajectories, and ground-truth data demonstrates Mapster's ability to make automation practical for map editing. Why it matters: This system could significantly improve the accuracy and completeness of street maps in rapidly developing urban areas across the Middle East.

OmniGen: Unified Multimodal Sensor Generation for Autonomous Driving

arXiv ·

The paper introduces OmniGen, a unified framework for generating aligned multimodal sensor data for autonomous driving using a shared Bird's Eye View (BEV) space. It uses a novel generalizable multimodal reconstruction method (UAE) to jointly decode LiDAR and multi-view camera data through volume rendering. The framework incorporates a Diffusion Transformer (DiT) with a ControlNet branch to enable controllable multimodal sensor generation, demonstrating good performance and multimodal consistency.