Researchers at MBZUAI have developed Auto-DUB, a system using deep learning, NLP, and CV to improve audio-visual dubbing, particularly for educational videos. The three-step process generates subtitles, creates an audio representation, and synchronizes the audio with lip movements. The system aims to overcome language barriers in e-learning by providing accurate translations and lip-synced audio. Why it matters: This research addresses a critical need in online education by making content more accessible to non-native English speakers, potentially expanding access to global educational resources in the Arab world.
ElevenLabs, a voice AI research and product company, presented at MBZUAI's Incubation and Entrepreneurship Center (IEC) on the adoption of audio AI in the Middle East. Hussein Makki, general manager for the Middle East at ElevenLabs, highlighted the potential of voice-native AI across sectors like telecommunications, banking, and education. ElevenLabs focuses on making content accessible and engaging across languages and voices through its text-to-speech models. Why it matters: This signals growing interest and investment in voice AI applications within the region, potentially transforming customer service and content accessibility in Arabic.
MBZUAI researchers introduce LLMVoX, a 30M-parameter, LLM-agnostic, autoregressive streaming text-to-speech (TTS) system that generates high-quality speech with low latency. The system preserves the capabilities of the base LLM and achieves a lower Word Error Rate compared to speech-enabled LLMs. LLMVoX supports seamless, infinite-length dialogues and generalizes to new languages with dataset adaptation, including Arabic.
The UAE government has issued a warning to the public regarding the dangers of misleading AI-generated videos, particularly those used to spread rumors and false information. Authorities emphasized the importance of verifying the credibility of video content before sharing it on social media. The warning highlights potential legal consequences for individuals involved in creating or disseminating such content. Why it matters: This proactive stance reflects growing concerns in the UAE about the misuse of AI-driven technologies and its commitment to combatting disinformation.
MBZUAI researchers, in collaboration with Monash University, have introduced ArEnAV, a new dataset for deepfake detection featuring Arabic-English code-switching. The dataset comprises 765 hours of manipulated YouTube videos, incorporating intra-utterance code-switching and dialect variations. Experiments showed that code-switching significantly reduces the performance of existing deepfake detectors. Why it matters: This work addresses a critical gap in AI's ability to handle linguistic diversity, particularly in regions where code-switching is prevalent, enhancing the reliability of deepfake detection in real-world scenarios.