MBZUAI students Muhammad Taimoor Haseeb and Ahmad Hammoudeh have created Audiomatic, an AI-driven platform that automates audio tasks for visual storytelling and addresses licensing challenges. The platform allows users to upload videos and automatically find suitable audio elements, streamlining the content creation process. The MBZUAI Incubation and Entrepreneurship Center (IEC) is providing support to help commercialize the platform. Why it matters: This platform has the potential to significantly impact the content creation industry in the region by simplifying audio production and mitigating licensing issues, while also highlighting MBZUAI's role in fostering AI innovation and entrepreneurship.
ElevenLabs, a voice AI research and product company, presented at MBZUAI's Incubation and Entrepreneurship Center (IEC) on the adoption of audio AI in the Middle East. Hussein Makki, general manager for the Middle East at ElevenLabs, highlighted the potential of voice-native AI across sectors like telecommunications, banking, and education. ElevenLabs focuses on making content accessible and engaging across languages and voices through its text-to-speech models. Why it matters: This signals growing interest and investment in voice AI applications within the region, potentially transforming customer service and content accessibility in Arabic.
MBZUAI's Incubation and Entrepreneurship Center (MIEC), launched in November 2023, is fostering AI-driven startups, including LibrAI, Audiomatic, and Limb. LibrAI is an AI safety monitoring platform founded by MBZUAI postdoctoral researcher Xudong Han. Audiomatic, created by MBZUAI students Muhammad Taimoor Haseeb and Ahmad Hammoudeh, is an AI-powered audio integration platform. Why it matters: These startups demonstrate MBZUAI's role in translating AI research into practical solutions, contributing to the UAE's innovation ecosystem and addressing real-world challenges.
MBZUAI and startAD jointly launched an entrepreneurship program to boost the AI startup ecosystem in Abu Dhabi. The program culminated in startup pitches, with top ideas including Audiomatic for AI-assisted audio production, Limb for accessible physiotherapy information, and Momzo, a generative AI assistant for maternity and motherhood. The 22 graduates, representing over 10 nationalities, completed intensive courses covering idea generation, prototyping, and pitching. Why it matters: This initiative underscores the UAE's commitment to fostering AI innovation and entrepreneurship, aiming to translate research into impactful businesses and contribute significantly to the nation's knowledge economy.
MBZUAI's Hanan Al Darmaki is working to improve automated speech recognition (ASR) for low-resource languages, where labeled data is scarce. She notes that Arabic presents unique challenges due to dialectal variations and a lack of written resources corresponding to spoken dialects. Al Darmaki's research focuses on unsupervised speech recognition to address this gap. Why it matters: Overcoming these challenges can improve virtual assistant effectiveness across diverse languages and enable more inclusive AI applications in the Arabic-speaking world.
Qatar Computing Research Institute (QCRI) has developed NatiQ, an end-to-end text-to-speech (TTS) system for Arabic utilizing encoder-decoder architectures. The system employs Tacotron-based models and Transformer models to generate mel-spectrograms, which are then synthesized into waveforms using vocoders like WaveRNN, WaveGlow, and Parallel WaveGAN. Trained on in-house speech data featuring a neutral male voice (Hamza) and an expressive female voice (Amina), NatiQ achieves a Mean Opinion Score (MOS) of 4.21 and 4.40, respectively. Why it matters: This research advances Arabic language technology, providing high-quality TTS synthesis that can enhance accessibility and usability of digital content for Arabic speakers.
Researchers at MBZUAI have developed Auto-DUB, a system using deep learning, NLP, and CV to improve audio-visual dubbing, particularly for educational videos. The three-step process generates subtitles, creates an audio representation, and synchronizes the audio with lip movements. The system aims to overcome language barriers in e-learning by providing accurate translations and lip-synced audio. Why it matters: This research addresses a critical need in online education by making content more accessible to non-native English speakers, potentially expanding access to global educational resources in the Arab world.
The Qatar Computing Research Institute (QCRI) has released SpokenNativQA, a multilingual spoken question-answering dataset for evaluating LLMs in conversational settings. The dataset contains 33,000 naturally spoken questions and answers across multiple languages, including low-resource and dialect-rich languages. It aims to address the limitations of text-based QA datasets by incorporating speech variability, accents, and linguistic diversity. Why it matters: This benchmark enables more robust evaluation of LLMs in speech-based interactions, particularly for Arabic dialects and other low-resource languages.