Skip to content
GCC AI Research

Search

Results for "voice synthesizer"

NatiQ: An End-to-end Text-to-Speech System for Arabic

arXiv ·

Qatar Computing Research Institute (QCRI) has developed NatiQ, an end-to-end text-to-speech (TTS) system for Arabic utilizing encoder-decoder architectures. The system employs Tacotron-based models and Transformer models to generate mel-spectrograms, which are then synthesized into waveforms using vocoders like WaveRNN, WaveGlow, and Parallel WaveGAN. Trained on in-house speech data featuring a neutral male voice (Hamza) and an expressive female voice (Amina), NatiQ achieves a Mean Opinion Score (MOS) of 4.21 and 4.40, respectively. Why it matters: This research advances Arabic language technology, providing high-quality TTS synthesis that can enhance accessibility and usability of digital content for Arabic speakers.

Text-to-speech system brings real-time speech to LLMs

MBZUAI ·

MBZUAI researchers developed LLMVoX, a system enabling LLMs to produce real-time speech, including Arabic. LLMVoX addresses limitations of existing end-to-end and cascaded pipeline approaches, which suffer from either degraded reasoning or latency. LLMVoX was developed as part of Project OMER, which was recently awarded Regional Research Grant from Meta. Why it matters: This enhances the potential of LLMs to function as more natural, multimodal virtual assistants, especially for Arabic-speaking users in the Middle East.

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

arXiv ·

MBZUAI researchers introduce LLMVoX, a 30M-parameter, LLM-agnostic, autoregressive streaming text-to-speech (TTS) system that generates high-quality speech with low latency. The system preserves the capabilities of the base LLM and achieves a lower Word Error Rate compared to speech-enabled LLMs. LLMVoX supports seamless, infinite-length dialogues and generalizes to new languages with dataset adaptation, including Arabic.

The future of audio AI: adoption use cases powering the Middle East

MBZUAI ·

ElevenLabs, a voice AI research and product company, presented at MBZUAI's Incubation and Entrepreneurship Center (IEC) on the adoption of audio AI in the Middle East. Hussein Makki, general manager for the Middle East at ElevenLabs, highlighted the potential of voice-native AI across sectors like telecommunications, banking, and education. ElevenLabs focuses on making content accessible and engaging across languages and voices through its text-to-speech models. Why it matters: This signals growing interest and investment in voice AI applications within the region, potentially transforming customer service and content accessibility in Arabic.

Making human-machine conversation more lifelike than ever at GITEX

MBZUAI ·

MBZUAI researchers demonstrated a low-latency, multilingual multimodal AI system at GITEX that integrates speech, text, and visual capabilities for more lifelike human-machine conversation. The demo, led by Dr. Hisham Cholakkal, includes a mobile app where users can point their camera at an object and ask questions, receiving spoken answers in multiple languages. They are also integrating the model into a robot dog that can respond to voice commands. Why it matters: This work addresses key challenges in deploying LLMs to real-world applications in the Middle East, such as multilingual support and real-time responsiveness.