Skip to content
GCC AI Research

NatiQ: An End-to-end Text-to-Speech System for Arabic

arXiv · · Significant research

Summary

Qatar Computing Research Institute (QCRI) has developed NatiQ, an end-to-end text-to-speech (TTS) system for Arabic utilizing encoder-decoder architectures. The system employs Tacotron-based models and Transformer models to generate mel-spectrograms, which are then synthesized into waveforms using vocoders like WaveRNN, WaveGlow, and Parallel WaveGAN. Trained on in-house speech data featuring a neutral male voice (Hamza) and an expressive female voice (Amina), NatiQ achieves a Mean Opinion Score (MOS) of 4.21 and 4.40, respectively. Why it matters: This research advances Arabic language technology, providing high-quality TTS synthesis that can enhance accessibility and usability of digital content for Arabic speakers.

Keywords

Text-to-speech · Arabic · TTS · QCRI · NatiQ

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

SpokenNativQA: Multilingual Everyday Spoken Queries for LLMs

arXiv ·

The Qatar Computing Research Institute (QCRI) has released SpokenNativQA, a multilingual spoken question-answering dataset for evaluating LLMs in conversational settings. The dataset contains 33,000 naturally spoken questions and answers across multiple languages, including low-resource and dialect-rich languages. It aims to address the limitations of text-based QA datasets by incorporating speech variability, accents, and linguistic diversity. Why it matters: This benchmark enables more robust evaluation of LLMs in speech-based interactions, particularly for Arabic dialects and other low-resource languages.

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

arXiv ·

MBZUAI researchers introduce LLMVoX, a 30M-parameter, LLM-agnostic, autoregressive streaming text-to-speech (TTS) system that generates high-quality speech with low latency. The system preserves the capabilities of the base LLM and achieves a lower Word Error Rate compared to speech-enabled LLMs. LLMVoX supports seamless, infinite-length dialogues and generalizes to new languages with dataset adaptation, including Arabic.

QASR: QCRI Aljazeera Speech Resource -- A Large Scale Annotated Arabic Speech Corpus

arXiv ·

The Qatar Computing Research Institute (QCRI) has released QASR, a 2,000-hour transcribed Arabic speech corpus collected from Aljazeera news broadcasts. The dataset features multi-dialect speech sampled at 16kHz, aligned with lightly supervised transcriptions and linguistically motivated segmentation. QCRI also released a 130M word dataset to improve language model training. Why it matters: QASR enables new research in Arabic speech recognition, dialect identification, punctuation restoration, and other NLP tasks for spoken data.

Sadeed: Advancing Arabic Diacritization Through Small Language Model

arXiv ·

The paper introduces Sadeed, a fine-tuned decoder-only language model based on the Kuwain 1.5B Hennara model, for improved Arabic text diacritization. Sadeed is fine-tuned on high-quality diacritized datasets and achieves competitive results compared to larger proprietary models. The authors also introduce SadeedDiac-25, a new benchmark for fairer evaluation of Arabic diacritization across diverse text genres. Why it matters: This work advances Arabic NLP by providing both a competitive diacritization model and a more robust evaluation benchmark, facilitating further research and development in the field.