Speech

5 articles RSS ↗

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

arXiv · Mar 6 · LLM NLP

MBZUAI researchers introduce LLMVoX, a 30M-parameter, LLM-agnostic, autoregressive streaming text-to-speech (TTS) system that generates high-quality speech with low latency. The system preserves the capabilities of the base LLM and achieves a lower Word Error Rate compared to speech-enabled LLMs. LLMVoX supports seamless, infinite-length dialogues and generalizes to new languages with dataset adaptation, including Arabic.

QASR: QCRI Aljazeera Speech Resource -- A Large Scale Annotated Arabic Speech Corpus

arXiv · Jun 24 · NLP Arabic AI

The Qatar Computing Research Institute (QCRI) has released QASR, a 2,000-hour transcribed Arabic speech corpus collected from Aljazeera news broadcasts. The dataset features multi-dialect speech sampled at 16kHz, aligned with lightly supervised transcriptions and linguistically motivated segmentation. QCRI also released a 130M word dataset to improve language model training. Why it matters: QASR enables new research in Arabic speech recognition, dialect identification, punctuation restoration, and other NLP tasks for spoken data.

MBZUAI team wins top prize at inaugural Arabic Natural Language Processing Conference

MBZUAI · Mar 25 · NLP Arabic AI

An MBZUAI team won the best paper award at the inaugural Arabic Natural Language Processing Conference for their work on processing Arabic speech. Their study establishes a new approach to tackle the complexities of spoken Arabic, which differs significantly from text-based language models. The team's approach aims to advance new tools for Arabic speakers by addressing challenges like intonation and the continuous nature of speech. Why it matters: This award highlights the importance of specialized research in Arabic NLP, as mainstream LLMs often face limitations in accurately processing the nuances of Arabic speech.

How dialectal pretraining improves Arabic automatic speech recognition

MBZUAI · Mar 25 · NLP Arabic AI

MBZUAI researchers presented a study at ACL 2024 on improving Arabic ASR by pre-training on dialectal Arabic. They trained three versions of the ArTST model: one on MSA, one on MSA and dialectal data, and one on MSA, dialectal, and multilingual data. Results showed that pre-training on dialectal Arabic improves ASR performance across MSA and various dialects. Why it matters: This research addresses a key challenge in Arabic NLP, given the diversity and lack of standardization in dialects, which could lead to more accurate speech recognition systems.

Processing language like a human

MBZUAI · Mar 25 · NLP Research

MBZUAI's Hanan Al Darmaki is working to improve automated speech recognition (ASR) for low-resource languages, where labeled data is scarce. She notes that Arabic presents unique challenges due to dialectal variations and a lack of written resources corresponding to spoken dialects. Al Darmaki's research focuses on unsupervised speech recognition to address this gap. Why it matters: Overcoming these challenges can improve virtual assistant effectiveness across diverse languages and enable more inclusive AI applications in the Arabic-speaking world.