Middle East AI

Topics

Speech

1 article RSS ↗

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

arXiv · · LLM NLP

MBZUAI researchers introduce LLMVoX, a 30M-parameter, LLM-agnostic, autoregressive streaming text-to-speech (TTS) system that generates high-quality speech with low latency. The system preserves the capabilities of the base LLM and achieves a lower Word Error Rate compared to speech-enabled LLMs. LLMVoX supports seamless, infinite-length dialogues and generalizes to new languages with dataset adaptation, including Arabic.