Skip to content
GCC AI Research

Search

Results for "speech technologies"

Past, Present and Future of Speech Technologies

MBZUAI ·

Pedro J. Moreno, former head of ASR R&D at Google, presented a talk at MBZUAI on the past, present, and future of speech technologies. The talk covered the evolution of speech tech, his career contributions including work on Google Voice search, and the impact of LLMs on speech science. He also discussed the interplay between foundational and applied research and preparing the next generation of scientists. Why it matters: The talk provides insights into the trajectory of speech technologies from a leading researcher, highlighting future directions and the ethical considerations surrounding AI's impact on society.

Research talk on Privacy and Security Issues in Speech

MBZUAI ·

A research talk was given on privacy and security issues in speech processing, highlighting the unique privacy challenges due to the biometric information embedded in speech. The talk covered the legal landscape, proposed solutions like cryptographic and hashing-based methods, and adversarial processing techniques. Dr. Bhiksha Raj from Carnegie Mellon University, an expert in speech and audio processing, delivered the talk. Why it matters: As speech-based interfaces become more prevalent in the Middle East, understanding and addressing the associated privacy risks is crucial for ethical AI development and deployment.

Processing language like a human

MBZUAI ·

MBZUAI's Hanan Al Darmaki is working to improve automated speech recognition (ASR) for low-resource languages, where labeled data is scarce. She notes that Arabic presents unique challenges due to dialectal variations and a lack of written resources corresponding to spoken dialects. Al Darmaki's research focuses on unsupervised speech recognition to address this gap. Why it matters: Overcoming these challenges can improve virtual assistant effectiveness across diverse languages and enable more inclusive AI applications in the Arabic-speaking world.

A Panoramic Survey of Natural Language Processing in the Arab World

arXiv ·

This survey paper reviews the landscape of Natural Language Processing (NLP) research and applications in the Arab world. It discusses the unique challenges posed by the Arabic language, such as its morphological complexity and dialectal diversity. The paper also presents a historical overview of Arabic NLP and surveys various research areas, including machine translation, sentiment analysis, and speech recognition. Why it matters: The survey provides a comprehensive resource for researchers and practitioners interested in the current state and future directions of Arabic NLP, a field critical for enabling AI technologies to serve Arabic-speaking communities.

The future of audio AI: adoption use cases powering the Middle East

MBZUAI ·

ElevenLabs, a voice AI research and product company, presented at MBZUAI's Incubation and Entrepreneurship Center (IEC) on the adoption of audio AI in the Middle East. Hussein Makki, general manager for the Middle East at ElevenLabs, highlighted the potential of voice-native AI across sectors like telecommunications, banking, and education. ElevenLabs focuses on making content accessible and engaging across languages and voices through its text-to-speech models. Why it matters: This signals growing interest and investment in voice AI applications within the region, potentially transforming customer service and content accessibility in Arabic.

Text-to-speech system brings real-time speech to LLMs

MBZUAI ·

MBZUAI researchers developed LLMVoX, a system enabling LLMs to produce real-time speech, including Arabic. LLMVoX addresses limitations of existing end-to-end and cascaded pipeline approaches, which suffer from either degraded reasoning or latency. LLMVoX was developed as part of Project OMER, which was recently awarded Regional Research Grant from Meta. Why it matters: This enhances the potential of LLMs to function as more natural, multimodal virtual assistants, especially for Arabic-speaking users in the Middle East.

QASR: QCRI Aljazeera Speech Resource -- A Large Scale Annotated Arabic Speech Corpus

arXiv ·

The Qatar Computing Research Institute (QCRI) has released QASR, a 2,000-hour transcribed Arabic speech corpus collected from Aljazeera news broadcasts. The dataset features multi-dialect speech sampled at 16kHz, aligned with lightly supervised transcriptions and linguistically motivated segmentation. QCRI also released a 130M word dataset to improve language model training. Why it matters: QASR enables new research in Arabic speech recognition, dialect identification, punctuation restoration, and other NLP tasks for spoken data.

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

arXiv ·

MBZUAI researchers introduce LLMVoX, a 30M-parameter, LLM-agnostic, autoregressive streaming text-to-speech (TTS) system that generates high-quality speech with low latency. The system preserves the capabilities of the base LLM and achieves a lower Word Error Rate compared to speech-enabled LLMs. LLMVoX supports seamless, infinite-length dialogues and generalizes to new languages with dataset adaptation, including Arabic.