Skip to content
GCC AI Research

Search

Results for "ASR"

QASR: QCRI Aljazeera Speech Resource -- A Large Scale Annotated Arabic Speech Corpus

arXiv ·

The Qatar Computing Research Institute (QCRI) has released QASR, a 2,000-hour transcribed Arabic speech corpus collected from Aljazeera news broadcasts. The dataset features multi-dialect speech sampled at 16kHz, aligned with lightly supervised transcriptions and linguistically motivated segmentation. QCRI also released a 130M word dataset to improve language model training. Why it matters: QASR enables new research in Arabic speech recognition, dialect identification, punctuation restoration, and other NLP tasks for spoken data.

Processing language like a human

MBZUAI ·

MBZUAI's Hanan Al Darmaki is working to improve automated speech recognition (ASR) for low-resource languages, where labeled data is scarce. She notes that Arabic presents unique challenges due to dialectal variations and a lack of written resources corresponding to spoken dialects. Al Darmaki's research focuses on unsupervised speech recognition to address this gap. Why it matters: Overcoming these challenges can improve virtual assistant effectiveness across diverse languages and enable more inclusive AI applications in the Arabic-speaking world.

How dialectal pretraining improves Arabic automatic speech recognition

MBZUAI ·

MBZUAI researchers presented a study at ACL 2024 on improving Arabic ASR by pre-training on dialectal Arabic. They trained three versions of the ArTST model: one on MSA, one on MSA and dialectal data, and one on MSA, dialectal, and multilingual data. Results showed that pre-training on dialectal Arabic improves ASR performance across MSA and various dialects. Why it matters: This research addresses a key challenge in Arabic NLP, given the diversity and lack of standardization in dialects, which could lead to more accurate speech recognition systems.

Challenges and Solutions in Developing Code-switched Arabic-English NLP Systems

MBZUAI ·

Injy Hamed from NYU Abu Dhabi's CAMeL Lab presented work on Egyptian Arabic-English code-switching for ASR and MT. She discussed the ArzEn-ST speech translation corpus and compared end-to-end and hybrid systems for ASR. For MT, she presented data augmentation and word segmentation techniques to handle data scarcity, also addressing ASR evaluation challenges in code-switching. Why it matters: Research into code-switching is crucial for building NLP systems capable of processing real-world language use in the Arab world.

Past, Present and Future of Speech Technologies

MBZUAI ·

Pedro J. Moreno, former head of ASR R&D at Google, presented a talk at MBZUAI on the past, present, and future of speech technologies. The talk covered the evolution of speech tech, his career contributions including work on Google Voice search, and the impact of LLMs on speech science. He also discussed the interplay between foundational and applied research and preparing the next generation of scientists. Why it matters: The talk provides insights into the trajectory of speech technologies from a leading researcher, highlighting future directions and the ethical considerations surrounding AI's impact on society.

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

arXiv ·

This paper benchmarks the performance of OpenAI's Whisper model on diverse Arabic speech recognition tasks, using publicly available data and novel dialect evaluation sets. The study explores zero-shot, few-shot, and full finetuning scenarios. Results indicate that while Whisper outperforms XLS-R models in zero-shot settings on standard datasets, its performance drops significantly when applied to unseen Arabic dialects.

Enhanced Arabic Text Retrieval with Attentive Relevance Scoring

arXiv ·

This paper introduces an enhanced Dense Passage Retrieval (DPR) framework tailored for Arabic text retrieval. The core innovation is an Attentive Relevance Scoring (ARS) mechanism that improves semantic relevance modeling between questions and passages, replacing standard interaction methods. The method integrates pre-trained Arabic language models and architectural refinements, achieving improved retrieval and ranking accuracy for Arabic question answering. Why it matters: This work addresses the underrepresentation of Arabic in NLP research by providing a novel approach and publicly available code to improve Arabic text retrieval, which can benefit various applications like Arabic search engines and question-answering systems.