Skip to content
GCC AI Research

Search

Results for "LetBabyTalk"

When AI learns to listen: how researchers are decoding baby cries to help new parents

MBZUAI ·

MBZUAI researchers developed LetBabyTalk, an AI-powered multilingual parenting app that analyzes baby cries to identify needs like hunger or sleepiness. The app is trained on over 1,000 baby cries and uses supervised machine learning with input from experienced parents and educators. Cradle AI, the startup behind the app, aims to bridge the gap between advanced AI research and real-world solutions, focusing on family care and education. Why it matters: This project demonstrates the potential of AI to address everyday challenges and improve the lives of families in the region and globally, while also showcasing MBZUAI's focus on AI for social good.

LLM-BABYBENCH: Understanding and Evaluating Grounded Planning and Reasoning in LLMs

arXiv ·

MBZUAI researchers introduce LLM-BabyBench, a benchmark suite for evaluating grounded planning and reasoning in LLMs. The suite, built on a textual adaptation of the BabyAI grid world, assesses LLMs on predicting action consequences, generating action sequences, and decomposing instructions. Datasets, evaluation harness, and metrics are publicly available to facilitate reproducible assessment.

Research talk on Privacy and Security Issues in Speech

MBZUAI ·

A research talk was given on privacy and security issues in speech processing, highlighting the unique privacy challenges due to the biometric information embedded in speech. The talk covered the legal landscape, proposed solutions like cryptographic and hashing-based methods, and adversarial processing techniques. Dr. Bhiksha Raj from Carnegie Mellon University, an expert in speech and audio processing, delivered the talk. Why it matters: As speech-based interfaces become more prevalent in the Middle East, understanding and addressing the associated privacy risks is crucial for ethical AI development and deployment.

SpokenNativQA: Multilingual Everyday Spoken Queries for LLMs

arXiv ·

The Qatar Computing Research Institute (QCRI) has released SpokenNativQA, a multilingual spoken question-answering dataset for evaluating LLMs in conversational settings. The dataset contains 33,000 naturally spoken questions and answers across multiple languages, including low-resource and dialect-rich languages. It aims to address the limitations of text-based QA datasets by incorporating speech variability, accents, and linguistic diversity. Why it matters: This benchmark enables more robust evaluation of LLMs in speech-based interactions, particularly for Arabic dialects and other low-resource languages.

Processing language like a human

MBZUAI ·

MBZUAI's Hanan Al Darmaki is working to improve automated speech recognition (ASR) for low-resource languages, where labeled data is scarce. She notes that Arabic presents unique challenges due to dialectal variations and a lack of written resources corresponding to spoken dialects. Al Darmaki's research focuses on unsupervised speech recognition to address this gap. Why it matters: Overcoming these challenges can improve virtual assistant effectiveness across diverse languages and enable more inclusive AI applications in the Arabic-speaking world.

AI researcher tackles stuttering diagnosis in the developing world

MBZUAI ·

MBZUAI doctoral student Hawau Toyin is applying AI to the identification, correction, and evaluation of stuttering, particularly in developing countries where it often goes undiagnosed. She is collaborating with the SpeechCare Center UAE and her advisor Dr. Hanan Aldarmaki to develop AI tools for faster and more accessible diagnosis and treatment. The research focuses on data collection from around the world to build an effective AI system that can analyze the various forms of stuttering. Why it matters: This research addresses a critical healthcare gap by leveraging AI to improve diagnosis and treatment of speech disorders in underserved regions.

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

arXiv ·

MBZUAI researchers introduce LLMVoX, a 30M-parameter, LLM-agnostic, autoregressive streaming text-to-speech (TTS) system that generates high-quality speech with low latency. The system preserves the capabilities of the base LLM and achieves a lower Word Error Rate compared to speech-enabled LLMs. LLMVoX supports seamless, infinite-length dialogues and generalizes to new languages with dataset adaptation, including Arabic.