Skip to content
GCC AI Research

Search

Results for "Chinese language"

Walking the line: Safety and performance in large language models

MBZUAI ·

MBZUAI researchers have expanded LLM safety research to Chinese, presenting their work at the 62nd Annual Meeting of the Association for Computational Linguistics in Bangkok. They developed an open-source Chinese dataset of 3,000 prompts translated and localized from the English "Do-Not-Answer" dataset. The dataset includes a "region-specific sensitivity" category to address unique safety risks for Chinese speakers, evaluating if models are over-sensitive in identifying innocuous questions as harmful. Why it matters: This research addresses a critical gap in LLM safety evaluation, ensuring that language models are both safe and effective for diverse linguistic and cultural contexts, particularly in regions with unique sensitivities.

Processing language like a human

MBZUAI ·

MBZUAI's Hanan Al Darmaki is working to improve automated speech recognition (ASR) for low-resource languages, where labeled data is scarce. She notes that Arabic presents unique challenges due to dialectal variations and a lack of written resources corresponding to spoken dialects. Al Darmaki's research focuses on unsupervised speech recognition to address this gap. Why it matters: Overcoming these challenges can improve virtual assistant effectiveness across diverse languages and enable more inclusive AI applications in the Arabic-speaking world.

Chinese students explore KAUST

KAUST ·

Undergraduate students from the University of Electronic Science and Technology of China (UESTC) in Chengdu visited KAUST for a one-week Spring Camp in March. The students, chosen from the top 10 percent of UESTC undergraduates, toured the CEMSE division. The UESTC students shared a presentation about their KAUST experience at the conclusion of the trip. Why it matters: The visit highlights KAUST's ongoing efforts to attract international talent and foster collaborations with leading universities.

A Panoramic Survey of Natural Language Processing in the Arab World

arXiv ·

This survey paper reviews the landscape of Natural Language Processing (NLP) research and applications in the Arab world. It discusses the unique challenges posed by the Arabic language, such as its morphological complexity and dialectal diversity. The paper also presents a historical overview of Arabic NLP and surveys various research areas, including machine translation, sentiment analysis, and speech recognition. Why it matters: The survey provides a comprehensive resource for researchers and practitioners interested in the current state and future directions of Arabic NLP, a field critical for enabling AI technologies to serve Arabic-speaking communities.

Polyglot programs: NLP for Arabic and the globe’s diverse dialects

MBZUAI ·

MBZUAI researchers presented studies at EMNLP and ArabicNLP conferences on improving NLP for diverse languages, especially Arabic. One study evaluated ChatGPT and GPT-4's performance across Arabic dialects, finding limitations compared to English. GPT-4 showed better performance than GPT-3.5 in Arabic. Why it matters: This research highlights the need for NLP models to better support the linguistic diversity of Arabic and other languages to avoid widening existing technological gaps.

Cultural inclusivity in AI: A new benchmark dataset on 100 languages

MBZUAI ·

MBZUAI researchers have released ALM Bench, a new benchmark dataset for evaluating the performance of multimodal LLMs on cultural visual question-answer tasks across 100 languages. The dataset includes over 22,000 question-answer pairs across 19 categories, with a focus on low-resource languages and cultural nuances, including three Arabic dialects. They tested 16 open- and closed-source multimodal LLMs on it, revealing a significant need for greater cultural and linguistic inclusivity. Why it matters: The benchmark aims to improve the inclusivity of multimodal AI systems by addressing the underrepresentation of low-resource languages and cultural contexts.

AceGPT, Localizing Large Language Models in Arabic

arXiv ·

Researchers introduce AceGPT, a localized large language model (LLM) specifically for Arabic, addressing cultural sensitivity and local values not well-represented in mainstream models. AceGPT incorporates further pre-training with Arabic texts, supervised fine-tuning using native Arabic instructions and GPT-4 responses, and reinforcement learning with AI feedback using a reward model attuned to local culture. Evaluations demonstrate that AceGPT achieves state-of-the-art performance among open Arabic LLMs across several benchmarks. Why it matters: This work advances culturally-aware AI development for Arabic-speaking communities, providing a valuable resource and benchmark for future research.