Skip to content
GCC AI Research

Search

Results for "Multilingual NLP"

Towards Inclusive NLP: Assessing Compressed Multilingual Transformers across Diverse Language Benchmarks

arXiv ·

This paper benchmarks multilingual and monolingual LLM performance across Arabic, English, and Indic languages, examining model compression effects like pruning and quantization. Multilingual models outperform language-specific counterparts, demonstrating cross-lingual transfer. Quantization maintains accuracy while promoting efficiency, but aggressive pruning compromises performance, particularly in larger models. Why it matters: The findings highlight strategies for scalable and fair multilingual NLP, addressing hallucination and generalization errors in low-resource languages.

Performance Prediction via Bayesian Matrix Factorisation for Multilingual Natural Language Processing Tasks

MBZUAI ·

A new Bayesian matrix factorization approach is explored for performance prediction in multilingual NLP, aiming to reduce the experimental burden of evaluating various language combinations. The approach outperforms state-of-the-art methods in NLP benchmarks like machine translation and cross-lingual entity linking. It also avoids hyperparameter tuning and provides uncertainty estimates over predictions. Why it matters: Accurate performance prediction methods accelerate multilingual NLP research by reducing computational costs and improving experimental efficiency, especially valuable for Arabic NLP tasks.

A Panoramic Survey of Natural Language Processing in the Arab World

arXiv ·

This survey paper reviews the landscape of Natural Language Processing (NLP) research and applications in the Arab world. It discusses the unique challenges posed by the Arabic language, such as its morphological complexity and dialectal diversity. The paper also presents a historical overview of Arabic NLP and surveys various research areas, including machine translation, sentiment analysis, and speech recognition. Why it matters: The survey provides a comprehensive resource for researchers and practitioners interested in the current state and future directions of Arabic NLP, a field critical for enabling AI technologies to serve Arabic-speaking communities.

Challenges in low-resourced NLP: an Irish case study

MBZUAI ·

Dr. Teresa Lynn from Dublin City University (DCU) discussed the challenges in developing NLP tools for Irish, a low-resource language facing digital extinction. She highlighted the lack of speech and language applications and fundamental language resources for Irish. Lynn also mentioned her work at DCU on the GaelTech project and her involvement in the European Language Equality project. Why it matters: The development of NLP tools for low-resource languages like Irish is crucial for preserving linguistic diversity and preventing digital marginalization in the AI era.

Language Models' Factuality Depends on the Language of Inquiry

arXiv ·

Researchers introduce a benchmark to evaluate the factual recall and knowledge transferability of multilingual language models across 13 languages. The study reveals that language models often fail to transfer knowledge between languages, even when they possess the correct information in one language. The benchmark and evaluation framework are released to drive future research in multilingual knowledge transfer.

Comparison of Multilingual and Bilingual Models for Satirical News Detection of Arabic and English

arXiv ·

This paper explores multilingual satire detection methods in English and Arabic using zero-shot and chain-of-thought (CoT) prompting. It compares the performance of Jais-chat(13B) and LLaMA-2-chat(7B) on distinguishing satire from truthful news. Results show that CoT prompting significantly improves Jais-chat's performance, achieving an F1-score of 80% in English. Why it matters: This demonstrates the potential of Arabic LLMs like Jais to handle nuanced language tasks such as satire detection, which is critical for combating misinformation in the region.