Search

Results for "multilingual models"

Language Models' Factuality Depends on the Language of Inquiry

arXiv · Feb 25

Researchers introduce a benchmark to evaluate the factual recall and knowledge transferability of multilingual language models across 13 languages. The study reveals that language models often fail to transfer knowledge between languages, even when they possess the correct information in one language. The benchmark and evaluation framework are released to drive future research in multilingual knowledge transfer.

Performance Prediction via Bayesian Matrix Factorisation for Multilingual Natural Language Processing Tasks

MBZUAI · Invalid Date

A new Bayesian matrix factorization approach is explored for performance prediction in multilingual NLP, aiming to reduce the experimental burden of evaluating various language combinations. The approach outperforms state-of-the-art methods in NLP benchmarks like machine translation and cross-lingual entity linking. It also avoids hyperparameter tuning and provides uncertainty estimates over predictions. Why it matters: Accurate performance prediction methods accelerate multilingual NLP research by reducing computational costs and improving experimental efficiency, especially valuable for Arabic NLP tasks.

A Culturally-diverse Multilingual Multimodal Video Benchmark & Model

arXiv · Jun 8

A new benchmark, ViMUL-Bench, is introduced to evaluate video LLMs across 14 languages, including Arabic, with a focus on cultural inclusivity. The benchmark includes 8k manually verified samples across 15 categories and varying video durations. A multilingual video LLM, ViMUL, is also presented, along with a training set of 1.2 million samples, with both to be publicly released.

PALO: A Polyglot Large Multimodal Model for 5B People

arXiv · Feb 22

Researchers introduce PALO, a polyglot large multimodal model with visual reasoning capabilities in 10 major languages including Arabic. A semi-automated translation approach was used to adapt the multimodal instruction dataset from English to the target languages. The models are trained across three scales (1.7B, 7B and 13B parameters) and a multilingual multimodal benchmark is proposed for evaluation.

New method reveals major cross-lingual gaps in language models

MBZUAI · Invalid Date

Researchers at MBZUAI have developed a new automatic method to examine cross-lingual abilities in multilingual language models, testing 10 models across 16 languages. They combined beam search with language-model-based simulation, generating 6,000 bilingual question pairs and found significant performance drops compared to English, even in high-resource languages like Chinese. The method introduces perturbations to test the models' ability to transfer knowledge rather than rely on memorization. Why it matters: This research highlights critical gaps in cross-lingual AI, providing a framework for developing more equitable and effective multilingual models, especially for Arabic and other under-represented languages.

Towards Inclusive NLP: Assessing Compressed Multilingual Transformers across Diverse Language Benchmarks

arXiv · Jul 25

This paper benchmarks multilingual and monolingual LLM performance across Arabic, English, and Indic languages, examining model compression effects like pruning and quantization. Multilingual models outperform language-specific counterparts, demonstrating cross-lingual transfer. Quantization maintains accuracy while promoting efficiency, but aggressive pruning compromises performance, particularly in larger models. Why it matters: The findings highlight strategies for scalable and fair multilingual NLP, addressing hallucination and generalization errors in low-resource languages.

Comparison of Multilingual and Bilingual Models for Satirical News Detection of Arabic and English

arXiv · Nov 16

This paper explores multilingual satire detection methods in English and Arabic using zero-shot and chain-of-thought (CoT) prompting. It compares the performance of Jais-chat(13B) and LLaMA-2-chat(7B) on distinguishing satire from truthful news. Results show that CoT prompting significantly improves Jais-chat's performance, achieving an F1-score of 80% in English. Why it matters: This demonstrates the potential of Arabic LLMs like Jais to handle nuanced language tasks such as satire detection, which is critical for combating misinformation in the region.