Skip to content
GCC AI Research

Search

Results for "Cultural relevance"

Commonsense Reasoning in Arab Culture

arXiv ·

A new dataset called ArabCulture is introduced to address the lack of culturally relevant commonsense reasoning resources in Arabic AI. The dataset covers 13 countries across the Gulf, Levant, North Africa, and the Nile Valley, spanning 12 daily life domains with 54 fine-grained subtopics. It was built from scratch by native speakers writing and validating culturally relevant questions. Why it matters: The dataset highlights the need for more culturally aware models and benchmarks tailored to the Arabic-speaking world, moving beyond machine-translated resources.

Why AI can describe an image but struggles to understand the culture inside it

MBZUAI ·

MBZUAI researchers release JEEM, a new benchmark dataset for evaluating vision-language models on Arabic dialects. The dataset covers image captioning and visual question answering tasks using images from Jordan, UAE, Egypt, and Morocco. Results show models struggle with cultural understanding and relevance despite fluent language generation.

Culturally Yours: A new tool for understanding cultural references in text

MBZUAI ·

MBZUAI researchers have developed "Culturally Yours," a reading assistant that highlights and explains culturally-specific items on webpages to help users understand unfamiliar terms. The tool addresses the "cold-start problem" by asking users for demographic information to personalize the identification of potentially unfamiliar cultural references. It was presented at the 31st International Conference on Computational Linguistics in Abu Dhabi. Why it matters: This tool can help bridge linguistic and cultural gaps, particularly for underrepresented languages and cultures, and aid businesses in reaching diverse audiences.

Teaching language models about Arab culture through cross-cultural transfer

MBZUAI ·

MBZUAI researchers presented a method for cross-cultural transfer learning to improve language models' understanding of diverse Arab cultures. They used in-context learning and demonstration-based reinforcement (DITTO) to transfer cultural knowledge between countries. Experiments showed up to 34% improvement in performance on cultural understanding benchmarks using only a few demonstrations. Why it matters: This research addresses the gap in cultural understanding of Arabic language models, especially for smaller Arab countries, and provides a novel transfer learning approach.

SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia

arXiv ·

The paper introduces SaudiCulture, a new benchmark for evaluating the cultural competence of LLMs within Saudi Arabia, covering five major geographical regions and diverse cultural domains. The benchmark includes questions of varying complexity and distinguishes between common and specialized regional knowledge. Evaluations of five LLMs (GPT-4, Llama 3.3, FANAR, Jais, and AceGPT) revealed performance declines on region-specific questions, highlighting the need for region-specific knowledge in LLM training.

Why AI can describe an image but struggles to understand the culture inside it

MBZUAI ·

A new paper from MBZUAI introduces JEEM, a benchmark dataset for evaluating vision-language models on their understanding of images grounded in four Arabic-speaking societies (Jordan, UAE, Egypt, and Morocco) and their ability to use local dialects. The dataset comprises 2,178 images and 10,890 question-answer pairs reflecting everyday life and culturally specific scenes. Evaluation of several Arabic-capable models (Maya, PALO, Peacock, AIN, AyaV) and GPT-4o revealed that while models can generate fluent language, they struggle with genuine understanding, consistency, and relevance, especially when cultural context is important. Why it matters: This research highlights the challenges of building AI systems that can truly understand and interact with diverse cultures, emphasizing the need for culturally grounded datasets and evaluation metrics.

What LLMs get wrong about culture — and how to fix them: Two studies from NAACL

MBZUAI ·

MBZUAI researchers presented two studies at NAACL 2025 concerning how LLMs understand cultural differences, with one study winning the SAC award. One study, titled "Reading between the lines: Can LLMs identify cross-cultural communication gaps," assesses GPT-4o's ability to identify cultural references in Goodreads book reviews. The researchers created a benchmark dataset using annotations from 50 evaluators across different cultures to measure the LLM's ability to identify culture-specific items (CSIs). Why it matters: Improving LLMs' cross-cultural understanding is crucial for ensuring these models can be used effectively and equitably across diverse global contexts.

AceGPT, Localizing Large Language Models in Arabic

arXiv ·

Researchers introduce AceGPT, a localized large language model (LLM) specifically for Arabic, addressing cultural sensitivity and local values not well-represented in mainstream models. AceGPT incorporates further pre-training with Arabic texts, supervised fine-tuning using native Arabic instructions and GPT-4 responses, and reinforcement learning with AI feedback using a reward model attuned to local culture. Evaluations demonstrate that AceGPT achieves state-of-the-art performance among open Arabic LLMs across several benchmarks. Why it matters: This work advances culturally-aware AI development for Arabic-speaking communities, providing a valuable resource and benchmark for future research.