Skip to content
GCC AI Research

Identifying bias in generative music models: A new study presented at NAACL

MBZUAI · Notable

Summary

MBZUAI researchers found that only 5.7% of music in existing datasets used to train generative music systems comes from non-Western genres. They discovered that 94% of the music represented Western music, while Africa, the Middle East, and South Asia accounted for only 0.3%, 0.4%, and 0.9% respectively. The team also tested whether parameter-efficient fine-tuning with adapters could improve generative music systems on underrepresented styles, presenting their findings at NAACL. Why it matters: This research highlights the critical need for more diverse datasets in AI music generation to better serve global musical traditions and audiences.

Keywords

MBZUAI · generative music · bias · datasets · NAACL

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

What LLMs get wrong about culture — and how to fix them: Two studies from NAACL

MBZUAI ·

MBZUAI researchers presented two studies at NAACL 2025 concerning how LLMs understand cultural differences, with one study winning the SAC award. One study, titled "Reading between the lines: Can LLMs identify cross-cultural communication gaps," assesses GPT-4o's ability to identify cultural references in Goodreads book reviews. The researchers created a benchmark dataset using annotations from 50 evaluators across different cultures to measure the LLM's ability to identify culture-specific items (CSIs). Why it matters: Improving LLMs' cross-cultural understanding is crucial for ensuring these models can be used effectively and equitably across diverse global contexts.

Culture and bias in LLMs: Defining the challenge and mitigating risks

MBZUAI ·

Researchers from MBZUAI, University of Washington, and other institutions presented studies at EMNLP 2024 exploring how LLMs represent cultures. A survey analyzed dozens of recent studies on LLMs and culture and proposes a new framework for future research. The survey found that there is no widely accepted definition of 'culture' in NLP, making it challenging to interpret how models represent culture through language. Why it matters: This highlights a key gap in the field and emphasizes the need for a more rigorous and consistent understanding of culture in AI, especially as LLMs become more globally integrated.

SectEval: Evaluating the Latent Sectarian Preferences of Large Language Models

arXiv ·

The paper introduces SectEval, a new benchmark to evaluate sectarian biases in LLMs concerning Sunni and Shia Islam, available in English and Hindi. Results show significant inconsistencies in LLM responses based on language, with some models favoring Shia responses in English but Sunni in Hindi. Location-based experiments further reveal that advanced models adapt their responses based on the user's claimed country, while smaller models exhibit a consistent Sunni-leaning bias.

Testing LLMs safety in Arabic from two perspectives | NAACL

MBZUAI ·

Researchers at MBZUAI presented a new Arabic dataset at NAACL to measure LLM safety, building on a Chinese dataset called 'Do Not Answer'. The dataset includes nearly 5,800 questions with challenges and harmless requests containing sensitive terms to test for over-sensitivity. The team localized cultural concepts and added 3,000 questions specific to Arabic language and culture. Why it matters: This comprehensive benchmark, accounting for the diversity of Arabic dialects and cultures, advances the development of safer and more culturally aligned LLMs for Arabic speakers.