This study introduces a Probabilistic Graphical Model (PGM) framework utilizing Pearl's do-operator to causally audit LLM safety mechanisms, specifically isolating the effect of injecting cultural demographics into prompts. A large-scale empirical analysis was conducted across seven instruction-tuned models from diverse origins, including the UAE's Falcon3-7B, as well as models from the US, Europe, China, and India, using ToxiGen and BOLD datasets. The findings revealed a disparity between observational and interventional bias, demonstrating that standard fairness metrics can overestimate demographic bias. Western models exhibited higher causal refusal rates for specific demographic groups, while Eastern models showed low overall intervention rates with targeted sensitivities toward regional demographics. Why it matters: This research highlights the geopolitical nuances of LLM safety alignment and the potential for demographic-sensitive over-triggering to restrict benign discourse, which is particularly relevant for diverse regions like the Middle East in developing culturally-aware AI.
Researchers from MBZUAI, University of Washington, and other institutions presented studies at EMNLP 2024 exploring how LLMs represent cultures. A survey analyzed dozens of recent studies on LLMs and culture and proposes a new framework for future research. The survey found that there is no widely accepted definition of 'culture' in NLP, making it challenging to interpret how models represent culture through language. Why it matters: This highlights a key gap in the field and emphasizes the need for a more rigorous and consistent understanding of culture in AI, especially as LLMs become more globally integrated.
The paper introduces SectEval, a new benchmark to evaluate sectarian biases in LLMs concerning Sunni and Shia Islam, available in English and Hindi. Results show significant inconsistencies in LLM responses based on language, with some models favoring Shia responses in English but Sunni in Hindi. Location-based experiments further reveal that advanced models adapt their responses based on the user's claimed country, while smaller models exhibit a consistent Sunni-leaning bias.
The paper introduces SaudiCulture, a new benchmark for evaluating the cultural competence of LLMs within Saudi Arabia, covering five major geographical regions and diverse cultural domains. The benchmark includes questions of varying complexity and distinguishes between common and specialized regional knowledge. Evaluations of five LLMs (GPT-4, Llama 3.3, FANAR, Jais, and AceGPT) revealed performance declines on region-specific questions, highlighting the need for region-specific knowledge in LLM training.