Skip to content
GCC AI Research

Machines and morality: judging right and wrong with large-language models

MBZUAI · Notable

Summary

MBZUAI Professor Monojit Choudhury co-authored a study on LLMs and their capacity for moral reasoning, with the study being presented at the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL) in Malta. The study included contributions from Aditi Khandelwal, Utkarsh Agarwal, and Kumar Tanmay from Microsoft. The research explores AI alignment, ensuring AI systems align with human values, moral principles, and ethical considerations. Why it matters: The study provides insight into LLMs' capabilities regarding complex ethical issues, which is important for guiding the development of AI in a way that is consistent with human values.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models

arXiv ·

MBZUAI researchers introduce SocialMaze, a new benchmark for evaluating social reasoning capabilities in large language models (LLMs). SocialMaze includes six diverse tasks across social reasoning games, daily-life interactions, and digital community platforms, emphasizing deep reasoning, dynamic interaction, and information uncertainty. Experiments show that LLMs vary in handling dynamic interactions, degrade under uncertainty, but can be improved via fine-tuning on curated reasoning examples.

Profiling News Media for Factuality and Bias Using LLMs and the Fact-Checking Methodology of Human Experts

arXiv ·

A new methodology emulating fact-checker criteria assesses news outlet factuality and bias using LLMs. The approach uses prompts based on fact-checking criteria to elicit and aggregate LLM responses for predictions. Experiments demonstrate improvements over baselines, with error analysis on media popularity and region, and a released dataset/code at https://github.com/mbzuai-nlp/llm-media-profiling.

Towards Real-world Fact-Checking with Large Language Models

MBZUAI ·

Iryna Gurevych from TU Darmstadt presented research on using large language models for real-world fact-checking, focusing on dismantling misleading narratives from misinterpreted scientific publications and detecting misinformation via visual content. The research aims to explain why a false claim was believed, why it is false, and why the alternative is correct. Why it matters: Addressing misinformation, especially when supported by seemingly credible sources, is critical for public health, conflict resolution, and maintaining trust in institutions in the Middle East and globally.

Assessing Large Language Models on Islamic Legal Reasoning: Evidence from Inheritance Law Evaluation

arXiv ·

The paper introduces a benchmark of 1,000 multiple-choice questions to evaluate LLMs on Islamic inheritance law ('ilm al-mawarith). Seven LLMs were tested, with o3 and Gemini 2.5 achieving over 90% accuracy, while ALLaM, Fanar, LLaMA, and Mistral scored below 50%. Error analysis revealed limitations in handling structured legal reasoning. Why it matters: This research highlights the challenges and opportunities for adapting LLMs to complex, culturally-specific legal domains like Islamic jurisprudence.