Adel Bibi, a KAUST alumnus and researcher at the University of Oxford, presented his research on AI safety, covering robustness, alignment, and fairness of LLMs. The research addresses challenges in AI systems, alignment issues, and fairness across languages in common tokenizers. Bibi's work includes instruction prefix tuning and its theoretical limitations towards alignment. Why it matters: This research from a leading researcher highlights the importance of addressing safety concerns in LLMs, particularly regarding alignment and fairness in the Arabic language.
A new paper coauthored by researchers at The University of Melbourne and MBZUAI explores disagreement in human annotation for AI training. The paper treats disagreement as a signal (human label variation or HLV) rather than noise, and proposes new evaluation metrics based on fuzzy set theory. These metrics adapt accuracy and F-score to cases where multiple labels may plausibly apply, aligning model output with the distribution of human judgments. Why it matters: This research addresses a key challenge in NLP by accounting for the inherent ambiguity in human language, potentially leading to more robust and human-aligned AI systems.
The paper introduces ALLaM, a series of large language models for Arabic and English, designed to support Arabic Language Technologies. The models are trained with language alignment and knowledge transfer in mind, using a decoder-only architecture. ALLaM achieves state-of-the-art results on Arabic benchmarks like MMLU Arabic and Arabic Exams. Why it matters: This work advances Arabic NLP by providing high-performing LLMs and demonstrating effective techniques for cross-lingual transfer learning and alignment with human preferences.
KAUST researchers developed a machine learning algorithm to control a deformable mirror within the Subaru Telescope's exoplanet imaging camera, compensating for atmospheric turbulence. The algorithm, which computes a partial singular value decomposition (SVD), outperforms a standard SVD by a factor of four. The KAUST team received a best paper award at the PASC Conference for this work, which has already been deployed at the Subaru Telescope. Why it matters: This advancement enables sharper images of exoplanets, facilitating their identification and study, and showcases the impact of optimizing core linear algebra algorithms.
MBZUAI Professor Monojit Choudhury co-authored a study on LLMs and their capacity for moral reasoning, with the study being presented at the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL) in Malta. The study included contributions from Aditi Khandelwal, Utkarsh Agarwal, and Kumar Tanmay from Microsoft. The research explores AI alignment, ensuring AI systems align with human values, moral principles, and ethical considerations. Why it matters: The study provides insight into LLMs' capabilities regarding complex ethical issues, which is important for guiding the development of AI in a way that is consistent with human values.
The paper introduces Juhaina, a 9.24B parameter Arabic-English bilingual LLM trained with an 8,192 token context window. It identifies limitations in the Open Arabic LLM Leaderboard (OALL) and proposes a new benchmark, CamelEval, for more comprehensive evaluation. Juhaina outperforms models like Llama and Gemma in generating helpful Arabic responses and understanding cultural nuances. Why it matters: This culturally-aligned LLM and associated benchmark could significantly advance Arabic NLP and democratize AI access for Arabic speakers.
Researchers from MBZUAI, IBM, and ServiceNow introduced GEOBench-VLM, a benchmark for evaluating vision-language models on Earth observation tasks using satellite and aerial imagery. The benchmark includes over 10,000 human-verified instructions across 31 sub-tasks spanning object classification, localization, change detection, and more. GEOBench-VLM addresses the gap in current VLMs' ability to perform spatially grounded reasoning and change detection in satellite imagery. Why it matters: This benchmark will drive progress in AI's ability to analyze satellite data for critical applications like disaster response, climate monitoring, and urban planning in the Middle East and globally.