The paper introduces a benchmark of 1,000 multiple-choice questions to evaluate LLMs on Islamic inheritance law ('ilm al-mawarith). Seven LLMs were tested, with o3 and Gemini 2.5 achieving over 90% accuracy, while ALLaM, Fanar, LLaMA, and Mistral scored below 50%. Error analysis revealed limitations in handling structured legal reasoning. Why it matters: This research highlights the challenges and opportunities for adapting LLMs to complex, culturally-specific legal domains like Islamic jurisprudence.
A new paper at ICCV 2025, co-authored by MBZUAI Ph.D. student Dmitry Demidov, introduces Dense-WebVid-CoVR, a 1.6-million sample benchmark for composed video retrieval (CoVR). The benchmark features longer, context-rich descriptions and modification texts, generated using Gemini Pro and GPT-4o, with manual verification. The paper also presents a unified fusion approach that jointly reasons across video and text inputs, improving performance on fine-grained edit details. Why it matters: This work advances video search capabilities by enabling more human-like queries, which is crucial for creative and analytic workflows that require nuanced video retrieval.
Communications Physics journal has a focus collection on space quantum communications. The collection covers supporting technologies, new quantum protocols, inter-satellite QKD, constellations of satellites, and quantum inspired technologies and protocols for space based communication. Contributions are welcome from October 20, 2020 to April 30, 2021, and accepted papers are published on a rolling basis. Why it matters: Space-based quantum communication is a critical area for developing secure, global quantum networks, and this collection could highlight relevant research for the GCC region as it invests in advanced technologies.
MBZUAI's Institute of Foundation Models (IFM) has released K2 Think V2, a 70 billion parameter open-source general reasoning model built on K2 V2 Instruct. The model excels in complex reasoning benchmarks like AIME2025 and GPQA-Diamond, and features a low hallucination rate with long context reasoning capabilities. K2 Think V2 is fully sovereign and open, from pre-training through post-training, using IFM-curated data and a Guru dataset. Why it matters: This release contributes to closing the gap between community-owned reproducible AI and proprietary models, particularly in reasoning and long-context understanding for Arabic NLP tasks.
Researchers at ETH Zurich have formalized models of the EMV payment protocol using the Tamarin model checker. They discovered flaws allowing attackers to bypass PIN requirements for high-value purchases on EMV cards like Mastercard and Visa. The team also collaborated with an EMV consortium member to verify the improved EMV Kernel C-8 protocol. Why it matters: This research highlights the importance of formal methods in identifying critical vulnerabilities in widely used payment systems, potentially impacting financial security for consumers in the GCC region and worldwide.
Researchers address the challenge of limited Arabic medical dialogue data by generating 80,000 synthetic question-answer pairs using ChatGPT-4o and Gemini 2.5 Pro, expanding an initial dataset of 20,000 records. They fine-tuned five LLMs, including Mistral-7B and AraGPT2, and evaluated performance using BERTScore and expert review. Results showed that training with ChatGPT-4o-generated data led to higher F1-scores and fewer hallucinations across models. Why it matters: This demonstrates the potential of synthetic data augmentation to improve domain-specific Arabic language models, particularly for low-resource medical NLP applications.