This paper benchmarks reasoning-focused LLMs, especially DeepSeek models, on fifteen Arabic NLP tasks. The study uses zero-shot, few-shot, and fine-tuning strategies. Key findings include that three in-context examples improve F1 scores by over 13 points on classification tasks, DeepSeek outperforms GPT-4-mini by 12 F1 points on complex inference tasks in the zero-shot setting, and LoRA fine-tuning yields up to an additional 8 points in F1 and BLEU. Why it matters: The systematic evaluation provides insights into the performance of LLMs on Arabic NLP, highlighting the effectiveness of different strategies for improving performance and contributing to the development of more capable Arabic language models.
A new paper at ICCV 2025, co-authored by MBZUAI Ph.D. student Dmitry Demidov, introduces Dense-WebVid-CoVR, a 1.6-million sample benchmark for composed video retrieval (CoVR). The benchmark features longer, context-rich descriptions and modification texts, generated using Gemini Pro and GPT-4o, with manual verification. The paper also presents a unified fusion approach that jointly reasons across video and text inputs, improving performance on fine-grained edit details. Why it matters: This work advances video search capabilities by enabling more human-like queries, which is crucial for creative and analytic workflows that require nuanced video retrieval.
Researchers developed a semantic search tool for the Quran using Arabic NLP techniques. The tool was trained on a dataset of over 30 tafsirs (interpretations) of the Quran. Using the SNxLM model and cosine similarity, the tool identifies Quranic verses most relevant to a user's query, achieving a similarity score of up to 0.97. Why it matters: This tool could significantly improve access to the Quran's teachings for Arabic speakers and researchers, providing a valuable resource for religious study and understanding.
KAUST researchers have developed deepBlastoid, a deep learning tool for evaluating models of human embryo development, called blastoids. deepBlastoid can evaluate images of blastoids at speeds 1000 times faster than expert scientists, processing 273 images per second. Trained on over 2000 microscopic blastoid images, it assesses the impact of chemicals on blastoid development using over 10,000 images. Why it matters: This AI tool accelerates research into early pregnancy, fertility complications, and the impact of chemicals on embryo development, with implications for reproductive technologies.
Stanford's Robotics Laboratory, in collaboration with KAUST professors Khaled Nabil Salama and Christian Voolstra and MEKA Robotics, developed OceanOne, a bimanual underwater humanoid robot avatar with haptic feedback. OceanOne allows human pilots to explore ocean depths with high fidelity by relaying instantaneous images. The robot has two fully articulated arms and a tail section with batteries, computers, and thrusters. Why it matters: This collaboration between KAUST and Stanford highlights the increasing role of robotics and AI in deep-sea exploration, with potential applications in underwater research and resource discovery in the Red Sea and beyond.
This paper explores the use of deep learning for anomaly detection in sports facilities, with the goal of optimizing energy management. The researchers propose a method using Deep Feedforward Neural Networks (DFNN) and threshold estimation techniques to identify anomalies and reduce false alarms. They tested their approach on an aquatic center dataset at Qatar University, achieving 94.33% accuracy and 92.92% F1-score. Why it matters: The research demonstrates the potential of AI to improve energy efficiency and operational effectiveness in sports facilities within the GCC region.
This paper introduces a framework that combines machine learning for multi-class attack detection in IoT/IIoT networks with large language models (LLMs) for attack behavior analysis and mitigation suggestion. The framework uses role-play prompt engineering with RAG to guide LLMs like ChatGPT-o3 and DeepSeek-R1, and introduces new evaluation metrics for quantitative assessment. Experiments using Edge-IIoTset and CICIoT2023 datasets showed Random Forest as the best detection model and ChatGPT-o3 outperforming DeepSeek-R1 in attack analysis and mitigation.
This paper analyzes the energy consumption and carbon footprint of LLM inference in the UAE compared to Iceland, Germany, and the USA. The study uses DeepSeek Coder 1.3B and the HumanEval dataset to evaluate code generation. It provides a comparative analysis of geographical trade-offs for climate-aware AI deployment, specifically addressing the challenges and potential of datacenters in desert regions.