Researchers from MBZUAI introduce Forget-MI, a machine unlearning method tailored for multimodal medical data, enhancing privacy by removing specific patient data from AI models. Forget-MI utilizes loss functions and perturbation techniques to unlearn both unimodal and joint data representations. The method demonstrates superior performance in reducing Membership Inference Attacks and improving data removal compared to existing techniques, while preserving overall model performance and enabling data forgetting.
This paper introduces MOTOR, a multimodal retrieval and re-ranking approach for medical visual question answering (MedVQA) that uses grounded captions and optimal transport to capture relationships between queries and retrieved context, leveraging both textual and visual information. MOTOR identifies clinically relevant contexts to augment VLM input, achieving higher accuracy on MedVQA datasets. Empirical analysis shows MOTOR outperforms state-of-the-art methods by an average of 6.45%.
Researchers at KAUST and KACST have developed a composite material that enhances solar cell performance by absorbing air moisture at night and releasing it during the day. When applied to solar cells in Saudi Arabia, the material increased power output by 12.9% and extended cell lifespan by over 200%. The passive cooling technology also reduced electricity generation costs by 18%. Why it matters: This innovation addresses a key challenge in solar energy adoption in hot climates, potentially making solar power more efficient and cost-effective in the region.
A new methodology emulating fact-checker criteria assesses news outlet factuality and bias using LLMs. The approach uses prompts based on fact-checking criteria to elicit and aggregate LLM responses for predictions. Experiments demonstrate improvements over baselines, with error analysis on media popularity and region, and a released dataset/code at https://github.com/mbzuai-nlp/llm-media-profiling.
RSM, a global accounting and consulting firm, has committed an investment of $1 billion to significantly expand its artificial intelligence strategy over the next five years. This substantial funding aims to accelerate the integration of AI capabilities across all its service lines globally. The firm intends to leverage AI to enhance operational efficiencies, improve client service delivery, and foster innovation within its professional services offerings. Why it matters: This major investment by a leading professional services firm underscores the growing imperative for traditional industries to adopt advanced AI solutions, setting a precedent for similar firms and influencing AI integration strategies in the Middle East's financial and consulting sectors.
This paper benchmarks reasoning-focused LLMs, especially DeepSeek models, on fifteen Arabic NLP tasks. The study uses zero-shot, few-shot, and fine-tuning strategies. Key findings include that three in-context examples improve F1 scores by over 13 points on classification tasks, DeepSeek outperforms GPT-4-mini by 12 F1 points on complex inference tasks in the zero-shot setting, and LoRA fine-tuning yields up to an additional 8 points in F1 and BLEU. Why it matters: The systematic evaluation provides insights into the performance of LLMs on Arabic NLP, highlighting the effectiveness of different strategies for improving performance and contributing to the development of more capable Arabic language models.
RSM US, a prominent provider of assurance, tax, and consulting services, has announced a significant $1 billion investment in technology. This substantial funding is dedicated to accelerating the firm's overarching AI strategy. The investment aims to drive the development of next-level innovative solutions tailored for its diverse client base. Why it matters: This major financial commitment by a global consulting firm underscores the increasing integration of AI across professional services sectors worldwide, influencing trends that will likely be adopted by businesses and institutions in the Middle East.
A new benchmark, ViMUL-Bench, is introduced to evaluate video LLMs across 14 languages, including Arabic, with a focus on cultural inclusivity. The benchmark includes 8k manually verified samples across 15 categories and varying video durations. A multilingual video LLM, ViMUL, is also presented, along with a training set of 1.2 million samples, with both to be publicly released.
MBZUAI researchers introduce TerraFM, a scalable self-supervised learning model for Earth observation that uses Sentinel-1 and Sentinel-2 imagery. The model unifies radar and optical inputs through modality-specific patch embeddings and adaptive cross-attention fusion. TerraFM achieves strong generalization on classification and segmentation tasks, outperforming prior models on GEO-Bench and Copernicus-Bench.
Researchers from MBZUAI have introduced VideoMolmo, a large multimodal model for spatio-temporal pointing conditioned on textual descriptions. The model incorporates a temporal module with an attention mechanism and a temporal mask fusion pipeline using SAM2 for improved coherence across video sequences. They also curated a dataset of 72k video-caption pairs and introduced VPoS-Bench, a benchmark for evaluating generalization across real-world scenarios, with code and models publicly available.
MBZUAI researchers introduce VideoMathQA, a new benchmark for evaluating mathematical reasoning in videos, requiring models to interpret visual information, text, and spoken cues. The dataset spans 10 mathematical domains with videos ranging from 10 seconds to over 1 hour, and includes multi-step reasoning annotations. The benchmark aims to evaluate temporal cross-modal reasoning and highlights the limitations of existing approaches in complex video-based mathematical problem solving.
This paper introduces a novel evaluation framework for Arabic language models, addressing gaps in linguistic accuracy and cultural alignment. The authors analyze existing datasets and present the Arabic Depth Mini Dataset (ADMD), a curated collection of 490 questions across ten domains. Evaluating GPT-4, Claude 3.5 Sonnet, Gemini Flash 1.5, CommandR 100B, and Qwen-Max using ADMD reveals performance variations, with Claude 3.5 Sonnet achieving the highest accuracy at 30%. Why it matters: The work emphasizes the importance of cultural competence in Arabic language model evaluation, providing practical insights for improvement.
The UAE Government has launched a new training program specifically designed for Chief AI Officers across its public sector. This initiative aims to enhance future technological leadership by equipping government leaders with essential skills in AI integration and strategy. The program is part of a broader effort to leverage artificial intelligence for improved government operations and service delivery. Why it matters: This program signifies the UAE's strategic investment in human capital for AI, reinforcing its national AI strategy and ambition to be a global leader in AI governance and application.