Middle East AI

Topics

Product

31 articles RSS ↗

Reinforcement learning-based dynamic cleaning scheduling framework for solar energy system

arXiv · · RL Robotics

This study introduces a reinforcement learning (RL) framework using Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) to optimize the cleaning schedules of photovoltaic panels in arid regions. Applied to a case study in Abu Dhabi, the PPO-based framework demonstrated up to 13% cost savings compared to simulation optimization methods by dynamically adjusting cleaning intervals based on environmental conditions. The research highlights the potential of RL in enhancing the efficiency and reducing the operational costs of solar power generation.

Beyond the Resumé: A Rubric-Aware Automatic Interview System for Information Elicitation

arXiv · · NLP LLM

MBZUAI researchers have developed an automatic interview system that uses LLMs to elicit nuanced, role-specific information from job candidates, improving early-stage hiring decisions. The system updates its belief about an applicant's rubric-oriented latent traits in a calibrated way based on their interview performance. Evaluation on simulated interviews showed the system's belief converges towards the simulated applicants' constructed ability levels.

ILION: Deterministic Pre-Execution Safety Gates for Agentic AI Systems

arXiv · · RL Ethics

The paper introduces ILION, a deterministic execution gate designed to ensure the safety of autonomous AI agents by classifying proposed actions as either BLOCK or ALLOW. ILION uses a five-component cascade architecture that operates without statistical training, API dependencies, or labeled data. Evaluation against existing text-safety infrastructures demonstrates ILION's superior performance in preventing unauthorized actions, achieving an F1 score of 0.8515 with sub-millisecond latency.

SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation

arXiv · · CV NLP

Researchers from MBZUAI have introduced SPECS, a new reference-free evaluation metric for long image captions that modifies CLIP to emphasize specificity. SPECS aims to improve the correlation with human judgment while maintaining computational efficiency compared to LLM-based metrics. The proposed approach is intended for iterative use during image captioning model development, offering a practical alternative to existing methods.

UI-Level Evaluation of ALLaM 34B: Measuring an Arabic-Centric LLM via HUMAIN Chat

arXiv · · LLM Arabic AI

This paper presents a UI-level evaluation of ALLaM-34B, an Arabic-centric LLM developed by SDAIA and deployed in the HUMAIN Chat service. The evaluation used a prompt pack spanning various Arabic dialects, code-switching, reasoning, and safety, with outputs scored by frontier LLM judges. Results indicate strong performance in generation, code-switching, MSA handling, reasoning, and improved dialect fidelity, positioning ALLaM-34B as a robust Arabic LLM suitable for real-world use.

FAID: Fine-Grained AI-Generated Text Detection Using Multi-Task Auxiliary and Multi-Level Contrastive Learning

arXiv · · NLP LLM

MBZUAI researchers introduce FAID, a fine-grained AI-generated text detection framework capable of classifying text as human-written, LLM-generated, or collaboratively written. FAID utilizes multi-level contrastive learning and multi-task auxiliary classification to capture authorship and model-specific characteristics, and can identify the underlying LLM family. The framework outperforms existing baselines, especially in generalizing to unseen domains and new LLMs, and includes a multilingual, multi-domain dataset called FAIDSet.

RP-SAM2: Refining Point Prompts for Stable Surgical Instrument Segmentation

arXiv · · CV Research

Researchers from MBZUAI introduced RP-SAM2, a method to improve surgical instrument segmentation by refining point prompts for more stable results. RP-SAM2 uses a novel shift block and compound loss function to reduce sensitivity to point prompt placement, improving segmentation accuracy in data-constrained settings. Experiments on the Cataract1k and CaDIS datasets show that RP-SAM2 enhances segmentation accuracy and reduces variance compared to SAM2, with code available on GitHub.

Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling

arXiv · · Research Healthcare

The paper introduces UAE-3D, a multi-modal VAE for 3D molecule generation that compresses molecules into a unified latent space, maintaining near-zero reconstruction error. This approach simplifies latent diffusion modeling by eliminating the need to handle multi-modality and equivariance separately. Experiments on GEOM-Drugs and QM9 datasets show UAE-3D establishes new benchmarks in de novo and conditional 3D molecule generation, with significant improvements in efficiency and quality.

FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

arXiv · · CV Research

FancyVideo, a new video generator, introduces a Cross-frame Textual Guidance Module (CTGM) to enhance text-to-video models. CTGM uses a Temporal Information Injector and Temporal Affinity Refiner to achieve frame-specific textual guidance, improving comprehension of temporal logic. Experiments on the EvalCrafter benchmark demonstrate FancyVideo's state-of-the-art performance in generating dynamic and consistent videos, also supporting image-to-video tasks.

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

arXiv · · Research LLM

MBZUAI researchers introduce Web2Code, a new large-scale dataset and evaluation framework for training and benchmarking multimodal LLMs on webpage understanding and HTML code generation. The dataset includes webpage images, HTML code, and QA pairs about webpage content. Experiments demonstrate the dataset's utility in webpage understanding, code generation, and general visual domain tasks, with code and data available on Github.

MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis

arXiv · · Healthcare CV

The paper introduces MedPromptX, a clinical decision support system using multimodal large language models (MLLMs), few-shot prompting (FP), and visual grounding (VG) for chest X-ray diagnosis, integrating imagery with EHR data. MedPromptX refines few-shot data dynamically for real-time adjustment to new patient scenarios and narrows the search area in X-ray images. The study introduces MedPromptX-VQA, a new visual question answering dataset, and demonstrates state-of-the-art performance with an 11% improvement in F1-score compared to baselines.

XReal: Realistic Anatomy and Pathology-Aware X-ray Generation via Controllable Diffusion Model

arXiv · · CV Healthcare

Researchers from MBZUAI have developed XReal, a diffusion model for generating realistic chest X-ray images with precise control over anatomy and pathology location. The model utilizes an Anatomy Controller and a Pathology Controller to introduce spatial control in a pre-trained Text-to-Image Diffusion Model without fine-tuning. XReal outperforms existing X-ray diffusion models in realism, as evaluated by quantitative metrics and radiologists' ratings, and the code/weights are available.

MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

arXiv · · LLM Research

Researchers from MBZUAI have released MobiLlama, a fully transparent open-source 0.5 billion parameter Small Language Model (SLM). MobiLlama is designed for resource-constrained devices, emphasizing enhanced performance with reduced resource demands. The full training data pipeline, code, model weights, and checkpoints are available on Github.

Early and Accurate Detection of Tomato Leaf Diseases Using TomFormer

arXiv · · CV Research

Researchers introduce TomFormer, a transformer-based model for accurate and early detection of tomato leaf diseases, with the goal of deployment on the Hello Stretch robot for real-time diagnosis. TomFormer combines a visual transformer and CNN, achieving state-of-the-art results on KUTomaDATA, PlantDoc, and PlantVillage datasets. KUTomaDATA was collected from a greenhouse in Abu Dhabi, UAE.

Tomato Maturity Recognition with Convolutional Transformers

arXiv · · CV Research

This paper introduces a convolutional transformer model for classifying tomato maturity, along with a new UAE-sourced dataset, KUTomaData, for training segmentation and classification models. The model combines CNNs and transformers and was tested against two public datasets. Results showed state-of-the-art performance, outperforming existing methods by significant margins in mAP scores across all three datasets.

EchoCoTr: Estimation of the Left Ventricular Ejection Fraction from Spatiotemporal Echocardiography

arXiv · · CV Healthcare

Researchers from MBZUAI have developed EchoCoTr, a novel spatiotemporal deep learning method for estimating left ventricular ejection fraction (LVEF) from echocardiograms. EchoCoTr combines CNNs and vision transformers to overcome the limitations of each when applied to medical video data. The method achieves state-of-the-art results on the EchoNet-Dynamic dataset, demonstrating improved accuracy compared to existing approaches, with code available on GitHub.

Can LLMs Automate Fact-Checking Article Writing?

arXiv · · NLP LLM

Researchers at MBZUAI have introduced QRAFT, an LLM-based framework designed to automate the generation of fact-checking articles. The system mimics the writing workflow of human fact-checkers, aiming to bridge the gap between automated fact-checking systems and public dissemination. While QRAFT outperforms existing text-generation methods, it still falls short of expert-written articles, highlighting areas for further research.

Movement Control of Smart Mosque's Domes using CSRNet and Fuzzy Logic Techniques

arXiv · · CV Robotics

This paper proposes a smart dome model for mosques that uses AI to control dome movements based on weather conditions and overcrowding. The model utilizes Congested Scene Recognition Network (CSRNet) and fuzzy logic techniques in Python to determine when to open and close the domes to maintain fresh air and sunlight. The goal is to automatically manage dome operation based on real-time data, specifying the duration for which the domes should remain open each hour.

LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

arXiv · · NLP LLM

MBZUAI researchers release LLM-DetectAIve, a tool for fine-grained detection of machine-generated text across four categories: human-written, machine-generated, machine-written then humanized, and human-written then machine-polished. The tool aims to address concerns about misuse of LLMs, especially in education and academia, by identifying attempts to obfuscate or polish content. LLM-DetectAIve is publicly accessible with code and a demonstration video provided.

Domain Adaptable Fine-Tune Distillation Framework For Advancing Farm Surveillance

arXiv · · CV Research

The paper introduces a framework for camel farm monitoring using a combination of automated annotation and fine-tune distillation. The Unified Auto-Annotation framework uses GroundingDINO and SAM to automatically annotate surveillance video data. The Fine-Tune Distillation framework then fine-tunes student models like YOLOv8, transferring knowledge from a larger teacher model, using data from Al-Marmoom Camel Farm in Dubai.

Dates Fruit Disease Recognition using Machine Learning

arXiv · · CV Healthcare

This paper proposes a machine learning method for early detection and classification of date fruit diseases, which are economically important to countries like Saudi Arabia. The method uses a hybrid feature extraction approach combining L*a*b color features, statistical features, and Discrete Wavelet Transform (DWT) texture features. Experiments using a dataset of 871 images achieved the highest average accuracy using Random Forest (RF), Multilayer Perceptron (MLP), Naïve Bayes (NB), and Fuzzy Decision Trees (FDT) classifiers.

er.autopilot 1.0: The Full Autonomous Stack for Oval Racing at High Speeds

arXiv · · Robotics RL

Team TII EuroRacing (TII-ER) developed a full autonomous software stack for oval racing, enabling speeds above 75 m/s (270 km/h). The software includes modules for perception, planning, control, vehicle dynamics modeling, simulation, telemetry, and safety. The team achieved second and third place in the first two Indy Autonomous Challenge events using this stack.

Spot-the-Camel: Computer Vision for Safer Roads

arXiv · · CV Research

Researchers in Saudi Arabia are applying computer vision techniques to reduce Camel-Vehicle Collisions (CVCs). They tested object detection models including CenterNet, EfficientDet, Faster R-CNN, SSD, and YOLOv8 on the task, finding YOLOv8 to be the most accurate and efficient. Future work will focus on developing a system to improve road safety in rural areas.

Computer Vision for a Camel-Vehicle Collision Mitigation System

arXiv · · CV Research

Researchers are exploring computer vision models to mitigate Camel-Vehicle Collisions (CVC) in Saudi Arabia, which have a high fatality rate. They tested CenterNet, EfficientDet, Faster R-CNN, and SSD for camel detection, finding CenterNet to be the most accurate and efficient. Future work involves developing a comprehensive system to enhance road safety in rural areas.

A Missing and Found Recognition System for Hajj and Umrah

arXiv · · CV Robotics

A proposed recognition system aims to identify missing persons, deceased individuals, and lost objects during the Hajj and Umrah pilgrimages in Saudi Arabia. The system intends to leverage facial recognition and object identification to manage the large crowds expected in the coming decade, estimated to reach 20 million pilgrims. It will be integrated into the CrowdSensing system for crowd estimation, management, and safety.

Web-Based Expert System for Civil Service Regulations: RCSES

arXiv · · Research Arabic AI

The paper introduces a web-based expert system called RCSES for civil service regulations in Saudi Arabia. The system covers 17 regulations and utilizes XML for knowledge representation and ASP.net for rule-based inference. RCSES was validated by domain experts and technical users, and compared favorably to other web-based expert systems.

Alumni Spotlight: How Abdelrahman Shaker learned to redefine impact in AI

MBZUAI · · Research CV

MBZUAI alumnus Abdelrahman Shaker discusses his evolving perspective on impactful AI research, shifting from publication counts to real-world usefulness. He highlights the success of his SwiftFormer and EdgeNext papers, which have been adopted by third parties and reached millions of users. Shaker chose MBZUAI for its faculty expertise, which led to 10 publications and over 2,500 citations during his Ph.D.