From YOLO to VLMs: Advancing Zero-Shot and Few-Shot Detection of Wastewater Treatment Plants Using Satellite Imagery in MENA Region

arXiv · December 16, 2025 · Significant research

Summary

A new study compares vision-language models (VLMs) to YOLOv8 for wastewater treatment plant (WWTP) identification in satellite imagery across the MENA region. VLMs like Gemma-3 demonstrate superior zero-shot performance compared to YOLOv8, trained on a dataset of 83,566 satellite images from Egypt, Saudi Arabia, and UAE. The research suggests VLMs offer a scalable, annotation-free alternative for remote sensing of WWTPs.

Keywords

wastewater treatment plant · satellite imagery · vision-language model · zero-shot learning · YOLOv8

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

arXiv · Nov 24

Researchers at MBZUAI have developed GeoChat, a new vision-language model (VLM) specifically designed for remote sensing imagery. GeoChat addresses the limitations of general-domain VLMs in accurately interpreting high-resolution remote sensing data, offering both image-level and region-specific dialogue capabilities. The model is trained on a novel remote sensing multimodal instruction-following dataset and demonstrates strong zero-shot performance across tasks like image captioning and visual question answering.

Satellites are speaking a visual language that today’s AI doesn’t quite get

MBZUAI · Invalid Date

Researchers from MBZUAI, IBM, and ServiceNow introduced GEOBench-VLM, a benchmark for evaluating vision-language models on Earth observation tasks using satellite and aerial imagery. The benchmark includes over 10,000 human-verified instructions across 31 sub-tasks spanning object classification, localization, change detection, and more. GEOBench-VLM addresses the gap in current VLMs' ability to perform spatially grounded reasoning and change detection in satellite imagery. Why it matters: This benchmark will drive progress in AI's ability to analyze satellite data for critical applications like disaster response, climate monitoring, and urban planning in the Middle East and globally.

From YOLO to VLMs: Advancing Zero-Shot and Few-Shot Detection of Wastewater Treatment Plants Using Satellite Imagery in MENA Region

Summary

Keywords

Related

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

Satellites are speaking a visual language that today’s AI doesn’t quite get