Skip to content
GCC AI Research

Changing the landscape: A vision language model to revolutionize remote sensing

MBZUAI · Significant research

Summary

MBZUAI, in partnership with IBM Research, is developing GeoChat+, a vision-language model (VLM) for multi-modal, temporal remote sensing image analysis. GeoChat+ builds on the previous GeoChat model, enhancing capabilities with multi-modal images from various Earth observation systems like Sentinel-1, Sentinel-2, Landsat, and high-resolution imagery. GeoChat+ will integrate data from multiple satellites at different times to detect environmental changes and analyze the impact on soil quality, air quality, and erosion. Why it matters: This advancement promises to revolutionize geographic data analysis, providing detailed reports for high-risk regions and aiding reforestation efforts.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

A new vision-language model for analyzing remote sensing data | CVPR

MBZUAI ·

Researchers at MBZUAI, IBM Research, and other institutions have developed EarthDial, a new vision-language model (VLM) specifically designed to process geospatial data from remote sensing technologies. EarthDial handles data in multiple modalities and resolutions, processing images captured at different times to observe environmental changes. The model outperformed others on over 40 tasks including image classification, object detection, and change detection. Why it matters: This unified model bridges the gap between generic VLMs and domain-specific models, enabling complex geospatial data analysis for applications like disaster assessment and climate monitoring in the region.

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

arXiv ·

Researchers at MBZUAI have developed GeoChat, a new vision-language model (VLM) specifically designed for remote sensing imagery. GeoChat addresses the limitations of general-domain VLMs in accurately interpreting high-resolution remote sensing data, offering both image-level and region-specific dialogue capabilities. The model is trained on a novel remote sensing multimodal instruction-following dataset and demonstrates strong zero-shot performance across tasks like image captioning and visual question answering.

Satellites are speaking a visual language that today’s AI doesn’t quite get

MBZUAI ·

Researchers from MBZUAI, IBM, and ServiceNow introduced GEOBench-VLM, a benchmark for evaluating vision-language models on Earth observation tasks using satellite and aerial imagery. The benchmark includes over 10,000 human-verified instructions across 31 sub-tasks spanning object classification, localization, change detection, and more. GEOBench-VLM addresses the gap in current VLMs' ability to perform spatially grounded reasoning and change detection in satellite imagery. Why it matters: This benchmark will drive progress in AI's ability to analyze satellite data for critical applications like disaster response, climate monitoring, and urban planning in the Middle East and globally.

From YOLO to VLMs: Advancing Zero-Shot and Few-Shot Detection of Wastewater Treatment Plants Using Satellite Imagery in MENA Region

arXiv ·

A new study compares vision-language models (VLMs) to YOLOv8 for wastewater treatment plant (WWTP) identification in satellite imagery across the MENA region. VLMs like Gemma-3 demonstrate superior zero-shot performance compared to YOLOv8, trained on a dataset of 83,566 satellite images from Egypt, Saudi Arabia, and UAE. The research suggests VLMs offer a scalable, annotation-free alternative for remote sensing of WWTPs.