Skip to content
GCC AI Research

Search

Results for "GeoChat"

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

arXiv ·

Researchers at MBZUAI have developed GeoChat, a new vision-language model (VLM) specifically designed for remote sensing imagery. GeoChat addresses the limitations of general-domain VLMs in accurately interpreting high-resolution remote sensing data, offering both image-level and region-specific dialogue capabilities. The model is trained on a novel remote sensing multimodal instruction-following dataset and demonstrates strong zero-shot performance across tasks like image captioning and visual question answering.

Changing the landscape: A vision language model to revolutionize remote sensing

MBZUAI ·

MBZUAI, in partnership with IBM Research, is developing GeoChat+, a vision-language model (VLM) for multi-modal, temporal remote sensing image analysis. GeoChat+ builds on the previous GeoChat model, enhancing capabilities with multi-modal images from various Earth observation systems like Sentinel-1, Sentinel-2, Landsat, and high-resolution imagery. GeoChat+ will integrate data from multiple satellites at different times to detect environmental changes and analyze the impact on soil quality, air quality, and erosion. Why it matters: This advancement promises to revolutionize geographic data analysis, providing detailed reports for high-risk regions and aiding reforestation efforts.

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

arXiv ·

Video-ChatGPT is a new multimodal model that combines a video-adapted visual encoder with a large language model (LLM) to enable detailed video understanding and conversation. The authors introduce a new dataset of 100,000 video-instruction pairs for training the model. They also develop a quantitative evaluation framework for video-based dialogue models.

New multimodal model brings pixel-level precision to satellite imagery

MBZUAI ·

MBZUAI researchers have developed GeoPixel, a new multimodal model for pixel grounding in remote sensing images. GeoPixel associates individual pixels with object categories, enabling detailed image analysis by linking language to objects at the pixel level. The model was trained on a new dataset and benchmark, outperforming existing systems in precision. Why it matters: This advancement enhances the utility of remote sensing data for critical applications like environmental management and disaster response by providing more granular and accurate image interpretation.

Datacenters in the Desert: Feasibility and Sustainability of LLM Inference in the Middle East

arXiv ·

This paper analyzes the energy consumption and carbon footprint of LLM inference in the UAE compared to Iceland, Germany, and the USA. The study uses DeepSeek Coder 1.3B and the HumanEval dataset to evaluate code generation. It provides a comparative analysis of geographical trade-offs for climate-aware AI deployment, specifically addressing the challenges and potential of datacenters in desert regions.

KAUST Distinguished Professor Marc Genton awarded lectureship

KAUST ·

KAUST Professor Marc Genton has been selected as the 2020 Georges Matheron Lecturer of the International Association for Mathematical Geosciences. Genton will present a lecture at the 36th International Geological Congress in Delhi, India, focusing on geostatistics, climate model outputs, and the ExaGeoStat software developed at KAUST. His lecture will cover Matheron's theory of regionalized variables and showcase ExaGeoStat, a high-performance software for geostatistics with exascale computing capability developed at KAUST. Why it matters: This recognition highlights KAUST's contributions to advanced statistical methods and high-performance computing in geosciences, enhancing its international reputation in these fields.

A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos

arXiv ·

A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.