MBZUAI researchers have developed GeoPixel, a new multimodal model for pixel grounding in remote sensing images. GeoPixel associates individual pixels with object categories, enabling detailed image analysis by linking language to objects at the pixel level. The model was trained on a new dataset and benchmark, outperforming existing systems in precision. Why it matters: This advancement enhances the utility of remote sensing data for critical applications like environmental management and disaster response by providing more granular and accurate image interpretation.
This paper introduces a novel two-step method for predicting urban expansion using time-series satellite imagery. The approach combines semantic image segmentation with a CNN-LSTM model to learn temporal features. Experiments on satellite images from Riyadh, Jeddah, and Dammam in Saudi Arabia demonstrate improved performance compared to existing methods based on Mean Square Error, Root Mean Square Error, Peak Signal to Noise Ratio, Structural Similarity Index, and overall classification accuracy.
Researchers at MBZUAI have developed GeoChat, a new vision-language model (VLM) specifically designed for remote sensing imagery. GeoChat addresses the limitations of general-domain VLMs in accurately interpreting high-resolution remote sensing data, offering both image-level and region-specific dialogue capabilities. The model is trained on a novel remote sensing multimodal instruction-following dataset and demonstrates strong zero-shot performance across tasks like image captioning and visual question answering.
MBZUAI, in partnership with IBM Research, is developing GeoChat+, a vision-language model (VLM) for multi-modal, temporal remote sensing image analysis. GeoChat+ builds on the previous GeoChat model, enhancing capabilities with multi-modal images from various Earth observation systems like Sentinel-1, Sentinel-2, Landsat, and high-resolution imagery. GeoChat+ will integrate data from multiple satellites at different times to detect environmental changes and analyze the impact on soil quality, air quality, and erosion. Why it matters: This advancement promises to revolutionize geographic data analysis, providing detailed reports for high-risk regions and aiding reforestation efforts.
Researchers at MBZUAI, IBM Research, and other institutions have developed EarthDial, a new vision-language model (VLM) specifically designed to process geospatial data from remote sensing technologies. EarthDial handles data in multiple modalities and resolutions, processing images captured at different times to observe environmental changes. The model outperformed others on over 40 tasks including image classification, object detection, and change detection. Why it matters: This unified model bridges the gap between generic VLMs and domain-specific models, enabling complex geospatial data analysis for applications like disaster assessment and climate monitoring in the region.
This article discusses the evolution of mobile extended reality (MEX) and its potential to revolutionize urban interaction. It highlights the convergence of augmented and virtual reality technologies for mobile usage. A novel approach to 3D models, characterized as urban situated models or “3D-plus-time” (4D.City), is introduced. Why it matters: The development of MEX and 4D.City could significantly enhance user experience and analog-digital convergence in urban environments, offering new possibilities for human-computer interaction.
This paper introduces a novel approach for monitoring and analyzing the evolution of complex geographic objects in satellite image time-series. The method uses a spatiotemporal graph and constraint satisfaction problems (CSP) to model and analyze object changes. Experiments on real-world satellite images from Saudi Arabian cities demonstrate the effectiveness of the proposed approach.
Researchers in Saudi Arabia have developed a deep learning framework for automated counting and geolocation of palm trees using aerial images. The system uses a Faster R-CNN model trained on a dataset of 10,000 palm tree instances collected in the Kharj region using DJI drones. Geolocation accuracy of 2.8m was achieved using geotagged metadata and photogrammetry techniques.