GCC AI Research

Archive Monthly

November 2023

7 articles

Top Stories

UAE: Universal Anatomical Embedding on Multi-modality Medical Images

arXiv · · CV Healthcare

Researchers propose a universal anatomical embedding (UAE) framework for medical image analysis to learn appearance, semantic, and cross-modality anatomical embeddings. UAE incorporates semantic embedding learning with prototypical contrastive loss, a fixed-point-based matching strategy, and an iterative approach for cross-modality embedding learning. The framework was evaluated on landmark detection, lesion tracking and CT-MRI registration tasks, outperforming existing state-of-the-art methods.

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

arXiv · · CV LLM

Researchers at MBZUAI have developed GeoChat, a new vision-language model (VLM) specifically designed for remote sensing imagery. GeoChat addresses the limitations of general-domain VLMs in accurately interpreting high-resolution remote sensing data, offering both image-level and region-specific dialogue capabilities. The model is trained on a novel remote sensing multimodal instruction-following dataset and demonstrates strong zero-shot performance across tasks like image captioning and visual question answering.

PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

arXiv · · LLM CV

MBZUAI researchers introduce PG-Video-LLaVA, a large multimodal model with pixel-level grounding capabilities for videos, integrating audio cues for enhanced understanding. The model uses an off-the-shelf tracker and grounding module to localize objects in videos based on user prompts. PG-Video-LLaVA is evaluated on video question-answering and grounding benchmarks, using Vicuna instead of GPT-3.5 for reproducibility.