Skip to content
GCC AI Research

Search

Results for "pixel grounding"

New multimodal model brings pixel-level precision to satellite imagery

MBZUAI ·

MBZUAI researchers have developed GeoPixel, a new multimodal model for pixel grounding in remote sensing images. GeoPixel associates individual pixels with object categories, enabling detailed image analysis by linking language to objects at the pixel level. The model was trained on a new dataset and benchmark, outperforming existing systems in precision. Why it matters: This advancement enhances the utility of remote sensing data for critical applications like environmental management and disaster response by providing more granular and accurate image interpretation.

PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

arXiv ·

MBZUAI researchers introduce PG-Video-LLaVA, a large multimodal model with pixel-level grounding capabilities for videos, integrating audio cues for enhanced understanding. The model uses an off-the-shelf tracker and grounding module to localize objects in videos based on user prompts. PG-Video-LLaVA is evaluated on video question-answering and grounding benchmarks, using Vicuna instead of GPT-3.5 for reproducibility.

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

arXiv ·

The paper introduces the Prism Hypothesis, which posits a correspondence between an encoder's feature spectrum and its functional role, with semantic encoders capturing low-frequency components and pixel encoders retaining high-frequency information. Based on this, the authors propose Unified Autoencoding (UAE), a model that harmonizes semantic structure and pixel details using a frequency-band modulator. Experiments on ImageNet and MS-COCO demonstrate that UAE effectively unifies semantic abstraction and pixel-level fidelity, achieving state-of-the-art performance.

MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis

arXiv ·

The paper introduces MedPromptX, a clinical decision support system using multimodal large language models (MLLMs), few-shot prompting (FP), and visual grounding (VG) for chest X-ray diagnosis, integrating imagery with EHR data. MedPromptX refines few-shot data dynamically for real-time adjustment to new patient scenarios and narrows the search area in X-ray images. The study introduces MedPromptX-VQA, a new visual question answering dataset, and demonstrates state-of-the-art performance with an 11% improvement in F1-score compared to baselines.

Enabling Practical and Rich User Digitization

MBZUAI ·

A computer science vision involves computing devices becoming proactive assistants, enhancing various aspects of life through user digitization. Current devices provide coarse digital representations of users, but there's significant potential for improvement. Karan, a Ph.D. candidate at CMU, develops technologies for consumer devices to capture richer user representations without sacrificing practicality. Why it matters: Advancements in user digitization can lead to improved extended reality experiences, health tracking, and more productive work environments, enhancing the utility of consumer devices.

Blurring the lines between the physical and digital

MBZUAI ·

A panel discussion at Manarat Al Saadiyat, featuring MBZUAI's Elizabeth Churchill, explored the evolving relationship between the physical and digital worlds. The panel, titled 'Body as medium: InterFACES: Skin/Screen,' addressed how hyper-connectivity and digital amplification alter our understanding of the human body and its limits. Churchill highlighted the profound shift occurring as we navigate the era of AI and its implications for human beings. Why it matters: The discussion underscores the increasing importance of understanding the ethical, social, and existential questions arising from the intersection of AI and human identity in the digital age.

Advance Simulation Method for Wheel-Terrain Interactions of Space Rovers: A Case Study on the UAE Rashid Rover

arXiv ·

This paper introduces a virtual wheel-terrain interaction model developed and validated for the UAE Rashid rover to enhance simulation accuracy for space rovers. The model incorporates wheel grouser properties, slippage, soil properties, and interaction mechanics, validated via lunar soil simulation. Experiments tested a Grouser-Rashid rover wheel at slip ratios of 0, 0.25, 0.50, and 0.75. Why it matters: This simulation method advances rover design and control, crucial for the UAE's space exploration program and lunar mission success.