Skip to content
GCC AI Research

Search

Results for "vision-language models"

A new approach to improve vision-language models

MBZUAI ·

MBZUAI researchers have developed a new approach to enhance the generalizability of vision-language models when processing out-of-distribution data. The study, led by Sheng Zhang and involving multiple MBZUAI professors and researchers, addresses the challenge of AI applications needing to manage unforeseen circumstances. The new method aims to improve how these models, which combine natural language processing and computer vision, handle new information not used during training. Why it matters: Improving the adaptability of vision-language models is critical for real-world AI applications like autonomous driving and medical imaging, especially in diverse and changing environments.

Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models

arXiv ·

Researchers have introduced BloomBench, a new cognitively human-grounded, bilingual (English-Arabic) multimodal benchmark for Vision-Language Models (VLMs), as part of the Almieyar benchmarking series. Grounded in Bloom's Taxonomy, it systematically evaluates six levels of cognition—Remember, Understand, Apply, Analyze, Evaluate, Create—through carefully designed image-question-answer tasks. A comprehensive study using BloomBench revealed that state-of-the-art VLMs exhibit strong semantic understanding but struggle significantly with factual recall and creative synthesis, alongside a critical performance gap between Arabic and English. Why it matters: This benchmark provides a crucial tool for diagnosing cognitive weaknesses in current VLMs and lays the groundwork for developing more cognitively aligned and inclusive multimodal AI, particularly for cross-lingual applications.