Skip to content
GCC AI Research

Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models

arXiv · · Significant research

Summary

Researchers have introduced BloomBench, a new cognitively human-grounded, bilingual (English-Arabic) multimodal benchmark for Vision-Language Models (VLMs), as part of the Almieyar benchmarking series. Grounded in Bloom's Taxonomy, it systematically evaluates six levels of cognition—Remember, Understand, Apply, Analyze, Evaluate, Create—through carefully designed image-question-answer tasks. A comprehensive study using BloomBench revealed that state-of-the-art VLMs exhibit strong semantic understanding but struggle significantly with factual recall and creative synthesis, alongside a critical performance gap between Arabic and English. Why it matters: This benchmark provides a crucial tool for diagnosing cognitive weaknesses in current VLMs and lays the groundwork for developing more cognitively aligned and inclusive multimodal AI, particularly for cross-lingual applications.

Get the weekly digest

Top AI stories from the GCC region, every week.