Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models

arXiv · June 4, 2026 · Significant research

Summary

Researchers have introduced BloomBench, a new cognitively human-grounded, bilingual (English-Arabic) multimodal benchmark for Vision-Language Models (VLMs), as part of the Almieyar benchmarking series. Grounded in Bloom's Taxonomy, it systematically evaluates six levels of cognition—Remember, Understand, Apply, Analyze, Evaluate, Create—through carefully designed image-question-answer tasks. A comprehensive study using BloomBench revealed that state-of-the-art VLMs exhibit strong semantic understanding but struggle significantly with factual recall and creative synthesis, alongside a critical performance gap between Arabic and English. Why it matters: This benchmark provides a crucial tool for diagnosing cognitive weaknesses in current VLMs and lays the groundwork for developing more cognitively aligned and inclusive multimodal AI, particularly for cross-lingual applications.

Keywords

BloomBench · Almieyar · Vision-Language Models (VLMs) · Cognitive evaluation · Bilingual benchmark

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.