VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

arXiv · June 5, 2025 · Significant research

Summary

MBZUAI researchers introduce VideoMathQA, a new benchmark for evaluating mathematical reasoning in videos, requiring models to interpret visual information, text, and spoken cues. The dataset spans 10 mathematical domains with videos ranging from 10 seconds to over 1 hour, and includes multi-step reasoning annotations. The benchmark aims to evaluate temporal cross-modal reasoning and highlights the limitations of existing approaches in complex video-based mathematical problem solving.

Keywords

VideoMathQA · benchmark · mathematical reasoning · multimodal · cross-modal reasoning

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

A Culturally-diverse Multilingual Multimodal Video Benchmark & Model

arXiv · Jun 8

A new benchmark, ViMUL-Bench, is introduced to evaluate video LLMs across 14 languages, including Arabic, with a focus on cultural inclusivity. The benchmark includes 8k manually verified samples across 15 categories and varying video durations. A multilingual video LLM, ViMUL, is also presented, along with a training set of 1.2 million samples, with both to be publicly released.

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

Summary

Keywords

Related

A Culturally-diverse Multilingual Multimodal Video Benchmark & Model