Skip to content
GCC AI Research

LLMs tackle math word problems

MBZUAI · Notable

Summary

MBZUAI researchers presented a study at NAACL 2024 analyzing errors made by open-source LLMs when solving math word problems. The study, led by Ekaterina Kochmar and KV Aditya Srivatsa, investigates characteristics that make math word problems difficult for machines. Llama2-70B was used to test the ability of LLMs to solve these problems, revealing that LLMs can perform math operations correctly but still give the wrong answer. Why it matters: The research aims to improve AI's ability to understand and solve math word problems, potentially leading to better educational applications and teaching methods.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR

arXiv ·

A new method is proposed to reduce the verbosity of LLMs in step-by-step reasoning by retaining moderately easy problems during Reinforcement Learning with Verifiable Rewards (RLVR) training. This approach acts as an implicit length regularizer, preventing the model from excessively increasing output length on harder problems. Experiments using Qwen3-4B-Thinking-2507 show the model achieves baseline accuracy with nearly twice shorter solutions.

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

arXiv ·

MBZUAI researchers introduce VideoMathQA, a new benchmark for evaluating mathematical reasoning in videos, requiring models to interpret visual information, text, and spoken cues. The dataset spans 10 mathematical domains with videos ranging from 10 seconds to over 1 hour, and includes multi-step reasoning annotations. The benchmark aims to evaluate temporal cross-modal reasoning and highlights the limitations of existing approaches in complex video-based mathematical problem solving.

Can LLMs reason? New benchmark puts models to the test

MBZUAI ·

MBZUAI researchers created a new benchmark dataset called TextGames to evaluate the reasoning abilities of LLMs. The dataset uses simple, text-based games requiring skills like pattern recognition and logical thinking. LLMs struggled with the hardest questions, suggesting limitations in their reasoning capabilities despite advancements in language understanding. Why it matters: This research highlights the need for specialized reasoning models and benchmarks that go beyond memorization to truly test AI's problem-solving abilities.

Solving complex problems with LLMs: A new prompting strategy presented at NeurIPS

MBZUAI ·

Researchers from MBZUAI and King's College London have developed a new prompting strategy called self-guided exploration to improve LLM performance on combinatorial problems. The method was tested on complex challenges like the traveling salesman problem. The findings will be presented at the 38th Annual Conference on Neural Information Processing Systems (NeurIPS) in Vancouver. Why it matters: This research could lead to practical applications of LLMs in industries like logistics, planning, and scheduling by offering new approaches to computationally complex problems.