Skip to content
GCC AI Research

Search

Results for "RTM"

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning

arXiv ·

Researchers introduce MATRIX, a vision-centric agent tuning framework for robust tool-use reasoning in VLMs. The framework includes M-TRACE, a dataset of 28.5K multimodal tasks with 177K verified trajectories, and Pref-X, a set of 11K automatically generated preference pairs. Experiments show MATRIX consistently outperforms open- and closed-source VLMs across three benchmarks.