This paper addresses exploration in reinforcement learning (RL) in unknown environments with sparse rewards, focusing on maximum entropy exploration. It introduces a game-theoretic algorithm for visitation entropy maximization with improved sample complexity of O(H^3S^2A/ε^2). For trajectory entropy, the paper presents an algorithm with O(poly(S, A, H)/ε) complexity, showing the statistical advantage of regularized MDPs for exploration. Why it matters: The research offers new techniques to reduce the sample complexity of RL, potentially enhancing the efficiency of AI agents in complex environments.
MBZUAI researchers created Open CaptchaWorld, a new benchmark to test AI agents on solving CAPTCHAs. The benchmark includes 20 modern CAPTCHA types that require perception, reasoning, and interactive actions within a browser. While humans achieve 93.3% accuracy, the best AI agent only reaches 40% on the benchmark. Why it matters: This research highlights a critical gap in current AI agent capabilities, as CAPTCHAs are gatekeepers to high-value web actions like e-commerce and secure logins.
This paper introduces a decentralized multi-agent decision-making framework for search and action problems under time constraints, treating time as a budgeted resource where actions have costs and rewards. The approach uses probabilistic reasoning to optimize decisions, maximizing reward within the given time. Evaluated in a simulated search, pick, and place scenario inspired by the Mohamed Bin Zayed International Robotics Challenge (MBZIRC), the algorithm outperformed benchmark strategies. Why it matters: The framework's validation in a Gazebo environment signals potential for real-world robotic applications, particularly in time-sensitive and cooperative tasks within the robotics domain in the UAE.
A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.