This paper addresses exploration in reinforcement learning (RL) in unknown environments with sparse rewards, focusing on maximum entropy exploration. It introduces a game-theoretic algorithm for visitation entropy maximization with improved sample complexity of O(H^3S^2A/ε^2). For trajectory entropy, the paper presents an algorithm with O(poly(S, A, H)/ε) complexity, showing the statistical advantage of regularized MDPs for exploration. Why it matters: The research offers new techniques to reduce the sample complexity of RL, potentially enhancing the efficiency of AI agents in complex environments.
The paper introduces a novel actor-critic framework called Distillation Policy Optimization that combines on-policy and off-policy data for reinforcement learning. It incorporates variance reduction mechanisms like a unified advantage estimator (UAE) and a residual baseline. The empirical results demonstrate improved sample efficiency for on-policy algorithms, bridging the gap with off-policy methods.
A Marie Curie Fellow from Inria and UIUC presented research on stochastic gradient descent (SGD) through the lens of Markov processes, exploring the relationships between heavy-tailed distributions, generalization error, and algorithmic stability. The research challenges existing theories about the monotonic relationship between heavy tails and generalization error. It introduces a unified approach for proving Wasserstein stability bounds in stochastic optimization, applicable to convex and non-convex losses. Why it matters: The work provides novel insights into the theoretical underpinnings of stochastic optimization, relevant to researchers at MBZUAI and other institutions in the region working on machine learning algorithms.
Mladen Kolar from the University of Chicago Booth School of Business discussed stochastic optimization with equality constraints at MBZUAI. He presented a stochastic algorithm based on sequential quadratic programming (SQP) using a differentiable exact augmented Lagrangian. The algorithm adapts random stepsizes using a stochastic line search procedure, establishing global "almost sure" convergence. Why it matters: The presentation highlights MBZUAI's role in hosting discussions on advanced optimization techniques, fostering research and knowledge exchange in the field of machine learning.
This paper introduces a decentralized multi-agent decision-making framework for search and action problems under time constraints, treating time as a budgeted resource where actions have costs and rewards. The approach uses probabilistic reasoning to optimize decisions, maximizing reward within the given time. Evaluated in a simulated search, pick, and place scenario inspired by the Mohamed Bin Zayed International Robotics Challenge (MBZIRC), the algorithm outperformed benchmark strategies. Why it matters: The framework's validation in a Gazebo environment signals potential for real-world robotic applications, particularly in time-sensitive and cooperative tasks within the robotics domain in the UAE.
Keith Ross, Dean of Computer Science, Data Science and Engineering at NYU Shanghai, will be giving a talk on recent advances in Deep Reinforcement Learning (DRL). The talk will review DRL breakthroughs and discuss algorithmic research on DRL for high-dimensional state and action spaces, with applications to robotic locomotion. Ross's research interests include deep reinforcement learning, Internet privacy, peer-to-peer networking, and computer network modeling. Why it matters: Reinforcement learning is a core area of AI research in the GCC region, and a talk by a prominent researcher can help inform and inspire local researchers.