Discrete and Continuous Submodular Bandits with Full Bandit Feedback

MBZUAI · Notable

RL Research Optimization KAUST Bandit Algorithms

Summary

Vaneet Aggarwal from Purdue University presented new research on discrete and continuous submodular bandits with full bandit feedback. The research introduces a framework transforming discrete offline approximation algorithms into sublinear α-regret methods using bandit feedback. Additionally, it introduces a unified approach for maximizing continuous DR-submodular functions, accommodating various settings and oracle access types. Why it matters: This research provides new methods for optimization under uncertainty, which is crucial for real-world AI applications in the region, such as resource allocation and automated decision-making.

Keywords

submodular bandits · bandit feedback · offline algorithms · DR-submodular functions · optimization

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

Diffusion-BBO: Diffusion-Based Inverse Modeling for Online Black-Box Optimization

arXiv · Jun 30

This paper introduces Diffusion-BBO, a new online black-box optimization (BBO) framework that uses a conditional diffusion model as an inverse surrogate model. The framework employs an Uncertainty-aware Exploration (UaE) acquisition function to propose scores in the objective space for conditional sampling. The approach is shown theoretically to achieve a near-optimal solution and empirically outperforms existing online BBO baselines across 6 scientific discovery tasks.

Fast Rates for Maximum Entropy Exploration

MBZUAI · Invalid Date

This paper addresses exploration in reinforcement learning (RL) in unknown environments with sparse rewards, focusing on maximum entropy exploration. It introduces a game-theoretic algorithm for visitation entropy maximization with improved sample complexity of O(H^3S^2A/ε^2). For trajectory entropy, the paper presents an algorithm with O(poly(S, A, H)/ε) complexity, showing the statistical advantage of regularized MDPs for exploration. Why it matters: The research offers new techniques to reduce the sample complexity of RL, potentially enhancing the efficiency of AI agents in complex environments.

Distillation Policy Optimization

arXiv · Feb 1

The paper introduces a novel actor-critic framework called Distillation Policy Optimization that combines on-policy and off-policy data for reinforcement learning. It incorporates variance reduction mechanisms like a unified advantage estimator (UAE) and a residual baseline. The empirical results demonstrate improved sample efficiency for on-policy algorithms, bridging the gap with off-policy methods.

A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation

arXiv · Jul 26

This paper introduces a unified deep autoregressive model (UAE) for cardinality estimation that learns joint data distributions from both data and query workloads. It uses differentiable progressive sampling with the Gumbel-Softmax trick to incorporate supervised query information into the deep autoregressive model. Experiments show UAE achieves better accuracy and efficiency compared to state-of-the-art methods.