Skip to content
GCC AI Research

Search

Results for "maximum entropy exploration"

Fast Rates for Maximum Entropy Exploration

MBZUAI ·

This paper addresses exploration in reinforcement learning (RL) in unknown environments with sparse rewards, focusing on maximum entropy exploration. It introduces a game-theoretic algorithm for visitation entropy maximization with improved sample complexity of O(H^3S^2A/ε^2). For trajectory entropy, the paper presents an algorithm with O(poly(S, A, H)/ε) complexity, showing the statistical advantage of regularized MDPs for exploration. Why it matters: The research offers new techniques to reduce the sample complexity of RL, potentially enhancing the efficiency of AI agents in complex environments.

Diffusion-BBO: Diffusion-Based Inverse Modeling for Online Black-Box Optimization

arXiv ·

This paper introduces Diffusion-BBO, a new online black-box optimization (BBO) framework that uses a conditional diffusion model as an inverse surrogate model. The framework employs an Uncertainty-aware Exploration (UaE) acquisition function to propose scores in the objective space for conditional sampling. The approach is shown theoretically to achieve a near-optimal solution and empirically outperforms existing online BBO baselines across 6 scientific discovery tasks.

Discrete and Continuous Submodular Bandits with Full Bandit Feedback

MBZUAI ·

Vaneet Aggarwal from Purdue University presented new research on discrete and continuous submodular bandits with full bandit feedback. The research introduces a framework transforming discrete offline approximation algorithms into sublinear α-regret methods using bandit feedback. Additionally, it introduces a unified approach for maximizing continuous DR-submodular functions, accommodating various settings and oracle access types. Why it matters: This research provides new methods for optimization under uncertainty, which is crucial for real-world AI applications in the region, such as resource allocation and automated decision-making.

Distillation Policy Optimization

arXiv ·

The paper introduces a novel actor-critic framework called Distillation Policy Optimization that combines on-policy and off-policy data for reinforcement learning. It incorporates variance reduction mechanisms like a unified advantage estimator (UAE) and a residual baseline. The empirical results demonstrate improved sample efficiency for on-policy algorithms, bridging the gap with off-policy methods.

Multi-agent Time-based Decision-making for the Search and Action Problem

arXiv ·

This paper introduces a decentralized multi-agent decision-making framework for search and action problems under time constraints, treating time as a budgeted resource where actions have costs and rewards. The approach uses probabilistic reasoning to optimize decisions, maximizing reward within the given time. Evaluated in a simulated search, pick, and place scenario inspired by the Mohamed Bin Zayed International Robotics Challenge (MBZIRC), the algorithm outperformed benchmark strategies. Why it matters: The framework's validation in a Gazebo environment signals potential for real-world robotic applications, particularly in time-sensitive and cooperative tasks within the robotics domain in the UAE.

Gaussian Variational Inference in high dimension

MBZUAI ·

This article discusses approximating a high-dimensional distribution using Gaussian variational inference by minimizing Kullback-Leibler divergence. It builds upon previous research and approximates the minimizer using a Gaussian distribution with specific mean and variance. The study details approximation accuracy and applicability using efficient dimension, relevant for analyzing sampling schemes in optimization. Why it matters: This theoretical research can inform the development of more efficient and accurate AI algorithms, particularly in areas dealing with high-dimensional data such as machine learning and data analysis.

Towards Robust Multimodal Open-set Test-time Adaptation via Adaptive Entropy-aware Optimization

arXiv ·

This paper introduces Adaptive Entropy-aware Optimization (AEO), a new framework to tackle Multimodal Open-set Test-time Adaptation (MM-OSTTA). AEO uses Unknown-aware Adaptive Entropy Optimization (UAE) and Adaptive Modality Prediction Discrepancy Optimization (AMP) to distinguish unknown class samples during online adaptation by amplifying the entropy difference between known and unknown samples. The study establishes a new benchmark derived from existing datasets with five modalities and evaluates AEO's performance across various domain shift scenarios, demonstrating its effectiveness in long-term and continual MM-OSTTA settings.

Bayesian Optimization-based Tire Parameter and Uncertainty Estimation for Real-World Data

arXiv ·

This paper introduces a Bayesian optimization method for estimating tire parameters and their uncertainty, addressing a gap in existing literature. The methodology uses Stochastic Variational Inference to estimate parameters and uncertainties, and it is validated against a Nelder-Mead algorithm. The approach is applied to real-world data from the Abu Dhabi Autonomous Racing League, revealing uncertainties in identifying curvature and shape parameters due to insufficient excitation. Why it matters: The research provides a practical tool for assessing tire model parameters in real-world conditions, with implications for autonomous racing and vehicle dynamics modeling in the GCC region.