The paper introduces a novel actor-critic framework called Distillation Policy Optimization that combines on-policy and off-policy data for reinforcement learning. It incorporates variance reduction mechanisms like a unified advantage estimator (UAE) and a residual baseline. The empirical results demonstrate improved sample efficiency for on-policy algorithms, bridging the gap with off-policy methods.
This paper addresses exploration in reinforcement learning (RL) in unknown environments with sparse rewards, focusing on maximum entropy exploration. It introduces a game-theoretic algorithm for visitation entropy maximization with improved sample complexity of O(H^3S^2A/ε^2). For trajectory entropy, the paper presents an algorithm with O(poly(S, A, H)/ε) complexity, showing the statistical advantage of regularized MDPs for exploration. Why it matters: The research offers new techniques to reduce the sample complexity of RL, potentially enhancing the efficiency of AI agents in complex environments.
The article discusses the importance of sample correlations in computer graphics, vision, and machine learning, highlighting how tailored randomness can improve the efficiency of existing models. It covers various correlations studied in computer graphics and tools to characterize them, including the use of neural networks for developing different correlations. Gurprit Singh from the Max Planck Institute for Informatics will be presenting on the topic. Why it matters: Optimizing sampling techniques via understanding and applying correlations can lead to significant advancements and efficiency gains across multiple AI fields.
This paper introduces Diffusion-BBO, a new online black-box optimization (BBO) framework that uses a conditional diffusion model as an inverse surrogate model. The framework employs an Uncertainty-aware Exploration (UaE) acquisition function to propose scores in the objective space for conditional sampling. The approach is shown theoretically to achieve a near-optimal solution and empirically outperforms existing online BBO baselines across 6 scientific discovery tasks.
MBZUAI's Samuel Horváth presented a new framework called Maestro at ICML 2024 for efficiently training machine learning models in federated settings. Maestro identifies and removes redundant components of a model through trainable decomposition to increase efficiency on edge devices. The approach decomposes layers into low-dimensional approximations, discarding unused aspects to reduce model size. Why it matters: This research addresses the challenge of running complex models on resource-constrained devices, crucial for expanding AI applications while preserving data privacy.
This article discusses approximating a high-dimensional distribution using Gaussian variational inference by minimizing Kullback-Leibler divergence. It builds upon previous research and approximates the minimizer using a Gaussian distribution with specific mean and variance. The study details approximation accuracy and applicability using efficient dimension, relevant for analyzing sampling schemes in optimization. Why it matters: This theoretical research can inform the development of more efficient and accurate AI algorithms, particularly in areas dealing with high-dimensional data such as machine learning and data analysis.
MBZUAI PhD graduate William de Vazelhes is researching hard-thresholding algorithms to enable AI to work from smaller datasets. His work focuses on optimization algorithms that simplify data, making it easier to analyze and work with, useful for energy-saving and deploying AI models on low-memory devices. He demonstrated that his approach can obtain results similar to those of convex algorithms in many usual settings. Why it matters: This research could broaden AI accessibility by reducing computational costs, and has potential applications in sectors like finance, particularly for portfolio management under budgetary constraints.