Skip to content
GCC AI Research

Search

Results for "Reinforcement Learning"

Recent Advances in Deep Reinforcement Learning

MBZUAI ·

Keith Ross, Dean of Computer Science, Data Science and Engineering at NYU Shanghai, will be giving a talk on recent advances in Deep Reinforcement Learning (DRL). The talk will review DRL breakthroughs and discuss algorithmic research on DRL for high-dimensional state and action spaces, with applications to robotic locomotion. Ross's research interests include deep reinforcement learning, Internet privacy, peer-to-peer networking, and computer network modeling. Why it matters: Reinforcement learning is a core area of AI research in the GCC region, and a talk by a prominent researcher can help inform and inspire local researchers.

Distillation Policy Optimization

arXiv ·

The paper introduces a novel actor-critic framework called Distillation Policy Optimization that combines on-policy and off-policy data for reinforcement learning. It incorporates variance reduction mechanisms like a unified advantage estimator (UAE) and a residual baseline. The empirical results demonstrate improved sample efficiency for on-policy algorithms, bridging the gap with off-policy methods.

What reinforcement learning can teach language models about reasoning

MBZUAI ·

MBZUAI researchers at the Institute of Foundation Models (IFM) investigated the role of reinforcement learning (RL) in improving reasoning abilities of language models. Their study found that RL acts as an 'elicitor' for reasoning in domains frequently encountered during pre-training (e.g., math, coding), while genuinely teaching new reasoning skills in underrepresented domains (e.g., logic, simulations). To support their analysis, they created a new dataset called GURU containing 92,000 examples across six domains. Why it matters: This research clarifies the impact of reinforcement learning on language model reasoning, paving the way for developing models with more generalizable reasoning abilities across diverse domains, an important direction for more capable AI systems.

Energy Pricing in P2P Energy Systems Using Reinforcement Learning

arXiv ·

This paper presents a reinforcement learning framework for optimizing energy pricing in peer-to-peer (P2P) energy systems. The framework aims to maximize the profit of all components in a microgrid, including consumers, prosumers, the service provider, and a community battery. Experimental results on the Pymgrid dataset demonstrate the approach's effectiveness in price optimization, considering the interests of different components and the impact of community battery capacity.

Fast Rates for Maximum Entropy Exploration

MBZUAI ·

This paper addresses exploration in reinforcement learning (RL) in unknown environments with sparse rewards, focusing on maximum entropy exploration. It introduces a game-theoretic algorithm for visitation entropy maximization with improved sample complexity of O(H^3S^2A/ε^2). For trajectory entropy, the paper presents an algorithm with O(poly(S, A, H)/ε) complexity, showing the statistical advantage of regularized MDPs for exploration. Why it matters: The research offers new techniques to reduce the sample complexity of RL, potentially enhancing the efficiency of AI agents in complex environments.

Fine-tuning Text-to-Image Models: Reinforcement Learning and Reward Over-Optimization

MBZUAI ·

The article discusses research on fine-tuning text-to-image diffusion models, including reward function training, online reinforcement learning (RL) fine-tuning, and addressing reward over-optimization. A Text-Image Alignment Assessment (TIA2) benchmark is introduced to study reward over-optimization. TextNorm, a method for confidence calibration in reward models, is presented to reduce over-optimization risks. Why it matters: Improving the alignment and fidelity of text-to-image models is crucial for generating high-quality content, and addressing over-optimization enhances the reliability of these models in creative applications.

Learning to Identify Critical States for Reinforcement Learning from Videos

arXiv ·

Researchers at KAUST have developed a new method called Deep State Identifier for extracting information from videos for reinforcement learning. The method learns to predict returns from video-encoded episodes and identifies critical states using mask-based sensitivity analysis. Experiments demonstrate the method's potential for understanding and improving agent behavior in DRL.

Learn to control

MBZUAI ·

Patrick van der Smagt, Director of AI Research at Volkswagen Group, discussed the use of generative machine learning models for predicting and controlling complex stochastic systems in robotics. The talk highlighted examples in robotics and beyond and addressed the challenges of achieving quality and trust in AI systems. He also mentioned his involvement in a European industry initiative on trust in AI and his membership in the AI Council of the State of Bavaria. Why it matters: Understanding control in robotics, along with trust in AI, are key issues for further development of autonomous systems, especially in industrial applications within the GCC region.