Skip to content
GCC AI Research

Search

Results for "RLtools"

RLtools: Technology Innovation Institute and New York University Debut Novel Reinforcement Learning Library

TII ·

TII's Autonomous Robotics Research Center (ARRC) and NYU's Agile Robotics and Perception Lab have released RLtools, an open-source reinforcement learning library. RLtools achieves a 75x speed-up in training compared to existing libraries, enabling drone controller training on standard computers. It allows training on consumer-grade laptops or directly on microcontrollers, addressing resource efficiency and deployment challenges. Why it matters: This library accelerates the development and deployment of autonomous systems by reducing training time and resource requirements, making advanced AI more accessible.

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning

arXiv ·

Researchers introduce MATRIX, a vision-centric agent tuning framework for robust tool-use reasoning in VLMs. The framework includes M-TRACE, a dataset of 28.5K multimodal tasks with 177K verified trajectories, and Pref-X, a set of 11K automatically generated preference pairs. Experiments show MATRIX consistently outperforms open- and closed-source VLMs across three benchmarks.

Energy Pricing in P2P Energy Systems Using Reinforcement Learning

arXiv ·

This paper presents a reinforcement learning framework for optimizing energy pricing in peer-to-peer (P2P) energy systems. The framework aims to maximize the profit of all components in a microgrid, including consumers, prosumers, the service provider, and a community battery. Experimental results on the Pymgrid dataset demonstrate the approach's effectiveness in price optimization, considering the interests of different components and the impact of community battery capacity.

Reinforcement learning-based dynamic cleaning scheduling framework for solar energy system

arXiv ·

This study introduces a reinforcement learning (RL) framework using Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) to optimize the cleaning schedules of photovoltaic panels in arid regions. Applied to a case study in Abu Dhabi, the PPO-based framework demonstrated up to 13% cost savings compared to simulation optimization methods by dynamically adjusting cleaning intervals based on environmental conditions. The research highlights the potential of RL in enhancing the efficiency and reducing the operational costs of solar power generation.

Recent Advances in Deep Reinforcement Learning

MBZUAI ·

Keith Ross, Dean of Computer Science, Data Science and Engineering at NYU Shanghai, will be giving a talk on recent advances in Deep Reinforcement Learning (DRL). The talk will review DRL breakthroughs and discuss algorithmic research on DRL for high-dimensional state and action spaces, with applications to robotic locomotion. Ross's research interests include deep reinforcement learning, Internet privacy, peer-to-peer networking, and computer network modeling. Why it matters: Reinforcement learning is a core area of AI research in the GCC region, and a talk by a prominent researcher can help inform and inspire local researchers.

ILION: Deterministic Pre-Execution Safety Gates for Agentic AI Systems

arXiv ·

The paper introduces ILION, a deterministic execution gate designed to ensure the safety of autonomous AI agents by classifying proposed actions as either BLOCK or ALLOW. ILION uses a five-component cascade architecture that operates without statistical training, API dependencies, or labeled data. Evaluation against existing text-safety infrastructures demonstrates ILION's superior performance in preventing unauthorized actions, achieving an F1 score of 0.8515 with sub-millisecond latency.

Fine-tuning Text-to-Image Models: Reinforcement Learning and Reward Over-Optimization

MBZUAI ·

The article discusses research on fine-tuning text-to-image diffusion models, including reward function training, online reinforcement learning (RL) fine-tuning, and addressing reward over-optimization. A Text-Image Alignment Assessment (TIA2) benchmark is introduced to study reward over-optimization. TextNorm, a method for confidence calibration in reward models, is presented to reduce over-optimization risks. Why it matters: Improving the alignment and fidelity of text-to-image models is crucial for generating high-quality content, and addressing over-optimization enhances the reliability of these models in creative applications.