Figure 1 - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
Average length of evaluation runs (with í µí¼ = 0.05) on a 25x25 Gridworld with potential-based reward shaping where Φ(í µí± ) = í µí± * (í µí± ).
Source publication
Potential-based reward shaping is commonly used to incorporate prior knowledge of how to solve the task into reinforcement learning because it can formally guarantee policy invariance. As such, the optimal policy and the ordering of policies by their returns are not altered by potential-based reward shaping. In this work, we highlight the dependenc...
Context in source publication
Similar publications
Reinforcement learning (RL) systems can be complex and non-interpretable, making it challenging for non-AI experts to understand or intervene in their decisions. This is due in part to the sequential nature of RL in which actions are chosen because of their likelihood of obtaining future rewards. However, RL agents discard the qualitative features...
Multi-feed radial distribution systems are used to reduce the losses in the system using reconfiguration techniques. Reconfiguration can reduce the losses in the system only to a certain extent. Introduction of distributed generators has vastly improved the performance of distribution systems. Distributed generators can be used for reduction of los...
We address the problem of code generation from multi-turn execution feedback. Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards. We propose a simple yet scalable approach, $\mu$Code, that solves multi-turn code generation using only single-step rewards. Our key...
This work investigates the implementation of the Deep Deterministic Policy Gradient (DDPG) algorithm to enhance the target-reaching capability of the seven degree-of-freedom (7-DoF) Franka Pandarobotic arm. A simulated environment is established by employing OpenAI Gym, PyBullet, and Panda Gym. After 100,000 training time steps, the DDPG algorithm...
The application of deep reinforcement learning algorithms to economic battery dispatch problems has significantly increased recently. However, optimizing battery dispatch over long horizons can be challenging due to delayed rewards. In our experiments we observe poor performance of popular actor-critic algorithms when trained on yearly episodes wit...