Figure 1 - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
Comparing performance of ATRPO and TRPO with different discount factors. The x-axis is the number of agent-environment interactions and the y-axis is the total return averaged over 10 seeds. The solid line represents the agents' performance on evaluation trajectories of maximum length 1,000 (top row) and 10,000 (bottom row). The shaded region represents one standard deviation.
Source publication
We develop theory and algorithms for average-reward on-policy Reinforcement Learning (RL). We first consider bounding the difference of the long-term average reward for two policies. We show that previous work based on the discounted return (Schulman et al., 2015; Achiam et al., 2017) results in a non-meaningful bound in the average-reward setting....
Contexts in source publication
Context 1
... plot the performance for ATRPO and TRPO trained with different discount factors in Figure 1. We see that TRPO with its best discount factor can perform as well as ATRPO for the simplest environment HalfCheetah. ...
Context 2
... 2 summarizes the hyperparameters used in our experiments. Figure 4 repeats the experiments presented in Figure 1 except discounted TRPO is trained in the standard MuJoCo setting without any resets (i.e. during training, when the agent falls, the trajectory terminates.) The maximum length of a TRPO training episode is 1000. ...
Context 3
... note that when TRPO is trained in the standard MuJoCo setting, ATRPO still outperforms discounted TRPO by a significant margin. In Figure 5 we plotted the performance of the best discount factor for each environment for TRPO trained with and without the reset scheme (i.e. the best performing TRPO curves from Figure 1 and Figure 4.) ATRPO is also plotted for comparison. ...
Similar publications
Deep Reinforcement Learning (DRL) is being used in many domains. One of the biggest advantages of DRL is that it enables the continuous improvement of a learning agent. Secondly, the DRL framework is robust and flexible enough to be applicable to problems of varying nature and domain. Presented work is evidence of using the DRL technique to solve a...
Citations
The global transition to renewable energy is crucial for mitigating climate change, but the increasing penetration of renewable sources introduces challenges such as uncertainty and intermittency. The electricity market plays a vital role in encouraging renewable generation while ensuring operational security and grid stability. This Review examines the optimization of market design for power systems with high renewable penetration. We explore recent innovations in renewable-dominated electricity market designs, summarizing key research questions and strategies. Special focus is given to multi-agent reinforcement learning (MARL) for market simulations, its performance and real-world applicability. We also review performance evaluation metrics and present a case study from the Horizon 2020 TradeRES project, exploring European electricity market design under 100% renewable penetration. Finally, we discuss unresolved issues and future research directions.