Figure 1 - uploaded by Luchen Li
Content may be subject to copyright.
Learning curves on Atari games with the mean (solid line) and standard deviation (shaded area) across 5 runs.
Source publication
Distributional Reinforcement Learning (RL) maintains the entire probability distribution of the reward-to-go, i.e. the return, providing more learning signals that account for the uncertainty associated with policy performance, which may be beneficial for trading off exploration and exploitation and policy learning in general. Previous works in dis...
Contexts in source publication
Context 1
... exploration is also used for the PPO+WGAN and PPO+QR baselines. Learning curves in Figure 1 suggest that with exploration mechanism fixed, the proposed Bayesian approach BDPG naive outperforms or is comparable to WGAN and QR in 6 out of 8 games. Morever, BDPG is always better or equal to BDPG naive, vindicating our exploration scheme, and is able to get the highest score among all tested algorithms in all four of the hard-exploration games tested on. ...
Context 2
... exploration is also used for the PPO+WGAN and PPO+QR baselines. Learning curves in Figure 1 suggest that with exploration mechanism fixed, the proposed Bayesian approach BDPG naive outperforms or is comparable to WGAN and QR in 6 out of 8 games. Morever, BDPG is always better or equal to BDPG naive, vindicating our exploration scheme, and is able to get the highest score among all tested algorithms in all four of the hard-exploration games tested on. ...
Similar publications
Although insects have long been part of the human diet in many countries, they are poorly received and accepted in European and North American countries. Therefore, this cross-sectional observational study, based on a structured questionnaire, aimed to evaluate the level of acceptability of entomophagy among young adults in a Swiss university conte...