Learning curves on Atari games with the mean (solid line) and standard deviation (shaded area) across 5 runs.

Learning curves on Atari games with the mean (solid line) and standard deviation (shaded area) across 5 runs.

Source publication
Preprint
Full-text available
Distributional Reinforcement Learning (RL) maintains the entire probability distribution of the reward-to-go, i.e. the return, providing more learning signals that account for the uncertainty associated with policy performance, which may be beneficial for trading off exploration and exploitation and policy learning in general. Previous works in dis...

Context in source publication

Context 1
... exploration is also used for the PPO+WGAN and PPO+QR baselines. Learning curves in Figure 1 suggest that with exploration mechanism fixed, the proposed Bayesian approach BDPG naive outperforms or is comparable to WGAN and QR in 6 out of 8 games. Morever, BDPG is always better or equal to BDPG naive, vindicating our exploration scheme, and is able to get the highest score among all tested algorithms in all four of the hard-exploration games tested on. ...

Similar publications

Article
Full-text available
Curiosity pervades all aspects of human behaviour and decision-making. Recent research indicates that the value of information is determined by its propensity to reduce uncertainty, and the hedonic value of the outcomes it predicts. Previous findings also indicate a preference for options that are freely chosen, compared to equivalently valued alte...