Learning curves on MuJoCo tasks with the mean (solid line) and standard deviation (shaded area) across 5 runs.

Learning curves on MuJoCo tasks with the mean (solid line) and standard deviation (shaded area) across 5 runs.

Source publication
Preprint
Full-text available
Distributional Reinforcement Learning (RL) maintains the entire probability distribution of the reward-to-go, i.e. the return, providing more learning signals that account for the uncertainty associated with policy performance, which may be beneficial for trading off exploration and exploitation and policy learning in general. Previous works in dis...

Context in source publication

Context 1
... amounts of variance displayed throughout training for both algorithms may be due to that they both involve adversarial training. As shown in Figure 3, however, our model outperforms the benchmark in all cases with distinct margins. We believe this is because WGAN does not take expectations across an amortised inference space that accounts for better generalisation. ...

Similar publications

Article
Full-text available
Curiosity pervades all aspects of human behaviour and decision-making. Recent research indicates that the value of information is determined by its propensity to reduce uncertainty, and the hedonic value of the outcomes it predicts. Previous findings also indicate a preference for options that are freely chosen, compared to equivalently valued alte...