Impact of bootstrap length k and truncation cap ¯ u for information gain at 10M and 150M steps into training.

Impact of bootstrap length k and truncation cap ¯ u for information gain at 10M and 150M steps into training.

Source publication
Preprint
Full-text available
Distributional Reinforcement Learning (RL) maintains the entire probability distribution of the reward-to-go, i.e. the return, providing more learning signals that account for the uncertainty associated with policy performance, which may be beneficial for trading off exploration and exploitation and policy learning in general. Previous works in dis...

Context in source publication

Context 1
... particular, k = 1 and ¯ u → ∞ are looked at as ablation cases. Average scores of the batch started at 10M and 150M steps into training are shown in Figure 2. Each coloured pixel corresponds to the best outcome with respect to η value among its selection sweep according to average long-term performance for each combination of k and ¯ u. ...

Similar publications

Article
Full-text available
Curiosity pervades all aspects of human behaviour and decision-making. Recent research indicates that the value of information is determined by its propensity to reduce uncertainty, and the hedonic value of the outcomes it predicts. Previous findings also indicate a preference for options that are freely chosen, compared to equivalently valued alte...