Schematic and data flow in BoP.

Schematic and data flow in BoP.

Source publication
Preprint
Full-text available
Efficient exploration in complex environments remains a major challenge for reinforcement learning (RL). Compared to previous Thompson sampling-inspired mechanisms that enable temporally extended exploration, i.e., deep exploration, we focus on deep exploration in distributional RL. We develop here a general purpose approach, Bag of Policies (BoP),...

Context in source publication

Context 1
... the following we lay out the structure, theory and variants of BoP. A schematic can be found in Fig. 1 and for the pseudocode please refer to Algorithm ...

Similar publications

Article
Full-text available
柳澤秀吉,感情力学と探究サイクル(興味と好奇心の数学原理),設計工学,Vol. 58, No. 11, 2023.
Chapter
Full-text available
As artificial intelligence (AI) plays a more prominent role in our everyday lives, it becomes increasingly important to introduce basic AI concepts to K-12 students. To help do this, we combined physical robots and an augmented reality (AR) software to help students learn some of the fundamental concepts of reinforcement learning (RL). We chose RL...
Preprint
Full-text available
Both entropy-minimizing and entropy-maximizing (curiosity) objectives for unsupervised reinforcement learning (RL) have been shown to be effective in different environments, depending on the environment's level of natural entropy. However, neither method alone results in an agent that will consistently learn intelligent behavior across environments...
Conference Paper
Full-text available
This paper shows how a tool that explores future possibilities,ReadySet- Future_, helped a major automotive maker understand how shifts in consumer values may impact the features and use cases that surprise and delight future vehicle consumers in the year 2033. It contrasts two common mindsets when thinking about the future: a there is no alternati...