Figure 5 - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
Average episode return (left) and surprise (right) versus environment interactions (average over 5 seeds, with one shaded standard deviation) in the MinAtar suite of environments. The S-Adapt agent indeed demonstrates emergent behaviors in certain environments, such as Freeway where it achieves rewards on par with that of the Extrinsic agent. However, in certain environments, like Seaquest, Space Invaders and Asterix, the extrinsic reward is not closely correlated with entropy control, with the Random agent and the Extrinsic agent achieving similar entropy at the end of training
Source publication
Both entropy-minimizing and entropy-maximizing (curiosity) objectives for unsupervised reinforcement learning (RL) have been shown to be effective in different environments, depending on the environment's level of natural entropy. However, neither method alone results in an agent that will consistently learn intelligent behavior across environments...
Context in source publication
Context 1
... results have shown that the S-Adapt agent can successfully recreate the performance of the S-Min and the S-Max agents in their respective didactic environments. Next, we investigate controlling entropy across the MinAtar benchmark, shown in Figure 5. Notably, these environments were not constructed with any particular entropy regime in mind. ...
Similar publications
Deep Reinforcement Learning (DRL) is promising for multi-agent path planning problems in which sparse external environmental rewards may cause the agent group to make overly conservative decisions and explore the environment inefficiently. In general, the reward shaping mechanism is used to mitigate the above problems with the additional reward fun...