Faisal Mohamed’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (1)


Figure 2: Average episode return (left) and surprise (right) versus environment interactions (average over 5 seeds, with one shaded standard deviation) in the Maze environment. S-Max and S-Adapt are the only objectives that allow the RL agents to consistently find the goal in the maze. These also cause the largest change in surprise when compared to the random agent.
Figure 4: Average episode return (left) and surprise (right) versus environment interactions (average over 5 seeds, with one shaded standard deviation) in Tetris. S-Min, S-Adapt, and the Extrinsic agent solve the game (i.e. consistently survive for more than 200 steps). Interestingly, the surpriseminimizing objective, which S-Adapt converges to, turns out to be a better learning signal than the row-clearing extrinsic reward in Tetris, as the learned policies are more stable and the average episodic surprise is the lowest.
Figure 5: Average episode return (left) and surprise (right) versus environment interactions (average over 5 seeds, with one shaded standard deviation) in the MinAtar suite of environments. The S-Adapt agent indeed demonstrates emergent behaviors in certain environments, such as Freeway where it achieves rewards on par with that of the Extrinsic agent. However, in certain environments, like Seaquest, Space Invaders and Asterix, the extrinsic reward is not closely correlated with entropy control, with the Random agent and the Extrinsic agent achieving similar entropy at the end of training
Figure 6: Average episode return versus environment interactions (average over 5 seeds, with one shaded standard deviation) in the Atari Freeway environment. The S-Adapt agent learns useful behaviours (making progress in the original task) from image-based observations. The Extrinsic agent achieves the highest returns as it exploits the task rewards, the S-Max agent achieves slightly lower returns than the S-Adapt agent, while the S-Min agent achieves zero returns.
Figure 7: Pixel-rendering of the small maze (left) and the large maze (right)

+1

Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning
  • Preprint
  • File available

May 2024

·

40 Reads

Adriana Hugessen

·

Roger Creus Castanyer

·

Faisal Mohamed

·

Both entropy-minimizing and entropy-maximizing (curiosity) objectives for unsupervised reinforcement learning (RL) have been shown to be effective in different environments, depending on the environment's level of natural entropy. However, neither method alone results in an agent that will consistently learn intelligent behavior across environments. In an effort to find a single entropy-based method that will encourage emergent behaviors in any environment, we propose an agent that can adapt its objective online, depending on the entropy conditions by framing the choice as a multi-armed bandit problem. We devise a novel intrinsic feedback signal for the bandit, which captures the agent's ability to control the entropy in its environment. We demonstrate that such agents can learn to control entropy and exhibit emergent behaviors in both high- and low-entropy regimes and can learn skillful behaviors in benchmark tasks. Videos of the trained agents and summarized findings can be found on our project page https://sites.google.com/view/surprise-adaptive-agents

Download