Figure 4 - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
(Left) The learning curves for five algorithms on the Variant MDP. (Right) A plot depicting the average episode lengths during the learning process. Each curve is averaged over 20 random seeds, where the error bars represent one unit of standard error.
Source publication
We present ICU-Sepsis, an environment that can be used in benchmarks for evaluating reinforcement learning (RL) algorithms. Sepsis management is a complex task that has been an important topic in applied RL research in recent years. Therefore, MDPs that model sepsis management can serve as part of a benchmark to evaluate RL algorithms on a challeng...
Context in source publication
Context 1
... number of episodes and steps required for convergence and expected returns after convergence are shown in Table 5. Figure 4 shows the learning curves and average episode lengths for the five algorithms described in Section 5 when run on the Variant MDP, using the same methodology as explained in Section 5.1 for the experiments with ICU-Sepsis. Therefore, with respect to the second and third research questions, we see that the expected returns and average episode lengths in the learned policies are unusually high, which do not reflect the numbers seen in the dataset. ...