(Left) The learning curves for five algorithms on the ICU-Sepsis MDP. (Right) Average episode lengths during the learning process. Each curve is averaged over 1,000 random seeds, where the error bars represent one unit of standard error.

(Left) The learning curves for five algorithms on the ICU-Sepsis MDP. (Right) Average episode lengths during the learning process. Each curve is averaged over 1,000 random seeds, where the error bars represent one unit of standard error.

Source publication
Preprint
Full-text available
We present ICU-Sepsis, an environment that can be used in benchmarks for evaluating reinforcement learning (RL) algorithms. Sepsis management is a complex task that has been an important topic in applied RL research in recent years. Therefore, MDPs that model sepsis management can serve as part of a benchmark to evaluate RL algorithms on a challeng...

Contexts in source publication

Context 1
... the second research question, we observe that even after extensive parameter tuning, all of these algorithms take hundreds of thousands of episodes (i.e., millions of steps) to converge. The average episode lengths are shown in Figure 2 (Right), which are roughly in line with the episode lengths seen in the MIMIC-III dataset, where the episodes had 13.27 steps on average. ...
Context 2
... 11 lists the best hyperparameters for each method found during the random search. These hyperparameters were used in the experiments, with results shown in Figure 2. ...