Copy reference, caption or embed code
Figure 6 - ICU-Sepsis: A Benchmark MDP Built from Real Medical Data

Effects of removing some actions from the set of admissible actions on the learned policies as the probability of removing actions (σ) increases from 0 to 1. Each perturbation was done 32 times for each environment and the average and standard error of the results are shown. (a) The average return for different policies. (b) The average lengths of episodes for different policies.
Reference
Caption
Embed code