Figure 1 - uploaded by Luchen Li
Content may be subject to copyright.
Histograms on bootstrap samples of WIS return of test set. (a) when per-trajectory IS weights clipped to [1e − 30, 1e4]. (b) when per-trajectory IS weights clipped to [1e − 20, 1e3]

Histograms on bootstrap samples of WIS return of test set. (a) when per-trajectory IS weights clipped to [1e − 30, 1e4]. (b) when per-trajectory IS weights clipped to [1e − 20, 1e3]

Source publication
Preprint
Full-text available
Health-related data is noisy and stochastic in implying the true physiological states of patients, limiting information contained in single-moment observations for sequential clinical decision making. We model patient-clinician interactions as partially observable Markov decision processes (POMDPs) and optimize sequential treatment based on belief...

Similar publications

Preprint
Full-text available
Goal-conditioned Hierarchical Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large. Searching in a large goal space poses difficulty for both high-level subgoal generation...