ArticlePDF Available

Abstract and Figures

Sepsis is a systemic inflammatory state due to an infection, and is associated with very high mortality and morbidity. Early diagnosis and prompt antibiotic and supportive therapy is associated with improved outcomes. Our objective was to detect the presence of sepsis soon after the patient visits the emergency department. We used Dynamic Bayesian Networks, a temporal probabilistic technique to model a system whose state changes over time. We built, trained and tested the model using data of 3,100 patients admitted to the emergency department, and measured the accuracy of detecting sepsis using data collected within the first 3 hours, 6 hours, 12 hours and 24 hours after admission. The area under the curve was 0.911, 0.915, 0.937 and 0.944 respectively. We describe the data, data preparation techniques, model, results, various statistical measures and the limitations of our experiments. We also briefly discuss techniques to improve accuracy, and the generalizability of our methods to other diseases.
Content may be subject to copyright.
Early Detection of Sepsis in the Emergency Department
using Dynamic Bayesian Networks
Senthil K. Nachimuthu, MD, PhD
1
, Peter J. Haug, MD
2,3
1
Division of Epidemiology, Department of Internal Medicine, University of Utah;
2
Department
of Biomedical Informatics, University of Utah;
3
Intermountain Healthcare;
Salt Lake City, UT
Abstract
Sepsis is a systemic inflammatory state due to an infection, and is associated with very high mortality and morbidity.
Early diagnosis and prompt antibiotic and supportive therapy is associated with improved outcomes. Our objective
was to detect the presence of sepsis soon after the patient visits the emergency department. We used Dynamic
Bayesian Networks, a temporal probabilistic technique to model a system whose state changes over time. We built,
trained and tested the model using data of 3,100 patients admitted to the emergency department, and measured the
accuracy of detecting sepsis using data collected within the first 3 hours, 6 hours, 12 hours and 24 hours after
admission. The area under the curve was 0.911, 0.915, 0.937 and 0.944 respectively. We describe the data, data
preparation techniques, model, results, various statistical measures and the limitations of our experiments. We also
briefly discuss techniques to improve accuracy, and the generalizability of our methods to other diseases.
Corresponding author: Senthil K. Nachimuthu. Email: senthil.nachimuthu@hsc.utah.edu
INTRODUCTION
Sepsis is a heightened systemic immune response state due to an infection. It is defined as a combination of
Systemic Inflammatory Response Syndrome (SIRS), and a confirmed or suspected infection, usually caused by
bacteria. The infection may be localized to a part of the body such as lungs, skin, urinary tract, bone, or it may be
generalized. Sepsis may occur whether the infection stays localized or spreads to other parts of the body. Bacteremia
(presence of bacteria in blood) does not by itself denote sepsis in the absence of a systemic inflammatory response.
Untreated or inadequately treated cases of sepsis can lead to condition known as severe sepsis or septic shock, which
are associated with high mortality and morbidity. Early diagnosis of sepsis is essential for successful treatment.
Hence, we developed a Dynamic Bayesian Network (DBN) for the early detection of sepsis at the bedside in the
emergency department. DBN is a generalization of Bayesian Networks (BN) and Hidden Markov Models (HMM),
where the state transitions in the HMM are expressed using complex probabilistic interactions as in a BN[1]. DBN is
a probabilistic technique where the input and output variables need not be predetermined and fixed, and algorithms
are available to handle missing data, in addition to modeling temporal relationships using Markov properties. We
included data from only the first 24 hours after admission for the test cases, and included only those variables that
can be observed at the bedside or can be measured easily in the laboratory within this time. Our goal was to detect
the presence or absence of sepsis within 24 hours after admission when the bacterial culture results are often
unavailable. We have previously described a preliminary study about detecting sepsis using DBN[2]. Use of DBN to
model organ failure in patients with sepsis has been described by Peelen et al.[3]
BACKGROUND
The high mortality of sepsis warrants early diagnosis and treatment. Sepsis is responsible for nearly 10% of the ICU
admissions in the United States, totaling about 1 million cases nationwide every year[4]. The incidence rate of
severe sepsis in the United States is about 300 per 100,000 persons per year, with a total of 750,000 cases
nationwide per year. Direct costs per sepsis patient for ICU treatment in the United States have been estimated at
more than $40,000. Gram negative bacteria have been implicated as the most common cause, followed by other
bacteria and other pathogens.
The following definitions are from the American College of Chest Physicians (ACCP) and Society of Critical Care
Medicine (SCCM) Consensus Conference held in 1991 to define common definitions for sepsis and related
disorders, and were published in 1992[5]. Sepsis is defined as a systemic inflammatory reaction in response to an
infection. In addition to SIRS, infection must be present or suspected to confirm a diagnosis of sepsis[5]. SIRS alone
is not sufficient to confirm a diagnosis of sepsis, since SIRS can be caused due to non-infectious causes such as
pulmonary embolism, adrenal insufficiency, anaphylaxis, pancreatitis, trauma, etc[5].
653
In adults, Systemic Inflammatory Response Syndrome (SIRS) is defined as the presence of two or more of the
following[5]:
1. Body temperature below 36° C (degrees Celsius) or above 38° C.
2. Tachycardia, with heart rate above 90 beats per minute.
3. Tachypea (increased respiratory rate), with respiratory rate above 20 per minute, or arterial partial
pressure of carbon dioxide (PaCO
2
) less than 4.3 kPa (kilo Pascals), equivalent to 32 mmHg (millimeters of
mercury).
4. White blood cell (WBC) count less than 4,000/mm
3
(cubic millimeter) or above 12,000/mm
3
, or the
presence of more than 10% immature neutrophils (band forms).
When sepsis causes Multiple Organ Dysfunction Syndrome (MODS), such as damage to vital organs, decreased
perfusion, or hypotension, it is termed severe sepsis. Sepsis-induced hypotension is defined as a systolic pressure
below 90 mmHg or a reduction in the baseline systolic blood pressure of more than 40 mmHg, in the absence of
other causes of hypotension[5]. Sepsis can lead to a condition known as septic shock, which is indicated by
hypotension (fall in blood pressure) that is not responsive to fluid replacement or vasopressor drugs[5].
Sepsis is a rapidly worsening clinical condition. Given the fast rate of change in the physiological parameters, the
change in the clinical condition of sepsis patients lends itself well to a temporal probabilistic model such as a
Dynamic Bayesian Network. Our objective was an early detection of sepsis even before many laboratory tests
become available, ideally within the first few hours after admission.
MATERIALS AND METHODS
At LDS Hospital (LDSH) and Intermountain Medical Center (IMC), two tertiary care hospitals of Intermountain
Healthcare in Salt Lake City, Utah, USA, the prevalence of sepsis in patients who directly present at the emergency
department is between 1.7% and 2%. Clinical literature shows that patients with sepsis will have high mortality and
morbidity if they are not treated immediately and aggressively. However, a confirmatory laboratory test for
infections may take several hours to arrive, since culture and susceptibility tests cannot be performed immediately.
Many patients have atypical presentations, and may not have a clear picture of SIRS. To assist the clinicians in
detection of sepsis, a clinical decision support system for early detection of sepsis is highly desirable. Sepsis
presents a very good case for early detection using clinical decision support systems since the components of SIRS
are easily measured at the bedside, or in the case of WBC and band counts, can be obtained in a short amount of
time from the laboratory.
We wanted to use a temporal probabilistic model for the early detection of sepsis, and try to understand how the
accuracy of the inference changes over time as more data become available. We used the Projeny toolkit[6] that we
developed to build, train and test the Dynamic Bayesian Network models. We prepared our data using a sequence of
steps and used the resulting data sets to train and test DBN models for the early detection of sepsis in the emergency
department.
The Data Set
We obtained a data set of about 3,100 patients treated at Intermountain Healthcare, consisting of 20% cases (patients
who had sepsis) and 80% controls (patients without sepsis). We used the anonymized data set for our sepsis
detection modeling. The data elements available in the raw data set were the patients' vital signs (heart rate,
respiratory rate, body temperature, systolic blood pressure, diastolic blood pressure, and PaCO
2
); the patients' lab
test results (WBC count, bands percentage); and general encounter information (patient's age, date of admission,
date of discharge, etc). The data set also contained a variable named Sepsis, which was entered by a clinician
during a retrospective review done for clinical research. Mean blood pressure or mean arterial pressure is the
weighted average of the systolic and diastolic pressure. It is calculated using the formula
MAP = DP + (SP-DP)/3
where MAP denotes mean arterial pressure, SP denotes systolic pressure and DP represents diastolic pressure. This
formula is applicable to adult patients when the blood pressure is not extremely high or low. The variables included
in our model and their probabilistic relationships within and across timeslices are shown in figure 1.
654
The data set did not have information about blood (or other specimen) culture and sensitivity results from the
microbiology laboratory, information signifying multi-organ dysfunction syndrome (MODS), or treatment
information such as the administration of IV fluids and vasopressors, which help with a diagnosis of septic shock.
Hence, we did not have the necessary clinical variables for diagnosing severe sepsis or septic shock.
We did not have clinical information denoting suspected or confirmed infection (culture and sensitivity results,
clinical notes, etc.). However, we wanted to model a sepsis detection system with the currently available data. Our
objective was to detect signs and symptoms to indicate impending or existing sepsis, which in turn can trigger
treatment to prevent or treat sepsis, rather than confirming the presence of sepsis.
Vital signs were the most numerous type of data in our data set, and they were often measured between 1 to 2
intervals, even though some measurements were up to 20 hours apart. Hence, we used 1-hour as the width of the
timeslices in our DBN models to help with early detection of sepsis within a few hours after admission. The lab tests
were not measured at such frequent intervals. Not all vital signs were measured at 1-hour intervals. Hence, we had a
large amount of missing data in our temporally aggregated and transformed data set. Our data preparation, data
aggregation, temporal data abstraction and data discretization techniques are presented in detail in a separate
publication[7], and a brief description is provided below.
Data Preparation
The accuracy of inferences performed by a machine learning system depends on the quality of data provided to the
system. The input data determine the accuracy of the model and the technique, because many machine learning
systems learn the structure of the model, the parameters or both from the training data. Machine learning systems
such as Dynamic Bayesian Networks cannot directly use the raw data obtained from an electronic medical record
system. The data need to be preprocessed and transformed into an appropriate format before they can be used by a
Dynamic Bayesian Network-based model or system.
Probabilistic machine learning models require data in a continuous or discrete format. They cannot use unstructured
or free-text data. The algorithms we used require discrete data. Clinical data from various sources need to be
compiled together and transformed into a time-stamped, discretized format with a common structure for use by
automated tools.
At Intermountain Healthcare's LDS Hospital in Salt Lake City, the data are captured and stored in a well-structured
form using an information model and an enterprise reference terminology. The original electronic medical record
system, HELP (Health Evaluation through Logical Processing) encodes the data using a hierarchical data dictionary
known as PTXT (Pointer to text), and stores the data in the HELP database mostly in encoded and partly in free-text
forms[8]. Intermountain Healthcare also has a newer electronic medical record system known as HELP2, which uses
a multihierarchical concept-based Healthcare Data Dictionary (HDD). The HELP2 data are stored in a Clinical Data
Repository (CDR). HDD and CDR were developed in collaboration with 3M Health Information Systems, Inc.[9]
The data from HELP and HELP2 are highly structured and encoded using biomedical terminologies, and are usable
for clinical documentation as well as decision support. However, all the data required for the temporal probabilistic
models needed to be aggregated and abstracted before they could be used by the temporal reasoning tools.
Data Aggregation
The raw clinical data was available in an entity-time-attribute-value table format. Each row in the laboratory or vital
signs table had columns identifying the patient and the date/time-stamp of the observation. The tables had additional
columns that specified the data element and the value of the data element. The name, attribute or the value of the
observation may be contained in a single column each, or spread across a group of columns. If the observation is a
simple data element such as heart rate, it may be present in a single column. In cases where the data convey complex
information such as blood pressure, which includes the systolic pressure, diastolic pressure, body site, posture of the
patient, and the device used, this information would need to be post-coordinated from multiple pieces of data.
A denormalized table format was required to support the temporal reasoning tools used in our experiments. The
denormalized table had two of its columns representing the patient identifier and the date/time identifier respectively.
However, the remaining columns were not in the attribute-value format as in the source data tables. There were
multiple additional columns instead, each representing a single, meaningful reconstituted clinical variable.
Different clinical variables were measured with different frequency and periodicity in the clinical setting. For
example, vital signs were measured once every fifteen minutes to an hour. Lab tests were performed less often. Data
655
elements that were measured together did not have the same date/time-stamp in some cases. They were often 1 to 15
minutes apart. Hence, storing them in the denormalized table produced several rows where only a handful of
columns were populated. For each patient, we first loaded the timestamp and the values of the most numerous
clinical variable into the denormalized table. We then selected the second most numerous clinical variable. If the
patient identifier and timestamp of a given row of this second clinical variable existed in the denormalized table, we
updated the row in the denormalized table to store the value of this second clinical variable in its own column. If the
combination of the patient identifier and the timestamp did not exist, then a new row was inserted into the
denormalized table with this value. This process was repeated for all clinical variables in the data set.
Temporal Data Abstraction
At the end of the data aggregation step, all the data reflecting the clinical variables in the model were stored in a
single denormalized relational database table. The data present in this table were used to train and test the model.
However, the data rows differed by a few minutes to a few hours, and produced a very sparse data table. A very
sparse data table when used for training necessitates the use of the expectation maximization (EM) algorithm, which
increases the computational expense of the model while reducing the accuracy of the learned parameters. However,
the data can be temporally consolidated to pick one representative data point per time interval for the smallest time
interval represented in the model, which will reduce the need for imputing missing values using the EM algorithm.
The smallest time interval to be supported by the model is based on both the nature of the model and the availability
of data. In the case of the sepsis data set, which consisted of both cases and controls from the emergency department,
the model consisted mostly of vital signs, which were available once every hour in most cases. So, a timeslice
interval of 1-hour was chosen for the sepsis models, and a representative data point was chosen for each clinical
variable during every 1-hour interval.
In the sepsis data set, we encountered both multiple instances and no instances of various clinical variables observed
during each timeslice interval. Missing data can mean a variety of things: the data were not measured, measured and
then lost, or they were uneventful and in line with the expected values given the prior measurements, and hence not
recorded in this case. It may also mean that a value was measured on paper or was stored in a different part of the
electronic medical record system, and hence unavailable at the time of data preparation. Several approaches have
been discussed to define and overcome the missing data problem. Little and Rubin classify reasons for missing data
as missing but completely at random (MCAR), missing at random (MAR), and not missing at random (NMAR)[10].
We did not apply special treatment for missing data in our experiments. Bayes Net Toolbox (BNT), which
implemented all the algorithms we needed to train and test the models, supported parameter learning with missing
values using the Expectation Maximization (EM) algorithm. Hence, we were able to leave missing values as null
values in the database, and we designed our temporal modeling toolkit, Projeny, to support null values in the data set
and to call the EM algorithm in BNT.
Approaches to choosing the representative data point for a variable if multiple data points are available in a temporal
data set are discussed by various authors under the context of temporal data clustering[11,12][13]. Some simple
approaches for temporal data sampling include selecting the average, or selecting the most abnormal measurement.
We decided to select the average, since this process can be automatically applied for all numerical variables.
At the end of this process, we had a temporally abstracted denormalized table, with one data point or a null value for
each variable per timeslice per patient. The numerical data were continuous in this data set. The data can be used in
this form with models and algorithms that support continuous data. However, we designed our models with entirely
discrete variables, since models with discrete variables are computationally less expensive and more tractable than
models with continuous variables. Furthermore, only discrete variables were supported by our toolkits. Hence, we
chose to discretize the continuous variables in our data set.
Data Discretization
Several data discretization methods are available to discretize continuous data for use with machine learning
algorithms. It is helpful to evaluate the distribution of the continuous variables before we choose the discretization
technique and any manually selected cut-off points. We used histograms and cluster analysis to find clusters and
study the distribution of each continuous variable individually. Visually discernible clustering was not found for the
continuous variables, and many continuous variables in our data set formed one large cluster each with few outliers.
We tried equal interval, domain knowledge based, k-means clustering and minimum description length (MDL)
discretization techniques. MDL algorithm provided the most accurate results, and is described here. The lessons we
656
learned from other discretization algorithms, and how choosing the correct algorithm can drastically affect the
accuracy of a model are described in a separate publication[7].
The minimum description length (MDL) algorithm described by Fayyad and Irani (1993) finds the minimum
number of clusters of the input variable required to describe the variation in the output variable[14]. All the
relationships in our models were directional; hence describing the variation in one variable using variation in one or
more variables is straightforward. The variations in the child nodes can be explained in terms of variation in parent
nodes. Hence, the MDL algorithm seemed to be appropriate for discretizing the variables in our models. The model
works by sorting and then splitting the distribution of input values at specific cut-off points that reduce class entropy
of the resulting classes[14]. We used the MDL algorithm implemented in Weka[15] to discretize our data.
We found that MDL discretization produced a much smaller number of discrete states for most of the continuous
variables in the sepsis model. Using the MDL algorithm also reduced the time taken for training and testing the
model, and produced higher accuracy than other discretization techniques.
Model Structure, Training and Testing
A Dynamic Bayesian Network (DBN), also known as a Temporal Bayesian Network, allows complex causal
relationships within and across time instances to be represented as a directed acyclic graph (DAG) with Bayesian
probabilities[1]. A DBN may be considered as a generalization of a Hidden Markov Model (HMM), in which the
probabilistic relationships are represented using complex interactions and dependencies between many variables as a
Bayesian Network. The temporal processes in the model are designed as time invariant Markov processes. This
reduces the computational complexity and the number of parameters required to describe the model, since the
conditional probabilities learned from a two timeslice DBN can be used to infer the variables in test cases of
arbitrary lengths. A detailed description of Hidden Markov Models is provided in [16].
We created a two timeslice Dynamic Bayesian Network using the nodes and edges shown in figure 1. We created
several models, and the only latest and the most accurate model is described in this article. The model was built
using the Projeny toolkit that we have created. Projeny is a front-end application written in Java, and it provides a
user-friendly graphical environment to create, train and test Dynamic Bayesian Network models[6]. It also allows
easy data binding between the nodes in the model and various columns in a relational database table. Projeny allows
the user to call the Bayes Net Toolbox (BNT)[17], a DBN algorithm implementation running inside Matlab, for both
training and testing. Projeny then saves the results from the inference algorithm in a relational database table.
Projeny is based on the source code of Bayesian Network tools in Java (BNJ)[18], and it uses the JMatLink
library[19] to communicate with Matlab. Projeny is released as open source software under the GNU GPL v2
license[20].
Age[t-1]
Sepsis[t-1]
DBP[t-1]
SBP[t-1]
HeartRate[t-1]
BodyTemp[t-1]
WBCCount[t-1]
Bands[t-1]
RespRate[t-1]
PaCO2[t-1]
Age[t]
Sepsis[t]
DBP[t]
SBP[t]
HeartRate[t]
BodyTemp[t]
WBCCount[t]
Bands[t]
RespRate[t]
PaCO2[t]
Figure 1. Sepsis modeled as a two timeslice Dynamic Bayesian Network
657
We first created the nodes of the two timeslice DBN model, and then we defined the states of the nodes as described
earlier under the data discretization section. The variables included in our model (and their names used in figure 1)
are clinicians’ diagnosis of sepsis (Sepsis), patient’s age (Age), systolic blood pressure (SBP), diastolic blood
pressure (DBP), heart rate (HeartRate), respiratory rate (RespRate), body temperature (BodyTemp), WBC count
(WBCCount), PaCO
2
(PaCO2) and percentage of immature neutrophils (Bands). We then created the intra-slice
(atemporal, shown in blue) and inter-slice (temporal, shown in green) edges (probabilistic relationships), as shown
in figure 1. When we train and test the model, BNT automatically unrolls (expands) the model to as many
timeslices as are present in the data of each patient, and learns the probabilities or infers the marginal probability
distributions. BNT uses the EM algorithm[21] to learn the conditional probability tables in spite of missing data.
In this model, age and sepsis are d-separated by the nodes that represent systolic blood pressure, diastolic blood
pressure, heart rate, respiratory rate and WBC count. Age and sepsis together explain the variation in these
physiological parameters in the model. If none of these five physiological parameters in the model are known, then
age and sepsis are mathematically conditionally independent. [1]
The discretized data set was divided into training data set and test data set. Two-thirds of anonymized patients were
allocated to the training data set at random, and the remaining one-third of the patients were allocated to the test data
set. Training was performed using an EM-based parameter learning algorithm implemented in BNT. Training was
performed with a data set having a maximum of 168 timeslices (approximately 7 days since admission) for each
patient. Training took about 9 hours, on the same computer with two quad-core 2.25GHz Intel Xeon processors,
24GB of RAM and 32GB of swap space. Training completed successfully in less than 9 hours, with less than 7GB
of memory utilization by Matlab.
The value of sepsis was replaced with null values for all the timeslices for all the patients in the test data set. Our
goal was to repeat testing with data sets having different number of timeslices, so that we can simulate testing after
the patient has been in the hospital for increasing durations of time. The test data set was then divided into four
separate data sets having up to 3, up to 6, up to 12 and up to 24 timeslices for each patient. Testing was performed
using a junction-tree algorithm[22], with test data sets having a maximum of 3, 6, 12 and 24 timeslices, since our
goal was to detect sepsis within 24 hours after admission. These timeslices correspond to approximately 2, 5, 11and
23 hours after admission respectively, since the first timeslice was measured when the patient arrived at the
emergency department at time t = 0 hours after admission.
RESULTS
Sepsis is a binary variable in our model, with values of true and false. The goal of our DBN model was to detect
signs and symptoms indicating impending or existing sepsis, to trigger interventions to prevent or treat sepsis.
Hence, the presence of sepsis as entered by the clinician provided a reference standard against which our DBN
model's inferences were compared. This makes our model similar to laboratory tests that are performed to detect
specific diseases.
Given the similarity of our sepsis detection models to laboratory tests that detect a disease, our DBN models were
evaluated using the same evaluation techniques applied to laboratory tests. We decided to perform statistical
analysis of our model’s inferences in terms of sensitivity, specificity, positive predictive value, negative predictive
value, F-value and area under the ROC (receiver operator characteristic) curve.
The clinician-entered values of sepsis, also considered as the reference standard or disease, was left intact in the
training data set. The clinician-entered values of sepsis in the test data set were hidden from the inference algorithm
during the test iteration, and were later used to evaluate the performance of the predicted values. All other variables
were left intact in both the training and test data sets. The state of sepsis estimated by the DBN model was
considered analogous to a lab test finding. If the probability of sepsis estimated by the DBN model was equal to or
above 0.5, it was considered as a positive test. If the estimated probability of sepsis was below 0.5, it was considered
a negative test.
A 2x2 confusion matrix and standard epidemiologic techniques could then be applied to calculate the sensitivity,
specificity, positive predictive value, negative predictive value, and the F-measure. The ROC curve was constructed
and the area under the ROC curve were calculated using a procedure described by Morrison using Microsoft
Excel[23]. A confusion matrix was plotted with the actual values of sepsis considered as the disease and the
estimated values of sepsis considered as the test. The confusion matrices are shown in the table 1 (sub-tables 1.a.
through 1.d.). The area under the ROC curve (AUC) obtained from the model using test data sets with 3, 6, 12 and
24 timeslices is shown in figure 2 (sub-figures 2.a. through 2.d.).
658
Table 1. Confusion matrices for detecting sepsis using test data sets with 3, 6, 12 and 24 timeslices.
Table 1.a. Three timeslices
Sepsis
Actual
Yes
Actual
No
Total
Estimated
Yes
113
45
158
Estimated
No
51
834
885
Total
164
879
1043
Table 1.b. Six timeslices
Sepsis
Actual
Yes
Actual
No
Total
Estimated
Yes
116
44
160
Estimated
No
48
835
883
Total
164
879
1043
Table 1.c. Twelve timeslices
Sepsis
Actual
Yes
Actual
No
Total
Estimated
Yes
134
45
179
Estimated
No
30
834
864
Total
164
879
1043
Table 1.d. Twenty four timeslices
Sepsis
Actual
Yes
Actual
No
Total
Estimated
Yes
141
48
189
Estimated
No
23
831
854
Total
164
879
1043
Figure 2. Area under the ROC curve (AUC) obtained using test data sets with 3, 6, 12 and 24 timeslices
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Sensitivity (TPR)
1 - Specificity (FPR)
AUC = 0.91102
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Sensitivity (TPR)
1 - Specificity (FPR)
AUC = 0.91499
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Sensitivity (TPR)
1 - Specificity (FPR)
AUC = 0.93362
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Sensitivity (TPR)
1 - Specificity (FPR)
AUC = 0.94353
659
The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F-measure and the
area under the ROC curve (AUC) are given in table 2.
Table 2. Comparison of statistical measures of models using test data sets with 3, 6, 12 and 24 timeslices.
3 timeslices
6 timeslices
12 timeslices
24 timeslices
Sensitivity (recall)
0.68902
0.70732
0.81707
0.85976
Specificity
0.94881
0.94994
0.94881
0.94539
PPV (precision)
0.71519
0.72500
0.74860
0.74603
NPV
0.94237
0.94564
0.96528
0.97307
F-measure
0.70186
0.71605
0.78134
0.79887
AUC
0.91102
0.91499
0.93362
0.94353
Figure 3 shows how the statistical measures of sensitivity, specificity, PPV, NPV, F-measure and AUC increase as
the number of timeslices in the test data set increases.
Figure 3. Comparison of statistical measures using test data sets with 3, 6, 12 and 24 timeslices.
Discussion
The shortest number of timeslices supported by our algorithms is three. A three timeslice model includes data
collected at approximately 0, 1 and 2 hours after admission in the emergency department. Hence, we tried to detect
the presence of sepsis in less than 3 hours (approximately two hours, plus or minus a few minutes) in the first
experiment. The three timeslice model shows a sensitivity of 0.69, a specificity of 0.95, positive predictive value of
0.72, negative predictive value of 0.94 and an area under the curve of 0.91 while testing with three timeslices.
From these results, we can demonstrate that DBN methods can be used to successfully detect sepsis in patients in the
emergency department within two hours of admission. The model can detect sepsis with variables that are mostly
collected at the bedside, and WBC count and bands percentage, which are easily obtained from the lab in a short
duration of time. These measures again show that the model is more specific than it is sensitive.
However, the sensitivity increases further as more data become available, as can be seen from the 6, 12 and 24
timeslice models. Other statistical measures such as area under the ROC curve, specificity, PPV, NPV and F-
measure also increase as more data become available for a patient.
0.5
0.6
0.7
0.8
0.9
1.0
TS3
TS6
TS12
TS24
Sensitivity (recall)
Specivity
PPV (precision)
NPV
F-measure
AUC
660
It must be noted that the prior probability of sepsis in the emergency departments of Intermountain Healthcare was
about 2%. But our data set was enriched to have a prior probability of 20% and our data set consisted of 20% cases
and 80% controls. The real world prior probability of sepsis (0.02) is only 1/10 of the prior probability in our data
set (0.2). Hence, the PPV and NPV of the model will change in the real world. Hence, further testing and validation
needs to be done before this model may be used in an emergency department to detect sepsis.
A majority of probabilistic temporal reasoning experiments in the biomedical domain use Hidden Markov Models
which in turn use equal-interval and equal-frequency data preparation techniques. We have shown that by using
information content based methods of data preparation such as the MDL algorithm which analyzes the variation in
the dependent variable to discretize the independent variable, we can obtain smaller but more meaningful state-
spaces. This is a novel application and finding in the field of medicine, where probabilistic temporal reasoning
methods have not been used extensively.
Our experiments showed that both data preparation and model structure affect the accuracy as well as the
computational complexity of the model. We also found that simple models that are designed to be intuitive for
human experts understanding such as the SIRS model may not be computationally efficient or accurate for
probabilistic modeling[7]. Models that take into account the complex conditional inter-dependencies and reflect
them accurately, while defining the state-space in a meaningful way, prove to be more accurate in probabilistic
learning and inference[7].
CONCLUSIONS AND FURTHER RESEARCH
An overview of temporal reasoning techniques, detailed descriptions of the data preparation and discretization
techniques, and detailed descriptions of multiple sepsis modeling experiments and glucose homeostasis (insulin
dosing) models are described in the author’s doctoral dissertation, along with a description of factors influencing the
computational complexity and accuracy of these models[7].
Our Dynamic Bayesian Network methods and the Projeny toolkit were built to be generalizable. These methods
were successfully tested in estimating serum glucose and recommending insulin drip rates for patients in the
intensive care unit. The results were validated by comparing them to the estimations of the computerized rule-based
protocol currently in use. These results were described in a previous publication[24].
We intend to perform further experiments with data sets with real-world prior probability, variables that denote a
suspected or confirmed infection, and differential diagnoses of sepsis or SIRS. We also intend to include clinical
variables that are necessary to estimate the presence of severe sepsis or septic shock. We also hope to build and test
models that predict future probability of sepsis rather than estimating the current probability. Such prognostic
methods can help to provide treatment that can reduce or avert severe mortality and morbidity.
Acknowledgment
The authors thank Dr. Jason Jones at Intermountain Healthcare, Salt Lake City, UT for his help with the sepsis data
set.
References
[1] Murphy KP. Dynamic Bayesian Networks: representation, inference and learning. PhD thesis.
U.C.Berkeley; 2002.
[2] Wong A, Nachimuthu SK, Haug PJ. Predicting Sepsis in the ICU using Dynamic Bayesian Networks.
AMIA Annu Symp Proc. 2009.
[3] Peelen L, de Keizer NF, Jonge Ed, Bosman RJ, Abu-Hanna A, Peek N. Using hierarchical Dynamic
Bayesian Networks to investigate dynamics of organ failure in patients in the Intensive Care Unit. J
Biomed Inform. 2010 Apr;43(2):273286.
[4] Dremsizov TT, Kellum JA, Angus DC. Incidence and definition of sepsis and associated organ
dysfunction. Int J Artif Organs. 2004 May;27(5):352359.
[5] Bone RC, Balk RA, Cerra FB, Dellinger RP, Fein AM, Knaus WA, et al. Definitions for sepsis and organ
failure and guidelines for the use of innovative therapies in sepsis. The ACCP/SCCM Consensus
661
Conference Committee. American College of Chest Physicians/Society of Critical Care Medicine. Chest.
1992 Jun;101(6):16441655.
[6] Nachimuthu SK. Projeny: An Open Source Toolkit for Probabilistic Temporal Reasoning. AMIA Annu
Symp Proc. 2009.
[7] Nachimuthu SK. Temporal Reasoning in Medicine using Dynamic Bayesian Networks. PhD thesis.
University of Utah; 2012.
[8] Pryor TA, Gardner RM, Clayton PD, Warner HR. The HELP system. J Med Syst. 1983 Apr;7(2):87102.
[9] Clayton PD, Narus SP, Huff SM, Pryor TA, Haug PJ, Larkin T, et al. Building a comprehensive clinical
information system from components. The approach at Intermountain Health Care. Methods Inf Med.
2003;42(1):17.
[10] Little RJA, Rubin DB. Statistical analysis with missing data. Wiley New York; 1987.
[11] Mitsa T. Temporal Data Mining. Chapman & Hall/CRC data mining and knowledge discovery series.
Chapman & Hall/CRC; 2009.
[12] Warren Liao T. Clustering of time series dataa survey. Pattern Recognition. 2005;38(11):18571874.
[13] Bar-Joseph Z. Analyzing time series gene expression data. Bioinformatics. 2004;20(16):2493.
[14] Fayyad UM, Irani KB. Multi-interval discretization of continuous-valued attributes for classification
learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence. Morgan-
Kaufmann; 1993. p. 10221027.
[15] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an
update. ACM SIGKDD Explorations Newsletter. 2009;11(1):1018.
[16] Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Readings
in speech recognition. 1990;53(3):267296.
[17] Murphy KP. BNT - Bayes Net Toolbox. Cited on March 13, 2009. Available from:
http://people.cs.ubc.ca/~murphyk/Software/ BNT/bnt.html
[18] Hsu, et al.. BNJ - Bayesian Network Tools in Java. Cited on March 13, 2009. Available from:
http://bnj.sourceforge.net
[19] Müller S. JMatLink. Cited on March 13, 2009. Available from: http://jmatlink.sourceforge.net
[20] Nachimuthu SK. Projeny website. Cited on March 10, 2012. Available from: http://projeny.sourceforge.net
[21] Lauritzen SL. The EM algorithm for graphical association models with missing data. Computational
Statistics & Data Analysis. 1995;19(2):191201.
[22] Lauritzen S, Spiegelhalter D. Local computations with probabilities on graphical structures and their
application to expert systems. Journal of the Royal Statistical Society Series B (Methodological).
1988;50(2):157224.
[23] Morrison AM. Receiver Operating Characteristic (ROC) Curve Preparation: A Tutorial. Boston:
Massachusetts Water Resources Authority Report ENQUAD. 2005;20(5).
[24] Nachimuthu SK, Wong A, Haug PJ. Modeling Glucose Homeostasis and Insulin Dosing in an Intensive
Care Unit using Dynamic Bayesian Networks. AMIA Annu Symp Proc. 2010;2010:532536.
662
... [31] Several previous studies were conducted with similar or even lower ratios of high-risk cases. [32][33][34] But it also suggests that future studies should incorporate larger datasets to further optimize the model. Third, our study primarily focused on immediate outcomes such as the need for ICU admission and mortality in the resuscitation room. ...
... These advantages are naturally suitable for early prediction of clinical events, such as sepsis. Senthil and Haug (2012) [25] used EM-based DBNs and discrete variables to detect sepsis for patients admitted to the emergency department based on the clinician's diagnostic criteria. A three-time-slice model for sepsis detection at time t using variables collected at hour t, t-1, and t-2 showed sensitivity, specificity, and AUROC of 0.69, 0.95, and 0.91. ...
Article
Full-text available
Sepsis is among the leading causes of morbidity, mortality and high costs in the ICU. The early prediction and intervention of sepsis is a challenging task under strict time and cost constraints. In this paper, a novel High-order Markov Dynamic Bayesian Network (HMDBN) classifier with discrete features is presented for early prediction of sepsis at a high-order time point. The model structure is learned from the unrolled DBN by performing the K2 algorithm, and the features ‘disappeared’ in the prediction are eliminated using the VE method. Based on a few vital signs and laboratory results, an intuitive causal graph and indicating system are constructed to realize continuous prediction and probabilistic interpretation in real-time. Compared with other ten classical machine learning classifiers on evaluation metrics, HMDBN models have the highest AUROC scores on both internal tests and external validations for sepsis early prediction, and provide identifiable and interpretable results that allowing clinicians to immediately understand the reason for the prediction.
... These clinical predictors will again depend on whether you want to develop a prognostic predictive model (which predicts the likelihood of sepsis occurring before the systemic inflammation process begins) 23 or a diagnostic predictive model (which early detects the likelihood of sepsis but after the inflammation process has already begun). 24 A review of the medical literature can help identify potential predictors that might be worth considering; 194 clinical predictors have been previously used to train machine learning algorithms for sepsis prediction, 13 of which were used across all 17 newly developed algorithms. 22 These 13 predictors contained a blend of non-modifiable (eg, age, gender) and modifiable (eg, blood glucose levels, blood pressure) predictors, the latter potentially increasing the applicability of the model in clinical practice. ...
Article
Full-text available
Background: Predictive models have been used in clinical care for decades. They can determine the risk of a patient developing a particular condition or complication and inform the shared decision-making process. Developing artificial intelligence (AI) predictive models for use in clinical practice is challenging; even if they have good predictive performance, this does not guarantee that they will be used or enhance decision-making. We describe nine stages of developing and evaluating a predictive AI model, recognising the challenges that clinicians might face at each stage and providing practical tips to help manage them. Findings: The nine stages included clarifying the clinical question or outcome(s) of interest (output), identifying appropriate predictors (features selection), choosing relevant datasets, developing the AI predictive model, validating and testing the developed model, presenting and interpreting the model prediction(s), licensing and maintaining the AI predictive model and evaluating the impact of the AI predictive model. The introduction of an AI prediction model into clinical practice usually consists of multiple interacting components, including the accuracy of the model predictions, physician and patient understanding and use of these probabilities, expected effectiveness of subsequent actions or interventions and adherence to these. Much of the difference in whether benefits are realised relates to whether the predictions are given to clinicians in a timely way that enables them to take an appropriate action. Conclusion: The downstream effects on processes and outcomes of AI prediction models vary widely, and it is essential to evaluate the use in clinical practice using an appropriate study design.
... However, significant steps have been taken towards explainability for such models, such as the use of Shapley additive explanation (SHAP) or local interpretable model-agnostic explanation (LIME) methods 36,37 . To our knowledge, only a few studies using CPN models to predict sepsis have been reported, but without any clinically realistic performance evaluation 38,39 . ...
Article
Full-text available
Sepsis is a leading cause of mortality and early identification improves survival. With increasing digitalization of health care data automated sepsis prediction models hold promise to aid in prompt recognition. Most previous studies have focused on the intensive care unit (ICU) setting. Yet only a small proportion of sepsis develops in the ICU and there is an apparent clinical benefit to identify patients earlier in the disease trajectory. In this cohort of 82,852 hospital admissions and 8038 sepsis episodes classified according to the Sepsis-3 criteria, we demonstrate that a machine learned score can predict sepsis onset within 48 h using sparse routine electronic health record data outside the ICU. Our score was based on a causal probabilistic network model—SepsisFinder—which has similarities with clinical reasoning. A prediction was generated hourly on all admissions, providing a new variable was registered. Compared to the National Early Warning Score (NEWS2), which is an established method to identify sepsis, the SepsisFinder triggered earlier and had a higher area under receiver operating characteristic curve (AUROC) (0.950 vs. 0.872), as well as area under precision-recall curve (APR) (0.189 vs. 0.149). A machine learning comparator based on a gradient-boosting decision tree model had similar AUROC (0.949) and higher APR (0.239) than SepsisFinder but triggered later than both NEWS2 and SepsisFinder. The precision of SepsisFinder increased if screening was restricted to the earlier admission period and in episodes with bloodstream infection. Furthermore, the SepsisFinder signaled median 5.5 h prior to antibiotic administration. Identifying a high-risk population with this method could be used to tailor clinical interventions and improve patient care.
... Despite limited use in pediatric sepsis, PGM has been investigated in various disease diagnoses, such as cancer, heart disease, and adult sepsis (8,34,35). In 2012, a study applied the Dynamic Bayesian Network (DBN) to detect early sepsis in 3,100 adults within 24 hours of admission and reported an AUC of 0.94 (36). In 2016, Jiang et al. proposed a BN-based sepsis monitoring framework for the elderly that can report patients' conditions periodically without human intervention. ...
Article
Full-text available
Background: Probabilistic graphical model, a rich graphical framework in modelling associations between variables in complex domains, can be utilized to aid clinical diagnosis. However, its application in pediatric sepsis remains limited. This study aims to explore the utility of probabilistic graphical models in pediatric sepsis in the pediatric intensive care unit. Methods: We conducted a retrospective study on children using the first 24-hour clinical data of the intensive care unit admission from the Pediatric Intensive Care Dataset, 2010-2019. A probabilistic graphical model method, Tree Augmented Naive Bayes, was used to build diagnosis models using combinations of four categories: vital signs, clinical symptoms, laboratory, and microbiological tests. Variables were reviewed and selected by clinicians. Sepsis cases were identified with the discharged diagnosis of sepsis or suspected infection with the systemic inflammatory response syndrome. Performance was measured by the average sensitivity, specificity, accuracy, and area under the curve of ten-fold cross-validations. Results: We extracted 3,014 admissions [median age of 1.13 (interquartile range: 0.15-4.30) years old]. There were 134 (4.4%) and 2,880 (95.6%) sepsis and non-sepsis patients, respectively. All diagnosis models had high accuracy (0.92-0.96), specificity (0.95-0.99), and area under the curve (0.77-0.87). Sensitivity varied with different combinations of variables. The model that combined all four categories yielded the best performance [accuracy: 0.93 (95% confidence interval (CI): 0.916-0.936); sensitivity: 0.46 (95% CI: 0.376-0.550), specificity: 0.95 (95% CI: 0.940-0.956), area under the curve: 0.87 (95% CI: 0.826-0.906)]. Microbiological tests had low sensitivity (<0.10) with high incidence of negative results (67.2%). Conclusions: We demonstrated that the probabilistic graphical model is a feasible diagnostic tool for pediatric sepsis. Future studies using different datasets should be conducted to assess its utility to aid clinicians in the diagnosis of sepsis.
... The relationship between ICU patients' physiological variables is highly complex (usually nonlinear and interactive), which is unlikely to be captured by common parametric methods (e.g., linear regression). Moreover, models designed to be intuitive for human experts' understanding may not be computationally efficient or accurate for probabilistic modeling [24]. Methods that consider the complex conditional inter-dependencies between variables would be more precise in probabilistic modeling. ...
Article
Full-text available
Background Critical trauma patients are particularly prone to increased mortality risk; hence, an accurate prediction of their conditions enables early identification of patients' mortality status. Thus, we aimed to develop and validate a real-time prediction model for physiological changes, organ dysfunctions and mortality risk in critical trauma patients. Methods We used Dynamic Bayesian Networks (DBNs) to model complicated relationships of physiological variables across time slices, accessing data of trauma patients from the Medical Information Mart for Intensive Care database (MIMIC-III) (n = 2915) and validated with patients' data from ICU admissions at the Changhai Hospital (ICU-CH) (n = 1909). The DBN model's evaluation included the predictive ability of physiological changes, organ dysfunctions and mortality risk. Results Our DBN model included two static variables (age and sex) and 18 dynamic physiological variables. The differences in ratios between the real values and the 24- and 48-h predicted values of most physiological variables were within 5% in the two datasets. The accuracy of our DBN model for predicting renal, hepatic, cardiovascular and hematologic dysfunctions was more than 0.8.The calculated area under the curve (AUC) from receiver operating characteristic curves and 95% confidence interval for predicting the 24- and 48-h mortality risk were 0.977 (0.967–0.988) and 0.958 (0.945–0.971) in the MIMIC-III and 0.967 (0.947–0.987) and 0.946 (0.925–0.967) in ICU-CH. Conclusions A DBN is a promising method for predicting medical temporal data such as trauma patients' mortality risk, demonstrated by high AUC scores and validation by a real-life ICU scenario; thus, our DBN prediction model can be used as a real-time tool to predict physiological changes, organ dysfunctions and mortality risk during ICU admissions.
Article
Full-text available
About 2.9 million neonates die every year worldwide, and most of these deaths occur in low resource settings where it causes about 30 to 50% of the total neonatal deaths annually. Neonatal sepsis occurs when there is a bacterial invasion in the bloodstream; the immune system begins a systemic inflammatory response syndrome (SIRS) damaging to the body and can quickly advance to severe sepsis, multi-organ failure, and finally, death. Sepsis in neonates can progress more rapidly than in adults; therefore, timely diagnosis is critical. The gold standard test for diagnosing neonatal sepsis is blood culture, which takes at least 72 hours. Hence, identifying key predictor variables and models that work best can help reduce neonatal morbidity and mortality. Matching articles were identified by searching PubMed, IEEE, and Cochrane bibliography databases. Full-text articles with the following criteria were included for analysis based on 1) the subject population are neonates. 2) the study provided a clear definition of neonatal sepsis. 3) the study provides neonatal sepsis onset definition (i.e., time of onset). 4) the study clearly described the predictor variables used. 5) the study clearly described machine learning models used or evaluated any of the consolidated screening parameters. 6) the study must have provided diagnostic performance results. Thirty-one studies met full inclusion criteria. The duration of ROM was found to be more significant than other maternal risk factors. Heart rate and heart rate variability were found to be more significant than other neonatal clinical signs. C reactive protein and I/T ratio were found to be more significant than other laboratory tests. A combination of predictor variables has shown to strengthen neonatal sepsis prediction, as shown by some of the reviewed studies. Predictive algorithms that combine multiple variables are urgently needed to improve models for early detection, prognosis, and treatment of neonatal sepsis.
Preprint
Full-text available
Sepsis is a life-threatening condition with organ dysfunction and is a leading cause of death and critical illness worldwide. Accurate detection of sepsis during emergency department triage would allow early initiation of lab analysis, antibiotic administration, and other sepsis treatment protocols. The purpose of this study was to determine whether EHR data can be extracted and synthesized with the latest machine learning algorithms (KATE Sepsis) and clinical natural language processing to produce accurate sepsis models, and compare KATE Sepsis performance with existing sepsis screening protocols, such as SIRS and qSOFA. A machine learning model (KATE Sepsis) was developed using patient encounters with triage data from 16 participating hospitals. KATE Sepsis, SIRS, standard screening (SIRS with source of infection) and qSOFA were tested in three settings. Cohort-A was a retrospective analysis on medical records from a single Site 1. Cohort-B was a prospective analysis of Site 1. Cohort-C was a retrospective analysis on Site 1 with 15 additional sites. Across all cohorts, KATE Sepsis demonstrates an AUC of 0.94-0.963 with 73-74.87% TPR and 3.76-7.17% FPR. Standard screening demonstrates an AUC of 0.682-0.726 with 39.39-51.19% TPR and 2.9-6.02% FPR. The qSOFA protocol demonstrates an AUC of 0.544-0.56, with 10.52-13.18% TPR and 1.22-1.68% FPR. For severe sepsis, across all cohorts, KATE Sepsis demonstrates an AUC of 0.935-0.972 with 70-82.26% TPR and 4.64-8.62% FPR. For septic shock, across all cohorts, KATE Sepsis demonstrates an AUC of 0.96-0.981 with 85.71-89.66% TPR and 4.85-8.8% FPR. SIRS, standard screening, and qSOFA demonstrate low AUC and TPR for severe sepsis and septic shock detection. KATE Sepsis provided substantially better sepsis detection performance in triage than commonly used screening protocols.
Article
Sepsis is a major public health problem and a leading cause of death in the world, where delay in the beginning of treatment, along with clinical guidelines non-adherence have been proved to be associated with higher mortality. Machine Learning is increasingly being adopted in developing innovative Clinical Decision Support Systems in many areas of medicine, showing a great potential for automatic prediction of diverse patient conditions, as well as assistance in clinical decision making. In this context, this work conducts a narrative review to provide an overview of how specific Machine Learning techniques can be used to improve sepsis management, discussing the main tasks addressed, the most popular methods and techniques, as well as the obtained results, in terms of both intelligent system accuracy and clinical outcomes improvement.
Article
Temporal data mining deals with the harvesting of useful information from temporal data. New initiatives in health care and business organizations have increased the importance of temporal information in data today. From basic data mining concepts to state-of-the-art advances, Temporal Data Mining covers the theory of this subject as well as its application in a variety of fields. It discusses the incorporation of temporality in databases as well as temporal data representation, similarity computation, data classification, clustering, pattern discovery, and prediction. The book also explores the use of temporal data mining in medicine and biomedical informatics, business and industrial applications, web usage mining, and spatiotemporal data mining. Along with various state-of-the-art algorithms, each chapter includes detailed references and short descriptions of relevant algorithms and techniques described in other references. In the appendices, the author explains how data mining fits the overall goal of an organization and how these data can be interpreted for the purpose of characterizing a population. She also provides programs written in the Java language that implement some of the algorithms presented in the first chapter.
Article
It is shown how the computational scheme of Lauritzen and Spiegelhalter (1988) can be exploited to perform the E-step of the EM algorithm when applied to finding maximum likelihood estimates or penalized maximum likelihood estimates in hierarchical log-linear models and recursive models for contingency tables with missing data. The generalization to mixed association models introduced in Lauritzen and Wermuth (1989) and Edwards (1990) is indicated.
Article
Time series clustering has been shown effective in providing useful information in various domains. There seems to be an increased interest in time series clustering as part of the effort in temporal data mining research. To provide an overview, this paper surveys and summarizes previous works that investigated the clustering of time series data in various application domains. The basics of time series clustering are presented, including general-purpose clustering algorithms commonly used in time series clustering studies, the criteria for evaluating the performance of the clustering results, and the measures to determine the similarity/dissimilarity between two time series being compared, either in the forms of raw data, extracted features, or some model parameters. The past researchs are organized into three groups depending upon whether they work directly with the raw data either in the time or frequency domain, indirectly with features extracted from the raw data, or indirectly with models built from the raw data. The uniqueness and limitation of previous research are discussed and several possible topics for future research are identified. Moreover, the areas that time series clustering have been applied to are also summarized, including the sources of data used. It is hoped that this review will serve as the steppingstone for those interested in advancing this area of research.