Conference PaperPDF Available

A Markov Analysis of Patients Developing Sepsis Using Clusters

Authors:

Abstract and Figures

Sepsis is a significant cause of mortality and morbidity. There are now aggressive goal oriented treatments that can be used to help patients suffering from sepsis. By predicting which patients are more likely to develop sepsis, early treatment can potentially reduce their risks. However, diagnosing sepsis is difficult since there is no “standard” presentation, despite many published definitions of this condition. In this work, data from a large observational cohort of patients – with variables collected at varying time periods – are observed in order to determine whether sepsis develops or not. A cluster analysis approach is used to form groups of correlated datapoints. This sequence of datapoints is then categorized on a per person basis and the frequency of transitioning from one grouping to another is computed. The result is posed as a Markov model which can accurately estimate the likelihood of a patient developing sepsis. A discussion of the implications and uses of this model is presented.
Content may be subject to copyright.
D. Riaño et al. (Eds.): KR4HC 2010, LNAI 6512, pp. 85–100, 2011.
© Springer-Verlag Berlin Heidelberg 2011
A Markov Analysis of Patients Developing Sepsis
Using Clusters
Femida Gwadry-Sridhar1, Michael Bauer2, Benoit Lewden1, and Ali Hamou1
1 I-THINK Research Lab, Lawson Health Research Institute, London, ON Canada
2 University of Western Ontario, London, ON Canada
femida.gwadry-sridhar@lhsc.on.ca, bauer@csd.uwo.ca,
benoit.lewden@lawsonresearch.com, ali.hamou@sjhc.london.on.ca
Abstract. Sepsis is a significant cause of mortality and morbidity. There are
now aggressive goal oriented treatments that can be used to help patients suffer-
ing from sepsis. By predicting which patients are more likely to develop sepsis,
early treatment can potentially reduce their risks. However, diagnosing sepsis
is difficult since there is no “standard” presentation, despite many published
definitions of this condition.
In this work, data from a large observational cohort of patients – with vari-
ables collected at varying time periods – are observed in order to determine
whether sepsis develops or not. A cluster analysis approach is used to form
groups of correlated datapoints. This sequence of datapoints is then categorized
on a per person basis and the frequency of transitioning from one grouping to
another is computed. The result is posed as a Markov model which can accu-
rately estimate the likelihood of a patient developing sepsis. A discussion of the
implications and uses of this model is presented.
Keywords: Sepsis, Markov Chains, Cluster Analysis, Predictive Modelling.
1 Introduction
Sepsis is defined as an infection with systematic manifestations of an infection [7].
Severe sepsis is considered present when sepsis co-exists with sepsis-induced organ
dysfunction or tissue hypoperfusion [7]. This can result in mortality and morbidity,
especially when associated with shock and/or organ dysfunction [3]. Sepsis can be
associated with increased hospital resource utilization, prolonged stays in intensive
care units (ICU) and hospital wards, decreased long-term health related quality of life
and an economic burden estimated at $17 billion USD (equivalent to $17.49 billion
CAD) each year in the United States alone [5, 17, 21]. In Canada, there is limited data
on the burden of severe sepsis; however, costs in Quebec may be as high as $73 mil-
lion CAD per year [13] which contributes to the estimate of total Canadian cost of
approximately $325 million CAD per year.
Patients with severe sepsis generally receive their care in an ICU. A multinational
study of sepsis in teaching hospitals found that severe sepsis or septic shock is present
or develops in 15% of ICU patients [1]. However, diagnosing sepsis is difficult since
86 F. Gwadry-Sridhar et al.
there is no “standard” presentation despite many published definitions for sepsis [2,
14]. In the STAR registry [15] (containing a mix of teaching and community hospitals
across Canada), the total rate for severe sepsis was 19%. Of these, 63% occurred after
hospitalization.
The management of severe sepsis requires prompt treatment within the first six
weeks of resuscitation [7]. Intensivists currently support the use of early goal-directed
resuscitation of patients. This has shown to improve survival in patients presenting to
emergency rooms with septic shock [7].
Given the many advances in medicine today, there now exist aggressive goal ori-
ented treatments that can be used to help patients with sepsis and severe sepsis [4, 16,
18]. If researchers could predict which patients may be at risk for sepsis, treatment
may be started early and potentially reduce the risk of mortality and morbidity. There-
fore, methods that help with the early diagnosis of patients who either have or are at
risk for sepsis within hospitals are in dire need and in fact may result in better progno-
sis if interventions are initiated early.
1.1 Analytic Techniques
A variety of analytical techniques can be used to establish relationships and assess the
strength of these relationships among a set of measured variables or quantities. Re-
searchers commonly use univariate or multivariable regression models to estimate the
association between prognostic variables and a clinical outcome (such as sepsis).
Multivariable regression models are frequently used in studies where clinical out-
comes are included. These models can use both categorical and continuous variables,
but the use of uncritical modeling techniques can lead to erroneous conclusions and
resultant imprecision [9].
In this work, our primary goal is to determine patients at risk of a particular clinical
diagnosis – sepsis being the reference case. Many approaches exist for analyzing
patients with and without sepsis [8]. These approaches generally use regression mod-
els, which assume the existence of an identifiable, singular set of variables (or vari-
ables) for prediction. Hence, if there exist multiple sets of variables that appear in
several independent traits, then prediction with this method becomes difficult. The
current literature illustrates some of the limitations with univariable and multivariable
models [8, 18, 19]. In order to address such limitations, alternative approaches to
simplify the multi-faceted nature of predicting sepsis need to be investigated.
In this paper, we construct a Markov model to predict sepsis using patient data
from 12 Canadian Intensive Care Units (ICU’s). Our model differs fundamentally
from those previously used for predicting an outcome. Outcome probabilities are
generated by the use of a k-means cluster analysis algorithm to group variables into
correlated datapoints and then use these to define the “states” in a Markov model.
Transitions in the model are determined based on successive values of the datapoints
for individual patients. The transition model assumes that patient’s risk changes de-
pendent on their state of health.
Section 2 describes the approach in detail and depicts examples and implications of
the proposed model. Section 3 describes the results and Section 4 explores the
model’s repercussions and future initiatives. Section 5 concludes and summarizes
this work.
A Markov Analysis of Patients Developing Sepsis Using Clusters 87
Table 1. Sepsis Study: 25 variables collected to form a single datapoint
Variables P value Exp(B)
Anaerobia culture 0.122 0.317
Abdominal diagnosis 0.000 15.027
Blood diagnosis 0.000 3.574
Lung diagnosis 0.000 10.360
Other diagnosis 0.000 8.492
Urine diagnosis 0.000 7.280
Chest X-ray and purulent sputum 0.000 2.756
Gram negative infection culture 0.047 0.679
Gram positive infection culture 0.001 0.533
Heart rate > 90bpm 0.000 16.933
No culture growth 0.000 0.103
PaO2/FiO2 < 250 0.000 12.305
pH < 7.30 or lactate > 1.5 upper normal with base deficit > 5 0.141 1.242
Platelets < 80 or 50% decrease in past 3 days 0.000 5.665
Respiratory rate > 19, PaCO2 < 32 or Mechanical ventilation 0.000 8.866
SBP < 90 or MAP < 70 or Pressure for one hour 0.000 9.963
Abdominal culture 0.259 1.872
Blood culture 0.000 2.311
Lung culture 0.724 0.932
Other site culture 0.614 0.869
Urine culture 0.100 1.450
Temperature < 36 or > 38 0.000 8.246
Urinary output < 0.5 mL/kg/hr 0.000 3.166
WBC > 12 or < 4 or > 10% bands 0.000 6.281
Yeast culture 0.011 0.492
Constant 0.000 0.000
2 Methods
2.1 Data Acquisition
We obtained data that was collected from 12 Canadian ICU’s that were geographi-
cally distributed and included a mix of medical and surgical patients [15]. The study
was approved by the University of Western Ontario Research Ethics Board and the
need for informed consent was waived. Data was collected on all patients admitted to
the ICU who had a stay greater than 24 hours or who had severe sepsis at the time of
admission. Patients who did not receive active treatment were excluded. We screened
over 23,000 patients for sepsis (of which 4,196 were randomly selected for this
study). Patients with confirmed sepsis were classified as “septic.” It is normal practice
in the ICU to treat patients with suspected sepsis as septic until blood cultures are
available.
Hospitals routinely collect a minimum data profile on all eligible patients admitted
to the ICU [6]. This includes demographic information, admission data, source of
admission, diagnosis, illness severity, outcome and length of ICU and hospital dura-
tion. Table 1 presents a summary of the variables collected in this data profile for this
study. This table also shows the influence of each variable on sepsis output when
88 F. Gwadry-Sridhar et al.
analyzed with a simple logistic regression model (since the variables were Boolean –
or binary in terms of the analysis).We have previously reported this analysis. [8].
Illness severity scores were calculated using a validated formula from data obtained
during the first 24 hours in the ICU [11, 12]. All patients were subsequently assessed
on a daily basis for the presence of infection and severe sepsis. Hence, for patients
with stays longer than 24 hours, a repeated measures of the variables in Table 1 were
collected and averaged on their daily condition. The characteristics of the variables
are a part of standard ICU testing protocol and have been described elsewhere [15].
2.2 Model Formulation
Our patient data is modeled in the following format:
p1 : d11, d12, …, d1n1
p2 : d21, d22, …, …, …, d2n2
p3 : d31, d32, …, …, d3n3
. . . . . .
where pi represents a patient in our data set, and dij represents a datapoint, each con-
sisting of measurements for the 25 variables identified in Table 1 (where i represents
the patient and j represents the data collection variable). A non-predetermined number
of datapoints hence exists for each patient. These datapoints progressed time-wise
from initial admittance to the ICU/ward until departure or death. The time period
between datapoints typically fell on consecutive days, though this was not always the
case (dependent on hospital protocols and staff availability).
Analysis of the collected data was especially challenging for several reasons:
The number of datapoints per patient varied. For instance, some patients had
two or three datapoints, while for others there was a dozen or more.
The time periods between datapoints for a patient occasionally varied.
The conditions of patients when admitted were extremely diverse; some were
severely ill, already showing signs of sepsis, while others showed none.
Hence, the first datapoint of two separate patients would not correspond to
the same stage of illness.
Clustering. Since the datapoints across the patient dataset is not aligned, as stated
above, we had to address this by considering all datapoints independently for an ini-
tial cluster analysis. In our previous work [8], cluster analysis had been used to group
patients with and without sepsis based on their initial datapoint. This proved to be a
useful approach for grouping patients. In this instance, the clustering would group
similar datapoints as well – one could consider the datapoints within the same cluster
as representing the same “state” of a patient. That is, similar datapoints would be
clustered regardless of their position within a timeframe for an individual patient.
Though not considered in this initial clustering algorithm, timeframe measurement
data will be considered when such datapoints are placed within the Markov model.
A variety of different clustering sizes and algorithms were explored. Data cluster-
ing algorithms can be hierarchical. Hierarchical algorithms find successive clusters
A Markov Analysis of Patients Developing Sepsis Using Clusters 89
using previously established clusters. Hierarchical algorithms can be agglomerative
("bottom-up") or divisive ("top-down"). Agglomerative algorithms begin with each
variable as a separate cluster and merge them into successively larger clusters. Divi-
sive algorithms begin with the whole set and proceed to divide it into successively
smaller clusters.
A modified k-means clustering algorithm (which was modeled to be agglomera-
tive) was used in this work due to its speed of execution on large datasets (over hier-
archical for instance). This algorithm incorporates both the internal consistency of
variables within the cluster (distance to the cluster’s centroid) and the external
distance to all neighboring clusters as its heuristic. The k-means algorithm used is
refinement-iterative and alternates between assignment (where each variable is as-
signed to a cluster) and updating (calculation of new means of the cluster centre). By
doing so, clusters will eventually incorporate all variables within the closest proximity
space. In general, the correct choice of k (being the number of clusters) is often am-
biguous, with interpretations depending on the shape and scale of the distribution of
points in a data set and the desired clustering resolution of the user. However, for this
study the number of clusters was chosen to be 12 – resulting in clusters that were
neither too large to manage (over fitted), nor ones with too few datapoints to give an
adequate determination of patient status due to lack of separation. k was achieved by
examining the percentage of variance of a function of the number clusters. Hence,
adding greater than 12 clusters didn’t achieve better modeling of the results given the
modifications to the clustering algorithm (i.e. custom distance measures).
During cluster generation, the algorithm is applied to every variable within the
dataset. Variables are then ranked by influence and probability of defining a cluster
and its location within the cluster field (internal consistency measure). The mathe-
matical indicator used to create these groupings is defined as the orthogonal distance
between clusters. Each variable was essential a discrete point in a space which has its
own dimension and base (the dimension of each variable is simply the number of
categories available). For example, if a variable can take two distinct values then our
base for this space becomes (1,0)(0,1). These two vectors are orthogonal to each other
and therefore are a base for this particular space.
Since our datapoints were binary categorized, each binary variable was normalized
along its matrix length and squared. Furthermore, in the event of datapoints that were
equidistant to multiple centroids, the internal consistency of each cluster was meas-
ured and the tighter field was chosen. This prevented the ballooning of the cluster
sets. Based on this consistency measure, each cluster is labelled either sepsis or non-
sepsis depending on the different distributions of the datapoints within.
The algorithm behaves very close to the standard k-means algorithm:
Initialization:
o Select first datapoint as the first centroid.
o Calculate distance between centroid and all other points, select the
furthest point as second centroid (based on the distance calculated
above).
o Choose the furthest point from the first two centroids as the third.
o Continue until N centroids (clusters) are achieved.
90 F. Gwadry-Sridhar et al.
Iterate:
o Each datapoint is sent to its closest cluster.
o Recalculate cluster centres. (The centre minimizes the distance from
itself and each point. Hence, the distance between the centre and each
point is minimized by assigning it the most common category among
its cluster).
o Repeat until no change
The proposed algorithm also varies by compensating for “attraction points” such that
clusters smaller than 0.05% of the total size were removed at the end of each iteration
and a new centroid was determined based on calculating the point furthest away from
the other centroids. Furthermore, datapoints containing missing values were not con-
sidered suitable candidates for any centroid positions.
Markov Model. Following the cluster computation, each individual patient within the
dataset was analyzed. The individual datapoints, for instance, d11, d12, …, d1n1, were
considered separately. If d11 was in cluster Ci and d12 was in cluster Cj, then a transi-
tion from Ci to Cj was created. The frequency of each transition was also tracked and
so the total number of transitions from Ci to any cluster, including Ci could be deter-
mined. This resulted in each cluster being considered its own “state” and revealing the
probability of a transition from one cluster to another. Hence neighbouring datapoints
will be represented by the transitions between states.
All transition points were created by comparing each datapoint to every other data-
point occurring in the future. This provides a temporal compliance (or normalization),
such that the time between d1 - d3 for a patient might be the same as the time between
d1 - d2 for another patient (or that the evolution of the infection during this “period”
could be the same, in other words the duration of d1 - d3 may equate d1 - d2.)
In order to complete the model, the following metrics were also recorded for each
datapoint within a cluster: (a) whether the patient did or did not have sepsis; (b)
whether this was the last datapoint associated with the patient or not and if it was the
last datapoint what the outcome was – namely discharged or deceased.
3 Results
A Markov graph with 14 states – 12 states corresponding to the clusters and 2 addi-
tional states – one for patients that had been discharged and one for those that were
deceased was generated. The full model is represented by cluster transitions in
Figure 1.
However in order to simplify an explanation of the results, Figure 2 illustrates a
portion of the graph for 3 states: states #1, #5, and #6. The two final gray nodes repre-
sent the “deceased” state and the “discharged” state. What is not shown in the graph
is the percent of patients that had been diagnosed with sepsis at the time their data-
points were included. These are 1.3%, 35.0% and 90.18% for nodes #1, #5 and #6,
respectively.
A Markov Analysis of Patients Developing Sepsis Using Clusters 91
Fig. 1. Full Markov transition graphs represented by clusters
92 F. Gwadry-Sridhar et al.
Fig. 1. (continued)
A Markov Analysis of Patients Developing Sepsis Using Clusters 93
Fig. 1. (continued)
94 F. Gwadry-Sridhar et al.
Fig. 1. (continued)
A Markov Analysis of Patients Developing Sepsis Using Clusters 95
Fig. 1. (continued)
96 F. Gwadry-Sridhar et al.
Fig. 1. (continued)
A Markov Analysis of Patients Developing Sepsis Using Clusters 97
Note that a significant portion (57.8%) of the datapoints that were assigned to state
#1, were followed by their adjacent datapoint which was also assigned to state #1.
Given the very low percentage of patients diagnosed with sepsis within this cluster,
this is likely a state indicative of patients being admitted, never developing sepsis
symptoms, remaining for several days, and finally being discharged.
Node #5 can be considered representative of a patient developing sepsis (hidden or
apparent, depending on variable characteristics), or a patient recovering from it, as
most patients have a 35% chance of sepsis in this state. It is not relevant which, as the
model will provide a predictor to which its next state will be.
Node #6 represents a state which can be thought of as one characterizing a patient
with sepsis, since 90.18% of the patients having datapoints in this cluster were diag-
nosed with sepsis. Interestingly, 25.9% return to state #1, where relatively few pa-
tients were identified with sepsis. It is also interesting to note that from state #6 no
patients died or were discharged. This is likely due to aggressive treatment of sepsis
when symptoms become severe and apparent.
Fig. 2. Portion of Markov Graph
3.1 Further Implications
There are many clinical applications where this model can be used. Consider the fol-
lowing scenario where there exists a datapoint (the 25 variable set) for a patient
admitted to an ICU. This datapoint can be matched to a cluster, i.e., a “state”. From
this state in the model, the probability of the patient transitioning to a new state can be
determined at the next “time point”. For each “next state”, there exists a probability
that the patient will develop sepsis. Order is not implied in the model and is left to the
transition probabilities, giving rise to a powerful predictor. Finally, the likelihood of
the patient developing sepsis by summing over all possible transitions can easily be
computed.
98 F. Gwadry-Sridhar et al.
For example, assume that Figure 1 represents the entire Markov graph. If a pa-
tient’s datapoint ends up in state #1, the patient’s estimated probability of developing
sepsis is calculated as follows:
Probability of Sepsis = 0.578*0.0129 + 0.0025*0.3504 + 0.0049*0.9018
= 0.012648, or 1.26%
This is only a one-step approximation – since it only considers a transition from
state #1 to its adjacent states by multiplying the transition probabilities by the state
sepsis ones. In this example, the one-step approximation illustrates the probability of
the patient developing sepsis using paths of length one. In general, one could compute
paths of any length (limited by the longest stay of any patient in the hospital in
the dataset). Similarly, the probability of a patient not developing sepsis, becoming
discharged or becoming deceased, can also be computed using the same algebraic
technique.
3.2 Validation
In this study, 4,196 patients were selected in order to test the accuracy of the proposed
method. In total, 23,547 datapoints were available from the patient dataset. Testing
the precision of the clustering was achieved by first randomly selecting 10% of the
datapoints and removing them from the dataset for testing. Cluster generation fol-
lowed by training on this sub-sampled dataset, and testing on the removed datapoints,
by matching them to the trained clusters. These datapoints were then analysed to see
whether they were repatriated to a cluster that represented their end state (sepsis ver-
sus non-sepsis) – as if they were part of the original training set.
For sepsis patients, an 80.01% precision rate was achieved (when recursive train-
ing and testing was applied to the datapoints during cross-validation – in this instance,
1000 datapoints were swapped for each train/test run). For non-sepsis patients, a
94.69% precision rate was achieved, partly because non-sepsis patients represented a
larger sample size throughout.
3.3 Future Work
The proposed technique will be thoroughly tested against other ICU patient cohorts in
Europe. Clusters will be used in conjunction with decision trees models in order to
identify which variables truly influence the development of sepsis versus non-sepsis
in patients. Furthermore, the non-homogeneous feature-based classification ability of
decision trees and the temporal break sequence modeling based on the Markov proc-
ess should improve results significantly. Kim and Oh’s [10] proposed algorithm will
be modified and utilized in future works to further improve the model.
4 Conclusions
Multiple methods of analyzing clinical data provide different perspectives on assess-
ing patient health. We have demonstrated the use of cluster analysis as an efficient
and relatively swift method for identifying patients at risk – or not at risk – of a
A Markov Analysis of Patients Developing Sepsis Using Clusters 99
clinical condition, being sepsis. The reliability and the validity of these clusters in-
creases based on sample size and based on testing across different datasets. Cluster
analysis can be used as an effective tool to supplement the daily monitoring of pa-
tients. By utilizing the transitioning probabilities through a Markov model, clinicians
can have a greater awareness regarding a patient who is no longer at risk of contract-
ing sepsis. This gives the clinician the ability to make a more informed decision on
treatment. Utilizing multi-faceted mathematical modeling is a useful clinical infor-
matics approach to support clinical decision making, the utilization of evidence based
treatment and efficiencies in health resource utilization.
References
1. Alberti, C., Brun-Buisson, C., Burchardi, H., Martin, C., Goodman, S., Artigas, A., et al.:
Epidemiology of sepsis and infection in ICU patients from an international multicentre co-
hort study. Intensive Care Med. 28(2), 108–121 (2002)
2. American College of Chest Physicians/Society of Critical Care Medicine Consensus Con-
ference: definitions for sepsis and organ failure and guidelines for the use of innovative
therapies in sepsis. Crit. Care Med. 20(6), 864–874 (1992)
3. Angus, D.C., Linde-Zwirble, W.T., Lidicker, J., Clermont, G., Carcillo, J., Pinsky, M.R.:
Epidemiology of severe sepsis in the United States: analysis of incidence, outcome, and
associated costs of care. Crit.Care Med. 29(7), 1303–1310 (2001)
4. Bernard, G.R., Vincent, J.L., Laterre, P.F., LaRosa, S.P., Dhainaut, J.F., Lopez-Rodriguez,
A., et al.: Efficacy and safety of recombinant human activated protein C for severe sepsis.
N. Engl. J. Med. 344(10), 699–709 (2001)
5. Brun-Buisson, C., Doyon, F., Carlet, J., Dellamonica, P., Gouin, F., Lepoutre, A., et al.:
Incidence, risk factors, and outcome of severe sepsis and septic shock in adults. a multi-
center prospective study in intensive care units. French ICU Group for Severe Sepsis.
JAMA 274(12), 968–974 (1995)
6. Chen, L.M., Martin, C.M., Morrison, T.L., Sibbald, W.J.: Interobserver variability in data
collection of the APACHE II score in teaching and community hospitals. Crit. Care
Med. 27(9), 1999–2004 (1999)
7. Dellinger, R.P., Levy, M.M., Carlet, J.M., Bion, J., Parker, M.M., Jaeschke, R., et al.: Sur-
viving sepsis campaign: international guidelines for management of severe sepsis and
septic shock: 2008. Crit. Care Med. 36(1), 296–327 (2008)
8. Gwadry-Sridhar, F., Lewden, B., Mequanint, S., Bauer, M.: Comparison of analytic ap-
proaches for determining variables: a case study in predicting the likelihood of sepsis. In:
HEALTHINF 2009. Proceedings of INSTICC, Porto, Portugal, January 14-17, pp. 90–96
(2009)
9. Harrell Jr., F.E., Lee, K.L., Mark, D.B.: Multivariable prognostic models: issues in devel-
oping models, evaluating assumptions and adequacy, and measuring and reducing errors.
Stat. Med. 15(4), 361–387 (1996)
10. Kim, S.H., Oh, S.S.: Decision-tree-based Markov model for phrase break prediction. ETRI
Journal 29(4), 527–529 (2007)
11. Knaus, W.A., Draper, E.A., Wagner, D.P., Zimmerman, J.E.: APACHE II: a severity of
disease classification system. Crit. Care Med. 13(10), 818–829 (1985)
12. Knaus, W.A., Wagner, D.P., Draper, E.A., Zimmerman, J.E., Bergner, M., Bastos, P.G., et
al.: The APACHE III prognostic system. risk prediction of hospital mortality for critically
ill hospitalized adults. Chest 100(6), 1619–1636 (1991)
100 F. Gwadry-Sridhar et al.
13. Letarte, J., Longo, C.J., Pelletier, J., Nabonne, B., Fisher, H.N.: patient characteristics and
costs of severe sepsis and septic shock in Quebec. J. Crit Care 17(1), 39–49 (2002)
14. Levy, M.M., Fink, M.P., Marshall, J.C., Abraham, E., Angus, D., Cook, D., et al.:
SCCM/ESICM/ACCP/ATS/SIS International Sepsis Definitions Conference. Crit. Care
Med. 31(4), 1250–1256 (2003)
15. Martin, C.M., Priestap, F., Fisher, H., Fowler, R.A., Heyland, D.K., Keenan, S.P., et al.: A
prospective, observational registry of patients with severe sepsis: The Canadian Sepsis
Treatment and Response Registry. Crit. Care Med. 37(1), 81–88 (2009)
16. Minneci, P.C., Deans, K.J., Banks, S.M., Eichacker, P.Q., Natanson, C.: Meta-analysis: the
effect of steroids on survival and shock during sepsis depends on the dose. Ann. Intern.
Med. 141(1), 47–56 (2004)
17. Pittet, D., Rangel-Frausto, S., Li, N., Tarara, D., Costigan, M., Rempe, L., et al.: Systemic
inflammatory response syndrome, sepsis, severe sepsis and septic shock: incidence, mor-
bidities and outcomes in surgical ICU patients. Intensive Care Med. 21(4), 302–309 (1995)
18. Riaño, D., Prado, S.: A Data Mining Alternative to Model Hospital Operations: Filtering,
Adaptation and Behaviour Prediction. In: Brause, R., Hanisch, E. (eds.) ISMDA 2000.
LNCS, vol. 1933, pp. 293–299. Springer, Heidelberg (2000)
19. Riaño, D., Prado, S.: The Analysis of Hospital Episodes. In: Crespo, J.L., Maojo, V., Mar-
tin, F. (eds.) ISMDA 2001. LNCS, vol. 2199, pp. 231–237. Springer, Heidelberg (2001)
20. Rivers, E., Nguyen, B., Havstad, S., Ressler, J., Muzzin, A., Knoblich, B., et al.: Early
goal-directed therapy in the treatment of severe sepsis and septic shock. N. Engl. J.
Med. 345(19), 1368–1377 (2001)
21. Salvo, I., de, C.W., Musicco, M., Langer, M., Piadena, R., Wolfler, A., et al.: The Italian
SEPSIS Study: preliminary results on the incidence and evolution of SIRS, sepsis, severe
sepsis and septic shock. Intensive Care Med. 21(suppl. 2), 244–249 (1995)
Article
Background: In the last ten years, the international workshop on knowledge representation for health care (KR4HC) has hosted outstanding contributions of the artificial intelligence in medicine community pertaining to the formalization and representation of medical knowledge for supporting clinical care. Contributions regarding modeling languages, technologies and methodologies to produce these models, their incorporation into medical decision support systems, and practical applications in concrete medical settings have been the main contributions and the basis to define the evolution of this field across Europe and worldwide. Objectives: Carry out a review of the papers accepted in KR4HC in the 2009-2018 decade, analyze and characterize the topics and trends within this field, and identify challenges for the evolution of the area in the near future. Methods: We reviewed the title, the abstract, and the keywords of the 112 papers that were accepted to the workshop, identified the medical and technological topics involved in these works, provided a classification of these papers in medical and technological perspectives and obtained the timeline of these topics in order to determine interest growths and declines. The experience of the authors in the field and the evidences after the review were the basis to propose a list of challenges of knowledge representation in health care for the future. Results: The most generic knowledge representation methods are ontologies (31%), semantic web related formalisms (26%), decision tables and rules (19%), logic (14%), and probabilistic models (10%). From a medical informatics perspective, knowledge is mainly represented as computer interpretable clinical guidelines (43%), medical domain ontologies (26%), and electronic health care records (22%). Within the knowledge lifecycle, contributions are found in knowledge generation (38%), knowledge specification (24%), exception detection and management (12%), knowledge enactment (8%), temporal knowledge and reasoning (7%), and knowledge sharing and maintenance (7%). The clinical emphasis of knowledge is mainly related to clinical treatments (27%), diagnosis (13%), clinical quality indicators (13%), and guideline integration for multimorbid patients (12%). According to the level of development of the works presented, we distinguished four maturity levels: formal (22%), implementation (52%), testing (13%), and deployment (2%) levels. Some papers described technologies for specific clinical issues or diseases, mainly cancer (22%) and diseases of the circulatory system (20%). Chronicity and comorbidity were present in 10% and 8% of the papers, respectively. Conclusions: KR4HC is a stable community, still active after ten years. A persistent focus has been knowledge representation, with an emphasis on semantic-web ontologies and on clinical-guideline based decision-support. Among others, two topics receive growing attention: integration of computer-interpretable guideline knowledge for the management of multimorbidity patients, and patient empowerment and patient-centric care.
Conference Paper
The prevalence of diabetes is increasing worldwide. Despite the advances in evidence based therapies, patients with diabetes continue to encounter ongoing morbidity and diminished health-related quality of life. One of the reasons for the diminished benefit from therapy is medication noncompliance. Considerable evidence shows that a combination of therapeutic lifestyle changes (increased exercise and diet modification) and drug treatment can control and, if detected early enough, even prevent the development of diabetes and its harmful effects on health. However, despite the fact that type-2 diabetes is treatable and reversible with appropriate management, patients frequently do not comply with treatment recommendations. In this paper, we use a combination of Expectation Maximization (EM) clustering and Artificial Neural Network (ANN) modeling to determine factors influencing compliance rates, as measured in terms of medication possession ratio (MPR), among patients prescribed fixed dose combination therapy for type 2 diabetes.
Article
Full-text available
Objectives: To examine the incidence of infections and to describe them and their outcome in intensive care unit (ICU) patients. Design and setting: International prospective cohort study in which all patients admitted to the 28 participating units in eight countries between May 1997 and May 1998 were followed until hospital discharge. Patients: A total of 14,364 patients were admitted to the ICUs, 6011 of whom stayed less than 24 h and 8353 more than 24 h. Results: Overall 3034 infectious episodes were recorded at ICU admission (crude incidence: 21.1%). In ICU patients hos-pitalised longer than 24 h there were 1581 infectious episodes (crude incidence: 18.9%) including 713 (45%) in patients already infected at ICU admission. These rates varied between ICUs. Respiratory, digestive, urinary tracts, and primary bloodstream infections represented about 80% of all sites. Hospital-acquired and ICU-acquired infections were documented more frequently micro-biologically than community-acquired infections (71% and 86%, respectively vs. 55%). About 28% of infections were associated with sep-sis, 24% with severe sepsis and 30% with septic shock, and 18% were not classified. Crude hospital mortality rates ranged from 16.9% in non-infected patients to 53.6% in patients with hospital-acquired infections at the time of ICU admission and acquiring infection during the ICU stay. Conclusions: The crude incidence of ICU infections remains high, although the rate varies between ICUs and patient subsets, illustrating the added burden of no-socomial infections in the use of ICU resources.
Article
Full-text available
This paper presents the form and validation results of APACHE II, a severity of disease classification system. APACHE II uses a point score based upon initial values of 12 routine physiologic measurements, age, and previous health status to provide a general measure of severity of disease. An increasing score (range 0 to 71) was closely correlated with the subsequent risk of hospital death for 5815 intensive care admissions from 13 hospitals. This relationship was also found for many common diseases.When APACHE II scores are combined with an accurate description of disease, they can prognostically stratify acutely ill patients and assist investigators comparing the success of new or differing forms of therapy. This scoring index can be used to evaluate the use of hospital resources and compare the efficacy of intensive care in different hospitals or over time.
Article
Full-text available
This paper presents the form and validation results of APACHE II, a severity of disease classification system. APACHE II uses a point score based upon initial values of 12 routine physiologic measurements, age, and previous health status to provide a general measure of severity of disease. An increasing score (range 0 to 71) was closely correlated with the subsequent risk of hospital death for 5815 intensive care admissions from 13 hospitals. This relationship was also found for many common diseases. When APACHE II scores are combined with an accurate description of disease, they can prognostically stratify acutely ill patients and assist investigators comparing the success of new or differing forms of therapy. This scoring index can be used to evaluate the use of hospital resources and compare the efficacy of intensive care in different hospitals or over time.
Article
Multivariable regression models are powerful tools that are used frequently in studies of clinical outcomes. These models can use a mixture of categorical and continuous variables and can handle partially observed (censored) responses. However, uncritical application of modelling techniques can result in models that poorly fit the dataset at hand, or, even more likely, inaccurately predict outcomes on new subjects. One must know how to measure qualities of a model's fit in order to avoid poorly fitted or overfitted models. Measurement of predictive accuracy can be difficult for survival time data in the presence of censoring. We discuss an easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities. Both types of predictive accuracy should be unbiasedly validated using bootstrapping or cross-validation, before using predictions in a new data series. We discuss some of the hazards of poorly fitted and overfitted regression models and present one modelling strategy that avoids many of the problems discussed. The methods described are applicable to all regression models, but are particularly needed for binary, ordinal, and time-to-event outcomes. Methods are illustrated with a survival analysis in prostate cancer using Cox regression.
Article
An American College of Chest Physicians/Society of Critical Care Medicine Consensus Conference was held in Northbrook in August 1991 with the goal of agreeing on a set of definitions that could be applied to patients with sepsis and its sequelae. New definitions were offered for some terms, while others were discarded. Broad definitions of sepsis and the systemic inflammatory response syndrome were proposed, along with detailed physiologic parameters by which a patient may be categorized. Definitions for severe sepsis, septic shock, hypotension, and multiple organ dysfunction syndrome were also offered. The use of severity scoring methods when dealing with septic patients was recommended as an adjunctive tool to assess mortality. Appropriate methods and applications for the use and testing of new therapies were recommended. The use of these terms and techniques should assist clinicians and researchers who deal with sepsis and its sequelae.
Article
In this paper a decision-tree-based Markov model for phrase break prediction is proposed. The model takes advantage of the non-homogeneous-features-based classification ability of decision tree and temporal break sequence modeling based on the Markov process. For this experiment, a text corpus tagged with paris-of-speech and three break strength levels is prepared and evaluated. The complex feature set, textual conditions, and prior knowledge are utilized; and chunking rules are applied to the search results. The proposed model shows an error reduction rate of about 11.6% compared to the conventional classification model.
Article
Objective: To determine the incidence, cost, and outcome of severe sepsis in the United States. Design: Observational cohort study. Setting: All nonfederal hospitals (n = 847) in seven U.S. states. Patients: All patients (n = 192,980) meeting criteria for severe sepsis based on the International Classification of Diseases, Ninth Revision, Clinical Modification. Interventions: None. Measurements and Main Results : We linked all 1995 state hospital discharge records (n = 6,621,559) from seven large states with population and hospital data from the U.S. Census, the Centers for Disease Control, the Health Care Financing Administration, and the American Hospital Association. We defined severe sepsis as documented infection and acute organ dysfunction using criteria based on the International Classification of Diseases, Ninth Revision, Clinical Modification. We validated these criteria against prospective clinical and physiologic criteria in a subset of five hospitals. We generated national age- and gender-adjusted estimates of incidence, cost, and outcome. We identified 192,980 cases, yielding national estimates of 751,000 cases (3.0 cases per 1,000 population and 2.26 cases per 100 hospital discharges), of whom 383,000 (51.1%) received intensive care and an additional 130,000 (17.3%) were ventilated in an intermediate care unit or cared for in a coronary care unit. Incidence increased >100-fold with age (0.2/1,000 in children to 26.2/1,000 in those >85 yrs old). Mortality was 28.6%, or 215,000 deaths nationally, and also increased with age, from 10% in children to 38.4% in those >85 yrs old. Women had lower age-specific incidence and mortality, but the difference in mortality was explained by differences in underlying disease and the site of infection. The average costs per case were $22,100, with annual total costs of $16.7 billion nationally. Costs were higher in infants, nonsurvivors, intensive care unit patients, surgical patients, and patients with more organ failure. The incidence was projected to increase by 1.5% per annum. Conclusions: Severe sepsis is a common, expensive, and frequently fatal condition, with as many deaths annually as those from acute myocardial infarction. It is especially common in the elderly and is likely to increase substantially as the U.S. population ages.