Evaluating the utility of syndromic surveillance algorithms for screening to detect potentially clonal hospital infection outbreaks

Article (PDF Available)inJournal of the American Medical Informatics Association 18(4):466-72 · July 2011with20 Reads
DOI: 10.1136/amiajnl-2011-000216 · Source: PubMed
Abstract
The authors evaluated algorithms commonly used in syndromic surveillance for use as screening tools to detect potentially clonal outbreaks for review by infection control practitioners. Study phase 1 applied four aberrancy detection algorithms (CUSUM, EWMA, space-time scan statistic, and WSARE) to retrospective microbiologic culture data, producing a list of past candidate outbreak clusters. In phase 2, four infectious disease physicians categorized the phase 1 algorithm-identified clusters to ascertain algorithm performance. In phase 3, project members combined the algorithms to create a unified screening system and conducted a retrospective pilot evaluation. The study calculated recall and precision for each algorithm, and created precision-recall curves for various methods of combining the algorithms into a unified screening tool. Individual algorithm recall and precision ranged from 0.21 to 0.31 and from 0.053 to 0.29, respectively. Few candidate outbreak clusters were identified by more than one algorithm. The best method of combining the algorithms yielded an area under the precision-recall curve of 0.553. The phase 3 combined system detected all infection control-confirmed outbreaks during the retrospective evaluation period. Lack of phase 2 reviewers' agreement indicates that subjective expert review was an imperfect gold standard. Less conservative filtering of culture results and alternate parameter selection for each algorithm might have improved algorithm performance. Hospital outbreak detection presents different challenges than traditional syndromic surveillance. Nevertheless, algorithms developed for syndromic surveillance have potential to form the basis of a combined system that might perform clinically useful hospital outbreak screening.
Evaluating the utility of syndromic surveillance
algorithms for screening to detect potentially clonal
hospital infection outbreaks
Randy J Carnevale,
1
Thomas R Talbot,
2,3
William Schaffner,
2,3
Karen C Bloch,
2
Titus L Daniels,
2
Randolph A Miller
1
ABSTRACT
Objective The authors evaluated algorithms commonly
used in syndromic surveillance for use as screening tools
to detect potentially clonal outbreaks for review by
infection control practitioners.
Design Study phase 1 applied four aberrancy detection
algorithms (CUSUM, EWMA, space-time scan statistic,
and WSARE) to retrospective microbiologic culture data,
producing a list of past candidate outbreak clusters. In
phase 2, four infectious disease physicians categorized
the phase 1 algorithm-identified clusters to ascertain
algorithm performance. In phase 3, project members
combined the algorithms to create a unified screening
system and conducted a retrospective pilot evaluation.
Measurements The study calculated recall and
precision for each algorithm, and created precision-recall
curves for various methods of combining the algorithms
into a unified screening tool.
Results Individual algorithm recall and precision ranged
from 0.21 to 0.31 and from 0.053 to 0.29, respectively.
Few candidate outbreak clusters were identified by more
than one algorithm. The best method of combining the
algorithms yielded an area under the precision-recall
curve of 0.553. The phase 3 combined system detected
all infection control-confirmed outbreaks during the
retrospective evaluation period.
Limitations Lack of phase 2 reviewers’ agreement
indicates that subjective expert review was an imperfect
gold standard. Less conservative filtering of culture
results and alternate parameter selection for each
algorithm might have improved algorithm performance.
Conclusion Hospital outbreak detection presents
different challenges than traditional syndromic
surveillance. Nevertheless, algorithms developed for
syndromic surveillance have potential to form the basis
of a combined system that might perform clinically useful
hospital outbreak screening.
INTRODUCTION
Outbreaks of bacterial infections can spread among
hospitalized patients. Such outbreaks are often
facilitated through contact with healthcare
personnel, environmental factors, contaminated
equipment, or contaminated injections. Identica-
tion of hospital-based outbreaks, however, poses
substantial challenges. To determine whether an
outbreak exists, hospital infection control profes-
sionals must rst recognize the presence of a new
pathogen or the emergence of a new pattern of
infection, and then determine whether these nd-
ings merit further investigation or intervention.
Problems during the recognition and investigative
processes incur delays in interventions, and with
delays come increased costs and higher risks of
patient morbidity and mortality.
1
Several recent approaches supplement older
manual outbreak detection practices with auto-
mated outbreak alerting mechanisms. For more than
2 decades, various investigative groups have applied
direct and straightforward algorithmic detection
methods to hospital data to demonstrate improved
sensitivity in inpatient outbreak alerting.
2 3
Rela-
tively few studies, however, have applied the newer
algorithms developed for syndromic surveillance to
single hospital inpatient surveillance. Syndromic
surveillance algorithms have typically used pre-
clinical data (eg, records of over-the-counter phar-
maceutical purchases and of chief complaints from
emergency room visits) in an attempt to detect
outbreaks in outpatient settings over large
geographic areas.
4e6
In order to develop a screening
tool that helps hospital infection control personnel
to identify outbreaks in an individual hospital
setting, the present study utilized microbiology
culture and antibiotic sensitivity results rather than
pre-clinical data as the input for algorithms initially
developed for regional syndromic surveillance. The
authors evaluated the algorithms suitability, singly
and in combination, to screen culture results in
a clinically useful manner.
BACKGROUND
Past approaches to automated hospital outbreak
detection fall into two categories: active and
passive surveillance. Active surveillance approaches
use decision support algorithms to automatically
inform infection control staff of suspicious disease
patterns that require further attention. Passive
surveillance approaches provide tools that simply
aggregate or display information in a more usable
and manipulable electronic format for infection
control staff to review on their own initiative,
allowing them to better detect interesting patterns
manually. Online appendix A contains a brief
summary of these previous approaches to auto-
mated surveillance, with references.
Outbreaks fall into two categories: clonal and
non-clonal. Non-clonal outbreaks typically occur
when infection control techniques are suboptimal
(eg, improper hand washing). The resulting infec-
tions involve many different bacterial species. A
clonal outbreak occurs when progeny of a single
organism spread to multiple patients. Non-clonal
outbreaks are readily identiable by an overall
<
An additional appendix is
published online only. To view
this file please visit the journal
online (www.jamia.org).
1
Department of Biomedical
Informatics, Vanderbilt
University, Nashville,
Tennessee, USA
2
Department of Medicine,
Vanderbilt University School of
Medicine, Nashville, Tennessee,
USA
3
Department of Preventive
Medicine, Vanderbilt University
School of Medicine, Nashville,
Tennessee, USA
Correspondence to
Randy J Carnevale, Department
of Biomedical Informatics,
Vanderbilt University, 2209
Garland Ave, 400 Eskind
Biomedical Library, Nashville,
TN, USA; randy.carnevale@
vanderbilt.edu
Received 12 July 2010
Accepted 22 March 2011
Published Online First
23 May 2011
466 J Am Med Inform Assoc 2011;18:466e472. doi:10.1136/amiajnl-2011-000216
Research and applications
increase in infection rates in a given hospital unit. Clonal
outbreaks, however, may remain unnoticed since the increase in
infections by a single rarer species may not signicantly affect
the overall infection rate. Genetic and molecular ngerprinting
techniques remain the gold standard for determining the clon-
ality of two bacterial isolates from different patients cultures of
the same species. Nevertheless, it is both more efcient and more
cost effective within a given institution to rst screen for
potential clonal outbreaks by comparing antibiotic sensitivity
patterns for each bacterial species identied by cultures.
7
The current exploratory study evaluated the ability of four
algorithms previously applied to regional syndromic surveillance
to serve as screening tools for detecting potential clonal hospital
outbreaksdindividually and in combination. The goal was to
provide useful input to hospital infection control personnel for
further review and possible additional testing. Two of these
aberrancy detection algorithms originated in manufacturing
quality control (CUSUM and EWMA), while the other two
came from syndromic surveillance research (space-time scan
statistic and WSARE).
Statistical process control algorithms: CUSUM and EWMA
Statistical process control originated in 1931, when Walter
Shewhart of Bell Laboratories rst described control chart
methodologies to monitor manufacturing processes.
8
Statistical
process control algorithms use previous data to estimate future
values, including the mean and reasonable upper and lower
limits. If actual future measurements fall within the predicted
limits, the process is under control. Recorded new measure-
ments outside the calculated control limits may indicate that
a noteworthy change has occurred in the underlying process.
The simplest statistical process control algorithms set upper and
lower limits as a multiple of the previously measured standard
deviation and plot each new measurement against these limits.
While this approach provides a method easy enough to plot
manually on a graph, it does not effectively detect small shifts in
the mean.
9
CUSUM, the rst algorithm deployed in the current study, is
calculated by taking the cumulative summation of the difference
between each measured value x
e
and the estimated in-control
mean bm
0
9
:
S
m
¼ +
m
i¼1
ðx
e
i
bm
0
In a process that is under control, each measured value x
e
should be reasonably close to the mean. Thus, a plot of each
calculated value of S
m
should be centered at zero with small
uctuations up or down. When calculating upper and lower
bounds for S
m
, methods that increase the bounds over time
( V-mask methods) have historically provided greater sensitivity
to small shifts in the mean and decreased impact from older
measurements as compared to traditional control charts.
10 11
Another approach to improving Shewharts original control
charts, the exponentially weighted moving average statistic
(EWMA), directly incorporates exponentially decreasing weights
applied successively to old values, thus providing a measurement
less affected by random noise than CUSUM. EWMA is recur-
sively dened as:
EWMA
t
¼
l
Y
t
þð1
l
ÞEWMA
t1
where EWMA
0
is the historical mean, Y
t
is the measurement at
time t, and
l
is the decay rate of past measurements, with
0<
l
#1.
9
At
l
¼1, the EWMA formula matches the Shewhart
control chart formula. Optimal
l
values vary depending on the
problem domain, but empirically, values between 0.2 and 0.3
have provided good performance in manufacturing.
912
The typical upper and lower bounds for EWMA are similar to
those used in Shewharts control charts, and are given by
EWMA
0
6ðks
EWMA
Þ with standard deviations s
EWMA
and factor k
depending on the problem domain.
12
The value of
l
affects the
variance of the EWMA statistic and thus the limits, as the
estimated variance is given by:
s
2
EWMA
¼
l
2
l
s
2
where s
2
is the historical variance. Although more difcult to
calculate, EWMA charts have the benet of being more sensitive
to small shifts in the mean than Shewharts control charts while
still being as easy to interpret graphically.
Syndromic surveillance algorithms: space-time scan statistic
and WSARE
Following the 2001 anthrax attacks in the USA,
13
fears of
bioterrorism increased interest in the nascent eld of syndromic
surveillance. Such systems identify infectious disease outbreaks
using pre-clinical data (eg, emergency room visits, pharmaceu-
tical purchases, etc) over a large geographic area. The current
study applied two algorithms previously developed specically
for syndromic sur veillance to the hospital setting: Kulldorffs
space time scan statistic (STSS) and Whats Strange About
Recent Events (WSARE).
Martin Kulldor ff rst introduced STSS in 1997.
14
At the time,
most syndromic surveillance researchers used purely temporal
disease cluster detection methods, including the algorithms used
in statistical process control.
45
The STSS algorithm incorporates
spatial information into its detection as well to attempt to
improve detection over a large geographic area. It uses a two-
stage process. First, STSS searches the study area for the circular
region most likely to be a disease cluster assuming the disease
follows either a Bernoulli model or a Poisson model. Second, it
estimates the statistical signicance of the cluster using Monte
Carlo simulation. Many studies have employed STSS with
success, including those observing commonly occurring infec-
tious diseases,
15
emerging infectious diseases,
16
and cancer inci-
dence.
17 18
Complete details regarding the STSS algorithm
appear in Kulldor ff s publications.
14 15 19
As STSS addressed the growing need for incorporating spatial
data, WSARE addressed the growing need for a cluster detection
algorithm that could incorporate multidimensional data (eg,
gender, age, and location in addition to disease status).
4 520
WSARE
rst constructs a Bayesian network model based on the problem
domains historical data. It then uses the Bayesian network to
nd the single best clustering rule for the given day and esti-
mates a p value using Benjamini and Hochbergs False Discovery
Rate method
21
to adjust for the multiple hypothesis tests.
20
Because the underlying Bayesian model can include a node for
each data element, WSARE easily incorporates multidimensional
data. For example, if the data include gender, zip code, and
inuenza diagnoses, WSARE could in theory detect an increase
in inuenza across the study region, an increase in inuenza in
women region-wide, or an increase in inuenza in one specic
zip code. WSAREs primary use has been in conjunction with
the RODS public health sur veillance system
22
both for tempo-
rary short term monitoring of the 2002 Winter Olympics
23
and
for long-term public health surveillance of the state of Penn-
sylvania.
24
Complete details of the WSARE algorithm appear in
Wong et al.
20
J Am Med Inform Assoc 2011;18:466e472. doi:10.1136/amiajnl-2011-000216 467
Research and applications
METHODS
Study design
This study evaluated the ability of four aberrancy detection
algorithms to function as a screening tool for identifying
potentially clonal outbreaks at a single site using de-identied
microbiologic culture data. The four evaluated algorithms
included two custom implementations (CUSUM
9
and EWMA
9
)
and two reference implementations (WSARE
20
and Kulldorff s
space-time scan statistic,
14
SaTScan). The de-identied dataset
included daily case counts for each organism taken from all
microbiologic culture data collected from 2001 through 2006
from Vanderbilt University Hospital and Monroe Carell Jr.
Childrens Hospital at Vanderbilt-afliated inpatient units,
outpatient clinics, and emergency rooms. It included only the
rst result of a given culture type (ie, organism and sensitivity
pattern) for each patient on each unit to avoid giving extra
weight to multiple serial cultures of the same organism from the
same patient.
The study comprised three phases. Phase 1 implemented the
four aberrancy detection algorithms using the hospital-derived
retrospective microbiologic culture data, producing a list of
potential past outbreak clusters. In phase 2, four Vanderbilt
University School of Medicine Infectious Diseases faculty
members who were blinded to algorithm source reviewed and
categorized the suspected clusters to ascertain the performance
of each phase 1 algorithm. In phase 3, project members empir-
ically used the phase 2 results as feedback to adjust conguration
parameters associated with each algorithm and investigated
additional methods for combining the algorithms output into
a single outbreak detection screening tool. The authors then
carried out a 6-month retrospective evaluation of the new
system. The Vanderbilt University Institutional Review Board
approved the study prior to its initiation.
Phase 1: Algorithm execution
The study congured each algorithm to identify clusters of
positive cultures from daily case-culture counts for each organ-
ismdboth for individual hospital units and across the entire
institution. The study divided the culture dataset into three
parts. The rst set (1 year; January 1, 2001eDecember 31, 2001)
provided historical seed data for each algorithm. The second set
(3 years; January 1, 2002eDecember 31, 2004) served as a testing
set for tuning the parameters of each algorithm and designing
the review module before study initiation. This second set also
provided additional historical baseline data for the nal review.
The third set (2 years; January 1, 2005eDecember 31, 2006)
provided the testing data for the study phase 2 expert review.
The study converted output from each of the four study algo-
rithms into a common format to prevent the reviewers from
identifying which algorithm had generated the cluster.
Phase 2: Expert review process
The project developed a web-based review module that collec-
tively and serially displayed the clusters identied by the algo-
rithms to the group of expert reviewers. Each reviewer had
substantial experience as a hospital-afliated physician-epide-
miologist. Using the web-based review module, the reviewers
classied each computer-generated cluster as a potential
outbreak or a spurious cluster and further delineated each
outbreak occurrence as probable (likely a real outbreak), or
possible (not certain if a real outbreak). They produced their
assessments based on geographic and temporal data regarding
a given set of culture results comprising an algorithm-dened
cluster. The reviewers could drill down on each cluster to view
narrative culture result reports and antibiotic sensitivities as
needed. The reviewers also noted whether they would have
conducted any further investigations had they been both aware
of the cluster and responsible for hospital infection control at
the time the cluster occurred. Each expert conducted an inde-
pendent review while blinded to the assessments made by the
other experts. As indicated in table 1, the study converted the
experts designations into a binary classication, labeling
a cluster as a candidate outbreak if the experts identied it as
a probable outbreak or a possible outbreak that merited further
investigation. In an actual outbreak investigation, hospital
infection control staff would conduct additional serologic or
genetic testing of each candidate bacterial isolate to determine
whether the cluster represented a true outbreak; no such data
were available regarding the clusters the experts reviewed.
The study assigned two of the four expert reviewers to
examine each algorithm-identied potential cluster indepen-
dently. Discordant assessments were resolved by submitting
each to a tiebreaker reviewer randomly selected from the two
reviewers who had not previously evaluated the cluster. To
calibrate the reliability of the tiebreaking opinions, the study
also presented the tiebreak reviewers with several randomly
chosen clusters on which the rst two reviewers determinations
agreed (either as candidates or not).
The study supplemented the list of candidate outbreaks
identied by the review process (as dened above) with ve
infection control-investigated clusters that had been indepen-
dently characterized previously by the hospitals infection
control staff. These ve consisted of disease clusters subjected to
genetic or serologic testing during the study time period.
Following the clinicians reviews, the study calculated the
sensitivity and positive predictive value (recall and precision) for
each cluster identication algorithm based on the consensus
classications (by two or three reviewers, per protocol) of
suspected outbreaks and infection control-investigated clusters.
The study compared the individual algorithms
performance
statistics pairwise using McNemars test. Figure 1 summarizes
the processes followed in phases 1 and 2.
Phase 3: Parameter tuning, precision-recall analysis, combined
tool development, and retrospective evaluation
In study phase 3, the project empirically analyzed the effects of
varying algorithm parameters on each algorithms ability to
identify phase 2 expert-labeled candidate outbreaks. The study
also explored potential methods of combining the individual
algorithms with additional heuristic data to produce better
candidate outbreak identication than obtained by the indi-
vidual algorithms per se.
A rst approach was to adjust parameters for the custom-
izable algorithm that demonstrated better performance in phase
2 (CUSUM or EWMA) to detect as many of the candidate
outbreaks as possible. For each of the expert-identied candidate
outbreak clusters, the study calculated k, the minimum
threshold at which the chosen algorithm would generate an alert
Table 1 Phase 2 expert categorization of phase 1 algorithm-identified
clusters
Probable
outbreak
Possible
outbreak
Non-
outbreak
Would investigate Candidate Candidate False positive
No investigation necessary Candidate False positive False positive
468 J Am Med Inform Assoc 2011;18:466e472. doi:10.1136/amiajnl-2011-000216
Research and applications
for the outbreak, using varying decay rates
l
(0.05, 0.07, 0.1,
0.15, 0.2, 0.25, and 0.3). Project members recorded the number of
additional alerts that would also have triggered at the given
value of k. Based on these measurements, the study determined
the optimal value of
l
and generated precision-recall curves for
varying values of k when using the optimized algorithm.
The study also explored methods of combining the output
from the four original algorithms using various scoring metrics
by which the resulting clusters could be ranked. A rst step
attempted to order the clusters by their previously measured
value of k. Project members then made additional adjustments
to the rank weights regarding several features identied as
potentially important by the expert Infectious Disease faculty
reviewers during the phase 2 review, including hospital location
type (inpatient vs outpatient) and primary culture source type
(urine, blood, wound, etc).
The study examined the potential for not alerting for
clusters comprised of organisms with substantially different
antibiotic susceptibilities. This approach had the potential to
eliminate noise due to clusters com prised of different clones
from the same bacterial species. For each cluster for which
sensitivity results were available for at least 50% of component
cultures, project members developed an algorithm that calcu-
lated an antibiotic suscepti bili ty difference score b y summing
the number of individual antibioti c sensitivity result pairwise
differences and weighting the overall result by the number of
cultures having each of the compared patterns. The resulting
score thus represented the average number of differing antibi-
otic sensitivities between each pair of bacterial isolates. This
ltering method, applied to the output of the individual
screening algorithms, allowed the analysis to exclude clusters
not meeting empirically derived u nifor mity limits (ie, those
that appeared to be non-clonal based on varied culture sensi-
tivities) while st ill allowing the system to detect potentially
clonal clu sters that had mutated only slightly in their antibi-
otic resistance over the course of the outbreak. A nal best-case
heuristic combination of these methods comprised the phase 3
combined detection s ystem. With these adjustments in place,
phase 3 of the study concluded by conducting a brief retro-
spective validation of the combined outbreak detection
systems recall. The system was run using data from January 1,
2010 to June 30, 2010 and the resulting clusters were compared
to the list of conrmed outbreaks that had been previously
discovered by hospital infection control staff using manual
methods.
RESULTS
Phase 1: Algorithm parameters
Using the rst and second datasets, the authors empirically set
the parameters for each algorithm. For EWMA, authors set
a decay rate
l
¼0.3 and an alerting threshold k¼5. For CUSUM,
the authors used a V-mask for determining the alerting
threshold with a daily rise of three times the standard deviation
of the CUSUM statistic for each particular organism. SaTScan
was executed using its purely temporal Poisson model, and
WSARE with its Fishers exact scoring metric and 100
randomizations for each day.
Phase 2.1: Expert review results
For institution-wide microbial data covering the 2-year study
period, the four outbreak detection algorithms collectively
generated a total of 257 alerts (CUSUM: 114, EWMA: 66,
SaTScan: 21, WSARE: 56). To present alerts to clinical expert
reviewers, the study combined any computer-generated alerts
with start and stop dates differing by fewer than 2 days into one
single alert. As a result, six alerts detected by two algorithms and
one alert detected by three algorithms were combined to form
the nal review list of 249 alerts.
Percent agreement on the clusters between the two assigned
reviewers ranged from 79% to 88% with Cohens
k
ranging from
0.11 to 0.49 (table 2). Overall, reviewers agreed on their deter-
minations for 210 of the 249 alerts, with 17 (8.1%) deemed
candidate outbreaks.
For the 39 clusters on which the pair of initial reviewer
assessments disagreed, the study assigned a randomly selected
third reviewer. Of the 39, the third reviewer deemed nine (23%)
to be candidate outbreaks. Six randomly selected candidate
outbreaks (where the two initial reviewers agreed the cluster
was a potential outbreak) and six randomly selected false alarms
(where the reviewers had agreed the cluster was not an
outbreak) were also assigned to a random third reviewer. The
third reviewer agreed with the rst two reviewers on all six of
the false alarms. However, for the six pairwise-agreed-upon
candidate outbreaks, the third expert reviewer only agreed with
the initial experts judgment once (17%).
The hospital infection control service had previously identied
ve suspected outbreak clusters during the study period. Those
clusters were not detected by any of the algorithms as originally
congured for the phase 1 study. Of the ve, two have been
excluded from the study analysis. In one, the laboratory assay for
the involved organism, Clostridium difcile, was not included in the
input since the dataset only included organisms identied by
microbiological culturing and thus Cdifcile antigen could not be
detected by the algorithms. In the other, the outbreak spanned
several months and began prior to the beginning of the study
period. The study gold standard outbreak dataset therefore
contained 29 candidate outbreaks: 17 from the initial expert
consensus review, nine from the second expert conict-resolving
review, and three from the infection control archival data.
Figure 1 Flow of microbiologic culture data during study phases 1 and 2.
Table 2 Percent agreement between reviewers
(Cohen’s
k
in parentheses)
Reviewer 1 Reviewer 2 % Agreement
A B 86% (0.22)
A C 81% (0.47)
A D 88% (0.48)
B C 85% (0.49)
B D 88% (0.38)
C D 79% (0.11)
J Am Med Inform Assoc 2011;18:466e472. doi:10.1136/amiajnl-2011-000216 469
Research and applications
Phase 2.2: Algorithm performance
For the four evaluated algorithms, the positive predictive value
relative to the study-derived gold standard ranged from 5.3% to
29%, with sensitivities ranging from 21% to 31%. Table 3 shows
individual results for each algorithm. The differences in sensi-
tivity were not sufcient to reject the null hypothesis that the
algorithms had identical performance. For positive predictive
value, CUSUM was signicantly lower than all other algorithms
(p<0.001 in all comparisons), and EWMA and WSARE were
signicantly lower than SaTScan (p<0.001 for each).
Stratifying the analysis by location type (hospital-wide clus-
ters and inpatient units as inpatient; clinics and emergency
rooms as outpatient) demonstrated that clusters from inpatient
locations were much more likely to be considered candidate
outbreaks than clusters from outpatient locations (inpatient: 21/
120 clusters vs outpatient: 5/129 clusters;
c
2
p¼0.002).
Phase 3.1: Parameter adjustment
As EWMA yielded both better positive predictive value and
sensitivity than CUSUM, project members adjusted EWMAs
decay rates and minimum alerting thresholds in phase 3. After
the adjustments, EWMA detected up to 24 of the 29 candidate
outbreaks, but its positive predictive value suffered at this
sensitivity, with 629 false alarms (3.7%) at this most sensitive
setting.
Phase 3.2: Scoring metrics
Using the minimum alerting threshold k as the initial ranking
metric to sort the original list of 249 clusters generated by the
four algorithms yielded an area under the precision-recall curve
(AUC) of 0.283, where the AUC for a precision-recall curve
represents the average overall precision. A linear interpolation of
the expert reviewers performance targets of 0.5 precision at 0.9
recall and 0.75 precision at 0.25 recall gives a target AUC of 0.65.
Figure 2 shows the precision-recall curve for this initial metric,
with the curve for the adjusted EWMA algorithm and points for
each of the individual algorithms.
To investigate whether primary culture specimen type could
help to separate clinically signicant clusters from less impor-
tant ones, project members developed an algorithm that labeled
each cluster by specimen type (blood, urine, wound, etc) if more
than 50% of the cultures in a given cluster shared a common
source. A
c
2
test compared that specimen type to all other
cultures independent of source type. The only statistically
signicant relationship this analysis identied was that urine
cultures were less reliable indicators of clusters than other
specimen types (2.0% of urine vs 13% non-urine; p¼0.029). After
adjusting the ranking metric downward for clusters of urine
cultures, the k-sorted precision-recall AUC improved from 0.283
to 0.356. As observed in phase 2, clusters in inpatient locations
were more likely to produce candidate outbreaks than clusters
in outpatient units. After increasing the ranking metric for
inpatient clusters, the AUC rose from 0.356 to 0.489.
Project members calculated antibiotic susceptibility difference
scores for the 165 clusters that met the 50% criterion, including
six of the 19 candidate outbreaks. Antibiotic susceptibility
difference scores ranged from 0 to 138 in the false alarm clusters
and from 0 to 2.7 in the candidate outbreaks. Based on these
results, project members generated new precision-recall curves
after eliminating all clusters with similarity scores greater than a
conservative threshold of 5 and an aggressive threshold of 3.
These adjustments increased the precision-recall AUC from
0.489 to 0.528 for the conser vative threshold and to 0.553 for the
aggressive threshold. Precision-recall curves for each of these
adjustments are shown in gure 3.
Phase 3.3: Retrospective evaluation of combined algorithms
During the 6-month retrospective evaluation period, infection
control staff identied and conrmed two single-unit outbreaks:
an outbreak of vancomycin-resistant Enterococcus, and an
outbreak of C difcile. Unlike the phase 2 dataset, in phase 3,
non-culture assays were added, allowing the system to detect
the C difcile outbreak. The system detected a total of 41 clusters
during that time period, including both of the conrmed
outbreak clusters. No phase 2-type expert analyses of the other
39 clusters were conducted.
DISCUSSION
This exploratory study attempted to determine whether one or
more aberrancy detection algorithms might be adapted to
screening for potentially clonal hospital outbreak detection.
Because each algorithm produced a list of interesting suspect
clusters substantially different from the others, an ideal system
in this setting would consist of multiple algorithms working
together.
Cluster review
Analysis of the expert review process demonstrated the degree of
subjectivity in determining which clusters were potentially
interesting. The rst round of reviews only managed moderate
levels of inter-rater agreement as shown in table 2. Because the
Table 3 Cluster determination by algorithm
Candidate Non-candidate PPV Sensitivity
CUSUM 6 108 5.3% 21%
EWMA 9 57 14% 31%
SaTScan 6 15 29% 21%
WSARE 7 49 13% 24%
PPV, positive predictive value.
Figure 2 Precision-recall measurements for individual algorithms;
precision-recall curves for EWMA adjustments and initial scoring metric.
470 J Am Med Inform Assoc 2011;18:466e472. doi:10.1136/amiajnl-2011-000216
Research and applications
overall prevalence of true positive clusters was relatively low,
measured values of Cohens
k
were low despite a high
percentage of agreement between reviewers. The low
k
suggests
that despite having similar training and using similar review
criteria, the expert reviewers disagreed fairly often, and that
constructing a true gold standard is not possible. In the second
round tiebreaker reviews, the third reviewer only agreed with
the initial reviews on 17% of the seed candidate outbreaks. By
contrast, when the third reviewer examined clusters which one
of the two original reviewer had designated as a candidate
cluster and the other had not, the third reviewer designated the
cluster as a candidate 23% of the time.
The low reviewer agreement suggests that an ideal hospital
outbreak detection screening tool should favor sensitivity over
positive predictive value since experts may disagree on which
clusters merit further investigation. This strategy is further
supported by standard infection control practice: in a prospec-
tive study, further investigation including molecular typing
would have followed on each of the potentially interesting
clusters to conrm clonality. Because the investigation will
easily distinguish true positives from false positives, it is more
important that the detection system acts as a screening test
that does not produce many false negatives.
System performance and ranking
The lack of consensus among alerts generated by the four
algorithms and the excessive false positive rate for the param-
eter-adjusted EWMA algorithm suggest that none of the four
algorithms evaluated can solely provide a reliable alerting
mechanism. Thus, to create a functionally useful alerting system
for hospital infection control purposes, some algorithmic
combination technique that leverages the relative strengths of
each individual algorithm will likely provide the best overall
system.
Prior to the current studys data analysis, the expert reviewers
stated that performance goals for a useful outbreak screening
system that they would use in practice would require a 50%
positive predictive value at 0.9 sensitivity and 0.25 sensitivity at
a 75% positive predictive value. Ranking the combined list of
clusters using the adjusted scoring metric and eliminating clus-
ters with dissimilar antibiotic susceptibilities allowed us to
achieve a 40% positive predictive value up to a sensitivity of 0.9
and a sensitivity of approximately 0.15 at a positive predictive
value of 75%. While these results did not attain the targeted
performance levels, our experts found them encouraging, and
further improvements may be possible.
Limitations
The subjectivity of the review process led to an imperfect gold
standard list of candidate outbreaks. The gold standard list
could easily have missed some true outbreaks due to reviewer
disagreement on what constituted a candidate cluster. Further-
more, the selection of algorithms for the study did not include
the newest syndromic surveillance methods
25e27
and the
parameter tuning required to implement each of the four algo-
rithms may not have been optimal, with the result that true
outbreak clusters may have been omitted from the algorithms
output lists before ever being seen by the reviewers. That none
of infection control veried outbreaks during the study period
appeared on the combined output list of the four algorithms
suggests that suboptimal detection at the algorithmic level was
a factor in our study.
The culture results dataset used to generate the alerts also
contained potential methodological aws. The study used only
the rst result for a given organism/patient/unit combination in
the dataset. While this approach prevents spurious alerts for
multiple consecutive positive cultures on the same patient, it
may have been too conservative overall. For example, a patient
with Escherichia coli cultures in January 2005 and January 2006
would only be included in 2005, although it is unlikely that the
patients infection lasted a full year. Additional errors may also
arise from the systems lack of information about changes
within the hospital over time. For example, in late 2005
(approximately halfway through the study period), the burn
intensive care unit was relocated to another geographic ward, so
new patient-organism-location clusters that previously would
have been suppressed as duplicate cultures were not suppressed
since they were reported from a different geographic unit. In
Figure 3 Precision-recall curves for
adjusted scoring metrics.
J Am Med Inform Assoc 2011;18:466e472. doi:10.1136/amiajnl-2011-000216 471
Research and applications
addition, some clusters were simply a result of increased
surveillance for certain organisms or an increase in a hospital
units size or number of patient days as the study did not adjust
for increases in patient bed days.
The adjustment for antibiotic sensitivity similarity was
somewhat crude. For example, if an algorithm detected a cluster
made up of two distinct clones with widely differing sensitiv-
ities, the resulting average difference between the two could be
large enough to eliminate the cluster from further consideration.
Ideally, available antibiotic sensitivity data should be included
earlier in the detection process.
Lastly, the performance of the system on retrospective datasets
does not guarantee similar future performance. Because the review
process was time consuming for the reviewers and the number of
expected candidate outbreaks was limited, the resulting parameter
adjustments have not been validated extensively. The optimal
alerting thresholds determined in the current study may be
overtted to the current data. Nevertheless, the 6-month retro-
spective evaluation demonstrated that the resulting system was
able to detect all outbreaks conrmed by hospital infection
control staff during that time period.
CONCLUSION
The current study explored the potential for a syndromic-
surveillance-based approach to screening for potentially clonal
inpatient infectious disease outbreaks. Each of the four aber-
rancy detection algorithms that the study examined had
different performance characteristics that limited its individual
applicability to the problem at hand. However, by combining
the output from each algorithm and then sorting and ltering
the possible clusters that the algorithms identify based on
additional heuristic data that the algorithms cannot easily
incorporate, the authors created a prototypic combined
screening tool that demonstrated better potential to be clinically
useful for hospital outbreak detection than any of the individual
algorithms. Thus, while in-hospital outbreak surveillance pres-
ents different challenges than those faced by regional syndromic
surveillance, the algorithms developed for syndromic surveil-
lance may eventually be adapted to the inpatient screening
setting. Further, more formal evaluation of such combined
systems should occur.
Funding This study was funded by the National Library of Medicine, National
Institutes of Health (grants T15 LM007450-08 and 5R01-LM07995-06).
Competing interests None.
Ethics approval Vanderbilt University IRB approved this study.
Provenance and peer review Not commissioned; externally peer reviewed.
REFERENCES
1. Sagel U, Schulte B, Heeg P, et al. Vancomycin-resistant enterococci outbreak,
Germany, and calculation of outbreak start. Emerg Infect Dis 2008;14:317e19.
2. Hacek DM, Cordell RL, Noskin GA, et al. Computer-assisted surveillance for detecting
clonal outbreaks of nosocomial infection. JClinMicrobiol2004;42:1170e5.
3. Sagel U, Mikolajczyk RT, Kra
¨
mer A. Using mandatory data collection on
multiresistant bacteria for internal surveillance in a hospital. Methods Inf Med
2004;43:483e5.
4. Buckeridge DL. Outbreak detection through automated surveillance: a review of the
determinants of detection. J Biomed Inform 2007;40:370 e 9.
5. Buckeridge DL, Burkom H, Campbell M, et al. Algorithms for rapid outbreak
detection: a research synthesis. J Biomed Inform 2005;38:99e113.
6. Bravata DM, McDonald KM, Smith WM, et al. Systematic review: surveillance
systems for early detection of bioterrorism-related diseases. Ann Intern Med
2004;140:910e22.
7. Tenover FC, Arbeit R, Archer G, et al. Comparison of traditional and molecular methods
of typing isolates of Staphylococcus aureus. JClinMicrobiol1994;32:407e15.
8. Shewhart W. Economic Control of Quality of Manufactured Product. New York:
D. Van Nostrand Company Inc., 1931.
9. National Institute of Standards and Technology. NIST/SEMATECH e-Handbook
of Statistical Methods, 2008. http://www.itl.nist.gov/div898/handbook/index.htm.
10. Rowlands RJ, Nix ABJ, Abdollahian MA, et al. Snub-Nosed V-Mask Control
Schemes. Journal of the Royal Statistical Society: Series D (The Statistician)
1982;31:133
e42.
11. Lucas JM. A Modified “V” Mask Control Scheme. Technometrics 1973;15:833e47.
12. Lucas JM, Saccucci MS, Robert V, et al. Exponentially weighted moving average
control schemes: properties and enhancements. Technometrics 1990;32:1e29.
13. Jernigan JA, Stephens DS, Ashford DA, et al; Anthrax Bioterrorism Investigation
Team. Bioterrorism-related inhalational anthrax: the first 10 cases reported in the
United States. Emerg Infect Dis 2001;7:933e44.
14. Kulldorff M. A spatial scan statistic. Commun Stat Theory Methods
1997;26:1481e96.
15. Kulldorff M, Heffernan R, Hartman J, et al. A space-time permutation scan statistic
for disease outbreak detection. PLoS Med 2005;2:216e24.
16. Mostashari F, Kulldorff M, Hartman JJ, et al. Dead bird clusters as an early warning
system for West Nile virus activity. Emerg Infect Dis 2003;9:641e6.
17. Kulldorff M. Prospective Time Periodic Geographical Disease Surveillance Using
a Scan Statistic. Journal of the Royal Statistical Society. Series A (Statistics in
Society) 2001;164:61e72.
18. Kulldorff M, Feuer EJ, Miller BA, et al. Breast cancer clusters in the northeast
United States: a geographic analysis. Am J Epidemiol 1997;146:161e70.
19. Kulldorff M, Tango T, Park P. Power comparisons for disease clustering tests.
Comput Stat Data Anal 2003;42:665e84.
20. Wong W, Moore A, Cooper G, et al. What’s strange about recent events (WSARE):
An algorithm for the early detection of disease outbreaks. J Mach Learn Res
2005;6:1961e98.
21. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: a Practical and
Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series
B (Methodological) 1995;57:289 e 300.
22. Espino JU, Wagner M, Szczepaniak C, et al. Removing a barrier to computer-based
outbreak and disease surveillanceethe RODS Open Source Project. MMWR Morb
Mortal Wkly Rep 2004;(53 Suppl):32e9.
23. Gesteland PH, Gardner RM, Tsui FC, et al. Automated syndromic surveillance for the
2002 Winter Olympics. J Am Med Inform Assoc 2003;10 :547e54.
24. Tsui FC, Espino JU, Dato VM, et al. Technical description of RODS: a real-time public
health surveillance system. J Am Med Inform Assoc 2003;10:399e408.
25. Mnatsakanyan ZR, Burkom HS, Coberly JS, et al. Bayesian information fusion
networks for biosurveillance applications. J Am Med Inform Assoc 2009;16:855e63.
26. Jian g X, Cooper GF. A recursive algorithm for spatial cluster detection. AMIA Annu
Symp Proc 2007;2007:369e73.
27. Que J, Tsui FC. A Multi-level spatial clustering algorithm for detection of disease
outbreaks. AMIA Annu Symp Proc 2008;2008:611e15.
PAGE fraction trail=7
472 J Am Med Inform Assoc 2011;18:466e472. doi:10.1136/amiajnl-2011-000216
Research and applications
    • "There was a special section focused on CRI papers in the December 2011 supplement issue. Much of the increase can be attributed to publications from awardees of the CTSA, since publication rate is related to funding.38 JAMIA publications acknowledging CTSA funding rose from three in 200939–41 to four in 201014 42–44 and 15 in 2011.15 17 19 36 45–55 Some of the articles were not exclusively focused on CRI, but were directly related, covering many different topics that are highly relevant to CRI: data models and terminologies,27 56–68 natural language processing (NLP),16 50 61 69–99 surveillance systems,48 65 80 100–110 and privacy technology and policy.33 111–117 This 2012 CRI supplement adds 18 new publications to this growing field. "
    [Show abstract] [Hide abstract] ABSTRACT: Clinical research informatics is the rapidly evolving sub-discipline within biomedical informatics that focuses on developing new informatics theories, tools, and solutions to accelerate the full translational continuum: basic research to clinical trials (T1), clinical trials to academic health center practice (T2), diffusion and implementation to community practice (T3), and 'real world' outcomes (T4). We present a conceptual model based on an informatics-enabled clinical research workflow, integration across heterogeneous data sources, and core informatics tools and platforms. We use this conceptual model to highlight 18 new articles in the JAMIA special issue on clinical research informatics.
    Full-text · Article · Apr 2012
  • [Show abstract] [Hide abstract] ABSTRACT: Infections and outbreaks of antimicrobial-resistant bacteria, such as methicillin-resistant Staphylococcus aureus (MRSA) and vancomycin-resistant enterococcus (VRE), have been increasing. Detection methods for antimicrobial-resistant bacteria have been changed from traditional culture methods to chromogenic media culture and molecular methods. Strain-typing methods using various molecular technologies are essential tools for epidemiologic surveillance. Furthermore, outbreak detection, using syndromic surveillance as well as passive and active surveillance, has been applied. However, it is difficult to establish effective and robust guidelines and systems for using these various methods to control antimicrobial-resistant bacteria. Therefore, clinical microbiologists and policy makers must possess expertise in the control of antimicrobial resistant bacteria, discuss the issue sufficiently, and, finally, create a system to accomplish this control.
    Full-text · Article · Jan 2013
  • [Show abstract] [Hide abstract] ABSTRACT: Real-time alerting systems typically warn providers about abnormal laboratory results or medication interactions. For more complex tasks, institutions create site-wide 'data warehouses' to support quality audits and longitudinal research. Sophisticated systems like i2b2 or Stanford's STRIDE utilize data warehouses to identify cohorts for research and quality monitoring. However, substantial resources are required to install and maintain such systems. For more modest goals, an organization desiring merely to identify patients with 'isolation' orders, or to determine patients' eligibility for clinical trials, may adopt a simpler, limited approach based on processing the output of one clinical system, and not a data warehouse. We describe a limited, order-entry-based, real-time 'pick off' tool, utilizing public domain software (PHP, MySQL). Through a web interface the tool assists users in constructing complex order-related queries and auto-generates corresponding database queries that can be executed at recurring intervals. We describe successful application of the tool for research and quality monitoring.
    Article · Nov 2013