ArticlePDF Available

Big Data in Health Care: Using Analytics to Identify and Manage High-Risk and High-Cost Patients

Authors:

Abstract

The US health care system is rapidly adopting electronic health records, which will dramatically increase the quantity of clinical data that are available electronically. Simultaneously, rapid progress has been made in clinical analytics-techniques for analyzing large quantities of data and gleaning new insights from that analysis-which is part of what is known as big data. As a result, there are unprecedented opportunities to use big data to reduce the costs of health care in the United States. We present six use cases-that is, key examples-where some of the clearest opportunities exist to reduce costs through the use of big data: high-cost patients, readmissions, triage, decompensation (when a patient's condition worsens), adverse events, and treatment optimization for diseases affecting multiple organ systems. We discuss the types of insights that are likely to emerge from clinical analytics, the types of data needed to obtain such insights, and the infrastructure-analytics, algorithms, registries, assessment scores, monitoring devices, and so forth-that organizations will need to perform the necessary analyses and to implement changes that will improve care while reducing costs. Our findings have policy implications for regulatory oversight, ways to address privacy concerns, and the support of research on analytics.
At the Intersection of Health, Health Care and Policy
doi: 10.1377/hlthaff.2014.0041
, 33, no.7 (2014):1123-1131Health Affairs
High-Cost Patients
Big Data In Health Care: Using Analytics To Identify And Manage High-Risk And
Escobar
David W. Bates, Suchi Saria, Lucila Ohno-Machado, Anand Shah and Gabriel
Cite this article as:
http://content.healthaffairs.org/content/33/7/1123.full.html
available at:
The online version of this article, along with updated information and services, is
For Reprints, Links & Permissions: http://healthaffairs.org/1340_reprints.php http://content.healthaffairs.org/subscriptions/etoc.dtlE-mail Alerts : http://content.healthaffairs.org/subscriptions/online.shtmlTo Subscribe:
written permission from the Publisher. All rights reserved.
mechanical, including photocopying or by information storage or retrieval systems, without prior
may be reproduced, displayed, or transmitted in any form or by any means, electronic orAffairs HealthFoundation. As provided by United States copyright law (Title 17, U.S. Code), no part of
by Project HOPE - The People-to-People Health2014Bethesda, MD 20814-6133. Copyright ©
is published monthly by Project HOPE at 7500 Old Georgetown Road, Suite 600,Health Affairs
Not for commercial use or unauthorized distribution
at BRIGHAM & WOMENS HOSP on July 10, 2014Health Affairs by content.healthaffairs.orgDownloaded from at BRIGHAM & WOMENS HOSP on July 10, 2014Health Affairs by content.healthaffairs.orgDownloaded from
By David W. Bates, Suchi Saria, Lucila Ohno-Machado, Anand Shah, and Gabriel Escobar
Big Data In Health Care: Using
Analytics To Identify And Manage
High-Risk And High-Cost Patients
ABSTRACT
The US health care system is rapidly adopting electronic health
records, which will dramatically increase the quantity of clinical data that
are available electronically. Simultaneously, rapid progress has been made
in clinical analyticstechniques for analyzing large quantities of data and
gleaning new insights from that analysiswhich is part of what is known
as big data. As a result, there are unprecedented opportunities to use big
data to reduce the costs of health care in the United States. We present
six use casesthat is, key exampleswhere some of the clearest
opportunities exist to reduce costs through the use of big data: high-cost
patients, readmissions, triage, decompensation (when a patients
condition worsens), adverse events, and treatment optimization for
diseases affecting multiple organ systems. We discuss the types of insights
that are likely to emerge from clinical analytics, the types of data needed
to obtain such insights, and the infrastructureanalytics, algorithms,
registries, assessment scores, monitoring devices, and so forththat
organizations will need to perform the necessary analyses and to
implement changes that will improve care while reducing costs. Our
findings have policy implications for regulatory oversight, ways to
address privacy concerns, and the support of research on analytics.
The cost of health care in the United
States is high, nearly twice that in
most other developed countries,1
and it continues to grow rapidly.
The unsustainable projected trajec-
tory of US health care costs has led to calls for
improving the value of health care.2However,
the Affordable Care Actthe most substantial
policy reform in US health care in decades
has been criticized for not doing enough to con-
tain costs.3
As health reform progresses, one key dynamic
of the US health care system is the rapid adoption
of electronic health records (EHRs). The growth
of EHRs will make it possible to access unprece-
dented amounts of clinical data and offers the
potential for cost savings.4The extent of those
cost savings is still to be determined,5but EHRs
value in increasing health care providersaccess
to patientsrecords is not in question.
In other industries, companies have been very
successful at using big data to improve their effi-
ciency.6By big data, we refer to the high volume,
variety, and potential for the rapid accumulation
of data and to analytics, which is the discovery
and communication of patterns in data.
Examples include Amazons product recom-
mendation system for online shopping, creating
efficient pricing in the stock market, and predict-
ing playersstatistics in baseball. Watson”—an
application developed by IBMhad a recent suc-
cess on the television quiz show Jeopardy, using
some of these big-data approaches.7However,
the extent to which these tactics will be applica-
ble to clinical questions is as yet uncertain.8
The underlying techniques used in big data
doi: 10.1377/hlthaff.2014.0041
HEALTH AFFAIRS 33,
NO. 7 (2014): 11231131
©2014 Project HOPE
The People-to-People Health
Foundation, Inc.
David W. Bates (dbates@
partners.org) is chief of the
Division of General Medicine,
Brigham and Womens
Hospital, in Boston,
Massachusetts.
Suchi Saria is an assistant
professor of computer science
and health policy management
at the Center for Population
Health and IT, Johns Hopkins
University, in Baltimore,
Maryland.
Lucila Ohno-Machado is
associate dean for informatics
and technology in the Division
of Biomedical Informatics,
University of California, San
Diego, in La Jolla.
Anand Shah is vice president
of clinical services at PCCI, in
Dallas, Texas.
Gabriel Escobar is regional
director of hospital operations
research and director of the
Systems Research Initiative,
Division of Research, Kaiser
Permanente, in Oakland,
California.
July 2014 33:7 Health Affairs 1123
Predictive Analytics
at BRIGHAM & WOMENS HOSP on July 10, 2014Health Affairs by content.healthaffairs.orgDownloaded from
have improved substantially in the past decade,
and they often involve hypothesis-free ap-
proaches such as data mining. Many experts
have called for health care to adopt big-data ap-
proaches,9but uptake has been relatively limited
so far.
That may be about to change. Payment reform
strategies that incentivize value such as account-
able care (a key strategy of the Affordable Care
Act, in which entities are asked to be account-
ablefor the care they provide) and bundling (a
payment approach in which providers are asked
to deliver a set of services for a predefined price)
are intended to motivate organizations to im-
prove the efficiency of their care. One tactic that
health care organizations will likely deploy is the
more effective use of predictive analytics.
Ideally, predictive analytics will involve link-
ing data from multiple sources, including clini-
cal, genetic and genomic, outcomes, claims, and
social data. Many new sources of data are becom-
ing available, such as data from cell phones and
social media applications. Aggregating these
data for the purpose of achieving clinical predic-
tive analytics will require the adoption of stand-
ards,10 raise privacy and ethical concerns,11 and
require new ways to preserve privacy.12
Big data sets can be subjected to many other
types of analytic approaches, including pattern
recognition and natural historythat is, the
course of a disease process. However, we believe
that even in the short term, it will be possible for
health care organizations to realize substantial
benefits from deploying predictive systems. Pre-
dictive systems are software tools that allow the
stratification of risk to predict an outcome. Such
tools are important because many potential out-
comes are associated with harm to patients, are
expensive, or both.
In health care, we suggest that one way to use
predictive systems would be to identify and man-
age six very practical use casesthat is, examples
of instances in which value is likely to be
achieved. They are high-cost patients, readmis-
sions, triage, decompensation (when a patients
condition worsens), adverse events, and treat-
ment optimization for diseases affecting multi-
ple organ systems (such as autoimmune dis-
eases, including lupus). Below we address the
types of data and infrastructure that health care
organizations will need for each use case.We also
discuss what organizations will need to do to
actually improve care.
High-Cost Patients
Approximately 5 percent of patients account for
about 50 percent of all US health care spending.13
One approach to reducing costs is to identify
such patients and manage them more effectively,
often by having case managers work with them to
improve their care. Such an approach has already
resulted in cost reductions.14 However, the iden-
tification of potentially high-cost patients has
not always produced the desired results. For ex-
ample, a number of Medicare demonstration
projects did not lower costs even though the
projects were able to identify high-risk pa-
tients.14,15
To effectively implement analytic methods for
identifying potentially high-cost patients, a
number of issues must be considered. First, what
approach should be used to predict which pa-
tients who are likely to be high risk or high cost?
Second, what new measurement sources can be
incorporated to improve the predictions? Attri-
butes associated with high-cost patients may
include behavioral health problems or socio-
economic factors such as poverty or racial mi-
nority status. Thus, integrating data about men-
tal health, socioeconomic status, or other issues
such as marital and living status from various
sources16 may significantly change the quality
of the predictions that can be made.
A third issue is how to make predictions ac-
tionable, by identifying which patients are most
likely to benefit from an intervention and what
specific interventions can most improve care.
The effective implementation of new analytic
systems to identify potentially high-cost patients
will require making predictions easily available
with minimal changes to clinical work flows, to
increase the chances that health care providers
will act on the predictions.
Many organizations and companies that cur-
rently use analytic systems have focused on iden-
tifying the algorithm that can best stratify data by
risk of future costs while not addressing other
issues. The variation among algorithms may not
be large, and a more practical algorithm may be
better than a slightly more accurate one. Algo-
rithms are most effective and perform best when
they are derived from and then used in similar
populations.1719
A fourth issue is how to account for the fact
that many cases of outcomes in predictive mod-
els often come from low-risk groups. This sug-
gests the need for more accurate modeling, par-
ticularly for population management.
We suggest that it is important in using analyt-
ic systems to identify potentially high-cost pa-
tients to determine the patientsspecific needs
and gaps in care. It is especially important to
identify and address behavioral health problems,
because a large portion of the patients at high
risk for hospital admission have some sort of
behavioral health issue, with depression being
especially frequent.20
Predictive Analytics
1124 Health Affairs July 2014 33:7
at BRIGHAM & WOMENS HOSP on July 10, 2014Health Affairs by content.healthaffairs.orgDownloaded from
Programs to manage high-cost patients are
expensive. They will be much more cost-effective
if interventions can be precisely tailored to a
patients specific problems, which might be re-
lated to transportation, medication nonadher-
ence, or family conflict.
Resources in health care are becoming increas-
ingly limited, which requires greater emphasis
on value. Thus, it will be important to investigate
analytic techniques that identify not only high-
risk people, but also those who are at particularly
low risk. For instance, the standard approach
may be to give all patients who are discharged
from the hospital a follow-up appointment in
two weeks. But it might make more sense to
ensure that the highest-risk patients are seen
within two days, while patients with very low risk
might require follow-up care only as needed.
Algorithms can help reallocate resources more
effectively at both the high-risk and low-risk
ends of the spectrum.
Readmissions
Much has been made of the frequency and high
cost of hospital readmissions.21 The Centers for
Medicare and Medicaid Services (CMS) has
strongly incentivized organizations to reduce
their frequency.22 As many as one-third of re-
admissions have been posited to be preventable
and, therefore, to present a significant opportu-
nity for improving care delivery.23
Health care organizations should all use an
algorithm to predict who is likely to be readmit-
ted to the hospital. However, the predictive value
of the algorithms tends to be similar. Four areas
of a predictive algorithm may be important dif-
ferentiators: tailoring the intervention to the in-
dividual patient, ensuring that patients actually
get the precise interventions intended for them,
monitoring specific patients after discharge to
find out if they are having problems before they
decompensate, and ensuring a low ratio of pa-
tients flagged for an intervention to patients who
experience a readmission (that is, a low false
positive rate).
Some work has already been done in predict-
ing readmissions,24 and analytics will play a key
role in further work. For example, it may make
sense soon to ask patients with a smartphone to
allow health care organizations to access data
from their phones that will help identify patients
who are not managing a chronic condition well
or that will monitor people recently discharged
from the hospital, since it appears that patients
who are not making calls or sending e-mail with
their usual frequency may be depressed or suf-
fering from other issues.25 Patients may also be
asked to wear some type of device that monitors
physiological parameters, such as heart rate or
rhythm. These data will be most effective in in-
forming health care decisions if they are proc-
essed with analytics.
Triage
Estimating the risk of complications when a pa-
tient first presents to a hospital can be useful for
a number of reasons, such as managing staffing
and bed resources, anticipating the need for a
transfer to the appropriate unit, and informing
overall strategy for managing the patient. In the
neonatal setting, for example, the invention of
the Apgar score revolutionized the management
of newborn resuscitation.26,27 However, comput-
ing the score required training caregivers to as-
sess subjective parameters such as irritability
and color(a proxy for tissue perfusion, or
how well blood is flowing to tissues). In new-
borns and many other populations, using mod-
ern big-data techniques28 that combine routinely
collected physiological measurements makes
much more accurate assessments possible with
a minimal burden of training and implemen-
tation.29
In integrating a triage algorithm into clinical
work flow, it is vital to have a detailed guideline
that clarifies how the algorithm will inform care.
Two pilot programs in Kaiser Permanente North-
ern California (KPNC), an integrated health care
delivery system with comprehensive informa-
tion systems, are using this approach.
The first pilot involves evaluating newborns
for early onset sepsis. The goal is to reduce the
number of newborns who receive antibiotics un-
necessarily. Hundreds of thousands of newborns
are evaluated for early-onset sepsis each year.3032
Recently, a team of scientists from KPNC,
Harvard University, and the University of Cali-
fornia, San Francisco and Santa Cruz, developed
a two-step protocol that can be expected to de-
crease the number of these evaluations and re-
duce the prescription of antibiotics for newborns
dramatically in the United States. In the first
One tactic that health
care organizations will
likely deploy is the
more effective use of
predictive analytics.
July 2014 33:7 Health Affairs 1125
at BRIGHAM & WOMENS HOSP on July 10, 2014Health Affairs by content.healthaffairs.orgDownloaded from
step, which can be embedded in an EHR, objec-
tive maternal data are used to assign a prelimi-
nary (prior to birth) probability of early-onset
sepsis.33 In the second step, a simplified set of
clinical findings are combined with the estimate
based on maternal data to yield a new posterior
probability for risk of sepsis following birth.34
The combination of these two steps could lead
to as many as 240,000 fewer US newbornsbeing
treated with systemic antibiotics each year.
The second KPNC pilot addresses adult pa-
tients in the emergency department. Severity-
of-illness scores for adult intensive care patients
have been available for some time.35,36 However,
the scoresimpact on triage has been limited.
This is in part because the most important of
thesethe Acute Physiology and Chronic Health
Evaluation (APACHE)37 and the Simplified Acute
Physiology Score (SAPS)38involve data that are
captured after a patient has entered intensive
care.
In the second pilot, clinicians in the emergen-
cy department will be provided with two compos-
ite scores that have been calibrated using mil-
lions of patient records and that are applicable to
all hospitalized patients, not just those in inten-
sive care. The first of these scores summarizes a
patients global comorbidity burden during the
preceding twelve months; the second captures a
patients physiological instability in the preced-
ing seventy-two hours.39 In addition, these two
scores, available in real time, are combined with
vital signs, trends in vital signs, and other infor-
mation, such as how long a patient has been in
the hospital. If the information collectively indi-
cates that a patient has 8 percent risk of deteri-
orating in the next twelve hours, an alert is sent
to the responsible providers.
Importantly, the KPNC early-onset sepsis and
emergency department composite score pilots
are both designed for patients who are not being
monitored continuously, yet they take advantage
of big-data methodologies. In both cases, teams
of clinicians are developing work flows that in-
tegrate big-data components (real-time risk es-
timates) with traditional components (such as
clinical examinations and care pathways).
Decompensation
Often before decompensationthe worsening of
a patients conditionthere is a period in which
physiological data can be used to determine
whether the patient is at risk for decompensat-
ing. Much of the initial rationale for intensive
care units (ICUs) was to allow patients who were
critically ill to be closely monitored. A host of
technologies40 are now available that can be used
to monitor patients who are in general care
units, in nursing homes, or even at home but
at risk of some sort of decompensation. Real-
time indices such as the Rothman Index are also
available.4143
Some of these technologies have been avail-
able for many years, such as electrocardiograph-
ic monitoring and oxygen monitoring. Others
are newer, such as end-tidal CO2monitoring
and monitors that allow detection of whether
or not a patient is moving.44,45 A problem with
all of these technologies has been the signal-to-
noise ratio: Alarms are often false positives.
Monitors are becoming available in which mul-
tiple data streams can be compared simulta-
neously, and analytics can be used in the back-
ground to determine whether or not the signal
is valid.
One example of these new monitors is a device
that sits under the mattress and that collects data
about the patients respiratory rate and pulse and
whether or not the patient is moving.45 The data
are transmitted to a server, where analytics are
used in real time to determine if the patient
appears likely to be decompensating. When
the system detects a likely decompensation, an
e-mail message is sent to an on-duty nurses
smartphone.
With this system, the likelihood that a true
decompensation is present has been increased
to approximately 50 percentfar better than for
cardiac telemetry, for which it is typically 5
10 percent. In one small trial, the system reduced
the number of subsequent ICU days for patients
in general care units by 47 percent, compared to
controls.45
Analytics that use multiple data streams to
effectively detect decompensation are already
at work in some ICUs, and such use is expected
to grow. Analytic tools are likely to make their
way into other clinical settings as well to predict
decompensation.
Some work has
already been done in
predicting
readmissions, and
analytics will play a
key role in further
work.
Predictive Analytics
1126 Health Affairs July 2014 33:7
at BRIGHAM & WOMENS HOSP on July 10, 2014Health Affairs by content.healthaffairs.orgDownloaded from
Adverse Events
Another use case for analytics will be to predict
which patients are at risk of adverse events of
several types. Adverse events are expensive46 and
cause substantial morbidity and mortality, yet
many are preventable.
Renal Failure Renal failure is extremely ex-
pensive and carries a high risk of mortality.47
However, renal function is readily measured,
and early changes in it are often apparent well
before major decompensation occurs. It seems
likely that analytics could be combined with data
about exposures to specific medications and
with measures of kidney function, blood pres-
sure, urine output, and other processes to iden-
tify patients at risk of decompensation.
Infection Analytics can be effective in man-
aging infection. One example involves monitor-
ing and interpreting changes in heart rate vari-
ability for detection of major decompensation in
infants with very low birthweights before the
emergence of an infection.48 Monitoring the
heart-rate characteristics of newborns alone
has already resulted in reductions in mortality
and increases in the number of ventilator-free
days. However, there is room for improvement
using increasingly sophisticated analytics that
account for subtle signals28 but also filter out
extraneous patterns,49 such as those that occur
when the baby moves.
Adverse Drug Events Adverse drug events,
which occur frequently50 and are costly,51 are an-
other area where analytics can be effective. Most
efforts so far to predict which patients will suffer
an adverse drug event have not been very effec-
tive.52,53 However, analytics have the potential to
predict with substantial accuracy which patient
may suffer an adverse drug event and to detect
patients who are in the early stages of such an
event, by assessing genetic and genomic infor-
mation, laboratory data, information on vital
signs, and other data.
Diseases Affecting Multiple Organ
Systems
Chronic conditions that span more than one or-
gan system or are systemic in nature are some of
the costliest conditions to manage.54,55 Any single
disease may include cutaneous (skin), mucosal,
renal, musculoskeletal, pulmonary, hematologi-
cal, immunological, and neurologic manifesta-
tions.56 Autoimmune disorders such as sclero-
derma, rheumatoid arthritis, and systemic
lupus erythematosus are examples of such con-
ditions.
The ability to accurately predict the trajectory
of a patients disease could allow the caregiver to
better target complicated and expensive thera-
pies to patients who stand to benefit the most
from them, thus reducing the burden of disease
on those patients and on the health care system.
Currently, the caregivers ability to optimize
treatment is limited by the complexities result-
ing from the heterogeneity in clinical pheno-
types, the diversity of available measurements,
and lack of high-precision biomarkers.57
This area is ripe for computing approaches
that can combine the multitude of measure-
ments taken as part of routine care to infer the
progression of a patients disease and tailor
treatments to that patient. There are already
some successful examples of these ap-
proaches.58,59
Multisite longitudinal registries that allow the
aggregation of populations of patients with a
disease or condition60 have been initiated.
In the near future, clinical data networks are
likely to play the role that registries now do.
One example of such a network is the National
Patient-Centered Clinical Research Network
(PCORnet),61 which itself comprises multiple
clinical data research networks. Access to longi-
tudinal records has been the biggest limitation
for making progress in the area of chronic dis-
eases in multiple organ systems. As EHRs and
clinical data networks based on EHRs become
widespread, we expect to see the benefits of these
technologies in improving care for patients with
such diseases.
Discussion
We have discussed six use cases for high-risk
patients in which clinical analytics are likely to
be highly beneficial. This is by no means an ex-
haustive list. The evidence of benefit varies wide-
ly across the six use cases, but the current costs
for the patients in each case are very great.
We focused in particular on use cases that in-
clude the hospital inpatient setting, in part be-
cause that is where the most data are available.
However, analytics will almost certainly be use-
Analytics will almost
certainly be useful
across the health care
continuum.
July 2014 33:7 Health Affairs 1127
at BRIGHAM & WOMENS HOSP on July 10, 2014Health Affairs by content.healthaffairs.orgDownloaded from
ful across the health care continuumfor exam-
ple, in evaluating the overall drivers of costs and
using tools like geocoding (coding data by geog-
raphy) to detect epidemics or to identify hot
spots(of diseases, high costs, and so on). Both
predicting outcomes of patientssuch as who
will be a high-cost patient, be readmitted, or
suffer an adverse eventand tailoring the man-
agement of patients should result in substantial
savings for the health care system.
One question is to what extent to use disease-
specific models versus more general ones in big-
data analytics in health care. Much of US health
care organizationsfocus in their use of analytics
has been on patients with one condition, such as
congestive heart failure. This approach can often
be effective. However, we believe that ap-
proaches that address multiple conditions are
likely to have a bigger impact on care outcomes
and cost savings in the long run.
Another question is how to incorporate the
narrative text from EHRs into big-data analytics
in health care. Extracting clinically relevant con-
cepts using natural language processing is diffi-
cult.62,63 Essential elements of the narrative, such
as temporal relationships and co-references
(that is, narrative that refers to more than one
thing), may be lost or incorrectly assigned.64,65
Nonetheless, clinical natural language process-
ing is already quite usable, and even simple ap-
proaches can find 90 percent of factual informa-
tion of many types.66,67 A related problem is that
longitudinal follow-up is hindered by the paucity
of information exchange among health systems
and registries of vital statistics.
Modern analytic approaches have shown de-
monstrable performance gains in other indus-
tries and are markedly different from the typical
data analytic approaches used in health care. The
health care system has generally used simple
decision tree or logistic regression models, in
part because these often have to be implemented
under time constraints at the point of care.
EHRs make it possible to use models of diag-
nosis and care that combine thousands of dis-
parate measurements to generate evidence in
real time. These models can be far more complex
than their predecessors: For example, instead of
identifying one or two key markers, such as
smoking and high blood pressure, complex ana-
lytic models can combine subtle cues from a large
number of markers. This increased complexity
makes the new models more difficult to interpret
and their reliability less easy to assess, compared
to previous models.
Other industries have grown accustomed to
running mission-critical systems using such
complex and advanced approaches while also
establishing reliabilitytypically through exten-
sive test implementations before deployment in
production. Attention must be paid to the gen-
eralizability of existing results in modelsperfor-
mance to evaluate the size and scope of appro-
priate test implementations in health care.16
Another limiting factor in the use of analytics
in the health care setting has been delivering
predictions to providersespecially in real
timeto enable action. That is becoming pro-
gressively easier with EHRs and modern commu-
nication tools. However, many EHRs do not in-
clude robust event enginestools that sift
through data and use rules to notify providers
when appropriateor robust approaches for de-
termining which provider is responsible for a
specific patient at a given time.
Policy Implications
Our observations have a number of implications
with respect to research, regulation, payment,
and privacy, among other areas.
Research Regarding research, more system-
atic evaluation is needed to move from potential
to realization in many areas. Specifically, we be-
lieve that federal support for research that eval-
uates the use of analytics and big data to address
the six use cases discussed above is warranted.
Especially useful would be studies of the tailor-
ing of solutions for high-risk patients and the use
of multiple streams of datain particular, from
sensor technologiesfor the prediction of ad-
verse events and for therapy selection for pa-
tients with diseases that affect multiple organ
systems.
Yet to be determined is the extent to which
hypothesis-driven (the traditional approach)
or hypothesis-free approaches (such as those
used in data mining) are appropriate. Also still
unclear is the relative importance of developing
specific approaches and of implementing and
disseminating them. We believe that there is
more need to develop approaches, because pay-
Federal support for
research that
evaluates the use of
analytics and big data
to address the six use
casesiswarranted.
Predictive Analytics
1128 Health Affairs July 20 14 33:7
at BRIGHAM & WOMENS HOSP on July 10, 2014Health Affairs by content.healthaffairs.orgDownloaded from
ment reform is likely to offer strong incentives
for their implementation and dissemination.
Regulation From the regulation perspective,
a key question will be to what extent these pre-
dictive analytic approaches will be regulated by
the Food and Drug Administration (FDA). In
August 2013 the Food and Drug Administration
Safety and Innovation Act working group tasked
with evaluating emerging health information
technology (IT) published a draft report con-
cluding that FDA premarket review of health
IT applications, such as analytics, would not
be beneficial.68 The report also concluded that
if health IT applications used analytics to deliver
strong clinical decision support or were embed-
ded in devices, they might require FDA review.
Thus, there is clearly a tension between the need
for regulatory oversight and for protection of the
public. The FDA has already released another
report on this topic in 2014.69
Payment With respect to payment, strategies
such as the accountable care organization model
that encourage organizations to invest in cost
reduction will likely accelerate the adoption of
analytics. However, as many experts have com-
mented, the current provisions of the Affordable
Care Act may not be sufficient on their own to get
providers to focus on costs.70
Privacy Regarding privacy, there are many
thorny issues, as the growing controversy over
the National Security Agencys collection of data
about private phone calls has illustrated. Many
people will not wish to have some types of data
about them linked with other types of data, and
this issue may be even more sensitive in health
care than in other domains. However, Ruth
Faden and coauthors have argued that in a just
health care system, patients have a moral obliga-
tion to contribute to the common purpose of
improving the quality and value of clinical care.71
Policy makers have been reluctant to alter the
provisions of the Health Information Portability
and Accountability Act (HIPAA) of 1996, which is
the major legislation related to privacy and secu-
rity issues in health care. However, the act does
not address many issues that will become rele-
vant as more disparate data sources become
linked.
Conclusion
Big data, including analytics, is a powerful tool
that will be as useful in health care as it has been
in other industries. The choice of these specific
use cases that we have discussed in this article
can be debated. Nonetheless, we believe that they
will be among those that deliver the greatest
value for health care organizations in the near
term. This general approach has great potential
for improving value in health care. We believe
that organizations that employ it in many do-
mains will benefit, especially under payment
reform.
David Bates is on the clinical advisory
board for and has received research
funding from EarlySense, a company
that uses analytics and sensor
technology to improve care. The authors
thank Stephanie Klinkenberg-Ramirez for
her assistance with the preparation of
this article. Funding was provided by
Framework and Action Plan for
Predictive Analytics Grant No. 3861
from the Gordon and Betty Moore
Foundation.
Current provisions of
theAffordableCare
Act may not be
sufficient on their
own to get providers
to focus on costs.
July 2014 33:7 Health Affairs 1129
at BRIGHAM & WOMENS HOSP on July 10, 2014Health Affairs by content.healthaffairs.orgDownloaded from
NOTES
1Davis K. 2012 annual report: presi-
dents messagehealth care reform:
a journey [Internet]. New York (NY):
Commonwealth Fund; 2012 Dec 26
[cited 2014 Apr 24]. Available from:
http://www.commonwealthfund
.org/Publications/Annual-Report-
Essays/2012/Dec/Health-Care-
Reform-A-Journey.aspx
2Porter ME. What is value in health
care? N Engl J Med. 2010;363(26):
247781.
3Mechanic RE, Altman SH,
McDonough JE. The new era of
payment reform, spending targets,
and cost containment in Massachu-
setts: early lessons for the nation.
Health Aff (Millwood). 2012;31(10):
233442.
4Xierali IM, Hsiao CJ, Puffer JC,
Green LA, Rinaldo JC, Bazemore AW,
et al. The rise of electronic health
record adoption among family
physicians. Ann Fam Med. 2013;
11(1):149.
5Bassi J, Lau F. Measuring value for
money: a scoping review on eco-
nomic evaluation of health infor-
mation systems. J Am Med Inform
Assoc. 2013;20(4):792801.
6Davenport TH, Harris JG. Compet-
ing on analytics: the new science of
winning. Boston (MA): Harvard
Business School Press; 2007.
7Ferrucci D, Brown E, Chu-Carroll J,
Fan J, Gondek D, Kalyanpur AA,
et al. Building Watson: an overview
of the DeepQA project. AI Magazine.
2010;31(3):5979.
8Nadkarni PM, Ohno-Machado L,
Chapman WW. Natural language
processing: an introduction. J Am
Med Inform Assoc. 2011;18(5):
54451.
9Murdoch TB, Detsky AS. The inevi-
table application of big data to health
care. JAMA. 2013;309(13):13512.
10 Tenenbaum JD, Sansone SA,
Haendel M. A sea of standards for
omics data: sink or swim? J Am Med
Inform Assoc. 2013;21(2):2003.
11 Agaku IT, Adisa AO, Ayo-Yusuf OA,
Connolly GN. Concern about secu-
rity and privacy, and perceived con-
trol over collection and use of health
information are related to with-
holding of health information from
healthcare providers. J Am Med In-
form Assoc. 2013;21(2):3748.
12 Ohno-Machado L. To share or not to
share: that is not the question. Sci
Transl Med. 2012;4(165):165cm15.
13 Schoenman JA, Chockley N. Under-
standing U.S. health care spending
[Internet]. Washington (DC): Na-
tional institute for Health Care
Management Research and Educa-
tional Foundation; 2011 Jul [cited
2014 Apr 24]. (Data Brief). Available
from: http://www.nihcm.org/
images/stories/NIHCM-CostBrief-
Email.pdf
14 Nelson L. Lessons from Medicares
demonstration projects on disease
management and care coordination
[Internet]. Washington (DC): Con-
gressional Budget Office; 2012 Jan
[cited 2014 Apr 24]. (Working Paper
No. 2012-01). Available from: http://
www.cbo.gov/sites/default/files/
cbofiles/attachments/WP2012-
01_Nelson_Medicare_DMCC_
Demonstrations.pdf
15 Weil E, Ferris T, Meyer G. Fact
sheetphase one: MGH Medicare
demonstration project for high-cost
beneficiaries [Internet]. Boston
(MA): Massachusetts General Hos-
pital Physician Group; [cited 2014
Apr 24]. Available from: http://
www.massgeneral.org/News/assets/
pdf/CMS_project_phase1Fact
Sheet.pdf
16 Paxton C, Niculescu-Mizil A, Saria S.
Developing predictive models using
electronic medical records: chal-
lenges and pitfalls. AMIA Annu
Symp Proc. 2013;2013:110915.
17 Turner-McGrievy GM, Beets MW,
Moore JB, Kaczynski AT, Barr-
Anderson DJ, Tate DF. Comparison
of traditional versus mobile app self-
monitoring of physical activity and
dietary intake among overweight
adults participating in an mHealth
weight loss program. J Am Med In-
form Assoc. 2013;20(3):5138.
18 Jiang X, Menon A, Wang S, Kim J,
Ohno-Machado L. Doubly Optimized
Calibrated Support Vector Machine
(DOC-SVM): an algorithm for joint
optimization of discrimination and
calibration. PLoS One. 2012;7(11):
e48823.
19 Jiang X, Boxwala AA, El-Kareh R,
Kim J, Ohno-Machado L. A patient-
driven adaptive prediction technique
to improve personalized risk esti-
mation for clinical decision support.
J Am Med Inform Assoc. 2012;
19(e1):e3642.
20 Freund T, Kunz CU, Ose D,
Szecsenyi J, Peters-Klimm F. Pat-
terns of multimorbidity in primary
care patients at high risk of future
hospitalization. Popul Health
Manag. 2012;15(2):11924.
21 Jencks SF, Williams MV, Coleman
EA. Rehospitalizations among pa-
tients in the Medicare fee-for-service
program. N Engl J Med. 2009;
360(14):141828.
22 Clancy CM. Commentary: reducing
hospital readmissions: aligning fi-
nancial and quality incentives. Am J
Med Qual. 2012;27(5):4413.
23 Kocher RP, Adashi EY. Hospital re-
admissions and the Affordable Care
Act: paying for coordinated quality
care. JAMA. 2011;306(16):17945.
24 Bayati M. Data-driven decision
making in healthcare systems [In-
ternet]. Redmond (WA): Microsoft
Corporation; 2011 Sep 27 [cited 2014
Apr 24]. (20th Anniversary Lecture
Series). Available from: http://
research.microsoft.com/apps/
video/default.aspx?id=159290
25 Madan A, Cebrian M, Lazer D,
Pentland A. Social sensing for epi-
demiological behavior change. In:
Proceedings of the 12th ACM Inter-
national Conference on Ubiquitous
Computing. New York (NY): ACM
Press; 2010. p. 291300.
26 Apgar V. The newborn (Apgar)
scoring system. Reflections and ad-
vice. Pediatr Clin North Am. 1966;
13(3):64550.
27 Finster M, Wood M. The Apgar score
has survived the test of time. Anes-
thesiology. 2005;102(4):8557.
28 Saria S, Koller D, Penn A. Learning
individual and population level traits
from clinical temporal data [Inter-
net]. Submitted to: Neural Informa-
tion Processing Systems (NIPS)
Foundation. Predictive Models in
Personalized Medicine workshop;
Whistler, BC; 2010 Dec 11 [cited 2014
May 22]. Available from: https://
sites.google.com/site/personalmed
models/proceedings/Saria.pdf
29 Saria S, Rajani AK, Gould J, Koller D,
Penn AA. Integration of early phys-
iological responses predicts later
illness severity in preterm infants.
Sci Transl Med. 2010;2(48):48ra65.
30 Escobar GJ. The neonatal sepsis
work-up: personal reflections on
the development of an evidence-
based approach toward newborn
infections in a managed care orga-
nization. Pediatrics. 1999;103(Suppl
E1):36073.
31 Escobar GJ, Li DK, Armstrong MA,
Gardner MN, Folck BF, Verdi JE,
et al. Neonatal sepsis workups in
infants >=¼2000 grams at birth: a
population-based study. Pediatrics.
2000;106(2 Pt 1):25663.
32 Mukhopadhyay S, Eichenwald EC,
Puopolo KM. Neonatal early-onset
sepsis evaluations among well-
appearing infants: projected impact
of changes in CDC GBS guidelines. J
Perinatol. 2013;33(3):198205.
33 Puopolo KM, Draper D, Wi S,
Newman TB, Zupancic J, Lieberman
E, et al. Estimating the probability of
neonatal early-onset infection on the
basis of maternal risk factors. Pedi-
atrics. 2011;128(5):e115563.
34 Escobar GJ, Puopolo KM, Wi S, Turk
BJ, Kuzniewicz MW, Walsh EM, et al.
Stratification of risk of early-onset
sepsis in newborns 34 weeksges-
tation. Pediatrics. 2014;133(1):306.
35 Ohno-Machado L, Resnic FS,
Matheny ME. Prognosis in critical
care. Annu Rev Biomed Eng. 2006;
8:56799.
36 Knaus WA, Draper EA, Wagner DP,
Zimmerman JE. APACHE II: a se-
verity of disease classification sys-
tem. Crit Care Med. 1985;13(10):
81829.
37 Zimmerman JE, Kramer AA. Out-
come prediction in critical care: the
Predictive Analytics
1130 Health Affairs July 20 14 33:7
at BRIGHAM & WOMENS HOSP on July 10, 2014Health Affairs by content.healthaffairs.orgDownloaded from
Acute Physiology and Chronic
Health Evaluation models. Curr
Opin Crit Care. 2008;14(5):4917.
38 Metnitz PG, Moreno RP, Almeida E,
Jordan B, Bauer P, Campos RA, et al.
SAPS 3from evaluation of the pa-
tient to evaluation of the intensive
care unit. Part 1: objectives, methods
and cohort description. Intensive
Care Med. 2005;31(10):133644.
39 Escobar GJ, Gardner MN, Greene
JD, Draper D, Kipnis P. Risk-
adjusting hospital mortality using a
comprehensive electronic record in
an integrated health care delivery
system. Med Care. 2013;51(5):
44653.
40 Kodali BS. Capnography outside the
operating rooms. Anesthesiology.
2013;118(1):192201.
41 Rothman MJ, Rothman SI, Beals J
4th. Development and validation of a
continuous measure of patient con-
dition using the electronic medical
record. J Biomed Inform. 2013;
46(5):83748.
42 Finlay GD, Rothman MJ, Smith RA.
Measuring the Modified Early
Warning Score and the Rothman
Index: advantages of utilizing the
electronic medical record in an early
warning system. J Hosp Med. 2014;
9(2):1169.
43 Rothman SI, Rothman MJ, Solinger
AB. Placing clinical variables on a
common linear scale of empirically
based risk as a step towards con-
struction of a general patient acuity
score from the electronic health
record: a modelling study. BMJ
Open. 2013;3(5).
44 Donald MJ, Paterson B. End tidal
carbon dioxide monitoring in pre-
hospital and retrieval medicine: a
review. Emerg Med J. 2006;23(9):
72830.
45 Brown H, Terrence J, Vasquez P,
Bates DW, Zimlichman E. Continu-
ous monitoring in an inpatient
medical-surgical unit: a controlled
clinical trial. Am J Med. 2014;
127(3):22632.
46 Jha AK, Chan DC, Ridgway AB, Franz
C, Bates DW. Improving safety and
eliminating redundant tests: cutting
costs in U.S. hospitals. Health Aff
(Millwood). 2009;28(5):147584.
47 Bates DW, Su L, Yu DT, Chertow GM,
Seger DL, Gomes DR, et al. Mortality
and costs of acute renal failure as-
sociated with amphotericin B thera-
py. Clin Infect Dis. 2001;32(5):
68693.
48 Moorman JR, Carlo WA, Kattwinkel
J, Schelonka RL, Porcelli PJ,
Navarrete CT, et al. Mortality re-
duction by heart rate characteristic
monitoring in very low birth weight
neonates: a randomized trial. J Pe-
diatr. 2011;159(6):9006.
49 Quinn JA, Williams CK, McIntosh N.
Factorial switching linear dynamical
systems applied to physiological
condition monitoring. IEEE Trans
Pattern Anal Mach Intell. 2009;
31(9):153751.
50 Bates DW, Cullen DJ, Laird N,
Petersen LA, Small SD, Servi D, et al.
Incidence of adverse drug events and
potential adverse drug events. Im-
plications for prevention. ADE Pre-
vention Study Group. JAMA. 1995;
274(1):2934.
51 Bates DW. Drugs and adverse drug
reactions: how worried should we
be? JAMA. 1998;279(15):12167.
52 Sakuma M, Bates DW, Morimoto T.
Clinical prediction rule to identify
high-risk inpatients for adverse drug
events: the JADE Study. Pharma-
coepidemiol Drug Saf. 2012;21(11):
12216.
53 Field TS, Gurwitz JH, Harrold LR,
Rothschild J, DeBellis KR, Seger AC,
et al. Risk factors for adverse drug
events among older adults in the
ambulatory setting. J Am Geriatr
Soc. 2004;52(8):134954.
54 Fortin M, Bravo G, Hudon C,
Vanasse A, Lapointe L. Prevalence of
multimorbidity among adults seen in
family practice. Ann Fam Med.
2005;3(3):2238.
55 Wolff JL, Starfield B, Anderson G.
Prevalence, expenditures, and com-
plications of multiple chronic con-
ditions in the elderly. Arch Intern
Med. 2002;162(20):226976.
56 Petri M. Systemic lupus erythema-
tosus: 2006 update. J Clin Rheuma-
tol. 2006;12(1):3740.
57 Hummers LK, Wigley FM. Sclero-
derma. New York (NY): Lange
Medical Books/McGraw Hill; 2013.
58 Leeper NJ, Bauer-Mehren A, Iyer SV,
Lependu P, Olson C, Shah NH.
Practice-based evidence: profiling
the safety of cilostazol by text-
mining of clinical notes. PLoS One.
2013;8(5):e63499.
59 Frankovich J, Longhurst CA,
Sutherland SM. Evidence-based
medicine in the EMR era. N Engl J
Med. 2011;365(19):17589.
60 Natter MD, Quan J, Ortiz DM,
Bousvaros A, Ilowite NT, Inman CJ,
et al. An i2b2-based, generalizable,
open source, self-scaling chronic
disease registry. J Am Med Inform
Assoc. 2013;20(1):1729.
61 National Patient-Centered Clinical
Research Network [home page on
the Internet]. Boston (MA):
PCORnet; [cited 2014 Apr 24].
Available from: http://pcornet.org/
62 Meystre SM, Savova GK, Kipper-
Schuler KC, Hurdle JF. Extracting
information from textual documents
in the electronic health record: a
review of recent research.Yearb Med
Inform. 2008:12844.
63 Saria S, McElvain G, Rajani AK,
Penn AA, Koller DL. Combining
structured and free-text data for au-
tomatic coding of patient outcomes.
AMIA Annu Symp Proc. 2010;
2010:7126.
64 Sun W, Rumshisky A, Uzuner O.
Temporal reasoning over clinical
text: the state of the art. J Am Med
Inform Assoc. 2013;20(5):8149.
65 Uzuner O, Bodnari A, Shen S,
Forbush T, Pestian J, South BR.
Evaluating the state of the art in
coreference resolution for electronic
medical records. J Am Med Inform
Assoc. 2012;19(5):78691.
66 DAvolio LW, Nguyen TM, Goryachev
S, Fiore LD. Automated concept-level
information extraction to reduce the
need for custom software and rules
development. J Am Med Inform As-
soc. 2011;18(5):60713.
67 LePendu P, Iyer SV, Bauer-Mehren
A, Harpaz R, Mortensen JM,
Podchiyska T, et al. Pharmacovigi-
lance using clinical notes. Clin
Pharmacol Ther. 2013;93(6):54755.
68 Bates DW. Draft FDASIA Committee
report [Internet]. Silver Spring
(MD): Food and Drug Administra-
tion; [cited 2014 Apr 25]. Available
from: http://www.healthit.gov/
facas/sites/faca/files/FDASIA
RecommendationsDraft030913_v2
.pdf
69 Food and Drug Administration.
FDASIA health IT report: proposed
strategy and recommendations for a
risk-based framework [Internet].
Silver Spring (MD): FDA; 2014 Apr
[cited 2014 May 5]. Available from:
http://www.fda.gov/downloads/
AboutFDA/CentersOffices/Officeof
MedicalProductsandTobacco/
CDRH/CDRHReports/UCM391521
.pdf
70 Noble DJ, Casalino LP. Can ac-
countable care organizations im-
prove population health?: should
they try? JAMA. 2013;309(11):
111920.
71 Faden RR, Kass NE, Goodman SN,
Pronovost P, Tunis S, Beauchamp
TL. An ethics framework for a
learning health care system: a de-
parture from traditional research
ethics and clinical ethics. Hastings
Cent Rep. 2013;(S1):S627.
July 2014 33:7 Health Affairs 1131
at BRIGHAM & WOMENS HOSP on July 10, 2014Health Affairs by content.healthaffairs.orgDownloaded from
... In the interim, several studies report potential benefits or values that can be derived from the application of big data in healthcare. Some of the potential benefits include quality improvement in healthcare (Kruse et al. 2016); personalized medicine (Chaussabel and Pulendran 2015), geographical mapping of diseases (Luo et al. 2016), disease and population management (Kruse et al. 2016;Raghupathi and Raghupathi 2014), and cost reduction (Bates et al. 2014;Roski et al. 2014). Bate et al (2014) suggest six instances in which big data analytics can be used to identify and manage high-risk and high-cost patients. ...
... Further, scholars have also identified and discussed some challenges that confront the application of big data in healthcare. Notable amongst them are policy and trust issues that inhibit data sharing (Bates et al. 2014;Kohli and Tan 2016;Kosseim and Brady 2008); privacy and security issues (Hoffman and Podgurski 2012;Salas-Vega et al. 2015); data and algorithm governance issues (Hoffman 2017;Kruse et al. 2016;Tan et al. 2015); lack of capabilities for data management, analytics and visualization (Hoffman 2015;Marfo et al. 2017;Roski et al. 2014); and, organizational culture and stakeholder management issues (Kohli and Tan 2016; Vithiatharan 2014). These challenges further highlight the need for establishing the socio-technical context in which individuals and healthcare institutions can actualize the potentials of big data in healthcare (Abbasi et al. 2016;Kosseim and Brady 2008;Olshannikova et al. 2015;Wamba et al. 2015). ...
... EHR process analysis is optimally guided by Quality Improvement and Patient Safety (clinical analytics). 31 However, the potential of the EHR for promoting learning and lasting behavioral change in the workplace is currently underexploited. For example, influential descriptions of the EHR by the Institute of Medicine, World Health Organization, and Centers for Medicaid and Medicare Services make minimal mention of education or learning in their descriptions of the core functions of the EHR, usually only referring to the training necessary to use an EHR and not to the opportunities for learning by practitioners (Table 1). ...
... There is a current tendency to rely on artificial intelligence or on machine learning to obtain information from Big Data currently available [51,52]. These data-driven approaches are fantastic methods for recognising and learning patterns in health environments, but their results may be hard to interpret. ...
Article
Full-text available
Healthcare teams act in a very complex environment and present extremely peculiar features since they are multidisciplinary, work under quickly changing conditions, and often stay together for a short period with a dynamically fluctuating team membership. Thus, in the broad discussions about the future of healthcare, the strategy for improving providers' collaboration and team dynamics is becoming a central topic. Within this context, this paper aims to discuss different viewpoints about the application of network science to teamworking. Our results highlight the potential benefits deriving from network science-enabled analysis, and also show some preliminary empirical evidence through a real case study. In so doing, we intend to stimulate discussions regarding the implications of network science in the investigation and improvement of healthcare teams. The intention is to pave the way for future research in this context by suggesting the potential advantages of healthcare teamwork analysis, as well as recognising its challenges and threats.
... Considering the availability of electronic medical records (EMRs), new patient-level data can be obtained and analysed to yield insights that can gradually improve nursing plans by mining data and exploring unknown information (Bates et al., 2014;Murdoch & Detsky, 2013;Wiens & Fackler, 2018). In this study, patient categorization was approached as a question of meaningful subgroup classification of critical patients. ...
Article
The interest in new and more advanced technological solutions is paving the way for the diffusion of innovative and revolutionary applications in healthcare organizations. The application of an artificial intelligence system to medical research has the potential to move toward highly advanced e-Health. This analysis aims to explore the main areas of application of big data in healthcare, as well as the restructuring of the technological infrastructure and the integration of traditional data analytical tools and techniques with an elaborate computational technology that is able to enhance and extract useful information for decision-making. We conducted a literature review using the Scopus database over the period 2010–2020. The article selection process involved five steps: the planning and identification of studies, the evaluation of articles, the extraction of results, the summary, and the dissemination of the audit results. We included 93 documents. Our results suggest that effective and patient-centered care cannot disregard the acquisition, management, and analysis of a huge volume and variety of health data. In this way, an immediate and more effective diagnosis could be possible while maximizing healthcare resources. Deriving the benefits associated with digitization and technological innovation, however, requires the restructuring of traditional operational and strategic processes, and the acquisition of new skills.
Thesis
Effective SARS-CoV-2 screening allows for a speedy and accurate diagnosis of COVID-19, reducing the d on healthcare systems. In order to evaluate the risk of infection, prediction models that integrate many variables have been developed. These are intended to aid medical personnel around the world in triaging patients, particularly in areas where health resources are scarce. We developed a machine-learning algorithm that was trained on the records of 51,831 people who had been tested (of whom 4769 were confirmed to have COVID- 19). The data in the test set came from the next week (47,401 tested individuals of whom 3624were confirmed to have COVID-19). Overall, we created a model that detects COVID-19 cases using simple variables available by asking basic questions, based on nationwide data publicly released by the Israeli Ministry of Health. When testing resources are limited, our approach can be used to prioritize testing for COVID-19, among other things. This project proposed the CNN-based x-ray image for detection of covid a boost for detection of symptoms.
Article
Introduction: Health systems in high-income countries face a variety of challenges calling for a systemic approach to improve quality and efficiency. Putting people in the centre is the main idea of the WHO model of people-centred and integrated health services. Integrating health services is fuelled by an integration of health data with great potentials for decision support based on big data analytics. The research question of this paper is "How can big data analytics support people-centred and integrated health services?" Methods: A scoping review following the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses - Scoping Review (PRISMA-ScR) statement was conducted to gather information on how big data analytics can support people-centred and integrated health services. The results were summarized in a role model of a people-centred and integrated health services platform illustrating which data sources might be integrated and which types of analytics might be applied to support the strategies of the people-centred and integrated health services framework to become more integrated across the continuum of care. Additional rapid literature reviews were conducted to generate frequency distributions of the most often used data types and analytical methods in the medical literature. Finally, the main challenges connected with big data analytics were worked out based on a content analysis of the results from the scoping literature review. Results: Based on the results from the rapid literature reviews the most often used data sources for big data analytics (BDA) in healthcare were biomarkers (39.3%) and medical images (30.9%). The most often used analytical models were support vector machines (27.3%) and neural networks (20.4%). The people-centred and integrated health services framework defines different strategic interventions for health services to become more integrated. To support all aspects of these interventions a comparably integrated platform of health-related data would be needed, so that a role model labelled as people-centred health platform was developed. Based on integrated data the results of the scoping review (n = 72) indicate, that big data analytics could for example support the strategic intervention of tailoring personalized health plans (43.1%), e.g. by predicting individual risk factors for different therapy options. Also BDA might enhance clinical decision support tools (31.9%), e.g. by calculating risk factors for disease uptake or progression. BDA might also assist in designing population-based services (26.4% by clustering comparable individuals in manageable risk groups e.g. mentored by specifically trained, non-medical professionals. The main challenges of big data analytics in healthcare were categorized in regulatory, (information-) technological, methodological, and cultural issues, whereas methodological challenges were mentioned most often (55.0%), followed by regulatory challenges (43.7%). Discussion: The BDA applications presented in this literature review are based on findings which have already been published. For some important components of the framework on people-centred care like enhancing the role of community care or establishing intersectoral partnerships between health and social care institutions only few examples of enabling big data analytical tools were found in the literature. Quite the opposite does this mean that these strategies have less potential value, but rather that the source systems in these fields need to be further developed to be suitable for big data analytics. Conclusions: Big data analytics can support people-centred and integrated health services e.g. by patient similarity stratifications or predictions of individual risk factors. But BDA fails to unfold its full potential until data source systems are still disconnected and actions towards a comprehensive and people-centred health-related data platform are politically insufficiently incentivized. This work highlighted the potential of big data analysis in the context of the model of people-centred and integrated health services, whereby the role model of the person-centered health platform can be used as a blueprint to support strategies to improve person-centered health care. Likely because health data is extremely sensitive and complex, there are only few practical examples of platforms to some extent already capable of merging and processing people-centred big data, but the integration of health data can be expected to further proceed so that analytical opportunities might also become reality in the near future.
Article
Full-text available
Recent technology has modeled VANET (vehicular adhoc network) communication well in terms of privileges to derive vehicular communication technologically to save time, energy, and money. Due to the increase in powerful technology in modern times, VANETs play a vital role in uplifting daily concerns across vehicles and vehicular identities. Hence, to tune VANETs to become compatible with traditional technologies and increase demand, VANETs require upgrading. The severity and frequency of unwanted occurrences have become a considerable concern for our day-to-day lives relating to vehicular position. Thus, verily updated methodologies or working procedures are needed for the future VANET interplay to eradicate such problems occurring through vehicular identities. This article outlines in technology related to VANETS, future developments, and coping issues by deriving comprehensive frameworks, workflow patterns, upgrading procedures including big data, fog computing, SDN (software defined networking), and SIoT (social Internet of Things). This article provides a high-level overview of a complete VANET upgrade solution to address future problem management issues under a range of acceptable scientific themes, indicators, and combinations.
Article
Background: It is well known that 20% of the patients incur 80% of health care costs and many diseases and complications can be prevented or ameliorated with prompt intervention. One of the well-recognized strategies for cost reduction and better outcomes is to predict or identify high-risk and high-cost (HRHC) patients for proactive intervention. Objective: The objective of this study was to develop a predictive model that can be used to identify HRHC patients more accurately for proactive intervention. Methods: This is an observational study using fiscal year (FY) 2018 administrative data to predict FY 2019 total cost at the patient level. All 5,676,248 patients who received care in both FYs 2018 and 2019 from the Veterans Health Administration were included in the analyses. The Veterans Health Administration Corporate Data Warehouse was our main data source. With split-sample analyses, 3 sets of patient comorbidities and 5 statistical models were assessed for the highest predictive power. Results: The Box-Cox regression using comorbidities designated by the expanded CCSR (Clinical Classifications Software Refined) groups as predictors yielded the highest predictive power. The R2 reached 0.51 and 0.37 for the transformed and raw scale cost, respectively. Conclusions: The predictive model developed in this study exhibits substantially higher predictive power than what has been reported in the literature. The algorithm based on administrative data and a publicly available patient classification system can be readily implemented by other value-based health systems to identify HRHC patients for proactive intervention.
Article
Full-text available
This paper presents the form and validation results of APACHE II, a severity of disease classification system. APACHE II uses a point score based upon initial values of 12 routine physiologic measurements, age, and previous health status to provide a general measure of severity of disease. An increasing score (range 0 to 71) was closely correlated with the subsequent risk of hospital death for 5815 intensive care admissions from 13 hospitals. This relationship was also found for many common diseases.When APACHE II scores are combined with an accurate description of disease, they can prognostically stratify acutely ill patients and assist investigators comparing the success of new or differing forms of therapy. This scoring index can be used to evaluate the use of hospital resources and compare the efficacy of intensive care in different hospitals or over time.
Article
Full-text available
While Electronic Medical Records (EMR) contain detailed records of the patient-clinician encounter - vital signs, laboratory tests, symptoms, caregivers' notes, interventions prescribed and outcomes - developing predictive models from this data is not straightforward. These data contain systematic biases that violate assumptions made by off-the-shelf machine learning algorithms, commonly used in the literature to train predictive models. In this paper, we discuss key issues and subtle pitfalls specific to building predictive models from EMR. We highlight the importance of carefully considering both the special characteristics of EMR as well as the intended clinical use of the predictive model and show that failure to do so could lead to developing models that are less useful in practice. Finally, we describe approaches for training and evaluating models on EMR using early prediction of septic shock as our example application.
Article
Full-text available
Objective: To define a quantitative stratification algorithm for the risk of early-onset sepsis (EOS) in newborns ≥ 34 weeks' gestation. Methods: We conducted a retrospective nested case-control study that used split validation. Data collected on each infant included sepsis risk at birth based on objective maternal factors, demographics, specific clinical milestones, and vital signs during the first 24 hours after birth. Using a combination of recursive partitioning and logistic regression, we developed a risk classification scheme for EOS on the derivation dataset. This scheme was then applied to the validation dataset. Results: Using a base population of 608,014 live births ≥ 34 weeks' gestation at 14 hospitals between 1993 and 2007, we identified all 350 EOS cases <72 hours of age and frequency matched them by hospital and year of birth to 1063 controls. Using maternal and neonatal data, we defined a risk stratification scheme that divided the neonatal population into 3 groups: treat empirically (4.1% of all live births, 60.8% of all EOS cases, sepsis incidence of 8.4/1000 live births), observe and evaluate (11.1% of births, 23.4% of cases, 1.2/1000), and continued observation (84.8% of births, 15.7% of cases, incidence 0.11/1000). Conclusions: It is possible to combine objective maternal data with evolving objective neonatal clinical findings to define more efficient strategies for the evaluation and treatment of EOS in term and late preterm infants. Judicious application of our scheme could result in decreased antibiotic treatment in 80,000 to 240,000 US newborns each year.
Article
Full-text available
Early detection of an impending cardiac or pulmonary arrest is an important focus for hospitals trying to improve quality of care. Unfortunately, all current early warning systems suffer from high false-alarm rates. Most systems are based on the Modified Early Warning Score (MEWS); 4 of its 5 inputs are vital signs. The purpose of this study was to compare the accuracy of MEWS against the Rothman Index (RI), a patient acuity score based upon summation of excess risk functions that utilize additional data from the electronic medical record (EMR). MEWS and RI scores were computed retrospectively for 32,472 patient visits. Nursing assessments, a category of EMR inputs only used by the RI, showed sharp differences 24 hours before death. Receiver operating characteristic curves for 24-hour mortality demonstrated superior RI performance with c-statistics, 0.82 and 0.93, respectively. At the point where MEWS triggers an alarm, we identified the RI point corresponding to equal sensitivity and found the positive likelihood ratio (LR+) for MEWS was 7.8, and for the RI was 16.9 with false alarms reduced by 53%. At the RI point corresponding to equal LR+, the sensitivity for MEWS was 49% and 77% for RI, capturing 54% more of those patients who will die within 24 hours. Journal of Hospital Medicine 2013. © 2013 Society of Hospital Medicine.
Article
Nearly fifteen years after the Apgar score was introduced, Apgar reflected on its usefulness for increasing newborn survival, and noted that there were still methodological problems with using it for neonatal research.
Article
Objectives. —To assess incidence and preventability of adverse drug events (ADEs) and potential ADEs. To analyze preventable events to develop prevention strategies.Design. —Prospective cohort study.Participants. —All 4031 adult admissions to a stratified random sample of 11 medical and surgical units in two tertiary care hospitals over a 6-month period. Units included two medical and three surgical intensive care units and four medical and two surgical general care units.Main Outcome Measures. —Adverse drug events and potential ADEs.Methods. —Incidents were detected by stimulated self-report by nurses and pharmacists and by daily review of all charts by nurse investigators. Incidents were subsequently classified by two independent reviewers as to whether they represented ADEs or potential ADEs and as to severity and preventability.Results. —Over 6 months, 247 ADEs and 194 potential ADEs were identified. Extrapolated event rates were 6.5 ADEs and 5.5 potential ADEs per 100 nonobstetrical admissions, for mean numbers per hospital per year of approximately 1900 ADEs and 1600 potential ADEs. Of all ADEs, 1% were fatal (none preventable), 12% life-threatening, 30% serious, and 57% significant. Twenty-eight percent were judged preventable. Of the life-threatening and serious ADEs, 42% were preventable, compared with 18% of significant ADEs. Errors resulting in preventable ADEs occurred most often at the stages of ordering (56%) and administration (34%); transcription (6%) and dispensing errors (4%) were less common. Errors were much more likely to be intercepted if the error occurred earlier in the process: 48% at the ordering stage vs 0% at the administration stage.Conclusion. —Adverse drug events were common and often preventable; serious ADEs were more likely to be preventable. Most resulted from errors at the ordering stage, but many also occurred at the administration stage. Prevention strategies should target both stages of the drug delivery process.(JAMA. 1995;274:29-34)
Article
This paper proposes a nonparametric Bayesian method for exploratory data anal-ysis and feature construction in continuous time series such as longitudinal health data. Our method focuses on understanding shared characteristics in a set of time series that exhibit significant individual variability. Each series is characterized as switching between latent states ("topics"), where each topic is characterized as a distribution over generating functions ("words") that specify the series dy-namics. Individual series maintain series-specific topic mixing proportions. The words are modeled as lying in an infinite dimensional space and the hierarchi-cal Dirichlet Process prior allows selection of words that are shared across topics given data. Word and topic descriptions are shared across the entire population. We apply this model to the task of tracking the physiological signals of premature infants; our model obtains clinically significant insights as well as useful features for supervised learning tasks. Furthermore, based on these insights, we devel-oped Physiscore, a personalized risk stratification score for preemies. Physiscore performs significantly better than APGAR, the current standard of care.
Article
Calls are increasing for American health care to be organized as a learning health care system, defined by the Institute of Medicine as a health care system “in which knowledge generation is so embedded into the core of the practice of medicine that it is a natural outgrowth and product of the healthcare delivery process and leads to continual improvement in care.” We applaud this conception, and in this paper, we put forward a new ethics framework for it. No such framework has previously been articulated. The goals of our framework are twofold: to support the transformation to a learning health care system and to help ensure that learning activities carried out within such a system are conducted in an ethically acceptable fashion.
Article
For hospitalized patients with unexpected clinical deterioration delayed or suboptimal intervention is associated with increased morbidity and mortality. Lack of continuous monitoring for average risk patients has been suggested as a contributing factor for unexpected in-hospital mortality. Our objective was to assess the effects of continuous heart rate and respiration rate monitoring in a medical-surgical unit on unplanned transfers and length of stay at the intensive care unit and length of stay at the medical-surgical unit. In a controlled study we have compared a 33-beds medical-surgical unit (intervention unit) to a "sister" control unit for a 9-month pre and a 9-month post implementation period. Following the intervention, all beds in the intervention unit were equipped with monitors that allowed for continuous assessment of heart and respiration rate. We reviewed 7643 patient charts, 2314 that were continuously monitored in the intervention arm and 5329 in the control arms. Comparing the average length of stay of patients hospitalized in the intervention unit following the implementation of the monitors to that prior to the implementation and to that in the control unit we have observed a significant decrease (from 4.0 to 3.6 and 3.6 days respectively; p=<0.01). Total Intensive Care Unit days were significantly lower in the intervention unit post implementation (63.5 versus. 120.1 and 85.36 days/1000 patients respectively; p=0.04). The rate of transfer to the Intensive Care Unit did not change comparing before and after implementation and to the control unit (p=0.19). Rate of code blue events decreased following the intervention from 6.3 to 0.9 and 2.1 respectively per 1000 patients (p=0.02). Continuous monitoring on a medical-surgical unit was associated with a significant decrease in total length of stay in the hospital and in intensive care unit days for transferred patients, as well as lower code blue rates.