ArticlePDF AvailableLiterature Review

Big data analytics in healthcare: Promise and potential


Abstract and Figures

To describe the promise and potential of big data analytics in healthcare. The paper describes the nascent field of big data analytics in healthcare, discusses the benefits, outlines an architectural framework and methodology, describes examples reported in the literature, briefly discusses the challenges, and offers conclusions. The paper provides a broad overview of big data analytics for healthcare researchers and practitioners. Big data analytics in healthcare is evolving into a promising field for providing insight from very large data sets and improving outcomes while reducing costs. Its potential is great; however there remain challenges to overcome.
Content may be subject to copyright.
REVIE W Open Access
Big data analytics in healthcare: promise and
Wullianallur Raghupathi
and Viju Raghupathi
Objective: To describe the promise and potential of big data analytics in healthcare.
Methods: The paper describes the nascent field of big data analytics in healthcare, discusses the benefits, outlines
an architectural framework and methodology, describes examples reported in the literature, briefly discusses the
challenges, and offers conclusions.
Results: The paper provides a broad overview of big data analytics for healthcare researchers and practitioners.
Conclusions: Big data analytics in healthcare is evolving into a promising field for providing insight from very large
data sets and improving outcomes while reducing costs. Its potential is great; however there remain challenges to
Keywords: Big data, Analytics, Hadoop, Healthcare, Framework, Methodology
The healthcare industry historically has generated large
amounts of data, driven by record keeping, compliance
& regulatory requirements, and patient care [1]. While
most data is stored in hard copy form, the current trend is
toward rapid digitization of these large amounts of data.
Driven by mandatory requirements and the potential to
improve the quality of healthcare delivery meanwhile re-
ducing the costs, these massive quantities of data (known
as big data) hold the promise of supporting a wide range
of medical and healthcare functions, including among
others clinical decision support, disea se surveillance,
and population health management [2-5]. Reports say
data from the U.S. healthcare sy stem alone reached, in
2011, 150 exabytes. At this rate of growth, big data for U.S.
healthcare will soon reach the zettabyte (10
scale and, not long after, the yottabyte (10
gigabytes) [6].
Kaiser Permanente, the California-based health network,
which has more than 9 million members, is believed to
have between 26.5 and 44 petabytes of potentially rich
data from EHRs, including images and annotations [6].
By definition, big data in healthcare refers to electronic
health data sets so large and complex that they are difficult
(or impossible) to manage with traditional software and/
or hardware; nor can they be easily managed with trad-
itional or common data management tools and methods
[7]. Big data in healthcare is overwhelming not only be-
cause of its volume but also because of the diversity of
data types and the speed at which it must be managed [7].
The totality of data related to patient healthcare and well-
being make up big data in the healthcare industry. It
includes clinical data from CPOE and clinical decision
support systems (physicians written notes and prescrip-
tions, medical imaging, laboratory, pharmacy, insurance,
and other administrative data); patient data in electronic
patient records (EPRs); machine generated/sensor data,
such as from monitoring vital signs; social media posts, in-
cluding Twitter feeds (so-called tweets) [8], blogs [9], status
updates on Facebook and other platforms, and web pages;
and less patient-specific information, including emergency
care data, news feeds, and articles in medical journals.
For the big data scientist, there is, amongst this vast
amount and array of data, opportunity. By discovering
associations and understanding patterns and trends
within the data, big data analytics has the potential to
improve care, save lives and lower costs. Thus, big data
analytics applications in healthcare take advantage of the
explosion in data to extract insights for making better
informed decisions [10-12], and as a research category
* Correspondence:
Graduate School of Business, Fordham University, 113 W. 60th Street, 10023
New York, NY, USA
Full list of author information is available at the end of the article
© 2014 Raghupathi and Raghupathi; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of
the Creative Commons Attribution License (, which permits unrestricted use,
distribution, and reproduction in any medium, provided the original work is properly credited.
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3
are referred to as, no surprise here, big data analytics in
healthcare [13-15]. When big data is synthesized and an-
alyzedand those aforementioned associations, patterns
and trends revealedhealthcare providers and other
stakeholders in the healthcare delivery system can de-
velop more thorough and insightful diagnoses and treat-
ments, resulting, one would expect, in higher quality
care at lower costs and in better outcomes overall [12].
The potential for big data analytics in health care to lead
to better outcomes exists across many scenarios, for ex-
ample: by analyzing patient characteristics and the cost
and outc omes of care to iden tify the most clinically and
cost effe ctive treatments and offer analysis and tools,
thereby influencing provider behavior; applying ad-
vanced analytics to patient profiles (e.g., segmentation
and predictive modeling) to proactively identify individ-
uals who would benefit from preventative care or life-
style changes; broad scale disease profiling to identify
predictive events and support prevention initiatives; col-
lecting and publishing data on medical procedures, thus
assisting patients in determining the care protocols or
regimens that offer the best value; identifying, predicting
and minimizing fraud by implementing advanced ana-
lytic systems for fraud detection and checking the accur-
acy and consistency of claims; and, implementing much
nearer to real-time, claim authorization; creating new
revenue streams by aggregating and synthesizing patient
clinical records and claims data sets to provide data and
services to third parties, for example, licensing data to
assist pharmaceutical companies in identifying patients
for inclusion in clinical trials. Many payers are develop-
ing and deploying mobile apps that help patients manage
their care, locate providers and improve their health. Via
analytics, payers are able to monitor adherence to drug
and treatment regimens and detect trends that lead to
individual and population wellness benefits [12,16-18].
This article provides an overview of big data analytics
in healthcare as it is emerging as a discipline. First, we
define and discuss the various advantages and character-
istics of big data analytics in healthcare. Then we de-
scribe the architectural framework of big data analytics
in healthcare. Third, the big data analytics application
development methodology is described. Fourth, we pro-
vide examples of big data analytics in healthcare reported
in the literature. Fifth, the challenges are identified. Lastly,
we offer conclusions and future directions.
Big data analytics in healthcare
Health data volume is expected to grow dramatically in
the years ahead [6]. In addition, healthcare reimburse-
ment models are changing; meaningful use and pay for
performance are eme rging as critical new factors in to-
days healthcare environment. Although profit is not and
should not be a primary motivator, it is vitally imp ortant
for healthcare organizations to acquire the available
tools, infrastructure, and techniqu es to leverage big data
effectively or else risk losing potentially millions of dol-
lars in revenue and profits [ 19].
What exactly is big data? A report delivered to the U.S.
Congress in August 2012 defines big data as large vol-
umes of high velocity, complex, and variable data that re-
quire advanced techniques and technologies to enable the
capture, storage, distribution, management and analysis of
the information [6]. Big data encompasses such charac-
teristics as variety, velocity and, with respect specifically to
healthcare, veracity [20-23]. Existing analytical techniques
can be applied to the vast amount of existing (but cur-
rently unanalyzed) patient-related health and medical data
to reach a deeper understanding of outcomes, which then
can be applied at the point of care. Ideally, individual and
population data would inform each physician and her
patient during the decision-making process and help de-
particular patient.
Advantages to healthcare
By digitizing, com bining and effectively using big data ,
healthcare organizations ranging from single-physician
offices and multi-provider groups to large hospital net-
works and accountable care organizations stand to
realize significant benefits [2]. Potential benefits include
detecting diseases at earlier stages when they can be
treated more easily and effectively; managing specific in-
dividual and population health and detecting health care
fraud more quickly and efficiently. Numerous questions
can be addressed with big data analytics. Certain devel-
opments or outcomes may be predicted and/or esti-
mated based on vast amounts of historical data, such as
length of stay (LOS); patients who will choose elective
surgery; patients who likely will not benefit from surgery;
complications; patients at risk for medical complications;
patients at risk for sepsis, MRSA, C. difficile, or other
hospital-acquired illness; illness/disease progression; pa-
tients at risk for advancement in disea se states; causal
factors of illness/disease progression; and possible co-
morbid conditions (EMC Consulting). McKinsey esti-
mates that big data analytics can enable more than $300
billion in savings per year in U.S. healthcare, two thirds
of that through reductions of approximately 8% in na-
tional healthcare expenditures. Clinical operations and R
& D are two of the largest areas for potential savings
with $165 billion and $108 billion in waste respectively
[24]. McKinsey believes big data could help reduce waste
and inefficiency in the following three areas:
Clinical operations: Comparative effectiveness
research to determine more clinically relevant and
cost-effective ways to diagnose and treat patients.
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 2 of 10
Research & development: 1) predictive modeling to
lower attrition and produce a leaner, faster, more
targeted R & D pipeline in drugs an d devic es;
2) statistical tools and algorithms to improve clinical
trial design and patient recruitment to better match
treatments to individual patients, thus reducing trial
failures and speeding new treatments to market; and
3) analyzing clinical trials and patient records to identify
follow-on indications and discover adverse effects before
products reach the market.
Public health: 1) analyzing disease patterns and
tracking disease outbreaks and transmission to improve
public health surveillance and speed response; 2) faster
developmen t of more accurately targeted vaccines, e.g.,
choosing the annual influenza strains; and, 3) turning
large amou nts of data into actionable information that
can be used to identify needs, provide services, and
predict and prevent crises, especially for the benefit of
populations [ 24].
In addition, [14] suggests big data analytics in
healthcare can contribute to
Evidence-based medicine: Combine and analyze a
variety of structured and unstructured data-EMRs,
financial and operational data, clinical data, and genomic
data to match treatments with outcomes, predict patients
at risk for disease or readmission and provide more
efficient care;
Genomic analytics: Execute gene sequencing more
efficiently and cost effectively and make genomic
analysis a part of the regular medical care decision
process and the growing patient medical record [25];
Pre-adjudication fraud analysis: Rapidly analyze
large numb ers of claim requests to reduce fraud, waste
and abuse;
Device/remote monitoring: Capture and analyze in
real-time large volumes of fast-moving data from
in-hospital and in-home devices, for safety monitoring
and adverse event prediction;
Patient profile analytic s: Apply adv anced analytics
to patient profiles (e.g., segmentation and predict ive
modeling) to identify individuals who would benefit
from proactive care or lifestyle changes, for example,
those patient s at risk of developing a specific disease
(e.g., diabetes) who would benefit from pre ven tive
care [14].
According to [16], areas in which enhanced data and
analytics yield the greatest results include: pinpointing
patients who are the greatest consumers of health re-
sources or at the greatest risk for adverse outcomes; pro-
viding individuals with the information they need to
make informed decisions and more effectively manage
their own health as well as more easily adopt and track
healthier behaviors; identifying treatments, programs
and processes tha t do not deliver demonstrable benefit s
or cost too much; reducing readmissions by identif ying
environmental or lifestyle factors that increase risk or trig-
ger adverse events [26] and adjusting treatment plans ac-
cordingly; improving outcomes by examining vitals from
at-home health monitors; managing population health by
detecting vulnerabilities within patient populations during
disease outbreaks or disasters; and bringing clinical, finan-
cial and operational data together to analyze resource
utilization productively and in real time [16].
The 4 Vs of big data analytics in healthcare
Like big data in healthcare, the analytics a ssociated with
big data is described by three primary characteristics:
volume, velocity and variety (
ware/data/bigdata/). Over time, health-related data will be
created and accumulated continuously, resulting in an in-
credible volume of data. The already daunting volume of
existing healthcare data includes personal medical records,
radiology images, clinical trial data FDA submissions, hu-
man genetics and population data genomic sequences, etc.
Newer forms of big data, such as 3D imaging, genomics
and biometric sensor readings , are also fueling this ex-
ponential growth.
Fortunately, advances in data management, particu-
larly virtualization and cloud computing, are facilitating
the development of platforms for more effective capture,
storage and manipulation of large volumes of data [4].
Data is accumulated in real-time and at a rapid pace, or
velocity. The constant flow of new data accumulating at
unprecedented rates presents new challenges. Just as the
volume and variety of data that is collected and stored
has changed, so too has the velocity at which it is gener-
ated and that is necessary for retr ieving, analyzing, com-
paring and making decisions based on the output.
Most healthcare data has been traditionally staticpaper
files, x-ray films, and scripts. Velocity of mounting data in-
creases with data that represents regular monitoring, such
as multiple daily diabetic glucose measurements (or more
continuous control by insulin pumps), blood pressure
readings, and EKGs. Meanwhile, in many medical situa-
tions, constant real-time data (trauma monitoring for
blood pressure, operating room monitors for anesthesia ,
bedside heart monitors, etc.) can mean the difference be-
tween life and death.
Future applications of real-time data, such a s detecting
infections as early as possible, identifying them swiftly
and applying the right treatments (not just broad-spectrum
antibiotics) could reduce patient morbidity and mortality
and even prevent hospital outbreaks. Already, real-time
streaming data monitors neonates in the ICU, catching
life-threatening infections sooner [6]. The ability to per-
form real-time analytics against such high-volume data in
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 3 of 10
motion and across all specialties would revolutionize
healthcare [4]. Therein lies variety.
As the nature of health data has evolved, so too have
analytics techniques scaled up to the complex and so-
phisticated analytics necessary to accommodate volume,
velocity and variety. Gone are the days of data collected
exclusively in electronic health records and other struc-
tured formats. Increasingly, the data is in multimedia
format and unstructured. The enormous variety of data
structured, unstructured and semi-structuredis a di-
mension that makes healthcare data both interesting
and challenging.
Structured data is data that can be easily stored, quer-
ied, recalled, analyzed and man ipulated by machine. His-
torically, in healthcare, structured and semi-structured
data includes instrument readings and data generated by
the ongoing con version of paper records to ele ctronic
health and medical records. Historically, the point of
care generated unstructured data: office medical records,
handwritten nurse and doctor notes, hospital admission
and discharge records, paper prescriptions , radiograph
films, MRI, CT and other images.
Already, new data streamsstructured and unstruc-
turedare cascading into the healthcare realm from fit-
ness devices, genetics and genomi cs, social media
research and other sources. But relatively little of this
data can presently be captur ed, stored and organized so
that it can be manipulated by computers and analyzed
for useful information. Healthcare applications in par-
ticular need more efficient ways to combine and convert
varieties of data including automating conversion from
structured to unstructured data.
The structured data in EMRs and EHRs include famil-
iar input record fields such as patient name, data of
birth, address, physicians name, hospital name and ad-
dress, treatment reimbursement codes, and other infor-
mation easily coded into and handled by automated
databases. The need to field-code data at the point of
care for electronic handling is a major barrier to accept-
ance of EMRs by physicians and nurses, who lose the
natural language ease of entry and understanding that
handwritten notes provide. On the other hand, most
providers agree that an easy way to reduce prescription
errors is to use digital entries rather than handwritten
The potential of big data in healthcare lies in combin-
ing traditional data with new forms of data, both indi-
vidually and on a population level. We are already seeing
data sets from a multitude of sources support faster and
more reliable research and discovery. If, for example,
pharmaceutical developers could integrate population
clinical data sets with genomics data, this development
could facilitate those developers gaining approvals on
more and better drug therapies more quickly than in the
past and, more importantly, expedite distribution to the
right patients [4]. The prospects for all areas of health-
care are infinite.
Some practitioners and researchers have introduced a
fourth characteristic, veracity, or data assurance. That
is, the big data, analytics and outcomes are error-free
and credible. Of course, veracity is the goal, not (yet) the
reality. Data quality issues are of acute concern in
healthcare for two reasons: life or death decisions de-
pend on having the accurate information, and the quality
of healthcare data, especially unstructured data, is highly
variable and all too often incorrect. (Inaccurate transla-
tions of poor handwriting on prescriptions are perhaps
the most infamous example).
Veracity assumes the simultaneous scaling up in granu-
larity and performance of the archite ctures a nd plat-
forms, algorithms, methodologies and tools to match
the demands of big data . The analytics architectures
and tools for structured and unstructured big data are
very different from traditional business intelligence (BI)
tools. They are nece ssarily of industrial strength. For ex-
ample, big data analytics in healthcare would be exe-
cuted in distribute d processing across several servers
(nodes), utiliz ing the par adigm of parallel computing
and divide and process
approach. Likewise, models and
techniquessuch as data mining and statistical approaches,
algorithms, visualization techniquesneed to take into ac-
count the characteristics of big data analytics. Traditional
data management assumes that the warehoused data is
certain, clean, and precise.
Veracity in healthcare data faces many of the same is-
sues as in financial data, especially on the payer s ide: Is
this the correct patient/hospital/payer/reimbursement
code/dollar amount? Other veracity issues are unique to
healthcare: Are diagnoses/treatments/prescriptions/proce-
dures/outcomes captured correctly?
Improving coordination of care, avoiding errors and
reducing costs depend on high-q uality data , as do ad-
vances in drug safety and efficacy, diagnostic accuracy
and more precise targeting of disease processes by treat-
ments. But increased variety and high velocity hinder
the ability to cleanse data before analyzing it and making
decisions, magnifying the issue of data trust [4].
The 4Vs are an appropriate starting point for a
discussion about big data analytics in healthcare. But
there are other issues to consider, such a s the num-
ber of architectures and platforms, and the domin-
ance of the open source paradigm in the availability
of tools. Consider, too, the challenge of de veloping
methodologies and the need for user-friendly inter-
faces. While the overall cost of hardware and software
is declining , these issues have to be addressed to har-
ness and maximize the potential of big data analytics
in healthcare.
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 4 of 10
Architectural framework
The conceptual framework for a big data analytics pro-
ject in healthcare is similar to that of a traditional health
informatics or analytics project. The key difference lies
in how processing is executed. In a regular health analyt-
ics project, the analysis can be performed with a busi-
ness intelligence tool installed on a stand-alone system,
such as a desktop or laptop. Beca use big data is by defin-
ition large, processing is br oken down and executed
across multiple nodes. The concept of distributed pro-
cessing has existed for decades. What is relatively new is
its use in analyzing very large data sets as healthcare
providers start to tap into their large data repositories to
gain insight for making better-informed health-related
decisions. Furthermore, open source platforms such as
Hadoop/MapReduce, available on the cloud, have encour-
aged the application of big data analytics in healthcare.
While the algorithms and models are similar, the user
interfaces of traditional analytics tools and those used
for big data are entirely different; traditional health ana-
lytics too ls have be come very user friendly and transpar-
ent. Big data analytics tools, on the other hand, are
extremely com plex, program ming intensive, and require
the application of a variety of skills. They have emerged
in an ad hoc fashion mostly as open-source development
tools and platforms, and therefore they lack the support
and user-friendliness that vendor-driven proprietary
tools possess. As Figure 1 indicates , the complexity be-
gins with the data itself.
Big data in healthcare can come from internal (e.g., elec-
tronic health records, clinical decision support systems,
CPOE, etc.) and external sources (government sources, la-
boratories, pharmacies, insurance companies & HMOs,
etc.), often in multiple formats (flat files, .csv, relational
tables, ASCII/text, etc.) and residing at multiple locations
(geographic as well as in different healthcare providers
sites) in numerous legacy and other applications (transac-
tion processing applications, databases, etc.). Sources and
data types include:
1. Web and social media data: Clickstream and
interaction data from Facebook, Twitter, LinkedIn,
blogs, and the like. It can also include health plan
websites, smartphone apps, etc. [6].
2. Machine to machine data: readings from remote
sensors , meters, and other vital sign devices [6].
3. Big transaction data: health care claims and other
billing records increasingly available in semi-structured
and unstructured formats [6].
4. Biometric data: finger prints, genetics, handwriting,
retinal scans, x-ray and other medical images, blood
pressure, pulse and pulse-oximetry readings, and
other similar types of data [6].
5. Human-generated data: unstructured and
semi-structured data such as EMRs, physicians
notes, email, and paper documents [6].
For the purpose of big data analytics, this data has to
be pooled. In the second component the data is in a
raw state and needs to be processed or transformed, at
which point several options are available. A service-
oriented architectural approach com bined with web ser-
vices (middleware) is one possibility [27]. The data stays
raw and services are used to call, retrieve and process
the data. Another approach is data warehousing wherein
data from various sources is aggregated and made ready
for processing, although the data is not available in real-
time. Via the steps of extract , transform, and load (ETL),
Figure 1 An applied conceptual architecture of big data analytics.
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 5 of 10
data from diverse sources is cleansed and readied. De-
pending on whether the data is structured or unstruc-
tured, several data formats can be input to the big data
analytics platform.
In this next component in the conceptual framework,
several decisions are made regarding the data input ap-
proach, distributed design, tool selection and analytics
models. Finally, on the far right, the four typical applica-
tions of big data analytics in healthcare are shown.
These include queries, reports, OLAP, and data mining.
Visualization is an overarching theme across the four ap-
plications. Drawing from such fields as statistics, com-
puter science, applied mathematics and economics, a
wide variety of techniques and technologies has been de-
veloped and adapted to aggregate, manipulate, analyze,
and visualize big data in healthcare.
The most significant platform for big data analytics is
the open-source distributed data processing platform
Hadoop (Apache platform), initially developed for such
routine functions as aggregating web search indexes. It
belongs to the class NoSQL technologiesothers in-
clude CouchDB and MongoDBthat evolved to aggre-
gate data in unique ways. Hadoop has the potential to
process extremely large amount s of data mainly by allo-
cating partitioned data sets to numerous servers (nodes),
each of which solves different parts of the larger prob-
lem and then integrates them for the final result [28-31].
Hadoop can serve the twin roles of data organizer and
analytics tool. It offers a great deal of potential in enab-
ling enterprises to harness the data that has been, until
now, difficult to manage and analyze. Specifically, Hadoop
makes it possible to process extremely large volumes of
data with various structures or no structure at all. But
Hadoop can be challenging to install, configure and ad-
minister, and individuals with Hadoop skills are not easily
found. Furthermore, for these reasons, it appears organiza-
tions are not quite ready to embrace Hadoop completely.
The surrounding ecosystem of additional platforms and
tools supports the Hadoop distributed platform [30,31].
These are summarized in Table 1.
Numerous vendorsincluding AWS, Cloudera,
Hortonworks, and MapR Technologiesdistribute open-
source Hadoop platforms [29]. Many proprietary options
are also available, such a s IBMsBigInsights.Further,
many of these platforms are cloud versions, making them
widely available. Cassandra, HBase, and MongoDB, de-
scribed above, are used widely for the database compo-
nent. While the available frameworks and tools are mostly
open source and wrapped around Hadoop and related
platforms, there are numerous trade-offs that devel-
opers and users of big data analytics in hea lthcare must
consider. While the development costs may be lower
since these tools are open source and free of charge, the
downsides are the lack of technical support and minimal
Table 1 Platforms & tools for big data analytics in
Platform/Tool Description
The Hadoop Distributed
File System (HDFS)
HDFS enables the underlying storage for
the Hadoop cluster. It divides the data into
smaller parts and distributes it across the
various servers/nodes.
MapReduce MapReduce provides the interface for the
distribution of sub-tasks and the gathering
of outputs. When tasks are executed,
MapReduce tracks the processing of each
PIG and PIG Latin
(Pig and PigLatin)
Pig programming language is configured
to assimilate all types of data (structured/
unstructured, etc.). It is comprised of two
key modules: the language itself, called
Hive Hive is a runtime Hadoop support
architecture that leverages Structure Query
Language (SQL) with the Hadoop platform.
It permits SQL programmers to develop
Hive Query Language (HQL) statements
akin to typical SQL statements.
Jaql Jaql is a functional, declarative query
language designed to process large data
sets. To facilitate parallel processing, Jaql
converts “‘high-level queries into low-level
queries consisting of MapReduce tasks.
Zookeeper Zookeeper allows a centralized
infrastructure with various services,
providing synchronization across a cluster
of servers. Big data analytics applications
utilize these services to coordinate parallel
processing across big clusters.
HBase HBase is a column-oriented database man-
agement system that sits on top of HDFS. It
uses a non-SQL approach.
Cassandra Cassandra is also a distributed database
system. It is designated as a top-level pro-
ject modeled to handle big data distributed
across many utility servers. It also provides
reliable service with no particular point of
failure (
Cassandra) and it is a NoSQL system.
Oozie Oozie, an open source project, streamlines
the workflow and coordination among the
Lucene The Lucene project is used widely for text
analytics/searches and has been
incorporated into several open source
projects. Its scope includes full text indexing
and library search for use within a Java
Avro Avro facilitates data serialization services.
Versioning and version control are
additional useful features.
Mahout Mahout is yet another Apache project
whose goal is to generate free applications
of distributed and scalable machine
learning algorithms that support big data
analytics on the Hadoop platform.
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 6 of 10
security. In the healthcare industry, these are, of course,
significant drawbacks, and therefore the trade-offs must be
addressed. Additionally, these platforms/tools require a
great deal of programming, skills the typical end-user in
healthcare may not possess. Furthermore, considering the
only recent emergence of big data analytics in healthcare,
governance issues including ownership, privacy, security,
and standards have yet to be addressed. In the next section
we offer an applied big data analytics in healthcare meth-
odology to develop and implement a big data project for
healthcare providers.
While several different methodologies are being developed
in this rapidly emerging discipline, here we outline one
that is practical and hands-on. Table 2 shows the main
stages of the methodology. In Step 1, the interdisciplinary
big data analytics in healthcare team develops a concept
statement. This is a first cut at establishing the need for
such a project. The concept statement is followed by a de-
scription of the projects significance. The healthcare
organization will note that there are trade-offs in terms of
alternative options, cost, scalability, etc. Once the concept
statement is approved, the team can proceed to Step 2,the
proposal development stage. Here, more details are filled
in. Based on the concept statement, several questions are
addressed: What problem is being addressed? Why is it
important and interesting to the healthcare provider?
What is th e ca se for a big data analytics approach?
(Because the complexity and cost of big data analytics
are significantly higher compared to traditional analytics
approaches, it is important to justify their use). The pro-
ject team also should provide background information on
the problem domain as well as prior projects and research
done in this domain.
Next, in Step 3, the steps in the methodology are fleshed
out and implemented. The concept statement is broken
down into a series of propositions. (Note these are not
rigorous as they would be in the case of statistical ap-
proaches. Rather, they are developed to help guide the big
data analytics process). Simultaneously, the independent
and dependent variables or indicators are identified. The
data sources, as outlined in Figure 1, are also identified;
the data is collected, described, and transformed in prep-
aration for for analytics. A very important step at this
point is platform/tool evaluation and selection. There are
several options available, as indicated previously, including
AWS Hadoop, Cloudera, and IBM BigInsights. The next
step is to apply the various big data analytics techniques
to the data. This process differs from routine analytics
only in that the techniques are scaled up to large data sets.
Through a series of iterations and what-if analyses, insight
is gained from the big data analytics. From the insight, in-
formed decisions can be made. In Step 4, the models and
their findings are tested and validated and presented to
stakeholders for action. Implementation is a staged ap-
proach with feedback loops built in at each stage to
minimize risk of failure.
The next section describes several reported big data
analytics applications in healthcare. We draw on publicly
available material from numerous sources, including
vendor sites. In this emerging discipline, there is little in-
dependent research to cite. These examples are from
secondary sources. Nevertheless, they are illustrative of
the potential of big data analytics in health care.
Premier, the U.S. healthcare alliance network, has more
than 2,700 members, hospitals and health systems,
90,000 non-acute facilities and 400,000 physicians and is
reported to have data on approximately one in four pa-
tients discharged from hospitals. Naturally, the network
has assembled a large database of clinical, financial, pa-
tient, and supply chain data, with which the network has
generated comprehensive and comparable clinical out-
come measures, resource utilization reports and trans-
action le vel cost data. These outputs have informed
decision-making and improved the he althcare processes
at approximately 330 hospitals, saving an estimated
29,000 lives and reducing healthcare spending by nearly
Table 2 Outline of big data analytics in healthcare
Step 1 Concept statement
Establish need for big data analytics project in healthcare
based on the 4Vs.
Step 2 Proposal
What is the problem being addressed?
Why is it important and interesting?
Why big data analytics approach?
Background material
Step 3 Methodology
Variable selection
Data collection
ETL and data transformation
Platform/tool selection
Conceptual model
Analytic techniques
-Association, clustering, classification, etc.
Results & insight
Step 4 Deployment
Evaluation & validation
Source: Adapted from [Raghupathi & Raghupathi, [9]].
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 7 of 10
$7 billion [16]. North York General Hospita l, a 450-bed
community teaching hospital in Toronto, Canada, reports
using real-time analytics to improve patient outcomes and
gain greater insight into the operations of healthcare deliv-
ery. North York is reported to have implemented a scal-
able real-time analytics application to provide multiple
perspectives, including clinical, administrative, and finan-
cial [16]. Another example, reported by IBM, is that of the
large, unnamed healthcare provider that is analyzing data
in the electronic medical record (EMR) system with the
goal of reducing costs and improving patient care. (Data
in the EMR include the unstructured data from physician
notes, pathology reports and other sources). Big data ana-
lytics is used to develop care protocols and case pathways
and to assist caregivers in performing customized queries
[16]. Another example of big data analytics in healthcare
is Columbia University Medical Centersanalysisofcom-
plex correlations of streams of physiological data related
to patients with brain injuries. The goal is to provide med-
ical professionals with critical and timely information to
aggressively treat complications. The advanced analytics is
reported to diagnose serious complications as much as 48
hours sooner than previously in patients who have suf-
fered a bleeding stroke from a ruptured brain aneurysm
[16]. The Rizzoli Orthopedic Institute in Bologna, Italy, is
reportedly using advanced analytics to gain a more
granular understanding of the clinical variations within
families whereby individual patients display extreme dif-
ferences in the severity of their symptoms. This insight is
reported to have reduced annual hospitalizations by 30%
and the number of imaging tests by 60%. In the long-
term, the Institute expects to gain insight into the role of
genetic factors to develop treatments [16]. The Hospital
for Sick Children (Sick Kids) in Toronto is using analytics
to improve the outcomes for infants prone to life-
threatening nosocomial infections.Itisreportedthat
Sick Kids applies advanced analytics to vital-sign data
gathered from bedside monitoring devices to identify po-
tential signs infection as early as 24 hours prior to previous
methods [6,16]. Additional examples are reported below.
A recent New Yorker magazine article by Atul Gawande,
MD described how orthopedic surgeons at Brigham and
Womens Hospital in Boston relied on personal experi-
ence along with insight extracted from research on data
based on a host of factors critical to the success of joint-
replacement surgery to systematically standardize knee
joint-replacement surgery. The result: improved outcomes
at lower costs. The University of Michigan Health System
standardized the administration of blood transfusions
using analytics in a similar fashion, combining experience
with big data analytics research. This resulted in a 31% re-
duction in transfusions and $200,000 reduction in ex-
penses per month (reported in [6]). Another example is
The National Institute for Health and Clinical Excellence
(NICE) of the U.K.s National Health Service. NICE is re-
portedly a leader in the analytics of large clinical datasets
for exploring the effectiveness of clinical and cost factors
in the use of new drugs and/or clinical treatments. The
Italian Medicines Agency is also reported to collect and
analyze clinical data on the use of expensive new drugs as
one goal in a country-level cost-effectiveness program [6].
Another leading example of big data analytics in health-
care is the Department of Veterans Affairs (VA) use of ap-
plications on its very large data set in an effort to comply
with performance-based accountability framework and
disease management practice [6]. In one very famous ex-
ample, California-based Kaiser Permanente associated
clinical data with cost data to generate a key data set, the
analytics of which led to the discovery of adverse drug ef-
fects and subsequent withdrawal of Vioxx from the mar-
ket [6]. Researchers at the Johns Hopkins School of
Medicine discovered they could use data from Google Flu
Trends to predict sudden increases in flu-related emer-
gency room visits at least a week before warnings from
the CDC. Likewise, the analysis of Twitter updates was as
accurate as (and two weeks ahead of) official reports at
tracking the spread of cholera in Haiti after the January
2010 earthquake [6]. Also reported is an application devel-
oped by IBM that predicts the likely outcomes of diabetes
patients using patients panel data linked to physicians,
management protocols, and the overall relationship to
population health management averages [6]. In another dia-
betes application, physicians at Harvard Medical School
and Harvard Pilgrim Health Care recently demonstrated
the potential of analytics applications to EHR data to iden-
tify and group patients with diabetes for public health sur-
veillance. Four years worth of data based on numerous
indicators from multiple sources was utilized. The analyt-
ics application also differentiated between Type 1 and
Type II diabetes [6,26]. Finally, at Blue Cross Blue Shield
of Massachusetts (BCBSMA) there was a need to embed
analytics into business processes to help decision-makers
across the business gain insight into financial and medical
data and become more proactive. Several benefits were
reported. First, the analytics enabled medical directors to
identify high-risk disease groups and act to minimize risk
and improve patient outcomes. For example, new pre-
ventive treatment protocols could be introduced among
patient groups with high cholesterol, thereby fending off
heart problems. Also, complex health informatics re-
ports were generated 300% fa ster than previously, help-
ing BC BSMA ser vice client s more effe ctively [6].
The next section briefly identifies some of the key
challenges in big data analytics in healthcare.
At minimum, a big data analytics platform in healthcare
must support the key functions necessary for processing
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 8 of 10
the data. The criteria for platform evaluation may include
availability, continuity, ease of use, scalability, ability to
manipulate at different levels of granularity, privacy and
security enablement, and quality assurance [6,29,32]. In
addition, while most platforms currently available are
open source, the typical advantages and limitations of
open source platforms apply. To succeed, big data analyt-
ics in healthcare needs to be packaged so it is menu-
driven, user-friendly and transparent. Real-time big data
analytics is a key requirement in healthcare. The lag be-
tween data collection and processing has to be addressed.
The dynamic availability of numerous analytics algo-
rithms, models and methods in a pull-down type of menu
is also necessary for large-scale adoption. The important
managerial issues of ownership, governance and standards
have to be considered. And woven through these issues
are those of continuous data acquisition and data cleans-
ing. Health care data is rarely standardized, often fragmen-
ted, or generated in legacy IT systems with incompatible
formats [6]. This great challenge needs to be addressed
as well.
Big data analytics has the potential to transform the way
healthcare providers use sophisticated technologies to
gain insight from their clinical and other data repositor-
ies and make informed decisions. In the future well see
the rapid, widespread implementation and use of big
data analytics across the healthcare organiz ation and the
healthcare industry. To that end, the several challenges
highlighted above, must be addressed. As big data analyt-
ics becomes more mainstream, issues such as guarantee-
ing privacy, safeguarding security, establishing standards
and governance, and continually improving the tools and
technologies will garner attention. Big data analytics and
applications in healthcare are at a nascent stage of devel-
opment, but rapid advances in platforms and tools can ac-
celerate their maturing process.
Competing interests
We, the authors declare we have no competing interests.
Authors contributions
Both WR and VR contributed equally. Both authors read and approved the
final manuscript.
Author details
Graduate School of Business, Fordham University, 113 W. 60th Street, 10023
New York, NY, USA.
Brooklyn College, City University of New York, Brooklyn,
Received: 27 August 2013 Accepted: 5 January 2014
Published: 7 February 2014
1. Raghupathi W: Data Mining in Health Care. In Healthcare Informatics:
Improving Efficiency and Productivity. Edited by Kudyba S. Taylor & Francis;
2. Burghard C: Big Data and Analytics Key to Accountable Care Success. IDC
Health Insights; 2012.
3. Dembosky A: Data Prescription for Better Healthcare. Financial Times,
December 12, 2012, p. 19; 2012. Available from:
4. Feldman B, Martin EM, Skotnes T: Big Data in Healthcare Hype and Hope.
October 2012. Dr. Bonnie 360; 2012.
5. Fernandes L, OConnor M, Weaver V: Big data, bigger outcomes. J AHIMA
6. IHTT: Transforming Health Care through Big D ata Strategies for leveraging
big data in the health care industry; 201 3. http://iheal
7. Frost & Sullivan: Drowning in Big Data? Reducing Information Technology
Complexities and Costs for Healthcare Organizations.
8. Bian J, Topaloglu U, Yu F, Yu F: Towards Large-scale Twitter Mining for Drug-
related Adverse Events. Maui, Hawaii: SHB; 2012.
9. Raghupathi W, Raghupathi V: An Overview of Health Analytics. Working
paper; 2013.
10. Ikanow: Data Analytics for Healthcare: Creating Understanding from Big Data.
11. jStart: How Big Data Analytics Reduced Medicaid Re-admissions. A jStart Case
Study; 2012.
12. Knowledgent: Big Data and Healthcare Payers; 2013. http://knowledgent.
13. Explorys: Unlocking the Power of Big Data to Improve Healthcare for Everyone.
14. IBM: IBM big data platform for healthcare. Solutions Brief; 2012. http://public.
15. Intel: Leveraging Big Data and Analytics in Healthcare and Life Sciences:
Enabling Personalized Medicine for High-Quality Care, Better Outcomes; 2012.
16. IBM: Data Driven Healthcare Organizations Use Big Data Analytics for Big
Gains; 2013.
17. Savage N: Digging for drug facts. Commun ACM 2012, 55
18. Zenger B:
Can Big Data Solve Healthcares Big Problems? HealthByte,
February 2012; 2012.
19. LaValle S, Lesser E, Shockley R, Hopkins MS, Kruschwitz N: Big data,
analytics and the path from insights to value. MIT Sloan Manag Rev 2011,
20. Capgemini: The Deciding Factor: Big Data & Decision Making; 2013. http://
21. Connolly S, Wooledge S: Harnessing the Value of Big Data Analytics. Teradata;
22. Courtney M: Puzzling out big data. Engineering & Technology 2013:5660.
23. Intel: Big Data Analytics; 2012.
24. Manyika J, Chui M, Brown B, Buhin J, Dobbs R, Roxburgh C, Byers AH: Big
Data: The Next Frontier for Innovation, Competition, and Productivity. USA:
McKinsey Global Institute; 2011.
25. IBM: Large Gene interaction Analytics at University at Buffalo, SUNY; 2012.
http://public.d m/common/s si/ecm/en/ imc14675 usen/
26. IBM: Harvard Medical School; 2011.
27. Raghupathi W, Kesh S: Interoperable electronic health records
design: towards a service-oriented architecture. e-Ser vice Journal
2007, 5:3957.
28. Borkar VR, Carey MJ, Chen L: Big data platforms: what's next? ACM
Crossroads 2012, 19(1):4449.
29. Ohlhorst F: Big Data Analytics: Turning Big Data into Big Money. USA: John
Wiley & Sons; 2012.
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 9 of 10
30. Zikopoulos PC, DeRoos D, Parasuraman K, Deutsch T, Corrigan D, Giles J:
Harness the Power of Big Data. McGraw-Hill: The IBM Big Data Platform;
31. Zikopoulos PC, Eaton C, DeRoos D, Deutsch T, Lapis G: Un derstanding Big
Data Analytics for Enterprise Class Hadoop and Streaming D ata.
McGraw-Hill: Asp en Insti tute; 201 2.
32. Bollier D: The Promise and Peril of Big Data. Washington, DC: The Aspen
Institute; 2010.
Cite this article as: Raghupathi and Raghupathi: Big data analytics in
healthcare: promise and potential. Health Information Science and Systems
2014 2:3.
Submit your next manuscript to BioMed Central
and take full advantage of:
Convenient online submission
Thorough peer review
No space constraints or color figure charges
Immediate publication on acceptance
Inclusion in PubMed, CAS, Scopus and Google Scholar
Research which is freely available for redistribution
Submit your manuscript at
Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 10 of 10
... To provide high quality healthcare services through patient-centered model, massive amounts of health data must be managed and analyzed (Kakandikar & Nandedkar, 2020). Big data analytics and applications in healthcare are still in their infancy, but rapid developments in tools and platforms can speed up their maturation (Raghupathi & Raghupathi, 2014). Generally, healthcare data is unstructured, arises in silos, and is stored in imaging systems, medical prescription notes, insurance claims data, EPR, and so on. ...
... Flume is habitually utilized for log files, facts generated via social media, and e-mail messages. (Raghupathi & Raghupathi, 2014) ...
... h different administrations, guaranteeing synchronization between groups of servers. Big data analytics applications utilize these administrations to facilitate equal handling across enormous bunches. (Ambingavathi & Sridharan, 2018) 6. HBase-HBase is a section arranged direction framework that sits on prime of HDFS. It utilizes a non-SQL approach.(Raghupathi & Raghupathi, 2014) Mahout-Mahout is one more Apache project whose point is to create loose bundles of dispensed and adaptable framework concentrating on calculations that help huge facts analytics at the Hadoop stage(Raghupathi & Raghupathi, 2014) 9. Apache Pig-Apache Pig is one of the accessible open-supply structures used to higher examine massive facts ...
Full-text available
As the healthcare sector has been disrupted by technological innovation in the past decade, a considerable amount of patient-oriented data is generated at an exponential rate in a very short period of time. This data can be in either structured or unstructured format. It requires the power of statistical analysis to analyze such a big volume of data. In the healthcare sector, numerous big data analytics tools and techniques have been developed to handle these vast volumes of data. These tools help in extracting useful information out of such data. This chapter focuses on big data analytics tools along with applications and challenges of leveraging big data analytics in the healthcare sector. This study depends on secondary data that have been gathered from various sites, journals, books, and other available e-content and contributes to the existing literature on leveraging big data analytics in the healthcare sector.
... For example, for genomics data, each human genome requires 200GB of raw data or 125MB if we store just snipes, a single functional MRI i.e. medical resonance imaging is about 300GB in case of medical imaging data. monitoring, temperature, heart rate ,medication dispensing measure at intensive care unit [2]. ...
... Healthcare also generates lots of various information such as clinical information including patients demographics ,diagnosis procedure, medication, lab results , medical records of a patient and patient generated health data includes body sensors and other equipments that patients wear and live data sources such as blood pressure measure, blood glucose measure at intensive care unit [2]. ...
Conference Paper
Full-text available
This study provides an understanding of how task, technology, people and structures (TTPS) can be used by developing countries as a baseline for future alignment evaluations to leverage big data analytics (BDA) in healthcare. Enabling the adoption of patient-empowering technologies is being used to cultivate a more patient-centric and collaborative healthcare ecosystem. Creating simple and reliable technology then integrating it into the existing public healthcare infrastructure has generated new challenges for big data analytics. Challenges arise because of complex interactions among the tasks that need to be supported by the technology, people using the technology and the distinct contextual characteristics of the organization. Therefore, TTPS are organisational elements that affect the value generated by BDA. However, little is still known about how these interacting organizational elements can be aligned to obtain maximum BDA value. A study involving participants from different industrial sectors in South Africa was conducted to determine what level of alignment of TTPS yields the best value from BDA. We adopted the notion of alignment as Gestalts. Results show strong alignment between TTPS where IT experts work closely with financial planners. Adequate training, coupled with structures encouraging usage of BDA results in higher organisational performance when technology is in sync with the tasks.
This paper offers insights into the diffusion and impact of artificial intelligence in science. More specifically, we show that neural network-based technology meets the essential properties of emerging technologies in the scientific realm. It is novel, because it shows discontinuous innovations in the originating domain and is put to new uses in many application domains; it is quick growing, its dimensions being subject to rapid change; it is coherent, because it detaches from its technological parents, and integrates and is accepted in different scientific communities; and it has a prominent impact on scientific discovery, but a high degree of uncertainty and ambiguity associated with this impact. Our findings suggest that intelligent machines diffuse in the sciences, reshape the nature of the discovery process and affect the organization of science. We propose a new conceptual framework that considers artificial intelligence as an emerging general method of invention and, on this basis, derive its policy implications.
Objective: People living with HIV have high rates of obesity and obesity-related comorbidities. Our study sought to evaluate weight trajectory in a retrospective cohort of people living with HIV and matched HIV-negative veterans (controls) and to evaluate risk factors for weight gain. Methods: This was a retrospective database analysis of data extracted from the VA Corporate Data Warehouse that included people living with HIV (n = 22 421) and age-matched HIV-negative controls (n = 63 072). The main outcomes were baseline body weight and weight change from baseline at 1, 2, and 5 years after diagnosis (baseline visit for controls). Results: Body weight at baseline was lower in people living with HIV than in controls. People living with HIV on antiretroviral therapy (ART) gained more weight than did controls. In a sub-analysis of ART-exposed people living with HIV, age >50 years, African American race, body mass index (BMI) <25, CD4 ≤200, and HIV diagnosis year after 2000 were associated with more weight gain at year 1. Nucleoside reverse transcriptase inhibitors (NRTI) plus non-NRTIs (NNRTIs) were associated with less weight gain than NRTIs plus protease inhibitors, NRTIs plus integrase inhibitors, or NRTIs plus other agents at year 1. Conclusions: Among US veterans, those living with HIV had lower rates of obesity than age-matched HIV-negative controls; however, primarily in the first 2 years after starting ART, people living with HIV gained more weight than did controls.
Full-text available
The overall goal of the ISEE Assessment is to pool multi-disciplinary expertise on educational systems and reforms from a range of stakeholders in an open and inclusive manner, and to undertake a scientifically robust and evidence based assessment that can inform education policy-making at all levels and on all scales. Its aim is not to be policy prescriptive but to provide policy relevant information and recommendations to improve education systems and the way we organize learning in formal and non-formal settings. It is also meant to identify information gaps and priorities for future research in the field of education.
In this work, we present a study of electronic health record (EHR) data that aims to identify pediatric obesity clinical subtypes. Specifically, we examine whether certain temporal condition patterns associated with childhood obesity incidence tend to cluster together to characterize subtypes of clinically similar patients. In a previous study, the sequence mining algorithm, SPADE was implemented on EHR data from a large retrospective cohort (n = 49 594 patients) to identify common condition trajectories surrounding pediatric obesity incidence. In this study, we used Latent Class Analysis (LCA) to identify potential subtypes formed by these temporal condition patterns. The demographic characteristics of patients in each subtype are also examined. An LCA model with 8 classes was developed that identified clinically similar patient subtypes. Patients in Class 1 had a high prevalence of respiratory and sleep disorders, patients in Class 2 had high rates of inflammatory skin conditions, patients in Class 3 had a high prevalence of seizure disorders, and patients in Class 4 had a high prevalence of Asthma. Patients in Class 5 lacked a clear characteristic morbidity pattern, and patients in Classes 6, 7, and 8 had a high prevalence of gastrointestinal issues, neurodevelopmental disorders, and physical symptoms respectively. Subjects generally had high membership probability for a single class (>70%), suggesting shared clinical characterization within the individual groups. We identified patient subtypes with temporal condition patterns that are significantly more common among obese pediatric patients using a Latent Class Analysis approach. Our findings may be used to characterize the prevalence of common conditions among newly obese pediatric patients and to identify pediatric obesity subtypes. The identified subtypes align with prior knowledge on comorbidities associated with childhood obesity, including gastro-intestinal, dermatologic, developmental, and sleep disorders, as well as asthma.
ArtificialArtificialintelligenceIntelligence(AI)Artificial intelligence (AI) and big dataBig data are active researchResearch topics in e-healthE-health. Big dataBig data in medicineMedicine comprises massive dataData that includes image dataImage data, metadataMetadata, and rich clinicalClinicalinformationInformationfrom electronic health records (EHRs)Electronic health records (EHRs). InherentInherentbig dataBig data challenges include lack of labeled dataLabeled data, obstacles to dataData share among institutionsInstitution, need for information technologyInformation technology framework for data managementData management and procurement, and dataDatasecuritySecurity. In this chapter, we explore AIArtificial intelligence (AI)conceptsConcept and big dataBig data in medicineMedicine and their impact on e-healthE-health. We discuss the promise of AIArtificial intelligence (AI) and new opportunities for cancerCancerdetectionDetection and preventionPrevention, precisionPrecision in diagnostic imagingDiagnostic imaging, drug discoveryDrug discovery, clinical decision-makingClinical decision‐making, and its potential role for COVID-19COVID-19 and other future pandemicsPandemic. We also examine potential barriersBarrier and challenges to clinicalClinical translatability and fairnessFairnessin AIArtificial intelligence (AI) and ethical implications. Big dataBig data and artificialArtificialintelligenceIntelligence for medical decisionDecision-making The code of this chapter is 01101111 01101110 01100101 01100110 01101001 01110100 01100100 01111001 01101100 01110100 01101001 01101110 01100001 01101001 01000011.
Big data comes in many forms. It comes as customer information and transactions contained in customer-relationship management and enterprise resourceplanning systems and HTML-based web stores. It comes as information generated by machine-to-machine applications collecting data from smart meters, manufacturing sensors, equipment logs, trading systems data and call detail records compiled by fixed and mobile telecommunications companies. Big data can come with big differences. Some say that the 'three Vs' of big data should more properly be tagged as the 'three HVs': high-volume, high-variety, high-velocity, and high-veracity. Apply those tags to the mountains of information posted on social network and blogging sites, including Facebook, Twitter and VouTube; the deluge of text contained in email and instant messages; not to mention audio and video files. It is evident then that it's not necessarily the 'big-ness' of information that presents big-data applications and services with their greatest challenge, but the variety and the speed at which all that constantly changing information must be ingested, processed, aggregated, filtered, organised and fed back in a meaningful way for businesses to get some value out of it.
Objectives: We examine the emerging health analytics field by describing the different health analytics and providing examples of various applications. Methods: The paper discusses different definitions of health analytics, describes the four stages of health analytics, its architectural framework, development methodology, and examples in public health. Results: The paper provides a broad overview of health analytics for researchers and practitioners. Conclusions: Health analytics is rapidly emerging as a key and distinct application of health information technology. The key objective of health analytics is to gain insight for making informed healthcare decisions.
With the right approach, data mining can discover unexpected side effects and drug interactions.
Conference Paper
Drug-related adverse events pose substantial risks to patients who consume post-market or Drug-related adverse events pose substantial risks to patients who consume post-market or investigational drugs. Early detection of adverse events benefits not only the drug regulators, but also the manufacturers for pharmacovigilance. Existing methods rely on patients' "spontaneous" self-reports that attest problems. The increasing popularity of social media platforms like the Twitter presents us a new information source for finding potential adverse events. Given the high frequency of user updates, mining Twitter messages can lead us to real-time pharmacovigilance. In this paper, we describe an approach to find drug users and potential adverse events by analyzing the content of twitter messages utilizing Natural Language Processing (NLP) and to build Support Vector Machine (SVM) classifiers. Due to the size nature of the dataset (i.e., 2 billion Tweets), the experiments were conducted on a High Performance Computing (HPC) platform using MapReduce, which exhibits the trend of big data analytics. The results suggest that daily-life social networking data could help early detection of important patient safety issues.
Three computer scientists from UC Irvine address the question "What's next for big data?" by summarizing the current state of the big data platform space and then describing ASTERIX, their next-generation big data management system.
The digital automation of health information has traditionally focused on the formal implementation of electronic health records (EHRs). Most of these EHR systems are relational databases that focus on intra-enterprise applications; very few have become fully functional, scalable, distributed systems with interoperability. This article identifies the design challenges in EHRs and explores the potential of service-oriented architecture in the development of interoperable EHRs. A prototype SOA model for an EHR in a health clinic setting is described. Challenges in the application of the SOA model to health care are discussed. The experience gained from this effort provides valuable insight into how SOA can be developed in health care organizations. The paradigm shift towards an SOA will involve the consideration of “ health care services” as the fundamental basis for developing next-generation health care systems.