Content uploaded by Saima Jabeen
Author content
All content in this area was uploaded by Saima Jabeen on May 24, 2019
Content may be subject to copyright.
Data Science Algorithms and Techniques for
Smart Healthcare using IoT and Big Data
Analytics
Liyakathunisa1, Saima Jabeen2, Manimala S3, and Hoda A. Elsayed4
1Taibah University, Madina, Saudi Arabia dr.liyakath@yahoo.com
2University of Wah, Wah Cantt, Pakistan sjabeen@uow.edu.pk
3Sri Jayachamarajendra College of Engineering, India malasjce@gmail.com
4Prince Sultan University, Riyadh, Saudi Arabia helsayed1993@gmail.com
Abstract
Smart Healthcare network is an innovative process of synergizing the ben-
efits of Sensors, Internet of Things (IoT) and Big Data analytics to deliver
improved patient care while reducing the healthcare costs. In recent days,
healthcare industry faces vast challenges to save the data generated and to
process it in order to extract knowledge out of it. The increasing volume
of healthcare data generated through IoT devices, electronic health, mobile
health and Telemedicines screening requires the development of new meth-
ods and approaches for their handling. In this chapter, we briefly discuss
some of the healthcare challenges and big data analytics evolution in this fast
growing area of research with a focus on those addressed to smart healthcare
through remote monitoring. In order to monitor the healthcare conditions of
an individual, support from sensor, and IoT devices is essential. The objec-
tive of this study is to provide healthcare services to the diseased as well as
healthy population through remote monitoring using intelligent algorithms,
tools, and techniques with faster analysis and expert intervention for better
treatment recommendations. The delivery of healthcare services has become
fully advanced with integration of technologies. This study proposes a novel
smart healthcare big data framework for remotely monitoring physical daily
activities of healthy and unhealthy population. The framework is validated
through a case study which monitors the physical activities of athletes with
sensors placed on wrist, chest and ankle. The sensors connected to the human
body transmit the signals continuously to the receiver. On the other hand, at
the receiver end the signals are stored, analyzed through big data analytics
techniques and machine learning algorithms are used to recognize the activ-
ity. Our proposed framework predicts whether the player is active or inactive
based on the physical activities. Our proposed model has provided an accu-
racy of 99.96% which can be adapted to remotely monitor health conditions of
2 Liyakathunisa, Saima Jabeen, Manimala S, and Hoda A. Elsayed
old patients in case of Alzheimer disease by caregivers, rehabilitation, obesity
monitoring, remotely monitoring of sports persons physical exertion and it
can also be beneficial for remotely monitoring chronic diseases which requires
vital physical information, biological and genetic data.
1 Introduction
Rapid growth in the field of telecommunication and Internet of Things has
made the life of human being very smart. According to Cambridge dictionary
smart means clean, tidy and intelligent appearance. Adding smartness to the
sophisticated lifestyle makes smart living. Smartness can be added to home,
education, energy consumption, shopping, and agriculture or even to health
(Fig. 1). Smart home can be connected through the smart phones where re-
mote monitoring and managing is possible. Suppose, cameras are connected
to the entrance of the home or at the gates, smart apps can be used to recog-
nize people and allow the gate to open only when the recognized people enter,
thus adding security to the home. Smart home can be controlled remotely in
case any light or fan is switched on it could be switched off from the office
itself, thus conserving energy. In hot summer while leaving the office, one can
turn on the air conditioner of the home so that by the time when one reaches
home then could get a comfortable and soothing environment.
Fig. 1. Smart Living [61]
A healthy society is built by creating a right balance in every sphere of the
lives of the people. Smart health monitoring deals with healthy population as
Title Suppressed Due to Excessive Length 3
well as the diseased such as tracking daily activities of aged people, obesity
monitoring, sports persons workload monitoring, monitoring the heart rate,
glucose level, asthma, body temperature or any other concerned issues by
remote physicians. Healthcare sector is facing major economic challenges in
most of the developing countries due to increasing number of patients and
the level of care needed for aged population. It is estimated that more than
200 million people in the world experience chronic diseases such as cancer,
asthma, cardiovascular diseases, arthritis, dementia, alzheimer and chronic
obstructive pulmonary diseases which needs frequent diagnosis, monitoring
and expert interventions [1][2]. Also by 2015, around 46.8 million people in
the world are living with dementia. It is also predicted that the predominance
of dementia will increase in all parts of the world by 2050 [3]. China and India
have the highest number of diabetes sufferers in the world, at around 110
million and 69 million, respectively. Globally, this number is expected to rise
from the current 415 million to 642 million by 2040 [4]. This imposes a heavy
burden on the government sectors. A proportion of about 8.9% to 16.4% of
Gross Domestic Product (GDP) is spent on healthcare costs by most of the
countries according to a recent survey [5].
In 2012, an estimate of around 500 Petabytes of data was collected by
healthcare industries. It is also predicted that by 2020 over 2500 Petabytes of
data will be collected by healthcare industries. Currently clinical, biological,
and physical data are stored into heterogeneous systems. Most of this data
is in structured, semi structured or in unstructured formats. Integration of
all this data from heterogeneous sources requires in-depth analysis of all the
relevant data for efficient treatment of a particular patient [6]. Hence, there
is a significant need of big data tools and techniques to analyze and process
this enormous amount of data efficiently. High speed processors are needed
to achieve quick insight and analyses from this data for better treatment
recommendations.
Despite linking the big data concept mostly to the size property solely, big
data has other properties that well define it. These properties are referred to
as 3 Vs: velocity, variety, and volume [7]. In addition, big data permits data-
driven decision making which is essential especially in the medical healthcare
sector [8]. Healthcare Big data world in United States (U.S.) [9] exceeds the 50
million patients records that rely mainly on using the concept of data-driven
to overcome the frequent healthcare-related challenges. Accordingly, IoT tech-
nology advances play an important role in providing healthcare services with
better quality.
Internet of Things (IoT) term is referred to as a physical connectivity
network that enables different objects to interact by exchanging collected data
[10]. Zaslavsky et al defined IoT [11] as a technique which connects many
devices that is able to sense, store and do computations over the internet.
In other words, IoT applicability relies on the amount of data streams that
emerges from everytime the smart devices communicate. As a result, data
size started to increase gradually. In 2015, wireless sensors and the number of
4 Liyakathunisa, Saima Jabeen, Manimala S, and Hoda A. Elsayed
connected nodes also grew exponentially in the healthcare field to be 10-30
million nodes [12] as shown in (Fig. 2).
Fig. 2. Number of connected nodes over web per each sector in 2015 [12]
The European Commission [13] predicts that between 50-100 billions of
devices will be physically connected through IoT by the end of this decade;
Which suggests that around 40% of the technology related to IoT will be
devoted to the healthcare sector, making up a $117 billion market [14]. This
prediction has to shed the light on big data that will result from such huge
network connectivity. In fact smart health devices are able to communicate
over internet which gives access to a huge pool of real-time medical data.
Big data analytics and IoT for smart healthcare can improve the medical
institutions efficiency in terms of identifying risk factors and disease treat-
ment workflow. It cannot only achieve the examination index requirement on
hospital management, but also get better cost expansion between treatment
and care service [15].
Thus, a major data transformation has been driven by the use of IoT, ad-
vanced analytics and Big Data technologies in Healthcare sectors. Analyzing
data has significant importance amongst all these stages i.e. from data collec-
tion and transmission, its evaluation, to notify and to intervene. Hence, there
is a need of exploiting algorithms and techniques of data science for Big Data,
which lack efficiency in accessing, processing and analyzing the data gener-
ated from various sources such as imaging digital devices, laboratory tests,
telematics, sensors, emails, clinical notes, and third party sources.
Over the years, effective diagnosis and solutions were provided for many
ill-fated diseases with advancement in medical science. Nonetheless, the in-
creasing urban population and changing lifestyle demand a smart healthcare
network which can provide a quick and efficient treatment [16]. In order to
Title Suppressed Due to Excessive Length 5
provide better healthcare services, medical and pharmaceutical companies,
healthcare professionals, researchers and city managers are working on Big
Data solutions and IOT devices which can minimize response time, provide
remote treatment, offer quick emergency services, reduce overcrowding in hos-
pitals, and communicate, share and collaborate with doctors around the world
[16][6].
In this chapter, we provide a brief review of techniques and algorithms
of data science which are highly significant in processing and overcoming the
challenges of big data in smart healthcare applications.
2 Related Work
IoT and Big Data plays a significant role for early diagnosis and proper treat-
ment recommendations with expert intervention. Due to the rapid growth in
E-health, M-Health and Telemedicine, in this section, we intent to present
the related work on E-Health, M-Health and TeleMedicine using IoT and Big
Data Analytics techniques. To show the intensiveness and variety of research
in current area, only some of the recent and representative works are presented
in this section.
2.1 E-Health
IoT model has strengthen the machines communications that enables the ap-
plication of telemonitoring in E-Health [17]. In addition, big data plays an
important role in E-Health as it turns hypothesis-based researches to data-
oriented researches by processing huge volume of health data so rapidly [18].
Thus, trivial and nontrivial link between various sensors and the E-Health
data is made possible. Such connections can help in remote clinical diagnosis,
diseases uncoverage and introducing novel therapy methods. In [17], Suciu et
al analyzed the secure integration methods of big data processing over cloud
communication using Remote Telemetry Units (RTUs). They proposed an ar-
chitecture for E-Health that is built on the top of a search application known
as Exalead CloudView.
2.2 M-Health
In the age of information interpretation for knowledge building, devices and
apps can be used to create a health selfie [34]. Many devices and mobile
applications were developed to serve the healthcare. For example, Myo is a
motion controller for games that is used in orthopedics to aid patients with
severe fracture in exercising and monitor their progress. Moreover, it enables
doctors to evaluate the patient’s performance by measuring the movement
6 Liyakathunisa, Saima Jabeen, Manimala S, and Hoda A. Elsayed
angle. Zio Patch is another example that measures heart rate and electrocar-
diogram (ECG) and has got usability approval by the US Food and Drug Ad-
ministration [20]. Moreover, Glaxo announced that their current investments
are centered around electroceuticals and bioelectrical drugs that uses micro-
stimulation of nerves [21]. In addition, J&J along with Google and Philips
teamed to develop automated robotics-based surgeries and other wearable de-
vices such as blood pressure monitors respectively [22]. Furthermore, Novartis
and Google are working on sensor technologies such as smart lens, wearable
blood glucose levels measuring device [23]. Adding to that, HeartCare+ is a
mobile application that was developed to assess coronary heart diseases risks
for patients residing in the rural areas and allows remote communication with
urban side physicians [24]. HeartCare+ evaluates the risks and classifies them
to either low, moderate or high based on Framingham Scoring Model. Other
mobile apps were emerged recently, including SleepBot and myDario [25][26].
Ranked Health program was announced by the Hacking Medicine Institute
to assess and rank the effectiveness of such health-centered applications and
wearables [27].
2.3 TeleMedicine
IoT along with the existing technologies of multimedia have good contribu-
tion in the health area by aiding the surrounding living and telemedicine
nowadays. TeleMedicine is referred to as using telecommunications technolo-
gies such as telephone, facsimile, and distance education to provide electronic
medical consultation and specialty care services as and when needed. The in-
terest of using telemedicine has grown widely since 1990s with around $100
million investments. The telemedicine technology is being adopted currently
by around 13 federal agencies. However, some concerns arise along with that
adoption including IoT interoperability, service quality, system security, and
rapidly storage growth [28][29].
In [28], the authors proposed an open source, flexible and secure platform
that rely mostly on IoT with the aid of cloud computing technology that allows
nearby ambient communication for medical use. The authors also addressed
some of the pitfalls discussed earlier for enhancement purposes. On the other
hand, Ahmed et al. and Anpeng et al. implemented a mobile TeleMedicine
system that transmits electrocardiogram (ECG) signal to hospital via cellu-
lar networks [30][31]. In [32], the authors developed a portable TeleMedicine
services tool that diagnose patients remotely using seven vital signs at low
cost. These patients vital signs include data namely blood pressure, oxygen
in blood, glucose level, patient position and falls, body temperature, hearts
electrical and muscular functions through ECG and breath. They collected
these vital signs through an android application that they developed.
TeleMedicine is capable of doing a noticeable enhancement in the way
healthcare is viewed currently and some other technologies can ease this pro-
cess. For example, Micro Engineer Machine Systems (MEMS), a NanoTechnol-
Title Suppressed Due to Excessive Length 7
ogy form, takes different forms opens new TeleMedicine opportunities, These
forms include [33]: 1) Robots that can be used in arthroscopic surgery 2) En-
capsulated cameras that can be swallowed to monitor digestion 3) Wearable
wireless sensors that can be used to monitor physiological functions
Human-to-Machine (H2M) interfaces are significantly important in
TeleMedicine since telemedicine system requires two interfaces, one at the
patient side and the other on the physician side. The most ultimate rec-
ommended H2M interface to use is known as natural language. Moreover,
intelligent know-bots robots represent a future investment of TeleMedicine.
Know-bots are virtually existing intelligent avatar that are planned to be at-
tached to patients in the future to track their e-health records. They were
designed to understand natural language, respond to medical inquiries and
alert their owners if unhealthy trends were noticed [33]. This would all partic-
ipate in reducing the medical errors and thus, enhance the quality of remote
healthcare services and support clinical decision making. However, such tech-
nologies are less likely to be adopted in many countries due to low economy
supply to the healthcare field and technology resistance.
3 Healthcare Challenges
3.1 Medical IoT Challenges
Medical IoT platforms were designed to help in composing analytics rapidly,
gaining insights and achieving organizational data transformation and integra-
tion. However, five main requirements represent a serious challenge to achieve
in these IoT-based medical platforms. These challenging requirements can be
summarized as follows [34]:
•Providing a simple connectivity to devices and data through cloud-based
services.
•Medical devices management activities (e.g. check asset availability, in-
crease throughput, minimize outages and reduce maintenance costs).
•Intelligent data storage and transformation through APIs bridge the gap
that exists between the data and the cloud.
•Informative analytics of huge data at run time to gain insight and make
better decisions
•Resolving unknown-source risks by activating notifications and isolating
incidents from affecting the active IoT environment.
3.2 Big Data Challenges
Big data is changing every second. This changing data poses lots of challenge
to store, analyze and retrieve the massive data. The traditional databases
cannot be used to store, process and retrieve the data due to its volume and
variability. The main challenges faced by big data analytics are:
8 Liyakathunisa, Saima Jabeen, Manimala S, and Hoda A. Elsayed
i. Data storage and quality
ii. Good quality analysis of data
iii. People with good analytical skill
iv. Security and privacy of data
v. Multiple sources of data
Challenges faced by big data in healthcare are no different. The characteristics
of big data are the main challenges that need to be addressed. It addition to the
above challenges it also encompasses the following issues that require atten-
tion. Some issues in healthcare sector require critical review before analyzing
the data. These issues with the data include its storage, structure, standard-
ization, querying, cleaning, ownernship, inaccuracies, precision, real time anal-
ysis, privacy issues related to personal health information, retrieval/collection
from a variety of sources, reporting, visualizing and managing the health data
[35][36].
3.3 Medical Big Data Technology Challenges
Moving towards big data technology is essential for providing better medical
integrated services. However, big data technology presents a potential threat
to some people categories. Big data challenges, that usually arise in the med-
ical sector, fall under two main categories, which are [34]:
1) Fiscal Challenges: The medical field services rely on paid face-to-face in-
teractions between patients and clinicians during clinical visits. Thus, pro-
moting technologies to be involved into this process burdens the medical
society and creates an inevitable staff bias against those non-paid services.
However, from a value-based care perspective, there is more incentive to
use new technologies that reduce unnecessary in-office encounters.
2) Technology Challenges: Technology-wise, big data introduces a barrier to-
wards achieving the healthcare data vision. Exchanging individual records
between various parties requires data fragmentation where the expected
future vision moves towards data aggregation. Aggregated data has two
additional advantages [34]. First, no data interoperability or data struc-
ture translation will be needed between two proprietary systems. Moreover,
flexible support to machine learning and AI will be given to function in a
real-time fashion.
3.4 Big Data Security and Privacy Issues
Hacking has become a leading cause of privacy breaches [37]. Security attacks
has various financial benefits. Health records can be accessed for various rea-
sons [38] (e.g. revealing persons health record illegally, collecting medical data
that is hard to access, or simply to defeat systems). Security attacks risks arise
mostly due to the lack of understanding, by the health care community, in
using technology. One of which is big data technology, used in the healthcare
Title Suppressed Due to Excessive Length 9
sector, that raised recently lots of concerns related to security and patient
privacy. Although patient details are stored in data centers with different lev-
els of security, there is no guarantee for a patient records safety. In addition,
the medical data, that flow from diverse sources, burden data storage, pro-
cessing and communication. Thus. patients privacy and health data security
are bound together in the medical environment. However, its obvious that se-
curity shortage can dramatically lead to invading peoples privacy in medical
sector especially with the growing use of mobile devices.
Researchers are always seeking high-standard implementation that pro-
tects medical data during transmission, storage, and after collection usage.
Some of the agreed upon security standards from researchers and IT people
during medical applications implementation are listed below as follows [38]:
•Wiping personal details from a device when the patient session ends
•No third party application (e.g. commercial advertisements) interference
•Using the two-steps authentication to protect the stored data
•End-to-End data encryption and decryption while transmission
•SSL/TLS-based communication between the app and other systems
4 Data Science Techniques and Algorithms
Data science is a broad discipline. It works with huge datasets of diverse
formats found over the Internet or in database repositories [39]. Nowadays,
statistical and AI techniques are widely used in data science. In applications
wise, fast, parallel and distributed algorithms have their significant role. Spark
and Hadoop software systems incorporate the principle of distributed com-
puting and being extensively used in terms of cloud computing technology
[40][41][42].
Data Science-related applications are using various methods and algo-
rithms. SVM, Regression, Clustering, Decision Trees, Visualization, K-nearest
neighbors, PCA, Statistics, Random Forests, Time series/Sequence analy-
sis, Text Mining, Boosting, Anomaly/Deviation detection, Ensemble meth-
ods, Optimization, Neural networks, Singular Value Decomposition and Deep
learning are among the popular algorithms and approaches being used by
Data Scientists in recent years.
Algorithms used by the government and industrial data scientists are more
different than the ones used by the academic researchers and students. It
is found that data scientists from Industry more often make use of Time
series, Regression, Random Forests, Statistics and Visualization. Time Series,
PCA and Visualization are usually exploited by the Government/non-profit
organizations. Researchers from academics prefer to use Deep Learning and
PCA however students usually do not have much tendency to use lots of
algorithms rather they seem to have interest in using Deep Learning and text
mining [43].
10 Liyakathunisa, Saima Jabeen, Manimala S, and Hoda A. Elsayed
Since, Data science overlaps with computer science, statistics, Machine
Learning and data mining, operations research and business intelligence. It
indeed fully encompasses machine learning and data mining which are very
popular and closely related domains of computer science. In machine learning
(ML), systems are programmed in a way that they automatically recognize
and understand the input data and could make intelligent decisions based on
the supplied data.Its popular techniques are Recommendation, classification
and clustering.
Supervised and unsupervised approaches are the two most commonly used
methods to implement ML techniques. The former uses a given training data
to learn a function and includes common examples such as classifying emails
as spammed, labeling web pages after analyzing their content and recogniz-
ing voice while later, more often used for clustering homogeneous data into
coherent groups with no predefined dataset for its training. K-Nearest Neigh-
bor Method (KNN), Naive Bayes classifiers, Neural networks, Support Vector
Machines are some well-known supervised learning algorithms while k-means,
hierarchical clustering, and self-organizing maps are common approaches to
unsupervised learning [39][40].
Data science governs various techniques and algorithms borrowed from
aforementioned domains. There are no winning algorithms as they meant for
different situations depending upon the size of underlying dataset and itera-
tions in the existing algorithms. Along with the aim of searching best tech-
niques, one should also be aware of the fundamentals of different algorithms,
the significance of their simplicity and their applications as well. Some popular
algorithms of data scientists are briefly discussed below.
4.1 Classification
Classification, also called categorization, is a form of supervised learning. It
is a ML technique where known data is used to decide assignment of existing
categories to the new data [39]. In other words, some samples of known clas-
sifications are collected to identify the categories of new objects [40]. iTunes
application prepares playlists by making use of classification. Yahoo! And
Gmail which are mail service providers use classification technique to deter-
mine whether a new mail is a spam or not where user actions of indicating
certain mails as spammed are analyzed by the governed classification algo-
rithm.
4.2 Clustering
Clustering (also referred to as segmentation), or sometimes known as unsu-
pervised classification due to the unknown output to the analyst, aims to
divide data into unique, differentiated clusters. The algorithm is not trained
on any previous input data or output information rather it defines the output
for users. For example, customer data spanning 1000 rows can be grouped
Title Suppressed Due to Excessive Length 11
into differentiated segments or clusters using clustering based on the variables
such as demographic information or purchasing behavior of customers [44]. In
Newsgroups, various articles are grouped based on related topics by clustering
techniques. Google and Yahoo! use clustering techniques to make groups as
clusters of related data based on similar features. Tutorialspoint incorporates
a clustering engine that manages its tutorials library in such a way that a
new incoming tutorial is grouped, based on its content, in its corresponding
relevant cluster [39].
4.3 Dimensionality Reduction
Dimension reduction techniques are used to reduce number of dimensions or
variables of a dataset without the loss of information carried by the dataset.
Principal Component Analysis (PCA) and Factor Analysis are two well-known
variable reduction techniques. The core of PCA lies in determining the data
from the viewpoint of principal component where a principal component is a
direction with largest variance in the dataset. The principal components are
defined as the highest variance axis upon rotated the axis of each variable to
highest Eigen vector and are uncorrelated and orthogonal [44].
4.4 Anomaly Detection
An anomaly is also known as an outlier, deviation, exception, novelty, peculiar-
ity, surprise or noise. Detecting an anomaly is all about identifying the items,
or such events or observations which do not obey the pattern of an expected
behavior [46]. It can detect critical information in data and it is highly appli-
cable in a number of application domains. Anomalies are seen as some kind of
problems like cyber intrusions, credit card theft, a configurational flaw, health
issues or mistakes in a text. In healthcare informatics, detecting anomalies in
patient records indicate disease outbreaks, instrumentation errors, etc. The
key challenges in healthcare data are availability of only normal labels, very
high cost of misclassification and the possibility of complex spatio-temporal
data.
4.5 Recommender Systems
Recommender systems provide close recommendations based on users infor-
mation which captures their behavior such as the clicks, ratings, as well as
the past purchases. Amazon and Facebook make use of the recommender sys-
tem to suggest items of one’s interest, based on the information from one’s
previous actions and recommend the people one may know respectively [39].
12 Liyakathunisa, Saima Jabeen, Manimala S, and Hoda A. Elsayed
4.6 Support Vector Machines
Support Vector Machine (SVM) is a supervised approach that learns about
the classes from a given dataset in order to classify new data [45]. Based on the
learnt hyperplane, data is classified into two classes. It behaves in a similar
manner as of C4.5 at a high level but it does not use decision trees. SVM
projects data into higher dimensions and the best hyperplane is figured out
where data is divided into the two classes. For example, it can be considered
that it works in the same way as of separating red and blue balls placed on a
table with a stick without moving the balls, provided if balls are not too mixed
together. This way, for a newly added ball on the table, one can predict its
color by having the information of the side of the stick where the ball is placed.
In this context, balls can be considered as objects, and the red and blue color
imply two classes while the simplest hyperplane as a line is represented by the
stick. SVM figures out the function for the hyperplane.
Considering the more complex case in this example where if balls are mixed
then a straight stick would not adequately work. One could quickly lift up the
table throwing the balls above in the air in just the correct way. Then, make
use of a big sheet of paper to separate the balls in the air while balls are in
the air. In this scenario, lifting the table up as the two dimensional surface is
as of mapping the data into higher dimensions; i.e., to the three dimensional
balls in the air. lifting the table up is the equivalent of mapping data into
higher dimensions; i.e., from the two dimensional table surface to the three
dimensional balls in the air. Kernel in SVM approach is a nice way to work
in higher dimensions. The big sheet of paper as a hyperplane is a function for
a plane rather than a line. The balls placed on a table or in the air of this
example can be mapped to real life data by specifying location of a ball on a
table as (x, y) coordinates where two dimensions of the ball are represented
by x and y.
For the patient dataset, different measures as dimensions like pulse, choles-
terol level, blood pressure etc. are used to describe a patient. Higher dimen-
sions are drawn for these measures by SVM to divide the classes after iden-
tifying the hyperplane. SVMs associated margins are the distances between
two closest objects and hyperplane from each corresponding class. In the run-
ning example, the closest distance between the stick and red and blue ball
is considered as the margin. SVM tries to maximize this margin to set the
hyperplane as far away from red ball as the blue ball hence decreasing the
chance of misclassification. This way, hyperplane stays on equal distance from
a red ball and a blue ball where the balls as data points are known as support
vectors due to their support to hyperplane.
SVM and C4.5 are two classifiers of first choice to try. According to No
Free Lunch Theorem, it can not be said a classifier a best one in all cases.
Kernel selection and interpretability are some of its weaknesses. There are
many implementations of SVM such as scikit-learn, MATLAB and libsym.
Title Suppressed Due to Excessive Length 13
4.7 Ensemble Methods
Ensemble methods is a joint venture of weak learners to provide a powerful
prediction. It is found that Random forest, an ensemble method, is at present
the most accurate of all existing classification methods. Simple decision tree
acts as a weak learner while random forest is strong learner. Random forest
grows many decision trees from a subset of same dataset as a sample and then
it optimizes the output to find the most accurate of classification model [44].
In the subsequent section, we discuss how these data science algorithms
can be used in Big Data Analytics.
5 Big Data Analytics Tools and Techniques
Data is information that is organized in some specific format. Big Data is
a huge collection of data which can be structured or unstructured. Big data
refers to datasets whose volume is beyond the capabilities of any structured
database management system to handle. According to Forbes, the adoption
of Big data has tremendous growth from 17% in 2015 to 53% in 2017 [47]. A
survey conducted by vcloudnews reports that everyday around 2.5 quintillion
bytes of data. 90% of the world data is segregated in the past two years [48].
The main characteristics of big data are four Vs that is Volume, Velocity,
Variety and Veracity. Volume of the Internet data has increased from 50 GB
per day in 1992 to 28,875 GB per second in 2013 and projected to be 50,000
GB per second in 2018. Velocity refers to the speed at which the data is cre-
ated, stored and analyzed. Every minute 216000 Instagram posts, around 20
crores emails sent, 277000 tweets and 12 hours of video content is uploaded
on Internet. Facebook, WhatsApp, Scopus, Research Gate and many more
sources are complementing the growth of Big data. Variety means Big data
generated on the web from various sources like text, images, videos, sensor
data, keystrokes, clicks and many more. 90% of the data generated is unstruc-
tured in nature according to the study of vcloudnews [48]. The data available
can be semi-structured which has logical flow and format. The data can also
be multi-structured which means the data format is not user friendly.
Apart from the 3 V's of big data (Fig. 3), additional characteristics are
added. Veracity is another important characteristic of Big Data which is nor-
mally overlooked by the analyst. Veracity means inconsistency in data, am-
biguity in data, inaccuracy in data and model approximation. Some of the
big data analysts even add additional 3 V's as the characteristics namely
Variability, Visualization and Value. Fig. 4 illustrates the 7 V's of big data.
Tracking or monitoring self health is increasing tremendously in recent
years. A study reveals that around 70% of the US and UK population is in-
volved in monitoring self health [48]. Google play store hosts more than one
lakh apps related to health monitoring. Applications are available to track
sleep, eating, mood and fertility patterns and many more. Physical and emo-
tional health patterns of individual are monitored.
14 Liyakathunisa, Saima Jabeen, Manimala S, and Hoda A. Elsayed
Fig. 3. The three V's of Big Data [62]
Fig. 4. The seven V's of Big Data [51]
5.1 Applications of Big Data
Big data is generated in an uncontrolled manner every second. Big data can
be in the form of user feedback, review about a product as either good or bad,
usage of apps, the content shared by the user on the social media can assist
in analyzing an individual, his interests, priorities and behaviour. The data
generated can be exploited by the organization to enhance their business. A
few applications are enumerated below (Fig. 5).
Banking and Securities: Complete digitization of the Banking solutions
produces huge data. Analyzing the assets and liabilities of the customer may
help the manager to offer various types of loans, issue credit cards, reward
Title Suppressed Due to Excessive Length 15
points, reduction in the interest rates and also customize the financial solutions
for each individual.
•Insurance: Insurance companies are also trying to analyze the lifestyle of an
individual and promote the most relevant plans to the targeted customers
thereby increasing the revenue of the company. If complete data is available
then claims can also be settled easily which can add value to the service.
•Transportation: Tracing transportation is state of the art application in
recent years. The product manufacturers need to ensure that the products
are transported to the customer in satisfactory condition. With the help
of RFID the goods can be tracked continuously so as to know the exact
location at any given point of time.
•Education: Education system can be personalized based on the require-
ment of an individual. Person who is interested in making films will ed-
ucation only related to that. Education can be packaged as per the need
of the student rather than a lethargic method of loading all the things.
Manufacturing: The entire manufacturing process can be improved if past
process is analyzed thoroughly by applying big data techniques.
•Energy and Utilities: People are thinking of smart cities which have given
rise to smart energy meters which can be hosted in every house or in-
dustry. These smart energy meters can analyze the utilization pattern of
an individual and probably financial charges can be tailored based on the
usage pattern.
•Healthcare: Healthcare monitoring is another field where a very big data
repository is being created. People are health conscious and hence they self
monitor themselves using several apps available for smartphones. Tracking
glucose levels, blood pressure levels, sleeping pattern analysis, diet and
many more enable fitness and also increases the awareness towards healthy
lifestyle.
•Media and Entertainment: Entertainment industry poses really big data.
Audio files with very good quality, video files of HD quality, many TV
shows, live updates in the news portal etc., have huge data. Many appli-
cations like voot, savan and many more available to search and play the
shows of an individual interest.
Apart from the above mentioned applications there are many dimensions
to the Big data applications. The web data can reveal the shopping behaviour
of a person, customer preferences, recommender systems, opinion mining and
retail market analysis. The telecommunication industry, online stores can per-
sonalize the best package for the customers based on their usage patterns.
5.2 Tools and Techniques of Big Data
Big data is to be stored, processed and meaningful information is to be re-
trieved from the voluminous data. Since the volume of the data is huge it is
not possible to store complete data in traditional databases, hence there is
16 Liyakathunisa, Saima Jabeen, Manimala S, and Hoda A. Elsayed
Fig. 5. Applications of Big Data [49]
need to use database which supports unstructured data handling. There are
numerous tools and techniques available to perform analytics on big data.
Following are the few tools and techniques under discussion.
•Hadoop Common: The common utilities that support the other Hadoop
modules.
•Hadoop Distributed File System (HDFS): A distributed file system
that provides high throughput access to application data. Hadoop is an
open source version of MapReduce. HDFS stands for Hadoop Distributed
File System that runs predominantly on commodity hardware. Hadoop is
a data management layer of Apache.
•Hadoop MapReduce: A YARN-based system for parallel processing
of large data sets, which is neither a database nor a competitor to the
databases. MapReduce is a distributed style of computing that has been
implemented in several systems. It has two processes namely Map and
Reduce which can efficiently handle both structured and unstructured data
(Fig. 6).
The input data is divided into multiple chunks and fed as input to Map
tasks on a distributed file system. These Map tasks turn the chunk into a
sequence of key-value pairs. The key-value pairs from each Map task are col-
lected and sorted by a master controller based on the key values. The keys
are divided among all the Reduce tasks, so all key-value pairs with the same
key wind up at the same Reduce task. The Reduce tasks work on one key
at a time, and combine all the values associated with that key in a specific
manner.
Map finds the data on disk and executes the logic it contains. Reduce sum-
marize and aggregates the final result. The benefits of the MapReduce are cost
effective, easy to scale or expand the capacity and easily data can be tailored
Title Suppressed Due to Excessive Length 17
Fig. 6. MapReduce Functions Architecture
according to the requirement. The cons of MapReduce is not a database, hence
no security, indexing, querying and the technique is not mature.
Apache Hive is basically developed to manage and perform analytics
on data warehouse system which is built exclusively for Hadoop. The Hive
project initially started by Facebook, and later it became an open source. In
order to query the data warehouse which normally contains huge volume of
historical data is achieved using a SQL like scripting language called HiveQL
(Hive Query Language)[50].
Pig: A high-level data-flow language and execution framework for parallel
computation.
Spark: A fast and general compute engine for Hadoop data. Spark pro-
vides a simple and expressive programming model that supports a wide range
of applications, including ETL which stands for Extraction, Transformation
and Loading, machine learning, stream processing, and graph computation.
Mahout for Machine Learning: Apache Mahout is a project that pri-
marily focus on the developing algorithms related to machine learning. The
main objective is to develop clustering, classification and collaborative algo-
rithms that can be scalable to handle the rapidly growing big data. These
algorithms are built on top of MapReduce of Apache. The three popular ma-
chine learning techniques are implemented:
•Recommendation
•Classification
•Clustering
Mahout also focuses on the development of linear algebraic and statistical
algorithms.
In our proposed work, we have used Hadoop mapreduce and Mahout Naive
Bayes classification technique.
18 Liyakathunisa, Saima Jabeen, Manimala S, and Hoda A. Elsayed
6 Smart Healthcare
The Convergence of IoT, Big Data and Machine Learning with Telemedicine,
E-health, M-Health for patient monitoring has dramatically revolutionized
personalized healthcare by improving the quality of patient care and lower-
ing costs. Telehealth innovations are considered to resolve significant issues of
remote patient monitoring. The potential benefits include continuous moni-
toring of patient health, regardless of patients location, enhanced accessibility
to healthcare, reduced cost of care and improved quality of care [1].
As technology in Telemedicine, E-health, and M-Health continues to grow,
more and more exciting new IoT based healthcare applications emerge for
collecting, transmitting, and analyzing data [52]. Various biomedical sensors
have been used in the patient monitoring system which provides vital physical
information, genetic, and biological data.
6.1 Biomedical Sensors for Healthcare Monitoring
Several chronic healthcare diseases can be detected using intelligent biomedi-
cal sensors, for example:
•Cardiovascular heart diseases: Chest Strap Sensor and smart watches are
used for heart disease detection which uses electrocardiography (ECG) to
record the electrical activity of your heart. Using Bluetooth and a con-
nected smartphone, ECG signals about the heart-rate can be consistently
transmitted to the receiver mobile device [53].
•Glucose level monitoring: Google smart contact lenses are used for mon-
itoring glucose levels in diabetes patients. Information collected from the
lens can be moved from the eye to the lens using which consists of a ca-
pacitor, a controller and an antenna attached to it [54].
•Asthma: Environmental conditions such as carbon monoxide, ozone and
nitrogen dioxide levels can be monitored using a wrist band. Concurrently
it can also monitor heart rate and other vital information. A mobile phone
is used for transmitting the sensors data. In case of emergencies it can also
be transmitted to doctors for intervention [55].
•Alzheimer, dementia Autism, or other cognitive disorders: In these situa-
tions it is possible to detect abnormal situations such as patient wandering,
according to the Alzheimers Association, Smart Wearable biosensors such
as GPS SmartSoles, and motion detection sensors can alert neighbors,
family or the nearest hospital [56].
GPS SmartSoles consists of a miniaturized GPS device and cellular communi-
cator to send location coordinates. A smartphone or a computer can be used
to the location history on a map. It send alerts by text messages or email.
Alert options include the crossing of a geographic boundary, power on or off,
and even low battery.
Title Suppressed Due to Excessive Length 19
Motion detection sensors work with accelerometers to detect the patient
movements and may use the ZigBee protocol or via GSM to provide real-time
information. It can also use RFID readers to control the inputs and outputs of
the patient; sound sensors can detect motion and request assistance and the
light to check opening of refrigerator to monitor how often patient get food
[57].
Smart clothing: In this sensors are integrated into textile clothing which
is used to measure the vital signs and can constantly track our heart rate,and
also monitor our emotions [57].
Different Biomedical sensors are shown below (Fig. 7).
Fig. 7. Biomedical Sensors
6.2 Smart Healthcare Big Data Framework
With sensors positioned across the body, data collection is a major challenge
in most of the healthcare applications using biomedical sensors. This exist-
ing situation leads to adoption of new tools and techniques which consists
of Internet of Things, Big Data, Machine learning and real-time healthcare
applications.
In order to overcome the above challenges we have proposed smart health-
care framework, which can be used to predict chronic diseases such as car-
diovascular disease, Telemammography, teleophthalmology, asthma, diabetes,
alzheimer’s, dementia, blood pressure, cognitive disorders and other vital
physical and biological symptoms from remote monitoring.
The proposed framework consists of four main components (Fig. 8).
1 Data acquisition: Data will be collected from different biomedical sensors
such as Electrocardiogram (ECG), ElectroEncephalogram (EEG), Elec-
20 Liyakathunisa, Saima Jabeen, Manimala S, and Hoda A. Elsayed
Fig. 8. Smart Healthcare Big Data Framework
tromyography (EMG), Glucometer, blood pressure, body position , and
body motion sensor, attached to the different parts of human body such
as chest, hand, and ankle . The storage format of these data is in diverse
formats such as structured, semi structured or unstructured format. Dur-
ing transmission of the clinical and non clinical data DICOM standards
have been considered. DICOM conforms to international standard in the
market [1][58].
2 IoT and cloud storage: Data collected from different sensors, wearables,
mobile and various IoT devices is transferred through cloud to the big data
processor for further analysis using machine learning techniques. Storage
and processing of the data collected by these IoT devices can be inflexible
and extremely costly due to the enormous growth of healthcare data. Hence
for large amounts of patient data Cloud storage can be used which can help
organizations to save money .
3 Big Data Processing: Because of the variety, veracity and volume of health-
care data it is well suited for big data processing and analytics. The health-
care data collected from different sensors and IoT devices will be sent to
one common platform for Big data processing. we use Apache Hadoop
Distributed File System (HDFS) for processing of this huge amounts of
healthcare data. Hadoop HDFS consists of MapReduce a computing model
of Hadoop which parallely processes large amounts of data sets. A detail
description of Hadoop map reduce is provided in Section 5.2. The input
biomedical sensor data is partitioned into training and testing subsets of
data. Machine learning algorithm can be implemented using apache Ma-
hout to perform intelligent analysis on the input data and produce analysis
results that can be used to generate reports in early detection of healthcare
abnormalities.
Title Suppressed Due to Excessive Length 21
4 Expert Intervention: The results of the analysis are sent to medical experts
and healthcare assistance for further treatment recommendation. Notifi-
cation alerts can be sent to patients wirelessly.
The above smart healthcare big-data framework can be adapted to vari-
ous healthcare applications such as monitoring old patients by caregivers,
remote monitoring of sports persons physical exertion. Also heterogeneous
types of vital physical information, biological and genetic data can be re-
motely monitored. The individualized treatment of a patient, with efficient
processing and timely access of the experts decision is the ultimate goal
of our proposed smart healthcare framework.
6.3 Case Study
This study focuses on the proposed scenario illustrated in (Fig. 9). The dis-
cussed scenario starts by attaching wearable biomedical sensors to the athletes
body on chest, hand, and ankle to collect activity-related information. These
data is then transmitted over WiFi signal towers to be saved in the cloud-
supported servers storages using IoT in a structured format. Afterwards, the
data will undergo the suggested analysis to identify if the athlete is reflect-
ing a positive physical activity. The big data analysis is applied using machine
learning algorithms on Hadoop Distributed File System, MapReduce and Ma-
hout where the inputs taken from the biomedical sensors are partitioned into
training and testing datasets. These details is illustrated in analytical report
form to the experts afterwards including the dietitian and athletes trainers
in charge of following up with their health status. This enables the experts
to monitor and send the possible recommendations on the athletes lifestyle
remotely.
Fig. 9. Remote Monitoring of Athletes Physical Activity Status Scenario
22 Liyakathunisa, Saima Jabeen, Manimala S, and Hoda A. Elsayed
To validate the proposed scenario, PAMAP2 Physical Activity Monitoring
benchmarked database [59] was used for monitoring the physical and daily
activities of the athletes as will be explained in the coming sections.
7 Implementation and Results
7.1 Data Collection
In this study, PAMAP2 Physical Activity Monitoring Dataset1was used par-
tially to extract the features that can trace an athletes activity status us-
ing IoT-based infrastructure that involves connected sensors. PAMAP2s big
dataset [59] contains 3850505 records originally that were planned to describe
18 main different physical activities in total. These planned activities are:
1. Lying down while doing nothing
2. Sitting on a chair comfortably
3. Standing while talking
4. Ironing 1-2 T-shirts
5. Vacuum Cleaning of 1-2 office rooms while moving objects
6. Ascending stairs of 5 floors
7. Descending stairs of 5 floors
8. Walking (4-6 km/hr speed)
9. Nordic walking by the walking poles
10. Cycling using a real bike with a suitable speed
11. Running with a suitable speed
12. Rope Jumping in the form of a basic jump or the alternate foot jump
13. Watching Television at home
14. Computer Work at the office
15. Driving Car between subjects office and home
16. Folding Laundry
17. House Cleaning and dusting shelves
18. Playing Soccer ball
Other junk activities were also noted in the data files that includes for ex-
ample, moving from one location to another or waiting for equipment prepa-
ration. The main 18 activities were performed by 9 subjects (1 female and 8
males with age range of 27.22 3.31 years), who wore three sensors and a heart
rate monitor.
7.2 IoT Infrastructure Setup
The three wireless Colibri inertial measurement units (IMU) were positioned
over the dominant arms wrist, on the chest and on the ankle of the dom-
inant side. Data processing, feature extraction and classification algorithms
1https://archive.ics.uci.edu/ml/datasets/PAMAP2+Physical+Activity+Monitoring
Title Suppressed Due to Excessive Length 23
using big datas various analysis techniques can be applied on the dataset to
recognize the individuals activity and to estimate its intensity.
Sensors raw data is available in .dat files format. Missing values, due to
problems with the hardwares setups (e.g. connection loss to the sensors), were
marked as NaN. Every subjects data file contains the following fields per each
record:
•Timestamp
•ActivityID
•Heart rate (bpm)
•IMU hand temperature (c)
•IMU hand 3D-acceleration (unit: ms-2, 13-bit resolution, 16g scale)
•IMU hand 3D-acceleration (unit: ms-2, 6g scale, 13-bit resolution)
•IMU hand 3D-gyroscope (unit: rad/s)
•IMU hand 3D-magnetometer (unit: T)
•IMU hand orientation
•IMU chest temperature (c)
•IMU chest 3D-acceleration (unit: ms-2, 13-bit resolution, 16g scale)
•IMU chest 3D-acceleration (unit: ms-2, 6g scale, 13-bit resolution)
•IMU chest 3D-gyroscope (unit: rad/s)
•IMU chest 3D-magnetometer (unit: T)
•IMU chest orientation
•IMU ankle temperature (c)
•IMU ankle 3D-acceleration (unit: ms-2, 13-bit resolution, 16g scale)
•IMU ankle 3D-acceleration (unit: ms-2, 6g scale, 13-bit resolution)
•IMU ankle 3D-gyroscope (unit: rad/s)
•IMU ankle 3D-magnetometer (unit: T)
•IMU ankle orientation
Preparing the dataset for implementing a reliable classifier, that uses the
big data analysis (e.g. Hadoop MapReduce and Mahout), was required to
monitor athletes activity status remotely through nutritionists, diet watchers
and trainings responsible personnel. In the following section, data prepara-
tion stages (e.g. cleansing and features extraction) will be elaborated in more
detail.
7.3 Data Pre-Analysis Preparation
PAMAP2 data preparation was performed over multiple steps. These steps
started with converting the .dat file format to .csv file format. In addition,
attributes names were assigned to all dataset columns with numeric values.
Activity ID (0-24) were then mapped to the actual activity name equivalent
to it based on the provided data description. Based on the given activity
name, activity status (e.g. active or inactive) was assigned. Data preprocessing
took place afterwards using Weka 3.6 filters which will be explained later. In
addition, active and inactive cases were separated for each subject data file
24 Liyakathunisa, Saima Jabeen, Manimala S, and Hoda A. Elsayed
and all active cases files were saved in active folder whereas inactive ones were
saved in inactive folder. Afterwards, each single data file was splitted to n data
files using EmEditor where n is the number of rows per each .csv file. Finally,
the n data files were moved to hadoop cloudera environment for processing.
7.4 Data Cleansing
PAMAP2 dataset was cleansed from all unwanted records. Data cleansing
phase involved various steps. First, cleaning data from useless activities that
is ir-related to athletes daily physical training. The deleted records involve
the following activities:
1. Standing
2. Car Driving
3. Watching TV
4. Computer Work
5. House Cleaning
6. Vacuum Cleaning
7. Ironing
8. Playing Soccer
9. Folding Laundry
Moreover, All records with activity marked as other were discarded for an
accurate analysis as they represent the noise data in between activities (e.g.
break time and waiting time in-between activities). In addition, all missing
values with NaN values were replaced by empty cells for smoother preprocess-
ing. Moreover, all data files were cleansed from existing ASCII characters (e.g.
single quotes and commas). As a result of this phase, the final data, that were
selected for the analysis phase on Hadoop, were 272108 in total distributed as:
1) 240,041 active instances, and 2) 31977 inactive instances. These instances
were splitted to the active and inactive folders accordingly.
7.5 Preprocessing
The dataset was uploaded to Weka 3.6 before analysis on Hadoop to pre-
process. In the pre-processing phase, the following unsupervised filters were
adopted respectively which are:
1) ReplaceMissingValues Unsupervised Filter: In which all missing numeric
values were replaced throughout the data files with the modes and means
of the training dataset
2) Normalize Unsupervised Filter: In which all numerical values were limited
between 0 and 1 to avoid any classification bias during the data analysis
phase
Title Suppressed Due to Excessive Length 25
Afterwards, each single data file was saved as a new.csv files and features
were then extracted to and files were splitted using EmEditor to n data files
and then moved to hadoop cloudera environment for processing as explained
earlier in the data preparation section.
7.6 Features Selection
As shown earlier in the IoT infrastructure setup section, around 21 features
were collected using sensors and monitoring devices for the purpose of defin-
ing the physical activity that is taken place. However, for the purpose of
data analysis to train the classifier on what does the activity name reflects
about the physical activity status of an athlete, the right features have to
be extracted for accurate results. Thus, some attributes were excluded from
the elicited dataset. For example, 3D accelerometer data were collected twice
(once using 16g scale and another using 6g scale); However, it was found that
6g scale accelerometer wasnt precisely calibrated compared to the other one
due to certain movements impact (e.g. running) on 6g scale sensor. In addi-
tion, timestamp was removed from the existing dataset as its not needed for
the analysis. So, timestamp, 6g scale for IMU hand, chest and ankle sensors
were taken away.
Also, some other features were replaced as explained earlier (e.g. activity
id was replaced by activity name). Thus, the final list for features considered
in the analysis include the following 17 attributes:
•Activity Name
•Heart rate (bpm)
•IMU hand temperature (c)
•IMU hand 3D-acceleration data (unit: ms-2, 16g scale, 13-bit resolution)
•IMU hand 3D-gyroscope data (unit: rad/s)
•IMU hand 3D-magnetometer data (unit: T)
•IMU hand orientation
•IMU chest temperature (c)
•IMU chest 3D-acceleration data (unit: ms-2, 16g scale, 13-bit resolution)
•IMU chest 3D-gyroscope data (unit: rad/s)
•IMU chest 3D-magnetometer data (unit: T)
•IMU chest orientation
•IMU ankle temperature (c)
•IMU ankle 3D-acceleration data (unit: ms-2, 16g scale, 13-bit resolution)
•IMU ankle 3D-gyroscope data (unit: rad/s)
•IMU ankle 3D-magnetometer data (unit: T)
•IMU ankle orientation
Every .csv data file, that contains these features after cleansing and pre-
processing, was then splitted to n data file using EmEditor and these n data
files were moved to the hadoop distributed file system environment for analysis
as will be illustrated in the coming section.
26 Liyakathunisa, Saima Jabeen, Manimala S, and Hoda A. Elsayed
7.7 Big Data Analysis and Processing
The selected features from multiple sensors is sent to the Cloudera [60] vir-
tual machine, a Big Data processor platform for analysis and processing of the
physical activities monitoring for a healthy living. At the Big Data Server, the
input data is fed to Hadoop HDFS. The input data adapts Hadoop MapRe-
duce for parallel processing of massive amounts of data. Further the input data
is split into Training and Testing subsets of data; with 70% of the input data
is sent for training and remaining 30% for testing. Apache Mahout is used for
implementation of machine learning techniques, which intelligently performs
analysis of the input data. In our proposed approach, we have used Naive
Bayes classification for classification of different physical activities which are
further categorised into active and inactive cases. The classification results
are explained in the following section.
7.8 Experimental Results and Analysis
Experiments were performed on 272108 PAMP2 [59] database with 17 de-
scriptors. The results of the classifier are shown in (Fig. 10). The performance
of the classifier is evaluated by the generated confusion matrix. A confusion
matrix is a tabular representation that provides classiers performance based
on the correctly and incorrectly predicted inactive or active cases as shown in
(Fig. 10).
Fig. 10. Experimental Results for Physical Activity monitoring
Accuracy, Sensitivity and Specificity are used in order to measure the
performance of the classifier, based on the confusion matrix shown in (Fig. 11).
Title Suppressed Due to Excessive Length 27
Fig. 11. Confusion Matrix for Performance Evaluation
Accuracy: The accuracy of a test is its ability to differentiate the inac-
tive and active cases correctly. In other words, it is the percentage of correct
predictions. On the basis of Confusion Matrix, it is calculated by using the
below equation 1.
Accuracy =T P +T N
T P +T N +F P +F N (1)
Where, True positive (TP) = the number of instances correctly identified
as inactive cases False positive (FN) = the number of instances incorrectly
identified as inactive cases True negative (TN) = the number of instances cor-
rectly identified as active cases False negative (FP) = the number of instances
incorrectly identified as active cases
The results of the confusion matrix shown in Fig. 10, presents the accuracy
of the Naive Bayes classifier as 99.966%.
Sensitivity: is the ability of a test to correctly identify those with inactive
physical activities. Sensitivity is measured using the equation 2.
Sensitivity =T P
T P +F N
Sensitivity =h31977
(31977 + 0)i100
Sensitivity = 100
(2)
A sensitivity of 100% indicates that inactive cases were correctly identified
as inactive.
28 Liyakathunisa, Saima Jabeen, Manimala S, and Hoda A. Elsayed
Specificity: is the ability of the test to correctly identify those with active
physical activities. Specicity is measured using the equation 3.
Specif icity =T N
T N +F P
Specif icity =h24041
(24041 + 90)i100
Specif icity = 99.97
(3)
A specificity of 99.97% indicates that most active cases were correctly
identified as active.
From the accuracy, sensitivity and specificity it is clear that, the Naive
Bayes classifier predicts 99.966% accuracy, and with a specificity of 99.97%
which indicates that most of athletes are active and perform their physical
activities regularly, also the model has correctly predicted inactive cases with
100% Sensitivity. The proposed model can be tested on any health care appli-
cation as it has proven to be highly efficient in terms of accuracy, sensitivity
and specificity.
8 Conclusion
This chapter presents a brief review of data science techniques and algorithms
which are highly significant to process and also to overcome the challenges of
Big data in healthcare industry. Smart healthcare applications are focused
on the issue of applying technological solutions of data science and machine
learning to deal with the obstacles faced by remote patient monitoring. The
technological breakthrough in the fields of Internet of Things and Telecommu-
nication has enabled us to propose a case study for smart healthcare which
facilitates moving towards the growth of Smart Planet. The data collected
by the IoT devices, via biomedical sensors connected to the human body, is
analyzed and by the application of machine learning algorithms, prediction of
human activity is achieved with very good accuracy.
References
[1] Syed L., Jabeen S., Manimala S. (2018) Telemammography: A Novel Approach
for Early Detection of Breast Cancer Through Wavelets Based Image Processing
and Machine Learning Techniques. In: Hassanien A., Oliva D. (eds) Advances in
Soft Computing and Machine Learning in Image Processing. Studies in Computa-
tional Intelligence, vol 730. Springer, Cham
Title Suppressed Due to Excessive Length 29
[2] Telemedicine - Remote Patient Monitoring Systems. (n.d.).
http://www.aeris.com/for-enterprises/healthcare-remote-patient-monitoring.
Accessed Online 20 Dec 2017.
[3] Facing the tidal wave: De-risking pharma and creating value for patients (2016)
Deloitte Centre for Health Solutions.
[4] World Industry Outlook, Healthcare and Pharmaceuticals, The Economic Intel-
ligence Unit (2016), citing the International Diabetes Federation.
[5] 10 Countries that Spend the Most on Healthcare
http://hitconsultant.net/2016/04/01/10-countries-spend-healthcare/. Accessed
22 Dec 2017.
[6] SAP HANA Platform for Healthcare: Bringing the World Closer to Real-Time
Personalized Medicine https://blogs.saphana.com/2013/10/15/sap-hana-for-
healthcare-bringing-the-world-closer-to-real-time-personalized-medicine. Accessed
20 Dec 2017.
[7] Chris Eaton, Dirk Deroos, Tom Deutsch, George Lapis, and
Paul Zikopoulos, Understanding Big Data.: McGraw-Hill Companies.
http://public.dhe.ibm.com/common/ssi/ecm/en/iml14296usen/IML14296USEN.pdf.
Accessed 22 Dec 2017.
[8] Kevin Benedict, Moneyball (March 2012) Big Data, The Internet of Things and
Enterprise Mobility,, http://cloudcomputing.syscon.com/node/2181866. Accessed
24 Dec 2017.
[9] Christian Bizer, Peter Boncz, Michael L. Brodie, and Orri Erling (2012) The
meaningful use of big data: four perspectives, four challenges. SIGMOD Rec., vol.
40, no. 4, pp. 56-60. Accessed http://doi.acm.org/10.1145/2094114.2094129
[10] Zanella A, Bui N, Castellani A, Vangelista L, Zorzi M. (2014) Internet of things
for smart cities. IEEE Internet Things J.1(1):2232.
[11] Zaslavsky, A., Perera, C., Georgakopoulos, D. (2013) Sensing as a service and
big data. arXiv preprint arXiv:1301.0159.
[12] James Manyika et al. (2011) Big data: The next frontier for innovation, com-
petition, and productivity. McKinsey Global Institute Tech. rep.
[13] D.A. Reed, D.B. Gannon, and J.R. Larus (2012) Imagining the Future:
Thoughts on Computing. Computer, vol. 45, no. 1, pp. 25-30.
[14] Bauer H, Patel M, Veira J. (2016) The Internet of Things: siz-
ing up the opportunity [Internet]. New York (NY): McKinsey & Com-
pany. http://www.mckinsey.com/industries/high-tech/our-insights/the-internet-
of-things-sizing-up-the-opportunity. Accessed 24 Dec 2017.
[15] Ms. Lily Chianglin. Big Data Analytic for Smart Health Care Technology
https://www.itri.org.tw/eng/Content/MSGPic/contents.aspx?&SiteID=1&MmmID
=617751562433643461&MSID=744304106227526127. Accessed 24 Dec 2017.
[16] Smart Healthcare Solutions for Smart Cities.
http://www.smartcity.press/smart-healthcare-for-smart-cities/. Accessed 25
Dec 2017.
[17] Suciu, G., Suciu, V., Martian, A., Craciunescu, R., Vulpe, A., Marcu, I., Fratu,
O. (2015) Big data, internet of things and cloud convergencean architecture for
secure e-health applications. Journal of medical systems, 39(11), 141.
[18] Kahn,E. (2014) Natural language processing, big data, bioinformatics and bi-
ology. Int. J. Biol. Biomed. Eng. 8:107117.
[19] Dimitrov, D. V. (2016) Medical internet of things and big data in healthcare.
Healthcare informatics research, 22(3), 156-163.
30 Liyakathunisa, Saima Jabeen, Manimala S, and Hoda A. Elsayed
[20] Tung CE, Su D, Turakhia MP, Lansberg MG. (2015) Diagnostic yield of ex-
tended cardiac patch monitoring in patients with stroke or TIA. Front Neurol,
5:266.
[21] Famm K, Litt B, Tracey KJ, Boyden ES, Slaoui M. (2013) Drug discovery: a
jump-start for electroceuticals. Nature, 496(7444):159-61.
[22] Cuba-Gyllensten I, Gastelurrutia P, Riistama J, Aarts R, Nunez J, Lupon J,
et al. (2014) A novel wearable vest for tracking pulmonary congestion in acutely
decompensated heart failure. Int J Cardiol, 177(1):199-201.
[23] Senior M. (2014) Novartis signs up for Google smart lens. Nat Biotechnol,
32(9):856.
[24] Hoda Ahmed Galal Elsayed, Mariam Ahmed Galal, Liyakathunisa Syed
(2017) HeartCare+: A Smart Heart Care Mobile Application for Framingham-
Based Early Risk Prediction of Hard Coronary Heart Diseases in Middle
East. Mobile Information Systems, vol. 2017, Article ID 9369532, 11 pages,
doi:10.1155/2017/9369532.
[25] MyDario.com. (2016) Burlington (MA): MyDario. com. http://mydario.com/
Accessed 24 Dec 2017.
[26] SleepBot (2013) New York (NY): SleepBot. https://mysleepbot.com/ Accessed
23 Dec 2017.
[27] Your trusted source for health apps and devices reviewed by medical experts.
RANKED Health. http://www.rankedhealth.com/about/ Accessed 24 Dec 2017.
[28] Zhang, X. M., Zhang, N. (2011) An open, secure and flexible platform based on
internet of things and cloud computing for ambient aiding living and telemedicine.
In Computer and Management (CAMAN). International Conference on (pp. 1-4).
[29] Douglas A. Perednia, Ace Allen (1995) Telemedicine Tech-
nology and Clinical Applications. JAMA. 1995; 273(6):483488.
doi:10.1001/jama.1995.03520300057037.
[30] S. S. T. Ahmed, K. Thanuja, N. S. Guptha, S. Narasimha (2016) Telemedicine
approach for remote patient monitoring system using smart phones with an eco-
nomical hardware kit. In International Conference on Computing Technologies
and Intelligent Data Engineering (ICCTIDE’16), pp. 1-4.
[31] H. Anpeng, C. Chao, B. Kaigui, D. Xiaohui, C. Min, G. Hongqiao, et al. (2014)
WE-CARE: An Intelligent Mobile Telecardiology System to Enable mHealth Ap-
plications. IEEE Journal of Biomedical and Health Informatics, vol. 18, pp. 693-
702.
[32] Prodhan, U. K., Rahman, M. Z., Jahan, et. al. (2017) Development of a portable
telemedicine tool for remote diagnosis of telemedicine application. In International
Conference on Computing Communication and Automation (ICCCA-2017).
[33] Ackerman, M., Craft, R., Ferrante, F., Kratz, M., et. al. (2002) Chapter 6:
telemedicine technology. Telemedicine Journal and e-health, 8(1), 71-78.
[34] Dimitrov, D. V. (2016) Medical internet of things and big data in healthcare.
Healthcare informatics research, 22(3), 156-163.
[35] https://upxacademy.com/big-data-analysis-top-5-challenges/ Accessed 26 Dec
2017.
[36] https://healthitanalytics.com/news/top-10-challenges-of-big-data-analytics-in-
healthcare Accessed 26 Dec 2017.
[37] Snell E. (2015) Hacking Still Leading Cause of 2015 Health Data Breaches.
http://healthitsecurity.com.
Title Suppressed Due to Excessive Length 31
[38] Filkins, B. L., Kim, J. Y., Roberts, B. et. al. (2016) Privacy and security in the
era of digital health: what should translational researchers know and do about it.
American journal of translational research, 8(3), 1560.
[39] Chen L.M. (2015) Overview of Basic Methods for Data Science. In: Mathemat-
ical Problems in Data Science. Springer, Cham
[40] Afrati Foto, Jeffrey Ullman, (2009) Optimizing Joins in a Map-Reduce Envi-
ronment. Technical Report. Stanford InfoLab. (2009)
[41] J. Han, M. Kamber (2001) Data Mining: Concepts and Techniques. Morgan
Kaufmann, San Francisco, 2001.
[42] T. Kanungo, D.M. Mount, N. Netanyahu, et.al. (2004) A local search approx-
imation algorithm for k-means clustering. Computational Geometry: Theory and
Applications - Special issue on the 18th annual symposium on computational ge-
ometry. Volume 28 Issue 2-3, June 2004. Pages 89-112.
[43] www.kdnuggets.com/2016/09/poll-algorithms-used-data-scientists.html. Ac-
cessed 26 Dec 2017.
[44] https://analyticsindiamag.com/10-machine-learning-algorithms-every-data-
scientist-know/ Accessed 26 Dec 2017.
[45] www.kdnuggets.com/2015/05/top-10-data-mining-algorithms-explained.html.
Accessed 26 Dec 2017.
[46] Chandola, V.; Banerjee, A.; Kumar, V. (2009) Anomaly detection: A survey.
ACM Computing Surveys. 41 (3): 158. doi:10.1145/1541880.1541882
[47] https://www.forbes.com/sites/louiscolumbus/2017/12/24/53-of-companies-
are-adopting-big-data-analytics/#50bf384239a1. Accessed 26 Dec 2017.
[48] http://www.vcloudnews.com/every-day-big-data-statistics-2-5-quintillion-
bytes-of-data-created-daily/ Accessed 28 Dec 2017.
[49] https://data-flair.training/blogs/big-data-applications-various-domains/
Accessed 26 Dec 2017.
[50] https://www.datasciencecentral.com/profiles/blogs/the-hadoop-ecosystem-
hdfs-yarn-hive-pig-hbase-and-growing. Accessed 28 Dec 2017.
[51] https://www.linkedin.com/pulse/enabling-healthcare-analytics-raycare-
navdeep-singh-gill. Accessed 26 Dec 2017.
[52] David Niewolny. How the Internet of Things Is Revolutionizing Health-
care. white Paper https://www.nxp.com/docs/en/white-paper/ IOTREVHEAL-
CARWP.pdf. Accessed 28 Dec 2017.
[53] How wearable heart-rate monitors work, and which is best for you.
https://arstechnica.com/gadgets/2017/04/how-wearable-heart-rate-monitors-
work-and-which-is-best-for-you/ Accessed 28 Dec 2017.
[54] What Happened to the Smart Contact Lens for Diabetics.
https://labiotech.eu/contact-lens-glucose-diabetes/ Accessed 28 Dec 2017.
[55] Wearable sensors to monitor triggers for asthma, and more, (2015)
https://www.nsf.gov/news/special reports/science nation/wearablenano.jsp. Ac-
cessed 28 Dec 2017.
[56] Integrated Wearable Technology. http://www.gpssmartsole.com/gps-smart-
sole.php. Accessed 28 Dec 2017.
[57] Eduardo Freitas, Amndio Azevedo (2016) Wireless Biomedical Sensor Net-
works: The Technology, Proceedings of the 2nd World Congress on Electrical
Engineering and Computer Systems and Science (EECSS’16).
[58] Patil, K.K., Ahmed, S.T. (2014) Digital Telemammography Services for Rural
India, Software Components and Design Protocol, IEEE International Conference
on Advances in Electronics, Computers and Communication.
32 Liyakathunisa, Saima Jabeen, Manimala S, and Hoda A. Elsayed
[59] A. Reiss, D. Stricker (2012) Introducing a New Benchmarked Dataset for Activ-
ity Monitoring. The 16th IEEE International Symposium on Wearable Computers
(ISWC).
[60] Cloudera: Machine Learning—Analytics—Cloud. https://www.cloudera.com/
Accessed 15 Nov 2017.
[61] https://www.linkedin.com/pulse/smart-living-we-might-live-artificial-
intelligence-iot-karl-smith. Accessed 15 Dec 2017.
[62] https://blog.sqlauthority.com/2013/10/02/big-data-what-is-big-data-3-vs-of-
big-data-volume-velocity-and-variety-day-2-of-21/ Accessed 23 Dec 2017.