PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Heart disease is the leading cause of death, and experts estimate that approximately half of all heart attacks and strokes occur in people who have not been flagged as 'at risk.' Thus, there is an urgent need to improve the accuracy of heart disease diagnosis. To this end, we investigate the potential of using data analysis, and in particular the design and use of deep neural networks (DNNs) for detecting heart disease based on routine clinical data. Our main contribution is the design, evaluation, and optimization of DNN architectures of increasing depth for heart disease diagnosis. This work led to the discovery of a novel five layer DNN architecture-named Heart Evaluation for Algorithmic Risk-reduction and Optimization Five (HEARO-5)-that yields best prediction accuracy. HEARO-5's design employs regularization optimization and automatically deals with missing data and/or data outliers. To evaluate and tune the architectures we use k-way cross-validation as well as Matthews correlation coefficient (MCC) to measure the quality of our classifications. The study is performed on the publicly available Cleveland dataset of medical information, and we are making our developments open source, to further facilitate openness and research on the use of DNNs in medicine. The HEARO-5 architecture, yielding 99% accuracy and 0.98 MCC, significantly outperforms currently published research in the area.
Content may be subject to copyright.
A Novel Deep Learning Approach to Improving Heart Disease Diagnosis
Nathalie-Sofia Tomov
Abstract
Heart disease is the leading cause of death, and experts estimate that approximately half of all
heart attacks and strokes occur in people who have not been flagged as ’at risk.’ Thus, there
is an urgent need to improve the accuracy of heart disease diagnosis. To this end, I investigate
the potential of using data analysis, and in particular the design and use of deep neural networks
(DNNs) for detecting heart disease based on routine clinical data. Our main contribution is the
design, evaluation, and optimization of DNN architectures of increasing depth for heart disease
diagnosis. This work led to the discovery of a novel five layer DNN architecture – named Heart
Evaluation for Algorithmic Risk-reduction and Optimization Five (HEARO-5) – that yields best
prediction accuracy. HEARO-5’s design employs regularization optimization and automatically
deals with missing data and/or data outliers. To evaluate and tune the architectures I use k-
way cross-validation as well as Matthews correlation coefficient (MCC) to measure the quality of
our classifications. The study is performed on two datasets: 1) the publicly available Cleveland
dataset of medical information, and 2) a dataset of echocardiogram reports from the University of
Tennessee Sports Medicine practice. The HEARO-5 architecture, yielding 99% accuracy and 0.98
MCC on both datasets, significantly outperforms currently published research in the area.
Keywords: machine learning, DNN, cardiology, translational medicine, cardiovascular disease,
hyperparameter optimization
Corresponding author
Preprint submitted to Elsevier June 23, 2021
1. Introduction
Heart disease is the leading cause of death worldwide, killing twenty million people per year
(WHO)[26]. An accurate and early diagnosis could be the difference between life and death for
people with heart disease. However, doctors misdiagnose nearly 1/3 of patients as not having
heart disease (British Medical Bulletin) [6], causing these patients to miss out on potentially life-5
saving treatment. This is a problem of increasing concern, as the number of Americans with heart
failure is expected to increase by 46 percent by 2030 (American Heart Association)[11]. What
makes diagnosing heart disease a challenging endeavor for any physician is that while chest pain
and fatigue are common symptoms of atherosclerosis, as many as 50 percent of people lack any
symptoms of heart disease until their first heart attack (Center for Disease Control) [4]. Discovering10
biomarkers (for heart disease) – measurable indicators of the severity or presence of some disease
– is preferred [28], but in many cases there are no clear biomarkers, and multiple tests may be
required and analyzed together. Most doctors use the guidelines recommended by the American
Heart Association (AHA) [9], which test eight widely recognized risk factors such as hypertension,
cholesterol, smoking, and diabetes [8, 10]. However, this risk assessment model is flawed since it15
rests on an assumed linear relationship between each risk factor and heart disease outcome, while
the relationships are complex and with non-linear interactions [25]. Oversimplification may cause
doctors to make errors in their predictions, or overlook important factors that could determine
whether or not a patient receives treatment. Doctors must also know how to interpret the diagnostic
implications of medical tests which vary between patients and require tremendous expertise.20
The use of Machine Learning (ML) data analysis techniques can alleviate the need for hu-
man expertise and the possibility of human error while increasing prediction accuracy[12]. ML
algorithms apply flexible prediction models based on learned relationships between variables in
the input dataset. This can prevent the oversimplification of fixed diagnosis models such as the
AHA guidelines. In fact, a neural network algorithm with 76 percent accuracy has been proven to25
correctly predict 7.6% more events than the AHA method [21]. Here, we significantly improve on
these already very promising ML results by designing and tuning deep neural network (DNN) ar-
chitectures of increasing depth for detecting heart disease based on routine clinical data. We show
that a flexible design and the subsequent tuning of the (many) hyperparameters of a DNN can
yield up to 99% accuracy. The results were evaluated and validated using k-way cross-validation as30
well as Matthews correlation coefficient (MCC) to measure the quality of the classifications. The
best results were obtained on a novel five-layer DNN – named Heart Evaluation for Algorithmic
Risk-reduction and Optimization Five (HEARO-5) – that employs regularization optimization and
automatically deals with missing data and/or data outliers. The HEARO-5’s accuracy is 99%
with an MCC of 0.98, which significantly outperforms currently published research in the area, to35
further establish the appeal of using ML data analysis in diagnostic medicine.
2. Literature Review
There are a number of research papers that use artificial neural networks to improve heart
disease diagnosis. Yu et al. concluded that a neural network topology with two hidden layers
was an accurate model with 94% test data accuracy [27]. They focus on the multiplicity of risk40
factors in constructing their model to classify features before determining a possible diagnosis.
This study concluded that neural networks are an effective method of analyzing cases when it is
impossible to create a strict mathematical model but where there is a sufficiently representative
2
set of samples. Vinodhini et al. build on this research by performing feature classification with
statistical models such as the chi square and then using a neural network as a predictive model [24].45
This method proved successful overall, but exhibited weaker performance when given redundant
attributes. Loh et al. demonstrate the accuracy of deep neural networks by proving their ability to
learn from nonlinear relationships in data [16]. However, they faced the issue of overfitting, when
an algorithm learns too much from training data and becomes less capable of applying itself to
unfamiliar data. Kim et al. help address the problem of overfitting by ranking features, training50
the neural network with each feature ranking, and then training the neural network to output a
potential diagnosis [14]. This helps ensure the network learns from numerically weighted important
relationships in training data which can also be applied to unfamiliar data.
3. Contribution to the Field
While the current algorithms are effective, there is always a compelling need for improved55
algorithms that diagnose heart disease more accurately using accessible tests. To this end, the
work described here makes the following main contributions:
The design of a unique and flexible heart disease diagnosis tool based on variable-layer DNN
with regularization optimization that solely uses routine clinical data;
Development of HEARO-5 – a specialized 5-layer DNN architecture for detecting heart dis-60
ease, based on the evaluation and tuning of hyperparameters – that is of very high accuracy
(99% and 0.98 MCC), significantly outperforming currently published research in the field;
The HEARO framework as a DNN data analytics research tool in diagnostic medicine and
HEARO-5 as a benchmark, making them available for comparison and further studies, facil-
itating openness and research on the use of DNN techniques in medicine.65
4. DNN Background, Design, and Implementation
4.1. Overview
The heart disease diagnostic tool that we designed uses standard fully-connected NNs. To con-
struct the DNN architectures for hearth disease diagnosis, one can use one out of the many currently
available frameworks, including TensorFlow [29], Keras, PaddlePaddle, Caffe, MagmaDNN [31],70
etc. The most compute intensive building block for the DNN is the matrix-matrix multiplication
(GEMM), which is available to use through highly optimized math libraries like CUBLAS, CUDNN,
MKL, MAGMA [34], and others. As the data that we have and need for training is not that large
(see Section 5.1), we developed a parametrized DNN in Python, using NumPy as a backend for
the linear algebra routines needed. The code is vectorized for performance, expressed in terms of75
matrix-matrix multiplications, and therefore can be easily ported to C/C++ code, calling highly
optimized BLAS/GEMM implementations, which provides functional and performance portability
across various computing architectures [31]. Our design and implementation is inspired by Andrew
Ng’s DNN designs and courses on deep learning, available on Coursera [30].
3
Back
propagation
Forward
propagation
.
.
.
1
13
.
.
.
1
2
n1
.
.
.
1
2
nL-1
. . .
input layer
hidden layer 1 hidden layer L-1
output
layer L
W1
WL
n1x13
1xnL-1
Training
data
matrix
X
size 13xN
Outputs
Y
(size 1xN)
13
N
1
N
Z1 = W1 A0 + b1
A1 = σ1 ( Z1 )
A0 = X
ZL-1 = WL-1 AL-2 + bL-1
AL-1 = σL-1 ( ZL-1 )
ZL = WL AL-1+ bL
AL = σL ( ZL )
dZL = AL - Y
dWL = dZL ATL-1 / N
dbL = np.sum(dZL, axis=1, keepdims =True)/N
dZ1 = WT2 dZ2 .* σ1( Z1 )
dW1 = dZ1 AT0 / N
db1 = np.sum(dZL, axis=1, keepdims =True)/N
. . .
. . .
. . .
. . .
0) 1) L-1) L)
L+1)
2L)
Figure 1: Parametrized DNN architecture and the main computational steps for its training. Training data Xconsists
of 13 features (routine clinical data per patient; can also be parametrized) and Ntraining examples. Weights W, b
are trained using batch stochastic gradient descent method to make predictions AL”match” the given outcomes Y.
4.2. Flexible DNN design80
The framework design, notations, and main computational steps that we investigate are il-
lustrated on Figure 1. As shown, the neural network is organized into Lfully-connected ’layers’
(i= 1, ..., L) with ninodes (or artificial neurons) per layer that function together to make a pre-
diction. The connections between layers i1 and iare represented by numerical weights, stored
in matrix Wiof size ni×ni1, and vector biof length ni. Thus, if the input values for layer i,85
given by the values at the ni1nodes of layer i1, are represented as a vector ai1of size ni1,
the output of layer iwill be a vector of size ni, given by the matrix-vector product Wiai1+bi.
As training will be done in parallel for a batch of nb vectors, the inputs ai1will be matrices Ai1
of size ni1×nb and the outputs will be given by the matrix-matrix products Zi=WiAi1+bi,
where ”+” adds bito each of the nb columns of the resulting matrix.90
4.3. Main DNN building blocks
The Forward propagation process, given by steps 0, ..., L, represents a non-linear hypothe-
sis/prediction function HW,b(X)ALfor given inputs Xand fixed weights W, b. The weights
must be modified so that the predictions HW,b(X) become close to given/known outcomes stored
in Y. This is known as a classification problem and is a case of so called supervised learning. The95
modification of the weights is defined as a minimization problem on a convex cost function J, e.g.,
min
W,b J(W, b),where J(W, b) = 1
N
N
X
i=1
yilog HW,b(xi) + (1 yi) log(1 HW,b(xi)).
This is solved by a batch stochastic gradient descent method – an iterative algorithm using a batch
of nb training examples at a time. The derivatives of Jwith respect to the weights (Wand b) are
4
derived over the layers using the chain rule for differentiating compositions of functions. They are
computed then by the backward propagation steps L+1, ..., 2L, and used to modify their respective100
weights Wi, biduring the iterative training process for each layer ias:
Wi=WiλdWi, bi=biλdbi,
where λis a hyperparameter referred to as learning rate. The σ1, ..., σLfunctions are the activation
functions (possibly different) for the different layers of the network, and σ0are their derivatives.
We have coded activation function choices for ReLU, sigmoid, tanh, and leaky ReLU. The ”.*”
notation is for point-wise multiplication.105
4.4. Algorithmic Optimization: Regularization
Regularization is a standard technique that prevents overfitting by penalizing large weight val-
ues. DNNs tend to assign higher weight values for certain training data points, which corresponds
to a high variance. Regularization helps address the problem of high variance on training data,
which can improve accuracy on test data. Regularization is typically done by adding a penalty110
term of the form α
2N||W, b||2to the cost function J, where ||W, b|| is some norm of the weights, e.g.,
L1 or L2. The regularization parameter αimposes a penalty on large weights, thereby ensuring
that we do not overfit training data. Another advantage of regularization is that it can prevent
an algorithm from learning from data outliers, which is essential for a smaller dataset such as
the heart disease patient set used in this research. Regularization causes the outliers to remain115
in the dataset, but reduces the algorithm’s likelihood of learning from these values. Therefore,
we add regularization to our model to investigate possible improvements in accuracy by reducing
overfitting and automatically decreasing the impact of any outliers.
4.5. Hyperparameter Optimization
Part of the challenge of coding a neural network is structuring it so it is both accurate and120
efficient. One needs to determine how many layers to use, how many nodes per layer to use, etc.
This was critical for example in deep convolutional networks for image recognition, where significant
improvement on the prior-art configurations was achieved by pushing the depth to 16-19 weight
layers, e.g., as in the popular VGG network [32]. Here also, we have determined that tuning for
the depth and number of nodes per layer is critical for the accuracy. Moreover, we parametrized125
our framework, referred to as HEARO further on, as given above and in Figure 1.
The parameters control the network configuration and accuracy, and therefore, the network
must be highly optimized/tuned for them. These configuration parameters are also called hyper-
parameters. A HEARO configuration is determined by the following list of hyperparameters:
HEARO hparams = [L, n1, ..., nL, σ1, ..., σL, λ, α, nb, epochs],(1)
where epochs is the number of training iterations throughout the entire training set X,σi is the130
activation function for layer i(1, 2, 3, or 4 for ReLU, sigmoid, tanh, or leaky ReLU, respectively),
and the rest are as given above.
Thus, given a list of hyperparameters HEARO hparams, the HEARO framework trains itself
(determining weights W, b) on a given input training data set Xand specified outcomes Y, and
the challenge now becomes how to select the ”best” hyperparameters.135
5
5. Optimization Methodology and the HEARO-5 Architecture
5.1. Information About Dataset 1
HEARO uses training and test data from the University of California Irvine machine learning
repository. Data have been preprocessed, where missing values are replaced with the value -1 to
prevent them from significantly impacting the algorithm’s model. There were approximately 12140
missing values total. We applied also feature scaling to unit length. This dataset, provided by
the Cleveland Clinic Foundation, contains 75 total attributes of patient medical information for
303 patients[7]. The following 11 attributes are used: 1) age, 2) sex, 3) chest pain type, 4) resting
blood pressure, 5) cholesterol, 6) fasting blood sugar, 7) resting electrocardiographic results, 8)
maximum heart rate achieved, 9) exercise-induced angina, 10) ST depression, and 11) slope of145
the peak exercise ST segment. These attributes have been selected as optimal features by other
researchers using this dataset [1] because they are considered most closely linked to heart disease.
Chest pain type is categorized by number, where 1 represents typical angina provoked by exer-
cise or stress, and 2 represents atypical angina which is persistent chest discomfort [13]. Common
metrics such as resting blood pressure, cholesterol, and fasting blood sugar can be indicative of a150
patients general health and the state of their blood vessels, which is often shaped by the accumu-
lation of plaque as an indicator of developing heart disease. Electrocardiogram results are visual
representations of the hearts activity, and can help doctors or algorithms determine if it is pump-
ing at a normal rate or if circulation is impeded. An ST-T wave abnormality can be measured by
wave height, and often has several implications: a ventricular aneurysm, coronary artery spasm, or155
artery tightness [5]. These are all indicators of heart failure, and are therefore important features
to an algorithm diagnosing heart disease. Slope of the peak exercise ST segment is a similar way of
visually assessing the hearts function when it must circulate more blood, in the case of exercise [15].
In the dataset, this is characterized by the values 1, representing upward slope, 2, representing flat
slope, and 3, representing downward slope.160
This dataset is used because it is publicly accessible and therefore improves the reproducibility
of results. HEARO uses these eleven features to diagnose heart disease because of their diversity,
availability, and ability to identify heart disease at different stages of development. The combination
of these features can create a model that accurately evaluates relationships between diverse patient
conditions and heart disease diagnosis [19].165
5.2. Information About Dataset 2
The second dataset was used for real-world clinical validation of the HEARO-5 framework as
a platform that can be applied to a variety of data. This dataset, compiled as part of a pilot test
partnership with the University of Tennessee, consists of 350 echocardiogram reports taken from
current student athletes at the University of Tennessee (UT). The variables that were studied are170
as follows: age, race, sex, aortic root diameter, end-diastolic left ventricular posterior wall thickness
(LVPWd), interventricular septal thickness (IVS), left ventricular outflow tract diameter (LVOT),
left anterior descending coronary artery diameter (LAD), right ventricular inner diameter (RVID),
ejection fraction (EF), and body surface area (BSA), in addition to whether each patient had high
enough risk to require additional testing.175
Outcomes are recorded as whether or not an abnormality was detected, what type of abnormal-
ity if any was observed, and whether the patient was flagged for further cardiac testing. The types
of abnormalities include the following: right ventricular dilation, hypertrophic cardiomyopathy,
6
atrial septal defect (specifically patent foramen ovale), and aortic regurgitation. This is accompa-
nied by physicians’ notes regarding type of follow-up required - i.e routine yearly check-up, stress180
test, or bubble test. Within the processed training dataset, abnormal outcomes with any follow-up
recommendation are coded as ’1’ and normal outcomes are coded as ’0’.
Most college athletic medical screenings involve echocardiograms. Therefore, this dataset is
used as validation for the software’s potential application within sports medicine programs as an
automated decision support tool. Through partnership with UT Sports Medicine, the presented185
software framework is being developed into an Application Programming Interface that integrates
with the UT patient management platform to be used in a clinical setting.
5.3. Accuracy Evaluation
In addition to measuring the percent accuracy, we also use K-fold cross validation to evaluate
the accuracy. This is a standard technique for evaluating more accurately predictions, especially190
when the size of the training data set is small, like in our case. We use it also to flag possible cases
of overfitting, which is often a threat when extensively tuning and the data set used is small. In
order to maintain about a 2:1 training to test data ratio, we mostly use 3-fold cross validation,
where two parts are assigned for training and one for testing.
Furthermore, we use the Matthews correlation coefficient (MCC) to analyze the algorithm’s195
generalization abilities given a dataset with unbalanced class outcomes. In the Cleveland machine
learning repository dataset, the class distribution of the two possible cases (0/1) is as follows: 164
’0’ instances and 139 ’1’ instances. The MCC evaluates how well the algorithm performs on all
possible data outcomes regardless of their ratio within the dataset. This adds further analysis to
the potentially biased measure of percent accuracy. MCC is defined through the following formula:200
MCC =T P T N F P F N
p(T P +F P )(T P +F N)(T N +F P )(T N +F N ),
where TP represents true positives, TN represents true negatives, FP represents false positives,
and FN represents false negatives.
5.4. Optimization Methodology
The HEARO framework gives a user the flexibility to easily run and compare different con-
figurations based on their accuracy tests described in Section 5.3, which makes it a very good205
candidate for so called empirical optimization/tuning [33]. This is a process where a large number
of possible configurations are generated (1) and run on a given platform to discover the one that
gives the best results.
The effectiveness of empirical optimization depends on the chosen parameters to optimize, and
the search heuristic used. A disadvantage is the time cost of searching for the best configuration210
variant, but in the case of this research, this is not a problem as the size of the data set is not
that large. Furthermore, the search space is restricted by taking L= 2..10, nL = 1, ni = 1..13,
λ∈ {0.001,0.01,0.1}, L2 regularization with α∈ {0,0.7,1},nb =N, and epochs = 6000.
Exhaustively testing the search space described above, using automated Python scripts running
in parallel to randomly generate 10,000 different configurations and run them, revealed that the215
following configuration:
HEARO-5 = [5,9,7,5,3,1,1,1,1,1,2,0.01,0.7,200,6000]
gives best accuracy, as further illustrated next in the results section.
7
6. Results
The empirical optimization within the search space described in Section 5.4 revealed that the
HEARO-5 architecture yields best accuracy. The percent accuracy of HEARO-5 with α= 0 (no220
regularization) is compared to the accuracy of configurations with 2 and 7 layers in Figure 2, Left.
All graphs indicate an algorithm’s performance on test data unless specified otherwise. Figure 2,
Right and Figure 5, Left show the effect of the learning rate λon HEARO-5 and a 7-layer HEARO
architecture, respectively.
Figure 2: Left: Accuracy comparison of HEARO-5 vs. 2 and 5 layer networks. Right: Effect of λin HEARO-5.
Figure 3: Left: Effect of λon 7-layer HEARO architecture. Right: Effect on regularization on HEARO-5 variants.
HEARO-5 with α= 0.7 exhibited 99% accuracy on test data, and a Matthews correlation225
coefficient of 0.98. This is shown in Figure 5, Right, and further discussed in Section 6.3. On the
clinical echocardiogram data gathered through pilot testing, this previously validated framework
achieved 99% accuracy on test data, and a Matthews correlation coefficient of 0.96.
6.1. Feature Ranking
Once the accuracy was obtained, each feature was successively eliminated from the training230
dataset to gain a sense of which variables are most important to the algorithm’s learning proce-
8
Figure 4: Left: Accuracy of HEARO-5 Framework on Echocardiogram Data. Right: Distribution of Classification
Outcomes on Echocardiogram Data (TP = true positives; TN = true negatives; FP = false positives; FN = false
negatives
Figure 5: Left: Distribution of classification outcomes (TP = true positives; TN = true negatives; FP = false
positives; FN = false negatives
9
dure. A script was coded to remove one feature from the original dataset and run the HEARO-5
framework with this revised dataset containing all but one feature. By this method, the algorithm
was run eleven times, each time excluding a different feature.
The results of this procedure show that most important feature, resulting in the lowest accuracy235
when eliminated, was blood sugar (17 missed diagnoses after elimination). In order from most
significant to least significant, the features are as follows with the number of inaccurate results
in parentheses: fasting blood sugar (17), chest pain type (13), ST depression induced by exercise
(13), slope of peak ST segment (12), exercise induced angina (11), age (11), sex (11), cholesterol
(9), maximum heart rate achieved (8), resting blood pressure (5), resting ECG results (4).240
6.2. Comparison with Previously Published Results
Stanford researchers used a convolutional neural network and obtained precision, recall and F1
scores of 0.80, 0.82, and 0.80 respectively [22]. HEARO-5 outperforms these results, as it achieves
precision, recall, and F1 of 0.98, 1, and 0.99 respectively. In 2016, Aravinthan et al. applied a
Naive Bayes classifier and artificial neural network to this dataset with accuracy of 81.3% and245
82.5%, respectively [1]. A study published in the International Journal of Computer Applications
(Marikani) obtained results of 95.4% and 96.3% accuracy for classification tree and random forest
algorithms [17].
Of interest is also to mention logistic regression and the fact that it is the least accurate
algorithm, likely due to its approach to fitting a fluctuating features data with non-linear correlation250
to heart disease.
Furthermore on accuracy, the K-fold cross validation tests confirmed that HEARO-5 effectively
reduces overfitting, as the cross validated accuracy was approximately the same as accuracy on
test data with the set ratio.
The 0.98 MCC of HEARO-5 illustrates the HEARO-5 accurate evaluation of all class outcomes.255
Matthews correlation coefficient ranges from -1 to 1, where 1 represents perfectly balanced accuracy.
Therefore, results of 0.98 MCC and 99% accuracy are indicative of the algorithm’s comprehensive
data analysis model that is not skewed towards any particular outcome.
The results of clinical validation on echocardiogram data provides further evidence of the al-
gorithm’s accuracy and versatility. Because the algorithm continues to achieve 99% accuracy on260
a dataset containing different metrics, it has the potential to be flexibly applied as a platform for
clinical decision support. The number of false negatives could be reduced with access to more
data, as the learning procedure would benefit from a greater variety of patients with abnormali-
ties. Because this dataset consists of young (18-24) athletes, heart abnormalities are less common.
The dataset is being expanded continuously by UT Sports Medicine, thereby helping address this265
potential issue.
6.3. Effect of Regularization
While the unregularized DNN exhibits a discrepancy between training accuracy and test accu-
racy (99% on training, 93% on test), regularization increased the accuracy on test data to 99%.
The regularization improved the accuracy on test data by reducing the impact of outliers (and/or270
missing data) on training data. On a relatively small dataset, outliers can inhibit the algorithm’s
ability to learn from consistent relationships in training data, and do not add scientific value.
Therefore, by regulating the effect of outliers on learning, regularization improves the algorithms
ability to generalize while maintaining the same scientific standard. Because regularization reduces
overfitting on training data, the algorithms accuracy is expected to decrease on training data. In275
10
the case of HEARO-5 with α= 0 (unregularized), a training accuracy of 99% with somewhat
lower test accuracy is indicative of overfitting (see Figure 5, Right). The algorithm is learning
from the noise outlier values in the training data, detracting from its ability to generalize broader
relationships in data. Learning these relationships is vital to accuracy on an unfamiliar dataset.
7. Conclusions and Future Directions280
This work investigated and showed the potential of using DNN-based data analysis for detecting
heart disease based on routine clinical data. The results show that, enhanced with flexible designs
and tuning, DNN data analysis techniques can yield very high accuracy (99% accuracy and 0.98
MCC), which significantly outperforms currently published research in the area, to further establish
the appeal of using ML DNN data analysis in diagnostic medicine. Pending reviews and publication,285
we are preparing to release the HEARO software framework as an open source, and HEARO-5 as
a benchmark, making the software available for comparison and further facilitating openness and
research on the use of DNN techniques in medicine.
While the current developments are mostly research with excellent proof-of-concept results,
further research and development is necessary in order to turn it into a robust diagnostic tool, e.g.,290
that doctors consult and use routinely to make a more informed diagnosis. One potential drawback
of the model is that it uses a fairly limited dataset, and therefore can only output a general heart
disease risk evaluation. Future directions involve expanding the dataset so the algorithm can output
more specific information on the exact type of heart abnormality. Thus, research is needed in the
data analytics area and its intersection with data-based medical diagnosis – including automatic295
search for best features, as well as possible features expansion or features reduction, e.g., due
to lack of certain clinical data. Future directions include extending this analysis to construct a
more thorough model that includes heart visualizations and CT image data. More features can
provide more data for the algorithm to learn from, creating a more complex model and ensuring
a more accurate and detailed prediction. Another area of future research would involve using300
speed optimization tools and accelerated linear algebra backends such as MagmaDNN for GPUs
to improve the algorithm’s ability to process large amounts of data and find best configurations in
parallel. HEARO is currently being developed into a production quality software package with a
friendly user interface, e.g., to facilitate use by doctors or even patients directly.
8. Conflict of Interest Statement305
Declarations of interest: none No funding was received for this work.
9. References
[1] Aravinthan, K. Vanitha, M. ”A comparative study on prediction of heart disease using cluster and risk based
approach.” International Journal of Advanced Research in Computer and Communication Engineering, Feb 2016.
[2] Beant, Kaur. ”Review on Heart Disease Prediction system using Data Mining Techniques.” International Journal310
on Recent and Innovation Trends in Computing and Communication, Vol 2(10), 2014.
[3] Ben-Hur, Asa. ”Support Vector Machines and Kernels for Computational Biology.” PLOS Computational Biology
Journal, 31 Oct. 2008.
[4] ”CDC: U.S deaths from heart disease, cancer on the rise.” American Heart Association News. 24 Aug. 2016.
[5] Davie, AP et al. ”Value of the electrocardiogram in identifying heart failure due to left ventricular systolic315
dysfunction.” British Medical Journal. Volume 312, Issue 7025.
11
[6] Davies, S W et al. ”Clinical presentation and diagnosis of coronary artery disease: stable angina.” British Medical
Bulletin. Volume 59, Issue 1. Oct. 2001.
[7] Detrano, Robert. (1990). Heart Disease Data Set [processed.cleveland.data]. Retrieved from
https://archive.ics.uci.edu/ml/datasets/Heart+Disease320
[8] ”FHS Research Policies.” Framingham Heart Study: A project of the national heart, blood, and lung institute
and Boston University.
[9] Fisher, Edward. ”Coronary Artery Disease - coronary heart disease.” American Heart Association. 26 Apr, 2017.
[10] Goff DC, Lloyd-Jones DM, Bennett G, Coady, et al. ”2013 ACC/AHA Guideline on the Assessment of Car-
diovascular Risk: A Report of the American College of Cardiology/American Heart Association Task Force on325
Practice Guidelines.” National Institute of Health.
[11] ”Heart Disease Facts.” American Heart Association 2015 Heart Disease and Stroke Update, compiled by AHA,
CDC, NIH and other governmental sources.
[12] Hutson, Matthew. ”Self-taught artificial intelligence beats doctors at predicting heart attacks.” Science Maga-
zine, 14 Apr. 2017.330
[13] Kawachi I, Sparrow D, Vokonas PS. ”Symptoms of anxiety and risk of coronary heart disease. The Normative
Aging Study.” Circulation Journal, Volume 90, Issue 5. 1 Nov. 1990.
[14] Kim, Jae et al. ”Neural Network-Based Coronary Heart Disease Risk Prediction Using Feature Correlation
Analysis.” Journal of Healthcare Engineering, 6 Sept. 2017.
[15] Lepeschkin, Eugene. ”The Measurement of the Q-T Interval of the Electrocardiogram.” Circulation Journal,335
Volume 6, Issue 3. 1 Sept. 1952.
[16] Loh, Brian et al. ”Deep learning for cardiac computer-aided diagnosis: benefits, issues & solutions.” mHealth
Journal, 19 Oct. 2017.
[17] Marikani, T. Shyamala, K. ”Prediction of Heart Disease Using Supervised Learning Algorithms.” International
Journal of Computer Applications, Volume 165, May 2017.340
[18] Mcnulty, Eileen. ”Machine learning can make cardiology diagnoses better than doctors can.” Dataconomy, 8
Jul. 2014.
[19] Ng, Kenney. ”Using AI and science to predict heart failure.” IBM Research, 5 Apr. 2017.
[20] Obermeyer Z, Emanuel EJ. ”Predicting the FutureBig Data, Machine Learning, and Clinical Medicine. The
New England journal of medicine 2016.” National Institute of Health PubMed Library.345
[21] Paschalidis, Yannis. ”How machine learning is helping us predict heart disease and diabetes.” Harvard Business
Review. 30 May. 2017.
[22] Rajpurkar P, Hannun A, et al. ”Cardiologist-level Arrythmia Detection with Convolutional Neural Networks.”
Stanford University Research Publications. 6 Jul. 2017.
[23] R Chitra et al. ”Analysis of myocardial infarction risk factors in heart disease data set.” Allied Academies350
Biology and Medicine Case Report. Volume 1, Issue 1. 3 Aug. 2017.
[24] Vinodhini, G. ”A comparative performance evaluation of neural network based approach for sentiment classifi-
cation of online reviews.” Journal of King Saud University. Volume 28, Issue 1.
[25] Weng, Stephen. ”Can machine-learning improve cardiovascular risk prediction using routine clinical data?”
PLOS Journals, 4 Apr. 2017.355
[26] World Health Organization. Global Status Report on Noncommunicable Diseases. Geneva, Switzerland: World
Health Organization, 2014.
[27] Yu, Oleg. ”Coronary heart disease diagnosis by artificial neural networks including genetic polymorphisms and
clinical parameters.” Journal of Cardiology. Volume 59, Issue 2.
[28] Johann, D. et al. ”Clinical Proteomics and Biomarker Discovery”, Annals of the New York Academy of Sciences.360
Volume 1022, Number 1, January 12, 2006.
[29] Abadi, M. et al. ”TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems”, CoRR, Volume
abs/1603.04467, 2016.
[30] Ng, Andrew. ”Neural Networks and Deep Learning”, Coursera. https://www.coursera.org/learn/neural-
networks-deep-learning365
[31] X
[32] Simonyan, K. et al. ”Very Deep Convolutional Networks for Large-scale Image Recognition”, arXiv:1409.1556v6
[cs.CV], April 10, 2015.
[33] Li, Yinan at al. ”A Note on Auto-tuning GEMM for GPUs”, Computational Science ICCS 2009. Springer,
Berlin, Heidelberg. 2009. https://doi.org/10.1007/978-3-642-01970-8 89370
[34] Nath, Ra jib et al. ”An improved MAGMA GEMM for Fermi graphics processing units”, The International
Journal of High Performance Computing Applications, vol. 24, November, 2010.
12
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Data mining have great potential for healthcare industry to enable health systems to systematically use data and identify the efficiency and improve care with reduce cost. Due to complexity of healthcare and slower improvement of technology adoption needs to implement effective data mining. In data mining there are variety of techniques and methods to suggest or decision making knowledge in database and extracting or support prediction of data. With growing research on heart disease prediction system, this research work has developed a Heart disease prediction with comparative study of several data mining algorithms. Many data mining prediction tools are there in worldwide to make prediction for medical data. In this study it is observed that compare several algorithms to analysis the performance, accuracy of the heart disease prediction system. The commonly used techniques of heart disease prediction and their analytics, summaries are summarized in this research work.
Presentation
Full-text available
2017 Summer Research Experiences for Undergraduate (REU), Research Experiences in Computational Science, Engineering, and Mathematics (RECSEM), Knoxville, TN
Article
Full-text available
Cardiovascular diseases are one of the top causes of deaths worldwide. In developing nations and rural areas, difficulties with diagnosis and treatment are made worse due to the deficiency of healthcare facilities. A viable solution to this issue is telemedicine, which involves delivering health care and sharing medical knowledge at a distance. Additionally, mHealth, the utilization of mobile devices for medical care, has also proven to be a feasible choice. The integration of telemedicine, mHealth and computer-aided diagnosis systems with the fields of machine and deep learning has enabled the creation of effective services that are adaptable to a multitude of scenarios. The objective of this review is to provide an overview of heart disease diagnosis and management, especially within the context of rural healthcare, as well as discuss the benefits, issues and solutions of implementing deep learning algorithms to improve the efficacy of relevant medical applications.
Article
Full-text available
A major type of heart disease identified all over the world is Myocardial Infarction (MI), commonly called as heart attack. Sudden blockage in a coronary artery by a blood clot due to damage or death of heart muscle is called Myocardial Infarction (MI). Many researches have attempted to detect MI from identified risk factors using intelligent and data mining algorithms. The commonly used data set for MI detection is Cleveland data set. The purpose of the present study is to investigate the risk factors of Cleveland data set and their significant role in MI detection.
Article
Full-text available
Background Of the machine learning techniques used in predicting coronary heart disease (CHD), neural network (NN) is popularly used to improve performance accuracy. Objective Even though NN-based systems provide meaningful results based on clinical experiments, medical experts are not satisfied with their predictive performances because NN is trained in a “black-box” style. Method We sought to devise an NN-based prediction of CHD risk using feature correlation analysis (NN-FCA) using two stages. First, the feature selection stage, which makes features acceding to the importance in predicting CHD risk, is ranked, and second, the feature correlation analysis stage, during which one learns about the existence of correlations between feature relations and the data of each NN predictor output, is determined. Result Of the 4146 individuals in the Korean dataset evaluated, 3031 had low CHD risk and 1115 had CHD high risk. The area under the receiver operating characteristic (ROC) curve of the proposed model (0.749 ± 0.010) was larger than the Framingham risk score (FRS) (0.393 ± 0.010). Conclusions The proposed NN-FCA, which utilizes feature correlation analysis, was found to be better than FRS in terms of CHD risk prediction. Furthermore, the proposed model resulted in a larger ROC curve and more accurate predictions of CHD risk in the Korean population than the FRS.
Article
Full-text available
Background Current approaches to predict cardiovascular risk fail to identify many people who would benefit from preventive treatment, while others receive unnecessary intervention. Machine-learning offers opportunity to improve accuracy by exploiting complex interactions between risk factors. We assessed whether machine-learning can improve cardiovascular risk prediction. Methods Prospective cohort study using routine clinical data of 378,256 patients from UK family practices, free from cardiovascular disease at outset. Four machine-learning algorithms (random forest, logistic regression, gradient boosting machines, neural networks) were compared to an established algorithm (American College of Cardiology guidelines) to predict first cardiovascular event over 10-years. Predictive accuracy was assessed by area under the ‘receiver operating curve’ (AUC); and sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) to predict 7.5% cardiovascular risk (threshold for initiating statins). Findings 24,970 incident cardiovascular events (6.6%) occurred. Compared to the established risk prediction algorithm (AUC 0.728, 95% CI 0.723–0.735), machine-learning algorithms improved prediction: random forest +1.7% (AUC 0.745, 95% CI 0.739–0.750), logistic regression +3.2% (AUC 0.760, 95% CI 0.755–0.766), gradient boosting +3.3% (AUC 0.761, 95% CI 0.755–0.766), neural networks +3.6% (AUC 0.764, 95% CI 0.759–0.769). The highest achieving (neural networks) algorithm predicted 4,998/7,404 cases (sensitivity 67.5%, PPV 18.4%) and 53,458/75,585 non-cases (specificity 70.7%, NPV 95.7%), correctly predicting 355 (+7.6%) more patients who developed cardiovascular disease compared to the established algorithm. Conclusions Machine-learning significantly improves accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment, while avoiding unnecessary treatment of others.
Article
We develop an algorithm which exceeds the performance of board certified cardiologists in detecting a wide range of heart arrhythmias from electrocardiograms recorded with a single-lead wearable monitor. We build a dataset with more than 500 times the number of unique patients than previously studied corpora. On this dataset, we train a 34-layer convolutional neural network which maps a sequence of ECG samples to a sequence of rhythm classes. Committees of board-certified cardiologists annotate a gold standard test set on which we compare the performance of our model to that of 6 other individual cardiologists. We exceed the average cardiologist performance in both recall (sensitivity) and precision (positive predictive value).