ArticlePDF Available

Abstract and Figures

Heart diseases are among the nation's leading couse of mortality and moribidity. Data mining teqniques can predict the likelihood of patients getting a heart disease. The purpose of this study is comparison of different data mining algorithm on prediction of heart diseases. This work applied and compared data mining techniques to predict the risk of heart diseases.After feature analysis, models by six algorithms including decision tree, neural network, support vector machine and k-nearest neighborhood developed and validated. C5.0 Decision tree has been able to build a model with greatest accuracy 93.02%, KNN, SVM, Neural network have been 88.37%, 86.05% and 80.23% respectively. Produced results of decision tree can be simply interpretable and applicable; their rules can be understood easily by different clinical practitioner. © 2015 Institute of Advanced Engineering and Science. All rights reserved.
Content may be subject to copyright.
International Journal of Electrical and Computer Engineering (IJECE)
Vol. 5, No. 6, December 2015, pp. 1569~1576
ISSN: 2088-8708 1569
Journal homepage: http://iaesjournal.com/online/index.php/IJECE
Comparing Performance of Data Mining Algorithms in
Prediction Heart Diseses
Moloud Abdar
1
, Sharareh R. Niakan Kalhori
2
, Tole Sutikno
3
, Imam Much Ibnu Subroto
4
, Goli Arji
5
1
Department of Engineering, Damghan University, Iran
2
Department of Health Information Management, Tehran University of Medical Sciences, Iran
3
Department of Electrical Engineering, Universitas Ahmad Dahlan, Yogyakarta, Indonesia
4
Department of Informatics Engineering, Universitas Islam Sultan Agung, Semarang, Indonesia
5
Health Information Management, Tehran University of Medical Sciences, Iran
Article Info
ABSTRACT
Article history:
Received Aug 4, 2015
Revised Oct 11, 2015
Accepted Oct 27, 2015
Heart diseases are among the nation’s leading couse of mortality and moribidity. Data
mining teqniques can predict the likelihood of patients getting a heart disease. The
purpose of this study is comparison of different data mining algorithm on prediction of
heart diseases. This work applied and compared data mining techniques to predict the
risk of heart diseases.After feature analysis, models by six algorithms including decision
tree, neural network, support vector machine and k-nearest neighborhood developed and
validated. C5.0 Decision tree has been able to build a model with greatest accuracy
93.02%, KNN, SVM, Neural network have been 88.37%, 86.05% and 80.23%
respectively. Produced results of decision tree can be simply interpretable and
applicable; their rules can be understood easily by different clinical practitioner.
Keyword:
C5.0 Algorithm
Data Mining
Heart Disease
Neural Network
Copyright © 2015 Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
Goli Arji,
Faculty of Allied Medical Sciences
Tehran University of Medical Sciences
Addres, Tehran,Iran
Email: Goliarji@ymail.com
1. INTRODUCTION
According to the latest statistics from the World Health Organization (WHO), heart diseases have a
great deal of attention in medical research due to its impact on human health [1]. Cardiovascular disease is
the number one cause of death in industrialized countries and not only have a major impact on individuals
and their quality of life in general, but also on public health costs and the countries’ economies. Diagnosis of
heart disease was more costly decision in diagnosis. Artificial Intelligence (AI) techniques were used vastly
in medical diagnosis.With the advancement of science, the volume of accumulated data in various fields has
been increased that it is well known the explosion of information [2]. When analyzing the accumulated data
they could reveal their hidden useful information. By performing data mining, which is a new science, we
able to extract the hidden knowledge of the data. Performing data mining reveals useful relationship existed
among data, and this rule can apply for right decision making [3],[4]. Classification is one of the subdivisions
of data mining, which acts in accordance with If-Then rule. Its purpose is to predict a variable based on other
features that are known as predictors. Neural Network , Support vector machine, and Decision Tree are
different form of classification algorithms [5-9]. The purpose of this study is comparison of different machin
learning algorithm on prediction of heart diseases.
This section summarises various technical articles on KDD process and data mining classification
techniques applied on heart diseses datasets:
ISSN: 2088-8708
IJECE Vol. 5, No. 6, December 2015 : 1569 – 1576
1570
Ram Bilas Pachori and his colleagues [10] have been studying and diagnosing heart disease using
tunable-Q wavelet obtained from heart rate signals. Since manual data entry occurs with errors and also it is
time consuming, Tunable-Q Wavelet Transform (TQWT) method is recommended in the present study.
Using the least squares support vector machine (LS-SVM), they have reported the accuracy of 96.8%,
sensitivity equal to 100%, and specificity of 93.7%.
Another study conducted by Yongqiang Lyu et al. [11] has been based on an evaluation model of
coronary artery disease by using data mining algorithm. In this research a new dynamic model, which makes
it possible to assess lifetime, suggests linear time-invariant approach to assess CHD. The model result based
on SYNTAX scores indicates a 5% possible error al [12] in this study they have used J4.8 Decision tree
method, and the reported precision was 84.1 percent.
In another study using genetic algorithm, SVM and SSVM conducted by Sumit Bhatia et al [13] in
classification of cardiac patients the features have been selected by genetic algorithm to help the SSVM in
the best mode of input selection, the obtained precision is 72.55%, while the precision obtained by GA-
SSVM has improved the result and its precision equals to 90.57%. Peter C. Austin and colleagues [14]
discuss heart malfunctions in their paper. The associated physicians have divided the patients into two groups
of "with" and "without" disease. They have found that the use of decision tree in data mining will have better
results than regression model. Using MV5, Saba Bashir et.al [15] applied MV5 algorithn and its precision
was 88.52%.
Another research done by Jasmine Nahar et al. [16] for finding relationship between heart disease
risk factors in men and women. It refers to the fact that coronary heart disease risk in women is less than
men. Doing exercise men and women can easily overcome their chest pain. One of the extracted points in this
paper introduces "Rest ECG" in both forms of normal and hyper, and "Slope being flat" is introduced as a
risk factor. However, the research resulth indicate that Rest ECG for men is considered a risk factor only in
its hyper form. The study concludes that Rest ECG should be considered as important factor to predict heart
disease in women. The research techniques including Apriori, Predictive Apriori and Tertius have compared
to each other and precision of predictive Apriori was 90%.
Kyle. Walker et.al [17] note that heart disease is
the principal cause of death in America, Texas. Therefore, the performed a study on different areas of Texas
using cluster analysis and result show that factors such as poor hygiene and economic deprivation and other
conditions affect the outbreak of disease.
In the paper presented by K. Rajeswari and colleagues [18], they study the heart disease using
Neural Network. They have studied the influence of feature selection for neural network algorithm in
identifying patients with Ischemic heart disease. 12 features have been used in the paper. The result of their
study shows that when all the features(attributes)are applied, the precision rate in training mode 89.4% and in
test mode is 82.2%. An interesting point in the conclusion is that any reduction in features entry causes the
precision decrease in both training and test modes. AV Senthil Kumar [19] applied fuzzy mechanism on
cardiac patients The calculated precision in this paper was 94.11%. Some examples of research done on
cardiac patients with different techniques have briefly mentioned below.
2. RESEARCH METHOD
The present study conducted by using data from the University of California, Irvine (UCI).This data
includes 13 features classified into 2 classes of "with" and "without" heart disease. After feature analysis,
models by six algorithms including decision tree, neural network, support vector machine and k-nearest
neighborhood developed and validated.
2.1. C5.0 Algorithm
C5.0 algorithm developed from C4.5 algorithm is one of the most important and widely used
algorithms in data mining. C4.5 itself is the extended form of ID3 algorithm. C5.0 has the ability to be
applied for classifying as a decision tree or a set of rules. Because of the understandability of their rules set,
they are preferred in many applications. The strength of the algorithm is in handling missing values or its
large number of entries, as well as the fact that less time is necessary to learn it [20], [21], [22], [23].
If S is training set and X contains n attributes so that the set S is divided into N sub categories: The
algorithm to test the features makes use of element is called the gain ratio [24].
The number of samples in the S is displayed in (S1, S2, S3,....Sn). For calculating the number of
samples that belong to Ci (the value Parameter i is [i = 1,2,3,4, ..., N]) is used in the following formula:
,. Also for calculate an instance belonging the Ci is used to the formula: ,/||
IJECE ISSN: 2088-8708
Comparing Performance of Data Mining Algorithms in Prediction Heart Diseses (Moloud Abdar)
1571
Training set can be calculated according to the formula
:
1.
,
||
log
,
||

That
 includes information can be identified by all the samples in S. After the division of S to all
its subsets, Gain ratio is calculated as follows:
2.

||


3.


4.
||
||

||
||

5.
∆

6.Speciicity
TN
FPTN
7.Sensitivity
TP
TPFP
8.Precision
TP
TPFP
9.Accuracy
TPTN
TPFNFPTN
2.2. SVM Algorithm
Support Vector Machine (SVM) is a regulatory algorithm introduced by Vapnik in 1995. The base
of the algorithm is using the precision to generalize the errors. The algorithm makes "hyperplane" and
divides the data into classes so that all samples belonging to one class will be categorized on one side and the
rest on the other side. Linear SVM Classifier is defined for the SVM classifying task, and dividing them
occurs provided that the chosen line involves the most marginalized sure [13], [25].
2.3. KNN Algorithm
K-nearest neighbor algorithm is a method for classification based on similarity to other cases. Those
close to others, are called a "neighbor". When a case is new, its distance from each of the cases in the model
is calculated. Applying this classification, specifies the case as being the nearest neighbor, which is the most
similar. Therefore, it puts the case into the group that contains the nearest neighbors. The algorithm is also
able to calculate values continuously for a target. In this situation, the average or the median target value of
the nearest neighbor is used to obtain the predicted value of new case [26].
2.4. Neural Network Algorithm
Artificial Neural Network is a data processing algorithm, originated from human brain. The system
includes a large number of tiny processors to handle data processing. The processors act in the form of an
interconnected network parallel to each other to solve a problem. Using programming knowledge, in this
networks a data structure is designed that can act as neurons. This data structure is called the neuron[27],
[28], [29], [30].
ISSN: 2088-8708
IJECE Vol. 5, No. 6, December 2015 : 1569 – 1576
1572
2.5. Accuracy Measurment
In order to evaluate the prediction rate,there are several indices such as specificity, sensitivity,
precision, and accuracy to assess to assess the models’ validity. These indices(equation 6-9) are calculated by
the cofusion matrix (Figure 1). This matrix is a useful tool for analyzing the performance of classification
method in data diagnosis or observations of various categories. The ideal state, most parts of the relevant data
with the observations should be located on the main diagonal of the matrix, and the remaining values of the
matrix are zero or near zero [31], [32].
FN= The number of positively labeled data, which falsely have been classified as "Negative".
TN= The number of negatively labeled data, which have been classified as "Correct".
TP= The number of positively labeled data, which have been classified as "Correct".
FP= The number of negatively labeled data, which falsely have been classified as "Positive".
Figure 1. Confusion matrix
2.6. Data Set
In this study 270 record with 13 features has been used [33]. Patients’ attributions applied for
modeling, their definitions and their range of values presented in Table 1.
Table 1. Patients’ attributions applied for modeling, their definitions and their range of values.
Variable Variable Definition Categories of Values
Age Age of Heart Disease [29-77]
Sex Gender of Heart Disease (1 = male; 0 = female)
CP chest pain type [1-4]
RBP resting blood pressure [94-200]
SC serum cholestoral in mg/dl [126-564]
FBS fasting blood sugar > 120 mg/dl [0-1]
RER resting electrocardiographic results [0-2]
MHRA maximum heart rate achieved [71-202]
EIA exercise induced angina [0-1]
Oldpeak ST depression induced by exercise relative to rest [0-6.2]
Slope the slope of the peak exercise ST segment [1-3]
NUM number of major vessels (0-3) colored by
flourosopy
[0-3]
Thal Normal, fixed defect, reversible defect [3, 6, 7]
Variable to be predicted Class of Heart Disease Absence (1) or presence (2) of heart disease
By means of logestic regression variables which are significantly correlated with target variable are
selected as predictor (P<=0.05).they are presented an defined in Table 2.
Table 2. variables which are significantly correlated with target variable by using logestic Regression
Variable Variable Definition Categories of Values B Wald Sig Exp
Sex Gender 1 = male; 0 = female 1.104 6.337 0.012 3.018
CP chest pain type [1-4] 0.731 13.648 0.000 2.077
RBP resting blood pressure [94-200] 0.023 5.238 0.022 1.023
EIA exercise induced angina [0-1] 1.236 10.182 0.001 3.442
NUM number of major vessels (0-3) colored by flourosopy [0-3] 1.133 25.224 0.000 3.106
Thal Normal, fixed defect, reversible defect [3, 6, 7] 0.397 16.848 0.000 1.488
IJECE ISSN: 2088-8708
Comparing Performance of Data Mining Algorithms in Prediction Heart Diseses (Moloud Abdar)
1573
3. RESULTS AND ANALYSIS
This section presents the experimental results and analysis done for this study.In this work, four
classifiers including C5.0, SVM, KNN and Neural Network. Data divided into trainset and testset (70% and
30% respectively). The training set is used to build the classifier and test set used to validate it. Model
development is conducted in two main steps including model fitness and model accuracy. To calculate the
model fitness criteria we used the data of training set; however, to compute the model accuracy
measurements, data of testing set is applied which is merely much more valuable to judge about our models
accuracy. Related results of these experiments are demonstrated in Table 3.
Table 3. Comparison on model fitness and model accuracy of six various applied machine learning
algorithms
Model Fitness (through using training set) Model Accuracy (through using testing set)
Algorithms Specificity Sensitivity Precision Training
Accuracy
Specificity Sensitivity Precision Testing
Accuracy
C5.0 89.62 % 84.61 % 85.71 % 87.50 % 90.90 % 95.23 % 90.90 % 93.02%
SVM 84.90 % 79.48 % 79.48 % 82.61 % 90.90 % 80.95 % 89.47 % 86.05%
KNN 91.50 % 79.48 % 87.32 % 86.41 % 88.63 % 88.09 % 88.09 % 88.37%
Neural
Network
91.50 % 78.20 % 87.14 % 85.87 % 86.36 % 73.80 % 83.78 % 80.23%
C5.0 Decision tree has been able to build a model with greatest accuracy since the model prediction
accuracy is 93.02%. Model accuracies obtained from other classifiers are different as this value for
KNN,SVM, Neural network have been 88.37%,86.05% and 80.23% respectively.By analyzing the variables
importance in c5, 0 model we find that attention to features such as Thal, CP and Slope are so important in
prediction of heart diseases (Figure 2).
Figure 2.variable importance for heart diseases prediction based on C5.0 model
Figures 3 and 4 are comparative ROC curves based on risk of heart diseases.This figures show two
ROC curve for logistic regression and C5.0 decision tree C5.0 has outperformed than logistic regression with
area under curve (AUC) 0.869. AUC for logistic regression was 0.835. Overall, these results of area under
curve reveals better performance of C4.5 decision tree classification algorithm.
ISSN: 2088-8708
IJECE Vol. 5, No. 6, December 2015 : 1569 – 1576
1574
Figure 3. ROC curve for logistic regression Figure 4.ROC curve for C5.0 decision tree
In a study conducted to comparing between data mining tools for heart diseases data set in [34] and
[35] variable like blood pressure, blood sugar, age and sex showed a significant association with heart
diseases. The study conducted by Jasmine Nahar and her colleagues [16] also pointed out that sex was highly
important in predicting heart disease, wheras in this study features such as resting blood pressure, sex, chest
pain type, exercise induced angina and number of major vessels played a major role.In a paper Zahra
Alizadeh Sani et al [36] have used the C4.5 and Bagging algorithms to diagnosing coronary heart disease.
For C4.5 algorithms have reported the best accuracy rate. K. Rajeswari et al [18] applied neural network on
ischemic heart disease that the accuracy obtained for training and testing was 89.4 % and 82.2 %
respectively. T. John Peter and K. Somasundaram [37] have been used hybrid attribute selection method for
prediction of heart disease.The accuracy obtained by this model was 83.62 %. Kemal Polat and Salih Gunes
[38] by use of C4.5 decision tree algorithm obtained 92.59 % accuracy.
4. CONCLUSION
In this study, KNN, SVM, C5.0, Logistic Regression and Neural Network were implemented on
UCI dataset. Based on
investigated methods, decision tree has achieved the best performance.There are
different issues that influence the performance of applied models including type of problem and type of input
data(discrete or continous).due to the fact that dataset mainly was discrete,decision tree able to handle
numerical data.Because output variable labeled with two class:’with’ and ‘without’ heart diseases,decision
tree yielded better performance than other algorithms.
Decision trees are able to generate understandable
rules and can perform classification without requiring much computation and clearly indicate that which
fields are most important for prediction or classification.
REFERENCE
[1] WHO Report, the Top 10 Causes of Death, last accessed 12/9/2013 from http://
who.int/mediacentre/factsheets/fs310/en/, ( accessed 01.04.2015).
[2] Hamid Bagheri, Abdusalam Abdullah Shaltooki. Big Data: Challenges, Opportunities and Cloud Based Solutions.
International Journal of Electrical and Computer Engineering (IJECE), 2014; 5(2): 340-343.
[3] Vijayajothi P, Tan SY, Sarinder KD, Amandeep SS. A methodological review of data mining techniques in
predictive medicine: An application in hemodynamic prediction for abdominal aortic aneurysm disease. Published by
Elsevier, Biocybernetics and Biomedical Engineering, 2014; 34(3):139-145.
[4] K.C. Tan, E.J. Teoh, Q. Yu, K.C. Goh. A hybrid evolutionary algorithm for attribute selection in data mining. Expert
Systems with Applications, 2009; 36: 8616–8630.
[5] Nikola K, Elisa C. Spiking neural network methodology for modelling classification and understanding of EEG
spatio-temporal data measuring cognitive processes. Information Sciences, 2015; 294: 565–575.
[6] F. Lotte, M. Congedo, A. Lécuyer, F. Lamarche, B.A. Arnaldi. Review of classification algorithms for EEG-based
brain–computer interfaces. J. Neural Eng. 2007; 4(2):1-25.
IJECE ISSN: 2088-8708
Comparing Performance of Data Mining Algorithms in Prediction Heart Diseses (Moloud Abdar)
1575
[7] C. Anderson, D. Peterson. Recent advances in EEG signal analysis and classification, in: R. Dybowski, V. Gant
(Eds.). Clinical Applications of Artificial Neural Networks, Cambridge University Press, UK. 2001: 175–191
(Chapter 8).
[8] C. Anderson, E. Stolz, S. Shamsunder,” Multivariate autoregressive models for classification of spontaneous
electroencephalogram during mental tasks. IEEE Trans. Biomed. Eng. 1998; 45 (3): 277–286.
[9] K. Padmavathi, K. Sri Ramakrishna. Detection of Atrial Fibrillation using Autoregressive modeling. International
Journal of Electrical and Computer Engineering (IJECE), 2015; 5(1): 64-70.
[10] Shivnarayan P, Ram BP, U. Rajendra A. Automated diagnosis of coronary artery disease using tunable-Q wavelet
transform applied on heart rate signals. Knowledge-Based Systems, 2015; 82: 1-10.
[11] Yongqiang L, Jiaming H, Yiran W, Jijiang Y , Yida T, Wenyao W, Nazim A. Dynamic evaluation model of coronary
heart disease for ubiquitous healthcare. Computers in Industry, 2015; 69: 35-44.
[12] Mai Sh, Tim T, Rob S. Using Decision Tree for Diagnosing Heart Disease Patients. AusDM'11, Proceedings of the
9-th Australasian Data Mining Conference, Ballarat, Australia, 2011.
[13] Sumit B, Praveen P, G.N. Pillai. SVM Based Decision Support System for Heart Disease Classification with Integer-
Coded Genetic Algorithm to Select Critical Features. WCECS. Proceedings of the World Congress on Engineering
and Computer Science, San Francisco, USA, October 22 – 24, 2008.
[14] Peter C. Austin, Jack V. Tu, Jennifer E. Ho, Daniel Levy, Douglas S. Lee. Using methods from the data-mining and
machine-learning literature for disease classification and prediction: a case study examining classification of heart
failure subtypes. Journal of Clinical Epidemiology, 2013; 66(4): 398-407.
[15] Saba B, Usman Q, Farhan HK, M. Younus J. MV5: A Clinical Decision Support Framework for Heart Disease
Prediction Using Majority Vote Based Classifier Ensemble. Arab J Sci Eng, 2014; 39(11): 7771-7783.
[16] Jesmin N, Tasadduq I, Kevin ST, Yi-Ping Ph Ch. Association rule mining to detect factors which contribute to heart
disease in males and females. Expert Systems with Application, 2013; 40(4): 1086–1093.
[17] Kyle E. Walker*, Sean M. Crotty. Classifying high-prevalence neighborhoods for cardiovascular disease in Texas.
Applied Geography, 2014; 57: 22-31, 2014.
[18] K.Rajeswari, V.Vaithiyanathan, T.R. Neelakantan. Feature Selection in Ischemic Heart Disease Identification using
Feed Forward Neural Networks. International Symposium on Robotics and Intelligent Sensors 2012 (IRIS 2012),
Procedia Engineering, 2012; 41: 1818–1823.
[19] A.V Senthil Kumar. Generating Rules for Advanced Fuzzy Resolution Mechanism to Diagnosis Heart Disease.
International Journal of Computer Applications, 2013; 77(11): 6-12.
[20] Quinlan J R. Induction of decision trees. Machine Learning, 1986; 4: 81–106.
[21] Quinlan J R. C4.5: Programs for machine learning. Machine,Learning, 1994; 3:235–240.
[22] Quinlan J R. Bagging, Boosting and C4.5. Proceedings of 14th National Conference on Artificial Intelligence, 1996:
725–730.
[23] Xindong W , Vipin K , J. Ross Q , Joydeep Gh, Qiang Y, Hiroshi M , Geoffrey J. M, Angus Ng, Bing L, Philip S.
Yu, Zhi-Hua Z, Michael S, David JH, Dan S. Top 10 algorithms in data mining. Springer, 2008; 14(1): 1-37.
[24] Shuonan H, Rongtao H, Xinming S, Jun W, Chengshang Y, Research on C5.0 Algorithm Improvement and the
Test in Lightning Disaster Statistics”, International Journal of Control and Automation, vol. 7, no1, pp. 181-190,
2014.
[25] Vapnik, V. N. The nature of statistical learning theory. New York:Springer, 1995.
[26]. Yazdani A, Ebrahimi T, Hoffmann U. Classification of EEG signals using Dempster Shafer theory and a K-nearest
neighbor classifier. IEEE. In: Proc of the 4th int EMBS conf on neural engineering, 2009: 327–30.
[27] Daubechies I. The wavelet transform, time-frequency localization and signal analysis. IEEE. Trans Inform Theor,
1990; 36: 961–1005.
[28] Demuth H, Beale M, Hagan M. Neural network Toolbox™ user’s guide. The MathWorks, Inc.; 2009.
[29] Leng, G., McGinnity, T.M., Prasad, G. Design for self-organizing fuzzy neural networks based on genetic
algorithms. IEEE. Trans. Fuzzy Syst. 2006; 14 (6): 755–766.
[30] Frank H. F. Leung, H. K. Lam, S. H. Ling, Peter K. S. Tam . Tuning of the structure and parameters of a neural
network using an improved genetic algorithm. IEEE. Trans. Neural Networks, 2003; 14 (1): 79–88.
[31] Alizadeh S, Ghazanfari M,”Teimorpour B .Data Mining and Knowledge Discovery”, Publication of Iran University
of Science and Technology . 2nd ed, 2011. [Persian].
[32] Han J. Kamber M.chapter 1: introduction: Data Mining: Concepts and Techniques. Morgan Kaufman Publisher. 2nd
ed, 2006.
[33] UCI Archive, Machine Learning Repository,” https://archive.ics.uci.edu/ml/machine-learning-
databases/statlog/heart/ ( accessed 02.05.2015).
[34] G.Subbalakshmi, K. Ramesh, M. Chinna Rao. Decision Support in Heart Disease Prediction System using Naive
Bayes. Indian Journal of Computer Science and Engineering (IJCSE). 2011; 2(2): 170-176.
[35] Aditya M, Prince K, Himanshu A, Pankaj K. Early Heart Disease Prediction Using Data Mining Techniques.
Computer Science & Information Technology (CS & IT). 2014: 53-59.
[36] Roohallah A, Jafar H, Zahra A, Hoda M, Reihane B, Asma Gh, Fahime Kh, Fariba A. Diagnosing Coronary Artery
Disease via Data Mining Algorithms by Considering Laboratory and Echocardiography Features. Official Journal of
Rajaie Cardiovascular Medical and Research Center. 2013; 2(3): 133-139.
[37] T. John Peter, K. Somasundaram. Study and Development of Nevel Feature Selection Frmework for Heart Disease
Preciction. International Journal of Scientific and Research Publications. 2012; 2(10): 1-7.
[38] Kemal Polat, Salih Gunes. A hybrid approach to medical decision support systems: Combining feature selection,
fuzzy weighted pre-processing and AIRS. computer methods and programs in biomedicine. 2007; 88 :164–174.
ISSN: 2088-8708
IJECE Vol. 5, No. 6, December 2015 : 1569 – 1576
1576
BIOGRAPHIES OF AUTHORS
Moloud Abdar. He received his Undergraduate (Bachelor) degree in Computer
Engineering (Software Engineering) from the University of Damghan, Iran in 2015. He
has more than 7 conference and journal papers
about the Data Mining. Currently, his
research interests include data mining, web and text mining
, Artificial Intelligence
and Image
Processing.
Goli Arji. She is PHD student in health information management, Tehran
university of medical science. She is interested in data mining, fuzzy logic,
clinical decision support system, telemedicin and consumer health informatics.
... According to World Health Organization (WHO), studies related to cardiac disease were at the forefront of scholarly attention in 2017 owing to the detrimental ramifications it poses on the health of populations [1]. Each year, 17.9 million deaths occur due to cardiovascular diseases (CVDs). ...
... It is noteworthy that certain articles have addressed more than one type of cardiac disease. Classification Heart Disease 2021 [32] Prediction Heart Disease 2021 [33] Prediction Heart Disease 2021 [34] Classification Heart Disease 2020 [35] Classification Heart Disease 2020 [36] Prediction Heart Disease 2020 [37] Prediction Heart Disease 2020 [38] Classification and Diagnosis Heart Disease 2020 [39] Prediction Heart Disease 2019 [40] Prediction Heart Disease 2019 [41] Classification Heart Disease 2018 [42] Prediction Heart Disease 2016 [43] Primary Estimation Cardio Metabolic Risk 2016 [22] Prediction Coronary Artery Disease 2016 [44] Prediction Heart Disease 2015 [45] Prediction and Classification Heart Disease 2015 [46] Classification Cardiovascular Disease 2015 [47] Prediction Cardiovascular Disease 2015 [48] Prediction Heart diseases 2015 [1] Classification Cardiovascular Disease 2014 [49] Classification Heart Disease 2014 [50] Clustering Diabetes and Heart Disease 2014 [51] prediction Heart Disease 2013 [52] Analysis and Prediction Heart Disease 2013 [53] Primary Estimation Cardio Metabolic Risk 2013 [23] Classification Heart Disease 2013 [54] Classification Coronary Heart Disease 2013 [55] Prediction and Classification Heart Disease 2012 [56] prediction Heart Disease 2012 [57] Classification Coronary Heart Disease 2012 [58] Prediction Heart Disease 2012 [59] Prediction Heart Disease 2012 [60] Prediction Heart Disease 2012 [61] Prediction Coronary Artery Disease 2012 [62] Prediction Heart Disease 2012 [63] Diagnosis and Prediction Heart Disease 2011 [7] Prediction Heart Disease 2011 [64] Analysis and Prediction Coronary Heart Disease and Heart Attack 2010 [65] Classification Heart Disease 2010 [66] Prediction Heart Disease, Thyroid, Diabetes, and Hepatitis 2010 [67] Prediction Heart Disease 2010 [68] Classification Heart Disease (coronary artery disease) 2008 [69] Prediction Coronary Heart Disease 2008 [70] Prediction Heart Disease 2008 [71] ...
Article
Full-text available
Introduction: Heart disease is a major public health concern with millions of reported deaths annually. Data mining techniques have received attention in recent years as a tool aiding diagnosis and prediction of heart disease cases. This systematic review examines the application of data mining methods to cardiac disease diagnosis in order to identify specific types of heart-related disease that are diagnosed using data mining techniques as well as the most successful data mining methods.Material and Methods: This study involved a systematic review of IEEE, Science Direct, Google Scholar, Web of Science, Scopus and MEDLINE databases from 2008 until April 2023. Inclusion criteria were original papers that used data mining methods for heart disease diagnosis. Non-English papers, those without full text, studies conducted on animals, and other types of papers (conference abstracts and letters) were excluded from the study. All the retrieved references were then assessed by title and abstract according to PRISMA, after which full texts of relevant articles were analyzed. The final sample comprised of 47 articles.Results: Various classification methods have been utilized to diagnose heart-related disease using different mining tools, with genetic neural network data mining method having the highest accuracy among the studied techniques. Results show that predicting cardiac disease is the most commonly performed task. The demographic, bio-clinical, personal and exercise-related attributes, as well as other features used for classification were identified. The findings suggest that data mining methods hold great potential for detecting and preventing heart disease on both individual and population scales.Conclusion: The study findings have implications for the prevention and treatment of cardiac disease, especially in high-risk individuals. Data mining methods can be widely applied to detect and prevent heart disease on a population scale, as well as supporting decisions for the most suitable treatment for individual patients to prevent death and reduce treatment costs.
... Therefore, the timely detection of heart disease is of utmost importance and typically relies on a combination of physical examinations, internal symptoms, and clinical indicators, including various pathological and functional factors. However, relying solely on these factors can sometimes lead to delays in diagnosis, potentially resulting in misinterpretations and unpredictable health outcomes [2]. ...
... It effectively maximizes class segregation by utilizing either Gini impurity or entropy as its criteria. In contrast, Logistic Regression estimates the coefficients of the features and models the probability of heart disease occurrence using a logistic function [2]. ...
Article
Full-text available
Cardiovascular disease, which encompasses various conditions affecting the heart and blood vessels, is a significant global health concern and a primary cause of mortality on a global scale. These ailments have a profound impact on heart function, blood circulation, and overall well-being. This investigation introduces a novel hybrid model that effectively combines the strengths of Decision Tree (DT), Logistic Regression (LR), and Artificial Neural Network (ANN) algorithms, thereby significantly augmenting the accuracy of heart disease prediction. The model demonstrates exceptional performance, boasting an impressive accuracy rate of 88%, which surpasses the individual accuracies of DT at 99%, LR at 80%, and ANN at 86%. Furthermore, the hybrid approach excels in precision, recall, and F1-score metrics, thereby substantiating its reliability and robustness as a predictive tool for heart disease. This research underscores the advantages of incorporating multiple algorithms in order to create a more efficient predictive model for cardiovascular health diagnostics.
... 13,14 Machine learning (ML) approaches have gained popularity in prediction purposes in many domains, such as healthcare. 15,16 So far, previous studies have used prediction models in various medical fields, such as heart diseases, cancer, etc. [17][18][19] One category of ML technique is deep learning (DL), which uses artificial neural networks (ANN) to acquire associations between features and discover unknown patterns in sophisticated data such as images. 20,21 Although the ML algorithms can give us insight into the optimal predictive performance based on the structured dataset, the DL can perform this task by using unstructured data such as videos, images, sounds, etc. 22 So, based on the data leveraged for analysis, each of them can be insightful for prediction purposes. ...
Article
Full-text available
Background and aim: Due to changes in lifestyle, bariatric surgery is expanding worldwide. However, this surgery has numerous complications, and early identification of these complications could be essential in assisting patients to have a higher-quality surgery. Machine learning has a significant role in prediction tasks. So far, no systematic review has been carried out on leveraging ML techniques for predicting complications of bariatric surgery. Therefore, this study aims to perform a systematic review for better prediction insight. Materials and methods: This review was conducted in 2023 based on Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). We searched scientific databases using the inclusion and exclusion criteria to obtain articles. The data extraction form was used to gather data. To analyze the data, we leveraged the narrative synthesis of the quantitative data. Results: Ensemble algorithms outperformed others in large databases, especially at the national registries. Artificial Neural Networks (ANN) performed better than others based on one-single-center database. Also, Deep Belief Networks (DBN) and ANN obtained favorable performance for complications such as diabetes, dyslipidemia, hypertension, thrombosis, leakage, and depression. Conclusion: This review gave us insight into using ensemble and non-ensemble algorithms based on the types of datasets and complications.
... It essentially uses distance functions to allocate and group the nearest data. Euclidean, and Manhattan, distance are the available functions [6]. The functions will determine the separation between the points, and if the points are close together, they will be grouped. ...
Article
Abstract: Heart disease is the term used to describe any ailment that is unavoidable and damages the heart. Machine learning (ML) techniques have been used by computer sys�tems to support diagnosis in the field because of the vast amount of electronic health data that is now available. The data in this study is classified using a multi-layered perceptron (MLP) that was trained using a back-propagation (BP) artificial neural network (ANN). Heart diseases with or without heart attacks are classified using the MLP. By repeatedly going through layers of functions, ANN may identify patterns in the data and subse�quently build a model. This study spans more than a thousand epochs and uses sigmoid activation functions. Experiments with different learning rates and neuron number values produced the greatest results. The results showed 25 neurons and a 0.25 learning rate with a high accuracy of 80.66%. It is discovered that ANN can be applied to categorize cases of heart disease.
... They are focused on developing cost cutting strategies by data mining techniques (Repaka, Ravikanti, and Franklin 2019).The proposed study helps the medical practitioners in diagnosing heart disease in an accurate way, which assists in identifying high risk of heart attacks (Saw et al. 2020 Several works have demonstrated that the performance of SVM is poor and provides less accuracy in prediction of heart disease. A study by (Abdar et al. 2015) compares the accuracy of various mining classification algorithms in predicting heart disease. It is important to analyse and compare the various classification algorithms that provide better accuracy. ...
... A new model has been proposed by Mai Shouman, et al. [28] to diagnose the heart disease. Moloud Abdar, et al. [1] compared the different data mining techniques used for heart disease prediction. After analyzing the features, prediction models using algorithms such as C5.0, Neural Network, Support vector Machine (SVM), K-Nearest Neighborhood (KNN) and Logistic Regression have been developed and validated. ...
... In the study of the coronary artery disease and the Cardio Vascular Diseases [11] according to the year wise advancements the techniques and the approaches used for the detection of the Disease are discussed in detail. With the advancements in the technologies and the approaches in the field of Big Data and Machine Learning over the years more accurate and efficient detection is possible for decreasing the mortality rate. ...
Article
Full-text available
Myocardial Infarction stands as a prevalent and severe ailment on a global scale. It ranks among the primary contributors to the world's highest mortality rates. Sometimes a Myocardial Infarction can show no symptoms at all. It is a disease that occurs when there is less supply of blood to the heart. In this research paper the main aim is to evaluate various techniques of Machine Learning to predict accurately the disease and the adverse effect of the risk factors. The different ML Techniques are applied on the dataset collected which includes 350 entries which includes some MI patients and some non-MI patients including men and women. The dataset is trained and then the Ensemble Classifiers are applied that increases prediction performance. The Ensemble Classifiers helps to improve gender specific prediction precision by merging classifier prediction.
Preprint
Full-text available
Mental health remains a critical concern in China, particularly for patients with severe psychiatric disorders in rural areas. This study aimed to analyze the health status of Chinese patients with severe psychiatric disorders using the CART algorithm. Mental health is a critical facet of overall well-being, yet understanding and addressing the complexities of severe psychiatric disorders in rural China presents unique challenges. This comprehensive study employs advanced analytical techniques to explore and illuminate multifaceted aspects of mental health, with a specific focus on patients with severe psychiatric disorders, healthcare professionals, and rural residents in China. This study's novelty is in its creative utilization of the CART algorithm to assess the well-being of Chinese individuals grappling with severe psychiatric conditions. This cutting-edge data analysis method opens up a potential path for enhancing mental health strategies and optimizing resource allocation. In essence, this study offers a holistic examination of mental health in rural China, encompassing various dimensions, from predictive elements to the challenges faced by healthcare professionals. Its findings aim to inform the development of effective mental health strategies and resource allocation, enhancing the overall well-being of individuals grappling with severe psychiatric disorders in this region. The government and relevant authorities are recommended to ensure their physical and mental health. The lack of mental health information in rural China also negatively impacts patients' behavior in seeking and using medical services. Thus, measures to promote different forms of mental health education are proposed. In conclusion, the treatment of patients with severe mental illness is crucial to the physical and mental health of millions of people.
Article
Full-text available
p class="Abstract">A ‎ atrial fibrillation (AF) is the arrhythmia that commonly causes death in the adults. We measured AR coefficients using Burg’s method for each 15 second segment of ECG. These features are classified using the different statistical classifiers: kernel SVM and KNN classifier. The performance of the algorithm was evaluated on signals from MIT Physionet database.. The effect of AR model order and data length was tested on the classification results. This method shows better results can be used for practical use in the clinics. ‏ </p
Conference Paper
Full-text available
The successful application of data mining in highly visible fields like e-business, marketing and retail has led to its application in other industries and sectors. Among these sectors just discovering is healthcare. The Healthcare industry is generally “information rich”, but unfortunately not all the data are mined which is required for discovering hidden patterns & effective decision making .Discovery of hidden patterns and relationships often goes unexploited. Advanced data mining modeling techniques can help remedy this situation. This research paper intends to use data mining Classification Modeling Techniques, namely, Decision Trees, Naïve Bayes and Neural Network, along with weighted association Apriori algorithm and MAFIA algorithm in Heart Disease Prediction. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood of patients getting heart disease.
Article
Full-text available
Data Mining refers to using a variety of techniques to identify suggest of information or decision making knowledge in thedatabase and extracting these in a way that they can put to use in areas such as decision support, predictions, forecasting and estimation. The healthcare industry collects huge amounts of healthcare data which, unfortunately, are not “mined” to discover hidden information for effective decision making. Discovering relations that connect variables in a database is the subject of data mining. This research has developed a Decision Support in Heart Disease Prediction System (DSHDPS) using data mining modeling technique, namely, Naïve Bayes. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood of patients getting a heart disease. It is implemented as web based questionnaire application. It can serve a training tool to train nurses and medical students to diagnose patients with heart disease.
Article
Full-text available
In this paper we review classification algorithms used to design brain–computer interface (BCI) systems based on electroencephalography (EEG). We briefly present the commonly employed algorithms and describe their critical properties. Based on the literature, we compare them in terms of performance and provide guidelines to choose the suitable classification algorithm(s) for a specific BCI.
Article
Application of computer data processing in medical domain has been witnessing several significant revolutions in recent days. Particularly data mining has played an important role in knowing the hidden patterns of clinically relevant data sets. This technique can be employed for the diagnosis of heart attack. This is due to new information on the nature of diseases and their diagnostic criteria have been increasing at a tremendous level. Nevertheless, the data on certain illnesses always is in heterogeneous in nature. It is highly difficult to interpret such a voluminous data to arrive at a strong conclusion. Hence an organized data is mandatory. Diagnosis to carry out a suitable treatment is a difficult task for some fatal diseases. Doctor requires a precise diagnosis out of many clinical reports of the individual concerned. Therefore automation of data and mining would be advantageous for a medical professional to initiate tre atment regime. Further computer machine processing neglects error and extraordinary time consumption for prediction. Data mining techniques enhance a comparative knowledge base and user friendly working environment. It helps to get the accuracy of the heart disease diagnosis.
Article
The aim of this study is to combine the neural networks (ANNs) and Fuzzy Logic (FL) to make a powerful tool to diagnosis heart disease. By combining the Fuzzy inference system and neural network, the input values are passed through the input layer (by input membership function) and the output could be seen in output layer (by output membership functions). Training involves iterative adjustment of parameters of the adaptive neuro-fuzzy inference system using a hybrid learning procedure to diagnosis the heart disease. This mechanism presents five layer, each layer has its own nodes. Layer 1 had the input variables with membership function. T-norm operator that perform the AND operator can be used in layer 2. The sum of all rules firing strengths are assigned in layer 3. The nodes in layer 4 are adaptive and perform the consequent of the rules. Single node computes the overall output in layer 5. The proposed method is tested with Cleveland heart disease dataset. The ANFIS approach is implemented using MATLAB. The proposed mechanism can work more effectively for diagnosis of heart disease and also improves the accuracy. The result of the proposed methods is compared with earlier method using accuracy as metrics.
Article
Two different procedures are studied by which a rrequency analysis of a time-dependenl signal can be effected, locally in lime. The lirst procedure is the short-time or windowed Fourier transform, the second is the "wavelet transform," in which high frequency components are sludied wilh sharper time resolution than low frequency components. The similarities and the differences between these two methods are discussed. For both scbemes a detailed study is made of Ibe reconslruetion method and ils stability, as a function of the chosen time-frequency density. Finally the notion of "time-frequency localization" is made precise, within this framework, by two localization theorems.
Chapter
In the history of research of the learning problem one can extract four periods that can be characterized by four bright events: (i) Constructing the first learning machines, (ii) constructing the fundamentals of the theory, (iii) constructing neural networks, (iv) constructing the alternatives to neural networks.