ArticlePDF Available

An evolution based hybrid approach for heart diseases classification and associated risk factors identification

Authors:

Abstract and Figures

With the advent of voluminous medical database, healthcare analytics in big data have become a major research area. Healthcare analytics are playing an important role in big data analysis issues by predicting valuable information through data mining and machine learning techniques. This prediction helps physicians in making right decisions for successful diagnosis and prognosis of various diseases. In this paper, an evolution based hybrid methodology is used to develop a healthcare analytic model exploiting data mining and machine learning algorithms Support Vector Machine (SVM), Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The proposed model may assist physicians to diagnose various types of heart diseases and to identify the associated risk factors with high accuracy. The developed model is evaluated with the results reported by the literature algorithms in diagnosing heart diseases by taking the case study of Cleveland heart disease database. A great prospective of conducting this research is to diagnose any disease in less time with less number of factors or symptoms. The proposed healthcare analytic model is capable of reducing the search space significantly while analyzing the big data, therefore less number of computing resources will be consumed.
Content may be subject to copyright.
An evolution based hybrid approach for heart diseases classification and
associated risk factors identification.
Saman Iftikhar1, Kiran Fatima2, Amjad Rehman1, Abdulaziz S Almazyad4, Tanzila Saba3*
1College of Computer and Information Systems, Al-Yamamah University, Riyadh, 11512, Saudi Arabia
2Departent of Computer Science, National University of Computer and Emerging Sciences, Islamabad, Pakistan
3College of Computer and Information Sciences, Prince Sultan University Riyadh, 11586 Saudi Arabia
4College of Computer and Information Sciences King Saud University Riyadh Saudi Arabia
Abstract
With the advent of voluminous medical database, healthcare analytics in big data have become a major
research area. Healthcare analytics are playing an important role in big data analysis issues by
predicting valuable information through data mining and machine learning techniques. This prediction
helps physicians in making right decisions for successful diagnosis and prognosis of various diseases. In
this paper, an evolution based hybrid methodology is used to develop a healthcare analytic model
exploiting data mining and machine learning algorithms Support Vector Machine (SVM), Genetic
Algorithm (GA) and Particle Swarm Optimization (PSO). The proposed model may assist physicians to
diagnose various types of heart diseases and to identify the associated risk factors with high accuracy.
The developed model is evaluated with the results reported by the literature algorithms in diagnosing
heart diseases by taking the case study of Cleveland heart disease database. A great prospective of
conducting this research is to diagnose any disease in less time with less number of factors or symptoms.
The proposed healthcare analytic model is capable of reducing the search space significantly while
analyzing the big data, therefore less number of computing resources will be consumed.
Keywords: Healthcare analytics, Optimization algorithms, Heart diseases classification.
Accepted on December 15, 2016
Introduction
Healthcare information systems are becoming important as
they develop relational databases enriched with valuable
informative medical data [1-3]. Now-a-days, data mining and
machine learning algorithms are being used for analyzing and
predicting these huge volume databases. Clinical decision
support systems are developed as a result of these algorithms
based methodologies. In this way, the risks associated with
incorrect diagnostic decisions and the cost associated with
wrong clinical medication of patients will be minimized [4,5].
The primary goal of healthcare analytics is to develop different
predictive models in various medical domains like heart
diseases, cancers, diabetes and other complex diseases [6-8].
Different data mining techniques are being used in healthcare
informatics such as: supervised learning technique,
unsupervised learning technique and Feature selection
techniques [9,10]. These techniques are extensively used in
numerous real world applications [11,12]. In supervised
learning technique such as classification technique, a model is
developed to classify different data classes and then this model
is used to predict the class of a feature whose class label is
unknown [13]. Training and testing data sets are used to
classify the objects in different classes. Another data mining
technique known as clustering is an unsupervised learning
technique which forms clusters of data objects with unknown
class labels on account of some similarity measure [14]. The
formed clusters are then used to derive association rules which
guide the classification of test instances. Feature selection is a
data mining technique used to select a subset of attributes that
considered to be most fit for making an optimal diagnosis
decision based on some selection criteria.
There are various algorithms in use for supervised learning
techniques and unsupervised learning technique. These
algorithms are being applied in healthcare analytics [15] with
more or less efficacy, such as Support Vector Machines (SVM),
Artificial Neural Networks (ANN), decision trees, Bayesian
networks, Support Feature Machines (SFM) and regression
analysis. This research presents a hybrid approach using a
supervised learning model based on a well known classifier
SVM and evolutionary optimization techniques (Genetic
Algorithm (GA) and Particle Swarm Optimization (PSO)
[16,17]. The results are evaluated with algorithms reported in
literature and have shown considerably improved accuracy of
ISSN 0970-938X
www.biomedres.info
Biomed Res- India 2017 Volume 28 Issue 8 3451
Biomedical Research 2017; 28 (8): 3451-3455
more than 88%. By using the proposed model for the
classification and diagnosis of a disease, big data analysis
issues may be resolved to a great extent. This paper is
organized in five sections; state of art is presented in section 2,
proposed methodology is presented in section 3, experiments
and results are discussed in section 4 and conclusion in section
5.
Proposed Methodology
Data collection
The data is collected from heart disease databases available at
UCI machine learning data set online repository as the
Cleveland heart disease patient's datasets. The data mining
problem to be solved is a multiclass problem. The dataset is
being segregated into five classes, such as: 0 corresponding to
absence of heart disease and 1,2,3,4 corresponding to four
different types of heart diseases (Acute Myocardial Infarction
(AMI), Percutaneous Coronary Intervention (PCI),
Percutaneous Tran luminal Coronary Angioplasty (PTCA) and
Coronary Artery Bypass Graft (CABG)). The most imperative
13 attributes and class label are given in Table 1.
Table 1. Data set attributes.
Features Description Value
age Age 16-80
sex Gender 1: Male
0 : Female
cp Chest Pain Type 1: typical angina
2: typical type angina
3: non-angina pain
4: asymptomatic
trestbps Trest Blood Pressure mm Hg on admission to the
hospital
chol Serum Cholesterol (mg/dl)
fbs Fasting Blood Sugar 0: <120 mg/dl
1: >120 mg/dl
restecg Resting electrographic
results
0: normal
1: having ST-T wave
abnormality
2: showing probable or
definite left ventricular
hypertrophy
thalach Maximum heart rate
achieved
exang Exercise induced angina 0 = no
1 = yes
oldpeak St depression induced by
exercise relative to rest
Slope Slope of the peak exercise
ST segment
1: unsloping
2: flat
3: downsloping
Ca Number of major vessels
colored by floursopy
0–3
thal 3: normal
6: fixed defect
7: reversible defect
Num Predicted attribute 0,1,2,3,4
Heart diseases classification through SVM
SVMs build a separating hyperplane represented linearly in
space of training points. The decision function for classifying
linearly separable and non-separable data points with respect to
the optimal hyperplane is shown in Figure 1. In this research,
experiments are performed to use SVM classifier with three
kernel functions and three methods for finding separating
hyperplane. Following are the steps of implementation for
heart diseases classification through SVM.
1. Load the data set in variable 'data' acquired from UCI
repository
2. Extract the classes in variable ' labels ' mentioned as 0 (no
disease) and 1,2,3,4 (other types of heart diseases)
3. Perform training and test sets division through 10-fold
cross validation and save the resulting train and test sets in
variables 'train' and 'test'
4. Build the SVM structure through Matlab Bioinformatics
toolbox builtin function 'svmtrain'
5. SVMStruct =
svmtrain(data(train,:),labels(train),'Kernel_Function',
kernel_func, 'Method', hyper_method, 'boxconstraint',bx)
6. Classify the test data instances through built in function
'svmclassify' svmclassify(SVMStruct,data(test,:))
7. Evaluate the classifier performance in terms of correct rate
(accuracy)
Heart diseases classification through GA-SVM
Genetic Algorithm (GA) is a computational model inspired by
evolution and has a basic motivation from the laws of natural
selection and genetics. This algorithm preserves significant
information through the repetitive application of genetic
operations on evolved individuals. In this way, the evolutionary
algorithm selects “N best features for classification out of total
M features”. Following main steps are performed to implement
GA in combination with SVM for heart diseases classification.
1. Generate initial random population of binary strings
(chromosomes) to represent presence (1) or absence (0) of
features
2. Perform genotype to phenotype conversion of all the
chromosomes to get desired features set
3. Get fitness value (classification accuracy) of random
individuals using phenotypes through SVM
4. Perform genetic operation 'tournament selection' to select
parent(s) individuals based on fitness value for application of
recombination operators crossover and mutation
Iftikhar/Fatima/Rehman/Almazyad/Saba
3452 Biomed Res- India 2017 Volume 28 Issue 8
5. Apply one point crossover and bit-wise uniform mutation on
selected parents individuals
6. Remove repetitive individuals from evolved population
7. Repeat step 2 to step 6 till stagnation (no improvement in
fitness value) for a specified number of generations
8. Return optimal feature subset with highest fitness value
Figure. 1. Maximum margin and optimal hyperplane of linear and non-linear SVM.
Heart diseases classification through PSO-SVM
Particle Swarm Optimization (PSO) algorithm is another
population-based optimization algorithm inspired from the
social behavior of bird flocking or fish schooling. PSO is
widely used in artificial intelligence applications for solving
different optimization problems. In PSO, all the potential
candidate solutions, called particles, trace the current optimum
particles and move through the search space in order to find the
global best solution.
From the heart diseases data set, each particle (data point) is
taken as a point in N dimensional problem space. Every
particle holds a position in terms of its coordinates in search
space which is to be updated with a velocity. Each particle's
position and velocity is updated in coordination with the best
particle based on fitness value (classification accuracy) that has
attained through SVM so far by that particle.
Experiments and Results
Parameters setting
In this research, the parameters of SVM kernel functions are
tuned in order to get optimal results. Three kernel functions
(Linear, Polynomial and Gaussian Radial Basis Function-RBF)
of SVM classifier are explored. Three methods for finding the
separating hyperplane of SVM are explored including
Quadratic Programming (QP) algorithm, Least Square (LS)
algorithm and Sequential Minimal Optimization (SMO)
algorithm. The values of cost parameter or box constraint,
order of polynomial function and RBF scaling parameter sigma
are found through grid search algorithm.
The parameters setting used for GA in our experiments is as
follows:
No. of Generations = 50, Population Size = 50, Chromosome
Length = 13,
Selection mechanism - Tournament (size = 7), Mutation rate =
0.01, Crossover rate = 0.8.
The parameters setting used for PSO algorithm in our
experiments is as follows:
No. of generations = 50, Population Size = 50, Particle Length
= 13.
Classification results
In this research, a number of experiments are performed in
order to explore the optimal combination of SVM parameters
in simple mode and in hybrid mode with GA and PSO. The
four test scenarios with highest classification results are only
given and explained.
In first two scenarios, SVM is explored in simple mode for
classification without feature selection.
Scenario 1: SVM with Least Square hyperplane finding
method and Linear kernel function (SVM-LS-Linear) using 10-
fold cross validation for training and testing divisions achieved
mean accuracy of 87.56% over 5 States of heart disease by
using 13 attributes (symptoms).
Mean Accuracy over 5 States: 87.56%
For type 0 - Individual Accuracy: 83.17%
For type 1 - Individual Accuracy: 81.84%
For type 2 - Individual Accuracy: 88.23%
For type 3 - Individual Accuracy: 88.89%
For type 4 - Individual Accuracy: 95.64%
A hybrid approach for heart diseases classification and associated risk factors identification
Biomed Res- India 2017 Volume 28 Issue 8 3453
Scenario 2: SVM with Sequential Minimal
Optimization hyperplane finding method and RBF kernel
function (SVM-SMO-RBF) having = 12 using 10-fold cross
validation for training and testing divisions has achieved mean
accuracy of 87.95% over 5 States of heart disease by using 13
attributes (symptoms).
Mean Accuracy over 5 States: 87.95%
For type 0 - Individual Accuracy: 85.83%
For type 1 - Individual Accuracy: 81.84%
For type 2 - Individual Accuracy: 88.24%
For type 3 - Individual Accuracy: 88.23%
For type 4 - Individual Accuracy: 95.63%
In other two scenarios, SVM is explored in hybrid mode for
classification with feature selection using GA and PSO
algorithm.
Figure 2. Classification Scores obtained through GA-SVM over 50
generations.
Figure 3. Detail of Fitness Values obtained through GA-SVM over 50
generations.
Scenario 3: The GA is used for optimal feature selection
where the quality (fitness) of a feature set is evaluated through
SVM classifier. The highest classification results are obtained
when SVM is used with RBF kernel function (=12) and SMO
hyperplane calculation method. The mean accuracy over 5
States achieved through GA and SVM hybrid approach is
88.10%.
The experiment with GA has shown the following scores
(classification accuracy) for 50 generations as in Figure 2
where the 31st generation has given the best fitness value. The
detailed graphs of GA individuals and fitness values obtained
through GA-SVM over 50 generations as shown in Figure 3.
The final chromosome of GA representing the best individual
with highest fitness value is shown in Figure 4. This optimal
subset of features is showing the set of attributes which are at
high risk while diagnosing heart disease.
Figure 4. GA Chromosome showing High Risk Features for Heart
Diseases Classification
The reduced feature set obtained through GA includes these
attributes: 3 - Chest pain type, 7 - resting electrographic results,
8 - thalach (maximum heart rate achieved) , 9 - exercise
induced angina, 11 - slope (the slope of the peak exercise ST
segment), 12 - ca (number of major vessels colored by
floursopy) and 13 - Thal.
Scenario 4: The PSO algorithm is also employed for the
selection of best feature set. The quality (fitness) of a feature
set is evaluated through SVM classifier. In this way
classification precision is attained by the feature subset
(particle) as its final best point. The highest classification
accuracy of 88.24% is obtained with RBF kernel function of
SVM (Sigma value = 15) and SMO hyperplane calculation
method.
The PSO global best particle β (final best point in search
space) with high risk features is shown in Figure 5. Digit '1' in
the figure represents the high risk features: i.e., 3, 7, 8, 9, 11,
12, 13.
Figure 5. The Global Best Particle β of PSO Representing High Risk
Features.
Comparison of heart diseases classification techniques
Overall results of the proposed hybrid approach and other
different hybrid and non-hybrid techniques for heart diseases
classification and associated risk factors identification are
given in Table 2. The classification results achieved by the
proposed approach in four scenarios (SVM-LS-Linear, SVM-
SMO-RBF, GA-SVM-SMO-RBF and PSO-SVM-SMO-RBF)
have outperformed the other techniques for heart diseases
classification [8-10].
Table 2. Comparison of classification results of different techniques
for multiclass heart disease problem.
Classification Algorithms Accuracy (%)
SVM-Linear [18] 86.62
SVM-Polynomial [19] 83.9
Iftikhar/Fatima/Rehman/Almazyad/Saba
3454 Biomed Res- India 2017 Volume 28 Issue 8
GA-SVM [19] 72.55
SFM [20] 83.31
OCSFM [20] 86.73
NaïveBayes [18] 78.93
ANN [18] 86.04
SVM-LS-Linear (Proposed simple approach) 87.56
SVM-SMO-RBF (Proposed simple approach) 87.95
GA-SVM-SMO-RBF (Proposed hybrid approach) 88.10
PSO-SVM-SMO-RBF (Proposed hybrid approach)
Conclusion
In this paper, an evolution based hybrid approach is proposed
which exploits SVM classifier, GA and PSO optimization
techniques for the classification of multiple states of heart
disease. GA and PSO algorithm are used to select less but
discriminative features in order to significantly improve SVM
classification accuracy. The population-based evolutionary
algorithms GA and PSO found the same reduced optimal
feature set and best final point respectively. Therefore, the
search space for identifying the best solution during heart
disease data analysis is reduced that in turn reduced the
computing rescores consumption. Moreover, a patient visiting
a cardiologist for the examination of a heart disease, he may be
examined for the discovered optimal subset of features
(symptoms) in less time with less effort. The optimal feature
set is obtained with high accuracy and shown high risk
associated with the presence of a particular type of heart
disease in a patient.
References
1. Rad AE, Mohd Rahim MS, Rehman A, Altameem A, Saba
T. Evaluation of current dental radiographs segmentation
approaches in computer-aided applications. IETE Technical
Review 2013; 30: 210-222.
2. Norouzi A, Rahim MSM, Saba A, Rada T, Rehman AE,
Uddin M. Medical image segmentation methods,
algorithms, and applications. IETE Tech Rev 2014; 31.
3. Saman I, Sharifullah K, Zahid A, Muhammad K.
GenInfoGuard-A Robust and Distortion-Free Watermarking
Technique for Genetic Data. PloS one 2015.
4. Fatima K, Arooj A, Majeed H. A new texture and shape
based technique for improving meningioma classification.
Microscopy Res Tech 2014; 77: 862-873.
5. Saba T, Rehman A, Sulong G. An intelligent approach to
image denoising. J Theor Appl Informa Technol 2010; 17:
32-36.
6. Saba T, Almazyad AS, Rehman A. Online versus offline
Arabic script classification. Neural Computing Appl 2016;
27: 1797-1804.
7. Jadooki S, Mohamad D, Saba T, Almazyad AS, Rehman A.
Fused features mining for depth-based hand gesture
recognition to classify blind human communication. Neural
Comput Appl 2016.
8. Saba T, Rehman A, Sulong G. Cursive script segmentation
with neural confidence. Int J Innovat Comput Informa
Control (IJICIC) 2011; 7: 1-10.
9. Saba T, Rehman A. Machine Learning and Script
Recognition. Lambert Academic publisher 2012; 37-39.
10. Husham A, Alkawaz H, Saba M, Rehman T, Alghamdi AS.
Automated nuclei segmentation of malignant using level
sets. Microscopy Res Technique 2016.
11. Muhsin ZF, Rehman A, Altameem A, Saba A, Uddin M.
Improved quadtree image segmentation approach to region
information. Imaging Sci J 2014; 62: 56-62.
12. Rehman A, Saba T. Neural network for document image
preprocessing. Artificial Int Rev 2014; 42: 253-273.
13. Rehman A, Saba T. Off-line cursive script recognition:
current advances, comparisons and remaining problems.
Art Int Rev 2012; 37: 261-268.
14. Saba T, Rehman A, Gulong S. Improved statistical features
for cursive character recognition. Int J Innovat Comput
Informa Control (IJICIC) 2011; 7: 5211-5224.
15. Saba T, Al-Zahrani S, Rehman A. Expert system for offline
clinical guidelines and treatment. Life Sci J 2012; 9:
2639-2658.
16. Rad AE, Rahim MSM, Rehman A, Saba T. Digital dental
X-ray database for caries screening. 3D Research 2016; 7:
1-5.
17. Younus ZS, Mohamad D, Saba T, Alkawaz MH, Rehman
A, Al-Rodhaan M, Al-Dhelaan A. Content-based image
retrieval using PSO and k-means clustering algorithm,
Arabian J Geosci 2015; 8: 6211-6224.
18. Sundar NA, Latha NPP, Chandra R. Performance analysis
of classification data mining techniques over heart disease
database. Int J Curr Eng Technol 2012; 2: 470-478.
19. Bhatia NN, Chow G, Timon SJ, Watts HG. Diagnostic
modalities for the evaluation of pediatric back pain: a
prospective study. J Pediatr Orthop 2008; 28: 230-233.
20. Adeli A, Neshat M. A fuzzy expert system for heart disease
diagnosis. Proc Inte Multiconference Eng Comp Sci Hong
Kong 2010.
*Correspondence to
Tanzila Saba
College of Computer and Information Sciences
Prince Sultan University Riyadh, 11586
Saudi Arabia
A hybrid approach for heart diseases classification and associated risk factors identification
Biomed Res- India 2017 Volume 28 Issue 8 3455
... DL systems have been shown to be effective for analyzing medical images [1]. Some familiar image modalities used to detect the abnormalities in human organs are MRI (Magnetic Resonance Imaging) [2], X-ray, CT (Computed Tomography) scan [3], Ultrasound, and Mammography [4]. Enhancing diagnostic capacity is the goal of medical image processing [5]. ...
... Enhancing diagnostic capacity is the goal of medical image processing [5]. Image enhancement, segmentation, feature extraction, and classification are the main components of medical image analysis [3,[6][7][8]. This paper reviews the image analyses of six different diseases, viz., lung cancer, colorectal cancer, liver cancer, stomach cancer, breast cancer, and brain tumors. ...
... The inflammatory disease of the blood vessels, known as atherosclerosis, is the leading cause of CVD. Endothelium dysfunction causes atherosclerosis by causing damage to the blood vessel's inner surface's thin wall, creating increasingly complex lesions and fatty streaks inside artery walls (Iftikhar et al. 2017b;Ragab et al. 2022b). Ece et al. (2022) research included 600 patients, 300 females and 300 males, diagnosed with IHD due to data acquired from reports of patients who had Coronary angiography CAG at the Frat University Hospital. ...
Article
Full-text available
Radiological image analysis using machine learning has been extensively applied to enhance biopsy diagnosis accuracy and assist radiologists with precise cures. With improvements in the medical industry and its technology, computer-aided diagnosis (CAD) systems have been essential in detecting early cancer signs in patients that could not be observed physically, exclusive of introducing errors. CAD is a detection system that combines artificially intelligent techniques with image processing applications thru computer vision. Several manual procedures are reported in state of the art for cancer diagnosis. Still, they are costly, time-consuming and diagnose cancer in late stages such as CT scans, radiography, and MRI scan. In this research, numerous state-of-the-art approaches on multi-organs detection using clinical practices are evaluated, such as cancer, neurological, psychiatric, cardiovascular and abdominal imaging. Additionally, numerous sound approaches are clustered together and their results are assessed and compared on benchmark datasets. Standard metrics such as accuracy, sensitivity, specificity and false-positive rate are employed to check the validity of the current models reported in the literature. Finally, existing issues are highlighted and possible directions for future work are also suggested.
... An automated Tumor Diagnosis model from CT Lung images has been proposed in [26], using Marker-Controlled Watershed algorithm. For classification, Support Vector Machine (SVM) has been used. ...
Preprint
Full-text available
In today’s worldwide health scenario, Lung Cancer has the highest rates of mortality and morbidity. The accurate and clinical staging of lung cancer category can effectively reduce the death rate, since the treatment phase requires the specific stage of cancer. However, the staging of lung cancer still remains challenging, requires more efforts. The Computed Tomography images (CT) images are utilized for the Computer Aided Diagnosis based cancer diagnosis. With that note, this paper develops a Volumetric Analysis for Lung Tumor Staging and Classification (VA-LTSC), in which the stages are classified based on Tumor Nodule Metastasis (TNM) with Machine Learning and Deep Learning Models Moreover, the proposed model comprises different phases. The results are measured using inputs from LIDC-IDRI and LUNA 16, based on classification accuracy, model effectiveness and time complexities and in all, the proposed model outperforms the existing results.
... In other words, the presence of discriminative features is crucial for successful classification. An excessive number of features can lead to confusion for the classifier, while too few features may not be sufficient for accurate classification (Iftikhar et al., 2017). Therefore, this paper suggests the use of Deep Learning (DL) techniques for the classification of ALL into its various subtypes. ...
Article
Acute lymphoblastic leukemia (ALL) is a life‐threatening disease that commonly affects children and is classified into three subtypes: L1, L2, and L3. Traditionally, ALL is diagnosed through morphological analysis, involving the examination of blood and bone marrow smears by pathologists. However, this manual process is time‐consuming, laborious, and prone to errors. Moreover, the significant morphological similarity between ALL and various lymphocyte subtypes, such as normal, atypic, and reactive lymphocytes, further complicates the feature extraction and detection process. The aim of this study is to develop an accurate and efficient automatic system to distinguish ALL cells from these similar lymphocyte subtypes without the need for direct feature extraction. First, the contrast of microscopic images is enhanced using histogram equalization, which improves the visibility of important features. Next, a fuzzy C‐means clustering algorithm is employed to segment cell nuclei, as they play a crucial role in ALL diagnosis. Finally, a novel convolutional neural network (CNN) with three convolutional layers is utilized to classify the segmented nuclei into six distinct classes. The CNN is trained on a labeled dataset, allowing it to learn the distinguishing features of each class. To evaluate the performance of the proposed model, quantitative metrics are employed, and a comparison is made with three well‐known deep networks: VGG‐16, DenseNet, and Xception. The results demonstrate that the proposed model outperforms these networks, achieving an approximate accuracy of 97%. Moreover, the model's performance surpasses that of other studies focused on 6‐class classification in the context of ALL diagnosis. Research Highlights Deep neural networks eliminate the requirement for feature extraction in ALL classification The proposed convolutional neural network achieves an impressive accuracy of approximately 97% in classifying six ALL and lymphocyte subtypes.
... Authors in [37] introduced a novel fitness function for Particle Swarm Optimization (PSO) using Support Vector Machine (SVM) and achieve average classification accuracy of 84.36% and highest accuracy of 88.22% using SVM. Combination of PSO and SVM is also used in [38] .Rough sets and firefly algorithms are used in [39] for attribute selection and interval type-2 fuzzy system is used to predict the heart disease with an accuracy of 88.3% over UCI heart disease dataset. Firefly algorithm is also used in [40] along with the Opposition Based Learning (OBL) to implement a a hybrid OFBAT-RBFL (Optimal Rule Generation using Opposition based Firefly and BAT algorithm-Rule Based Fuzzy Learning) heart disease diagnosis system which attains a relatively low classification accuracy of 78% on the UCI heart disease dataset. ...
Preprint
Full-text available
In this article, we explore the natural problem of class imbalance in the heart disease datasets. Aiming for a comprehensive examination of the class balancing techniques we train and test our models on three different datasets all suffering from different degree of class imbalance. The Healthcare dataset (mid size) and the BRFSS-2015 dataset (large dataset) having huge class imbalance, and the Iraq hospital dataset with mild class imbalance. Feature selection is done using backward elimination and Logistic Regression (LR), Adaptive Boosting (ADB), XGBoost (XGB), Random Forest (RF), and Ensemble classifiers are combined with the four popular class balancing techniques namely Random Under Sample, Random Over Sample, SMOTETomek, and ADASYN. As proven by results, Random Under Sampling and Random Over Sampling combined with Logistic Regression and ADA Boost proves the best combination for the highly imbalanced Healthcare Stroke Dataset and BRFSS-2015 dataset by producing almost 15%-20% increase in the in F1-Score of the minority class and AU-ROC making the classifier model more robust but with a minor decline in the overall accuracy. For the mildly imbalanced Iraq Hospital dataset the same combination marginally improves the F1-Score and AU-ROC without compromising on the overall accuracy and the F1-Score of majority class.
... A challenging signal processing issue is classifying ECG data into cardiac disorders [1,2]. Effective signal capturing, pre-processing, filtering, segmentation, feature extraction, feature selection, classification, and post-processing blocks are all included in this. ...
Article
Full-text available
The signals produced by an electrocardiogram (ECG) are made up of intricate pattern se-quences that have a periodic structure. These pattern sequences contain an initial P-wave, which denotes the beginning of an ECG wave, a QRS sequence, which denotes the intensi-ty of the pulse, and a T segment, which denotes the conclusion of the wave. Characteris-tics such as PR interval, QRS interval, QT interval, ST interval, R to R interval, etc. are employed to recognize chronic, ischemic, and other cardiac illnesses. These wave patterns need the simultaneous execution of many high-complexity signal processing operations to be classified into cardiac disorders. Signal pre-processing, feature extraction, feature se-lection, classification into epileptic and non-epileptic seizures, and post-processing are some of these operations. For each of these operations, researchers create a wide range of algorithms. These algorithms' performance differs significantly in terms of the quantity of leads utilized for ECG collection, filtering effectiveness, feature extraction & selection ef-fectiveness, and classifier effectiveness. Thus, researchers and system designers have be-come unclear when choosing the optimum algorithm set for an application. This text of-fers a thorough analysis & design of fuzzy CNN model of a broad range of epileptic & non-epileptic seizure classification techniques to lessen ambiguity. Convolutional neural network (CNN) based models outperform other models in terms of general-purpose per-formance, whereas application-specific deployments need the employment of customized fuzzy CNN models. A significant area of research & development is presented by the ob-servation that fuzzy logic techniques are not observed while constructing ECG classifica-tion models. Based on these findings, a new fuzzy logic-based classification method is provided in this text that employs quantization techniques to transform input ECG signals into fuzzy values. With these parameters and a specially created CNN model, it was pos-sible to achieve an accuracy of 99.5% for diverse ECG datasets. This accuracy was com-pared to a number of cutting-edge models, and it was observed that the suggested model is quite good at categorizing ECG signals. The suggested technique was observed to be quicker than traditional procedures due to the usage of a fuzzy logic model, enhancing its scalability for a broad range of clinical applications.
Article
Purpose: Vaginal infections are prevalent causes of gynecological consultations. This study introduces and evaluates the efficacy of four Machine Learning algorithms in detecting vaginitis cases in southern Mexico. Methods: Utilizing Simple Perceptron, Naïve Bayes, CART, and AdaBoost, we conducted classification experiments to identify four vaginitis subtypes (gardnerella, candidiasis, trichomoniasis, and chlamydia) in 600 patient cases. Results: The outcomes are promising, with a majority achieving 100% accuracy in vaginitis identification. Conclusion: The successful implementation and high accuracy of these algorithms demonstrate their potential as valuable diagnostic tools for vaginal infections, particularly in southern Mexico. It is crucial in a region where health technology adoption lags behind, and intelligent software support is limited in gynecological diagnoses.
Article
Gastrointestinal diseases cause around two million deaths globally. Wireless capsule endoscopy is a recent advancement in medical imaging, but manual diagnosis is challenging due to the large number of images generated. This has led to research into computer-assisted methodologies for diagnosing these images. Endoscopy produces thousands of frames for each patient, making manual examination difficult, laborious, and error-prone. An automated approach is essential to speed up the diagnosis process, reduce costs, and potentially save lives. This study proposes transfer learning-based efficient deep learning methods for detecting gastrointestinal disorders from multiple modalities, aiming to detect gastrointestinal diseases with superior accuracy and reduce the efforts and costs of medical experts. The Kvasir eight-class dataset was used for the experiment, where endoscopic images were preprocessed and enriched with augmentation techniques. An EfficientNet model was optimized via transfer learning and fine tuning, and the model was compared to the most widely used pre-trained deep learning models. The model’s efficacy was tested on another independent endoscopic dataset to prove its robustness and reliability.
Article
Full-text available
Segmentation of objects from a noisy and complex image is still a challenging task that needs to be addressed. This article proposed a new method to detect and segment nuclei to determine whether they are malignant or not (determination of the region of interest, noise removal, enhance the image, candidate detection is employed on the centroid transform to evaluate the centroid of each object, the level set [LS] is applied to segment the nuclei). The proposed method consists of three main stages: preprocessing, seed detection, and segmentation. Preprocessing stage involves the preparation of the image conditions to ensure that they meet the segmentation requirements. Seed detection detects the seed point to be used in the segmentation stage, which refers to the process of segmenting the nuclei using the LS method. In this research work, 58 H&E breast cancer images from the UCSB Bio-Segmentation Benchmark dataset are evaluated. The proposed method reveals the high performance and accuracy in comparison to the techniques reported in literature. The experimental results are also harmonized with the ground truth images.
Article
Full-text available
Gesture recognition and hand pose tracking are applicable techniques in human–computer interaction fields. Depth data obtained by depth cameras present a very informative explanation of the body or in particular hand pose that it can be used for more accurate gesture recognition systems. The hand detection and feature extraction process are very challenging task in the RGB images that they can be effectively dissolved with simple ways with depth data. However, depth data could be combined with the color information for more reliable recognition. A common hand gesture recognition system requires identifying the hand and its position or direction, extracting some useful features and applying a suitable machine-learning method to detect the performed gesture. This paper presents the novel fusion of the enhanced features for the classification of static signs of the sign language. It begins by explaining how the hand can be separated from the scene by depth data. Then, a combination feature extraction method is introduced for extracting some appropriate features of the images. Finally, an artificial neural network classifier is trained with these fused features and applied to critically analyze various descriptors performance.
Article
Full-text available
Images are full of information and most often, little information is desired for subsequent processing. Hence, region of interest has key importance in image processing. Quadtree image segmentation has been widely used in many image processing applications to locate the region of interest for further processing. There are also variable block-size image coding techniques to effectively reduce the number of transmitted parts. This paper presents quadtree partition technique as a pre-processing step in image processing to determine what part should be more heterogeneous than the others. It also introduces an idea to solve the problem of squared images. Finally, proposed approach is implemented and analysed. The simulation of the Matlab code of the quadtree is represented by all algorithms and the figures. Thus, achieved results are promising in the state of the art.
Article
Full-text available
Genetic data, in digital format, is used in different biological phenomena such as DNA translation, mRNA transcription and protein synthesis. The accuracy of these biological phenomena depend on genetic codes and all subsequent processes. To computerize the biological procedures, different domain experts are provided with the authorized access of the genetic codes; as a consequence, the ownership protection of such data is inevitable. For this purpose, watermarks serve as the proof of ownership of data. While protecting data, embedded hidden messages (watermarks) influence the genetic data; therefore, the accurate execution of the relevant processes and the overall result becomes questionable. Most of the DNA based watermarking techniques modify the genetic data and are therefore vulnerable to information loss. Distortion-free techniques make sure that no modifications occur during watermarking; however, they are fragile to malicious attacks and therefore cannot be used for ownership protection (particularly, in presence of a threat model). Therefore, there is a need for a technique that must be robust and should also prevent unwanted modifications. In this spirit, a watermarking technique with aforementioned characteristics has been proposed in this paper. The proposed technique makes sure that: (i) the ownership rights are protected by means of a robust watermark; and (ii) the integrity of genetic data is preserved. The proposed technique-GenInfoGuard-ensures its robustness through the "watermark encoding" in permuted values, and exhibits high decoding accuracy against various malicious attacks.
Article
Full-text available
In various application domains such as website, education, crime prevention, commerce, and biomedicine, the volume of digital data is increasing rapidly. The trouble appears when retrieving the data from the storage media because some of the existing methods compare the query image with all images in the database; as a result, the search space and computational complexity will increase, respectively. The content-based image retrieval (CBIR) methods aim to retrieve images accurately from large image databases similar to the query image based on the similarity between image features. In this study, a new hybrid method has been proposed for image clustering based on combining the particle swarm optimization (PSO) with k-means clustering algorithms. It is presented as a proposed CBIR method that uses the color and texture images as visual features to represent the images. The proposed method is based on four feature extractions for measuring the similarity, which are color histogram, color moment, co-occurrence matrices, and wavelet moment. The experimental results have indicated that the proposed system has a superior performance compared to the other system in terms of accuracy.
Article
Full-text available
This paper presents a new, simple and fast approach for character segmen-tation of unconstrained handwritten words. The proposed approach first seeks the possible character boundaries based on characters geometric features analysis. However, due to inherited ambiguity and a lack of context, few characters are over-segmented. To increase the efficiency of the proposed approach, an Artificial Neural Network is trained with sig-nificant number of valid segmentation points for cursive handwritten words. Trained neural network extracts incorrect segmented points efficiently with high speed. For fair comparison, benchmark database CEDAR is used. The experimental results are promis-ing from complexity and accuracy points of view.
Article
Standard database is the essential requirement to compare the performance of image analysis techniques. Hence the main issue in dental image analysis is the lack of available image database which is provided in this paper. Periapical dental X-ray images which are suitable for any analysis and approved by many dental experts are collected. This type of dental radiograph imaging is common and inexpensive, which is normally used for dental disease diagnosis and abnormalities detection. Database contains 120 various Periapical X-ray images from top to bottom jaw. Dental digital database is constructed to provide the source for researchers to use and compare the image analysis techniques and improve or manipulate the performance of each technique.
Article
Offline clinical guidelines are typically designed to integrate a clinical knowledge base, patient data and an inference engine to generate case specific advice. In this regard, offline clinical guidelines are still popular among the healthcare professionals for updating and support of clinical guidelines. Although their current format and development process have several limitations, these could be improved with artificial intelligence approaches such as expert systems/decision support systems. This paper first, presents up to date critical review of existing clinical expert systems namely AAPHelpm, MYCIN, EMYCIN, PIP, GLIF and PROforma. Additionally, an analysis is performed to evaluate all these fundamental clinical expert systems. Finally, this paper presents the proposed research and development of a clinical expert system to help healthcare professionals for treatment.
Article
Arabic script classification is a complex area of research in the field of computer vision. The issue of offline Arabic script classification has been a concern of many researchers interest currently as it is assumed that online Arabic script recognition is comparatively simple and significant achievements have been attained. Numerous researchers deal with these issues evolved in pre-processing and post-processing techniques of Arabic script and presented various approaches to improve its accuracy rate. However, offline Arabic script classification and its related issues are still fresh. In this paper, we focus on pre-processing to post-processing techniques and highlight several issues in each phase in order to highlight need of high classification performance for Arabic script classification (offline and online). Additionally, top experimental results are reported, discussed and compared, and current challenges are also discussed. Finally, online versus offline Arabic script recognition achievements are also compared.
Article
The healthcare industry collects huge amounts of healthcare data which, unfortunately, are not "mined" to discover hidden information for effective decision making. Discovery of hidden patterns and relationships often goes unexploited. Advanced data mining techniques can help remedy this situation. This paper describes about a prototype using data mining techniques, namely Naïve Bayes and WAC (weighted associative classifier).This system can answer complex "what if" queries which traditional decision support systems cannot. Using medical profile0073 such as age, sex, blood pressure and blood sugar it can predict the likelihood of patients getting a heart disease. It enables significant knowledge, e.g. patterns, relationships between medical factors related to heart disease, to be established. It can serve a training tool to train nurses and medical students to diagnose patients with heart disease. It is a web based user friendly system and can be used in hospitals if they have a data ware house for their hospital. Presently we are analyzing the performances of the two classification data mining techniques by using various performance measures.