ArticlePDF Available

Early Detection of Breast Cancer Using Machine Learning Techniques

Authors:

Abstract and Figures

Cancer is the second cause of death in the world. 8.8 million patients died due to cancer in 2015. Breast cancer is the leading cause of death among women. Several types of research have been done on early detection of breast cancer to start treatment and increase the chance of survival. Most of the studies concentrated on mammogram images. However, mammogram images sometimes have a risk of false detection that may endanger the patient's health. It is vital to find alternative methods which are easier to implement and work with different data sets, cheaper and safer, that can produce a more reliable prediction. This paper proposes a hybrid model combined of several Machine Learning (ML) algorithms including Support Vector Machine (SVM), Artificial Neural Network (ANN), K-Nearest Neighbor (KNN), Decision Tree (DT) for effective breast cancer detection. This study also discusses the datasets used for breast cancer detection and diagnosis. The proposed model can be used with different data types such as image, blood, etc.
Content may be subject to copyright.
e-ISSN: 2289-8131 Vol. 10 No. 3-2 21
Early Detection of Breast Cancer Using Machine
Learning Techniques
M. Tahmooresi1, A. Afshar2, B. Bashari Rad1, K. B. Nowshath1 and M. A. Bamiah2
1Asia Pacific University of Technology and Innovation (APU), Malaysia.
2University of Malaya, Malaysia.
maryam.tahmooresi@yahoo.com
AbstractCancer is the second cause of death in the world.
8.8 million patients died due to cancer in 2015. Breast cancer is
the leading cause of death among women. Several types of
research have been done on early detection of breast cancer to
start treatment and increase the chance of survival. Most of the
studies concentrated on mammogram images. However,
mammogram images sometimes have a risk of false detection
that may endanger the patient’s health. It is vital to find
alternative methods which are easier to implement and work
with different data sets, cheaper and safer, that can produce a
more reliable prediction. This paper proposes a hybrid model
combined of several Machine Learning (ML) algorithms
including Support Vector Machine (SVM), Artificial Neural
Network (ANN), K-Nearest Neighbor (KNN), Decision Tree
(DT) for effective breast cancer detection. This study also
discusses the datasets used for breast cancer detection and
diagnosis. The proposed model can be used with different data
types such as image, blood, etc.
Index TermsBreast Cancer; Breast Cancer Detection;
Medical Images; Machine Learning.
I. INTRODUCTION
World Health Organization (WHO) reported the breast cancer
is the most common cancer amongst women globally [1]. It
is also the highest ranked type of cancer cause the death
among women in the world [2, 3]. In Malaysia, Breast cancer
has the highest rate of cancer deaths, around 25%, and it is
the commonest cancer among women [4]. Around 5% of
Malaysian women are at risk of breast cancer while Europe
and the United States, it is around 12.5% [3]. It confirms that
women with breast cancer in Malaysia present at a later stage
of the disease compared to women from other countries [4].
Usually, breast cancer can be easily detected if specific
symptoms appear. However, many women who are suffering
from breast cancer have no symptoms. Hence, regular breast
cancer screening is very important for early detection [3].
Early detection of breast cancer aids for early diagnosis and
treatment, because the prognosis is very important for long-
term survival [5]. Since early detection, diagnosis, and
treatment of cancer can reduce the risk of death, it plays a
significant role in saving the life of the patient. Any delay in
detection of cancer in early stages leads to disease
progression and complication of treatment [5], therefore long
waiting time prior to diagnosis of breast cancer and starting
the treatment process is of prognostic concern.
Previous studies on the investigation of the consequences
of a late diagnosis of cancer confirm that it is strongly
associated with progression of the disease to more advanced
stages, consequently less chance to save the patient’s life. In
a systematic review conducted by Prof MA Richards et al.
[6], an analysis of 87 studies strongly concluded that female
patients with breast cancer who start their therapy less than 3
months after the appearance of symptoms significantly have
a higher chance of survival compare to those who wait for
more than 3 months.
Many previous studies confirm that detection of breast
cancer in early stages significantly increase the chance of
survival because it prevents the spreading of malignant cells
throughout the entire body [6].
The main contribution of this paper is to review the role of
machine learning techniques in early detection of the breast
cancer.
Artificial Intelligence (AI) can be applied to improve
breast cancer detection and diagnosis, as well as prevent
overtreatment. Nevertheless, combining AI and Machine
Learning (ML) methods enables the prediction and empower
accurate decision making. For example, deciding on the
biopsy results for detecting breast cancer if the patient needs
surgery or not.
Currently, Mammograms are the most used test available,
however, still, they have false positive (high-risk) results
which shows abnormal cells that can lead to unnecessary
biopsies and surgeries. Sometimes surgery is done to remove
lesions reveals that it is benign which is not harmful. This
means that the patient will go through unnecessary painful
and expensive surgery.
ML Algorithms were introduced with many features such
as effective performance on healthcare related dataset which
involve images, x-rays, blood samples, etc. Some methods
are appropriate for the small dataset whereby others are
suitable for huge datasets. However, noise can be a
problematic concern in some methods.
This paper is organized as follows, Section II introduces
the breast cancer briefly, Section III explains the ML
algorithms used for detecting breast cancer. A summary of
previous related works is given in section IV. Finally,
Section V concludes the paper.
II. BREAST CANCER
Breast cancer is the most found disease in the women,
worldwide, where abnormal growth of a mass of tissue, cause
the expansion of malignant cells leads to acute breast cancer.
These malignant cells are originally created from milk glands
of the breast. These malignant cells which are the main reason
for breast cancer can be classified into different groups
according to their unusual progress and capability affecting
other normal cells [7]. The capability of affecting means
whether these malignant cells affect only the local cells or can
spread throughout the full body. The effect of spreading these
Journal of Telecommunication, Electronic and Computer Engineering
22 e-ISSN: 2289-8131 Vol. 10 No. 3-2
malignant cells throughout the whole body of the patient is
called as metastasis [7]. It is very important to prevent this
spreading effect by a diagnosis of cancer in the early stages
using advanced techniques and equipment. In recent decades,
there are many efforts to employ artificial intelligence and
other related methods to assist in the detection of cancer in
earlier stages.
Early detection of cancer boosts the increase of survival
chance to 98% [8]. Figure 1. shows different types of cancers
whereby breast cancer is leading with 24% as follows.
Figure 1: Types of cancer
III. MACHINE LEARNING METHODS
Machine Learning is a process that machines (computers) are
trained with data to make the decision for similar cases [9].
ML is employed in various applications, such as object
recognition, network, security, and healthcare. There are two
ML types i.e. single and hybrid methods like ANN, SVM,
Gaussian Mixture Model (GMM), K-Nearest Neighbor
(KNN), Linear Regressive Classification (LRC), Weighted
Hierarchical Adaptive Voting Ensemble (WHAVE), etc.
Following are the used ML algorithms:
A. Artificial Neural Network (ANN)
ANN is a model like human brains nerve system that has a
large number of nodes connected to each other. Each node
has two states: 0 means active and 1 means active. Also, each
node has a positive or negative weight that adjusts the
strength of the node and can activate or deactivate it. ANN
provides samples of data to train the machine. The trained
machine is used to detect the pattern of hidden date. It can
search for patterns among patients’ healthcare and personal
records to identify high-risk lesions [10].
B. Support Vector Machine (SVM)
SVM is a supervised pattern classification model which is
used as a training algorithm for learning classification and
regression rule from gathered data [11]. The purpose of this
method is to separate data until a hyperplane with high
minimum distance is found. SVM is used to classify two or
more data types. SVM include single or hybrid models such
as Standard SVM (St-SVM), Proximal Support Vector
Machine (PSVM), Newton Support Vector Machine
(NSVM), Lagrangian Support Vector Machines (LSVM),
Linear Programming Support Vector Machines (LPSVM),
and Smooth Support Vector Machine (SSVM).
C. K-Nearest Neighbors (KNN)
KNN is a supervised learning method which is used for
diagnosing and classifying cancer [12]. In this method, the
computer is trained in a specific field and new data is given
to it. Additionally, similar data is used by the machine for
detecting (K) hence, the machine starts finding KNN for the
unknown data. It is recommended to choose a large dataset
for training also K value must be an odd number.
D. Decision Tree (DT)
DT is a data mining technique used for early detection of
breast cancer. It is a model that presents classifications or
regressions as a tree. In this model, the data set is broken to
small sub-data, then to smaller ones. As a result, the tree is
developed and at the last level, the result is revealed. In a tree
structure, the leaves characterize the class labels whereby the
branches characterize conjunctions of feature leading to the
class labels Hence, DT is not sensitive to noise [13].
E. Random Forest (RF) Algorithm
RF algorithm is used at the regularization point where the
model quality is highest, variance and bias problems are
compromised [14]. RF builds numerous numbers of DTs
using random samples with a replacement to overcome the
problem of DTs. Each tree classifies its observations, and
majority votes decision is chosen. RF is used in the
unsupervised mode for assessing proximities among data
points.
F. AdaBoost Classifier
This algorithm is used for classification and regression to
predict breast cancer existence. It converts weak learners to
strong ones by combining all weak learners to form a single
strong rule. It gets the weight of the node and changes it
continuously until an accurate result is found. However, it is
sensitive to noise and quality of features [15].
G. Naïve Bayes (NB) Classifier
Naïve Bayes refers to a probabilistic classifier that applies
Bayes’ theorem with robust independence assumptions [16].
In this model, all properties are considered separately to
detect any existing relationship between them. It assumes that
predictive attributes are conditionally independent given a
class. Moreover, the values of the numeric attributes are
distributed within each class. NB is fast and performs well
even with a small dataset. However, it is difficult to find
independent properties in real life. [16]. have deployed NB
classifier for breast cancer detection and it gave the maximum
accuracy with only five dominant.
IV. PREVIOUS RELATED WORKS
Several studies have been conducted on the
implementation of ML on Breast Cancer detection and
diagnosis using different methods or combination of several
algorithms to increase the accuracy. S. Gc et al. [17] worked
on extracting features including variance, range, and
compactness. They used SVM classification to evaluate the
performance. Their findings showed the highest variance of
95%, range 94%, compactness 86%. According to their
results, SVM can be considered as an appropriate method for
Breast Cancer Detection.
24%
13%
10%
6%
6%
41%
Breast Tranchea,Bronchus,Lung
Colorectum Ovary
Cervix Uteri Other
Early Detection of Breast Cancer Using Machine Learning Techniques
e-ISSN: 2289-8131 Vol. 10 No. 3-2 23
Chunqiu Wang et al. [18] chose Microwave Tomography
Imaging (MTI) to extract features and classify the images
using ANN. Two different techniques were compared in this
study, GMM and KNN. Their results showed that the
sensitivity obtained by KNN is 87%, while for GMM is 67%.
The accuracy was 85% for KNN and 75% for GMM. The
result for Matthews Correlation Coefficient (MCC) was 67%
and 48% for KNN and GMM, respectively. Finally, the
specificity was 84% for KNN and 86% for GMM. According
to their findings, Sensitivity, Accuracy, and MCC for KNN
were better than GMM, but GMM was better in Specificity
and Precision.
Chowdhary and Acharjya [19] focused on mammogram
images as they are cheaper and more efficient in detection.
However, since selecting and extracting features are
important for improving performance, Fuzzy Histogram
Hyperbolization (FHH) was chosen to increase the quality of
images, Fuzzy C-mean for segmenting, and Gray level
dependence model for extracting the features. Their method
showed 94% accuracy for detecting malignant breast lesions.
In a study conducted by Aminikhanghahi et al. [20],
wireless cyber mammography images were explored. After
selecting features and extracting them, the researcher has
chosen two different ML techniques, SVM and GMM to
check their accuracy. Their findings showed that SVM is
more accurate if there is no noise or error, else GMM is better
and safer.
Durai et al. [21] Have selected Data Mining technique for
detecting diseases including breast cancer. They used LRC
and compared it with four other techniques including BFI,
ID3, J48, and SVM. The result shows that LRC is the most
accurate one with 99.25% accuracy.
Wang and Yoon [22] chose four methods of Data Mining
to measure their effectiveness in detection. These models
were: SVM, ANN, Naïve Bayes Classification and Adaboost
tree. In addition, PCs and PCi were used for making hybrid
models. After checking the accuracy, they have found out that
Principal Component Analysis (PCA) can be a critical factor
to improve performance.
Hafizah et al. [23] compared SVM and ANN using four
different datasets of breast and liver cancer including WBCD,
BUPA JNC, Data, Ovarian. The researchers have
demonstrated that both methods are having high performance
but still, SVM was better than ANN.
Azar and El-Said [24] worked on six different methods of
SVM. They have compared ST-SVM with LPSVM, LSVM,
SSVM, PSVM, and NSVM to find out which method
performs the best in accuracy, sensitivity, specificity, and
ROC. LPSVM proved to be the best with accuracy 97.1429%,
sensitivity 98.2456%, specificity 95.082%, and ROC
99.38%. Therefore, LPSVM has the highest performance.
Deng and Perkowski [25] used a new method called
Weighted Hierarchical Adaptive Voting Ensemble
(WHAVE). They compared the accuracy of WHAVE with
seven other methods that had the highest accuracies in
previous researchers. WHAVE proved to achieve the highest
performance value of 99.8%.
Rehman et al. [26] extracted different features including
Phylogenetic trees, Statistical Features and Local Binary
Patterns from mammography images. They used a hybrid
model combined with SVM and RBF for classification. They
checked the accuracy of each feature separately. In this step
the best accuracy value was 76% for 90 features that were
chosen based on Taxonomic Indices based Feature (TIF)
Vector, 68% for Statistical and LBP based Feature Vector,
then the features were combined (Taxonomic Indices,
Statistical and LBP based Feature Vector) and again checked
for accuracy. The evaluation results were the best after 4
times testing. The researchers claimed that to increase
performance and efficiency of detecting breast cancer is
performed by using different features.
Mejia et al. [27] have chosen Thermogram images for
detecting breast cancer as it is cheaper and safer than other
methods. It can detect cancer in the earlier stage compared to
other images or tests, and it doesn’t have any limitation such
as pregnancy, size or density of breast. Also, it doesn’t need
any complex features for extracting. They selected 18 cases
with 9 abnormal and 9 normal cases. KNN classifier was used
to improve the accuracy. The results were 88.88% for
abnormal and 94, 44% for normal cases.
Ayeldeen et al. [28] used AI and its techniques for breast
cancer detection. They used 5 different methods for
performance comparison. RF algorithm showed the highest
result with 99% performance.
Avramov and Si [29] worked on feature extraction and the
impact of the selection on performance. They applied 4 ways
of correlation selection (PCA, T-Test Significance and
Random feature selection) and 5 models of classification (LR,
DT, KNN, LSVM, and CSVM). Best result was achieved by
stacking the logistic, SVM and CSVM improve accuracy to
98.56%.
Ngadi et al. [30] used NSVC algorithm to test different
classification methods including RBF, Poly, and Linear. Then
they compared the results with other classification methods
such as Naïve Bayes, DT, K-NN, SVM, RF, and Adaboost.
RF has the best performance result with 93% accuracy. This
proves that NSVC was better than the other methods.
Jiang and Xu [31] used Diffusion-Weighted Magnetic
Resonance Image (DWI) for breast cancer detection. They
used two types of features; one based on ROI and another one
based on ADC- on 61 patient’s data. Moreover, they
implemented RF-RFE and RF algorithm was used. The study
findings show that the accuracy of RF-RFE and RF and
Histogram + GLCM is 77.05% which indicates that feature-
based texture has a critical role in improving performance and
detection.
Salma [32] selected two different data sets from WBCD
and KDD also they used FM-ANN for both of them. They
compared the results with other techniques (RBF, FNN, and
MNN). After training and testing KDD achieved better
accuracy of 99.96% due to the number of features were more.
Comparing the results FM- ANN proved to be more accurate.
Bevilacqua et al. [33] selected MR images for training and
testing. After extracting data and processing, they used ANN
for classification and detecting breast cancer. However, when
Genetic Algorithm was used to optimize ANN, the observed
specificity was 90.46%, sensitivity was 89.08% and the
average accuracy was improved to 89.77% and high accuracy
changed to 100%.
Table 1 represents all the related work ML method used in
this study [17-33]. It contains the references, type of extracted
features, data sets and measured performances. Performance
is the most significant feature in choosing the proper method.
Journal of Telecommunication, Electronic and Computer Engineering
24 e-ISSN: 2289-8131 Vol. 10 No. 3-2
Table 1
Related work on different types of methodology, features, dataset, and references for breast cancer detection
R
Methodology
Features
Data Base
Dataset
[17]
SVM
Variance, Range,
Compactness
Mammogram
MCC
Sensitivity
Specificity
Accuracy
Variance
83.2%,
95%,
88%
91.5%
Range
82.1%
94%
88%
90.5%
Compactness
70%
86%
84%
85%
Digital Database
for Screening
Mammography
(DDSM)
[18]
GMM
KNN
Tissue
Microwave
Tomography
Image
MCC%
Sensitivity
Specificity
Precision
Accuracy
KNN
67%
87%,
84%
70%
80-90%
GMM
48%
67%
86%
70.8%
70-80%
ETRI
[19]
SVM, KNN,
RSDA
Fuzzy Histogram
Hyperonization,
Fuzzy C-mean, and Gray
level dependence model
Mammogram
Training set
Accuracy %
Normal
70
100
Benign
60
96.67
Malignant
50
94
Mammographic
Image Analysis
Society (MIAS)
[20]
SVM, GMM
Contrast, Homogeneity,
Mean, Correlation, Energy,
Maximum
Mammography
MCC
Sensitivity
Specificity
SVM
78.78%
82%
96%
GMM
72.06%
84%
86%
DDSM
University of
South Florida
[21]
LRC
Mitoses, Marginal-Adhesion,
Normal Nucleoli, Clump
Thickness, Bland Chromatin,
Uniformity of cell shape,
Single Epithelial cell size,
Uniformity of cell size, Bare
Nuclei
Standard Data
Accuracy percentage
LRC
99.25
BFI
95.46
ID3
92.99
J48
98.14
SVM
96.40
UCI
[22]
SVM, ANN, NB,
Adaboost tree,
PCA
WBC: Mitoses, Marginal-
Adhesion, Normal Nucleoli,
Clump Thickness, Bland
Chromatin, Uniformity of
cell shape, Single Epithelial
cell size, Uniformity of cell
size, Bare Nuclei
WDBC, Radius, Texture,
Perimeter, Area,
Smoothness, Compactness,
Concavity, Concave Points
Symmetry, Fractal
Dimension
Standard Data
Accuracy percentage
WBC
WDBC
SVM
97.10
97.99
PCs-SVM
97.47
98.12
PCi-SVM
96.73
97.90
ANN
89.88
99.60
PCs-ANN
95.52
99.61
PCi-ANN
94.33
99.63
Naïve
96.21
93.32
PCs-Naïve
96.50
91.79
PCi-Naïve
96.16
91.72
Adaboost
95.84
97.19
PCs-Adaboost
96.24
96.73
PCi-AdaBoost
96.32
96.83
Wisconsin Breast
Cancer Database
Original (WBC)
Wisconsin
Diagnostic Breast
Cancer Database
(WDBC)
[23]
ANN,
SVM
Mitoses, Marginal-Adhesion,
Normal Nucleoli, Clump
Thickness, Bland Chromatin,
Uniformity of cell shape and
size, Single Epithelial cell
size, Bare Nuclei
Standard Data
Accuracy
Sensitivity
Specificity
AUC
SVM
99.51%
99.25%
100%
99.63%
ANN
98.54%
99.25%
97.22%
98.24%
Wisconsin Breast
Cancer Database
(WBCD)
[24]
St-SVM,
PSVM,
LSVM,
NSVM,
LPSVM,
SSVM
Mitoses, Marginal-Adhesion,
Normal Nucleoli, Clump
Thickness, Bland Chromatin,
Uniformity of cell shape and
size, Single Epithelial cell
size, Bare Nuclei
Mammography
Accuracy
Sensitivity
Specificity
ROC
LPSVM
97.1429
98.2456
95.082
99.38
LSVM
95.4286
96.5217
93.3333
97.18
SSVM
96.5714
96.5812
96.5517
98.35
PSVM
96
97.3684
93.4426
97.75
NSVM
96.5714
96.5812
96.5517
98.35
ST-SVM
94.86
95.65
93.33
96.61
WBCD
[25]
Weighted
Hierarchical
Adaptive Voting
Ensemble
(WHAVE)
Disjunctive
Normal Form
(DNF) rule-based
method,
DT, NB, SVM
Mitoses, Marginal-Adhesion,
Normal Nucleoli, Clump
Thickness, Bland Chromatin,
Uniformity of cell shape and
size, Single Epithelial cell
size, Bare Nuclei
Method
Accuracy Percentage
DNF
65. 72
DT
94.74
NB
84.5
SVM
99.54
Hybrid
99.54
KNN
97.14
Quadratic Classifier
97.14
WHAVE
99.8
WBCD
[26]
SVM
RBF kernel
Phylogenetic trees,
Statistical Features, and
Local Binary Patterns
DDSM
Training
Testing
(%)
Model I
Model II
Model III
TIF %
(LBP) %
TIF and LBP %
Accura
cy
Specifi
city
Accura
cy
Specifi
city
Accura
cy
Specifi
city
80
20
64
58
54
51
66
60
70
30
71
66
52
49
65
61
60
40
76
73
68
64
80
76
50
50
70
76
64
60
72
67
MIAS
[27]
KNN
Mean, Standard Deviation
Thermogram
Accuracy
KNN
Normal
Abnormal
94.44%
88.88%
Federal
Fluminense
University
Hospital
Early Detection of Breast Cancer Using Machine Learning Techniques
e-ISSN: 2289-8131 Vol. 10 No. 3-2 25
R
Methodology
Features
Data Base
Dataset
[28]
Bayes Net (BN),
Multi-Class
Classifier,
DT,
Radial Basis
Function, RF
TP Rate, FP Rate, Precision,
Recall, F-measure, ROC area
Blood Serum
RF on
TP rate
FP
Rate
Precision
Recall
F
ROC
BN
0.947
0.035
0.949
0.947
0.945
0.995
Multi CC
0.933
0.043
0.933
0.933
0.93
0.987
DT
0.87
0.084
0.878
0.87
0.868
0.966
RBF
0.774
0.128
0.722
0.774
0.739
0.908
RF
0.99
0.007
0.99
0.99
0.99
1
Department of
Biochemistry and
Molecular
Biology of Kasr
Alainy
[29]
Logistic
Regression (LR),
DT.
KNN,
Cubic SVM
(CSVM)
Radius, Texture, Perimeter,
Area, Smoothness,
Compactness, Concavity,
Concave Points, Symmetry,
Fractal, Dimension
Microscope
Digital Image
Accuracy percentage
DT with 30 features
92.51
KNN with 30 features
91.56
LR with 3 features
96.27
LR with 6 features
97.77
LR with 30 features
95.65
LSVM with 3 features
97.47
LSVM with 10 features
97.87
LSVM with 30 features
97.30
CSVM with 11 features
97.98
SVM and CSVM
98.56
CSVM with 30 features
98
Stacking the Logistic, LSVM, and CSVM
98.56
UCI
[30]
NSVC
BI-RADS, Age, Shape,
Margin, Density, Severity
Mammography
UCI
[31]
RF-Recursive
Feature
Elimination (RF-
RFE) method
ROI: Mean, Variance,
Skewness, Kurtosis, Energy,
Entropy
ADC: Contrast, Entropy,
ASM, Correlation
Diffusion-
Weighted
Magnetic
Resonance
Image (DW
(Convert to
ADC)-MRI)
Accuracy
Sensitivity
Specificity
AUC
RF-RFE and RF
77.05%
84.21%
65.21%
0.76
Histogram
68.85%
76.32%
56.52%
0.73
GLCM
65.57%
71.05%
56.52%
0.63
Histogram + GLCM
77.05%
84.21%
65.21%
0.76
Zhejiang Cancer
Hospital
[32]
Fast Modular
Artificial Neural
Network (FM-
ANN)
WBCD: f4, f8, f12, f14, f24,
f27, f28
KDD: f22, f29, f47, f50, f60,
f61, f62, f63, f64, f65, f71,
f97f80, f98, f108,
X-Ray
Feedforward
%
MLP
%
RBF
%
MNN
%
FM-
ANN
WBCD 70:30
98.45
91.50
93.75
99.22
99.80
WBCD 50:50
94.91
89.5
90.65
93.57
95.71
WBCD after training Accuracy
99.8
KDD 70:30
94.91
93.95
98.45
99.22
99.96
KDD 50:50
93.21
92.95
97.98
98.22
98.96
KDD cup 2008 after training Accuracy 99.96
WBCD, KDD
Cup 2008
[33]
Optimized ANN
Size, Convexity, Solidity,
Eccentricity, Aspect ratio,
Circularity, the standard
deviation value of the gray
levels of
images with and without MC
in ROIs;
MRI
High
Accuracy
Average
Accuracy
Sensitivity
Specificity
Optimized ANN
100%
89.77%
89.08%
90.46%
Radiologists of
the University of
Bari Aldo Moro
According to Figure 2, most researchers have worked on
mammogram images as its quicker than other types of breast
cancer detection and it is safe and more effective [34].
Figure 3 presents a comparison of using ML methods and
algorithms methodologies employed for breast cancer
detection in the reviewed literature listed in Table 1. It is
observed that SVM is the most frequently used method.
Whereby, Figure 4 presents the results of breast cancer
detection using ML methods.
V. CONCLUSION
In the present paper, breast cancer and ML were introduced
as well as an in-depth literature review was performed on
existing ML methods used for breast cancer detection. The
findings of these researchers suggest that SVM is the most
popular method used for cancer detection applications. SVM
was used either alone or combined with another method to
improve the performance. The maximum achieved accuracy
of SVM (single or hybrid) was 99.8% that can be improved
to 100%. It was observed from the work of [33] who used
optional ANN on MRI resulted in 100% accuracy in
detecting breast cancer. This method can be applied and
tested on another dataset like mammogram and ultrasound to
check the performance of different data types. The
mammogram was the most frequent data set used compared
to other types of data such as ultrasound images, thermal
images or blood features.
Figure 2: Different breast cancer detection methods
35%
17%
12%
6%
6%
6%
6%
6%
6%
Mammogram Standard MRI
MTI Thermogram Blood Test
MDI DW X-Ray
Journal of Telecommunication, Electronic and Computer Engineering
26 e-ISSN: 2289-8131 Vol. 10 No. 3-2
Figure 3: Using machine learning methods in cancer detection
Figure 4: Accuracy percentages in different literatures
REFERENCES
[1] World Health Organization, “Cancer country profiles 2014,” WHO,
http://www.who.int/cancer/country-profiles/en/
[2] M. Stalin, and R. Kalaimagal, “Breast cancer diagnosis from low-
intensity asymmetry thermogram breast images using fast support
vector machine,” i-manager's Journal on Image Processing, vol. 3, no.
3, pp. 1726, 2016.
[3] R. Kirubakaran, T. C. Jia, and N. M. Aris, “Awareness of Breast Cancer
among Surgical Patients in a Tertiary Hospital in Malaysia,” Asian
Pacific Journal of Cancer Prevention, 2017, vol. 18, no. 1, pp. 115
120.
[4] T. M. Khan, and S. A. Jacob, “Brief review of complementary and
alternative medicine use among Malaysian women with breast cancer,”
Journal of Pharmacy Practice and Research, 2017, vol. 47, no. 2, pp.
147152.
[5] L. Caplan, “Delay in breast cancer: implications for the stage at
diagnosis and survival,” Frontiers in Public Health, 2014, vol. 2,
Article 87, pp. 16.
[6] M.A. Richards, A.M. Westcombe, S.B. Love, P. Littlejohns, and A.J.
Ramirez, “Influence of delay on survival in patients with breast cancer:
a systematic review,” The Lancet, 1999, vol. 353, no. 9159, pp. 1119-
1126.
[7] B. Stewart and C.P. Wild, World Cancer Report 2014, International
Agency for Research on Cancer, WHO, 2014.
[8] S. A. Korkmaz, and M. Poyraz, “A New Method Based for Diagnosis
of Breast Cancer Cells from Microscopic Images: DWEE—JHT,” J.
Med. Syst., vol. 38, no. 9, p. 92, 2014.
[9] P. Louridas, and C. Ebert, “Machine Learning,” IEEE Softw., vol. 33,
no. 5, pp. 110115, 2016.
[10] A. Simons, “Using artificial intelligence to improve early breast cancer
detection, “2017. Retrieved on April 10, 2018, from
https://www.csail.mit.edu/news/using-artificial-intelligence-improve-
early-breast-cancer-detection
[11] E. Ali, and W. Feng, “Breast Cancer classification using Support
Vector Machine and Neural Network,” International Journal of
Science and Research, pp. 2013, 2319-7064.
[12] S. Medjahed, T. Saadi, and A. Benyettou, “Breast Cancer Diagnosis by
using k-Nearest Neighbor with Different Distances and Classification
Rules,” International Journal of Computer Applications, 2013, vol. 62,
no. 1, pp. 0975 8887.
[13] R. Sumbaly, N. Vishnusri, and S. Jeyalatha, “Diagnosis of Breast
Cancer using Decision Tree Data Mining Technique,” International
Journal of Computer Applications, 2014, vol. 98, no. 10, pp. 0975
8887.
[14] M. Elgedawy, “Prediction of Breast Cancer using Random Forest,
Support Vector Machines and Naïve Bayes,” International Journal of
Engineering and Computer Science, 2017, vol. 6, no. 1, pp. 19884-
19889.
[15] R. Senkamalavalli, and T. Bhuvaneswari,” Improved classification of
breast cancer data using hybrid techniques, International Journal of
Advanced Research in Computer Science. 2017, vol. 8, no. 8, pp. 454-
457.
[16] A. Hazra, S. Mandal, and A. Gupta” Study and Analysis of Breast
Cancer Cell Detection using Naïve Bayes, SVM and Ensemble
Algorithms,” International Journal of Computer Applications. 2016,
vol. 145, no.2, pp. 0975 8887.
[17] S. Gc, R. Kasaudhan, T. K. Heo, and H.D. Choi, “Variability
Measurement for Breast Cancer Classification of Mammographic
Masses,” in Proceedings of the 2015 Conference on research in
adaptive and convergent systems (RACS), Prague, Czech Republic,
2015, pp. 177182.
[18] C. Wang, W. Wang, S. Shin, and S. I. Jeon, “Comparative Study of
Microwave Tomography Segmentation Techniques Based on GMM
and KNN in Breast Cancer Detection,” in Proceedings of the 2014
Conference on Research in Adaptive and Convergent Systems (RACS
'14), Towson, Maryland, 2014, pp. 303308.
[19] C. L. Chowdhary, and D. P. Acharjya, “Breast Cancer Detection using
Intuitionistic Fuzzy Histogram Hyperbolization and Possibilitic Fuzzy
c-mean Clustering algorithms with texture feature-based Classification
on Mammography Images,” in Proceedings of the International
Conference on Advances in Information Communication Technology &
Computing, Bikaner, India, 2016, pp. 16.
[20] S. Aminikhanghahi, S. Shin, W. Wang, S. I. Jeon, S. H. Son, and C.
Pack, “Study of wireless mammography image transmission impacts
on robust cyber-aided diagnosis systems,” Proc. 30th Annu. ACM
Symp. Appl. Comput. - SAC ’15, pp. 22522256, 2015.
[21] S. G. Durai, S. H. Ganesh, and A. J. Christy, “Novel Linear Regressive
Classifier for the Diagnosis of Breast Cancer,” In Computing and
Communication Technologies (WCCCT), 2017 World Congress on
2017.
[22] H. Wang, and S. W. Yoon, “Breast cancer prediction using data mining
method,” IIE Annu. Conf. Expo 2015, pp. 818828, 2015.
[23] S. Hafizah, S. Ahmad, R. Sallehuddin, and N. Azizah, “Cancer
Detection Using Artificial Neural Network and Support Vector
Machine: A Comparative Study,” J. Teknol, vol. 65, pp. 7381, 2013.
[24] A. T. Azar, and S. A. El-Said, “Performance analysis of support vector
machines classifiers in breast cancer mammography recognition,”
Neural Comput. Appl., vol. 24, no. 5, pp. 11631177, 2014.
[25] C. Deng, and M. Perkowski, “A Novel Weighted Hierarchical Adaptive
Voting Ensemble Machine Learning Method for Breast Cancer
Detection,” Proc. Int. Symp. Mult. Log., vol. 2015Septe, pp. 115120,
2015.
[26] A. U. Rehman, N. Chouhan, and A. Khan, “Diverse and Discriminative
Features Based Breast Cancer Detection Using Digital
Mammography,” 2015 13th Int. Conf. Front. Inf. Technol., pp. 234
239, 2015.
[27] T. M. Mejia, M. G. Perez, V. H. Andaluz, and A. Conci, “Automatic
Segmentation and Analysis of Thermograms Using Texture
SVM
K-NN
ANN
DT
RF
GMM
LRC
NB
RBF
RSDA
ABT
PCA
DNF
BNN
MCC
NSVC
Popularity of Machine Learning Methods
0
10
20
30
40
50
60
70
80
90
100
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
Accuracy (%)
Early Detection of Breast Cancer Using Machine Learning Techniques
e-ISSN: 2289-8131 Vol. 10 No. 3-2 27
Descriptors for Breast Cancer Detection,” 2015 Asia-Pacific Conf.
Comput. Aided Syst. Eng., pp. 2429, 2015.
[28] H. Ayeldeen, M. A. Elfattah, O. Shaker, A. E. Hassanien, and T.-H.
Kim, “Case-Based Retrieval Approach of Clinical Breast Cancer
Patients,” 2015 3rd Int. Conf. Comput. Inf. Appl., pp. 3841, 2015.
[29] T. K. Avramov and D. Si, “Comparison of Feature Reduction Methods
and Machine Learning Models for Breast Cancer Diagnosis,” Proc. Int.
Conf. Comput. Data Anal. - ICCDA ’17, pp. 6974, 2017.
[30] M. Ngadi, A. Amine, and B. Nassih, “A Robust Approach for
Mammographic Image Classification Using NSVC Algorithm,” Proc.
Mediterr. Conf. Pattern Recognit. Artif. Intell. - MedPRAI-2016, pp.
4449, 2016.
[31] Z. Jiang, and W. Xu, “Classification of benign and malignant breast
cancer based on DWI texture features,” ICBCI 2017 Proceedings of the
International Conference on Bioinformatics and Computational
Intelligence 2017.
[32] M. U. Salma, “Fast Modular Artificial Neural Network for the
Classification of Breast Cancer Data,” Proc. Third Int. Symp. Women
Comput. Informatics - WCI ’15, pp. 6672, 2015.
[33] V. Bevilacqua, A. Brunetti, M. Triggiani, D. Magaletti, M. Telegrafo,
and M. Moschetta, “An Optimized Feed-forward Artificial Neural
Network Topology to Support Radiologists in Breast Lesions
Classification,” Proc. 2016 Genet. Evol. Comput. Conf. Companion -
GECCO ’16 Companion, pp. 13851392, 2016.
[34] M. Rmili, and A. El, “A Combined Approach for Breast Cancer
Detection in Mammogram,” 2016 13th International Conference on
Computer Graphics, Imaging and Visualization, pp. 350353, 2016.
... These cells originate from the milk glands of the breast. The classification of these cells is determined by the rate at which these unusual cells progress and the effect that they have on other normal cells that can eventually affect the whole body [1]. According to statistics of the World Health Organization (WHO), breast cancer is the most common type of cancer occurring in women globally, accounting for approximately 2.1 million new breast cancer cases [2]. ...
... Mammography has emerged as an alternative and is being widely used in the medical field. However, relying only on mammograms has a high risk of false positives which often lead to unnecessary biopsies and surgeries [1]. With the development of automated medical applications, researchers have been developing automated breast cancer detection systems. ...
... Many applications have been deployed using machine learning techniques. In [1], the use of machine learning has been reported to determine the types of treatment that need to be administered to cancer patients. It has been reported by several authors [10][11][12] that there has not been any significant improvement in accuracy with the early computer aided applications for breast cancer detection that were developed. ...
Article
Full-text available
The real cause of breast cancer is very challenging to determine and therefore early detection of the disease is necessary for reducing the death rate due to risks of breast cancer. Early detection of cancer boosts increasing the survival chance up to 8%. Primarily, breast images emanating from mammograms, X-Rays or MRI are analyzed by radiologists to detect abnormalities. However, even experienced radiologists face problems in identifying features like micro-calcifications, lumps and masses, leading to high false positive and high false negative. Recent advancement in image processing and deep learning create some hopes in devising more enhanced applications that can be used for the early detection of breast cancer. In this work, we have developed a Deep Convolutional Neural Network (CNN) to segment and classify the various types of breast abnormalities, such as calcifications, masses, asymmetry and carcinomas, unlike existing research work, which mainly classified the cancer into benign and malignant, leading to improved disease management. Firstly, a transfer learning was carried out on our dataset using the pre-trained model ResNet50. Along similar lines, we have developed an enhanced deep learning model, in which learning rate is considered as one of the most important attributes while training the neural network. The learning rate is set adaptively in our proposed model based on changes in error curves during the learning process involved. The proposed deep learning model has achieved a performance of 88% in the classification of these four types of breast cancer abnormalities such as, masses, calcifications, carcinomas and asymmetry mammograms.
... The model would prioritise women with a high probability of breast abnormalities or cancer which in turn accelerates the needed process for those with medical urgency. Previous studies showed that early detection of breast cancer reduces its mortality [40,41]. Additionally, one of the factors of severe breast cancer presentation and poor survival among breast cancer patients was a delay in seeking medical treatment [42][43][44][45]. ...
Preprint
Full-text available
This study aimed to determine the feasibility of the development of an over-the-counter (OTC) screening model using machine learning for breast cancer screening in the Asian women population. Data were retrospectively collected from women who came to the Hospital Universiti Sains Malaysia, Malaysia. Five screening models were developed based on machine learning methods ; random forest, artificial neural network (ANN), support vector machine (SVM), elastic-net logistic regression and extreme gradient boosting (XGBoost). Features used for the development of the screening models were limited to information from the patients' registration form. The model performance was assessed across the dense and non-dense groups. SVM had the best sensitivity while elastic-net logistic regression had the best specificity. In terms of precision, both random forest elastic-net logistic regression had the best performance, while, in terms of PR-AUC, XGBoost had the best performance. Additionally, SVM had a more balanced performance in terms of sensitivity and specificity across the mammographic density groups. The three most important features were age at examination, weight and number of children. In conclusion, OTC models developed from machine learning methods can improve the prognostic process of breast cancer in Asian women.
... In this approach, the data set is divided into little sub-data, which are subsequently divided into even smaller subdata. It is not susceptible to noise (Tahmooresi et al., 2018). ...
Article
Full-text available
Breast cancer is one of the most dangerous diseases and the second largest cause of women cancer death. Techniques and methods have been adopted for early indications of the disease signs as it's the only effective way of managing breast cancer in women. This review explores the techniques used for breast cancer in Computer-Aided Diagnosis (CAD) using image analysis, deep learning and traditional machine learning. It primarily gives an introduction to the various strategies of machine learning, followed by an explanation of the various deep learning techniques and particular architectures for breast cancer detection and their classification. After the review, the researcher recommended the need for the inclusion of deep learning in machine learning because it performs multi-functions in enabling medical diagnosis. Also, it is important to involve the integration of more than learning methods in medical learning to improve the process of medical diagnostic imaging and their benefits and limitations, recent advancements and development are discussed by reviewing the existing secondary sources. This study reviews papers published from 2015 (early publications on breast cancer) to 2021. This paper is a review of the latest works and techniques have done in the field with the future trends and problems in breast cancer categorization and diagnosis.
... SVM has been successfully used for clinical diagnosis (Veropoulos et al., 1999). The accuracy achieved by SVM method achieved is higher when used alone four combined with other methods to improve performance and the maximum achieved accuracy on single or hybrid was 99.8% which can be further improved with elaborated model (Tahmooresi et al., 2018). The elaborated artificial intelligence techniques such as ELM ANN is best suited for classification and regression problems with good accuracy in breast cancer diagnosis (Utomo et al., 2014). ...
... Stochastic Gradient Descent (SGD) is one of the optimization techniques which when used with the SVM model not only optimizes the accuracy of the model but also reduces the execution time [26]. SVM has been known to perform better than many other ML techniques in the medical domain [4], [27]- [29]. Hence SVM is chosen as one of the classifiers. ...
Conference Paper
Full-text available
Breast Cancer is the second most leading cause of death among women. The early detection of the disease increases the chances of survival of the patient. Therefore, there is always a need for techniques that can accurately predict the presence of cancer. Data Mining is one such powerful technique that can assist clinicians to effectively use the data for timely prediction of the disease. In the medical domain, data is usually imbalanced with unequal distribution of the positive and negative classes. Imbalanced datasets introduce a bias in the model and can thus reduce the accuracy of the minority class predictions. In the case of cancer detection, the mammographic data is highly imbalanced, and predicting the positive (minority) class is of the utmost importance. To achieve this, different models using various class balancing techniques are built and evaluated. The experiments show that the performance of the weighted approach and the undersampling technique is better than oversampling and hybrid techniques. The best performing classifiers are the weighted XGBoost model and Stacking ensemble with the average AUC of 0.78 and 0.76 respectively.
Chapter
One of the most common diseases among the middle-aged women is breast cancer. The proposed system designed an efficient breast cancer detection and classification approach using an ensemble booster algorithm “XGBOOST” classifier. All the traditional approaches in machine learning have developed models with low variance or with high bias. So, the proposed system has evaluated the model with different evaluation metrics. In the world of machine learning, optimization has a great impact. To address this issue, the proposed system performed whirling of hyper-parameters during the classification process. Finally, the designed system is compared with conventional models. The major goal of the proposed system is to identify the breast cancer and classify the stage of cancer. So, the automated system can help the doctors to recommend the treatment or medicines and in turn the morality rate due to the breast cancer can be reduced.
Conference Paper
Full-text available
Medical consultation and prediction has become an interactive decision-making process for patients. E-healthcare and telemedicine are now widely accepted as a method for remote consultation between patient and doctor. This paper proposes an android mobile health application model for breast cancer diagnosis and daily heath prediction with the help of which we can improve the quality of treatment for patients. Breast cancer is one of the primary reasons for the death of women in recent times, being the second most common cause of cancer deaths of women worldwide. A lot of research has been done on early detection of breast cancer so as to allow starting of treatment early to increase the chance of survival. Breast Cancer Diagnosis is distinguishing benign from malignant breast lumps. Machine Learning Techniques are used as an approach for diagnosis of this disease. The user interface will be implemented using android/java programming, and the disease diagnosis part is designed with the help of symptoms-disease data and ML prediction techniques.
Conference Paper
Full-text available
Breast cancer (BC) has been the second largest cause of death for women around the world for the past few years. BC is characterized by the chronic pain, genes mutation, color (redness), changes in the size and texture of the skin. BC classification helps clinicians to find a comprehensive and accurate response to treatment, with the most common binary classification (benign / malignant cancer). Nowadays, the Machine Learning (ML) techniques are commonly used in the case of classification of breast cancer. They support with high classification accuracy and rapid evaluation technologies. The proposed research work is mainly focused on supervised learning algorithm, which uses four distinct classifiers: K-Nearest Neighbor (KNN), Weighted K-Nearest Neighbor (WKNN), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA) and Artificial Neural Network (ANN) for the classification of breast cancer. Also, this research work suggests the difference between the aforementioned classifiers and determines their accuracy. The performance of the classifier is assessed based on its accuracy, sensitivity, specificity, precision and recall. Results indicate that, ANN provides the highest accuracy of 97.60 % than the other classifiers.
Conference Paper
Breast cancer is a leading cause of death after skin cancer and is a threat to women worldwide. The automated technique of breast cancer that is still being considered is screening X-ray mammography by radiologists, which sometimes led to false-positive and false-negative results, in turn increasing the mortality ratio in breast cancer patients. However, earlier study shows that AI can be used as a detecting aid for radiologists where either radiologists or AI can work hand in hand or AI can be used as the second opinion for better results. The main objective of this paper is to include a technical review, which mainly summarizes contemporary research progress on mammogram image analysis and an overview of various projected AI-based models such as machine learning (ML), deep learning (DL), transfer learning (TL) and extreme learning machine (ELM) and the way they function and their advantages and limitations. Also, we convey how the different models have taken into contemplation to detect breast cancer more accurately with lesser false-positive (sensitivity) or false-negative rates (specificity) using the 2D or 3D mammograms images. This article also discusses several hyperparameters that affect the model accuracy.
Article
Full-text available
Artificial intelligence is a spectacular part of computer engineering that has earned a compelling diversion in the field of medical data classification due to its state-of-art algorithmic strength and learning capabilities. Machine Learning is a major sub-domain of artificial intelligence, where it has become one of most promising fields in computer science. In recent years, there is a large spectrum of healthcare and biomedical data has been growing intensely. Due to the huge labeled or unlabeled data, it is important to have a compact and robust machine learning solution for classification. Several optimizers have been deployed to improve the inclusive performance of machine learning models. The classification of machine learning models depend on several factors. This comprehensive review paper aims to insight the current stage of optimized machine learning success on medical data classification. Increasing number of unstructured medical data has been utilizing in machine learning algorithms to predict intuitions. But it is difficult to inherent immense intuition from those data. So machine learning researchers have utilized state-of-art optimizers and novel feature selection techniques to overcome and emend the performance accuracy. We have highlighted some recent literatures, which exhibits the robust impact of optimizers and feature selection on machine learning techniques on medical data characterization. On the other hand, a clean-cut introduction on machine learning and theoretical outlook of widely utilized optimization techniques like genetic algorithm, gray wolf optimization, and particle swarm optimization are discussed for initial understanding to the optimization techniques.
Article
Full-text available
One of the deadly diseases among humans is Cancer, which occurs almost anywhere in the human body. Cancer is caused by the cells that spread into the surrounding tissues by dividing itself uncontrollably. Breast Cancer is the most common cancer among women. Early detection and diagnosis of breast cancer are treatable and curable. Many women have no symptoms for this cancer at an early stage. The abnormal cells in the breast will risk for the development of breast cancer. So, it is important for women to regularly examine their breast. Technologies can be utilized in a smarter way with Artificial Intelligence techniques to assist the women during their examination of the breast at their living place to avoid the risk of breast cancer. The main aim is to develop a lowcost self-examining device for the detection of breast cancer and abnormality in the breast using an efficient optical method, Deep-learning algorithm and Internet of Things.
Article
Full-text available
Nowadays, classification is the most efficient method for breast cancer detection using mammography images. During the last two decades, researchers achieved very good results using different kinds of classification methods. In this context, we will focus in this work on the quality of mammography image classification by proposing a new approach that us compares with the state-of-art methods. Our method consists of two phases; a semi-supervised step based on SKDA and a second step based on SVM. As will be shown in the experiments section, results on Mammography data set show that the proposed algorithm can get very good results. The application of our algorithm on five well known real data sets allow us to validate our method and show its interest in the context of image based medical diagnosis. The best precision obtained by NSVC exceeded 99 %.
Article
Full-text available
Background Breast cancer is the commonest cancer among women worldwide. About one in nineteen women in Malaysia are at risk, compared to one in eight in Europe and the United States. The objectives of this study were: (1) to assess patients’ knowledge on risk factors, symptoms and methods of screening of breast cancer; and (2) to determine their perceptions towards the disease treatment outcomes. Methods A cross-sectional survey using a validated self-administered questionnaire was conducted among 119 consecutive surgical female patients admitted from 1st of September to 8th of October 2015 in Hospital Sultan Abdul Halim, Kedah. Data were analyzed using General linear regression and Spearman’s correlation with Statistical Package for Social Science (SPSS) version 20. Results Mean (SD) age was 40.6 (15.1) years and majority of the patients were Malay (106, 89.1%). Mean scores for general knowledge, risk factors and symptoms of breast cancer were 50.2 (24.0%), 43.0 (22.9%) and 64.4 (28.4%) respectively. Mean total knowledge score was 52.1(19.7%). 80 (67.2%) and 55 (46.2%) patients were aware of breast self-examination and clinical breast examination recommendations, respectively. Generally, patients had positive perceptions towards breast cancer treatment outcomes. However, majority (59.7%) considered that it would be a long and painful process. Knowledge was significantly better among married women with spouses (p=0.046), those with personal history of breast cancer (p=0.022) and with monthly personal income (p=0.001) with the coefficient of determination, R²=0.16. Spearman’s correlation test showed a significant positive relationship between monthly personal income and breast cancer awareness (r = 0.343, p <0.001). Conclusion Awareness on breast cancer among our patients was average. Thus, there is a need for more awareness programs to educate women about breast cancer and promote its early detection.
Conference Paper
Full-text available
During past 20 years, it is stated that cancer belongings are mounting all-inclusive. Amid innumerable natures of cancers, breast cancer is witnessed as key reason of demise among women. Ultrasound, x-ray (mammograms and x-ray computed tomography), magnetic resonance imaging, thermography and nuclear medicine functional imaging are different modalities offered for early stage breast cancer detection. Mammography technology is a unadventurous breast cancer practice that can perceive tumorous masses on lower cost and better truthfulness. This paper designates the digital execution of a model, based on an intuitionistic fuzzy histogram hyperbolization and possibilitic fuzzy c-mean clustering algorithm for early breast cancer detection. Clustering plays a key role in segmentation fragment. Classical fuzzy clustering assigns data to multiple clusters at different degrees of membership but irrelevant data are also allocated to some clusters that do not relate to them. In our newfangled work we bound possibilistic method with fuzzy c-mean to resolve this issue after applying intuitionistic fuzzy histogram hyperbolization algorithm in initial preprocessing phase in the mammogram images. Further texture feature extraction technique is used for extracting features. Developed rules was applied in classifier to detect about the presence of cancerous tumor in mammogram images. The inclusive classification accuracy achieved 94% during training stage.
Conference Paper
In this paper, the purpose was to develop a computer-aided diagnosis (CADs) system that use diffusion-weighted Magnetic resonance image (DW-MRI) for distinguishing benign and malignant breast tumors. Sixty-one cases were collected, including 23 patients with benign tumors, and 38 patients with malignant tumors. Two types of texture features were obtained from each lesion, including 6 histogram statistical features and 16 gray-level co-occurrence matrix (GLCM) features. The feature selection was based on Random Forest-Recursive Feature Elimination (RF-RFE) method. Random Forest was utiliezed to build the classification model, and the classifier performance were evaluated based on area under the receiver operating characteristic curve (AUC), and using leave-one-out cross validation(LOOCV). 6 texture features (including 3 histogram statistical features and 3 GLCM features) are selected via this approach, an AUC of 0.76 was obtained, and the classification accuracy, sensitivity, and specificity were show to be 77.05%, 84.21%, 65.21%, respectively. The results suggest that the texture features can be used for developing CADs of breast cancer, and show high sensitivity.
Conference Paper
This paper builds on previously published studies that successfully apply machine learning techniques to diagnose breast cancer from image-processed nuclear features. The study compares five different classification models (Logistic Regression, Decision Tree, k-Nearest Neighbor, Linear Support Vector Machine and Cubic Support Vector Machine) and how the choice of the features used to construct these models affects their performance. The five classifiers were trained with reduced feature subsets identified by Principal Component Analysis (PCA), correlation selection, selection based on t-test significance, and Random Feature Selection. The Random Feature Selection was the most effective feature reduction technique that identified subset of features that, when used to construct the models, yielded the highest cross validation prediction accuracies. The logistic regression model trained with the 6 features identified by random selection had 97.77% accuracy, improvement of 1.5% over the logistic regression model with 3 features reported in the original papers. The linear SVM model trained with 10 features identified by random selection had 97.87% cross validated accuracy, an improvement of 0.4% over the linear model with 3 features reported in the original paper. The cubic SVM model with 11 features had similar accuracy of 97.98%. Stacking the logistic, linear SVM and cubic SVM models in an ensemble learner improved the classification accuracy to 98.56%.