Conference PaperPDF Available

An Integrated Approach to Stage 1 Breast Cancer Detection


Abstract and Figures

We present an automated, end-to-end approach for Stage~1 breast cancer detection. The first phase of our proposed work-flow takes individual digital mammograms as input and outputs several smaller sub-images from which the background has been removed. Next, we extract a set of features which capture textural information from the segmented images. In the final phase, the most salient of these features are fed into a Multi-Objective Genetic Programming system which then evolves classifiers capable of identifying those segments which may have suspicious areas that require further investigation. A key aspect of this work is the examination of several new experimental configurations which focus on textural asymmetry between breasts. The best evolved classifier using such a configuration can deliver results of 100% accuracy on true positives and a false positive per image rating of just 0.33, which is better than the current state of the art.
Content may be subject to copyright.
An Integrated Approach to Stage 1 Breast Cancer
Jeannie M. Fitzgerald
University of Limerick, Ireland
Conor Ryan
University of Limerick, Ireland
David Medernach
University of Limerick, Ireland
Krzysztof Krawiec
Poznan University of
We present an automated, end-to-end approach for Stage 1
breast cancer detection. The first phase of our proposed
work-flow takes individual digital mammograms as input
and outputs several smaller sub-images from which the back-
ground has been removed. Next, we extract a set of features
which capture textural information from the segmented im-
In the final phase, the most salient of these features are fed
into a Multi-Objective Genetic Programming system which
then evolves classifiers capable of identifying those segments
which may have suspicious areas that require further inves-
A key aspect of this work is the examination of several new
experimental configurations which focus on textural asym-
metry between breasts. The best evolved classifier using
such a configuration can deliver results of 100% accuracy on
true positives and a false positive per image rating of just
0.33, which is better than the current state of the art.
Categories and Subject Descriptors
1.2.2 [Artificial Intelligence]: ;Automatic Programming
Mammography; Classification; Multi-Objective Genetic Pro-
Routine mammographic screening, particularly at a national
level, is by far the most effective tool for the early detection
and subsequent successful treatment of breast cancer [30,
32]. It is essential to discover signs of cancer early, as sur-
vival is directly correlated with early detection [32].
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from
GECCO ’15, July 11 - 15, 2015, Madrid, Spain
2015 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ISBN 978-1-4503-3472-3/15/07. . . $15.00
The introduction of breast screening programs has con-
tributed to a significantly higher demand for radiologists,
and a world wide shortage of qualified radiologists who spe-
cialize in mammography [3] has led to many radiologists
being dangerously overworked [2]. This is likely to lead to
(i) there being insufficient time for radiologists to interpret
mammograms (which are notoriously difficult to read); (ii)
an inability to provide redundant readings (second reader);
and (iii) radiologists being overly conservative, which in turn
is likely to increase the number of patient call backs, poten-
tially resulting in unnecessary biopsies which may lead to
patient anxiety and mistrust of the system.
Fortunately, the increased availability of digital mammog-
raphy means that it is now much more feasible to use au-
tomated methods to assist with detection. A mammogram
is performed by compressing the breast between two plates
which are attached to a mammogram machine: an adjustable
plate on top with a fixed x-ray plate underneath. An image
is recorded using a digital detector located on the bottom
plate. Two views of each breast are recorded: the cranio-
caudal (CC) view, which is a top down view, and the medi-
olateral oblique (MLO) view, which is a side view taken at
an angle. Functional breast tissue is termed parenchyma
and this appears as white areas on a mammogram, while
the black areas are composed of adipose (non-functioning,
fatty) tissue which is transparent under X-rays.
Various levels of automation exist in mammography, and
these can generally be divided into Computer-Aided Detec-
tion (CAD) and Computer-Aided Diagnosis (CADx) [20]. In
this work we concentrate exclusively on CAD. In particular,
what is known as Stage 1 detection.
A stage 1 detector examines mammograms and highlights
suspicious areas that require further investigation. For this
task, it is important to strike a balance: an overly conser-
vative approach degenerates to marking every mammogram
(or segment of) as suspicious, while missing a cancerous area
can have disastrous consequences. Our objective is to de-
velop a stage 1 detector that is highly accurate in terms of
detecting suspicious areas (True Positives (TP)) with as few
false alarms (False Positives (FP)) as possible. In the lit-
erature, it is relatively standard practice to convert FP to
False Positives Per Image (FPPI).
The potential for CAD to improve screening mammogra-
phy outcomes by increasing the cancer detection rate has
been shown in several retrospective studies including that
of Cupples et al. [5] who reported an overall increase of 16%
in the cancer detection rates using CAD together with tra-
ditional detection methods. In that study, CAD increased
the detection rate of small invasive cancers by 164%.
The remainder of this paper is organised as follows: In
Section 2 we outline related work, then in Section 3 we pro-
vide a detailed description of our proposed workflow, next
in Section 4 we further describe our experiments and report
results and finally, in Section 5 we state our conclusions and
propose avenues for future research.
Previous GP research effort such as that of Nandi et al. [14,
24] has successfully tackled both feature selection and the
classification of previously identified abnormalities as either
benign or malignant.
Other notable research using GP is that Ahmad et al. [1]
who designed a Stage 2 cancer detector for the well known
Wisconsin Breast Cancer dataset, in which they used the
features extracted from a series of fine needle aspirations
and an evolved neural network. Ludwig and Roos [19] used
GP to estimate the prognosis of breast cancer patients from
the same data set, using GP to reduce the number of features
before evolving predictors. Langdon and Harrison [17] took
a different approach, using biopsy gene chip data, but their
system approached a similar level of automation.
More recent GP effort in mammography has been con-
cerned with a combination of feature detection and classifi-
cation. One such study by Ryan et al. [29] reports a best
TP Rate of 100% with an FPPI of just 1.5%. In other work,
the best reported seems to be that of Li et al [18] which
reports a 97.3% TP rate and 14.81 FPPI. Similar work by
Polakowski [27] had a lower TP rate (92%) but with a much
lower FPPI rate (8.39).
The standard method of reporting results is to use a TP/F-
PPI breakdown, which is what we will present here. How-
ever, it is important to note that while TP rates may be di-
rectly comparable if the same volumes are used, FPPI may
not be, as the metric depends to some extent on the num-
ber of mammographic regions examined and the detection
objective of the system. See 3.1 for further explanation.
Regarding the various types of features that have been
studied, those which capture aspects of shape [28], edge-
sharpness [23] and texture [4] have been used for mass seg-
mentation and detection. Nandi et al. [24] reported a clas-
sification accuracy of 98% using a combination of all three
of these feature types.
The majority of existing GP systems operate only at the
classification stage and rely upon previously extracted fea-
tures, and with the exception of Ryan et al. [29], all of the
previous work mentioned above, deals with a single breast
in isolation. In this paper, we are influenced by the research
of Tabar [33] which indicates that, in general, both breasts
from the same patient have the same textural characteristics.
Our hypothesis is that the existence of suspicious areas may
be more likely if a patient’s breasts are texturally different
from each other.
Another strong argument in favour of considering textural
aspects is the connection between breast density and breast
cancer. Density of breast tissue is an important attribute
of the parenchyma and it has been established that mam-
mograms of dense breasts are more challenging for human
experts. At the same time, repeated studies have demon-
strated that women with dense tissue in greater than 75%
of the breast are 4- to 6 times more likely to develop breast
cancer compared with women with little to no breast den-
sity [22].
A complete review of the important relevant research is
beyond the scope of this paper. The interested reader is
directed to [26] for an evaluation of the recent state-of-the-
art in computer-aided detection and diagnosis systems.
Given the importance of parenchymal density as a risk fac-
tor and the difficulty for human experts in identifying suspi-
cious areas in this challenging environment, we believe that
a stage 1 detector which focuses on textural asymmetry may
have a strong decision support role to play in the identifi-
cation of suspicious mammograms. GP is particularly suit-
able for this type of task as it is very flexible, is capable of
automated feature selection and extraction, and most im-
portantly in the medical domain – provides an explanation
for classification decisions in the form of a human readable
With this in mind, we develop an automated stage 1 detec-
tion system with GP [15] at its centre, where parenchymal
texture and textural asymmetry between both breasts of the
same patient determines the choice of image features (fea-
ture detection) and the construction of input datasets which
leverage this data. In doing so, we extend the work of [29] by
considering a different set of features and a greater number
of configurations.
Our work-flow begins with background suppression and
image segmentation and then progresses to the generation of
textural features. These are fed through the system, which
performs feature reduction before passing the most salient
ones to GP to evolve classifiers. The best resulting classifiers
deliver results of 100% accuracy on true positives and a false
positive per image rating of just 0.33, which is appreciably
better than prior work.
Figure 1: Image Segmentation
3.1 Background Suppression and Image Seg-
The background of the mammographic image is never per-
fectly homogeneous, and it includes at least one tag letter
indicating whether the image is either a right or left breast.
This is often augmented by a string of characters indicating
which view (CC or MLO) was taken. These backgrounds
need to be replaced with homogeneous ones to correctly pro-
cess the image in a later stage. We used the same approach
as in [29] to suppress the image background.
We chose to divide each image into three segments, and
to examine each segment separately. As there can be more
than one suspicious area in an image, we return true for as
many segments as the system finds suspicious, meaning that
an image can have several positives returned. With Stage 1
detectors such as ours, this is described by the FPPI of an
image. As the maximum FPPI is capped at the number of
segments that the breast is divided into, using fewer seg-
ments means that the FPPI will be lower. On the other
hand, accurate detection of the TPs is substantially more
difficult as the area is larger.
Using the same algorithm as described in [29], we divided
the breast images into three overlapping segments of roughly
similar size as shown in figure 1: one segment captures the
nipple area and one each for of the top and bottom of the
remainder of the breast. The overlapping is designed to help
reduce the possibility of a mass going undetected.
3.2 Feature Detection
Before attempting to classify mammograms as suspicious or
not we must first extract features that GP can use to dis-
tinguish between classes. In this study, we have chosen to
use Haralick’s Texture Features [11]. Textural features are
appropriate in this case because we are examining parenchy-
mal patterns, and our hypothesis is that suspicious areas are
likely to be texturally dissimilar to normal areas.
The seminal work of Haralick et al. [11] described a method
of generating 14 measures which can be used to form tex-
tural features from a set of co-occurrence matrices or “grey
tone spatial dependency matrices”. When applied to pixel
grey levels, the Grey Level Co-occurrence Matrix (GLCM)
is defined to be the distribution of co-occurring values at a
given offset. Using the co-occurrence matrix, different prop-
erties of the pixel distribution can be generating by applying
various calculations to the matrix values.
To quantitatively describe the textural characteristics of
breast tissue, we calculate a GLCM for each segment and
for each breast. To keep the GLCM size manageable, we
first reduce the number of gray levels to 256 (from 65535 in
the original images). We independently calculate GLCMs
for four orientations corresponding to two adjacent and two
diagonal neighbours. Next, we calculate the Haralick fea-
tures [11], which reflect (among others) contrast, entropy,
variance, and correlation of pixel values. In this work we
examine a neighbourhood of 1 and averaged the feature val-
ues for the four orientations. The down-sampling of gray
levels, construction of GLCMs and extraction of Haralick
features is achieved using Matlab [21].
Segments are rectangular and often extend beyond the
breast, thus containing some background information. A
GLCM calculated from such a segment in a conventional
way would register very high values for black pixels and so
distort the values of Haralick features. As many mammo-
graphic images contain sections of adipose tissue, which ap-
pears black in mammograms and, which is in its own right
useful information, we should not ignore black pixels. There-
fore, before calculating the GLCM, we increase by one the
intensity of every pixel within the breast, using the informa-
tion resulting from the segmentation stage. The pixels that
already have the maximal value retain it (this causes certain
information loss, albeit negligible one, as there are typically
very few such pixels). Then, once the GLCM has been cal-
culated, we simply “hoist” the GLCM up and to the left to
remove the impact of the unmodified background pixels.
We conducted a preliminary analysis of the 13 computed
Haralick features examining variance across and between
both classes and then carried out a more formal analysis us-
ing several ranker methods [10] which ranked the attributes
according to the concept of information gain [16]. In this
context information gain can be thought of as a measure
of the value of an attribute which describes how well that
attribute separates the training examples according to their
target class labels. These feature selection steps suggested
that the most promising features in terms of discrimination
were contrast and difference entropy. Accordingly, we dis-
carded the other features and let GP focus on those two.
3.3 Dataset Construction
We use the University of South Florida Digital Database for
Screening Mammography (DDSM) [13] which is a collection
of 43 “volumes” of mammogram cases. A volume is a col-
lection of mammogram cases and can be classified as either
normal,positive,benign or benign without callback. All pa-
tients in a particular volume have the same classification.
We use cases from the cancer 02 and three of the normal
volumes (volumes 1 to 3).
The incidence of positives within mammograms is roughly
5 in 1000 giving a massively imbalanced data set. To ensure
that our training data maintains a realistic balance, we de-
liberately select only a single volume of positive cases. In
constructing training and test data several images were dis-
carded either because of image processing errors or because
we were unable to confidently identify which segment/s were
cancerous for a particular positive case. This initial process-
ing resulted in a total of 294 usable cases 75 of which con-
tain cancerous growths (which we call positive from now on).
Each case initially consists of images for the left and right
breasts and for the MLO and CC views of each breast. Once
the segmentation step has been completed images are added
for each of the three segments (nipple/top/bottom) for each
view of each breast. Thus, there are a total of four images
for each breast, for each view: one for the entire breast (A),
and one for each of the three segments (At, Ab, An).
If we count the numbers of positives (P) and negatives (N)
in terms of breasts rather than cases, which is reasonable,
given that each is examined independently (i.e. most, but
not all, patients with cancerous growths do not have them
in both breasts), then the number of non-cancerous images
increases significantly: giving two for each non-cancerous
case and one for most cancerous growths. For the volumes
studied, of the 75 usable positive cases, 3 have cancer in
both breasts. Thus, considering full breast CC images only,
we have 78 positive images and 510 (219 * 2 + 72) negative
Turning our attention to segments (At, Ab, An) (excluding
full breast images), and again considering only CC segments
for the moment, for each non cancerous case we have 3 seg-
ments for each breast (left and right) together with 2 non
cancerous segments for each cancerous breast which gives
a total of 1686 non cancerous segments and 78 cancerous
segments. Similarly, for the MLO view there are 1686 non
cancerous segments and 78 cancerous ones.
Thus, we obtain three distributions: one for the non-
segmented single views (CC or MLO) full breast images (78
positives (Ps), and 510 negatives, (Ns)); one for the seg-
mented single views (78 Ps and 1686 Ns); and one for seg-
mented combined CC MLO views (156 Ps and 3372 Ns).
Each of these three distributions exhibit very significant
class imbalance which, in and of itself, increases the level
Name Ps Ns Breasts Segs Views Description
B1S0V1 78 510 1 1 CC Unsegmented (full breast image).
B1S1V1 78 1686 1 1 CC Single segment.
B1S2V2 156 3372 1 2 CC + MLO 1 segment for each view.
B1S3V1 78 1686 1 3 CC 3 segments (At, Ab, An).
B2S0V1 78 510 2 2 CC Unsegmented images from both breasts.
B2S2V1 78 1686 2 2 CC 1 segment from each breast.
B2S4+0V1 78 1686 2 4 CC 1 segment + unsegmented from each.
B2S3+0V1 78 1686 2 3 CC 1 segment from each + unsegmented from first.
B2S4V1 78 1686 2 4 CC 3 segments + one segment.
B2S6B1 78 1686 2 6 CC 3 segments + one segment + 2 unsegmented.
Table 1: Experimental Configurations. Each was generated from the same master data set.
of difficulty of the classification problem. The imbalance in
the data was mitigated in all cases by using Proportional
Individualised Random Sampling, as described in [8]
Based on this master dataset, we consider several setups
representing different configurations of breasts, segments and
views (see Table 1). The following terminology is used to de-
scribe the composition of instances for a given setup: BXSYVZ,
where Xis the number of breasts, Ythe number of seg-
ments and Zthe number of views for a given instance. In
the cases where there is just one view (B1S1V1, B2S2V1,
B1S3V1, B2S4V1) we use the CC views, while in the cases
where the breast has been segmented, the system attempts
to classify whether or not the segment has a suspicious area
or not. In particular, the two breast (B2SYV1) special se-
tups which investigate the use of asymmetry. These rely
solely on the CC view: each instance is comprised of se-
lected features from one breast CC segment/s together with
the same features taken from the corresponding other breast
CC segment/s for the same patient.
There are two setups which deviate slightly from the nam-
ing scheme above: B2S3+0V1 and B2S4+0V1. Here, +0 in-
dicates that features for a non segmented image have been
We want to exploit any differences between a segment
and the rest of the breast (i.e. between Aand Ax) but also
between a segment and the corresponding segment from the
opposite breast, (say Band Bx), with the objective of evolv-
ing a classifier capable of pinpointing a specific cancerous
segment. To facilitate this process, where we use more than
one segment for a particular setup, features from the seg-
ment of interest are the first appearing data items in each
instance for the dataset for that setup. Details of he specific
setups used in the current study are as follows:
3.3.1 B1S0V1
This dataset configuration has an instance for the selected
features of each full breast image. It uses the CC view only
and has a separate instance for each breast for each patient.
3.3.2 B1S1V1
The BIS1V1 configuration also uses only the CC view, but
this setup uses each of the three segments (At, Ab, An) sep-
arately, i.e each instance is comprised of the feature values
for a single segment. Again there is an instance for each
breast for each segment.
3.3.3 B1S2V2
Both views are used in the B2S2V2 setup. For each seg-
ment, excluding the full breast image, for each breast, each
instance contains feature values for that segment and the
corresponding segment for the other view (CC or MLO).
So for a given segment, say At, there are instances for the
In this setup the segments of interest are AtLEF T CC ,
respectively, i.e the segment whose features occur first. This
principle applies to all of the remaining setups, where more
than one segment is used.
3.3.4 B1S3V1
This configuration uses three CC segments (At, Ab, An) for
a single breast, where the first segment is alternated in suc-
cessive instances For example, for a given single breast there
are three training instances similar to:
Where the order of the remaining two segments does not
3.3.5 B2S0V1
In the B2S0V1 we study a simple case of symmetry where
each instance is comprises features for 2 segments: one for
each full breast image, left and right, CC view only,
3.3.6 B2S2V1
In this configuration we investigate another case of sym-
metry: each entry consists of the feature values for a sin-
gle CC segment from one breast combined with those of
the corresponding CC segment from the other breast, for
the same patient. In this case there are two entries for
each segment: (AxLEFT CC, AxRIGHT CC) and (Ax
RIGHT CC, AxLEFT CC), where xrepresents a particu-
lar segment (At, Ab, An).
3.3.7 B2S3+0V1
Each instance in this setup is comprised of feature data from
segmented and unsegmented images. It consists of informa-
tion for a segment, the unsegmented image and the corre-
sponding segment from the other breast. For example:
3.3.8 B2S4+0V1
Similar to B2S3+0V1, each instance in this setup is again
comprised of feature data from segmented and unsegmented
images. It consists of information for a segment, the un-
segmented image and the corresponding segment from the
other breast together with the unsegmented image data for
the other breast. For example:
3.3.9 B2S4V1
The B2S4V1 experimental setup is a combination of B1S3V1
and B2S2V1 where each training instance is comprised of the
feature values for the three segments for a single breast (A)
combined with the corresponding segment from the other
breast (B) for the leftmost, first occurring segment of A.
For example:
Where in this instance AtLEFT CC is the segment of in-
3.3.10 B2S6V1
The final experimental setup is an extension of B2S4V1
where feature values for the full breast segment for the right
and left breasts are added. For example:
Where in this instance AtLEFT CC is the segment of in-
It is important to note here is that where more than one
segment is used the segment of interest is the first occurring
leftmost one. If that segment has been diagnosed as can-
cerous then the training / test instance in which it occurs
is marked as positive, and if it has not been diagnosed as
cancerous then the entire instance is marked as negative re-
gardless of the cancer status of any other segments used in
that particular instance. Thus, excluding the B1S0V1 setup,
the objective is not simply to determine if a given breast is
positive for cancer, but rather to pinpoint which segments
are positive. If successful, this capability could pave the
way for further diagnosis.
All experiments used a population 200 individuals, running
for 60 generations, with a crossover rate of 0.8 and muta-
tion rate of 0.2. The minimum initial depth was 4, while
the maximum depth was 17. The instruction set was small,
consisting of just +,,,\. The tree terminals (leaves) are
selected from the available Haralick features, with two avail-
able per segment.
To transform a continuous output of a GP tree into a
nominal decision (Positive, Negative), we binarize it using
the method described in [7], which optimizes the binariza-
tion threshold individually for each GP classifier.
For our selection and replacement strategy we employed
an NSGA-II [6] Multi Objective GP (MOGP) algorithm as
described in [6] and updated in [9]. We chose to use the
multi-objective algorithm due to the relationship between
the main objectives for the mammography task. Prelim-
inary experiments with various composite single objective
fitness functions had not proved very successful and pre-
vious work [29] had demonstrated the effectiveness of the
MOGP approach.
The NSGA-II algorithm was used to drive selection ac-
cording to performance on three different objectives. When
using this type of algorithm for problems which may neces-
sitate trade-offs due to a natural tension between objectives,
the system typically does not return a single best individual
at the end of evolution, but rather a Pareto front, or range of
individuals representing various levels of trade-off between
the different objectives. However, for this particular task,
we are not interested in the pareto front of individuals, after
all, a model with a zero FPR and zero or very low TPR
(every instance classified as N) is not of much practical use
in this context. What we really care about is achieving the
lowest possible FPR for the highest possible TPR. Thus,
during evolution we maintained a single entry “hall of fame”
(HOF) for each CV iteration, whereby as we evaluated each
new individual on the training data, if it had a higher TPR
or if it had an equal TPR but a lower FPR to the HOF in-
cumbent for that CV iteration, the new individual replaced
that HOF incumbent.
Using a population of 200, at each generation, 200 new off-
spring are generated, then parents and offspring are merged
into one pool before running Pareto-based selection to select
the best 200. During evolution, we aim to minimize three
fitness objectives: FP Rate, 1TP Rate and 1AUC, where
AUC is a the area under ROC, calculated using the Mann-
Whitney [31] test.
We performed stratified five-fold cross-validation (CV, [12])
for all setups. However, we also retained 10% of the data
as a “hold out” (HO) test set, where for each set of 5 cross-
validated runs this HO test set data was separated from
the CV data prior to the latter’s allocation to folds for CV.
The data partitioning was carried out using the sci-kit learn
Machine Learning (ML) toolkit [25]. We conducted 50 cross-
validated runs (each consisting of 5 runs) with identical ran-
dom seeds for each configuration outlined in Table 1.
4.1 Results
In this section we present our experimental results firstly
with regard to AUC measure on the training and test par-
titions of the CV phase, before we examine the TP and FP
rates for this data. Finally we explore the results for each
performance metric, this time taking the performance on
hold-out data into consideration.
4.1.1 AUC
The plots in figure 2, show the evolution in population av-
erage AUC on the CV training data, development of the
best population training AUC, change in population aver-
age AUC over the generations on the CV test data, and the
evolution of the best population test AUC, where each of
these represent metrics which are averaged over all cross-
validated runs.
It seems that the best performing setups from the per-
spectives of both training and test CV data are those which
leverage information from both breasts, the single breast
configuration which uses all three segments or the single
breast setup which uses features from the unsegmented im-
Clearly some of the worst AUC results are achieved with
the configurations which use segments from a single breast,
Figure 2:
Average and best population training and test AUC, averaged over all cross-validated runs.
particularly that which uses two views of the same breast.
This is not very surprising as the features contain essen-
tially the same information, and having features which are
strongly correlated is known to be detrimental to accurate
Overall, the results suggest that increasing the number of
segments gives a significant boost to performance in terms
of training fitness but that the strategy does not necessarily
improve results on test data.
4.1.2 TP/FP Rates
Population average TP and FP rates for training data and
the corresponding rates on test data can be seen in figure 3.
The plots exemplify the tension which exists in the popu-
lation between the two competing objectives of maximizing
the True Positive Rate (TPR) while simultaneously trying
to minimize the False Positive Rate (FPR). In general, a
configuration which produces a higher than average TPR
will also produce a correspondingly higher FPR. For any
configuration, there will always be individuals within the
population which classify all instances as either negative or
positive. In order to accurately distinguish which configura-
tions are likely to deliver a usable classification model it is
better to examine the results of the best performing individ-
uals in the population on the various metrics: TP rate, FP
rate and AUC. We explore this aspect in section 4.1.3.
4.1.3 Model Selection
We report results on the training and test CV segments but
the most important results are those for the HO test set,
as these provide an indication of how the system might be
expected to perform on new, unseen instances.
To compare with results from the literature we convert
the FPR into FPPI which we report in table 3. Here, the
results reported arise from the data shown in table 2 which
represents the mean average results for the best trained in-
dividuals. Results for each HOF are firstly averaged for
each run in a CV set and then averaged across the 50 cross-
validated runs. These results refer to performance on the
crucial hold HO data.
Clearly the best results are produced by the two breast non-
segmented approach B2S2V1 with a TPR of 1 and an FPPI
of 0.33. This is closely followed by its single breast counter-
part B1S0V1 which again delivered a perfect TPR and an
FPP1 of 0.41.
Of the segmented setups the two augmented configura-
tions of B2S3+0V1 and B2S4+0V1 also produced good re-
sults with perfect TPR combined with good FPPIs of 1.11
and 1.08 respectively. Also the B2S4V1 method did very
well with a TPR of 1 and FPP1 of 1.11.
Overall, several of our configurations proved capable of
correctly classifying 100% of the cancerous cases while at
the same time having a low FPPI, and the best results were
delivered by individuals trained to view breast asymmetry.
TPR and FPPI produced by the most successful experimen-
tal configurations compare well with the results reported in
section 2, and also reinforce the quality of previous results
reported in [29] as different volumes have been used on this
occasion. We hypothesise that the improvement in FPPI
over the previous work is largely due to the addition of the
Haralick contrast attribute.
The experimental set up with the lowest FPPI was the
one that compared both entire breasts, showing that we suc-
cessfully leveraged textural breast asymmetry as a potential
indicator for abnormalities. Additionally, several of the seg-
mented configurations also produced very good results, in-
dicating that the system is capable of not only identifying
with high accuracy which breasts are likely to have suspi-
cious lesions but also which segments con<tain suspicious
Figure 3:
Average and best population training and test TP and FP rates, averaged over all cross-validated runs
Method Train TP Train FP Train AUC Test TP Test FP Test AUC HO TP HO FP HO AUC
B1S0V1 1 0.60 0.78 0.92 0.63 0.73 1 0.66 0.76
B1S1V1 1 0.62 0.74 0.94 0.65 0.69 1 0.65 0.80
B1S2V2 1 0.72 0.71 0.97 0.74 0.68 0.97 0.72 0.70
B1S3V1 1 0.48 0.82 0.93 0.50 0.77 0.96 0.51 0.76
B2S0V1 1 0.49 0.81 0.92 0.54 0.75 1 0.51 0.83
B2S2V1 1 0.55 0.77 0.96 0.58 0.74 1 0.57 0.82
B2S3+0V1 1 0.57 0.76 0.93 0.59 0.73 0.96 0.55 0.77
B2S4+0V1 1 0.54 0.76 0.92 0.57 0.71 0.96 0.52 0.78
B2S4V1 1 0.48 0.82 0.92 0.52 0.76 0.97 0.52 0.78
B2S6V1 1 0.40 0.84 0.92 0.45 0.77 0.86 0.46 0.73
Table 2: Mean average training, test and hold out TP, FP AUC of best trained individuals
Method Avg TP Avg FPPI Best TP Best FPP1
B1S0V1 1 0.61 1 0.41
B1S1V1 1 1.88 1 1.68
B1S2V2 0.97 2.03 0.95 1.86
B1S3V1 0.96 1.49 0.80 1.08
B2S0V1 1 0.45 1 0.33
B2S2V1 1 1.67 1 1.34
B2S3+0V1 0.96 1.57 1 1.11
B2S4+0V1 0.96 1.48 1 1.08
B2S4V1 0.97 1.52 1 1.11
B2S6V1 0.86 1.28 0.77 1.06
Table 3: Mean average TP and FPPI of best trained individuals,
TP and FPPI of single best trained individual, on HO data.
Best trained individuals are selected according to the algorithm de-
scribed in Section 4.
areas. The first of these capabilities could prove useful in
providing second reader functionality to busy radiologists,
whereas the second may provide inputs into an automated
diagnostic system.
Future work will focus on further refining abnormality de-
tection such that the specific location of suspicious areas
within segments may be identified. We are also exploring
the possibility of developing ensemble classifiers where each
member may have been trained on a different type of X-Ray
machine. This may be possible as digital mammograms are
in a format which contains meta-data which includes details
of the specific machine and location where the mammogram
was taken.
K. Krawiec acknowledges support from the Ministry of Science
and Higher Education grant 09/91/DSPB/0572. The remaining
authors gratefully acknowledge the support of Science Foundation
Ireland, grant number 10/IN.1/I3031.
[1] Arbab Masood Ahmad, Gul Muhammad Khan,
Sahibzada Ali Mahmud, and Julian Francis Miller. Breast
cancer detection using cartesian genetic programming
evolved artificial neural networks. In Terry Soule et al.,
editors, GECCO ’12: Proceedings of the fourteenth
international conference on Genetic and evolutionary
computation conference, pages 1031–1038, Philadelphia,
Pennsylvania, USA, 7-11 July 2012. ACM.
[2] Leonard Berlin. Liability of interpreting too many
radiographs. American Journal of Roentgenology,
175:17 ˆ
U22, 2000.
[3] Mythreyi Bhargavan, Jonathan H. Sunshine, and Barbara
Schepps. Too few radiologists? American Journal of
Roentgenology, 178:1075–1082, 2002.
[4] Keir Bovis and Sameer Singh. Detection of masses in
mammograms using texture features. In Pattern
Recognition, 2000. Proceedings. 15th International
Conference on, volume 2, pages 267–270. IEEE, 2000.
[5] Tommy E. Cupples, Joan E. Cunningham, and James C.
Reynolds. Impact of computer-aided detection in a regional
screening mammography program. American Journal of
Roentgenology, 186:944–950, 2005.
[6] Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and
TAMT Meyarivan. A fast and elitist multiobjective genetic
algorithm: Nsga-ii. Evolutionary Computation, IEEE
Transactions on, 6(2):182–197, 2002.
[7] Jeannie Fitzgerald and Conor Ryan. Exploring boundaries:
optimising individual class boundaries for binary
classification problems. In Proceedings of the 14th
international conference on Genetic and evolutionary
computation conference, GECCO ’12, pages 743–750, New
York, NY, USA, 2012. ACM.
[8] Jeannie Fitzgerald and Conor Ryan. A hybrid approach to
the problem of class imbalance. In International Conference
on Soft Computing, Brno, Czech Republic, June 2013.
[9] elix-Antoine Fortin and Marc Parizeau. Revisiting the
nsga-ii crowding-distance computation. In Proceedings of
the 15th Annual Conference on Genetic and Evolutionary
Computation, GECCO ’13, pages 623–630, New York, NY,
USA, 2013. ACM.
[10] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard
Pfahringer, Peter Reutemann, and Ian H. Witten. The
weka data mining software: an update. SIGKDD Explor.
Newsl., 11(1):10–18, November 2009.
[11] R. et al Haralick. Texture features for image classification.
IEEE Transactions on Systems, Man, and Cybernetics,
3(6), 1973.
[12] Trevor Hastie, Robert Tibshirani and Jerome Friedman.
The elements of statistical learning, volume 2. Springer,
[13] Michael Heath, Kevin Bowyer, Daniel Kopans, Richard
Moore, and W. Philip Kegelmeyer. The digital database for
screening mammography. In M.J. Yaffe, editor, Proceedings
of the Fifth International Workshop on Digital
Mammography, pages 212–218. Medical Physics Publishing,
[14] Rolando R Hern´andez-Cisneros, Hugo Terashima-Mar´ın,
and Santiago E Conant-Pablos. Comparison of class
separability, forward sequential search and genetic
algorithms for feature selection in the classification of
individual and clustered microcalcifications in digital
mammograms. In Image Analysis and Recognition, pages
911–922. Springer, 2007.
[15] J. Koza. Genetic programming: A paradigm for genetically
breeding populations of computer programs to solve
problems. Technical Report STAN-CS-90-1314, Dept. of
Computer Science, Stanford University, June 1990.
[16] Solomon Kullback and Richard A Leibler. On information
and sufficiency. The Annals of Mathematical Statistics,
pages 79–86, 1951.
[17] W.B. Langdon and A.P. Harrison. Gp on spmd parallel
graphics hardware for mega bioinformatics data mining.
Soft Computing, 12(12):1169–1183, 2008.
[18] Huai Li, Yue Wang, KJ Ray Liu, S-CB Lo, and Matthew T
Freedman. Computerized radiographic mass detection. i.
lesion site selection by morphological enhancement and
contextual segmentation. Medical Imaging, IEEE
Transactions on, 20(4):289–301, 2001.
[19] Simone A. Ludwig and Stefanie Roos. Prognosis of breast
cancer using genetic programming. In Rossitza Setchi et al.,
editors, 14th International Conference on Knowledge-Based
and Intelligent Information and Engineering Systems
(KES 2010), Part IV, volume 6279 of LNCS, pages
536–545, Cardiff, UK, September 8-10 2010. Springer.
[20] M Markey M. Sampat and A Bovik. Computer-aided
detection and diagnosis in mammography. In Alan C.
Bovik, editor, Handbook of Image and Video Processing.
Elsevier Academic Press, 2010.
[21] MATLAB. version 8.2 (R2012a). MathWorks Inc., Natick,
MA, 2013.
[22] Valerie A McCormack and Isabel dos Santos Silva. Breast
density and parenchymal patterns as markers of breast
cancer risk: a meta-analysis. Cancer Epidemiology
Biomarkers & Prevention, 15(6):1159–1169, 2006.
[23] Naga R Mudigonda, Rangaraj M Rangayyan, and JE Leo
Desautels. Gradient and texture analysis for the
classification of mammographic masses. Medical Imaging,
IEEE Transactions on, 19(10):1032–1043, 2000.
[24] R. J. Nandi, A. K. Nandi, R. Rangayyan, and D. Scutt.
Genetic programming and feature selection for classification
of breast masses in mammograms. In 28th Annual
International Conference of the IEEE Engineering in
Medicine and Biology Society, EMBS ’06, pages
3021–3024, New York, USA, August 2006. IEEE.
[25] F. Pedregosa et al. Scikit-learn: Machine learning in
Python. Journal of Machine Learning Research,
12:2825–2830, 2011.
[26] Nicholas Petrick, Berkman Sahiner, Samuel G Armato III,
Alberto Bert, Loredana Correale, Silvia Delsanto,
Matthew T Freedman, David Fryd, David Gur, Lubomir
Hadjiiski, et al. Evaluation of computer-aided detection and
diagnosis systemsa). Medical physics, 40(8):087001, 2013.
[27] W. E. Polakowski, D. A. Cournoyer, and S. K. Rogers.
Computer-aided breast cancer detection and diagnosis of
masses using difference of gaussians and derivative-based
feature saliency,. IEEE Trans. Med. Imag., 16:811–819,
[28] Rangaraj M Rangayyan, Nema M El-Faramawy, JE Leo
Desautels, and Onsy Abdel Alim. Measures of acutance
and shape for classification of breast tumors. Medical
Imaging, IEEE Transactions on, 16(6):799–810, 1997.
[29] Conor Ryan, Krzysztof Krawiec, Una-May O’Reilly,
Jeannie Fitzgerald, and David Medernach. Building a stage
1 computer aided detector for breast cancer using genetic
programming. In M. Nicolau et al., editors, 17th European
Conference on Genetic Programming, volume 8599 of
LNCS, pages 162–173, Granada, Spain, 23-25 April 2014.
[30] Robert A Smith, Stephen W Duffy, and L´aszl´o Tab´ar.
Breast cancer screening: the evolving evidence. Oncology,
26(5):471–475, 2012.
[31] Paul Stober and Shi-Tao Yeh. An explicit functional form
specification approach to estimate the area under a receiver
operating characteristic (roc) curve. Available at,
http://www2. sas. com/proceedings/sugi27/p226–227. pdf,
Accessed March, 7, 2007.
[32] Tabar, L. et al. A new era in the diagnosis of breast cancer.
Surgical oncology clinics of North America, 9(2):233–77,
April 2000.
[33] T. Tot, L. Tabar, and P. B. Dean. The pressing need for
better histologic-mammographic correlation of the many
variations in normal breast anatomy. Virchows Archiv,
437(4):338–344, October 2000.
... GA has established itself to be extremely effective in problems such as software testing [167], [170], medical diagnosis [68], [121], airline booking [13], seismic vibration [161], mechatronics and robot [76] and load-balancing [238], networks [110]. The simple algorithm of GA and flow chart for chromosome population are shown in algorithm 2 and figure 11 respectively. ...
... [5], [25], [35], [68], [103], [121], [142], [148], [161], [184], [186], [190], [216] 13 ...
... We list a few applications. For example, in climatology for modeling global temperature changes [10], for creating a monitoring network to measure the state of groundwater [11], in healthcare for early detection of breast cancer [12], in genetics for operon prediction [13], and in bioinformatics [14]. Among industrial applications belong a management and reservation system for airlines [15], finding the optimal robot path for revising complex structures [16], solving vehicle route planning problems [17], or optimizing and simplifying design in CAD systems [18]. ...
Full-text available
The presented research study focuses on demonstrating the learning ability of a neural network using a genetic algorithm and finding the most suitable neural network topology for solving a demonstration problem. The network topology is significantly dependent on the level of generalization. More robust topology of a neural network is usually more suitable for particular details in the training set and it loses the ability to abstract general information. Therefore, we often design the network topology by taking into the account the required generalization, rather than the aspect of theoretical calculations. The next part of the article presents research whether a modification of the parameters of the genetic algorithm can achieve optimization and acceleration of the neural network learning process. The function of the neural network and its learning by using the genetic algorithm is demonstrated in a program for solving a computer game. The research focuses mainly on the assessment of the influence of changes in neural networks’ topology and changes in parameters in genetic algorithm on the achieved results and speed of neural network training. The achieved results are statistically presented and compared depending on the network topology and changes in the learning algorithm.
... Genetic algorithms are mathematical optimization algorithms that solve engineering and other problems by representing the unknown variables as strings of digits that mimic DNA. This biomimetic approach has been hugely successful producing innovations in a range of fields (e.g., ophthalmology [31], oncology [32], engineering [33], and economics [34]); however, such work is rarely framed with any reference to the natural world or sustainability. This utilitarian bioinspired language can be contrasted to the position of BII experts such as Benyus who promotes biomimicry as innovation inspired by nature: "In a society accustomed to dominating or 'improving' nature, this respectful imitation is a radically new approach, a revolution really. ...
Full-text available
The disparity between disciplinary approaches to bioinspired innovation has created a cultural divide that is stifling to the overall advancement of the approach for sustainable societies. This paper aims to advance the effectiveness of bioinspired innovation processes for positive benefits through interdisciplinary communication by exploring the epistemological assumptions in various fields that contribute to the discipline. We propose that there is a shift in epistemological assumptions within bioinspired innovation processes at the points where biological models derived from reductionist approaches are interpreted as socially-constructed design principles, which are then realized in practical settings wrought with complexity and multiplicity. This epistemological shift from one position to another frequently leaves practitioners with erroneous assumptions due to a naturalistic fallacy. Drawing on examples in biology, we provide three recommendations to improve the clarity of the dialogue amongst interdisciplinary teams. (1) The deliberate articulation of epistemological perspectives amongst team members. (2) The application of a gradient orientation towards sustainability instead of a dichotomous orientation. (3) Ongoing dialogue and further research to develop novel epistemological approaches towards the topic. Adopting these recommendations could further advance the effectiveness of bioinspired innovation processes to positively impact social and ecological systems.
Der künstlichen Intelligenz wird gemeinhin ein hohes disruptives und marktzerstörerisches Potenzial zugeschrieben, weil durch sie eingeschliffene Routinen, Strukturen, Strategien und Geschäftsmodelle auf den Prüfstand gestellt werden. Der Beitrag widmet sich der Fragestellung, ob der Einsatz künstlich intelligenter Anwendungen im Gesundheitswesen die „bestehende Ordnung“ substituiert, komplementär arrondiert oder im Sinne der digitalen Assistenz unterstützt. Zudem ist zu klären, ob sich radikale KI-Visionen im pfadabhängigen, massenträgen und regulierten Gesundheitswesen überhaupt mit hoher Intensität umsetzen lassen. Der Grund hierfür ist in multiplen Umsetzungsbarrieren zu suchen, die revolutionären und flächendeckenden KI-Lösungen trotz technischer Machbarkeit im Wege stehen. Aus dem Blickfeld des ressourcenorientierten Ansatzes (Resource-based View) wird der Frage nachgegangen, ob und unter welchen Bedingungen die künstliche Intelligenz eine Kernkompetenz darstellt, die zu Disruption, Diskontinuität und Differenzierung (3-D-Modell) führen.
Full-text available
Scheduling is an important process that is present in many real world scenarios where it is essential to obtain the best possible results. The performance and execution time of algorithms that are used for solving scheduling problems are constantly improved. Although metaheuristic methods by themselves already obtain good results, many studies focus on improving their performance. One way of improvement is to generate an initial population consisting of individuals with better quality. For that purpose a variety of methods can be designed. The benefit of scheduling problems is that dispatching rules (DRs), which are simple heuristics that provide good solutions for scheduling problems in a small amount of time, can be used for that purpose. The goal of this paper is to analyse whether the performance of genetic algorithms can be improved by using such simple heuristics for initialising the starting population of the algorithm. For that purpose both manual and different kinds of automatically designed DRs were used to initialise the starting population of a genetic algorithm. In case of the manually designed DRs, all existing DRs for the unrelated machines environment were used, whereas the automatically designed DRs were generated by using genetic programming. The obtained results clearly demonstrate that using populations initialised by DRs leads to a significantly better performance of the genetic algorithm, especially when using automatically designed DRs. Furthermore, it is also evident that such a population initialisation strategy also improves the convergence speed of the algorithm, since it allows it to obtain significantly better results in the same amount of time. Additionally, the DRs have almost no influence on the execution speed of the genetic algorithm since they construct the schedule in time which is negligible when compared to the execution of the genetic algorithm. Based on the obtained results it can be concluded that initialising individuals by using DRs significantly improves both the convergence and performance of genetic algorithm, without the need of having to manually design new complicated initialisation procedures and without increasing the execution time of the genetic algorithm.
Conference Paper
Full-text available
In Machine Learning classification tasks, the class imbalance problem is an important one which has received a lot of attention in the last few years. In binary classification, class imbalance occurs when there are significantly fewer examples of one class than the other. A variety of strategies have been applied to the problem with varying degrees of success. Typically previous approaches have involved attacking the problem either algo- rithmically or by manipulating the data in order to mitigate the imbalance. We propose a hybrid approach which combines Proportional Individualised Random Sampling(PIRS) with two different fitness functions designed to improve performance on imbalanced classification problems in Genetic Programming. We investigate the ef- ficacy of the proposed methods together with that of five different algorithmic GP solutions, two of which are taken from the recent literature. We conclude that the PIRS approach combined with either average accuracy or Matthews Correlation Coefficient, delivers superior results in terms of AUC score when applied to either balanced or imbalanced datasets.
Conference Paper
Full-text available
We describe a fully automated workflow for performing stage 1 breast cancer detection with GP as its cornerstone. Mammograms are by far the most widely used method for detecting breast cancer in women, and its use in national screening can have a dramatic impact on early detection and survival rates. With the increased availability of digital mammography, it is becoming increasingly more feasible to use automated methods to help with detection. A stage 1 detector examines mammograms and highlights suspicious areas that require further investigation. A too conservative approach degenerates to marking every mammogram (or segment of) as suspicious, while missing a cancerous area can be disastrous. Our workflow positions us right at the data collection phase such that we generate textural features ourselves. These are fed through our system, which performs PCA on them before passing the most salient ones to GP to generate classifiers. The classifiers give results of 100% accuracy on true positives and a false positive per image rating of just 1.5, which is better than prior work. Not only this, but our system can use GP as part of a feedback loop, to both select and help generate further features.
Full-text available
Computer-aided detection and diagnosis (CAD) systems are increasingly being used as an aid by clinicians for detection and interpretation of diseases. Computer-aided detection systems mark regions of an image that may reveal specific abnormalities and are used to alert clinicians to these regions during image interpretation. Computer-aided diagnosis systems provide an assessment of a disease using image-based information alone or in combination with other relevant diagnostic data and are used by clinicians as a decision support in developing their diagnoses. While CAD systems are commercially available, standardized approaches for evaluating and reporting their performance have not yet been fully formalized in the literature or in a standardization effort. This deficiency has led to difficulty in the comparison of CAD devices and in understanding how the reported performance might translate into clinical practice. To address these important issues, the American Association of Physicists in Medicine (AAPM) formed the Computer Aided Detection in Diagnostic Imaging Subcommittee (CADSC), in part, to develop recommendations on approaches for assessing CAD system performance. The purpose of this paper is to convey the opinions of the AAPM CADSC members and to stimulate the development of consensus approaches and "best practices" for evaluating CAD systems. Both the assessment of a standalone CAD system and the evaluation of the impact of CAD on end-users are discussed. It is hoped that awareness of these important evaluation elements and the CADSC recommendations will lead to further development of structured guidelines for CAD performance assessment. Proper assessment of CAD system performance is expected to increase the understanding of a CAD system's effectiveness and limitations, which is expected to stimulate further research and development efforts on CAD technologies, reduce problems due to improper use, and eventually improve the utility and efficacy of CAD in clinical practice.
Conference Paper
Full-text available
The aim of this paper is to explore the application of Neuro-Evolutionary Techniques to the diagnosis of various diseases. We applied the evolutionary technique of Cartesian Genetic programming Evolved Artificial Neural Network (CG-PANN) for the detection of three important diseases. Some cases showed excellent results while others are in the process of experimentation. In the first case we worked on diagnosing the extent of Parkinson’s disease using a computer based test. Experiments in this case are in progress. In the second case, we applied the Fine Needle Aspirate (FNA) data for Breast Cancer from the WDBC website to our network to classify the samples as either benign (non-cancerous) or malignant (cancerous). The results from these experiments were highly satisfactory. In the third case, we developed a modified form of Pan-Tompkins´ s algorithm to detect the fiducial points from ECG signals and extracted key features from them. The features shall be applied to our network to classify the signals for the different types of Arrhythmias. Experimentation is still in progress
Full-text available
A fast learning neuro-evolutionary technique that evolves Artificial Neural Networks using Cartesian Genetic Programming (CGPANN) is used to detect the presence of breast cancer. Features from breast mass are extracted using fine needle aspiration (FNA) and are applied to the CGPANN for diagnosis of breast cancer. FNA data is obtained from the Wisconsin Diagnostic Breast Cancer website and is used for training and testing the network. The developed system produces fast and accurate results when compared to contemporary work done in the field. The error of the model comes out to be as low as 1% for Type-I (classifying benign sample falsely as malignant) and 0.5% for Type-II (classifying malignant sample falsely as benign).
Mammographic features are associated with breast cancer risk, but estimates of the strength of the association vary markedly between studies, and it is uncertain whether the association is modified by other risk factors. We conducted a systematic review and meta-analysis of publications on mammographic patterns in relation to breast cancer risk. Random effects models were used to combine study-specific relative risks. Aggregate data for > 14,000 cases and 226,000 noncases from 42 studies were included. Associations were consistent in studies conducted in the general population but were highly heterogeneous in symptomatic populations. They were much stronger for percentage density than for Wolfe grade or Breast Imaging Reporting and Data System classification and were 20% to 30% stronger in studies of incident than of prevalent cancer. No differences were observed by age/menopausal status at mammography or by ethnicity. For percentage density measured using prediagnostic mammograms, combined relative risks of incident breast cancer in the general population were 1.79 (95% confidence interval, 1.48-2.16), 2.11 (1.70-2.63), 2.92 (2.49-3.42), and 4.64 (3.64-5.91) for categories 5% to 24%, 25% to 49%, 50% to 74%, and >= 75% relative to < 5%. This association remained strong after excluding cancers diagnosed in the first-year postmammography. This review explains some of the heterogeneity in associations of breast density with breast cancer risk and shows that, in well-conducted studies, this is one of the strongest risk factors for breast cancer. It also refutes the suggestion that the association is an artifact of masking bias or that it is only present in a restricted age range.
Conference Paper
This paper improves upon the reference NSGA-II procedure by removing an instability in its crowding distance operator. This instability stems from the cases where two or more individuals on a Pareto front share identical fitnesses. In those cases, the instability causes their crowding distance to either become null, or to depend on the individual's position within the Pareto front sequence. Experiments conducted on nine different benchmark problems show that, by computing the crowding distance on unique fitnesses instead of individuals, both the convergence and diversity of NSGA-II can be significantly improved.
This paper explores a range of class boundary determination techniques that can be used to improve performance of Genetic Programming (GP) on binary classification tasks. These techniques involve selecting an individualised boundary threshold in order to reduce implicit bias that may be introduced through employing arbitrarily chosen values. Individuals that can chose their own boundaries and the manner in which they are applied, are freed from having to learn to force their outputs into a particular range or polarity and can instead concentrate their efforts on seeking a problem solution. Our investigation suggests that while a particular boundary selection method may deliver better performance for a given problem, no single method performs best on all problems studied. We propose a new flexible combined technique which gives near optimal performance across each of the tasks undertaken. This method together with seven other techniques is tested on six benchmark binary classification data sets. Experimental results obtained suggest that the strategy can improve test fitness, produce smaller less complex individuals and reduce run times. Our approach is shown to deliver superior results when benchmarked against a standard GP system, and is very competitive when compared with a range of other machine learning algorithms.