Page 1
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART B: CYBERNETICS1
Using Ant Programming Guided by Grammar for
Building Rule-Based Classifiers
Juan Luis Olmo, Member, IEEE, Jos´ e Ra´ ul Romero, Member, IEEE, and Sebasti´ an Ventura, Senior Member, IEEE
Abstract—The extraction of comprehensible knowledge is one
of the major challenges in many domains. In this paper an ant
programming (AP) framework, capable of mining classification
rules easily comprehensible by humans and, therefore, capable of
supporting expert-domain decisions, is presented. The algorithm
proposed, called GBAP (Grammar Based Ant Programming), is
the first AP algorithm developed for the extraction of classifica-
tion rules, and it is guided by a context-free grammar that ensures
the creation of new valid individuals. To compute the transition
probability of each available movement, this new model intro-
duces the use of two complementary heuristic functions, instead
of just one, as typical ant-based algorithms do. The selection
of a consequent for each rule mined and the selection of the
rules that make up the classifier is based on the use of a niching
approach. The performance of GBAP is compared against other
classification techniques on 18 varied data sets. Experimental
results show that our approach produces comprehensible rules
and competitive or better accuracy values than those achieved
by the other classification algorithms compared with it.
Index Terms—ant programming (AP), grammar-based auto-
matic programming, ant colony optimization (ACO), classifica-
tion, data mining (DM)
I. INTRODUCTION
D
trivial and useful knowledge from data. The discovered knowl-
edge should have good generalization performance, i.e., it
should accurately predict the values of some attributes or
features of data that were not used during the run of the
DM algorithm. This paper focuses on the classification task of
DM, whose goal is to predict the value of the class given the
values of certain other attributes (referred to as the predicting
attributes). A model or classifier is inferred in a training
stage by analyzing the values of the predicting attributes
that describe each instance, as well as the class to which
each instance belongs to. Thus, classification is considered to
be supervised learning, in contrast to unsupervised learning,
where instances are unlabelled. Once the classifier is built,
it can be used later to classify other new and uncategorized
instances into one of the existing classes.
A great variety of algorithms and techniques have been used
to accomplish this task, including decision trees [1], decision
rules [2], naive Bayes [3], support vector machines [4], neural
networks [5], genetic algorithms [6], etc. In domains such
as medical diagnosis, financial engineering, marketing, etc.,
ATA MINING (DM) involves the process of applying
specific algorithms for extracting comprehensible, non-
Manuscript received August 7, 2010;
The authors are with the Department of Computer Science and Nu-
merical Analysis, University of Cordoba, 14071 Cordoba, Spain (e-mail:
{juanluisolmo,jrromero,sventura}@uco.es.
Digital Object Identifier ?
where domain experts can use the model inferred as a decision-
support system, decision trees and decision rules are especially
interesting. These techniques have a high-level representation
and, therefore, they allow the user to interpret and understand
the knowledge extracted. For example, in medical problems,
classification rules can be verified by medical experts, thus
providing better understanding of the problem in-hand [7].
More recently, ant colony optimization (ACO) [8] has
successfully carried out the extraction of rule-based classifiers.
ACO is a nature-inspired optimization metaheuristic based on
the behavior and self-organizing capabilities of ant colonies
in their search for food. The first application of ACO to the
classification task was the widely spread Ant-Miner algorithm,
proposed by Parpinelly et al. [9], and it has become a bench-
mark algorithm in this field. Since then, several extensions and
modifications of this sequential covering algorithm have been
presented.
Another technique that has reported good results in classi-
fication is genetic programming (GP) [10]. GP is a particular
type of automatic programming, a method that uses search
techniques to find computer programs for solving a given
problem, without requiring that the user knows the structure
of the solution in advance. The user just has to specify the
basic blocks that make up any program or individual, and
how individuals are evaluated. Concretely, GP uses genetic
algorithms as the search technique. Although automatic pro-
gramming seems to fit well to classification problems, to the
best of our knowledge, Ant Programming (AP)—a.k.a. ACO-
based automatic programming [11]—, which is another kind
of automatic programming method that instead uses ACO
as the search technique, has never been explored to tackle
classification problems. In this paper we first look at the AP
works published in the literature, to prove that the development
of AP algorithms and their application to DM is still an
unexplored and promising research area. Then, we explore
the application of an AP algorithm for mining classification
rules, which takes advantage of the inherent benefits of both
ACO metaheuristic and automatic programming. In addition,
a context-free grammar (CFG) is used during the learning
process. All generated individuals must adhere to this gram-
mar, which also provides flexibility to apply the developed
algorithm to a variety of problems with minor changes. Our
proposal can support any number of classes, so that it can
be easily applied to a large variety of data sets, generating
a rule-based classifier. It aims to construct not only accurate
but also comprehensible classifiers. In contrast to other ACO
classification algorithms, our proposal provides more expres-
sive power, because the grammar allows to control several
aspects related to comprehensibility, such as the definition
Page 2
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART B: CYBERNETICS2
of specific operators, the specification of the conditions that
can appear in rule antecedents or how these conditions are
connected [12]. Moreover, our algorithm lacks the drawbacks
of rule induction using sequential covering algorithms [13],
as Ant-Miner, because it does not rule out examples when
building the classifier.
It is a challenging task to compare the results obtained by
this new approach with those obtained by other rule-based
classification techniques. Thus, we perform a benchmarking
experimental study where several classification techniques are
considered, using a wide variety of data sets. The results
obtained are promising and they show that our approach
performs accurately, building understandable classifiers.
The remainder of this paper is organized as follows. In the
next section we present some related work on ACO and a
brief review of AP. In Section III, we describe the proposed
algorithm. Section IV explains the experiments carried out, the
data sets used and the algorithm set up. The results obtained
are discussed in Section V. Finally, Section VI presents some
concluding remarks.
II. RELATED WORK
In this section, we first present some related work on the
application of ACO to classification. We then provide a review
of the various AP algorithms published in the literature so far.
A. Ant Colony Optimization
ACO is an agent-based nature-inspired optimization meta-
heuristic placed into swarm intelligence (SI) [14]. SI is con-
cerned with the development of multi-agent systems inspired
by the collective behavior of simple agents, e.g., flocks of
birds, schools of fish, colonies of bacteria or amoeba, or
groups of insects living in colonies, such as bees, wasps or
ants. Specifically, ACO bases the design of intelligent multi-
agent systems on the foraging behavior and organization of
ant colonies in their search for food, where ants communicate
between themselves through the environment, in an indirect
way, by means of a chemical substance—pheromone—that
they spray over the path they follow—phenomenon known
as stigmergy. The pheromone concentration in a given path
increases as more ants follow this path, and it decreases more
quickly as ants fail to travel it, since the evaporation in this
path becomes greater than the reinforcement. The higher is the
pheromone level in a path, the higher is the probability that a
given ant will follow this path.
ACO algorithms were initially applied to combinatorial
optimization problems [15], finding optimal or near optimal
solutions. Since then, ACO algorithms have been engaged in
an increasing range of problem domains, and they have also
been shown to be effective when tackling the classification
task of DM [16]. The first algorithm that applied ACO to
rule induction was Ant-Miner [9], and it has become the most
referred-to ACO algorithm in this field. It follows a separate-
and-conquer approach where, starting from a training set and
an empty set of rules, it finds new rules to be added to the
set of discovered rules. As it discovers new rules, it removes
those instances of the training set that are covered by each new
rule, reducing the size of the training set. Ant-Miner chooses a
new term for the current partial rule by applying the transition
rule, and it only considers including terms that have not been
previously chosen. It keeps on adding new terms to build this
rule antecedent until one term from each available attribute
has been selected, or until when selecting any term that is still
available, the number of training instances covered by the rule
is reduced below the value specified by the minimum cases per
rule parameter. An information theoretic measure in terms of
entropy is used as the heuristic function. The probability with
which a given ant will select a node termij—a rule condition
of the form Ai= Vij, where Aiis the i-th attribute and Vijis
the j-th value of the domain of Ai—to be added to the current
partial rule is assessed using the following formula:
Pij=
ηij· τij(t)
j=1(ηij· τij(t))
Σa
i=1xi· Σbi
(1)
where ηij is a problem-dependent heuristic function for
termij; τij is the amount of pheromone associated to the
transition between attribute Ai and attribute Aj at time t; a
is the total number of attributes; biis the number of values in
the domain of attribute Ai; and xiis set to 1 if the attribute Ai
has not been selected yet for the construction of the current
partial rule by the current ant, or otherwise, 0.
Ant-Miner continues discovering new rules until either the
training set is empty or the number of training instances not
covered by any rule is below a user-defined threshold. Finally,
the majority class among the instances covered by the rule is
assigned as the consequent. The quality of the rules is gauged
by the following expression:
fitness =sensitivity · specificity
=
TP+ FN
where TP stands for true positives, which are positive in-
stances correctly classified as positives; FP stands for false
positives, i.e., negative examples erroneously labeled as posi-
tives; TN are true negatives, i.e., negative instances correctly
identified as negatives; and FN are false negatives, which are
positive examples incorrectly classified as negatives [1].
Since the publication of Ant-Miner, other research works
have followed the research lines suggested in [9], exploring
the effects of several modifications. Table I lists chronologi-
cally the modifications of Ant-Miner, summing up the main
differences of these algorithms with respect to the reference
one. Some variations entail the use of different mechanisms for
pruning, pheromone updating, heuristic function, or they are
designed for including interval rules, dealing with continuous
attributes, extracting fuzzy classification rules or being applied
to multi-label or hierarchical classification.
Many of these extensions imply minor changes, and the
results obtained are slightly different from the ones obtained
by the original Ant-Miner. For example, Liu et al. [17]
presented Ant-Miner2, where they applied a much simpler
heuristic function, acting on the assumption that pheromone
reinforcement has enough power to compensate possible errors
induced by the use of this less effective heuristic measure.
In addition to this change in the heuristic function, the same
TP
·
TN
TN+ FP
(2)
Page 3
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART B: CYBERNETICS3
TABLE I
MODIFICATIONS AND EXTENSIONS OF ANT-MINER
ReferencePruning Pheromone HeuristicTransition
rule
Fitness
measure
Interval
rules
Continuous
attributes
Fuzzy Multi-
Label
Hierarchical
Liu et al. [17]
Liu et al. [18]
Wang, Feng [19]
Chen et al. [20]
Chan, Freitas [21]
Chan, Freitas [22]
Smaldon, Freitas [23]
Jin et al. [24]
Galea, Shen [25]
Swaminathan [26]
Martens et al. [27]
Otero et al. [28]
Nalini, Balasubramanie [29]
Otero et al. [30]
Salama, Abdelbar [31]
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√√
√
√√√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
authors proposed a different transition rule and pheromone
reinforcement in Ant-Miner3 [18].
In contrast, Ant-Miner+, proposed by Martens et al. [27],
demonstrated superior accuracy results than the previous
Ant-Miner versions. This algorithm defines the environment
as a directed acyclic graph, which allows the selection
of better transitions and the inclusion of interval rules. It
also implements the better performing max-min ant system
(MMAS) [32] and uses a more accurate class-specific heuristic
function. The fitness measure is defined as the sum of the
confidence and the coverage of the individual (rule), as shown
in the following equation:
fitness =confidence + coverage
=| A ∪ C ⊆ I,I ∈ D |
| A ⊆ I,I ∈ D |
where the confidence refers to the quotient of the number of
instances I belonging to the data set D that includes both
the antecedent A and the consequent C, and the number
of instances that include A. The coverage is gauged as the
proportion between the number of instances that include both
A and C and the number of instances that are not covered by
any of the extracted rules, referred as |D|.
Another key difference of Ant-Miner+ lies in the value
selected for the heuristic and the pheromone exponent
parameters—α and β. In fact, it introduces a range for each
parameter and lets the ants choose suitable values in an
autonomous way.
In addition to these modifications, there are other extensions
related to the hybridization of ACO with other metaheuristics.
Among them, we appreciate the hybrid particle swarm opti-
mization (PSO) - ACO algorithm, PSO/ACO2, developed by
Holden et al. [33], for the discovery of classification rules.
PSO is another optimization technique positioned among SI,
inspired by the social behavior of birds in flocks or fish in
schools. PSO/ACO2 is also a sequential-covering algorithm,
and it can cope with both numerical and nominal attributes.
It is a hybrid algorithm because it uses ACO to deal with
nominal attributes and PSO to deal with numerical ones. The
(3)
+| A ∪ C ⊆ I,I ∈ D |
| D |
fitness function is given by the following expression:
fitness =
1 + TP
1 + TP+ FP
(4)
B. Ant Programming
AP is an automatic programming technique that has certain
similarities with GP, but rather than using genetic algorithms
as search technique, it employs ACO to search for programs.
There are different proposals using AP in the literature,
which we now review, although their application is limited
to problems such as symbolic regression, and no applications
of AP to classification have been published so far.
The first work that combined the ants paradigm with the
automatic generation of programs was presented by Roux
and Fonlupt [11], and it was closely related to GP. In fact,
their algorithm starts by creating a random population of
programs (trees) using the ramped half-and-half initialization
method and storing a table of pheromones for each node of the
tree. Each pheromone table holds the amount of pheromone
associated with all possible elements (also named terminals
and functions, as GP does). Then, each program is evaluated
and the pheromone table is updated by evaporation and rein-
forcement based on the quality of the solutions. These steps
are repeated until some criteria are satisfied, but notice that
new populations of programs are generated according to the
pheromone tables. This approach was used to solve symbolic
regression problems and a multiplexor problem with relative
success.
A similar idea is that presented in [34], where each ant
builds and modifies trees taking into account the quantity of
pheromone at each node, which holds a pheromone table.
The authors combined this approach with particle swarm
optimization, with AP responsible for evolving the architecture
of flexible neural networks, and PSO responsible for optimiz-
ing the parameters encoded in the neural tree. They applied
developed flexible neural networks to a time-series prediction
problem, showing the effectiveness of their algorithm.
Page 4
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART B: CYBERNETICS4
Boryczka et al. [35], [36] also applied AP to solve symbolic
regression problems, calling their method ant colony program-
ming (ACP). They proposed two different versions of ACP,
known as the expression approach and the program approach.
In the expression approach the system generates arithmetic
expressions in prefix notation from the path followed by
the ant in a graph. This graph is defined as G = (N,E)
where N is the set of nodes, which can represent either a
variable or an operator, and E is the set of edges, each one
with a pheromone value. Green et al. [37] also presented
an AP technique similar to the ACP expression approach.
In turn, in the program approach the nodes in the graph
represent assignment instructions, and the solution consists of
a sequence of assignments that evaluate the function.
Boryczka also presented extensions to these works with
the aim of improving the evaluation performance. In [38],
the author improved the effectiveness and achieved more
simplified solutions by eliminating introns, while in [39] the
computational time necessary for the evaluation of transition
rules was reduced.
Another attempt to evolve programs using the ACO algo-
rithm was AntTAG [40]. It was proposed by Abbass et al.
as a method of automatic programming employing ACO as
its search strategy and a tree adjoining grammar (TAG) to
build programs. TAGs are compact contex-sensitive grammars
that use tree manipulation operations for syntactic analysis,
and they can distinguish between derivation and derivation
trees as well. The authors tested its performance on symbolic
regression problems and achieved better performance than
grammar guided GP [41] or TAG guided GP [42].
Keber and Schuster published another grammar-based work
called generalized ant programming (GAP) [43], which uses
a CFG instead of TAG, and where ants generate a program
by following a path in the space of states. Salehi-Abari
and White [44] worked on GAP, proposing a variation of
the algorithm called enhanced generalized ant programming
(EGAP). More specifically, it introduces a new pheromone
placement method that tends to put in a derivation step an
amount of pheromone proportional to the depth of the path;
and it also employs a specific heuristic function to control
the path termination. Then, Salehi-Abari and White published
another work [45] comparing the performance of GP against
their EGAP algorithm in three different problems: quarctic
symbolic regression, multiplexer and Santa Fe ant trail.
More recently, Shirakawa et al. [46] proposed dynamic ant
programming (DAP). Its main difference with regards to ACP
lies in the use of a dynamically changing pheromone table and
a variable number of nodes, which leads to a more compact
space of states. The authors only compared the performance
of DAP against GP using symbolic regression problems.
III. GBAP: GRAMMAR BASED ANT PROGRAMMING
ALGORITHM
In this section we describe the main features of Grammar
Based Ant Programming (GBAP) algorithm.
In short, GBAP is an automatic programming algorithm
that uses ACO as its search technique and which is also
guided by a context-free grammar. The use of a grammar
in this kind of systems establishes a formal definition of
the syntactical restrictions, defines the space of states, and
ensures that any solution found is syntactically valid. In fact,
it guarantees the closure property that must be fullfilled in
any automatic programming system [10], as any state and any
feasible solution can only be reached from the initial state in
a certain sequence of steps by applying the production rules
available.
The GBAP algorithm has been conceived for obtaining a
specific classifier arising from a learning process over a given
training set. The output classifier is an ordered rule list in
which discovered rules are sorted in descending order by their
fitness. A default rule predicting the majority class of the
training set is added at the bottom of the classifier. Once the
model has been learned from the training set, to classify a new
instance, the label assigned corresponds to the consequent of
the first rule in the classifier whose antecedent matches the
instance. In case it gets to the end of the classifier without
any rule antecedent covering this new instance, it would be
classified by the default rule.
As outlined in the following sections, the GBAP algorithm
can not be fitted into a typical ACO system. Due to the
bounding of the pheromone levels to within the interval [τmin,
τmax], and to the initialization of all edges to the maxi-
mum pheromone amount allowed, the algorithm with which
GBAP shares more characteristics may be the MMAS [32].
However, unlike how the reinforcement is carried out in
GBAP, in MMAS only the best ant is responsible for updating
pheromone trails. The complexity of MMAS-based algorithms
is a complex research area, which has been widely studied and
analyzed by Neumann et al. [47].
On the other hand, regarding the AP algorithms reviewed
in Section II-B, GBAP presents certain similarities with
GAP [43] and EGAP [44], specially in the use of a context-
free grammar and in the shape adopted by the space of states.
The main characteristics of GBAP are depicted in the
following subsections, presenting also a detailed pseudocode
of the algorithm.
G = (ΣN, ΣT, P, EXP)
ΣN = {EXP, COND}
ΣT = {’AND’, ’=’, ’!=’,
’attr1’, ’attr2’, ..., ’attrn’,
’value11’, ’value12’, ..., ’value1m’,
’value21’, ’value22’, ..., ’value2m’,
..., ’valuen1’, ’valuen2’, ..., ’valuenm’}
P = {EXP = COND | ’AND’ , EXP , COND ;
COND = all possible valid
combinations of the ternary
operator-attribute-value ; }
Fig. 1. Context-free grammar used in GBAP
A. Environment and Rule Encoding
GBAP prescribes a CFG for representing the antecedent of
the rule encoded by each individual. It is expressed in extended
Page 5
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART B: CYBERNETICS5
'=' ,
'attr1' ,
'value11'
EXP
'AND' , EXP , CONDCOND
'=' ,
'attr1' ,
'value12'
...
'!=' ,
'attrn' ,
'valuenm'
'AND' , 'AND' ,
EXP , COND , COND
'AND' , COND ,
COND
'AND' , 'AND' ,
'AND' , EXP ,
COND , COND ,
COND
'AND' ,
'AND' ,
COND ,
COND , COND
'AND' ,
'=' ,
'attr1' ,
'value11' ,
COND
'AND' ,
'=' ,
'attr1' ,
'value12' ,
COND
...
'AND' ,
'!=' ,
'attrn' ,
'valuenm' ,
COND
der = 1
der = 2
der = 3
'AND' ,
'=' , 'attr1' , 'value11' ,
'!=' , 'attr2' , 'value21'
'AND' ,
'=' , 'attr1' , 'value11' ,
'!=' , 'attr3' , 'value3m'
... ...
......
... ......
der = 4
(AND (= attr1 value11) (!= attr3 value3m))
Fig. 2.Space of states at a depth of four derivations. The sample coloured path represents the antecedent found by a given ant.
Backus-Naur form (EBNF) notation, and its definition is given
by G = (ΣN, ΣT, P, EXP), where ΣN is the set of non-
terminals, ΣTis the set of terminals, P is the set of production
rules, and EXP is the start symbol of the grammar. The
grammar is shown in Figure 1. Any production rule consists
of a left hand side (LHS) and a right hand side (RHS). The
LHS always refers to a non-terminal symbol that might be
replaced by the RHS of the rule (composed of a combination
of terminal and non-terminal symbols). Production rules are
expressed in prefix notation and should be always derived from
the left. Hence each transition from a state i to another state
j is triggered after applying a production rule to the first non-
terminal symbol of the state i. This design decision was taken
for performance reasons, in order to save on computation costs
when gauging rule fitness.
Observe that like GP, grammar guided systems also use the
terminal and non-terminal nomenclature, but here it refers to
the symbols of the grammar, rather than to the leaf nodes
or function/internal nodes of an individual tree representation
in GP. In grammar guided GP, the grammar controls the
creation of the initial population of individuals, the crossover,
mutation and reproduction processes; in contrast, in grammar
guided AP, because there are no genetic operators involved,
the grammar looks after each movement of each ant in such
a way that each ant will follow a valid path and will find a
feasible solution to the problem.
Concerning the design of any ant inspired algorithm, it is
necessary to specify an environment where ants cooperate with
each other. In GBAP, this environment is the search space
comprising all possible expressions or programs that can be
derived from the grammar in the number of derivations avail-
able. Thus, the environment adopts the shape of a derivation
tree, as shown in Figure 2 at a depth of three derivations.
Starting with the initial state of the environment, which
is associated with the start symbol defined by the grammar,
each ant tries to build a feasible solution to the problem.
Any solution found takes the form of a path from the root
node to a final state over the derivation tree, as shown in
the sample coloured path in Figure 2. This path consists of
a sequence of states, where each derivation step is given
by applying one of the available production rules at that
point. A final state—represented in the figure with a double-
border oval—only contains terminal symbols and, therefore
represents the evaluatable expression of the antecedent of
the rule encoded. Although final states encode an evaluatable
antecedent, fulfilling the properties of an artificial ant [48],
ants have an internal memory to store the path in order to do
an offline pheromone update.
The treatment of the environment is a key issue of the
algorithm. It should be pointed out that, depending on both
the dimensionality of the data set addressed and the number
of derivations permitted from the grammar, it may suppose an
excessive computational cost to keep in memory the whole
space of states. Thus, with respect to the space of states
generation, GBAP follows an incremental build approach. The
data structure that represents the space of states is initialized
just with the initial state and all possible transitions have the
same quantity of pheromones. This data structure also contains
attributes that take into account the effects of the evaporation
and normalization processes over the environment. The data
structure is filled as ants are created, storing there the states
of each ant’s path.
Regarding the individual encoding, GBAP follows the
ant=rule (i.e., individual=rule) approach [12]. As aforemen-
tioned, when ants have been just created they only repre-
sent the antecedent of a new rule. The consequent will be
Page 6
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART B: CYBERNETICS6
assigned by following the niching approach described later in
Section III-F.
B. Algorithm
The main steps of GBAP are detailed in the pseudocode of
Algorithm 1. It begins by starting up the grammar, creating a
cardinality table for each production rule, and initializing the
space of states with the initial state. It also creates an empty
object that represents the classifier, which will contain the
remaining—winner—ants of the competition that takes place
in the niching algorithm in each generation. The algorithm
starts with the minimum number of derivations that are nec-
essary to find a solution in the space of states and computes
the derivation step for each generation. Notice that in the case
of the grammar defined, at least two derivations are needed
to reach a solution from the initial state, as can be seen in
Figure 2.
Algorithm 1 High Level Pseudocode of GBAP
Require: numGenerations,numAnts,maxDerivations
1: Initialize grammar and space of states
2: Create an empty classifier
3: derivationStep ←maxDerivations−2
4: maxDerivations ← 2
5: for i = 0 to i = numGenerations inc 1 do
6:
Create list ants ← {}
7:
for j = 0 to j = numAnts inc 1 do
8:
ant ← Create new ant (see Procedure 2)
9:
Store ant’s path states in the space of states
10:
Evaluate ant, computing its fitness for each available class
in the data set
11:
Add ant to the list ants
12:
end for
13:
Niching approach to assign the consequent to the ants and to
establish the classifier rules (see Procedure 3)
14:
for each ant in ants do
15:
if fitness > threshold then
16:
Update pheromone rate in the path followed by ant
proportionally to its fitness and inversely proportional
to its path’s length
17:
end if
18:
end for
19:
Evaporate the pheromone along the whole space of states
20:
Normalize values of pheromones
21:
maxDerivations ← maxDerivations + derivationStep
22: end for
23: Establish the default rule in the classifier
24: predictiveAccuracy ← Compute the predictive accuracy ob-
tained when running the classifier built on the test set
25: return predictiveAccuracy
numGenerations
A new list of ants is initialized at the beginning of each
generation, and the algorithm fills this list, creating the number
of ants specified by a parameter. The states visited by each
new ant are stored in the space of states. Then, the algorithm
computes k fitness values per ant, k being the number of
classes in the data set. Notice that at this point each ant
encodes only the antecedent of a rule because the consequent
has not been assigned yet.
Once all ants have been created, these ants along with
the ants assigned to the classifier in the previous generation
compete in the niching algorithm. They try to capture as
Procedure 2 Ants Creation
Require: maxDerivations
1: Create list path ← {}
2: n ← Initial state
3: Add n to the list path
4: repeat
5:
maxDerivations ← maxDerivations − 1
6:
n ← Select next movement from space of states, n being the
source node, and maxDerivations the number of derivations
available
7:
Add n to the list path
8: until (n is a final node)
9: ant ← New Ant with its path set to path
10: return ant
many instances of the data set as they can, as explained in
Section III-F. Then, a consequent is assigned to each ant. To
conclude the niching algorithm, the winner ants are assigned
to the classifier, replacing the previous rules.
Afterwards, each ant created in this generation of the algo-
rithm reinforces the amount of pheromones of the transitions
followed only if it has a fitness greater than the threshold
value. To complete the generation, an evaporation and a
normalization process takes place. The maximum number of
derivations is also incremented by the derivation step.
After finishing all the generations, the default rule is added
to the classifier and the classifier is run on the test set,
computing the predictive accuracy.
The creation process of a given ant is described in Proce-
dure 2. First, the algorithm initializes a new empty list to store
the nodes visited by the new ant. Then, it creates a new node
n that corresponds to the initial state of the environment and
adds this node to the path list. Following a stepwise approach,
the main loop of the algorithm takes care of selecting the next
movement of the ant from the current state, decreasing by one
the number of derivations that remain available. It also adds
the newly visited state to the list path. It finishes when a final
state is reached and, therefore, the ant has found a solution.
Finally, a new ant is created from the list of visited states
path.
C. Heuristic Measures
Another differentiating factor of GBAP with respect to ACO
algorithms lies in the use of two components in the heuristic
function that can not be applied simultaneously. To distinguish
which one apply, GBAP needs to find out which type of
transition it is about, considering two different cases, which we
refer as intermediate transitions (i.e., transitions not involving
production rules that imply the selection of attributes of the
problem domain) and final transitions (i.e., transitions that
suppose the application of production rules of the type COND
= ’operator’ , ’attribute’ , ’value’ ;).
For intermediate transitions, the measure considered is re-
lated to the cardinality of the production rules, and is referred
to as Pcard. This component increases the likelihood that a
given ant chooses transitions that lead to a greater number
of candidate solutions. It is based on the cardinality measure
proposed in [49], and to use it, the algorithm works out a
cardinality table for each production rule when it initializes
Page 7
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART B: CYBERNETICS7
the grammar. This cardinality table will contain as many
entries as the maximum number of derivations available, where
each entry maps the number of derivations to the number of
solutions that can be reached. Thus, given a state i having
k subsequent states, be j a specific successor among those k
states, and where d derivations remain available at that point,
Pcardk
solutions that can be successfully reached from the state j in
d-1 derivations, and the sum of all possible candidate solutions
that can be reached from the source state i (see Eq. 5). Notice
that Pcardhas no sense in final transitions, since each possible
destination node embraces the same number of solutions.
ijis gauged as the ratio between the number of candidate
Pcardk
ij=
cardinality(statej,d − 1)
Σk∈allowed(cardinality(statek,d − 1))
contrast,forfinaltransitions,
(G(Ai)) [9] is the component considered. It measures
the worth of each attribute for separating the training
examples with respect to their target classification, and it is
computed as:
(5)
In information gain
G(Ai) = I − I(Ai)
(6)
where I represents the entropy of the classes in the training
set, and I(Ai) is the entropy of the classes given the values
for attribute Ai. In turn, these are computed as follows:
I = −
#classes
?
c=1
(nc
n· log2nc
n)
(7)
where ncis the number of instances of class c, and n stands
for the number of instances in the training set.
I(Ai) =
#valuesAi
?
j=1
(nij
n
· I(Aij))
(8)
where nijis the number of instances with value j in attribute
Ai, and I(Aij) is the entropy of the classes given the value j
for attribute Ai, computed as:
I(Aij) = −
#classes
?
c=1
(nijc
nij
· log2nijc
nij
)
(9)
where, finally, nijcstands for the number of instances of class
c with the value j in attribute Ai.
D. Transition Rule
The ACO metaheuristic follows a constructive method, i.e.,
every solution is built according to a sequence of transitions
guided by some information. The information that biases each
step is considered in the transition rule, which defines the
probability that a given ant moves from a state i to another
state j:
Pk
ij=
(ηij)α· (τij)β
Σk∈allowed(ηik)α· (τik)β
(10)
where k is the number of valid subsequent states, α is the
heuristic exponent, β is the pheromone exponent, η is the value
of the heuristic function, computed as G(Ai)+Pcard(having at
least one of the two components equal to zero), and τ indicates
the strength of the pheromone trail.
The transition rule will assign a probability to each available
next state. The algorithm arranges that each possible transition
can reach at least one final state or solution in the number
of derivations that remain available at that point. If not, those
transitions will be assigned a probability of zero and, therefore,
the ant will never select such movements.
E. Pheromone Updating
With regards to the pheromone maintenance, two operations
are considered: reinforcement and evaporation. Each ant of the
current generation is able to reinforce the pheromone amount
in its path’s transitions only if the quality of the solution
encoded by this ant is greater than a threshold value. Then,
a delayed pheromone update over the path of this ant takes
place. The threshold value has been fixed experimentally at
0.5 so that bad solutions will never influence the environment.
All transitions in the path get an equal amount of pheromone,
and this reinforcement is based both on the length and the
quality of the solution encoded by the ant:
τij(t + 1) = τij(t) · Q · fitness
where τijrepresents the amount of pheromone in the transition
from the state i to the state j, and Q is a computed measure
that favors comprehensible solutions. In fact, the value of this
parameter depends on the length of the solution encoded by
the ant, being calculated as the ratio between the maximum
number of derivations in the current generation and the length
of the path followed by the ant (shorter solutions will receive
more pheromone thus).
Regardless of whether the ant’s quality is greater than the
threshold value, the evaporation process always takes place,
involving all transitions in the space of states, as seen in the
following expression:
(11)
τij(t + 1) = τij(t) · (1 − ρ)
(12)
where ρ represents the evaporation rate.
Once the pheromone trails in the environment have been
reinforced and evaporated, a normalization process takes place,
in order to bound the pheromone amount existing in each
transition to the range [τmin, τmax].
Notice that it is quite complicated to store the environment’s
pheromone values in a pheromone matrix. Instead, each value
is kept in the data structure that corresponds to the destination
state of a given transition.
F. Fitness Function and Consequent Assignment
The fitness function that GBAP uses in the training stage
to conduct the search process is the Laplace accuracy [50].
This measure was selected because it suits well to multiclass
classification since it takes into account the number of classes
in the data set. The quality of the ants is then defined by:
fitness =
1 + TP
k + TP+ FP
(13)
Page 8
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART B: CYBERNETICS8
Procedure 3 GBAP Niching Approach
Require: ants,classifier,minCasesPerRule,trainingSet
1: Create list winnerAnts ← {}
2: numInstances ← number of instances in trainingSet
3: numClasses ← number of classes in trainingSet
4: for k ← 0 to k < numClasses inc 1 do
5:
Sort ants by fitness corresponding to class k
6:
flagsArray ← boolean array of numInstances elements
7:
for i ← 0 to i < |flagsArray| inc 1 do
8:
flagsArray[i] ← false
9:
end for
10:
for each ant in ants do
11:
idealTokens ← 0
12:
capturedTokens ← 0
13:
for j ← 0 to j < numInstances inc 1 do
14:
instance ← Get instance j of trainingSet
15:
cat ← Get category of instance
16:
if cat = k then
17:
idealTokens ← idealTokens + 1
18:
if flagsArray[j] = false and ant covers instance
then
19:
flagsArray[j] ← true
20:
capturedTokens ← capturedTokens + 1
21:
end if
22:
end if
23:
end for
24:
if capturedTokens >= minCasesPerRule then
25:
fitnessadj[k] ←
26:
else
27:
fitnessadj[k] ← 0
28:
end if
29:
end for
30: end for
31: for each ant in ants do
32:
max ← 0
33:
for k ← 0 to k < numClasses inc 1 do
34:
if fitnessadj[k] > fitnessadj[max] then
35:
max ← k
36:
end if
37:
end for
38:
Set consequent of ant to class max
39:
if fitnessadj[max] > 0 then
40:
Add ant to winnerAnts
41:
end if
42: end for
43: return winnerAnts
capturedTokens
idealTokens
where TP and FP represent true positives and false positives,
respectively, and k refers to the number of classes in the data
set.
Concerning the assignment of the consequent, GBAP fol-
lows a niching approach quite similar to that employed in
[51], whose purpose is to evolve different multiple rules for
predicting each class in the data set while preserving the
diversity. Depending both on the data set and the distribution
of instances per class, it is often not possible for a rule to
cover all instances of a class and therefore it is necessary to
discover additional rules for predicting this class. The niching
algorithm takes care that it does not overlap with instances of
another class. In addition, it is appropriate when removing
redundant rules. Moreover, it lacks the drawbacks that are
present in covering algorithms related to the discarding of
instances. Since these are covered by mined rules, they are
removed from the training set and in consequence, the fitness
of the subsequent rules discovered is computed by using a
smaller number of instances. Thus, covering algorithms seek
rules with good performance in the sub-data set.
The pseudocode of the niching algorithm developed is
shown in Procedure 3. Notice that each existing instance
in the training set is called a token: all ants will compete
to capture the tokens, bearing in mind that a given token
can be captured at most by one ant. Remember that GBAP
has previously computed an array of k fitness values per
individual, one for each class (assuming that the respective
class is assigned as consequent to that individual). Then,
the niching algorithm repeats the following process k times,
considering each iteration a different class related to their
order of appearance in the metadata section of the training
set: (a) the ants are sorted out by their respective class fitness
in descending order; and (b) each ant tries to capture as many
tokens of the computing class as it covers, in case of tokens
not seized before by any other ant. Finally, if the rule encoded
by the ant has captured at least minCasesPerRule, the ant’s
adjusted fitness value for this class is computed as described
in Eq. 14, where idealTokens stands for the ideal number
of tokens that the current ant could capture, i.e., the number
of tokens belonging to the computing class (regardless of
whether they have been previously captured by other ant or
not). Otherwise, if the number of captured tokens is below
the threshold established by minCasesPerRule, an adjusted
fitness value of zero is assigned to the ant.
fitnessadj=fitness ·#capturedTokens
Notice that the number of idealTokens is always greater or
equal than capturedTokens. Thus, the closer are their values,
the less penalized is the ant (in fact, if capturedTokens =
idealTokens, the ant is not penalized).
Once the k adjusted fitness values have been calculated, the
consequent assigned to each ant corresponds to the one that
reports the best adjusted fitness. To conclude, individuals that
have an adjusted fitness greater than zero—and consequently
cover at least one instance of the train set—are added to the
classifier.
#idealTokens
(14)
IV. EXPERIMENTATION
In this section we will first present the data sets used in
the experimental study, along with the preprocessing actions
performed. Then, we explain the cross validation procedure
employed. Finally, the parameter set-up for the different algo-
rithms considered in the comparison is presented.
A. Data Sets and Preprocessing
The performance of GBAP was tested on 18 publicly
available data sets, both artificial and real-world, selected from
the machine learning repository of the University of California
at Irvine (UCI) [52]1. We have selected problems with a wide
range of dimensionality with respect to the number of classes
1All data sets can be reached from the UCI website at http://archive.ics.
uci.edu/ml/datasets.html
Page 9
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART B: CYBERNETICS9
TABLE II
DATA SETS DESCRIPTION
ATTRIBUTES
Continuous Binary
6
60
0
6
33
7
6
9
0
6
0
4
13
4
3
9
1
0
DATASET
MISSING VALUES
INSTANCES
CLASSES
DISTRIBUTION OF CLASSES
Nominal
0
0
6
4
0
13
9
0
0
11
22
0
0
0
6
0
0
3
Hepatitis
Sonar
Breast-c
Heart-c
Ionosphere
Horse-c
Australian
Breast-w
Diabetes
Credit-g
Mushroom
Iris
Wine
Balance-scale
Lymphography
Glass
Zoo
Primary-tumor
yes
no
yes
yes
no
yes
yes
yes
no
no
yes
no
no
no
no
no
no
yes
155
208
286
303
351
368
690
699
768
1000
8124
150
178
625
148
214
101
339
13
0
3
3
1
2
0
0
8
3
0
0
0
0
9
0
15
14
2
2
2
2
2
2
2
2
2
2
2
3
3
3
4
6
7
32 / 123
97 / 111
201 / 85
165 / 138
126 / 225
232 / 136
307 / 383
458 / 241
500 / 268
700 / 300
4208 / 3916
50 / 50 / 50
59 / 71 / 48
288 / 49 / 288
2 / 81 / 61 / 4
70 / 76 / 17 / 13 / 9 / 29
41 / 20 / 5 / 13 / 4 / 8 / 10
84 / 20 / 9 / 14 / 39 / 1 / 14 / 6 / 2 / 28
16 / 7 / 24 / 2 / 1 / 10 / 29 / 6 / 2 / 1 / 24
21
and attributes. These data sets are listed in Table II, where
their particular characteristics are also described.
Due to the fact that the data sets considered contained
numerical attributes and missing values, two preprocessing
actions were performed using Weka2. A first one entailed the
replacement of missing values with the mode (for nominal
attributes) or the arithmetic mean (for numerical attributes).
And the other involved the discretization of such data sets con-
taining numerical attributes, by applying Fayyad and Irani’s
discretization algorithm [53]. The replacement of missing
values was done before partitioning the data set, and the
discretization was applied for each specific training set, using
the same intervals found to discretize the corresponding test
set.
B. Cross Validation
For each data set and algorithm, we performed a stratified
ten-fold cross-validation procedure, where we randomly split
each data set into ten mutually exclusive partitions, P1,...,P10,
containing approximately the same number of instances and
the same proportion of classes present in the original data set.
Then, ten different experiments were executed using?
set. Hence, the predictive accuracy obtained on a given data
set is considered as the average accuracy over these ten folds,
described as
j?=iPj
as the training set at the ith-experiment and Pi as the test
predAcc =Σ10
i=1(#correctlyClassifiedPi)
#instances
· 100
(15)
where #correctlyClassifiedPi is the number of correctly
classified instances when using Pias test set, and #instances
is the number of instances in the original data set.
In addition, to avoid any chance of obtaining biased results
when evaluating the performance of stochastic algorithms, ten
executions per fold were performed, using ten different seeds.
2The Weka machine learning software is publicly available at http://www.
cs.waikato.ac.nz/ml/index.html
C. Algorithms and Parameter Set-Up
For comparison purposes, six other rule induction algo-
rithms were considered: three ant-based algorithms, Ant-
Miner3, Ant-Miner+4and PSO/ACO25, which were dis-
cussed in Section II-A; a GP algorithm, Bojarczuk-GP [54]6,
which will be explained briefly next; and two well-known
classifiers, JRIP—the Weka’s implementation of the popular
sequential covering Repeated Incremental Pruning to Produce
Error Reduction (RIPPER) algorithm—and PART, which ex-
tract rules from the decision trees generated by the J48 Weka’s
algorithm. It is worth noting at this point that every algorithm
used in the experimentation was run over the same discretized
partitions of the data sets previously mentioned, even in the
case of those capable of handling numerical values.
Bojarczuk-GP is a GP algorithm for classification rule min-
ing that reports good accuracy and comprehensibility results
when applied to medical data sets. It is a constrained syntax
algorithm which represents the rules by defining a set of
functions consisting both of logical operators (AND, OR)
and relational operators (=,?=,≤,>). Bojarczuk-GP follows
a mixed individual=rule/rule set approach, where each indi-
vidual encodes a set of rules in disjunctive form that predict
the same class, and the classifier generated for a given problem
consists of k individuals, k being the number of classes in the
data set. The genetic operators considered by this algorithm are
crossover and reproduction, so that no mutation is performed
during the evolution. For the sake of evolving comprehensible
rules, the fitness function evolves three terms:
fitness = sensitivity · specificity · simplicity
(16)
3Ant-Miner was run using framework Myra (version 2.1.2), which can
be downloaded from http://myra.sourceforge.net/
4The source code of Ant-Miner+ was provided by the authors.
5The PSO/ACO2 algorithm was run using the implementation available
at http://sourceforge.net/projects/psoaco2
6Bojarczuk-GP was run using the existing implementation in the evolu-
tionary computation framework JCLEC [55], which is publicly available at
http://jclec.sourceforge.net
Page 10
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART B: CYBERNETICS 10
TABLE III
USER-DEFINED PARAMETER CONFIGURATION
DESCRIPTION
ALGORITHM
GBAP
NAME
VALUE
numAnts
numGenerations
maxDerivations
minCasesPerRule
[τ0]
[τmin]
[τmax]
[ρ]
[α]
[β]
Number of ants
Number of generations
Maximum number of derivations for the grammar
Minimum number of instances covered per rule
Initial pheromone amount for transitions
Minimum pheromone amount allowed for transitions
Maximum pheromone amount allowed for transitions
Evaporation rate
Heuristic exponent value
Pheromone exponent value
20
100
15
3
1.0
0.1
1.0
0.05
0.4
1.0
ANT-MINER
number of ants
min. cases per rule
max. uncovered cases
rules for convergence
number of iterations
Number of ants
Minimum number of instances covered per rule
Maximum number of uncovered cases
Convergence test size
Maximum number of iterations
1
10
10
10
3000
ANT-MINER+
nAnts
rho
Number of ants
Evaporation rate
1000
0.85
PSO/ACO2
numParticles
numIterations
maxUncovExampPerClass
Number of particles
Number of iterations
Maximum number of uncovered examples per class
10
200
2
GP
population-size
max-of-generations
max-deriv-size
recombination-prob
reproduction-prob
parents-selector
Number of individuals
Number of generations
Maximum number of derivations for the grammar
Crossover probability
Reproduction probability
Selection method for both parents
200
100
20
0.8
0.05
Roulette
JRIP
checkErrorRate Whether check for error rate >= 1/2 is included
in stopping criterion
Determines the amount of data used for pruning. One fold
is used for pruning, the rest for growing the rules
The minimum total weight of the instances in a rule
The number of optimization runs
Whether pruning is performed
True
folds3
minNo
optimizations
pruning
2.0
2
True
PART
binarySplits Whether to use binary splits on nominal attributes when
building the partial trees
The confidence factor used for pruning (smaller
values incur more pruning).
The minimum number of instances per rule
Determines the amount of data used for reduced-error
pruning. One fold is used for pruning, the rest for growing
the rules
Whether reduced-error pruning is used instead of C4.5
pruning
Whether pruning is performed
False
confidenceFactor0.25
minNumObj
numFolds
2
3
reducedErrorPruningFalse
unprunedFalse
where sensitivity and specificity are computed as indicated in
Eq. 2 and simplicity is computed as follows:
simplicity =maxnodes − 0.5 · numnodes − 0.5
maxnodes − 1
where maxnodes is the maximum number of nodes allowed
and numnodes is the current number of nodes. Thus, the
goal of the fitness function is to maximize both sensitivity
and specificity, while minimizing the complexity of the rule
set. When the evolutionary process terminates, the classifier is
set-up with the best individual found for each class.
For each algorithm, excluding GBAP, its user-defined pa-
rameters were set to the values reported by the authors in
the aforementioned references. The parameter configuration
is summarized in Table III. As it can be observed, GBAP
(17)
seems to have more parameters than the other ACO-based
algorithms, and it may be a disadvantage for the final user.
Nevertheless, the other ACO algorithms also have parameters
that are hidden for the final user. For example, in the paper
were Ant-Miner+ was proposed [27], the authors describe
parameters such as α, β, early stopping criterion, or param-
eters that are implicit to the MMAS approach followed by
this algorihtm—τ0, τmin and τmax—, but the authors have
preset their value in the code of the algorithm. We could
have reduced the number of user-defined parameters just to
four—numAnts, numGenerations, maxDerivations and
minCasesPerRule—, prefixing the value for the rest of
parameters in the algorithm’s code to the values reported in Ta-
ble III, but this could be also a disadvantage for a given expert
Page 11
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART B: CYBERNETICS 11
TABLE IV
PREDICTIVE ACCURACY(%) COMPARATIVE RESULTS
GBAP
Acc
82.17
81.98
71.40
82.84
93.02
82.97
85.47
96.50
75.80
70.79
98.26
96.00
97.01
75.49
81.00
69.13
95.60
37.91
2.25
ANT-MINER
Acc
83.27
76.95
73.42
78.01
84.39
82.71
85.30
94.69
72.48
70.55
98.15
95.20
91.86
68.36
75.51
65.52
92.55
37.75
4.78
ANT-MINER+
Acc
81.79
76.05
73.05
82.41
92.89
81.79
83.48
94.28
74.58
70.80
98.89
94.00
93.86
77.75
77.23
62.03
93.09
37.26
4.39
PSO/ACO2
Acc
84.59
78.49
68.63
82.25
89.97
82.06
85.19
95.86
74.16
70.36
99.90
95.33
90.20
77.14
76.59
71.16
92.32
37.19
4.11
GPJRIP PART
Dataset
Hepatitis
Sonar
Breast-c
Heart-c
Ionosphere
Horse-c
Australian
Breast-w
Diabetes
Credit-g
Mushroom
σAcc
12.04
7.44
7.86
5.24
4.07
6.34
4.49
1.68
4.12
4.27
0.76
4.10
4.37
4.97
10.35
8.66
4.21
6.55
σAcc
10.32
6.89
7.29
6.69
6.73
4.73
4.12
2.04
3.76
3.72
0.71
5.47
5.08
5.30
9.59
9.26
7.93
5.27
σAcc
10.30
7.22
6.86
5.10
4.02
6.03
3.38
2.86
4.81
3.87
0.63
3.59
4.61
6.31
10.91
9.80
10.65
5.43
σAcc
9.33
8.05
6.87
5.36
4.99
4.93
4.69
1.91
4.47
3.55
0.11
6.70
2.86
4.93
12.20
10.54
7.19
5.88
Acc
71.05
79.82
68.63
70.02
76.48
82.52
85.52
87.39
61.94
63.02
86.22
91.73
83.69
58.38
77.78
39.23
64.20
16.41
σAcc
14.45
9.24
10.94
7.08
8.19
6.06
4.50
2.75
4.72
7.03
6.11
10.46
9.44
7.76
12.77
11.34
18.88
4.96
Acc
81.54
80.33
72.00
82.20
91.70
83.72
86.70
95.71
75.56
70.70
99.99
96.00
95.61
73.42
78.84
69.00
86.85
38.11
σAcc
12.05
6.61
6.41
5.12
5.14
6.35
5.15
1.81
2.34
3.26
0.04
5.33
5.37
5.66
11.49
8.70
7.25
3.75
Acc
84.64
77.84
68.48
80.13
88.93
81.5
84.66
95.71
75.66
72.70
100.00
95.33
95.03
76.50
78.43
73.91
94.84
38.36
σAcc
7.66
8.10
7.90
6.39
4.02
3.72
4.48
1.82
2.52
3.26
0.00
6.70
3.89
3.51
14.30
8.43
9.02
5.09
Iris
Wine
Balance-scale
Lymphography
Glass
Zoo
Primary
RANKING
6.08 3.063.33
user, because it will probably be more difficult to harness
the power of the algorithm. Thus, the first four parameters of
GBAP are mandatory, and the other six parameters—enclosed
into square brackets—are optional, having a default value.
For GBAP, the configuration considered in Table III was
adopted after carrying out a cross-validation procedure over
three data sets (primary-tumor, hepatitis and wine), using
values from different ranks for each parameter, and then
analyzing which specific set-up globally reported the best
values. It is worth mentioning that no single combination of
parameter values performed better for all data sets as expected.
Nevertheless, notice that this adopted configuration should be
tuned when classifying a particular data set.
V. RESULTS AND DISCUSSION
The performance and the understandability of the model
proposed is compared to other classification algorithms. The
aim of this section is to analyze statistically and interpret the
experimental results obtained. Recall that in DM there is no
classification algorithm that performs better than all others for
every data set, as stated by the no free lunch theorem [56],
[57].
A. Predictive accuracy analysis
A first evaluation criterion for the comparison is the pre-
dictive accuracy. Table IV shows average values for predictive
accuracy with standard deviation. The best classification ac-
curacies for each data set are highlighted in bold typeface.
Analyzing the table, it is possible to realize that GBAP is
competitive with respect to all the other algorithms considered,
and also that it obtains the best results on 50% of the data sets
used in the experimentation. In those data sets where GBAP
does not reach the best results, its classification results are
quite competitive. With regard to the standard deviation values,
we can also observe that GBAP globally yields middling
values in terms of stability.
Though GBAP obtains the best average accuracy values,
we performed the Friedman test with the aim of comparing
the results obtained and analyzing if there are significant
differences between the classifiers. The Friedman test com-
pares the average rankings of k algorithms over N datasets.
Average rankings of all the algorithms considered are sum-
marized at the bottom of Table IV. Looking at these ranking
values, it can be noticed that the lowest ranking value, i.e.,
the best global position, is obtained by our proposal. The
computed value for the Friedman statistic of average rankings
distributed according to the F-distribution with k − 1 and
(k−1)(N −1) degrees of freedom is 8.7404, which is greater
than the tabled critical value at the α = 0.1 significance
level, C0= [0,(FF)0.1,6,102= 1.8327]. Thus, we reject the
null-hypothesis that all algorithms perform equally well when
α = 0.1.
5432
GBAPPART
JRIP
PSO/ACO2
Ant-
Miner+
CD=1.7239
6
Bojarczuk-GP
Ant-Miner
7
Fig. 3.
shaded interval have significant differences with respect to GBAP (p < 0.1)
Bonferroni–Dunn test. All classifiers whose ranks are outside the
Because of the rejection of the null-hypothesis by the
Friedman test, we proceed with a post-hoc test to reveal
the performance differences. Since all classifiers are com-
pared with respect to a control classifier, we can apply the
Bonferroni–Dunn test [58] focusing on all possible pairwise
comparisons involving the GBAP algorithm. At the same
significance level (α = 0.1) the Bonferroni–Dunn critical
value is 1.7239, which means that in order to be significant, the
difference between any pair of means must be at least 1.7239
units [59]. Thus, the performance of GBAP is statistically
better than those of the PSO/ACO2, Ant-Miner+, Ant-Miner
Page 12
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART B: CYBERNETICS12
TABLE V
RULE SET LENGTH AND RULE COMPLEXITY COMPARATIVE RESULTS
GBAPANT-MINER
ANT-MINER+
#R#C/R #R #C/R #R#C/R
8.11.89 4.8 1.99 3.93.25
12.31.81 5.22.07 4.0 3.48
13.21.916.0 1.285.4 2.82
14.51.67 5.9 1.204.42.82
11.11.18 5.71.61 8.81.41
9.01.46 6.31.494.73.41
10.11.08 6.5 1.53 3.32.08
6.61.657.21.04 6.4 1.92
9.9 1.538.6 1.03 5.53.71
22.9 1.829.11.51 3.3 3.31
6.71.337.71.19 8.6 1.27
3.71.06 4.3 1.033.9
7.2 1.50 5.11.33 2.5 2.19
16.71.92 12.4 1.01 9.1 3.35
10.21.60 4.71.694.62.83
21.61.79 8.41.76 12.44.10
8.7 1.976.11.32 6.74.07
45.9 2.6012.13.359.3 8.50
5.303.672.69
3.22 2.56.33
PSO/ACO2
#R
7.4
6.1
11.8
11.9
4.5
20.1
25.8
10.5
35.2
52.8
9.1
3.0
4.0
27.0
15.6
24.5
7.1
86.5
5.25
5.55
GPJRIPPART
#R
8.4
13.9
17.1
17.3
8.2
13.2
19.4
10.9
17.9
57.8
10.0
4.6
6.3
28.6
10.2
13.7
7.7
48.7
6.31
4.75
Dataset
Hepatitis
Sonar
Breast-c
Heart-c
Ionosphere
Horse-c
Australian
Breast-w
Diabetes
Credit-g
Mushroom
#C/R
2.28
2.92
1.75
3.81
4.03
3.39
6.96
1.1
3.61
4.2
2.04
1.20
1.73
2.69
2.11
3.13
1.47
6.01
#R
3.1
3.0
3.5
3.0
3.1
3.0
3.0
3.0
3.0
3.3
3.3
4.3
4.1
5.2
5.1
8.2
8.0
23.7
#C/R
1.22
1.00
1.01
3.02
1.14
1.00
1.00
1.00
1.33
1.17
1.12
1.29
1.27
1.56
1.02
1.48
1.42
1.37
#R
3.8
4.6
3.3
5.3
7.7
3.5
5.2
6.5
4.6
7.1
8.5
3.0
4.2
12.4
6.9
8.0
7.4
8.3
#C/R
2.15
2.21
1.70
2.32
1.48
1.74
1.80
1.74
2.88
2.54
1.58
1.00
1.56
1.84
1.53
2.03
1.58
3.13
#C/R
2.30
2.98
2.12
2.35
1.83
2.38
2.01
1.63
2.21
2.70
1.72
1.00
1.77
1.55
2.30
2.32
1.57
3.23
Iris1.8
Wine
Balance-scale
Lymphography
Glass
Zoo
Primary
#R RANKING
#C/R RANKING
2.06
1.78
2.72
3.86
and Bojarczuk-GP algorithms, because the difference between
their mean rank value and the mean rank of GBAP is greater
than the mentioned critical value. These results are captured
in Figure 3, where one can also see that GBAP achieves
competitive or even better accuracy results than PART and
JRIP.
Note that both at a significance level of α = 0.05 and
α = 0.01, the Friedman test also rejects the null-hypothesis. In
the first case, the Bonferroni–Dunn critical value is 1.8996, so
that GBAP is significantly more accurate than Ant-Miner+,
Ant-Miner and GP. At the α = 0.01 significance level,
the Bonferroni–Dunn critical value is equal to 2.2639 and,
therefore, GBAP is significantly more accurate than Ant-Miner
and GP. In both cases, GBAP is the control algorithm and its
results are quite competitive or better than the results obtained
by the other algorithms.
In order to contrast the results obtained after the application
of the Bonferroni–Dunn’s procedure, we can use the Holm
test, which is more powerful than the first one and makes no
additional assumptions about the hypotheses tested [58]. The
advantage of the Bonferroni–Dunn test lies in the fact that it
is easier to describe and visualize because it uses the same
critical difference for all comparisons. In turn, the Holm test
is a step-down post-hoc procedure that tests the hypotheses
ordered by significance, comparing each piwith α/(k−i) from
the most significant p value. Table VI shows all the possible
hypotheses of comparison between the control algorithm and
the others, ordered by their p value and associated with their
level of significance α. To contrast the results obtained by the
Bonferroni–Dunn method, we applied the Holm test, which
rejects those hypotheses that have a p value less or equal to
0.025. Thus, at a significance level of α = 0.05, according
to the Holm test and regarding to the predictive accuracy
results, GBAP is statistically better than PSO/ACO2, Ant-
Miner+, Ant-Miner and Bojarczuk-GP algorithms.
TABLE VI
HOLM TABLE FOR α = 0.05
i
6
5
4
3
2
1
Algorithm
GP
ANT-MINER
ANT-MINER+
PSO/ACO2
PART
JRIP
zp α/i
Hypothesis
Rejected
Rejected
Rejected
Rejected
Accepted
Accepted
5.323465
3.510401
2.970339
2.584581
1.504457
1.118699
1.0180E-7
4.4743E-4
0.002974
0.009749
0.132463
0.263268
0.008333
0.01
0.0125
0.016666
0.025
0.05
B. Comprehensibility analysis
A second evaluation criterion is the comprehensibility of
the knowledge acquired. In contrast to predictive accuracy,
comprehensibility is a subjective concept, and it is frequently
associated to the syntactical simplicity of the classifier [54].
Thus, the smaller the number of rules and the number of
conditions appearing in them, the smaller the complexity of
the classifier.
Table V summarizes both the classifier’s rule set complexity,
by the average number of rules found per data set, and the
complexity of the rules, by the average number of conditions
per rule. The last but one row of the table shows the average
ranking value of each algorithm using the Friedman test with
respect to the number of rules in the classifier, and the last row
does the same for the number of conditions per rule. In both
cases the control algorithm found is GP, as it has the lowest
ranking value.
Before analyzing the results obtained, it is important to
mention that all algorithms except GP extract rules in the same
form, as a conjunction of conditions. However, GP employs
the OR operator, and due to the tree-based enconding of
individuals in GP, to compute fairly the number of rules and
the number of conditions per rule, for each OR operator it
is necessary to split the rule into two separate rules, without
considering OR nodes as conditions.
The first statistical analysis is carried out considering
the average number of rules in the output classifier. At a
Page 13
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART B: CYBERNETICS 13
significance level of α = 0.05 the application of the Friedman
test rejects the null-hypothesis, because the value of the
statistic, 23.4734, does not belong to the critical interval
C0 = [0,(FF)0.05,6,102 = 2.1888]. To show the significant
differences we applied the post-hoc Bonferroni–Dunn test. The
Bonferroni–Dunn’s critical value is 1.8995 when alpha=0.05,
which means that GP, JRIP and Ant-Miner+ are statistically
better than GBAP. In turn, GBAP does not perform signifi-
cantly worse than Ant-Miner, PSO/ACO2 and PART.
Regarding the number of rules in the output classifier, the
best possible result would be to mine one rule per class, but
this may not lead to good results when the distribution of
instances per class is not located in a definite space region.
This can be observed in the behavior of the GP algorithm,
because it nearly extracts one rule per class and, therefore,
it obtains the best results in this respect. Notice that in
this algorithm, although OR nodes are not considered to be
conditions but a way of joining two different rules predicting
the same class, the algorithm tends to minimize this kind of
operator, as it decreases substantially the simplicity component
of the fitness function and, therefore, decreases the quality of
the rules mined. In addition, this number of rules may not
be enough for obtaining accurate results in many data sets,
as it can be deduced looking at the accuracy results obtained
by GP algorithm in Section V-A. In contrast, by using the
niching algorithm described in Section III-F, GBAP ensures
the selection of the number of rules that are necessary to
cover the examples of each class, also achieving very good
classification results.
The second statistical analysis involved the average number
of conditions per rule measured. To check whether the algo-
rithms present differences, we applied the Friedman test at the
same significance level considered in the previous study, α =
0.05. The F-distribution’s statistic value is 22.0539, which nei-
ther belongs to the critical interval C0= [0,(FF)0.05,6,102=
2.1888]. Therefore, there are significant differences between
the algorithms. The subsequent application of the Bonferroni–
Dunn test revealed that GBAP performs significantly better
than Ant-Miner+ and PSO/ACO2 in this aspect. Another
conclusion of this test is that GBAP is not significantly better
than GP, Ant-Miner, JRIP and PART, neither significantly
worse than these algorithms, which is more important.
Regarding this measure, it should be pointed out that the use
of a grammar in GBAP has a benefit because we can restrict
the complexity of each rule by the number of derivations
allowed for such grammar. Thus, we can arrange a trade-
off between rule complexity and performance, reaching a
compromise (longer rules may report better rules as they
can discover more complex relationships between attributes).
As seen in Table VII, the GBAP algorithm is the third-best
algorithm in obtaining a small number of conditions per rule,
only beaten by GP and Ant-Miner. The reason why the GP
algorithm obtains the lowest values of conditions per rule
may lie in the fact that this algorithm considers a simplicity
component in the fitness function, and so the algorithm tries
to minimize this factor. GBAP also takes into account the
complexity of the rules in the reinforcement, as seen in
Section III-E.
TABLE VII
AVERAGE RESULTS OF THE ALGORITHMS
ALGORITHM
GBAP
ANT-MINER
ANT-MINER+
PSO/ACO2
GP
JRIP
PART
ACCURACY
81.85
79.25
80.29
80.63
70.22
80.99
81.26
#R
13.41
7.01
5.93
20.16
5.16
6.13
17.44
#C/R
1.65
1.52
3.13
3.02
1.30
1.93
2.11
The trade-off between comprehensibility and accuracy is
perfectly illustrated in the results obtained by the GP algo-
rithm, as it is the most comprehensible algorithm; however
it obtains the poorest accurate results. Despite, we can con-
clude by saying that the GBAP algorithm presents a good
comprehensibility-accuracy trade-off, since it is the algorithm
that presents the best ranking in accuracy, though it does
not give rise to bad comprehensibility results, reaching quite
competitive results in this sense, as shown before.
Finally, an example of a classifier obtained by GBAP on a
training fold of the hepatitis data set is shown in Table VIII.
TABLE VIII
SAMPLE CLASSIFIER ON HEPATITIS DATA SET
IF (!= ALBUMIN (-inf,2.65] ) THEN LIVE
ELSE IF (AND(!= PROTIME (44.5,inf) )
(= ASCITES yes) ) THEN DIE
ELSE IF (= ALBUMIN (-inf,2.65] ) THEN DIE
ELSE IF (AND(!= AGE (-inf,29] )
(= VARICES yes) ) THEN DIE
ELSE IF (= ANTIVIRALS no) THEN DIE
ELSE IF (AND(!= PROTIME (44.5,inf) )
(= ASCITES no) ) THEN DIE
ELSE IF (AND(= PROTIME (-inf,44.5] )
(= SPIDERS no) ) THEN DIE
ELSE LIVE
VI. CONCLUSIONS AND FUTURE WORK
In this paper we have presented a novel ACO-based au-
tomatic programming algorithm guided by a CFG for multi-
class classification. This algorithm, called GBAP, uses two
complementary heuristic measures that conduct the search
process for valid solutions, and offers as well the opportunity
to the user to modify the complexity of the rules mined by
simply varying the number of derivations allowed for the
grammar. In addition, the niching algorithm developed, which
is responsible for assigning a consequent to the rules and
selecting the rules that make up the final classifier, avoids
the disadvantages of sequential covering algorithms, because
it neither removes nor rules out examples from the training
data set.
Though GBAP had been originally designed for the DM
classification task, it can also be applied to other kinds of
problems, setting up another way of evaluating individuals and
designing a suitable grammar for the subject problem.
We have compared GBAP with other representative rule-
induction algorithms: three state-of-the-art algorithms (Ant-
Miner, Ant-Miner+ and PSO/ACO2), a GP algorithm, and two
Page 14
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART B: CYBERNETICS14
other industry standard classifiers (JRIP and PART) over eigh-
teen different data sets. Non-parametrical statistical methods
have been used to analyze the accuracy and comprehensibility
of the algorithms to conclude, on the one hand, that GBAP is
statistically more accurate than PSO/ACO2, Ant-Miner+, Ant-
Miner and the GP algorithm at a significance level of 95%, and
that GBAP is also competitive with JRIP and PART in terms of
accuracy. On the other hand, comprehensibility results prove
that GBAP is a competitive classifier in this sense, too. We
consider these results promising, as they demonstrate that AP
can be successfully employed to tackle classification problems,
just as GP has demonstrated in previous research.
ACKNOWLEDGMENTS
This work has been supported by the Regional Govern-
ment of Andalusia and the Ministry of Science and Technol-
ogy, projects P08-TIC-3720 and TIN2008-06681-C06-03, and
FEDER funds.
We would also thank the authors of Ant-Miner+ for kindly
providing the source code of their algorithm.
REFERENCES
[1] J. Han and M. Kamber, Data Mining: Concepts and Techniques.
Morgan Kauffman, 2006.
[2] S. B. Kotsiantis, I. D. Zaharakis, and P. E. Pintelas, “Machine learning:
a review of classification and combining techniques,” Artificial Intelli-
gence Reviews, vol. 26, pp. 159–190, 2006.
[3] H.-J. Huang and C.-N. Hsu, “Bayesian classification for data from
the same unknown class,” IEEE Transactions on Systems, Man, and
Cybernetics, Part B, vol. 32, no. 2, pp. 137–145, 2002.
[4] T.-M. Huang, V. Kecman, and I. Kopriva, “Support vector machines
in classification and regression - an introduction,” in Kernel Based
Algorithms for Mining Huge Data Sets: Supervised, Semi-supervised,
and Unsupervised Learning (Studies in Computational Intelligence).
Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2006.
[5] S. Haykin, Neural Networks and Learning Machines, 3rd ed. Pearson,
2009.
[6] S. U. Guan and F. Zhu, “An incremental approach to genetic-algorithms-
based classification,” IEEE Transactions on Systems, Man, and Cyber-
netics, Part B, vol. 35, no. 2, pp. 227–239, 2005.
[7] K. C. Tan, Q. Yu, C. M. Heng, and T. H. Lee, “Evolutionary
computing for knowledge discovery in medical diagnosis,” Artificial
Intelligencein Medicine, vol.
2003. [Online]. Available: http://www.sciencedirect.com/science/article/
B6T4K-47RRWS9-2/2/5c8dfaf6e49d194b0c8ed6e2fd1b5117
[8] M. Dorigo and T. St¨ utzle, The Ant Colony Optimization metaheuristic:
Algorithms, Applications and Advances, ser. International Series
in Operations Research and Management Science, F. Glover and
G. Kochenberger, Eds.Kluwer Academic Publishers, 2002, also
available as technical report TR/IRIDIA/2000-32, IRIDIA, Universit´ e
Libre de Bruxelles. [Online]. Available: ftp://iridia.ulb.ac.be/pub/
mdorigo/tec.reps/TR.11-MetaHandBook.pdf
[9] R. Parpinelli, A. A. Freitas, and H. S. Lopes, “Data mining with
an ant colony optimization algorithm,” IEEE Trans on Evolutionary
Computation, vol. 6, pp. 321–332, 2002.
[10] J. R. Koza, Genetic programming: on the programming of computers by
means of natural selection.Cambridge, MA: The MIT Press, 1992.
[11] O. Roux and C. Fonlupt, “Ant programming: or how to use ants for
automatic programming,” in ANTS’2000, M. Dorigo and E. Al, Eds.,
2000, pp. 121–129.
[12] P. Espejo, S. Ventura, and F. Herrera, “A survey on the application of
genetic programming to classification,” Systems, Man, and Cybernetics,
Part C: Applications and Reviews, IEEE Transactions on, vol. 40, no. 2,
pp. 121–144, march 2010.
[13] J. F¨ urnkranz, “Separate-and-conquer rule learning,” Artif. Intell.
Rev., vol.13, pp. 3–54, February
http://portal.acm.org/citation.cfm?id=309283.309291
[14] E. Bonabeu, T. Eric, and M. Dorigo, Swarm Intelligence: From Natural
to Artificial Systems. Nueva York, EUA : Oxford University, 1999.
27, no. 2, pp.129– 154,
1999. [Online].Available:
[15] M. Dorigo, V. Maniezzo, and A. Colorni, “The ant system: Optimization
by a colony of cooperating agents,” IEEE Transactions on Systems, Man,
and Cybernetics-Part B, vol. 26, pp. 29–41, 1996.
[16] D. Martens, B. Baesens, and T. Fawcett, “Editorial survey: swarm
intelligence for data mining,” Mach. Learn., vol. 82, pp. 1–42, January
2011. [Online]. Available: http://dx.doi.org/10.1007/s10994-010-5216-5
[17] B. Liu, H. A. Abbass, and B. McKay, “Density-based heuristic for rule
discovery with ant-miner,” in Proceedings of the 6th Australasia-Japan
Joint Workshop on Intell. Evol. Syst., 2002, pp. 180–184.
[18] ——, “Classification rule discovery with ant colony optimization,” in
IAT ’03: Proceedings of the IEEE/WIC International Conference on
Intelligent Agent Technology. Washington, DC, USA: IEEE Computer
Society, 2003, p. 83.
[19] Z. Wang and B. Feng, “Classification rule mining with an improved ant
colony algorithm,” LNAI, vol. 3339, pp. 357–367, 2004.
[20] C. Chen, Y. Chen, and J. He, “Neural network ensemble based ant
colony classification rule mining,” Innovative Computing, Information
and Control, International Conference on, vol. 3, pp. 427–430, 2006.
[Online]. Available: http://dx.doi.org/http://doi.ieeecomputersociety.org/
10.1109/ICICIC.2006.477
[21] A. Chan and A. Freitas, “A new classification-rule pruning procedure
for an ant colony algorithm,” 2006, pp. 25–36. [Online]. Available:
http://dx.doi.org/10.1007/11740698\ 3
[22] A. Chan and A. A. Freitas, “A new ant colony algorithm for multi-label
classification with applications in bioinformatics,” pp. 27–34, July 2006.
[23] J. Smaldon and A. A. Freitas, “A new version of the ant-miner algorithm
discovering unordered rule sets,” in GECCO, 2006, pp. 43–50.
[24] P. Jin, Y. Zhu, K. Hu, and S. Li, Classification Rule Mining Based on Ant
Colony Optimization Algorithm. Springer, 2006, vol. 344, pp. 654–663.
[Online]. Available: http://dx.doi.org/10.1007/978-3-540-37256-1\ 82
[25] M. Galea and Q. Shen, “Swarm intelligence in data mining,” Swarm
Intelligence in Data Mining, pp. 75–99, 2006. [Online]. Available:
http://dx.doi.org/10.1007/978-3-540-34956-3\ 4
[26] S. Swaminathan, “Rule induction using ant colony optimization for
mixed variable attributes,” Master’s thesis, 2006.
[27] D. Martens, M. De Backer, J. Vanthienen, M. Snoeck, and B. Baesens,
“Classification with ant colony optimization,” IEEE Transactions on
Evolutionary Computation, vol. 11, pp. 651–665, 2007.
[28] F. Otero, A. A. Freitas, and C. Johnson, “cant-miner: An ant colony
classification algorithm to cope with continuous attributes,” LNCS, vol.
5217, pp. 48–59, 2008.
[29] C. Nalini and P. Balasubramanie, “Discovering unordered rule sets for
mixed variables using an ant-miner algorithm,” Data Science Journal,
vol. 7, pp. 76–87, May 2008.
[30] F. E. Otero, A. A. Freitas, and C. G. Johnson, “A hierarchical classifica-
tion ant colony algorithm for predicting gene ontology terms,” in EvoBIO
’09: Proceedings of the 7th European Conference on Evolutionary
Computation, Machine Learning and Data Mining in Bioinformatics.
Berlin, Heidelberg: Springer-Verlag, 2009, pp. 68–79.
[31] K. M. Salama and A. M. Abdelbar, “Extensions to the ant-miner
classification rule discovery algorithm,” in Swarm Intelligence - Proc.
7th International Conference, ANTS 2010, LNCS, vol. 6234.
2010, pp. 167–178.
[32] T. St¨ utzle and H. H. Hoos, “Max-min ant system,” Future Generation
Computer Systems, vol. 16, pp. 889–914, 2000.
[33] N. Holden and A. A. Freitas, “A hybrid PSO/ACO algorithm
for discovering classification rules in data mining,” J. Artif. Evol.
App., vol. 2008, pp. 2:1–2:11, January 2008. [Online]. Available:
http://dx.doi.org/10.1155/2008/316145
[34] Y. Chen, B. Yang, and J. Dong, “Evolving flexible neural networks using
ant programming and pso algorithm,” in ISNN (1), 2004, pp. 211–216.
[35] M. Boryczka and Z. J. Czech, “Solving approximation problems by ant
colony programming,” in GECCO Late Breaking Papers, 2002, pp. 39–
46.
[36] M. Boryczka, Z. J. Czech, and W. Wieczorek, “Ant colony programming
for approximation problems,” in GECCO, 2003, pp. 142–143.
[37] J. Green, J. Whalley, and C. Johnson, “Automatic programming with
ant colony optimization,” in Proceedings of the 2004 UK Workshop on
Computational Intelligence, 2004, pp. 70–77.
[38] M. Boryczka, “Eliminating introns in ant colony programming,” Fun-
dam. Inf., vol. 68, no. 1-2, pp. 1–19, 2005.
[39] ——, “Ant colony programming with the candidate list,” LNAI, vol.
4953, pp. 302–311, 2008.
[40] H. A. Abbass, X. Hoai, and R. I. Mckay, “AntTAG: A new method to
compose computer programs using colonies of ants,” in In The IEEE
Congress on Evolutionary Computation, 2002, pp. 1654–1659.
Springer,
Page 15
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART B: CYBERNETICS 15
[41] P. Whigham, “Grammatically biased genetic programming,” in Proceed-
ings of the Workshop on Genetic Programming: from Theory to Real-
World Applications, 1995, pp. 33–41.
[42] N. Hoai and R. McKay, “A framework for tree adjunct grammar guided
genetic programming,” in Proceedings of the Post-Graduate ADFA
Conference on Computer Science (PACCS’01), 2001, pp. 93–99.
[43] C. Keber and M. G. Schuster, “Option valuation with generalized
ant programming,” in GECCO 2002: Proceedings of the Genetic and
Evolutionary Computation Conference, W. B. L. et al., Ed. New York:
Morgan Kaufmann Publishers, 9-13 Jul. 2002, pp. 74–81. [Online].
Available: http://www.cs.bham.ac.uk/∼wbl/biblio/gecco2002/aaaa075.ps
[44] A. Salehi-Abari and T. White, “Enhanced generalized ant programming
(EGAP),” in GECCO ’08: Proceedings of the 10th annual conference
on Genetic and evolutionary computation. New York, NY, USA: ACM,
2008, pp. 111–118.
[45] ——, “The uphill battle of ant programming vs. genetic programming,”
in IJCCI, 2009, pp. 171–176.
[46] S. Shirakawa, S. Ogino, and T. Nagao, “Dynamic ant programming for
automatic construction of programs,” IEEJ Transactions on Electrical
and Electronic Engineering (TEEE), vol. 3, no. 5, pp. 540–548, Aug
2008. [Online]. Available: http://dx.doi.org/doi:10.1002/tee.20311
[47] F. Neumann, D. Sudholt, and C. Witt, “Computational complexity
of ant colony optimization and its hybridization with local search,”
in Innovations in Swarm Intelligence, ser. Studies in Computational
Intelligence, C. Lim, L. Jain, and S. Dehuri, Eds.
/ Heidelberg, 2009, vol. 248, pp. 91–120. [Online]. Available:
http://dx.doi.org/10.1007/978-3-642-04225-6 6
[48] R. J. Mullen, D. Monekosso, S. Barman, and P. Remagnino, “A review
of ant algorithms,” Expert Systems with Applications, vol. 36, pp. 9608–
9617, 2009.
[49] A. Geyer-Schulz, Fuzzy Rule-Based Expert Systems and Genetic Ma-
chine Learning, ser. Studies in Fuzziness.
1995, vol. 3.
[50] P. Clark and R. Boswell, “Rule induction with CN2: Some recent
improvements,” in EWSL-91. Springer-Verlag, 1991, pp. 151–163.
[51] J.
´Avila,
E.Gibaja,A. Zafra,
algorithm to learn discriminant functions with multi-label patterns,”
in Intelligent Data Engineering and Automated Learning - IDEAL
2009, 2009, pp. 570–577. [Online]. Available: http://dx.doi.org/10.
1007/978-3-642-04394-9\ 69
[52] A. Frank and A. Asuncion, “UCI machine learning repository,” 2010.
[Online]. Available: http://archive.ics.uci.edu/ml
[53] U. M. Fayyad and K. B. Irani, “Multi-interval discretization of
continuous-valued attributes for classification learning,” in 13th In-
ternational Joint Conference on Uncertainly in Artificial Intelli-
gence(IJCAI93), 1993, pp. 1022–1029.
[54] C. C. Bojarczuk, H. S. Lopes, A. A. Freitas, and E. L. Michalkiewicz,
“A constrained-syntax genetic programming system for discovering clas-
sification rules: application to medical data sets,” Artificial Intelligence
in Medicine, vol. 30, pp. 27–48, 2004.
[55] S. Ventura, C. Romero, A. Zafra, J. A. Delgado, and C. Herv´ as, “JCLEC:
a java framework for evolutionary computation,” Soft Comput., vol. 12,
no. 4, pp. 381–392, 2007.
[56] D. H. Wolpert, “The lack of a priori distinctions between learning
algorithms,” Neural Comput., vol. 8, no. 7, pp. 1341–1390, 1996.
[57] D. H. Wolpert and W. G. Macready, “No free lunch theorems
for optimization,” IEEE Transactions on Evolutionary Computation,
vol. 1, no. 1, pp. 67–82, April 1997. [Online]. Available: http:
//dx.doi.org/10.1109/4235.585893
[58] J. Demˇ sar, “Statistical comparisons of classifiers over multiple data sets,”
J. Mach. Learn. Res., vol. 7, pp. 1–30, 2006.
[59] D. J. Sheskin, Handbook of Parametric and Nonparametric Statistical
Procedures. Chapman & Hall/CRC, 2007.
Springer Berlin
Heidelberg: Physica-Verlag,
and S. Ventura,“A niching
Juan Luis Olmo was born in Cordoba, Spain, in
1984. He received a B.Sc. degree from the Univer-
sity of Cordoba in 2005, and a M.Sc. degree from the
University Oberta of Catalonia, Barcelona, in 2007,
both in Computer Science.
Since 2009, he has been with the Department
of Computer Science and Numerical Analysis, the
University of Cordoba, Spain, where he is currently
working towards obtaining the Ph.D., as well as
developing teaching and research tasks, with a grant
from the regional government of Andalusia. His
research interests include the application of evolutionary computation and
swarm intelligence to data mining.
Juan Luis Olmo is a Member of the IEEE Computer, Computational
Intelligence and Systems, Man and Cybernetics societies and the Association
of Computing Machinery Special Interest Group on Genetic and Evolutionary
Computation.
Jos´ e Ra´ ul Romero is currently an Associate Profes-
sor at the Department of Computer Science of the
University of Cordoba, Spain. He received his Ph.D.
in Computer Science from the University of Malaga,
Spain, in 2007. He has worked as an IT consultant
for important business consulting and technology
companies for several years. His current research in-
terests include the use of bio-inspired algorithms for
data mining, the industrial use of formal methods,
open and distributed processing and model-driven
software development and its applications.
Dr. Romero is a member of IEEE, the ACM, and the Spanish Technical
Normalization Committee AEN/CTN 71/SC7 of AENOR. He can also be
reached at http://www.jrromero.net.
Sebasti´ an Ventura was born in Cordoba, Spain,
in 1966. He received the B.Sc. and Ph.D. degrees
from the University of Cordoba, in 1989 and 1996,
respectively.
He is currently Associate Professor in the De-
partment of Computer Science and Numerical Anal-
ysis, the University of Cordoba, where he heads
the Knowledge Discovery and Intelligent Systems
Research Laboratory. He is the author or coauthor
of more than 90 international publications, 30 of
which have been published in international journals.
He has also been engaged in eleven research projects (being the coordinator
of two of them) supported by the Spanish and Andalusian governments and
the European Union, concerning several aspects of the area of evolutionary
computation, machine learning, and data mining and its applications. His
current main research interests are in the fields of soft-computing, machine
learning, data mining and its applications.
Dr. Ventura is a Member of the IEEE Computer, Computational Intelli-
gence and Systems, Man and Cybernetics societies and the Association of
Computing Machinery.
Download full-text