An Automatic Programming ACO-Based Algorithm for Classification Rule Mining.
-
Citations (0)
-
Cited In (0)
Page 1
An Automatic Programming ACO-Based
Algorithm for Classification Rule Mining
J.L. Olmo, J.M. Luna, J.R. Romero, S. Ventura
Abstract In this paper we present a novel algorithm, named GBAP, that jointly uses
automaticprogrammingwithantcolonyoptimizationforminingclassificationrules.
GBAP is based on a context-free grammar that properly guides the search process of
valid rules. Furthermore, its most important characteristics are also discussed, such
as the use of two different heuristic measures for every transition rule, as well as the
way it evaluates the mined rules. These features enhance the final rule compilation
from the output classifier. Finally, the experiments over 17 diverse data sets prove
that the accuracy values obtained by GBAP are pretty competitive and even better
than those resulting from the top Ant-Miner algorithm.
1 Introduction
Data Mining (DM) entails the process of applying specific algorithms for extracting
comprehensible, non-trivial and useful knowledge from data. The DM classification
task aims to obtain a set of classification rules (a classifier) from a training data set.
Once the classifier is built, one can apply these rules to other uncategorized data in
order to label each instance with one of the predefined classes. The performance of
the classifier is typically measured with the accuracy obtained when applying the
classifier to a separate test set.
Support vector machines and neural networks have demonstrated to be accurate
solutions to build classifiers. However, they have the disadvantage of generating
non-linear classifiers. In constrast, logic based algorithms (i.e., decision trees and
rule-based classifiers) provide more interpretability, but they are not as accurate as
the previous methods [7].
Dept. of Computer Science and Numerical Analysis, Rabanales Campus, Albert Einstein building,
14071 Cordoba, Spain e-mail: {juanluisolmo, i32luarj, jrromero, sventura}@uco.es
1
Page 2
2J.L. Olmo, J.M. Luna, J.R. Romero, S. Ventura
Ant Colony Optimization (ACO) [4] is a nature-inspired optimization meta-
heuristic based in the behavior and organization of ant colonies in their search for
food. Ant algorithms have been successfully applied to a broad range of domains,
including the extraction of classification rules in DM. For example, Ant-Miner, orig-
inally proposed by Parpinelli and colleagues [10], was the first algorithm based in
ACO applied to the classification task. Ant-Miner follows a sequential-covering ap-
proach and has become a top algorithm in this field.
Furthermore, automatic programming is a method that uses search techniques
to construct automatically a program that solves a given problem automatically,
without requiring the user to know the structure of the solution. In fact, the problem
is solved by simply specifying the goals to be reached. Typical examples of this
method are Genetic Programming (GP) [2, 8] and Ant Programming (AP) [11],
which uses ACO as search technique. The former has demonstrated that is capable
to provide good performance for the design of classifiers, but the latter has never
been used to tackle classification problems.
In this work we explore the application of an AP algorithm for classification rule
mining. Thus, our proposal generates a rule-based classifier, which is composed of a
setofclassificationrulesthattaketheformIF <antecedent>THEN <consequent>.
Our algorithm aims to construct accurate but also comprehensible classifiers, and
first results show that it achieves good performance in terms of accuracy.
The remainder of the paper is organized as follows. In the next section we de-
scribe the proposed algorithm. Section 3 explains the experiments carried out and
the data sets employed. In Section 4 we discuss the results obtained. Finally, some
concluding remarks and ideas for future work are provided in Section 5.
2 The GBAP algorithm
In this section we introduce Grammar Based Ant Programming (GBAP) algorithm.
Roughly speaking, GBAP follows a grammar guided automatic programming
approach. This kind of systems are restricted by a defined grammar to ensure that
any solution found is syntactically valid. The goal of GBAP is to obtain a classifier
for a given data set, instead of a generic solution that could be applied to other
data sets. This classifier takes the form of a decision list where discovered rules are
sorted in descending order by fitness, and the bottom rule added to the classifier,
which corresponds to the majority class in the data set, acts as default rule.
In ACO-based algorithms there must be an environment where ants cooperate
each other. In GBAP, this environment is defined by the search space comprising
all the possible expressions or programs that can be derived from the grammar. The
space of states adopts the form of a derivation tree, and the path followed by the
artificial ant could be seen as the sequence of derivation steps that leads the ant to a
final state or solution.
In next sections we explain how classification rules are represented, presenting
also a detailed pseudocode and description of the main characteristics of GBAP.
Page 3
An Automatic Programming ACO-Based Algorithm for Classification Rule Mining 3
2.1 Rules encoding
GBAP follows the ant=rule (i.e., individual=rule) approach [5]. Notice that once
the ant is created, it just represents the antecedent of the new rule. In Section 2.2 we
will analyze how the consequent is properly assigned to the rule.
GBAP prescribes a context-free grammar for the representation of the individu-
als, as shown in Figure 1. Notice that this grammar is expressed in prefix notation
and should be always derived from the left. It implies that each transition from a
state i to another state j is triggered after applying a production rule to the first
non-terminal symbol of the state i. This design decision was taken because of per-
formance reasons, in order to expedite the calculations neccessary to compute the
rule fitness.
Fig. 1 Context-free grammar
used in GBAP, defined by
G = (V,Σ,R,S). Notice that
any production rule consists
of a left hand side (LHS)
and a right hand side (RHS).
The LHS always refers to
a non-terminal symbol that
might be replaced by the RHS
of the rule (composed of a
combination of terminal and
non-terminal symbols).
2.2 Pseudocode and main characteristics of GBAP
An important characteristic of GBAP is the incremental generation of the space of
states. In fact, depending on both the problem addressed and the number of deriva-
tionsfromthegrammarpermitted,itmaybeunfeasibletokeepinmemorythewhole
space of states. We therefore follow an incremental build approach in which as the
ants are created we store the states they visit. This requires that each ant stores first
the followed path. For this reason, the initial space of states is empty and all the
possible transitions have the same amount of pheromones.
Another important characteristic of the algorithm proposed is that it considers
two complementary heuristic measures. A first one is the cardinality of the produc-
tion rules (Pcard), and it is used due to the shape adopted by the space of states. This
measure increases the probability of choosing transitions that lead to a greater num-
ber of solutions, and it is based on the cardinality measure proposed in [6]. When
initializing the grammar in the algorithm, a cardinality table for the maximum num-
Page 4
4J.L. Olmo, J.M. Luna, J.R. Romero, S. Ventura
Algorithm 1 High level pseudocode of GBAP
Require: numberOfGenerations,numberOfAnts
1: Initialize space of states, starting up the grammar
2: Create the classifier
3: for i = 0 to i = numberOfGenerations do
4: Create list ants ← {}
5:
for j = 0 to j = numberOfAnts do
6:ant ← Create new ant
7:Store ant’s path states in the space of states
8: Evaluate ant, computing its fitness for each available class in the data set
9:Add ant to the list ants
10:
end for
11:Do niching algorithm, assigning the consequent to the ants and establishing the classifier
rules
12:
for each ant in ants do
13:
if fitness >threshold then
14: Update pheromone rate in the path followed by ant proportionally to its fitness
15:
end if
16:
end for
17: Evaporate the pheromone rate along the whole space of states
18: Normalize values of pheromones
19: end for
20: Establish the default rule in the classifier
21: predictiveAccuracy ← Compute the predictive accuracy obtained by the classifier when run-
ning over the test set
22: return predictiveAccuracy
ber of derivations allowed is computed per each production rule. Given a state i and
all its possible subsequent states, the value of this heuristic for each possible transi-
tion is defined as the ratio between the number of solutions that can be successfully
reached if the ant goes to the destination state applying this transition, and the num-
ber of all possible solutions that can be reached from the source state. Notice that
this heuristic measure is only taken into account for intermediate transitions.
A second one is the information gain (G(Ai)). It is only used in transitions involv-
ing the application of production rules that imply the selection of attributes of the
problem domain (i.e., <COND>:=operator−attribute−value). This measure is
similar to the one used by the Ant-Miner algorithm.
The use of both heuristic measures affects the creation process of new ants, when
they move to the next state of their path. The transition rule will assign a probability
to each available next state. In case of derivations that are not able to reach any final
state in a number of steps less than or equal to the maximum number of derivations
remaining, a probability equal to zero will be assigned and, in consequence, the ant
will not select such a movement.
The probability that a given ant moves from a state i to another valid state j is
defined by the following equation:
Page 5
An Automatic Programming ACO-Based Algorithm for Classification Rule Mining 5
Pij=
ηα
ij·τβ
i=0ηα
ij
Σj
ij·τβ
ij
(1)
where α is the heuristic exponent, β is the pheromones exponent and η is computed
as G(Ai)+Pcard(at least one of the two components must be equal to zero).
The fitness function that GBAP uses in the training stage for measuring the qual-
ity of the ants is the Laplace accuracy [3], which is defined as:
fitness = LaplaceAccuracy =
1+TP
k+TP+FP
(2)
where TP and FP stands for true positives and false positives, respectively, and k
refers to the number of classes in the data set.
Concerning the assignment of the consequent, GBAP follows a niching approach
analogous to that employed in [1], whose purpose is to evolve different multiple
rules for predicting each class in the data set while preserving the diversity. De-
pending on each dataset and in the distribution of the instances by class, it is often
not possible for a rule to cover all instances of a class and therefore it is necessary
to discover additional rules for predicting this class. The niching algorithm takes
care of it but it does not overlap with instances of another class. In addition, it is
appropriate when removing redundant rules.
In the niching algorithm developed every instance in a data set is called a token,
for which all ants in the colony will compete to capture. First of all GBAP computes
an array of k fitness values per individual, one for each class (assuming that the
respective class is assigned as consequent to the individual). Then, the following
steps are repeated for each class: first, the ants are sorted by their respectively class
fitness in descending order. Second, each ant tries to take as many tokens as it covers
in case of tokens that belong to the computing class and also if the token has not
been seized by other ant previously. Finally, the ant’s adjusted fitness for this class
is computed as:
ad justedFitness = fitness·numberOfCapturedTokens
Once the k adjusted fitnesses have been calculated, the consequent assigned to
each ant corresponds to the one that reports the best adjusted fitness. To conclude,
individuals that have an adjusted fitness greater than zero –and consequently cover
at least one instance of the train set– are added to the classifier.
Finally, regarding the pheromone update, if the quality of an ant is greater than
a threshold value, then a delayed pheromone update over the path of this ant takes
place. The threshold value has been fixed to 0.5 with the aim that bad solutions will
never incluence the environment. The reinforcement is based on the quality of the
solution encoded by the ant:
numberOfClassTokens
(3)
τij(t +1) = τij(1−ρ)+τij·Q· fitness (4)
Page 6
6J.L. Olmo, J.M. Luna, J.R. Romero, S. Ventura
where τ represents the amount of pheromones in the transition from the state i to
the state j; ρ, the evaporation rate; and Q is the parameter that permits to vary the
influence of the reinforcement.
3 Data sets and preprocessing
This section describes the data sets used for the experimentation, as well as the
preprocessing steps performed.
GBAP algorithm has been tested with many diverse data sets from the well-
known UCI1machine learning repository. In fact, the data sets selected for the ex-
periments present varied dimensionality, and some of them include missing values,
while other do not, as shown in Table 1.
During the preprocessing stage we performed the following two actions using the
Weka Machine Learning library2. Firstly, data sets comprising missing values were
preprocessed replacing these values with the mode (in case of nominal attributes)
and the arithmetic mean (in case of numerical attributes) of the entire data set [9].
Secondly, data sets with numerical attributes were properly discretized in order to
only deal with categorical attributes [12].
Regarding the algorithm evaluation, we applied a stratified 10-fold cross-vali-
dation, so that the prediction performance is considered as the average accuracyover
Table 1 Data sets description
DATASET
MISSING VALUES INSTANCES
ATTRIBUTES
CLASSES
Continuous Binary Nominal
6
60
0
6
33
7
9
0
6
0
4
13
4
3
9
1
0
Hepatitis
Sonar
Breast-c
Heart-c
Ionosphere
Horse-c
Breast-w
Diabetes
Credit-g
Mushroom
Iris
Wine
Balance-scale
Lymphography
Glass
Zoo
Primary-tumor
yes
no
yes
yes
no
yes
yes
no
no
yes
no
no
no
no
no
no
yes
155
208
286
303
351
368
699
768
1000
8124
150
178
625
148
214
101
339
13
0
3
3
1
2
0
8
3
0
0
0
0
9
0
15
14
0
0
6
4
0
2
2
2
2
2
2
2
2
2
2
3
3
3
4
6
7
13
0
0
11
22
0
0
0
6
0
0
321
1All data sets can be reached from the UCI website at http://archive.ics.uci.edu/ml/datasets.html
2The Weka library is publicly available at http://www.cs.waikato.ac.nz/ml/index.html
Page 7
An Automatic Programming ACO-Based Algorithm for Classification Rule Mining7
Table 2 Predictive accuracy(%) comparative results
Dataset
Hepatitis
Sonar
Breast-c
Heart-c
Ionosphere
Horse-c
Breast-w
Diabetes
Credit-g
Mushroom
Iris
Wine
Balance-scale
Lymphography 81,00
Glass
Zoo
Primary
GBAP Ant-Miner
82,17
81,98
71,40
82,84
93,02
82,97
96,50
75,80
70,79
98,26
96,67
97,01
75,49
79,17
74,70
73,12
76,62
87,41
84,38
92,13
72,84
70,36
97,06
95,33
92,08
68,09
77,69
64,43
83,85
35,30
69,13
95,60
37,91
these 10 folds. In the stratified 10-fold cross-validation the entire data set is splitted
into 10 mutually exclusive partitions, P1,...,Pk, containing approximately the same
number of patterns, and the same proportion of classes than the original data set.
Then, 10 different experiments were executed using?
j?=iPjas the training set at the
ith-experiment and Pias the test set.
4 Results
Experiments compare the performance of GBAP against Ant-Miner, in terms of
predictiveaccuracy,overthe datasets listed in Table1. More specifically,Ant-Miner
was used in all executions –10 per dataset– with its default parameters. For the
GBAP algorithm its configuration parameters were set to: number of ants = 20,
number of generations = 100, max number of derivations = 15, initial pheromone
amount = 1.0, evaporation rate = 0.05, min pheromone amount = 0.1, Q = 1.0, alpha
= 0.4, and betha = 1.0.
Table2summarizestheresultsobtained,whereeachrowshowsthedatasettested
and the resulting average accuracy (in %) for both algorithms. Best results per algo-
rithm and dataset are highlighted in bold typeface.
As can be seen, GBAP obtains best results in approximately 88% of cases. We
also analyzed the statistic significance of the obtained results applying the non-
parametric Wilcoxon pair-test, and the results (z = 3.195, p < 0.001) proved that
there are significant differences with a probability of 99%, where GBAP is signifi-
cantly more accurate than Ant-Miner.
Page 8
8J.L. Olmo, J.M. Luna, J.R. Romero, S. Ventura
5 Conclusions and future work
In this paper we presented a novel automatic programming algorithm based in ACO
restricted by the use of a context-free grammar for mining classification rules from
diverse data sets. The proposal is supported by a two-sided heuristic function that
guides the search process of the valid solutions, as well as the chance of modifying
the complexity of rules mined by simply varying the number of derivations allowed
for the grammar.
In this work we have compared the performance of GBAP against Ant-Miner in
17 different data sets publicly available. The obtained results prove that the former
is significantly more accurate than the latter. As future work we plan to apply other
problem-dependent heuristic measures. We also will explore the consideration of
adding new functionality to deal with continuous attributes and adapting the current
algorithm to the multi-objective approach.
Acknowledgments. This work has been supported by the Regional Govern-
ment of Andalucia and Ministry of Science and Technology, projects TIC-3720 and
TIN2008-06681-C06-03.
References
1. J.´Avila, E. Gibaja, A. Zafra, and S. Ventura. A niching algorithm to learn discriminant func-
tions with multi-label patterns. In Intelligent Data Engineering and Automated Learning -
IDEAL 2009, pages 570–577. 2009.
2. Wolfgang Banzhaf, Peter Nordin, Robert E. Keller, and Frank D. Francone. Genetic Program-
ming - An Introduction; On the Automatic Evolution of Computer Programs and its Applica-
tions. Morgan Kaufmann, San Francisco, CA, USA, January 1998.
3. Peter Clark and Robin Boswell. Rule induction with CN2: Some recent improvements. In
EWSL-91, pages 151–163. Springer-Verlag, 1991.
4. M. Dorigo and C. Blum. Ant colony optimization theory: a survey. Theoretical Computer
Science, 344:243–278, 2005.
5. Pedro G. Espejo, Sebasti´ an Ventura, and Francisco Herrera. A survey on the application of
genetic programming to classification. IEEE Transactions on System, Man and Cybernetics
Part C, Article in press, 2008.
6. Andreas Geyer-Schulz. Fuzzy Rule-Based Expert Systems and Genetic Machine Learning,
volume 3 of Studies in Fuzziness. Physica-Verlag, Heidelberg, 1995.
7. S.B.Kotsiantis,I.D.Zaharakis,andP.E.Pintelas. Machinelearning:areviewofclassification
and combining techniques. Artificial Intelligence Reviews, 26:159–190, 2006.
8. J. R. Koza. Genetic programming: on the programming of computers by means of natural
selection. The MIT Press, Cambridge, MA, 1992.
9. D. T. Larose. Discovering Knowledge in Data: An Introduction to Data Mining. Wiley, 2005.
10. R. Parpinelli, A. A. Freitas, and H. S. Lopes. Data mining with an ant colony optimization
algorithm. IEEE Trans on Evolutionary Computation, 6:321–332, 2002.
11. O. Roux and C. Fonlupt. Ant programming: or how to ants for automatic programming. In
M. Dorigo and Et Al, editors, ANTS’2000, pages 121–129, 2000.
12. I. H. Witten and E. Frank. Data Mining Practical Machine Learning Tools And Techniques.
Morgan Kauffman, 2005.
View other sources
Hide other sources
-
Available from José María Luna · 5 Dec 2012
-
Available from ugr.es
-
Available from ugr.es
-
Available from ugr.es