Conference PaperPDF Available

Particle swarm and bayesian networks applied to attribute selection for protein functional classification


Abstract and Figures

The Discrete Particle Swarm (DPSO) algorithm is an optimization procedure that belongs to the fertile paradigm of Swarm Intelli- gence. The DPSO was designed for the task of attribute selection and it deals with discrete variables in a straightforward manner. This paper extends the DPSO algorithm in two ways. First, we enable the DPSO to select attributes for a Bayesian network algo- rithm, which is a much more sophisticated algorithm than the Naive Bayes classifier previously used by this algorithm. Second, we ap- ply the DPSO to a challenging protein functional classification data set, involving a large number of classes to be predicted. The per- formance of the DPSO is compared to the performance of a Binary PSO on the task of selecting attributes in this challenging data set. The criteria used for comparison are: (1) maximizing predictive accuracy; and (2) finding the smallest subset of attributes.
Content may be subject to copyright.
Particle Swarm and Bayesian Networks Applied to
Attribute Selection for Protein Functional Classification
Elon S. Correa
Computing Laboratory and
Centre for BioMedical
University of Kent
Canterbury, CT2 7NF, UK
Alex A. Freitas
Computing Laboratory and
Centre for BioMedical
University of Kent
Canterbury, CT2 7NF, UK
Colin G. Johnson
Computing Laboratory and
Centre for BioMedical
University of Kent
Canterbury, CT2 7NF, UK
The Discrete Particle Swarm (DPSO) algorithm is an optimization
method that belongs to the fertile paradigm of Swarm Intelligence.
The DPSO was designed for the task of attribute selection and it
deals with discrete variables in a straightforward manner. This
work extends the DPSO algorithm in two ways. First, we enable
the DPSO to select attributes for a Bayesian network algorithm,
which is a much more sophisticated algorithm than the Naive Bayes
classifier previously used by this algorithm. Second, we apply the
DPSO to a challenging protein functional classification data set, in-
volving a large number of classes to be predicted. The performance
of the DPSO is compared to the performance of a Binary PSO on
the task of selecting attributes in this challenging data set. The cri-
teria used for comparison are: (1) maximizing predictive accuracy;
and (2) finding the smallest subset of attributes.
Categories and Subject Descriptors
I.2.6 [Computing Methodologies]: Artificial Intelligence—Learn-
ing, induction.
General Terms
Algorithms, performance.
Particle swarm, Data Mining, attribute selection, Naive Bayes clas-
sifier, Bayesian networks, bioinformatics.
Most of the particle swarm algorithms present in the literature
deal only with continuous variables [1, 9, 17]. This is a signif-
icant limitation because many optimization problems are set in a
space featuring discrete variables. Typical examples include prob-
lems which require the ordering or arranging of discrete variables,
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
GECCO’07, July 7–11, 2007, London, United Kingdom.
Copyright 2007 ACM 978-1-59593-697-4/07/0007...$5.00.
such as scheduling or routing problems [24]. Therefore, the de-
sign of particle swarm algorithms that deal with discrete variables
is pertinent to this field of study.
In [4] we proposed a discrete Particle Swarm Optimization (PSO)
algorithm for attribute selection in Data Mining. We will refer to
that algorithm as the Discrete Particle Swarm Optimization (DPSO)
algorithm. The DPSO deals with discrete variables, and its popula-
tion of candidate solutions contains particles of dierent sizes – it
forces the particles to have a constant number of attributes across
iterations. The motivation and main innovation of the DPSO al-
gorithm is to interpret the concept of velocity, used in traditional
PSO, as “probability"; render velocity as a proportional likelihood
and use this information to sample new particle positions. Though
the DPSO has been designed for an attribute selection task, it is
not limited to this kind of application. With few modifications,
the DPSO may potentially be applied to other discrete optimization
problems, such as facility location problems [5].
Many data mining applications involve the task of building a
model for predictive classification. The goal of such a model is to
classify examples (records or data instances) into classes or cate-
gories of the same type. Noise or unimportant variables (attributes)
may reduce the accuracy and reliability of a classification or pre-
diction model. Unnecessary variables (attributes) also increase the
costs of building and running a model – particularly on large data
sets. It is therefore important to select an appropriate subset of
“good" attributes before performing classification. Attribute selec-
tion tries to simplify a data set by reducing its dimensionality and
identifying relevant underlying attributes without sacrificing pre-
dictive accuracy. As a result, it reduces redundancy in the informa-
tion provided by the attributes eectively used for prediction. For
a more detailed review of the attribute selection task using genetic
algorithms see [7].
The DPSO algorithm was designed to the data mining task of
attribute selection. It diers from other traditional PSO algorithms
because its particles do not represent points inside an n-dimensional
Euclidean space (continuous case) or lattice (binary case) as in the
standard PSO algorithms [14]. Instead, they represent a combina-
tion of selected attributes. In previous work the DPSO was used to
select attributes for a Naive Bayes (NB) classifier. The NB classi-
fier was used to predict postsynaptic function in proteins.
This new study extends that previous work in two ways. First,
we enable the DPSO to select attributes for a Bayesian network al-
gorithm, which is much more sophisticated than the Naive Bayes
algorithm previously used. Second, we apply DPSO to a more chal-
lenging protein functional classification data set. This data set has
a much larger number of classes to be predicted than the previously
tested postsynaptic data set – which had just two classes to be pre-
The organization of the paper is: Section 2 briefly addresses
Bayesian networks and Naive Bayes classifier. Section 3 shortly
discusses PSO algorithms. Section 4 describes the standard Binary
PSO algorithm and Section 5 the DPSO algorithm. Section 6 sum-
marizes G protein-coupled receptors (GPCRs). Section 7 reports
computational experiments. It also includes a brief discussion of
the results obtained. Section 8 presents conclusions and points out
future research directions. The following subsection presents nota-
tion used throughout this paper.
1.1 Notation
We denote a random variable by an uppercase letter, i.e., Xand
the state or value of this random variable by a similar lowercase
letter, i.e., x. An uppercase letter with an arrow over the letter, e.g.,
X, denotes a vector of random variables.
X=(X1,X2, ..., Xn) de-
notes an n-dimensional vector of random variables. Abusing the
mathematical notation, we use
X={X1,X2, ..., Xn}(note the braces
“{}”) to represent a vector of random variables which is also a set
of indices.
X={X1,X2, ..., Xn}is a set of indices in the math-
ematical sense of set. That is, there are no duplicated indices and
there is no ordering among the indices X1,X2, ..., Xn. Given a candi-
date solution, say
X(i), the symbol f(
X(i)), called the fitness func-
tion, represents a measurement of how well the solution
X(i) solves
the target problem. Subsection 7.1 describes how the measurement
X(i)) is computed in the present work.
The Naive Bayes classifier uses a probabilistic approach to as-
sign each example (record) of the data set to a possible class. In
our application, it assigns a record (protein) of the data set to one
of the possible classes. A Naive Bayes classifier assumes that all
attributes are conditionally independent of one another [18].
A Bayesian network, by contrast, detects probabilistic dependen-
cies among these attributes and uses this information to benefit the
attribute selection process.
A Bayesian network (BN) is a graphical representation of a prob-
ability distribution over a set of variables of a given problem do-
main [10, 20]. This graphical representation is a directed acyclic
graph in which nodes represent the variables of the problem and
arcs represent conditional probabilistic dependencies among the
nodes. The network structure encodes probabilistic dependencies
among domain variables and a joint probability distribution quan-
tifies the strength of these dependencies.
An example of a Bayesian network is as follows1. Suppose that
a doctor is treating a patient who has been suering from shortness
of breath (called dyspnoea). The doctor knows that diseases such as
tuberculosis and bronchitis are possible causes for that, as well as
lung cancer. The doctor also knows that other relevant information
includes whether the patient is a smoker (increasing the chances of
cancer and bronchitis) and what sort of air pollution the patient has
been exposed to. A positive X-ray would indicate either tuberculo-
sis or lung cancer. The set of variables for this problem and their
possible values are shown in Table 1.
Figure 1 shows a Bayesian network representing this problem.
For applications of Bayesian networks on evolutionary algorithms
and optimization problems see [15, 21].
1This is a modified version of the so-called “Asia" problem, [16],
given in §2.5.3.
Table 1: Bayesian network: nodes and values for the lung can-
cer problem. L =low, H =high, T =true, F =false, Pos =
positive and Neg =negative.
Node name Values
Pollution {L, H}
Smoker {T, F}
Cancer {T, F}
Dyspnoea {T, F}
X-ray {Pos, Neg}
Figure 1: A Bayesian network for the lung cancer problem.
Parents(Xi) represents the set of nodes (attributes) that have a
directed edge pointing to Xi. More formally, consider a BN con-
taining nodes, X1to X, taken in that order. A particular value
X={X1,X2, ..., X}in the joint probability distribution is repre-
sented by:
X)=p(X1=x1,X2=x2, ..., X=x),
or more compactly, p(x1,x2, ..., x). The chain rule of probability
theory allows us to factorize joint probabilities, therefore:
X)=p(x1)p(x2|x1)... p(x|x1, ..., x1)
p(xi|x1, ..., xi1).(1)
As the structure of a BN implies that the value of a particular
node is conditional only on the values of its parent nodes, Equation
1 may be reduced to:
Learning the structure of a BN is an NP-hard problem [2, 3].
Many algorithms developed to this end use a scoring metric and
a search procedure. The scoring metric evaluates the goodness-
of-fit of a structure to the data. The search procedure generates
alternative structures and selects the best one based on the scoring
metric. To reduce the search space of networks, only candidate
networks in which each node has at most kinward arcs (parents)
are considered – kis a parameter determined by the user. In this
work we use k=20 to avoid overly complex models.
To generate alternative structures for our BN we used a greedy
search algorithm. Starting with an empty network, the greedy search
algorithm adds into the network the edge that most increases the
score of the resulting network. The search stops when no other edge
addition improves the score of the network. Algorithm 1 shows the
pseudocode of our generic greedy search algorithm.
Algorithm 1 Pseudocode for a generic greedy search algorithm
Require: Initialize an empty Bayesian network Gcontainingn
nodes (i.e., a BN with nnodes but no edges)
1: Evaluate the score of G:Score(G)
2: G’ =G
3: for i=1 to ndo
4: for j=1 to ndo
5: if i,jthen
6: if there is no edge between the nodes iand jin Gthen
7: Modify G’ by adding an edge between the nodes iand jin G
such that iis a parent of j: (ij)
8: if the resulting G’ is a DAG then
9: if (Score(G’)>Score(G)) then
10: G=G’
11: end if
12: end if
13: end if
14: end if
15: G’ =G
16: end for
17: end for
In this work we evaluate the “goodness-of-fit” (score) of a net-
work structure to the data using an unconventional scoring metric.
To evaluate the score of candidate networks we proceed as follows.
We divide the data set into 10 equally sized folds. For all class
levels each fold maintains roughly the same proportion of classes
present in the whole data set before division. This is called strat-
ified cross-validation. Eight of the ten folds are used to compute
the probabilities for the bayesian network. The ninth fold is used
as validation set and the tenth fold as test set. During the search
for the network structure only the validation set is used to compute
predictive accuracy. The score of the candidate networks is given
by the predictive accuracy of the classification of the proteins in the
validation set. The network that shows the highest predictive accu-
racy on the validation set is then used to compute the predictive
accuracy on the test set. Once the network structure is selected, the
nine folds are merged and this merged data set is used to compute
the probabilities for the selected Bayesian network. The predictive
accuracy (reported as the final result) is then computed on the pre-
viously untouched test set fold. Every fold will be once used as
validation set and once used as test set. This process is discussed
again, somewhat in more details, in subsection 7.1 when the com-
putation of a fitness function is presented. A similar process is
adopted for the computation of the predictive accuracy using the
Naive Bayes classifier.
Particle Swarm Optimization (PSO) comprises a set of search
techniques, inspired by the behavior of natural swarms, for solv-
ing optimization problems [14]. In PSO a potential solution to a
problem is represented by a particle,
X(i)=(X(i,1),X(i,2) , ..., X(i,n)),
in an n-dimensional search space. The coordinates X(i,d)of these
particles have a rate of change (velocity) v(i,d),d=1, 2, ..., n. Every
particle keeps a record of the best position that it has ever visited.
Such a record is called the particle’s previous best position and de-
noted by
B(i). The global best position attained by any particle
so far is also recorded and stored in a particle denoted by
G. An
iteration comprises evaluation of each particle, then stochastic ad-
justment of v(i,d)in the direction of particle
X(i)’s previous best
position and the previous best position of any particle in the neigh-
borhood [13]. There is much variety in the neighborhood topology
used in PSO, but quite often gbest or lbest topologies are used. In
the gbest topology every particle has only the global best particle
Gas its neighbor. In the lbest topology, usually, each particle has
a number of other particles to its right and left as neighbors. For
a review of the neighborhood topologies used in PSO the reader is
referred to [12, 14].
As a whole, the set of rules that govern PSO are: evaluate, com-
pare and imitate. The evaluation phase measures how well each
particle (candidate solution) solves the problem at hand. The com-
parison phase identifies the best particles. The imitation phase pro-
duces new particle positions based on some of the best particles
previously found. These three phases are repeated until a given
stopping criterion is met. The objective is to find the particle that
best solves the target problem.
Important concepts in PSO are velocity and neighborhood topol-
ogy. Each particle,
X(i), is associated with a velocity vector. This
velocity vector is updated at every generation. The updated veloc-
ity vector is then used to generate a new particle position
X(i). The
neighborhood topology defines how other particles in the swarm,
such as
B(i) and
G, interact with
X(i) to modify its respective ve-
locity vector and, consequently, its position as well.
The standard binary version of the PSO algorithm [14] works as
follows. Potential solutions (particles) to the target problem are en-
coded as fixed length binary strings; i.e.,
X(i)=(X(i,1),X(i,2) , ..., X(i,n)),
where X(i,j){0, 1}, i=1, 2,..., Nand j=1, 2, ..., n. Given a list
of attributes A=(A1,A2, ..., An), the first element of
X(i), from the
left to the right hand side, corresponds to the first attribute “A1”,
the second to the second attribute “A2”, and so forth. A value of
0 on the site associated to an attribute indicates that the respective
attribute is not selected. A value of 1 means that it is selected.
4.1 The initial population for the standard
Binary PSO algorithm
For the initial population, Nbinary strings of length nare ran-
domly generated. Each particle
X(i) is independently generated as
follows. For every position X(i,d)of
X(i) a uniform random num-
ber ϕis drawn on the interval (0, 1). If ϕ < 0.5, then X(i,d)=1,
otherwise X(i,d)=0. We then record this exactly initial population
to be used as the initial population by the DPSO algorithm. This is
to try to make the comparison between both algorithms as fair as
4.2 Updating the records
At the beginning, the previous best position of
X(i), denoted by
B(i), is empty. Therefore, once the initial particle
X(i) is gener-
B(i) is set to
X(i). After that, every time that
is updated,
B(i) is also updated if f(
X(i)) is better than f(
B(i) remains as it is. A similar process is used to up-
date the global best position
G. At the beginning,
Gis also empty.
Therefore, once all the
B(i) have been determined,
Gis set to the
B(i) previously computed. After that,
Gis updated if the
fittest f(
B(i)) in the swarm is better than f(
G(i)). And, in that case,
G(i)) is set to f(
G(i)) =fittest f(
B(i)). Otherwise,
Gremains as
it is.
4.3 Updating the velocities for the standard
Binary PSO algorithm
Every particle
X(i) is associated to a unique vector of velocities
V(i)=(v(i,1),v(i,2) , ..., v(i,n)). The elements v(i,d)in V(i) determine
the rate of change of each respective coordinate X(i,d)in
X(i), d=
1, 2, ..., n. Each element v(i,d)V(i) is updated according to the
v(i,d)=w v(i,d)+ϕ1(b(i,d)X(i,d))+ϕ2(g(d)X(i,d)),(3)
where w(0 <w<1), called the inertia weight, is a constant value
chosen by the user. Equation 3 is a standard equation used in PSO
algorithms to update the velocities [11, 22]. Note that X(i,d)is the
dth component of
X(i); b(i,d)is the dth component of
B(i); g(d)is the
dth component of
Gand d=1, 2, ..., n. The factors ϕ1and ϕ2are
uniform random numbers independently generated in the interval
(0, 1).
4.4 Sampling new particle positions for the
standard Binary PSO algorithm
New particle positions are sampled as follows. For each particle
X(i) and each dimension d, the value of the new coordinate X(i,d)
X(i) can be either 0 or 1. The decision of whether X(i,d)will be 0 or
1 is based on its respective velocity v(i,d)V(i) and is given by the
following equation:
X(i,d)=(1,if(rand <S(v(i,d)))
0,otherwise; (4)
where 0 rand 1 is a uniform random number and
is the sigmoid function. Equation 4 is a standard equation used to
sample new particle positions in the Binary PSO algorithm [14].
Note that the lower the value of v(i,d)the more likely the value of
X(i,d)will be 0. By contrast, the higher the value of v(i,d)the more
likely the value of X(i,d)will be 1. The next section presents the
DPSO algorithm.
This algorithm deals with discrete variables (attributes) and its
population of candidate solutions contains particles of dierent sizes.
Potential solutions to the optimization problem at hand are repre-
sented by a swarm of particles. There are Nparticles in a swarm.
The length of each particle may vary from 1 to n, where nis the
number of attributes of the problem. Each particle
X(i) keeps a
record of the best position it has ever attained. This information
is stored in a separated particle labeled as
B(i). The swarm also
keeps a record of the global best position ever attained by any par-
ticle in the swarm. This information is also stored in a separated
particle labeled
G. Note that
Gis equal to the best
B(i) present in
the swarm.
5.1 Encoding of the particles for the DPSO
Each attribute is identified by a unique positive integer number,
or index. These numbers, indices, vary from 1 to n. A particle is a
subset of non-ordered indices without repetition, e.g.,
X(i)={2, 4,
18, 1}.
5.2 The initial population for the DPSO
The initial population of solutions used by the DPSO is always
identical to the initial population used by the Binary PSO. They dif-
fer only in the way in which solutions are represented. We translate
all candidate solution in the initial population (of the binary PSO
to the Discrete PSO population) in the following way: the index of
every attribute that has value 1 is copied to the new solution (parti-
cle) of the DPSO initial population. For instance, a solution equal
to (1, 0, 1, 1, 0) is translated into {1, 3, 4}.
5.3 Velocities = proportional likelihoods
The DPSO algorithm does not use a vector of velocities as the
standard PSO algorithm does. It works with proportional likeli-
hoods instead. Arguably, the notion of proportional likelihood used
in the DPSO algorithm and the notion of velocity used in the stan-
dard PSO are somewhat similar. We use ˙
V(i) to represent an array
of proportional likelihoods and ˙vto represent one of its compo-
nents. Every particle is associated with a 2-by-narray of propor-
tional likelihoods, where 2 is the number of rows in this array and n
is the number of columns. A generic proportional likelihood array
looks like this:
V(i)= proportional likelihood row
attribute index row !.
Each of the nelements in the first row of ˙
V(i) represents the pro-
portional likelihood that an attribute be selected. The second row of
V(i) shows the indices of the attributes associated with the respec-
tive proportional likelihoods. There is a one-to-one correspondence
between the columns of this array and the attributes of the problem
domain. At the beginning, all elements in the first row of ˙
V(i) are
set to 1, for example:
V(i)= 1 1 1 1 1
1 2 3 4 5!.
After the initial population of particles is generated, this array is al-
ways updated before a new configuration for the particle associated
to it is generated. The updating process is based on
B(i) and
Gand works as follows. In addition to
B(i) and
G, three con-
stant updating factors, namely, α,βand γ, are used to update the
proportional likelihoods ˙v(i,d). These factors determine the strength
of the contribution of
B(i) and
Gto the adjustment of every
coordinate ˙v(i,d)˙
V(i). Note that α,βand γare parameters chosen
by the user. The contribution of these parameters to the updating of
˙v(i,d)is as follows. All indices present in
X(i) have their correspon-
dent proportional likelihood increased by α. In addition to that, all
indices present in
B(i) have their correspondent proportional like-
lihood increased by β. The same for
Gfor which the proportional
likelihoods are increased by γ. For instance, given n=5, α=0.10,
β=0.12, γ=0.14,
X(i)={2, 3, 4},
B(i)={3, 5, 2},
G={5, 2}
and also:
V(i)= 1 1 1 1 1
1 2 3 4 5!, the updated ˙
V(i) would be:
V(i)= 1 1 +α+β+γ1+α+β1+α1+β+γ
1 2 3 4 5 !.
Note that index 1 is not present in
B(i) or
G. Therefore, the
proportional likelihood of attribute 1 in ˙
V(i) remains as it is. This
new updated array replaces the old one and will be used to generate
a new configuration to the particle associated to it as follows.
5.4 Sampling new particle positions for the
DPSO algorithm
The proportional likelihood array ˙
V(i) is then used to sample a
new instance of particle
X(i) – that is, the particle associated to it.
First, every element of the first row of the array ˙
V(i) is multiplied by
a uniform random number between 0 and 1. A new random number
is drawn for every single multiplication performed. To illustrate,
suppose that
V(i)= 1 1.36 1.22 1.1 1.26
1 2 3 4 5 !.
The multiplied proportional likelihood array would be:
V(i)= 1×ϕ11.36 ×ϕ21.22 ·ϕ31.1·ϕ41.26 ·ϕ5
1 2 3 4 5 !,
where ϕ1, ..., ϕ5are uniform random numbers independently drawn
on the interval (0, 1). Suppose that the multiplied array ˙
V(i) looks
like this:
V(i)= 0.11 0.86 0.57 0.62 1.09
1 2 3 4 5 !.
The new particle position is then defined by ranking the columns in
V(i) by the values in its first row. That is, the elements in the first
row of the array are ranked in a decreasing order of value and the
indices of the attributes (in the second row of ˙
V(i)) follow their re-
spective proportional likelihoods. For example, ranking the array:
V(i)= 0.11 0.86 0.57 0.62 1.09
1 2 3 4 5 !,
we would obtain ˙
V(i)= 1.09 0.86 0.62 0.57 0.11
After ranking the array ˙
V(i), the first kindices (in the second row
of ˙
V(i)), from left to right, are selected to compose the new particle
position. The constant krepresents the length of the particle
the particle associated to the ranked array ˙
V(i). Thus, if particle
X(i), a particle associated to the multiplied and sorted array:
V(i)= 1.09 0.86 0.62 0.57 0.11
5 2 4 3 1 !,
has length 3, the first 3 indices from the second row of ˙
V(i) would
be selected to compose the new particle position. Based on the ar-
ray ˙
V(i) given above, if k=3 (that is,
X(i)={*, *, *}) the indices
(attributes) 5, 2 and 4 would be selected to compose the new par-
ticle position, i.e.,
X(i)={5, 2, 4}. Note that indices that have a
higher proportional likelihood are, on average, more likely to be
The updating of
B(i) and
Gis identical to what is described
in Subsection 4.2.
G protein-coupled receptors (GPCRs) are a protein family of
transmembrane receptors. Their function is to transduce signals
that induce a cellular response to the environment. GPCRs are the
largest protein family known and they are involved in all types of
stimulus-response pathways, from intercellular communication to
physiological senses. GPCRs are of much interest to the pharma-
ceutical industry for these proteins are involved in many pathologi-
cal conditions, which led to GPCRs being the target of 40% to 50%
of modern medicinal drugs [6].
In this work we use the GPCR-PROSITE data set of proteins
previously used in [8]. The data set contains 190 proteins. The pro-
teins are represented by a set of 127 PROSITE patterns. PROSITE
is a database of protein families and domains. It is based on the
observation that, while there is a huge number of dierent proteins,
most of them can be grouped, on the basis of similarities in their
sequences, into a limited number of families (a protein consists
of a sequence of amino acids). PROSITE patterns are small re-
gions within a protein that present a high sequence similarity when
compared to other proteins. In our data set the absence of a given
PROSITE pattern is indicated by a value of 0 for the attribute corre-
sponding to that PROSITE pattern. The presence of it is indicated
by a value of 1 for that same attribute. The proteins in this data
set are grouped into families and subfamilies in a hierarchical fash-
ion. There are three levels of hierarchy. The first level has 8 classes
(families), the second and third levels have 32 classes (subfamilies)
each one (some proteins are classified only up to the second hier-
archical level and have no class at the third level). The objective
of our algorithms is to classify each protein into its most suitable
family in each level. In this work the classification of the proteins
is performed for each class level individually. For instance, given
protein Xa conventional “flat” classification algorithm assigns Xs
class at the first class level only. Once protein Xhas been classified
at the first class level, the conventional flat classification algorithm
is again applied to assign a class to protein Xat the second level –
no information about X’s class at the previous level is used. The
same process is used to assign a class to protein Xat the third class
In this section, we report and discuss computational experiments.
The quality of a candidate solution (fitness) is evaluated in three
dierent ways: (1) by a baseline algorithm (using all possible at-
tributes); (2) by the Binary PSO; and (3) by the Discrete PSO
(DPSO). Each of these algorithms computes the fitness of every
given solution using two distinct techniques: (a) using a Naive
Bayes classifier; and (b) using a Bayesian network. For the Binary
PSO and DPSO 30 independent runs are performed for each single
fold. The results obtained, averaged over 30 runs, are reported in
Table 2.
7.1 Experimental Methodology
The fitness function f(
X(i)) of any particle
X(i) is computed as
follows. f(
X(i)) is equal to the predictive accuracy achieved by the
Naive Bayes classifier (and the Bayesian network) on the GPCR-
PROSITE data set and using only the attributes present in
X(i). The
objective is to find the smallest subset of attributes (PROSITE pat-
terns) with which it is possible to classify the proteins on the data
set as belonging to one of the classes (for each class level) with
an acceptable accuracy. We define the accuracy as acceptable if
it is equal to or better than the accuracy obtained by the classifica-
Table 2: Results for the GPCR-PROSITE data set.
1 71.27±2.08 72.88±2.40 *73.05±2.31 85.60±2.84 *74.90±3.48
2 30.00±2.10 31.34±2.47 *32.60±2.31 101.50±3.14 *83.80±4.64
3 20.47±0.96 21.47±1.16 *23.25±1.08 102.30±3.77 *87.50±4.25
1 78.05±2.33 79.03±2.57 *80.54±2.46 78.50±3.50 *65.50±3.41
2 39.08±2.67 40.31±2.85 *43.24±4.67 94.10±3.70 *73.30±2.67
3 24.70±1.83 26.14±2.11 *28.97±2.77 94.90±3.90 *77.60±4.35
The best result on each line for each performance criterion is marked with an asterisk (*).
tion performed considering all the 127 original attributes. Note that
this is a naive and particular definition of acceptable accuracy. We
chose this definition because it suits the purpose of our experiments
– to compare the performance of the standard Binary PSO and the
DPSO algorithms in the GPCR-PROSITE data set. As a rule, the
definition of acceptable accuracy is problem dependent and should
take into account prior knowledge of the target problem - when
available. In fact, in many real-world applications, minimizing the
number of selected attributes while maximizing classification ac-
curacy are conflicting tasks.
The measurement of f(
X(i)) in this paper follows what in Data
Mining is called a wrapper approach. The wrapper approach searches
for an optimal attribute subset tailored to a particular algorithm,
such as the Naive Bayes classifier or Bayesian network. For more
information on wrapper and other attribute selection approaches see
The computational experiments involved a 10-fold cross-validation
method [25]. First, the 190 records in the GPCR-PROSITE data
set were divided into 10 equally sized folds. The folds were ran-
domly generated but under the following criterion. The proportion
of classes in every single fold must be similar to the one found in
the original data set containing all the 190 records. This is known
as stratified cross-validation. Each of the 10 folds is used once as
test set and the remaining of the data set is used as training set. Out
of the 9 folds in the training set, one is reserved to be used as a val-
idation set. The Naive Bayes classifier and the Bayesian network
use the remaining 8 folds to compute the probabilities required to
classify new examples. Once those probabilities have been com-
puted, the Naive Bayes classifier (NB) and the Bayesian network
(BN) classify the examples in the validation set. The accuracy of
this classification on the validation set is the value of the fitness
functions fNB(
X(i)) and fBN(
X(i)). After the run of the PSO al-
gorithm is completed, the 9 folds are merged into a full training
set. The Naive Bayes classifier and the Bayesian network are then
trained again on this full training set (9 merged folds), and the prob-
abilities computed in this final, full training set are used to classify
examples in the test set (the 10th fold), which was never accessed
during the run of the algorithms. In each of the 10 iterations of the
cross-validation procedure, the predictive accuracy of the classifi-
cation is assessed by 3 dierent methods:
(1) Using all the 190 original attributes: all possible attributes
are used by the Naive Bayes classifier and the Bayesian net-
(2) Standard Binary PSO algorithm: only the attributes se-
lected by the best particle found by the Binary PSO algorithm
are used by the Naive Bayes classifier and the Bayesian net-
(3) DPSO algorithm: only the attributes selected by the best
particle found by the DPSO algorithm are used by the Naive
Bayes classifier and the Bayesian network.
Since the Naive Bayes and Bayesian network classifiers that we
used are deterministic, only one run (for each of these algorithms)
is performed for the classification using all the 127 attributes. For
the Binary PSO and the DPSO algorithms 30 independent are per-
formed for each fold. Results reported are averaged over these 30
independent runs. The population size used for both algorithms
(Binary PSO and DPSO) is 200 and the search stops after 20,000
fitness evaluations (or 100 iterations). The Binary PSO algorithm
uses a inertia weight value of 0.8 (i.e., w=0.8). The choice of the
value of this parameter was based on the work presented in [23].
Other choices of parameter values for the DPSO were α=0.10, β
=0.12 and γ=0.14. These values were empirically determined
in our preliminary experiments; but we make no claim that these
are optimal values. Parameter optimization is a topic for future re-
The measurement of the predictive accuracy rate of a model
should be a reliable estimate of how well that model classifies the
test examples (unseen during the training phase) on the target prob-
lem. In Data Mining, typically, the equation:
Standard accuracy rate =T P +T N
T P +F P +FN +T N (5)
is used to assess the accuracy rate of a classifier (where T P,T N,
FP,FN are the numbers of true positives, true negatives, false pos-
itives and false negatives, respectively [25]). Nevertheless, if the
class distribution is highly unbalanced, Equation 5 is an ineective
way of measuring the accuracy rate of a model. For instance, in
many problems it is easy to maximize Equation 5 by simply pre-
dicting always the majority class. Therefore, on our experiments
we use a more demanding measurement for the accuracy rate of a
classification model.
It has also been used before in [19]. This measurement is given
by the equation:
Predictive accuracy rate =T PR ·T N R ,(6)
where, T PR =T P
T P +F N and T N R =T N
T N +F P .
Note that if any of the quantities T PR or T NR is zero, the value
returned by Equation 6 is also zero.
7.2 Discussion
Results are reported in Table 2. First, we discuss the results ob-
tained by the three algorithms using the Naive Bayes classifier. To
assess the performance of the algorithms we consider two criteria:
(1) maximizing predictive accuracy; and (2) finding the smallest
subset of attributes. Comparing the first criterion, accuracy, we
note that both versions of the PSO algorithm did better (in all class
levels) than the baseline algorithm using all attributes. Further-
more, the DPSO algorithm did slightly better than the Binary PSO
algorithm in all class levels. Nevertheless, the dierence in the pre-
dictive accuracy performance between these algorithms is, in some
cases, not statistically significant. Table 3 shows the results of a
paired two-tailed t-test for the predictive accuracy of the Binary
PSO versus the predictive accuracy of the DPSO (at a significance
level of 0.05).
Table 3: Binary PSO vs. DPSO (ACCURACY) : paired two-
tailed t-test for the predictive accuracy (significance level 0.05).
LEVEL Naive Bayes Bayesian network
1 t(9) =0.467, p =0.651 t(9) =3.407, p =0.007
2 t(9) =2.221, p =0.053 t(9) =3.200, p =0.010
3 t(9) =3.307, p =0.009 t(9) =3.556, p =0.006
According to Table 3, using Naive Bayes as classifier the only
statistically significant dierence in performance (in terms of pre-
dictive accuracy) between the algorithms (Binary PSO and DPSO)
is at the third class level. By contrast, using Bayesian networks as
classifier the dierence in performance is statistically significant at
all class levels.
However, the discriminating factor between the performance of
these algorithms is on the second comparison criterion – finding the
smallest subset of attributes. The DPSO not only outperformed the
binary PSO in predictive accuracy, but also did so using a smaller
subset of attributes in all class levels. Moreover, when it comes
to eectively pruning the set of attributes, the dierence in perfor-
mance between the Binary PSO and the DPSO is always statisti-
cally significant. Table 4 shows that.
Table 4: Binary PSO vs. DPSO (ATTRIBUTES) : paired two-
tailed t-test for the number of attributes selected (significance
level 0.05).
LEVEL Naive Bayes Bayesian network
1 t(9) =7.248, p =4.8E-5 t(9) =8.2770, p =1.6E-5
2 t(9) =9.052, p =8.1E-6 t(9) =14.890, p =1.2E-7
3 t(9) =6.887, p =7.1E-5 t(9) =9.1730, p =7.3E-6
Second, we discuss the results obtained using the Bayesian net-
work algorithm as a classifier. Again, the predictive accuracy at-
tained by both versions of the PSO algorithm surpassed the predic-
tive accuracy obtained by the baseline algorithm in all class levels.
DPSO obtained the best predictive accuracy of all algorithms in all
three class levels. In terms of the second comparison criterion, find-
ing the smallest subset of attributes, again DPSO always selected
the smallest subset of attributes in all hierarchical levels.
Comparing the performance of the classifiers (Naive Bayes vs.
Bayesian networks), we note that Bayesian networks did a much
better job. For all three class levels the predictive accuracy ob-
tained by the algorithms (baseline, Binary PSO and DPSO) using
Bayesian networks was significantly better than the predictive ac-
curacy obtained using Naive Bayes classifier. The Bayesian net-
works also enabled the two PSO algorithms to do the job using
fewer selected attributes.
The results emphasize the importance of taking correlations among
attributes into account when doing attribute selection. When these
correlations are ignored, predictive accuracy is adversely aected.
Computational results show that the use of unimportant attributes
tend to derail classifiers and hurt classification accuracy. Using
fewer attributes, the Binary PSO and the DPSO algorithms obtained
better predictive accuracy (in 100% of the cases) than the classifica-
tion performed using all possible attributes. Previous work had al-
ready shown that the DPSO algorithm performs better than the Bi-
nary PSO in the task of attribute selection [4]. Even if the improve-
ment in predictive accuracy is not significant, by selecting fewer
attributes the DPSO certainly enhance computational eciency of
the classifier.
The original work, however, questioned whether the dierence
in performance between these two algorithms was attributable to
variations in the initial population of solutions. To overcome this
possible advantage/disadvantage for one algorithm or the other, the
present work used the same initialization for both algorithms. Com-
putational results show that, even using the same initial conditions,
the DPSO is still outperforming the Binary PSO in both predictive
accuracy and number of selected attributes. The DPSO is arguably
not too dierent from traditional PSO but still the algorithm has
some features that enable it to improve over binary PSO.
Another interesting result from the experiments is the clear dif-
ference in performance between Naive Bayes and Bayesian net-
works used as classifiers. Bayesian networks outperformed Naive
Bayes classifier in all experiments and in all hierarchical class lev-
The hierarchical classification performed in this work was a flat
classification. The algorithms did not use the information of the
class assigned to an example (protein) in one level to help the pre-
diction of the class of at the next hierarchical level. In future work
we intend to develop an algorithm that takes advantage of this in-
Thanks to Nick Holden for kindly providing us with the bio-
logical data sets used in this work. The authors would also like to
thank EPSRC (grant Extended Particle Swarms GR/T11265/01) for
financial support.
[1] T. Blackwell and J. Branke. Multi-swarm optimization in
dynamic environments. In Lecture Notes in Computer
Science, volume 3005, pages 489–500. Springer-Verlag,
[2] R. R. Bouckaert. Properties of Bayesian belief network
learning algorithms. In I. R. L. de Mantaras and e. D. Poole,
editors, Proceedings of the 10th Conference on Uncertainty
in Artificial Intelligence, pages 102–109, Seattle, WA, USA,
1994. Morgan Kaufmann.
[3] D. M. Chickering, D. Geiger, and D. Heckerman. Learning
Bayesian networks is NP-hard. Technical Report
MSR-TR-94-17, Microsoft Research, November 1994.
[4] E. S. Correa, A. A. Freitas, and C. G. Johnson. A new
discrete particle swarm algorithm applied to attribute
selection in a bioinformatics data set. In M. K. et al., editor,
Proceedings of the Genetic and Evolutionary Computation
Conference - GECCO-2006, pages 35–42, Seattle, WA,
USA, July 2006. ACM Press.
[5] E. S. Correa, M. T. Steiner, A. A. Freitas, and C. Carnieri.
Using a genetic algorithm for solving a capacity p-median
problem. Numerical Algorithms, 35:373–388, 2004.
[6] D. Filmore. It’s a GPCR world. Modern drug discovery,
11(7):24–28, November 2004.
[7] A. A. Freitas. Data Mining and Knowledge Discovery with
Evolutionary Algorithms. Springer-Verlag, October 2002.
[8] N. Holden and A. A. Freitas. Hierarchical classification of
g-protein-coupled receptors with a pso/aco algorithm. In
Proc. IEEE Swarm Intelligence Symposium (SIS-06), pages
77–84. IEEE Press, June 2006.
[9] S. Janson and M. Middendorf. A hierarchical particle swarm
optimizer for dynamic optimization problems. In
Evoworkshops 2004: 1st European Workshop on
Evolutionary Algorithms in Stochastic and Dynamic
Environments, pages 513–524, Coimbra, Portugal, 2004.
[10] F. V. Jensen. Bayesian networks and decision graphs.
Springer-Verlag, 1st edition, July 2001.
[11] G. Kendall and Y. Su. A particle swarm optimisation
approach in the construction of optimal risky portfolios. In
Proceedings of the 23rd IASTED International
Multi-Conference on Applied Informatics, pages 140–145,
2005. Artificial intelligence and applications.
[12] J. Kennedy. Small worlds and mega-minds: eects of
neighborhood topology on particle swarm performance. In
P. J. Angeline, Z. Michalewicz, M. Schoenauer, X. Yao, and
A. Zalzala, editors, Proceedings of the Congress of
Evolutionary Computation, pages 1931–1938, Piscataway,
NJ, USA, 1999. IEEE Press.
[13] J. Kennedy and R. C. Eberhart. A discrete binary version of
the particle swarm algorithm. In Proceedings of the 1997
Conference on Systems, Man, and Cybernetics, pages
4104–4109, Piscataway, NJ, USA, 1997. IEEE.
[14] J. Kennedy and R. C. Eberhart. Swarm Intelligence. Morgan
Kaufmann Publishers Inc., San Francisco, CA, USA, 2001.
[15] P. Larrañaga, R. Etxeberria, J. A. Lozano, B. Sierra, I. naki
Inza, and J. M. Peña. A review of the cooperation between
evolutionary computation and probabilistic models. In
Second Symposium on Artificial Intelligence - CIMAF-1999,
pages 314–324, Havana, Cuba, March 1999. Special Session
on Distributions and Evolutionary Computation.
[16] S. L. Lauritzen and D. J. Spiegelhalter. Local computations
with probabilities on graphical structures and their
application to expert systems. Journal of the Royal Statistics
Society 50, 2:157–224, 1988.
[17] M. Løvbjerg and T. Krink. Extending particle swarm
optimisers with self-organized criticality. In D. B. Fogel,
M. A. El-Sharkawi, X. Yao, G. Greenwood, H. Iba,
P. Marrow, and M. Shackleton, editors, Proceedings of the
2002 Congress on Evolutionary Computation CEC2002,
pages 1588–1593. IEEE Press, 2002.
[18] T. M. Mitchell. Machine Learning. McGraw-Hill, August
[19] G. L. Pappa, A. J. Baines, and A. A. Freitas. Predicting
post-synaptic activity in proteins with data mining.
Bioinformatics, 21(2):ii19–ii25, 2005.
[20] J. Pearl. Probabilistic reasoning in intelligent systems:
networks of plausible inference. Morgan Kaufmann, 1st
edition, September 1988.
[21] J. M. Peña, J. A. Lozano, and P. Larrañaga. Globally
multimodal problem optimization via an estimation of
distribution algorithm based on unsupervised learning of
bayesian networks. In Evolutionary Computation,
volume 13, pages 43–66. MIT Press, January 2005.
[22] R. Poli, C. D. Chio, and W. B. Langdon. Exploring extended
particle swarms: a genetic programming approach. In
GECCO’05: Proceedings of the 2005 Conference on Genetic
and Evolutionary Computation, pages 169–176, New York,
NY, USA, 2005. ACM Press.
[23] Y. Shi and R. C. Eberhart. Parameter selection in particle
swarm optimization. In EP’98: Proceedings of the 7th
International Conference on Evolutionary Programming,
pages 591–600, London, UK, 1998. Springer-Verlag.
[24] M. M. Solomon. Algorithms for the vehicle routing and
scheduling problems with time window constraints.
Operations Research, 35(2):254–265, 1987.
[25] I. H. Witten and E. Frank. Data Mining: Practical Machine
Learning Tools and Techniques. Morgan Kaufmann, 2nd
edition, 2005.
... In this paper, we present an empirical evaluation of the proposal of O'Gorman et al. in order to assess its practical applicability using the available architectures. Since the problem encoding and the subsequent embedding in the quantum architecture limit the direct application to around 18 Bayesian variables (at time of writing), we also propose a divide-et-impera approach to overcome this limitation. Both the original algorithm and the new scheme have been tested on different problems with a growing number of variables. ...
... Given ∆, all penalties can be determined. In detail, δ (i) max is computed for each Bayesian variable according to (18) with a resulting complexity (for all δ ...
... Regarding the complexity of this step, it is linear with respect to the number of subproblems ( n k ). Eventually, the solution to the original BNSL problem must be reconstructed starting from the subproblems solutions (lines [11][12][13][14][15][16][17][18][19][20][21]. Let S be the set of all subproblems solutions, where each solution consists of the list of indices of the variables included and an adjacency matrix for the corresponding graph. ...
Full-text available
Bayesian networks are widely used probabilistic graphical models, whose structure is hard to learn starting from the generated data. O'Gorman et al. have proposed an algorithm to encode this task, i.e., the Bayesian network structure learning (BSNL), into a form that can be solved through quantum annealing, but they have not provided an experimental evaluation of it. In this paper, we present (i) an implementation in Python of O'Gorman's algorithm, (ii) a divide et impera approach that allows addressing BNSL problems of larger sizes in order to overcome the limitations imposed by the current architectures, and (iii) their empirical evaluation. Specifically, several problems with an increasing number of variables have been used in the experiments. The results have shown the effectiveness of O'Gorman's formulation for BNSL instances of small sizes, and the superiority of the divide et impera approach on the direct execution of O'Gorman's algorithm.
... It is generally applicable to problems, where the global optimum of an objective function is to be found. The presented solution is inspired by the work presented in [4][5][6] where a discrete P SO was successfully applied in ontology alignment [4] as well as the selection of an optimal set of attributes for a classifier presented in [5,6]. ...
... It is generally applicable to problems, where the global optimum of an objective function is to be found. The presented solution is inspired by the work presented in [4][5][6] where a discrete P SO was successfully applied in ontology alignment [4] as well as the selection of an optimal set of attributes for a classifier presented in [5,6]. ...
Conference Paper
Semantic similarity plays a vital role within a myriad of shared data applications, such as data and information integration. A first step towards building such applications is to determine concepts, which are semantically similar to each other. One way to compute this similarity of two concepts is to assess their word similarity by exploiting different knowledge sources, e.g., ontologies, thesauri, domain corpora, etc. Over the last few years, several approaches to similarity assessment based on quantifying information content of concepts have been proposed and have shown encouraging performance. For all these approaches, the Least Common Subsumer (LCS) of two concepts plays an important role in determining their similarity. In this paper, we investigate the influence the choice of this node (or a set of nodes) on the quality of the similarity assessment. In particular, we develop a particle swarm optimization approach that optimally discovers LCSs. An empirical evaluation, based on well-established biomedical benchmarks and ontologies, illustrates the accuracy of the proposed approach, and demonstrates that similarity estimations provided by our approach are significantly more correlated with human ratings of similarity than those obtained via related works.
... Áreas como bioinformática (Chan & Freitas, 2006) e composição de músicas (Geis & Middendorf, 2007) também têm sido exploradas com as técnicas de ACO. Além das aplicações já mencionadas, essa metaheurística também vem sendo usada para o desenvolvimento de algoritmos de aprendizagem para estruturas de representação de conhecimento, como máquinas de vetores suporte (Martens et al., 2008), lógica nebulosa (Vieira et al., 2007) e redes Bayesianas (Correa et al., 2007) para tarefas relacionadas com mineração de dados (Parpinelli et al., 2002). Umas das tendências atuais das investigações sobre esse tema é a hibridização dos algoritmos ACO com os métodos mais clássicos de inteligência artificial ou de pesquisa operacional. ...
Full-text available
This paper presents an overview of some most recent bioinspired methods based on swarm behaviors for the development of problem-solving techniques. The metaheuristics provided here are ant colony optimization, particle swarm optimization, shuffled frog-leaping algorithm, bacterial foragingoptimization and bee colony. The basic biological prin-ciples that have motivated the development of each strategy, as well as their computational algorithms, are introduced. Two different applications were carried out in order to clarify the performance of such algorithms. The goal is to emphasize perspectives of applications of these approaches in different engineering problems.
Full-text available
The objective of this study is to classify the rice grains of seventeen different varieties popularly planted in Vietnam. Image processing is used to extract color, morphological, and texture features of the rice grains. Five feature subsets are formed, namely, morphological, basic color, clustering color, statistical, and gray level co-occurrence matrix (GLCM). These subsets and combined sets are evaluated for classification ability with a support vector machine (SVM). A dataset of 248 features, including a total of color, morphological, and texture features classified with the SVM gives an overall accuracy of 88.29%. To decrease the number of used features and to improve the classification accuracy, the proposed method combining binary particle swarm optimization (BPSO) and the SVM, called BPSO+SVM, is applied to the dataset. In the results, classification accuracy from BPSO+SVM reaches 93.94% using only 96 selected features. The obtained result shows the proposed method achieves higher classification accuracy than the SVM alone, and the required number of features is only 39% of the total dataset. This result can be applied for developing an automatic classification and identification system of rice varieties.
Immunological computation is one of the largest recent bio-inspired approaches of artificial intelligence. Artificial immune systems (AIS) are inspired by the processes of the biological immune systems like the learning and memory characteristics which are used for solving complex problems. During the last two decades, AIS have been applied in various fields such as optimization, network security and data mining. In this article, we focus on the application of AIS to data mining in bioinformatics, more specifically, the classification task. For this purpose, we suggest three immune models based on clonal selection theory for the identification of G-protein coupled receptors (GPCRs) to predict their function. Our three classifiers are the artificial immune recognition system (AIRS), the clonal selection algorithm (CLONALG) and the clonal selection classification algorithm (CSCA). The GPCRs represent one of the largest and most important families of multifunctional proteins and are a significant target for bioactive and drug discovery programs. It is estimated that more than half of the drugs on the market currently target GPCRs. However, although thousands of GPCRs sequences are known, many of them remain orphans, have unknown function. Our experiments show that the three immunological classifiers have provided interesting results, however, AIRS obtained the best ones. Therefore, it is, for us, the most suitable immune model for the GPCRs identification problem.
The G protein-coupled receptors (GPCRs) include one of the largest and most important families of multifunctional proteins known to molecular biology. They play a key role in cell signaling networks that regulate many physiological processes, such as vision, smell, taste, neurotransmission, secretion, immune responses, metabolism, and cell growth. These proteins are thus very important for understanding human physiology and they are involved in several diseases. Therefore, many efforts in pharmaceutical research are to understand their structures and functions, which is not an easy task, because although thousands GPCR sequences are known, many of them remain orphans. To remedy this, many methods have been developed using methods such as statistics, machine learning algorithms, and bio-inspired approaches. In this article, the authors review the approaches used to develop algorithms for classification GPCRs by trying to highlight the strengths and weaknesses of these different approaches and providing a comparison of their performances.
Recently there has been considerable interest in applying evolutio-nary and natural computing techniques for analyzing large datasets with large number of features. In particular, efficacy prediction of siRNA has attracted a lot of researchers, because of large number of features involved. In the present work, we have applied the SVM based classifier along with PSO, ACO and GA on Huesken dataset of siRNA features as well as on two other wine and wdbc breast cancer gene benchmark dataset and achieved considerably high accuracy and the results have been presented. We have also highlighted the necessary da-ta size for better accuracy in SVM for selected kernel. Both groups of features (sequential and thermodynamic) are important in the efficacy prediction of siRNA. The results of our study have been compared with other results availa-ble in the literature.
In this paper, a feature selection algorithm based on ant colony optimization (ACO) is presented to construct classification rules for image classification. Most existing ACO-based algorithms use the graph with O(n2) edges. In contrast, the artificial ants in the proposed algorithm FSC-ACO traverse on a feature graph with only O(n) edges. During the process of feature selection, ants construct the classification rules for each class according to the improved pheromone and heuristic functions. FSC-ACO improves the qualities of rules depend on the classification accuracy and the length of rules. The experimental results on both standard and real image data sets show that the proposed algorithm can outperform the other related methods with fewer features in terms of speed, recall and classification accuracy.
Feature selection (FS) is an important task which can significantly affect the performance of image classification and recognition. In this paper, we present a feature selection algorithm based on ant colony optimization (ACO). For n features, existing ACO-based feature selection methods need to traverse a complete graph with O(n2) edges. However, we propose a novel algorithm in which the artificial ants traverse on a directed graph with only O(2n) arcs. The algorithm incorporates the classification performance and feature set size into the heuristic guidance, and selects a feature set with small size and high classification accuracy. We perform extensive experiments on two large image databases and 15 non-image datasets to show that our proposed algorithm can obtain higher processing speed as well as better classification accuracy using a smaller feature set than other existing methods.
Full-text available
Work on "short-text clustering" is relevant, particularly if we consider the current/future mode for people to use 'small-language', e.g. blogs, text-messaging, snippets, etc. Potential applications in different areas of natural language processing may include re-ranking of snippets in information retrieval, and automatic clustering of scientific texts avail- able on the Web. Despite its relevance, this kind of problems has not received too much attention by the computational linguistic commu- nity due to the high challenge that this problem implies. In this work, we propose the CLUDIPSO algorithm, a novel approach for cluster- ing short-text collections based on a discrete Particle Swarm Optimizer. Our approach explicitly considers clustering as an optimization problem where a given arbitrary objective function must be optimized. We used two unsupervised measures of cluster validity with this purpose: the Expected Density Measure and the Global Silhouette coefficient. These measures have shown interesting results in recent works on short-text clustering. The results indicate that our approach is a highly competi- tive alternative to solve this kind of problems.
Full-text available
In our previous work we have proposed a hybrid Particle Swarm Optimisation / Ant Colony Optimisation (PSO/ACO) algorithm for discovering classification rules. In this paper we propose some modifications to the algorithm and apply it to a challenging hierarchical classification problem. This is a bioinformatics problem involving the prediction of G-Protein-Coupled Receptor's (GPCR) hierarchical functional classes. We report the results of an extensive comparison between four versions of swarm intelligence algorithms – two versions based on our proposed algorithm and two versions based on Discrete PSO for discovering classification rules proposed in the literature. The experiments also compared the effectiveness of different kinds of protein signatures when used as predictor attributes, namely Prints, Interpro and Prosite signatures.
Conference Paper
Full-text available
In this paper, we apply particle swarm optimisation to the construction of optimal risky portfolios for financial investments. Constructing an optimal risky portfolio is a high-dimensional constrained optimisation problem where financial investors look for an optimal combination of their investments among different financial assets with the aim of achieving a maximum reward-to-variability ratio. A particle swarm solver is developed and tested on various restricted and unrestricted risky investment portfolios. The particle swarm solver demonstrates high computational efficiency in constructing optimal risky portfolios of less than fifteen assets. The effectiveness of a weighting function in the particle swarm optimisation algorithm is also studied.
This paper considers the design and analysis of algorithms for vehicle routing and scheduling problems with time window constraints. Given the intrinsic difficulty of this problem class, approximation methods seem to offer the most promise for practical size problems. After describing a variety of heuristics, we conduct an extensive computational study of their performance. The problem set includes routing and scheduling environments that differ in terms of the type of data used to generate the problems, the percentage of customers with time windows, their tightness and positioning, and the scheduling horizon. We found that several heuristics performed well in different problem environments; in particular an insertion-type heuristic consistently gave very good results.