ArticlePDF Available

Training Neural Nets with Reactive Tabu Search

Authors:

Abstract and Figures

In this paper the task of training subsymbolic systems is considered as a combinatorial optimization problem and solved with the heuristic scheme of the reactive tabu search (RTS). An iterative optimization process based on a "modified local search" component is complemented with a meta-strategy to realize a discrete dynamical system that discourages limit cycles and the confinement of the search trajectory in a limited portion of the search space. The possible cycles are discouraged by prohibiting (i.e., making tabu) the execution of moves that reverse the ones applied in the most recent part of the search. The prohibition period is adapted in an automated way. The confinement is avoided and a proper exploration is obtained by activating a diversification strategy when too many configurations are repeated excessively often. The RTS method is applicable to nondifferentiable functions, is robust with respect to the random initialization, and effective in continuing the search after local minima. Three tests of the technique on feedforward and feedback systems are presented.
Content may be subject to copyright.
1
The Reactive Tabu Search
ROBERTO BATTITI
Dipartimento di Matematica and Istituto Nazionale di Fisica
Nucleare, gruppo collegato di Trento, Universita di Trento, 38050 Povo (Trento), Italy,
EMAIL: battiti@itnvax.science.unitn.it
GIAMPIETRO TECCHIOLLI
Istituto Nazionale di Fisica Nucleare, gruppo collegato
di Trento and Istituto per la Ricerca Scientica e Tecnologica, 38050 Povo (Trento), Italy,
EMAIL: tec@irst.it
We propose an algorithm for combinatorial optimization where an explicit check for
the repetition of congurations is added to the basic scheme of Tabu search. In our
Tabu scheme the appropriate size of the list is learned in an automated way by re-
acting to the occurrence of cycles. In addition, if the search appears to be repeating
an excessive number of solutions excessively often, then the search is diversied by
making a number of random moves proportional to a moving average of the cycle
length. The reactive scheme is compared to a "strict" Tabu scheme, that forbids
the repetition of congurations and to schemes with a xed or randomly varying list
size. From the implementation point of view we show that the Hashing or Digital
Tree techniques can be used in order to search for repetitions in a time that is ap-
proximately constant. We present the results obtained for a series of computational
tests on a benchmark function, on the 0-1 Knapsack Problem, and on the Quadratic
Assignment Problem.
Preprint Dip. di Matematica Univ. di Trento, October 1992
To appear in:
ORSA Journal on Computing
Vol. 6, N. 2 (1994), pagg. 126-140.
R Battiti and G Tecchiolli - The Reactive Tabu Search
2
The tabu search meta-strategy has been showed to be an eective and ecient scheme for combi-
natorial optimization that combines a hill-climbing search strategy based on a set of elementary
moves and a heuristics to avoid the stops at suboptimal points and the occurrence of cycles
(see [5], [6], [7]). This goal is obtained by using a nite-size list of forbidden moves (the tabu
moves) derived from the recent history of the search. The basic underlying assumption is that
the suboptimal points (where the simple hill-cli mbing component stops) can be better starting
points with respect to random restarts, provided that care is taken so that the local maxima
(or minima) do not b ecome
attractors
of the dynamics induced by the algorithm and that
limit
cycles
do not arise (we borrow the terminology from the theory of dynamical systems [13]).
Some tabu search implementations are based on the fact that cycles are avoided if the rep e-
tition of previously visited congurations is prohibited. For example, in the Reverse Elimination
Method [7], the only lo cal movements that are excluded from consideration (i.e. that become
tabu) are those that would lead to previously visited solutions. REM is a metho d to realize
what may be called
Strict Tabu
(S-TABU for short).
We argue that S-TABU can converge very slowly for problems where the sub optimal cong-
uration is surrounded by large \basins of attractions", i.e., by large set of points that converge
to it with hill-climbing. This slow convergence is related to the "basin-lling" eect that is
illustrated in Section 2. In addition, the optimal point can become unreachable b ecause of the
creation of barriers consisting of the already-visited p oints. When S-TABU can be used, one
can avoid the relatively slow REM technique (that at iteration
n
requires a computation of
order
O
(
n
)) by using the
hashing
or
digital tree
approaches (that require a constant amount of
computing per iteration).
The tabu scheme based on a xed list size (F-TABU) is not strict and therefore the possibility
of cycles remain. The proper choice of the size (long to avoid cycles but short in order not to
constrain the search too much) is critical to the success of the algorithm, although for many
interesting problems the results do not depend to o much on its value (see [8] and the contained
bibliography). More robust schemes are based on a randomly varying list size [15], although one
must prescribe suitable limits for its variation.
Our
Reactive Tabu
scheme (R-TABU for short) goes further in the direction of robustness
by proposing a simple mechanism for adapting the list size to the properties of the optimization
problem. The congurations visited during the search and the corresponding iteration numb ers
are stored in memory so that, after the last movement is chosen, one can check for the repeti-
tion of congurations and calculate the interval between two visits. The basic fast "reaction"
mechanism increases the list size when congurations are repeated. This is accompanied by a
slower reduction mechanism so that the size is reduced in regions of the search space that do
not need large sizes.
An additional Long Term Memory diversication mechanism is enforced when there is evi-
dence that the system is in a
complex attractor
of the search space (the analogy is that of chaotic
attractors, see Section 1). The LTM "escape" or "diversication" mechanism can be realized
with a negligible eort by exploiting the memory structure describ ed.
If a problem requires an excessive memory space to store the entire congurations one may
resort to compression techniques (the use of hashing for compression in [17] is an example). If
a total of
m
congurations are visited, the theoretical minimum on the number of bits needed
to distinguish among them is log
2
m
bits per conguration. Reaching the information-theoretic
minimum may require complex coding techniques, but with a small increase in memory size even
simple compression techniques are eective and well within the typical memory limitations of
current workstations.
In the following sections, rst we motivate and describe the
Reactive Tabu
scheme (Section
R Battiti and G Tecchiolli - The Reactive Tabu Search
3
1), then we analyze the behavior of the algorithms in the case study of a function of two variables
(Section 2) and compare the computation and memory requirements (Section 3). Finally we
apply the
Reactive Tabu
search to the quadratic assignment problem and discuss the results
(Section 4).
1 The Reactive Tabu Scheme
Let us begin with an analogy b etween the evolution of the search process in combinatorial op-
timization and the theory of dynamical systems (see, for example, [13] and [12]). The current
conguration traces a path in the conguration space subject to the movements dictated by the
search technique. Let us suppose that we are looking for the global minimum (trivially maxi-
mizing
f
corresponds to minimizing
?
f
). Lo cal minima are
attractors
of the system dynamics
for the steepest descent strategy. They are in fact
xed points
until a scheme is intro duced
that forces the system to exit from the local minimum and continue the search.
Limit cycles
(or
closed orbits
) are a second possibility, where the tra jectory endlessly rep eats a sequence of
states. Cycles are discouraged by the tabu technique and they are in fact strictly prohibited
in the S-TABU version. But there is a third possibility that is very relevant for the case of
optimization: the case in which xed p oints and limit cycles are absent but the tra jectory is
conned
in a limited portion of the search space. In the theory of dynamical systems this phe-
nomenon is described by introducing the concept of
chaotic attractors
. Because in this paper
the concept of chaotic attractor is used only as an example of a dynamic b ehavior that could
aect the search process, we summarize the main characteristics and refer to [13] for a detailed
theoretical analysis. Chaotic attractors are characterized by a "contraction of the areas", so
that trajectories starting with dierent initial conditions will be compressed in a limited area
of the conguration space, and by a "sensitive dependence upon the initial conditions", so that
dierent tra jectories will diverge.
For an analytical characterization of this sensitive dep endence, it is convenient to introduce
the concept of
Lyapunov exponent
. Let us consider the function
g
that maps the point at step
n
to the point at step
n
+ 1,
g
k
(
x
) is dened as the map obtained by iterating
g k
times. Starting
from close initial conditions
x
0
and
x
0
+
, the Lyapunov exponent
is dened through the
relationship:
e
n
k
g
n
(
x
0
+
)
?
g
n
(
x
0
)
k
If the exponent
is greater than 0, the initially close points diverge exponentially and, if
the trajectories remain conned in a limited region of the space, one obtains the situation
called \deterministic chaos". The motivation for this term is that the trajectory appears to be
\random" although the system is deterministic. In the above case, although limit cycles are
absent, the search trajectory will visit only a limited part of the search space. If this part does
not contain the absolute minimum (or the desired conguration), it will never be found.
The motion caused by the tabu search technique is very complex, so that a detailed analytical
study of the associated discrete dynamical system is problematic (for example most of the results
in the discrete dynamical systems of "cellular automata", that have a simpler structure with
respect to tabu, are based on numerical simulations, see [12] for a brief overview of the sub ject).
Nonetheless, the main suggestion to be derived is that avoiding limit cycles or even avoiding
repetitions of congurations is not sucient for an eective and ecient search technique. The
chaotic-like attractors should be discouraged. In some computational tests we will show evidence
of a trapping of the solution trajectory in a suboptimal region of the search space (see Section
2). Similar ideas are also present in [17] and [8] (\cycle avoidance is not an ultimate goal of the
search process ... the broader ob jective is to continue to stimulate the discovery of new high
quality solutions").
R Battiti and G Tecchiolli - The Reactive Tabu Search
4
The reactive tabu scheme maintains the basic building blocks of tabu search, i.e., the use of
a set of temporarily-forbidden moves, where the time interval for the prohibition is regulated by
the tabu list-size. What we add is a fully automated way of
adapting
the size to the problem
and to the current evolution of the search, and an escape strategy for diversifying the search
when the rst mechanism is not sucient. Both ideas can b e seen as ways to implement the
learn-while-searching paradigm that is characteristic of the tabu approach.
The algorithm is summarized in words and described in detail using a pseudo-language
derived from the
Pascal
language. To make the description more concise, variable declarations
and trivial parts of the code have been omitted or described in words.
For concreteness reasons we will present the algorithm for an application to the Quadratic
Assignment Problem (see [14] for one of the rst applications of Tabu to the QAP). The space
of congurations is given by the possible assignments of
N
units to
N
locations (
[
loc
] is the
unit that occupies location
loc
). The details about the QAP application will be described in
Section 4. Understanding the following Section does not require a detailed knowledge about the
QAP problem.
1.1 Basic Tabu To ols
The main tabu structures are common to various tabu implementations for the QAP (see for
example [15]). An exchange movement is tabu if it places b oth units to locations that they had
occupied within the latest
list size
iterations. The
aspiration
criterion is satised if the function
value reached after the move is better than the b est previously found. The basic functions for
the above operations are illustrated in Figure 1.
1.2 Memory Structures
Before explaining our variation of the tabu scheme, let us briey illustrate the meaning of the
variables and memory structures used. An elementary exchange move is indexed by variables
rchosen
and
s chosen
, the two locations that will be subjected to the exchange. All visited
points in the conguration space are saved in records that contain the placement of units in
locations (
), the most recent time when it was encountered (
last time
) and its multiplicity
(
repetitions
). When the number of repetitions for a given point is greater than
Rep
(3 in our
runs) the conguration is added to the set of often-repeated ones.
The constants
Increase
and
Decrease
determine the amounts by which the
list size
is
increased in the fast reaction mechanism, or decreased in the long-term size reduction pro-
cess. The variables
moving average
(a moving average of the detected cycle length) and
steps since last size change
(the number of iterations executed after the last change of list
size) are used for the long-term size reduction, the variable
chaotic
counts the number of often-
repeated placements. A diversifying
escape
movement is executed when
chaotic
is greater than
Chaos
(a constant equal to 3 in our runs). The status of the search is described by the record
current
;
current.
is the placement (
current.
[loc]
is the unit contained in location
loc
);
current.f
is the corresponding function value and
current.time
is the number of steps exe-
cuted . Each step consists of neighborhood evaluation and move selection. A similar record
best so far
stores the best placement found during the search.
The target value
sub optimum
and the maximum number of iterations
max iterations
are used for terminating the search.
1.3 Skeleton of R-TABU
Before proceeding with the reactive tabu search, the data records for hashing and tabu are
initialized and a random starting conguration is generated. Then the search routine cycles
R Battiti and G Tecchiolli - The Reactive Tabu Search
5
BASIC TABU FUNCTIONS (FOR THE QAP PROBLEM)
procedure
make tabu
(r,s)
comment:
Record the latest occupation for the two units that are going to be exchanged.
The array
latest o ccupation
[
,r] contains the latest occupation time
for unit
in location
r
.
begin
latest o ccupation[current.
[r],r] := current.time
latest occupation[current.
[s],s] := current.time
end
function
aspiration
(r,s)
comment:
Return the b o olean value
true
if the function value after the movement is
better than the best value ever found,
false
otherwise.
begin
if
(current.tness - move value[r,s])
<
best so far.tness
then
aspiration :=
true
else
aspiration :=
false
end
function
is tabu
(r,s)
comment:
Return the b o olean value
true
if, after the exchange, both units r and s
occupy locations that they had already occupied within the latest list size iterations,
the boolean value
false
otherwise.
begin
if
latest occupation[current.
[r],s]
current.time - list size
and
latest occupation[current.
[s],r]
current.time - list size
then
is tabu :=
true
else
is tabu :=
false
end
Figure 1: Basic tabu functions for the QAP problem.
R Battiti and G Tecchiolli - The Reactive Tabu Search
6
through the following steps:
i) all p ossible elementary moves from the current conguration are evaluated
ii) the latest conguration is searched in the memory structure (with a possible update of the
list size
, see Section 1.1) and a decision is taken about a diversifying escape move. In the
"default" case, i.e., with no escape:
iii-D) (Default) the best admissible move is executed with a possible reduction of the
list size
if all movements are tabu and none satises the aspiration criterion, and
the current status and time are updated.
In the other case, i.e., with escape:
iii-E) (Escape) the system enters a phase of random movements whose duration is regulated
by a moving average of detected cycles.
The initialization and main loop of R-TABU are illustrated in Figure 2, while the details on
the reaction, escape and move selection mechanisms will b e illustrated in the following Sections.
1.4 The Reaction and Escap e Mechanism
When a repetition of a previously encountered conguration occurs, there are two possible
reaction mechanisms. The basic "immediate reaction" increases the
list size
to discourage
additional repetitions (
list size
list size
Increase
). After a number
R
of immediate
reactions, the geometric increase (
/
Increase
R
) is sucient to break any limit cycle. In
this case a continuous sequence of repetitions rapidly increases the size until the tra jectory is
forced to explore new regions, but this mechanism may not be sucient to avoid the "chaotic
trapping" of the tra jectory in a limited area of the conguration space. To this end a second and
slower mechanism counts the number of congurations that are repeated many times (more than
Rep
times). When this number is greater than a threshold
Chaos
the
check for repetitions
function returns and diversifying escap e movement is enforced.
The reaction is caused by the lo cal properties of the solution trajectory, but, if the
list size
increases only, it could be excessive in the later phases of the search because it would constrain
the search more than necessary. Therefore a slow process reduces the size if a number of iterations
greater than
moving average
passed from the last size change. The function that checks for
the repetitions of function values is showed in Figure 3.
In addition to the immediate increase and slow reduction mechanisms, there is a third point
at which the
list size
is modied. This is the case when the list grows so much that all
movements become tabu (and none satises the aspiration criterion). When this happens the
size is reduced. Because of the geometric decrease, after small number of reductions at least
some movements will lose their tabu status.
Our escape strategy is based on the execution of a series of random exchanges. Their number
is random and prop ortional to the
moving average
, the rationale being that longer average
cycles are evidence of a larger basin and therefore more escape steps are likely to be required. To
avoid an immediate return into the old region of the search space, all random steps executed are
made tabu. The choice of the best move with the reaction and the escape strategies is illustrated
in Figure 4.
The dierent dynamics for the list size and the escap e mechanism are illustrated in Figure 5,
for an application to a Quadratic Assignment Problem of size
N
= 30.
In Figure 5 (top) we show the evolution of
list size
and the percent of repetitions of pre-
viously visited congurations. Data points are taken every 100 iterations. Note how frequent
R Battiti and G Tecchiolli - The Reactive Tabu Search
7
SKELETON OF REACTIVE TABU
procedure
initialization
begin
Initialize the data structures for hashing.
Initialize the data structures for tabu: set the latest occupation time
equal to a large negative value:
for
unit :=0
to
N
?
1
do
for
location :=0
to
N
?
1
do
latest occupation[unit,locatio n] := -
Infinity
list size := 1
chaotic := 0
moving average :=0
steps since last size change := 0
current.time := 0
Generate a starting conguration in a random way by setting current.
[ ] equal to
a random p ermutation of 0,1,2,...,N-1;
Set current.f equal to the initial function value.
Initialize the b est so far record containing the best solution ever found:
best so far.f := current.f
end
function
reactive tabu search
(max iterations)
comment:
Cycle until the best conguration is found or
the maximum number of iterations is reached.
begin
while
current.time
<
max iterations
do
begin
Find the decrease in function value for all p ossible elementary moves.
escape :=
check for rep etitions
(current.
)
if
escape =
Do Not Escape
then
begin
choose b est move
comment:
when the ab ove pro cedure returns,
r chosen and s chosen contain the two units to be exchanged
make tabu
(r chosen,s chosen)
Swap the units contained in the locations r chosen, s chosen
Update time, function value and best so far.
end
else
escape
if
best so far.f
sub optimum
then
reactive tabu search
:=
Successful
Return from function.
end
reactive tabu search
:=
Unsuccessful
Return from function.
end
Figure 2: Skeleton of reactive tabu algorithm.
R Battiti and G Tecchiolli - The Reactive Tabu Search
8
REACTION AND ESCAPE MECHANISM
function
check for rep etitions
(
)
comment:
The function takes a current placement
as argument and returns
Escape
when an
escape
action is to be executed,
Do Not Escape
otherwise.
Cycle Max
is a constant equal to 50 in our runs, the other constants and variables
are described in the text.
begin
steps since last size change := steps since last size change+1
Search for the current conguration in the hashing structure.
Set pointer := the location of the record if it is found.
if
the conguration is found
then
begin
Find the cycle length, up date last time and rep etitions:
length := current.time - pointer
"
last time
pointer
"
last time := current.time
pointer
"
repetitions := pointer
"
repetitions +1
if
pointer
"
repetitions
>
Rep
then
begin
Add the current placement to the set of often-repeated ones:
chaotic := chaotic +1
if
chaotic
>
Chaos
then
begin
Reset counter and execute
escape
after returning.
chaotic :=0
check for repetitions
:=
Escape
Return from function.
end
end
if
length
<
Cycle Max
then
begin
moving average := 0.1
length +0.9
moving average
list size := list size
Increase
steps since last size change :=0
end
end
else
If the conguration is not found, install it.
if
steps since last size change
>
moving average
then
begin
list size :=
Max
(list size
Decrease
,1)
steps since last size change :=0
end
Do not escape in the 'default' case:
check for repetitions
:=
Do Not Escape
end
Figure 3: Reaction and escap e mechanism.
R Battiti and G Tecchiolli - The Reactive Tabu Search
9
MOVE SELECTION AND ESCAPE FUNCTION
procedure
choose b est move
begin
if
a move that is not tabu or that satises the aspiration criterion is found
then
set (r chosen,s chosen) := two units to be exchanged
else
begin
If all moves are tabu and none satises the aspiration requirement
nd the b est of all moves, independently of their tabu status.
Decrease list size to decrease the number of tabu moves:
list size := list size
Decrease
set (r chosen,s chosen) := two units to be exchanged
end
end
procedure
escape
begin
Clean the hashing memory structure.
Generate a random number of steps in the given range.
rand
returns a random no. in [0,1)
steps := 1 + (1+
rand
)
moving average/2
for
i=1
to
steps
do
begin
set (r chosen,s chosen) := random exchange of two units
make tabu
(r chosen,s chosen)
update current and best
Find the decrease in function value for all p ossible elementary moves.
end
end
Figure 4: Move selection and escape function.
R Battiti and G Tecchiolli - The Reactive Tabu Search
10
0
5
10
15
20
25
30
35
40
0 1000 2000 3000 4000 5000
Iterations
list_size
percent_repetitions
0
5
10
15
20
25
30
35
40
0 10000 20000 30000 40000 50000
Iterations
list_size
chaotic_counter
Figure 5: Dynamics of the size of the tabu list (top) and evolution of the repetition counter
(bottom).
R Battiti and G Tecchiolli - The Reactive Tabu Search
11
repetitions provoke a fast increase of
list size
, while the absence of repetitions provokes a grad-
ual decrease. In Figure 5 (bottom) we show the
list size
evolution for a larger span of time (up
to 50K iterations). Frequent spikes are superimposed to a plateau of about size 8. The b ottom
curve shows the counter
chaotic
. Each time the value of
Chaos
is surpassed, the counter is
reset to zero and an escape move is executed. Note that the escapes are automatically triggered
by the evolution of the search process. In this case they are executed with a much larger time
scale with respect to the frequent spikes of reaction.
1.5 Details on REM, Hashing and Digital Tree
According to Glover and Laguna [8] a fundamental element of tabu search is the use of
exible
memory
, that embodies the creation and exploitation of structures for taking advantage of
history. In this section we present some competitive structures and algorithms for storing and
retrieving the information about the history of the search process in a fast way. Items to store
are, for example, the congurations, the corresponding function values and the iteration number
when they were encountered. The various schemes dier in their time-space complexity and in
the amount of data stored per iteration.
In the classical Reverse Elimination Scheme (REM) proposed in [7] the visited points are
not stored explicitly, but they can be derived by applying a sequence of moves that reverse the
moves applied during the search. In fact, one does not need to nd the previous points if one is
interested only in knowing which moves will lead from the current conguration to a previously
visited one, i.e. the moves that will acquire a tabu status.
REM is based on a
running list
containing all the moves executed from the initial congu-
ration. According to the
suciency property
stated in [7], let us assume that all moves
m
i
are
such that a sequence
m
j
1
m
j
2
::
m
jk
is the identity move only if it consists of couples of the
kind
m
i
m
?
1
i
, p ossibly in separated positions (
is the composition operator). In addition, we
assume that each move
m
i
has a unique inverse
m
?
1
i
and that moves commute. This is trivially
true for set-clear moves acting on single bits of a binary string, but not for the elementary
exchanges of a permutation problem like QAP.
Before each iteration, the
residual cancellation sequence
(RCS) is constructed for all previous
points, starting from the most recent ones. The RCS is the shortest sequence of moves leading to
a previously encountered conguration (all couples
m
i
m
?
1
i
are canceled using the commutative
property). A move
m
i
is tabu if its execution would repeat an old conguration, i.e. if during
the backward tracing the RCS collapses to only
m
?
1
i
. For additional details and modications
see [7] and [3].
Let
n
be the number of iterations executed. Because the computational cost of each trace is
proportional to the length of the running list (i.e. to the number of iterations), the total cost is
proportional to
n
2
. An example is presented in Figure 6 for an application to the 0-1 knapsack
problem labeled Weingartner8 dened in [18] and used in [16], with 105 binary variables and 2
constraints. The basic moves used in this case are the set-clear operations on individual bits.
A total of 16 runs with dierent random initial points has been executed. The data points
are derived from actual measurements of the CPU time on a current Sun Sparc 2 Workstation.
An interpolation of the points corresponding to a number of iterations greater than 100 gives
the following result:
CP U time
(
sec
) = 6
:
71
10
?
5
n
1
:
91
The errors on the interpolation are 2% on the multiplicative constant and 0.1% on the exponent.
The slight deviation from the quadratic form is caused by the inuence of points with low
iteration numbers: the quadratic term dominates only in the asymptotic limit. The code has
not been optimized (a C++ programming language was used) so that the multiplicative constant
is not to be considered as the smallest obtainable.
R Battiti and G Tecchiolli - The Reactive Tabu Search
12
1e-05
0.0001
0.001
0.01
0.1
1
10
100
1000
10000
100000
1 10 100 1000 10000 100000
CPU time (secs)
Iteration
Figure 6: REM time complexity in logarithmic scale (diamonds) and interpolating curve (dashed
line).
The asymptotic behavior is unchanged by the modications proposed for reducing the num-
ber of tracing steps (see the auxiliary memory structure Least used in Section 1.3 of [7]). It is
obviously changed if the backward tracing is stopped after a maximum number of steps.
The
hashing
technique is standard in computer science (see for example [1]). The basic
idea of hashing is that of storing entries in "buckets" whose index is obtained in a scattered
way from the entry itself (by using the hashing function with the element as argument). The
search time is approximately constant and equal to a very small number of machine cycles
if the number of buckets is so large that, with a high probability, dierent entries end up in
dierent buckets. In this last case, only a comparison with a single stored item is sucient.
One way of dealing with "collisions" (entries with the same bucket) is that of associating to
each bucket a list of entries, that is enlarged when new elements arrive. The version that
we describ e (
open hashing
) is based on an array of "bucket table headers", that contain the
pointer to the rst entry in the associated list (
bucket[hash val]
"
contains the rst stored
placement in the list,
bucket[hash val]
"
last time
the last time when it was encountered,
bucket[hash val]
"
repetition
the number of repetitions). Elements in the list are chained
with the pointer
next
that gives the address of the next element.
nil
pointers are used for list
termination. The number of buckets has to be large in order to make collisions a rare event. A
rule of thumb is to make it ab out two times (or larger than) the maximum number of stored
entries, assuming that the hashing function scatters the entries in an almost uniform manner.
Our use of the hashing technique is illustrated in Figure 7
The
digital tree
[11] method stores binary strings (for example) using a binary tree structure
where the decision about choosing the left or right child of a node at depth
d
depends on the
value of the
d
-th bit of the string. The storing time is proportional to the total numb er of
bits (therefore it is
constant
when the number of stored item grows). The same is true for the
worst-case retrieval time, i.e. when the item is found. If it is not found, the search is terminated
before reaching the deepest layer, as soon as the rst
nil
pointer is encountered. In Figure 8 we
show the memory conguration for the storage of strings (101111) and (100110).
We tested the digital tree technique on the 0-1 knapsack problem labeled Petersen7 (see [16]),
with 50 variables, 5 constraints and the same moves as those used for Weingartner8. We ran 60
R Battiti and G Tecchiolli - The Reactive Tabu Search
13
hash(phi5)
hash(phi2)
hash(phi3)
hash(phi4)
hash(phi1)
reptsphi1 time repts phi4 time repts phi5 time
phi2 time repts
phi3 time repts
bucket
array
Figure 7: Memory congurations for the open hashing scheme.
PTR
0
PTR
1
PTR
0
PTR
1
PTR
0
PTR
1
PTR
0
PTR
1
PTR
0
PTR
1
PTR
0
PTR
1
PTR
0
PTR
1
PTR
0
PTR
1
PTR
0
PTR
1
PTR
0
PTR
1
PTR
0
PTR
1
1
1
1
1
1
0
0
0
1
1
Figure 8: \Superposition" of the strings 101111 and 100110 stored in a digital tree.
R Battiti and G Tecchiolli - The Reactive Tabu Search
14
0
200000
400000
600000
800000
1e+06
1.2e+06
1.4e+06
1.6e+06
1.8e+06
0 50000 100000 150000 200000 250000 300000
Memory Usage (number of tree nodes)
Iteration
Figure 9: Observed memory growth for the digital tree scheme.
tests with random starts, sampling the memory usage (in terms of number of tree nodes used)
at random times. Each node requires 8 bytes for the two pointers. The results are consistent
with a linear increase (see Figure 9). The dierences between the various runs are caused by
the possible superpositions of the initial parts of dierent strings, that saves some nodes.
2 Ecacy and Eciency: a Case Study
The set of basic moves for a tabu algorithm must satisfy a \completeness" criterion, i.e. the
region of the search space "covered" by the algorithm starting from a randomly-chosen starting
point must not be too small with respect to the size of the search space.
If
m
1
; m
2
; :::; m
k
are the basic moves,
X
the search space and
span(
x
) =
f
y
2
X
j
y
=
m
j
1
m
j
2
:::m
jk
x
g
i.e. the points that can be reached with chains of basic moves. The requirement that span(
x
) =
X
for every
x
2
X
is a necessary condition for the ecacy of a tabu algorithm, if an exact
solution is required. The basic moves often correspond to simple operations as in the case of the
set-clear bit moves, but in certain applications they can be implemented by complex sequences
of operations, for example see the DROP/ADD moves used in [16].
The ecacy of the search is obviously aected by the specic tabu strategy. In fact, the
trajectory obtained by the step-by-step application of the allowed moves can show qualitatively
dierent behaviors. To illustrate this fact, let us consider the problem of nding the global
maximum of the following function:
F
6(
x; y
) = 0
:
5
?
sin
2
(100
p
x
2
+
y
2
)
?
0
:
5
(1 + 10 (
x
2
+
y
2
))
2
(1)
in the domain [
?
1
;
1]
[
?
1
;
1]. The function, apart from a trivial scaling of the
x
and
y
coordinates so that the domain becomes [-1,1], is the same as the one described in [4] and used
in [17].
In Figure 10 we show the central part of the F6 function. The global maximum is at the
origin, with a very narrow basin around it. The diculty of this function is due to the fact that
R Battiti and G Tecchiolli - The Reactive Tabu Search
15
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
Function F6
Figure 10: Function F6: Cross section of the central part, with the needle-like peak in the origin.
there are many suboptimal solutions located on concentric circles that trap algorithms based on
hill-climbing, with a high probability.
The problem of maximizing
F
6 becomes a combinatorial problem after choosing a discrete
binary encoding of the continuous interval [
?
1
;
1]. Two natural mappings between binary strings
and (
x; y
) co ordinates are obtained by discretizing each coordinate in 2
n
evenly-spaced points
identied integers
j
x
; j
y
= 0
;
1
; :::;
2
n
?
1, such that
x
= (2
=
2
n
)
j
x
?
1
; y
= (2
=
2
n
)
j
y
?
1, and
then using the binary or Gray encoding of the integers
j
x
; j
y
. The conversion between the binary
encoding
b
n
b
n
?
1
:::b
1
and the Gray encoding
g
n
g
n
?
1
:::g
1
is as follows (see for example [10]):
(
g
k
=
b
k
if
k
=
n
g
k
=
b
k
+1
b
k
if
k < n
(
b
k
=
g
k
if
k
=
n
b
k
=
b
k
+1
g
k
if
k < n
Where
is the exclusive-or operator and the second transformation must be done for decreasing
values of
k
, starting from
k
=
n
.
In our test
n
is equal to 14, so that a total of 28 bits are used for the two coordinates.
The elementary moves on the 28-bits binary string are setting-clearing individual bits. The
corresponding elementary moves in
x
?
y
plane depend on the encoding. The standard binary
encoding leads to moves of dierent sizes, larger when the more signicant bits are modied.
Gray encoding causes a similar "multi-scale" set of moves, with the important addition that
near p oints in the
x
?
y
domain can be reached by changing a
single bit
of the string.
The ecacy and eciency of the search for the dierent tabu schemes and enco dings is
showed in Table 1, that summarizes the results on 20 random starts for each version with a
maximum of 16000 iterations each.
The set of moves associated to the Gray encoding allows a more eective search in every
variant. Let us now discuss the main characteristics of the tra jectories obtained in the dierent
schemes.
In Strict-Tabu the next p oint of the trajectory is the one with the highest function value
among the new (i.e. not-yet-visited) congurations in the neighborhood. The tra jectory tries
R Battiti and G Tecchiolli - The Reactive Tabu Search
16
-Binary Coding -
Variant Succ.Nr.Iter.Nr Unsucc.Values. Iter.Nr.
Fixed 13 0.888385 14
(List Size 7) (11) (0.026073) (3)
2
Fixed 90 0.971304 92
(List Size 14) (56) (0.016403) (226)
3
Fixed 3707 0.999956 208
(List Size 21) (1876) (4.19002e-06) (104)
8
4448 0.989758 3282
(4430) (0.004806) 1171
Strict 3
5055 0.999963 2435
(1294) (0.0) (2211)
Reactive 17
-Gray Coding -
Variant Succ.Nr.Iter.Nr Unsucc.Values. Iter.Nr.
Fixed 40.911327 558
(List Size 7) (0) (0.035065) (236)
1
Fixed 189 0.990284 105
(List Size 14) (50) (0.0) (32)
14
Fixed 1623 0.0 0
(List Size 21) (430) (0.0) (0)
20
3300 0.0 0
(528) (0.0) (0)
Strict 20
1554 0.0 0
(391) (0.0) (0)
Reactive 20
Table 1: Comparison of dierent tabu schemes on the problem F6. Number of successes, mean
iterations number in the successful cases, mean best value and number of iterations to obtain it
in the unsuccessful cases. Standard deviations in parentheses.
to visit each point in a basin around a local maximum, although the multi-scale moves in the
x
?
y
plane permit "jumps" if the large-size steps lead in a basin with higher function values.
This dynamics is acceptable from the ecacy point of view, but the eciency is very low
because every basin must be almost completely lled before a new region of the search space is
entered. Actually the discovery of the optimal point is not guaranteed: in same cases the search
may be stuck if all moves would lead to previously visited points, in other cases the optimal
point can be separated from the current one by "walls" of previously visited congurations and
may never be reached. The probability of the above eects is low when the dimensionality
of the problem is high but it increases in the presence of constraints because they can limit
the number of admissible moves. We found evidence of these results in problems with many
constraints and using a simple set of elementary moves (like the 0-1 knapsack problem with
single-add and single-drop moves). It is dicult to predict the impact of the ab ove complex
dynamics on a specic problem, although the "basin-lling" eect can become worse when the
dimensions
D
of the problem grows. In fact, an attraction basin of radius
contains a number
of points proportional to
D
, so that the lling can become very time consuming. The walls
can be superated by allowing "tunneling", i.e. a passage over old congurations if this leads
to better regions. Tunneling may be favored by complex basic moves. The \pedantic" way to
explore is clearly showed in Figure 11 where the dynamics of the algorithm is depicted in the
case of a typical successful run. The high mean value of the number of iterations necessary to
nd the solution in the case of a successful run (see Table 1) is another indicator of this \almost
R Battiti and G Tecchiolli - The Reactive Tabu Search
17
exhaustive" type of search.
At this point, let us note that we do not discourage the use of S-TABU in a general way, in
fact S-TABU required the minimum number of iterations for some QAP problems illustrated in
Section 4, although the actual CPU time is larger than the time of R-TABU.
In Fixed-Tabu (i.e. tabu based on a xed list size), the next point of the tra jectory is chosen
among the congurations obtained by applying the moves which have not b een used for a number
of iterations given by list size, unless the aspiration criterion is satised. The tra jectories are
more "jagged": the exploring p oint describes orbits surrounding lo cal maxima without clear
regular patterns. The basins are sampled without visiting every point. Obviously, the range of
the explored region surrounding a lo cal maximum grows with the list size parameter and, if the
attraction basin associated to a maximum is larger than the the maximum "exploration range",
the trajectory remains indenitely trapped.
Fixed-Tabu on the F6 function do es not converge frequently (it remains trapp ed), although
it converges rapidly if the initial p oint is suitable. The success frequency, but also the number
of iterations increase with large sizes. This conclusion is derived from Table 1, where the results
of runs using dierent list size values are given. The eciency is clearly higher than in the case
of the Strict-Tabu search.
In Figure 12 we show a successful run with list size =21. Let us note that the p oints are
more scattered than the points of Figure 11, corresponding to the S-TABU case. The above
considerations remain true also in the case of variable-length list algorithms [15], if the range of
variation is limited.
The dynamics of the Reactive-Tabu metho d shows characteristics of the xed-tabu search,
but the unsuccessful cases are completely absent with Gray encoding and rare with the standard
binary encoding. Actually the three unsuccessful cases converged to a point next to the optimal
conguration in the
x
?
y
plane (with
F
6 = 0
:
99963 instead of 1.0). The maximum could not be
reached in the allotted time because this required changing the binary string of one coordinate
(
x
or
y
) from 01111111111111 to 10000000000000, i.e.
all
bits (but only a single bit with Gray
encoding). A typical trajectory is showed in Figure 13.
Let us note a similarity of the tra jectory with the tra jectory of the S-TABU on a large scale,
and with that of F-TABU in the small scale.
3 Space-Time Costs of REM, Hashing and Digital Tree
To terminate the comparison among the dierent versions of tabu search, let us concentrate on
their time and memory requirements. It is clear that the need to store the whole set of visited
congurations and the need to check if a candidate point was already visited can aect the
memory requirements and the CPU time.
Table 2 collects the asymptotic expressions for the space (memory) and time (CPU secs)
complexity of the dierent schemes. In the two rst columns we isolate the dependency on the
number of iterations
n
and in the last one we explicit the application dep endencies. Let us
note that the REM time complexity is high, being proportional to the square of the iteration
numbers. Therefore the REM scheme for S-TABU is convenient only when the memory cost of
a single conguration is very high with respect to the cost of storing a single move.
In Table 2,
n
denotes the number of iterations,
N
the problem size,
f
the function to be
optimized and
C
(
f; N
) the computational cost for evaluating the neighborhood containing
j
S
j
points, a number depending on the problem size. The constant
k
0
is the cost of the single tracing
step of REM,
k
1
is the average fraction of number of congurations evaluated in the neighbor-
hood,
D
0
is the cost of a single fetch-and-test operation on the node of the digital tree,
H
0
and
H
1
depends on the specic hashing scheme (
H
0
in the case of storing the whole conguration,
H
1
in the case of storing a single compressed item). Let us note how the dependency on the factor
R Battiti and G Tecchiolli - The Reactive Tabu Search
18
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4
-0.1
-0.05
0
0.05
0.1
-0.1 -0.05 0 0.05 0.1
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
-0.03 -0.02 -0.01 0 0.01 0.02 0.03
Figure 11: Visited points of function F6 for the strict-tabu metho d. Local maxima are located
on the concentric lines (7227 iterations). The gures show the sam e region around the global
optimum at dierent resolutions.
R Battiti and G Tecchiolli - The Reactive Tabu Search
19
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4
-0.1
-0.05
0
0.05
0.1
-0.1 -0.05 0 0.05 0.1
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
-0.03 -0.02 -0.01 0 0.01 0.02 0.03
Figure 12: Visited points of function F6 for the xed-tabu method. Local maxima are located
on the concentric lines (3577 iterations). The gures show the same region around the global
optimum at dierent resolutions.
R Battiti and G Tecchiolli - The Reactive Tabu Search
20
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4
-0.1
-0.05
0
0.05
0.1
-0.1 -0.05 0 0.05 0.1
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
-0.03 -0.02 -0.01 0 0.01 0.02 0.03
Figure 13: Visited points of function F6 for the reactive-tabu method. Local maxi ma are
located on the concentric lines (1966 iterations). The gures show the same region around the
global optimum at dierent resolutions.
R Battiti and G Tecchiolli - The Reactive Tabu Search
21
Tabu Time Space Problem
Variant Complexity Complexity Dependencies
Fixed
α
0nConstant
α
0=C(f,N)
Strict
α
1=k0
2
(REM)
α
2=k1C(f,N)k0
2
α
1n2+
α
2nnsizeof(MOVE)
Strict
(DTREE)
α
3nnsizeof(NODE)
α
3=D0N|S|+C(f,N)
Strict
α
4=H0N|S|+C(f,N)
or
α
4=H1|S|+C(f,N)
(HASH)
α
4nnsizeof(ENTRY)
Reactive
(DTREE)
α
5nnsizeof(NODE)
α
5=D0N+C(f,N)
Reactive
α
6=H0N+C(f,N)
or
α
6=H1+C(f,N)
(HASH)
α
6nnsizeof(ENTRY)
Table 2: Asymptotic requirements of CPU time and memory space for dierent tabu schemes.
j
S
j
is canceled in the expression for the time complexity of R-TABU with respect to S-TABU.
This fact can reduce the computational cost, especially for large neighborhoods.
The space-time complexity of the hashing variant is a little higher than in the digital tree
case, but it can be reduce if the hashing mechanism implements a compression mechanism as
described in [17] where the vector describing the conguration is \shrunk" into a 16 bit datum.
4 Results on the Quadratic Assignment Problem
In the Quadratic Assignment Problem of size
N
the function to be minimized is:
f
(
) =
N
X
i
=1
N
X
j
=1
a
ij
b
(
i
)
(
j
)
where the search space consists of the set of all possible permutations
of
N
integers. The
practical relevance of the problem is clear when
is interpreted as the assignment of
N
units
to
N
locations (
(
loc
) is the unit assigned to lo cation
loc
), the matrix element
a
ij
represents
the distance between the locations
i
and
j
and the element
b
ij
is the "ow" from location
i
to
location
j
. Solving the QAP problem means searching for a an assignment that minimizes the
sum of the products "distance" times "ow" (the "transportation cost").
The problems used for the tests were created by using the pseudo-random procedure de-
scribed in [15]. The symmetric and zero-diagonal matrices
a
ij
and
b
ij
are lled starting from the
values obtained from a random number generator dened by the following recursive formula:
X
k
= (
a X
k
?
1
)
MOD
m
where
a
= 16807,
m
= 2
31
?
1 and
X
0
= 123456789 (integers coded on 64 bits). The pseudo-
random numbers are scaled and converted into integers in the range (0 , 99). In detail, the
elements
a
ij
above the diagonal are lled in a row-wise manner by using successive
X
k
values
(
X
1
; X
2
; X
3
; :::
) as:
a
ij
b
(100
X
k
)
=m
c
and the lower part of the matrix is obtained from the
R Battiti and G Tecchiolli - The Reactive Tabu Search
22
symmetry requirement. The elements
b
ij
are then dened by "consuming" additional
X
k
values
in the same way.
The elementary moves for the problem consist o all possible exchanges of the lo cations
occupied by two units. Following the notation of [15] a new placement (permutation)
is
obtained from the current placement
by exchanging two units
r
and
s
:
(
k
) =
(
k
)
8
k
6
=
r; s
;
(
r
) =
(
s
);
(
s
) =
(
r
)
The complete evaluation of the neighborhood requires
O
(
N
2
) operations. In the case of
symmetric and null-diagonal matrices the
value
of a move that brings from state
to state
(i.e. the reduction (
; r; s
)
f
(
)
?
f
(
)) is:
(
; r; s
) = 2
X
k
6
=
r;s
(
a
sk
?
a
rk
)(
b
(
s
)
(
k
)
?
b
(
r
)
(
k
)
) (2)
If the move values starting from a conguration
are stored, the move values for the new
conguration
(obtained from
by exchanging units
r
and
s
) can be calculated in constant
time for
u; v
dierent from
r
or
s
by using:
(
; u; v
) = (
; u; v
) + 2(
a
ru
?
a
rv
+
a
sv
?
a
su
)(
b
(
s
)
(
u
)
?
b
(
s
)
(
v
)
+
b
(
r
)
(
v
)
?
b
(
r
)
(
u
)
)(3)
We performed a series of computational tests by running dierent versions of the tabu algo-
rithm (S-TABU, R-TABU with or without the escape mechanism) starting with dierent random
initial p oints (the same for the dierent algorithms). For comparison we report the mean values
obtained by the
robust tabu
scheme of [15].
In Table 3 we list the expected number of iterations for convergence to the best known
solution listed in [15] and the standard deviation of the estimates. Each iteration consists of the
complete neighborhood evaluation and the selection of the best move among those satisfying
the tabu or aspiration requirements. We ran a total of 30 tests for problem sizes ranging from
5 to 35. All tests reached the desired target solution.
Size (N) tests (max. iter.) R-TABU [1.1,0.9,esc] S-TABU robust TABU
5 30 (max.100K) 9.9 (1.9) 6.1 (0.5) 7.6
6 30 (max.100K) 12.2 (2.4) 7.4 (0.9) 6.6
7 30 (max.100K) 78.1 (12.2) 35.6 (3.1) 25.7
8 30 (max.100K) 40.9 (5.7) 32.5 (3.8) 29.4
9 30 (max.100K) 67.5 (12.0) 56.2 (8.1) 31.7
10 30 (max.100K) 256.7 (34.0) 161.3 (20.7) 137.1
12 30 (max.100K) 282.3 (51.4) 477.0 (95.7) 210.7
15 30 (max.100K) 1780.3 (319.0) 3642.2 (308.2) 2168.0
17 30 (max.100K) 4133.9 (646.8) 7364.2 (817.4) 5020.4
20 30 (max.500K) 37593.2 (6012.5) 25092.9 (6572.2) 34279.0
25 30 (max.1M) 38989.7 (6236.1) 20483.9 (3575.0) 80280.4
30 30 (max.2M) 68178.2 (11370.3) 48919.2 (9055.6) 146315.7
35 30 (max.4M) 281334.0 (48543.5) 146276.2 (47419.7) 448514.5(*)
Table 3: Comparison of dierent schemes of tabu search. R-TABU version with
Increase
=1.1,
Decrease
=0.9 and escape mechanism. The standard deviation of the measured average is
given in parenthesis. (*) needed the introduction of a long term memory mechanism.
It can be noted that the
reactive tabu
is competitive with the
robust tabu
, especially for large
problem sizes. The larger number of iterations for small problem sizes (
N
12) is expected
because the R-TABU scheme needs a small number of iterations in the start-up phase, when
R Battiti and G Tecchiolli - The Reactive Tabu Search
23
an appropriate list-size is "learned" from the evolution of the search (let us rememb er that the
initial size is one). Nonetheless the R-TABU scheme pays o for large problem sizes, where the
convergence to the optimal conguration is obtained in a robust way without having to dene
at the beginning a suitable list-size (or range of sizes).
The performance of the
strict tabu
scheme
in terms of iterations
is goo d for this problem:
for large problem sizes the average number of iterations for convergence is reduced with respect
to both R-TABU and S-TABU, but the advantage is lost because of the larger CPU time per
iteration (for the
N
= 35 case, S-TABU is about 3.5 times slower than R-TABU p er iteration).
The number of iterations tends to be proportional to the actual CPU time in the same manner
for the
reactive
and
robust
versions. In fact, see Section 3, the time for updating the hashing
or the digital tree memory structure is approximately
O
(
N
) and, because the neighborhood
evaluation requires
O
(
N
2
) iterations, the memory-updating comp onent tends to be negligible
for large problem sizes. The case of S-TABU is dierent because for each iteration all the
O
(
N
2
) points in the neighborho o d have to b e compared with stored congurations, with a total
cost of
O
(
N
3
), so that this is the dominant term for large
N
. If only the function values are
stored (see below), one obtains a not negligible cost of
O
(
N
2
), of the same order as that for the
neighborhood evaluation.
The CPU time per iteration on a state-of-the-art workstation (Iris from Silicon Graphics)
is approximately 6.7
N
2
s
per iteration. This value, like the relative speed of R-TABU vs.
S-TABU, was obtained by using a C-language program and the standard
cc
compiler.
4.1 Discussion of R-TABU Choices
In the following series of tests we probe the functionality of R-TABU for changes in the design of
the algorithm. First we eliminated the escap e mechanism and changed the speed with which the
list-size is increased or decreased (see the
Increase
and
Decrease
parameters in Section 1.4).
In Table 4 we present the results obtained from a series of 30 tests for each problem size (ranging
from 5 to 20). The
Increase
and
Decrease
parameters are written at the top of each column.
When the algorithm does not reach the optimum in the allowed maximum number of iterations
listed in Table 3, we report the proportion of optimal results in 30 runs.
N INC=1.1, DEC=.9 INC=1.2, DEC=.9 INC=1.1, DEC=.8 INC=1.2, DEC=.8
5 9.9 (1.9) 8.0 (1.3) 9.9 (1.9) 8.0 (1.3)
6 13.8 (2.6) 10.4 (1.8) 13.2 (2.4) 10.6 (1.9)
7 77.2 (10.1) 46.4 (5.8) 81.6 (10.3) 61.2 (7.8)
8 33.4 (4.2) 36.4 (5.8) 36.8 (4.2) 33.4 (3.9)
9 56.6 (11.7) 47.0 (7.1) 59.8 (8.8) 50.4 (7.0)
10 277.4 (45.0) 199.1 (27.3) 221.7 (35.0) 240.7 (36.1)
12 181.0 (37.4) 187.6 (35.2) 195.6 (33.5) 157.6 (19.9)
15 1962.4 (379.5)29/30 1827.7 (335.4)29/30 2153.7 (387.7) 2241.5 (393.0)29/30
17 3890.3 (556.1)26/30 5767.3 (1258.5)25/30 4136.2 (794.5)24/30 4784.8 (796.7)25/30
20 13452.9 (4489.4)7/30 15710.5 (3988.3)21/30 19856.1 (4599.2)14/30 17997.3 (3156.1)19/30
Table 4: Robustness for parameter changes: R-TABU without escape, dierent values of
In-
crease
and
Decrease
.
While the results are acceptable for the smaller problems (up to
N
= 12), starting from
N
= 15 we observed that the algorithm fails for a growing fraction of runs. The size dynamics is
not sucient to avoid traps. The search either reaches the optimum in a relatively small number
of iterations or it does not reach it at all. A p ossible explanation is that the algorithm is visiting
R Battiti and G Tecchiolli - The Reactive Tabu Search
24
only a limited portion of the search space, a portion that contains the optimal p oint in the lucky
cases and only sub-optimal values in the remaining ones. The cancellation of limit-cycles with
the list-size dynamics do es not guarantee the success and the additional escape mechanism is
therefore needed in the algorithm (see also the discussion of chaotic attractors in Section 1).
In a second series of tests we included the escape mechanism and tested dierent speeds for
list-size variation, see Table 5. Success is obtained in
all
cases and the number of iterations is
not aected in a critical way, justifying the choice of xed values 1.1 and 0.9 for all tests.
In the last column of Table 5 we modied the memory mechanism so that the function
value is recorded instead of the conguration, the same metho d used in [2]. Because the same
function value can be associated with
dierent
congurations, there is a small probability of "false
alarms", i.e. reactions of the algorithm when there is no actual repetition of congurations. The
advantage of the method is that the memory requirement is reduced: only a single 32-bit integer
is stored instead of the entire conguration. The tests show no statistically signicant dierence
with respect to the case when the precise conguration is saved.
N INC=1.2, DEC=.9 INC=1.1, DEC=.8 INC=1.2, DEC=.8 INC=1.1, DEC=.9,
f
5 8.0 (1.3) 9.9 (1.9) 8.0 (1.3) 13.0 (2.3)
6 10.4 (1.8) 11.2 (2.1) 10.6 (1.9) 10.6 (1.7)
7 59.0 (12.7) 77.3 (9.2) 63.8 (11.3) 98.5 (13.1)
8 44.1 (7.6) 40.4 (5.5) 45.1 (6.6) 39.9 (5.3)
9 62.3 (6.9) 80.2 (8.5) 53.4 (5.7) 58.5 (7.7)
10 210.3 (28.4) 181.6 (29.1) 254.7 (39.9) 236.1 (33.5)
12 203.9 (57.6) 218.6 (30.1) 195.3 (39.3) 234.7 (40.2)
15 1981.0 (357.3) 1677.3 (222.2) 1994.7 (370.9) 1828.3 (519.1)
17 4408.5 (699.4) 4487.9 (918.3) 4215.5 (606.7) 4347.3 (764.6)
20 51648.5 (11499.3) 23652.6 (4117.1) 29230.3 (6440.7) 46019.6 (7567.9)
Table 5: Robustness for parameter changes R-TABU with escape. Dierent values of
Increase
and
Decrease
.
More sophisticated "compression" techniques are describ ed by Wo o dru and Zemel [17],
where a hashing function is used to compress the vector describing the conguration. To adapt
our hashing algorithm described in Section 1.5 to their proposal, it is sucient to use the entries
of the
bucket array
(see Figure 7) as ags for the existence of a conguration with the given
index. In this case a single bit is sucient for each slot.
4.2 New Sub-Optimal Solutions
Encouraged by the results obtained in the previous sections, we ran a series of tests for larger
problem sizes (from
N
= 40 to
N
= 100). While for the smaller sizes we duplicated the
optimal values listed in [15] and could not reach lower values (therefore conrming their status
of "provably or probably optimal solutions"), for the larger sizes we could surpass all best known
solutions listed in the cited paper, often by large relative amounts. The new obtained solution
values and the percent below Taillard's values are listed in Table 6.
For the
N
= 40 case we ran a total of 10 tests, stopping when Taillard's value was reached
or overcome. In all cases this value was overcome, in 6 out of 10 cases the new b est value
(
f
= 3141702) was obtained. Excessive computing times prohibited extensive tests for larger
sizes, but the results obtained in a single test (for
N
= 50
;
60
;
80 and 100) are extremely
encouraging. In particular the optimal solution for
N
= 100 was ameliorated by almost 0
:
4% in
about 500K iterations.
R Battiti and G Tecchiolli - The Reactive Tabu Search
25
N new b est percent iteration
40 3141702 -0.1529 % 1048900.2 (295738.0) 10 tests
50 4948508 -0.0514 % 7628548, 1 test
60 7228214 -0.6024 % 3071920, 1 test
80 13558710 -0.1718 % 4767363, 1 test
100 21160946 -0.3993 % 542561, 1 test
Table 6: Best solutions obtained and percent reduction with respect to Taillard's values. R-
TABU with
Increase
=1.1,
Decrease
=.9, escap e and storage of
f
values.
5 Conclusions
The Tabu technique pioneered the use of exible memory structures in the search process. In
the present work we presented b oth an overview of ecient storing and retrieval techniques to
speed-up the search, and a new
reactive
version of tabu, where the appropriate size of the tabu
list is adapted to the history of the search process.
The
hashing
and
digital tree
storing and retrieval methods permit the rapid comparison
of a candidate conguration with all points previously encountered, in
O
(1) time. These fast
mechanisms can be used as the building block of both the
reactive
and
strict
versions of tabu.
In the strict case, when the only forbidden moves are those leading to previously visited points,
the trajectory obtained is the same as that of the Reverse Elimination Metho d, the dierence
being in the search sp eed.
The
reactive tabu
with the escape diversication technique and the exploitation of fast mem-
ory structures does not need the
a priori
choice of the list size and shows a robust and ecient
convergence on the chosen test problems. An additional use of hashing functions is that ad-
vocated by [17], where a conguration vector is mapped to a "compressed" datum given by
its hashing index. A comparable compression can be obtained by storing the function values,
the method that we used for experimenting on the large-size QAP problems. Apparently the
occurrence of the same function values for dierent congurations does not impair the ecacy
and eciency of the search.
The utility of the reaction mechanism, as compared to the strict cycle-avoidance conrms
that avoiding cycles is not the ultimate goal of the search process [8], the broader objective
being that of stimulating a "bold" exploration of the search space.
A straightforward parallel implementation of a primitive version of R-TABU was presented in
[2], where independent searches are executed in the dierent nodes. We are now experimenting
the use the above mentioned memory structures in the fully-parallel case, where the information
contained in a set of suboptimal congurations is used to create a new set of candidate points
(see also [9]).
ACKNOWLEDGEMENTS
The authors would like to thank Professors Fred Glover, Dave Woo dru, and Stefan Voss for
useful comments and for sending both relevant preprints and the data for some benchmark
problems used in the pap er. The anonymous referees helped to improve the clarity of the paper.
The hardware facilities for the computational tests were kindly made available by the I.N.F.N.
group of the University of Trento.
R Battiti and G Tecchiolli - The Reactive Tabu Search
26
References
[1]
A.V. AHO, J. E. HOPCROFT, and J.D. ULLMAN, 1985.
Data structures and
algorithms
, Addison-Wesley.
[2]
R. BATTITI and G. TECCHIOLLI, 1992.
Parallel Biased Search for Combinatorial
Optimization,
Microprocessors and Microsystems 16(7)
, 351{367.
[3]
F. DAMMEYER, P. FORST and S. VOSS, 1991.
On the cancellation sequence
method of tabu search,
ORSA Journal of Computing 3
, 262{265.
[4]
L. DAVIS, 1991.
Handbook of Genetic Algorithms
, Van Nostrand Reinhold, New York.
[5]
F. GLOVER, C. McMILLAN and B. NOVICK, 1985.
Interactive Decision Soft-
ware and Computer Graphics for Architectural and Space Planning,
Annals of Operations
Research 5
, 557{573.
[6]
F. GLOVER, 1989.
Tabu Search - part I,
ORSA Journal on Computing 1(3)
, 190{206.
[7]
F. GLOVER, 1990.
Tabu Search - part II,
ORSA Journal on Computing 2(1)
, 4{32.
[8]
F. GLOVER and M. LAGUNA, 1992.
Tabu Search, in
Modern Heuristic Techniques
for Combinatorial Problems
, Blackwell Publishing, in press.
[9]
F. GLOVER, J. KELLY and M. LAGUNA, 1992.
Genetic Algorithms and Tabu
Search: Hybrids for Optimization, Manuscript, University of Colorado, Boulder.
[10]
D. F. ELLIOT and K. R. RAO, 1982
Fast Transforms, Algorithms, Analyses Appli-
cations
, Academic Press, Orlando, Florida.
[11]
D.E. KNUTH, 1973.
The Art of Computer Programming Vol. III: Sorting and Search-
ing
, Addison-Wesley, Reading, Mass.
[12]
R. SERRA and G. ZANARINI, 1990.
Complex Systems and Cognitive Processes
,
Springer Verlag, Berlin.
[13]
H. G. SHUSTER, 1984.
Deterministic Chaos
, Physik-Verlag, Weinheim.
[14]
J. SKORIN-KAPOV, 1990.
Tabu search applied to the quadratic assignment problem.
ORSA Journal on Computing 2(1)
, 33{45.
[15]
E. TAILLARD, 1991.
Robust taboo search for the quadratic assignment problem,
Par-
allel Computing 17
, 443{455.
[16]
F. DAMMEYER and S. VOSS, 1992.
Dynamic Tabu List Management Using the
Reverse Elimination Metho d, Manuscript, Technische Hochschule Darmstadt, Germany.
To appear in
Annals of Operations Research
, 1992.
[17]
D. L. WOODRUFF and E. ZEMEL, 1991.
Hashing Vectors for Tabu Search, Techni-
cal Report 90-08, Northwestern University, Evanstown. To app ear in
Annals of Operations
Research
, 1992.
[18]
H. M. WEINGARTNER and D. N. NESS, 1967
Methods for the Solution of the
Multi-Dimensional 0/1 Knapsack Problem,
Operations Research 15
, 83{103.
... ii) we make a comparison between two implementations of ANN in the feed-forward architecture; namely, a simulated neural network trained by the usual backpropagation algorithm [12] is compared to a hardware realization of a low-precision-weight Multi-Layer Perceptron (MLP), the neurochip Totem [13], whose training-by-example task is accomplished by a derivative free combinatorial optimization algorithm called Reactive Tabu Search (RTS) [14,15]. ...
... The neurochip Totem, has been conceived to implement Multi-Layer Perceptrons in the feed-forward architecture on the basis of a simple and fast computational structure [13]. This is achieved escaping the necessity of derivative calculations, turning the task of training-by-examples into a combinatorial optimization problem, whose solution is searched then by means of the Reactive Tabu Search method [14,15]. Differently from the derivative-based backpropagation algorithms, RTS thus allows simple and low precision computation, using only up to 8 bits for the synaptic weights and 16 bits to represent the feature parameters 4 : this is indeed the basis of the simple and fast computational structure said above. ...
Preprint
Full-text available
We show that neural network classifiers can be helpful to discriminate Higgs production from background at LHC in the Higgs mass range M= 200 GeV. We employ a common feed-forward neural network trained by the backpropagation algorithm for off-line analysis and the neural chip Totem, trained by the Reactive Tabu Search algorithm, which could be used for on-line analysis.
... The TS method has been often used in combinatorial optimization problems (Sexton et al. (1998)) (Hertz et al. (1995)) (Battiti and Tecchiolli (1995)), and there are few applications of TS for training feedforward neural networks (Sexton et al. (1998)) (Battiti and Tecchiolli (1995)) (Karaboga and Kalinli (1997)). As with SA, TS has not been popular in simultaneous optimization of neural network weights and architectures. ...
... The TS method has been often used in combinatorial optimization problems (Sexton et al. (1998)) (Hertz et al. (1995)) (Battiti and Tecchiolli (1995)), and there are few applications of TS for training feedforward neural networks (Sexton et al. (1998)) (Battiti and Tecchiolli (1995)) (Karaboga and Kalinli (1997)). As with SA, TS has not been popular in simultaneous optimization of neural network weights and architectures. ...
Conference Paper
Full-text available
This paper shows results of using simulated annealing and tabu search for optimizing neural network architectures and weights. The algorithms generate networks with good generalization performance (mean classification error for the test set was 5.28% for simulated annealing and 2.93% for tabu search) and low complexity (mean number of connections used was 11.68 out of 36 for simulated annealing and 11.49 out of 36 for tabu search) for an odor recognition task in an artificial nose.
... This constitutes a massive search space, and is among the main reasons why derivative-based optimization is typically used. Add noise to selected indices in θ i+1 : 16 if F (θ i+1 ) < score then 17 Update rules: ...
... Otherwise, there is no evidence to suggest significant interest in single-candidate, derivative-free methods to train neural networks. There were attempts, however, in the 1990s as in [16] and [17]. Those attempts, though promising at the time, are not representative of modern deep learning tasks. ...
Conference Paper
Full-text available
Deep neural networks (DNNs) have been found useful for many applications. However, training and designing those networks can be challenging and is considered more of an art or an engineering process than rigorous science. In this regard, the important process of choosing hyperparameters is relevant. In addition, training neural networks with derivative-free methods is somewhat understudied. Particularly, with regards to hyperparameter selection. The paper presents a small-scale study of 3 hyperparameters choice for convolutional neural networks (CNNs). The networks were trained with two single-candidate optimization algorithms: Stochastic Gradient Descent (derivative-based) and Local Search (derivative-free). The CNN is trained on a subset of the FashionMNIST dataset. Experimental results show that hyperparameter selection can be detrimental for Local Search, especially regarding network parametrization. Moreover, the best hyperparameter choices didn't match for both algorithms. Future investigation into the training dynamics of Local Search is likely needed.
... Methods for the optimization of functions of real variables with no derivatives are an additional option, for example direct search [7] or versions of Simulated Annealing for functions of continuous variables [8]. Intelligent schemes based on adaptive diversification strategies by prohibiting selected moves in the neighborhood are considered in [9]. ...
Preprint
This paper proposes a new algorithm based on multi-scale stochastic local search with binary representation for training neural networks. In particular, we study the effects of neighborhood evaluation strategies, the effect of the number of bits per weight and that of the maximum weight range used for mapping binary strings to real values. Following this preliminary investigation, we propose a telescopic multi-scale version of local search where the number of bits is increased in an adaptive manner, leading to a faster search and to local minima of better quality. An analysis related to adapting the number of bits in a dynamic way is also presented. The control on the number of bits, which happens in a natural manner in the proposed method, is effective to increase the generalization performance. Benchmark tasks include a highly non-linear artificial problem, a control problem requiring either feed-forward or recurrent architectures for feedback control, and challenging real-world tasks in different application domains. The results demonstrate the effectiveness of the proposed method.
... The network calculates the output using the provided input using its weight and bias vector in the forward step, and a loss function is produced from the output values. The weights are modified in the backward step using a gradient descent technique or something similar until the desired result is obtained or the neural network reaches its maximum loop limit [20][21][22]. ...
Article
Full-text available
Mathematical models are beneficial in representing a given dataset, especially in engineering applications. Establishing a model can be used to visualise how the model fits the dataset, as was done in this research. The Levenberg–Marquardt model was proposed as a training algorithm and employed in the backpropagation algorithm or multilayer perceptron. The dataset obtained from a previous researcher consists of electrochemical data of uncoated and coated additive manufacturing steel with Ni-P at several testing periods. The model’s performance was determined by regression value (R) and mean square error (MSE). It was found that the R values for non-coated additive manufacturing steel were 0.9999, 1, and 1, while MSE values were 1.14 × 10−6, 2.99 × 10−7, and 5.10 × 10−7 for 0 h, 288 h, and 572 h, respectively. Meanwhile, the R values for the Ni-P coated additive manufacturing steel were 1, 1, 1, while the MSE values were 1.06 × 10−7, 1.15 × 10−8, and 6.59 × 10−8 for 0 h, 288 h, and 572 h, respectively. The high R and low values of MSE emphasise that this training algorithm has shown good accuracy. The proposed training algorithm provides an advantage in processing time due to its ability to approach second-order training speed without having to compute the Hessian Matrix.
... Although ACO is suitable for combinatorial problems, it was shown in Socha and Blum (2007), Blum and Socha (2005), Hong et al. (2003) and Liu et al. (2006) that this algorithm is able to provide very promising results when applying to NNs as well. Some other heuristicbased learning algorithms in the literature are as follows: DE based trainer (Ilonen, Kamarainen, & Lampinen, 2003;Slowik & Bialko, 2008), ABC-based trainer (Karaboga, Akay, & Ozturk, 2007;Ozturk & Karaboga, 2011), gravitational search algorithm (GSA)-based trainer (Ghalambaz et al., 2001;Mirjalili, Hashim, & Sardroudi, 2012), Tabu search (TS)-based trainer (Battiti & Tecchiolli, 1995;Kalinli & Karaboga, 2004), biogeography-based optimization (BBO)-based trainer , ES based trainer (Wienholt, 1993); magnetic optimization algorithm (MOA) (Mirjalili & Sadiq, 2011), grey wolf optimizer (GWO) (Mirjalili, 2015). Despite the many advantages of metaheuristic algorithms, the problem of falling into the local optimum still exists. ...
Article
Spotted hyena optimizer (SHO) is a novel metaheuristic optimization algorithm based on the behavior of spotted hyena and their collaborative behavior in nature. In this paper, we design a spotted hyena optimizer for training feedforward neural network (FNN), which is regarded as a challenging task since it is easy to fall into local optima. Our objective is to apply metaheuristic optimization algorithm to tackle this problem better than the mathematical and deterministic methods. In order to confirm that using SHO to train FNN is more effective, five classification datasets and three function-approximations are applied to benchmark the performance of the proposed method. The experimental results show that the proposed SHO algorithm for optimization FNN has the best comprehensive performance and has more outstanding performance than other the state-of-the-art metaheuristic algorithms in terms of the performance measures.
Article
Full-text available
The learning process and hyper-parameter optimization of artificial neural networks (ANNs) and deep learning (DL) architectures is considered one of the most challenging machine learning problems. Several past studies have used gradient-based back propagation methods to train DL architectures. However, gradient-based methods have major drawbacks such as stucking at local minimums in multi-objective cost functions, expensive execution time due to calculating gradient information with thousands of iterations and needing the cost functions to be continuous. Since training the ANNs and DLs is an NP-hard optimization problem, their structure and parameters optimization using the meta-heuristic (MH) algorithms has been considerably raised. MH algorithms can accurately formulate the optimal estimation of DL components (such as hyper-parameter, weights, number of layers, number of neurons, learning rate, etc.). This paper provides a comprehensive review of the optimization of ANNs and DLs using MH algorithms. In this paper, we have reviewed the latest developments in the use of MH algorithms in the DL and ANN methods, presented their disadvantages and advantages, and pointed out some research directions to fill the gaps between MHs and DL methods. Moreover, it has been explained that the evolutionary hybrid architecture still has limited applicability in the literature. Also, this paper classifies the latest MH algorithms in the literature to demonstrate their effectiveness in DL and ANN training for various applications. Most researchers tend to extend novel hybrid algorithms by combining MHs to optimize the hyper-parameters of DLs and ANNs. The development of hybrid MHs helps improving algorithms performance and capable of solving complex optimization problems. In general, the optimal performance of the MHs should be able to achieve a suitable trade-off between exploration and exploitation features. Hence, this paper tries to summarize various MH algorithms in terms of the convergence trend, exploration, exploitation, and the ability to avoid local minima. The integration of MH with DLs is expected to accelerate the training process in the coming few years. However, relevant publications in this way are still rare.
Article
Nowadays, artificial intelligence has gained recognition in every aspect of life. Artificial neural networks, one of the most efficient artificial intelligence techniques, is remarkably successful in computers' acquisition of the learning and interpretation capabilities of humans and attainment of meaningful results. Whether artificial intelligence networks can yield meaningful results is directly related to how the network is trained. The traditional algorithms, which are used to train artificial intelligence networks, do not always yield successful results in complicated problems and real-life problems. Metaheuristic algorithms are efficient techniques developed in order to figure out time-consuming and challenging problems fast and as optimally as possible. This study makes use of the artificial bee colony algorithm, which has been widely used recently in solving many kinds of problems so as to train artificial neural networks efficiently. Within this study, different strategies are used on subpopulations, so that the algorithm can search without getting tangled with the local optima. And also same and different parameter settings are considered for each population to reflect different search behaviours. The proposed approach was analysed through applied results of different data sets. The results yielded that the used strategies can be promising alternatives to the current search algorithms.
Preprint
Full-text available
Neural network and metaheuristic algorithm are two technique of machine learning. Each of them is employed for different purposes. NN is used for classification, regression, etc., however, a metaheuristic algorithm is used to find the optima in a huge search space. To use a neural network, first, it should be trained. In the process of training, the weight of each connection is obtained so that the total error (real output minus predicted amount) became minimum. That’s where stochastic search space come in to help find the best set of weights. Therefore, finding weights of a neural network can be interpreted as finding the optima of a vast search space. The focus of this paper is on the use of metaheuristic algorithm on training and evolving structure of feed-forward neural networks
Preprint
Full-text available
Neural network and metaheuristic algorithm are two technique of machine learning. Each of them is employed for different purposes. NN is used for classification, regression, etc., however, a metaheuristic algorithm is used to find the optima in a huge search space. To use a neural network, first, it should be trained. In the process of training, the weight of each connection is obtained so that the total error (real output minus predicted amount) became minimum. That’s where stochastic search space come in to help find the best set of weights. Therefore, finding weights of a neural network can be interpreted as finding the optima of a vast search space. The focus of this paper is on the use of metaheuristic algorithm on training and evolving structure of feed-forward neural networks
Article
Full-text available
Previous work on analog VLSI implementation of multilayer perceptrons with on-chip learning has mainly targeted the implementation of algorithms like backpropagation. Although backpropagation is efficient, its implementation in analog VLSI requires excessive computational hardware. In this paper we show that, for analog parallel implementations, the use of gradient descent with direct approximation of the gradient using “weight perturbation” instead of backpropagation significantly reduces hardware complexity. Gradient descent by weight perturbation eliminates the need for derivative and bidirectional circuits for on-chip learning, and access to the output states of neurons in hidden layers for off-chip learning. We also show that weight perturbation can be used to implement recurrent networks. A discrete level analog implementation showing the training of an XOR network as an example is described.
Article
Full-text available
This is the second half of a two part series devoted to the tabu search metastrategy for optimization problems. Part I introduced the fundamental ideas of tabu search as an approach for guiding other heuristics to overcome the limitations of local optimality, both in a deterministic and a probabilistic framework. Part I also reported successful applications from a wide range of settings, in which tabu search frequently made it possible to obtain higher quality solutions than previously obtained with competing strategies, generally with less computational effort. Part II, in this issue, examines refinements and more advanced aspects of tabu search. Following a brief review of notation, Part II introduces new dynamic strategies for managing tabu lists, allowing fuller exploitation of underlying evaluation functions. In turn, the elements of staged search and structured move sets are characterized, which bear on the issue of finiteness. Three ways of applying tabu search to the solution of integer programming problems are then described, providing connections also to certain nonlinear programming applications. Finally, the paper concludes with a brief survey of new applications of tabu search that have occurred since the developments reported in Part I. Together with additional comparisons with other methods on a wide body of problems, these include results of parallel processing implementations and the use of tabu search in settings ranging from telecommunications to neural networks. INFORMS Journal on Computing, ISSN 1091-9856, was published as ORSA Journal on Computing from 1989 to 1995 under ISSN 0899-1499.
Article
We propose an algorithm for combinatorial optimization where an explicit check for the repetition of configurations is added to the basic scheme of Tabu search. In our Tabu scheme the appropriate size of the list is learned in an automated way by reacting to the occurrence of cycles. In addition, if the search appears to be repeating an excessive number of solutions excessively often, then the search is diversified by making a number of random moves proportional to a moving average of the cycle length. The reactive scheme is compared to a “strict” Tabu scheme that forbids the repetition of configurations and to schemes with a fixed or randomly varying list size. From the implementation point of view we show that the Hashing or Digital Tree techniques can be used in order to search for repetitions in a time that is approximately constant. We present the results obtained for a series of computational tests on a benchmark function, on the 0-1 Knapsack Problem, and on the Quadratic Assignment Problem. INFORMS Journal on Computing, ISSN 1091-9856, was published as ORSA Journal on Computing from 1989 to 1995 under ISSN 0899-1499.