Content uploaded by Walid Ben-Ameur

Author content

All content in this area was uploaded by Walid Ben-Ameur on Oct 16, 2014

Content may be subject to copyright.

Computational Optimization and Applications, 29, 369–385, 2004

c

2004 Kluwer Academic Publishers. Manufactured in The Netherlands.

Computing the Initial Temperature

of Simulated Annealing

WALID BEN-AMEUR walid.benameur@int-evry.fr

GET/INT—CNRS/SAMOVAR, Institut National des T´

el´

ecommunications, 9, rue Charles Fourier,

91011 Evry, France

Received May 6, 2003; Revised December 30, 2003

Abstract. The classical version of simulated annealing is based on a cooling schedule. Generally, the initial

temperature is set such that the acceptance ratio of bad moves is equal to a certain value χ0.Inthis paper, we ﬁrst

propose a simple algorithm to compute a temperature which is compatible with a given acceptance ratio. Then,

we study the properties of the acceptance probability. It is shown that this function is convex for low temperatures

and concave for high temperatures. We also provide a lower bound for the number of plateaux of a simulated

annealing based on a geometric cooling schedule. Finally, many numerical experiments are reported.

Keywords: simulated annealing, initial temperature, acceptance ratio

Introduction

Simulated annealing is a general probabilistic local search algorithm, proposed 20 years ago

by Cerny [3] and Kirkpatrick et al. [10] to solve difﬁcult optimization problems. Many large

instances of practical difﬁcult problems were successfully solved by simulated annealing

(see, e.g., [2, 7–9]).

To use a simulated annealing algorithm, one has ﬁrst to deﬁne a set of solutions, generally

large, representing the solutions of an optimization problem. Then a neighborhood structure

is deﬁned. To ﬁnd a good solution we move from a solution to one of its neighbors in

accordance to a probabilistic criterion. If the cost decreases then the solution is changed and

the move is accepted. Otherwise, the move is accepted only with a probability depending on

the cost increase and a control parameter called temperature. Classically, the probability to

accept bad moves, i.e. moves with increase in terms of cost, is high at the beginning to allow

the algorithm to escape from local minimum. This probability decreases in a progressive

waybyreducing the temperature. The method used to decrease the temperature is generally

called cooling schedule. The performance of the algorithm strongly depends on the choice

of the cooling schedule and the neighborhood structure.

Many theoretical papers focused on an optimal cooling schedule (see, e.g., [1, 4, 6, 12,

13]). One of the most important results may be the proof of optimality of a logarithmic

cooling schedule given in Hajek [6]. However, the number of iterations needed to guarantee

to ﬁnd of a global optimum is generally very large (see, e.g., [1]). The transition probability

Pij from state ito state jis deﬁned as the product of a generation probability Gij and an

acceptance probability Aij.

370 BEN-AMEUR

The acceptance probability considered in this paper is the one deﬁned by Metropolis

[11]:

Aij =exp −Ej−Ei

Tif Ej>Eiand Aij =1 otherwise (1)

where Tis the current temperature and Ei(resp. Ej)isthe energy of state i(resp. j).

A state is a solution of an optimization problem and energy is the cost function that has

to be minimized. We indifferently use energy and cost to designate the same thing.

We also assume that the homogenous Markov chain representing the simulated annealing

at a given temperature Tis irreducible (i.e. all states can be reached from any other state

with a positive probability) and aperiodic (see, e.g., [1]). These conditions are generally

satisﬁed.

If we assume that the generation probabilities are symmetrical (Gij =Gji), the stationary

distribution is nothing other than the Boltzmann distribution: πi=exp(−Ei

T)

jexp(−Ej

T).

Another generation strategy that is commonly used is given by

Gij =

1

|N(i)|if j∈N(i)

0 else

(2)

where N(i)isthe set of neighbors of i. The stationary distribution is then given by

πi=|N(i)|exp −Ei

T

j|N(j)|exp −Ej

T(3)

As previously said, one of the most important properties of simulated annealing is its hill

climbing feature. This is achieved by accepting some increasing cost moves. Consequently,

the average probability of accepting these moves is very important to evaluate the ability

of simulated annealing to escape from local minimum.

This acceptation ratio strongly depends on the temperature. To allow the simulated an-

nealing to ﬁnd good solutions, one has to carefully compute the initial temperature. This

parameter plays an important role in simulated annealing, but is of course only a piece of a

large puzzle. This paper will focus on this initial temperature and some other properties of

the acceptance ratio.

Many methods have been proposed in literature to compute the initial temperature T0.It

is suggested in Kirkpatrick et al. [10] to take T0=Emax where Emax is the maximal cost

difference between any two neighboring solutions.

Another scheme based on a more precise estimation of the cost distribution is proposed

with multiple variants (see, e.g., [1, 16]). It is recommended to choose T0=Kσ2

∞where

Kis a constant typically ranging from 5 to 10 and σ2

∞is the second moment of the energy

distribution when the temperature is ∞.σ∞is estimated using a random generation of some

solutions.

COMPUTING INITIAL TEMPERATURE OF SIMULATED ANNEALING 371

A more classical and intuitive method is described in Kirkpatrick et al. [10]. It consists

in computing a temperature such that the acceptance ratio is approximately equal to a given

value χ0. First, we choose a large initial temperature. Then, we have to perform a number of

transitions using this temperature. The ratio of accepted transitions is compared with χ0.If

it is less than χ0, then the temperature is multiplied by 2. The procedure continues until the

observed acceptance ratio exceeds χ0. Other variants are proposed to obtain an acceptance

ratio which is close to χ0.Itis, for example, possible to divide the temperature by 3 if the

acceptance ratio is much higher than χ0. Using this kind of rules, cycles are avoided and a

good estimation of the temperature can be found.

Another procedure is proposed in Johnson et al. [7, 8]. Temperature is obtained using the

formula T0=− E

ln(χ0), where Eis an estimation of the cost increase of strictly positive

transitions. This estimation is again obtained by randomly generating some transitions.

Notice that −δt

ln(χ0), where δtis the cost increase induced by a transition t,isthe temperature

allowing this transition to be accepted with a probability χ0.Inother terms, T0=− E

ln(χ0)

is the average of these temperatures over a set of random transitions.

Finally, note that to accelerate the simulated annealing, a heuristic is sometimes used

to ﬁnd a good initial solution. Then, simulated annealing is applied with a low initial

temperature (see, e.g., [5, 7, 15]). An algorithm is provided by Varanelli [15] to compute

an initial temperature such that the expected cost of the best solution that can be found at

this temperature is approximately equal to the cost of the solution given by the heuristic.

Anew algorithm to compute the initial temperature is given in this paper. The algorithm

is fast and accurate. It is presented in next section. The convergence is proved in Section 1.

Some other properties of the acceptance probability are presented in Section 2. Many

numerical experiments are reported and commented in Section 3. Finally, some concluding

remarks are given in Section 4.

1. An efﬁcient algorithm to compute the temperature

The initial temperature is often chosen such that the acceptance probability is approximately

equal to a certain value, for example, 0.8 (see, e.g., [1]). Let tbe a strictly positive transition

and let maxt(resp. mint)bethe state after (resp. before) the transition. As we assumed

that the transition is strictly positive, then Emaxt>Emint.Tosimplify notation, we use δtto

designate the cost difference Emaxt−Emint. Using the generation strategy (2), the acceptance

probability is given by:

χ(T)=tpositive πmint

1

|N(mint)|exp −δt

T

tpositive πmint

1

|N(mint)|

.(4)

Note that πmint

1

|N(mint)|represents the probability to generate a transition twhen the

energy states are distributed in conformance with the stationary distribution (3). Moreover,

exp(−δt

T)isthe probability to accept a positive transition t. Thus, χ(T)isthe conditional

expectation of the acceptance of positive transitions.

372 BEN-AMEUR

We will use an estimation ˆχ(T)ofthis acceptance probability based on a random set S

of positive transitions. ˆχ(T)isdeﬁned as follows:

ˆχ(T)=t∈Sπmint

1

|N(mint)|exp −δt

T

t∈Sπmint

1

|N(mint)|

=t∈Sexp −Emaxt

T

t∈Sexp −Emint

T.(5)

Now, let us assume that we are looking for a temperature T0such that χ(T0)=χ0where

χ0∈]0,1[ is the wanted acceptance probability. We will propose a simple iterative method

to compute such a temperature. In fact, we will consider ˆχ(T) instead of χ(T). First, we

randomly generate a set of positive transitions S. This can be done, for example, by gener-

ating some states and a neighbor for each state. The energies Emaxtand Emintcorresponding

with the states of the subset Sare stored. Then we choose a value T1for temperature. T1

can be any positive number.

T1may be far from T0.ToﬁndT0we use the recursive formula

Tn+1=Tnln (ˆχ(Tn))

ln(χ0)1

p

.(6)

where pis a real number ≥1.

When ˆχ(Tn) becomes close to χ0we can stop: Tnis a good approximation of the wanted

temperature T0.

Please note that we use at each iteration the energy values previously stored. In other

words, we do not have to generate new transitions.

Before proving the convergence of our procedure, let us give a summary of the whole

process. denotes a small real number (e.g., 10−3).

Computing the temperature of simulated annealing

Step 1.

(a) Estimate the number of samples Sneeded to compute ˆχ(T).

(b) Generate and store Srandom positive transitions.

(c) Set T1at any strictly positive number and set n=1.

Step 2.

(a) Compute ˆχ(Tn)=t∈Sexp(−Emaxt

Tn)

t∈Sexp(−Emint

Tn).

(b) If |ˆχ(Tn)−χ0|≤, return Tn.

Otherwise

–Tn+1=Tn(ln( ˆχ(Tn))

ln(χ0))1

p.

–n=n+1.

–gotoStep 2(a).

End.

COMPUTING INITIAL TEMPERATURE OF SIMULATED ANNEALING 373

Steps 1(a) and (b) will be discussed later.

As said before, the value of T1can be any strictly positive number. However, to slightly

accelerate the whole process, we compute T1using the formula given in introduction Johnson

[7, 8]:

T1=− t∈Sδt

Sln(χ0).(7)

In the rest of this section, we ﬁrst prove under some assumptions the convergence of

the algorithm described above. Then we give some remarks about the sampling procedure

needed by the algorithm.

1.1. Algorithm convergence

To show the convergence of the algorithm, we will prove that T→T(ln( ˆχ(T))

ln(χ0))1

pis a non

decreasing function and T→ˆχ(T)isastrictly increasing function. This means that T0is

a unique ﬁxed point of function T→T(ln( ˆχ(T))

ln(χ0))1

pand min(T0,Tn)≤Tn+1≤max(T0,Tn).

Notice that if T→T(ln( ˆχ(T))

ln(χ0))1

pis a non decreasing function when p=1, then it will have the

same behavior for any p≥1. This can be seen by computing the derivative of the logarithm

of this function: 1

T+1

p

ˆχ(T)

ˆχ(T)ln( ˆχ(T)) .Ifweassume that ˆχ(T)≥0, then 1

T+1

p

ˆχ(T)

ˆχ(T)ln( ˆχ(T))

clearly increases when pincreases. Therefore, we will focus on p=1.

Before giving the proofs of the wanted results, we will present an hypothesis that will be

used to simplify calculation.

Hypothesis 1.1. We assume that the energy levels Emintand the cost differences δtof the

set of transitions Sare independent.

More precisely, given a temperature T,weassume that the positive transitions are gen-

erated in conformance with the equilibrium distribution. As we focus here on S,wecon-

sider the conditional distribution where the probability to generate a transition t0is given

by πmint0

1

|N(mint0)|

t∈Sπmint

1

|N(mint)|

.Itisnatural to assume that there is no correlation between {δi,Emini}

and {δj,Emin j}where iand jare two transitions of Sobtained by independent trials in

conformance with the conditional equilibrium distribution. However, in Hypothesis 1.1

we also assume that Eminiis independent with δi. This assumption is less easy to under-

stand. In fact, it depends on the distribution which is related to temperature. Said another

way, even if it is valid for some temperatures, it will be invalid for others. Note how-

ever that we do not need this assumption to be strictly satisﬁed. The convergence of the

algorithm is obtained in almost all cases when p=1. Moreover, it can be ensured by

increasing the value of the parameter p. More details will be given in the end of this

subsection.

374 BEN-AMEUR

Lemma 1.2. Assuming hypothesis 1.1is valid,then we have

i,j∈S,i<jexp−Emini+Emin j

TEmini−Emin jexp −δi

T−exp −δj

T

i,j∈Sexp −Emaxi+Emin j

Tδi=0.

Proof: Let L(resp. R)bethe numerator (resp. denominator) of the ratio given in the

lemma. We want to show that L

R=0. In fact, Lis nothing but 1

2i,j∈Sexp(−Emini+Eminj

T)

(Emini−Emin j)(exp(−δi

T)−exp(−δj

T)).

Moreover, using formulas 3 and 2, the expectation of ( Emini−Emin j)(exp(−δi

T)−

exp(−δj

T)) is given by E((Emini−Emin j)(exp(−δi

T)−exp(−δj

T)) |i,j∈S)=i,j∈S

exp(−Emini

T)

k∈Sexp(−Emink

T)×exp(−

Emin j

T)

k∈Sexp(−Emink

T)×((Emini−Emin j)(exp(−δi

T)−exp(−δj

T))). Note that we

used here the fact that the transitions of Sare independent. We obtain

L=1

2

k∈S

exp−Emink

T2

EEmini−Emin jexp −δi

T

−exp −δj

T

i,j∈S.

On the other hand,

R=

i,j∈S

exp−Emaxi+Emin j

Tδi

=

j∈S

exp−Emin j

T

i∈S

exp −Emaxi

Tδi

=

j∈S

exp −Emin j

T2

i∈S

exp −Emini

T

j∈Sexp −Emin j

Texp −δi

Tδi

=

j∈S

exp −Emin j

T2

Eexp −δi

Tδi

i∈S

Combination of the previous expressions related to Land Rleads to

L

R=1

2

EEmini−Emin jexp −δi

T−exp −δj

Ti,j∈S

Eexp −δi

Tδii∈S

COMPUTING INITIAL TEMPERATURE OF SIMULATED ANNEALING 375

Now using Hypothesis 1.1, one can deduce that

EEmini−Emin jexp −δi

T−exp −δj

T

i,j∈S

=EEmini−Emin ji,j∈SEexp −δi

T−exp −δj

T

i,j∈S

Finally,

L

R=1

2

EEmini−Emin ji,j∈SEexp −δi

T−exp −δj

Ti,j∈S

Eexp −δi

Tδii∈S

=0

which means that L

R=0.

Note that it is possible to build a particular small example for which both Hypothesis 1.1

and Lemma 1.2 are not valid. However, our experimental results (Section 3) show that the

algorithm works very well in practice, and the convergence is obtained in almost all cases.

More details will be given in the end of this subsection.

Proposition 1.3. Assuming Hypothesis 1.1is valid,then the derivative of ˆχ(T)is given

by:

ˆχ(T)=1

T2i∈Sexp −Emaxi

Tδi

i∈Sexp −Emini

T.

Proof: Let us calculate ˆχ(T).

ˆχ(T)

=i∈SEmaxiexp −Emaxi

Tj∈Sexp −Emin j

T−i∈SEminiexp −Emini

Tj∈Sexp −Emax j

T

T2i∈Sexp −Emini

T2

=i,jexp −Emaxi+Eminj

TEmaxi−Emin j

T2i∈Sexp −Emini

T2

=i,jexp −Emaxi+Eminj

TEmini−Emin j+δi

T2i∈Sexp −Emini

T2

=i,jexp −Emaxi+Eminj

TEmini−Emin j+i,jexp −Emaxi+Emin j

Tδi

T2i∈Sexp −Emini

T2

=i<jexp −Emini+Eminj

TEmini−Emin jexp −δi

T−exp −δj

T+i,jexp −Emaxi+Eminj

Tδi

T2i∈Sexp −Emini

T2

376 BEN-AMEUR

Using Lemma 1.2, the previous expression becomes:

ˆχ(T)=i,jexp −Emaxi+Emin j

Tδi

T2i∈Sexp −Emini

T2

=1

T2i∈Sexp −Emaxi

Tδi

i∈Sexp −Emini

T

Please note that even if Hypothesis 1.1 is not valid, we can be satisﬁed with a small value

of the ratio L

Rof Lemma 1.2 to obtain a good approximate value of ˆχ(T).

Proposition 1.3 tells us that ˆχ(T)>0. To ﬁnish our proof of convergence, we have to

show that T→Tln(ˆχ(T))

ln(χ0)is a non decreasing function.

Proposition 1.4. Assuming Hypothesis 1.1is valid,then (Tln( ˆχ(T)))≤0.

Proof: Derivative of Tln( ˆχ(T)) is given by ln( ˆχ(T)) +Tˆχ(T)

ˆχ(T).

Using expression (5), one can write:

1

ˆχ(T)=i∈Sexp −Emini

T

i∈Sexp −Emaxi

T

1

ˆχ(T)=i∈Sexp −Emaxi

Texp δi

T

i∈Sexp −Emaxi

T

=

i∈S

exp −Emaxi

T

j∈Sexp −Emax j

Texp δi

T

By concavity of logarithm, one can deduce that ln( 1

ˆχ(T))≥i∈S

exp(−Emaxi

T)

j∈Sexp(−Emaxj

T)

δi

T.Said

another way, we have

ln( ˆχ(T)) <

i∈S

exp −Emaxi

T

j∈Sexp −Emax j

T−δi

T.

On the other hand, using Proposition 1.3, we obtain:

Tˆχ(T)

ˆχ(T)=1

Ti∈Sexp −Emaxi

Tδi

i∈Sexp −Emaxi

T.

Combination of the previous two results leads to (Tln( ˆχ(T)))≤0.

Propositions 1.3 and 1.4 clearly imply the convergence of the algorithm: (Tn)n∈Nis

monotonous and bounded.

COMPUTING INITIAL TEMPERATURE OF SIMULATED ANNEALING 377

Note that even if the results of this subsection are based on Hypothesis 1.1, they are

useful in a general context. Let us give an insight into this point. First, to show that ˆχ(T)

is an increasing function, we only need to have the ratio of Lemma 1.2 close to 0. In other

terms, we do not really require Hypothesis 1.1 to be strictly satisﬁed. Second, we already

said in the beginning of this subsection that the derivative of the logarithm of the function

T→T(ln( ˆχ(T))

ln(χ0))1

pincreases when pincreases. Said another way, if we get some convergence

problems when p=1 due to the inaccuracy of Hypothesis 1.1, we can sufﬁciently increase

pto allow T→T(ln( ˆχ(T))

ln(χ0))1

pto be an increasing function. Moreover, our experimental

results (Section 3) show that in most of cases p=1issufﬁcient. We needed to take p=2

in about 1 run per 1000 to guarantee the convergence. However, to strictly guarantee the

convergence, we can slightly modify the algorithm of Section 1. If an oscillation is detected

(i.e., (Tn+1−Tn)(Tn−Tn−1)<0) then we multiply pby 2 and we continue the algorithm.

1.2. On the sampling procedure

The ﬁrst steps of the algorithm (1(a) and (b)) can be called the sampling procedure.

Even if the convergence of the algorithm is shown for a set Sof random transitions

satisfying Hypothesis 1.1 (and experimentally in Section 3), the set Smust be representative

to allow the algorithm to give a temperature which is close to the wanted temperature.

Obviously, the exact temperature is given when Scontains all positive transitions. However,

it is generally not possible to consider all transitions.

We will not give a deﬁnitive description of the sampling procedure: we think that it

depends on the nature and the size of the problem that we are solving.

One can, for example, begin with a small value of S, compute the temperature, and

increase the number of transitions until the temperature becomes stable.

It is also possible to use the temperature T1of Eq. (7) to perform a ﬁrst simulated annealing

plateau. All positive transitions considered during this plateau can be stored and then used

to compute a more accurate temperature using our algorithm.

Numerical experiments that will be presented in Section 3, are based, for each value

of S,onarandom generation of independent transitions. Notice that when we use the

transitions encountered during a plateau, transitions may not be independent.

2. Other properties

More properties of the acceptance probability are given in this section.

Proposition 2.1. Assuming Hypothesis 1.1is valid,then ˆχ(T)≤1

eT .

Proof: It was shown in Proposition 1.3 that ˆχ(T)=1

T2i∈Sexp(−Emaxi

T)δi

i∈Sexp(−Emini

T).

It implies that ˆχ(T)=1

Ti∈Sexp(−Emini

T)exp(−δi

T)δi

T

i∈Sexp(−Emini

T)

Moreover, the function x→xexp(−x)isbounded by 1/e. Using this upper bound in

the previous approximation leads to the wanted result.

378 BEN-AMEUR

An important straightforward corollary dealing with the evolution of the acceptance

probability is given below.

Corollary 2.2. Assuming Hypothesis 1.1is valid,then ˆχ(T+T)−ˆχ(T)≤1

eln(1+T

T).

Proof: A simple integration of the inequality ˆχ(T)≤1

eT gives the wanted result.

Using the fact that ln(1 +x)≤x, one can deduce that ˆχ(T+T)−ˆχ(T)≤1

e

T

T.

Corollary 2.2 implies that even if you divide the temperature by 2, you can not expect to

reduce the acceptance probability by more than ln(2)

e≈0,255.

It is also possible to use the previous corollary to have an indication about the number

of iterations of a classical simulated annealing with a geometric cooling schedule. As-

sume that the temperature is multiplied by α<1atthe end of each plateau. In most of

cases, the initial temperature is chosen such that the acceptance probability of positive

moves is equal to χ0. The stopping criterion can also be a low acceptance probability χf.

Using Corollary 2.2, one can easily show that the number of plateaux Nis higher than

e(χ0−χf)

ln(1/α).

Proposition 2.3. Assuming Hypothesis 1.1is valid,then the number of plateaux is higher

than e(χ0−χf)

ln(1/α).

Assume, for example, that χ0=0.9, χf=0.05 and α=0.95. The number of plateaux

is then higher than 46. If α=0.99, we need more than 230 plateaux. More precisely, if

α=1−where 1, then the number of plateaux is approximately higher than e(χ0−χf)

.

Note that one of the advantages of the upper bound 1

eT given in Proposition 2.1 is its

independence with energy. However, this upper bound is bad for low temperatures. In fact,

one can easily see that ˆχ(T)≈C

T2exp(−

T) where Cis a constant depending on the

energies and the transitions and is the difference between the smallest Emaxiand the

smallest Emini. This clearly implies that ˆχ(T)isapproximately equal to 0 when Tis close

to 0.

To ﬁnish our study of the acceptance probability, let us consider the second derivative

ˆχ(T).

First, another simple lemma will be stated.

Lemma 2.4. Assuming Hypothesis 1.1is valid,then

i,j∈S,i<jexp −Emini+Emin j

TEmini−Emin jδiexp −δi

T−δjexp −δj

T

i,j∈Sexp −Emaxi+Emin j

Tδi2=0.

This lemma can be easily proved using the same kind of arguments as those given to

prove the validity of Lemma 1.2.

COMPUTING INITIAL TEMPERATURE OF SIMULATED ANNEALING 379

Proposition 2.5. Assuming Hypothesis 1.1is valid,then the second derivative is given

by:

ˆχ(T)=1

T4i∈Sexp −Emaxi

Tδi(δi−2T)

i∈Sexp −Emini

T.

Proof: A simple derivation of T2ˆχ(T) using Proposition 1.3 gives the following:

(T2ˆχ(T))

=1

T2i∈SEmaxiexp −Emaxi

Tδij∈Sexp −Eminj

T−i∈SEminiexp −Emini

Tδij∈Sexp −Emaxj

T

i∈Sexp −Emini

T2

=1

T2i,j∈S,i<jexp −Emini+Eminj

TEmini−Emin j(δiexp −δi

T−δjexp −δj

T)

i∈Sexp −Emini

T2

+1

T2i,j∈Sexp −Emaxi+Eminj

Tδi2

i∈Sexp −Emini

T2

Using Lemma 2.4, we obtain:

(T2ˆχ(T))=1

T2i,j∈Sexp −Emaxi+Emin j

Tδi2

i∈Sexp −Emini

T2

=1

T2i∈Sexp −Emaxi

Tδi2

i∈Sexp −Emini

T

Using again Proposition 1.3 gives the wanted result.

One can easily see that the expression given above is positive when Tis close to 0 and

negative when Tis sufﬁciently high.

Corollary 2.6. Assuming Hypothesis 1.1is valid,the probability to accept positive tran-

sitions is convex for low temperatures and concave for high temperatures.

Finally, we give here simple bounds for the second derivative ˆχ(T).

Corollary 2.7. Assuming Hypothesis 1.1is valid,then

1

T2(2 −2√2) exp(√2−2) ≤ˆχ(T)≤1

T2(2 +2√2) exp(−2−√2).

380 BEN-AMEUR

Proof: Proposition 2.5 tells us that ˆχ(T)=1

T4i∈Sexp(−Emaxi

T)δi(δi−2T)

i∈Sexp(−Emini

T). Simple calculation

leads to:

ˆχ(T)=1

T2i∈Sexp −Emini

Texp −δi

Tδi

Tδi

T−2

i∈Sexp −Emini

T

=1

T2i∈Sexp −Emini

Tfδi

T

i∈Sexp −Emini

T

=1

T2Efδi

T

i∈S

where fdenotes the function x∈R+→x(x−2) exp(−x). One can easily see that the

minimum of fis obtained for x=2−√2 and the maximum is reached for x=2+

√2. Thus, for any x≥0wehave(2−2√2) exp(√2−2) ≤f(x)≤(2 +2√2)

exp(−2−√2).

Combinations of these inequalities and the expression of ˆχ(T) leads to the wanted

result.

Notice that (2 −2√2) exp(√2−2) ≈−0.462 and (2 +2√2) exp(−2−√2) ≈0.159.

3. Numerical experiments

To illustrate the results given in the previous sections, extensive numerical experiments are

carried out. Two kind of problems are considered: random problems and traveling salesman

problems (TSP).

3.1. Random problems

A random problem is represented by a symmetric graph G=(V,E) where Vis the set of

vertices corresponding with the solutions of the problem, and Eis the set of edges repre-

senting the neighborhood relationship. Each solution (vertex) has a random cost. Simulated

annealing is applied to ﬁnd a minimum cost solution.

The graphs used to represent random problems are described using three parameters:

the number of vertices V, the graph density d=E

VV−1

2

and an upper bound Ufor the

maximum degree. Two sets of problems are considered in this section where (V,d,U)=

(5 ×104,10−4,30) in the ﬁrst case and (2 ×106,10−6,5) in the second one. Notice that

these problems are small and can be solved by enumeration. However, the aim of this

section is only to study the procedure proposed in this paper to compute the temperature of

a simulated annealing.

Due to space limitation, we only give a summary of the results: more details will be

provided on Kluwer’s web site.

COMPUTING INITIAL TEMPERATURE OF SIMULATED ANNEALING 381

We consider 4 different values of the number of samples S: 20, 100, 500 and 2500.

We also try 8 values of the acceptance probability χ0: 0.99, 0.9, 0.7, 0.5, 0.3, 0.1, 0.05 and

0.01. Positive transitions are randomly and independently generated. For each value of χ0

and Sthe algorithm of Section 1 is used to give a temperature. Convergence was always

obtained with p=1(Formula 6).

Simulated annealing is then applied using the given temperature, without any decrease,

to provide the experimental acceptance probability ¯χ.Wealso apply simulated annealing

using the temperature T1deﬁned by Eq. (7) to obtain χ(T1) deﬁned as the experimental

acceptance probability corresponding with T1. Recall that this temperature is commonly

used by simulated annealing practitioners.

All experiments are repeated 200 times (200 runs for each value of χ0and S). Results

are expressed in terms of average and standard deviation.

The ratios corresponding with Lemmas 1.2 and 2.4 are also considered here in order

to check the validity of our hypothesis. We also focus on the number of iterations of the

algorithm needed to compute the temperature. This number is null if the temperature T1

given by Eq. (7) obtained in Step 1(c) is the ﬁnal result of the algorithm. The precision term

used in the algorithm is here equal to 10−3.

To summarize, we focus on the average and the standard deviation of the following quan-

tities: the experimental acceptance probability ¯χ, the experimental acceptance probability

obtained with T1of Eq. (7), the ratios corresponding with Lemmas 1.2 and 2.4 and the

number of iterations of the algorithm of Section 1.

First, the algorithm used to compute the temperature converges at each run. The average

and the standard deviation values corresponding with Lemmas 1.2 and 2.4 are generally

low. They are not null because Hypothesis 1.1 is not always valid. When both χ0and S

are very low, Lemma 1.2 does not seem to be satisﬁed. In fact, the denominator of the

fraction deﬁned in Lemma 1.2 is close to 0 when Tis very low. Moreover, Hypothesis 1.1 is

unlikely to be satisﬁed when |S|is very small. Nevertheless, as previously said, convergence

is always obtained, even if Hypothesis 1.1 is not satisﬁed.

The number of iterations needed by the algorithm to achieve convergence is small for

high values of χ0.Infact, when χ0is high, temperature T1seems to be a good one. This is

shown by χ(T1) which is very close to χ0when χ0is high.

χ(T1) becomes far from χ0when χ0is low. We also observed that the standard deviation

of χ(T1) generally decreases when Sincreases. However, the average value of χ(T1) can

be considered as stable. This is due to the fact that T1is based on the average of the cost

variations. Although, ¯χis generally close to χ0more than χ(T1).

Moreover, our numerical experiments show that for high values of χ0,asmall value of

Scan be sufﬁcient to obtain a temperature achieving the goal. However, if χ0is low, we

need higher values of S.

The difference between χ0and ¯χdecreases when the problem size decreases. The problem

size considered in the ﬁrst case is smaller than the second problem size and the results are

slightly better: the standard deviation of ¯χis lower in the ﬁrst case than in the second one.

Said another way, when the problem size islarger, we may need more transitions to compute

a temperature.

382 BEN-AMEUR

Table 1.T1=500 and p=1.

χ0.9352 0.5114 0.5007 0.5000

T500.0 48.3411 46.7738 46.6768

Table 2.T1=5 and p=1.

χ0.0461 0.2661 0.4687 0.4978 0.4999 0.5000

T5.0 22.1953 42.3929 46.3526 46.6492 46.6687

Table 3.T1=500 and p=2.

χ0.9352 0.8067 0.6818 0.5993 0.5536 0.5286 0.5152 0.5081

0.5043 0.5023 0.5012 0.5006 0.5003 0.5002 0.5000

T500.0 155.4690 86.5412 64.3252 55.2274 51.0096 48.9224 47.8533

47.2957 47.0020 46.8464 46.7639 46.72 46.6966 46.6842

Table 4.T1=5 and p=2.

χ0.0461 0.1141 0.2208 0.3291 0.4043 0.4482 0.4722 0.4852

0.4921 0.4958 0.4976 0.4988 0.4994 0.4997 0.4998 0.4999

T5.0 10.5345 18.6429 27.5214 34.8496 39.8314 42.8594 44.5904

45.5479 46.0682 46.3483 46.4984 46.5785 46.6213 46.6441 46.6562

Finally, some sequences of temperatures and acceptance ratios obtained by the algo-

rithm are given in Tables 1–4. We intend to compute a temperature corresponding with

an acceptance ratio χ0=0.5. Instead of using T1, deﬁned in Eq. (7), we take T1=500 to

perform the experiments of Tables 1 and 3, and T1=5 for the experiments of Tables 2 and

4. The precision is here 10−4.Parameter pused in the recursive formula 6 is equal to 1 in

Tables 1 and 2. We take p=2inTables 3 and 4.

We can see that the convergence of the algorithm is slower for p=2 than for p=1. This

can be easily understood from formula 6. However, recall that when pincreases then the

derivative of the function T→T(ln( ˆχ(T))

ln(χ0))1

pis more likely to be positive (Section 1.1). Said

another way, as we know that Hypothesis 1.1 is not always valid, it may be more advisable

to take p>1. Although we did not need to take p>1tocompute the temperature in the

case of these random problems, we will see in the next section that this may be necessary

in very few of cases.

3.2. Traveling salesman problems

The algorithm of Section 1 is applied here for the traveling salesman problem. We consider

transitions based on the very classical 2-OPT moves (see, for example [9]). Two sets of

COMPUTING INITIAL TEMPERATURE OF SIMULATED ANNEALING 383

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0,35

0,45

0,53

0,64

0,97

1,09

1,47

1,87

2,26

2,74

3,35

4,39

5,42

6,77

8,36

10,4

12,7

16,1

20,2

25,2

30,9

39,3

49

61,1

77

95,1

116

148

183

233

293

Temperature

acceptance probability

Figure 1.Evolution of χ(T) for TSP(100).

randomly generated Euclidean instances are used: 20-city and 100-city problems. Notice

that these problems can now be solved by efﬁcient cutting plane algorithms.

Three values of Sare considered: 20, 2500 and 62500. Eight values of χ0are used:

0.99, 0.9, 0.7, 0.5, 0.3, 0.1, 0.05 and 0.01. 200 experiments are performed for each value of

χ0and S.

Comments given in the previous subsection about random problems are still valid here.

However, we noticed that S=2500 was sufﬁcient to give a good approximation of the

temperature in the previous case, but does not seem to be sufﬁcient here for some values

of χ0.Inother words, when the size of problems increases, we need larger size samples to

obtain a good approximation of the temperature.

The procedure used to compute the temperature does not converge in about 1 run per

1000 when p=1. If p=2, the algorithm always converges. In fact, when the algorithm

is applied, it is easy to check whether there is an oscillation in terms of temperature. In this

case, we multiply pby 2 and we continue the algorithm.

The experimental acceptance ratio is plotted as a function of temperature. The graph of

ﬁgure 1 corresponds with the 100-city problem. This ratio is, as claimed in Corollary 2.6,

convex for low temperatures and concave for high temperatures.

Finally, we studied the experimental number of plateaux of simulated annealing when a

geometric cooling schedule (α=0.95) is used. Different values of the initial acceptance ratio

χ0and the ﬁnal acceptance ratio χfare considered. The number of Plateaux is compared with

the lower bound of Proposition 2.3. This lower bound seems to be good for intermediate

acceptance ratios and bad for extremal acceptance ratios (either very low or very high

ratios).

384 BEN-AMEUR

4. Conclusion

A simple algorithm is proposed to compute a temperature such that the acceptance ratio of

increasing cost moves is equal to a given value χ0.Wealso presented some properties of

the acceptance ratio.

We think that this algorithm can be used as a component of either classical or modern

simulated annealing schemes for which the cooling schedule is not necessarily monotonous.

The procedure proposed in this paper can be modiﬁed in different ways. First, the formula

linking Tnand Tn+1can be changed. Said another way, even if the algorithm is very fast, one

can ﬁnd another formula allowing a faster convergence. Second, We assumed in this paper

that transitions are accepted in accordance with the Metropolis criterion. A further research

direction may consist in introducing some modiﬁcations and studying the convergence of

the algorithm when other acceptance probabilities are considered.

Finally, we considered the acceptance ratio of positive transitions. Although, we may want

to focus on the acceptance of all transitions. A similar algorithm allowing the computation

of a temperature that is compatible with a given acceptation probability of all transitions is

now under study.

Acknowledgment

Iwould like to thank an anonymous referee for his valuable comments.

References

1. E. Aarts, J. Korst, and P. van Laarhoven, “Simulated annealing,” in Local Search in Combinatorial Optimiza-

tion, E.H.L. Aarts and J.K. Lenstra (Eds.), John Wiley and Sons, Ltd., 1997, pp. 91–120.

2. E. Bonomi and J.-L. Lutton, “The N-city traveling salesman problem: Statistical mechanisms and the metropo-

lis algorithm,” SIAM Review, vol. 26, pp. 551–568, 1984.

3. V. Cerny, “A thermodynamical approach to the traveling salesman problem: An efﬁcient simulated algorithm,”

Journal of Optimization Theory and Applications, vol. 45, pp. 41–51, 1985.

4. H. Cohn and M. Fielding, “Simulated annealing: Searching for an optimal temperature schedule,” SIAM J.

Optim, vol. 3, pp. 779–802, 1999.

5. L. Grover, “A new simulated annealing algorithm for standard cell placement,” in Proc. IEEE ICCAD-86,

Santa Clara, CA, 1986.

6. B. Hajek, “Cooling schedules for optimal annealing,” Math. Oper. Res., vol. 13, pp. 311–329, 1988.

7. D.S. Johnson, C.R. Aragon, L.A. McGeoch, and C. Schevon, “Optimization by simulated annealing: An

experimental evaluation; part I, graph partitioning,” Operations Research, vol. 37, pp, 865–892, 1989.

8. D.S. Johnson, C.R. Aragon, L.A. McGeoch, and C. Schevon, “Optimization by simulated annealing: An

experimental evaluation; part II, graph coloring and number partitioning,” Operations Research, vol. 39, pp.

378–406, 1991.

9. D.S. Johnson and L.A. McGeoch, “The traveling salesman problem: A case study,” in Local Search in

Combinatorial Optimization, E.H.L. Aarts and J.K. Lenstra (Eds.), John Wiley and Sons, Ltd., 1997, pp.

215–310.

10. S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, pp.

671–680, 1983.

11. N.A. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller, “Equation of state calculations by

fast computing machines,” J. Chem. Phys, vol. 21, pp. 1087–1092, 1953.

COMPUTING INITIAL TEMPERATURE OF SIMULATED ANNEALING 385

12. D. Mitra, F. Romeo, and A.L. Sangiovanni-vincentelli, “Convergence and ﬁnite-time bahavior of simulated

annealing,” Advances in Applied Probability, vol. 18, pp. 747–771, 1986.

13. F. Romeo and A.L. Sangiovanni-Vincentelli, “A theoretical framework for simulated annealing,” Algorith-

mica, vol. 6, pp. 302–345, 1991.

14. P. Van Laarhoven and E. Aarts, Simulated Annealing: Theory and Applications, D. Reidel Publishing Com-

pany, 1988.

15. J. Varanelli, “On the acceleration of simulated annealing,” PhD Dissertation, University of Virginia, 1996.

16. S. White, “Concepts of scale in simulated annealing,” in Proc. IEEE Int. Conference on Computer Design,

Port Chester, 1984.