ArticlePDF Available

Computing the Initial Temperature of Simulated Annealing

Abstract and Figures

The classical version of simulated annealing is based on a cooling schedule. Generally, the initial temperature is set such that the acceptance ratio of bad moves is equal to a certain value 0. In this paper, we first propose a simple algorithm to compute a temperature which is compatible with a given acceptance ratio. Then, we study the properties of the acceptance probability. It is shown that this function is convex for low temperatures and concave for high temperatures. We also provide a lower bound for the number of plateaux of a simulated annealing based on a geometric cooling schedule. Finally, many numerical experiments are reported.
Content may be subject to copyright.
Computational Optimization and Applications, 29, 369–385, 2004
2004 Kluwer Academic Publishers. Manufactured in The Netherlands.
Computing the Initial Temperature
of Simulated Annealing
GET/INT—CNRS/SAMOVAR, Institut National des T´
ecommunications, 9, rue Charles Fourier,
91011 Evry, France
Received May 6, 2003; Revised December 30, 2003
Abstract. The classical version of simulated annealing is based on a cooling schedule. Generally, the initial
temperature is set such that the acceptance ratio of bad moves is equal to a certain value χ0.Inthis paper, we first
propose a simple algorithm to compute a temperature which is compatible with a given acceptance ratio. Then,
we study the properties of the acceptance probability. It is shown that this function is convex for low temperatures
and concave for high temperatures. We also provide a lower bound for the number of plateaux of a simulated
annealing based on a geometric cooling schedule. Finally, many numerical experiments are reported.
Keywords: simulated annealing, initial temperature, acceptance ratio
Simulated annealing is a general probabilistic local search algorithm, proposed 20 years ago
by Cerny [3] and Kirkpatrick et al. [10] to solve difficult optimization problems. Many large
instances of practical difficult problems were successfully solved by simulated annealing
(see, e.g., [2, 7–9]).
To use a simulated annealing algorithm, one has first to define a set of solutions, generally
large, representing the solutions of an optimization problem. Then a neighborhood structure
is defined. To find a good solution we move from a solution to one of its neighbors in
accordance to a probabilistic criterion. If the cost decreases then the solution is changed and
the move is accepted. Otherwise, the move is accepted only with a probability depending on
the cost increase and a control parameter called temperature. Classically, the probability to
accept bad moves, i.e. moves with increase in terms of cost, is high at the beginning to allow
the algorithm to escape from local minimum. This probability decreases in a progressive
waybyreducing the temperature. The method used to decrease the temperature is generally
called cooling schedule. The performance of the algorithm strongly depends on the choice
of the cooling schedule and the neighborhood structure.
Many theoretical papers focused on an optimal cooling schedule (see, e.g., [1, 4, 6, 12,
13]). One of the most important results may be the proof of optimality of a logarithmic
cooling schedule given in Hajek [6]. However, the number of iterations needed to guarantee
to find of a global optimum is generally very large (see, e.g., [1]). The transition probability
Pij from state ito state jis defined as the product of a generation probability Gij and an
acceptance probability Aij.
The acceptance probability considered in this paper is the one defined by Metropolis
Aij =exp EjEi
Tif Ej>Eiand Aij =1 otherwise (1)
where Tis the current temperature and Ei(resp. Ej)isthe energy of state i(resp. j).
A state is a solution of an optimization problem and energy is the cost function that has
to be minimized. We indifferently use energy and cost to designate the same thing.
We also assume that the homogenous Markov chain representing the simulated annealing
at a given temperature Tis irreducible (i.e. all states can be reached from any other state
with a positive probability) and aperiodic (see, e.g., [1]). These conditions are generally
If we assume that the generation probabilities are symmetrical (Gij =Gji), the stationary
distribution is nothing other than the Boltzmann distribution: πi=exp(Ei
Another generation strategy that is commonly used is given by
Gij =
|N(i)|if jN(i)
0 else
where N(i)isthe set of neighbors of i. The stationary distribution is then given by
πi=|N(i)|exp Ei
j|N(j)|exp Ej
As previously said, one of the most important properties of simulated annealing is its hill
climbing feature. This is achieved by accepting some increasing cost moves. Consequently,
the average probability of accepting these moves is very important to evaluate the ability
of simulated annealing to escape from local minimum.
This acceptation ratio strongly depends on the temperature. To allow the simulated an-
nealing to find good solutions, one has to carefully compute the initial temperature. This
parameter plays an important role in simulated annealing, but is of course only a piece of a
large puzzle. This paper will focus on this initial temperature and some other properties of
the acceptance ratio.
Many methods have been proposed in literature to compute the initial temperature T0.It
is suggested in Kirkpatrick et al. [10] to take T0=Emax where Emax is the maximal cost
difference between any two neighboring solutions.
Another scheme based on a more precise estimation of the cost distribution is proposed
with multiple variants (see, e.g., [1, 16]). It is recommended to choose T0=Kσ2
Kis a constant typically ranging from 5 to 10 and σ2
is the second moment of the energy
distribution when the temperature is .σis estimated using a random generation of some
A more classical and intuitive method is described in Kirkpatrick et al. [10]. It consists
in computing a temperature such that the acceptance ratio is approximately equal to a given
value χ0. First, we choose a large initial temperature. Then, we have to perform a number of
transitions using this temperature. The ratio of accepted transitions is compared with χ0.If
it is less than χ0, then the temperature is multiplied by 2. The procedure continues until the
observed acceptance ratio exceeds χ0. Other variants are proposed to obtain an acceptance
ratio which is close to χ0.Itis, for example, possible to divide the temperature by 3 if the
acceptance ratio is much higher than χ0. Using this kind of rules, cycles are avoided and a
good estimation of the temperature can be found.
Another procedure is proposed in Johnson et al. [7, 8]. Temperature is obtained using the
formula T0=− E
ln(χ0), where Eis an estimation of the cost increase of strictly positive
transitions. This estimation is again obtained by randomly generating some transitions.
Notice that δt
ln(χ0), where δtis the cost increase induced by a transition t,isthe temperature
allowing this transition to be accepted with a probability χ0.Inother terms, T0=− E
is the average of these temperatures over a set of random transitions.
Finally, note that to accelerate the simulated annealing, a heuristic is sometimes used
to find a good initial solution. Then, simulated annealing is applied with a low initial
temperature (see, e.g., [5, 7, 15]). An algorithm is provided by Varanelli [15] to compute
an initial temperature such that the expected cost of the best solution that can be found at
this temperature is approximately equal to the cost of the solution given by the heuristic.
Anew algorithm to compute the initial temperature is given in this paper. The algorithm
is fast and accurate. It is presented in next section. The convergence is proved in Section 1.
Some other properties of the acceptance probability are presented in Section 2. Many
numerical experiments are reported and commented in Section 3. Finally, some concluding
remarks are given in Section 4.
1. An efficient algorithm to compute the temperature
The initial temperature is often chosen such that the acceptance probability is approximately
equal to a certain value, for example, 0.8 (see, e.g., [1]). Let tbe a strictly positive transition
and let maxt(resp. mint)bethe state after (resp. before) the transition. As we assumed
that the transition is strictly positive, then Emaxt>Emint.Tosimplify notation, we use δtto
designate the cost difference EmaxtEmint. Using the generation strategy (2), the acceptance
probability is given by:
χ(T)=tpositive πmint
|N(mint)|exp δt
tpositive πmint
Note that πmint
|N(mint)|represents the probability to generate a transition twhen the
energy states are distributed in conformance with the stationary distribution (3). Moreover,
T)isthe probability to accept a positive transition t. Thus, χ(T)isthe conditional
expectation of the acceptance of positive transitions.
We will use an estimation ˆχ(T)ofthis acceptance probability based on a random set S
of positive transitions. ˆχ(T)isdefined as follows:
|N(mint)|exp δt
=tSexp Emaxt
tSexp Emint
Now, let us assume that we are looking for a temperature T0such that χ(T0)=χ0where
χ0]0,1[ is the wanted acceptance probability. We will propose a simple iterative method
to compute such a temperature. In fact, we will consider ˆχ(T) instead of χ(T). First, we
randomly generate a set of positive transitions S. This can be done, for example, by gener-
ating some states and a neighbor for each state. The energies Emaxtand Emintcorresponding
with the states of the subset Sare stored. Then we choose a value T1for temperature. T1
can be any positive number.
T1may be far from T0.TofindT0we use the recursive formula
Tn+1=Tnln (ˆχ(Tn))
where pis a real number 1.
When ˆχ(Tn) becomes close to χ0we can stop: Tnis a good approximation of the wanted
temperature T0.
Please note that we use at each iteration the energy values previously stored. In other
words, we do not have to generate new transitions.
Before proving the convergence of our procedure, let us give a summary of the whole
process. denotes a small real number (e.g., 103).
Computing the temperature of simulated annealing
Step 1.
(a) Estimate the number of samples Sneeded to compute ˆχ(T).
(b) Generate and store Srandom positive transitions.
(c) Set T1at any strictly positive number and set n=1.
Step 2.
(a) Compute ˆχ(Tn)=tSexp(Emaxt
(b) If |ˆχ(Tn)χ0|≤, return Tn.
Tn+1=Tn(ln( ˆχ(Tn))
–gotoStep 2(a).
Steps 1(a) and (b) will be discussed later.
As said before, the value of T1can be any strictly positive number. However, to slightly
accelerate the whole process, we compute T1using the formula given in introduction Johnson
[7, 8]:
T1=− tSδt
In the rest of this section, we first prove under some assumptions the convergence of
the algorithm described above. Then we give some remarks about the sampling procedure
needed by the algorithm.
1.1. Algorithm convergence
To show the convergence of the algorithm, we will prove that TT(ln( ˆχ(T))
pis a non
decreasing function and Tˆχ(T)isastrictly increasing function. This means that T0is
a unique fixed point of function TT(ln( ˆχ(T))
pand min(T0,Tn)Tn+1max(T0,Tn).
Notice that if TT(ln( ˆχ(T))
pis a non decreasing function when p=1, then it will have the
same behavior for any p1. This can be seen by computing the derivative of the logarithm
of this function: 1
ˆχ(T)ln( ˆχ(T)) .Ifweassume that ˆχ(T)0, then 1
ˆχ(T)ln( ˆχ(T))
clearly increases when pincreases. Therefore, we will focus on p=1.
Before giving the proofs of the wanted results, we will present an hypothesis that will be
used to simplify calculation.
Hypothesis 1.1. We assume that the energy levels Emintand the cost differences δtof the
set of transitions Sare independent.
More precisely, given a temperature T,weassume that the positive transitions are gen-
erated in conformance with the equilibrium distribution. As we focus here on S,wecon-
sider the conditional distribution where the probability to generate a transition t0is given
by πmint0
.Itisnatural to assume that there is no correlation between {δi,Emini}
and {δj,Emin j}where iand jare two transitions of Sobtained by independent trials in
conformance with the conditional equilibrium distribution. However, in Hypothesis 1.1
we also assume that Eminiis independent with δi. This assumption is less easy to under-
stand. In fact, it depends on the distribution which is related to temperature. Said another
way, even if it is valid for some temperatures, it will be invalid for others. Note how-
ever that we do not need this assumption to be strictly satisfied. The convergence of the
algorithm is obtained in almost all cases when p=1. Moreover, it can be ensured by
increasing the value of the parameter p. More details will be given in the end of this
Lemma 1.2. Assuming hypothesis 1.1is valid,then we have
i,jS,i<jexpEmini+Emin j
TEminiEmin jexp δi
Texp δj
i,jSexp Emaxi+Emin j
Proof: Let L(resp. R)bethe numerator (resp. denominator) of the ratio given in the
lemma. We want to show that L
R=0. In fact, Lis nothing but 1
(EminiEmin j)(exp(δi
Moreover, using formulas 3 and 2, the expectation of ( EminiEmin j)(exp(δi
T)) is given by E((EminiEmin j)(exp(δi
T)) |i,jS)=i,jS
Emin j
T)×((EminiEmin j)(exp(δi
T))). Note that we
used here the fact that the transitions of Sare independent. We obtain
EEminiEmin jexp δi
exp δj
On the other hand,
expEmaxi+Emin j
expEmin j
exp Emaxi
exp Emin j
exp Emini
jSexp Emin j
Texp δi
exp Emin j
Eexp δi
Combination of the previous expressions related to Land Rleads to
EEminiEmin jexp δi
Texp δj
Eexp δi
Now using Hypothesis 1.1, one can deduce that
EEminiEmin jexp δi
Texp δj
=EEminiEmin ji,jSEexp δi
Texp δj
EEminiEmin ji,jSEexp δi
Texp δj
Eexp δi
which means that L
Note that it is possible to build a particular small example for which both Hypothesis 1.1
and Lemma 1.2 are not valid. However, our experimental results (Section 3) show that the
algorithm works very well in practice, and the convergence is obtained in almost all cases.
More details will be given in the end of this subsection.
Proposition 1.3. Assuming Hypothesis 1.1is valid,then the derivative of ˆχ(T)is given
T2iSexp Emaxi
iSexp Emini
Proof: Let us calculate ˆχ(T).
=iSEmaxiexp Emaxi
TjSexp Emin j
TiSEminiexp Emini
TjSexp Emax j
T2iSexp Emini
=i,jexp Emaxi+Eminj
TEmaxiEmin j
T2iSexp Emini
=i,jexp Emaxi+Eminj
TEminiEmin j+δi
T2iSexp Emini
=i,jexp Emaxi+Eminj
TEminiEmin j+i,jexp Emaxi+Emin j
T2iSexp Emini
=i<jexp Emini+Eminj
TEminiEmin jexp δi
Texp δj
T+i,jexp Emaxi+Eminj
T2iSexp Emini
Using Lemma 1.2, the previous expression becomes:
ˆχ(T)=i,jexp Emaxi+Emin j
T2iSexp Emini
T2iSexp Emaxi
iSexp Emini
Please note that even if Hypothesis 1.1 is not valid, we can be satisfied with a small value
of the ratio L
Rof Lemma 1.2 to obtain a good approximate value of ˆχ(T).
Proposition 1.3 tells us that ˆχ(T)>0. To finish our proof of convergence, we have to
show that TTln(ˆχ(T))
ln(χ0)is a non decreasing function.
Proposition 1.4. Assuming Hypothesis 1.1is valid,then (Tln( ˆχ(T)))0.
Proof: Derivative of Tln( ˆχ(T)) is given by ln( ˆχ(T)) +Tˆχ(T)
Using expression (5), one can write:
ˆχ(T)=iSexp Emini
iSexp Emaxi
ˆχ(T)=iSexp Emaxi
Texp δi
iSexp Emaxi
exp Emaxi
jSexp Emax j
Texp δi
By concavity of logarithm, one can deduce that ln( 1
another way, we have
ln( ˆχ(T)) <
exp Emaxi
jSexp Emax j
On the other hand, using Proposition 1.3, we obtain:
TiSexp Emaxi
iSexp Emaxi
Combination of the previous two results leads to (Tln( ˆχ(T)))0.
Propositions 1.3 and 1.4 clearly imply the convergence of the algorithm: (Tn)nNis
monotonous and bounded.
Note that even if the results of this subsection are based on Hypothesis 1.1, they are
useful in a general context. Let us give an insight into this point. First, to show that ˆχ(T)
is an increasing function, we only need to have the ratio of Lemma 1.2 close to 0. In other
terms, we do not really require Hypothesis 1.1 to be strictly satisfied. Second, we already
said in the beginning of this subsection that the derivative of the logarithm of the function
TT(ln( ˆχ(T))
pincreases when pincreases. Said another way, if we get some convergence
problems when p=1 due to the inaccuracy of Hypothesis 1.1, we can sufficiently increase
pto allow TT(ln( ˆχ(T))
pto be an increasing function. Moreover, our experimental
results (Section 3) show that in most of cases p=1issufficient. We needed to take p=2
in about 1 run per 1000 to guarantee the convergence. However, to strictly guarantee the
convergence, we can slightly modify the algorithm of Section 1. If an oscillation is detected
(i.e., (Tn+1Tn)(TnTn1)<0) then we multiply pby 2 and we continue the algorithm.
1.2. On the sampling procedure
The first steps of the algorithm (1(a) and (b)) can be called the sampling procedure.
Even if the convergence of the algorithm is shown for a set Sof random transitions
satisfying Hypothesis 1.1 (and experimentally in Section 3), the set Smust be representative
to allow the algorithm to give a temperature which is close to the wanted temperature.
Obviously, the exact temperature is given when Scontains all positive transitions. However,
it is generally not possible to consider all transitions.
We will not give a definitive description of the sampling procedure: we think that it
depends on the nature and the size of the problem that we are solving.
One can, for example, begin with a small value of S, compute the temperature, and
increase the number of transitions until the temperature becomes stable.
It is also possible to use the temperature T1of Eq. (7) to perform a first simulated annealing
plateau. All positive transitions considered during this plateau can be stored and then used
to compute a more accurate temperature using our algorithm.
Numerical experiments that will be presented in Section 3, are based, for each value
of S,onarandom generation of independent transitions. Notice that when we use the
transitions encountered during a plateau, transitions may not be independent.
2. Other properties
More properties of the acceptance probability are given in this section.
Proposition 2.1. Assuming Hypothesis 1.1is valid,then ˆχ(T)1
eT .
Proof: It was shown in Proposition 1.3 that ˆχ(T)=1
It implies that ˆχ(T)=1
Moreover, the function xxexp(x)isbounded by 1/e. Using this upper bound in
the previous approximation leads to the wanted result.
An important straightforward corollary dealing with the evolution of the acceptance
probability is given below.
Corollary 2.2. Assuming Hypothesis 1.1is valid,then ˆχ(T+T)ˆχ(T)1
Proof: A simple integration of the inequality ˆχ(T)1
eT gives the wanted result.
Using the fact that ln(1 +x)x, one can deduce that ˆχ(T+T)ˆχ(T)1
Corollary 2.2 implies that even if you divide the temperature by 2, you can not expect to
reduce the acceptance probability by more than ln(2)
It is also possible to use the previous corollary to have an indication about the number
of iterations of a classical simulated annealing with a geometric cooling schedule. As-
sume that the temperature is multiplied by α<1atthe end of each plateau. In most of
cases, the initial temperature is chosen such that the acceptance probability of positive
moves is equal to χ0. The stopping criterion can also be a low acceptance probability χf.
Using Corollary 2.2, one can easily show that the number of plateaux Nis higher than
Proposition 2.3. Assuming Hypothesis 1.1is valid,then the number of plateaux is higher
than e(χ0χf)
Assume, for example, that χ0=0.9, χf=0.05 and α=0.95. The number of plateaux
is then higher than 46. If α=0.99, we need more than 230 plateaux. More precisely, if
α=1where 1, then the number of plateaux is approximately higher than e(χ0χf)
Note that one of the advantages of the upper bound 1
eT given in Proposition 2.1 is its
independence with energy. However, this upper bound is bad for low temperatures. In fact,
one can easily see that ˆχ(T)C
T) where Cis a constant depending on the
energies and the transitions and is the difference between the smallest Emaxiand the
smallest Emini. This clearly implies that ˆχ(T)isapproximately equal to 0 when Tis close
to 0.
To finish our study of the acceptance probability, let us consider the second derivative
First, another simple lemma will be stated.
Lemma 2.4. Assuming Hypothesis 1.1is valid,then
i,jS,i<jexp Emini+Emin j
TEminiEmin jδiexp δi
Tδjexp δj
i,jSexp Emaxi+Emin j
This lemma can be easily proved using the same kind of arguments as those given to
prove the validity of Lemma 1.2.
Proposition 2.5. Assuming Hypothesis 1.1is valid,then the second derivative is given
T4iSexp Emaxi
iSexp Emini
Proof: A simple derivation of T2ˆχ(T) using Proposition 1.3 gives the following:
T2iSEmaxiexp Emaxi
TδijSexp Eminj
TiSEminiexp Emini
TδijSexp Emaxj
iSexp Emini
T2i,jS,i<jexp Emini+Eminj
TEminiEmin j(δiexp δi
Tδjexp δj
iSexp Emini
T2i,jSexp Emaxi+Eminj
iSexp Emini
Using Lemma 2.4, we obtain:
T2i,jSexp Emaxi+Emin j
iSexp Emini
T2iSexp Emaxi
iSexp Emini
Using again Proposition 1.3 gives the wanted result.
One can easily see that the expression given above is positive when Tis close to 0 and
negative when Tis sufficiently high.
Corollary 2.6. Assuming Hypothesis 1.1is valid,the probability to accept positive tran-
sitions is convex for low temperatures and concave for high temperatures.
Finally, we give here simple bounds for the second derivative ˆχ(T).
Corollary 2.7. Assuming Hypothesis 1.1is valid,then
T2(2 22) exp(22) ˆχ(T)1
T2(2 +22) exp(22).
Proof: Proposition 2.5 tells us that ˆχ(T)=1
T). Simple calculation
leads to:
T2iSexp Emini
Texp δi
iSexp Emini
T2iSexp Emini
iSexp Emini
where fdenotes the function xR+x(x2) exp(x). One can easily see that the
minimum of fis obtained for x=22 and the maximum is reached for x=2+
2. Thus, for any x0wehave(222) exp(22) f(x)(2 +22)
Combinations of these inequalities and the expression of ˆχ(T) leads to the wanted
Notice that (2 22) exp(22) ≈−0.462 and (2 +22) exp(22) 0.159.
3. Numerical experiments
To illustrate the results given in the previous sections, extensive numerical experiments are
carried out. Two kind of problems are considered: random problems and traveling salesman
problems (TSP).
3.1. Random problems
A random problem is represented by a symmetric graph G=(V,E) where Vis the set of
vertices corresponding with the solutions of the problem, and Eis the set of edges repre-
senting the neighborhood relationship. Each solution (vertex) has a random cost. Simulated
annealing is applied to find a minimum cost solution.
The graphs used to represent random problems are described using three parameters:
the number of vertices V, the graph density d=E
and an upper bound Ufor the
maximum degree. Two sets of problems are considered in this section where (V,d,U)=
(5 ×104,104,30) in the first case and (2 ×106,106,5) in the second one. Notice that
these problems are small and can be solved by enumeration. However, the aim of this
section is only to study the procedure proposed in this paper to compute the temperature of
a simulated annealing.
Due to space limitation, we only give a summary of the results: more details will be
provided on Kluwer’s web site.
We consider 4 different values of the number of samples S: 20, 100, 500 and 2500.
We also try 8 values of the acceptance probability χ0: 0.99, 0.9, 0.7, 0.5, 0.3, 0.1, 0.05 and
0.01. Positive transitions are randomly and independently generated. For each value of χ0
and Sthe algorithm of Section 1 is used to give a temperature. Convergence was always
obtained with p=1(Formula 6).
Simulated annealing is then applied using the given temperature, without any decrease,
to provide the experimental acceptance probability ¯χ.Wealso apply simulated annealing
using the temperature T1defined by Eq. (7) to obtain χ(T1) defined as the experimental
acceptance probability corresponding with T1. Recall that this temperature is commonly
used by simulated annealing practitioners.
All experiments are repeated 200 times (200 runs for each value of χ0and S). Results
are expressed in terms of average and standard deviation.
The ratios corresponding with Lemmas 1.2 and 2.4 are also considered here in order
to check the validity of our hypothesis. We also focus on the number of iterations of the
algorithm needed to compute the temperature. This number is null if the temperature T1
given by Eq. (7) obtained in Step 1(c) is the final result of the algorithm. The precision term
used in the algorithm is here equal to 103.
To summarize, we focus on the average and the standard deviation of the following quan-
tities: the experimental acceptance probability ¯χ, the experimental acceptance probability
obtained with T1of Eq. (7), the ratios corresponding with Lemmas 1.2 and 2.4 and the
number of iterations of the algorithm of Section 1.
First, the algorithm used to compute the temperature converges at each run. The average
and the standard deviation values corresponding with Lemmas 1.2 and 2.4 are generally
low. They are not null because Hypothesis 1.1 is not always valid. When both χ0and S
are very low, Lemma 1.2 does not seem to be satisfied. In fact, the denominator of the
fraction defined in Lemma 1.2 is close to 0 when Tis very low. Moreover, Hypothesis 1.1 is
unlikely to be satisfied when |S|is very small. Nevertheless, as previously said, convergence
is always obtained, even if Hypothesis 1.1 is not satisfied.
The number of iterations needed by the algorithm to achieve convergence is small for
high values of χ0.Infact, when χ0is high, temperature T1seems to be a good one. This is
shown by χ(T1) which is very close to χ0when χ0is high.
χ(T1) becomes far from χ0when χ0is low. We also observed that the standard deviation
of χ(T1) generally decreases when Sincreases. However, the average value of χ(T1) can
be considered as stable. This is due to the fact that T1is based on the average of the cost
variations. Although, ¯χis generally close to χ0more than χ(T1).
Moreover, our numerical experiments show that for high values of χ0,asmall value of
Scan be sufficient to obtain a temperature achieving the goal. However, if χ0is low, we
need higher values of S.
The difference between χ0and ¯χdecreases when the problem size decreases. The problem
size considered in the first case is smaller than the second problem size and the results are
slightly better: the standard deviation of ¯χis lower in the first case than in the second one.
Said another way, when the problem size islarger, we may need more transitions to compute
a temperature.
Table 1.T1=500 and p=1.
χ0.9352 0.5114 0.5007 0.5000
T500.0 48.3411 46.7738 46.6768
Table 2.T1=5 and p=1.
χ0.0461 0.2661 0.4687 0.4978 0.4999 0.5000
T5.0 22.1953 42.3929 46.3526 46.6492 46.6687
Table 3.T1=500 and p=2.
χ0.9352 0.8067 0.6818 0.5993 0.5536 0.5286 0.5152 0.5081
0.5043 0.5023 0.5012 0.5006 0.5003 0.5002 0.5000
T500.0 155.4690 86.5412 64.3252 55.2274 51.0096 48.9224 47.8533
47.2957 47.0020 46.8464 46.7639 46.72 46.6966 46.6842
Table 4.T1=5 and p=2.
χ0.0461 0.1141 0.2208 0.3291 0.4043 0.4482 0.4722 0.4852
0.4921 0.4958 0.4976 0.4988 0.4994 0.4997 0.4998 0.4999
T5.0 10.5345 18.6429 27.5214 34.8496 39.8314 42.8594 44.5904
45.5479 46.0682 46.3483 46.4984 46.5785 46.6213 46.6441 46.6562
Finally, some sequences of temperatures and acceptance ratios obtained by the algo-
rithm are given in Tables 1–4. We intend to compute a temperature corresponding with
an acceptance ratio χ0=0.5. Instead of using T1, defined in Eq. (7), we take T1=500 to
perform the experiments of Tables 1 and 3, and T1=5 for the experiments of Tables 2 and
4. The precision is here 104.Parameter pused in the recursive formula 6 is equal to 1 in
Tables 1 and 2. We take p=2inTables 3 and 4.
We can see that the convergence of the algorithm is slower for p=2 than for p=1. This
can be easily understood from formula 6. However, recall that when pincreases then the
derivative of the function TT(ln( ˆχ(T))
pis more likely to be positive (Section 1.1). Said
another way, as we know that Hypothesis 1.1 is not always valid, it may be more advisable
to take p>1. Although we did not need to take p>1tocompute the temperature in the
case of these random problems, we will see in the next section that this may be necessary
in very few of cases.
3.2. Traveling salesman problems
The algorithm of Section 1 is applied here for the traveling salesman problem. We consider
transitions based on the very classical 2-OPT moves (see, for example [9]). Two sets of
acceptance probability
Figure 1.Evolution of χ(T) for TSP(100).
randomly generated Euclidean instances are used: 20-city and 100-city problems. Notice
that these problems can now be solved by efficient cutting plane algorithms.
Three values of Sare considered: 20, 2500 and 62500. Eight values of χ0are used:
0.99, 0.9, 0.7, 0.5, 0.3, 0.1, 0.05 and 0.01. 200 experiments are performed for each value of
χ0and S.
Comments given in the previous subsection about random problems are still valid here.
However, we noticed that S=2500 was sufficient to give a good approximation of the
temperature in the previous case, but does not seem to be sufficient here for some values
of χ0.Inother words, when the size of problems increases, we need larger size samples to
obtain a good approximation of the temperature.
The procedure used to compute the temperature does not converge in about 1 run per
1000 when p=1. If p=2, the algorithm always converges. In fact, when the algorithm
is applied, it is easy to check whether there is an oscillation in terms of temperature. In this
case, we multiply pby 2 and we continue the algorithm.
The experimental acceptance ratio is plotted as a function of temperature. The graph of
figure 1 corresponds with the 100-city problem. This ratio is, as claimed in Corollary 2.6,
convex for low temperatures and concave for high temperatures.
Finally, we studied the experimental number of plateaux of simulated annealing when a
geometric cooling schedule (α=0.95) is used. Different values of the initial acceptance ratio
χ0and the final acceptance ratio χfare considered. The number of Plateaux is compared with
the lower bound of Proposition 2.3. This lower bound seems to be good for intermediate
acceptance ratios and bad for extremal acceptance ratios (either very low or very high
4. Conclusion
A simple algorithm is proposed to compute a temperature such that the acceptance ratio of
increasing cost moves is equal to a given value χ0.Wealso presented some properties of
the acceptance ratio.
We think that this algorithm can be used as a component of either classical or modern
simulated annealing schemes for which the cooling schedule is not necessarily monotonous.
The procedure proposed in this paper can be modified in different ways. First, the formula
linking Tnand Tn+1can be changed. Said another way, even if the algorithm is very fast, one
can find another formula allowing a faster convergence. Second, We assumed in this paper
that transitions are accepted in accordance with the Metropolis criterion. A further research
direction may consist in introducing some modifications and studying the convergence of
the algorithm when other acceptance probabilities are considered.
Finally, we considered the acceptance ratio of positive transitions. Although, we may want
to focus on the acceptance of all transitions. A similar algorithm allowing the computation
of a temperature that is compatible with a given acceptation probability of all transitions is
now under study.
Iwould like to thank an anonymous referee for his valuable comments.
1. E. Aarts, J. Korst, and P. van Laarhoven, “Simulated annealing,” in Local Search in Combinatorial Optimiza-
tion, E.H.L. Aarts and J.K. Lenstra (Eds.), John Wiley and Sons, Ltd., 1997, pp. 91–120.
2. E. Bonomi and J.-L. Lutton, “The N-city traveling salesman problem: Statistical mechanisms and the metropo-
lis algorithm,” SIAM Review, vol. 26, pp. 551–568, 1984.
3. V. Cerny, “A thermodynamical approach to the traveling salesman problem: An efficient simulated algorithm,”
Journal of Optimization Theory and Applications, vol. 45, pp. 41–51, 1985.
4. H. Cohn and M. Fielding, “Simulated annealing: Searching for an optimal temperature schedule,” SIAM J.
Optim, vol. 3, pp. 779–802, 1999.
5. L. Grover, “A new simulated annealing algorithm for standard cell placement,” in Proc. IEEE ICCAD-86,
Santa Clara, CA, 1986.
6. B. Hajek, “Cooling schedules for optimal annealing,” Math. Oper. Res., vol. 13, pp. 311–329, 1988.
7. D.S. Johnson, C.R. Aragon, L.A. McGeoch, and C. Schevon, “Optimization by simulated annealing: An
experimental evaluation; part I, graph partitioning, Operations Research, vol. 37, pp, 865–892, 1989.
8. D.S. Johnson, C.R. Aragon, L.A. McGeoch, and C. Schevon, “Optimization by simulated annealing: An
experimental evaluation; part II, graph coloring and number partitioning, Operations Research, vol. 39, pp.
378–406, 1991.
9. D.S. Johnson and L.A. McGeoch, “The traveling salesman problem: A case study,” in Local Search in
Combinatorial Optimization, E.H.L. Aarts and J.K. Lenstra (Eds.), John Wiley and Sons, Ltd., 1997, pp.
10. S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, pp.
671–680, 1983.
11. N.A. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller, “Equation of state calculations by
fast computing machines,” J. Chem. Phys, vol. 21, pp. 1087–1092, 1953.
12. D. Mitra, F. Romeo, and A.L. Sangiovanni-vincentelli, “Convergence and finite-time bahavior of simulated
annealing,” Advances in Applied Probability, vol. 18, pp. 747–771, 1986.
13. F. Romeo and A.L. Sangiovanni-Vincentelli, “A theoretical framework for simulated annealing, Algorith-
mica, vol. 6, pp. 302–345, 1991.
14. P. Van Laarhoven and E. Aarts, Simulated Annealing: Theory and Applications, D. Reidel Publishing Com-
pany, 1988.
15. J. Varanelli, “On the acceleration of simulated annealing, PhD Dissertation, University of Virginia, 1996.
16. S. White, “Concepts of scale in simulated annealing,” in Proc. IEEE Int. Conference on Computer Design,
Port Chester, 1984.
... A simple iterative algorithm was given in [15] to compute the optimal initial temperature w.r.t the required acceptance probability. In terms of "temperature scheme"(i.e. the evolution of λ t ), multiple schemes are available to be experimented, for example, linear cooling schedule, geometric cooling schedule, etc., according to a massive amount of literature [2]. The key idea is to raise the temperature in an appropriate pace so that the chain can access to "downhill" point for the global maximum. ...
... Based on the results, we inferred that the global maximum payoff after post-resolution adjustments could be around 217 in this example, which is reasonably consistent with the eventual optimal payoff in Figure 4. We ran and evaluated a few more cases to derive generalized results. Among the three candidates of temperature parameters, λ (1) t and λ (2) t have relatively better performances in the reaching value and efficiency, while λ (3) t always led to difficulty in overcoming local optimas. In spite of that, all parameters gave an optimal value between 215 and 220 when the process is iterated for a sufficiently large number of times (e.g. ...
Full-text available
The model of Network Coloring Game (NCG) is used to simulate conflict resolving and consensus reaching procedures in social science. In this work, we adopted some Markov Chain Techniques into the investigation of NCG. Firstly, with no less than $\Delta + 2$ colors provided, we proposed and proved that the conflict resolving time has its expectation to be $O(\log n)$ and the variance $O((\log n)^2)$, thus is $O_p(\log n)$, where $n$ is the number of vertices and $\Delta$ is the maximum degree of the network. This was done by introducing an absorbing Markov Chain into NCG. Secondly, we developed an algorithms to reduce the network in post-conflict-resolution adjustments when a Borda rule is applied among players. Markov Chain Monte Carlo methods were employed to estimate both local and global optimal payoffs. Supporting experimental results were given to illustrate the corresponding procedures.
... The initial temperature is crucial to the algorithm's ability to avoid local minima by accepting worse solutions in early iterations (Kirkpatrick et al., 1983). (Ben-Ameur, 2004) proposes a method to calculate the initial temperature that is consistent with the algorithm's acceptance ratio while accounting for all cooling rates. Single solution-based techniques like SA are difficult to use without a good starting point, and an unfeasible starting point will lead to failed searches (Shojaee. ...
Full-text available
Ride-pooling is an on-demand service that offers convenient and cheap mobility solutions to reduce traditional car traffic by pooling several trips together in a single-vehicle. Due to the involvement of humans, they are not as efficient. Researchers have conducted several simulation studies to examine and improve this current inefficient system with the help of MATSim. This open-source Java application is helping them simulate large-scale ride-pooling operations. However, the simulations did not consider operational challenges in the ride-pooling studies until transport specialists developed an extension that incorporated the inefficient human factors that form the driver shift (and break) schedule. It is shift (and break) schedules that determine drivers’ working hours. A driver’s active shift time is every working hour, while their non-working hours account for their breaks. The active shift serves as a Travel Supply to the system, serving ride-pooling requests from customers that impact the static Travel Demand. Due to the uncertainty of travel supply and demand at any given moment, ride-pooling is inefficient. In order to eliminate uncertainty, demand and supply must be balanced to maintain an optimal equilibrium. One can only manipulate the Travel Supply, not the Travel Demand to achieve such optimality. The proposed model aims to resolve this issue by optimizing the shift schedules of drivers to reduce excess demand of unserved rides by a heuristic algorithm. Furthermore, the model ensures no excess Travel Supply of driver schedules that could potentially increase operating costs. Following a comprehensive literature review, the Simulated Annealing algorithm was adopted as the heuristic algorithm in the model due to its various advantages, including its ability to provide a globally optimal solution, its guarantee of convergence, and the lack of complicated mathematical equations. Nevertheless, it raises the question of whether a heuristic algorithm like Simulated Annealing can optimize drivers’ shift (and break) schedules in ride-pooling services? Having analyzed the model results, this thesis model is discerned for its potential, strength, and weaknesses in answering the research question. The model seemed to produce promising results under certain parametric conditions, so it was concluded that the model and its algorithm have the potential to optimize driver shift (and break) schedules.
... However, the too-high value of T 0 results in more function calls. There are several ways of determining the T 0 initial value of the temperature parameter, which must perform the requirement that virtually (χ(T 0 ) ≥ 0.8) all the proposed transitions should be accepted [65,71,72]. Let m t denote the number of trials at a T t temperature value, and m t = m 1 + m 2 . ...
Full-text available
A design engineer has to deal with increasingly complex design tasks on a daily basis, for which the available design time is shrinking. Market competitiveness can be improved by using optimization if the design process can be automated. If there is limited information about the behavior of the objective function, global search methods such as simulated annealing (SA) should be used. This algorithm requires the selection of a number of parameters based on the task. A procedure for reducing the time spent on tuning the SA algorithm for computationally expensive, simulation-driven optimization tasks was developed. The applicability of the method was demonstrated by solving a shape optimization problem of a rubber bumper built into air spring structures of lorries. Due to the time-consuming objective function call, a support vector regression (SVR) surrogate model was used to test the performance of the optimization algorithm. To perform the SVR training, samples were taken using the maximin Latin hypercube design. The SA algorithm with an adaptive search space and different cooling schedules was implemented. Subsequently, the SA parameters were fine-tuned using the trained SVR surrogate model. An optimal design was found using the adapted SA algorithm with negligible error from a technical aspect.
... It is worth noting that the performance of SA algorithm strongly depends on a number of critical factors including the starting temperature T st , cooling schedule, initial parameter values and the search domain for the parameters, i.e., lower and upper bounds. Previous works showed that SA algorithm with a sufficiently high T st and a slow cooling schedule would be preferred to ensure a satisfactory solution [53,54]. To further decrease the complexity of the non-uniform iterative reconstructions, C 2,T is set to 293 K, while C 2,X is estimated to be 780 ppm. ...
Full-text available
We develop a novel mid-infrared CO2 absorption sensor exploiting spectrally blended features to characterize thermochemical non-uniformity of laminar premixed flames. A new algorithm for interpreting spectra with significantly blended features is proposed for single line-of-sight multi-transition absorption thermometry. A CO2 sensor covering eight absorption transitions near 2378.0 cm⁻¹ is demonstrated in a laminar premixed CH4/Air flame at an equivalence ratio of φ=1.0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi =1.0$$\end{document}. The average signal-to-noise ratio is 1293 with a measurement time of 1.0 s, and the estimated CO2 detection limit is 42.8 ppm at 1543 K with 6 cm pathlength. Computational fluid dynamics (CFD) simulation with reduced GRI 1.2 mechanism is performed for comparison. Spatially resolved distributions are obtained with the laser absorption spectroscopy (LAS) measurements, combined with postulated distribution from CFD simulation. The LAS measurements agree with the CFD simulation, with a central-zone temperature difference of less than 1.1% and CO2 concentration difference of less than 1.0%. Discrepancy is observed in the boundary layer region due to pronounced mixing with the ambient surroundings. The sensor developed provides a lead for general LAS sensor design (blended absorption features, ambient interference, or under optically thick conditions), and can serve for practical combustion sensing.
Full-text available
The conventional layout problem is concerned with finding the arrangements of components inside the container to optimize objectives under geometrical constraints, i.e., no component overlap and no container protrusion. In this paper, the multi-objective optimization for layout balance and component activity requirements with functional constraints is developed. Integrating the accessibility of components as functional constraints ensures components maintenance or proper operation. However, addressing the functional constraints increase the complexity of the layout optimization. A novel multi-objective optimization algorithm is proposed using the constructive placement and the simulated annealing to search for compromised solutions between the two objectives. Thereafter, a similarity indicator is defined to evaluate how similar optimized layout designs are. The experiments indicate that the proposed optimization approach performs well in ensuring accessibility and efficiently finding high-qualified solutions, where the constructive placement largely contributes to the search for alternatives satisfying constraints.
In the vast majority of the published article on flexible job shop scheduling problems (FJSP), machines are the only resources with limited capacities. There are also a sizable number of research articles in FJSP in which workers (machine operators) are constraining resources in addition to machines. In those articles, a worker performs the sequential steps of the production process and must stay with a machine. However, we argue that with the increasing adoption of numerically controlled machines with self-controlling capabilities, operators become machine tenders rather than individuals performing the sequential steps of the production process. Hence, the assumption of machine tenders as constraining resources that can result in the idling of expensive numerical controlled machines cannot be justified. Moreover, the replacement of machine tenders with automation and robotics increasingly becomes standard practice. In contrast, skilled setup operators remain critical and constraining resources since automating their tasks cannot be easily achieved. This paper proposed a mathematical model for a new setup operator constrained flexible job shop scheduling problem (SOC-FJSP) where setup operations are assumed to be anticipatory (detached). Contrary to a machine tender, a setup operator needs to stay with the machine only while performing setup. Once setup is completed, a setup operator becomes free and available to perform setup operations on a different machine. The assumption of a setup being detached from operations allows the overlapping of a setup operation of a job with the setup and processing of its preceding operation, enabling makespan reduction and better utilization of machine tools and setup operators. To solve the proposed mathematical model, we develop a simulated annealing (SA) algorithm. We further expand the model and the algorithm to account for sequence-dependent setup time and workload balancing among the setup operators. Extensive numerical studies were conducted to illustrate the various attributes of the proposed mathematical model and the convergence behavior of the proposed algorithm.
Full-text available
In any N-city travelling salesman problem there are (N-1)!/2 possible tours. We use the Metropolis algorithm to generate a sequence of such tours. This sequence may be viewed as the random evolution of a physical system in contact with a heat-bath. As the temperature is lowered, the tours gererated approach the optimal tour. It appears for large N one arrives within a few percent of the optimal solution in better than quadratic time.
Conference Paper
Full-text available
Simulated Annealing is a randomized algorithm which has been proposed for finding globally optimum least-cost configurations in large NP-complete problems with cost functions which may have many local minima. A theoretical analysis of Simulated Annealing based on its precise model, a time-inhomogeneous Markov chain, is presented. An annealing schedule is given for which the Markov chain is strongly ergodic and the algorithm converges to a global optimum. The finite-time behavior of Simulated Annealing is also analyzed and a bound obtained on the departure of the probability distribution of the state at finite time from the optimum. This bound gives an estimate of the rate of convergence and insights into the conditions on the annealing schedule which gives optimum performance.
Simulated annealing is a randomized algorithm which has been proposed for finding globally optimum least-cost configurations in large NP-complete problems with cost functions which may have many local minima. A theoretical analysis of simulated annealing based on its precise model, a time-inhomogeneous Markov chain, is presented. An annealing schedule is given for which the Markov chain is strongly ergodic and the algorithm converges to a global optimum. The finite-time behavior of simulated annealing is also analyzed and a bound obtained on the departure of the probability distribution of the state at finite time from the optimum. This bound gives an estimate of the rate of convergence and insights into the conditions on the annealing schedule which gives optimum performance.
A sizable part of the theoretical literature on simulated annealing deals with a property called convergence, which asserts that the simulated annealing chain is in the set of global minimum states of the objective function with probability tending to 1. However, in practice, the convergent algorithms are considered too slow, whereas a number of nonconvergent ones are usually preferred. We attempt a detailed analysis of various temperature schedules. Examples will be given of when it is both practically and theoretically justified to use boiling, fixed temperature, or even fast cooling schedules which have a small probability of reaching global minima. Applications to traveling salesman problems of various sizes are also given.
This is a preliminary version of a chapter that appeared in the book Local Search in Combinatorial Optimization, E. H. L. Aarts and J. K. Lenstra (eds.), John Wiley and Sons, London, 1997, pp. 215-310. The traveling salesman problem (TSP) has been an early proving ground for many approaches to combinatorial optimization, including clas- sical local optimization techniques as well as many of the more recent variants on local optimization, such as simulated annealing, tabu search, neural networks, and genetic algorithms. This chapter discusses how these various approaches have been adapted to the TSP and evaluates their relative success in this perhaps atypical domain from both a theoretical and an experimental point of view.
Simulated annealing is a powerful general-purpose optimization technique, based on the annealing process that is used for crystallization in physical systems. The author shows how the probabilistic nature of the algorithm can be exploited to reduce the computer time the algorithm takes by doing cost calculations approximately instead of exactly. This idea has been used to design a standard cell placement algorithm in LTX2, a VLSI layout system, and it yields a 20-40% improvement in routing area as compared to a min-cut placement algorithm. Preliminary comparisons show that the program compares favorably (3 to 5 times faster for results of the same quality) with other simulated annealing placement programs.
Simulated annealing is a powerful technique for finding near‐optimal solutions to NP‐complete combinatorial optimization problems. In this technique, the states of a physical system are generalized to states of a system being optimized, the physical energy is generalized to the function being minimized, and the temperature is generalized to a control parameter for the optimization process. Wire length minimization in circuit placement is used as an example to show how ideas from statistical physics can elucidate the annealing process. The mean of the distribution of states in energy is a maximum energy scale of the system, its standard deviation defines the maximum temperature scale, and the minimum change in energy defines the minimum temperature scale. These temperature scales tell us where to begin and end an annealing schedule. The ‘‘size’’ of a class of moves within the state space of the system is defined as the average change in the energy induced by moves of that class. These move scales are related to the characteristic temperature scales of a system, and show that a move class should be used when it gives an average change in energy on the order of the temperature. This, in turn, helps improve the performance of the algorithm.
From the Publisher:In the past three decades local search has grown from a simple heuristic idea into a mature field of research in combinatorial optimization. Local search is still the method of choice for NP-hard problems as it provides a robust approach for obtaining high-quality solutions to problems of a realistic size in a reasonable time. This area of discrete mathematics is of great practical use and is attracting ever increasing attention. The contributions to this book cover local search and its variants from both a theoretical and practical point of view, each with a chapter written by leading authorities on that particular aspect. This book is an important reference volume and an invaluable source of inspiration for advanced students and researchers in discrete mathematics, computer science, operations research, industrial engineering and management science.