A low power scheduling method using dual Vdd and dual Vth
ABSTRACT As technology scales down to nanometer dimensions, static power consumption has become more and more important. We propose a low power method to manage power consumption; it considers dual supply voltage (V_{dd}) and dual threshold voltage (V_{th}) at the same time to deal with the scheduling problem in the behavioral synthesis stage. A flexible design space of power, and a better performance can be achieved when we use the proposed method. An algorithm combining GA (genetic algorithm) and SA (simulated annealing) is used to solve the scheduling problem. Experimental results illustrate 41.6% power reduction on average.

Conference Paper: Low power scheduling method using multiple supply voltages.
[Show abstract] [Hide abstract]
ABSTRACT: In this paper, we propose a method to solve the multiple supply voltage scheduling problem which is to assign the operational nodes of a control/data flow graph to a voltage level to minimize the average power consumption within a given computation time. Different from the previous researches focused on the operational nodes in the critical path and utilized the slack time to change the voltage of other nodes, our method can deal with all nodes without considering whether the node is in the critical path or not, and the benefit is that the voltage assignment of each node becomes more flexible. The proposed method consists of two phases, the scheduling phase and the adjusting phase, and considers both the power (delay) of the computational components and the power (delay) of the level shifters. Experimental result shows that using three voltages on a number of standard benchmarks, an average power saving of 34.23% can be obtained if the delay overhead is set as 0, and 48.07% can be obtained if the total delay is set as 1.6 times of the original delayInternational Symposium on Circuits and Systems (ISCAS 2006), 2124 May 2006, Island of Kos, Greece; 01/2006
Page 1
A Low Power Scheduling Method using Dual Vddand Dual Vth
KunLin Tsai1
Department of EE1
National Taiwan University
Taipei 106, Taiwan.
Email: kunlin@orchid.ee.ntu.edu.tw
SzuWei Chang2and Feipei Lai1,2
Department of CSIE2
National Taiwan University
Taipei 106, Taiwan.
Email: flai@ntu.edu.tw
ShanqJang Ruan3
Department of ET3
National Taiwan University
of Science and Technology
Taipei 106, Taiwan.
Email: sjruan@et.ntust.edu.tw
Abstract—As the technology scales down to the nanometer
dimensions, the static power consumption has become more
and more important. To manage the power consumption, in
this paper, we propose a low power method, which considers
the dual supply voltage (Vdd) and the dual threshold voltage
(Vth) at the same time, to deal with the scheduling problem
in the behavioral synthesis stage. A flexible design space of
power, and a better performance can be achieved when we use
the proposed method. A combined algorithm of GA (Genetic
Algorithm) and SA (Simulated Annealing) is used to solve the
scheduling problem. Experimental results illustrate 41.6% power
reduction on average.
I. INTRODUCTION
In recent years, the power consumption of a chip has
become a very important issue, especially for SoC design. It
is obvious that in the next decade low power design would be
a big challenge for the IC design companies [1]. Among lots
of design methods, the most effective way to reduce power
consumption is to lower the supply voltage (Vdd) of a circuit.
Reducing the supply voltage, however, increases the circuit
delay. A solution method is using the dual or the multiple
supply voltages.
The dual Vddmethod was used on every level of low power
circuit design, such as behavioral level [2], [3] and gate level
[4]. However, taking only the supply voltage into account is
not enough. In deep submicron design, the leakage power
consumption is also a very important issue. In [5], [6], they
proposed the dual threshold voltage (Vth) to tackle with the
leakage power optimization problem. Although the leakage
power is greatly reduced, the total power consumption still
hang in the balance. Hence, some papers proposed the design
method of using dual Vddand dual Vthat the same time on
gate level to further reduce power dissipation [7] and circuit
level [8].
The work presented in this paper focuses on high level
power optimization. We address the problem of scheduling
a dataflow graph, for the case when the resources operate at
dual supply voltage and dual threshold voltage. An algorithm
which combines genetic algorithm with simulated annealing
was used to assign the voltage for each node of the CDFG. The
contributions of this paper are 1) take both dynamic power and
static power consumption into account; 2) a novel application
of genetic algorithm based simulated annealing algorithm is
used in high level synthesis.
II. SIMULATED ANNEALING AND GENETIC ALGORITHM
The goal of high level synthesis is to map the high level
descriptions to hardware structures that meet the design con
straints such as area, latency, and power consumption. There
are many algorithms available in the high level synthesis, and
the simulated annealing (SA) and genetic algorithm (GA) are
two of them.
Simulated annealing [9], SA in short, is an optimization
technique which is naturally motivated by the process of
annealing. Simulated annealing starts with a high temperature
T. By applying a neighborhood operation, a current state i
(with energy Ei) may change to the state j (with energy Ej),
when Ej< Ei. If Ej> Ei, the state i is replaced by the state
j with probability e(Ei−Ej)/T. The process is repeated with a
new state, and a lower temperature comes from the cooling
function until the temperature is smaller than the termination
temperature Tf.
The genetic algorithm (GA) is a method which explores
the design space to find a local optimal solution. The genetic
algorithm consists of four steps: crossover, mutation, natural
selection, and survival of the fitness. The detailed descriptions
can be found in [10].
Both of the above algorithms can be used to solve the
optimization problem. Generally, the process of simulated
annealing is hard to parallelize, but the genetic algorithm is
a naturally parallel algorithm. However, the genetic algorithm
is hard to converge to a good result. The proposed GASA
(Genetic Algorithm based Simulated Annealing) algorithm
inherits strengths from both GA and SA, and gets rid of the
disadvantages of them. The GASA can be easily implemented
in parallel. By parallelizing the algorithm, several machines
can be gathered to speed up the computing time. Besides, the
design space can be explored by performing neighborhood
operation from SA.
III. LOW POWER SCHEDULING WITH GASA
In this section, we show our GASA (Genetic Algorithm
based Simulated Annealing) scheduling method. First, we
introduce the data flow of the GASA algorithm. Secondly, we
talk about the dual Vddand dual Vthlibrary. And thirdly, the
chromosome representation and some GASA operations and
parameters are introduced in detail. Finally, an example of the
GASA scheduling is illustrated.
6840780388348/05/$20.00 ©2005 IEEE.
Page 2
Scheduler
CDFG &
parameters
Library
Schedule
Result
Replace two
parents with
trial winners
Select two
individuals
(Two parents)
Crossover &
Mutation
Evaluation
Boltzmann
Trial
Arrive Tf?
Initiate first
generation
Trial
Winner
All
individuals
Power &
Delay
Yes
No
Output
Two parents &
two children
All individuals
include
trial winners
Fig. 1.The GASA scheduling algorithm flowchart.
A. Data flow of GASA
The GASA algorithm runs with several simulated annealing
processes in parallel. The mutation operation in GA is ana
logical to the neighborhood operation in SA, and crossover
operation represents the role of recombining independent
solutions. Before we come to the GASA flow, one term must
be defined first.
Definition: A Boltzmann trial is defined as a compe
tition between states i and j, and the probability of
state i wins the competition is 1/(1 + e(Ei−Ej)/T).
Here, e is the natural constant, and Eiand Ejdenote
as the energy of state i and state j respectively. T
represents the temperature in the SA algorithm.
By the definition, the energy Eiand Ej represent the power
delay products of the scheduling results. If the powerdelay
product of state i is smaller than that of state j, then we
define Ei< Ej. What should be noted is that if temperature
T is large enough, then the next state j will be accepted even
the energy of j is larger than the energy of i. By using the
Boltzmann trial, the SA uphill operation can be presented in
our scheduling algorithm.
The data flow of the GASA algorithm is shown in Fig. 1.
In this flow, the inputs are the CDFG (Control/Data Flow
Graph) and some parameters. The main scheduler assigns
different Vdd and Vth to each node in the CDFG. Thus, a
library which consists of several dual Vdd/Vthcomponents is
necessary to the scheduler. At the beginning of the scheduler, it
brings out the first generation, and generates many individuals.
Then, it randomly selects two individuals as the parents, and
performs the crossover and mutation operations to generate
two children. After that, the scheduler evaluates the power
and delay of two parents and two children, and decides the
Boltzmann trial winner by Definition 1. If the temperature
cools down to the Tf (terminated temperature), it will output
the trial winner, else it will continue the GASA loop.
In our scheduling algorithm, we try to recombine the results
of each individual rather than just randomly generate a new
TABLE I
AN EXAMPLE OF TECHNOLOGY LIBRARY WITH DUAL VddAND DUAL Vth.
Multiplier
Vdd H/ Vth H
561.6
18
Vdd L/ Vth H
273.5
25
Vdd H/ Vth L
754.7
17
Vdd L/ Vth L
365.3
24
Power
Delay
Power
Delay
(Power: mW) (Delay: control cycle time)
TABLE II
CHROMOSOME REPRESENTATION
One individual
n1
0
L / H
n2
1
···
···
···
nk
1
L / L
Relative CS
instance (Vdd/Vth)
H / L
individual. By combining the results of each individual, we
can improve the convergence speed. The recombining phase
will select two parents from the selection pool, and produce
two children. The two children may have some essential
parts of genes that make the fitness of the children better or
worse than that of their parents. Here, the “essential” parts of
genes represent those nodes belonging to the critical paths, or
reducing large amount of power if we choose another Vddor
Vth.
B. Dual Vdd/ Vthlibrary
An essential component of the GASA algorithm is the cell
library, in which each cell has four instance types, as shown
in Table I. Table I shows the power and delay of a multiplier
with different Vddand Vth. In Table I, Vdd Hrepresents a cell
with high supply voltage. Similarly, Vth Lmeans a cell with
low threshold voltage. In order to simplify the calculation, the
delay of each instance type is set as the number of control
cycles rather than the actual delay time.
C. Individual and Chromosome Representation
A suitable chromosome representation is needed to repre
sent the individual in the GASA scheduling algorithm, since
it affects the running time of the algorithm. The chromo
some representation must include the information of supply
voltage, threshold voltage and control cycles. Table II shows
the chromosome representation in our GASA algorithm. In
Table II, ni means the ithnode in the data flow graph, the
relative cs of nishows the number of control steps between the
starting control step of niand the maximum occupied control
step of all preceding nodes of ni. The instance records what
kind of resources allocated to this operation; H indicates the
high voltage and L indicates the low voltage. By checking
the instance field of one node, we can look up the power
consumption,area cost, and delay information from the library.
Each column in the table represents a chromosome. If there are
k nodes in the CDFG, this individual needs k chromosomes
to represent itself.
685
Page 3
Parent A
Parent B
Child A
Child B
Node
Relative CS
Instance ( Vdd / Vth)
L / L
L / HH / H H / L
N1
0
N2
1
N3
1
H / L
N4
3
N5
0
H / H
N6
1
Node
Relative CS
Instance ( Vdd / Vth)
H / L
H / HL / HL / L
N1
1
N2
2
N3
0
L / H
N4
3
N5
1
L / L
N6
1
Node
Relative CS
Instance ( Vdd / Vth)
H / L
H / H
N1
1
N2
2
N3
0
L / H
N4
3
H / H
N5
0
H / H
N6
1
H / L
Node
Relative CS
Instance ( Vdd / Vth)
L / HL / L
N1
0
N2
1
N3
1
H / L
N4
3
N5
1
L / L
N6
1
L / LL / H
Fig. 2.An example of one point crossover operation.
D. GASA Operations and Parameters
1) Mutation operation: While performing the mutation op
eration on one individual, we will choose some chromosomes
to mutate their values by a specific mutation rate. For example,
if there are k nodes in one individual, and the mutation rate
is Pm. We will choose k×Pmchromosomes to mutate, while
randomly changing the genes (Relative CS and instance).
Through mutation operation, some variants of one individual
can be produced to explore the neighbors of the current
position in design space. Later we shall give a discussion on
the mutation rate Pm.
2) Crossover operation: Crossover is a kind of recombi
nation operation. Fig. 2 shows an example of the crossover
operation. In the GASA scheduling, we adopt one point
crossover operation which will exchange the right half part
of two individuals to each other.
3) Population size: The population size is one of the major
control parameters of the GASA. Generally, the larger of the
population size, the better result we can obtain. However, the
larger size of the population also requires the larger amount
of memory and takes the longer operation time.
4) cooling procedure: The cooling procedure in our GASA
scheduling is to multiply the current temperature by a cooling
constant CC (0 < CC < 1). If CC is set with a large
value, the temperature would reduce slowly and it would
produce large generations. In generally, the population size
and the cooling constant both affect the optimization gain and
the computing time. The designer should tune both of these
parameters to meet the design constraints.
5) Mutation rate: The value of mutation rate will affect the
difference between parents and children. If the mutation rate is
too large, some good chromosomes will be annihilated. If the
mutation rate is too small, the resemblance between parents
and children will be too close. Therefore, it will result in a
local minimal solution. In our algorithm, we set the mutation
rate as 20%. A larger mutation rate is set if the result seems
to fall in a local minimal value.
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Source
Sink
(c)
????
+
?
+
?
–
–
1
2
3
4
5
6
7
8
9
10
?
?
?
?
+
??
+
–
–
Source
Sink
(a)
(b)
Node
instance L/H
(Vdd / Vth)
L/H
L/H
L/H
L/H
L/H
L/L
L/H
H/H
H/H
123456789 10
Control Step
Fig. 3.
of each node. (c) Scheduling result.
An example of GASA scheduling. (a) Original DFG. (b) Vdd/Vth
6) Temperature: The last two parameters are the starting
temperature Ts and terminating temperature Tf. We set the
terminating temperature as 0.1. The starting temperature Ts
is set by the mathematical method. Ts =
Ej and Ei is the initial energy of state i and j, and k is
the probability of Eilarger than Ej. The starting temperature
influences the convergence speed and the accepting rate of the
parent individuals at the beginning of the optimization process.
Ej−Ei
−ln(1
k−1), where
E. Example of GASA Scheduling
Suppose we have a library which consists of several compo
nents. Each component was implemented with four different
kinds of the Vdd / Vth combination, as shown in Table I.
The input file is the DFG, as shown in Fig. 3.(a). At the
beginning of the scheduling, each node was assigned with
different Vddand Vth. After many loops, the scheduling result
was shown in Fig. 3.(c), and the Vdd / Vth assignment was
shown in Fig. 3.(b). Note that the different instances used in
the scheduling process influence the final delay and the power
consumption of whole design.
The goal of GASA is to minimize the power and delay
penalty of a system. Through our GAbased simulated an
nealing approach, the nearly optimal solution can be achieved
in the tolerable processing time.
IV. EXPERIMENTAL RESULT
To show the effectiveness of our method, we compare
the power consumption and delay overhead among three
scheduling algorithms. The first is the ASAP (As Soon As
Possible) scheduling method. The second is the dual Vddonly
scheduling method, and the third is the proposed dual Vddand
dual Vth scheduling method. A dual Vdd/Vth library, which
contains several components such as multiplier and adder, is
used for low power scheduling. In this library, each component
was designed by TSMC 0.18µm process. The Vdd H was 1.8
V, Vdd Lwas 1.26 V, Vth Hwas 0.558 V, and Vth Lwas 0.458
V.
The experimental result is shown in Table III. Six CDFG
benchmarks are used to examine the scheduling algorithms.
It is clearly that the proposed dual Vdd/Vth can obtain the
686
Page 4
TABLE III
EXPERIMENTAL RESULT
As Soon As Possible (ASAP) Scheduling
PowerDelay
5199.454
9979.5 109
16436.8 72
13669.3 144
17023.8125
28177.3135
(Power: mW) (Delay: number of control cycles)
Dual VddScheduling
Delay
3079.062
5435.7 130
10941.083
8519.8158
10576.0 150
17834.9151
Proposed Dual Vdd/ VthScheduling
PowerDelay
2300.264
5797.9 112
933284
7656.9156
9485.1150
14814.2145
Benchmark
diffeq
fir11
dct
iir7
wdf7
nc
Power * Delay
2807677.6
1087765.5
1183449.6
1968379.2
2127975
3803935.5
PowerPower * Delay
190898
706641
908103
1346128.4
1586400
2693069.9
Power * Delay
147212.8
649364.8
783888
1194476.4
1422765
2148059
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
diffeqfir11dct
benchmark
iir7wdf7nc
Power saving
Delay overhead
Fig. 4.
scheduling and dual vdd/vthscheduling.
A comparison of power saving and delay overhead between ASAP
best powerdelay product, and it also shows the proposed
method can obtain about 46.1% power saving and 12.3%
delay overhead when compare with the ASAP scheduling
method. Fig. 4 shows the comparison of power saving and
delay overhead between ASAP scheduling and dual vdd/vth
scheduling methods. In this figure, the solid lines represent the
dual vdd/vthscheduling relative to the ASAP method.
We also show the convergence result of diffeq benchmark.
Fig. 5.(a) and (b) are the power and delay convergence result
by using dual Vddonly scheduling method. Fig. 5.(c) and (d)
indicate the power and delay convergence result by using dual
Vddand dual Vthscheduling method.
The experimental result shows that we can get further power
reduction when using a dual Vdd and dual Vth library with
limited delay overhead. It means that the proposed method has
a tradeoff between power consumption and delay. In this al
gorithm, most controlling parameters can be set automatically
to reduce the designers’ load. More constraints can be added
to this algorithm such as area constraints, and the additional
penalty so that it can provide more flexible design space.
V. CONCLUSIONS
In this paper, a dual Vdd / Vth scheduling method is
proposed for low power high level synthesis. In the proposed
method, the dynamic and static power consumption are con
sidered simultaneously. By using the GASA (genetic algorithm
based simulated annealing) algorithm, each node in the CDFG
(Control Data Flow Graph) is assigned with either a high or
low Vdd / Vth to achieve the low power goal and to control
the computing time. The experimental result shows that our
2700
2900
3100
3300
3500
3700
3900
20.5
13.6
9
5.96
3.952.611.73 1.150.76
0.5
0.33
0.22 0.15
Tempture
Power
60
61
62
63
64
65
66
20.513.9 9.376.334.28
2.891.951.320.89
0.6
0.410.280.19
0.13
Tempture
(b)
Control Step
2000
2500
3000
3500
4000
4500
13.32
8.48 5.39
3.43 2.18 1.39
0.88 0.56 0.36 0.23
0.14
Tempture
Power
56
58
60
62
64
66
68
13.59
8.82
5.73 3.72 2.41
1.57 1.02
0.66 0.43
0.28 0.18
0.12
Tempture
Control Step
(a)
(c) (d)
Fig. 5.
Vdd. (b) delay result with dual Vdd. (c) power result with dual Vdd/Vth. (d)
delay result with dual Vdd/Vth.
Convergence result of diffeq benchmark. (a) power result with dual
method is feasible. The contribution of this paper is that the
GASA method can be used on multiple Vdd/ Vthscheduling
and takes both power and performance into account at the
same time.
REFERENCES
[1] http://public.itrs.net/
[2] M. A. Elgamel, and M. A. Bayoumi, “On low power high level synthesis
using genetic algorithms,” in IEEE Proc. of ICECS 2002, vol. 2, pp.
725–728, Sept. 2002.
[3] S. P. Mohanty, and N. Ranganathan, “A framework for energy and
transient power reduction during behavioral synthesis,” IEEE Trans. on
VLSI system, vol. 12, No. 6, pp. 562–572, June 2004.
[4] K. Usami, and M. Igarashi, “Lowpower design methodology and
application utilizing dual supply voltage,” in IEEE Proc. of ASPDAC,
pp. 123–128, Jan. 2000.
[5] D. Samanta, and A. Pal, “Synthesis of dualV/sub T/ dynamic CMOS
circuits,” in Proc. of VLSI Design 2003, pp. 303–308, Jan. 2003.
[6] K. S. Khouri, and N.K.Jha, “Leakage Power Analysis and Reduction
During Behavioral Synthesis,” IEEE Trans. on VLSI Systems, vol. 10,
No. 6, pp. 876–885, Dec. 2002.
[7] S. Augsburger, and B. Nikoli´ c, “Combing dualsupply, dual threshold
and transistor sizing for power reduction,” in IEEE Proc. of ICCD’02,
pp. 316–321, Sept. 2002.
[8] A. Srivastava, and D. Sylvester, “Minimizing total power by simulaneous
Vdd/Vthassignment,” IEEE Trans. on CAD , vol. 23, No. 5, pp. 665–
677, May 2004.
[9] P.J.M. van Laarhoven and E.H.L. Aarts, Simulated annealing : theory
and applications, Kluwer Academic Publishers, 1987.
[10] D. E. Goldberg, Genetic algorithm in search, optimization, and machine
learning, AddisonWesley, 1989.
687
View other sources
Hide other sources
 Available from ShanqJang Ruan · May 21, 2014
 Available from edu.tw