### Full-text

Available from: Shanq-Jang RuanA Low Power Scheduling Method using Dual V

dd

and Dual V

th

Kun-Lin Tsai

1

Department of EE

1

National Taiwan University

Taipei 106, Taiwan.

Email: kunlin@orchid.ee.ntu.edu.tw

Szu-Wei Chang

2

and Feipei Lai

1,2

Department of CSIE

2

National Taiwan University

Taipei 106, Taiwan.

Email: ﬂai@ntu.edu.tw

Shanq-Jang Ruan

3

Department of ET

3

National Taiwan University

of Science and Technology

Taipei 106, Taiwan.

Email: sjruan@et.ntust.edu.tw

Abstract— As the technology scales down to the nanometer

dimensions, the static power consumption has become more

and more important. To manage the power consumption, in

this paper, we propose a low power method, which considers

the dual supply voltage (V

dd

) and the dual threshold voltage

(V

th

) at the same time, to deal with the scheduling problem

in the behavioral synthesis stage. A ﬂexible design space of

power, and a better performance can be achieved when we use

the proposed method. A combined algorithm of GA (Genetic

Algorithm) and SA (Simulated Annealing) is used to solve the

scheduling problem. Experimental results illustrate 41.6% power

reduction on average.

I. INTRODUCTION

In recent years, the power consumption of a chip has

become a very important issue, especially for SoC design. It

is obvious that in the next decade low power design would be

a big challenge for the IC design companies [1]. Among lots

of design methods, the most effective way to reduce power

consumption is to lower the supply voltage (V

dd

) of a circuit.

Reducing the supply voltage, however, increases the circuit

delay. A solution method is using the dual or the multiple

supply voltages.

The dual V

dd

method was used on every level of low power

circuit design, such as behavioral level [2], [3] and gate level

[4]. However, taking only the supply voltage into account is

not enough. In deep sub-micron design, the leakage power

consumption is also a very important issue. In [5], [6], they

proposed the dual threshold voltage (V

th

) to tackle with the

leakage power optimization problem. Although the leakage

power is greatly reduced, the total power consumption still

hang in the balance. Hence, some papers proposed the design

method of using dual V

dd

and dual V

th

at the same time on

gate level to further reduce power dissipation [7] and circuit

level [8].

The work presented in this paper focuses on high level

power optimization. We address the problem of scheduling

a data-ﬂow graph, for the case when the resources operate at

dual supply voltage and dual threshold voltage. An algorithm

which combines genetic algorithm with simulated annealing

was used to assign the voltage for each node of the CDFG. The

contributions of this paper are 1) take both dynamic power and

static power consumption into account; 2) a novel application

of genetic algorithm based simulated annealing algorithm is

used in high level synthesis.

II. SIMULATED ANNEALING AND GENETIC ALGORITHM

The goal of high level synthesis is to map the high level

descriptions to hardware structures that meet the design con-

straints such as area, latency, and power consumption. There

are many algorithms available in the high level synthesis, and

the simulated annealing (SA) and genetic algorithm (GA) are

two of them.

Simulated annealing [9], SA in short, is an optimization

technique which is naturally motivated by the process of

annealing. Simulated annealing starts with a high temperature

T . By applying a neighborhood operation, a current state i

(with energy E

i

) may change to the state j (with energy E

j

),

when E

j

< E

i

. If E

j

> E

i

, the state i is replaced by the state

j with probability e

(E

i

−E

j

)/T

. The process is repeated with a

new state, and a lower temperature comes from the cooling

function until the temperature is smaller than the termination

temperature T

f

.

The genetic algorithm (GA) is a method which explores

the design space to ﬁnd a local optimal solution. The genetic

algorithm consists of four steps: crossover, mutation, natural

selection, and survival of the ﬁtness. The detailed descriptions

can be found in [10].

Both of the above algorithms can be used to solve the

optimization problem. Generally, the process of simulated

annealing is hard to parallelize, but the genetic algorithm is

a naturally parallel algorithm. However, the genetic algorithm

is hard to converge to a good result. The proposed GASA

(Genetic Algorithm based Simulated Annealing) algorithm

inherits strengths from both GA and SA, and gets rid of the

disadvantages of them. The GASA can be easily implemented

in parallel. By parallelizing the algorithm, several machines

can be gathered to speed up the computing time. Besides, the

design space can be explored by performing neighborhood

operation from SA.

III. LOW POWER SCHEDULING WITH GASA

In this section, we show our GASA (Genetic Algorithm

based Simulated Annealing) scheduling method. First, we

introduce the data ﬂow of the GASA algorithm. Secondly, we

talk about the dual V

dd

and dual V

th

library. And thirdly, the

chromosome representation and some GASA operations and

parameters are introduced in detail. Finally, an example of the

GASA scheduling is illustrated.

6840-7803-8834-8/05/$20.00 ©2005 IEEE.

Page 1

Scheduler

CDFG &

parameters

Library

Schedule

Result

Replace two

parents with

trial winners

Select two

individuals

(Two parents)

Crossover &

Mutation

Evaluation

Boltzmann

Trial

Arrive T

f

?

Initiate first

generation

Trial

Winner

All

individuals

Power &

Delay

Yes

No

Output

Two parents &

two children

All individuals

include

trial winners

Fig. 1. The GASA scheduling algorithm ﬂowchart.

A. Data ﬂow of GASA

The GASA algorithm runs with several simulated annealing

processes in parallel. The mutation operation in GA is ana-

logical to the neighborhood operation in SA, and crossover

operation represents the role of recombining independent

solutions. Before we come to the GASA ﬂow, one term must

be deﬁned ﬁrst.

Deﬁnition: A Boltzmann trial is deﬁned as a compe-

tition between states i and j, and the probability of

state i wins the competition is 1/(1 + e

(E

i

−E

j

)/T

).

Here, e is the natural constant, and E

i

and E

j

denote

as the energy of state i and state j respectively. T

represents the temperature in the SA algorithm.

By the deﬁnition, the energy E

i

and E

j

represent the power-

delay products of the scheduling results. If the power-delay

product of state i is smaller than that of state j, then we

deﬁne E

i

< E

j

. What should be noted is that if temperature

T is large enough, then the next state j will be accepted even

the energy of j is larger than the energy of i. By using the

Boltzmann trial, the SA uphill operation can be presented in

our scheduling algorithm.

The data ﬂow of the GASA algorithm is shown in Fig. 1.

In this ﬂow, the inputs are the CDFG (Control/Data Flow

Graph) and some parameters. The main scheduler assigns

different V

dd

and V

th

to each node in the CDFG. Thus, a

library which consists of several dual V

dd

/V

th

components is

necessary to the scheduler. At the beginning of the scheduler, it

brings out the ﬁrst generation, and generates many individuals.

Then, it randomly selects two individuals as the parents, and

performs the crossover and mutation operations to generate

two children. After that, the scheduler evaluates the power

and delay of two parents and two children, and decides the

Boltzmann trial winner by Deﬁnition 1. If the temperature

cools down to the T

f

(terminated temperature), it will output

the trial winner, else it will continue the GASA loop.

In our scheduling algorithm, we try to recombine the results

of each individual rather than just randomly generate a new

TABLE I

AN E X A M P L E OF T E C H N O L O G Y LIBRARY W I T H DUAL V

dd

AND D UAL V

th

.

Multiplier

V

dd

H

/ V

th H

V

dd

H

/ V

th L

Power 561.6 754.7

Delay 18 17

V

dd

L

/ V

th H

V

dd

L

/ V

th L

Power 273.5 365.3

Delay 25 24

(Power: mW) (Delay: control cycle time)

TABLE II

CHRO M O S O M E REPRESENTAT I O N

One individual

n

1

n

2

· · · n

k

Relative CS 0 1 · · · 1

instance (V

dd

/V

th

) L / H H / L · · · L / L

individual. By combining the results of each individual, we

can improve the convergence speed. The recombining phase

will select two parents from the selection pool, and produce

two children. The two children may have some essential

parts of genes that make the ﬁtness of the children better or

worse than that of their parents. Here, the “essential” parts of

genes represent those nodes belonging to the critical paths, or

reducing large amount of power if we choose another V

dd

or

V

th

.

B. Dual V

dd

/ V

th

library

An essential component of the GASA algorithm is the cell

library, in which each cell has four instance types, as shown

in Table I. Table I shows the power and delay of a multiplier

with different V

dd

and V

th

. In Table I, V

dd

H

represents a cell

with high supply voltage. Similarly, V

th L

means a cell with

low threshold voltage. In order to simplify the calculation, the

delay of each instance type is set as the number of control

cycles rather than the actual delay time.

C. Individual and Chromosome Representation

A suitable chromosome representation is needed to repre-

sent the individual in the GASA scheduling algorithm, since

it affects the running time of the algorithm. The chromo-

some representation must include the information of supply

voltage, threshold voltage and control cycles. Table II shows

the chromosome representation in our GASA algorithm. In

Table II, n

i

means the i

th

node in the data ﬂow graph, the

relative cs of n

i

shows the number of control steps between the

starting control step of n

i

and the maximum occupied control

step of all preceding nodes of n

i

. The instance records what

kind of resources allocated to this operation; H indicates the

high voltage and L indicates the low voltage. By checking

the instance ﬁeld of one node, we can look up the power

consumption, area cost, and delay information from the library.

Each column in the table represents a chromosome. If there are

k nodes in the CDFG, this individual needs k chromosomes

to represent itself.

685

Page 2

Parent A

Parent B

Child A

Child B

Node

Relative CS

Instance ( V

dd

/ V

th

)

L / L

L / H H / H H / L

N1

N2 N3 N4 N5 N6

0 1 1 3 0 1

H / HH / L

Node

Relative CS

Instance ( V

dd

/ V

th

)

H / L

H / H L / H L / L

N1

N2 N3 N4 N5 N6

1 2 0 3 1 1

L / LL / H

Node

Relative CS

Instance ( V

dd

/ V

th

)

H / L

H / H

N1

N2 N3 N4 N5 N6

1 2 0 3 0 1

L / H

Node

Relative CS

Instance ( V

dd

/ V

th

)

L / H L / L

N1

N2 N3 N4 N5 N6

3 1 1

L / L

H / H H / LH / H

L / L L / H

0 1 1

H / L

Fig. 2. An example of one point crossover operation.

D. GASA Operations and Parameters

1) Mutation operation: While performing the mutation op-

eration on one individual, we will choose some chromosomes

to mutate their values by a speciﬁc mutation rate. For example,

if there are k nodes in one individual, and the mutation rate

is P

m

. We will choose k × P

m

chromosomes to mutate, while

randomly changing the genes (Relative CS and instance).

Through mutation operation, some variants of one individual

can be produced to explore the neighbors of the current

position in design space. Later we shall give a discussion on

the mutation rate P

m

.

2) Crossover operation: Crossover is a kind of recombi-

nation operation. Fig. 2 shows an example of the crossover

operation. In the GASA scheduling, we adopt one point

crossover operation which will exchange the right half part

of two individuals to each other.

3) Population size: The population size is one of the major

control parameters of the GASA. Generally, the larger of the

population size, the better result we can obtain. However, the

larger size of the population also requires the larger amount

of memory and takes the longer operation time.

4) cooling procedure: The cooling procedure in our GASA

scheduling is to multiply the current temperature by a cooling

constant CC (0 < CC < 1). If CC is set with a large

value, the temperature would reduce slowly and it would

produce large generations. In generally, the population size

and the cooling constant both affect the optimization gain and

the computing time. The designer should tune both of these

parameters to meet the design constraints.

5) Mutation rate: The value of mutation rate will affect the

difference between parents and children. If the mutation rate is

too large, some good chromosomes will be annihilated. If the

mutation rate is too small, the resemblance between parents

and children will be too close. Therefore, it will result in a

local minimal solution. In our algorithm, we set the mutation

rate as 20%. A larger mutation rate is set if the result seems

to fall in a local minimal value.

0

1

2

3

4

5

6

7

8

9

10

11

14

15

12

13

Source

Sink

+

+

–

–

1

2

3

4

5

6

7

8

9

10

+

+

–

–

Source

Sink

(a)

(b) (c)

Node

instance

L

/

H

L

/

H

L

/

H

L

/

H

L

/

H

L

/

H

L

/

L

L

/

H

H

/

H

H

/

H(V

dd

/ V

th

)

1 2 3 4 5 6 7 8 9 10

Control Step

Fig. 3. An example of GASA scheduling. (a) Original DFG. (b) V

dd

/V

th

of each node. (c) Scheduling result.

6) Temperature: The last two parameters are the starting

temperature T

s

and terminating temperature T

f

. We set the

terminating temperature as 0.1. The starting temperature T

s

is set by the mathematical method. T

s

=

E

j

−E

i

−ln(

1

k

−1)

, where

E

j

and E

i

is the initial energy of state i and j, and k is

the probability of E

i

larger than E

j

. The starting temperature

inﬂuences the convergence speed and the accepting rate of the

parent individuals at the beginning of the optimization process.

E. Example of GASA Scheduling

Suppose we have a library which consists of several compo-

nents. Each component was implemented with four different

kinds of the V

dd

/ V

th

combination, as shown in Table I.

The input ﬁle is the DFG, as shown in Fig. 3.(a). At the

beginning of the scheduling, each node was assigned with

different V

dd

and V

th

. After many loops, the scheduling result

was shown in Fig. 3.(c), and the V

dd

/ V

th

assignment was

shown in Fig. 3.(b). Note that the different instances used in

the scheduling process inﬂuence the ﬁnal delay and the power

consumption of whole design.

The goal of GASA is to minimize the power and delay

penalty of a system. Through our GA-based simulated an-

nealing approach, the nearly optimal solution can be achieved

in the tolerable processing time.

IV. EXPERIMENTAL RESULT

To show the effectiveness of our method, we compare

the power consumption and delay overhead among three

scheduling algorithms. The ﬁrst is the ASAP (As Soon As

Possible) scheduling method. The second is the dual V

dd

only

scheduling method, and the third is the proposed dual V

dd

and

dual V

th

scheduling method. A dual V

dd

/V

th

library, which

contains several components such as multiplier and adder, is

used for low power scheduling. In this library, each component

was designed by TSMC 0.18µm process. The V

dd H

was 1.8

V, V

dd L

was 1.26 V, V

th H

was 0.558 V, and V

th L

was 0.458

V.

The experimental result is shown in Table III. Six CDFG

benchmarks are used to examine the scheduling algorithms.

It is clearly that the proposed dual V

dd

/V

th

can obtain the

686

Page 3

TABLE III

EXPERIMENTAL RESULT

As Soon As Possible (ASAP) Scheduling Dual V

dd

Scheduling Proposed Dual V

dd

/ V

th

Scheduling

Benchmark Power Delay Power * Delay Power Delay Power * Delay Power Delay Power * Delay

diffeq 5199.4 54 2807677.6 3079.0 62 190898 2300.2 64 147212.8

ﬁr11 9979.5 109 1087765.5 5435.7 130 706641 5797.9 112 649364.8

dct 16436.8 72 1183449.6 10941.0 83 908103 9332 84 783888

iir7 13669.3 144 1968379.2 8519.8 158 1346128.4 7656.9 156 1194476.4

wdf7 17023.8 125 2127975 10576.0 150 1586400 9485.1 150 1422765

nc 28177.3 135 3803935.5 17834.9 151 2693069.9 14814.2 145 2148059

(Power: mW) (Delay: number of control cycles)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

diffeq fir11 dct iir7 wdf7 nc

benchmark

Power saving

Delay overhead

Fig. 4. A comparison of power saving and delay overhead between ASAP

scheduling and dual v

dd

/v

th

scheduling.

best power-delay product, and it also shows the proposed

method can obtain about 46.1% power saving and 12.3%

delay overhead when compare with the ASAP scheduling

method. Fig. 4 shows the comparison of power saving and

delay overhead between ASAP scheduling and dual v

dd

/v

th

scheduling methods. In this ﬁgure, the solid lines represent the

dual v

dd

/v

th

scheduling relative to the ASAP method.

We also show the convergence result of diffeq benchmark.

Fig. 5.(a) and (b) are the power and delay convergence result

by using dual V

dd

only scheduling method. Fig. 5.(c) and (d)

indicate the power and delay convergence result by using dual

V

dd

and dual V

th

scheduling method.

The experimental result shows that we can get further power

reduction when using a dual V

dd

and dual V

th

library with

limited delay overhead. It means that the proposed method has

a tradeoff between power consumption and delay. In this al-

gorithm, most controlling parameters can be set automatically

to reduce the designers’ load. More constraints can be added

to this algorithm such as area constraints, and the additional

penalty so that it can provide more ﬂexible design space.

V. CONCLUSIONS

In this paper, a dual V

dd

/ V

th

scheduling method is

proposed for low power high level synthesis. In the proposed

method, the dynamic and static power consumption are con-

sidered simultaneously. By using the GASA (genetic algorithm

based simulated annealing) algorithm, each node in the CDFG

(Control Data Flow Graph) is assigned with either a high or

low V

dd

/ V

th

to achieve the low power goal and to control

the computing time. The experimental result shows that our

2700

2900

3100

3300

3500

3700

3900

20.5

13.6

9

5.9 6

3.9 5

2.61

1.7 3

1.1 5

0.7 6

0.5

0.3 3

0.2 2

0.15

Tempture

Power

60

61

62

63

64

65

66

20.5

13.9

9.3 7

6.3 3

4.2 8

2.8 9

1.95

1.3 2

0.8 9

0.6

0.4 1

0.2 8

0.1 9

0.1 3

Tempture

Control Step

2000

2500

3000

3500

4000

4500

13.32

8.48

5.39

3.43

2.18

1.39

0.88

0.56

0.36

0.23

0.14

Tempture

Power

56

58

60

62

64

66

68

13.59

8.82

5.73

3.72

2.41

1.57

1.02

0.66

0.43

0.28

0.18

0.12

Tempture

Control Step

(a) (b)

(c) (d)

Fig. 5. Convergence result of diffeq benchmark. (a) power result with dual

V

dd

. (b) delay result with dual V

dd

. (c) power result with dual V

dd

/V

th

. (d)

delay result with dual V

dd

/V

th

.

method is feasible. The contribution of this paper is that the

GASA method can be used on multiple V

dd

/ V

th

scheduling

and takes both power and performance into account at the

same time.

REFERENCES

[1] http://public.itrs.net/

[2] M. A. Elgamel, and M. A. Bayoumi, “On low power high level synthesis

using genetic algorithms,” in IEEE Proc. of ICECS 2002, vol. 2, pp.

725–728, Sept. 2002.

[3] S. P. Mohanty, and N. Ranganathan, “A framework for energy and

transient power reduction during behavioral synthesis,” IEEE Trans. on

VLSI system, vol. 12, No. 6, pp. 562–572, June 2004.

[4] K. Usami, and M. Igarashi, “Low-power design methodology and

application utilizing dual supply voltage,” in IEEE Proc. of ASP-DAC,

pp. 123–128, Jan. 2000.

[5] D. Samanta, and A. Pal, “Synthesis of dual-V/sub T/ dynamic CMOS

circuits,” in Proc. of VLSI Design 2003, pp. 303–308, Jan. 2003.

[6] K. S. Khouri, and N.K.Jha, “Leakage Power Analysis and Reduction

During Behavioral Synthesis,” IEEE Trans. on VLSI Systems, vol. 10,

No. 6, pp. 876–885, Dec. 2002.

[7] S. Augsburger, and B. Nikoli´c, “Combing dual-supply, dual threshold

and transistor sizing for power reduction,” in IEEE Proc. of ICCD’02,

pp. 316–321, Sept. 2002.

[8] A. Srivastava, and D. Sylvester, “Minimizing total power by simulaneous

V

dd

/V

th

assignment,” IEEE Trans. on CAD , vol. 23, No. 5, pp. 665–

677, May 2004.

[9] P.J.M. van Laarhoven and E.H.L. Aarts, Simulated annealing : theory

and applications, Kluwer Academic Publishers, 1987.

[10] D. E. Goldberg, Genetic algorithm in search, optimization, and machine

learning, Addison-Wesley, 1989.

687

Page 4