Minimizing register requirements for synchronous circuits derived using software pipelining techniques
ABSTRACT A method based on software pipelining has been recently proposed to optimize mono-phase clocked sequential circuits. The resulting circuits are multi-phase clocked sequential circuits, where all clocks have the same period. To preserve functionality of the original circuit, registers must be placed according to a correct schedule. This schedule also ensures maximum throughput. In that method, it is a question of (1) how to determine a schedule that requires the minimum number of registers, and (2) how to place these registers optimally. In this paper, problems (1) and (2) are tackled simultaneously. More precisely, we deal with the problem of determining schedules with the minimum register requirements, where the optimal register placement is done during the schedule determination. To optimally solve that problem, we provide a mixed integer linear program that we use to derive a linear program, which is polynomial-time solvable. Experimental results confirm the effectiveness of the approach, and show that significant reductions of the number of registers can be obtained.
- Citations (9)
- Cited In (0)
- [Show abstract] [Hide abstract]
ABSTRACT: this paper appeared in Proceedings of ICCD '98, IEEE Computer Society Press, Los Alamitos, Calif., pp. 62-6711/2001; - SourceAvailable from: Yvon Savaria[Show abstract] [Hide abstract]
ABSTRACT: Currently, many optimizations of sequential circuits, even as simple as retiming, are avoided due to the lack of verification tools that support them. Doing general sequential equivalence to compare the circuits is impractical for circuits of a reasonable size. On the other hand, combinational optimization is part of the design process, because tools and methods are available to ensure correctness and verify combinational circuits. We present a practical method to verify sequential circuits equivalence using combinational equivalence on a transformed circuit of the same size, for a class of circuits. The constraint imposed is that for each loop in the circuit, there must be a point in both circuits that are in correspondence. The circuits can have a different number of clock phases, and they can be transformed by other scheduling algorithms than retiming and multi-phase retimingElectronics, Circuits and Systems, 2000. ICECS 2000. The 7th IEEE International Conference on; 02/2000 - SourceAvailable from: utwente.nl[Show abstract] [Hide abstract]
ABSTRACT: Rate-optimal scheduling of iterative data-flow graphs requires the computation of the iteration period bound. According to the formal definition, the total computational delay in each directed loop in the graph has to be calculated in order to determine that bound. As the number of loops cannot be expressed as a polynomial function of the number of modes in the graph, this definition cannot be the basis of an efficient algorithm. A polynomial-time algorithm for the computation of the iteration period bound based on longest path matrices and their multiplications is presentedIEEE Transactions on Circuits and Systems I Fundamental Theory and Applications 01/1992;
Page 1
Minimizing Register Requirements for Synchronous Circuits
Derived Using Software Pipelining Techniques
Noureddine Chabini1, El Mostapha Aboulhamid1, Yvon Savaria2
1: LASSO, DIRO, Université de Montréal C.P.6128, Suc. Centre ville, Montréal, Qc, Canada, H3C 3J7.
Email:{chabinin, aboulham}@iro.umontreal.ca
2: GRM, DGEGI, École Polytechnique de Montréal, C.P. 6079, Suc. Centre-ville, Montréal, Qc, Canada,
H3C 3A7. Email: savaria@vlsi.polymtl.ca
Abstract
A method based on software pipelining has been recently
proposed to optimize mono-phase clocked sequential circuits. The
resulting circuits are multi-phase clocked sequential circuits,
where all clocks have the same period. To preserve functionality of
the original circuit, registers must be placed according to a correct
schedule. This schedule also ensures the maximum throughput. In
that method, it is question of (1) how to determine a schedule that
requires the minimum number of registers, and (2) how to place
these registers optimally. In this paper, problems (1) and (2) are
tackled simultaneously. More precisely, we deal with the problem
of determining schedules with the minimum register requirements,
where the optimal register placement is done during the schedule
determination. To optimally solve that problem, we provide a mixed
integer linear program that we use to derive a linear program,
which is polynomial-time solvable. Experimental results confirm
the effectiveness of the approach, and show that significant
reductions of the number of registers can be obtained.
1. Introduction
Software pipelining is a powerful technique for increasing the
instruction-level parallelism for parallel processors. This method
overlaps the execution of successive iterations. It has recently been
used to develop a method for optimizing mono-phase clocked
sequential circuits [2]. The resulting circuit is a multi-phase clocked
circuit, where all clocks have the same period. That method may be
described as follows. First, the optimal clock period is determined,
and a schedule of all the functional elements of the circuit is
computed. Second, in order to preserve the behavior of the original
circuit, registers are placed, independently of their initial
placement, according to that schedule. Finally, once the registers
are placed, the phases are determined.
With this method, it is question of (1) how to determine a
schedule that produces the minimum number of required registers,
and (2) how to place the minimum number of registers even if that
schedule is already determined. Solving (1) and (2) is of great
interest, since reducing the number of registers allows to reduce the
number of control signals, the area of the circuit, and the power
consumption.
In [4], the authors have provided two polynomial-time solvable
methods to determine schedules for reducing register requirements,
and the number of the required phases. Compared to the original
method [2], these methods proved very efficient in reducing the
number of registers and the number of the required phases.
Nevertheless, the problem of how to efficiently place registers in
the circuit is not addressed.
In this paper, we focus on solving simultaneously (1) and (2)
that are outlined above. More precisely, we tackle the problem of
determining schedules that yields the minimum number of
registers, where the optimal register placement is done during the
schedule determination. To optimally solve that problem, we
provide a mixed integer linear program (MILP), which we use to
derive a linear program (LP) that is polynomial-time solvable. To
test the effectiveness of the approach, we experiment the MILP and
the LP on well known benchmarks, and we show the superiority of
that approach over the original method [2].
This paper is organized as follows. The next section gives some
notations and definitions used in this article. Section 3 briefly
reviews the registers placement step in the method based on
software pipelining, which was outlined above. Also, it shows that
the algorithm used to place registers is greedy. The problem we
tackle and its optimal solution are presented in Section 4, and a
linear program for that problem is given in Section 5. Section 6
provides experimental results and Section 7 concludes the paper.
2. Preliminaries
2.1. The cyclic graph model
In order to minimize the clock period of a synchronous
sequential circuit, it is modeled (as in [2]) as a directed cyclic graph
, where V is the set of functional elements in the
circuit, and E is the set of edges which represent interconnections
between vertices. Each vertex v in V has a non-negative integer
propagation delay, which is assumed to be fixed. Each
edge, from u to v, in E is weighted with a register count
, representing the number of registers on the wire
between u and v.
Figure 1 presents an example of a circuit and its directed cyclic
graph model. In this figure, large rectangles represent functional
elements, and small rectangles represent registers. Wires are
oriented to show the propagation direction of the signals. The
propagation delay of each functional element of this circuit is
specified as a label on the left of each large rectangle. This example
will be used through this paper, and will serve to illustrate the initial
specification for the problem to optimize. The initial specification
is in general a synchronous circuit with a single-phase clock. The
minimum clock period of the circuit in Figure 1 as specified is 7,
which is equal to.
d v5
d v1
+
GV E d w
, , ,()
=
d v ( )
N
∈
eu v
)
,
w eu v
(
,
N
∈
()()
Page 2
2.2. Periodic schedules
We define a schedule s [1, 2] as a function
where denotes the schedule time of the nth iteration
of operation v. In multi-phase flip-flop based circuits, the schedule
time is the start time of the operation. A schedules is called periodic
with period P, if:
When there is no resource constraint, a schedule s is said to be
valid if and only if the operations terminate before their results are
needed. In this case, we say that data dependencies are satisfied,
which is equivalent to the following mathematical inequality:
nN
∈∀
,
eu v
,
E
: sn
+
,
.(1)
.(2)
2.3. Maximum throughput of synchronous
sequential circuits
The throughput, T, of a synchronous sequential circuit is
bounded by the inverse of the length, P, of the critical paths in the
circuit. Based on data dependencies constraints only, the maximum
throughput is [1]:
,
(3)
where C is the set of directed cycles in the directed cyclic graph
modeling the circuit. Determining the maximum throughput is a
Minimal Cost-to-Time Ratio Cycle Problem [6, 10], which can be
solved in the general case in
where. A possible method to solve this
problem is to iteratively apply Bellman-Ford’s algorithm [5] for
longest paths on the graph
letting:
wPeu v
,
d u
=
eu v
,
EP
1 T
=
[10],
derived from G by
, (4)
where
find the minimal value of P for which there is no positive cycle in
GP [1]. Without loss of generality, we assume that P is greater than
or equal to the execution delay of each computational element in the
circuit.
For the example in Figure 1, we have that P = 6. This value
corresponds to the cycle defined by vertices v1, v2, v4, and v5.
and . A binary search may be used to
2.4. Schedule for a given throughput
From equation (1) and inequality (2), we have that:
eu v
,
E
.(5)
In the case of periodic schedules, determining a valid schedule
of all the instances of each vertexv inV is equivalent to determining
for each v in V, which is also equivalent to determining
solutions to the system of inequalities described by (5). To solve
this system, the graph GP, previously described, may be used. To
find an ASAP schedule, Bellman-Ford’s algorithm [5] for longest
paths, from a chosen vertex vx to the others, may be applied on the
graph GP. Finding an ALAP schedule may be done as follows. Step
1, a graph G’ has to be derived from GP by inverting the direction
of each edge in GP. Step 2, Bellman-Ford’s algorithm for longest
paths, from the vertexvx to the others, has to be applied on the graph
G’, where the weights of its edges are defined by equation (4).
Finally, step 3, the ALAP schedule is obtained by multiplying each
result in step 2 by -1. Relatively to vx = v1, the ASAP schedules of
vertices v1, v2, v3, v4, v5, and v6 of the circuit in Figure 1 are 0, -3,
3, -1, -4, and -3, respectively. Their ALAP schedules are 0, -3, 4,
-1, -4, and 1, respectively.
2.5. Schedule graph
As in [2], a periodic schedule, with period P, is expressed by a
schedule graph
same definition given for the case of the graph G previously
defined, and is a weight function, which associates to
each edge eu, v in E the time distance between the schedule times of
u and v. Mathematically,
wseu v
,
eu v
,
E
, wseu v
,
. Here V, E and d have the
is defined as follows:
sw eu v
,
(
. (6)
Because s is periodic with period P, equation (6) may be written
as follows:
The graph Gs is consistent if and only if for each edge
E,. This is derived from equation (2). Figure 2
shows a consistent schedule graph, where edges are labeled with ws
values, for the circuit in Figure 1, using the ASAP schedule
determined in Section 2.4.
(7)
in
3. Register placement
In the method proposed in [2], which was outlined in Section 1,
a register placement step is needed in order to preserve the behavior
of the original circuit. The placement of registers is derived from a
schedule graph Gs, by breaking every path in Gs that is longer than
the optimal clock period P. For paths having a length less than P,
no register is required because operations chaining is assumed.
Figure 1. Sample circuit and its directed cyclic graph model.
Computational
element #1
Computational
element #2
Computational
element #3
Computational
element #4
Computational
element #5
3
21
3
4
1
3
2231
43
5
4
1
0
0
1
1
Circuit.
Cyclic graph model.
Computational
element #6
1
0
61
1
1
s : NV
×
Q
→
snv ( )
s n v
(,)≡
nN
∈∀
,
vV
∈∀
: sn
1+
v ( )
snv ( )
P
+=
∈∀
w eu v
(
,
)
v ( )
snu
( )
d u
( )
+
≥
T MincC
∈
w eu v
(
,
)
eu v
,
c
∈
∑
d u
( )
vV
∈∀
and eu v
,
c
∈
∑
⁄
=
O V
(
)
EVdmax
⋅()
log
⋅⋅)
dmax
MaxvV
∈
d v ( )(
=
GP
V E d wP
, , ,()
=
() ( )
P w eu v
⋅
,
()
–
∈⁄
∈∀
, s0v ( )
s0u
( )
–
d u
( )
P w eu v
,
()⋅
–
≥
s0v ( )
Gs
V E d wsP
, , ,,()
=
ws: EQ
→
()
=
∈∀()
)v ( )
s0u
( )
–
eu v
,
E
∈∀
, wseu v
,
()
s0v ( )
s0u
( )
–=
P w eu v
,
()⋅
+
eu v
,
wseu v
,
()
d u
( )≥
Figure 2. Schedule graph.
1
23
4
5
3
3
2
2
3
4
6
3
5
Page 3
For the circuit in Figure 1, applying the algorithm in [2] for
register placement on the schedule graph Gs in Figure 2, starting
from v1, gives the placement of registers and their schedules as
depicted by Figure 3. The number of registers that are placed is 6
and the number of phases is 4.
The algorithm for register placement in [2] is not exact. Indeed,
for Figure 3, R1 can be omitted.
4. Problem formulation and optimal solution
As mentioned in Section 1, two problems arise in the method
based on software pipelining proposed in [2]: the first one is how to
determine a schedule that yields the minimum number of required
registers, and the second one is how to place the minimum number
of required registers even if that schedule is already determined.
Our focus in this paper is to simultaneously solve these two
problems. More precisely, the problem (Prob) we tackle is to
determine a schedule with the minimum register requirements,
where the register placement is done during the schedule
determination. To optimally solve Prob, we provide a mixed
integer linear program (MILP), and use it to derive a linear program
which is polynomial-time solvable.
Before presenting that MILP, let us first give some
requirements. Figure 4 gives a portion of the cyclic graph modeling
the circuit, where i and j are two computational elements. xi, j
denotes the number of registers that must be placed on the arc ei,j to
guarantee that the length, li,j, of every path that goes to j via i, is less
than or equal to the optimal clock period P. li,j will be defined in the
following. Note that as in [2], operation chaining is assumed, and
hence no register is required if
to j via i are already examined in order to determine if some
registers must be placed on them or not. Letmi be a no-negative real
greater than or equal to each rest obtained by dividing the length of
each one of those paths by P. The length li,j of every path that goes
to j via i is the sum of mi and ws(ei,j), where ws(ei,j) is defined by
equation (7). yi, j is the rest of the division of li,j by P. We require
that, which guarantee that if a register R is on the
output of computational element i, then its schedule will be after i
finishes its execution.
Figure 5 presents a mathematical formulation to Prob. The
objective function expresses the number of registers to be placed in
the circuit. Equations (8), (9), and (10) are equivalent to the
definition of xi, j, yi, j, and mi, respectively. Inequality (11) is
equivalent to (5). (13) is required, since the number of registers
must be an integer. In this formulation, the variables are xi, j, yi, j, mi
. Suppose that paths that go
and the schedule s0(u) for each computational element u.
The formulation in Figure 5 can be linearized as follows. Using
the fact that, and that no register is required if
the length of a path is less than or equal P, equation (8) can be
replaced by:
ei j
,
E
, xi j
,
–+
(14)
Equations (9) and (10) together can be replaced by
ek i ,
E
, mi
–
(15)
After linearizing the formulation in Figure 5, we obtain the
MILP to optimally solve Prob as presented in Figure 6. In this
figure, equations (16) and (17) are equivalent to (14). (18) is
equivalent to (15). (19), (20) and (21) are equivalent to (12), (11)
and (13), respectively. The variables are not negative.
5. A linear program for solving Prob
Linear programs are polynomial-time solvable [7, 8]. A linear
program for solving Prob can be obtained by deleting the constraint
that is integer in Figure 6. In this case, once the linear program
is solved, the number of registers to be placed on the arc ei,jis
. Due the space limitation, details on why it is possible to
place registers on the arc ei,j can be found in [3].
xi j
,
Figure 3. Register placement an their schedules.
Register
R1
R2
R3
R4
R5
R6
Schedule
3
5
5
0
2
2
1
23
4
5
6
R1
R2
R3
R4
R5
R6
li j
,
P
≤
mi
Pd i ( )
–
()≤
xxx
1+
<≤
∈∀
s0j ( )
s0i ( )
P w ei j
,
()⋅
mi
+
() P
⁄
xi j
,
1+
≤≤
∈∀
s0i ( )
s0k ( )
P w ek i ,
()⋅
mk
++
()
P xk i ,
⋅
–
≥
Figure 4. Illustration of the variables of the MILP.
ij
...
...
(xi,j; yi,j)
mi
Figure 5. A mathematical formulation to solve Prob.
Subject to:
ei j
ei j
,
(8)
(9)
(10)
(11)
(12)
(13)
Minimizexi j
,
ei j
,
E
∈∀∑
,
E
∈∀
, xi j
, yi j
,
,
s0j ( )
s0i ( )
+
–
P w ei j
P w ei j
(⋅
)k Fanin i ( )∈
d i ( )
–
≥
d i ( )
–
,is integer
,
+
()⋅
mi
)
–
++
() P
⁄
=
E
∈∀
s0j ( )
V
∈
s0i ( )
(≥
–
,
)
mi
(
P xi j
⋅
,
=
i
∀
, mi
yk i ,
ei j
,
E
∈∀
, s0j ( )
i
∀
ei j
,
s0i ( )
, mi
E
, xi j
–
P w ei j
,
()⋅
V
∈
P
≤
∈∀
Figure 6. A MILP to optimally solve Prob.
Subject to:
ei j
ei j
ek i ,
(16)
(17)
(18)
(19)
(20)
(21)
Minimizexi j
,
ei j
,
E
∈∀∑
,
E
E
E
∈
∈
∈
∀
∀
∀
,
P xi j
⋅
,
+
+
i
⋅
–
s0i ( )
s0i ( )
s0k ( )
V
∈
–
ei j
,
s0j ( )
s0j ( )
s0i ( )
P
≤
s0i ( )
E
, xi j
mi
≥
–
++
mi
mi
d i ( )
–
P w ei j
P w ei j
mk
P w ek i ,
,
()
P
⋅
–
⋅
≥
≥
,
, P xi j
, P xk i ,
,
–
+
–
d i ( )
–
–
,
()
(
–
⋅)⋅
∀
, mi
ei j
,
E
∈∀
, s0j ( )
∀
P w ei j
,
()⋅
–
≥
∈
,is integer
xi j
,
xi j
,
Page 4
6. Experimental results
To test the effectiveness of our approach, the MILP in Figure 6
and the corresponding linear program (LP), obtained by ignoring
the constraint integer in (21), are experimented on well known
benchmarks. Circuits from the ISCAS89 benchmark suite are used
to test the efficiency of the LP in terms of the run-time and of the
reduction of the number of registers inserted in the circuit. The
mathematical formulations for each circuit are automatically
generated by a module we coded in C++ and integrated in a tool we
developed in [4]. We did not implement the cited polynomial-time
algorithms for linear programs, but the Lp_Solve tool [11] (in the
public domain) is used to solve the generated mathematical
formulations. Obtained results are given in Tables 1 and 2, where
the first column gives the name of the circuit and the second column
presents the number, N1, of registers placed using the algorithm in
[2] that uses ALAP as a schedule. The number, N2, of registers
placed by MILP or by LP are presented in the third column. The
fourth column gives the
. For Table 2, the fifth column gives the
run-time in seconds on an UltraSparc 10 with 1GB RAM. As Table
1 reports, significant reductions of the number of required registers
are obtained. Substantial reductions are also obtained using the LP.
Indeed, as summarized by Table 2, reductions as high as 77.46%
are obtained in less than 181.4s.
relative gain defined as
Table 1. Register placement by [2] vs. by MILP.
Table 2. Register placement by [2] vs. by LP.
7. Conclusions
A method based on software pipelining has recently been
proposed to optimize mono-phase clocked sequential circuit. The
resulting circuit is a multi-phase clocked circuit, where all clocks
have the same period. To preserve the behavior of the original
circuit, registers are placed according to a schedule, which has the
maximum throughput.
In that method, two problems arise: how to determine schedules
that lead to a minimal register requirements, and how to place the
minimum number of required registers even if these schedules are
already determined.
In this paper, we have simultaneously tackled these two
problems. We have provided a mixed integer linear program and
used it to derive a linear program, which is polynomial-time
solvable. Experimental results on well known benchmarks
confirmed the effectiveness of the approach we propose. Indeed,
significant reductions of the number of required registers have been
obtained in very short run-time.
References
[1] I.-E. Bennour, Estimation de la performance et méthodes
d’allocation dans la synthèse de systèmes numériques,Thèse
de doctorat, DIRO, Université de Montréal, 1996.
F.-R. Boyer, E.-M. Aboulhamid, Y. Savaria and M. Boyer,
“Optimal Design of Synchronous Circuits Using Software
Pipelining Techniques”, ACM Trans. on Design Aut. of Elec.
Systems, Vo. 7, Num. 2, 2002.
F.-R. Boyer, E.-M. Aboulhamid and Y. Savaria, “An
Efficient Verification Method for a Class of Multi-phase
Sequential Circuits”,
The 7th IEEE International
Conference on Electronics, Circuits & Systems, Lebanon,
December 17-20, 2000.
N. Chabini, E.-M. Aboulhamid and Y. Savaria, “Reducing
Register and Phase Requirements for Synchronous Circuits
Derived Using Software
Proceedings of the IEEE Computer Society Annual
Workshop on VLSI, Orlando, Florida, April, 2001.
T. H. Cormen, C. E. Leiserson, and R. L. Rivest,
Introduction to Algorithms, New York, NY: McGraw-Hill,
1990.
S.-H. Gerez, S.-M.-H. de Groot, and O.-E. Herrmann, “A
Polynomial-Time Algorithm for the Computation of the
Iteration-Period Bound in Recursive Data- Flow Graphs”,
IEEE Trans. on Circuits and Syst.-1, No. 1, Vo. 39, Jan.
1992.
N. Karmakar, “A New Polynomial-Time Algorithm for
Linear Programming”, Combinatorica, Vo. 4, 1984. xxx
L.-G. Khachian, “A Polynomial Algorithm in Linear
Programming”, Soviet Math. Doklady, Vo. 20, 1979.
S. Y. Kung, H. J. Whitehouse and T. Kailath, VLSI and
Modern Signal Processing, Prentice-Hall, Inc., Englewood
Cliffs, NJ, 1985, pages: 259-60.
E.-L. Lawler, Combinatorial Optimization: Networks and
Matroids, Holt, Reinhart, and Winston, New York, NY,
USA, 1976.
The LP_Solve Tool: ftp://ftp.ics.ele.tue.nl/pub/lp_solve/
[2]
[3]
[4]
Pipelining Techniques”,
[5]
[6]
[7]
[8]
[9]
[10]
[11]
#registers
placed by [2]
#registers
placed using
Figure 6
Relative
gain
Figure 1
6*5 16.66%
SOIIR Filter [1]
3223.33%
Polynomial Div. [1]
9455.55
Correlator [2]
66 0%
FOWDEF [9]
148 42.85%
#registers
placed by
[2]
#registers
placed
using the
LP
Relative
gain
Run-
time
(s)
Figure 1
6*5 16.66%0.01
SOIIR Filter [1]
33 0% 0.02
Polynomial Div. [1]
94 55.55%0.02
Correlator [2]
660%0.01
FOWDEF [9]
14 1214.28%0.06
S344
13153 59.54%1.63
S641
1423277.461.25
S1423
42221648.81% 17.31
S5378
103358143.75%181.4
S9234
1042 46655.27%93.29
N1
N2
–
() N1
⁄()
100%
×
*: ASAP schedule is used.