Content uploaded by Khaled Elleithy
Author content
All content in this area was uploaded by Khaled Elleithy on Sep 08, 2013
Content may be subject to copyright.
Content uploaded by Khaled Elleithy
Author content
All content in this area was uploaded by Khaled Elleithy on Sep 04, 2013
Content may be subject to copyright.
A
FORMAL DESIGN METHODOLOGY
FOR
PARALLEL ARCHITECTURES
KHALED M. ELLEITHY
&
MAGDY
A.
BAYOUMI
The Center For Advanced Computer Studies
University of Southwestern Louisiana
Lafayette, LA 70504, U.S.A.
ABSTRACT--
In this paper, we introduce
a
formal approach for synthesis
of
array architectures. Four different
fovms
are used
to
express the input
algorithm: simultaneous recursion, recursion with respect
to
dsferent vari-
ables, fixed nesting and variable nesting. Four different architectures for
the same algorithm are obtained.
As
an example,
a
matrix-matrix multi-
plication algorithm is used
to
obtain four different optimal architectures.
The dtfferent architectures
of
this example are compared in terms
of
area,
time, broadcasting and required hardware.
1.
INTRODUCTION
Reported techniques
for
high level synthesis surer from the
following
disadvan-
tages: some systems are not suitable
for
large problems (Emerlad[l]), some systems
require to
know
the target architecture in advance in order to have the structural
description [2-4], a number of restrictions are imposed on the input description
(Flamel[5], HART[6]), Certain phases of the design process are not automated due to
the lack of algorithms to transform between different representations; such limita-
tions require the designer’s responsibility for designing these phases (HARP[6]), the
designer is responsible for specifying the operations sequencing and communications
among different units
[7,8],
and some systems are limited to a special class of algo-
rithms
[9-113.
A
formal design methodology for high level synthesis has been introduced in
[12-141 to overcome the previous disadvantages. The architectures produced by this
methodology can be classified as uniprocessor architectures. To exploit the parallelism
in
a
given algorithm the methodology has been generalized
so
that it can be applied
to the simultaneous recursion form
[15,16].
In this paper the methodology is ap-
plied to the following forms:
[l]
[2] Fixed nested recursion.
[3]
Variable nested recursion.
The methodology provides two main features:
completeness
and
correctncrs.
Com-
pleteness means the ability to use tlic approach for any general algorithm. Correctness
is achieved by using a set of transformations that are proved to be correct.
A
formal
framework for the synthesis procedure has bcen developed which can be easily auto-
mated.
A
design example of matrix-matrix multiplication is used with
each
one
of
the forms to obtain a parallel architecturc. These different architectures for this
Recursion with respect to several variables.
CH2920-7/9010000/0603$01
.OO
0
1990
IEEE
603
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
604
International Coqference
on
Application Specific Array Processors
example are compared in terms
of
area and speed.
2.
RECURSION WITH RESPECT
TO
SEVERAL VARIABLES
If
xi
(16i
<n)
are
n-1
place functions,
z
is
n
place functions,
y
is 2n place
function and
w!?
are
n
place functions, then
z
is defined
by
the following Algorithm
Specification Linguage (ASL)[12-141 code:
Transformation
Algorithm
to
RSL
To
transform the system
of
recursion with respect
to
several variables to the
Realization Specification Language RSL[12-14] representation we implement each
equation using the same method described in[12-14]. Here is the RSL representation
of the system:
(1)
Initp(l,avgl
;
2,arg2
;
. .
'
;
n,ayB,)
I
=
p;
+I
mc
(I)
(3)
Ready
-
eg?(I
,
m)
(4)
(5)
z(0,
arg2;
.
.
,
aa,,)
-
Comp(arg2
,....,
awn
#
xl)
pdY
(0,
aw,,
. . .
,
aa,)
=
And(argy
,
argy
,
...,
arg,
fi'"iY)
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
Potpourri
111
605
z
(argl+l
.
0
.
arg,
.....
awn)
-
Comp(arBl+l
.
0
.
arg3
.....
awn
#
~2)
zfidy(argl+l
.
0,
aV3
.....
aVIj) =And(argl+lfidy
.
arg3fidy
.....
arg,
fidY
)
......................................
Equation
1
is used to show that we
use
n
registers to be initialized with the argu-
ments
(argl,
...
,ay&,,).
Equation
2
means that the unit
Suc
which is a basic func-
tion has its inputs
control,
(l<i
<Y)
connected to the
readyi
output of the unit com-
puting
xi
to be sure that
I
is not incremented until
xi
is computed. Equation
3
is
used to represent the fact that
I
is incremented every clock cycle
using
the
Suc
unit,
and
I
is initialized to the value
1
using the register number
n+l.
Equation
4
deter-
mines the end of operation when
I
reaches the value
m.
Equations
5,6,7
represent
the composition operation in equations
1,2,3
of the
ASL
representation respectively.
Proof of correctness for the algorithm can
be
found
in[17].
Example:
Let
us
use the recursion with several variable to define the greatest common di-
visor
(GCD).
Two
variables are required to
dcfinc
the
GCD.
The definition
is
as
fol-
lows:
GCD
(0,n
)
-
n
GCD
(m
+1,0)
-
m
+1
GCD(m+l,n+l)
-
p(m,n,GCD(iv(m,Iz),n+l),GCD(m+l,w(n,m)))
w(m,n)
<
m+1
w(m,n)
<
n+l
P(m,n,a,b)
=
a
n>m
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
606
International Coderence
on
Application Specijic Array Processors
b
m<n
This
is
expressed
in
ASL
as
follows:
GCD
(0,~
=
$(o
,
n
1
GCD(m +1,0)
=
$(m
+1,0)
GCD(m+l,n+l)
-
p(m,n,GCD(w(m,n),n+l),GCD(m
+l,w(n,m)))
w
(m
,n
1
-
m
(h(9f(m
,n
1))
w
(n
,m
)
-
-than)
(h($(n
,m
)))
p(m,n,a,b)
-
(compareli(m,n,a,b)
The
RSL
representation
of
this example
is
as
follows:
Initp(1,m
;
2,n)
2rcady
2rcdy
S~C~‘?,rr0ll
=
92
(a*)
~~CCO”
=
91
(ai)
I
-
p;+l
suc(I)
Ready
=
eq?
(I
,
m
)
GCD
(0
,
n)
=
Comp
(0,n
#
7;)
GCD~~~Y(O
,
n)
-
pz
nRrady
GCD(~
+i
,
0)
=
comp
(m
+i,o
#
$)
mRrndy
GCDfidy
(m
+1
,
0)
=
p1
GCD(m+l
,
n+l)
-
Comp(m+l,
n+l
,
GCD(w(m,n),n+l)
,
GCD(m+l,w(n,m))
#
p)
GCDRrady(m+l
,
n+l)
-
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
Porpourri
111
607
We
see
in
this example the correspondence between
ASL
and
RSL
representations.
Equation
1
shows that we use 2 registers to be initialized with the arguments
m,~.
Equation 2,3,4 has the same meaning as equations 2,3,4 in the transformation algor-
ithm. Equations
5,6,7
represent the composition operation in equations
1,2,3
of the
ASL
representation respectively.
3.
NESTED RECURSION
Two cases are considered for the nested recursion: fixed number of nestings and
varied number
of
nestings. Instcad of addressing
a
recursion of k-fold nesting, we
show the idea in every case with
a
double nesting. Thc general case of k-fold can be
easily extended.
3.1
Fixed Number of Nestings
If
x
is 1-place function,
z
is 2- place function, and
y
is 2-place function and
w
is’3-place function, then
y
is
a
double nested recursion function defined by the
following
ASL
Y(0 >a)
=x(a)
y(n+l
,a)
,y(n
,w(n
>fi
,y(n
,a))))
Although the solution of the function
y
gets more complicated as we
go
for higher
values, we are still able to solve it in the same previous method. The solution we ob-
tain for y(2,a) is expressed in the functions
z,w$
but not the function
y.
This
means
if
the functions
z
,w
+
are primitive recursive than
y
is primitive recursive since
it is driven from
z
,w
,x
by
composition.
Transformation Algorithm
to
RSL
To
transform the system of nested recursion to
RSL
we implement each equa-
tion using the same method described in[12-14]. Here is the
RSL
representation of
the double nested recursion:
Initp
(1,rz
;
2,a
)
(1)
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
608
International Conference
on
Application Specific Array Processors
~*C,O,,,l
=
x
(a
I
=
p:
IUC
(I)
Ready
-
eq?(l
,
n)
Temp, =Comp(n,a
#
y)
Temp?
=
And
(prRIady
,
py
)
Temp,
-
Comp
(n,a,Templ
#
w)
TempFdy
-And
(prRId'
,
py
,
Temp?)
Temp
-
Comp
(n
,Temp
,
#
y
)
Temp:&'
=
And
(p;&'dY
,
Temp?)
y
-
Comp(n,a,Temp3
#
z)
yaady -And
(praady
,
pydy
,
Temp?)
Proof of correctness for the algorithm can
be
found in[17],
3.2
Variable Number
of
Nestings
If
x
is 1-place function, and
y
is 2-place function then
y
is a double nested re-
cursion function defined by the following
ASL:
Y(0
1
n)
-x(n)
r(m+1,0)
=r(m
3
1)
y(m+1,
n+l)
-r(m~(m+l,n))
To
see that the previous definition doesn't behave in the same manner as the
definition in section
3.1.
let us computc values ofy for different
m,n.
From the computation
of
y
for different
m
,n
we notice that the number
of
ncstings
is not constant and depends on earlier values. This means that nested recursion can
leads out
of
primitive recursion.
Transformation Algorithm to
RSL
To
transform the system of nested recursion to
RSL
we implement each equa-
tion using the same method described in
[12-141.
Here is the
RSL
representation of
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
Porpourri
111
609
the double nested recursion:
Inirp(1,n
;
2,m)
J*C,,,,,,I
-
x
(a
I
-
p;
SUC(Il)
I
-
p,'
SllC(I*)
yfi.dy(O,n)
~
pdv
Ready
-And
(eg?
(I,
,
n)
,
eg?
(I,
,
n))
y(O,n)
-
cOWlP(7Z
#
X)
(n
)
y(m+l,O)
-
Comp(m,l
#
y)
yfin'"(m+l,O) -yfiady(m,l)
Temp,
-
Comp(m+l,n
#
y)
Temp?
-And
(prfindy
,
prfidy)
y
-
Comp (m
,a
,Temp
,
#
Y
1
yfipdy
-And
(pTfiadv
,
Tempydy)
Temp?
-And
(pYfindy
,
Temp?)
y
-
Comp(n,a,Temp3
#
z)
randy
I
And
(pl)fidy
,
p?
,
Temp?)
(7)
(9)
Proof of correctness for the algorithm can be found in[17].
Table
1
shows
a
comparison between diff'erent
forms
of recursion
in
terms
of
ar-
chitecture, broadcasting and complexity of the controller. The simultaneous recur-
sion
is
the only form that gives a two dimensional array.
All
forms have broadcasting
except the variable nesting. The controller
of
the variable nesting is complex com-
pared with the other three forms.
Table
1.
Comparison Between Different Forms
of
Recursion.
Simultaneous Several Fixed Variable
variables nesting nesting
Architecture
tWO
one one one
dimensional dimensional dimensional dimensional
Broadcasting
yes
yes
Yes no
Complexity
simple
simple
simple complex
of
controller
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
610
Internutiowl Corlference on Application Specific
Array
Processors
5.
Matrix-Matrix Multiplication Example
An
example of matrix multiplication
is
introduced as an application of different
forms
of
recursion. The architecture has two matrices
A
and
B
as inputs, and matrix
Cas
an
output. The multiplication is done in
a
recursive way and can be described
by the following high level subroutine:
mat*-multiplication
(A,B)
begin
for
i-1
to
n
begrn
for
j=l
to
n
CiJ
-
0
for
&-I
to
n
nexidl
C.
C;j,k-l
+
Ai,k
*
Bkj
end
next
i
next
j
end
The
ASL
and
RSL
representation
using
the
simultaneous recursion form
is
as fol-
lows:
......................................
s14cconmi
-
[()““‘Y
r
-
p;+l
$Uc(r)
Ready
=
eq?(l
,
m)
Remltl
-
comp (aT,I,p$’ Result
#
innerproduct)
......................................
Resultl
-
comp (fiT,I,p$+’
Result
#
innerproduct)
Figure
1
shows the architecture obtained
for
matrix multiplication. The details of
im$cmcnting the inner-product cell
arc
shown in
[14].
The architecture consists of
N
inner-product cells. The number of cycles required to perform the multiplication
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
Potpourri
III
61
1
is
N.
Figure
2
shows the architecture using recursion with respect to several variables. The
architecture consists
of
N
multiplication cqls and one adder. The number
of
cycles
required to perform the multiplication is
N
.
Figure
3
shows the architecture using
fixed
nesting recursion. The architecture con-
sists
of
N
in2er-product cells. The number
of
cycles required to perform the multi-
plication is
N
.
Table
2
shows
a
comparison between different architectures
of
the matrix-matrix
md-
tiplication.
-3
-
)
LE?
,
Ready
Aln,
,A12,A11
suc
1
I
ready
Inner
Product
cell
A2n,..,i22,Azi
Product
I
H
An.
..,AnZ.An
1
Product
cell
Product
cell
$E-$
Product
Product
cell
Fig.
1.
Matrix-Matrix Multiplication Architecture Using Simultaneous Recursion.
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
612
International Coderenee
on
Application Spec@c Array Processors
'
-
Table
2.
Comparison Between Different Matrix-Matrix Multiplication Architectures.
ready
Result
Cl
1
control
c2
1
C3
1
@
Simultaneous Several Nesting
variables
Area
*)
e(=
)
Time
2,
e(@
A+T
6(n3)
0(n3)
W3)
Broadcasting
Yes no no
Hardware
inner-product inultiplier
(n
)
inner-product
)
adder
(1)
(n
)
An1 A11 A31A21AIl
ready
Bln Bln
E11
BII
811
LU
An2 A12 A32A22A12
B2n B2n
821 821
021
Ready
Control
t
c12
Cnn
Ann Aln A3nA2nAln
1
1
Dl
Bnn
..
Bnn
......
Bnl Bn
1
0nl
Fig.
2.
Matrix-Matrix Multiplication Architecture Using Recursion
With Respect to Several Variables.
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
Potpourri
III
613
Ann
.._..
Aln Aln
0000
..
00
Bnn
....
Bn2 Bnl
0000
..
00
’
+
6.
CONCLUSIONS
/nnpl;
Result
pr*d’ct
(1)
>
In
this
pa
er
an
formal
approach for transforming different
forma
of
recurtion
to
parallel arc&tectures has been introduced. Four different
forms
are used
to
express a given algorithm. Four optimal architectures for
a
matrix-matrix multiplica-
tion are compared. The approach has the following advantages:
__
It
is suitable for large problems since the transformation algorithm is linear.
It
does not require
to
know
the target architecture in advance.
There are no restrictions imposed on the input description.
The technique is fully automated.
The designer is not responsible for specifying the operations sequencing and
communications among different units.
The approach is applicable to any general algorithm.
Parallcl
properties
of
algorithms are explored.
n
SUC
I
1
Ready
I
m
......
IT
A12
A12
0
Ann
Ann
.....
Bnn
.....
Fin.
3.
Matrix-Matrix Multiolication UsinPr Fixed Number
of
Nestinzs.
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
614
References
International Codererace
on
Application Specific Array Processors
C.
J.
Tseng, "Automated Synthesis of Data Paths in Digital Systems," Ph.D.
Dissertation, CMU, Apr.
1984.
T.
J.
Kowalski, D.
J.
Geiger, W.
H.
Wolf, and A. C. Parker, "The VU1 Design
automation assistant: From Algorithms to Silicon," IEEE Design and Test, pp.
33-43,
Aug.
1985.
T.
J.
Kowalski, "The VU1 Design Automation Assistant: A Knowledge-based
Expert System," P1i.D. thesis, CMU,
1984.
T.
J. Kowalski and D. E. Thomas, "The VLSI Design Automation Assistant: Pro-
totype System," Twenty Design Automation Conf. Proc., Miami, pp.
479483,
1983.
H. Trickey, "Flamel: A High Level Hardware Compiler," IEEE Trans. on Com-
puter Aided Design, vol.
CAD-6,
no. 2, pp.
259-269,
March
1987.
T. Tanaka,
T.
Kobayashi and
0.
Karatsu,
"HARP:
Fortran to Silicon," IEEE
Trans. on computer-aided design, vol.
8.,
no.
6.,
pp.
649-660,
June
1989.
C. Niessen, C.H. van Berkel, M. Rem, and
R
W. Saeiji, "VLSI Programming
and Silicon Compilation; A novel Approach from Philips Research," Intl Conf.
on Computer Design: VLSI in Computer and processors, pp.
150-151,
Oct.
1988.
C.H.
van Berkel, M.
Rem
and
R
W.
Saeijs, "VLSI Programming," Intl Conf. on
Computer Design: VU1 in Computer and processors, pp.
152-156,
Oct.
1988.
S.
D.
Johnson, "Synthesis of Digital Designs from Recursion Equations," Ph.D.
Dissertation, Comp. Sc. Dept., Indiana Uni.,
1983.
[lo]
P.
Quinton, "Automatic Synthesis of Systolic Arrays From Uniform Recurrent
Equations," Proc.
of
the
11
th, Annual Intl Symp. on Computer Architecture,
[ll]
S.
K.
Rao,
"Regular Iterative Algorithms and Their Implementationr on
Pro-
cessor Arrays," Ph.D. Dissertation, Dept. of Electrical Eng., Stanford Uni.,
1985.
[12]
K.
M. Elleithy and
M.
A. Bayoumi, "Synthesizing DSP Architectures from
Behavioral Specifications: A Formal Approach," Proceedings of the
1990
IEEE
International Symposium for Circuits and Systems, May
1990.
[13]
K. M. Elleithy and M. A. Bayoumi, "A Formal High Level Synthesis Approach
for DSP Architectures," Proceedings
of
the
1990
International Conf. on Acous-
tics, Speech and Signal Processing, April
1990.
[14]
K.
M. Elleithy and M. A. Bayoumi, "A Frame-work for High Level Synthesis
of
Digital Architectures from precursive Algorithms," Proc.
of
the ACM
Eighteenth Annual Computer Science Conference, pp.
305-311,
Feb.
1990.
[15]
K.
M. Elleithy and
M.
A. Bayoumi, "Formal Synthesis of Parallel Architectures
from Recursive Equations," Accepted for inclusion in the
1990
International
Conference on Parallel Processing, Aug.
1990.
[16]
K. M. Elleithy and M. A. Bayoumi, "A Formal Framework for Synthesis
of
Parallel Architectures," Proceedings
of
the Fourth Annual Symposium on
Parallel Processing, April
1990.
[17]
k. M. Elleithy, "A Formal Framework
for
High Level Synthesis
of
Digital
Designs," Ph.D. Dissertation, The Center for Advanced Computer Studies,
University of
SW
Louisiana,
1990.
pp.
208-214, 1984.
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.