Content uploaded by Khaled Elleithy
Author content
All content in this area was uploaded by Khaled Elleithy
Content may be subject to copyright.
Content uploaded by Khaled Elleithy
Author content
All content in this area was uploaded by Khaled Elleithy
Content may be subject to copyright.
Content uploaded by Khaled Elleithy
Author content
All content in this area was uploaded by Khaled Elleithy
Content may be subject to copyright.
From Algorithms
To
Parallel Architectures:
A Formal Approach
Baled
M.
Elleitby
Computer Engineering Department
King Fahd University
Dhahran
31
261
Saudi Arabia
Abstract
In this paper, we introdwe a formal ap
proach fm synthesis of parallel architectures.
Fwr
different forms are used
to
express thegiven
abo
rithms: simdtancous recursion, recursion with
respect
to
drfmnt variables, fied nestin8 and
variable nestin8. Four dfifkrent architectures fm
the same algmthm are obtained.
As
an example,
a
matrixmatrix mdtiplication a&withm
is
used
to
obtain fwr drferent optimal architectures.
The
drrerent architecwres of this example are com
pared in terms of area, time, broadcasting and
re
quired hardware.
The
approach
is
providing
two
main features: completeness and correctness.
1.
Introduction
Formal high level synthesis of general ar
chitectures is an important design phase to en
sure functional, correct and cost effective archi
tectures. Recently, there have been several
efforts in this direction [l51, but many of these
efforts have not included parallel architectures.
There have been several synthesis approaches
for synthesizing special class of arrays
[6,7l.
In
this paper we present a formal system for syn
thesizing parallel architectures. The architec
tures produced by this system can be classified
as uniprocessor architectures. To exploit the
parallelism in a given algorithm the methodo
logy
has been generalized
so
that it can be ap
plied to simultaneous recursion forms
[lo].
In
this paper we shall extend the methodology by
applying it to the following forms: recursion
with respect to several variables, fixed nested re
cursion and variable nested recursion. Recur
sion with respect to several variables will be dis
cussed
in
detail. For a complete discussion
on
all the forms please refer to
[lo].
The metho
dology provides
two
main features:
completeness
TH03632/91/0000/0358$01
.OO
Q
1991
IEEE
Maady A. Bayoumi
The Center For Advanced Computer Studies
University of SoutkrrpcJtGm Louisiana
hfayette,
LA
70504
U.
S.A.
and
correctness.
Completeness means the ability
to use the approach for any general algorithm.
Correctness is achieved by using a set of
transformations that are proved to be correct.
A
design example of matrixmatrix multiplica
tion is used with each one of the forms to ob
tain
a parallel architecture. These Merent ar
chitectures for this example are compared
in
terms of area
and
speed.
1.1.
System Overview
Figure
1
shows the different components
of our formal system. The system is composed
from two subsystems: synthesis subsystem and
userinterface subsystem.
A
new language
Al
gorithm Specification Language
(ASL)
based
on
precursive functions is used to specify the
given algorithm. Transformation techniques
are used to transform an algorithm specified in
ASL
to a realization language called
RSL.
Every
construct
in
ASL
has
an
isomorphic representa
tion in
RSL
which is the basis of the automated
transformation.
A
logic programming environment based
on
Prolog,
is employed as a user interface to
the synthesis process. The logic programming
environment supports specifying, simulating,
and testing the target systems. Prolog provides
homogeneity to the developed system as
it
sup
ports hierarchical development and mixing of
description at various hierarchical levels. For
more details on the synthesis subsystem and
the user interface subsystem please refer to
[S
lo].
In
the next subsection, due to space limi
tations, we shall present only the recursion
with respect to several variables synthesis ap
proach
in
detail.
2.
Recursion with respect
to
358
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 13:29:18 EST from IEEE Xplore. Restrictions apply.
several variables
“xi
(l<i<n)
are
n1
place functions,
2;
is
n
gace
functions,
y
is
2n
place function
and
w
are
7c
place functions, then
z
is
defined by ASL Code ASLl.
Transformation
Algorithm
to
RSL
To transform the system of recursion
with respect to several variables to
RSL
we im
plement each equation using the same method
described in[9]. RSLl is the
RSL
representation
of the system:
Equation
1
is used to show that we use
n
re
gisters to be initialized with the arguments
(1~~8~~
. .
.
ar~,,).
Equation
2
means that the
unit
Suc
which is a basic function has its in
puts
control,.
(l<i<v)
connected to the
ready;
output of the
unit
computing
xi
to be
sure that
I
is not incremented
until
xi
is com
puted. Equation
3
is used to represent the fact
that
I
is
incremented every
clock
cycle using
the
Suc
unit, and
I
is initialized to the value
1
using the register number
n
41.
Equation
4
determines the end of operation when
I
reaches the value
m
.
Equations
5,6,7
represent
the composition operation in equations
1,2,3
of the ASL representation respectively. The ar
chitecture for the recursion with respect to
several variables
is
shown in Figure
2.
Similar
analysis has been done for the other two ap
proaches: fixed nested recursion and variable
nested recursion
[lo].
Table
1
shows a comparison among these
different forms of recursion
in
terms of archi
tecture, broadcasting and complexity of the
controller.
The simultaneous recursion is the
only form that gives a two dimensional array.
All forms have broadcasting except the variable
nesting. The controller of the variable nesting
is complex compared with the other three
forms.
3.
MatrixMatrix Multiplication
Example
An
example of matrix multiplication is in
troduced as an application of different forms
of
recursion. The architecture has two matrices
A
and
B
as inputs, and matrix
Cas
an output.
The multiplication is done in a recursive way
and can be described by the following high
level subroutine:
matrixmdti$luatkm
(A,B)
begin
fw
i1
to
n
fw
j1
to
n
begin
cij,o

O
fw
b1
to
n
next
b
c;j,k cjj,kl
+Ai,&
*
Bk,j
end
next
i
nuct
j
end
For the ASL and
RSL
representation using the
four recursive forms please refer to
[lo].
Figure
3
shows the architecture obtained for
matrix multiplication for the case
of
recursive
equations with several variables. The details
of
implementing the innerproduct cell
ar5
shown
in[9]. The architecture consists
of
N
inner
product cells. The number of cycles required
to perform the multiplication is
N.
Figure
4
shows the architecture using recursion
with respect to several variables. The architec
ture consists of
N
multiplication cells and one
adder. The number of
cycl5s
required to per
form the multiplication is
N
.
Figure
5
shows the architecture using fixed nes
ting recursion. The architecture consists of
N
innerproduct cells. The number of
cy+
re
quired to perform the multiplication is
N
.
Table
2
shows a comparison between different
architectures
of
the matrixmatrix multiplica
tion.
4.
Conclusions
In
this paper an formal approach for
transforming different forms of recursion to
parallel architectures has been introduced.
Four different forms are used to express a given
algorithm. Four optimal architectures for a
matrixmatrix multiplication are compared.
The developed approach represents the first
step towards developing a high level synthesis
system for general parallel architectures.
It
en
sures correctness, but it does not address op
timality which is considered
as
an
important
issue too. The developed approach has the fol
lowing advantages:
[l]
It
is suitable for large problems since the
transformation algorithm is linear.
359
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 13:29:18 EST from IEEE Xplore. Restrictions apply.
[2]
It
does not require to know the target
ar
chitecture in advance.
[3]
The technique is fully automated.
[4]
The designer is not responsible for
speclfylng the operations sequencing and
communications among different units.
POI
Processing, April
1990.
K.
M. Elleithy, "A Formal Framework for
High Level Synthesis of Digital Designs,"
Ph.D. Dissertation, Center for Advanced
Computer Studies, University of
SW
Louisiana,
1990.
[5]
The approach is applicable to any general
algorithm.
Acknowledgements:
The first author wishes
to thank
King
Fahd Univeristy of Petroleum
and Minerals for support.
5.
References
C.
J.
Tseng, "Automated Synthesis of
Data Paths in Digital Systems," Ph.D.
Dissertation, CMU, Apr.
1984.
T. J. Kowalski, "The VLSI Design Au
tomation Assistant: A Knowledgebased
Expert System," Ph.D. thesis, CMU,
1984.
H.
Trickey, "Flamel: A High Level
Hardware Compiler," IEEE Trans. on
Computer Aided Design, vol.
CAD6,
no.
T. Tanaka, T. Kobayashi and
0.
Karatsu,
"HARP:
Fortran to Silicon," IEEE Trans.
on computeraided design, vol.
8.,
no.
6.,
pp.
649660,
June
1989.
C.
Niessen, C.H. van Berkel,
M.
Rem,
and
R
W.
Saeiji,
"VLSI
Programming
and Silicon Compilation; A novel Ap
proach from Philips Research," Intl Cod.
on Computer Design:
VLSI
in Computer
and processors, pp.
150151,
Oct.
1988.
S.
K.
Rao,
"Regular Iterative Algorithms
and Their Implementations
on
Processor
Arrays,"
Ph.D.
Dissertation, Dept. of Elec
trical Eng., Stanford Uni.,
1985.
P.
Quinton, "Automatic Synthesis of
Sys
tolic Arrays From Uniform Recurrent
Equations," Proc. of the
11
th, Annual
Intl
Symp. on Computer Architecture,
K.
M. Elleithy and
M.
A. Bayoumi,
"Formal Synthesis of Parallel Architec
tures from Recursive Equations," Proceed
ings of the
1990
International Con
ference
on
Parallel Processing, Aug.
1990.
K.
M. Elleithy and M. A. Bayoumi, "A
Formal Framework for Synthesis of
Parallel Architectures," Proceedings of the
Fourth Annual Symposium on Parallel
2,
pp.
259269,
March
1987.
pp.
208214, 1984.
360
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 13:29:18 EST from IEEE Xplore. Restrictions apply.
......................................
=
a(argl+l,
...
,
arg,,Z+l
arg,,17
wrl))
a,
=
a(argl+l,.
..
,
ag,,_,+l
awn)
Simultaneous
SeVerJ
Fixed
variables nesting
Architecture
two one one
Yes
simple
dimensional dimensional dimensional
Broadcasting
Yes Yes
Complexity
simple simple
of
controller
ASL1: ASL code
for
recursion with respect
to
several variables
Variable
nesting
one
dimensional
no
complex
Simultaneous Several
variables
Area
"1
WJ
Time
d("3,
W3)
A*T
)
1
Broadcasting
Yes no
Hardware
innerproduct multiplier
(n
)
(n
)
adder
(1)
2
Table
1.
Comparison Between Different
Forms
of
Recursion.
Nesting
+J
3)
1
no
innerproduct
n
Table
2
Comparison Between Different MatrixMatrix Multiplication Architectures.
361
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 13:29:18 EST from IEEE Xplore. Restrictions apply.
Initp(l,argl
;
2,arg2
;
. .

;
nPWn>
......................................
sucmtm,,

x,
(amrcdy
I

p;+l
stcc
(I)
Ready

@(I
,
m)
z
(0
.
arg,
.....
awn)

Comp (arg,
......
awn
#
XI)
z
(arg,+l,
0,
arg,
.....
awn)

Comp (argl+l
.
0
.
arg3
.....
awn
#
~2)
zRc*
(0
.
arg, ..... awn)

And
(argy
.
arg3
fidY
.
...,awn
RcdY)
zRcdy
(argl+l
.
0
.
arg, ..... awn)
=
And
(arg1+lRcdY
.
....
&rgn
kaar
)
......................................
a(arg,+l, arg2+1
.....
awn,
.
0)

Comp(argl+l
.
arg2+l ..... aVnl+l#
Xn)
zRcdy(argl+l
.
arg2+1
.....
arg,,+l,
0)

And (arg1+lRcadY
.
arg,+l
hady
..... awn,
+P)
......................................
znl

Comp(argl+l
J..
.
arg,,+l, arg,,,
,
wrl)
#
2)
)
...
+lReady
Rcady
(n1)fiM

Atzd(arg1+lRcM arg,,
3
aWn1
9
z,
=
Comp(argl+l
...
arg,,,+l
,
awn
#
2)

And(arg1+lRcady
...
argnl
+lkad'
,
argy)
'n1
RSL1:
RSL
code for recursion with respect
to
several variables
362
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 13:29:18 EST from IEEE Xplore. Restrictions apply.
i
363

CII
Ell
U1
i
in
I
y
~~
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 13:29:18 EST from IEEE Xplore. Restrictions apply.