Conference PaperPDF Available

A formal design methodology for parallel architectures

Authors:

Abstract and Figures

The authors introduce a formal approach for synthesis of array architectures. The methodology provides two main features: completeness and correctness. Completeness means the ability to use the approach for any general algorithm. Correctness is achieved by using a set of transformations that are proved to be correct. Four different forms are used to express the input algorithm: simultaneous recursion, recursion with respect to different variables, fixed nesting, and variable nesting. Four different architectures for the same algorithm are obtained. As an example, a matrix-matrix multiplication algorithm is used to obtain four different optimal architectures. The different architectures of this example are compared in terms of area, time, broadcasting, and required hardware
Content may be subject to copyright.
A
FORMAL DESIGN METHODOLOGY
FOR
PARALLEL ARCHITECTURES
KHALED M. ELLEITHY
&
MAGDY
A.
BAYOUMI
The Center For Advanced Computer Studies
University of Southwestern Louisiana
Lafayette, LA 70504, U.S.A.
ABSTRACT--
In this paper, we introduce
a
formal approach for synthesis
of
array architectures. Four different
fovms
are used
to
express the input
algorithm: simultaneous recursion, recursion with respect
to
dsferent vari-
ables, fixed nesting and variable nesting. Four different architectures for
the same algorithm are obtained.
As
an example,
a
matrix-matrix multi-
plication algorithm is used
to
obtain four different optimal architectures.
The dtfferent architectures
of
this example are compared in terms
of
area,
time, broadcasting and required hardware.
1.
INTRODUCTION
Reported techniques
for
high level synthesis surer from the
following
disadvan-
tages: some systems are not suitable
for
large problems (Emerlad[l]), some systems
require to
know
the target architecture in advance in order to have the structural
description [2-4], a number of restrictions are imposed on the input description
(Flamel[5], HART[6]), Certain phases of the design process are not automated due to
the lack of algorithms to transform between different representations; such limita-
tions require the designer’s responsibility for designing these phases (HARP[6]), the
designer is responsible for specifying the operations sequencing and communications
among different units
[7,8],
and some systems are limited to a special class of algo-
rithms
[9-113.
A
formal design methodology for high level synthesis has been introduced in
[12-141 to overcome the previous disadvantages. The architectures produced by this
methodology can be classified as uniprocessor architectures. To exploit the parallelism
in
a
given algorithm the methodology has been generalized
so
that it can be applied
to the simultaneous recursion form
[15,16].
In this paper the methodology is ap-
plied to the following forms:
[l]
[2] Fixed nested recursion.
[3]
Variable nested recursion.
The methodology provides two main features:
completeness
and
correctncrs.
Com-
pleteness means the ability to use tlic approach for any general algorithm. Correctness
is achieved by using a set of transformations that are proved to be correct.
A
formal
framework for the synthesis procedure has bcen developed which can be easily auto-
mated.
A
design example of matrix-matrix multiplication is used with
each
one
of
the forms to obtain a parallel architecturc. These different architectures for this
Recursion with respect to several variables.
CH2920-7/9010000/0603$01
.OO
0
1990
IEEE
603
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
604
International Coqference
on
Application Specific Array Processors
example are compared in terms
of
area and speed.
2.
RECURSION WITH RESPECT
TO
SEVERAL VARIABLES
If
xi
(16i
<n)
are
n-1
place functions,
z
is
n
place functions,
y
is 2n place
function and
w!?
are
n
place functions, then
z
is defined
by
the following Algorithm
Specification Linguage (ASL)[12-141 code:
Transformation
Algorithm
to
RSL
To
transform the system
of
recursion with respect
to
several variables to the
Realization Specification Language RSL[12-14] representation we implement each
equation using the same method described in[12-14]. Here is the RSL representation
of the system:
(1)
Initp(l,avgl
;
2,arg2
;
. .
'
;
n,ayB,)
I
=
p;
+I
mc
(I)
(3)
Ready
-
eg?(I
,
m)
(4)
(5)
z(0,
arg2;
.
.
,
aa,,)
-
Comp(arg2
,....,
awn
#
xl)
pdY
(0,
aw,,
. . .
,
aa,)
=
And(argy
,
argy
,
...,
arg,
fi'"iY)
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
Potpourri
111
605
z
(argl+l
.
0
.
arg,
.....
awn)
-
Comp(arBl+l
.
0
.
arg3
.....
awn
#
~2)
zfidy(argl+l
.
0,
aV3
.....
aVIj) =And(argl+lfidy
.
arg3fidy
.....
arg,
fidY
)
......................................
Equation
1
is used to show that we
use
n
registers to be initialized with the argu-
ments
(argl,
...
,ay&,,).
Equation
2
means that the unit
Suc
which is a basic func-
tion has its inputs
control,
(l<i
<Y)
connected to the
readyi
output of the unit com-
puting
xi
to be sure that
I
is not incremented until
xi
is computed. Equation
3
is
used to represent the fact that
I
is incremented every clock cycle
using
the
Suc
unit,
and
I
is initialized to the value
1
using the register number
n+l.
Equation
4
deter-
mines the end of operation when
I
reaches the value
m.
Equations
5,6,7
represent
the composition operation in equations
1,2,3
of the
ASL
representation respectively.
Proof of correctness for the algorithm can
be
found
in[17].
Example:
Let
us
use the recursion with several variable to define the greatest common di-
visor
(GCD).
Two
variables are required to
dcfinc
the
GCD.
The definition
is
as
fol-
lows:
GCD
(0,n
)
-
n
GCD
(m
+1,0)
-
m
+1
GCD(m+l,n+l)
-
p(m,n,GCD(iv(m,Iz),n+l),GCD(m+l,w(n,m)))
w(m,n)
<
m+1
w(m,n)
<
n+l
P(m,n,a,b)
=
a
n>m
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
606
International Coderence
on
Application Specijic Array Processors
b
m<n
This
is
expressed
in
ASL
as
follows:
GCD
(0,~
=
$(o
,
n
1
GCD(m +1,0)
=
$(m
+1,0)
GCD(m+l,n+l)
-
p(m,n,GCD(w(m,n),n+l),GCD(m
+l,w(n,m)))
w
(m
,n
1
-
m
(h(9f(m
,n
1))
w
(n
,m
)
-
-than)
(h($(n
,m
)))
p(m,n,a,b)
-
(compareli(m,n,a,b)
The
RSL
representation
of
this example
is
as
follows:
Initp(1,m
;
2,n)
2rcady
2rcdy
S~C~‘?,rr0ll
=
92
(a*)
~~CCO”
=
91
(ai)
I
-
p;+l
suc(I)
Ready
=
eq?
(I
,
m
)
GCD
(0
,
n)
=
Comp
(0,n
#
7;)
GCD~~~Y(O
,
n)
-
pz
nRrady
GCD(~
+i
,
0)
=
comp
(m
+i,o
#
$)
mRrndy
GCDfidy
(m
+1
,
0)
=
p1
GCD(m+l
,
n+l)
-
Comp(m+l,
n+l
,
GCD(w(m,n),n+l)
,
GCD(m+l,w(n,m))
#
p)
GCDRrady(m+l
,
n+l)
-
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
Porpourri
111
607
We
see
in
this example the correspondence between
ASL
and
RSL
representations.
Equation
1
shows that we use 2 registers to be initialized with the arguments
m,~.
Equation 2,3,4 has the same meaning as equations 2,3,4 in the transformation algor-
ithm. Equations
5,6,7
represent the composition operation in equations
1,2,3
of the
ASL
representation respectively.
3.
NESTED RECURSION
Two cases are considered for the nested recursion: fixed number of nestings and
varied number
of
nestings. Instcad of addressing
a
recursion of k-fold nesting, we
show the idea in every case with
a
double nesting. Thc general case of k-fold can be
easily extended.
3.1
Fixed Number of Nestings
If
x
is 1-place function,
z
is 2- place function, and
y
is 2-place function and
w
is’3-place function, then
y
is
a
double nested recursion function defined by the
following
ASL
Y(0 >a)
=x(a)
y(n+l
,a)
,y(n
,w(n
>fi
,y(n
,a))))
Although the solution of the function
y
gets more complicated as we
go
for higher
values, we are still able to solve it in the same previous method. The solution we ob-
tain for y(2,a) is expressed in the functions
z,w$
but not the function
y.
This
means
if
the functions
z
,w
+
are primitive recursive than
y
is primitive recursive since
it is driven from
z
,w
,x
by
composition.
Transformation Algorithm
to
RSL
To
transform the system of nested recursion to
RSL
we implement each equa-
tion using the same method described in[12-14]. Here is the
RSL
representation of
the double nested recursion:
Initp
(1,rz
;
2,a
)
(1)
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
Porpourri
111
609
the double nested recursion:
Inirp(1,n
;
2,m)
J*C,,,,,,I
-
x
(a
I
-
p;
SUC(Il)
I
-
p,'
SllC(I*)
yfi.dy(O,n)
~
pdv
Ready
-And
(eg?
(I,
,
n)
,
eg?
(I,
,
n))
y(O,n)
-
cOWlP(7Z
#
X)
(n
)
y(m+l,O)
-
Comp(m,l
#
y)
yfin'"(m+l,O) -yfiady(m,l)
Temp,
-
Comp(m+l,n
#
y)
Temp?
-And
(prfindy
,
prfidy)
y
-
Comp (m
,a
,Temp
,
#
Y
1
yfipdy
-And
(pTfiadv
,
Tempydy)
Temp?
-And
(pYfindy
,
Temp?)
y
-
Comp(n,a,Temp3
#
z)
randy
I
And
(pl)fidy
,
p?
,
Temp?)
(7)
(9)
Proof of correctness for the algorithm can be found in[17].
Table
1
shows
a
comparison between diff'erent
forms
of recursion
in
terms
of
ar-
chitecture, broadcasting and complexity of the controller. The simultaneous recur-
sion
is
the only form that gives a two dimensional array.
All
forms have broadcasting
except the variable nesting. The controller
of
the variable nesting is complex com-
pared with the other three forms.
Table
1.
Comparison Between Different Forms
of
Recursion.
Simultaneous Several Fixed Variable
variables nesting nesting
Architecture
tWO
one one one
dimensional dimensional dimensional dimensional
Broadcasting
yes
yes
Yes no
Complexity
simple
simple
simple complex
of
controller
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
610
Internutiowl Corlference on Application Specific
Array
Processors
5.
Matrix-Matrix Multiplication Example
An
example of matrix multiplication
is
introduced as an application of different
forms
of
recursion. The architecture has two matrices
A
and
B
as inputs, and matrix
Cas
an
output. The multiplication is done in
a
recursive way and can be described
by the following high level subroutine:
mat*-multiplication
(A,B)
begin
for
i-1
to
n
begrn
for
j=l
to
n
CiJ
-
0
for
&-I
to
n
nexidl
C.
C;j,k-l
+
Ai,k
*
Bkj
end
next
i
next
j
end
The
ASL
and
RSL
representation
using
the
simultaneous recursion form
is
as fol-
lows:
......................................
s14cconmi
-
[()““‘Y
r
-
p;+l
$Uc(r)
Ready
=
eq?(l
,
m)
Remltl
-
comp (aT,I,p$’ Result
#
innerproduct)
......................................
Resultl
-
comp (fiT,I,p$+’
Result
#
innerproduct)
Figure
1
shows the architecture obtained
for
matrix multiplication. The details of
im$cmcnting the inner-product cell
arc
shown in
[14].
The architecture consists of
N
inner-product cells. The number of cycles required to perform the multiplication
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
Potpourri
III
61
1
is
N.
Figure
2
shows the architecture using recursion with respect to several variables. The
architecture consists
of
N
multiplication cqls and one adder. The number
of
cycles
required to perform the multiplication is
N
.
Figure
3
shows the architecture using
fixed
nesting recursion. The architecture con-
sists
of
N
in2er-product cells. The number
of
cycles required to perform the multi-
plication is
N
.
Table
2
shows
a
comparison between different architectures
of
the matrix-matrix
md-
tiplication.
-3
-
)
LE?
,
Ready
Aln,
,A12,A11
suc
1
I
ready
Inner
Product
cell
A2n,..,i22,Azi
Product
I
H
An.
..,AnZ.An
1
Product
cell
Product
cell
$E-$
Product
Product
cell
Fig.
1.
Matrix-Matrix Multiplication Architecture Using Simultaneous Recursion.
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
612
International Coderenee
on
Application Spec@c Array Processors
'
-
Table
2.
Comparison Between Different Matrix-Matrix Multiplication Architectures.
ready
Result
Cl
1
control
c2
1
C3
1
@
Simultaneous Several Nesting
variables
Area
*)
e(=
)
Time
2,
e(@
A+T
6(n3)
0(n3)
W3)
Broadcasting
Yes no no
Hardware
inner-product inultiplier
(n
)
inner-product
)
adder
(1)
(n
)
An1 A11 A31A21AIl
ready
Bln Bln
E11
BII
811
LU
An2 A12 A32A22A12
B2n B2n
821 821
021
Ready
Control
t
c12
Cnn
Ann Aln A3nA2nAln
1
1
Dl
Bnn
..
Bnn
......
Bnl Bn
1
0nl
Fig.
2.
Matrix-Matrix Multiplication Architecture Using Recursion
With Respect to Several Variables.
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
Potpourri
III
613
Ann
.._..
Aln Aln
0000
..
00
Bnn
....
Bn2 Bnl
0000
..
00
+
6.
CONCLUSIONS
/nnpl;
Result
pr*d’ct
(1)
>
In
this
pa
er
an
formal
approach for transforming different
forma
of
recurtion
to
parallel arc&tectures has been introduced. Four different
forms
are used
to
express a given algorithm. Four optimal architectures for
a
matrix-matrix multiplica-
tion are compared. The approach has the following advantages:
__
It
is suitable for large problems since the transformation algorithm is linear.
It
does not require
to
know
the target architecture in advance.
There are no restrictions imposed on the input description.
The technique is fully automated.
The designer is not responsible for specifying the operations sequencing and
communications among different units.
The approach is applicable to any general algorithm.
Parallcl
properties
of
algorithms are explored.
n
SUC
I
1
Ready
I
m
......
IT
A12
A12
0
Ann
Ann
.....
Bnn
.....
Fin.
3.
Matrix-Matrix Multiolication UsinPr Fixed Number
of
Nestinzs.
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
614
References
International Codererace
on
Application Specific Array Processors
C.
J.
Tseng, "Automated Synthesis of Data Paths in Digital Systems," Ph.D.
Dissertation, CMU, Apr.
1984.
T.
J.
Kowalski, D.
J.
Geiger, W.
H.
Wolf, and A. C. Parker, "The VU1 Design
automation assistant: From Algorithms to Silicon," IEEE Design and Test, pp.
33-43,
Aug.
1985.
T.
J.
Kowalski, "The VU1 Design Automation Assistant: A Knowledge-based
Expert System," P1i.D. thesis, CMU,
1984.
T.
J. Kowalski and D. E. Thomas, "The VLSI Design Automation Assistant: Pro-
totype System," Twenty Design Automation Conf. Proc., Miami, pp.
479483,
1983.
H. Trickey, "Flamel: A High Level Hardware Compiler," IEEE Trans. on Com-
puter Aided Design, vol.
CAD-6,
no. 2, pp.
259-269,
March
1987.
T. Tanaka,
T.
Kobayashi and
0.
Karatsu,
"HARP:
Fortran to Silicon," IEEE
Trans. on computer-aided design, vol.
8.,
no.
6.,
pp.
649-660,
June
1989.
C. Niessen, C.H. van Berkel, M. Rem, and
R
W. Saeiji, "VLSI Programming
and Silicon Compilation; A novel Approach from Philips Research," Intl Conf.
on Computer Design: VLSI in Computer and processors, pp.
150-151,
Oct.
1988.
C.H.
van Berkel, M.
Rem
and
R
W.
Saeijs, "VLSI Programming," Intl Conf. on
Computer Design: VU1 in Computer and processors, pp.
152-156,
Oct.
1988.
S.
D.
Johnson, "Synthesis of Digital Designs from Recursion Equations," Ph.D.
Dissertation, Comp. Sc. Dept., Indiana Uni.,
1983.
[lo]
P.
Quinton, "Automatic Synthesis of Systolic Arrays From Uniform Recurrent
Equations," Proc.
of
the
11
th, Annual Intl Symp. on Computer Architecture,
[ll]
S.
K.
Rao,
"Regular Iterative Algorithms and Their Implementationr on
Pro-
cessor Arrays," Ph.D. Dissertation, Dept. of Electrical Eng., Stanford Uni.,
1985.
[12]
K.
M. Elleithy and
M.
A. Bayoumi, "Synthesizing DSP Architectures from
Behavioral Specifications: A Formal Approach," Proceedings of the
1990
IEEE
International Symposium for Circuits and Systems, May
1990.
[13]
K. M. Elleithy and M. A. Bayoumi, "A Formal High Level Synthesis Approach
for DSP Architectures," Proceedings
of
the
1990
International Conf. on Acous-
tics, Speech and Signal Processing, April
1990.
[14]
K.
M. Elleithy and M. A. Bayoumi, "A Frame-work for High Level Synthesis
of
Digital Architectures from precursive Algorithms," Proc.
of
the ACM
Eighteenth Annual Computer Science Conference, pp.
305-311,
Feb.
1990.
[15]
K.
M. Elleithy and
M.
A. Bayoumi, "Formal Synthesis of Parallel Architectures
from Recursive Equations," Accepted for inclusion in the
1990
International
Conference on Parallel Processing, Aug.
1990.
[16]
K. M. Elleithy and M. A. Bayoumi, "A Formal Framework for Synthesis
of
Parallel Architectures," Proceedings
of
the Fourth Annual Symposium on
Parallel Processing, April
1990.
[17]
k. M. Elleithy, "A Formal Framework
for
High Level Synthesis
of
Digital
Designs," Ph.D. Dissertation, The Center for Advanced Computer Studies,
University of
SW
Louisiana,
1990.
pp.
208-214, 1984.
Authorized licensed use limited to: University of Bridgeport. Downloaded on March 01,2010 at 15:04:30 EST from IEEE Xplore. Restrictions apply.
Conference Paper
Full-text available
A formal design methodology is used to design a residue Number System (RNS) processor. An optimal architecture for the residue decoding process is obtained through this design approach. The architecture is modular, consists of simple cells, and is general for any set of moduli
Data
In this paper a formal design methodorogy is used to design a Residue Number System (RNS) processor. An optimal architecture for the residue decoding pro-cess is obtained through this design approach. The architecture is modular, con-sists of simple cells, and is general for any set of moduli. -1. Introduction A novel approach for synthesizing digital architectures has been introduced in[l-4]. The approach is supporting two essential features: completeness and correctness. Completeness means the abil-ity to use the approach for any general algorithm. Correctness is achieved through a unified formal set of transformations that transforms a high level algorithmic description to an RTL level architecture. A given algorithm is modeled using a new developed language termed Algorithm specif-ication language(ASL). The realization for-mat is based on representing the architec-ture by another developed language called Realization Specification Language (RSL). In order to support parallel architec-tures, the approach have been extended to include different forms of recursion[5-7]. Other forms of recursion such as: simul-taneous recursion, recursion with respect to several variables, nested recursion with fixed number of nestings and nested recur-sion with variable number of nestings are used for designing parallel architectures. In this paper, we use this formal design methodology for designing a Residue Number System (RNS) processor. Section 2 gives a brief description of the design methodology. Section 3 describes the formal design of a residue decoder. Section 4 offers conclusions.
Conference Paper
Full-text available
A formal behavioral synthesis framework is introduced for specification, simulation, and synthesis of digital signal processing (DSP) algorithms. The given algorithm is represented using a newly developed language called the algorithm specification language (ASL). The components and connectivity of the synthesized architecture can be represented in three different forms: a language called the realization specification language (RSL), schematic captures, and PROLOG. PROLOG is used as a user interface language between the user subsystem and the synthesis subsystem. Algorithms of linear time complexity are introduced for transferring between different representations
Conference Paper
Full-text available
An approach is presented for high-level synthesis of digital signal processing (DSP) algorithms. Two features are provided by the approach: completeness and correctness. A given algorithm is represented in a newly developed language termed the algorithm specification language (ASL). ASL had the ability to describe any general algorithm. An automatic procedure is used to transform an ASL representation into a specific realization specification using a correctness preserving set of transformations. The realization format is based on representing the digital architectures by another language called the realization specification language (RSL). Logic programming is used as a user interface for the synthesis procedure
Conference Paper
Full-text available
The major drawback of reported high level synthesis techniques is their limited applicability to a specific class of algorithms without extendibility to general algorithms and the lack of a formal approach to prove the correctness of the such techniques. In this paper, we introduce a novel approach for high level synthesis from μ-recursive algorithms. Two features are provided by the approach: completeness and correctness. Completeness means the ability to use the approach for any general algorithm. Correctness is achieved by using a set of transformations that are proved to be correct. A formal framework for the synthesis procedure has been developed which can be easily automated. A given algorithm will be represented in a new developed language termed Algorithm Specification Language (ASL). ASL has the ability to describe any general algorithm. An automatic procedure is used to transform an ASL representation into a specific realization specification using a correctness preserving set of transformations. The realization format is based on representing the digital architectures by a Realization Specification Language(RSL).
Conference Paper
This paper describes an approach to VLSI design synthesis that uses knowledge-based expert systems to proceed from an algorithmic description of a VLSI system to a list of technology-independent registers, operators, data paths, and control signals. This paper describes how the prototype Design Automation Assistant uses large amounts of expert knowledge to design an architecture with little searching. It also presents the current design of a small microprocessor along with a discussion of improvements currently being added.
Data
In this paper, we introduce a novel approach for high level syn-Lhesis for DSP algorithms. Two features are provided by the approach: completeness and correctness. A given algorithm will be represented in a new developed language termed Algorithm Specification Language (ASL). ASL has the abilit,y to describe any general algorithm. An automatic procedure is used to transform an ASL representation into a specific realization specification using a correctness preserving set of transformations. The realization format is based on representing the digital architectures by another developed language called Realization Specification Language(RSL). Logic Programming is used as a user interface for the synthesis pro-cedure.
Article
The Design Automation Assistant speeds VLSI chip design. More Importantly, perhaps, it makes explicit some of the intuition and common sense that are important elements of expertise.