Conference PaperPDF Available

Abstract

In this paper parallel-ism on che algorithmic, architec-tural, and arithmetic levels is exploited in the design of a Residue Number System (RNS) based archite:;ture. The architecture is basecl on modulo processors. Each modulo processor is imple-mented 1 :) y two dimensional systol-ic arr,:iy composed of very simple cells. 'rhe decoding stage is im-plementled using a 2-D array, too. The dec:ading bottleneck is elim-inated. The whole architecture is pipelincd which lead to high throughput rate. -1. Introduction: In this paper,
SYSTOLIC ARITHMETIC ARCHITECTURES
Khaled
-
M.
Elleithy
Computer Engineering Department
Kinq Fahd University of Petroleum and Minerals
-Box
1967, Dhahran
Abstract:
-
In this paper parallel-
ism on che algorithmic, architec-
tural, and arithmetic levels is
exploited in the design
of
a
Residue Number System (RNS) based
archite:;ture. The architecture
is basecl on modulo processors.
Each modulo processor is imple-
mented 1:)y two dimensional systol-
ic arr,:iy composed
of
very simple
cells. 'rhe decoding stage is im-
plementled using a 2-D array, too.
The dec:ading bottleneck is elim-
inated. The whole architecture is
pipelincd which lead to high
throughput rate.
-
1. Introduction: In this paper,
Resiaue' Number System (RNS) has
been used to achieve arithmetic
parallelism. Figure 1 shows a
general RNS based architecture.
The algebraic properties
of
RNS
provide both high speed computa-
tion ancl parallel operations. In
our implementation, parallelism
is
achieved
in
three directions;
each modulo processor is imple-
mented by two dimensional systol-
ic arraq composed of very simple
cells. The decoding stage is im-
plemented using a 2-D array, too.
The delzoding bottleneck will be
eliminated.
The
whole
architec-
ture wiI.1 be pipelined which lead
to high throughput rate.
-
2.
Modu1.o Adder: The modulo adder
perforiii?; aadition in time com-
plexity O(1). Figure
2
shows the
modulo adder architecture. It is
composed of fine 1-D arrays in-
dependent
of
the size of the
31261, Saudi Arabia
moduli.
3.
Modulo Multiplier: The modulo
Zu1-r is using this modulo
adder as a computational kernel.
The multiplier consists
of
two
stages
as
shown
in
Figure
3.
In
the first stage is an SIMD array
of
AND gates used to obtain the
par'tial products. The second
stage
OE
the adder is a binary
tree
oE
modulo adders used to
perform the addition
of
the n
partial products.
-
4.
Chinese Remainder Theorem: The
proposed decoding stage
is
based
on the Chinese Remainder
Theorem(CRT)[l]. Although the
CRT provides a direct, fast, and
simple conversion formula, the
lack
of
large and fast modulo
M
adder has held back this ap-
proach. Systolizing the CRT will
overcome?
this problem. The pro-
posed
systolic CRT architecture
can produce both sign-magnitude
and
2's
complement data represen-
tation. The overall structure
is
shown
in
Figure
4.
The Inputs to
the residue decoder are the resi-
dues and a control line, C, which
determi1,ies the output to be in
sign maqnitude or
2's
complement
representation. The partial sum
generatlnr computes the partial
sums. The partial sum addition
is computed using the developed
modulo adder. The range determi-
nator enforce the correctness
of
data. The input data to this
stage is decomposed into groups
of
bits to be processed in paral-
lel. The overall time complexity
P
-
0-8186-2465-5/91 $1.00
Q
1991
IEEE
364
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 12:51:22 EST from IEEE Xplore. Restrictions apply.
of
t
is
O(
numbe
he
CRT
systolic architecture
log n
),
where
2
is the
r
of
moduli.
Conclusions: An SIMD structure
has bee.nesented which exhibits
two level parallelism; architec-
tural (systolic array) and arith-
metic (RNS). Three optimal algo-
rithms are introduced in this pa-
MODULO
m1
PROCESSOR
--LA--
MODULO
m2
-
R
E
C
0
M
P
0
S
E
R
-
per;
O(1)
modulo addition algo-
rithm,
O(log
n) modulo multipli-
cation and O(1og n)
CRT
decoder.
References
[l]
N.
S.
Szabo and R.
I.
Tanaka,
Residue Arithmetic and its Appli-
cations to Computer Technology,
McGraw-Hill,
1967.
OlJTPllT
A
Fig.
1.
General RNS Based Architectures.
.....................................
....................................
U&
1)
!111(2)
ghn)
.?.(a) Partial Product Generator.
Slay2
I
staga
2
slogc;
3
3.(b)
Addition
of
Partial Products using
R~rullll
I
Rrrultll
I
ID
0
Rcrulllrrl
Perulllnl
IO
Fig.
2.
The Modulo Adder.
rr
................
I1
1
rf
Partial
Sum
Generatlo:
...............
1l111*
(1
I
Partial
Sum
Adder
I
I
....
Final
Corrector
I
I
Fig.
4.
The Residue Decoder.
lviodulo Adders.
365
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 12:51:22 EST from IEEE Xplore. Restrictions apply.