Content uploaded by Khaled Elleithy
Author content
All content in this area was uploaded by Khaled Elleithy
Content may be subject to copyright.
Content uploaded by Khaled Elleithy
Author content
All content in this area was uploaded by Khaled Elleithy
Content may be subject to copyright.
Content uploaded by Khaled Elleithy
Author content
All content in this area was uploaded by Khaled Elleithy
Content may be subject to copyright.
SYSTOLIC ARITHMETIC ARCHITECTURES
Khaled

M.
Elleithy
Computer Engineering Department
Kinq Fahd University of Petroleum and Minerals
Box
1967, Dhahran
Abstract:

In this paper parallel
ism on che algorithmic, architec
tural, and arithmetic levels is
exploited in the design
of
a
Residue Number System (RNS) based
archite:;ture. The architecture
is basecl on modulo processors.
Each modulo processor is imple
mented 1:)y two dimensional systol
ic arr,:iy composed
of
very simple
cells. 'rhe decoding stage is im
plementled using a 2D array, too.
The dec:ading bottleneck is elim
inated. The whole architecture is
pipelincd which lead to high
throughput rate.

1. Introduction: In this paper,
Resiaue' Number System (RNS) has
been used to achieve arithmetic
parallelism. Figure 1 shows a
general RNS based architecture.
The algebraic properties
of
RNS
provide both high speed computa
tion ancl parallel operations. In
our implementation, parallelism
is
achieved
in
three directions;
each modulo processor is imple
mented by two dimensional systol
ic arraq composed of very simple
cells. The decoding stage is im
plemented using a 2D array, too.
The delzoding bottleneck will be
eliminated.
The
whole
architec
ture wiI.1 be pipelined which lead
to high throughput rate.

2.
Modu1.o Adder: The modulo adder
perforiii?; aadition in time com
plexity O(1). Figure
2
shows the
modulo adder architecture. It is
composed of fine 1D arrays in
dependent
of
the size of the
31261, Saudi Arabia
moduli.
3.
Modulo Multiplier: The modulo
Zu1r is using this modulo
adder as a computational kernel.
The multiplier consists
of
two
stages
as
shown
in
Figure
3.
In
the first stage is an SIMD array
of
AND gates used to obtain the
par'tial products. The second
stage
OE
the adder is a binary
tree
oE
modulo adders used to
perform the addition
of
the n
partial products.

4.
Chinese Remainder Theorem: The
proposed decoding stage
is
based
on the Chinese Remainder
Theorem(CRT)[l]. Although the
CRT provides a direct, fast, and
simple conversion formula, the
lack
of
large and fast modulo
M
adder has held back this ap
proach. Systolizing the CRT will
overcome?
this problem. The pro
posed
systolic CRT architecture
can produce both signmagnitude
and
2's
complement data represen
tation. The overall structure
is
shown
in
Figure
4.
The Inputs to
the residue decoder are the resi
dues and a control line, C, which
determi1,ies the output to be in
sign maqnitude or
2's
complement
representation. The partial sum
generatlnr computes the partial
sums. The partial sum addition
is computed using the developed
modulo adder. The range determi
nator enforce the correctness
of
data. The input data to this
stage is decomposed into groups
of
bits to be processed in paral
lel. The overall time complexity
P

0818624655/91 $1.00
Q
1991
IEEE
364
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 12:51:22 EST from IEEE Xplore. Restrictions apply.
of
t
is
O(
numbe
he
CRT
systolic architecture
log n
),
where
2
is the
r
of
moduli.
Conclusions: An SIMD structure
has bee.nesented which exhibits
two level parallelism; architec
tural (systolic array) and arith
metic (RNS). Three optimal algo
rithms are introduced in this pa
MODULO
m1
PROCESSOR
LA
MODULO
m2

R
E
C
0
M
P
0
S
E
R

per;
O(1)
modulo addition algo
rithm,
O(log
n) modulo multipli
cation and O(1og n)
CRT
decoder.
References
[l]
N.
S.
Szabo and R.
I.
Tanaka,
Residue Arithmetic and its Appli
cations to Computer Technology,
McGrawHill,
1967.
OlJTPllT
A
Fig.
1.
General RNS Based Architectures.
.....................................
....................................
U&
1)
!111(2)
ghn)
.?.(a) Partial Product Generator.
Slay2
I
staga
2
slogc;
3
3.(b)
Addition
of
Partial Products using
R~rullll
I
Rrrultll
I
ID
0
Rcrulllrrl
Perulllnl
IO
Fig.
2.
The Modulo Adder.
rr
................
I1
1
rf
Partial
Sum
Generatlo:
...............
1l111*
(1
I
Partial
Sum
Adder
I
I
....
Final
Corrector
I
I
Fig.
4.
The Residue Decoder.
lviodulo Adders.
365
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 12:51:22 EST from IEEE Xplore. Restrictions apply.