## No full-text available

To read the full-text of this research,

you can request a copy directly from the authors.

With the current advances in VLSI technology, traditional
algorithms for Residue Number System (RNS) based architectures should be
reevaluated to explore the new technology dimensions. In this brief, we
introduce A θ(log n) algorithm for large moduli multiplication
for RNS based architectures. A systolic array has been designed to
perform the modulo multiplication algorithm. The proposed modulo
multiplier is much faster than previously proposed multipliers and more
area efficient. The implementation of this multiplier is modular and is
based on using simple cells which leads to efficient VLSI realization. A
VLSI implementation using 3 micron CMOS technology shows that a
pipelined n-bit module multiplication scheme can operate with a
throughput of 30 M operation per second

To read the full-text of this research,

you can request a copy directly from the authors.

... Combinatorial-logic-based residue arithmetic circuits can be very efficient when bases of moduli of the form or [7]- [10] are used, as the corresponding circuits resemble the simplicity of ordinary binary arithmetic, due to the carry-ignore, carry-add, and carry-subtract properties. Adder-based word-level modulo circuits are presented in [11] and [12]. In particular, in [11], a systolic architecture for modulo multiplication is presented, which consists mainly of modular adders. ...

... Adder-based word-level modulo circuits are presented in [11] and [12]. In particular, in [11], a systolic architecture for modulo multiplication is presented, which consists mainly of modular adders. The multiplier in [12] uses binary multipliers and adders as building blocks. ...

... Property 1: Let , , and be radix-digits. Then, for the operation , it holds that (9) Proof: The proof is given in Appendix A. Let the elements of the vector (10) be organized into groups of bits each, starting from the least significant position to form as follows: (11) where , , is the binary-vector representation of a radix-digit. ...

A graph-based technique is introduced for the design of a class of residue arithmetic multipliers, as well as a family of new high-radix digit adders. A proposed design technique derives simple high-radix modulo- r <sup>n</sup> multipliers by optimally selecting among the variety of introduced digit adders the ones that compose a minimal-area multiplier. The proposed technique minimizes multiplier complexity by selecting digit adders that observe the constraints imposed on the maximum values of the various intermediate digits. The proposed technique leads to significant area and time improvements over previously published architectures for practical modulus cases.

... In [14] and [16] a generic structure for modular compressing of four inputs to two outputs is presented. This structure computes the modular reduction in five steps as depicted in Figure 1(a). ...

... The first step taken to evaluate the RNS compressors was to compare the obtained results for the considered modulo {2 n − 3} and {2 n − 3} compressor structures, namely i) CSA [16] 4:2, ii) CSA-1, and iii) CSA-2. ...

... As expected from the theoretical analysis, the CSA-2 structure is the most area and delay efficient realization, as depicted in Table I and Figure 4. Units for modulo {2 n − 1} and {2 n + 1} have also been synthesized, in order to evaluate their performance, regarding circuit area (A), delay (T), and AT 2 as performance metric. Experimental results suggest that the proposed structure CSA-2 4:2 has less 33% area and achieves a reduction of the delay up to 33% compared to CSA [16] 4:2 structure. The structure CSA-I has equal delay than CSA [16] ...

In this paper Residue Number Systems (RNS) con- version structures from Binary to RNS modulo {2 n ± 3} are proposed. These structures are based on arithmetic calcula- tions without the need for Lookup Tables as in the related art. Additionally, the required 4:2 and 3:2 Carry-Save Adders (CSA) modulo {2 n ± 3} are also proposed. Experimental results obtained for an ASIC technology suggest that the presented CSAs, needed in the conversion, improve the related art by reducing the required area resources by 33% and achieving a 1.49× speedup. Experimental results for the proposed conversion units suggest that improvements in performance up to 3 times can be achieved,while reducing the required area resources by 85%. Index Terms—Residue Number System; Conversion units; Compressor units; Arithmetic;

... Specifically, RNS has been realized in applications involving the design of adaptive coded multicarrier modulation systems [3], delta-sigma modulators [4], Discrete Fourier Transform, Number Theoretic Transform and many other applications [1,2]. This implies that designing efficient modular components like adders and multipliers is a very essential issue in implementing different RNS-based applications and processors [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]. ...

... Parker and Benaissa [14] introduced a serial modular multiplier. Elleithy and Bayoumi [15] proposed a systolic architecture for modulo multiplication which consists, mainly, of modular adders. Di Claudio et al. [16] presented also a multiplier based on a new pseudo-RNS and utilized, basically, arithmetic components. ...

... Substituting (8) into (15) leads to: ...

... transform (DFT) [7]. Elleithy and Bayoumi presented an architecture for modular multiplication, which consists mainly of modular adders and is suitable for medium and large moduli [8]. Stouraitis et al. [9] introduced full-adder (FA) based architectures for RNS multiply-add operations, which adopt the carry-save array paradigm, while the same architecture has been refined by Soudris et al. in [10]. ...

... , of (7) is reduced to , which represents zero, because the bit weight is ignored in -bit two's complement addition. Therefore, in order to evaluate (5) using two's complement arithmetic, the term is encoded according to the three possible values assumed by , as if if if (8) where . Due to (5) and (8), can be computed by (9) where is a cumulative additive constant, computed as (10) and ...

... In the following, the performance of the proposed residue multiplier is compared to the adder-based architectures presented in [7], [8], and [13]. The area complexity of the pseudoRNS multiplier [7] equals the complexity of -bit multipliers and -bit adders and the overall delay is . ...

A novel hardware algorithm, a VLSI architecture, and an optimization methodology for residue multipliers are introduced in this paper. The proposed design approach identifies certain properties of the bit products that participate in the residue product computation and subsequently exploits them to reduce the complexity of the implementation. A set of introduced theorems is used to identify the particular properties. The introduced theorems are of significant practical importance because they allow the definition of a graph-based design methodology. In addition, a bit-product weight encoding scheme is investigated in a systematic way, and exploited in order to minimize the number of bit products processed in the proposed multiplier. Performance data reveal that the introduced architecture achieves area × time complexity reduction of up to 55%, when compared to the most efficient previously reported design.

... Designing an efficient modular multiplier is, then, an important task. Many multipliers can be found in literature [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17]. ...

... However, these designs are limited to small moduli. Elleithy and Bayoumi [15] have introduced an architecture for modular multiplication. The design, which consists mainly of modular adders, is suitable for medium and large moduli. ...

... This seems to be very impractical. The third group deals with medium and large moduli [11], [12], [13], [15], [16], [17]. It uses basically arithmetic components, like binary multipliers and adders, operating in parallel, along with a few small-size ROMs or logic components. ...

Modular multiplication is a very important arithmetic operation in
residue-based real-time computing systems. In realizing these
multipliers, ROM-based structures are more efficient for small moduli.
Due to the exponential growth of ROM sizes, implementations with
arithmetic components are more suitable for medium and large moduli.
This paper presents a new modular multiplier that can deal efficiently
with medium and large size moduli. The design of this modular multiplier
that multiplies two n bit residue digits consists, basically, of a
(n×n) binary multiplier, a ((n-1-k)×k) binary multiplier
(k<n), three n-bit adders, and a small-size combinational circuit.
When compared with the most competitive published work, the new
multiplier reduces, significantly, both time delay and hardware
requirements. The design is very suitable for VLSI realization

... A number of techniques have been proposed by researchers for the implementation of =ldtip&cation h RNS. Look-up tables have been the main modules in constructing RNS multipliers [2]- [5]. Memory-based structures a e efficient €or small modnli sizes. ...

... Memory-based structures a e efficient €or small modnli sizes. For medium and large moduli, bitlevel structures a r e more efficient [5]. ...

The design of Residue Number System (RNS) multipliers has received
considerable attention in the last few years. This paper presents a new
approach for designing modular multipliers using a combinational logic
technique. The idea is based on constructing a truth table whose inputs
are the bits of the multiplicand and the multiplier. The outputs are the
bits of the modular product. Realizing any minimized Boolean function is
achieved using two levels of gates. Compared to most recent developed
approach, our new technique requires less integrated circuit area and
operates at a higher speed

... The multiplier introduced in [13] is efficient, for large moduli, when compared with ROM-based implementations. The multipliers in [14]- [15] use binary multipliers and adders as building blocks. Similarly, the newly proposed multiplier in [16] uses similar components. ...

... While the binary-to-residue conversion does not pose a serious threat to the high speedRNS operations, the residue-to-binary conversion can be a bottleneck. Chinese Remainder Theorem (CRT) [6] is considered the main algorithm for the conversion process. Several implementations of the residue decoder have been reported [2,5,7-101. ...

Designing an optimal Residue Number System (RNS) processor in terms of area and speed depends on the choice of the system moduli. In this paper an optimal algorithm for choosing the system moduli is presented. The algorithm takes into consideration several constraints imposed by the problem definition. The problem is formalized as an integer programming problem to optimize an aredtime objective function.

... While the binary-to-residue conversion does not pose a serious threat to the high speedRNS operations, the residue-to-binary conversion can be a bottleneck. Chinese Remainder Theorem (CRT) [6] is considered the main algorithm for the conversion process. Several implementations of the residue decoder have been reported [2,5,7-101. ...

Designing an optimal Residue Number System (RNS) processor in terms of area and speed depends on the choice of the system moduli. In this paper an optimal algorithm for choosing the system moduli is presented. The algorithm takes into consideration several constraints imposed by the problem definition. The problem is formalized as an integer programming problem to optimize an aredtime objective function.

... array (parallel) [9][10][11], serial [9,[12][13][14][15], and serial-parallel multipliers (SPM). SPMs, which are used for hardware simplicity and moderate speed, have one of the two operands (the multiplicand, A) loaded in parallel, and the other (the multiplier, B) fed serially to the multiplier. ...

The multiplier is one of the most important components in the computing and reconfigurable computing systems, especially in the field of digital signal processing (DSP). Hence, in this paper, a performance evaluation and comparison (efficient area and moderate speed) for different serial–parallel multiplier structures have been carried out for the case of their implementation by one of the programmable logic devices, such as a field programmable gate array (FPGA). The implementation of these structures for 8-bit parallel operands has been executed by utilizing the XC4010E chip and Foundation software package V1.3 from Xilinx. The implementation results illustrate the progress in the design area, saving and speeding up the design performance.

... At first, emphasis should be on efficient implementations of modular multipliers . If we look at modular multipliers, many designs have been proposed in the literature, ranging from look-up table based structures for small moduli [42] [23] [32] [15], to devices restricted to specific moduli [33] [44] [22], to architectures suitable for medium and large moduli and using only arithmetic and logic components [16] [1] [45] [2] [22]. ...

Logical cryptanalysis has been introduced by Massacci and Marraro as a general framework for encoding properties of crypto-algorithms into SAT problems, with the aim of generating SAT benchmarks that are controllable and that share the properties of real-world problems and randomly generated problems.In this paper, spurred by the proposal of Cook and Mitchell to encode the factorization of large integers as a SAT problem, we propose the SAT encoding of another aspect of RSA, namely finding (i.e. faking) an RSA signature for a given message without factoring the modulus.Given a small public exponent e, a modulus n and a message m, we can generate a SAT formula whose models correspond to the eth roots of m modulo n, without encoding the factorization of n or other functions that can be used to factor n. Our encoding can be used to either generate solved instances for SAT or both satisfiable and unsatisfiable instances.We report the experimental results of three solvers, HeerHugo by Groote and Warners, eqsatz by Li, and smodels by Niemela and Simmons, discuss their performances and compare them with standard methods based on factoring.

... The use of RNS has been adopted in several digital signal processing applications 6,7,8,9,10,11 . Finally, modular adders are essential building blocks for modular multipliers 12,13 , residue to binary converters 14,15 and other modular operations 16,17 . ...

Modular adders are met in various applications of computer systems. In this paper, we investigate a new architecture for their design that utilizes a carry save adder stage and two binary adders that operate in parallel. Realizations in static CMOS reveal that the introduced architecture leads to modular adder implementations that offer significant savings in delay and power consumption over implementations based on previously proposed architectures. In parallel, the proposed architecture offers significantly smaller implementation area for small operand widths.

... Each cell contains a multiplier and an adder. In a high-complexity system, area restriction is very crucial, thus leads to a need for a systolic array-based implementation of the area -consuming operator such as multiplier678. The multiplier in each cell of the systolic array for convolution is a natural candidate for systolization and should be implemented using systolic array as is proposed in this paper. ...

High-performance computation on a large array of cells has been an important feature of systolic array. To achieve even higher degree of concurrency, it is desirable to make cells of systolic array themselves systolic array as well. The architecture of systolic array with its cells consisting of another systolic array is to be called super-systolic array. In this paper we propose a scalable super-systolic array architecture which shows high-performance and can be adopted in the VLSI design including regular interconnection and functional primitives that are typical for a systolic architecture.

... While the binary-to-residue conversion does not pose a serious threat to the high speedRNS operations, the residue-to-binary conversion can be a bottleneck. Chinese Remainder Theorem (CRT) [6] is considered the main algorithm for the conversion process. Several implementations of the residue decoder have been reported [2,5,7-101. ...

Designing an optimal residue number system (RNS) processor in terms of area and speed depends on the choice of the system moduli. In this paper an optimal algorithm for choosing the system moduli is presented. The algorithm takes into consideration several constraints imposed by the problem definition. The problem is formalized as an integer programming problem to optimize an area/time objective function.

... for . Restriction (3) is necessary to assure that every integer , , can be uniquely represented as an -tuple (4) where denotes the residue of modulo . The main benefit of adopting RNS to perform arithmetic operations is that it allows for the totally parallel addition, subtraction, and multiplication of operands expressed as -tuples of the form (4). ...

Novel radix-r modulo-r<sup>n</sup> arithmetic units for residue
number system (RNS)-based architectures are introduced in this paper.
The proposed circuits are shown to require several times less area than
previously reported architectures for particular moduli of operation,
while also being preferable in the area×time complexity sense. The
complexity reduction is achieved by extending the carry-ignore property
of modulo-2<sup>n</sup> operations to radices higher than two, which are
not powers of two. The carry-ignore property is efficiently exploited by
introducing simplified digit adders, instead of general radix-r adders.
The proposed simplification of digit adders is possible, since the
maximum values of certain intermediate digits produced in the
architecture are found to be less than r-1. Detailed area and time
complexity models are derived for the arithmetic units. The proposed
radix-r architectures include multipliers, adders, and merged
multipliers-adders. In addition, efficient radix-r binary-to-residue and
residue-to-binary conversion techniques and architectures are introduced

... Therefore, designing an efficient residue adder is an important task in realizing different RNS-based applications [1], [2], [3]. Moreover, residue adders are very important components in building residue-based multipliers [2], [3], [4], [5], [6], [21], 22], [23], residue to binary converters, generators, and other arithmetic operations [7], [8], [9], [10], [11], [12], [13], [20]. ...

A modular adder is a very instrumental arithmetic component in implementing online residue-based computations for many digital signal processing applications. It is also a basic component in realizing modular multipliers and residue to binary converters. Thus, the design of a high-speed and reduced-area modular adder is an important issue. In this paper, we introduce a new modular adder design. It is based on utilizing concepts developed to realize binary-based adders. VLSI layout implementations and comparative analysis showed that the hardware requirements and the time delay of the new proposed structure are significantly, less than other reported ones. A new modulo (2<sup>n</sup>+1) adder is also presented. Compared with other similar ones, this specific modular adder requires less area and time delay

This paper proposes the constructing method of effective multiplier based on the finite fields in case of P

Recently, RNS has received increased attention due to its ability to support high-speed concurrent arithmetic. Applications such as fast fourier transform, digital filtering, and image processing utilize the efficiencies of RNS arithmetics in addition and multiplication; they do not require the difficult RNS operations such as division and magnitude comparison of digital signal processor. RNS have computational advantages since operation on residue digits are performed independently and so these processes can be performed in parallel. There are basically two methods that are used for residue to binary conversion. The first approach uses the mixed radix conversion algorithm, and the second approach is based on the Chinese remainder theorem. In this paper, the new design of CRT conversion is presented. This is a derived method using an overlapped multiple-bit scanning method in the process of CRT conversion. This is achieved by a general moduli form(2(k)-1, 2(k) 2(k)+1). Then, it simulates the implementation using an overlapped multiple-bit scanning method in the process of CRT conversion, In conclusion, the simulation shows that the CRT method which is adopted in this research, performs arithmetic operations faster that the traditional approaches, due to advantages of parallel processing and carry-free arithmetic operation.

Modular addition is a very vital arithmetic operation in any residue-based processor. This paper introduces a new design for a modular adder for residue number and cryptography systems. The new addition approach uses binary-based concepts to build a modular adder, for any modulus. Unlike most other previously reported modular adders where addition is performed through two cycles, the proposed algorithm uses one cycle of addition. Compared with another recent design, and based on 0.5 micrometer VLSI technology, the area and time delay of the proposed structure have been improved by 34% and 25.4%, respectively. The adder is customized for specific moduli of the form (2n+ 1) and (2n±2k+1).

Recently, RNS has received increased attention due to its ability to support high-speed concurrent arithmetic. Applications such as fast fourier transform, digital filtering, and image processing utilize the efficiencies of RNS arithmetics in addition and multiplication; they do not require the difficult RNS operations such as division and magnitude comparison of digital signal processor. RNS have computational advantages since operation on residue digits are performed independently and so these processes can be performed in parallel. There are basically two methods that are used for residue to binary conversion. The first approach uses the mixed radix conversion algorithm, and the second approach is based on the Chinese remainder theorem. In this paper, the new design of CRT conversion is presented. This is a derived method using an overlapped multiple-bit scanning method in the process of CRT conversion. This is achieved by a general moduli form (2k
-1, 2k
, 2k
+1). Then, it simulates the implementation using an overlapped multiple-bit scanning method in the process of CRT conversion, In conclusion, the simulation shows that the CRT method which is adopted in this research, performs arithmetic operations faster that the traditional approaches, due to advantages of parallel processing and carry-free arithmetic operation.

Computation of a polynomial function modulo integer with linearly incremented variable is required by certain number generators like, e.g., an interleaver of the turbo decoder in telecommunication field. In this paper, a systematic method for deriving hardware structures for such computation is proposed. The method is derived by recursively applying principles of simplifying modulo operations in a limited domain. With the aid of the proposed method, efficient hardware structures can be derived for any polynomials and significant savings can be obtained in the hardware complexity when compared to the straightforward modulo arithmetic. As a case study, the method is applied on the 3G long term evolution (LTE) interleaver.

This paper presents a novel method for the parallelization of the modular multiplication algorithm in the Residue Number System
(RNS). The proposed algorithm executes modular reductions using a new lookup table along with the Mixed Radix number System
(MRS) and RNS. MRS is used because algebraic comparison is difficult in RNS, which has a non-weighted number representation.
Compared with the previous algorithm, the proposed algorithm only requires L moduli which is half the number needed in the previous algorithm. Furthermore, the proposed algorithm reduces the number
of MUL operations by 25 %.

The family of moduli that has the form (2 n -(2 p ±1)) is considered in this paper. A suggestion for a fast residue multiplier for this family of moduli is introduced. The multiplication algorithm proposed in this paper generates (2n+p-2) partial products; however, it compresses the magnitude of each partial product to be less than 2 n . Although it requires an additional integrated circuit area compared with the most recent published study, the new proposed modular multiplier has a logarithmic delay, which makes it faster than any other modular multiplier. Moreover, it is even faster than binary-based iterative array multipliers with a final CLA addition. The proposed modular multiplier is very suitable for medium and large dynamic ranges.

In this paper we introduce a new algorithm for division in residue number system, which can be applied to any moduli set. Simulation results indicated that the algorithm is faster than the most competitive published work. To further improve this speed, we customize this algorithm to serve two specific moduli sets: (2(k), 2(k) - 1, 2(k-1) - 1) and (2(k) + 1, 2(k), 2(k) - 1). The customization results in eliminating memory devices (ROMs), thus increasing the speed of operation. A semi-custom VLSI design for this algorithm for the moduli (2(k) + 1, 2(k), 2(k) - 1) has been implemented, fabricated and tested.

A novel hardware algorithm, architecture and an optimization technique for residue multipliers are introduced in this paper. The proposed architecture exploits certain properties of the bit products to achieve low-complexity implementation via a set of introduced theorems that allow the definition of a graph based design methodology. In addition the proposed multiplier employs the Canonic Signed Digit (CSD) encoding to minimize the number of bit products required to be processed. Performance data reveal that the introduced architecture achieves area×time complexity reduction of up to 55%, when compared to the most efficient previously reported design.

Design of Residue Number system Multipliers has received
considerable attention in the last few years. This paper presents a new
approach for designing modular multipliers using combinational logic
technique. The idea is based on constructing a truth table whose inputs
are the bits of the multiplicand and the multiplier. The outputs are the
bits of the modular product realizing any minimized boolean function is
achieved using two levels of gates. A VLSI implementation of module 5
has been accomplished. Compared to most recent developed approach, our
new technique requires less integrated circuit area and operates at a
higher speed

New designs of serial-parallel multipliers based on the modified
Booth and multi-bit recoding algorithms are introduced. Using recoding
for the parallel operand, two proposed systolic multipliers have been
introduced to build structures having n/2 and n/3 cells. The proposed
serial-parallel multipliers are compared with other structures on the
basis of multiplication time, area, and complexity. By using multi-bit
overlapped recoding of the multiplier operand, the multiplier operates
at twice the speed of the existing designs and has a much lower AT<sup>2
</sup> complexity

A novel very large scale integration architecture and the corresponding design methodology for a combinatorial adder-based residue number system (RNS) multiplier are presented in this paper. The proposed approach to residue multiplier design, exploits the nonoccurring combinations of input bits to reduce the number of 1-bit full adders (FAs) required to compose an RNS multiplier. In particular, input bits which cannot be simultaneously asserted for any input residue value are organized into couples or triplets, which can be processed by OR gates instead of 1-bit adders, therefore reducing the RNS multiplier complexity. By comparing the performance and hardware complexity of the proposed residue multiplier to previously reported designs, it is found that the introduced architecture is more efficient in the area×time product sense. In fact, it is shown that a performance improvement in excess of 80% can be achieved in certain cases.

In implementing Residue Number System (RNS) arithmetic
multipliers, ROM-based structures are very efficient for small moduli,
However, due to their exponential growth, ROM implementations are not
suitable for medium and large moduli. This paper introduces an
architecture for a RNS-based multiplier which combines the use of
small-size ROMs and arithmetic components. The design is most suitable
for medium and large moduli. Compared with other implementations, the
VLSI layout implementation of this new approach is shown to be more
efficient in terms of area and delay requirements

A O(log n) algorithm for large moduli multiplication for Residue Number System(FtNS) based architectures is proposed. The proposed modulo multiplier is much faster than previously proposed multipliers and more area efficient. The implementation of the multiplier is modular and is based on simple cells which leads to efficient VLSI realization. A VLSI implementation using 3 micron CMOS process shows that a pipelined n-bit modulo multiplication scheme can operate with a throughput of 30 M operation per second.

An implementation of a fast and flexible residue decoder for
residue-number-system (RNS)-based architectures is proposed. The decoder
is based on the Chinese remainder theorem. It decodes a set of residues
to its equivalent representation in weighted binary number system. This
decoder is flexible since the decoded data can be selected to be either
unsigned magnitude or 2's complement binary number. Two different
architectures are analyzed; the first one is based on using carry-save
adders, while the other is based on utilizing modulo adders. The
implementation of both architectures is modular and is based on simple
cells, which leads to efficient VLSI realization. The proposed decoder
is fast; it has a time complexity of θ(log N ), where
N is the number of moduli

A θ(1) algorithm for large modulo addition for
architectures based on the residue number (RNS) is proposed. The
addition is done in a fixed number of stages which does not depend on
the size of the modulus. The proposed modulo adder is much faster than
previous adders and more area efficient. The implementation of the adder
is modular and is based on simple cells, which leads to efficient VLSI
realization

Absfrad -A O(l) algorithm for large modulo addition for residue number system (RNS) based archictectures is proposed. The addition is done in a fixed number of stages which does not depend on the size of the modulus. The proposed modulo adder is much faster than the previous adders and more area efficient. The implementation of the adder is modular and is based on simple cells which leads to efficient VLSI realization. I. INTRODUC~ION Recently, the residue member system (RNS) is receiving in-creased attention due to its ability to support high-speed concur-rent arithmetic [ 11. Applications such as fast Fourier transform, digital filtering, and image processing utilize the high-speed RNS arithmetic operations; addition and multiplication, do not require the difficult RNS operations such as division and magni-tude comparison. The technological advantages offered by VLSI have added a new dimension in the implementation of RNS-based architectures [2]. Several high-speed VLSI special pur-pose digital signal processors have been successfully imple-mented [31-[51. Modulo addition represents the computational kernel for RNS-based architectures. Subtraction is performed by adders using the additive inverse property [6]. Multiplication can be transformed into addition by several techniques [7]. Also, mod-ulo addition is the basic element in the conversion from RNS to binary using the Chinese remainder theorem (CRT) [6]. Banerji [8] analyzed modulo addition in MSI technology. A VLSI analy-sis of modulo addition has been reported in [9]-[11]. In general, lookup tables and PLAs have been the main logical modules used when the data granularity is the word. It has been found that such structure is only efficient for small size moduli. For medium size and large moduli, bit-level structures are more efficient, where the data granularity is the bit [12]. In this paper, we present a modulo adder for medium size and large moduli. It is based on using a two-dimensional array of very simple cells (full adders). The modulo addition is per-formed in a fixed time delay independent of the size of the moduli. 11. RESIDUE NUMBER SYSTEM (RNS)

Since the number of components that can fit on a single chip is large and rapidly growing, the asymptotic analysis and computational complexity have become applicable to the VLSI systems. We propose a model of computation devoted to VLSI structures based on Residue Number System (RNS). The developed model employs the ‘cut theorem’ which has been used by most of the abstract VLSI models. It is not as general as other reported models, but it gives tighter lower bounds and more accurate measures of performance for RNS structures. This computational model relates the area and time complexities with the inherent properties of RNS, the moduli size and the dynamic range. The model supports the look-up table implementation approach and it is technology-independent.

Results are presented on the design, layout, and fabrication of a custom-designed integrated circuit for a residue number system digital filter module. The architecture is based on a ROM-ACCUMULATOR FIR structure in which the modular arithmetic for each modulus is realized on a separate chip. The modules are designed to support error detection and fault isolation at module boundaries. Of the five chips that were fabricated and tested, all were found to be fully operational, with three operating at a maximum data-cycle frequency of approximately 1.7 MHz.

A θ(log n ) algorithm for large moduli multiplication for residue-number-system (RNS)-based architectures is proposed. The modulo multiplier is much faster than previously proposed multipliers, and more area efficient. The implementation of the multiplier is modular and is based on simple cells, which leads to efficient VLSI realization. A VLSI implementation using 3-μm CMOS process shows that a pipelined n -bit modulo multiplication scheme can operate with a throughput of 30M operations/s

A recently proposed residue-number-arithmetic digital filter offers major cost and speed advantages over binary-arithmetic digital filters, but suffers one major drawback. The filter coefficients must be constant, since the lack of a fast method of multiplication by a fraction in residue arithmetic requires the coefficients to be realised by a fixed table look-up read-only memory. Two multipliers are proposed which realise a completely general fractional multiply and are suitable for digital-filtering applications.

In current high-speed digital signal-processing (DSP) architectures, the Residue Number System (RNS) has an important role to play. RNS implementations have a highly modular structure, and are not dependent upon large binary arithmetic elements. RNS implementations become more attractive when combined with the advantages offered by VLSI fabrication technology. In this paper, a novel design methodology has been developed for RNS structures, based on using look-up tables, which takes into consideration the unique features and requirements of RNS. The paper discusses the following three phases: 1) developing a look-up table layout model, which is used to derive relationships between the size of each modulus and both chip area and time; this model supports all types of moduli; 2) selecting the most efficient layout according to the design requirements; the procedure allows the designer to control the area, time, or the configuration of the memory module required for implementing a modulo look-up table; 3) proposing a set of multi-look-up table modules, to be used as building block units for implementing digital signal-processing architectures. The paper uses two examples to -illustrate the use of the modules in phase 3).

The residue number system has been recently shown to be a viable signal processing media. However, it does possess limitations. One of the most serious is overflow prevention through magnitude scaling. One method of overcoming this defect is to increase the dynamic range of the numbering system. To this end a new high-speed large moduli multiplier has been developed. The multiplier is the result of combining the quartersquared algorithm with recent breakthroughs in device technology. As a result, equivalent 18-bit full precision products can be obtained at a pipelined rate of
28.5 \times 103
multiplies per second.

Dynamic range overflow is a serious problem in residue arithmetic systems. Contemporary overflow management schemes rely on inefficient scaling operations. In this correspondence a residue scaler is architectured which inhibits dynamic range overflow. The system uses the popular three moduli set {2n-1, 2n, 2n + 1}. Using a 4K memory model, practical 12-and 18-bit autoscalers are configured. An error model for the derived residue arithmetic unit is also derived and experimentally verified.

A technique for multiplying numbers, modulo a prime number, using look-up tables stored in read-only memories is discussed. The application is in the computation of number theoretic transforms implemented in a ring which is isomorphic to a direct sum of several Galois fields, parallel computations being performed in each field. The look-up table technique uses the addition of indexes within a ring that contains at least twice as many elements as the field. Specific examples are given for multiplication modulo 19 using ROM arrays, and multiplication modulo 13 using an 8048 single chip microcomputer.

Modulo Pi multipliers are implemented by look-up tables when Pi is small (5 bits or less) and by index calculus if Pi is larger (6 bits or more). However, index calculus only works for prime moduli Pi. In this letter, we introduce a new square-law multiplier that is useful for modulo Pi multiplication where Pi is any modulus. It is expected that this will have important applications in RNS arithmetic computing hardware. Copyright © 1980 by the Institute of Electrical and Electronics Engineers, Inc.

Digital systems structured into residue arithmetic units may play an important role in ultra-speed, dedicated, real-time systems that support pure parallel processing of integer-valued data. It is a 'carry-free' system that performs addition, subtraction, and multiplication as concurrent (parallel) operations, side-stepping one of the principal arithmetic delays - managing carry information. This article develops some of the fundamental properties of this branch of mathematics and presents the state of the RNS art and some potential applications.

A custom-designed integrated circuit for the realization of residue number digital filters A high-speed low-cost modulo pz multiplier with RNS arithmetic application

- W Jenkins
- E Davidson

W. Jenkins and E. Davidson, " A custom-designed integrated circuit for the realization of residue number digital filters, " in Proc. ICASSP 1985, Mar. 1985, pp. 22G223. 161 M. A. Soderstrand and C. Vernia, " A high-speed low-cost modulo pz multiplier with RNS arithmetic application, " Proc. IEEE, vol. 68, pp. 529-532, Apr. 1980.

On bit-parallel processing for modulo arithmetic

- K M Elleithy

K. M. Elleithy, " On bit-parallel processing for modulo arithmetic, " VLSI

A high speed VLSI complex digital signal processor based on quadratic residue number system

- M A Bayoumiin