Article

A O(l) Algorithm for Modulo Addition

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Absfrad -A O(l) algorithm for large modulo addition for residue number system (RNS) based archictectures is proposed. The addition is done in a fixed number of stages which does not depend on the size of the modulus. The proposed modulo adder is much faster than the previous adders and more area efficient. The implementation of the adder is modular and is based on simple cells which leads to efficient VLSI realization. I. INTRODUC~ION Recently, the residue member system (RNS) is receiving in-creased attention due to its ability to support high-speed concur-rent arithmetic [ 11. Applications such as fast Fourier transform, digital filtering, and image processing utilize the high-speed RNS arithmetic operations; addition and multiplication, do not require the difficult RNS operations such as division and magni-tude comparison. The technological advantages offered by VLSI have added a new dimension in the implementation of RNS-based architectures [2]. Several high-speed VLSI special pur-pose digital signal processors have been successfully imple-mented [31-[51. Modulo addition represents the computational kernel for RNS-based architectures. Subtraction is performed by adders using the additive inverse property [6]. Multiplication can be transformed into addition by several techniques [7]. Also, mod-ulo addition is the basic element in the conversion from RNS to binary using the Chinese remainder theorem (CRT) [6]. Banerji [8] analyzed modulo addition in MSI technology. A VLSI analy-sis of modulo addition has been reported in [9]-[11]. In general, lookup tables and PLAs have been the main logical modules used when the data granularity is the word. It has been found that such structure is only efficient for small size moduli. For medium size and large moduli, bit-level structures are more efficient, where the data granularity is the bit [12]. In this paper, we present a modulo adder for medium size and large moduli. It is based on using a two-dimensional array of very simple cells (full adders). The modulo addition is per-formed in a fixed time delay independent of the size of the moduli. 11. RESIDUE NUMBER SYSTEM (RNS)

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Conference Paper
Full-text available
Parallelism on the algorithmic, architectural, and arithmetic levels is exploited in the design of a residue number system (RNS) based architecture. The architecture is based on modulo processors. Each modulo processor is implemented by a two-dimensional systolic array composed of very simple cells. The decoding state is implemented using a two-dimensional array. The decoding bottleneck is eliminated. The whole architecture is pipelined, which leads to a high throughput rate. High speed algorithms for modulo addition, modulo multiplication, and RNS decoding are presented
Data
Designing an optimal Residue Number System (RNS) processor in terms of area and speed depends on the choice of the system moduli. In this paper an optimal algorithm for choosing the system moduli is presented. The algorithm takes into consideration several constraints imposed by the problem definition. The problem is formalized as an integer programming problem to optimize an aredtime objective function.
Data
Designing an optimal Residue Number System (RNS) processor in terms of area and speed depends on the choice of the system moduli. In this paper an optimal algorithm for choosing the system moduli is presented. The algorithm takes into consideration several constraints imposed by the problem definition. The problem is formalized as an integer programming problem to optimize an aredtime objective function.
Article
The implementation of a FIR filter using a new hybrid RNS-binary arithmetic is presented for the first time. In the new arithmetic, the data samples are represented using RNS, and hence the carry free advantage of RNS computations is retained. However, the computation peformed for each modulo is implemented using conventional binary arithemetic elements which overcome the drawback of ROM-based RNS arithmetic elements that become inefficient for large moduli. The conventional binary arithmetic elements are also faster and require less area than existing memoryless RNS arithmetic elements. It is shown that the filter structures based on the new arithmetic have better performance than those based on either the conventional binary or conventional RNS arithmetic for large moduli.
Conference Paper
Full-text available
Designing an optimal residue number system (RNS) processor in terms of area and speed depends on the choice of the system moduli. In this paper an optimal algorithm for choosing the system moduli is presented. The algorithm takes into consideration several constraints imposed by the problem definition. The problem is formalized as an integer programming problem to optimize an area/time objective function.
Article
With the current advances in VLSI technology, traditional algorithms for Residue Number System (RNS) based architectures should be reevaluated to explore the new technology dimensions. In this brief, we introduce A θ(log n) algorithm for large moduli multiplication for RNS based architectures. A systolic array has been designed to perform the modulo multiplication algorithm. The proposed modulo multiplier is much faster than previously proposed multipliers and more area efficient. The implementation of this multiplier is modular and is based on using simple cells which leads to efficient VLSI realization. A VLSI implementation using 3 micron CMOS technology shows that a pipelined n-bit module multiplication scheme can operate with a throughput of 30 M operation per second
Vinnakota and Rao's RNS-to-binary converter proposed recently (see ibid., vol. CAS-41, p. 927-9, 1994) for the moduli set {2<sup>n</sup>-1, 2<sup>n</sup> and 2<sup>n</sup>+1} is shown to be a simple modification of the well-known Mixed Radix Conversion technique. Their converter is also compared to other recently described RNS to binary converters regarding area and speed performance
Article
Since the number of components that can fit on a single chip is large and rapidly growing, the asymptotic analysis and computational complexity have become applicable to the VLSI systems. We propose a model of computation devoted to VLSI structures based on Residue Number System (RNS). The developed model employs the ‘cut theorem’ which has been used by most of the abstract VLSI models. It is not as general as other reported models, but it gives tighter lower bounds and more accurate measures of performance for RNS structures. This computational model relates the area and time complexities with the inherent properties of RNS, the moduli size and the dynamic range. The model supports the look-up table implementation approach and it is technology-independent.
Conference Paper
Results are presented on the design, layout, and fabrication of a custom-designed integrated circuit for a residue number system digital filter module. The architecture is based on a ROM-ACCUMULATOR FIR structure in which the modular arithmetic for each modulus is realized on a separate chip. The modules are designed to support error detection and fault isolation at module boundaries. Of the five chips that were fabricated and tested, all were found to be fully operational, with three operating at a maximum data-cycle frequency of approximately 1.7 MHz.
Conference Paper
This correspondence describes an implementation scheme for the operations of addition and subtraction in the residue number systems. The method is based on the property that the set of residues modulo m form a finite group under addition and subtraction (modulo m). The proposed adder/subtractor structure is very systematic and, hence, suitable for MSI/LSI realization.
Article
In the residue number system arithmetic is carried out on each digit individually. There is no carry chain. This locality is of particular interest in VLSI. An evaluation of different implementations of residue arithmetic is carried out, and the effects of reduced feature sizes estimated. At the current state of technology the traditional table lookup method is preferable for a range that requires a maximum modulus that is represented by up to 4 bits, while an array of adders offers the best performance fur 7 or more bits. A combination of adders and tables covers 5 and 6 bits the best. At 0.5 mu m feature size table lookup is competitive only up to 3 bits, These conclusions are based on sample designs in nMOS.
Article
A recently proposed residue-number-arithmetic digital filter offers major cost and speed advantages over binary-arithmetic digital filters, but suffers one major drawback. The filter coefficients must be constant, since the lack of a fast method of multiplication by a fraction in residue arithmetic requires the coefficients to be realised by a fixed table look-up read-only memory. Two multipliers are proposed which realise a completely general fractional multiply and are suitable for digital-filtering applications.
Article
In current high-speed digital signal-processing (DSP) architectures, the Residue Number System (RNS) has an important role to play. RNS implementations have a highly modular structure, and are not dependent upon large binary arithmetic elements. RNS implementations become more attractive when combined with the advantages offered by VLSI fabrication technology. In this paper, a novel design methodology has been developed for RNS structures, based on using look-up tables, which takes into consideration the unique features and requirements of RNS. The paper discusses the following three phases: 1) developing a look-up table layout model, which is used to derive relationships between the size of each modulus and both chip area and time; this model supports all types of moduli; 2) selecting the most efficient layout according to the design requirements; the procedure allows the designer to control the area, time, or the configuration of the memory module required for implementing a modulo look-up table; 3) proposing a set of multi-look-up table modules, to be used as building block units for implementing digital signal-processing architectures. The paper uses two examples to -illustrate the use of the modules in phase 3).
Article
The efficient hardware implementation of residue number system (RNS) architectures has evolved based on the development of integrated circuit technology. The implementation of RNS adders is discussed in this paper. Three approaches (the binary adder, the look-up table, and the hybrid implementation) are analyzed in the scope of VLSI criteria where the performance measures are area and time. Two layout design procedures have been used. They are flexible and support any type of moduli. The implementation complexity depends on the form and size of the modulus, but, in general, the look-up table approach is preferable in both area and time for moduli up to five bits, while the binary adder and the hybrid approaches offer better performance for larger moduli.
Article
Residue Number Systems (RNS) are proved to be useful in many applications, as for example in signal processing. In this paper, a VLSI computing architecture is proposed for converting an integer number N from the weighted binary representation into and out a residue code based on s moduli. For this architecture a possible layout is given and its complexity is evaluated in terms of area and time. Under several hypotheses on RNS parameters, constructive upper bounds ranging from 0(n^{2} log n) to 0(n^{2} log log n) and from 0(log^{2} n) to 0(log n) for area and time, respectively, have been obtained for the direct conversion. On the contrary, constructive upper bounds A = 0(n^{2} log n) and T = 0(log^{2} n) have been found independent of the formed hypotheses, for the reverse conversion.
Article
Digital systems structured into residue arithmetic units may play an important role in ultra-speed, dedicated, real-time systems that support pure parallel processing of integer-valued data. It is a 'carry-free' system that performs addition, subtraction, and multiplication as concurrent (parallel) operations, side-stepping one of the principal arithmetic delays - managing carry information. This article develops some of the fundamental properties of this branch of mathematics and presents the state of the RNS art and some potential applications.
On hit-parallel processing for modulo arithmetic
  • K M Elleithy
K. M. Elleithy, " On hit-parallel processing for modulo arithmetic, "
A high speed VLSI complex digital signal processor based on quadratic residue number system, " in VLSI Signal Processing II. -, " Digital filter VLSI systolic arrays over finite fields for DSP applications
  • M A Bayoumi
M. A. Bayoumi, " A high speed VLSI complex digital signal processor based on quadratic residue number system, " in VLSI Signal Processing II. -, " Digital filter VLSI systolic arrays over finite fields for DSP applications, " in Proc. 6th IEEE Ann. Phoenix Conf. on Computers and Communications, pp. 194-199, Feb. 1987.