Conference Paper

A &thetas;(log n ) algorithm for modulo multiplication

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

A θ(log n ) algorithm for large moduli multiplication for residue-number-system (RNS)-based architectures is proposed. The modulo multiplier is much faster than previously proposed multipliers, and more area efficient. The implementation of the multiplier is modular and is based on simple cells, which leads to efficient VLSI realization. A VLSI implementation using 3-μm CMOS process shows that a pipelined n -bit modulo multiplication scheme can operate with a throughput of 30M operations/s

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Conference Paper
Full-text available
Designing an optimal residue number system (RNS) processor in terms of area and speed depends on the choice of the system moduli. In this paper an optimal algorithm for choosing the system moduli is presented. The algorithm takes into consideration several constraints imposed by the problem definition. The problem is formalized as an integer programming problem to optimize an area/time objective function.
Article
With the current advances in VLSI technology, traditional algorithms for Residue Number System (RNS) based architectures should be reevaluated to explore the new technology dimensions. In this brief, we introduce A θ(log n) algorithm for large moduli multiplication for RNS based architectures. A systolic array has been designed to perform the modulo multiplication algorithm. The proposed modulo multiplier is much faster than previously proposed multipliers and more area efficient. The implementation of this multiplier is modular and is based on using simple cells which leads to efficient VLSI realization. A VLSI implementation using 3 micron CMOS technology shows that a pipelined n-bit module multiplication scheme can operate with a throughput of 30 M operation per second
Article
Full-text available
An implementation of a fast and flexible residue decoder for residue-number-system (RNS)-based architectures is proposed. The decoder is based on the Chinese remainder theorem. It decodes a set of residues to its equivalent representation in weighted binary number system. This decoder is flexible since the decoded data can be selected to be either unsigned magnitude or 2's complement binary number. Two different architectures are analyzed; the first one is based on using carry-save adders, while the other is based on utilizing modulo adders. The implementation of both architectures is modular and is based on simple cells, which leads to efficient VLSI realization. The proposed decoder is fast; it has a time complexity of θ(log N ), where N is the number of moduli
Conference Paper
Full-text available
Decoding in residue-number-system (RNS)-based architectures can be a bottleneck. A high-speed, flexible modulo decoder is an essential computational element to maintain the advantages of RNS. A fast and flexible modulo decoder, based on the Chinese remainder theorem (CRT), is presented. It decodes a set of residues into its equivalent representation in either unsigned magnitude or two's-complement binary number system. Two different architectures are analyzed: the first one uses carry-save adders, and the other uses modified structure carry-save adders. Both architectures are modular and are based on simple cells, which leads to efficient VLSI implementation. The decoder has a time complexity of θ(log N )
Article
Since the number of components that can fit on a single chip is large and rapidly growing, the asymptotic analysis and computational complexity have become applicable to the VLSI systems. We propose a model of computation devoted to VLSI structures based on Residue Number System (RNS). The developed model employs the ‘cut theorem’ which has been used by most of the abstract VLSI models. It is not as general as other reported models, but it gives tighter lower bounds and more accurate measures of performance for RNS structures. This computational model relates the area and time complexities with the inherent properties of RNS, the moduli size and the dynamic range. The model supports the look-up table implementation approach and it is technology-independent.
Conference Paper
Results are presented on the design, layout, and fabrication of a custom-designed integrated circuit for a residue number system digital filter module. The architecture is based on a ROM-ACCUMULATOR FIR structure in which the modular arithmetic for each modulus is realized on a separate chip. The modules are designed to support error detection and fault isolation at module boundaries. Of the five chips that were fabricated and tested, all were found to be fully operational, with three operating at a maximum data-cycle frequency of approximately 1.7 MHz.
Article
A recently proposed residue-number-arithmetic digital filter offers major cost and speed advantages over binary-arithmetic digital filters, but suffers one major drawback. The filter coefficients must be constant, since the lack of a fast method of multiplication by a fraction in residue arithmetic requires the coefficients to be realised by a fixed table look-up read-only memory. Two multipliers are proposed which realise a completely general fractional multiply and are suitable for digital-filtering applications.
Article
In current high-speed digital signal-processing (DSP) architectures, the Residue Number System (RNS) has an important role to play. RNS implementations have a highly modular structure, and are not dependent upon large binary arithmetic elements. RNS implementations become more attractive when combined with the advantages offered by VLSI fabrication technology. In this paper, a novel design methodology has been developed for RNS structures, based on using look-up tables, which takes into consideration the unique features and requirements of RNS. The paper discusses the following three phases: 1) developing a look-up table layout model, which is used to derive relationships between the size of each modulus and both chip area and time; this model supports all types of moduli; 2) selecting the most efficient layout according to the design requirements; the procedure allows the designer to control the area, time, or the configuration of the memory module required for implementing a modulo look-up table; 3) proposing a set of multi-look-up table modules, to be used as building block units for implementing digital signal-processing architectures. The paper uses two examples to -illustrate the use of the modules in phase 3).
Article
The residue number system has been recently shown to be a viable signal processing media. However, it does possess limitations. One of the most serious is overflow prevention through magnitude scaling. One method of overcoming this defect is to increase the dynamic range of the numbering system. To this end a new high-speed large moduli multiplier has been developed. The multiplier is the result of combining the quartersquared algorithm with recent breakthroughs in device technology. As a result, equivalent 18-bit full precision products can be obtained at a pipelined rate of 28.5 times 103 multiplies per second.
Article
Dynamic range overflow is a serious problem in residue arithmetic systems. Contemporary overflow management schemes rely on inefficient scaling operations. In this correspondence a residue scaler is architectured which inhibits dynamic range overflow. The system uses the popular three moduli set {2n-1, 2n, 2n + 1}. Using a 4K memory model, practical 12-and 18-bit autoscalers are configured. An error model for the derived residue arithmetic unit is also derived and experimentally verified.
Article
A technique for multiplying numbers, modulo a prime number, using look-up tables stored in read-only memories is discussed. The application is in the computation of number theoretic transforms implemented in a ring which is isomorphic to a direct sum of several Galois fields, parallel computations being performed in each field. The look-up table technique uses the addition of indexes within a ring that contains at least twice as many elements as the field. Specific examples are given for multiplication modulo 19 using ROM arrays, and multiplication modulo 13 using an 8048 single chip microcomputer.
Article
Modulo Pi multipliers are implemented by look-up tables when Pi is small (5 bits or less) and by index calculus if Pi is larger (6 bits or more). However, index calculus only works for prime moduli Pi. In this letter, we introduce a new square-law multiplier that is useful for modulo Pi multiplication where Pi is any modulus. It is expected that this will have important applications in RNS arithmetic computing hardware. Copyright © 1980 by the Institute of Electrical and Electronics Engineers, Inc.
Article
Digital systems structured into residue arithmetic units may play an important role in ultra-speed, dedicated, real-time systems that support pure parallel processing of integer-valued data. It is a 'carry-free' system that performs addition, subtraction, and multiplication as concurrent (parallel) operations, side-stepping one of the principal arithmetic delays - managing carry information. This article develops some of the fundamental properties of this branch of mathematics and presents the state of the RNS art and some potential applications.
On Bibparallel Processing for Modulo Arithmetic The Center for Advanced Com-puter Studies, University of Southwestern LouisianaImplementation of Multiplication Modulo a Prime Number with Applications to Number Theoretic Transforms
  • K M G A Elleithy
  • F J Jullien
  • Taylor
K. M. Elleithy, "On Bibparallel Processing for Modulo Arithmetic," VLSI Technical Report TR86-8-1, The Center for Advanced Com-puter Studies, University of Southwestern Louisiana, 1986. G. A. Jullien, "Implementation of Multiplication Modulo a Prime Number with Applications to Number Theoretic Transforms," IEEE Trans. Comput., vol. C-29, no. 10, pp. 899-905, Oct. 1980. F. J. Taylor, "Large Moduli Multipliers for Signal Processing," IEEE Trans. Circuits and Systems, vol. CAS-28, no. 7, July 1981.
A High Speed VLSI Complex Digital Signal Pro-cessor Based on Quadratic Residue Number SystemDigital Filter VLSI Systolic Arrays over Finite Fields for DSP ApplicationsA custom-designed Integrated Circuit for the Realization of Residue Number Digital Filters
  • M A Bayoumi
  • M A Bayoumi Jenkins
  • E Davidson
M. A. Bayoumi, "A High Speed VLSI Complex Digital Signal Pro-cessor Based on Quadratic Residue Number System," VLSI Signal Processing 11, pp. 200-211, IEEE Press, 1986. M. A. Bayoumi, "Digital Filter VLSI Systolic Arrays over Finite Fields for DSP Applications," Proc. of the 6th IEEE Annual Phoenix Conference on Computers and Communications, pp. 194-199, Feb. 1987. W. Jenkins and E. Davidson, "A custom-designed Integrated Circuit for the Realization of Residue Number Digital Filters," Proc. ICASSP 1985, pp, 22CL223, March 1985.
On the Bibparallel Implementation for the Chinese Remainder Theorem
  • K M Elleithy
K. M. Elleithy, "On the Bibparallel Implementation for the Chinese Remainder Theorem," VLSI Technical Report TR87-8-1, The Center for Advanced Computer Studies, University of Southwestern Louisi-ana, 1987.
On Bibparallel Processing for Modulo Arithmetic
  • K M Elleithy
K. M. Elleithy, "On Bibparallel Processing for Modulo Arithmetic," VLSI Technical Report TR86-8-1, The Center for Advanced Computer Studies, University of Southwestern Louisiana, 1986.
Digital Filter VLSI Systolic Arrays over Finite Fields for DSP Applications
  • M A Bayoumi
M. A. Bayoumi, "Digital Filter VLSI Systolic Arrays over Finite Fields for DSP Applications," Proc. of the 6th IEEE Annual Phoenix Conference on Computers and Communications, pp. 194- 199, Feb. 1987.
A High-speed Low-Cost Modulo
  • M A Soderstrand
  • C Vernia
M. A. Soderstrand and C. Vernia, "A High-speed Low-Cost Modulo