N. Petra

Università degli Studi di Napoli Federico II, Napoli, Campania, Italy

Are you N. Petra?

Claim your profile

Publications (28)30.66 Total impact

  • Article: Fixed-Width Multipliers and Multipliers-Accumulators With Min-Max Approximation Error
    IEEE Transactions on Circuits and Systems. 01/2013;
  • Article: Analytical Calculation of the Maximum Error for a Family of Truncated Multipliers Providing Minimum Mean Square Error
    V. Garofalo, N. Petra, E. Napoli
    [show abstract] [hide abstract]
    ABSTRACT: A truncated multiplier is a multiplier with two n bit operands that produces a n bit result. Truncated multipliers discard some of the partial products of a complete multiplier to trade off accuracy with hardware cost. Compared with a conventional multiplier, a truncated multiplier introduces an error on the output whose magnitude depends on the input bits. The maximum value of the error is hardly computable, since it isn't possible to test every possible input and nonexhaustive simulations are very unlikely to provide the actual maximum absolute error value. It is therefore extremely useful to develop methods that provide the maximum error for a truncated multiplier. This paper presents a closed form analytical calculation, for every bit width, of the maximum error for a previously proposed family of truncated multipliers. The considered family of truncated multipliers is particularly important since it is proved to be the design that gives the lowest mean square error for a given number of discarder partial products. With the contribution of this paper, the considered family of truncated multipliers is the only architecture that can be designed, for every bit width, using an analytical approach that allows the a priori knowledge of the maximum error.
    IEEE Transactions on Computers 10/2011; · 1.10 Impact Factor
  • Article: Design of Fixed-Width Multipliers With Linear Compensation Function
    [show abstract] [hide abstract]
    ABSTRACT: This paper focuses on fixed-width multipliers with linear compensation function by investigating in detail the effect of coefficients quantization. New fixed-width multiplier topologies, with different accuracy versus hardware complexity trade-off, are obtained by varying the quantization scheme. Two topologies are in particular selected as the most effective ones. The first one is based on a uniform coefficient quantization, while the second topology uses a nonuniform quantization scheme. The novel fixed-width multiplier topologies exhibit better accuracy with respect to previous solutions, close to the theoretical lower bound.
    Circuits and Systems I: Regular Papers, IEEE Transactions on 06/2011; · 1.97 Impact Factor
  • Article: Elementary Functions Hardware Implementation Using Constrained Piecewise-Polynomial Approximations
    [show abstract] [hide abstract]
    ABSTRACT: A novel technique for designing piecewise-polynomial interpolators for hardware implementation of elementary functions is investigated in this paper. In the proposed approach, the interval where the function is approximated is subdivided in equal length segments and two adjacent segments are grouped in a segment pair. Suitable constraints are then imposed between the coefficients of the two interpolating polynomials in each segment pair. This allows reducing the total number of stored coefficients. It is found that the increase in the approximation error due to constraints between polynomial coefficients can easily be overcome by increasing the fractional bits of the coefficients. Overall, compared with standard unconstrained piecewise-polynomial approximation having the same accuracy, the proposed method results in a considerable advantage in terms of the size of the lookup table needed to store polynomial coefficients. The calculus of the coefficients of constrained polynomials and the optimization of coefficients bit width is also investigated in this paper. Results for several elementary functions and target precision ranging from 12 to 42 bits are presented. The paper also presents VLSI implementation results, targeting a 90 nm CMOS technology, and using both direct and Horner architectures for constrained degree-1, degree-2, and degree-3 approximations.
    IEEE Transactions on Computers 04/2011; · 1.10 Impact Factor
  • Article: Designof fixed-width multipliers with linear compensation function
    Circuits and Systems I: Regular Papers, IEEE Transactions on 01/2011; 58(5):947-960. · 1.97 Impact Factor
  • Conference Proceeding: Fixed-width CSD multipliers with minimum mean square error
    [show abstract] [hide abstract]
    ABSTRACT: Many multimedia and DSP applications require fixed-width multipliers, in which input data and output results have the same bit width. In this paper we investigate fixed-width multipliers where one of the input operand is a constant, encoded using canonic signed digit (CSD) representation. This is a very important case in many practical applications such as the calculation of Fast Fourier Transform. In the paper we derive in closed form the expression of the compensation function giving the minimum mean square error for CSD fixed-width multiplier. On the basis of this analytical result, we propose a hardware efficient implementation of the multiplier. Fixed width CSD multipliers implemented with the approach presented in this paper are accurate and can be implemented by using a simple partial-product reduction tree followed by a fast adder, without requiring additional look-up tables. The proposed approach is general and is well suited for implementation in circuit synthesizers. Implementation results in 90 nm technology are presented, to demonstrate the effectiveness of the proposed technique.
    Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on; 07/2010
  • Article: Truncated Binary Multipliers With Variable Correction and Minimum Mean Square Error
    [show abstract] [hide abstract]
    ABSTRACT: Truncated multipliers compute the n most-significant bits of the n × n bits product. This paper focuses on variable-correction truncated multipliers, where some partial-products are discarded, to reduce complexity, and a suitable compensation function is added to partly compensate the introduced error. The optimal compensation function, that minimizes the mean square error, is obtained in this paper in closed-form for the first time. A sub optimal compensation function, best suited for hardware implementation, is introduced. Efficient multipliers implementation based on sub-optimal function is discussed. Proposed truncated multipliers are extensively compared with previously proposed circuits. Experimental results, for a 0.18 μm technology, are also presented.
    Circuits and Systems I: Regular Papers, IEEE Transactions on 07/2010; · 1.97 Impact Factor
  • Article: A 1.27 GHz, All-Digital Spread Spectrum Clock Generator/Synthesizer in 65 nm CMOS
    [show abstract] [hide abstract]
    ABSTRACT: Spread spectrum clocking is an effective solution to reduce the electromagnetic interference produced by digital chips, using a clock signal with a frequency that is intentionally swept (frequency modulated) within a certain frequency range, with a predefined modulation profile. We present the implementation of an all-digital spread spectrum clock generator. The circuit is realized by using a design flow completely based on standard cells and is able to perform clock spreading with an arbitrary modulation profile and a modulation frequency up to 5 MHz. The circuit uses two digitally controlled delay lines driven by a digital modulator to synthesize the output waveform. A replica delay line is employed in a real-time measurement circuit to track process, voltage and temperature variations. A chip has been implemented in a 65 nm CMOS technology. The chip is able to generate signals up to 1.27 GHz. The measured peak level reduction of the clock spectrum, at 750 MHz output frequency, is 20.5 dB with a 6% modulation depth. The power dissipation is 44 mW @ 1.27 GHz.
    IEEE Journal of Solid-State Circuits 06/2010; · 3.23 Impact Factor
  • Conference Proceeding: OpenCV compatible real time processor for background foreground identification
    M. Genovese, E. Napoli, N. Petra
    [show abstract] [hide abstract]
    ABSTRACT: Conference code: 83840, Cited By (since 1996):2, Export Date: 26 April 2013, Source: Scopus, Art. No.: 5696190, :doi 10.1109/ICM.2010.5696190
    Proceedings of the International Conference on Microelectronics, ICM, Cairo; 01/2010
  • Article: High-Performance Special Function Unit for Programmable 3-D Graphics Processors
    D. De Caro, N. Petra, A. Strollo
    [show abstract] [hide abstract]
    ABSTRACT: An high-speed special function unit (SFU) is presented in this paper. The system supports the single-precision IEEE-754 floating-point standard and implements faithfully rounded reciprocal, square root, reciprocal square root, logarithm, and exponential functions. The functions are approximated by using a novel constrained piecewise quadratic interpolation technique. In this way, the lookup table size is reduced by 40% with respect to previously proposed techniques, without any loss in accuracy. Error analysis and sizing methodology are presented in the paper. The SFU has been implemented in a 0.18-mum CMOS technology. The circuit is able to operate up to 420-MHz clock frequency, with a power dissipation of 160 mW at 420 MHz. The system can be employed in programmable graphics accelerators and in other applications where high-performance function evaluation is needed.
    Circuits and Systems I: Regular Papers, IEEE Transactions on 10/2009; · 1.97 Impact Factor
  • Article: Digital Synthesizer/Mixer With Hybrid CORDIC–Multiplier Architecture: Error Analysis and Optimization
    D. De Caro, N. Petra, A. Strollo
    [show abstract] [hide abstract]
    ABSTRACT: This paper describes a novel architecture for digital synthesizer/mixer (DSM). The operation performed by a DSM corresponds to a rotation of the input vector in the complex plane. The proposed architecture divides this rotation into three subrotations. The first one uses a few CORDIC stages, in which the rotation directions are in parallel computed with the help of a small lookup table. The CORDIC algorithm is employed also in the second subrotation, where the rotation directions are readily available after a simple recoding of the bits of the residual angle. The final rotation is multiplier based to reduce circuit latency and increase performances. A detailed error analysis and sizing methodology is given in this paper. It is shown that different versions of the architecture can be conceived by varying the dimensions of the second block and the topology of the third block. The proposed architecture exhibits very good performances, owing to the efficient carry-save implementation of CORDIC datapaths, the reduced lookup table, and the small size of multipliers. Implementations in a 0.25- mum CMOS technology are presented in order to demonstrate the design methodology and to investigate the implementation tradeoffs.
    Circuits and Systems I: Regular Papers, IEEE Transactions on 03/2009; · 1.97 Impact Factor
  • Article: A 430 MHz, 280 mW Processor for the Conversion of Cartesian to Polar Coordinates in 0.25 CMOS
    A. Strollo, D. De Caro, N. Petra
    [show abstract] [hide abstract]
    ABSTRACT: A novel architecture to realize the conversion of rectangular to polar coordinates is presented in this paper. The proposed technique for phase calculation uses a logarithmic number system and does not require any multiplications, but only a few small tables and a few multi-operand additions. The modulus is computed by a constant multiplier, a lookup table, and a full multiplier. A test chip has been designed and fabricated in 0.25 mum CMOS. The realized circuit uses a novel high-speed modified double-pass transistor (DPL) full-adder cell to improve performance. The test chip includes two processors. The first one computes only the phase and reaches 482 MHz maximum clock frequency, with 0.37 mW/MHz power dissipation. The second processor computes the phase and modulus and works up to 430 MHz, with 0.64 mW/MHz. The experimental results compare favorably with previously reported architectures.
    IEEE Journal of Solid-State Circuits 12/2008; · 3.23 Impact Factor
  • Conference Proceeding: Low error truncated multipliers for DSP applications
    [show abstract] [hide abstract]
    ABSTRACT: The paper presents a new technique to design signed and unsigned truncated multipliers. Simple formulae are developed in the paper to describe the truncated multiplier with minimum mean square error for every inputspsila bit-width. With respect to previously proposed techniques, our analytical approach is more general and improves the accuracy of the multiplier. We have also compared the accuracy achievable with the proposed truncated multiplier with respect to the accuracy of a standard full-width multiplier in a typical DSP application. The results show that the proposed multiplier causes only a negligible loss in accuracy. On the other hand, the area and the power dissipation of the DSP datapath are both improved by 16%.
    Electronics, Circuits and Systems, 2008. ICECS 2008. 15th IEEE International Conference on; 10/2008
  • Article: Reducing Lookup-Table Size in Direct Digital Frequency Synthesizers Using Optimized Multipartite Table Method
    D. De Caro, N. Petra, A. Strollo
    [show abstract] [hide abstract]
    ABSTRACT: The use of the multipartite table methods (MTMs) to implement high-performance direct digital frequency synthesizers (DDFSs) is investigated in this paper. A closed-form expressions for the spurious-free dynamic range (SFDR) is obtained when a single table of offset (TO) is used in the multipartite approximation. In this case, the optimal design that minimizes storage requirement for a given SFDR can be obtained analytically. A numerical algorithm is also presented to obtain the optimal design also when two or more TOs are employed is the approximation. The VLSI implementation results and the comparison with previously proposed DDFS architectures demonstrate the effectiveness of multipartite table methods for the realization of high performance direct digital synthesizers.
    Circuits and Systems I: Regular Papers, IEEE Transactions on 09/2008; · 1.97 Impact Factor
  • Conference Proceeding: A high performance floating-point special function unit using constrained piecewise quadratic approximation
    [show abstract] [hide abstract]
    ABSTRACT: A special function unit, able to compute square root, reciprocal square root, logarithm and exponential functions is presented in this paper. The system supports single precision IEEE-754 floating-point standard and uses a novel constrained piecewise quadratic interpolation technique to approximate the implemented functions. The proposed approach allows to reduce look-up table size of 40% with respect to previously proposed techniques. The SFU has been implemented in a test chip in 0.18 mum CMOS. A maximum clock frequency of 420 MHz and a power dissipation of 160 mW@420 MHz have been measured.
    Circuits and Systems, 2008. ISCAS 2008. IEEE International Symposium on; 06/2008
  • Conference Proceeding: Constrained piecewise polinomial approximation for hardware implementation of elementary functions
    [show abstract] [hide abstract]
    ABSTRACT: Conference code: 74828, Export Date: 26 April 2013, Source: Scopus, Art. No.: 4674949, :doi 10.1109/ICECS.2008.4674949
    Proceedings of the 15th IEEE International Conference on Electronics, Circuits and Systems, ICECS 2008, St. Julian's; 01/2008
  • Article: A Novel Architecture for Galois Fields GF(2^m) Multipliers Based on Mastrovito Scheme
    [show abstract] [hide abstract]
    ABSTRACT: In this paper, a new GF(2<sup>m</sup>) multiplier for standard-basis representation is developed. The proposed multiplier implements the Mastrovito multiplication scheme and can be designed for every field GF(2<sup>m</sup>). A minimum-area implementation of the first block of Mastrovito multiplier and a high-speed delay-driven tree architecture for the second block of the Mastrovito multiplier are employed in the new circuit. Multiplier complexity and delay are analytically evaluated for many polynomial classes. Timing and area occupation performances of the proposed multiplier are also calculated for many fields used in Reed-Solomon codes applications and compared with those of previously proposed solutions. The comparison shows that the proposed multiplier outperforms previous architectures for every considered GF(2<sup>m</sup>) field. The effectiveness of the proposed solution in a real application is verified by implementing in a 0.25 mum CMOS technology the key equation solving block of a (255, 239) Reed-Solomon decoder. The use of the proposed multiplier in this application results in a substantial speed improvement without any penalty in the silicon area occupation.
    IEEE Transactions on Computers 12/2007; 56(11):1470-1483. · 1.10 Impact Factor
  • Conference Proceeding: High speed galois fields GF(2<sup>m</sup>) multipliers
    [show abstract] [hide abstract]
    ABSTRACT: In the paper a new GF(2<sup>m</sup>) multiplier for standard basis representation is developed. Proposed multiplier can be designed for every field GF(2<sup>m</sup>). Multiplier complexity and delay are analytically evaluated for many polynomial classes. Timing and area occupation performances of the proposed multiplier are compared with those of previously proposed solutions. The comparison shows that the proposed multiplier outperforms previous architectures for every considered GF(2<sup>m</sup>) field.
    Circuit Theory and Design, 2007. ECCTD 2007. 18th European Conference on; 09/2007
  • Conference Proceeding: Design of fixed-width multipliers with minimum mean square error
    [show abstract] [hide abstract]
    ABSTRACT: The paper introduces a new technique to design signed and unsigned n x n bit fixed-width multipliers with minimum mean square error. In previous papers the error minimization of fixed-width multipliers was achieved through exhaustive searches, and is practically computable only for small n values. This is the first paper in which the error compensation function of the multiplier is computed analytically, giving a result which is optimal for any value of n. The proposed approach results in improved accuracy with respect to previously proposed techniques. The paper also compares the experimental performances, in a 0.18 mum CMOS technology, of a 16 bit full-width multiplier and of a fixed-width multiplier designed with our approach. A 50% decrease of the power dissipation joined with a 13% increase of the maximum operating frequency has been measured.
    Circuit Theory and Design, 2007. ECCTD 2007. 18th European Conference on; 09/2007
  • Article: A 630 MHz, 76 mW Direct Digital Frequency Synthesizer Using Enhanced ROM Compression Technique
    [show abstract] [hide abstract]
    ABSTRACT: The paper presents a detailed description of a direct digital frequency synthesizer (DDFS) based on a Multipartite Table Method (MTM) which is a salient lookup table compression technique. A novel algorithm to find the optimal MTM decomposition which minimizes the ROM size while archiving a target spurious free dynamic range (SFDR) is presented in the paper. The DDFS designed with the proposed technique is ideally suited for a high clock frequency operation, requiring small lookup tables and simple multi-operand adders. Low-power operation is achieved through a power-driven synthesis, by using in the circuit two flip-flop topologies (with different power and delay performances). A test chip has been realized in 0.25 mum, 2.5 V technology. The circuit achieves a 90 dBc SFDR and operates at a maximum clock frequency of 630 MHz, with 76 mW power dissipation. By reducing the power supply at 1.8 V, a maximum operating frequency of 430 MHz was measured, with a total power dissipation as low as 24.9 mW
    IEEE Journal of Solid-State Circuits 03/2007; · 3.23 Impact Factor