Conference Paper

Testing of floating point unit using BIST with parallelism

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Arithmetic computations can be on integer or floating(real) numbers. In digital systems, ALU handles arithmetic operations. However, ALU is not suitable for handling operations on real numbers as the result may not be precise and accurate. Hence to perform operations on real numbers digital systems use a dedicated unit called floating point unit(FPU). In this paper, the designed FPU is single precision and operates on IEEE - 754 - 2008 format. The available arithmetic operations on this FPU are floating point multiplication, division, addition and subtraction. The designed FPU can operate on both normal(normalized) and subnormal(denormalized) numbers present in floating point numbers. In this paper, stuck-at fault model using Built in self test(BIST) method is designed for the floating point unit to check the fault in the design. Basic idea behind the BIST is testing the device by itself. The proposed design is modified for parallel testing by dividing the FPU into 3 independent blocks. In this method when one of the blocks is in its normal operation the other block of the FPU is tested in parallel. The design's RTL code is written in Verilog HDL and Xilinx Vivado 2015 is used for simulation. The proposed method reduces the dynamic power by 10.47% compared to the conventional method.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
The paper describes the fused floating-point Add-Subtract Unit using IEEE-754 standard 32 bit floating point number representation. In fused add-subtract unit, both add unit and subtract unit perform parallel operation and this approach of fused unit reduces the hardware required and also the cost of designed unit. Besides addition and subtraction if one more operation suppose multiplication is to be performed then a multifunctional design is required. The multi-functional floating-point unit includes a multiplier unit above a Fused Add-Subtract Unit which uses hardware more efficiently in comparison to the separate unit blocks for each operation. This method reduces the area of the designed block but the speed of operation is also reduced. The blocks are reduced based on the common operation between each designed unit. For example, in every unit rounding operation is done after each operation for floating point number. If a common block for rounding is designed then the hardware is reduced. The paper also compares one add unit with fused floating point add-subtract unit and one multifunctional unit Keywords— floating point arithmetic operations, fused add-subtract unit, multiplier and multifunctional unit design Introduction.
Conference Paper
Full-text available
Arithmetic circuits plays an important role in digital systems. Realization of complex digital circuits is possible with development in very large scale integration (VLSI) circuit technology. In this paper an arithmetic unit based on IEEE-754 standard for floating point numbers has been implemented on Spartan3E XC3S500e FPGA Board. Here Floating Point Unit (FPU) follows IEEE single precision format. Various arithmetic operations such as, addition, subtraction multiplication and division on floating point numbers have been performed on arithmetic unit. Novel approach of converting fixed to floating point saves around 30% of slices and can perform 50 Mega floating point operations per second on Spartan 3E FPGA at 50 MHz clock. Arithmetic operations using proposed conversion optimize space and speed requirements.
Article
Full-text available
Export Date: 16 September 2015, References: Zimmerman, R., Efficient VLSI Implementation of Modulo 2n+1T Addition and Multiplication (1999) Proc. 14th IEEE Symp. Computer Arithmetic, pp. 158-167. , Apr;
Conference Paper
Full-text available
The comparison of performance and accuracy of the Active Power Filter control algorithm realizations has been presented in the paper. The worked out algorithms are based on dq synchronous frame control strategy. The following Texas Instruments digital signal controllers were used for the realization: fixed-point TMS320F2812 and floating-point TMS320F28335. The TMS320F28335 digital signal controller is the first floating-point controller in the C2000 family. It combines flexible control-specific interfaces and ease of use of an industrial microcontroller (MCU) with the processing power of DSP technology (up to 300 MFLOPS). Using of the floating-point digital signal processor allows improving accuracy of the computation, particularly in digital filtering (IIR low-pass filter) which has a direct influence on the quality of the generated current waveforms. The worked out algorithms have been practically verified on APF prototype which was built in the Institute of Industrial Electrical Engineering and Computer Science. The presented problems have been illustrated by laboratory results.
Conference Paper
Multiplication of floating point numbers found extensive use in DSP applications involving huge range. The critical part in floating point multiplication is the multiplication of mantissas which uses 24∗24 bit integer multiplier for single precision floating point numbers. The speed of the system can be enhanced by improving the speed of multiplication. In this paper a 24 bit Vedic multiplier has been proposed using 3∗3 Vedic multiplier as its basic block. This paper proposes a IEEE-754 single precision floating point multiplier which handles over flow, under flow and rounding. The proposed and conventional floating point multipliers based on Vedic mathematics are coded in Verilog, Synthesized and simulated in ISE Simulator. It is implemented on iWave Systems Unified Learning Kit (ULK), which is Spartan6 family xc6slx25t-2fgg484 FPGA. Maximum combinational path delay and number of slices required on FPGA are compared for proposed and conventional multipliers. The results clearly indicate that proposed method have a great impact on improving the speed and reduce the area required on Spartan 6 FPGA.
Conference Paper
Coverage modeling is one of the fundamental tasks of the verification flow in systems development. The resulting model is commonly used to evaluate the progress and quality of the verification process, it also provides a useful abstraction for the generation of test vector patterns. This work presents an heuristic approach for coverage model definition based on the concepts of equivalence classes and boundary-value analysis to address the verification of floating point arithmetic units. As a case study, a coverage model was designed to verify the ADD operation of a floating point module for the binary16 number format defined in the IEEE 754-2008 standard, and a SystemVerilog testbench was implemented to perform the verification process. The effectiveness of the heuristic and the quality of the resulting model are analysed by measuring the coverage obtained in the execution of a third party test suite, and by generating a set of test vectors from the model and stimulating a design under verification (DUV) to detect bugs for design review.
Conference Paper
There are both the one-to-many and many-to-one types of linear-feedback shift registers. In this paper we filled the gap between them. Our linear-feedback shift register has many-to-many configuration. It generates the Galois field different from finite field formed Galois linear-feedback shift register. We consider inference for appropriate primitive polynomial. Our principle can be implemented very efficiently on common microprocessors.
Conference Paper
With a tremendous shipment increase of MEMS microphones during the last few years, enhanced testing methods have become a key issue for industrial batch manufacturing, focusing on the mechanical sensitivity of the sound transducing diaphragm. Instead of semi-parallel techniques, in which test channels and measurement equipment are duplicated (with a multiplication of invest) for the purpose of further reduction of test time, this paper shows a novel approach for a massive parallel test of MEMS sensors. With several DUTs connected in parallel, the resulting measurements hence reveal several overlain device characteristics, each accounting for the sensitivity of one single DUT. For the specific correlation of those characteristics to each DUT, a reconstruction method known from tomography imaging techniques is adapted. The measurement results on wafer level shown in this paper exhibit the basic suitability of a single test of several DUTs in parallel and prove the concept of this novel method.
Article
This paper presents an optimization scheme for the synthesis of built-in self-test circuitry for embedded cores. This scheme is based on a two dimensional linear feedback shift registers that make use of XOR and XNOR gates. The proposed scheme results in configuration networks with up to 33% reduction in XOR gate inputs and up to 25% reduction in transistor count as compared to prior work in 2-D linear feedback shift registers. I. INTRODUCTION The increasing integration levels of modern integrated circuits combined with the need for the reduced time-to- market and reduced system costs, results in multicore and systems-on-chip (SoC) implementations as the new design paradigm. These systems use hardware intellectual property (IP) cores as their functional building blocks. These cores are in either synthesizable soft core form or in fully implemented hard core form, comprising of components such as processors, peripheral interfaces, controllers, memory, etc (1). When the system is implemented, the embedded cores need to be tested for manufacturing errors. Ensuring that the cores are functionally correct reduces the test complexity of the overall system. Typically, the IP core provider presents the precomputed test sequences required to test the cores. Because of the limited number of input and output pins on the SoC, built-in self test (BIST) is the better option to implement the test circuitry as compared to the use of Automatic Test Equipment. Several test generation techniques have been developed to generate deterministic ordered test vectors such as linear feedback shift registers, counter-based techniques, and scan-based test pattern generators (2-5). Linear feedback shift registers (LFSR) use simple encoding and decoding procedures for the implementation of the BIST logic and provides good tradeoff between circuit performance degradation with inserted BIST and at-speed testing. A LFSR is an autonomous finite state machine, configured as a shift register using XOR and XNOR gates in the feedback connections. The flip-flops used to implement the shift register are initialized to an initial state called seed. The test patterns are generated by feeding selective outputs from previous states to the XOR/XNOR gates to generate the current state input as shown in Figure 1. A 2-D LFSR consists of a two dimensional array of shift registers. Some of the outputs of the flip-flops in the array of shift registers are fed back to the XOR/XNOR input gates. These XOR/XNOR gates are used to generate the required test patterns. Even though this type of shift register covers many faults, it requires more hardware to generate the required test sequences. Chen and George proposed a configurable 2-D LFSR consisting of a combination of multiplexer, control unit, flip- flop array and XOR gates in the feedback network to generate the required set of test sequences. These test sequences are used to detect the random pattern resistant faults and the random pattern detectable faults (6). The control unit allows the switching of the outputs from different configuration arrays to appropriate cores. Zhang et al partitioned the test sequences into disjoint subsequences and uses integer programming model to optimize the test sequences for implementation as a 2-D LFSR (7). Recently, configuration network using NAND, NOT, NOR gates along with XOR/XNOR gates (8), and AOI/OAI gates (9), have been developed that use less number of transistors in BIST circuitry. Although these techniques reduce the complexity of the configuration network, the overall circuit cannot be strictly called a LFSR as the output may not be a linear function of previous state. A linear feedback function uses a combination of XOR and XNOR gates only. This work concentrates on optimizing configurable 2D LFSR using XOR and XNOR gates. Section 2 presents the details of a configurable 2-D LFSR. Section 3 presents the optimized scheme for the synthesis of 2-D LFSR BIST hardware using XOR/XNOR gates. In Section 4, the results of the work are presented and compared with existing schemes followed by conclusions.
Article
Online periodic testing of microprocessors is a valuable means to increase the reliability of a low-cost system, when neither hardware nor time redundant protection schemes can be applied. This is particularly valid for floating-point (FP) units, which are becoming more common in embedded systems and are usually protected from operational faults through costly hardware redundant approaches. In this paper, we present scalable instruction-based self-test program development for both single and double precision FP units considering different instruction sets (MIPS, PowerPC, and Alpha), different microprocessor architectures (32/64-bit architectures) and different memory configurations. Moreover, we introduce bit-level manipulation instruction sequences that are essential for the development of FP unit's self-test programs. We developed self-test programs for single and double precision FP units on 32-bit and 64-bit microprocessor architectures and evaluated them with respect to the requirements of low-cost online periodic self-testing: fault coverage, memory footprint, execution time, and power consumption, assuming different memory hierarchy configurations. Our comprehensive experimental evaluations reveal that the instruction set architecture plays a significant role in the development of self-test programs. Additionally, we suggest the most suitable self-test program development approach when memory footprint or low power consumption is of paramount importance.
Conference Paper
On-line periodic testing of microprocessors is a viable low-cost alternative for a wide variety of embedded systems which cannot afford hardware or software redundancy techniques but necessitate the detection of intermittent or permanent faults. Low-cost, on-line periodic testing has been previously applied to the integer datapaths of microprocessors but not to their high-performance real number processing counterparts consisting of sophisticated high-speed floating-point (FP) units. In this paper, we present, an effective on-line periodic self-testing methodology for high-speed FP units and demonstrate it on high-speed FP adders/subtracters of both single and double precision. The proposed self-test code development methodology leads to compact self-test routines that exploit the integer part of the processors instruction set architecture to apply test sets to the FP subsystem periodically. The periodic self-test routines exhibit very low memory storage requirements along with a very small number of memory references which are both fundamental requirements for on-line periodic testing. A comprehensive set of experiments on both single and double precision FP units including pipelined versions, and on a RISC processor with a complete FP unit demonstrate the efficacy of the methodology in terms of very high fault coverage and low memory footprint thus rendering the proposed methodology highly appropriate for on-line periodic testing.
Conference Paper
A configurable 2D LFSR based test generator and an automated synthesis procedure is presented. Without storage of test patterns, a 2D LFSR based test pattern generator can generate a sequence of pre-computed test patterns (detecting random-pattern-resistant faults) and followed by random patterns (detecting random-pattern-detectable). The hardware overhead is decreased considerably through configuration. The configurable 2D LFSR test generator can be adopted in two basic BIST execution options: test-per-clock (parallel BIST) and test-per-scan (serial BIST). Experimental results of test-per-clock and test-per-scan BIST of benchmark circuits demonstrate the effectiveness of the proposed technique. The configurable 2D LFSR can also be adopted in chip-level and system-on-chip (SoC) BIST.
Article
Denormalized numbers are the most difficult type of numbers to implement in floating-point units. They are so complex that certain designs have elected to handle them in software rather than in hardware. Traps to software can result in long execution times, which renders denormalized numbers useless to programmers. This does not have to happen. With a small amount of additional hardware, denormalized numbers and underflows can be handled close to the speed of normalized numbers. This paper summarizes the little known techniques for handling denormalized numbers. Most of the techniques described here only appear in filed or pending patent applications.
Implementation of a High Speed Single Precision Floating Point Unit using Verilog
  • G Ushasree