-
[show abstract]
[hide abstract]
ABSTRACT: An optimal implementation of 128-Pt FFT/IFFT for low power IEEE 802.15.3a WPAN using pseudo-parallel datapath structure is
presented, where the 128-Pt FFT is devolved into 8-Pt and 16-Pt FFTs and then once again by devolving the 16-Pt FFT into 4×4
and 2×8. We analyze 128-Pt FFT/IFFT architecture for various pseudo-parallel 8-Pt and 16-Pt FFTs and an optimum datapath architecture
is explored. It is suggested that there exists an optimum degree of parallelism for the given algorithm. The analysis demonstrated
that with a modest increase in area one can achieve significant reduction in power. The proposed architectures complete one
parallel-to-parallel (i.e., when all input data are available in parallel and all output data are generated in parallel) 128-point
FFT computation in less than 312.5ns and thereby meet the standard specification. The relative merits and demerits of these
architectures have been analyzed from the algorithm as well as implementation point of view. Detailed power analysis of each
of the architectures with a different number of data paths at block level is described. We found that from power perspective
the architecture with eight datapaths is optimum. The core power consumption with optimum case is 60.6MW which is only less
than half of the latest reported 128-point FFT design in 0.18u technology. Furthermore, a Single Event Upset (SEU) tolerant
scheme for registers is also explored. The SEU tolerant scheme will not affect the performance, however, there is an increase
power consumption of about 42percent. Apart from the low power consumption, the advantages of the proposed architectures
include reduced hardware complexity, regular data flow and simple counter based control.
KeywordsFFT–Low power–System on chip–WPAN–Single event upset
Circuits Systems and Signal Processing 04/2012; 30(4):871-882. · 0.82 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Concurrent error detection and correction is an effective way to mitigate fault attacks in cryptographic hardware. Recent work on differential power analysis shows that even mathematically-secure cryptographic protocols may be vulnerable at the physical implementation level. By measuring energy consumed by a working digital circuit, it is possible to gain valuable information about the encryption algorithms used and even the specific encryption keys. Thwarting such attacks requires a new approach to logic and physical designs. This paper presents a systematic approach to fault tolerant cryptographic hardware designs. Firstly, the effectiveness of the Hamming code based error correction schemes as a fault tolerance method in stream ciphers is investigated. Coding is applied to Linear Feedback Shift Registers (LFSR) based stream cipher implementations. The method was implemented on industrial standard stream ciphers, e.g. A5/1(GSM), E0 (Bluetooth), RC4 (WEP), and W7. The performance variation of stream cipher algorithms with error detection and correction was studied by synthesising the designs on Field Programmable Logic Arrays (FPGA) and Application Specific Integrated Circuits (ASIC). Further, we analyse hardware building blocks to minimise switching activity of a circuit over all possible inputs and input transitions by adding redundant gates and increasing the overall number of signal transitions. We also discuss the overhead and compositional properties of uniformly-switching circuits.
VLSI System on Chip Conference (VLSI-SoC), 2010 18th IEEE/IFIP; 10/2010
-
[show abstract]
[hide abstract]
ABSTRACT: This study presents a simplified structure of bit parallel systolic multiplier over Galois fields (GFs) over the set GF(2 m ) suitable for cryptographic hardware implementation. A redundant standard basis representation with the irreducible all one polynomial is considered. The systolic multiplier consists of ( m +1)<sup>2</sup> identical cells, each consisting of one two-input AND gate, one two-input XOR gate and two one-bit latches. This architecture is well suited to very large-scale integration implementation because of its regularity modular structure and unidirectional data flow. The proposed multipliers have clock cycle latency of ( m +1). This architecture has a total reduction of m <sup>2</sup> D-flip-flops compared to earlier bit parallel systolic multiplication architecture. As the finite-field multiplier is one of the complex blocks in cryptographic hardware and need secure testability to avoid unwanted access into the on-chip security blocks, the authors also introduce an on-chip testing scheme. The authors propose a test generation technique for detecting stuck-at fault (SAF), transition delay fault (TDF), stuck-open fault (SOF) and path delay faults (PDFs) at the gate and cell level in the systolic architecture. The authors also show that realistic sequential cell fault can be detected only by 12 single input change test vectors in the complete systolic multiplier over GF(2 m ). The proposed technique derives test vectors from the cell expressions of systolic multipliers without any requirement of an automatic test pattern generation tool. The complete systolic architecture is C-testable for SAF, TDF, SOF and PDF with only 12 constant tests. The test vectors are independent of the multiplier size. The test set provides 100% single SAF, TDF, SOF and PDF coverage.
IET Computers & Digital Techniques 10/2010; · 0.45 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents a test generation technique for detecting stuck-at (SAF) and transition delay fault (TDF) at gate level in the finite-field systolic multiplier over GF(2<sup>m</sup>) based on polynomial basis. The proposed technique derives test vectors from the cell expressions of systolic multipliers without any requirement of Automatic test Pattern Generation (ATPG) tool. The complete systolic architecture is C-testable for SAF and TDF with only six constant tests. The test vectors are independent of the multiplier size. The test set provides 100% single SAF and TDF coverage.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems 10/2010; · 1.22 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Montgomery Algorithm for modular multiplication with a large modulus has been widely used in public key cryptosystems for secured data communication. This paper presents a digit-serial systolic multiplication architecture for all-one polynomials (AOP) over GF(2<sup>m</sup>) for efficient implementation of Montgomery Multiplication (MM) Algorithm suitable for cryptosystem. Analysis shows that the latency and circuit complexity of the proposed architecture are significantly less than those of earlier designs for same classes of polynomials. Since the systolic multiplier has the features of regularity, modularity and unidirectional data flow, this structure is well suited to VLSI implementations. The proposed multipliers have clock cycle latency of (2N - 1), where N = ??m/L??, m is the word size and L is the digit size. No digit serial systolic architecture based on MM algorithm over GF(2<sup>m</sup>) is reported before. The architecture is also compared to two well known digit serial systolic architectures.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems 06/2010; · 1.22 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Recent studies have shown that an attacker can retrieve confidential information from cryptographic hardware (e.g. the secret key) by introducing internal faults. A secure and reliable implementation of cryptographic algorithms in hardware must be able to detect or correct such malicious attacks. Error detection/correction (EDC), through fault tolerance, could be an effective way to mitigate such fault attacks in cryptographic hardware. To this end, we analyze the area, delay, and power overhead for designing the S-Box, which is one of the main complex blocks in the Advanced Encryption Standard (AES), with error detection and correction capability. We use multiple Parity Predictions (PPs), based on various error correcting codes, to detect and correct errors. Various coding techniques are presented, which include simple parity prediction, split parity codes, Hamming, Hsiao, and LDPC codes. The S-Box, GF(p), and PP circuits are synthesized from the specifications, while the decoding and correction circuits are combined to form the complete designs. The analysis shows a comparison of the different approaches characterized by their error detection capability.
Quality Electronic Design (ISQED), 2010 11th International Symposium on; 04/2010
-
[show abstract]
[hide abstract]
ABSTRACT: We propose a C-testable S-box implementation which is one of the most complex blocks in AES hardware implementation. Only 12 constant vectors are sufficient to achieve 100% fault coverage in the S-box. C-testability is achieved with an extra hardware overhead of 8.2 percent.
On-Line Testing Symposium, 2009. IOLTS 2009. 15th IEEE International; 07/2009
-
[show abstract]
[hide abstract]
ABSTRACT: Motivated by the problems associated with soft errors in digital circuits and fault-related attacks in cryptographic hardware, a systematic method for designing single error correcting multiplier circuits is presented for finite fields or Galois fields over GF(2<sup>m</sup>). Multiple parity predictions to correct single errors based on the Hamming principles are used. The expressions for the parity prediction are derived from the input operands, and are based on the primitive polynomials of the fields. This technique, when compared with existing ones, gives better performance. It is shown that single error correction (SEC) multipliers over GF(2<sup>m</sup>) require slightly over 100% extra hardware, whereas with the traditional SEC techniques, this figure is more than 200%. Since single bit internal faults can cause multiple faults in the outputs, this has also been addressed here by using multiple Hamming codes with optimised hardware.
IET Computers & Digital Techniques 06/2009; · 0.45 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents an error tolerant hardware efficient VLSI architecture for bit parallel systolic multiplication over dual base, which can be pipelined. This error tolerant architecture is well suited to VLSI implementation because of its regularity, modular structure, and unidirectional data flow. The length of the largest delay path and area of this architecture are less compared to the bit parallel systolic multiplication architectures reported earlier. The architecture is implemented using Austria Micro System's 0.35 mum CMOS technology. This architecture can also operate over both the dual-base and polynomial base.
Testing and Diagnosis, 2009. ICTD 2009. IEEE Circuits and Systems International Conference on; 05/2009
-
[show abstract]
[hide abstract]
ABSTRACT: of the most demanding requirements. Efficient testable logic synthesis is one way to tackle the problem. To this end, this paper introduces a new fast efficient graph-based decomposition technique for Boolean functions in finite fields, which utilizes the data structure of the Multiple-Output Decision Diagrams (MODD). In particular, the proposed technique is based on finite fields and can decompose any N valued arbitrary function F into N distinct sets conjunctively and N-1 distinct sets disjunctively. The proposed technique is capable of generating testable circuits. The experimental results show that the proposed method is more economical in terms of literal count compared to existing approaches. Furthermore, we have shown that the basic block can be tested with eight test vectors. Key terms: Multiple-Output Decision Diagrams, testable, Galois Field, OBDD, Multipliers.
VLSI Design, 2008. VLSID 2008. 21st International Conference on; 02/2008
-
[show abstract]
[hide abstract]
ABSTRACT: designing single error correcting Galois field multipliers over polynomial basis. The proposed method uses multiple parity prediction circuits to detect and correct logic errors and gives 100% fault coverage both in the functional unit and the parity prediction circuitry. Area, power and delay overhead for the proposed design technique is analyzed. It is found that compared to the traditional Triple Modular Redundancy (TMR) techniques for single error correction the proposed technique is very cost efficient. Index Terms: Error Correcting Codes, Galois Field Multiplier, Cryptography, VLSI.
VLSI Design, 2008. VLSID 2008. 21st International Conference on; 02/2008
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents a new method for implementing Galois field multipliers over polynomial basis. The proposed method minimizes the number logic gates by reusing the same hardware. The proposed architecture can be adaptively resized according to the requirement, thereby giving more design flexibility. Design analysis based 0.18 micron technology shows that the proposed design with minimum latency is area/power efficient for finite field multipliers of size greater than 10.
Norchip, 2007; 12/2007
-
[show abstract]
[hide abstract]
ABSTRACT: We present a C-testable method for detecting stuck-at (s-a) faults in the polynomial basis (PB) bit parallel multiplier circuits over GF(2<sup>m</sup>). It requires only 7 tests for detecting faults to provide 100% fault coverage, which is independent of the multiplier size. These 7 tests can be derived directly without any requirement of ATPG tools. Synopsysreg tool is used to generate ATPG based test patterns.
On-Line Testing Symposium, 2007. IOLTS 07. 13th IEEE International; 08/2007
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents an error tolerant hardware efficient VLSI architecture for bit parallel systolic multiplication over dual base, which can be pipelined. This error tolerant architecture is well suited to VLSI implementation because of its regularity, modular structure, and unidirectional data flow. The length of the largest delay path and area of this architecture are less compared to the bit parallel systolic multiplication architectures reported earlier. The architecture is implemented using Austria Micro System's 0.35um CMOS technology. This architecture can also operate over both the dual-base and polynomial base.
-
[show abstract]
[hide abstract]
ABSTRACT: A testable implementation of bit parallel multiplier over the finite field GF(2m) is proposed. A function independent test set of length (2m+4), which detects all the single stuck-at faults in an m bit GF(2m) multiplier circuit, is also presented. Test set can be determined readily from the corresponding algebraic forms without running an ATPG tool. The test complexity is lower than ATPG generated or algorithmic test set. The test set provides 100 percent single stuck-at fault coverage. The gate counts of the proposed testable multiplier as a function of degree m has been analyzed. The testable circuit realization requires only two extra inputs for controllability and some additional EX ns and need field testing, built-in self-test (BIST) circuit may be used to generate test pattern internally for detecting faults in the multiplier circuits
High-Level Design, Validation, and Test Workshop, IEEE International.
-
[show abstract]
[hide abstract]
ABSTRACT: We present a C-testable design of polynomial basis (PB) bit-parallel (BP) multipliers over GF(2m) for 100% coverage of stuck-at faults. Our design method also includes the method for test vector generation, which is simple and efficient. C-testability is achieved with three control inputs and approximately 6% additional hardware. Only 8 constant vectors are required irrespective of the sizes of the fields and primitive polynomial. We also present a Built-In Self-Test (BIST) architecture for generating the test vectors efficiently, which eliminates the need for the extra control inputs. Since these circuits have critical applications as parts of cryptography (e.g., Elliptic Curve Crypto (ECC) systems) hardware, the BIST architecture may provide with added level of security, as the tests would be done internally and without the requirement of probing by external testing equipment. Finally we present experimental results comprising the area, delay and power of the testable multipliers of various sizes with the help of the Synopsys® tools using UMC 0.18 micron CMOS technology library.
ACM Transactions on Design Automation of Electronic Systems 13(1):5. · 0.81 Impact Factor