Journal of Cryptographic Engineering

Published by Springer Nature

Online ISSN: 2190-8516

·

Print ISSN: 2190-8508

Articles


Faster 64-bit universal hashing using carry-less multiplications
  • Article

March 2015

·

231 Reads

·

Intel and AMD support for the Carry-less Multiplication (CLMUL) instruction set in their x64 processors. We use CLMUL to implement an almost universal 64-bit hash family (CLHASH). We compare this new family with what might be the fastest almost universal family on x64 processors (VHASH). We find that CLHASH is at least 60% faster. Yet CLHASH has better universality while using about the same number of random bytes. We also compared CLHASH with a popular hash function designed for speed (Google's CityHash). We find that CLHASH is 40% faster than CityHash on inputs larger than 64 bytes and nearly as fast otherwise.
Share

A low-area unified hardware architecture for the AES and the cryptographic hash function ECHO

August 2011

·

39 Reads

We propose a compact coprocessor for the AES (encryption, decryption, and key expansion) and the cryptographic hash function ECHO on Virtex-5 and Virtex-6 FPGAs. Our architecture is built around an 8-bit datapath. The Arithmetic and Logic Unit performs a single instruction that allows for implementing AES encryption, AES decryption, AES key expansion, and ECHO at all levels of security. Thanks to a careful organization of AES and ECHO internal states in the register file, we manage to generate all read and write addresses by means of a modulo-16 counter and a modulo-256 counter. A fully autonomous implementation of ECHO and AES on a Virtex-5 FPGA requires 193 slices and a single 36k memory block, and achieves competitive throughputs. Assuming that the security guarantees of ECHO are at least as good as the ones of the SHA-3 finalists BLAKE and Keccak, our results show that ECHO is a better candidate for low-area cryptographic coprocessors. Furthermore, the design strategy described in this work can be applied to combine the AES and the SHA-3 finalist Grøstl. KeywordsHash functions–SHA-3–ECHO–AES–FPGA

summary of the differential attacks applied to a four-round AES. The computational effort has been determined by simulation or through previous estimates (such as given in Section 5.3).
Practical complexity differential cryptanalysis and fault analysis of AES
  • Article
  • Full-text available

November 2011

·

303 Reads

This paper presents a survey of practical complexity differential cryptanalysis of AES and compares this to attacks that have been proposed for differential fault analysis. Naturally, the attacks in each vein of research are applicable in the other but use different models. In this paper we draw from both topics to improve attacks proposed in the literature. We re-evaluate the so-called Square attack and the use of impossible differentials in terms of differential fault analysis using a weaker model than previously considered in the literature. Furthermore, we propose two new attacks applicable to both differential cryptanalysis and differential fault analysis. The first is a differential cryptanalysis of four-round AES based on a differential that occurs with a non-negligible probability. The second is an application of the Square attack to a five-round AES that requires 28 ciphertexts and a time complexity equivalent to approximately 237.5 AES encryptions. KeywordsDifferential cryptanalysis–Advanced encryption standard–Differential fault analysis
Download

Table 1 Best operation counts and memory usage for various co-Z addition formulae.
Table 2 Comparison of regular scalar multiplication algorithms. 
Scalar multiplication on Weierstraß elliptic curves from Co-Z arithmetic

August 2011

·

470 Reads

Raveen R. Goundar

·

Marc Joye

·

·

[...]

·

In 2007, Meloni introduced a new type of arithmetic on elliptic curves when adding projective points sharing the same Z-coordinate. This paper presents further co-Z addition formulæ (and register allocations) for various point additions on Weierstraß elliptic curves. It explains how the use of conjugate point addition and other implementation tricks allow one to develop efficient scalar multiplication algorithms making use of co-Z arithmetic. Specifically, this paper describes efficient co-Z based versions of Montgomery ladder, Joye’s double-add algorithm, and certain signed-digit algorithms, as well as faster (X, Y)-only variants for left-to-right versions. Further, the proposed implementations are regular, thereby offering a natural protection against a variety of implementation attacks. KeywordsElliptic curves–Meloni’s technique–Jacobian coordinates–Regular ladders–Implementation attacks–Embedded systems

Univariate side channel attacks and leakage modeling

April 2012

·

508 Reads

Differential power analysis is a powerful cryptanalytic technique that exploits information leaking from physical implementations of cryptographic algorithms. During the two last decades, numerous variations of the original principle have been published. In particular, the univariate case, where a single instantaneous leakage is exploited, has attracted much research effort. In this paper, we argue that several univariate attacks among the most frequently used by the community are not only asymptotically equivalent, but can also be rewritten one in function of the other, only by changing the leakage model used by the adversary. In particular, we prove that most univariate attacks proposed in the literature can be expressed as correlation power analyses with different leakage models. This result emphasizes the major role plays by the model choice on the attack efficiency. In a second point of this paper, we hence also discuss and evaluate side channel attacks that involve no leakage model but rely on some general assumptions about the leakage. Our experiments show that such attacks, named robust, are a valuable alternative to the univariate differential power analyses. They only loose bit of efficiency in case a perfect model is available to the adversary, and gain a lot in case such information is not available. KeywordsSide channel attack–Correlation–Regression–Model

Table 2 : Replacement sequences for some idempotent instructions 
Table 4 : Countermeasures overhead for several implementations 
Figure 4: Transition systems for the bl instruction and its replacement sequence MC: formula passed-AG(AF(adcs.pc=PC1)) MC: formula passed-AG(AF(cm(adcs).pc=PC1)) MC: formula passed-AG(((adcs.pc=PC1*cm(adcs).pc=PC1)->LIGHT_RESULT=1)) MC: formula failed-AG(((adcs.pc=PC1*cm(adcs).pc=PC1)->RESULT=1)) 
Formal verification of a software countermeasure against instruction skip attacks

February 2014

·

210 Reads

Fault attacks against embedded circuits enabled to define many new attack paths against secure circuits. Every attack path relies on a specific fault model which defines the type of faults that the attacker can perform. On embedded processors, a fault model consisting in an assembly instruction skip can be very useful for an attacker and has been obtained by using several fault injection means. To avoid this threat, some countermeasure schemes which rely on temporal redundancy have been proposed. Nevertheless, double fault injection in a long enough time interval is practical and can bypass those countermeasure schemes. Some fine-grained countermeasure schemes have also been proposed for specific instructions. However, to the best of our knowledge, no approach that enables to secure a generic assembly program in order to make it fault-tolerant to instruction skip attacks has been formally proven yet. In this paper, we provide a fault-tolerant replacement sequence for almost all the instructions of the Thumb-2 instruction set and provide a formal verification for this fault tolerance. This simple transformation enables to add a reasonably good security level to an embedded program and makes practical fault injection attacks much harder to achieve.

A Formal Proof of Countermeasures against Fault Injection Attacks on CRT-RSA

January 2014

·

137 Reads

In this article, we describe a methodology that aims at either breaking or proving the security of CRT-RSA implementations against fault injection attacks. In the specific case-study of the BellCoRe attack, our work bridges a gap between formal proofs and implementation-level attacks. We apply our results to three implementations of CRT-RSA, namely the unprotected one, that of Shamir, and that of Aum\"uller et al. Our findings are that many attacks are possible on both the unprotected and the Shamir implementations, while the implementation of Aum\"uller et al. is resistant to all single-fault attacks. It is also resistant to double-fault attacks if we consider the less powerful threat-model of its authors.

Modulus fault attacks against RSA–CRT signatures

September 2011

·

359 Reads

RSA-CRT fault attacks have been an active research area since their discovery by Boneh, DeMillo and Lipton in 1997. We present alternative key-recovery attacks on RSA-CRT signatures: instead of targeting one of the sub-exponentiations in RSA-CRT, we inject faults into the public modulus before CRT interpolation, which makes a number of countermeasures against Bonehet al.’s attack ineffective. Our attacks are based on orthogonal lattice techniques and are very efficient in practice: depending on the fault model, between 5 to 45 faults suffice to recover the RSA factorization within a few seconds. Our simplest attack requires that the adversary knows the faulty moduli, but more sophisticated variants work even if the moduli are unknown, under reasonable fault models. All our attacks have been fully validated experimentally with fault-injection laser techniques. KeywordsFault Attacks–Digital Signatures–RSA–CRT–Lattices

Figure 1. Average decryption time for given (correctable) error weights on a Linux AMD64 machine. The code parameters are m = 10 and (1024 , 524 , 101), m = 11 and (2048 , 1751 , 55), m = 12 and (2960 , 2288 , 113), m = 13 and (6624 , 5129 , 231). Measurements for a hardened decryption (using Algorithm 4) have been added for the cases m = 12 and m = 13. 
Figure 2. Average number of rounds needed for the extended euclidean algorithm in the inversion step of Algorithm 1. Averaged on 2000 random samples. 
Erratum to: Side-channel attacks on the McEliece and Niederreiter public-key cryptosystems

December 2011

·

308 Reads

Research within “post-quantum” cryptography has focused on development of schemes that resist quantum cryptanalysis. However, if such schemes are to be deployed, practical questions of efficiency and physical security should also be addressed; this is particularly important for embedded systems. To this end, we investigate issues relating to side-channel attack against the McEliece and Niederreiter public-key cryptosystems, for example improving those presented by Strenzke etal. (Side channels in the McEliece PKC, vol. 5299, pp. 216–229, 2008), and novel countermeasures against such attack. KeywordsPublic-key cryptography–McEliece–Niederreiter–Embedded systems–Side-channel attack

A fair evaluation framework for comparing side-channel distinguishers

August 2011

·

60 Reads

The ability to make meaningful comparisons between side-channel distinguishers is important both to attackers seeking an optimal strategy and to designers wishing to secure a device against the strongest possible threat. The usual experimental approach requires the distinguishing vectors to be estimated: outcomes do not fully represent the inherent theoretic capabilities of distinguishers and do not provide a basis for conclusive, like-for-like comparisons. This is particularly problematic in the case of mutual information-based side channel analysis (MIA) which is notoriously sensitive to the choice of estimator. We propose an evaluation framework which captures those theoretic characteristics of attack distinguishers having the strongest bearing on an attacker’s general ability to estimate with practical success, thus enabling like-for-like comparisons between different distinguishers in various leakage scenarios. We apply our framework to an evaluation of MIA relative to its rather more well-established correlation-based predecessor and a proposed variant inspired by the Kolmogorov–Smirnov distance. Our analysis makes sense of the rift between the a priori reasoning in favour of MIA and the disappointing empirical findings of previous comparative studies and moreover reveals several unprecedented features of the attack distinguishers in terms of their sensitivity to noise. It also explores—to our knowledge, for the first time—theoretic properties of near-generic power models previously proposed (and experimentally verified) for use in attacks targeting injective functions. KeywordsSide-channel analysis–Mutual information–Kolmogorov–Smirnov–Differential power analysis

Scaling efficient code-based cryptosystems for embedded platforms

December 2012

·

65 Reads

We describe a family of highly efficient codes for cryptographic purposes and dedicated algorithms for their manipulation. Our proposal is especially tailored for highly constrained platforms, and surpasses certain conventional and post-quantum proposals (like RSA and NTRU, respectively) according to most if not all efficiency metrics.

High-Speed High-Security Signatures

September 2011

·

472 Reads

This paper shows that a $390 mass-market quad-core 2.4GHz Intel Westmere (Xeon E5620) CPU can create 108000 signatures per second and verify 71000 signatures per second on an elliptic curve at a 2128 security level. Public keys are 32 bytes, and signatures are 64 bytes. These performance figures include strong defenses against software side-channel attacks: there is no data flow from secret keys to array indices, and there is no data flow from secret keys to branch conditions. KeywordsElliptic curves–Edwards curves–signatures–speed–software side channels–foolproof session keys

Fig. 1 Simulation results of S-box power consumption using both 130 nm and 65 nm GP/LP technologies (SVT devices) at nominal supply voltages.
Fig. 2 Measured total energy and delay scaling with V DD of S-box.  
Fig. 5 Measured energy per encryption vs. maximum throughput at ultra-low voltage. The inset shows the minimum supply voltage ensuring correct functionality (V limit ) for the 20 measured dies.  
Harvesting the potential of nano-CMOS for lightweight cryptography: An ultra-low-voltage 65 nm AES coprocessor for passive RFID tags

April 2011

·

224 Reads

An important challenge associated with the current massive deployment of Radio Frequency Identification solutions is to provide security to passive tags while meeting their micro Watt power budget. This can either be achieved by designing new lightweight ciphers, or by proposing advanced low-power implementations of standard ciphers. In this paper, we show that the AES algorithm can fit into this micro Watt power budget by combining ultra-low-voltage implementations with a proper selection of the process flavor in a low-cost nanometer CMOS technology. Interestingly, this approach only requires slight modifications to the standard EDA tool flow, without incurring the engineering costs of architecture optimizations. In order to demonstrate this claim, we successfully designed and manufactured an AES coprocessor in a 65 nm low-power CMOS process. We prove with measurement results obtained from a set of 20 manufactured dies that the proposed coprocessor can be safely operated down to 0.32 V with an energy per 128-bit encryption/decryption at least 2.75× lower than in previously published low-power AES implementations.

Higher-Order Glitches Free Implementation of the AES Using Secure Multi-party Computation Protocols
Higher-order side channel analysis (HO-SCA) is a powerful technique against cryptographic implementations and the design of appropriate countermeasures is nowadays an important topic. In parallel, another class of attacks, called glitches attacks, have been investigated which exploit the hardware glitches phenomena occurring during the physical execution of algorithms. Some solutions have been proposed to counteract HO-SCA at any order or to defeat glitches attacks, but no work has until now focused on the definition of a sound countermeasure thwarting both attacks. We introduce in this paper a circuit model in which side-channel resistance in the presence of glitches effects can be characterized. This allows us to construct the first glitch free HO-SCA countermeasure. The new construction can be built from any Secure Multi-Party Computation protocol and, as an illustration, we propose to apply the protocol introduced by Ben-Or et al at STOC in 1988. The adaptation of the latter protocol to the context of side-channel analysis results in a completely new higher-order masking scheme, particularly interesting when addressing resistance in the presence of glitches. An application of our scheme to the AES block cipher is detailed, as well as an information theoretic evaluation of the new masking function that we call polynomial masking.

Message-aimed side channel and fault attacks against public key cryptosystems with homomorphic properties

December 2011

·

40 Reads

In this work, we introduce a new timing vulnerability in the decryption operation of the McEliece cryptosystem. Furthermore, we review previously known side channel and fault attacks against the RSA and McEliece cryptosystems and analyze them with respect to their differences and similarities concerning the respective points of attack. We show that it is basically the homomorphic properties of these schemes that allow the special type of message-aimed attacks based on observing the decryption of manipulated versions of the respective ciphertext and derive an according methodology for the analysis of such schemes with respect to these attacks. Consequently, we present new side channel attacks against other public key cryptosystems with homomorphic properties and point out certain aspects that are special to the countermeasures against this type of attack.

An on-chip glitchy-clock generator for testing fault injection attacks

December 2011

·

219 Reads

This paper presents a glitchy-clock generator integrated in FPGA for evaluating fault injection attacks and their countermeasures on cryptographic modules. The proposed generator exploits clock management capabilities, which are common in modern FPGAs, to generate clock signal with temporal voltage spike. The shape and timing of the glitchy-clock cycle are configurable at run time. The proposed generator can be embedded in a single FPGA without any external instrument (e.g., a pulse generator and a variable power supply). Such integration enables reliable and reproducible fault injection experiments. In this paper, we examine the characteristics of the proposed generator through experiments on Side-channel Attack Standard Evaluation Board (SASEBO). The result shows that the timing of the glitches can be controlled at the step of about 0.17 ns. We also demonstrate its application to the safe-error attack against an RSA processor.

Extractors against side-channel attacks: Weak or strong?

November 2011

·

63 Reads

Randomness extractors are important tools in cryptography. Their goal is to compress a high-entropy source into a more uniform output. Beyond their theoretical interest, they have recently gained attention because of their use in the design and proof of leakage-resilient primitives, such as stream ciphers and pseudorandom functions. However, for these proofs of leakage resilience to be meaningful in practice, it is important to instantiate and implement the components they are based on. In this context, while numerous works have investigated the implementation properties of block ciphers such as the AES Rijndael, very little is known about the application of side-channel attacks against extractor implementations. In order to close this gap, this paper instantiates a low-cost hardware extractor and analyzes it both from a performance and from a side-channel security point of view. Our investigations lead to contrasted conclusions. On one hand, extractors can be efficiently implemented and protected with masking. On the other hand, they provide adversaries with many more exploitable leakage samples than, e.g. block ciphers. As a result, they can ensure high security margins against standard (non-profiled) side-channel attacks and turn out to be much weaker against profiled attacks. From a methodological point of view, our analysis consequently raises the question of which attack strategies should be considered in security evaluations.

Modulus fault attacks against RSA-CRT signatures

September 2011

·

152 Reads

RSA-CRT fault attacks have been an active research area since their discovery by Boneh, DeMillo and Lipton in 1997. We present alternative key-recovery attacks on RSA-CRT signatures: instead of targeting one of the sub-exponentiations in RSA-CRT, we inject faults into the public modulus before CRT interpolation, which makes a number of countermeasures against Boneh et al.'s attack ineffective. Our attacks are based on orthogonal lattice techniques and are very efficient in practice: depending on the fault model, between 5 and 45 faults suffice to recover the RSA factorization within a few seconds. Our simplest attack requires that the adversary knows the faulty moduli, but more sophisticated variants work even if the moduli are unknown, under reasonable fault models. All our attacks have been fully validated experimentally with fault-injection laser techniques.

Synchronization method for SCA and fault attacks

April 2011

·

42 Reads

This paper shows how effectiveness of side-channel and fault attacks can be improved for devices running from internal clock sources. Due to frequency instability of internally clocked chips, attacking them was always a great challenge. A significant improvement was achieved by using a frequency injection locking technique via the power supply line of a chip. As a result, the analysis of a semiconductor chip can be accomplished with less effort and in shorter time. Successful synchronization was demonstrated on a secure microcontroller and a secure FPGA. This paper presents research into limits for synchronization and discusses possible countermeasures against frequency injection attacks.

A practical device authentication scheme using SRAM PUFs

January 2011

·

100 Reads

The contamination of electronic component supply chains by counterfeit hardware devices is a serious and growing risk in today’s globalized marketplace. Current practice for detecting counterfeit semiconductors includes visual checking, electrical testing, and reliability testing which can require significant investments in expertise, equipment, and time. Additionally, best practices have been developed in industry worldwide to combat counterfeiting in many of its variants. Although the current approaches improve the situation significantly, they do not provide extensive technical means to detect counterfeiting. However, new approaches in this area are beginning to emerge. Suh and Devadas recently proposed a low cost device authentication scheme which relies on Physically Unclonable Functions (PUFs) to implement a challenge-response authentication protocol. There are several constraints in their authentication scheme, e.g., their scheme requires a secure online database and relies on PUF constructions that exhibit a large number of challenge-response pairs. In this paper, we introduce a new device authentication scheme using PUFs for device anti-counterfeiting. Our scheme is simple and practical as it does not require any online databases and is not tied to any PUF implementations. For hardware devices which already have SRAM and non-volatile storage embedded, our scheme takes almost no additional cost.

High performance GHASH and impacts of a class of unconventional bases

November 2011

·

30 Reads

This work presents a new method to compute the GHASH function involved in the Galois/Counter Mode of operation for block ciphers. If ${X= X_1\ldots X_n}$ is a bit string made of n blocks of 128 bits each, then the GHASH function essentially computes ${X_1H^n + X_2H^{n-1} + \cdots+ X_nH}$ , where H is the hash key and an element of the binary field ${\mathbb{F}_{2^{128}}}$ . This operation is usually computed using n successive multiply-and-add operations over ${\mathbb{F}_{2^{128}}}$ . Our proposed method replaces all but a fixed number of those multiplications by additions on the field. This is achieved using the characteristic polynomial of H. We present both how to use this polynomial to speed up the GHASH function and how to efficiently compute it for each session that uses a new H. We also show that the proposed technique can be parallelized to compute GHASH even faster. In order to completely eliminate the need for a field multiplication, we investigate a different set of bases for the field element representation and report their architectural and possible security impacts.

Table 6 Comparison with hardware accelerators for elliptic curve scalar multiplication.
Table 7 Timings in clock cycles for field arithmetic operations on a Sandy Bridge processor. "op/M " denotes ratio to multiplication obtained from ICC.
Speeding scalar multiplication over binary elliptic curves using the new carry-less multiplication instruction

November 2011

·

398 Reads

The availability of a new carry-less multiplication instruction in the latest Intel desktop processors significantly accelerates multiplication in binary fields and hence presents the opportunity for reevaluating algorithms for binary field arithmetic and scalar multiplication over elliptic curves. We describe how to best employ this instruction in field multiplication and the effect on performance of doubling and halving operations. Alternate strategies for implementing inversion and half-trace are examined to restore most of their competitiveness relative to the new multiplier. These improvements in field arithmetic are complemented by a study on serial and parallel approaches for Koblitz and random curves, where parallelization strategies are implemented and compared. The contributions are illustrated with experimental results improving the state-of-the-art performance of halving and doubling-based scalar multiplication on NIST curves at the 112- and 192-bit security levels and a new speed record for side-channel-resistant scalar multiplication in a random curve at the 128-bit security level. The algorithms presented in this work were implemented on Westmere and Sandy Bridge processors, the latest generation Intel microarchitectures.

SPA-Resistant Binary Exponentiation with Optimal Execution Time

August 2011

·

33 Reads

Straightforward implementations of binary exponentiation algorithms make the cryptographic system vulnerable to side-channel attacks; specifically, to simple power analysis (SPA) attacks. Solutions proposed so far introduce a considerable performance penalty. In this paper, we present a new method that implements an SPA-resistant binary exponentiation exhibiting optimal execution time at the cost of a small amount of storage—$${O(\sqrt{\ell})}$$, where ℓ is the bit length of the exponent. The technique is optimal in the sense that it adds SPA-resistance to an underlying binary exponentiation algorithm while introducing zero computational overhead. Furthermore, we show that for practical applications, the same optimal execution time can be achieved with much less storage space, without noticeably sacrificing security or any other aspect of the cryptosystem’s performance. We also discuss the possibility of our method being implemented in a way that a certain level of resistance against differential power analysis may be obtained.

A new method of black box power analysis and a fast algorithm for optimal key search

December 2011

·

75 Reads

This paper suggests a new method of power analysis, similarity power analysis, which overcomes the numerics and complexity problems of the template attacks. Similarity power analysis learns characteristics of the device to attack in a profiling phase and is then able to determine a secret key from a single power trace. Similarity power analysis is a black box attack; it does not make any assumptions on the algorithm attacked or its implementation. Since similarity power analysis usually gives wrong results for a small number of key bits, it is supplemented with a new fast algorithm for optimal key search, which enables an attacker to try the keys with the highest probability of success first. Both similarity power analysis and the fast optimal key search algorithm were experimentally tried on DES.

Machine learning in side-channel analysis: A first study

December 2011

·

972 Reads

Electronic devices may undergo attacks going beyond traditional cryptanalysis. Side-channel analysis (SCA) is an alternative attack that exploits information leaking from physical implementations of e.g. cryptographic devices to discover cryptographic keys or other secrets. This work comprehensively investigates the application of a machine learning technique in SCA. The considered technique is a powerful kernel-based learning algorithm: the Least Squares Support Vector Machine (LS-SVM). The chosen side-channel is the power consumption and the target is a software implementation of the Advanced Encryption Standard. In this study, the LS-SVM technique is compared to Template Attacks. The results show that the choice of parameters of the machine learning technique strongly impacts the performance of the classification. In contrast, the number of power traces and time instants does not influence the results in the same proportion. This effect can be attributed to the usage of data sets with straightforward Hamming weight leakages in this first study.

Top-cited authors