Conference Paper

XDIVINSA: eXtended DIVersifying INStruction Agent to Mitigate Power Side-Channel Leakage

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
Over the course of recent years, microarchitectural side-channel attacks emerged as one of the most novel and thought-provoking attacks to exfiltrate information from computing hardware. These attacks leverage the unintended artefacts produced as side-effects to certain architectural design choices and proved difficult to be effectively mitigated without incurring significant performance penalties. In this work, we undertake a systematic mapping study of the academic literature related to the aforementioned attacks. We, in particular, pose four research questions and study 104 primary works to answer those questions. We inquire about the origins of artefacts leading up to exploitable settings of microarchitectural side-channel attacks; the effectiveness of the proposed countermeasures; and the lessons to be learned that would help build secure systems for the future. Furthermore, we propose a classification scheme that would also serve in the future for systematic mapping efforts in this scope.
Article
Full-text available
In this paper we present a novel true random number generator based on high-precision edge sampling. We use two novel techniques to increase the throughput and reduce the area of the proposed randomness source: variable-precision phase encoding and repetitive sampling. The first technique consists of encoding the oscillator phase with high precision in the regions around the signal edges and with low precision everywhere else. This technique results in a compact implementation at the expense of reduced entropy in some samples. The second technique consists of repeating the sampling at high frequency until the phase region encoded with high precision is captured. This technique ensures that only the high-entropy bits are sent to the output. The combination of the two proposed techniques results in a secure TRNG, which suits both ASIC and FPGA implementations. The core part of the proposed generator is implemented with 10 look-up tables (LUTs) and 5 flip-flops (FFs) of a Xilinx Spartan-6 FPGA, and achieves a throughput of 1.15 Mbps with 0.997 bits of Shannon entropy. On Intel Cyclone V FPGAs, this implementation uses 10 LUTs and 6 FFs, and achieves a throughput of 1.07 Mbps. This TRNG design is supported by a stochastic model and a formal security evaluation.
Presentation
Full-text available
Slides of the oral presentation of the corresponding paper (https://www.researchgate.net/publication/327495855_CIDPro_Custom_Instructions_for_Dynamic_Program_Diversification)
Conference Paper
Full-text available
Timing side-channel attacks pose a major threat to embedded systems due to their ease of accessibility. We propose CIDPro, a framework that relies on dynamic program diversification to mitigate timing side-channel leakage. The proposed framework integrates the widely used LLVM compiler infrastructure and the increasingly popular RISC-V FPGA soft-processor. The compiler automatically generates custom instructions in the security critical segments of the program, and the instructions execute on the RISC-V custom co-processor to produce diversified timing characteristics on each execution instance. CIDPro has been implemented on the Zynq7000 XC7Z020 FPGA device to study the performance overhead and security trade-offs. Experimental results show that our solution can achieve 80% and 86% timing side-channel capacity reduction for two benchmarks with an acceptable performance overhead compared to existing solutions. In addition, the proposed method incurs only a negligible hardware area overhead of 1% slices of the entire RISC-V system.
Conference Paper
Full-text available
We present a generic framework for runtime code polymorphism, applicable to a broad range of computing platforms including embedded systems with low computing resources (e.g. microcontrollers with few kilo-bytes of memory). Code polymorphism is defined as the ability to change the observable behaviour of a software component without changing its functional properties. In this paper we present the implementation of code polymorphism with runtime code generation, which offers many code transformation possibilities: we describe the use of random register allocation, random instruction selection, instruction shuffling and insertion of noise instructions. We evaluate the effectiveness of our framework against correlation power analysis: as compared to an unprotected implementation of AES where the secret key could be recovered in less than 50 traces in average, in our protected implementation, we increased the number of traces necessary to achieve the same attack by more than 20000\(\times \). With regards to the state of the art, our implementation shows a moderate impact in terms of performance overhead.
Conference Paper
Full-text available
Random delays are commonly used as a countermeasure to hinder side channel analysis and fault attacks in embedded devices. This paper proposes a different manner of generating random delays, that increases the desynchronisation compared to random delays whose lengths are uniformly distributed. It is also shown that it is possible to reduce the time lost due to the inclusion of random delays, while maintaining the increased desynchronisation.
Conference Paper
Full-text available
In a recent work, Mangard et al. showed that under certain assumptions, the (so-called) standard univariate side-channel attacks using a distance-of-means test, correlation analysis and Gaussian templates are essentially equivalent. In this paper, we show that in the context of multivariate attacks against masked implementations, this conclusion does not hold anymore. While a single distinguisher can be used to compare the susceptibility of different unprotected devices to first-order DPA, understanding second-order attacks requires to carefully investigate the information leakages and the adversaries exploiting these leakages, separately. Using a framework put forward by Standaert et al. at Eurocrypt 2009, we provide the first analysis that explores these two topics in the case of a masked implementation exhibiting a Hamming weight leakage model. Our results lead to refined intuitions regarding the efficiency of various practically-relevant distinguishers. Further, we also investigate the case of second- and third-order masking (i.e. using three and four shares to represent one value). This evaluation confirms that higher-order masking only leads to significant security improvements if the secret sharing is combined with a sufficient amount of noise. Eventually, we show that an information theoretic analysis allows determining this necessary noise level, for different masking schemes and target security levels, with high accuracy and smaller data complexity than previousmethods.
Conference Paper
Full-text available
In this paper, a new true random number generator (TRNG), based entirely on digital components is proposed. The design has been implemented using a fast random number generation method, which is dependent on a new type of ring oscillator with the ability to be set in metastable mode. Earlier methods of random number generation involved employment of jitter, whereas the proposed method leverages the metastability phenomenon in digital circuits and applies it to a ring oscillator. The new entropy employment method allows an increase in the TRNG throughput by significantly reducing the required entropy accumulating time. Samples obtained from simulation of TRNG design have been evaluated using AIS.31 and FIPS 140-1/2 statistical tests. The results of these tests have proven the high quality of generated data. Corners analysis of the TRNG design was also performed to estimate the robustness to technology process and environment variations. Investigated in FPGA technology, phase distribution highlighted the advantages of the proposed method over traditional architectures.
Conference Paper
Secure implementations against side channel attacks usually combine hiding and masking protections in software implementations. In this work, we focus on desynchronization protection which is considered as a hiding countermeasure. The idea of desynchronization is to obtain a non-predictable offset of the attacking point in terms of time dimension. For this purpose, we present exploiting pattern-recognition methods to filter interesting points for obtaining a successful side channel attack. Using this tool as a case study, we completely cancel the desynchronization effect of the CHES 2009/2010 countermeasure~\cite{coron2009efficient,coron2010analysis}. Moreover, 25k traces are needed for a successful key recoveries in case of polymorphism-based countermeasure~\cite{courousse2016runtime}.
Conference Paper
In this work, we analyze all existing RSA-CRT countermeasures against the Bellcore attack that use binary self-secure exponentiation algorithms. We test their security against a powerful adversary by simulating fault injections in a fault model that includes random, zeroing, and skipping faults at all possible fault locations. We find that most of the countermeasures are vulnerable and do not provide sufficient security against all attacks in this fault model. After investigating how additional measures can be included to counter all possible fault injections, we present three countermeasures which prevent both power analysis and many kinds of fault attacks.
Article
The Simon and Speck families of block cIPhers were designed specifically to offer security on constrained devices, where simplicity of design is crucial. However, the intended use cases are diverse and demand flexibility in implementation. Simplicity, security, and flexibility are ever-present yet conflicting goals in cryptographic design. This paper outlines how these goals were balanced in the design of Simon and Speck.
Article
This paper presents a new metric, which we call Signal Available to Attacker (SAVAT), that measures the side channel signal created by a specific single-instruction difference in program execution, i.e. The amount of signal made available to a potential attacker who wishes to decide whether the program has executed instruction/event A or instruction/event B. We also devise a practical methodology for measuring SAVAT in real systems using only user-level access permissions and common measurement equipment. Finally, we perform a case study where we measure electromagnetic (EM) emanations SAVAT among 11 different instructions for three different laptop systems. Our findings from these experiments confirm key intuitive expectations, e.g. That SAVAT between on-chip instructions and off-chip memory accesses tends to be higher than between two on-chip instructions. However, we find that particular instructions, such as integer divide, have much higher SAVAT than other instructions in the same general category (integer arithmetic), and that last-level-cache hits and misses have similar (high) SAVAT. Overall, we confirm that our new metric and methodology can help discover the most vulnerable aspects of a processor architecture or a program, and thus inform decision-making about how to best manage the overall side channel vulnerability of a processor, a program, or a system.
Conference Paper
True random number generators are essential components in cryptographic hardware. In this work, a novel entropy extraction method is used to improve throughput of jitter-based true random number generators on FPGA. By utilizing ultra-fast carry-logic primitives available on most commercial FPGAs, we have improved the efficiency of the entropy extraction, thereby increasing the throughput, while maintaining a compact implementation. Design steps and techniques are illustrated on an example of a ring-oscillator based true random number generator implemented on Spartan-6. In this design, the required accumulation time is reduced by 3 orders of magnitude compared to the most efficient oscillator-based TRNG on the same FPGA. The presented implementation occupies only 67 slices, achieves a throughput of 14.3 Mbps and it is provided with a formal evaluation of security.
Article
Side-channel attacks (SCAs), such as differential power analysis or differential electromagnetic analysis, pose a serious threat to the security of embedded systems. In the literature, few articles address the problem of securing general purpose processors (GPPs) with resourceful countermeasures. However, in many low-cost applications, where security is not critical, cryptographic algorithms are typically implemented in software. Since it has been proved that GPPs are vulnerable to SCAs, it is desirable to develop efficient mechanisms to ensure a certain level of security. In this paper, we extend side-channel countermeasures to the register transfer level description. The challenge is to create a new class of processor that executes embedded software applications, which are intrinsically protected against SCAs. For that purpose, we first investigate how to integrate into the datapath two countermeasures based on masking and hiding approaches. Through an FPGA-based processor, we then evaluate the overhead and the effectiveness of the proposed solutions against time-domain first-order attacks. We finally show that a suitable combination of countermeasures significantly increases the side-channel resistance in a cost-effective way.
Article
Because two types of side-channel attacks, namely passive information leakages and active fault injections, are considered separate implementation threats to cryptographic modules, most countermeasures against these attacks have been independently developed. However, Amiel et al. demonstrated that a fault injection combined with a simple power analysis (SPA) can break such a classical Rivest, Shamir, and Adelman (RSA) system implementation. In this paper, we show that this combined attack (CA) can be applied to the Boscher, Naciri, and Prouff algorithm, which is an SPA/fault attack (FA)-resistant exponentiation method for RSA implementation. Furthermore, this paper proposes a novel exponentiation algorithm resistant to power analysis and an FA as well as to the CA. The proposed exponentiation algorithm can be employed for secure Chinese remainder theorem-RSA implementation. In addition, the paper presents some experimental results of an SPA under the assumption of a successful fault injection.
Article
There have been many attacks that exploit side-effects of program execution to expose secret information and many proposed countermeasures to protect against these attacks. However there is currently no systematic, holistic methodology for understanding information leakage. As a result, it is not well known how design decisions affect information leakage or the vulnerability of systems to side-channel attacks. In this paper, we propose a metric for measuring information leakage called the Side-channel Vulnerability Factor (SVF). SVF is based on our observation that all side-channel attacks ranging from physical to microarchitectural to software rely on recognizing leaked execution patterns. SVF quantifies patterns in attackers' observations and measures their correlation to the victim's actual execution patterns and in doing so captures systems' vulnerability to side-channel attacks. In a detailed case study of on-chip memory systems, SVF measurements help expose unexpected vulnerabilities in whole-system designs and shows how designers can make performance-security trade-offs. Thus, SVF provides a quantitative approach to secure computer architecture.
Article
There have been many attacks that exploit side-effects of program execution to expose secret information and many proposed countermeasures to protect against these attacks. However there is currently no systematic, holistic methodology for understanding information leakage. As a result, it is not well known how design decisions affect information leakage or the vulnerability of systems to side-channel attacks. In this paper, we propose a metric for measuring information leakage called the Side-channel Vulnerability Factor (SVF). SVF is based on our observation that all sidechannel attacks ranging from physical to microarchitectural to software rely on recognizing leaked execution patterns. SVF quantifies patterns in attackers' observations and measures their correlation to the victim's actual execution patterns and in doing so captures systems' vulnerability to side-channel attacks. In a detailed case study of on-chip memory systems, SVF measurements help expose unexpected vulnerabilities in whole-system designs and shows how designers can make performance-security trade-offs. Thus, SVF provides a quantitative approach to secure computer architecture.
Article
Summary This application note describes the implementation of Linear Feedback Shift Registers (LFSR) using the SRL macroavailable in the Virtex™ and Virtex-II series of FPGAs. The optimal implementation of a 15-bit LFSR, a 52-bit LFSR, and a 118-bit LFSR are discussed in this application note. Introduction The Virtex series of FPGAs have an SRL (Shift Register LUT) macro. This macro implements very efficient shift registers varying in length from one to sixteen bits as determined by the address lines. The length can be either set to a static value or it can be changed dynamically. This application note describes the use of the SRL to implement Linear Feedback Shift Registers in Virtex series devices. The configurable elements are called Configuration Logic Blocks (CLBs). Each Virtex series CLB contains four logic cells, organized into two slices. A logic cell includes a 4-input look-up table, carry logic, and a storage element. Each CLB in a Virtex-II device has four identical slices. Each slice contains 2, 4-input LUTs, two registers, carry logic, and other dedicated logic. The Virtex-II devices have a similar macro, a selectable SRL that utilizes one LUT to implement a 16-bit shift register. This macro has two outputs, one is the dedicated output from the 16th register, and the other is selected using the address lines. This enables the macros to be cascadable to implement 32-bit, 64-bit, and 128-bit shift registers in one CLB.
Article
ChaCha8 is a 256-bit stream cipher based on the 8-round cipher Salsa20/8. The changes from Salsa20/8 to ChaCha8 are designed to improve diffusion per round, conjecturally increasing resistance to cryptanalysis, while preserving—and often improving—time per round. ChaCha12 and ChaCha20 are analogous modifications of the 12-round and 20-round ciphers Salsa20/12 and Salsa20/20. This paper presents the ChaCha family and explains the differences between Salsa20 and ChaCha.
Conference Paper
The silicon industry has lately been focusing on side chan- nel attacks, that is attacks that exploit information that leaks from the physical devices. Although dieren t countermeasures to thwart these at- tacks have been proposed and implemented in general, such protections do not make attacks infeasible, but increase the attacker's experimental (data acquisition) and computational (data processing) workload beyond reasonable limits. This paper examines dieren t ways to attack devices featuring random process interrupts and noisy power consumption.
Conference Paper
Random delays are often inserted in embedded software to protect against side-channel and fault attacks. At CHES 2009 a new method for generation of random delays was described that increases the attacker’s uncertainty about the position of sensitive operations. In this paper we show that the CHES 2009 method is less secure than claimed. We describe an improved method for random delay generation which does not suffer from the same security weakness. We also show that the paper’s criterion to measure the security of random delays can be misleading, so we introduce a new criterion for random delays which is directly connected to the number of acquisitions required to break an implementation. We mount a power analysis attack against an 8-bit implementation of the improved method verifying its higher security in practice.
Conference Paper
To prevent smart card attacks using Differential Power Analysis (DPA), manufacturers commonly implement DPA countermeasures that create misalignment in power trace sets and decrease the effectiveness of DPA. We design and investigate the elastic alignment algorithm for non-linearly warping trace sets in order to align them. Elastic alignment uses FastDTW, originally a method for aligning speech utterances in speech recognition systems, to obtain so-called warp paths that can be used to perform alignment. We show on traces obtained from a smart card with random process interrupts that misalignment is reduced significantly, and that even under an unstable clock the algorithm is able to perform alignment.
Conference Paper
Resistance against side-channel analysis (SCA) attacks is an important requirement for many secure embedded systems. Microprocessors and microcontrollers which include suitable countermeasures can be a vital building block for such systems. In this paper, we present a detailed concept for building embedded processors with SCA countermeasures. Our concept is based on ideas for the secure implementation of cryptographic instruction set extensions. On the one hand, it draws from known SCA countermeasures like DPA-resistant logic styles. On the other hand, our protection scheme is geared towards use in modern embedded applications like PDAs and smart phones. It supports multitasking and a separation of secure system software and (potentially insecure) user applications. Furthermore, our concept affords support for a wide range of cryptographic algorithms. Based on this concept, embedded processor cores with support for a selected set of cryptographic algorithms can be built using a fully automated design flow.
Article
An encryption method is presented with the novel property that publicly re- vealing an encryption key does not thereby reveal the corresponding decryption key. This has two important consequences: 1. Couriers or other secure means are not needed to transmit keys, since a message can be enciphered using an encryption key publicly revealed by the intended recipient. Only he can decipher the message, since only he knows the corresponding decryption key. 2. A message can be \signed" using a privately held decryption key. Anyone can verify this signature using the corresponding publicly revealed en- cryption key. Signatures cannot be forged, and a signer cannot later deny the validity of his signature. This has obvious applications in \electronic mail" and \electronic funds transfer" systems. A message is encrypted by representing it as a number M, raising M to a publicly specied
Article
An encryption method is presented with the novel property that publicly revealing an encryption key does not thereby reveal the corresponding decryption key. This has two important consequences: Couriers or other secure means are not needed to transmit keys, since a message can be enciphered using an encryption key publicly revealed by the intended recipient. Only he can decipher the message, since only he knows the corresponding decryption key. A message can be “signed” using a privately held decryption key. Anyone can verify this signature using the corresponding publicly revealed encryption key. Signatures cannot be forged, and a signer cannot later deny the validity of his signature. This has obvious applications in “electronic mail” and “electronic funds transfer” systems. A message is encrypted by representing it as a number M, raising M to a publicly specified power e, and then taking the remainder when the result is divided by the publicly specified product, n , of two large secret prime numbers p and q. Decryption is similar; only a different, secret, power d is used, where e * d = 1(mod (p - 1) * (q - 1)). The security of the system rests in part on the difficulty of factoring the published divisor, n .
Article
Embedded cryptographic systems, such as smart cards, require secure implementations that are robust to a variety of low-level attacks. Side-Channel Attacks (SCA) exploit the information such as power consumption, electromagnetic radiation and acoustic leaking through the device to uncover the secret information. Attackers can mount successful attacks with very modest resources in a short time period. Therefore, many methods have been proposed to increase the security against SCA. Randomizing the execution order of the instructions that are independent, i.e., random shuffling, is one of the most popular among them. Implementing instruction shuffling in software is either implementation specific or has a significant performance or code size overhead. To overcome these problems, we propose in this work a generic custom hardware unit to implement random instruction shuffling as an extension to existing processors. The unit operates between the CPU and the instruction cache (or memory, if no cache exists), without any modification to these components. Both true and pseudo random number generators are used to dynamically and locally provide the shuffling sequence. The unit is mainly designed for in-order processors, since the embedded devices subject to these kind of attacks use simple in-order processors. More advanced processors (e.g., superscalar, VLIW or EPIC processors) are already more resistant to these attacks because of their built-in ILP and wide word size. Our experiments on two different soft in-order processor cores, i.e., OpenRISC and MicroBlaze, implemented on FPGA show that the proposed unit could increase the security drastically with very modest resource overhead. With around 2% area, 1.5% power and no performance overhead, the shuffler increases the effort to mount a successful power analysis attack on AES software implementation over 360 times.
Article
Montgomery multiplication methods constitute the core of modular exponentiation, the most popular operation for encrypting and signing digital data in public-key cryptography. In this article, we study the operations involved in computing the Montgomery product, describe several high-speed, space-efficient algorithms for computing MonPro(a, b), and analyze their time and space requirements. Our focus is to collect several alternatives for Montgomery multiplication, three of which are new. However, we do not compare the Montgomery techniques to other modular multiplication approaches
Article
. Cryptosystem designers frequently assume that secrets will be manipulated in closed, reliable computing environments. Unfortunately, actual computers and microchips leak information about the operations they process. This paper examines specific methods for analyzing power consumption measurements to find secret keys from tamper resistant devices. We also discuss approaches for building cryptosystems that can operate securely in existing hardware that leaks information. Keywords: differential power analysis, DPA, SPA, cryptanalysis, DES 1 Background Attacks that involvemultiple parts of a security system are difficult to predict and model. If cipher designers, software developers, and hardware engineers do not understand or review each other's work, security assumptions made at each level of a system's design may be incomplete or unrealistic. As a result, security faults often involveunanticipated interactions between components designed by different people. Manytechniques ...
A Proposal for: Functionality classes for random number generators
  • W Killmann
  • W Schindler
NIST special publication 800-22 revision 1a
  • A Rukhin
  • J Soto
  • J Nechvatal
  • M Smid
  • E Barker
  • S Leigh
A testing methodology for side-channel resistance validation
  • G Goodwill
  • B Jun
  • J Jaffe
  • P Rohatgi
NIST special publication 800-22 revision 1a
  • rukhin
A testing methodology for side-channel resistance validation
  • goodwill