Conference Paper

A lightweight ISE for ChaCha on RISC-V

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In addition to utilizing the C programing language for pure software implementations [18], it seemed that the preferred implementation approach also involved using an ISE. The work [19] introduced a lightweight ISE designed to support the ChaCha stream cipher in RISC-V architectures, which achieved a speedup gain of at least 5.4 times compared to the OpenSSL baseline and 3.4 times compared to an ISA optimized implementation. The study [20] focused on achieving secure and efficient execution of AES by separating different ISEs for 32-bit and 64-bit, which demonstrated significant performance improvements for AES-128 block encryption. ...
Article
Full-text available
The security of Internet of Things (IoTs) devices in recent years has created interest in developing implementations of lightweight cryptographic algorithms for such systems. Additionally, open-source hardware and field-programable gate arrays (FPGAs) are gaining traction via newly developed tools, frameworks, and HDLs. This enables new methods of creating hardware and systems faster, more simply, and more efficiently. In this paper, the implementation of a system-on-chip (SoC) based on a 32-bit RISC-V processor with lightweight cryptographic accelerator cores in FPGA and an open-source integrating framework is presented. The system consists of a 32-bit VexRiscv processor, written in SpinalHDL, and lightweight cryptographic accelerator cores for the PRINCE block cipher, the PRESENT-80 block cipher, the ChaCha stream cipher, and the SHA3-512 hash function, written in Verilog HDL and optimized for low latency with fewer clock cycles. The primary aim of this work was to develop a customized SoC platform with a register-controlled bus suitable for integrating lightweight cryptographic cores to become compact embedded systems that require encryption functionalities. Additionally, custom firmware was developed to verify the functionality of the SoC with all integrated accelerator cores, and to evaluate the speed of cryptographic processing. The proposed system was successfully implemented in a Xilinx Nexys4 DDR FPGA development board. The resources of the system in the FPGA were low with 11,830 LUTs and 9552 FFs. The proposed system can be applicable to enhancing the security of Internet of Things systems.
... An efficient way to implement these instructions in hardware is proposed in [14]. An ISE [15] for RISC-V is proposed to accelerate a stream-cipher called ChaCha [16]. Compared to OpenSSL baseline and ISA-based optimized implementations, ISE-assisted ChaCha speeds up at least 5.4× and 3.4×, respectively. ...
Article
Full-text available
The ever-increasing need for securing computing systems using cryptographic algorithms is spurring interest in the efficient implementation of common algorithms. While the algorithms can be implemented in software using base instruction sets, there is considerable potential to reduce memory cost and improve speed using specialized instructions and associated hardware. However, there is a need to assess the benefits and costs of software implementations and new instructions that implement key cryptographic algorithms in fewer cycles. The primary aim of this paper is to improve the understanding of the performance and cost of implementing cryptographic algorithms for the RISC-V instruction set architecture (ISA) in two cases: software implementations of the algorithms using the rv32i instruction set and using cryptographic instructions supported by dedicated hardware in additional functional units. For both cases, we describe a RISC-V processor with cryptography hardware extensions and hand-optimized RISC-V assembly language implementations of eleven cryptographic algorithms. Compared to implementations with only the rv32i instruction set, implementations with the cryptography set extension provide a 1.5× to 8.6× faster execution speed and 1.2× to 5.8× less program memory for five of the eleven algorithms. Based on our performance analyses, a new instruction is proposed to increase the implementation efficiency of the algorithms.
Article
Full-text available
Secure, efficient execution of AES is an essential requirement on most computing platforms. Dedicated Instruction Set Extensions (ISEs) are often included for this purpose. RISC-V is a (relatively) new ISA that lacks such a standardized ISE. We survey the state-of-the-art industrial and academic ISEs for AES, implement and evaluate five different ISEs, one of which is novel. We recommend separate ISEs for 32 and 64-bit base architectures, with measured performance improvements for an AES-128 block encryption of 4x and 10x with a hardware cost of 1.1K and 8.2K gates respectively, when compared to a software-only implementation based on use of T-tables. We also explore how the proposed standard bit-manipulation extension to RISC-V can be harnessed for efficient implementation of AES-GCM. Our work supports the ongoing RISC-V cryptography extension standardisation process.
Article
Full-text available
Ge et al. [GYH18] propose the augmented ISA (or aISA), a central tenet of which is the selective exposure of micro-architectural resources via a less opaque abstraction than normal. The aISA proposal is motivated by the need for control over such resources, for example to implement robust countermeasures against microarchitectural attacks. In this paper, we apply an aISA-style approach to challenges stemming from analogue micro-architectural leakage; examples include power-based Hamming weight and distance leakage from relatively fine-grained resources (e.g., pipeline registers), which are not exposed in, and so cannot be reliably controlled via, a normal ISA. Specifically, we design, implement, and evaluate an ISE named FENL: the ISE acts as a fence for leakage, preventing interaction between, and hence leakage from, instructions before and after it in program order. We demonstrate that the implementation and use of FENL has relatively low overhead, and represents an effective tool for systematically localising and reducing leakage.
Article
Full-text available
Microarchitectural timing channels expose hidden hardware states though timing. We survey recent attacks that exploit microarchitectural features in shared hardware, especially as they are relevant for cloud computing. We classify types of attacks according to a taxonomy of the shared resources leveraged for such attacks. Moreover, we take a detailed look at attacks used against shared caches. We survey existing countermeasures. We finally discuss trends in attacks, challenges to combating them, and future directions, especially with respect to hardware support.
Conference Paper
Full-text available
The extension of a given instruction-set with specialized instructions has become a common technique used to speed up the execution of applications. By identifying computationally intensive portions of an application to be partitioned in segments of code to execute in software and segments of code to execute in hardware, the execution of an application can be considerably speeded up. Each segment of code implemented in hardware can then be seen as a specialized application-specific instruction extending a given instruction-set. Although a number of approaches exist in literature proposing different methodologies to customize an instruction-set, the description of the problem consists only of sporadic comparisons limited to isolated problems. This survey presents a unique detailed description of the problem and provides an exhaustive overview of the research in the past years in instruction-set extension. This article presents a thorough analysis of the issues involved during the customization of an instruction-set by means of a set of specialized application-specific instructions. The investigation of the problem covers both instruction generation and instruction selection and different kinds of customizations are analyzed in a great detail.
Chapter
S-boxes are the only source of non-linearity in many symmetric primitives. While they are often defined as being functions operating on a small space, some recent designs propose the use of much larger ones (e.g., 32 bits). In this context, an S-box is then defined as a subfunction whose cryptographic properties can be estimated precisely.
Chapter
RISC-V is a promising free and open-source instruction set architecture. Most of the instruction set has been standardized and several hardware implementations are commercially available. In this paper we highlight features of RISC-V that are interesting for optimizing implementations of cryptographic primitives. We provide the first optimized assembly implementations of table-based AES, bitsliced AES, ChaCha, and the Keccak-\(f\)[1600] permutation for the RV32I instruction set. With respect to public-key cryptography, we study the performance of arbitrary-precision integer arithmetic without a carry flag. We then estimate the improvement that can be gained by several RISC-V extensions. These performance studies also serve to aid design choices for future RISC-V extensions and implementations.
Article
The Simon and Speck families of block cIPhers were designed specifically to offer security on constrained devices, where simplicity of design is crucial. However, the intended use cases are diverse and demand flexibility in implementation. Simplicity, security, and flexibility are ever-present yet conflicting goals in cryptographic design. This paper outlines how these goals were balanced in the design of Simon and Speck.
Conference Paper
This paper describes software optimization for the stream Cipher ChaCha. We leverage the wide vectorization capabilities of the new AVX2 architecture, to speed up ChaCha encryption (and decryption) on the latest x86_64 processors. In addition, we show how to apply vectorization for the future AVX512 architecture, and get further speedup. This leads to significant performance gains. For example, on the latest Intel Haswell microarchitecture, our AVX2 implementation performs at 1.43 cycles per byte (on a 4KB message), which is ~2x faster than the current implementation in the Chromium project.
Article
ChaCha8 is a 256-bit stream cipher based on the 8-round cipher Salsa20/8. The changes from Salsa20/8 to ChaCha8 are designed to improve diffusion per round, conjecturally increasing resistance to cryptanalysis, while preserving—and often improving—time per round. ChaCha12 and ChaCha20 are analogous modifications of the 12-round and 20-round ciphers Salsa20/12 and Salsa20/20. This paper presents the ChaCha family and explains the differences between Salsa20 and ChaCha.
The Transport Layer Security (TLS) Protocol Version 1.3
  • E Rescorla
Yosys open synthesis suite
  • C Wolf
The rocket chip generator
  • K Asanović
  • R Avizienis
  • J Bachrach
  • S Beamer
  • D Biancolin
  • C Celio
  • H Cook
  • D Dabbelt
  • J Hauser
  • A Izraelevitz
  • S Karandikar
  • B Keller
  • D Kim
  • J Koenig
  • Y Lee
  • E Love
  • M Maas
  • A Magyar
  • H Mao
  • M Moreto
  • A Ou
  • D Patterson
  • B Richards
  • C Schmidt
  • S Twigg
  • H Vo
  • A Waterman
The Open Source toolkit for SSL/TLS
  • Openssl
Instruction sets should be free: The case for RISC-V
  • K Asanović
  • D Patterson
Efficient simultaneous deployment of multiple lightweight authenticated ciphers
  • B Rezvani
  • T Conroy
  • L Beckwith
  • M Bozzay
  • T Laffoon
  • D Mcfeeters
ChaCha20 and Poly1305 for IETF Protocols
  • Y Nir
  • A Langley
Alzette: a 64-bit ARX-box (feat. CRAX and TRAX)
  • C Beierle
  • A Biryukov
  • L C Santos
  • J Groschdl
  • L Perrin
  • A Udovenko
Efficient simultaneous deployment of multiple lightweight authenticated ciphers
  • rezvani
ChaCha20-Poly1305 Cipher Suites for Transport Layer Security (TLS)
The rocket chip generator
  • K Asanović
  • R Avizienis
  • J Bachrach
  • S Beamer
  • D Biancolin
  • C Celio