FIGURE 7 - uploaded by William Fornaciari
Content may be subject to copyright.
Execution time (in milliseconds) of QC-LDPC BF decoding. Results are shown for software decoding on the Intel i7 processor and for hardware decoding on the Artix-7 12 and Artix-7 200 FPGAs.
Source publication
Considering code-based cryptography, quasi-cyclic low-density parity-check (QC-LDPC) codes are foreseen as one of the few solutions to design post-quantum cryptosystems. The bit-flipping algorithm is at the core of the decoding procedure of such codes when used to design cryptosystems. An effective design must account for the computational complexi...
Contexts in source publication
Context 1
... as it is explained in the following, such decoder implementations allow to overcome the performance of optimized software-implemented decoders employing the Intel AVX2 extension by 5 times, on average. Performance results - Figure 7 reports the performance results expressed as the execution time to complete the bitflipping decoding procedure for all the LEDAcrypt configurations. The results are reported for each configuration considering the two software implementations, i.e., C11 and AVX2, and the two hardware implementations, which targets the Xilinx Artix-7 12 and Artix-7 200 FPGAs, respectively. ...
Context 2
... -7 12 Artix-7 200 BW P AR BW P AR C1 32 4 128 24 C2 32 4 128 32 C3 32 4 128 32 C4 32 2 128 24 C5 32 4 128 24 C6 32 4 128 24 C7 32 1 128 24 C8 32 2 128 24 C9 32 1 128 24 the decoding of C1 and C9, respectively (see Figure 7). In order to highlight the actual performance speedup across the different implementations of the decoding procedure, Fig- ure 8 reports the performance speedup of the AVX2 software and of the two hardware implementations, normalized with respect to the C11 software version. ...
Context 3
... as it is explained in the following, such decoder implementations allow to overcome the performance of optimized software-implemented decoders employing the Intel AVX2 extension by 5 times, on average. Performance results - Figure 7 reports the performance results expressed as the execution time to complete the bitflipping decoding procedure for all the LEDAcrypt configurations. The results are reported for each configuration considering the two software implementations, i.e., C11 and AVX2, and the two hardware implementations, which targets the Xilinx Artix-7 12 and Artix-7 200 FPGAs, respectively. ...
Context 4
... -7 12 Artix-7 200 BW P AR BW P AR C1 32 4 128 24 C2 32 4 128 32 C3 32 4 128 32 C4 32 2 128 24 C5 32 4 128 24 C6 32 4 128 24 C7 32 1 128 24 C8 32 2 128 24 C9 32 1 128 24 the decoding of C1 and C9, respectively (see Figure 7). In order to highlight the actual performance speedup across the different implementations of the decoding procedure, Fig- ure 8 reports the performance speedup of the AVX2 software and of the two hardware implementations, normalized with respect to the C11 software version. ...
Similar publications
The design of cryptographic engines for the Internet of Things (IoT) edge devices and other ultralightweight devices is a crucial challenge. The emergence of such resource-constrained devices raises significant challenges to current cryptographic algorithms. PHOTON is an ultra-lightweight cryptographic hash function targeting low-resource devices....
Citations
... The hardware components implementing binary polynomial inversion [ 17], binary polynomial multiplication [ 4], and Black-Gray-Flip (BGF) decoding [ 31], i.e., the three most complex operations employed within the BIKE cryptosystem, were specifically designed in a parametric way to exploit parallelism as desired according to the performance requirements and the area constraints given by the target platform. Their designs, meant for FPGA targets, are suitable not only for accelerating the BIKE post-quantum KEM but more in general for other applications making use of large binary polynomials and QC-MDPC codes. ...
... Black-Gray-Flip decoding The decoding component implements the BGF decoding algorithm [ 11], a variant of the baseline QC-MDPC bit-flipping decoding algorithm. The BGF algorithm iterates the computation of two multiplications, performed respectively in the integer and binary domains, between a dense polynomial operand and a sparse one [ 31]. The two dense-sparse multiplications are performed concurrently in a pipelined fashion, and the number of the bits computed in parallel in both is configurable by the designer [ 4]. ...
Post-quantum cryptography aims to design cryptosystems that can be deployed on traditional computers and resist attacks from quantum computers, which are widely expected to break the currently deployed public-key cryptography solutions in the upcoming decades. Providing effective hardware support is crucial to ensuring a wide adoption of post-quantum cryptography solutions, and it is one of the requirements set by the USA’s National Institute of Standards and Technology within its ongoing standardization process. This research delivers a configurable FPGA-based hardware architecture to support BIKE, a post-quantum QC-MDPC code-based key encapsulation mechanism. The proposed architecture is configurable through a set of architectural and code parameters, which make it efficient, providing good performance while using the resources available on FPGAs effectively, flexible, allowing to support different large QC-MDPC codes defined by the designers of the cryptosystem, and scalable, targeting the whole Xilinx Artix-7 FPGA family. Two separate modules target the cryptographic functionality of the client and server nodes of the quantum-resistant key exchange, respectively, and a complexity-based heuristic that leverages the knowledge of the time and space complexity of the configurable hardware components steers the design space exploration to identify their best parameterization. The proposed architecture outperforms the state-of-the-art reference software that exploits the Intel AVX2 extension and runs on a desktop-class CPU by 1.77 and 1.98 times, respectively, for AES-128- and AES-192-equivalent security instances of BIKE, and it provides a speedup of more than six times compared to the fastest reference state-of-the-art hardware architecture, which targets the same FPGA family.
... Cryptography is one of the critical components against adversaries. Moreover, it is employed to ensure the confidentiality and integrity score of the data and give entities participating in communication anonymity and authentication [19]. Moreover, in these modern smart facilities, the mathematical operations enriched the crypto operations, Such as addition, multiplication integration, and so on [20]. ...
... Zoni et al. [19] presented a scalable and efficient structure for implementing bit-flipping functions targeting highly lightweight codes for post-quantum-based cryptography. Moreover, the lightweight cryptosystem has nine configurations employed for the implementation process. ...
The threat application and data hacking systems rule today’s digital world. Hence, securing digital information is the most needed to keep personal detail confidential. Several advanced crypto techniques were implemented in the past, resulting in the finest securing outcome. However, those models required additional features and maintenance based on the utilized applications. Considering these difficulties, Data-encryption-standard has been implemented, and the security functions were enriched by performing the Bernoulli mathematical model. So, the presented technique is named a novel Bernoulli Data-encryption-standard (B-DES). Here, the Bernoulli function has been performed after the XOR process, and then during the decryption process, the function of the novel B-DES has been reversed. Moreover, the designed encryption approach is checked with different attacks like brute force attack, Known plain-text Analysis, Chosen-Plaintext Analysis, Man-in-the-Middle Attack, and Ciphertext analysis. Here, the gained power usage by the designed scheme is 193 mW, compared to other models; it has diminished the power usage by 7%. Moreover, the earned memory utilization is 0.6 kB; compared to other models, it has reduced memory utilization by 3%. The delay rate recorded by the designed model is 3.43 ns; compared to other models, it has minimized the delay score by 3%.
... The work in [15] presented another FPGA-based implementation of BIKE, split into two components devoted to supporting the client-side (key generation and decapsulation) and server-side (encapsulation) primitives. The client and server cores integrated highly configurable hardware accelerators for binary polynomial multiplication [5,31] and inversion [16] and BGF decoding [30]. Setting different parameters for the configurable accelerators allowed the authors to implement the client and server cores on FPGAs ranging from Artix-7 35 to Artix-7 200. ...
NIST is conducting a process for the standardization of post-quantum cryptosystems, i.e., cryptosys-tems that are resistant to attacks by both traditional and quantum computers and that can thus substitute the traditional public-key cryptography solutions which are expected to be broken by quantum computers in the next decades. This manuscript provides an overview and a comparison of the existing state-of-the-art implementations of the BIKE QC-MDPC code-based post-quantum KEM, a candidate in NIST's PQC standardization process. We consider both software, hardware, and mixed hardware-software implementations and evaluate their performance and, for hardware ones, their resource utilization. Traditional public-key cryptosystems (PKC), including RSA [27], ECDSA [6], and Diffie-Hellman [11], underpin cryptographically secure key exchange mechanisms and digital signature schemes. Such cryptoschemes are however expected to be broken by quantum computers in the upcoming decades [23]. The threat posed by quantum computers requires the definition and the design of alternative cryptosystems that perform the same functions as PKC ones, maintaining security against traditional computer attacks while ensuring security against quantum computer attacks. Post-quantum cryptography (PQC) aims to develop cryptosystems that are resistant to both traditional attacks and new quantum attack models, which can be implemented on traditional architecture computers and existing devices, and that can be integrated into the networks and communication protocols currently in use [7].
... In all other terms, given the ubiquitous use of these devices and their ability to communicate with one another without the limited funding, data protection and identification are most pressing concerns for energy various measures. Traditional approaches for authenticated key agreements that use a key exchange protocol [18][19][20] really aren't appropriate again for architecture of the device system for various purposes: A need of significant processing burden even during administration of effective key methods is among the main factors, as is the overburdening of replying to every device queries independently. ...
... The work in [15] presented another FPGA-based implementation of BIKE, split into two components devoted to supporting the client-side (key generation and decapsulation) and server-side (encapsulation) primitives. The client and server cores integrated highly configurable hardware accelerators for binary polynomial multiplication [5,31] and inversion [16] and BGF decoding [30]. Setting different parameters for the configurable accelerators allowed the authors to implement the client and server cores on FPGAs ranging from Artix-7 35 to Artix-7 200. ...
NIST is conducting a process for the standardization of post-quantum cryptosystems, i.e., cryptosystems that are resistant to attacks by both traditional and quantum computers and that can thus substitute the traditional public-key cryptography solutions which are expected to be broken by quantum computers in the next decades. This manuscript provides an overview and a comparison of the existing state-of-the-art implementations of the BIKE QC-MDPC code-based post-quantum KEM, a candidate in NIST's PQC standardization process. We consider both software, hardware, and mixed hardware-software implementations and evaluate their performance and, for hardware ones, their resource utilization.
... This version will be suited to use Posits for both storage and inference. Beside the implementation of accelerators for new data formats, other analysis will be carried out to provide basic blocks to implement novel cryptosystems onto FPGAs, like multipliers for large binary polynomials [76] and decoders for post-quantum cryptography [75]. ...
In the near future, Exascale systems will need to bridge three technology gaps to achieve high performance while remaining under tight power constraints: energy efficiency and thermal control; extreme computation efficiency via HW acceleration and new arithmetic; methods and tools for seamless integration of reconfigurable accelerators in heterogeneous HPC multi-node platforms. TEXTAROSSA addresses these gaps through a co-design approach to heterogeneous HPC solutions, supported by the integration and extension of HW and SW IPs, programming models, and tools derived from European research.
... Contributions -This paper presents a complete hardware implementation of BIKE that targets the Xilinx Artix-7 family of FPGAs and supports client and server KEM operations in a quantum-resistant TLS [16]. The proposed architecture leverages a set of state-of-the-art configurable accelerators [17]- [19] that implement the key operations of the KEM primitives to provide the best hardware support. Our architecture is evaluated against the reference AVX2 software [20] and FPGAbased hardware [21] implementations of BIKE. ...
... [26] discusses a Karatsuba multiplier for dense binary polynomials, whose performance is, however, insufficient compared to a multiplier tailored to dense-sparse multiplication, when one of the two operands has a low Hamming weight. [19] presented a bit-flipping decoder for generic QC-MDPC codes that is highly configurable in terms of bandwidth and degree of parallelism, allowing it to scale across a range of FPGA targets. [27] also proposed a bitflipping decoder for QC-MDPC codes, that is, however, only configurable in the bandwidth of its datapath. ...
... The scalability of the client and server architectures is obtained by mixing state-of-the-art configurable components for the most complex operations with hard-coded ones. The client and server architectures employ configurable components to implement binary polynomial inversion [17], binary polynomial multiplication [18], and QC-MDPC bit-flipping decoding [19]. The decoding component was adapted to implement the Black-Gray-Flip (BGF) decoding algorithm employed by BIKE and introduced in [28]. ...
The recent advances in quantum computers impose the adoption of post-quantum cryptosystems into secure communication protocols. This work proposes two FPGA-based, client- and server-side hardware architectures to support the integration of the BIKE post-quantum KEM within TLS. Thanks to the parametric hardware design, the paper explores the best option between hardware and software implementations, given a set of available hardware resources and a realistic use-case scenario. The experimental evaluation comparing our client and server designs against the reference AVX2 and hardware implementations of BIKE highlighted two aspects. First, the proposed client and server architectures outperform the reference hardware implementation of BIKE by eight and four times, respectively. Second, the performance comparison between our client and server designs against the reference AVX2 implementation strongly depends on the available resource. Our solution is almost twice as fast as the AVX2 implementation while implemented on the Artix-7 200 FPGA, while it is up to six times slower when targeting smaller FPGAs, thus motivating a careful analysis of the available hardware resources and the optimization of the design's parallelism before opting for hardware support.
... Nevertheless it is not compared with any cyclic code [5,6]. ...
In this paper certain optimization techniques are proposed to reduce the encoder and decoder computation time of cache memory that is affected by soft error and by implementing ECC such as cyclic and block codes. Error that is adjacent by a width of three and two bits are the prime concern of this thesis. Optimised Golay code (23, 12) and new block code size (32, 19) which is also optimised are presented. Nevertheless, cyclic code is efficient compared with block code however, the prime concern of the thesis is to address the triple and double adjacent errors which also includes single bit error, in this regard built-in capability of Golay code is optimised and used for comparison. The extended Golay code is implemented in term of number of slice, number of LUT and maximum combinational path delay compared with existing Golay code.
... Even after the design of the digital transmission system has been optimized, bit errors in transmission will occur with some small but non-zero probability. Wireless transmission systems can experience error rates as high as 10-3 or worse [5]. The acceptability of a given level of bit error rate depends on the particular application. ...
The integrity of received data is a critical consideration in the design of digital communications and storage systems. The technique involved in attaining data reliability while transmission over a wireless channel is to use Channel Coding. These coding methods involve the use of Error Control Codes and there are two basic ways of controlling errors. They are Automatic Repeat Request and Forward Error Correction. This thesis concentrates on Forward error correction that deals with error detection and error correction. There are two types of error control codes: Block Codes and Convolutional Codes. The extent to which the errors are detected is a measure of the success of the code. The main trade-off in the error correction/detection technique lies in the key parameters involved in evaluating a coding system. Various block codes are analyzed using the performance metrics, namely, Improvement ratio and Error Resilience. It is observed that Cyclic Redundancy Check (CRC) codes showed better performance, and hence, they are chosen for further study.
... Cryptography is one of the critical component against adversaries. Moreover, it is employed to ensure the con dentiality and integrity score of the data and give entities participating in communication with both anonymity and authentication [19]. Moreover, in this modern smart facilities, the crypto operations were enriched by the mathematical operations. ...
The threat application has been increased a lot in the digital application; hence, protecting the data and the digital application is very important. Several crypto algorithms were already implemented with different sub-module to secure the data from the malicious event. However, the harmfulness of the malicious events has broken the security in many cases. This has resulted in high data theft, less confidential range and data loss So, the present article has designed a novel Bernoulli Data-encryption-standard (B-DES) for securing the digital communication in the wireless environment. Here, the Bernoulli function has been performed after the XOR process, and then during the decryption process, the function of the novel B-DES has been reversed. Moreover, the designed encryption approach is checked with the brute force attack . Here, the gained power usage by the designed scheme is 193 mW, compared to other models; it has diminished the power usage by 7%. Moreover, the earned memory utilization is 0.6 kB, compared to other compared models, it has reduced the memory utilization is 3%. The delay rated recorded by the designed model is 3.43 ns, compared to other models it has minimized the delay score by 3%.