Article

AEKA: FPGA Implementation of Area-Efficient Karatsuba Accelerator for Ring-Binary-LWE-based Lightweight PQC

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Lightweight PQC-related research and development have gradually gained attention from the research community recently. Ring-Binary-Learning-with-Errors (RBLWE)-based encryption scheme (RBLWE-ENC), a promising lightweight PQC based on small parameter sets to fit related applications (but not in favor of deploying popular fast algorithms like number theoretic transform). To solve this problem, in this paper, we present a novel implementation of hardware acceleration for RBLWE-ENC based on Karatsuba algorithm, particularly on the field-programmable gate array (FPGA) platform. In detail, we have proposed an area-efficient Karatsuba Accelerator (AEKA) for RBLWE-ENC, based on three layers of innovative efforts. First of all, we reformulate the signal processing sequence within the major arithmetic component of the KA-based polynomial multiplication for RBLWE-ENC to obtain a new algorithm. Then, we have designed the proposed algorithm into a new hardware accelerator with several novel algorithm-to-architecture mapping techniques. Finally, we have conducted thorough complexity analysis and comparison to demonstrate the efficiency of the proposed accelerator, e.g., it involves 62.5% higher throughput and 60.2% less area-delay product (ADP) than the state-of-the-art design for n = 512 (Virtex-7 device, similar setup). The proposed AEKA design strategy is highly efficient on the FPGA devices, i.e., small resource usage with superior timing, which can be integrated with other necessary systems for lightweight-oriented high-performance applications (e.g., servers). The outcome of this work is also expected to generate impacts for lightweight PQC advancement.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Kyber is designed to provide security levels equivalent to the current Advanced Encryption Standard (AES) encryption system, with different parameter sets targeted at equivalent security levels of AES-128, AES-192, and AES-256, making it versatile for various security requirements. The implementation security of PQC or PQC-related algorithms has received attention recently [9,10]. The design and implementation of Kyber have seen broad integration into cryptographic libraries and systems, such as the first quantum computing safe tape drive from IBM [11]. ...
Preprint
Full-text available
Intelligent transportation systems (ITS) are characterized by wired or wireless communication among different entities, such as vehicles, roadside infrastructure, and traffic management infrastructure. These communications demand different levels of security, depending on how sensitive the data is. The national ITS reference architecture (ARC-IT) defines three security levels, i.e., high, moderate, and low-security levels, based on the different security requirements of ITS applications. In this study, we present a generalized approach to secure ITS communications using a standardized key encapsulation mechanism, known as Kyber, designed for post-quantum cryptography (PQC). We modified the encryption and decryption systems for ITS communications while mapping the security levels of ITS applications to the three versions of Kyber, i.e., Kyber-512, Kyber-768, and Kyber-1024. Then, we conducted a case study using a benchmark fault-enabled chosen-ciphertext attack to evaluate the security provided by the different Kyber versions. The encryption and decryption times observed for different Kyber security levels and the total number of iterations required to recover the secret key using the chosen-ciphertext attack are presented. Our analyses show that higher security levels increase the time required for a successful attack, with Kyber-512 being breached in 183 seconds, Kyber-768 in 337 seconds, and Kyber-1024 in 615 seconds. In addition, attack time instabilities are observed for Kyber-512, 768, and 1024 under 5,000, 6,000, and 8,000 inequalities, respectively. The relationships among the different Kyber versions, and the respective attack requirements and performances underscore the ITS communication security Kyber could provide in the PQC era.
Article
Recent advances in hardware acceleration for postquantum cryptography (PQC) have also switched to lightweight PQC. Apart from the traditional hardware design methodology, instruction-set accelerator for PQC represents a new design trend but has not been explored on lightweight PQC. To fill the research gap, in this work, we present a novel instruction-set acceleration of a ring-binary-learning-with-errors (RBLWEs)-based PQC (RBLWE-based encryption (ENC), a promising lightweight scheme) on field-programmable gate array (FPGA). Key efforts include: 1) derivation of an algorithm for the major operation of RBLWE-ENC to facilitate instruction-set acceleration; 2) development of the instruction-set accelerator, including the polynomial multiplication core designed from the derived algorithm; and 3) evaluation to showcase the efficiency of the proposed design. The proposed accelerator is efficient and complete in cryptographic operations, which can help further lightweight PQC development.
Article
Post-quantum cryptography (PQC) has drawn significant attention from the hardware design research community, especially on field-programmable gate array (FPGA) platforms. In line with this trend, in this paper, we present a novel FPGA-based PQC design work (CHIRP), i.e., C ompact and high- P erformance FPGA implementation of un I fied accelerators for R ing-Binary-Learning-with-Errors (RBLWE)-based P QC, a promising lightweight PQC suited for related applications like Internet-of-Things. The proposed accelerators offer flexibility across the available two security levels, thus expanding their application potential. In total, we presented four distinct hardware accelerators tailored to different performance and resource requirements, ranging from resource-constrained devices to high-throughput applications. Our innovation encompasses three key efforts: (i) we derived four optimized algorithms for RBLWE-ENC’s unified operation (covering the available two security levels), allowing flexible switching of security sizes while boosting calculations; (ii) we then presented the four novel accelerators (CHIRP) targeting FPGA platforms, featuring dedicated hardware structures; (iii) we finally conducted a comprehensive evaluation to validate the efficiency of the proposed accelerators on various FPGA devices. Compared to the existing unified design, the proposed accelerator demonstrated up to 91.4% reduction in area-delay product (ADP) on the Straix-V device. Even when compared with the state-of-the-art single security designs, the proposed accelerator (best version) obtains much better resource usage and ADP performance while unified operation (flexibly switching between two security levels) is considered on both AMD-Xilinx and Intel devices. We anticipate the findings of this research will foster advancements in FPGA implementation techniques for lightweight PQC development.
Article
Full-text available
Along with the National Institute of Standards and Technology (NIST) post-quantum cryptography (PQC) standardization process, lightweight PQC-related research, and development have also gained substantial attention from the research community. Ring-binary-learning-with-errors (RBLWE), a ring variant of binary-LWE (BLWE), has been used to build a promising lightweight PQC scheme for emerging Internet-of-Things (IoT) and edge computing applications, namely the RBLWE-based encryption scheme (RBLWE-ENC). The parameter settings of RBLWE-ENC, however, are not in favor of deploying typical fast algorithms like number theoretic transform (NTT). Following this direction, in this work, we propose a Karatsuba initiated novel accelerator (KINA) for efficient implementation of RBLWE-ENC. Overall, we have made several coherent interdependent stages of efforts to carry out the proposed work: 1) we have innovatively used the Karatsuba algorithm (KA) to derive the major arithmetic operation of RBLWE-ENC into a new form for high-performance operation; 2) we have then effectively mapped the proposed algorithm into an efficient hardware accelerator with the help of a number of optimization techniques; and 3) we have also provided detailed complexity analysis and implementation comparison to demonstrate the superior performance of the proposed KINA, e.g., the proposed design with u=2 involves 64.71% higher throughput and 15.37% less area-delay product (ADP) than the state-of-the-art design for n=512 (Virtex-7). The proposed KINA offers flexible processing speed and is suitable for high-performance applications like IoT servers. This work is expected to be useful for lightweight PQC development.
Article
Full-text available
Significant innovation has been made in the development of public-key cryptography that is able to withstand quantum attacks, known as post-quantum cryptography (PQC). This paper focuses on the development of an efficient PQC hardware implementation. Specifically, an implementation of the binary Ring-learning-with-errors (BRLWE)-based encryption scheme, a promising lightweight PQC suitable for resource-constrained applications, is proposed. The paper first develops the mathematical formulation to present the proposed algorithmic process. The corresponding hardware accelerators are then described in detail. Finally, comparisons with previous implementations are provided to demonstrate the superior performance of the proposed design. For instance, the proposed low-complexity accelerator has 34.7% less area-delay product (ADP) than the state-of-the-art design for n=256 in the field-programmable gate array (FPGA) platform. Apart from the efficiency of the hardware architectures, the proposed design also has a complete input/output processing setup, and thus is feasible for emerging lightweight applications.
Article
Full-text available
Post-quantum cryptography (PQC) has gained significant attention from the community recently as it is proven that the existing public-key cryptosystems are vulnerable to the attacks launched from the well-developed quantum computers. The finite field arithmetic AB+C , where A and C are integer polynomials and B is a binary polynomial, is the key component for the binary Ring-learning-with-errors (BRLWE)-based encryption scheme (a low-complexity PQC suitable for emerging lightweight applications). In this paper, we propose a novel hardware implementation of the finite field arithmetic AB+C through three stages of interdependent efforts: (i) a rigorous mathematical formulation process is presented first; (ii) an efficient hardware architecture is then presented with detailed description; (iii) a thorough implementation has also been given along with the comparison. Overall, (i) the proposed basic structure ( u=1 ) outperforms the existing designs, e.g., it involves 46.3\% less area-delay product (ADP) than \cite{b14b} for n=512 ; (ii) the proposed design also offers very efficient performance in time-complexity and can be used in many future applications.
Conference Paper
Full-text available
The recent advance in the post-quantum cryptography (PQC) field has gradually shifted from the theory to the implementation of the cryptosystem, especially on the hardware platforms. Following this trend, in this paper, we aim to present efficient implementations of the finite field arithmetic (key component) for the binary Ring-Learning-with-Errors (BRLWE)-based PQC through a novel lookup-table (LUT)-like method. We have carried out four stages of interdependent efforts: (i) an algorithm-hardware co-design driven derivation of the proposed LUT-like method is provided detailedly for the key arithmetic of the BRLWE-based scheme; (ii) the proposed hardware architecture is then presented along with the internal structural description; (iii) we have also presented a novel hybrid size structure suitable for flexible operation, which is the first report in the literature; (iv) the final implementation and comparison processes have also been given, demonstrating that our proposed structures deliver significant improved performance over the state-of-the-art solutions. The proposed designs are highly efficient and are expected to be employed in many emerging applications. Index Terms-BRLWE based scheme, finite field arithmetic, hybrid size structure, lookup table, post-quantum cryptography
Article
Full-text available
Post-quantum cryptosystems should be prepared before the advent of powerful quantum computers to ensure information secure in our daily life. In 2016 a post-quantum standardization contest was launched by National Institute of Standards and Technology (NIST), and there have been lots of works concentrating on evaluation of these candidate protocols, mainly in pure software or through hardware-software co-design methodology on different platforms. As the contest progresses to third round in July 2020 with only 7 finalists and 8 alternate candidates remained, more dedicated and specific hardware designs should be considered to illustrate the intrinsic property of a certain protocol and achieve better performance. To this end, we present a standalone hardware design of CRYSTALS-KYBER, amodule learning-with-errors (MLWE) based key exchange mechanism (KEM) protocol within the 7 finalists on FPGA platform. Through elaborate scheduling of sampling and number theoretic transform (NTT) related calculations, decent performance is achieved with limited hardware resources. The way that Encode/Decode and the tweaked Fujisaki-Okamoto transform are implemented is demonstrated in detail. Analysis about minimizing memory footprint is also given out. In summary, we realize the adaptive chosen ciphertext attack (CCA) secure Kyber with all selectable module dimension k on the smallest Xilinx Artix-7 device. Our design computes key-generation, encapsulation (encryption) and decapsulation (decryption and reencryption) phase in 3768/5079/6668 cycles when k = 2, 6316/7925/10049 cycles when k = 3, and 9380/11321/13908 cycles when k = 4, consuming 7412/6785 LUTs, 4644/3981 FFs, 2126/1899 slices, 2/2 DSPs and 3/3 BRAMs in server/client with 6.2/6.0 ns critical path delay, outperforming corresponding high level synthesis (HLS) based designs or hardware-software co-designs to a large extent.
Article
Full-text available
NewHope-NIST is a promising ring learning with errors (RLWE)-based postquantum cryptography (PQC) for key encapsulation mechanisms. The performance on the field-programmable gate array (FPGA) affects the applicability of NewHope-NIST. In RLWE-based PQC algorithms, the number theoretic transform (NTT) is one of the most time-consuming operations. In this paper, low-complexity NTT and inverse NTT (INTT) are used to implement highly efficient NewHope-NIST on FPGA. First, both the pre-processing of NTT and the post-processing of INTT are merged into the fast Fourier transform (FFT) algorithm, which reduces N and 2N modular multiplications for N-point NTT and INTT, respectively. Second, a compact butterfly unit and an efficient modular reduction on the modulus 12289 are proposed for the low-complexity NTT/INTT architecture, which achieves an improvement of approximately 3× in the area time product (ATP) compared with the results of the state-of-the-art designs. Finally, a highly efficient architecture with doubled bandwidth and timing hiding for NewHope-NIST is presented. The implementation results on an FPGA show that our design is at least 2.5× faster and has 4.9× smaller ATP compared with the results of the state-of-the-art designs of NewHope-NIST on similar platforms.
Article
Full-text available
Ring learning with errors (RLWE) is an efficient lattice-based cryptographic scheme that has worst-case reduction to lattice problem, conjectured to be quantum-hard. Ring-BinLWE is an optimized variant of RLWE problem using binary error distribution, resulting in highly-efficient hardware implementation. Efficient and low-complexity architectures in hardware, thwarting natural and malicious faults, are essential for lattice-based post-quantum cryptography (PQC) algorithms. In this paper, we explore efficient fault detection approaches for implementing the Ring-BinLWE problem. This work, for the first time, investigates fault detection schemes for all three stages of RLWE encryption. Utilizing the stuck-at fault model, we employ recomputing with encoded operands schemes to achieve high error coverage. We simulate and implement our schemes on a field-programmable gate array (FPGA) platform. Our schemes provide low hardware overhead (area overhead of 15.74%, delay overhead of 7.74%, and power consumption overhead of 4.06%), with high error coverage, which can be suitable for resource-constrained as well as high-performance usage models.
Conference Paper
Full-text available
Quantum computers threaten to break public-key cryptography schemes such as DSA and ECDSA in polynomial time, which poses an imminent threat to secure signal processing. Ring learning with error (RLWE) lattice-based cryptography (LBC) is one of the most promising families of post-quantum cryptography (PQC) schemes in terms of efficiency and versatility. Two conventional methods to compute polynomial multiplication , the most compute-intensive routine in the RLWE schemes, are convolutions and Number Theoretic Transform (NTT). In this work, we explore the energy efficiency of polynomial multiplier using systolic architecture for the first time. As an early exploration, we design two high-throughput systolic array polynomial multipliers, including NTT-based and convolution-based, and compare them to our low-cost sequential (non-systolic) NTT-based multiplier. Our sequential NTT-based multiplier achieves 3x speedup over the state-of-the-art FGPA implementation of the polynomial multiplier in the NewHope-Simple key exchange mechanism on a low-cost Artix7 FPGA. When synthesized on a Zynq UltraScale+ FPGA, the NTT-based systolic and convolution-based systolic designs achieve on average 1.7x and 7.5x speedup over our sequential NTT-based multiplier respectively, which can lead to generating over 2x more signatures per second by CRYSTALS-Dilithium, a PQC digital signature scheme. These explorations help designers select the right PQC implementations for making future signal processing applications quantum-resistant.
Article
Full-text available
We give a simple proof that the decisional Learning With Errors (LWE) problem with binary secrets (and an arbitrary polynomial number of samples) is at least as hard as the standard LWE problem (with unrestricted, uniformly random secrets, and a bounded, quasi-linear number of samples). This proves that the binary-secret LWE distribution is pseudorandom, under standard worst-case complexity assumptions on lattice problems. Our results are similar to those proved by Brakerski, Langlois, Peikert, Regev and Stehlé (STOC 2013), but provide a shorter, more direct proof, and a small improvement in the noise growth of the reduction.
Conference Paper
Full-text available
Lattice-based cryptography has gained credence recently as a replacement for current public-key cryptosystems, due to its quantum-resilience, versatility, and relatively low key sizes. To date, encryption based on the learning with errors (LWE) problem has only been investigated from an ideal lattice standpoint, due to its computation and size efficien-cies. However, a thorough investigation of standard lattices in practice has yet to be considered. Standard lattices may be preferred to ideal lattices due to their stronger security assumptions and less restrictive parameter selection process. In this paper, an area-optimised hardware architecture of a standard lattice-based cryptographic scheme is proposed. The design is implemented on a FPGA and it is found that both encryption and decryption fit comfortably on a Spartan-6 FPGA. This is the first hardware architecture for standard lattice-based cryptography reported in the literature to date, and thus is a benchmark for future implementations. Additionally, a revised discrete Gaussian sampler is proposed which is the fastest of its type to date, and also is the first to investigate the cost savings of implementing with λ/2-bits of precision. Performance results are promising compared to the hardware designs of the equivalent ring-LWE scheme, which in addition to providing stronger security proofs; generate 1272 encryptions per second and 4395 decryptions per second.
Article
Lattice-based cryptography (LBC) stands out as one of the most viable classes of quantum-resistant schemes. This brief explores a time-sharing approach, with different parallelism levels, for a crucial operation in LBC cryptosystems, i.e., polynomial multiplication. We also employ an innovative coefficient ordering method in our time-shared schoolbook polynomial multiplication (SPM) to combine the best of two worlds: design compactness and lower processing latency. Thus, our work offers a choice of design points with performance vs. resource trade-offs. Our fastest proposed design exhibits 80% and 57% reductions in LUTs and throughput, respectively, compared to the existing fully parallel SPM architecture (on Xilinx Ultrascale+), which lead to a 53% improvement in the area-time-product efficiency. Our smallest proposed design is more than 2.2×2.2\times faster than the existing low-cost parallel SPM architecture (on Xilinx Kintex-7) at the expense of 85% additional area resources.
Article
Post-quantum cryptography (PQC) has recently drawn substantial attention from various communities owing to the proven vulnerability of existing public-key cryptosystems against the attacks launched from well-established quantum computers. The Ring-Binary-Learning-with-Errors (RBLWE), a variant of Ring-LWE, has been proposed to build PQC for lightweight applications. As more Field-Programmable Gate Array (FPGA) devices are being deployed in lightweight applications like Internet-of-Things (IoT) devices, it would be interesting if the RBLWE-based PQC can be implemented on the FPGA with ultra-low complexity and flexible processing. However, thus far, limited information is available for such implementations. In this paper, we propose novel RBLWE-based PQC accelerators on the FPGA with ultra-low implementation complexity and flexible timing. We first present the process of deriving the key operation of the RBLWE-based scheme into the proposed algorithmic operation. The corresponding hardware accelerator is then efficiently mapped from the proposed algorithm with the help of algorithm-to-architecture implementation techniques, and extended to obtain higher-throughput designs. The final complexity analysis and implementation results (on a variety of FPGAs) show that the proposed accelerators have significantly smaller area-time complexities than the state-of-the-art designs. Overall, the proposed accelerators feature low implementation complexity and flexible processing, making them desirable for emerging FPGA-based lightweight applications.
Article
CRYSTALS-Dilithium is a lattice-based post-quantum digital signature scheme that is resistant to attacks by quantum computers and has been selected to be standardized in the NIST post-quantum cryptography (PQC) standardization process. However, the speed performance and design flexibility of the Dilithium still need to be evaluated. This paper presents a high-performance software/hardware co-design of CRYSTALS-Dilithium based on the NIST PQC round-3 parameters. High-speed pipelined hardware modules for NTT/INTT, point-wise multiplication/addition, and for SHAKE are included in the design to accelerate the time-consuming operations in Dilithium. All hardware modules are parameterized, thus allowing full support of run-time configuration to increase versatility. Moreover, the proposed software/hardware architecture and tight operating workflows reduce the data transmission overhead between the processor and other hardware modules. The hardware accelerator is implemented with a reconfigurable logic on FPGA and is integrated with the high-performance ARM Cortex-A9 processor in the Xilinx Zynq Architecture. We measure the performance of the software/hardware system for Dilithium in NIST security levels 2, 3, and 5. Compared to pure software implementations, we achieve 8.7-12.5 times speedup in Key generation, 6.3-7.3 times speedup in Sign, and 9.1-12.2 times speedup in Verify operations.
Article
The Internet of Things (IoT) introduces an active connection between smart devices for revolutionizing our modern lives in this world. But, IoT devices often exhibit several security issues, so transmission between the nodes should be protected using cryptographic approaches. However, the complexity of conventional cryptographic approaches is very high and is vulnerable to quantum attacks. This paper presents a robust and lightweight post-quantum lattice-based authentication and code-based hybrid encryption scheme for resource-constrained IoT devices. The proposed Ring-Learning with Errors (Ring-LWE) based authentication scheme introduces Bernstein reconstruction in polynomial multiplication to achieve minimal computation cost; hence, resource-limited IoT devices are viable to use the reliable authentication mutually. This approach offers indefinite identity privacy and location privacy. Hence, the proposed signature generation and verification process are highly efficient compared to the existing ring signature systems. Also, the proposed post-quantum hybrid code-based encryption scheme follows Diagonal Structure Based QC‑LDPC Codes with column loop optimization and Simplified Log Domain Sum-Product Algorithm (SLDSPA) to provide the function of light weight encryption with minimum hardware requirements. The total authentication delay of the proposed authentication scheme is 23% less than the authentication scheme that is considered conventional polynomial multiplication. Also, the optimized design of the proposed code based HE uses only 64 slices and 640 slices on Xilinx Virtex-6 FPGA for encoding and decoding processes, respectively. These simulation results prove the effectiveness of the proposed cryptographic scheme against other competitive systems in terms of its functionality and hardware complexities.
Article
Learning with error (LWE) over the ring based on binary distribution (ring-BinLWE) has become a potential Internet-of-Things (IoT) confidentiality solution with its anti-quantum attack properties and uncomplicated calculations. Compared with ring-LWE based on discrete Gaussian distribution, the decryption scheme of ring-LWE based on binary distribution needs to be re- determined due to the asymmetry of the error distribution. The direct application of the ring-LWE decryption function based on discrete Gaussian distribution can cause serious misjudgment. In this article, we propose a more accurate and robust decryption scheme for ring-BinLWE based on 2’s complement ring. Compared with the previous decryption function, the re- derived decryption function significantly improves the decoding rate by 50%. Furthermore, based on the proposed decryption function, high-performance, and lightweight hardware architectures for terminal devices in IoT are, respectively, proposed, which are scalable and can be easily adapted to ring-BinLWE hardware deployment with other parameter sets. When the parameter set is n=n\,\,= 256, q=q\,\,= 256, the high-performance implementation consumes 7.6k LUTs, 6.2k FFs, and 2.3k SLICEs on Spartan 6 field-programmable gate array (FPGA) platform. Compared with the previous implementation, our resource overhead increases by only 23% while the decryption accuracy is significantly improved by 50%. The lightweight implementation for parameter set n=n\,\,= 256, q=q\,\,= 256 consumes only 230 LUTs, 338 FFs, and 84 SLICEs on the Spartan 6 FPGA platform. Compared with the previous work, the area ×\times time (AT) is reduced by 47.8%, which is more suitable for deployment on resource-constrained IoT nodes.
Article
italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Ring learning-with-errors (RLWE)-based encryption scheme is a lattice-based cryptographic algorithm that constitutes one of the most promising candidates for Post-Quantum Cryptography (PQC) standardization due to its efficient implementation and low computational complexity. Binary Ring -LWE (BRLWE) is a new optimized variant of RLWE, which achieves smaller computational complexity and higher efficient hardware implementations. In this paper, two efficient architectures based on Linear-Feedback Shift Register (LFSR) for the arithmetic used in Inverted Binary Ring -LWE ( Inv BRLWE)-based encryption scheme are presented, namely the operation of AB+CA\cdot B+C over the polynomial ring Zq/(xn+1)\mathbb {Z}_{q}/(x^{n}+1) . The first architecture optimizes the resource usage for major computation and has a novel input processing setup to speed up the overall processing latency with minimized input loading cycles. The second architecture deploys an innovative serial-in serial-out processing format to reduce the involved area usage further yet maintains a regular input loading time-complexity. Experimental results show that the architectures presented here improve the complexities obtained by competing schemes found in the literature, e.g., involving 71.23% less area-delay product than recent designs. Both architectures are highly efficient in terms of area-time complexities and can be extended for deploying in different lightweight application environments.
Article
Traditional and lightweight cryptography primitives and protocols are insecure against quantum attacks. Thus, a real‐time application using traditional or lightweight cryptography primitives and protocols does not ensure full‐proof security. Post‐quantum cryptography is important for the internet of things (IoT) due to its security against quantum attacks. This paper offers a broad literature analysis of post‐quantum cryptography for IoT networks, including the challenges and research directions to adopt in real‐time applications. The work draws focus towards post‐quantum cryptosystems that are useful for resource‐constraint devices. Further, those quantum attacks are surveyed, which may occur over traditional and lightweight cryptographic primitives.
Article
This work presents a multi-level approximation exploration undertaken on the Ring-Learning-with-Errors (R-LWE) based Public-key Cryptographic (PKC) schemes that belong to quantum-resilient cryptography algorithms. Among the various quantum-resilient cryptography schemes proposed in the currently running NIST’s Post-quantum Cryptography (PQC) standardization plan, the lattice based LWE schemes have emerged as the most viable and preferred class for the IoT applications due to their compact area and memory footprint compared to other alternatives. However, compared to the classical schemes used today, R-LWE is much harder a challenge to fit on embedded IoT (end-node) devices, due to their stricter resource constraints (lower area, memory, energy budgets) as well as their limited computational capabilities. To the best of our knowledge, this is the first endeavour exploring the inherent approximate nature of LWE problem to undertake a multi-level Approximate R-LWE (AxRLWE) architecture with respective security estimates opt for lightweight IoT devices. Undertaking AxRLWE on Field Programmable Gate Arrays (FPGAs), we benchmarked a 64% area reduction cost compared to earlier accurate R-LWE designs at the cost of reduced quantum-security. For the Application Specific Integrated Circuits (ASICs) with 45nm CMOS technology, AxRLWE was benchmarked to fit well within the same area-budget of lightweight ECC processor and consume a third of energy compared to special class of R-Binary LWE (R-BLWE) designs being proposed for an IoT, with better security level.
Article
Post-quantum cryptography (PQC) refers to the cryptosystem that can resist the attacks launched from mature quantum computers in the not far future and has recently gained intensive attention from the research community as most of the existing public-key cryptosystems are vulnerable to attacks from quantum computers. Ring-Learning-with-Errors (Ring-LWE)-based scheme is an essential type of the lattice-based PQC due to its strong security proof and ease of implementation. As the latest variant of the Ring-LWE, the binary Ring-LWE (BRLWE)-based scheme possesses even smaller computational complexity and thus is more suitable for resource-constrained applications. However, the existing works have not well covered various aspects related to this new scheme, especially on the low-complexity hardware implementation. In this paper, we aim to present a novel implementation of the BRLWE-based scheme on the hardware platform with very low-complexity with this point of view. To carry out the specified work in a successful manner, we have proposed mainly four layers of coherent interdependent efforts: (i) we have provided the necessary algorithmic derivation process in detail to formulate the desired algorithm for the polynomial multiplication over hybrid fields, which is the major arithmetic component of the BRLWE scheme; (ii) we have presented the corresponding hardware architecture in a thorough format with sufficient description of the internal structures; (iii) we have also provided the complexity analysis and implementation-based comparison to demonstrate the superior performance of the proposed polynomial multiplication over the state-of-the-art design; (iv) finally, we have extended the proposed low-complexity polynomial multiplication to the major operational phase of the BRLWE scheme. We have shown that the proposed BRLWE structure involves significantly lower area-time complexities over the existing design, e.g., the proposed design has at least 66.01% less area-delay product (ADP) than the newly reported (Straix V device). Overall, the proposed design and implementation strategies are highly efficient, and the proposed BRLWE structure is desirable for many emerging applications.
Article
Internet of Things (IoT) connects a myriad of small devices over a huge network, encompassing many different and varied applications and environments. As the IoT network continues to grow, providing end-to-end security over IoT is becoming a paramount issue. To mitigate existing and future security risks within IoT, two important factors should be considered. First, some resource-constrained edge devices have an insufficient area to contain the security part. Second, the advent of quantum computers threatens the security of current public-key cryptography algorithms. In response to these challenges, lattice-based cryptography (LBC) has emerged as a promising technique for IoT security in the quantum era. The feasibility of LBC integration onto resource-constrained devices has been demonstrated in previous research. Multiplication is the main operation in Ring-BinLWE, a type of LBC. In this paper, a new multiplication method is proposed, which is called In-place modular Reduction and anti-circular Rotation Column-based Multiplication (In-place Rot-Col-Mul), and new Ring-BinLWE architecture is designed. In-place Rot-Col-Mul performs a column-based multiplication in which one rotation is executed per cycle. The design was implemented on TSMC-65nm technology and FPGA platforms. ASIC implementation results show a respective improvement in power and area over the state-of-the-art design by 48.42% and 57.8%, respectively.
Article
Saber, the only module-learning with rounding-based algorithm in NIST’s third round of post-quantum cryptography (PQC) standardization process, is characterized by simplicity and flexibility. However, energy-efficient implementation of Saber is still under investigation since the commonly used number theoretic transform can not be utilized directly. In this manuscript, an energy-efficient configurable crypto-processor supporting multi-security-level key encapsulation mechanism of Saber, is proposed. First, an 8-level hierarchical Karatsuba framework is utilized to reduce degree-256 polynomial multiplication to the coefficient-wise multiplication. Second, a hardware-efficient Karatsuba scheduling strategy and an optimized pre-/post-processing structure is designed to reduce the area overheads of scheduling strategy. Third, a task-rescheduling-based pipeline strategy and truncated multipliers are proposed to enable fine-grained processing. Moreover, multiple parameter sets are supported in LWRpro to enable configurability among various security scenarios. Enabled by these optimizations, LWRpro requires 1066, 1456 and 1701 clock cycles for key generation, encapsulation, and decapsulation of Saber768. The post-layout version of LWRpro is implemented with TSMC 40 nm CMOS process within 0.38 mm 2 . The throughput for Saber768 is up to 275k encapsulation operations per second and the energy efficiency is 0.15 uJ/encapsulation while operating at 400 MHz, achieving nearly 50×50\times improvement and 31×31\times improvement, respectively compared with current PQC hardware solutions.
Article
Digit-serial systolic multipliers over GF(2m)GF(2^m) based on the National Institute of Standards and Technology (NIST) recommended trinomials play a critical role in the real-time operations of cryptosystems. Systolic multipliers over GF(2m)GF(2^m) involve a large number of registers of size O(m2)O(m^2) which results in significant increase in area complexity. In this paper, we propose a novel low register-complexity digit-serial trinomial-based finite field multiplier. The proposed architecture is derived through two novel coherent interdependent stages: (i) derivation of an efficient hardware-oriented algorithm based on a novel input-operand feeding scheme and (ii) appropriate design of novel low register-complexity systolic structure based on the proposed algorithm. The extension of the proposed design to Karatsuba algorithm (KA)-based structure is also presented. The proposed design is synthesized for FPGA implementation and it is shown that it (the design based on regular multiplication process) could achieve more than 12.1 percent saving in area-delay product and nearly 2.8 percent saving in power-delay product. To the best of the authors’ knowledge, the register-complexity of proposed structure is so far the least among the competing designs for trinomial based systolic multipliers (for the same type of multiplication algorithm).
Article
While the internet of things (IoT) shapes the future of internet, communications among nodes must be secured by employing cryptographic schemes such as public key encryption (PKE). However, classic PKE schemes such as RSA and ECC suffer from both high complexity and vulnerability to quantum attacks. During the past decade, post-quantum schemes based on Learning with errors (LWE) problem have gained high attention due to the lower complexity among PKE schemes. In addition to resistance against theoretical (quantum and classic) attacks, every practical implementation of any cryptosystem, must also be evaluated against different side-channel attacks such as power analysis or fault injection ones. In this paper, we analyze the vulnerability of binary Ring-LWE scheme regarding (first-order) fault attacks such as randomization, zeroing, and skipping faults. We show that previous implementations can be easily broken by employing such fault attacks. Moreover, we propose fault resilient software implementations of binary Ring-LWE on 8-bit and 32-bit lightweight microcontrollers, namely AVR ATxmega128A1 and ARM Cortex-M0 that are ideal for IoT devices. Furthermore, we formally prove the resilience of the proposed implementations against different fault attacks. To the best of our knowledge, this work is the first one to propose fault resilient binary Ring-LWE implementations on resource-constrained microcontrollers. Our implementations on AVR ATxmega128A1 require only 80 and 120 milliseconds for encryption and decryption, respectively.
Article
Lattice-based cryptography (LBC) is one of the most promising classes of post-quantum cryptography (PQC) that is being considered for standardization. This brief proposes an optimized schoolbook polynomial multiplication (SPM) for compact LBC. We exploit the symmetric nature of Gaussian noise for bit reduction. Additionally, a single field-programmable gate array (FPGA) DSP block is used for two parallel multiplication operations per clock cycle. These optimizations enable a significant 2.2×2.2\times speedup along with reduced resources for dimension n=256 . The overall efficiency (throughput per slice) is 1.28×1.28\times higher than the conventional SPM, as well as contributing to a more compact LBC system compared to previously reported designs. The results targeting the FPGA platform show that the proposed design can achieve high hardware efficiency with reduced hardware area costs.
Article
By exponential increase in applications of the internet of things (IoT), such as smart ecosystems or e-health, more security threats have been introduced. In order to resist known attacks for IoT networks, multiple security protocols must be established among nodes. Thus, IoT devices are required to execute various cryptographic operations such as public key encryption/decryption. However, classic public key cryptosystems such as RSA and ECC are computationally more complex to be efficiently implemented on IoT devices and are vulnerable regarding quantum attacks. Therefore, after complete development of quantum computing, these cryptosystems will not be secure and practical. In this paper, we propose InvRBLWE, an optimized variant for binary learning with errors over the ring (Ring-LWE) scheme that is proven to be secure against quantum attacks and is highly efficient for hardware implementations. We propose two architectures for InvRBLWE: 1) a high-speed architecture targeting edge and powerful IoT devices, 2) an ultra-lightweight architecture, which can be implemented on resource-constrained nodes in IoT. The proposed architectures are scalable regarding security levels and we provide experimental results for two versions of the InvRBLWE scheme providing 84 and 190 bits of classic security. Our implementation results on FPGA dominate the best of the classic and post-quantum previous implementations. Moreover, our two different ASIC implementations show improvement in terms of speed, area, power and/or energy. To the best of our knowledge, we are the first to implement LWE-based cryptosystems on ASIC platform.
Article
Lattice-based cryptography has shown great potential due to its resistance against quantum attacks. With the security requirements for high-precision Gaussian sampling and complex polynomial multiplication over rings, as well as storage of large public-keys, it is extremely challengeable but important to implement lattice-based schemes on resources constrained devices. In this paper, a resource-efficient and side-channel secure Ring-LWE cryptographic processor is presented. A discrete Gaussian sampler with constant response time, high precision, and large distribution tails is designed. The proposed Gaussian sampler is proven to be secure against side-channel timing attack according to the timing analysis attack results on a FPGA-based testing platform. A universal module MPE (Modular Processing Element) is designed to carry out all basic modular operations for Ring-LWE cryptography with high speed. The prototype verification is performed on the Xilinx Spartan-6 FPGA platform. The processor can execute an encryption/decryption operation on a 256-bit message in 4.5/0.9 ms whilst it consumes only 1307 LUTs, 889 FFs, 4 BRAMs, and none DSP module. Compared with other related hardware implementations, the Ring-LWE processor is advantageous not only in hardware efficiency but also in secure protection against side-channel attacks.
Article
Cryptography is essential for the security of online communication, cars and implanted medical devices. However, many commonly used cryptosystems will be completely broken once large quantum computers exist. Post-quantum cryptography is cryptography under the assumption that the attacker has a large quantum computer; post-quantum cryptosystems strive to remain secure even in this scenario. This relatively young research area has seen some successes in identifying mathematical operations for which quantum algorithms offer little advantage in speed, and then building cryptographic systems around those. The central challenge in post-quantum cryptography is to meet demands for cryptographic usability and flexibility without sacrificing confidence. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Conference Paper
Recently, an increasing amount of papers proposing post-quantum schemes also provide concrete parameter sets aiming for concrete post-quantum security levels. Security evaluations of such schemes need to include all possible attacks, in particular those by quantum adversaries. In the case of lattice-based cryptography, currently existing quantum attacks are mainly classical attacks, carried out with quantum basis reduction as subroutine. In this work, we propose a new quantum attack on the learning with errors (LWE) problem, whose hardness is the foundation for many modern lattice-based cryptographic constructions. Our quantum attack is based on Howgrave-Graham’s Classical Hybrid Attack and is suitable for LWE instances in recent cryptographic proposals. We analyze its runtime complexity and optimize it over all possible choices of the attack parameters. In addition, we analyze the concrete post-quantum security levels of the parameter sets proposed for the New Hope and Frodo key exchange schemes, as well as several instances of the Lindner-Peikert encryption scheme. Our results show that – depending on the assumed basis reduction costs – our Quantum Hybrid Attack either significantly outperforms, or is at least comparable to all other attacks covered by Albrecht–Player–Scott in their work “On the concrete hardness of Learning with Errors”. We further show that our Quantum Hybrid Attack improves upon the Classical Hybrid Attack in the case of LWE with binary error.
Article
This paper presents the design of ring learning with errors (LWE) cryptoprocessors using number theoretic transform (NTT) cores and Gaussian samplers based on the inverse transform method. The NTT cores are designed using radix-2 and radix-8 decimation-in-frequency NTT algorithms and pipeline architectures. The designed Gaussian samplers are an optimized parallel implementation of the inverse transform method and they use a pipeline architecture to generate a sample every clock cycle after the latency period, that is, the output is obtained in a fixed time achieving timing-attack-resistant ring-LWE cryptoprocessors. Also, taking into account the national institute of standards and technology recommendation, a random number generator is designed to generate the input of the Gaussian sampler. The cryptoprocessors were synthesized on the field-programmable gate array EP4SGX230KF40C2 and verified in hardware using the DE4 board and the SignalTap tool. According to the obtained synthesis results, for dimension 512, the three cryptoprocessors perform the encryption in 9.33, 5.16, and 1.73 μs and the decryption in 4.59, 2.78, and 1.04 μs. We compared the designed cryptoprocessors with other ones presented in the literature, and from this comparison, we can conclude that they have the highest throughput, but they require more area resources than other previous ones.
Conference Paper
Side-channel analysis and fault-injection attacks are known as major threats to any cryptographic implementation. Hardening cryptographic implementations with appropriate countermeasures is thus essential before they are deployed in the wild. However, countermeasures for both threats are of completely different nature: Side-channel analysis is mitigated by techniques that hide or mask key-dependent information while resistance against fault-injection attacks can be achieved by redundancy in the computation for immediate error detection. Since already the integration of any single countermeasure in cryptographic hardware comes with significant costs in terms of performance and area, a combination of multiple countermeasures is expensive and often associated with undesired side effects. In this work, we introduce a countermeasure for cryptographic hardware implementations that combines the concept of a provably-secure masking scheme (i.e., threshold implementation) with an error detecting approach against fault injection. As a case study, we apply our generic construction to the lightweight LED cipher. Our LED instance achieves first-order resistance against side-channel attacks combined with a fault detection capability that is superior to that of simple duplication for most error distributions at an increased area demand of 12 %.
Conference Paper
Lattice-based cryptography has gained credence recently as a replacement for current public-key cryptosystems, due to its quantum-resilience, versatility, and relatively low key sizes. To date, encryption based on the learning with errors (LWE) problem has only been investigated from an ideal lattice standpoint, due to its computation and size efficiencies. However, a thorough investigation of standard lattices in practice has yet to be considered. Standard lattices may be preferred to ideal lattices due to their stronger security assumptions and less restrictive parameter selection process. In this paper, an area-optimised hardware architecture of a standard lattice-based cryptographic scheme is proposed. The design is implemented on a FPGA and it is found that both encryption and decryption fit comfortably on a Spartan-6 FPGA. This is the first hardware architecture for standard lattice-based cryptography reported in the literature to date, and thus is a benchmark for future implementations. Additionally, a revised discrete Gaussian sampler is proposed which is the fastest of its type to date, and also is the first to investigate the cost savings of implementing with λ/2-bits of precision. Performance results are promising compared to the hardware designs of the equivalent ring-LWE scheme, which in addition to providing stronger security proofs; generate 1272 encryptions per second and 4395 decryptions per second.
Conference Paper
In the emerging Internet of Things, lightweight public-key cryptography is an essential component for many cost-efficient security solutions. Since conventional public-key schemes, such as ECC and RSA, remain expensive and energy hungry even after aggressive optimization, this work investigates a possible alternative. In particular, we show the practical potential of replacing the Gaussian noise distribution in the Ring-LWE based encryption scheme by Lindner and Peikert/Lyubashevsky et al. with a binary distribution. When parameters are carefully chosen, our construction is resistant against any state-of-the-art cryptanalytic techniques (e.g., attacks on original Ring-LWE or NTRU) and suitable for low-cost scenarios. In the end, our scheme can enable public-key encryption even on very small and low-cost 8-bit (ATXmega128) and 32-bit (Cortex-M0) microcontrollers.
Article
A transform analogous to the discrete Fourier transform may be defined in a finite field, and may be calculated efficiently by the 'fast Fourier transform algorithm. The transform may be applied to the problem of calculating convolutions of long integer sequences by means of integer arithmetic.
Conference Paper
The security of many cryptographic schemes has been based on special instances of the Learning with Errors (LWE) problem, e.g., Ring-LWE, LWE with binary secret, or LWE with ternary error. However, recent results show that some subclasses are weaker than expected. In this work we show that LWE with binary error, introduced by Micciancio and Peikert, is one such subclass. We achieve this by applying the Howgrave-Graham attack on NTRU, which is a combination of lattice techniques and a Meet-in-the-Middle approach, to this setting. We show that the attack outperforms all other currently existing algorithms for several natural parameter sets. For instance, for the parameter set n=256, m=512, q=256, this attack on LWE with binary error only requires 2852^{85} operations, while the previously best attack requires 21172^{117} operations. We additionally present a complete and improved analysis of the attack, using analytic techniques. Finally, based on the attack, we give concrete hardness estimations that can be used to select secure parameters for schemes based on LWE with binary error.
Conference Paper
In this paper we propose an efficient and compact processor for a ring-LWE based encryption scheme. We present three optimizations for the Number Theoretic Transform (NTT) used for polynomial multiplication: we avoid pre-processing in the negative wrapped convolution by merging it with the main algorithm, we reduce the fixed computation cost of the twiddle factors and propose an advanced memory access scheme. These optimization techniques reduce both the cycle and memory requirements. Finally, we also propose an optimization of the ring-LWE encryption system that reduces the number of NTT operations from five to four resulting in a 20% speed-up. We use these computational optimizations along with several architectural optimizations to design an instruction-set ring-LWE cryptoprocessor. For dimension 256, our processor performs encryption/decryption operations in 20/9 μs on a Virtex 6 FPGA and only requires 1349 LUTs, 860 FFs, 1 DSP-MULT and 2 BRAMs. Similarly for dimension 512, the processor takes 48/21 μs for performing encryption/decryption operations and only requires 1536 LUTs, 953 FFs, 1 DSP-MULT and 3 BRAMs. Our processors are therefore more than three times smaller than the current state of the art hardware implementations, whilst running somewhat faster.
Article
Polynomial multiplication is the basic and most computationally intensive operation in ring-learning with errors (ring-LWE) encryption and "somewhat" homomorphic encryption (SHE) cryptosystems. In this paper, the fast Fourier transform (FFT) with a linearithmic complexity of O(nlogn), is exploited in the design of a high-speed polynomial multiplier. A constant geometry FFT datapath is used in the computation to simplify the control of the architecture. The contribution of this work is three-fold. First, parameter sets which support both an efficient modular reduction design and the security requirements for ring-LWE encryption and SHE are provided. Second, a versatile pipelined architecture accompanied with an improved dataflow are proposed to obtain a high-speed polynomial multiplier. Third, the proposed architecture supports polynomial multiplications for different lengths n and moduli p. The experimental results on a Spartan-6 FPGA show that the proposed design results in a speedup of 3.5 times on average when compared with the state of the art. It performs a polynomial multiplication in the ring-LWE scheme (n=256,p=1049089) and the SHE scheme (n=1024,p=536903681) in only 6.3 μs and 33.1 μs, respectively.
Article
The Short Integer Solution (SIS) and Learning With Errors (LWE) problems are the foundations for countless applications in lattice-based cryptography, and are provably as hard as approximate lattice problems in the worst case. An important question from both a practical and theoretical perspective is how small their parameters can be made, while preserving their hardness. We prove two main results on SIS and LWE with small parameters. For SIS, we show that the problem retains its hardness for moduli q ≥ β·n δ for any constant δ > 0, where β is the bound on the Euclidean norm of the solution. This improves upon prior results which required q>βnlognq > \beta \cdot \sqrt{n \log n}, and is close to optimal since the problem is trivially easy for q ≤ β. For LWE, we show that it remains hard even when the errors are small (e.g., uniformly random from {0,1}), provided that the number of samples is small enough (e.g., linear in the dimension n of the LWE secret). Prior results required the errors to have magnitude at least n\sqrt{n} and to come from a Gaussian-like distribution.
Conference Paper
Bounded Distance Decoding (BDD) is a basic lattice problem used in cryptanalysis: the security of most lattice-based encryption schemes relies on the hardness of some BDD, such as LWE. We study how to solve BDD using a classical method for finding shortest vectors in lattices: enumeration with pruning speedup, such as Gama-Nguyen-Regev extreme pruning from EUROCRYPT '10. We obtain significant improvements upon Lindner-Peikert's Search-LWE algorithm (from CT-RSA '11), and update experimental cryptanalytic results, such as attacks on DSA with partially known nonces and GGH encryption challenges. Our work shows that any security estimate of BDD-based cryptosystems must take into account enumeration attacks, and that BDD enumeration can be practical even in high dimension like 350.
Article
A transform analogous to the discrete Fourier transform may be defined in a finite field, and may be calculated efficiently by the `fast Fourier transform' algorithm. The transform may be applied to the problem of calculating convolutions of long integer sequences by means of integer arithmetic.
Chapter
Imagine that it’s fifteen years from now and someone announces the successful construction of a large quantum computer. The New York Times runs a frontpage article reporting that all of the public-key algorithms used to protect the Internet have been broken. Users panic. What exactly will happen to cryptography? Perhaps, after seeing quantum computers destroy RSA and DSA and ECDSA, Internet users will leap to the conclusion that cryptography is dead; that there is no hope of scrambling information to make it incomprehensible to, and unforgeable by, attackers; that securely storing and communicating information means using expensive physical shields to prevent attackers from seeing the information—for example, hiding USB sticks inside a locked briefcase chained to a trusted courier's wrist.
Conference Paper
Our main result is a reduction from worst-case lattice problems such as GapSVP and SIVP to a certain learning problem. This learning problem is a natural extension of the “learning from parity with error” problem to higher moduli. It can also be viewed as the problem of decoding from a random linear code. This, we believe, gives a strong indication that these problems are hard. Our reduction, however, is quantum. Hence, an efficient solution to the learning problem implies a quantum algorithm for GapSVP and SIVP. A main open question is whether this reduction can be made classical (i.e., nonquantum). We also present a (classical) public-key cryptosystem whose security is based on the hardness of the learning problem. By the main result, its security is also based on the worst-case quantum hardness of GapSVP and SIVP. The new cryptosystem is much more efficient than previous lattice-based cryptosystems: the public key is of size Õ( n ² ) and encrypting a message increases its size by a factor of Õ( n ) (in previous cryptosystems these values are Õ( n ⁴ ) and Õ( n ² ), respectively). In fact, under the assumption that all parties share a random bit string of length Õ( n ² ), the size of the public key can be reduced to Õ( n ).