Article

Hybrid Recursive Karatsuba Multiplications on FPGAs

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The demand for large integer polynomial multiplications has become increasingly significant in modern cryptographic algorithms. The practical implementation of such multipliers presents a field of research focused on optimizing hardware design concerning space and time complexity. In this paper, the authors propose an efficient polynomial multiplier based on a Hybrid Recursive Karatsuba Multiplication (HRKM) algorithm. The overall performance of the proposed design is evaluated using the Area-Time-Product (ATP) metric. The hardware implementation of the proposed architecture is carried out on a Virtex-7 FPGA device using the Xilinx ISE platform. Hardware implementations results show that the proposed HRKM architecture shows ATP reduction of 67.885%, 70.128%, and 65.869% for 128 bits, 256 bits and 512 bits respectively in comparison to Hybrid Karatsuba (non-recursive) multiplications.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... There still remains a demand for optimized IEEE-754 compliant floating-point multipliers on FPGAs [35], with a focus on efficient mantissa multipliers for integration into floating-point units. This work introduces an optimized hardware solution for IEEE-754 compliant floatingpoint multipliers by exploring three types of mantissa multiplier: the Optimized Schoolbook Multiplier (OBSM), the Hybrid Karatsuba Multiplier (HKM), and the Hybrid Recursive Karatsuba Multiplier (HRKM), as proposed in [31], [32], and [33], respectively. ...
... The paper is structured as follows. Section II details the proposed methods for floating point multipliers in FP, SP, DP, DEP, and QP formats using the three mantissa multipliers: OSBM [31], HKM [32] and HRKM [33]. Section III presents FPGA implementation results and performance comparison of the proposed IEEE 754 floating-point multipliers with existing state-of-the-art designs. ...
... Hardware implementations of IEEE-754 compliant floatingpoint multipliers are developed using three types of mantissa multiplier: the Optimized Schoolbook Multiplier (OBSM) [31], the Hybrid Karatsuba Multiplier (HKM) [32], and the Hybrid Recursive Karatsuba Multiplier (HRKM) [33]. This reduces the hardware cost in terms of LUTs and reduces the latency of the circuit in terms of FPGA implementations. ...
Conference Paper
This work presents IEEE 754 compliant floating-point multipliers for various formats: Half Precision (HP), Single Precision (SP), Double Precision (DP), Double Extended Precision (DEP), and Quadruple Precision (QP), with bit lengths of 16, 32, 64, 80, and 128 bits, respectively, implemented on Field Programmable Gate Arrays (FPGAs). The study explores different mantissa multipliers -- Optimized Schoolbook Multiplier (OBSM), Hybrid Karatsuba Multiplier (HKM) and Hybrid Recursive Karatsuba Multiplier (HRKM) -- to identify the most efficient mantissa multipliers that comply with the IEEE 754 standard and are suitable for floating-point multipliers. Hardware implementations show that the proposed HRKM-based SP and DP floating-point multipliers show a better Area-Time-Product (ATP) performance of 79.599% and 83. 901%, respectively, in comparison to the latest state-of-the-art work on the AMD Kintex Ultrascale KCU105 FPGA platform.
Article
Full-text available
The Karatsuba algorithm is an effective way to accelerate large integer multiplications through recursive function calls. However, existing hardware implementations of Karatsuba multipliers are limited to fixed operand sizes. To enable their application in diverse domains, including homomorphic encryption with varying multiplicative depths, it is necessary to support variable operand sizes. In this paper, we propose a novel Karatsuba multiplier design, named FlexKA, which supports variable operand sizes through a state machine that manages the dynamic call states of the operation. We evaluate FlexKA on the Xilinx ZynqMP FPGA and demonstrate that it supports variable operand sizes up to 256K bits, achieving a 9.2× speedup compared to a highly-optimized software library running on a CPU. Our results show that FlexKA is an efficient and effective solution for large integer multiplications with flexible operand sizes in hardware.
Article
Full-text available
Efficiency in multiplication is very important in applications like signal processing, cryptosystems and coding theory. This paper presents the design of a fast multiplier using the Karatsuba algorithm to multiply two numbers using the technique of polynomial multiplication. The Karatsuba algorithm saves coefficient multiplications at the cost of extra additions as compared to the ordinary multiplication method. The Karatsuba algorithm is more efficient for multiplication of large numbers.
Article
Full-text available
In this work we generalize the classical Karatsuba Algorithm (KA) for polynomial multiplica- tion to (i) polynomials of arbitrary degree and (ii) recursive use. We determine exact complexity expressions for the KA and focus on how to use it with the least number of operations. We develop a rule for the optimum order of steps if the KA is used recursively. We show how the usage of dummy coefficients may improve performance. Finally we provide detailed i nformation on how to use the KA with least cost, and also provide tables that describe the best possible usage of the KA for polynomials up to a degree of 127. Our results are especially useful for efficient implementations of cryptographic and coding schemes over fixed-size fields likeGF(pm).
Article
Hardware-efficient polynomial multipliers are desired to satisfy the ever-growing demands of computing within the finite field space toward developing a strong cryptosystems. This research meticulously explores polynomial multiplication from the context of algebraic structures by introducing a novel hetero-blend recursive multiplier that harnesses the strengths of the contemporary state-of-the-art (SOTA) designs. The heterogeneous-blend recursive multiplier (HRM) adeptly merges the footprint efficiency of the Karatsuba multiplier (KM) and the compute-latency benefits of the overlap-free KM (OKM) at higher stages, while at lower bounds, it capitalizes the optimal balance of footprint and compute-latency benefits of the schoolbook multiplier (SBM). To further enhance the performance, HRM integrates the heterogeneous term division throughout its stages which is a characteristic find taken from the prior work on M -term nonhomogeneous Karatsuba multiplier (MNHKA). Furthermore, a MATLAB framework has been devised to expedite the exploration process in the finite field design space resulting from the heterogeneous usage of the M terms across multiple stages. The presented HRM design undergoes comprehensive evaluation when benchmarked against contemporary SOTA designs including KM, OKM, their corresponding homogeneous M term variants referred to as M -term Karatsuba multiplier (MKM), M -term OKM (MOKM) alongside recent variants of composite M -term Karatsuba multipliers (CMKA), MNHKA, and equivalent overlap-free variant M -term nonhomogeneous overlap-free Karatsuba multiplier (MNHOKA). The field-programmable gate array (FPGA) synthesized results for the HRM designs on Zynq ZCU-104 board showcase a best-case of 17.288% lookup table (LUT) savings, 5.68% reduction in delay, and 20.88% gain in area-delay product (ADP) compared with the optimal SOTA design, while also revealing a 13.49% reduction in LUT usage, 5.45% decrease in delay, and 12.97% improvement in ADP when compared with the best among MNHKA and MNHOKA designs. Furthermore HRM designs synthesized on the Cadence GPDK45 library achieved a best-case footprint saving of 16.18%, a critical path delay improvement of 29.53%, a remarkable 45.66% gain in the ADP, a substantial 30.37% reduction in power consumption, and a noteworthy 38.63% improvement in power per area when compared with the optimal SOTA design. In comparison to the leading MNHKA and MNHOKA designs, the HRM designs exhibit a best-case footprint improvement of 5.77%, 8.31% reduction in delay, 16.76% enhancement in ADP, a significant 20.18% power savings, and a notable 17.71% improvement in power-per-unit-area (PPA). To catalyze ongoing research and innovation, hardware designs assessed in this article are made publicly available for further usage.
Article
In this work, we present an area and energy-efficient serial multiplier. Specifically, we exploit symmetries in odd and even partial products (PPs) in its radix- γ\gamma implementation. Subsequently, we express them as (2k±1)\mp (2^{k}\pm 1) with 1klog2γ11 \leq k \leq \text {{log}}_{2}\gamma -1 , which enable to reduce the hardware resources. For γ16\gamma \geq 16 , the above representation becomes invalid, requiring additional power-of-two terms and raising hardware costs. To address this, we utilize recursive symmetries in PPs, which enable time-sharing and reduce the logic resources for efficient realization. ASIC synthesis results show the proposed design has substantial savings in area and energy than the state-of-the-art design.
Article
Novel binary polynomial multipliers have been designed using M-term overlap-free Karatsuba multiplication (OFKM), where M is 5–8. The proposed designs were realized in digital hardware and implemented on field-programmable gate array (FPGA) and the best value of M was selected and presented for common National Institute of Standards and Technology (NIST) operand sizes from 64 to 571 bits. The implemented hardware designs use a hybrid approach that combines a given M-term overlap-free Karatsuba multipliers with two-term splitting to reduce the need for zero-padding in the final recurrent stages. Compared to the traditional M-term Karatsuba multipliers, the proposed overlap-free implementations offer reductions in delay and area-delay product (ADP). The proposed designs also compare favorably to previous implementations of binary polynomial multipliers. Their favorable characteristics make the proposed overlap-free Karatsuba polynomial multipliers viable options for use in cryptographic systems where speed is a significant consideration and hardware resource consumption must be limited.
Conference Paper
The requirements of hardware design for large integer polynomial multiplications is the need of the hour in various cryptographic fields involving large computational complexities. Schoolbook multiplication, being a common alternative is presented in this paper for implementation. A highly optimized Schoolbook multiplier is proposed, which is much faster than the traditional ones. The overall performance of the algorithm is evaluated using Area-Time-Product (ATP). Hardware implementation of the proposed schoolbook multiplication architecture is done using Virtex-7 FPGA device in Xilinx ISE platform.
Conference Paper
Modern cryptographic algorithms demand the use and necessity for large integer polynomial multiplications. However for large input operand size of multipliers, the complexity of the hardware design arises in terms of space and time. Thus, in this paper efforts has been made to design an efficient polynomial multiplier by implementing a hybrid Karatsuba multiplication algorithm. The overall performance of the proposed design is measured using Area-Time-Product (ATP). Hardware implementation of the proposed architecture is done using Virtex-7 FPGA device in Xilinx ISE platform.
Article
This article details a fast and efficient implementation of the Montgomery Modular Multiplication by taking advantage of parallel multipliers and adders. This implementation was programmed in high-level synthesis language and tested on a FPGA device. In order to test the performance of the proposal, a sequential version of the algorithm was also implemented in hardware. Moreover, we compared the parallel implementation with a software version and with five contributions from the literature. This way, we found that our proposal improves the performance of all other implementations.
Article
Multipliers requiring large bit lengths have a major impact on the performance of many applications, such as cryptography, digital signal processing (DSP) and image processing. Novel, optimised designs of large integer multiplication are needed as previous approaches, such as schoolbook multiplication, may not be as feasible due to the large parameter sizes. Parameter bit lengths of up to millions of bits are required for use in cryptography, such as in lattice-based and fully homomorphic encryption (FHE) schemes. This paper presents a comparison of hardware architectures for large integer multiplication. Several multiplication methods and combinations thereof are analysed for suitability in hardware designs, targeting the FPGA platform. In particular, the first hardware architecture combining Karatsuba and Comba multiplication is proposed. Moreover, a hardware complexity analysis is conducted to give results independent of any particular FPGA platform. It is shown that hardware designs of combination multipliers, at a cost of additional hardware resource usage, can offer lower latency compared to individual multiplier designs. Indeed, the proposed novel combination hardware design of the Karatsuba-Comba multiplier offers lowest latency for integers greater than 512 bits. For large multiplicands, greater than 16384 bits, the hardware complexity analysis indicates that the NTT-Karatsuba-Schoolbook combination is most suitable.
Conference Paper
The finite Field multiplication is the basic operation in all cryptographic applications. It can be performed by using Serial, Booth, Montgomery and Karatsuba-Ofman's divide-and-conquer technique. The Karatsuba-Ofman multiplier replaces a multiplication by three ones of half-length operands which are performed in parallel. The implementation of Karatsuba-Ofman multiplier has been made both in sequential and parallel architectures. In order to improve the performance's architectures over GF (2m), we propose a new Sequential/Parallel architectures of Recursive Karatsuba-Ofman multiplier. In this paper, two Sequential/Parallel architectures are presented, developed and implemented on the Spartan 3 FPGA platform. Area and low Delay computation of the proposed architectures are improved. Mathematical Performances models (Area (n), Delay (n)) for large number (n) are elaborated for our proposed architectures. They can be established in order to expect the appropriate multiplier for the cryptographic applications.
Conference Paper
Multiplication of long integers is a cornerstone primitive in most public-key cryptosystems. Multiplication for big numbers can be performed best using Karatsuba-Ofman's divide-and-conquer approach. We propose a recursive and efficient hardware for Karatsuba-Ofman's multiplication algorithm. The hardware is efficient in terms of response time and fairly compact in terms of hardware description language VHDL. The performance of the synthesised hardware in terms of time and area requirements is compared with that of Synopsys™ library multiplier as well as two different multipliers that implement Booth's algorithm. The proposed hardware multiplies faster that the other three. However, it requires more hardware area. Nevertheless our design improves the area×time product as well as time requirement while the other three improve area at the expense of both time requirement and the factor area×time.