Conference Paper

# Finding Optimum Parallel Coprocessor Design for Genus 2 Hyperelliptic Curve Cryptosystems.

Politecnico di Milano, Italy
DOI: 10.1109/ITCC.2004.1286710 Conference: International Conference on Information Technology: Coding and Computing (ITCC'04), Volume 2, April 5-7, 2004, Las Vegas, Nevada, USA
Source: DBLP

ABSTRACT

Hardware accelerators are often used in cryptographic applications for speeding up the highly arithmetic-intensive public-key primitives, e.g. in high-end smart cards. One of these emerging and very promising public-key schemes is based on hyperelliptic curve cryptosystems (HECC). In the open literature only a few considerations deal with hardware implementation issues of HECC. Our contribution appears to be the first one to propose architectures for the latest findings in efficient group arithmetic on HEC. The group operation of HECC allows parallelization at different levels: bit-level parallelization (via different digit-sizes in multipliers) and arithmetic operation-level parallelization (via replicated multipliers). We investigate the trade-offs between both parallelization options and identify speed and time-area optimized configurations. We found that a coprocessor using a single multiplier (D=8) instead of two or more is best suited. This coprocessor is able to compute group addition and doubling in 479 and 334 clock cycles, respectively. Providing more resources it is possible to achieve 288 and 248 clock cycles, respectively.

### Full-text

Available from: G.M. Bertoni, Aug 01, 2014
• Source
• "Since then, tremendous effort has been made to extend and optimize the Harley's algorithm in order to make the performance of the HECC compatible to that of the ECC. The performance of HECC has been analyzed and implemented in all kinds of generalpurpose processors and embedded processors [2]–[4], [6], [12], [22], [28]–[31], [39], [42]–[45], [47], [53]–[55], [57], [58], [64]–[72], [74], [77], [82], [84], [86], [88]–[92], [95]–[97], [99], [101]–[103], [108], [111], [112], and in many hardware platforms such as Field Programmable Gate Arrays (FPGAs) [7]–[9], [11], [17], [26], [50], [51], [56], [107], [109], [110]. Furthermore, using HECs to efficiently implement pairingbased cryptosystem has actively investigated recently [5], [14]–[16], [23], [24], [46], [48], [75]. "
##### Article: Efficient Explicit Formulae for Genus 3 Hyperelliptic Curve Cryptosystems
[Hide abstract]
ABSTRACT: The ideal class groups of hyperelliptic curves (HECs) can be used in cryptosystems based on the discrete loga-rithm problem. Recent developments of computational technolo-gies for scalar multiplications of divisor classes have shown that the performance of hyperelliptic curve cryptosystems (HECC) is compatible to that of elliptic curve cryptosystems (ECC). Espe-cially, genus 3 HECC are well suited for all kinds of embedded processor architectures, where resources such as storage, time or power are constrained, because of their short operand sizes. In this paper, we investigate the efficient explicit formulae for genus 3 HECs over both prime fields and binary fields, and analyze how many field operations are needed. First, we improve the explicit formulae for genus 3 HECs over binary fields using the theta divisors which can save about 20% ∼ 50% multiplications for four cases, and extend the method to genus 3 HECs over prime fields. We then discuss acceleration of the divisor class doubling for genus 3 HECs over binary fields. By constructing birational transformations of variables, we find four types of curves which can lead to much faster divisor class doubling and give the corresponding explicit formulae. Especially, for special genus 3 HECs over binary fields with h(X) = 1, we obtain the fastest explicit doubling formula which only requires 1I + 10M + 11S. Thirdly, we propose the inversion-free explicit formulae for genus 3 HEC over both prime fields and binary fields by introducing one more coordinate to collect the common denominator of the usual six coordinates. Finally, comparisons with the known results in terms of field operations and an implementation of genus 3 HECC over three binary fields on a Pentium-4 processor are provided.
Full-text · Article · Dec 2008
• Source
• "They used Cantor's algorithm [8] to implement HECC on the VirtexII FPGA. Wollinger et al. investigated HECC implementation on a VLSI coprocessor [12] [13]. In [14] three different architectures on a FPGA have been examined for a vast area of applications. "
##### Conference Paper: A Hyperelliptic Curve Cryto coprocessor for an 8051 microcontroller
[Hide abstract]
ABSTRACT: This paper presents a microcode instruction set coprocessor which is designed to work with an 8-bit 8051 microcontroller and implements a hyperelliptic curve cryptosystem (HECC). The microcode coprocessor is capable of performing a range of Galois field operations using a dual-multiplier/dual-adder datapath and storing the intermediate results in the local storage unit of the coprocessor (RAM). This coprocessor is programmed using the software routines from the 8051 microcontroller which implements the HECC divisor's doubling and addition operations. The Jacobian scalar multiplication was computed in a 656 msec (7.87 M cycles) at 12 MHz clock frequency.
Full-text · Conference Paper · Dec 2005
• Source
##### Article: Parallel Memory Architecture for Elliptic Curve Cryptography over \mathbb{G}\mathbb{F}{\left( p \right)} Aimed at Efficient FPGA Implementation
[Hide abstract]
ABSTRACT: Parallelization of operations is of utmost importance for efficient implementation of Public Key Cryptography algorithms. Starting with a classification of parallelization methods at different abstraction levels of public key algorithms, we propose a novel memory architecture for elliptic curve implementations with multiple modular multiplier units. This architecture is well-suited for different point addition and doubling algorithms over $\mathbb{G}\mathbb{F}{\left( p \right)}$ to be implemented on FPGAs. It allows the execution time to scale with the number of modular multipliers and exhibits nearly no overhead compared to the mere runtime of the multipliers. The advantages of this distributed memory architecture are demonstrated by means of two different point addition and doubling algorithms.
Full-text · Article · Apr 2008 · Journal of Signal Processing Systems