Conference Paper

# Finding Optimum Parallel Coprocessor Design for Genus 2 Hyperelliptic Curve Cryptosystems.

Politecnico di Milano, Italy
DOI: 10.1109/ITCC.2004.1286710 Conference: International Conference on Information Technology: Coding and Computing (ITCC'04), Volume 2, April 5-7, 2004, Las Vegas, Nevada, USA
Source: DBLP

ABSTRACT Hardware accelerators are often used in cryptographic applications for speeding up the highly arithmetic-intensive public-key primitives, e.g. in high-end smart cards. One of these emerging and very promising public-key schemes is based on hyperelliptic curve cryptosystems (HECC). In the open literature only a few considerations deal with hardware implementation issues of HECC. Our contribution appears to be the first one to propose architectures for the latest findings in efficient group arithmetic on HEC. The group operation of HECC allows parallelization at different levels: bit-level parallelization (via different digit-sizes in multipliers) and arithmetic operation-level parallelization (via replicated multipliers). We investigate the trade-offs between both parallelization options and identify speed and time-area optimized configurations. We found that a coprocessor using a single multiplier (D=8) instead of two or more is best suited. This coprocessor is able to compute group addition and doubling in 479 and 334 clock cycles, respectively. Providing more resources it is possible to achieve 288 and 248 clock cycles, respectively.

0 Followers
·
78 Views
• ##### Article: Cryptographic Hardware and Embedded Systems - CHES 2005, 7th International Workshop, Edinburgh, UK, August 29 - September 1, 2005, Proceedings
[Hide abstract]
ABSTRACT: For the preceding workshop see Zbl 1056.68012.
• Source
##### Conference Paper: Performance of HECC Coprocessors Using Inversion-Free Formulae.
[Hide abstract]
ABSTRACT: The HyperElliptic Curve Cryptosystem (HECC) was quite extensively studied during the recent years. In the open literature one can flnd results on improving the group operations of HECC as well as implementations on various types of processors. There have also been some efiorts to implement HECC on hardware devices, like for instance FPGAs. Only one of these works, however, deals with the inversionfree formulae to compute the group operations of HECC. We present inversionfree group operations for the HEC y2 + xy = x5 + f1x + f0 and targeting characteristic two flelds. The reason being to al- low a fair comparison to hardware architectures using the a-ne case presented in (BBWP04). In the main part of the paper we use these results to investigate various hardware architectures for a HECC VLSI coprocessor. If area constraints are not considered, scalar multiplication can be performed in 19769 clock cycles using three fleld multipliers (of type D = 32), one fleld adder and one fleld squarer, where D indicates the digit size of the multiplier. However, the optimal solution in terms of latency and area uses two multipliers (of type D = 4), one addition and one squaring. The main flnding of the present contribution is that copro- cessors based on the inversionfree formulae should be preferred compared to those using group operations containing inversion. This holds despite the fact that one fleld inversion in the a-ne HECC group operation is traded by up to 24 fleld multiplications in the inversionfree case.
Computational Science and Its Applications - ICCSA 2006, International Conference, Glasgow, UK, May 8-11, 2006, Proceedings, Part III; 01/2006
• ##### Article: Parallel Memory Architecture for Elliptic Curve Cryptography over \mathbb{G}\mathbb{F}{\left( p \right)} Aimed at Efficient FPGA Implementation
[Hide abstract]
ABSTRACT: Parallelization of operations is of utmost importance for efficient implementation of Public Key Cryptography algorithms. Starting with a classification of parallelization methods at different abstraction levels of public key algorithms, we propose a novel memory architecture for elliptic curve implementations with multiple modular multiplier units. This architecture is well-suited for different point addition and doubling algorithms over $\mathbb{G}\mathbb{F}{\left( p \right)}$ to be implemented on FPGAs. It allows the execution time to scale with the number of modular multipliers and exhibits nearly no overhead compared to the mere runtime of the multipliers. The advantages of this distributed memory architecture are demonstrated by means of two different point addition and doubling algorithms.
Journal of Signal Processing Systems 04/2008; 51(1). DOI:10.1007/s11265-007-0135-9 · 0.56 Impact Factor