Conference Paper

Finding Optimum Parallel Coprocessor Design for Genus 2 Hyperelliptic Curve Cryptosystems.

Politecnico di Milano, Italy;
DOI: 10.1109/ITCC.2004.1286710 Conference: International Conference on Information Technology: Coding and Computing (ITCC'04), Volume 2, April 5-7, 2004, Las Vegas, Nevada, USA
Source: DBLP

ABSTRACT Hardware accelerators are often used in cryptographic applications for speeding up the highly arithmetic-intensive public-key primitives, e.g. in high-end smart cards. One of these emerging and very promising public-key schemes is based on hyperelliptic curve cryptosystems (HECC). In the open literature only a few considerations deal with hardware implementation issues of HECC. Our contribution appears to be the first one to propose architectures for the latest findings in efficient group arithmetic on HEC. The group operation of HECC allows parallelization at different levels: bit-level parallelization (via different digit-sizes in multipliers) and arithmetic operation-level parallelization (via replicated multipliers). We investigate the trade-offs between both parallelization options and identify speed and time-area optimized configurations. We found that a coprocessor using a single multiplier (D=8) instead of two or more is best suited. This coprocessor is able to compute group addition and doubling in 479 and 334 clock cycles, respectively. Providing more resources it is possible to achieve 288 and 248 clock cycles, respectively.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Pipelining is a well-known performance enhancing technique in computer science. Point multiplication is the computationally dominant operation in curve based cryptography. It is generally computed by repeatedly invoking some curve (group) operation like doubling, tripling, halving, addition of group elements. Such a computational procedure may be efficiently computed in a pipeline. More generally, let Π be a computational procedure, which computes its output by repeatedly invoking processes from a set of similar processes. Employing pipelining technique may speed up the running time of the computational procedure. To find pipeline sequence by trial and error method is a nontrivial task. In the current work, we present a general methodology, which given any such computational procedure Π can find a pipelined version with improved computational speed. To our knowledge, this is the first such attempt in curve based cryptography, where it can be used to speed up the point multiplication methods using inversion-free explicit formula for curves over prime fields. As an example, we employ the proposed general methodology to derive a pipelined version of the hyperelliptic curve binary algorithm for point multiplication and obtain a performance gain of 32% against the ideal theoretical value of 50%.
    Applied Cryptography and Network Security, 4th International Conference, ACNS 2006, Singapore, June 6-9, 2006, Proceedings; 01/2006
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a microcode instruction set coprocessor which is designed to work with an 8-bit 8051 microcontroller and implements a hyperelliptic curve cryptosystem (HECC). The microcode coprocessor is capable of performing a range of Galois field operations using a dual-multiplier/dual-adder datapath and storing the intermediate results in the local storage unit of the coprocessor (RAM). This coprocessor is programmed using the software routines from the 8051 microcontroller which implements the HECC divisor's doubling and addition operations. The Jacobian scalar multiplication was computed in a 656 msec (7.87 M cycles) at 12 MHz clock frequency.
    Signal Processing Systems Design and Implementation, 2005. IEEE Workshop on; 12/2005
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The HyperElliptic Curve Cryptosystem (HECC) was quite extensively studied during the recent years. In the open literature one can flnd results on improving the group operations of HECC as well as implementations on various types of processors. There have also been some efiorts to implement HECC on hardware devices, like for instance FPGAs. Only one of these works, however, deals with the inversionfree formulae to compute the group operations of HECC. We present inversionfree group operations for the HEC y2 + xy = x5 + f1x + f0 and targeting characteristic two flelds. The reason being to al- low a fair comparison to hardware architectures using the a-ne case presented in (BBWP04). In the main part of the paper we use these results to investigate various hardware architectures for a HECC VLSI coprocessor. If area constraints are not considered, scalar multiplication can be performed in 19769 clock cycles using three fleld multipliers (of type D = 32), one fleld adder and one fleld squarer, where D indicates the digit size of the multiplier. However, the optimal solution in terms of latency and area uses two multipliers (of type D = 4), one addition and one squaring. The main flnding of the present contribution is that copro- cessors based on the inversionfree formulae should be preferred compared to those using group operations containing inversion. This holds despite the fact that one fleld inversion in the a-ne HECC group operation is traded by up to 24 fleld multiplications in the inversionfree case.
    Computational Science and Its Applications - ICCSA 2006, International Conference, Glasgow, UK, May 8-11, 2006, Proceedings, Part III; 01/2006

Full-text (2 Sources)

1 Download
Available from
Aug 1, 2014