Content uploaded by Diego F. Aranha

Author content

All content in this area was uploaded by Diego F. Aranha on Dec 17, 2017

Content may be subject to copyright.

A Secure and Eﬃcient Implementation of the

Quotient Digital Signature Algorithm (qDSA)

Armando Faz-Hernández, Hayato Fujii, Diego F. Aranha, and Julio López ?

Institute of Computing, University of Campinas.

1251 Albert Einstein, Cidade Universitária. Campinas, São Paulo, Brazil.

{armfazh,dfaranha,jlopez}@ic.unicamp.br, hayato@lasca.ic.unicamp.br

Abstract. Digital signatures provide a means to publicly authenticate

messages sent over an insecure channel. Recently, the Quotient Digital

Signature Algorithm (qDSA) was introduced aiming key-compatibility

with the Diﬃe-Hellman X25519 function. Due to the novelty of qDSA,

there remains a need for an optimized implementation that allows iden-

tifying the real impact of this new algorithm. In this work, we focus on

the secure and eﬃcient implementation of qDSA. By leveraging the use

of precomputation on the right-to-left Joye’s algorithm, we reduced the

running time of signature generation by 30–35%, and the running time

of the veriﬁcation procedure by 19%. In addition, for increased security,

we show a veriﬁcation method that validates qDSA signatures unequiv-

ocally. All of these improvements were included into an optimized soft-

ware library targeting 32–bit ARM and 64–bit Intel architectures. The

improved performance achieved in these platforms, it positions qDSA as

a competitive alternative for deploying digital signatures eﬃciently and

securely.

Keywords: qDSA ·Digital Signatures ·Elliptic Curve Cryptography ·

Secure Software ·Montgomery Curves

1 Introduction

Digital signatures are public-key cryptographic schemes used to authenticate

messages sent over a public channel; thus, anyone with the knowledge of the

signer’s public-key is able to verify whether a signed message comes from a

reliable source. Digital signatures also provide other security services such as

data integrity, authentication, and non-repudiation. One of the most relevant

applications of digital signatures is the certiﬁcation of public keys in the Public-

Key Infrastructure (PKI). In this scenario, a trusted authority issues and signs a

digital certiﬁcate that binds a public key to its owner; then, whenever an entity

?The authors acknowledge support during the development of this research from

Intel and FAPESP under project “Secure Execution of Cryptographic Algorithms”

(grant 14/50704-7), and from LG Electronics Inc. under project “Eﬃcient and Secure

Cryptography for IoT”. The fourth author was partially supported by a research

productivity grant from CNPq.

2 Faz-Hernández, Fujii, Aranha, López

claims to be the owner of a public key, the digital certiﬁcate must be presented;

therefore, anybody with the knowledge of the authority’s public key is able to

verify the signature of the certiﬁcate that attests this relationship.

In the last decades, several digital signature algorithms have been standard-

ized. In 1998, the National Institute of Standards and Technology (NIST) ap-

proved the use of the Digital Signature Algorithm (DSA) [24] and the RSA

digital signature [34]. Later in 2000, NIST also adopted the use of a digital sig-

nature algorithm that relies on the computational intractability of the elliptic

curve discrete logarithm problem, such a method is known as the Elliptic Curve

Digital Signature Algorithm (ECDSA) [17,25]. Since their standardization, these

algorithms have been widely used in secure communication protocols, such as

the Transport Layer Security (TLS) protocol [33].

More recently, cutting-edge cryptographic research is in pursuit of eﬃcient

digital signature algorithms. The introduction of the Edwards Digital Signa-

ture Algorithm (EdDSA) [2] is an example of the latest progress. EdDSA uses

Edwards curves, which belong to a special family of elliptic curves whose point

addition formulas are more eﬃcient than the formulas used for an arbitrary curve

in the short Weierstrass model. Ed25519 [18] is an instance of EdDSA addressing

the 128-bit security level. Particularly, Ed25519 uses an Edwards curve derived

from the Montgomery curve known as Curve25519 [1]. This latter curve was

intended to accelerate the key exchange protocol leading to the Diﬃe-Hellman

X25519 function [41]. Although Ed25519 and X25519 can be used in conjunc-

tion beneﬁting from the common prime ﬁeld arithmetic, the keys used in each

protocol are not entirely compatible.

To make this compatibility possible, novel alternatives were derived such

as the XEdDSA signature scheme [30]. In the past few months, an alternative

approach was proposed by Renes and Smith [32], who introduced a new signature

scheme based on Curve25519. They named this scheme as the Quotient Digital

Signature Algorithm (qDSA) because scalar point multiplications are performed

on an algebraic variety generated by the quotient of an algebraic curve.

The most salient properties of qDSA are: ﬁrst, it allows to use X25519’s

keys (without modiﬁcation) for signing; and second, elliptic curve operations

are performed using only the x-coordinate of points (provided by the use of

Montgomery elliptic curves). On the opposite side, given a qDSA signature, it

is easy to obtain a second signature that also passes the veriﬁcation procedure.

Although this fact does not represent an attack per se, it does open a breach to

a misuse of the cryptographic scheme that could potentially become an eﬀective

attack [7,8]. Therefore, there is a need for methods that allows verifying qDSA’s

signatures unequivocally.

Contributions. In view of the current scenario, our main contribution focuses on

the secure and eﬃcient software implementation of qDSA. On the security side,

we provide a veriﬁcation method that validates (without ambiguity) the correct

signature of a message, and we also analyze the overheads on space and time

introduced by our approach. On the eﬃciency side, we show a technique that ac-

celerates the key generation, signing, and veriﬁcation procedures. This speedup

A Secure and Eﬃcient Implementation of qDSA 3

was achieved as a consequence of employing precomputed look-up tables during

the evaluation of the right-to-left Joye’s algorithm [19], using a similar approach

to the one introduced by Oliveira et al. [29]. Due to the novelty of qDSA, there

is a need for an optimized implementation beyond the one developed by qDSA’s

authors [32]. For this reason, we focus on the development of a software library

that supports both 32-bit ARM processors (Cortex M4, Cortex A7 and Cor-

tex A15 micro-architectures) and 64–bit Intel processors (Haswell and Skylake

micro-architectures). For all of these architectures, we use optimized prime ﬁeld

arithmetic and elliptic curve operations leading to an eﬃcient and secure im-

plementation of the qDSA signature scheme. The source code is available at:

[http://github.com/armfazh/qdsa-space17 ].

Regarding the scalar point multiplication algorithm presented in [29], it re-

quires the use of points that are in small subgroups of the elliptic curve, i.e. low-

order points. An attacker can leverage the use of low-order points to weaken the

security of a implementation; for example, by means of side-channel attacks [9],

or by exploiting vulnerabilities on unsecure implementations, like the ones found

in some cryptographic currencies [37]. For this reason and as a side result, we

describe a technique that avoids low-order points during the calculation of scalar

point multiplications.

The remainder of this document is divided as follows. In Sect. 2, we review

the qDSA scheme and the parameters used in our implementation. In Sect. 3, we

show how to accelerate the calculation of ﬁxed-point multiplications. In Sect. 4,

we present a new veriﬁcation procedure. In Sect. 5, we report the results of the

performance benchmark of our software library. Finally, in Sect. 6, we point out

some concluding comments.

2 The Quotient Digital Signature Algorithm

The Quotient Digital Signature Algorithm (qDSA) is a Schnorr-like signature

scheme [35] that operates over a Kummer variety K. This variety comes from

the quotient of an elliptic (or hyper-elliptic) curve Eas K=E/h±1i, i.e. for the

case of elliptic curves, the points P, −P∈Eare mapped to a single element in

E/h±1i. Although this mapping does not preserve the group structure of E, it is

still possible to compute multiplications by integers. When qDSA is instantiated

with elliptic curves the Kummer variety resultant is a one-dimensional projective

space P1(Fp), also known as the x-line (see [6,32] for more details).

In this section, we revisit elliptic curve operations on Montgomery curves;

then, we detail the qDSA signature scheme together with the instance generated

from Curve25519’s parameters.

2.1 Arithmetic of Montgomery Curves

Let Fpbe a prime ﬁeld, a Montgomery elliptic curve is deﬁned over Fpas:

EA,B /Fp:By2=x3+Ax2+x , (1)

4 Faz-Hernández, Fujii, Aranha, López

where A, B ∈Fp,A26= 4, and B6= 0. The set of solutions of this equation

forms a commutative group having as identity the element O, which is known

as the point at inﬁnity. Hence, given two points Pand Q, we can obtain a third

point Rsuch that R=P+Q. The inverse of a point P= (x, y)is obtained as

−P= (x, −y). For these curves, the order of the group is always divisible by

four [22]. Given an n-bit integer kand a point P, the scalar point multiplication

is deﬁned as kP =sgn(k)Pn−1

i=0 2ikiP, where kiis the i-th bit of |k|.

For adding points, Montgomery found eﬃcient formulas that operate over

the x-coordinate of points [22]. In order to apply these operations the elliptic

curve must be embedded on a projective space. Let P2(Fp)be a projective space

of dimension two, then the projective representation of a point P= (xP, yP)

is (λXP:λYP:λZP), such that λ6= 0,xP=XP/ZP, and yP=YP/ZP. Mont-

gomery noted that, in the projective space, a point addition can be calculated

using only the x-coordinate of the points. Therefore, the following function maps

elliptic curve points to elements in the Kummer variety E/h±1ias follows:

E→E/h±1i∼

=P1(Fp)

(XP:YP:ZP)7→ (XP:ZP)

O 7→ (1: 0)

.(2)

Let P= (XP:ZP)and Q= (XQ:ZQ)be two points mapped into the Kum-

mer variety. Montgomery devised a formula for computing diﬀerential additions

(dadd); thus, given P,Q, and R=P−Q(all in projective coordinates) the

diﬀerential addition formula computes P+RQ= (XP+RQ:ZP+RQ)as follows:

XP+RQ=ZR(XPXQ−ZPZQ)2,

ZP+RQ=XR(XPZQ−ZPXQ)2.(3)

For the particular case when the points to be added are equal, we have a point

doubling (doub) denoted as 2P= (X2P:Z2P)and calculated as follows:

X2P= (XP2−ZP2)2,

Z2P= 4XPZP(XP2+AXPZP+ZP2).(4)

Based on (3) and (4), Montgomery also introduced an algorithm for comput-

ing scalar point multiplications. The well-known Montgomery ladder algorithm

(Alg. 1) computes the x-coordinate of kP , given the x-coordinate of Pand an

n-bit integer scalar k. The cost of Alg. 1 is mainly determined by the number

of operations performed in each iteration; hence, Montgomery ladder algorithm

takes one doubling operation and one diﬀerential addition per bit of k.

Algorithm 1 uses an auxiliary function cswap(b, U, V ), which interchanges

the values of Uand Vwhenever b= 1, otherwise points are not modiﬁed. Since

this function could introduce a time variability in its execution, cswap must

be securely implemented by adding countermeasures that prevent of, for exam-

ple, timings attacks [4,20]. Consequently, we implemented cswap using Boolean

operations; thus, assuming Uand Vare n-bit strings cswap is computed as

A Secure and Eﬃcient Implementation of qDSA 5

Algorithm 1 Montgomery Ladder Algorithm.

Input: k∈Zsuch that k > 0, and P= (XP:ZP).

Output: kP = (XkP :ZkP ).

1: Let (kn−1= 1,...,k0)2be the binary representation of k.

2: Initialize Q0←2P,Q1←P.

3: for i←n−2to 0do

4: (Q0, Q1)←cswap(ki⊕ki+1, Q0, Q1)

5: (Q0, Q1)←doub(Q0),dadd(Q0, Q1, P )⁄⁄Q0←2Q0, Q1←Q0+PQ1

6: end for

7: (Q0, Q1)←cswap(k0, Q0, Q1)

8: return Q0⁄⁄Return also Q1for y-coordinate recovery.

follows:

(U0, V 0) = cswap(b, U, V )

=(¬M∧U)⊕(M∧V),(M∧U)⊕(¬M∧V),(5)

where Mis an n-bit mask initialized to (111 . . . 1)2, i.e. nones, if b= 1; otherwise

M= (000 . . . 0)2, i.e. nzeros.

2.2 Instantiating qDSA with Montgomery Curves

Domain Parameters of qDSA. Given an integer number N, the size of public

keys is ﬁxed to Nbits and the signature’s size is 2Nbits. The following set

represents the domain parameters of the signature scheme:

D={N, p, EA,B , `, G, H },(6)

where: pis a large prime number such that N≈log2(p),EA,B is a Montgomery

elliptic curve deﬁned over Fp, this curve has a large prime subgroup of order `,

Gis a point of order `, and His a hash function producing 2N–bit digests.

A qDSA Instance. Due to the performance features oﬀered by the elliptic curve

named Curve25519 [1], it can also be used to produce an eﬃcient instance of

qDSA; thus, Dis speciﬁed as:

–Since p= 2255 −19, we have N= 256.

–The Curve25519 is deﬁned over Fpas E486662,1.

–This curve forms a group of order 8`, where

`= 2252 + 27742317777372353535851937790883648493 (7)

is a prime number.

–A point G= (xG, yG)of order `is ﬁxed as xG= 9 and yG=√39420360 ∈Fp

such that yGis odd.

–Regarding the cryptographic hash function, the authors of qDSA selected

an extendable-output function belonging to the Secure Hash Algorithm v3

(SHA3) standard [26]; therefore, they selected Has the SHAKE128 function

ﬁxing its output size to 512 bits.

6 Faz-Hernández, Fujii, Aranha, López

2.3 Digital Signature Operations

The qDSA scheme consists of three algorithms: key generation (Alg. 2), signature

generation (Alg. 3), and signature veriﬁcation (Alg. 4). This latter procedure

requires an auxiliary function (Alg. 5) that it will be revised in Sect. 4.

Algorithm 2 Key generation.

Input: D, the domain parameters.

Output: (d0, d1)∈ {0,1}2Nis a private

key, and xQ∈Fpis a public key.

1: d$

←− {0,1}N

2: (h2N−1,...,h0)2←H(d)

3: d0←(h2N−1,...,hN)2

4: d1←(hN−1,...,h0)2

5: Q= (XQ:ZQ)←d0G⁄⁄Alg. 7.

6: xQ←XQ/ZQ

7: return (d0, d1)and xQ

Algorithm 3 Signature generation.

Input: (d0, d1)and xQare the signer’s

keys; and M∈ {0,1}∗is a message.

Output: (xRks)is the signature of M,

where xR∈Fpand s∈ {0,1}N.

1: r←H(d1kM)mod `

2: R= (XR:ZR)←rG ⁄⁄Alg. 7.

3: xR←XR/ZR

4: h←H(xRkxQkM)

5: s←r−hd0mod `

6: return (xRks)

Algorithm 4 Signature veriﬁcation.

Input: xQis the public key of the signer,

(xRks)is a signature, and

M∈ {0,1}∗is a message.

Output: True, if the signature is valid;

otherwise, False.

1: Q←(xQ: 1)

2: h←H(xRkxQkM)mod `

3: R0←sG ⁄⁄Alg. 7.

4: R1←hQ ⁄⁄Alg. 1.

5: return Check(xR, R0, R1)⁄⁄Alg. 5.

Algorithm 5 Check xR∈ {x(P±Q)}.

Input: xR∈Fp, and (P, Q) are elliptic

curve points in projective coordinates.

Output: True, if xR∈ {x(P±Q)}; oth-

erwise, False.

1: Let f(x)←f2x2+f1x+f0such that

fiare deﬁned as in Equation (10).

2: if f(xR) = 0 then

3: return True

4: else

5: return False

6: end if

By analyzing the elliptic curve operations required by qDSA, it was noted

that the running time is dominated by the computation of scalar point multipli-

cations. Consequently, we focused on the acceleration of this operation. Notice

that a multiple of the base point Gis calculated in each qDSA operation. Since

Gis ﬁxed for the entire scheme, then we can precompute a table that stores some

multiples of G. Hence, a scalar multiplication algorithm can be modiﬁed to look

up in the table and to retrieve multiples of Gfor calculating kG; this scenario

is commonly known as a ﬁxed-point multiplication, and it will be addressed in

the next section.

3 Accelerating Fixed-Point Multiplications

In the open literature, there exist specialized algorithms that accelerate the cal-

culation of ﬁxed-point multiplications. In the general setting, the most used

algorithm is the Comb technique [21], which arranges the bits of kin a matrix

A Secure and Eﬃcient Implementation of qDSA 7

form, then the point multiplication algorithm interprets bit-columns as indexes

to look up in the precomputed table. Several ﬁxed-point multiplication algo-

rithms were derived from the Comb technique, for example [10,11,14,15], among

others.

Comb-based algorithms have in common that indexes are directly derived

from the bits of the scalar. This implies that when the scalar is secret, every

access to the look-up table must be protected; otherwise, an attacker could

extract some bits of the scalar by correlating variations in the latency of access

to the cache memory. This kind of attack is known as a cache attack [40], which

in practice have been a successful method for recovering secret keys from insecure

implementations of tabled-based algorithms.

A common countermeasure to protect look-up table queries consists on using

a uniform accessing pattern. Hence, in spite of it occurs variations on the latency

of cache memory accessing, the attacker will not be able to determine from which

part of the table the requested entry was retrieved. However, in some cases

the cost of adding countermeasures impacts negatively on the performance of

point multiplication. A desirable solution for this scenario would be an algorithm

that uses non-secret indexes for accessing to the look-up table. In the following

section, we will show an algorithm that satisﬁes these conditions.

3.1 A Fixed-Point Multiplication Algorithm with Non-Secret

Indexes

In 2007, Joye presented right-to-left algorithms to compute scalar point multipli-

cations [19]. As their name suggests, these algorithms scan the bits of the scalar

from the least- to the most-signiﬁcant bit, unlike conventional methods such as

the double-and-add algorithm or the Montgomery ladder algorithm. Moreover,

Joye’s algorithm uses a regular execution pattern of elliptic curve operations

and without using dummy operations, these features aid on the prevention of

timings attacks [20] and fault-based attacks [42,3]. Joye’s algorithm has been ap-

plied on the implementation of both Weierstrass curves [13] and Koblitz binary

curves [28,38].

More recently, Oliveira et al. [29] adapted the right-to-left Joye’s algorithm

to use precomputed look-up tables with the purpose of accelerating ﬁxed-point

multiplications (see Algorithm 6). The central operation of Algorithm 6 is to

add some precomputed multiples of Gin two accumulators, namely Q0and Q1.

The bits of the scalar kdetermine which accumulator must be updated in such

a way that, at the i-th iteration, Algorithm 6 accumulates the point 2iGinto Q0

using a diﬀerential addition (with Q1as the diﬀerence) whenever ki⊕ki−1= 0;

otherwise, it accumulates 2iGinto Q1also using a diﬀerential addition (but

this time with Q0as the diﬀerence). Observe that Algorithm 6 is composed of

evaluations of diﬀerential additions, since no point doublings are required at all.

Notice that in either case, one operand of the diﬀerential addition is known in

advance. Hence, assuming Qis the known point, the diﬀerential addition can be

calculated saving one multiplication (as it was proposed in [29]). Let R=P−Q

8 Faz-Hernández, Fujii, Aranha, López

Algorithm 6 Right-to-left ﬁxed-point multiplication algorithm (cf. [29]).

Input: (k, G, S), where k∈Z`and k6= 0;Gis a point of order `; and Sis a point of

order 4 such that S /∈ hGi.

Precomputation: A look-up table storing (µ0,...,µn−1)as deﬁned in Eq. (9).

Output: 8kG = (X8kG :Z8kG ).

1: Let (kn−1,...,k0)2be the n-bit binary repr. of ksuch that n=blog2(`)c+ 1.

2: Initialize Q0←S,Q1←G−S, and deﬁne k−1= 0.

3: for i←0to n−1do

4: (Q0, Q1)←cswap(ki⊕ki−1, Q0, Q1)

5: Q0←dadd*(µi, Q0, Q1)⁄⁄Q0←Q0+Q12iG

6: end for

7: Q1←doub(Q1)

8: Q1←doub(Q1)

9: Q1←doub(Q1)

10: return Q1

and µ= (xQ+ 1)(xQ−1)−1∈Fp; then, we denote with dadd* the following

formula:

XP+RQ=ZR[(XP+ZP) + µ(XP−ZP)]2,

ZP+RQ=XR[(XP+ZP)−µ(XP−ZP)]2.(8)

To compute scalar point multiplications Algorithm 6 requires a precomputed

table storing one entry per bit of the scalar. Let n=blog2(`)c+ 1, then the

look-up table will store the values (µ0, . . . , µn−1), where µiis deﬁned as:

µi= (xi+ 1)(xi−1)−1∈Fp,such that (xi, yi)=2iG. (9)

Remark 3.1. To retrieve a point from the look-up table, the index used is actually

a counting variable, and most importantly, this index is not derived from the

secret scalar. Thus, a query is performed by directly choosing the correspondent

value from the table. This enables a faster execution in contrast to Comb-based

methods which require a secure (and sometimes costly) look-up table accessing.

By using Oliveira et al.’s algorithm, we expect an increase on the performance

of ﬁxed-point multiplications. Note that in each iteration, only one diﬀerential

addition is processed in contrast with the (left-to-right) Montgomery ladder

and the right-to-left Joye’s algorithm, which require an extra point doubling

per iteration. Before applying Oliveira et al.’s algorithm in the calculation of

ﬁxed-point multiplications, in the following section, we will introduce a set of

modiﬁcations to avoid the use of low-order points.

3.2 Circumventing the Use of Low-Order Points

Attention is required during the initialization of the accumulators Q0and Q1

in Algorithm 6, since the formula for diﬀerential point addition is not complete.

This means that for adding P+RQsuch that R=P−Q, the diﬀerential addition

formula fails whenever R∈ {O,(0,0)}.

A Secure and Eﬃcient Implementation of qDSA 9

We recall that the goal of Algorithm 6 in Oliveira et al.’s work [29] is to

calculate the point 8kG required by the Diﬃe-Hellman X25519 function. For

this reason, Algorithm 6 initializes accumulators with Q0←Sand Q1←G−S

such that S /∈ hGi. For the case of Curve25519, Swas chosen as a point of

order four (i.e. 4S=O). Thus, Algorithm 6 will compute S+kG, and after

applying three consecutive point doublings, the point Swill vanish resulting in

8kG. Although this procedure is correct, some vulnerabilities could appear due

to a misuse of low-order points [9,37]. Therefore, it is imperative to protect the

implementation against this potential threat.

To avoid the use of low-order points, we show a technique that accomplishes

this requirement. Our technique relies on the observation that if the order of

Gis odd, like in the case of Curve25519; then, the point Sis not required any

more. Notice that replacing Sby Oin Algorithm 6 causes a failure when the

least-signiﬁcant bit of kis zero; nonetheless, it always computes the correct point

multiplication whenever kis odd. This observation indicates that Algorithm 6

with S=Ocomputes scalar point multiplications only for odd scalars. There-

fore, we introduce a modiﬁcation in Algorithm 6 that supports even and odd

scalars, and avoids using low-order points.

Let `be the order of G. The key observation is that if `is odd, then the

parity of an element in {1, . . . , `−1}determines a bijection between the disjoint

sets of even and odd elements.

Proposition 3.1. Let `be an odd number. For any value asuch that 0< a < `

deﬁne b=`−a; we have that if ais even, then bis odd.

Proof. First, note that bis bounded as 0< b < `. Since a<`, then b=`−a > 0.

Suppose b≥`, then by the deﬁnition of bwe have that `−a≥`, i.e. a≤0, which

is a contradiction, since a > 0; thus, 0< b < `. Now, since `is odd and ais even,

then there exist some i, j ∈Zsuch that b=`−a= 2i+ 1 −2j= 2(i−j)+1;

showing that bis odd. ut

Using this proposition, we can calculate kG as k0G, for k0=`−k, whenever

the scalar kis even. Note that if this operation was computed using points in

the aﬃne space, then the point k0Gmust be inverted to obtain kG. Fortunately,

this is not required since we are operating with elements in the Kummer variety,

which maps kG and k0Gto the same element in E/h±1i. All of these observations

led to Algorithm 7, which supports both even and odd scalars, and does not

require low-order points in the computation of the ﬁxed-point multiplication.

Among the changes made, Algorithm 7 starts by computing r=`−kand

then selects the scalar between rand k. This selection could introduce a time

variability in its execution, and consequently, it must be processed using a regular

execution pattern. This task can be achieved using the cswap function as shown

in line 2 of Algorithm 7. Thus after computing a conditional swapping, rwill be

odd allowing to start the main-loop from the second iteration.

Finally, we apply Algorithm 7 to compute multiples of Gduring the qDSA

signature scheme. Since the ﬁxed-point multiplication appears in all operations

of the qDSA scheme, we improve the running time of the entire scheme. Sect. 5

reveals the impact on performance obtained by our software implementation.

10 Faz-Hernández, Fujii, Aranha, López

Algorithm 7 Our proposed right-to-left ﬁxed-point multiplication algorithm

without using low-order points.

Input: (k, G), where k∈Z`and k6= 0; and Gis a point of odd-order `.

Precomputation: A look-up table storing (µ0,...,µn−1)as deﬁned in Eq. (9).

Output: kG = (XkG :ZkG ).

1: r←`−k

2: (k, r)←cswap(k0, k, r)

3: Let (rn−1,...,r0= 1)2be the n-bit binary repr. of rsuch that n=blog2(`)c+ 1.

4: Initialize Q0←G,Q1←G.

5: for i←1to n−1do

6: (Q0, Q1)←cswap(ri⊕ri−1, Q0, Q1)

7: Q0←dadd*(µi, Q0, Q1)⁄⁄Q0←Q0+Q12iG

8: end for

9: (Q0, Q1)←cswap(rn−1, Q0, Q1)

10: return Q1⁄⁄Return also Q0for y-coordinate recovery.

4 A New qDSA Signature Veriﬁcation Method

Given an alleged signature (xRks)of a message M, the qDSA signature veri-

ﬁcation procedure must determine whether xRis the x-coordinate of R0+R1,

where R0=sG and R1=hQ for hdeﬁned as in Algorithm 4. For that purpose,

the authors of qDSA provided Algorithm 5, which checks a weaker relation. Such

a method accepts the signature whenever f(xR) = 0, where fis the quadratic

polynomial f(x) = f2x2+f1x+f0, such that:

f2= (xR0−xR1)2,

f1=−2(xR0xR1+ 1)(xR0+xR1)−4A xR0xR1,

f0= (xR0xR1−1)2.

(10)

This method works since one of the roots of fis xR, however one disadvantage

of this approach is that there is another value x0that also passes the veriﬁca-

tion procedure. Speciﬁcally, x0is the other root of fand corresponds to the

x-coordinate of R0−R1. Therefore, Mhas another valid signature (x0ks).

Although a low adversarial advantage can be exploited from this relaxed

veriﬁcation method, it has a high risk to introduce a misuse of the cryptographic

scheme, such as the ones reported in [7,8,16]. To avoid potential issues in future

implementations, we looked for an eﬃcient method that veriﬁes qDSA signature

of a message unequivocally.

4.1 Unequivocal Techniques for Signature Veriﬁcation

Let xSand xDbe the x-coordinate of R0+R1and R0−R1, respectively. Given

an alleged signature (xRks), we look for a relation that allows us to deter-

mine whether xR=xSfrom the coordinates of R0and R1, instead of verifying

whether xR∈ {xS, xD}as Algorithm 5 does. Thus, inspired by Montgomery’s

A Secure and Eﬃcient Implementation of qDSA 11

insights [22], we derive the following equivalences:

xS+xD=β/α , (11)

xS×xD=γ/α , (12)

xS−xD=δ/α , (13)

such that α,β,γand δare deﬁned as follows1:

α= (xR0−xR1)2,

β= 2(xR0xR1+ 1)(xR0+xR1)+4A xR0xR1,

γ= (xR0xR1−1)2,

δ=−4ByR0yR1.

(14)

The coeﬃcients of fcan be derived by solving Eq. (11) for xD, and plugging

in this into Eq. (12), what results in a second-degree polynomial function of

xS. Thus, fcan also be written as f(x) = αx2−βx +γ. We note that solving

Eq. (11) for xSand substituting this into Eq. (12) yields into a second-degree

polynomial function of xDthat has the same coeﬃcients as f. This means that

both xSand xDare the roots of f. Therefore, fdoes not help to distinguish

between xSand xD.

Our key idea is to obtain a (linear) polynomial that has a zero in xS. For that

end, we start by solving Eq. (13) for xSand substituting this into Eq. (12); thus

we obtain g0(x) = αx2−δx −γ. Analogously, we apply the same procedure, but

this time solving for xD, and we obtain g1(x) = αx2+δx−γ. So far, we have that

g06=g1, which means that by using g0, we are now able to distinguish between

xSand xD, since g0(xS) = 0 and g0(xD)6= 0. However, g0has zeros in xSand

in −xD. Now, using f(x) = (x−xS)(x−xD)and g0(x)=(x−xS)(x+xD), we

show how to unequivocally identify xS. Note that f(xS)=0and g0(xS)=0;

therefore, we deﬁne:

h0(x)=(f+g0)/x = 2αx −δ−β , and

h1(x) = f−g0= (δ−β)x+ 2γ , (15)

such that xSis a zero of both h0and h1. Listing 4.1 shows a SageMath [31]

computer script that validates the formulas used in this section. In summary,

either h0or h1aids to determine the validity of an alleged signature.

Our signature veriﬁcation method proceeds as follows: given (xRks), it cal-

culates α,β, and δfrom the coordinates of R0and R1; then, it declares a

signature as valid if h0(xR) = 0 (alternatively, it calculates γinstead of αand

accepts the signature if h1(xR) = 0). We have shown two relations that allow to

verify a signature unequivocally.

4.2 Trade-oﬀ Analysis of Our Signature Veriﬁcation Method

In contrast to the original signature procedure, our method requires calculating

the δterm, which implies the knowledge of the y-coordinate of both R0=sG

and R1=hQ.

1To avoid inversions, these terms can also be calculated using projective coordinates.

12 Faz-Hernández, Fujii, Aranha, López

1QQ = Rationals()

2 R.<x1,y1,x2,y2,A,B> = PolynomialRing(QQ,6,"x1,y1,x2,y2,A,B")

3 I = R.ideal([

4 B*y1**2-x1**3-A*x1**2-x1,

5 B*y2**2-x2**3-A*x2**2-x2 ])

6 FQuo = Frac(R.quotient(I))

7 evaluate = lambda F,X: FQuo(F.subs(x=X).rational_simplify())

8

9def addMontgomery(X1,Y1,X2,Y2):

10 global A, B

11 Xs = B*((Y1-Y2)/(X1-X2))**2-A-X1-X2

12 Ys = (2*X1+X2+A)*(Y2-Y1)/(X2-X1)-B*(Y2-Y1)**3/(X2-X1)**3-Y1

13 return Xs,Ys

14

15 xs,ys = addMontgomery(x1,y1,x2,y2)

16 xd,yd = addMontgomery(x1,y1,x2,-y2)

17

18 alpha = (x1-x2)**2

19 betta = 2*(x1*x2+1)*(x1+x2)+4*A*x1*x2

20 gamma = (x1*x2-1)**2

21 delta = -4*B*y1*y2

22

23 relAdd = FQuo(xs+xd)

24 relPro = FQuo(xs*xd)

25 relDif = FQuo(xs-xd)

26 # Verifying Relations

27 assert( relAdd == betta/alpha )

28 assert( relPro == gamma/alpha )

29 assert( relDif == delta/alpha )

30 # Renes&Smith’s f polynomial and testing its zeros

31 f = alpha*x**2-betta*x+gamma

32 assert( evaluate(f,xs) == evaluate(f,xd) == 0 )

33 # Defining g0 and g1 and testing their zeros

34 g0 = alpha*x**2-delta*x-gamma

35 g1 = alpha*x**2+delta*x-gamma

36 assert( evaluate(g0, xs) == evaluate(g0,-xd) == 0 )

37 assert( evaluate(g1,-xs) == evaluate(g1, xd) == 0 )

38 # Defining h0 and h1 and testing their zeros

39 h0 = 2*alpha*x-delta-betta

40 h1 = (delta-betta)*x+2*gamma

41 assert( evaluate(h0,xs) == evaluate(h1,xs) == 0 )

Listing 4.1: SageMath script for the validation of formulas in Q.

One can use the Okeya-Sakurai’s [27] method for recovering the y-coordinate

of R0=sG and R1=hQ. This technique requires some auxiliary points, namely

R2= (s+ 1)Gand R3= (h+ 1)Q, which are also computed by the Montgomery

ladder algorithm (Alg. 1). Thus, following Theorem 2 of [27], we have:

yR0= [(xR0xG+ 1)(xR0+xG+ 2A)−2A−(xR0−xG)2xR2](2ByG)−1,

yR1= [(xR1xQ+ 1)(xR1+xQ+ 2A)−2A−(xR1−xG)2xR3](2ByQ)−1;(16)

then, δcan be written as δ=−4ByR0yR1= (ByGyQ)−1T, where Tis:

T=−(xR0xG+ 1)(xR0+xG+ 2A)−2A−(xR0−xG)2xR2

×(xR1xQ+ 1)(xR1+xQ+ 2A)−2A−(xR1−xG)2xR3.(17)

A Secure and Eﬃcient Implementation of qDSA 13

Algorithm 8 Unequivocally qDSA Veriﬁcation Procedure.

Input: (xRks)is a signature, M∈ {0,1}∗is a message, and (xQkyQ(0))is the public

key of the signer.

Constants: (xG, yG)are the aﬃne coordinates of the generator G∈EA,B .

Output: True, if the signature is valid; otherwise, False.

1: h←H(xRkxQkM) mod `

2: Q←(xQ: 1),R0←sG,R1←hQ

3: {y0, y00 } ← ±pB−1(xQ3+AxQ2+xQ)∈Fp.

4: Set yQ←y0, if y0≡yQ(0) mod 2; otherwise, yQ←y00 .

5: Calculate α,β, and δas in Eq. (14).

6: if h0(xR) = 0 then ⁄⁄h0as deﬁned in Eq. (15).

7: return True

8: else

9: return False

10: end if

The most important thing to be noticed here is that yGyQmust be known

by the veriﬁer. There are several alternatives to obtain such value:

–The simplest one is to append yGyQ(or (ByGyQ)−1) to the public key; hence

the calculation of δis straightforward, however the public-key’s size doubles.

–Alternatively, the public key could contain an extra bit yQ(0), which is

the least-signiﬁcant bit of yQ; thus, the veriﬁcation procedure calculates

{y0, y00 }=±pB−1(xQ3+AxQ2+xQ); then, if y0≡yQ(0) mod 2, it sets

yQ←y0; otherwise it assigns yQ←y00. After that, it calculates yGyQ. Note

that yGmust be also known, fortunately, this is a ﬁxed parameter of the

scheme. This method has the advantage that the public key size is not in-

creased signiﬁcantly; for example using Curve25519, (xQkyQ(0))ﬁts in 256

bits. However, the cost of veriﬁcation increases by computing one square-root

and a few multiplications. This approach is summarized in Algorithm 8.

We want to remark that for verifying a qDSA signature unambiguously, it is

mandatory that the veriﬁcation method knows the y-coordinate of G(which is

a ﬁxed parameter) and the y-coordinate of Qas inputs.

5 Performance Results and Comparisons

We focused on the development a software library that supports the 32-bit ARM

architecture, which is designed for embedded devices, and the 64-bit Intel archi-

tecture, which is wide-spread distributed from commodity computers to high-end

servers. For measuring execution times, we use the clock cycle counter available

in each architecture. Besides that on Intel processors, the advanced hardware

technologies Intel Turbo Boost, Intel Speed Step, and Intel Hyper-Threading

were disabled to obtain stable and reproducible measurements.

14 Faz-Hernández, Fujii, Aranha, López

5.1 Performance of Prime Field Arithmetic

For the arithmetic operations over F2255−19, we use an optimized library for

Cortex M4 ARM-based processors taken from [12]; and for the 64-bit Intel pro-

cessors, we use the optimized library available in [29]. In Table 1, we summarize

the clock cycle measurements of the arithmetic operations.

Table 1. Latency (in clock cycles) of the arithmetic operations on F2255−19 . The last

columns list the ratio of the latency between square and multiplication, and the ratio

between inversion and multiplication.

Archi-

tecture

Micro-

architecture

Processor

Model

Arithmetic Operations Ratios

Add Mul Sqr Inv Sqrt S/M I/M

32-bit

Cortex M4 Teensy 3.2 85 278 250 66,637 132,416 0.90 239.7

Cortex A7 Odroid XU4 49 290 233 63,095 132,785 0.80 217.6

Cortex A15 Odroid XU4 36 225 139 41,978 97,242 0.62 186.6

64-bit Intel Haswell Core i7-4770 8 64 48 14,925 29,344 0.75 233.2

Intel Skylake Core i7-6700K 6 48 39 11,090 22,598 0.81 231.0

The 32–bit implementation of the integer multiplier uses the full consecutive

operand caching technique [36], which in turn utilizes multiply-and-accumulate

instructions (UMLAL/UMAAL instructions). The scheduling of these instructions

was ordered in such a way that reduces the presence of carry values during the

evaluation of the product. The 64–bit implementation of the integer multiplier

followed the operand scanning technique, which is highly compatible with the

MULX instruction. For Skylake, the latency of the multiplier was improved even

more, by using the newest integer addition instructions (ADCX/ADOX instructions).

5.2 Performance of Our Optimized Implementation of qDSA

First of all, we want to highlight the acceleration introduced by the right-to-left

ﬁxed-point multiplication algorithm presented in Sect. 3. To that end, we mea-

sured the percentage of improvement introduced by Algorithm 7 in the execution

time of the qDSA operations. Table 2 shows the timings obtained on a Cortex

M4 and on an Intel Haswell processor.

As it can be noted, the timings for computing qDSA operations were sig-

niﬁcantly reduced; the impact was more evident on the key generation and the

signing procedures achieving, respectively, a 35-40% and 30-34% reduction in

the execution time. Likewise the veriﬁcation procedure was accelerated by 19%.

Regarding memory footprint, the last row of Table 2 shows the overhead in-

troduced by integrating the use of precomputation. The code’s size (including the

8 KB table stored in ROM) of our implementation was increased by around 36%

and 44% on the 64-bit and 32-bit platforms, respectively. We recall that compu-

tations aided by precomputation always incur on trade-oﬀs between space and

time; hence, the best approach will depend on several engineering aspects.

A Secure and Eﬃcient Implementation of qDSA 15

Table 2. Performance comparison of the qDSA operations by replacing the Mont-

gomery ladder algorithm (Alg. 1) by the right-to-left ﬁxed-point multiplication algo-

rithm (Alg. 7). For each processor, the third column shows the percentage of improve-

ment achieved. Entries represent 103clock cycles, except the last row.

Processor ARM Cortex M4 Intel Haswell

Scalar point mult. Alg. 1 Alg. 7 Savings Alg. 1 Alg. 7 Savings

Key Generation 927.9 604.9 34.8% 171.5 103.8 39.5%

Signing 1,059.1 736.2 30.5% 197.3 130.1 34.1%

Veriﬁcation 1,746.2 1,422.8 18.5% 347.3 279.5 19.5%

Code size (bytes) 20,898 30,058 -43.8% 30,037 41,000 -36.4%

Table 3. Summary of the performance rendered by our optimized implementation.

Table entries show the latency, reported in 103clock cycles, of each qDSA operation.

qDSA Operation ARM (32-bit) Intel (64-bit)

Cortex M4 Cortex A7 Cortex A15 Haswell Skylake

Key Generation 604.9 538.8 366.5 103.8 86.8

Signing 736.2 652.1 422.7 130.1 114.6

Veriﬁcation (Alg. 4) 1,422.8 1,271.7 870.6 279.5 231.1

Veriﬁcation (Alg. 8) 1,555.2 1,404.4 967.8 309.6 253.5

The inclusion of the optimized prime ﬁeld arithmetic in conjunction with

the use of the ﬁxed-point multiplication algorithm reduced considerably the ex-

ecution time in comparison to the original implementation given by qDSA’s

authors [32]. In Table 3, we summarize the timings of our qDSA implementation

measured in several ARM and Intel platforms.

Table 3 also shows the latency of the proposed veriﬁcation method (Alg. 8)

described in Sect. 4. Recall that our method must calculate one square-root and

a few multiplications to recover the y-coordinate of the public key. The use of

our method has an overhead increment from 8% to 10% in the execution time.

This timing penalty is compensated by the security beneﬁts that our veriﬁcation

method provides, besides it prevents some issues that could appear in future

applications of qDSA.

In Table 4, we show a performance comparison of qDSA with other digi-

tal signature algorithms. As can be seen, the qDSA’s signing procedure has a

better performance than RSA and DSA signature schemes. In addition, qDSA

generates signatures as fast as ECDSA does; however, the qDSA’s veriﬁcation

procedure is faster than ECDSA’s veriﬁcation. This positions qDSA as a more

eﬃcient alternative for deploying digital signatures in contrast with standardized

signature algorithms.

From the comparison table, one can observe that, in both architectures, the

calculation of Ed25519 signatures is approximately twice as fast as the calcula-

tion of qDSA signatures. One of the reasons for this performance gap relies on

16 Faz-Hernández, Fujii, Aranha, López

Table 4. Performance comparison of qDSA and other digital signature schemes.

Signature

Scheme Instance 32-bit ARM Cortex A7 64-bit Intel Haswell

Sign/sec Verify/sec Sign/sec Verify/sec

RSAa2048 41.3 1,596.9 1,618 36,576

DSAa2048 146.3 137.9 2,071 1,883

ECDSAaP-256 940.5 250.7 25,344 10,198

EdDSA Ed25519 3,414.6b1,840.9b48,701c17,167 c

qDSAdCurve25519 2,148.0 1,001.6 25,109 12,109

aTimings taken using OpenSSL library (v.1.0.2) [39].

bMoon’s implementation [23] using the prime ﬁeld arithmetic from [12].

cMoon’s implementation [23] compiled for 64-bit architectures.

dThis work.

the properties of the elliptic curve model used by each scheme, which imposes

certain limitations on the point multiplication algorithms.

On Edwards curves, the point addition formula is complete and uniﬁed. This

allows to associate point additions in many diﬀerent ways, like in the Comb-based

algorithms; and because of that, the ﬁxed-point multiplication algorithms for

Edwards curves have more degrees of freedom on their construction. For example,

it allows the use of larger look-up tables; this property has been reﬂected in state-

of-the-art implementations of Ed25519; for instance, Moon’s [23] implementation

uses a look-up table of 24 KB, whereas Chou’s [5] implementation increased look-

up table’s size to 30 KB for further speed up.

On the other hand, the point addition formula for Montgomery curves is not

complete, meanwhile the diﬀerential point addition depends on the coordinates

of an auxiliary third point. These facts restrict point multiplication algorithms to

be, in fact, addition-chain evaluations; for example, the Montgomery ladder algo-

rithm (Alg. 1) or the right-to-left Joye’s algorithm [19]. With the introduction of

precomputation in the right-to-left method, the look-up table size depends now

on the size of `(the order of the main elliptic curve subgroup), since the look-up

table stores the sequence (µ0, . . . , µn−1)where n=blog2(`)c+ 1. Thus, for the

case of Curve25519, the look-up table used in our implementation is not larger

than 8 KB, which is a third of the table size used in Ed25519’s implementations.

Alternatively, qDSA can be also implemented using Edwards curves (through

a birational equivalence with Montgomery curves [6]) for obtaining a perfor-

mance closer to the Ed25519’s one; however, note that our implementation uses

a smaller look-up table, which is a relevant factor that must be noticed when

targeting memory-constrained architectures. We left the Edwards approach as a

future work.

6 Closing Remarks

The novel Quotient Digital Signature Algorithm was designed with the aim to

provide key compatibility with Diﬃe-Hellman functions based in Montgomery

A Secure and Eﬃcient Implementation of qDSA 17

curves. These curves are also employed for performing the signature operations

of qDSA; hence, the implementation of qDSA beneﬁts from reusing the prime

ﬁeld and the elliptic curve arithmetic that support the Diﬃe-Hellman protocol.

Like other elliptic curve based schemes, the performance-critical operation of

qDSA is the calculation of scalar point multiplications. To attend to this issue,

we revisited the ﬁxed-point multiplication proposed by Oliveira et al. [29]. One

advantage of this algorithm is the use of precomputed tables, which reduces

the execution time of point multiplications. However, this algorithm operates

with low-order points during its computation, and it must be recalled that an

improper utilization of these points could open a breach to vulnerabilities.

For that reason and with the aim to provide not only an eﬃcient but also

a secure implementation, we showed modiﬁcations on Oliveira et al.’s algorithm

that circumvent the use of low-order points. We noticed that whenever kis odd,

the x-coordinate of kP can be calculated without requiring low-order points;

and in the case kis even, the x-coordinate of −kP is calculated instead. In both

cases, the x-coordinate resultant will be the same, since in the Kummer variety,

scalar multiplication is performed regardless the scalar’s sign. Our observations

led to Algorithm 7 which computes ﬁxed-point multiplications on Montgomery

curves faster and does not require low-order points.

Additionally, we derived a new method to verify qDSA signatures unequiv-

ocally. Our method was inspired by Montgomery’s work and revealed than the

public key must contain not only the x-coordinate of Q, but also its y-coordinate;

with this information the veriﬁer will be able to validate signatures unequivo-

cally. This requirement introduces a trade-oﬀ between time and space. On the

one hand, if the public key contain both coordinates, then the veriﬁcation proce-

dure will remain as eﬃcient as the original method; however, the public key’s size

is increased to double. On the other hand, in order to avoid increasing the size

of keys, the y-coordinate can be encoded into a bit value; nonetheless, the exe-

cution time of the veriﬁcation procedure increases by 8-10% with respect to the

original method. We remark that opting by the either alternative enables the

unequivocally veriﬁcation of qDSA signatures, which further prevents against

potential vulnerabilities and the misuse of the original method.

According to the timings obtained in the performance benchmark, it can be

concluded that, for the evaluated platforms, qDSA can be considered as com-

petitive alternative for deploying digital signatures.

Acknowledgments. The authors want to thank the anonymous reviewers of

SPACE 2017 conference for the comments given to this research project.

References

1. Bernstein, D.J.: Curve25519: New Diﬃe-Hellman Speed Records. In Yung, M.,

Dodis, Y., Kiayias, A., Malkin, T., eds.: Public Key Cryptography - PKC 2006:

9th International Conference on Theory and Practice in Public-Key Cryptography,

New York, NY, USA, April 24-26, 2006. Proceedings, Berlin, Heidelberg, Springer

Berlin Heidelberg (April 2006) 207–228 https://doi.org/10.1007/11745853_14.

18 Faz-Hernández, Fujii, Aranha, López

2. Bernstein, D.J., Duif, N., Lange, T., Schwabe, P., Yang, B.Y.: High-speed high-

security signatures. Journal of Cryptographic Engineering 2(2) (September 2012)

77–89 http://dx.doi.org/10.1007/s13389-012-0027-1.

3. Biehl, I., Meyer, B., Müller, V.: Diﬀerential Fault Attacks on Elliptic Curve Cryp-

tosystems. In Bellare, M., ed.: Advances in Cryptology — CRYPTO 2000: 20th An-

nual International Cryptology Conference Santa Barbara, California, USA, August

20–24, 2000 Proceedings, Berlin, Heidelberg, Springer Berlin Heidelberg (August

2000) 131–146 https://doi.org/10.1007/3-540-44598-6_8.

4. Brumley, D., Boneh, D.: Remote Timing Attacks Are Practical. In: Proceedings of

the 12th Conference on USENIX Security Symposium, USENIX Association (Au-

gust 2003) 1–13 https://www.usenix.org/conference/12th-usenix-security-

symposium/remote-timing-attacks-are- practical.

5. Chou, T.: Sandy2x: New Curve25519 Speed Records. In Dunkelman, O., Ke-

liher, L., eds.: Selected Areas in Cryptography - SAC 2015: 22nd International

Conference, Sackville, NB, Canada, August 12-14, 2015, Revised Selected Pa-

pers, Cham, Springer International Publishing (August 2016) 145–160 http:

//dx.doi.org/10.1007/978-3-319-31301- 6_8.

6. Costello, C., Smith, B.: Montgomery curves and their arithmetic. Journal of Cryp-

tographic Engineering (Special Issue on Montgomery Arithmetic) (March 2017)

1–14 http://dx.doi.org/10.1007/s13389-017-0157-6.

7. Egele, M., Brumley, D., Fratantonio, Y., Kruegel, C.: An Empirical Study of

Cryptographic Misuse in Android Applications. In: Proceedings of the 2013

ACM SIGSAC Conference on Computer & Communications Security. CCS ’13,

New York, NY, USA, ACM (2013) 73–84 http://doi.acm.org/10.1145/2508859.

2516693.

8. Fahl, S., Harbach, M., Muders, T., Baumgärtner, L., Freisleben, B., Smith, M.:

Why Eve and Mallory Love Android: An Analysis of Android SSL (in)Security.

In: Proceedings of the 2012 ACM Conference on Computer and Communications

Security. CCS ’12, New York, NY, USA, ACM (2012) 50–61 http://doi.acm.org/

10.1145/2382196.2382205.

9. Fan, J., Gierlichs, B., Vercauteren, F.: To Inﬁnity and Beyond: Combined Attack on

ECC Using Points of Low Order. In Preneel, B., Takagi, T., eds.: Cryptographic

Hardware and Embedded Systems – CHES 2011: 13th International Workshop,

Nara, Japan, September 28 – October 1, 2011. Proceedings, Berlin, Heidelberg,

Springer Berlin Heidelberg (October 2011) 143–159 https://doi.org/10.1007/

978-3-642-23951- 9_10.

10. Faz-Hernández, A., Longa, P., Sánchez, A.H.: Eﬃcient and secure algorithms for

GLV-based scalar multiplication and their implementation on GLV–GLS curves

(extended version). Journal of Cryptographic Engineering 5(1) (Apr 2015) 31–52

https://doi.org/10.1007/s13389-014-0085-7.

11. Feng, M., Zhu, B.B., Zhao, C., Li, S.: Signed MSB-Set Comb Method for Ellip-

tic Curve Point Multiplication. In Chen, K., Deng, R., Lai, X., Zhou, J., eds.:

Information Security Practice and Experience: Second International Conference,

ISPEC 2006, Hangzhou, China, April 11-14, 2006. Proceedings, Berlin, Heidel-

berg, Springer Berlin Heidelberg (April 2006) 13–24 https://doi.org/10.1007/

11689522_2.

12. Fujii, H., Aranha, D.F.: Curve25519 for the Cortex-M4 and Beyond. In: Progress

in Cryptology – LATINCRYPT 2017: 5th International Conference on Cryptology

and Information Security in Latin America 2017, Proceedings. Lecture Notes in

Computer Science, Springer International Publishing (September 2017)

A Secure and Eﬃcient Implementation of qDSA 19

13. Goundar, R.R., Joye, M., Miyaji, A., Rivain, M., Venelli, A.: Scalar multiplication

on Weierstraß elliptic curves from Co-Z arithmetic. Journal of Cryptographic En-

gineering 1(2) (Aug 2011) 161 http://dx.doi.org/10.1007/s13389-011-0012-0.

14. Hamburg, M.: Fast and compact elliptic-curve cryptography. Cryptology ePrint

Archive, Report 2012/309 (May 2012) http://eprint.iacr.org/2012/309.

15. Hedabou, M., Pinel, P., Bénéteau, L.: A comb method to render ECC resistant

against Side Channel Attacks. Cryptology ePrint Archive, Report 2004/342 (De-

cember 2004) http://eprint.iacr.org/2004/342.

16. Jager, T., Schwenk, J., Somorovsky, J.: Practical Invalid Curve Attacks on TLS-

ECDH. In Pernul, G., Y A Ryan, P., Weippl, E., eds.: Computer Security –

ESORICS 2015: 20th European Symposium on Research in Computer Security,

Vienna, Austria, September 21-25, 2015, Proceedings, Part I, Cham, Springer

International Publishing (2015) 407–425 https://doi.org/10.1007/978-3-319-

24174-6_21.

17. Johnson, D., Menezes, A., Vanstone, S.: The Elliptic Curve Digital Signature

Algorithm (ECDSA). International Journal of Information Security 1(1) (August

2001) 36–63 http://dx.doi.org/10.1007/s102070100002.

18. Josefsson, S., Liusvaara, I.: Edwards-Curve Digital Signature Algorithm (EdDSA).

RFC 8032 (January 2017) https://dx.doi.org/10.17487/rfc8032.

19. Joye, M.: Highly Regular Right-to-Left Algorithms for Scalar Multiplication. In

Paillier, P., Verbauwhede, I., eds.: Cryptographic Hardware and Embedded Sys-

tems - CHES 2007: 9th International Workshop, Vienna, Austria, September 10-13,

2007. Proceedings, Berlin, Heidelberg, Springer Berlin Heidelberg (2007) 135–147

http://dx.doi.org/10.1007/978-3-540-74735- 2_10.

20. Kocher, P.C.: Timing Attacks on Implementations of Diﬃe-Hellman, RSA, DSS,

and Other Systems. In Koblitz, N., ed.: Advances in Cryptology — CRYPTO ’96:

16th Annual International Cryptology Conference Santa Barbara, California, USA

August 18–22, 1996 Proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg

(1996) 104–113 https://doi.org/10.1007/3-540-68697-5_9.

21. Lim, C.H., Lee, P.J.: More Flexible Exponentiation with Precomputation. In

Desmedt, Y.G., ed.: Advances in Cryptology — CRYPTO ’94: 14th Annual In-

ternational Cryptology Conference Santa Barbara, California Proceedings, Berlin,

Heidelberg, Springer Berlin Heidelberg (August 1994) 95–107 https://doi.org/

10.1007/3-540-48658-5_11.

22. Montgomery, P.L.: Speeding the Pollard and Elliptic Curve Methods of Factor-

ization. Mathematics of Computation 48(177) (January 1987) 243–264 http:

//dx.doi.org/10.2307/2007888.

23. Moon, A.: Implementations of a fast Elliptic-curve Digital Signature Algorithm.

https://github.com/floodyberry/ed25519-donna (March 2012)

24. NIST: Digital Signature Standard (DSS). Technical Report FIPS 186-1, National

Institute for Standards and Technology (December 1998)

25. NIST: Digital Signature Standard (DSS). Technical Report FIPS 186-2, National

Institute of Standards and Technology (January 2000) http://csrc.nist.gov/

publications/fips/archive/fips186-2/fips186-2.pdf.

26. NIST: SHA-3 Standard: Permutation-Based Hash and Extendable-Output Func-

tions. Technical Report FIPS-202, National Institute of Standards and Technology

(August 2015) http://dx.doi.org/10.6028/NIST.FIPS.202.

27. Okeya, K., Sakurai, K.: Eﬃcient Elliptic Curve Cryptosystems from a Scalar Mul-

tiplication Algorithm with Recovery of the y-Coordinate on a Montgomery-Form

Elliptic Curve. In Koç, Ç.K., Naccache, D., Paar, C., eds.: Cryptographic Hard-

ware and Embedded Systems — CHES 2001: Third International Workshop Paris,

20 Faz-Hernández, Fujii, Aranha, López

France, May 14–16, 2001 Proceedings, Berlin, Heidelberg, Springer Berlin Heidel-

berg (September 2001) 126–141 http://dx.doi.org/10.1007/3-540- 44709-1_12.

28. Oliveira, T., Aranha, D.F., López, J., Rodríguez-Henríquez, F.: Fast Point Multi-

plication Algorithms for Binary Elliptic Curves with and without Precomputation.

In Joux, A., Youssef, A., eds.: Selected Areas in Cryptography – SAC 2014: 21st

International Conference, Montreal, QC, Canada, August 14-15, 2014, Revised Se-

lected Papers, Cham, Springer International Publishing (August 2014) 324–344

http://dx.doi.org/10.1007/978-3-319-13051- 4_20.

29. Oliveira, T., López, J., Hışıl, H., Faz-Hernández, A., Rodríguez-Henríquez, F.: How

to (pre-)compute a ladder. In: Selected Areas in Cryptography – SAC 2017: 24th

International Conference, Ottawa, Ontario, Canada, August 16 - 18, 2017, Revised

Selected Papers, Springer International Publishing (August 2017)

30. Perrin, T.: The XEdDSA and VXEdDSA Signature Schemes. Technical re-

port, Open Whisper Systems (October 2016) https://whispersystems.org/docs/

specifications/xeddsa/xeddsa.pdf.

31. The Sage Developers: SageMath, the Sage Mathematics Software System (Version

7.6). (2017) http://www.sagemath.org.

32. Renes, J., Smith, B.: qDSA: Small and Secure Digital Signatures with Curve-based

Diﬃe-Hellman Key Pairs. In: Advances in Cryptology – ASIACRYPT 2017: 23nd

International Conference on the Theory and Application of Cryptology and Infor-

mation Security, Hong Kong, China, December 3-7, 2017, Proceedings. (December

2017)

33. Rescorla, E., Dierks, T.: The Transport Layer Security (TLS) Protocol Version

1.2. RFC 5246 (August 2008) https://dx.doi.org/10.17487/rfc5246.

34. Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures

and public-key cryptosystems. Communications of the ACM 21(2) (February 1978)

120–126 http://doi.org/10.1145/359340.359342.

35. Schnorr, C.P.: Eﬃcient signature generation by smart cards. Journal of Cryptology

4(3) (January 1991) 161–174 http://dx.doi.org/10.1007/BF00196725.

36. Seo, H., Kim, H.: Consecutive Operand-Caching Method for Multiprecision Mul-

tiplication, Revisited. Journal of Information and Communication Convergence

Engineering 13(1) (Mar 2015) 27–35 http://dx.doi.org/10.6109/jicce.2015.

13.1.027.

37. Spagni, R.: Disclosure of a Major Bug in CryptoNote Based Currencies. Announ-

ment on https://getmonero.org/2017/05/17/disclosure-of-a- major-bug-in-

cryptonote-based-currencies.html (May 2017)

38. Taverne, J., Faz-Hernández, A., Aranha, D.F., Rodríguez-Henríquez, F., Hanker-

son, D., López, J.: Speeding scalar multiplication over binary elliptic curves using

the new carry-less multiplication instruction. Journal of Cryptographic Engineer-

ing 1(3) (Sep 2011) 187 https://doi.org/10.1007/s13389-011-0017-8.

39. The OpenSSL Project: OpenSSL: The Open Source toolkit for SSL/TLS. www.

openssl.org (April 2003)

40. Tromer, E., Osvik, D.A., Shamir, A.: Eﬃcient Cache Attacks on AES, and

Countermeasures. Journal of Cryptology 23(1) (January 2010) 37–71 http:

//dx.doi.org/10.1007/s00145-009-9049-y.

41. Turner, S., Langley, A., Hamburg, M.: Elliptic Curves for Security. RFC 7748

(January 2016) https://dx.doi.org/10.17487/rfc7748.

42. Yen, S.M., Joye, M.: Checking before output may not be enough against fault-

based cryptanalysis. IEEE Transactions on Computers 49(9) (Sep 2000) 967–970

https://doi.org/10.1109/12.869328.