Available via license: CC BY 4.0
Content may be subject to copyright.
IACR Transactions on Cryptographic Hardware and Embedded Systems
ISSN 25692925, Vol. 2023, No. 2, pp. 358–380. DOI:10.46586/tches.v2023.i2.358380
Speeding Up MultiScalar Multiplication over
Fixed Points Towards Eﬃcient zkSNARKs
Guiwen Luo, Shihui Fu†and Guang Gong
Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON,
Canada, {guiwen.luo,shihui.fu,ggong}@uwaterloo.ca
Abstract. The arithmetic of computing multiple scalar multiplications in an elliptic
curve group then adding them together is called multiscalar multiplication (MSM).
MSM over ﬁxed points dominates the time consumption in the pairingbased trusted
setup zeroknowledge succinct noninteractive argument of knowledge (zkSNARK),
thus for practical applications we would appreciate fast algorithms to compute it.
This paper proposes a bucket set construction that can be utilized in the context
of Pippenger’s bucket method to speed up MSM over ﬁxed points with the help of
precomputation. If instantiating the proposed construction over BLS12381 curve,
when computing
n
scalar multiplications for
n
= 2
e
(10
≤e≤
21), theoretical analysis
indicates that the proposed construction saves more than 21% computational cost
compared to Pippenger’s bucket method, and that it saves 2
.
6% to 9
.
6% computational
cost compared to the most popular variant of Pippenger’s bucket method. Finally, our
experimental result demonstrates the feasibility of accelerating the computation of
MSM over ﬁxed points using large precomputation tables as well as the eﬀectiveness
of our new construction.
Keywords:
Multiscalar multiplication
·
Pippenger’s bucket method
·
zkSNARK
·
blockchain
1 Introduction
In recent years, zeroknowledge succinct noninteractive argument of knowledge (zk
SNARK) has gained tremendous interest from its theoretical development to the prac
tical implementation because it provides an elegant privacy protection solution. Popu
lar examples include anonymous transactions in Zcash [BCG
+
14] and smart contract
veriﬁcation over private inputs [Ebe] in Ethereum. Many zkSNARKs with trusted
setup rely on pairingbased cryptography, which are very eﬃcient in general. Groth
et al. [GOS06,Gro06,Gro09,Gro10,GOS12,GS12] ﬁrst introduced pairingbased zero
knowledge proofs, leading to the extensive research work in this area [Lip12,GGPR13,
DFGK14,Gro16,MBKM19,GWC19,CHM+19,BFS20,BDFG21].
All pairingbased trusted setup zkSNARKs in the literature follow a common paradigm,
where the prover computes a proof consisting of several points in an elliptic curve group
by generic group operations and the veriﬁer checks the proof by a number of pairings in
the veriﬁcation equation. Basically, it requires the prover and veriﬁer to conduct their
computation by only using linear operations to the points built in the common reference
string. This computation is indeed MSM over ﬁxed points. MSM dominates the overall
time for generating and verifying the proof. Thus, fast algorithms for computing MSM
over ﬁxed points are desirable and necessary.
†Shihui Fu is currently with Delft University of Technology, Delft, Netherlands, shihui.fu@tudelft.nl.
Licensed under Creative Commons License CCBY 4.0.
Received: 20221015 Accepted: 20221215 Published: 20230306
Guiwen Luo, Shihui Fu and Guang Gong 359
MSM in those applications shows the characteristic of having a large amount of points.
For example, one of the most classical zkSNARK applications is to prove the knowledge of
preimage for a cryptographic hash function. When using the traditional SHA256, which
is compiled to an arithmetic circuit with 22,272 AND gates when the preimage is 512
bit [CGGN17], it will lead to the computation of MSM with more than 22,272 points.
When utilizing zkSNARKfriendly hash function Poseidon [GKR
+
21], the MSM still has
hundreds of points.
1.1 Related work
The most popular method for scalar multiplication in the elliptic curve group is the binary
algorithm, known as doubling and addition method (also known as square and multiplication
method in the exponentiation setting) [Knu97, Section 4.6.3]. GLV method [GLV01] and
GLS method [GLS11] decompose the scalar into dimensions 2,4,6 and 8, then compute the
corresponding MSM. When the point for scalar multiplication is ﬁxed, precomputation
can be used to reduce the computational cost. Knuth’s 5 window algorithm utilizes
the precomputation of 16 points to speed up scalar multiplication [Knu97,BC89]. If
bigger window and more storage for precomputed points are used, the windowing method
can be even faster. Pippenger’s bucket method and its variants decompose the scalar,
then sort out all points into buckets with respect to their scalars, and ﬁnally utilize
an accumulation algorithm to add them together [Pip76,BDLO12]. Another line of
research lies in constructing new number systems to represent the scalar, such as basic
digit sets [Mat82,BGMW95] and multibase number systems [DKS09,SIM12,YWLT13].
Researchers also try to make the addition arithmetic more eﬃcient by using diﬀerent curve
representations, such as projective coordinates and Jacobian coordinates that eliminate
the inversion operations, and Montgomery form that only utilizes
x
coordinate [Mon87].
Diﬀerential addition chains (DACs) are used accompanying with
x
onlycoordinate systems,
for example, PRAC chains [Mon92], DJB chains [Ber06] and other multidimensional
DACs [Bro15,Rao15]. Most of the aforementioned techniques can be applied to MSM
where the number of points is small.
When the number of points in MSM is big, which is the situation in pairingbased
trusted setup zkSNARK applications, Pippenger’s bucket method and its variants are the
stateoftheart algorithms that outperform other competitors. Bernstein et al. [BDLO12]
investigated BosCoster method [DR94, Section 4], Straus method [Str64] and Pippenger’s
bucket method, then chose Pippenger’s bucket method to implement batching forgery iden
tiﬁcation for elliptic curve signatures, which marks the beginning of extensive deployment
of Pippenger’s bucket method for computing MSM with a big number of points.
In practice, all of the popular zkSNARKoriented implementations, such as Zcash [Zca],
TurboPLONK [GJW20], Bellman [bel], gnark [gna], choose Pippenger’s bucket method or
its variants to compute MSM over ﬁxed points.
1.2 Our contribution
This paper proposes a new bucket set construction that yields an eﬃcient algorithm
to compute MSM over ﬁxed points in the context of Pippenger’s bucket method. Our
construction targets on
n
scalar multiplication with 2
10 ≤n≤
2
21
, which is desirable for
many pairingbased trusted setup zkSNARK applications. Our main contributions are
summarized as follows.
•
A new subsum accumulation algorithm. After sorting out points into buckets with
respect to their scalars, Pippenger’s bucket method would compute intermediate
subsums and utilize an accumulation algorithm to add those subsums together. The
original subsum accumulation Algorithm 1presented in Section 2.3 is applicable for
360 Speeding Up MSM over Fixed Points Towards Eﬃcient zkSNARKs
the situation where the scalars in the bucket set are consecutive. When the scalars
in the bucket set are inconsecutive, Algorithm 1would be less eﬃcient. This paper
proposes a new subsum accumulation Algorithm 3that accumulates
m
intermediate
subsums using at most 2
m
+
d−
3additions, where
d
is the maximum diﬀerence
between two neighbor elements in the bucket set.
•
A construction of bucket set that yields eﬃcient algorithm to compute MSM over
ﬁxed points. The proposed bucket set construction carefully selects integer elements
from [0
, q/
2] so that for all
t
(0
≤t≤q
), there exist an integer
b
in the bucket set
and an integer m∈ {1,2,3}such that the following assertion holds,
t=mb or t=q−mb.
When instantiating over the BLS12381 curve [Bow17], this construction would yield
an algorithm that takes advantage of 3
nh
precomputed points to evaluate the
n
scalar
multiplication over ﬁxed points where all scalars are smaller than a 255bit prime
r
,
using at most approximately
((nh + 0.21q)additions, if q= 2c(10 ≤c≤31, c 6= 15,16,17),
(nh + 0.28q)additions, if q= 216,
where
h
=
dlogqre
. The theoretical analysis shows that for
n
= 2
e
(10
≤e≤
21), the
proposed algorithm saves more than 21% computational cost compared to Pippenger’s
bucket method, and that it saves 2
.
6% to 9
.
6% computational cost compared to the
most popular variant of Pippenger’s bucket method, which is reviewed in Section
2.3.2.
•
The feasibility of accelerating the computation of MSM by taking advantage of large
precomputation tables and the eﬀectiveness of our new construction are demonstrated
by our implementation. We implemented the popular variant of Pippenger’s bucket
method and our construction based on the BLS12381 library
blst
[bls]. When
computing nscalar multiplication over ﬁxed points in the BLS12381 curve groups,
the experimental result shows that the proposed construction saves more than 17
.
7%
of the computing cost compared to the Pippenger’s bucket method implementation
built in
blst
for
n
= 2
e
(10
≤e≤
21), and that it saves 3
.
1% to 9
.
2% of the
computing cost compared to the variant of Pippenger’s bucket method for
n
=
2e(10 ≤e≤21, e 6= 16,20).
The paper is organized as follows. In Section 2several popular MSM algorithms
including Pippenger’s bucket method and one of its popular variants are reviewed. Then
we propose a new subsum accumulation algorithm in Section 3. In Section 4, we present a
framework of computing MSM over ﬁxed points taking the advantage of precomputation.
This framework is used to derive our new MSM algorithm. Section 5is dedicated to the
construction of our new bucket set and multiplier set. We instantiate our construction
over BLS12381 curve in Section 6and do the theoretical time complexity analysis. In the
end, we present the implementation and experimental result in Section 7.
Let us ﬁrst introduce the notations used throughout the paper before diving into the
content.
Notations.
Without special explanations hereinafter, let
E
be an elliptic curve group and
r
be its order. Let
bxc
be the largest integer that is equal to or smaller than
x
, and
dxe
be
the smallest integer that is equal to or greater than
x
. Let

be bit string concatenation.
Notation Sn,r represents the following MSM over ﬁxed points,
Sn,r =a1P1+a2P2+· · · +anPn,(1)
Guiwen Luo, Shihui Fu and Guang Gong 361
where
ai
’s are scalars such that 0
≤ai< r
and
Pi
’s are ﬁxed points in
E
. Radix
q
= 2
c
is
an integer used to express a scalar in its radix
q
representation. Integer
h
is the length
of a scalar in its radix
q
representation, i.e.,
h
=
dlogqre
. The term
addition
refers to
the point addition arithmetic in
E
. Let us assume for simplicity the computational cost
of doubling and that of addition in
E
are the same, denoted as
A
. This is the norm in
Pippengerlike algorithms, where the major operations are additions. The storage size of a
point is denoted as P.
2 Recap of multiscalar multiplication methods
In this section we review several widely used methods that compute
Sn,r
with large
n
,
namely trivial method, Straus method, Pippenger’s bucket method and one of the variants
of Pippenger’s bucket method.
2.1 Trivial method
In trivial method, each
aiPi
in (1) is computed separately by the doubling and addition
method, then
n
intermediate results are added together to obtain the ﬁnal result. In the
worst case each scalar multiplication cost about 2
·
(
dlog2re −
1)
·A
, the total cost of
computing Sn,r is
[2 ·(dlog2re − 1) ·n+ (n−1)] ·A≈2nlog2r·A. (2)
If nonadjacent form is used to represent the scalar
ai
(
i
= 1
,
2
,· · · , n
), because every
nonzero digit has to be adjacent to two 0s, in the worst case there are half nonzero digits
in
ai
. The cost of each scalar multiplication would drop to about (3
/
2)
dlog2
(
r
)
e · A
. The
time complexity of computing Sn,r in the worst case is about
3
2dlog2re · n+ (n−1)·A≈3
2·nlog2r·A. (3)
2.2 Straus method
In order to compute Sn,r, Straus method [Str64] precomputes 2nc points
{b1P1+b2P2+· · · +bnPnfor 0≤bi≤2c−1, i = 1,2,· · · , n},
where
c
is a small integer. It then divides each
ai
(in its binary form with high order bit
to the left) into segments of length c, i.e.,
ai=ai,h−1ai,h−2 · · · ai1ai0=
h−1
X
j=0
aij 2jc , i = 1,2,· · · , n, (4)
where h=dlog2(r)/ce, and 0≤aij <2cfor 1≤j≤h−1. It retrieves the point
Sn,2c=a1,h−1P1+a2,h−1P2+· · · +an,h−1Pn(5)
from the precomputation table, doubles it ctimes, adds the precomputed point
a1,h−2P1+a2,h−2P2+· · · +an,h−2Pn(6)
to obtain
Sn,22c= (a1,h−1a1,h−2)P1+ (a2,h−1a2,h−2)P2+· · · + (an,h−1an,h−2)Pn.(7)
362 Speeding Up MSM over Fixed Points Towards Eﬃcient zkSNARKs
By repeating such process for h−1times, we obtain
Sn,2hc =(a1,h−1a1,h−2 · · · a10)P1+ (a2,h−1a2,h−2 · · · a20 )P2
+· · · + (an,h−1an,h−2 · · · an,0)Pn.
Sn,2hc is exactly what we aim to compute, i.e., Sn,r.
Straus method is only suitable for small
n
because when
n
goes big the precomputation
would be exponentially large. One variant that can be used for large number
n
is to only
store n·(2c−1) precomputed values,
{biPi1≤bi≤2c−1, i = 1,2,· · · , n},
where
c
is a small integer. At
j
th iteration of (5)(6)(7) (
j
= 0
,
1
,
2
,· · · , h −
1) in Straus
method, separately add together precomputed points
a1jP1
,
a2jP2
,
· · ·
,
anj Pn
with
n−
1
additions to obtain
a1jP1+a2jP2+· · · +anj Pn.
The storage size would drop from 2nc ·Pto
n(2c−1) ·P.
This process would repeat
h
times, each time it conducts
n
additions and
c
doublings (the
last time does not require doubling), so the computational cost is approximately
(n+c)h·A. (8)
2.3 Pippenger’s bucket method
Here we introduce Pippenger’s bucket method presented in [BDLO12, Section 4], which is
an application of Pippenger’s algorithm [Pip76].
Pippenger’s bucket method proceeds the same as what Straus method does except for
computing
Sn,2c=a1jP1+a2jP2+· · · +anj Pn,(9)
where j= 0,1,2,· · · , h −1, h =dlog2(r)/ce.
Pippenger’s bucket method evaluates (9) by ﬁrst sorting all the points into (2
c−
1)
buckets with respect to their scalars. We denote the intermediate subsum of those points
corresponding to scalar
i
as
Si
. It computes all
Si0s
(
i
= 1
,
2
,· · · ,
2
c−
1) using at most
(
n−
(2
c−
1)) additions. Finally it computes
Sn,2c
=
P2c−1
i=1 i·Si
by Algorithm 1using at
most 2(2c−2) additions.
Algorithm 1 Subsum accumulation algorithm I
Input: S1, S2,· · · , Sm.
Output: 1S1+ 2S2+· · · +mSm.
tmp =0
tmp1 =0
for i=mto 1do
tmp =tmp +Si
tmp1 =tmp1 +tmp
return tmp1
The correctness of Algorithm 1is ensured by the following equation,
m
X
i=1
iSi=
m
X
i=1
i
X
j=1
Si=
m
X
j=1
m
X
i=j
Si.
Guiwen Luo, Shihui Fu and Guang Gong 363
The computation of
Sn,2c
costs
n−
(2
c−
1) + 2(2
c−
2)
≈
(
n
+ 2
c
)additions. The
computational cost of Sn,r is thus approximately
(n+ 2c)h·A. (10)
Compared to (8), in the ﬁrst glimpse it seems that Pippenger’s bucket method is less
eﬃcient against Straus method, but this might not be right for large
n
. Because there is
no precomputation requirement in Pippenger’s bucket method, bigger
c
can be selected to
minimize the overall computational cost.
2.3.1 The variant
In the aforementioned Pippenger’s bucket method, one downside is that Algorithm 1
runs
h
times. If there is storage available for precomputation, this shortcoming can be
circumvented by the variant presented in [BGMW95].
Choose a radix q= 2c, partition ai(i= 1,2,· · · , n)into segments as follows,
ai=ai,h−1ai,h−2 · · · ai0=
h−1
X
j=0
aij qj,(11)
where h=dlogqre,0≤aij < q (0 ≤j≤h−1). It follows that
Sn,r =a1P1+a2P2+· · · +anPn
=
n
X
i=1
(
h−1
X
j=0
aij qj)Pi
=
n
X
i=1
h−1
X
j=0
aij ·qjPi
=: Snh,q.
(12)
We precompute the following points
{qjPii= 1,2,· · · , n, j = 0,1,2,· · · , h −1},
which requires the storage size of
nh ·P,
then
Sn,r
=
Snh,q
can be computed by using Algorithm 1only once. The computational
cost is
[nh −(q−1) + 2(q−2)] ·A≈(nh +q)·A.
2.3.2 Further optimization
Pippenger’s bucket method and the variant can be further optimized by halving the size
of the bucket set. Let radix
q
= 2
c
, using the observation that in an elliptic curve group
−P
is obtained from
P
by taking the negative of its
y
coordinate with almost no cost, all
the buckets can be restricted to scalars that are no more than q/2if
qh−1< r ≤q/2·qh−1,
where
h
=
dlogqre
. Algorithm 2can be used to convert scalar
a
(0
≤a < r
)from its
standard
q
ary form to the representation where every digit is in the range of [
−q/
2
, q/
2].
364 Speeding Up MSM over Fixed Points Towards Eﬃcient zkSNARKs
Algorithm 2 Scalar conversion I
Input:{aj}0≤j≤h−1,0≤aj< q such that a=Ph−1
j=0 ajqj.
Output:{bj}0≤j≤h−1,−q/2≤bj≤q/2such that a=Ph−1
j=0 bjqj.
1: for j=0to h−2by 1do
2: if aj≤q/2then
3: bj=aj
4: else
5: bj=aj−q
6: aj+1=aj+1+1
7: bh−1=ah−1
8: return {bj}1≤j≤h−1
The correctness of Algorithm 2is straightforward. Notice that the assumption ensures
ah−1≤q/2−1, so bh−1≤q/2considering the possible carry bit from ah−2.
The time complexity of Pippenger’s bucket method would thus drop to
h(n+q/2) ·A, (13)
and the complexity of the variant would be
(nh +q/2) ·A. (14)
Henceforward when mentioning Pippenger’s bucket method and Pippenger’s variant,
we refer to the algorithms whose time complexities are (13) and (14) respectively.
2.4 Comparison of multiscalar multiplication algorithms
We summarize in Table 1the precomputation storage and the time complexity of computing
Sn,r
by the aforementioned methods together with our construction proposed in Section 5.
Here
q
= 2
c
,
h
=
dlogqre
. Radix
q
is selected to minimize the computational cost. The
time complexity of Pippenger’s bucket set and Pippenger’s variant hold if
r≤q/
2
·qh−1
.
The time complexity of our construction holds when r/qhis small.
Table 1: Comparison of diﬀerent methods that computes Sn,r
Method Storage Worst case complexity
Trivial method n·P3/2·(nlog2r)·A
Straus method [Str64]n2c·P h(n+c)·A
Pippenger [Pip76,BDLO12]n·P h(n+q/2) ·A
Pippenger variant [BGMW95]nh ·P(nh +q/2) ·A
Our construction 3nh ·P(nh + 0.21q)·A
3 A new subsum accumulation algorithm
During the computation of
Sn,r
by Pippenger’s bucket method, after sorting every point
into the bucket with respect to its scalar and computing the intermediate subsum
Si0s
,
the reminder is invoking a subsum accumulation algorithm to compute
S=b1S1+b2S2+· · · +bmSm,
where 1
≤b1≤b2≤ · · · ≤ bm
. When set
{bi}1≤i≤m
is not a sequence of consecutive
integers, Algorithm 1shows the limitation of handling such case with less eﬃciency. One
Guiwen Luo, Shihui Fu and Guang Gong 365
may utilize BosCoster method [DR94, Section 4] to deal with this case but it is a recursive
algorithm and its complexity is not easy to analyze. Here we propose a straightforward
algorithm to tackle this case.
Deﬁne b0= 0, let
d= max
1≤i≤m{bi−bi−1},
then Scan be computed by Algorithm 3.
Algorithm 3 Subsum accumulation algorithm II
Input: b1, b2,· · · , bm, S1, S2,· · · , Sm.
Output: S=b1S1+b2S2+· · · +bmSm.
1: Deﬁne a length(d+ 1) array tmp = [0]×(d+1)
2: for i=mto 1by −1do
3: tmp[0] = tmp[0] + Si
4: k=bi−bi−1
5: if k>=1then
6: tmp[k] = tmp[k] + tmp[0]
7: return 1·tmp[1] + 2·tmp[2] + · · · +d·tmp[d]
Denote
δj
=
bj−bj−1
, then
bi
=
Pi
j=1 δj
. The correctness of Algorithm 3comes from
the following equation,
m
X
i=1
biSi=
m
X
i=1
(
i
X
j=1
δj)Si
=
m
X
j=1
δj(
m
X
i=j
Si)
=
d
X
k=1
k
m
X
j=1,δj=k
(
m
X
i=j
Si).
(15)
During the execution of Algorithm 3, temp variable
tmp[0]
stores
Pm
i=jSi
when loop index
i
equals
j
, and temp variable
tmp[k]
stores
Pm
j=1,δj=k
(
Pm
i=jSi
)for 1
≤k≤d
after the
for loop.
If
{bi}1≤i≤m
is strictly increasing and
k
in line 4 goes through
{
1
,
2
,· · · , d}
, then in
the
for
loop (line 2 – 6), each iteration executes exactly 2 additions. Since all
d
+ 1 temp
variables in
tmp
are initialized as 0’s, there are
d
+ 1 additions with addend 0, which
have no computational cost, so the
for
loop executes 2
m−
(
d
+ 1) additions. Line 7 is
computed by subsum accumulation Algorithm 1with 2(
d−
1) additions. In total, the cost
of Algorithm 3is 2m+d−3additions.
If
{bi}1≤i≤m
are not strictly increasing, which means sometimes
k
in line 4 equals 0, in
the corresponding for iteration it will only execute one addition by skipping if part.
If
k
in line 4 does not go through all integers in
{
1
,
2
,· · · , d}
, there exists a
tmp[k]
(1
≤
k≤d
)who would skip the
for
loop and stay at 0. In the
for
loop, the addition saved by
the fact that
tmp[k]
is initialized as 0 will no longer be saved. In the mean time when line
7 is executed, at least one addition will be saved because
tmp[k]
= 0, so the total cost will
not increase.
To sum up, the cost of Algorithm 3in the worst case is (2
m
+
d−
3)
·A
. When
d
= 1,
Algorithm 3degenerates to Algorithm 1.
366 Speeding Up MSM over Fixed Points Towards Eﬃcient zkSNARKs
4 A framework of computing multiscalar multiplication
over ﬁxed points
The following framework is inspired by Brickell et al. [BGMW95], who presented a similar
method to compute single scalar multiplication using the notion of basic digit sets. They
did not consider the possible overﬂow of the most signiﬁcant digit of a scalar, which is
not a big issue in single scalar multiplication while it matters in MSM
Sn,r
, because the
overﬂow will increase the computational cost by at most
n
additions. Here we give a
straightforward illustration to the framework without the involvement of basic digit sets.
Suppose we are going to compute
Sn,r
. Let
M
be a set of integers,
B
be a set of
nonnegative integers and 0
∈B
. Given scalar
ai
(0
≤ai< r
)in its radix
q
representation
ai=
h−1
X
j=0
aij qj,
where
h
=
dlogqre
, if every
aij
(1
≤i≤n,
0
≤j≤h−
1) is the product of an element
from set Mand an element from set B, i.e.,
aij =mij bij , mij ∈M, bij ∈B,
Sn,r can be computed as follows,
Sn,r =
n
X
i=0
aiPi=
n
X
i=1
(
h−1
X
j=0
aij qj)Pi
=
n
X
i=1
(
h−1
X
j=0
mij bij qj)Pi=
n
X
i=1
h−1
X
j=0
bij ·mij qjPi.
(16)
Denote Pij =mij qjPi, then
Sn,r =
n
X
i=1
h−1
X
j=0
bij Pij
=
n
X
i=1
h−1
X
j=0
(X
k∈B
k·X
i,j s.t. bij =k
Pij )
=X
k∈B
k·(
n
X
i=1
h−1
X
j=0 X
i,j s.t. bij =k
Pij ).
(17)
Suppose those nhMpoints
{mqjPi1≤i≤n, 0≤j≤h−1, m ∈M}(18)
are precomputed, and deﬁne intermediate subsum Sk,
Sk=
n
X
i=1
h−1
X
j=0 X
i,j s.t. bij =k
Pij , k ∈B.
Equation (17) can be evaluated by ﬁrst computing all
Sk0s
(
k∈B
)with at most
nh −
(
B −
1) additions, the reason is straightforward since there are
nh
points being sorted into
B −
1subsums. The reminder is computed by Algorithm 3with at most 2(
B −
1) +
d−
3
additions, where dis the maximum diﬀerence between two neighbor elements in B.
Guiwen Luo, Shihui Fu and Guang Gong 367
To sum up, the worst case time complexity of computing Sn,r is
(nh +B+d−4) ·A, (19)
where h=dlogqre, with the help of
nhM(20)
precomputed points.
Set
M
is called a multiplier set, because the set of precomputed points contains the
points multiplied by every element from
M
. Set
B
is called a bucket set, since all points are
sorted into subsum buckets with respect to the scalars in
B
. This framework is translated
into Algorithm 4.
Algorithm 4 Multiscalar multiplication over ﬁxed points
Input:
Scalars
a1, a2,· · · , an
, ﬁxed points
P1, P2,· · · , Pn
, radix
q
, scalar length
h
, multi
plier set M={m0, m1,· · · , mM−1}, bucket set B={b0, b1,· · · , bB−1}.
Output: Sn,r =Pn
i=1 aiPi.
1: Precompute a lengthnhMpoint array precomputation, such that
precomputation [M((i−1)h+j) + k] = mkqjPi.
Precompute a hash table
mindex
to record the index of every multiplier, such that
mindex[mk] = k
. Precompute a hash table
bindex
to record the index of every bucket,
such that bindex[bk] = k.
2: Convert every aito its standard qary form, then convert it to ai=Ph−1
j=0mijbij qj.
3:
Create a length
nh
scalar array
scalars
, such that
scalars[(i−1)h+j] = bij
.
Create a length
nh
array
points
recording the index of points, such that
points[(i−1)h+j] = M((i−1)h+j) + mindex[mij]
.
n
scalar multiplication
Sn,r
is equivalent to the following nhscalar multiplication
nh−1
X
i=0
scalars[i]·precomputation [points[i]],
where every scalar in scalars is from bucket set B.
4:
Create a length
B
point array
buckets
to record the intermediate subsums, and initial
ize every point to inﬁnity. For
0≤i≤nh −1
, add point
precomputation [points[i]]
to bucket buckets [bindex[scalars[i]]].
5: Invoke Algorithm 3to compute PB−1
i=0bi·buckets[i], return the result.
If we denote the expected number of zero element in the length
nh
array
scalars
as
f
, and assume all elements in the length
B
array
buckets
, Step 5 are nonezero, then
the average cost can be estimated as
(nh +B+d−f)·A. (21)
From (19)(21) we can see, given
n
and
r
, in order to reduce the time complexity of
computing
Sn,r
, we can choose a larger radix
q
to make
h
smaller, or ﬁnd a smaller bucket
set B. Those two alternatives are closely related.
Here are two examples of utilizing this framework.
Example 1. Under this framework, Pippenger’s variant presented in Section 2.3.2 has
M={−1,1}, B ={0,1,2,· · · ,2c−1}.
Example 2. For radix q= 2csuch that
qh−1< r ≤1/4·q·qh−1,
368 Speeding Up MSM over Fixed Points Towards Eﬃcient zkSNARKs
we denote λ=qmod 3,λ∈ {1,2}. The multiplier set is picked as
M={1,−1,3,−3},(22)
the corresponding bucket set is
B={i0≤i≤q
4}∪{3i−λfor all is.t. q
4≤3i−λ≤q
2}.(23)
It can be shown that this is a valid construction and that
B ≤ q/
3 + 2, thus the pair
(M, B )yields an algorithm to compute Sn,r using at most
(nh +lq
3m)·A.
5 A construction of multiplier set and bucket set
In this section, we construct a pair of multiplier set and bucket set (
M, B
)that can be
utilized to speed up the computation of
Sn,r
under the framework presented in Section
4. The essential diﬃculty to reduce the size of
B
is to make sure that every scalar in
[0
, r −
1] can be converted to its radix
q
representation where every digit is the product of
an element from Mand an element from B.
Given a scalar a(0 ≤a<r)in its standard qary representation
a=
h−1
X
j=0
ajqj,0≤aj< q, (24)
we will show that our construction enables the scalar conversion from its standard
q

ary form to the required radix
q
representation, thus yielding eﬃcient
Sn,r
computation
algorithm.
5.1 Our construction
For radix q= 2c(10 ≤c≤31), the multiplier set is picked as
M={1,2,3,−1,−2,−3}.(25)
Bucket set Bis established by an algorithm.
In order to determine
B
, let us ﬁrst deﬁne three auxiliary sets
B0, B1
and
B2
. Let
rh−1
=
br/qh−1c
be the maximum leading term of a scalar in its standard
q
ary expression,
B0={0}∪{b1≤b≤q/2, s.t. ω2(b) + ω3(b)≡0 mod 2},
B2={0}∪{b1≤b≤rh−1+ 1, s.t. ω2(b) + ω3(b)≡0 mod 2},(26)
where
ω2
(
b
)represents the exponent of factor 2 in
b
, and
ω3
(
b
)represents the exponent of
factor 3 in
b
. For instance, if
i
= 2
ek,
2
k
,
ω2
(
b
) =
e
. From the deﬁnitions,
B0
(or
B2
)
has such a property that for all 0
≤t≤q/
2(or 0
≤t≤rh−1
+ 1), there exist an element
b∈B0(or b∈B2) and an integer m∈ {1,2,3}, such that
t=mb.
Set
B0
itself is a valid bucket set construction, which was also mentioned in [BGMW95].
Since we can utilize negative elements in
M
, there are redundant elements to be removed
from B0. Set B1is deﬁned by Algorithm 5, and the following Property 1holds for B1.
Guiwen Luo, Shihui Fu and Guang Gong 369
Algorithm 5 Construction of auxiliary set B1
Input:B0, q.
Output:B1.
1: B1=B0
2: for i=q/4to q/2−1by 1do
3: if iis in B0and q−2·iis in B0then
4: B1.remove(q−2·i)
5: for i=bq/6cto q/4−1by 1do
6: if iis in B0and q−3·iis in B0then
7: B1.remove(q−3·i)
8: return B1
Property 1.
Given
q
= 2
c
(10
≤c≤
31), for all
t
(0
≤t≤q
), there exist an element
b∈B1and an integer m∈ {1,2,3}, such that
t=mb or t=q−mb.
This property is veriﬁed by computation using Algorithm 7. It is also asserted by
computation that exchanging two for loops in Algorithm 5would construct the same
B1
.
Finally the bucket set is proposed to be
B=B1∪B2.(27)
Property 2.
For the multiplier set
M
and the bucket set
B
deﬁned in (25) (27), scalar
a(0 ≤a<r)can be expressed (not necessarily uniquely) as follows
a=
h−1
X
j=0
mjbjqj, mj∈M, bj∈B. (28)
Proof.
By Property 1we know that for arbitrary integer
t∈
[0
, q
], it can be expressed as
t=mb +αq, m ∈M, b ∈B, α ∈ {0,1},
and by the deﬁnition of
B2
we know that for integer
t∈
[0
, rh−1
+ 1], it can be expressed
as
t=mb, m ∈ {1,2,3}, b ∈B.
Back to Property 2, Algorithm 6can be used to convert
a
from its standard
q
ary
representation deﬁned in (24) to its radix qrepresentation deﬁned in (28).
Algorithm 6 Scalar conversion II
Input:{aj}0≤j≤h−1,0≤aj< q such that a=Ph−1
j=0 ajqj.
Output:{(mj, bj)}0≤j≤h−1, mj∈M, bj∈Bsuch that a=Ph−1
j=0 mjbjqj.
1: for j=0to h−2by 1do
2: Obtain mj,bj, αjsuch that aj=mjbj+αjq
3: aj+1=αj+aj+1
4: Obtain mh−1,bh−1such that ah−1=mh−1bh−1
5: return {(mj,bj)}0≤j≤h−1
The correctness of Algorithm 6comes from the fact that
i) a0∈[0, q −1],
ii) αj+aj+1 ∈[0, q]for all 0≤j≤h−3,
iii) αh−2+ah−1∈[0, rh−1+ 1].
370 Speeding Up MSM over Fixed Points Towards Eﬃcient zkSNARKs
For every
t∈
[0
, q
], a hash table
H
is precomputed to store its decomposition, i.e.,
H
(
t
) = (
m, b, α
)such that
t
=
mb
+
αq, m ∈M , b ∈B , α ∈ {
0
,
1
}
. Steps 2 and 4 in
Algorithm 6are executed by retrieving the decomposition from the hash table. For the
proposed (
M, B
), hash table
H
can be realized by a length(
q
+ 1) array
decomposition
using Algorithm 7.
decomposition
is also utilized to verify Property 1by checking whether
in decomposition there is any entry whose last element is −1.
Algorithm 7 Construction of digit decomposition hash table
Input: M, B deﬁned in (25) (27).
Output: Length(q+ 1) array decomposition, a realization of hash table H.
1:
Deﬁne a length(
q
+ 1) array
decomposition
and initiate every entry to be
[0,0,−1]
.
2: for m∈ {−1,−2,−3}do
3: for b∈Bdo
4: if m·b+q≥0then
5: decomposition[m·b+q]=[m,b,1]
6: for m∈ {1,2,3}do
7: for b∈Bdo
8: if m·b≤qthen
9: decomposition[m·b]=[m,b,0]
10: return decomposition
When instantiating over BLS12381 curve, it is calculated approximately
B=(0.21q, q = 2c(10 ≤c≤31, c 6= 15,16,17),
0.28q, q = 216 .(29)
See Section 6and Appendix Afor the detailed parameters. It is checked that the maximum
diﬀerence between two neighbor elements in Bis d= 6.
For a point
P
= (
x, y
)on elliptic curve
E
with short Weierstrass form,
−P
= (
x, −y
)
can be obtained for almost no cost, hence the points associated with negative elements in
M
can be excluded from the precomputation table. Correspondingly, in Step 3, Algorithm
4, a length
nh
boolean array is added to record the sign of multipliers. In Step 4, if a
multiplier is negative, the corresponding point should be deducted from the intermediate
subsum.
By the computational cost estimation formula presented in (19), we have the following
Proposition 1.
Proposition 1.
Given number of points
n
and group order
r
, suppose
q
= 2
c
(10
≤c≤
31)
and
h
=
dlogqre
, the multiplier set and bucket set deﬁned in (25) (27) yield an algorithm
to compute MSM Sn,r over BLS12381 curve using at most approximately
((nh + 0.21q)·A, q = 2c(10 ≤c≤31, c 6= 15,16,17),
(nh + 0.28q)·A, q = 216 ,(30)
with the help of 3nh precomputed points
mqjPi1≤i≤n, 0≤j≤h−1, m ∈ {1,2,3}.
6 Instantiation
In this section we instantiate our construction over BLS12381 curve [BLS02,Bow17], and
present some theoretical analysis against Pippenger’s bucket method and Pippenger’s
variant.
Guiwen Luo, Shihui Fu and Guang Gong 371
BLS12381 curve is a pairingfriendly elliptic curve initially designed by Sean Bowe
for the cryptocurrency system Zcash [Zca,Bow17]. It is widely deployed in blockchain
applications such as Zcash, Ethereum [eth], Chia [chi], DFINITY [dﬁ], Algorand [alg]. It
provides about 126bit security [Pol78,BD19,GMT20].
BLS12381 curve is deﬁned by the equation
E:y2=x3+ 4
over the prime ﬁeld Fp, where
p=0x1a0111ea397fe69a4b1ba7b6434bacd7
64774b84f38512bf6730d2a0f6b0f624
1eabfffeb153ffffb9feffffffffaaab
is the 381bit ﬁeld characteristic (in hexadecimal), and its embedding degree is 12. Two
subgroups
G1⊂E
(
Fp
)and
G2⊂E
(
Fp2
)over which bilinear pairings are deﬁned have the
same 255bit prime order
r=0x73eda753299d7d483339d80809a1d805
53bda402fffe5bfeffffffff00000001.
6.1 Theoretical analysis
Radix
q
is called optimal if the number of additions required to compute
Sn,r
in the
worst case is minimized. The optimal
q
and its corresponding scalar length
h
for diﬀerent
methods is summarized in Table 2. The precomputation size presented in this table is in
terms of points in
G1
with aﬃne coordinates. The precomputation size over
G2
would
double its counterpart over G1.
Pippenger’s bucket method and Pippenger’s variant are those two methods introduced
in Section 2.3.2. Our construction refers to the proposed construction presented in Section
5.
Table 2: Radix q, length hand precomputation size utilized to compute Sn,r
Pippenger Pippenger variant Our construction
n q h Storage q h Storage q h Storage
210 2832 96.0 KB 212 22 2.06 MB 213 20 5.62 MB
211 210 26 192 KB 213 20 3.75 MB 214 19 10.6 MB
212 210 26 384 KB 213 20 7.50 MB 214 19 21.3 MB
213 211 24 768 KB 214 19 14.2 MB 216 16 36.0 MB
214 212 22 1.50 MB 216 16 24.0 MB 216 16 72.0 MB
215 213 20 3.00 MB 216 16 48.0 MB 216 16 144 MB
216 213 20 6.00 MB 216 16 96.0 MB 219 14 252 MB
217 216 16 12.0 MB 218 15 180 MB 220 13 468 MB
218 216 16 24.0 MB 219 14 336 MB 220 13 936 MB
219 216 16 48.0 MB 220 13 624 MB 220 13 1.83 GB
220 216 16 96.0 MB 220 13 1.22 GB 222 12 3.38 GB
221 219 14 192 MB 222 12 2.25 GB 222 12 6.75 GB
The number of additions taken to compute
Sn,r
in the worst case and their comparison
are summarized in Table 3, where
372 Speeding Up MSM over Fixed Points Towards Eﬃcient zkSNARKs
•Improv1 = ((Our construction)  Pippenger)/Pippenger,
•Improv2 = ((Our construction)  (Pippenger variant))/(Pippenger variant).
Table 3shows that theoretically when computing
Sn,r
over BLS12381 curve for
n
=
2
e
(10
≤e≤
21), our construction saves 21% to 40% additions compared to Pippenger’s
bucket method, and it saves 2.6% to 9.6% additions compared to Pippenger’s variant.
Table 3: Comparison of number of additions taken to compute Sn,r in the worst case
nPippenger Pippenger variant Our construction Improv1 Improv2
210 3.69 ×1042.46 ×1042.22 ×10439.8% 9.6%
211 6.66 ×1044.51 ×1044.23 ×10436.4% 6.1%
212 1.20 ×1058.60 ×1048.12 ×10432.2% 5.6%
213 2.21 ×1051.64 ×1051.49 ×10532.4% 8.8%
214 4.06 ×1052.95 ×1052.80 ×10530.8% 4.9%
215 7.37 ×1055.57 ×1055.43 ×10526.4% 2.6%
216 1.39 ×1061.08 ×1061.03 ×10626.3% 5.0%
217 2.62 ×1062.10 ×1061.92 ×10626.6% 8.2%
218 4.72 ×1063.93 ×1063.63 ×10623.1% 7.7%
219 8.91 ×1067.34 ×1067.04 ×10621.1% 4.1%
220 1.73 ×1071.42 ×1071.35 ×10722.2% 4.9%
221 3.30 ×1072.73 ×1072.60 ×10721.2% 4.5%
It is noted that the proposed bucket sets listed in Appendix
A
are suﬃcient to compute
Sn,r
over BLS12381 for
n
= 2
e
(22
≤e≤
28). Our method still shows 2
.
8%
∼
5
.
8%
theoretical improvement against Pippenger’s variant in those cases, but its drawback is
that the precomputation would be too large.
6.2 Time complexity: worst case versus average case
We are going to show in this section that the diﬀerence between the worst case time
complexity and the average case is tiny, hence the worst case is used in this paper as the
representative. The result relies on the group order
r
, that is why we do the average case
analysis after instantiation. It is done by estimating the expected number of zero elements,
denoted as f, in the lengthnh array scalars, Algorithm 4.
Suppose the group order in its standard
q
ary form is
r
=
Ph−1
j=0 rjqj
. For every
uniformly randomly picked scalar
a
(0
≤a < r
), when
a
is converted to the standard
q
ary
form
a=
h−1
X
j=0
ajqj,0≤aj< q,
for simplicity we assume that
Pr[aj= 0] ≈1
q,Pr[aj= (q−1)] ≈1
q,0≤j≤h−2,
and
Pr[ah−1= 0] = qh−1
r=qh−1
Ph−1
j=0 rjqj≈1
rh−1+ 1.
Guiwen Luo, Shihui Fu and Guang Gong 373
Let us ﬁrst do the analysis for our construction. When converting scalar
a
by Algorithm
6from its standard qary form to the radix qrepresentation
a=
h−1
X
j=0
mjbjqj, mj∈M, bj∈B, (31)
we know
bj
= 0 (1
≤j≤h−
2) if and only if
aj
= 0 and the carry bit from the previous
digit
αj−1
= 0, or
aj
=
q−
1and the carry bit
αj−1
= 1. Assume the probability of carry
bit being 0 is
λ
, which is equal to the probability of
α
= 0 in array
decomposition
decided
by Algorithm 7, then
Pr[bj= 0] = λ1
q+ (1 −λ)1
q=1
q,1≤j≤h−2,
and
Pr[b0= 0] = 1
q,Pr[bh−1= 0] = λ·1
rh−1+ 1.
If a scalar is converted to the representation in (31), the expectation of number of
j
’s such
that bj= 0 is
h−1
q+λ
rh−1+ 1.
When running Algorithm 4, the expectation of number of zeros in scalars is
f=n(h−1)
q+λ·n
rh−1+ 1.(32)
Deﬁne
I=worst case time complexity −average case time complexity
worst case time complexity
to measure the diﬀerence between the worst case time complexity and the average case
time complexity. Our method utilizes radix
q
comparable to
n
, so
n(h−1)/q
is a small
number that can be ignored. It follows that
I=f
nh +B+ 2 ≈λ·n/(rh−1+ 1)
nh +B+ 2 <λ·n/(rh−1+ 1)
nh =λ
(rh−1+ 1)h.(33)
For radix
q
= 2
c
(10
≤c≤
22
, c 6
= 15
,
17) used in Table 2, the triad (
q, h, rh−1
)can be
found in Appendix A, and it is checked that
λ <
0
.
7for those radixes. It follows that
I < 1%, which means the diﬀerence is small.
A similar analysis also applies to Pippenger’s bucket method and Pippenger’s variant,
and (33) still holds for them. Those two methods have
λ≈
0
.
5. For
q
= 2
c
(8
≤c≤
22),
I
is even smaller compared to our construction.
7 Implementation
In order to assess the cost of scalar conversion and the impact of memory locality issue
caused by large precomputation size, we conducted the experiment on the basis of
blst
, a
BLS12381 signature library written in C and assembly [bls].
blst
library includes the
addition/doubling arithmetic and the implementation of Pippenger’s bucket method over
G1
and
G2
. We implemented Pippenger’s variant and our construction following Algorithm
4, we invoked the Pippenger’s bucket method implementation built in blst.
374 Speeding Up MSM over Fixed Points Towards Eﬃcient zkSNARKs
7.1 Implementation analysis
In terms of the scalar conversion, which is Step 2 of Algorithm 4, a scalar is given as a
length8
uint32_t
array. Both Pippenger’s bucket method and Pippenger’s variant need
to ﬁrst convert the scalar to its standard
q
ary form, then convert to the expression where
the absolute value of every digit is no more than
q/
2using Algorithm 2. Our construction
ﬁrst converts the scalar to its standard
q
ary form, then converts to the expression where
every digit is the product of an element from the multiplier set and an element from the
bucket set using Algorithm 6with the help of the decomposition hash table. We utilize a
length(
q
+ 1) array
decomposition
to realize this hash table. So the concern boils down
to retrieving data from array decomposition.
In terms of Step 4 in Algorithm 4, where all points are sorted into diﬀerent buckets, the
addition is done between a point fetched from array
precomputation
and another point
fetched from array
buckets
. We treat the
n
ﬁxed points as the length
nprecomputation
array for Pippenger’s bucket method. We have the following observations.
•Pippenger’s bucket method compute htimes the equation (9), so in total it fetches
data
nh
times from its length
n
array
precomputation
, it fetches data
nh
times
from its length(0.5q+ 1) array buckets.
•
Pippenger’s variant fetches data
nh
times from its length
nh
array
precomputation
,
it fetches data nh times from its length(0.5q+ 1) array buckets.
•
Our construction fetches data
nh
times from its length3
nh
array
precomputation
, it
fetches data
nh
times from its array
buckets
, whose length is roughly 0
.
21
q
(
q6
= 2
c
,
c∈ {15,16,17}).
Pippenger’s variant and our construction show some advantages here regarding the num
ber of fetch operations, since their
h
’s are usually smaller than that of Pippenger’s
bucket method. Their disadvantages are that the fetch operations are executed in larger
precomputation
and
buckets
arrays. Step 4 of Algorithm 4is a simple loop, so we utilize
prefetch to mitigate the impact of memory access of large arrays.
It is noted that in terms of fetching data from
buckets
array, our construction would
have some advantage against Pippenger’s variant when the same radix
q
is used because
our construction would use smaller
buckets
array in this case. Even if our radix
q
(
q6
= 2
16
)
is twice as big, our construction still keeps such advantage.
7.2 Experimental result
Our experiment is done on an Apple 14 inch MacBook Pro with 3.2 GHz M1 Pro chip,
and with 16 GB memory. M1 Pro has advantages like large cache size, high memory
bandwidth. Most importantly its cache line is 128 bytes, which is suﬃcient to accommodate
a BLS12381
G1
point whose size is 96 bytes. Those characteristics are expected to provide
us some beneﬁt when fetching data from large arrays.
The experimental result is presented in Tables 4and 5. Both Pippenger’s variant and
our construction use optimal radixes presented in Table 2, while the Pippenger’s bucket
method built in blst utilizes slightly diﬀerent radixes, explicitly,
q=(2e−2for n= 2e(10 ≤e≤12),
2e−3for n= 2e(13 ≤e≤21).
We keep
blst
’s implementation intact, because on one hand our focus is on the comparison
between Pippenger’s variant and our construction, on the other hand
blst
’s implementation
can serve as a performance benchmark. In Table 4, s.c. v. represents the time spent
by Pippenger’s variant to do the scalar conversion for all
n
scalars in
Sn,r
, while s.c.
Guiwen Luo, Shihui Fu and Guang Gong 375
c. represents that of our construction. In Table 5, Improv1 is the comparison between
Pippenger’s bucket method and our construction, and Improv2 is the comparison between
Pippenger’s variant and our construction.
Both Pippenger’s variant and our construction show a huge improvement compared
to Pippenger’s bucket method, which demonstrates the feasibility of speeding up the
computation of Sn,r using large precomputation tables.
If we focus on the comparison between Pippenger’s variant and our construction, we
have the following observations when computing
Sn,r
in
G1
for
n
= 2
e
(10
≤e≤
21), and
in G2for n= 2e(10 ≤e≤20)1,
•
In terms of the computation time of scalar conversion, in
G1
Pippenger’s variant
takes 0
.
9
∼
1
.
5% out of its entire
Sn,r
computation time, while our construction
takes 1
.
1
∼
2
.
8%. Because in
G2
the addition arithmetic takes relatively more time
compared to that in
G1
, the percentages are smaller. In
G2
Pippenger’s variant takes
0
.
4
∼
0
.
6% out of its whole
Sn,r
computation time, while our construction takes
0.4∼1.1%.
•
Our construction does not perform good for
n
= 2
16,
2
20
. For
n
= 2
16
, our optimal
radix
q
= 2
19
, which is 8 times larger than that of Pippenger’s variant. For
n
= 2
20
,
our optimal radix
q
= 2
22
, which is 4 times larger than that of Pippenger’s variant.
Since the radix value is even larger than
n
, it would have negative impact on fetching
data from array
buckets
as the analysis in the previous section indicates. When we
change to smaller radixes, speciﬁcally
q
= 2
18
for
n
= 2
16
, and
q
= 2
20
for
n
= 2
20
,
our construction outperforms Pippenger’s variant again as the results marked by
asterisk show, although theoretically those radixes are not optimal.
•
Our construction outperforms Pippenger’s variant for
n
= 2
e
(10
≤e≤
21
, e 6
=
16
,
20). In those cases, our construction demonstrates 3
.
1%
∼
9
.
2% improvement
against Pippenger’s variant as Table 5shows.
Table 4: Experimental time taken to compute Sn,r by diﬀerent methods
G1G2
ns.c. v. s.c. c. Pip. Pip. v. Constr. Pip. Pip. v. Constr.
210 123 us 134 us
15.2 ms 9.91 ms 9.03 ms 37.3 ms 24.4 ms 22.2 ms
211 248 us 265 us
27.1 ms 18.3 ms 17.1 ms 66.4 ms 44.9 ms 41.9 ms
212 497 us 548 us
48.5 ms 34.2 ms 32.2 ms
119 ms
83.3 ms 78.5 ms
213 920 us 657 us
89.4 ms 64.8 ms 62.0 ms
221 ms 160 ms 155 ms
214 1.07 ms 1.23 ms 165 ms 122 ms 114 ms 404 ms 300 ms 279 ms
215 2.14 ms 2.40 ms 303 ms 224 ms 217 ms 734 ms 541 ms 522 ms
216 4.29 ms 6.47 ms 551 ms 422 ms 430 ms 1.35 s 1.03 s 1.05 s
∗216 4.29 ms 7.63 ms 554 ms 424 ms 418 ms 1.34 s 1.03 s 1.01 s
217 12.1 ms 14.9 ms 1.06 s 864 ms 822 ms 2.54 s 2.05 s 1.99 s
218 17.8 ms 28.0 ms 1.93 s 1.60 s 1.49 s 4.69 s 3.88 s 3.61 s
219 31.3 ms 54.6 ms 3.55 s 2.98 s 2.83 s 8.63 s 7.28 s 6.83 s
220 62.7 ms 149 ms 6.84 s 5.62 s 5.63 s 16.7 s 13.7 s 13.5 s
∗220 62.7 ms 109 ms 6.84 s 5.61 s 5.51 s 16.6 s 13.7 s 13.3 s
221 120 ms 296 ms 13.2 s 11.2 s 10.7 s − − −
1We did not do test in G2for n= 221 due to the memory size restriction of the test device.
376 Speeding Up MSM over Fixed Points Towards Eﬃcient zkSNARKs
Table 5: Our method versus Pippenger’s bucket method and Pippenger’s variant
G1G2
nImprov1 Improv2 Improv1 Improv2
210 40.6% 8.86% 40.6% 9.26%
211 36.8% 6.54% 37.0% 6.78%
212 33.7% 5.78% 34.2% 5.74%
213 30.7% 4.40% 29.7% 3.13%
214 31.2% 6.54% 31.0% 7.29%
215 28.4% 3.19% 29.0% 3.61%
216 21.9% 1.88% 21.8% 2.05%
∗216 24.6% 1.48% 24.9% 2.10%
217 22.1% 4.91% 21.6% 3.08%
218 22.8% 6.75% 23.0% 7.03%
219 20.3% 5.12% 20.8% 6.06%
220 17.7% 0.13% 18.8% 1.45%
∗220 19.4% 1.69% 17.9% 2.66%
221 19.0% 4.31% − −
Acknowledgments
We would like to thank the reviewers for providing us detailed and valuable comments
to revise the manuscript. This work is supported by NSERC SPG and Ripple University
Research Grant.
References
[alg] Algorand: The blockchain for futureﬁ. https://www.algorand.com/.
[BC89]
Jurjen Bos and Matthijs Coster. Addition chain heuristics. In Conference on
the Theory and Application of Cryptology, pages 400–407. Springer, 1989.
[BCG+14]
Eli BenSasson, Alessandro Chiesa, Christina Garman, Matthew Green, Ian
Miers, Eran Tromer, and Madars Virza. Zerocash: Decentralized anonymous
payments from bitcoin. In 2014 IEEE Symposium on Security and Privacy, SP
2014, Berkeley, CA, USA, May 1821, 2014, pages 459–474. IEEE Computer
Society, 2014.
[BD19]
Razvan Barbulescu and Sylvain Duquesne. Updating key size estimations for
pairings. Journal of Cryptology, 32(4):1298–1336, 2019.
[BDFG21]
Dan Boneh, Justin Drake, Ben Fisch, and Ariel Gabizon. Halo inﬁnite: Proof
carrying data from additive polynomial commitments. In Tal Malkin and
Chris Peikert, editors, Advances in Cryptology  CRYPTO 2021  41st Annual
International Cryptology Conference, CRYPTO 2021, Virtual Event, August
1620, 2021, Proceedings, Part I, volume 12825 of Lecture Notes in Computer
Science, pages 649–680. Springer, 2021.
[BDLO12]
Daniel J Bernstein, Jeroen Doumen, Tanja Lange, and JanJaap Oosterwijk.
Faster batch forgery identiﬁcation. In International Conference on Cryptology
in India, pages 454–473. Springer, 2012.
Guiwen Luo, Shihui Fu and Guang Gong 377
[bel]
bellman: A crate for building zksnark circuits.
https://github.com/
zkcrypto/bellman.
[Ber06]
Daniel J Bernstein. Diﬀerential addition chains.
https://cr.yp.to/ecdh/
diffchain20060219.pdf, 2006.
[BFS20]
Benedikt Bünz, Ben Fisch, and Alan Szepieniec. Transparent snarks from
DARK compilers. In Anne Canteaut and Yuval Ishai, editors, Advances in
Cryptology  EUROCRYPT 2020  39th Annual International Conference on
the Theory and Applications of Cryptographic Techniques, Zagreb, Croatia,
May 1014, 2020, Proceedings, Part I, volume 12105 of Lecture Notes in
Computer Science, pages 677–706. Springer, 2020.
[BGMW95]
Ernest F Brickell, Daniel M Gordon, Kevin S McCurley, and David B Wilson.
Fast exponentiation with precomputation: Algorithms and lower bounds.
preprint, Mar, 27, 1995.
[bls]
blst: a BLS12381 signature library focused on performance and security
written in c and assembly. https://github.com/supranational/blst.
[BLS02]
Paulo SLM Barreto, Ben Lynn, and Michael Scott. Constructing elliptic
curves with prescribed embedding degrees. In International Conference on
Security in Communication Networks, pages 257–267. Springer, 2002.
[Bow17] Sean Bowe. BLS12381: New zksnark elliptic curve construction, 2017.
[Bro15]
Daniel R Brown. Multidimensional Montgomery ladders for elliptic curves,
February 17 2015. US Patent 8,958,551.
[CGGN17]
Matteo Campanelli, Rosario Gennaro, Steven Goldfeder, and Luca Nizzardo.
Zeroknowledge contingent payments revisited: Attacks and payments for
services. In Proceedings of the 2017 ACM SIGSAC Conference on Computer
and Communications Security, pages 229–243, 2017.
[chi]
Chia network: a better blockchain and smart transaction platform.
https:
//www.chia.net/.
[CHM+19]
Alessandro Chiesa, Yuncong Hu, Mary Maller, Pratyush Mishra, Noah Vesely,
and Nicholas P. Ward. Marlin: Preprocessing zksnarks with universal and
updatable SRS. IACR Cryptology ePrint Archive, 2019:1047, 2019.
[DFGK14]
George Danezis, Cédric Fournet, Jens Groth, and Markulf Kohlweiss. Square
span programs with applications to succinct NIZK arguments. In Palash
Sarkar and Tetsu Iwata, editors, Advances in Cryptology  ASIACRYPT 2014
 20th International Conference on the Theory and Application of Cryptology
and Information Security, Kaoshiung, Taiwan, R.O.C., December 711, 2014.
Proceedings, Part I, volume 8873 of Lecture Notes in Computer Science, pages
532–550. Springer, 2014.
[dﬁ] Dﬁnity foundation: Internet computer. https://dfinity.org/.
[DKS09]
Christophe Doche, David R Kohel, and Francesco Sica. Doublebase number
system for multiscalar multiplications. In Annual International Conference
on the Theory and Applications of Cryptographic Techniques, pages 502–517.
Springer, 2009.
378 Speeding Up MSM over Fixed Points Towards Eﬃcient zkSNARKs
[DR94]
Peter De Rooij. Eﬃcient exponentiation using precomputation and vector ad
dition chains. In Workshop on the Theory and Application of of Cryptographic
Techniques, pages 389–399. Springer, 1994.
[Ebe] Jacob Eberhardt. Zokrates. https://zokrates.github.io/.
[eth]
Ethereum: a technology that’s home to digital money, global payments, and
applications. https://ethereum.org/en/.
[GGPR13]
Rosario Gennaro, Craig Gentry, Bryan Parno, and Mariana Raykova.
Quadratic span programs and succinct NIZKs without PCPs. In Thomas
Johansson and Phong Q. Nguyen, editors, Advances in Cryptology  EU
ROCRYPT 2013, 32nd Annual International Conference on the Theory and
Applications of Cryptographic Techniques, Athens, Greece, May 2630, 2013.
Proceedings, volume 7881 of Lecture Notes in Computer Science, pages 626–645.
Springer, 2013.
[GJW20]
Ariel Gabizon and Zachary J. Williamson. Proposal: The turboplonk program
syntax for specifying snark programs. 2020.
[GKR+21]
Lorenzo Grassi, Dmitry Khovratovich, Christian Rechberger, Arnab Roy, and
Markus Schofnegger. Poseidon: A new hash function for zeroknowledge proof
systems. In 30th USENIX Security Symposium (USENIX Security 21), pages
519–535, 2021.
[GLS11]
Steven D Galbraith, Xibin Lin, and Michael Scott. Endomorphisms for faster
elliptic curve cryptography on a large class of curves. Journal of cryptology,
24(3):446–469, 2011.
[GLV01]
Robert P Gallant, Robert J Lambert, and Scott A Vanstone. Faster point
multiplication on elliptic curves with eﬃcient endomorphisms. In Annual
International Cryptology Conference, pages 190–200. Springer, 2001.
[GMT20]
Aurore Guillevic, Simon Masson, and Emmanuel Thomé. Cocks–pinch curves
of embedding degrees ﬁve to eight and optimal ate pairing computation.
Designs, Codes and Cryptography, 88(6):1047–1081, 2020.
[gna] gnark zkSNARK library. https://github.com/ConsenSys/gnark.
[GOS06]
Jens Groth, Rafail Ostrovsky, and Amit Sahai. Noninteractive zaps and
new techniques for NIZK. In Cynthia Dwork, editor, Advances in Cryptology
 CRYPTO 2006, 26th Annual International Cryptology Conference, Santa
Barbara, California, USA, August 2024, 2006, Proceedings, volume 4117 of
Lecture Notes in Computer Science, pages 97–111. Springer, 2006.
[GOS12]
Jens Groth, Rafail Ostrovsky, and Amit Sahai. New techniques for noninter
active zeroknowledge. Journal of the ACM, 59(3):11:1–11:35, 2012.
[Gro06]
Jens Groth. Simulationsound NIZK proofs for a practical language and
constant size group signatures. In Xuejia Lai and Kefei Chen, editors, Advances
in Cryptology  ASIACRYPT 2006, 12th International Conference on the
Theory and Application of Cryptology and Information Security, Shanghai,
China, December 37, 2006, Proceedings, volume 4284 of Lecture Notes in
Computer Science, pages 444–459. Springer, 2006.
Guiwen Luo, Shihui Fu and Guang Gong 379
[Gro09]
Jens Groth. Linear algebra with sublinear zeroknowledge arguments. In
Shai Halevi, editor, Advances in Cryptology  CRYPTO 2009, 29th Annual
International Cryptology Conference, Santa Barbara, CA, USA, August 1620,
2009. Proceedings, volume 5677 of Lecture Notes in Computer Science, pages
192–208. Springer, 2009.
[Gro10]
Jens Groth. Short pairingbased noninteractive zeroknowledge arguments.
In Masayuki Abe, editor, Advances in Cryptology  ASIACRYPT 2010  16th
International Conference on the Theory and Application of Cryptology and
Information Security, Singapore, December 59, 2010. Proceedings, volume
6477 of Lecture Notes in Computer Science, pages 321–340. Springer, 2010.
[Gro16]
Jens Groth. On the size of pairingbased noninteractive arguments. In
Marc Fischlin and JeanSébastien Coron, editors, Advances in Cryptology 
EUROCRYPT 2016  35th Annual International Conference on the Theory
and Applications of Cryptographic Techniques, Vienna, Austria, May 812,
2016, Proceedings, Part II, volume 9666 of Lecture Notes in Computer Science,
pages 305–326. Springer, 2016.
[GS12]
Jens Groth and Amit Sahai. Eﬃcient noninteractive proof systems for bilinear
groups. SIAM Journal on Computing, 41(5):1193–1232, 2012.
[GWC19]
Ariel Gabizon, Zachary J. Williamson, and Oana Ciobotaru. PLONK: Per
mutations over lagrangebases for oecumenical noninteractive arguments of
knowledge. IACR Cryptol. ePrint Arch., page 953, 2019.
[Knu97]
Donald E Knuth. The Art of Programming, vol. 2 (3rd ed.), Seminumerical
algorithms. Addison Wesley Longman, 1997.
[Lip12]
Helger Lipmaa. Progressionfree sets and sublinear pairingbased non
interactive zeroknowledge arguments. In Ronald Cramer, editor, Theory of
Cryptography  9th Theory of Cryptography Conference, TCC 2012, Taormina,
Sicily, Italy, March 1921, 2012. Proceedings, volume 7194 of Lecture Notes
in Computer Science, pages 169–189. Springer, 2012.
[Mat82]
David W Matula. Basic digit sets for radix representation. Journal of the
ACM (JACM), 29(4):1131–1143, 1982.
[MBKM19]
Mary Maller, Sean Bowe, Markulf Kohlweiss, and Sarah Meiklejohn. Sonic:
Zeroknowledge snarks from linearsize universal and updatable structured
reference strings. In Lorenzo Cavallaro, Johannes Kinder, XiaoFeng Wang, and
Jonathan Katz, editors, Proceedings of the 2019 ACM SIGSAC Conference on
Computer and Communications Security, CCS 2019, London, UK, November
1115, 2019, pages 2111–2128. ACM, 2019.
[Mon87]
Peter L Montgomery. Speeding the Pollard and elliptic curve methods of
factorization. Mathematics of computation, 48(177):243–264, 1987.
[Mon92]
Peter L Montgomery. Evaluating recurrences of form
xm+n
=
f
(
xm, xn, xm−n
)
via Lucas chains, 1983.
https://cr.yp.to/bib/1992/montgomerylucas.
pdf, 1992.
[Pip76]
Nicholas Pippenger. On the evaluation of powers and related problems. In
17th Annual Symposium on Foundations of Computer Science (sfcs 1976),
pages 258–263. IEEE Computer Society, 1976.
380 Speeding Up MSM over Fixed Points Towards Eﬃcient zkSNARKs
[Pol78]
John M Pollard. Monte Carlo methods for index computation. Mathematics
of computation, 32(143):918–924, 1978.
[Rao15]
Srinivasa Rao Subramanya Rao. A note on Schoenmakers algorithm for multi
exponentiation. In 2015 12th International Joint Conference on eBusiness
and Telecommunications (ICETE), volume 4, pages 384–391. IEEE, 2015.
[SIM12]
Vorapong Suppakitpaisarn, Hiroshi Imai, and Edahiro Masato. Fastest multi
scalar multiplication based on optimal doublebase chains. In World Congress
on Internet Security (WorldCIS2012), pages 93–98. IEEE, 2012.
[Str64]
Ernst G Straus. Addition chains of vectors (problem 5125). American
Mathematical Monthly, 70(806808):16, 1964.
[YWLT13]
Wei Yu, Kunpeng Wang, Bao Li, and Song Tian. Joint triplebase num
ber system for multiscalar multiplication. In International Conference on
Information Security Practice and Experience, pages 160–173. Springer, 2013.
[Zca] Zcash: Privacyprotecting digital currency. https://z.cash/.
Appendix
A Our bucket set constructions over BLS12381 curve
Table 6lists our bucket set constructions for
q
= 2
c
(10
≤c≤
31
, c 6
= 15
,
17). Radixes 2
15
and 217 are abandoned because B/q is too large.
Table 6: Bucket sets over BLS12381 curve
q h rh−1BdB/q
210 26 28 218 6 0.213
211 24 3 427 6 0.208
212 22 7 857 6 0.209
213 20 231 1725 6 0.211
214 19 7 3417 6 0.209
215 17 29677 17312 4 0.528
216 16 29677 18343 6 0.280
217 15 118710 69249 4 0.528
218 15 7 54618 6 0.208
219 14 231 109244 6 0.208
220 13 29677 220931 6 0.211
221 13 7 436906 6 0.208
222 12 7419 874437 6 0.208
223 12 3 1747625 6 0.208
224 11 29677 3497731 6 0.208
225 11 28 6990507 6 0.208
226 10 1899369 14139299 6 0.211
227 10 3709 27962333 6 0.208
228 10 7 55924059 6 0.208
229 9 7597479 112481229 6 0.210
230 9 29677 223698691 6 0.208
231 9 115 447392434 6 0.208