Page 1

An Efficient and Parallel Gaussian Sampler for Lattices

Chris Peikert∗

April 13, 2011

Abstract

At the heart of many recent lattice-based cryptographic schemes is a polynomial-time algorithm that,

given a ‘high-quality’ basis, generates a lattice point according to a Gaussian-like distribution. Unlike

most other operations in lattice-based cryptography, however, the known algorithm for this task (due to

Gentry, Peikert, and Vaikuntanathan; STOC 2008) is rather inefficient, and is inherently sequential.

We present a new Gaussian sampling algorithm for lattices that is efficient and highly parallelizable.

We also show that in most cryptographic applications, the algorithm’s efficiency comes at almost no cost in

asymptotic security. At a high level, our algorithm resembles the “perturbation” heuristic proposed as part

of NTRUSign (Hoffstein et al., CT-RSA 2003), though the details are quite different. To our knowledge,

this is the first algorithm and rigorous analysis demonstrating the security of a perturbation-like technique.

1Introduction

In recent years, there has been rapid development in the use of lattices for constructing rich crypto-

graphic schemes.1These include digital signatures (both ‘tree-based’ [LM08] and ‘hash-and-sign’ [GPV08,

CHKP10]), identity-based encryption [GPV08] and hierarchical IBE [CHKP10, ABB10], noninteractive

zero knowledge [PV08], and even a fully homomorphic cryptosystem [Gen09].

The cornerstone of many of these schemes (particularly, but not exclusive to, those that ‘answer queries’)

is the polynomial-time algorithm of [GPV08] that samples from a so-called discrete Gaussian probability

distribution over a lattice Λ. More precisely, for a vector c ∈ Rnand a “width” parameter s > 0, the

distribution DΛ+c,sassigns a probability proportional to exp(−π?v?2/s2) to each v ∈ Λ+c (and probability

zero elsewhere). Given c, a basis B of Λ, and a sufficiently large s (related to the ‘quality’ of B), the GPV

algorithm outputs a sample from a distribution statistically close to DΛ+c,s. (Equivalently, by subtracting c

from the output, it samples a lattice point from a Gaussian distribution centered at −c.) Informally speaking,

the sampling algorithm is ‘zero-knowledge’ in the sense that it leaks no information about its input basis B

(aside from a bound on its quality), because DΛ+c,sis defined without reference to any particular basis. This

zero-knowledge property accounts for its broad utility in lattice-based cryptography.

While the sampling algorithm of [GPV08] has numerous applications in cryptography and beyond, for

both practical and theoretical purposes it also has some drawbacks:

∗School of Computer Science, College of Computing, Georgia Institute of Technology. Email: cpeikert@cc.gatech.edu.

This material is based upon work supported by the National Science Foundation under Grant CNS-0716786. Any opinions, findings,

and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of

the National Science Foundation.

1A lattice Λ ⊂ Rnis a periodic ‘grid’ of points, or more formally, a discrete subgroup of Rnunder addition. It is generated by a

(not necessarily unique) basis B ⊂ Rn×kof k linearly independent vectors, as Λ = {Bz : z ∈ Zk}. In this paper we are concerned

only with full-rank lattices, i.e., where k = n.

1

Page 2

• First, it is rather inefficient: on an n-dimensional lattice, a straightforward implementation requires

exact arithmetic on an n × n matrix having Ω(n)-bit entries (even ignoring some additional logn

factors). While approximate arithmetic and other optimizations may be possible in certain cases, great

care would be needed to maintain the proper output distribution, and the algorithm’s essential structure

appears difficult to make truly practical.

• Second, it is inherently sequential: to generate a sample, the algorithm performs n adaptive iterations,

where the choices made in each iteration affect the values used in the next. This stands in stark contrast

to other ‘embarrassingly parallelizable’ operations that are typical of lattice-based cryptography.

1.1Contributions

We present a new algorithm that samples from a discrete Gaussian distribution DΛ+c,sover a lattice, given a

‘high-quality’ basis for Λ. The algorithm is especially well-suited to ‘q-ary’ integer lattices, i.e., sublattices

of Znthat themselves contain qZnas a sublattice, for some known and typically small q ≥ 2. These

include NTRU lattices [HPS98] and the family of random lattices that enjoy ‘worst-case hardness,’ as first

demonstrated by Ajtai [Ajt96]. Most modern lattice-based cryptographic schemes (including those that rely

on Gaussian sampling) are designed around q-ary lattices, so they are a natural target for optimization.

The key features of our algorithm, as specialized to n-dimensional q-ary lattices, are as follows. It is:

• Offline / online: when the lattice basis is known in advance of the point c (which is the norm in

cryptographic applications), most of the work can be performed as offline precomputation. In fact, the

offline phase may be viewed simply as an extension of the application’s key-generation algorithm.

• Simple and efficient: the online phase involves only O(n2) integer additions and multiplications modulo

q or q2, where the O-notation hides a small constant ≈ 4.

• Fully parallelizable: for any P up to n2, the online phase can allocate O(n2/P) of its operations to

each of P processors.

• High-quality: for random bases that are commonly used in cryptographic schemes, our algorithm can

sample from a Gaussian of essentially the same ‘quality’ as the prior GPV algorithm; this is important

for the concrete security of applications. See Section 1.2.1 below for a full discussion.

We emphasize that for a practical implementation, parallelized operations on small integers represent

a significant performance advantage. Most modern computer processors have built-in support for “vector”

instructions (also known as “single instruction, multiple data”), which perform simple operations on entire

vectors of small data elements simultaneously. Our algorithm can exploit these operations very naturally. For

a detailed efficiency comparison between our algorithm and that of [GPV08], see Section 1.2.2 below.

At a very high level, our algorithm resembles the “perturbation” heuristic proposed for the NTRUSign

signature scheme [HHGP+03], but the details differ significantly; see Section 1.3 for a comparison. To our

knowledge, this is the first algorithm and analysis to demonstrate the theoretical soundness of a perturbation-

like technique. Finally, the analysis of our algorithm relies on some new general facts about ‘convolutions’ of

discrete Gaussians, which we expect will be applicable elsewhere. For example, these facts allow for the use

of a clean discrete Gaussian error distribution (rather than a ‘rounded’ Gaussian) in the “learning with errors”

problem [Reg05], which may be useful in certain applications.

2

Page 3

1.2Comparison with the GPV Algorithm

Here we give a detailed comparison of our new sampling algorithm to the previous one of [GPV08]. The two

main points of comparison are the width (‘quality’) of the sampled Gaussian, and the algorithmic efficiency.

1.2.1 Gaussian Width

One of the important properties of a discrete Gaussian sampling algorithm is the width s of the distribution

it generates, as a function of the input basis. In cryptographic applications, the width is the main quantity

governing the concrete security and, if applicable, the approximation factor of the underlying worst-case

lattice problems. This is because in order for the scheme to be secure, it must hard for an adversary to find a

lattice point within the likely radius s√n of the Gaussian (i.e., after truncating its negligibly likely tail). The

wider the distribution, the more leeway the adversary has in an attack, and the larger the scheme’s parameters

must be to compensate. On the other hand, a more efficient sampling algorithm can potentially allow for the

use of larger parameters without sacrificing performance.

The prior sampling algorithm of [GPV08], given a lattice basis B = {b1,...,bn}, can sample from

a discrete Gaussian having width as small as ??B? = maxi??bi?, where?B denotes the Gram-Schmidt

our new algorithm, so for simplicity we ignore it in this summary.) As a point of comparison, ??B? is always

In contrast, our new algorithm works for a width s as small as the largest singular value s1(B) of the

basis B, or equivalently, the square root of the largest eigenvalue of the Gram matrix BBt. It is easy to

show that s1(B) is always at least maxi?bi?, so our new algorithm cannot sample from a narrower Gaussian

than the GPV algorithm can. At the same time, any basis B can always be efficiently processed (without

increasing ??B?) to guarantee that s1(B) ≤ n · ??B?, so our algorithm is at worst an n factor looser than that

While a factor of n gap between the two algorithms may seem rather large, in cryptographic applications

this worst-case ratio is actually immaterial; what matters is the relative performance on the random bases

that are used as secret keys. Here the situation is much more favorable. First, we consider the basis-

generation algorithms of [AP09] (following [Ajt99]) for ‘worst-case-hard’ q-ary lattices, which are used

in most theoretically sound cryptographic applications. We show that with a minor modification, one of

the algorithms from [AP09] outputs (with overwhelming probability) a basis B for which s1(B) is only an

O(√logq) factor larger than ??B? (which itself is asymptotically optimal, as shown in [AP09]). Because

the Gaussian. Similarly, when the vectors of B are themselves drawn from a discrete Gaussian, as in the

basis-delegation technique of [CHKP10], we can show that s1(B) is only a ω(√logn) factor larger than ??B?

of our algorithm can come at almost no asymptotic cost in security. Of course, a concrete evaluation of the

performance/security trade-off for real-world parameters would require careful analysis and experiments,

which we leave for later work.

orthogonalization of B.2(Actually, the width also includes a small ω(√logn) factor, which is also present in

at most maxi?bi?, and in some cases it can be substantially smaller.

of [GPV08].

q is typically a small polynomial in n, this amounts to a cost of only an O(√logn) factor in the width of

(with overwhelming probability). Therefore, in cryptographic applications the performance improvements

1.2.2Efficiency

We now compare the efficiency of the two known sampling algorithms. We focus on the most common case of

q-ary n-dimensional integer lattices, where a ‘good’ lattice basis (whose vectors having length much less than

2In the Gram-Schmidt orthogonalization?B of B, the vector?biis the projection of biorthogonally to span(b1,...,bi−1).

3

Page 4

q) is initially given in an offline phase, followed by an online phase in which a desired center c ∈ Znis given.

This scenario allows for certain optimizations in both algorithms, which we include for a fair comparison.

The sampling algorithm from [GPV08] can use the offline phase to compute the Gram-Schmidt orthogo-

nalization of its given basis; this requires Ω(n4log2q) bit operations and Ω(n3) bits of intermediate storage.

The online phase performs n sequential iterations, each of which computes an inner product between a

Gram-Schmidt vector having Ω(n)-bit entries, and an integer vector whose entries have magnitude at most q.

In total, these operations require Ω(n3logq) bit operations. In addition, each iteration performs a certain

randomized-rounding operation, which, while asymptotically poly(logn)-time, is not especially practical

(nor precomputable) because it uses rejection sampling on a value that is not known until the online phase.

Lastly, while the work within each iteration may be parallelized, the iterations themselves must be performed

sequentially.

Our algorithm is more efficient and practical in the running time of both phases, and in the amount of

intermediate storage between phases. The offline phase first computes a matrix inverse modulo q2, and a

‘square root’ of a matrix whose entries have magnitude at most q; these can be computed in O(n3log2q) bit

operations. Next, it generates and stores one or more short integer ‘perturbation’ vectors (one per future call

to the online phase), and optionally discards the matrix square root. The intermediate storage is therefore

as small as O(n2logq) bits for the matrix inverse, plus O(nlogq) bits per perturbation vector. Optionally,

the offline phase can also precompute the randomized-rounding operations, due to the small number of

possibilities that can occur online. The online phase simply computes about 4n2integer additions and

multiplications (2n2of each) modulo q or q2, which can be fully parallelized among up to n2processors.

Lastly, we mention that our sampling algorithm translates very naturally to the setting of compact q-ary

lattices and bases over certain rings R that are larger than Z, where security is based on the worst-case

hardness of ideal lattices in R (see, e.g., [Mic02, SSTX09, LPR10]). In contrast to GPV, our algorithm can

directly take advantage of the ring structure for further efficiency, yielding a savings of an˜Ω(n) factor in the

computation times and intermediate storage.

1.3Overview of the Algorithm

The GPV sampling algorithm [GPV08] is based closely on Babai’s “nearest-plane” decoding algorithm for

lattices [Bab86]. Babai’s algorithm takes a point c ∈ Rnand a lattice basis B = {b1,...,bn}, and for

i = n,...,1 computes a coefficient zi∈ Z for biby iteratively projecting (‘rounding’) c orthogonally to the

nearest hyperplane of the form zibi+ span(b1,...,bi−1). The output is the lattice vector?

to sample from a discrete Gaussian centered at c, uses randomized rounding in each iteration to select a

‘nearby’ plane, under a carefully defined probability distribution. (This technique is also related to another

randomized-rounding algorithm of Klein [Kle00] for a different decoding problem.)

In addition to his nearest-plane algorithm, Babai also proposed a simpler (but somewhat looser) lattice

decoding algorithm, which we call “simple rounding.” In this algorithm, a given point c ∈ Rnis rounded to

the lattice point B?B−1c?, where each coordinate of B−1c ∈ Rnis independently rounded to its nearest

integer. With precomputation of B−1, this algorithm can be quite practical — especially on q-ary lattices,

where several more optimizations are possible. Moreover, it is trivially parallelized among up to n2processors.

Unfortunately, its deterministic form it turns out to be completely insecure for ‘answering queries’ (e.g.,

digital signatures), as demonstrated by Nguyen and Regev [NR06].

A natural question, given the approach of [GPV08], is whether a randomized variant of Babai’s simple-

rounding algorithm is secure. Specifically, the natural way of randomizing the algorithm is to round each

coordinate of B−1c to a nearby integer (under a discrete Gaussian distribution over Z, which can be sampled

izibi, whose

distance from the original c can be bounded by the quality of B. The GPV algorithm, whose goal is instead

4

Page 5

efficiently), then left-multiply by B as before. Unlike with the randomized nearest-plane algorithm, though,

the resulting probability distribution here is unfortunately not spherical, nor does it leak zero knowledge.

Instead, it is a ‘skewed’ (elliptical) Gaussian, where the skew mirrors the ‘geometry’ of the basis. More

precisely, the covariance matrix Ex[(x − c)(x − c)t] of the distribution (about its center c) is approximately

BBt, which captures the entire geometry of the basis B, up to rigid rotation. Because covariance can be

measured efficiently from only a small number of samples, the randomized simple-rounding algorithm leaks

this geometry.3

Our solution prevents such leakage, in a manner inspired by the following facts. Recall that if X and Y are

two independent random variables, the probability distribution of their sum X + Y is the convolution of their

individual distributions. In addition, for continuous (not necessarily spherical) Gaussians, covariance matrices

are additive under convolution. In particular, if Σ1and Σ2are covariance matrices such that Σ1+ Σ2= s2I,

then the convolution of two Gaussians with covariance matrices Σ1, Σ2(respectively) is a spherical Gaussian

with standard deviation s.

The above facts give the basic idea for our algorithm, which is to convolve the output of the randomized

simple-rounding algorithm with a suitable non-spherical (continuous) Gaussian, yielding a spherically

distributed output. However, note that we want the algorithm to generate a discrete distribution — i.e., it

must output a lattice point — so we should not alter the output of the randomized-rounding step. Instead,

we first perturb the desired center c by a suitable non-spherical Gaussian, then apply randomized rounding

to the resulting perturbed point. Strictly speaking this is not a true convolution, because the rounding step

depends on the output of the perturbation step, but we can reduce the analysis to a true convolution using

bounds related to the “smoothing parameter” of the lattice [MR04].

The main remaining question is: for a given covariance matrix Σ1 = BBt(corresponding to the

rounding step), for what values of s is there an efficiently sampleable Gaussian having covariance matrix

Σ2= s2I − Σ1? The covariance matrix of any (non-degenerate) Gaussian is symmetric positive definite,

i.e., all its eigenvalues are positive. Conversely, every positive definite matrix is the covariance of some

Gaussian, which can sampled efficiently by computing a ‘square root’ of the covariance matrix. Since any

eigenvector of Σ1(with eigenvalue σ2> 0) is also an eigenvector of s2I (with eigenvalue s2), it must be an

eigenvector of Σ2(with eigenvalue s2− σ2) as well. Therefore, a necessary and sufficient condition is that

all the eigenvalues of Σ1be less than s2. Equivalently, the algorithm works for any s that exceeds the largest

singular value of the given basis B. More generally, it can sample any (possibly non-spherical) discrete

Gaussian with covariance matrix Σ > Σ1(i.e., Σ − Σ1is positive definite).

In retrospect, the high-level structure of our algorithm resembles the “perturbation” heuristic proposed

for NTRUSign [HHGP+03], though the details are quite different. First, the perturbation and rounding steps

in NTRUSign are both deterministic with respect to two or more bases, and there is evidence that this is

insecure [MPSW09], at least for a large polynomial number of signatures. Interestingly, randomization also

allows for improved efficiency, since our perturbations can be chosen with offline precomputation (as opposed

to the deterministic method of [HHGP+03], which is inherently online). Second, the signing and perturbation

bases used in NTRUSign are chosen independently, whereas our perturbations are carefully chosen to conceal

the statistics that would otherwise be leaked by randomized rounding.

3Given the above, one might still wonder whether the covariance BBtcould be simulated efficiently (without any privileged

knowledge about the lattice) when B is itself drawn from a ‘nice’ distribution, such as a discrete Gaussian. Indeed, if the vectors of

B were drawn independently from a continuous Gaussian, the matrix BBtwould have the so-called Wishart distribution, which can

be generated ‘obliviously’ (without knowledge of B itself) using the Bartlett decomposition. (See, e.g., [Ksh59] and references

therein). Unfortunately, these facts do not quite seem to carry over to discrete Gaussians, though they may be useful in another

cryptographic context.

5