PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Consider a quantum circuit that, when fed a constant input, produces a fixed-length random bit-string in each execution. Executing it many times yields a sample of many bit-strings that contain fresh randomness inherent to the quantum evaluation. When the circuit is freshly selected from a special class, the output distribution of strings cannot be simulated in a short amount of time by a classical (non-quantum) computer. This quantum vs. classical gap of computational efficiency enables ways of inferring that an honest sample contains quantumly generated strings, and therefore fresh randomness. This possibility, initially proposed by Aaronson, has been recently validated in a "quantum supremacy" experiment by Google, using circuits with 53 qubits. In these notes, we consider the problem of estimating information entropy (a quantitative measure of random-ness), based on the sum of "probability values" (here called QC-values) of strings output by quantum evaluation. We assume that the sample of strings, claimed to have been produced by repeated evaluation of a quantum circuit, was in fact crafted by an adversary intending to induce us into overestimating entropy. We analyze the case of a "collisional" adversary that can over-sample and possibly take advantage of observed collisions. For diverse false-positive and false-negative rates, we devise parameters for testing the hypothesis that the sample has at least a certain expected entropy. This enables a client to certify the presence of entropy, after a lengthy computation of the QC-values. We also explore a method for low-budget clients to compute fewer QC-values, at the cost of more computation by a server. We conclude with several questions requiring further exploration.
Content may be subject to copyright.
Notes on Interrogating Random Quantum Circuits
Luís T. A. N. Brandão*and René Peralta*
May 29, 2020
Abstract
Consider a quantum circuit that, when fed a constant
input, produces a xed-length random bit-string in each
execution. Executing it many times yields a sample of
many bit-strings that contain fresh randomness inherent
to the quantum evaluation. When the circuit is freshly
selected from a special class, the output distribution of
strings cannot be simulated in a short amount of time
by a classical (non-quantum) computer. This quantum
vs. classical gap of computational eciency enables ways
of inferring that an honest sample contains quantumly
generated strings, and therefore fresh randomness. This
possibility, initially proposed by Aaronson, has been re-
cently validated in a “quantum supremacy” experiment
by Google, using circuits with 53 qubits.
In these notes, we consider the problem of estimating
information entropy (a quantitative measure of random-
ness), based on the sum of “probability values” (here
called QC-values) of strings output by quantum evalua-
tion. We assume that the sample of strings, claimed to
have been produced by repeated evaluation of a quantum
circuit, was in fact crafted by an adversary intending to
induce us into over-estimating entropy. We analyze the
case of a “collisional” adversary that can over-sample and
possibly take advantage of observed collisions.
For diverse false-positive and false-negative rates, we
devise parameters for testing the hypothesis that the sam-
ple has at least a certain expected entropy. This enables
a client to certify the presence of entropy, after a lengthy
computation of the QC-values. We also explore a method
for low-budget clients to compute fewer QC-values, at
the cost of more computation by a server. We conclude
with several questions requiring further exploration.
Keywords:
certiable randomness, distinguishability, en-
tropy estimation, gamma distribution, public randomness,
quantum randomness, randomness beacons.
*National Institute of Standards and Technology (Gaithersburg USA).
ORCIDs: [0000-0002-4501-089X] and [0000-0002-2318-7563].
Opinions expressed in this paper are from the authors and are not
to be construed as ocial or as views of the U.S. Department of
Commerce. Certain commercial entities, equipment, or materials
may be identied in this document in order to describe an ex-
perimental procedure or concept adequately. Such identication
is not intended to imply recommendation or endorsement by
NIST, nor is it intended to imply that the entities, materials, or
equipment are necessarily the best available for the purpose.
Index of sections
1. Introduction ......................... 2
1.1. System model ................. 2
1.2. Entropy of a sample .............. 4
1.3. Organization ................. 4
2. Exponential model.................... 5
2.1. The frequency-density representation ..... 5
2.2. Summary statistics .............. 6
2.3. Entropy per honest string ........... 6
2.4. Sampling with vs. without replacement . . . . 6
3. Sums of QC-values.................... 7
3.1. Statistics of interest .............. 8
3.2. CDFs of sums of i.i.d. variables ........ 8
3.3. Testing honest sampling ............ 9
3.4. Threshold vs. probability ........... 9
3.5. Sample sizes vs. thresholds .......... 9
4. Low-budget clients....................11
4.1. Truncated QC-values ............. 11
4.2. Sum of truncated QC values (STQC) ..... 12
5. Entropy estimation ...................12
5.1. Overview ................... 13
5.2. The client ................... 13
5.3. The pseudo-delity adversary ......... 14
5.4. The collisional adversary ........... 16
5.5. Final randomness for applications ....... 20
5.6. Classes of adversaries ............. 20
6. Concluding remarks...................21
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 22
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
A. Terminology ........................22
A.1. Abbreviations ................. 22
A.2. Acronyms ................... 23
A.3. Symbols .................... 23
B. Expected value and variance ...........23
B.1. Auxiliary primitives .............. 23
B.2. Expected values ................ 24
B.3. Variances ................... 24
B.4. The chosen-count sampling case ....... 24
C. Sum of QC-Values (SQCs)............. 24
C.1. CLT approximation .............. 24
C.2. Exact Gamma distributions ......... 25
C.3. A Gamma approximation ........... 27
D. Discretization of QC-values ............ 28
D.1. Individual probabilities ............ 28
D.2. Collisions ................... 28
D.3. Approximate sum of “entropies” ....... 30
E. Tables with more detail ............... 32
Page 1 of 35
List of Figures
Figure 1 QC-values upon uniform string sampling .5
Figure 2 QC-values upon quantum string sampling .5
Figure 3 Various PDFs of QC-values ........ 6
Figure 4 Various PDFs of SQC with  . . . . 8
Figure 5 PDF approximations of QC-values  .9
Figure 6 PDF approximations of SQC  . . 9
Figure 7 Inverse CDFs of SQC with . . . . 10
Figure 8 Inverse CDFs of SQC with . . . . 10
Figure 9 Sample size vs. FN=FP, with  .10
Figure 10 Sample size vs. FN=FP, with  . . 11
List of Tables
Table 1 Statistics of QC-values ........... 6
Table 2 Expected number of collisions ( ).7
Table 3 Statistics of SQCs of strings ....... 8
Table 4 Number of strings for SQC distinguishability 11
Table 5 TQC truncation thresholds ......... 11
Table 6 Statistics of Truncated QCs ........ 12
Table 7 Number of client-veried TQC values . . . 12
Table 8 Number of strings for SQC distinguishability 16
Table 9 Comparison pseudo-delity vs. collisional . . 19
Table 10 Gamma vs. Normal (CLT) approximations .27
Table 11 Entropy approximations: Uniform, . . 31
Table 12 Entropy approximations: Uniform, . . 32
Table 13 Entropy approximations: Quantum, .32
Table 14 Sample size for SQC distinguishability . . . 33
Table 15 Sample size for STQC distinguishability . . 34
Table 16 Statistics per bin and budget factor . . . 35
1. Introduction
The recent experimental proof of quantum supremacy
using Noisy Intermediate-Scale Quantum (NISQ) devices
showed that quantum circuits with 50+ qubits can now
be sampled with signicant delity [AABB+19]. Using
this technology, it is possible, in principle, to generate a
sample of bit-strings that can at a later time be “certi-
ed” as containing strings that were quantumly sampled
[Aar19], implying it can be externally veried that the
sample contained at least a minimum of fresh entropy.
We address the following two related questions: Under
a claim that a sequence of bit-strings has been generated
by sampling a given quantum circuit, how much entropy
can be safely assumed to be contained in it? Given a
goal of entropy, how many strings should be sampled to
enable a verication with high assurance? We consider an
adversarial setting, as usual in cryptography, where the
claimant tries to trick us into over-estimating entropy.
A metrological viewpoint.
In the scope of the Na-
tional Quantum Initiative Act [NQIA] from the U.S
Congress, the National Institute of Standards and Tech-
nology (NIST) is interested in the development of quan-
tum computing and its applications. A potential appli-
cation, within reach of current or soon-to-reach state of
the art, is the production of certiable randomness based
on evaluation of random quantum circuits [Aar19]. The
Computer Security Division at NIST has a special inter-
est in the area of randomness, which has an essential role
in cryptography. “Certiable” randomness in particular
may be useful in the context of public randomness, such
as that produced by randomness beacons [KBPB19].
In these notes we take the viewpoint of a metrology body
[NIST] in doing a preliminary evaluation of the param-
eters of a potential application for obtaining certiable
randomness. We consider a cryptographic perspective,
e.g., when asking what false positive and false negative
rates should be considered in distinguishability exper-
iments. This is a preliminary analysis and should be
taken as such. We do not investigate here the complexity-
theoretic basis for the assumption that sampling from
certain quantum circuits can be done eciently with a
quantum computer but not classically [AC16]. However,
we explore how certain attacks drastically reduce entropy
from the set of bit strings to be certied. The analysis
thus illustrates the need to dene appropriate safety mar-
gins for diverse parameters (e.g., number of strings to
sample), and rules out certain ranges thereof.
1.1. System model
1.1.1. The operator
We want to compare an honest operator of the quantum
computer — which generates a sample with fresh entropy
via an honest quantum evaluation, leading a sample to
be accepted with a statistically high probability (i.e., low
false-negative rate) — vs. a malicious operator — which
maliciously minimizes the amount of entropy in a sample
crafted to sill be acceptable with a not too-low probability
(i.e., a not-too-low false-positive rate).
In any of the cases, we assume that the circuit received for
quantum evaluation was unpredictable to the operator.
It could for example be based on fresh public random-
ness, or have been provided by a client interested in the
experiment. The operator then needs to output a sample
of strings (supposedly by evaluation of the circuit) after
a short amount of time, namely before being able to
compute their output probabilities.
The honest case.
The honest operator of the quantum
computer repeatedly evaluates the quantum circuit, to
probabilistically obtain output strings that, based on the
Page 2 of 35
computational model, inherently contain fresh entropy.
It is assumed that the sampling from such probability
distribution in a fast way is only possible by way of said
quantum computation. Later, a classical super-computer
performs a lengthy computation (e.g., of a few days)
of the output probabilities of the output strings (e.g.,
see Ref. [PGNHW19] for an analysis of the complex-
ity of simulating probability values for circuits with 54
qubits). Finally, a statistical analysis of those probabili-
ties conrms that some strings must have been output by
quantum evaluation, and, consequently, that the sample
set contains entropy that was fresh at the time of the
sample generation. Such randomness is then denoted as
“certied” (or “certiable”) randomness.
The adversarial case.
In adversarial contexts (as
with cryptographic applications), we are interested in
scenarios where the operator of the quantum computer
wants to trick us into accepting a maliciously produced
sample. Therefore, we consider that the sampling may
have been performed in a variety of ways, such as:
uniformly at random from a dened set
of strings;
as a pseudo-randomly generated output computed
from a xed secret seed;
using rejection sampling on the output of the circuit;
a mix of the above and other unknown methods.
The malicious goal is to minimize the entropy of the
sample, while having a not-too-low probability of it being
a posteriori accepted by the statistical test performed
by a client, who will compute and take in consideration
the probability values of the strings in the sample. We
consider concrete specications of adversary in Section 5.
The quantum computation.
The operator can use
a quantum computer to quantumly-evaluate circuits that
output strings of
bits. We denote by
 
the set of
-bit strings. There are
    
such strings. The honest computer implementation is
characterized by a delity parameter
. The sampling is
with probability
from correct quantum evaluation of
the circuit, and otherwise (i.e., with probability
)
uniform from
. As of this writing, both Google and
IBM report having quantum computers with more than 50
bits. As a reference in this work, we use the specication
reported about the quantum computer of Google, which
can evaluate circuits with

qubits at a delity of about
[AABB+19].
When honestly evaluating the quantum circuit, with -
delity
, the computer operator is not able to distinguish
whether an output string was obtained by uniform se-
lection (with probability

) or by a correct circuit
evaluation (with probability
). A malicious operator
can naturally decide to sample strings in arbitrary ways,
but (in the considered model) cannot determine, before
an expensive and lengthy classical computation, anything
new about the probability that a given string would have
been output from a correct circuit evaluation.
1.1.2. Circuits and probabilities
The circuits, with particular specications [AABB+19,
Fig. 3], are selected from a class with large cardinality,
such that it is infeasible to precompute useful information
about a non-negligible proportion of circuits.
QC-values.
Each quantum
c
ircuit
has its own prob-
abilistic distribution of output strings upon quantum
evaluation. We are interested in the set
QCVALUES Prob(1)
of “probabilities of occurrence” — here denoted as “QC-
values” — of the output strings. For simplicity we used
a set, assuming all probabilities are dierent; otherwise
we could describe a list with possible repetitions.
Assumptions.
The subsequent analysis in these notes
is based on assumptions whose coverage we have not
independently investigated. (Dierent assumptions may
invalidate some of our estimates of security or entropy
in adversarial settings.) An important notion is what
we call a “short amount of time” — the time duration
between the moment the adversary learns the circuit
specication and the deadline for publishing a sample of
output strings. At a high level, the assumptions are:
1.
for all circuits in the class, the QC-values of the
output strings t an exponential model: the density
of QC-values (real numbers between 0 and 1) is as-
sumed well approximated by an “exponential” curve.
More concretely, the “frequency density” is a normal-
ized negative exponential function

of the “probability value” [BISB+18].
2.
a quantum computer can eciently evaluate the
circuit many times within a short amount of time;
3.
without prior knowledge of (an approximation of)
the probability values, classical computers cannot,
within a short amount of time, simulate a circuit
evaluation with the appropriate output distribution;
4.
a computer (quantum or classical) cannot, within a
short amount of time, compute a useful approxima-
tion of probability values of the output strings;
5.
a classical super-computer can calculate the QC-
values (i.e.,
Prob  
for any string
) after
amoderately large amount of time, at a large-but-
possible-in-practice computational cost.
The abilities and inabilities mentioned above depend on
the number
of qubits. In these notes we focus on
Page 3 of 35

qubits. We assume the computations referred in
Assumptions 3,4and 5require resources exponential in
. Thus,
needs to be chosen such that the exponential
cost is infeasible in a short amount of time, but feasible
in a moderately large amount of time.
Towards certied randomness.
Relying on the
above, and based on a proposal by Aaronson [Aar19],
here is, at a high level, a potential experiment to produce
certied (i.e., externally veriable) randomness:
1.
The operator is given the specication of a quantum
circuit freshly chosen at random from a given class.
(For example, in the context of public randomness,
the choice may be based on a timely output of a
trusted randomness beacon.)
2.
Soon thereafter, the operator evaluates the circuit
many times and publishes the output strings.
3.
Later, a classical supercomputer computes the “QC-
values” corresponding to those sampled strings.
4.
By statistically analyzing the “QC-values”, one then
gains assurance (or not) that at least some strings
were quantumly produced and thus have entropy.
For eciency of execution we consider a sampling of
many strings from the same circuit. Comparatively, the
proposal by Aaronson uses one new circuit for each string,
to enable, with respect to entropy estimation, a security
reduction to a complexity theoretic hardness assumption.
It is an open problem what kind of reduction can be
made for the case of sampling multiple strings from a
single circuit. Section 2.4 considers tradeos between the
two approaches, and mentions the possible intermediate
case of several circuits with several strings each.
Random variables.
For the initial statistical analy-
sis in these notes, the main random variable of interest
for each sampling experiment is the sum, across sampled
strings, of the QC-values. Recall that these are the “prob-
ability values” that a correct evaluation of the quantum
circuit — what we denote as quantum sampling delity
1 — would output such strings. We denote by
the
random variable corresponding to this
s
um of QC-values
(SQC). We use indices to indicate the type of sampling
(,,,) and its parameters (,,):
Uniform:  (strings are sampled uniformly)
Pure
Q
uantum:

(
strings are obtained by
correct quantum evaluation of the circuit)
F
idelity
:

(each of
strings is obtained
either, with probability
, by correct quantum eval-
uation, or, with probability , uniformly)
C
hosen-count
:

(
strings are sampled by
correct quantum evaluation, and the other

are
pseudo-randomly selected)
The index
may be omitted when it is 1. Note that the
Pure
Q
uantum and the
C
hosen-count sampling are only
possible with a quantum computer with delity 1.
Distinguishability.
In these notes, we are focused on
the problem of distinguishing honest
F
idelity sampling
from a quantum circuit vs. malicious sampling performed
by an adversary with the goal of inducing over-estimation
of the entropy in the sample. We propose parameters for
sampling experiments, distinguishability thresholds, and
entropy estimation, assuming an adversarial setting.
1.2. Entropy of a sample
The meaning of entropy can be elusive and subject to
nuances. The measure of entropy of a string, or of a
sequence of strings, only makes sense with respect to
a probability distribution of outputs. For example, in
the case of a uniform distribution over the set of
-bit
strings we say that each occurring string has
-bits of
entropy. Conversely, a string obtained pseudo-randomly
from a xed apriori determined seed (whose bits are not
counted) has overall entropy 0.
Typical interpretations of entropy relate to unpredictabil-
ity, compressibility, and/or reproducibility. We focus our
analysis on estimating Shannon entropy (the expected
negative binary logarithm,
log
, of probabilities) rather
than minimum entropy (minlog).
Estimating entropy.
In these notes, we estimate the
entropy of a sample viewed as a vector of strings. We
consider an adversarial setting where each string can be
obtained from a dierent probability distribution. The
distributions can have dependencies, such as those related
to sampling without replacement, and/or from sorting the
sequence based on some order relation. Even though the
adversary has an incentive to minimize entropy, that goal
is conditioned on the sample being accepted by the client
with a certain minimum probability. For appropriately
parametrized experiments, this requires the adversary to
use some quantumly-obtained strings that have associated
entropy. If we take into account the conditional form
of probability distributions, then we can consider the
overall entropy of the sample as sum of “entropies” of
its consecutive strings, when for each string there is an
underlying conditional probability distribution that takes
into account the dependency on the previous strings. In
fact, we will consider an adversary that selects strings one
by one, with dependencies across each other, and argue
that such adversary is optimal (within stated constraints).
1.3. Organization
Section 2analyzes the exponential model of QC-values.
Section 3discusses the distribution and statistics of
“Sums of QC-values” (SQCs) for several sampling ex-
periments, and determines optimal thresholds for distin-
Page 4 of 35
guishability. Section 4explores alternative parameters for
settings where the client has a “low budget” for verifying
QC-values computed by a distrusted server. Section 5
considers the estimation of entropy in the face of an ad-
versarial sampling. Section 6concludes with suggested
questions for followup. The Appendix contains auxiliary
details. Section Adenes abbreviations, acronyms and
symbols. Section Bderives formulas for the expected
value and variance of several distributions. Section C
analyzes several distributions of sums of QC-values. Sec-
tion Dconsiders the discretization of QC-values and the
statistics when in the face of collisions. Section Epresents
additional large tables.
2. Exponential model
2.1. The frequency-density representation
We consider the model [BISB+18] where the quantum
circuit outputs strings whose frequency density (
, a
continuous approximation) of QC-values is dened by an
exponential distribution (Exp) with rate :
Exp  (2)
2.1.1. U: Uniform sampling
The function
is not a
p
robability
d
ensity
f
unction
(PDF) of the output strings, but rather a PDF of their
QC-values when strings are sampled uniformly from the
set
of all
-bit strings. The corresponding
c
umulative
distribution function (CDF) is
(3)
In the continuous model we calculate statistics while
integrating between 0 and innity, but the contribution
between 1 and innity is negligible (

). The actual
discrete QC-values (see a discretization in Appendix D),
being probabilities, are between 0 and 1.
Figure 1plots the frequency density curve (
), and its
accumulator (integral, times
), along with a histogram
of
. In the histogram, the height of each constant-width
bin equals
times the integral of the curve between the
limits of the bin. For example: 63.2 % is the fraction of
strings with “QC-values”

. The scales of the axes
are represented relative to
 
. The curve vanishes
exponentially fast.
0 1/N 2/N 3/N 4/N 5/N 6/N
0
0.1 N
0.2 N
0.3 N
0.4 N
0.5 N
0.6 N
0.7 N
0.8 N
0.9 N
N
QC-value
f: Exp[N] PDF
N × "Exp[N] CDF"
Histogram (bin width 1/N)
0.632
0.233
0.086 0.031 0.012 0.004
Figure 1: QC-values upon uniform string sampling
0 1/N 2/N 3/N 4/N 5/N 6/N
0
0.1 N
0.2 N
0.3 N
0.4 N
0.5 N
0.6 N
0.7 N
0.8 N
0.9 N
N
QC-value
N×f×p: Erlang[2,N] PDF
N × "Erlang[2,N] CDF"
Histogram (bin width 1/N)
0.264 0.330
0.207
0.108 0.051 0.023
Figure 2: QC-values upon quantum string sampling
2.1.2. Q: [pure] Quantum sampling
For more insight, Fig. 2plots the density curve of fre-
quency times QC-value, and its accumulator (times
).
Since QC-values are the probabilities of QC-values upon
sampling by quantum circuit evaluation, the correspond-
ing PDF (
) and CDF (
) of QC-values are as follows:
 (4)
 (5)
In Fig. 2, the accumulator curve shows

multiplied
by to enable simultaneous view with .
This is called an Erlang distribution, with “shape”
and
“rate”
. Interestingly, this random variable (
) corre-
sponds to the sum of two exponential random variables
(
) with rate
. This stems from the additivity of the
Erlang distribution, of which the exponential distribution
is the special case with shape 1. (It would be interesting
to explore whether this equivalence as a sum of two inde-
pendent exponential variables may have a more insightful
interpretation as a quantum phenomenon.)
Page 5 of 35
0 1/N 2/N 3/N 4/N 5/N 6/N
0
0.10 N
0.20 N
0.30 N
0.40 N
0.50 N
0.60 N
0.70 N
0.80 N
0.90 N
1.00 N
PDFs of QC-values
Pure-Quantum
Fidelity 0.4
Uniform
Figure 3: Various PDFs of QC-values
2.1.3. F: Fidelity sampling (the practical case)
In practice, honest sampling uses a quantum computer
characterized by a delity
between 0 and 1. This delity
is the probability that during the quantum computation
all gates in the circuit function without fault. When one
or more gates fail, the model assumes that the evaluation
yields a uniformly random bit-string. The resulting PDF
and CDF of QC-values are a mix of the uniform and the
pure-quantum case, as follows:
 (6)
 (7)
The
f
idelity case (allowing a generic
) generalizes both
the
u
niform (
) and the pure
q
uantum cases (
).
Figure 3plots at once three PDFs of QC-values.
2.2. Summary statistics
From the PDFs of QC-values for each type of sampling
(e.g.,
,
,
), we can derive statistics of interest, such as
the expected value () and variance (). Table 1shows
the resulting formulas. The calculations are detailed in
Appendix B. Two observations:
Both
and
in the pure
q
uantum case are twice
the corresponding statistic of the uniform case.
The less trivial result is the variance

for
Table 1: Statistics of QC-values
Sampling type Random
variable
Expected
value  Variance 
Uniform  
Pure Quantum  
Fidelity   
fidelity sampling, since for each individual sampled
string the possible outcome as a uniform sampled
string (
) is not independent of the possible out-
come as a correct quantum circuit output (as ).
2.3. Entropy per honest string
As a reference case, consider the expected entropy of
an individual string quantumly-sampled with delity 1.
Such entropy is slightly less than the number

of
bits per string, since the quantumly-generated strings do
not have a uniform distribution. From the exponential
model for the frequency density of QC-values, we can
in a rst approximation consider a notion of dierential
entropy (using log base 2):
 log (8)
In the integral, the factor

corresponds to the
density-number of strings that have probability
. The
approximate result (ignoring terms negligible in ) is
loglog (9)
where
  
is the Euler-Mascheroni constant.
For
  
this means about

bits of expected
entropy per string. This is the continuous approximation
of the Shannon entropy
(10)
, which sums, across every
string, the product of each discrete QC-value and its
log
.
Appendix D.1 considers a discretization.
 log (10)
The 52.39 bits of entropy per string are valid in the setting
of honest delity-1 evaluation with replacement. That
value changes when considering a sample composed of
strings required to be distinct, and even more so if they
may be selected adversarially. Appendix D.2 considers
an adversary that outputs a sample only after observing
the result of many quantum evaluations of the circuit,
possibly observing repeated outputs.
2.4. Sampling with vs. without replacement
Independence vs. collisions.
When sampling with
replacement, the probability of string collisions becomes
more signicant as the sample size increases. When
uniformly sampling with replacement from a set with
elements, the collision probability of about 50 % occurs
when the sample size
is about
log
. For
 
this corresponds to about 111.7 million, i.e.,

strings. For non-uniform distributions, such
as for quantum string sampling, collisions are expected
to start earlier and be more frequent.
Page 6 of 35
Table 2shows a few examples: the expected number of
collisions is 1 when

for uniform sampling,
or when
  
for quantumly sampling with
delity 1; if xing the sample size to

strings, then the
expected number of collisions is about

for uniform
sampling, and about

for delity-1 quantum sampling.
Table 2: Expected number of collisions ( )
coll 
Uniform 1 
Quantum  
Uniform  
Quantum 
Explicit removal of collisions.
We require that the
nal sample does not contain collisions, (i.e., repeated
strings marked as output of the same circuit). In the
honest case this equates to sampling without replacement.
Thus, we require that the client rejects any sample con-
taining any pair of equal strings claimed to have been
generated from the same circuit. If a client does not check
for collisions, then an adversary could simply produce a
sample as a sequence of
copies of a pseudo-randomly
generated string (with 0 entropy), and have a noticeable
probability of having an average probability value as high
or larger than an honest string from quantum evaluation.
Despite the mentioned requirement, as an approximation
we calculate statistics and thresholds while assuming
a sampling with replacement. This is valid when the
string length is suciently large. It is worth noticing
an (impractical) extreme case where the approximation
would not hold: sampling without replacement exactly
 
strings, from a single circuit with
qubits, yields
0 entropy (since all possible strings are present), apart
from the entropy contained in the ordering of the strings
(which can be 0 if maliciously ordered).
Changing the circuit.
A repetition is counted as a
collision only when it happens within the same circuit.
Thus, collisions in the nal sample can be inherently
avoided if requiring each string to be associated with a
dierent circuit. However, there is an eciency motiva-
tion for proposing sampling from a single circuit or only
a few circuits, assuming that: (i) it is more ecient to
reevaluate a circuit, compared to preparing a new circuit
for rst-time evaluation; (ii) it is more ecient to com-
pute many QC-values for a single circuit, compared to a
single QC-value for each of many circuits. If the sample
size is large enough to be reasonable to approximate the
sampling as being with replacement, then the statistics
of QC-values are similar between the single circuit and
many circuits cases.
An eciency tradeo.
An alternative option to re-
duce the probability of collisions, while not substantially
decreasing eciency, is to partition the sampling across
several circuits. Suppose the time taken to prepare a new
circuit for rst time sampling is

times longer than
the time it takes to repeat one evaluation (e.g., 10 ms
vs. 1
). Then, allowing a sampling of up to

strings
per circuit would: (i) speed up the evaluation by about

times, compared to the case of one string per circuit;
while (ii) only incurring a time increase factor of about
10 % compared to the case of sampling all strings from
the same circuit. With respect to verication time, the
complexity of verifying strings across several circuits in-
creases with the number of circuits, if assuming that the
computation/verication of several strings within each
circuit increases sub-linearly with the number of strings
(e.g., that verifying ten QC-values is less than 10 times
costlier than verifying a single QC-value).
A gap tradeo.
The range of possibilities between
one and many circuits may also allow tuning the gap be-
tween the time to sample strings and the time to compute
all QC-values. Compared with the many-string-from-a-
single-circuit setting, increasing the number of circuits to
simulate may substantially increase the time for comput-
ing QC-values, without signicantly increasing the time
to evaluate the corresponding strings.
A subtle adversarial issue.
An adversary who is able
to perform a very fast repeated evaluation of a quantum
circuit could possibly produce many string collisions. The
analysis of the frequency of collisions for each string would
show which strings are likely to have higher QC-values,
and could thus provide to the adversary an advantage in
skewing the SQC statistic, for example to adversarially
aect an estimation of entropy. This eect can be signi-
cant if the number of qubits is small, namely when the
number of possible strings is smaller than the number of
evaluations the adversary is able to perform within the
time window to publish a sample.
3. Sums of QC-values
In this section we consider the distribution of the
S
um
of
QC
-values (SQC). This distribution relates to the
Heavy Output Generation (HOG) test [AC16], where one
wants to nd whether the SQC of generated outputs is
heavy enough to be reasonable to accept that some of the
originating strings must have been quantumly obtained.
We denote by

the random variable SQC in the
honest case where
strings are sampled by a quantum
computation with
f
idelity
. The cases of
u
niform sam-
pling (

) and pure-
q
uantum sampling (

) are
special cases with delity 0 and 1, respectively.
Page 7 of 35
0 5/N 10/N 15/N 20/N 25/N 30/N 35/N 40/N
0
0.02 N
0.04 N
0.06 N
0.08 N
0.10 N
0.12 N PDFs of SQC (m=10)
Pure-Quantum
Fidelity 0.4
Uniform
Chosen-count (q=4)
Figure 4: Various PDFs of SQC with 
Another adversary of interest, with a quantum computer
with delity 1,
c
hooses the number
of quantumly-
sampled strings, and then uniformly samples the remain-
ing
ones. We denote the corresponding random
variable SQC as

and denote the quotient

as
the “pseudo-delity” of the experiment.
3.1. Statistics of interest
Table 3shows the expected values and variances of the
SQC of
strings, for samplings of type
,
,
and
. The results are computed in Appendix B. For the
rst three sampling types the mean and the variance are
proportional to
, since each isolated circuit evaluation is
assumed independent of the other. It is worth noting that
the expected value of

is the same as

,
but the variance is slightly dierent.
Figure 4shows the PDFs of SQCs for the
,
,

and
 samplings of strings.
3.2. CDFs of sums of i.i.d. variables
The distinguishability analysis that we are aiming for
requires calculation of points in the CDF curves. However,
in comparison with the simple formula for the CDF of
QC-values, the distributions of SQCs need to account
for the new parameter
that can specify an arbitrary
number of strings. A common approach to handle the
increase in complexity is to apply approximations that
simplify the analysis and are provably correct in a limit
Table 3: Statistics of SQCs of strings
Sampling type Random
variable
Expected
value  Variance 
Uniform   
Pure Quantum   
Fidelity    
Chosen-count    
of increasing the number of summed variables.
Central Limit Theorem (CLT).
When sampling a
large number of independently and identically distributed
(i.i.d.) QC-values, the distribution of their sum

can be approximated by a Normal distribution
(11)
, hav-
ing a Gaussian shaped PDF with mean

and standard deviation .

1
2𝑥−𝜇
𝜎2(11)
At a rst approximation, SQCs can be analyzed based
on the central limit theorem [BISB+18]. However, the
approximation can have noticeable inaccuracy if the num-
ber of summed variables is small and/or when evaluating
probabilities at the tails of the distribution.
Better approximations and exact formulas.
Ap-
pendix Cderives exact formulas for the PDFs and CDFs
of the SQCs under sampling experiments of interest. Even
for large
, the formulas are amenable for direct compu-
tation in the
u
niform, the
q
uantum and the
c
hosen-count
cases. For the general
f
idelity case we can use an exact
formula for not-too-large
, but when
gets larger we
use a Gamma-approximation that yields better results
than the CLT.
For the upcoming analysis we are specically interested
in the ability to evaluate the CDF and its inverse. The
Gamma distribution, with parameters derived in Ap-
pendix C.3, has a wide applicability, namely with the
following formula being applicable to several scenarios.
 (12)
where
denotes the [lower] incomplete gamma regular-
ized function (further details in Appendix C.2).
Particularly, the formula is:
Correct for the
u
niform, the
q
uantum and the
chosen-count cases, provided and
A better-than-the-CLT approximation for the gen-
eral
f
idelity case (F), provided
 
and
, where

. The transfor-
mation of variables ensures that the expected value

and the variance

are as in the non-approximated case.
Figures 5and 6illustrate, for an example with

,
how much better the Gamma approximation is, compared
with the CLT (Gaussian) approximation. (The Gaussian
curve also extends to negative values.)
For not-too-large
, we can eciently evaluate the PDF
Page 8 of 35
0 1.0/N 2.0/N 3.0/N 4.0/N 5.0/N 6.0/N
0
0.10 N
0.20 N
0.30 N
0.40 N
0.50 N
0.60 N
0.70 N
0.80 N
0.90 N
1.00 N
QC-values(fid=0.5)
Real PDF
Gamma-approx. PDF
CLT-approx. PDF
Figure 5: PDF approximations of QC-values 
0 2/N 4/N 6/N 8/N 10/N 12/N 14/N 16/N
0
0.02 N
0.04 N
0.06 N
0.08 N
0.10 N
0.12 N
0.14 N
0.16 N
0.18 N
0.20 N
SQC (m=5,fid=0.5)
Real PDF
Gamma-approx. PDF
CLT-approx. PDF
Figure 6: PDF approximations of SQC 
of the fidelity case (F) as a binomial weighted sum over
all possible numbers of quantumly sampled strings:


 (13)
3.3. Testing honest sampling
Some terminology:
We dene distinguishability ex-
periments where the baseline question is: Are we in the
presence of an honestly generated set of
strings [instead
of some other malicious or faulty sampling within a well-
dened range of behaviors]? In the scope of this experi-
ment/question, we dene the following terms: negative
and positive respectively mean rejection and acceptance;
false and true respectively mean incorrect and correct
classication. We are particularly interested in the false
positive and false negative probabilities:
F
alse
n
egative [rate] (FN): a test rejects an honestly
generated sample.
F
alse
p
ositive [rate] (FP): a test accepts a sample
generated by a reference malicious or faulty process.
When the
h
onest case is characterized by delity
,
how many (
) strings should be sampled, and what
threshold
on the SQCs (
) should be set for an accep-
tance/rejection test of honest behavior? It depends on:
what is the reference malicious sampling procedure;
what FN () and FP () rates are set as a goal.
Selecting a reference malicious sampling.
As a
default reference, we sometimes measure FP with respect
to the
u
niform sampling case (i.e., with delity 0). How-
ever, the denition of FP should match a higher goal of
distinguishability. For example, if intending to maximize
the entropy estimate (see Section 5), then a more useful
FP will refer to a malicious case that ensures a certain
non-zero minimum of entropy. A useful reference of the
malicious sampling is

, where the adversary
samples with pseudo-delity

for some
, where
is the claimed honest delity.
Selecting concrete FN and FP rates.
For cryptog-
raphy applications, the value

is a common bench-
mark related to statistical security in the “one-shot” se-
curity scenario — what can the adversary do if having
luck up to an
-likely event? When application goals do
not indicate otherwise, we recommend
  
as a minimum goal for both FN and FP. In specic ap-
plications it might be reasonable to allow less stringent
security parameters, if explicitly justied.
3.4. Threshold vs. probability
Figure 7shows, for several delity values
, the inverse
CDF of the SQC
6
across
  
sampled
strings. Each of these curves represents SQC as a func-
tion of the FN rate, i.e., of the accumulated probability
(horizontal axis) that the random variable

is at
most equal to such threshold value (vertical axis). Fig-
ure 8shows the same for sums across

strings.
For the uniform case (
=0) the curve is evaluated directly
from its exact formula. For the other cases the curves
are obtained from the Gamma approximation. For such
high
, the curves look visually the same as if they were
obtained from the CLT approximation.
The analysis of the curves illustrates the gap in SQC
as the delity increases, across the delity values in

. Also, comparing the two plots, it
is easy to notice the increase in the gap with the increase
in the number
of sampled strings, becoming easier to
distinguish between two distinct delities.
3.5. Sample sizes vs. thresholds
We can now nd, for each intended upper-bound
on
Page 9 of 35
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.9960 m/N
0.9970 m/N
0.9980 m/N
0.9990 m/N
1.0000 m/N
1.0010 m/N
1.0020 m/N
1.0030 m/N
1.0040 m/N
1.0050 m/N
1.0060 m/N
1.0070 m/N
1.0080 m/N
1.0090 m/N
N=2^53; m=10^6; m/N=1.11022E-10
Cumulative probability
SQC (Sum of QC values)
Fidelity = 0.005
Fidelity = 0.002
Fidelity = 0.001
Fidelity = 0 (Uniform)
Figure 7: Inverse CDFs of SQC with 
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.9990 m/N
1.0000 m/N
1.0010 m/N
1.0020 m/N
1.0030 m/N
1.0040 m/N
1.0050 m/N
1.0060 m/N
N=2^53; m=10^7; m/N=1.11022E-09
Cumulative probability
SQC (Sum of QC values)
Fidelity = 0.005
Fidelity = 0.002
Fidelity = 0.001
Fidelity = 0 (Uniform)
Figure 8: Inverse CDFs of SQC with 
-40 -30 -20 -14 -10 -7 -4 -2-1
10^5
3×10^5
10^6
3×10^6
10^7
3×10^7
5×10^7
10^8
3×10^8
10^9
log2(FN=FP), in a Sqrt scale
m: sample size (number of strings)
(Squared-log scale)
Fid1 = 0.002
Fid2/Fid1=3/4
Fid2/Fid1=1/2
Fid2/Fid1=1/4
Fid2/Fid1=0
Figure 9: Sample size vs. FN=FP, with 
the FN rate what is the minimal number
of strings
needed for an intended FP
, i.e., that a dened malicious
execution with lower pseudo-delity is accepted.
Example of FN vs. FP rates.
Consider an honest
sampling of

strings using delity

. If
the intended FN rate is

, then the threshold

is set at
  
, i.e., the value
satisfying Prob(
  
) = 0.2. If the malicious
reference is the uniform case, then the false-positive rate
is
FP Prob 
. If the malicious
reference is the case of delity equal to half of the honest
delity, then we get
FPProb 
.
Figures 9and 10, respectively for honest delities
 
, plot curves of the sample size (number
of sampled strings) vs. the FN=FP rates (
), for a
reference honest case
1
, and a malicious
c
hosen-
count sampling
2
with pseudo-delity
, for

. The scales of the axes of the
plots are adjusted for a consolidated view of all plotted
pseudo-delities, while FN=FP range within

.
Each curve labeled with honest delity
and malicious
pseudo-delity
shows
as a function of
= FN = FP,
such that there exists a distinguishability threshold
for
which Prob1Prob2.
For simplicity of computation, we applied here the
CLT-approximation, which provides a simple formula
for
based on the inverse-erf function ((51), (50)), as
derived in Appendix C.1. It is worth noting that the
analytic results are independent of the string space
size
(here assumed to be
 
), if we assume a
sampling with replacement.
Table 4shows selected examples of sample-size (
) vs.
honest-delity (
) and pseudo-delity (
). This is a
sub-table of the more-detailed Table 14 in Appendix E.
For example, the table shows that about 50 million strings
Page 10 of 35
-40 -30 -20 -14 -10 -7 -4 -2 -1
10^4
3×10^4
10^5
3×10^5
10^6
3×10^6
10^7
3×10^7
log2(FN=FP), in a Sqrt scale
m: sample size (number of strings)
(Squared-log scale)
Fid1 = 0.01
Fid2/Fid1=3/4
Fid2/Fid1=1/2
Fid2/Fid1=1/4
Fid2/Fid1=0
Figure 10: Sample size vs. FN=FP, with 
Table 4: Number of strings for SQC distinguishability
(Selected entries from Table 14)
Number of strings to sample
when
when
2
1
when
2
1
0.002  4.977E+7 6.146E+7 1.993E+8
 2.273E+7 2.807E+7 9.102E+7
 9.569E+6 1.182E+7 3.831E+7
 1.646E+6 2.032E+6 6.589E+6
0.01  2.007E+6 2.480E+6 8.066E+6
 9.165E+5 1.133E+6 3.684E+6
 3.858E+5 4.767E+5 1.551E+6
 6.635E+4 8.199E+4 2.667E+5
need to be sampled in order to obtain FN=FP rates of

, if using a quantum computer with delity

and if the FP refers to a malicious uniform sampling.
The number increases about 4-fold if, instead, measuring
FPs with respect to a malicious sampling with pseudo-
delity of about one half of the honest delity. The
needed number of QC-values reduces about 25-fold if
using instead a quantum computer with delity 0.01.
Why considering pseudo-delities?
Distinguishing
the honest vs. the malicious uniform case (delity 0) can
be useful to ascertain that some of strings resulted from
a quantum circuit evaluation. However, for the goal of
estimating entropy (see Section 5), it is more relevant to
distinguish the honest case from a malicious sampling
with some positive pseudo-delity .
4. Low-budget clients
One drawback of the distinguishability experiment consid-
ered in the previous section is that it requires computing
many QC-values. In this section we consider a dierent
statistic that allows an interesting tradeo: a powerful
server computes more QC-values; a weak verifying client
veries fewer QC-values. The client verication can be a
recomputation of QC-values followed by equality check,
but other verications are conceivable.
In the next subsections we consider one approach for
enabling a computationally cheaper verication by clients.
Other strategies exist and it would be interesting to
investigate which ones may yield useful tradeos.
4.1. Truncated QC-values
Suppose the client wants to save work by verifying only
(the sum of) a fraction
of the QC-values. This may be
particularly useful in the setting of one string per circuit,
as there the cost of computing QC-values is maximal
per string. One approach is to set a threshold on the
QC-value, and then check only (the sum of) those with
higher value.
Truncation thresholds.
We denote as truncated
QC-value (TQC), with respect to some truncation
threshold
, the measure that is equal to the QC-value
when it is not-smaller than
, and is 0 otherwise. For each
intended verication proportion
, it is useful to know
the matching truncation threshold
. Table 5shows,
for several delities, the “truncation threshold” that
correspond to each of several verication proportions

. For notation simplicity
we let  .
Table 5: TQC truncation thresholds
The indicated thresholds are to be divided by .
Thresholds
Verication proportion ()
    
Fidelity
0.0 0.693 1.386 2.303 4.605 6.908
0.001 0.694 1.388 2.305 4.610 6.915
0.002 0.695 1.389 2.307 4.614 6.922
0.005 0.697 1.393 2.314 4.628 6.942
0.01 0.700 1.400 2.326 4.651 6.975
0.02 0.707 1.414 2.348 4.695 7.039
0.05 0.729 1.457 2.417 4.821 7.216
0.1 0.767 1.529 2.528 5.011 7.465
0.2 0.850 1.675 2.739 5.331 7.852
0.5 1.146 2.105 3.272 5.990 8.573
1.0 1.678 2.693 3.890 6.638 9.233
Page 11 of 35
Since the threshold is measured with respect to the honest
distribution of a single QC-value, we can nd an exact
solution based on the inverse CDF of :
CDF (14)
 
 
(15)
where

is the real lower branch of the LambertW
function) [DLMF].
For example, for a proportion verication of

in a
case of claimed delity 1, the threshold should be set at
 
. (It is interesting to notice that even though

is 2/N, the median occurs at about

.)
As another example, for a proportion verication of

, the threshold should be set at about
 
if
the delity is 1, and at   if the delity is 0.002.
Consider the random variable
, equal to
if
and equal to 0 otherwise. (
is the indicator
function, equal to 1 if its input is greater than
; equal
to 0 otherwise.) For the sampling of a single string we
consider the cases
,
and
, i.e., when
is obtained
respectively from
U
niform, pure-
Q
uantum and
F
idelity
samplings. Table 6shows the formula for their expected
values and variances. We use
 
. Notice how
replacing by 0 leads to the formulas in Table 1.
Table 6: Statistics of Truncated QCs
Statistic Formula

 

   

  

   

  
 
  
  
Legend:
(
E
xpected value);
(
F
idelity);
(pure
Q
uantum);
(
 
, where
is the
t
runcation
t
hreshold);
(
U
niform);
(Variance); (random variable: truncated QC-value).
4.2. Sum of truncated QC values (STQC)
Similarly to how we considered sums of QC-values, we
are now interested in the sum of truncated QC values.
We use

to denote the sum of TQC values (STQC) of
strings obtained in an experiment
. Their expected
values and variances are obtained by simply multiplying
by the statistics of the single string.
We also consider the
c
hosen-count case, where an ad-
versary with a delity 1 quantum computer chooses in
advance the number
(out) of (
) strings that it eval-
uates quantumly. The expected value and variance of

are given by the corresponding weighted sums
from the uniform and the pure-quantum cases:
(16)
(17)
It is now interesting to calculate the increase in the num-
ber of QC-values that a server needs to compute. Table 7
(with selected entries from Table 15 in Appendix E) shows,
for several FN
1
= FP
2
rates
, honest delities
,
delity proportions

, and verication proportions
, what is the number

of positive TQC-values to be
veried by the client, when the number
of TQC-values
computed by the server is higher by a proportion .
Table 7: Number of client-veried TQC values
(Selected entries from Table 15)
: # client-veried TQC-values
when  when 
when
when
2
1 when
when
2
1
  7.29E+6 2.92E+7 2.24E+6 8.99E+6
 3.33E+6 1.33E+7 1.02E+6 4.10E+6
 1.40E+6 5.61E+6 4.31E+5 1.73E+6
 2.41E+5 9.65E+5 7.41E+4 2.97E+5
  2.97E+5 1.19E+6 9.34E+4 3.78E+5
 1.36E+5 5.45E+5 4.27E+4 1.73E+5
 5.70E+4 2.29E+5 1.80E+4 7.27E+4
 9.81E+3 3.95E+4 3.09E+3 1.25E+4
5. Entropy estimation
The previous section focused on distinguishing an honest
sampling with a xed delity
from a malicious sam-
pling with a lower pseudo-delity
, including 0. This
section focuses directly on the goal of entropy estimation
from a set of sampled strings, and on that basis deciding
parameters for a distinguishability test, and what adver-
saries to consider. For simplicity, the discussion hereafter
assumes a distinguishability test based on SQCs (as de-
scribed in Section 3). The use of dierent statistics could
change the estimated parameters.
This section is organized as follows: Section 5.1 gives an
informal overview of the analysis; Section 5.2 discusses
the client perspective; Section 5.3 denes the pseudo-
Page 12 of 35
delity adversary, for when the sampling budget of the
adversary is not enough to observe collisions. Section 5.4
considers the collisional adversary, which takes into ac-
count the information obtained from observing collisions.
Section 5.5 considers the post-processing of the sample
and a possible hash-biasing attack.
5.1. Overview
The client.
With respect to obtaining certiable en-
tropy, we consider two possible perspectives of the client.
Either: (i) it knows how much entropy it wants, and
with which assurance it wants to obtain it (FN and FP),
and then denes corresponding parameters for a sam-
pling experiment (e.g., the number
of strings to be
sampled) and for an acceptance/rejection test (e.g., an
SQC distinguishability threshold
); or (ii) it is given a
sample of distinct strings claimed to have been sampled
from a veriably fresh circuit, and then, based on their
QC-values and on an intended FP rate, it decides a lower
bound for the entropy contained in the sample.
The adversary.
For either of the above perspectives,
the adversary tries to minimize the entropy contained
in the published sample, conditioned on satisfying the
admissibility criterion of the client (an SQC threshold)
with a probability not smaller than FP. We denote the
latter as the FP goal (or FP constraint).
The adversary of interest is assumed to have a quantum
computer with delity 1. It produces a sample of
strings where only a small number
of them are actually
obtained from quantum evaluation. The sample gener-
ation includes adversarial actions of rejection sampling,
reordering and biasing, as ways to further reduce the
entropy. The
non-quantumly-generated strings
are obtained pseudo-randomly (with entropy 0) without
dependency on the circuit specication .
A black-box model.
The adversary is assumed to
not be able to take advantage of the knowledge of the
circuit specication, apart from being able to obtain
outputs from its evaluation. Therefore, we idealize the
adversary as having a black-box access to the circuit,
being able to request its evaluation a number (
) of
times. This substantiates the assumption of not being
able to compute or estimate QC-values from the circuit
specication alone, nor of simulating an evaluation with
a correlated probability distribution. Nonetheless, this
still allows the adversary to gather some information
about QC-values, depending on the sampling budget
,
by considering the multiplicity of each occurring string.
Entropy estimation.
For concrete parameters of an
experiment, we estimate the number of quantumly-
generated strings, possibly from various probability dis-
tributions. We consider that the strings may have inter-
dependencies, such as those related to sampling without
replacement, rejection sampling, and ordering.
5.2. The client
Timing assumption.
The continuing discussion re-
tains the assumption that the adversary cannot compute
the QC-values of strings by the time deadline to pub-
lish them. This requires the circuit specication to not
be known in advance by the adversary. For example,
one may base this assumption on requiring that the cir-
cuit specication is pseudo-randomly generated from the
timely output of a public-randomness beacon, if it is
reasonable to assume that the beacon is not maliciously
colluding with the adversary.
Two perspectives.
Using a statistic such as the SQC
or STQC described in the previous sections, we consider
how to parametrize an experiment and perform an esti-
mation of entropy. For simplicity, hereafter we focus on
the SQC statistic. We consider two perspectives:
1. Decidability problem.
Given a goal of obtaining
at least
bits of entropy, with a FP rate of at
most
, and accepting a FN rate of up to
for
an honest quantum circuit operator with delity
,
the client decides the number of strings
to
request from the operator to publish as a sample, and
which SQC distinguishability threshold
to use
in
order to accept or reject the sample
. The client
determines
and
assuming that an adversarial
operator (with quantum delity 1, for a conservative
estimate) will craft a sample with minimum entropy
subject to having a probability at least FP of having
an SQC greater than the threshold
. The estimate
depends on the sampling budget
assumed to be
available to the adversary.
2. Estimation problem.
Entropy can only be esti-
mated retrospectively. Given a circuit specication
and a published sequence of
strings, the client
starts by computing its SQC
. The individual QC-
values are either computed by the client or received
from an external trusted party that would have com-
puted them. The client also denes, based on the
time
assumed to have been available to the adver-
sary since learning the circuit specication
, what
is the assumed sampling budget
. The client has its
own requirements of FN and FP, and assumes that
the adversary may have performed a targeted attack
based on those parameters. Thus, the client assumes
the nal sample includes only a “small” number of
quantumly generated strings, minimizing the overall
entropy while still attempting to ensure a probability
of at least FP of having an SQC at least T.
The
client estimates this entropy .
Page 13 of 35
Interesting adversaries.
We consider that the client
is only interested in adversaries that publish samples
whose probability of being accepted by the client is at
least FP (
). Any adversary playing outside this con-
straint is ignored, since within the intended level of as-
surance the client will reject the sample. In particular,
we ignore adversaries that would publish only a sequence
of pseudo-random strings (with overall entropy 0) that
would lead to acceptance with probability less than FP.
Number of quantumly-sampled strings.
A key
step of the analysis is determining the number
of strings
that an optimal adversary has included (or will include)
in the nal sample, and the corresponding expected QC-
values and entropies of those strings. The client also
assumes that the adversary chooses those strings when
knowing the goal/parameters set by the client. In both
perspectives, the estimates/parametrizations by the client
are based on the assumption that, before the deadline to
publish a sample, the adversary cannot compute anything
about the QC-values of concrete strings, apart from prob-
abilistic information based on the number of times each
string has appeared when quantumly sampling a large
(yet feasible) number
of strings from the circuit. The
collisional adversary in Section 5.4 considers indeed such
information when selecting which quantumly-generated
strings to include in the nal sample.
5.3. The pseudo-delity adversary
We focus rst on a specic attack performed by what
we denote as the “pseudo-delity” adversary. This is
tailored to the case where the sampling budget of the
adversary is not large enough to enable nding many
collisions (repetition of strings) before having to publish
a sample. In this subsection we consider the decidability
perspective, where the client decides the size
(number
of strings) of the sample to be published.
5.3.1. Algorithm
The pseudo-delity adversary () operates as follows:
1. Input. receives four input parameters:
(a) : (budget) #quantum evaluations of ;
(b) : # strings to publish in a sample;
(c) : SQC threshold of acceptance by the client;
(d) : FP (maximum false-positive) rate.
2. Quantum over-sampling.
quantumly eval-
uates the circuit
, with delity 1, in a black-box
manner,
times. We call pre-sample to the set of
obtained distinct strings, which is expected to have
approximately

strings. See
auxiliary formulas in Section D.2.
3. Number of quantum strings.
As a function
of the input parameters (
,
,
), the adversary
calculates the minimum number
of quantumly-
generated strings (besides the
strings to be
pseudo-randomly generated) to include in the nal
sample, such that the client accepts with probability
at least
(the FP rate). Recalling, from Section 3,
the notation for the SQC random variable
when
doing chosen-count sampling, the condition is:
minProb (18)
The simplifying assumption here is that the QC-
values of these
strings are i.i.d., with expected
value

and variance

, whereas in fact the
budget
and the observation of collisions would
enable inferring more detailed information.
4. Rejection sampling.
Using a secret key known
apriori,
seeds a [cryptographic] pseudo-random
permutation (PRP), thus dening a bijective map-
ping from the set of
-bit strings onto itself. To
reduce the expected entropy in the nal sample,
performs “rejection sampling” as follows: (i)
computes the PRP-output of every string in the
pre-sample; (ii)
orders the
distinct strings,
ascendingly, with respect to their PRP-output; (iii)
selects, in the corresponding PRP-lexicographical
ordering, only the rst strings.
5. Positioning of strings.
initializes a sample
vector
of length
(the sample size requested
by the client) and pseudo-randomly selects
loca-
tions therein.
then places there the
quantumly-
obtained strings selected in the previous step, in the
respective devised order. Then,
pseudo-randomly
generates
additional strings, distinct from the
already quantumly-obtained strings.
positions
them in the free
free positions of the sample.
This step adds no additional entropy.
Note: Section 5.5 mentions a possible additional step
of hash-biasing.
6. Output. outputs the sequence of strings.
5.3.2. Statistics
We now summarize how the client, with a goal of obtain-
ing a sample with
bits of entropy (e.g., 1024), should
parametrize an experiment when assuming the quantum
computer operator is a pseudo-delity adversary ().
Pre-sample size.
We assume that
can sample the
quantum circuit at most
times in the allotted time
window. For example, if one circuit evaluation counts as
one cycle, then sampling

strings within a time
window of

seconds requires a quantum computer with
Page 14 of 35
a frequency of about 1.12 MHz. For
, we assume
for simplicity that the the number

of obtained distinct strings is (see Section D.2).
QC-values and pseudo-delity.
With
  
, the
expected QC-value in the pre-sample is very close to what
was determined in
(9)
, i.e.,
  
. Since the client
wants to accept an honest sample with high-probability
(i.e., low FN rate), the corresponding SQC threshold nec-
essarily allows some probability of it also being achieved
with a pseudo-delity slightly lower than the honest -
delity. When the adversary does rejection sampling to
select
strings, then by denition the pseudo-delity
is
 
. The client assumes the adversary uses
the smallest pseudo-delity that will still pass the SQC
threshold test probability non-lower than the FP rate
. Section D.2 considers a more detailed approximation
(124)
for
, obtaining

, where

is the budget factor. However, compared with

the correction is only signicant when it is already
relevant to consider the more sophisticated collisional
adversary from Section 5.4, which takes advantage of
collisions observed in the pre-sampling stage.
Entropy estimation.
The original expected entropy
per each of the obtained distinct strings in the pre-sample
depends on the sampling budget
. Let
denote the
expected entropy for a thought experiment where the
adversary would now output a single string uniformly
selected from the pre-sample. This is
 
log
(as determined in
(9)
in Section 2.3)
if
, or up to
when
is so large that the
pre-sample contains almost all
strings. However,
has
performed “rejection sampling” to reduce the expected
entropy per string (and thus of the nal sample).
An initial intuition is that the rejection sampling induces
the selected PRP-outputs to start with about
log

zeros (meaning we can discount about
log

bits of
entropy per string). Also, the ordered selection reduces
the space of possible vectors by a factor

, increasing
the probability of each possible one by

. This would
lead to approximating the expected entropy as follows:
log
log (19)
However, a lower value is obtained if we consider an
iterated procedure, one string at a time (i.e., using
),
repeated the original
times. In the iterated case, at
the
-th selection the entropy reduction from the above
formula would be
log

, leading to an apparent
overall reduction of log

.
The Shannon entropy of a sampling procedure is the
expected value of the negative logarithm (base 2) of
the probability of obtained samples. This logarithm
can be interpreted as a summation of logarithms, where
each new logarithm applies to the probability of a new
string being selected, conditioned on the strings already
selected. The logarithm for the probability of the overall
sample is itself a random variable, which has not only an
expected value (Shannon entropy) but also for example
a variance. In practice it may be relevant to measure
something like the minimum possible entropy, or a bound
for which there is a overwhelming probability of the
variable being larger than it. Some details about this
variable are considered in Section D.3.
For simplicity, and being conservative in the estimation of
Shannon entropy, in the subsequent discussion we focus
on the approximation obtained by iterating formula
(19)
one string at a time. We also pug in a minor correction
(see Section D.3.1) to the apriori average entropy
per string (i.e., before rejection sampling), due to the
reduction of the set from which the
-th string is selected.
In summary, we consider the following approximation:
 log𝛽

𝛽
log
(20)
where
is the average entropy per string expected for the
pre-sample of distinct strings obtained after
quantum
evaluations of the circuit, and where
log
represents
the binary logarithm of the descending factorial of
order
, i.e., of

. A more accurate
estimate could be based on simulation, as mentioned in
Section D.3. For example, there we show that with delity
0 and
both formulas are a non-tight lower-bound
of the expected entropy.
Parametrization.
Solving for
yields the (approxi-
mate) minimum number of quantumly-generated strings
that an interesting adversary will use. The client assumes
that the sample of
strings will be produced by a chosen-
count method, using exactly

quantumly generated
strings (i.e., pseudo-delity

), selected (and or-
dered) upon rejection sampling from a set of
strings.
Thus, the client determines the parameters
, and
,
respectively using the approximated equations
(51)
, and
(48)
or
(46)
, in Section C.1, as in the following examples.
5.3.3. Initial examples
Example 1.
Let

and
  
, which implies

and

. Instantiating this in equa-
tion
(19)
for a goal of

bits of entropy and solving
for
yields

, which rounded up is

(the
assumed number of quantumly-generated strings that the
adversary will include in the nal sample).
When
FN FP    
, the needed sample size
can, by the CLT, be approximated as in (51) in Ap-
pendix C.1, using

. Let
be the quantum
Page 15 of 35
delity claimed by the honest operator. Then:
E+for 
E+for 
The corresponding SQC thresholds can be obtained
from
(48)
in Section C.1, using
 
and

. Comparing against Table 8, the values
obtained for
are only slightly higher than (but quite
close to) those obtained for

and
 
.
This is expected, since

makes the ratio

very small, namely
E-E+6
and
E- E+7
, respectively when
is 0.01 and 0.002. Therefore, it is expected that a small
factor increase in
will yield a large factor increase in
estimated entropy.
Table 8: Number of strings for SQC distinguishability
(Selected entries from Table 14)
for
for
2
1 for
2
1 for
2
1
  4.98E+7 5.08E+7 8.85E+7 1.99E+8
 1.65E+6 1.68E+6 2.93E+6 6.59E+6
  2.01E+6 2.05E+6 3.57E+6 8.05E+6
 6.63E+4 6.77E+4 1.18E+5 2.66E+5
Example 2.
If the client wants

bits of entropy,
then we get

. Using the same FN=FP rates
  
, and assuming that
will use
   
quantumly-generated strings, leads to:
E+for 
E+for 
We note that the last example for
already exceeds the
considered budget
  
, meaning that such

parametrization would not be suitable for a real experi-
ment. The adversary client should then assume a higher
budget to the adversary (e.g., possibly by giving more
time for sampling). Consequently, the client should re-
estimate
and possibly also start taking into account the
advantage that the adversary may obtain from observing
collisions in the over-sampling stage.
5.3.4. Other examples
We show now a few examples of deriving the parameter
from Table 8(which has a few selected entries from
Table 14), based on the sample size (total number
of
strings) calculated with respect to the SQC statistic, for
several pseudo-delities and FN=FP rates.
Example 3.
Let
  
.
For FN=FP rates of

, one needs
E+
sampled strings to distinguish between the honest
sampling

with delity
 
, and a
malicious chosen-count sampling
1
with
pseudo-delity
equal to one hundredth of
(i.e.,
 
). Such malicious case contains only

(=

) strings generated from a correct
circuit evaluation.
Example 4.
Let
  
.
Compared to Example 3, increasing
by a factor of
about 78 %, to
E+
, provides the same FN=FP
rates (

) but when FP refers instead to a malicious
pseudo-delity
equal to 1/4 of the honest delity
.
That corresponds to

quantumly generated strings,
which is 40 times more than in Example 3.
Example 5.
The cost in sample size is about linear in
the

base of the FN=FP rates. Let


. Consider an FP=FN goal of
 

, for cryptographic suitability. More than 49.7 million
strings need to be obtained for any amount of entropy to
be present in the malicious case, if supposing

.
For example, for a pseudo-delity
 
, the
total number of strings is about
E+
strings,
meaning about 1 016 quantum-sampled strings. Overall
the sample size is about 30 times larger than in Example 3,
to increase the FP=FN rates from  to .
Example 6.
Let

. A substantial improvement
is possible by increasing the honest delity reference.
For example, for
 
(ve times larger than in
example 3), the needed total number
of strings is
about 2 million (i.e., about 25 times less than) for the
same

ratio as in Example 5. However, to satisfy
the FP constraint the malicious adversary would (in this
example) also be using a higher pseudo-delity, equal to

. One would then assume there are 205 quantum-
sampled strings present in the sampled string set.
5.4. The collisional adversary
We consider now how the observation of collisions during
over-sampling may give an advantage to the adversary.
Informal description.
The adversary
uses a quan-
tum computer with delity 1 to evaluate the circuit
“many” times (
), until obtaining “many” collisions. The
output strings with most collisions have a higher expected
QC-value (and lower expected entropy) than those with
fewer collisions. Based on this, fewer quantumly gener-
ated strings in the nal sample can achieve a higher SQC.
organizes the strings in bins, one for each multiplicity
of occurrence (i.e., bin
becomes the set of strings that ap-
peared, each, exactly
times). Depending on the tally of
multiplicities, i.e., the vector of numbers of strings across
bins,
chooses from which sets of bins to select strings for
the nal sample, along with applying rejection sampling.
Page 16 of 35
When a small sampling budget does not lead to colli-
sions, the collisional adversary corresponds to the pseudo-
delity adversary from Section 5.3.1. Conversely, in a the-
oretical extreme of a sampling budget being a very large
exponential in the number of qubits, each string would
get a multiplicity suciently apart from the multiplicities
of other strings. From those multiplicities the adversary
could estimate with high accuracy the QC-values of each
string. This would enable a straightforward simulation of
sampling from a circuit evaluation, while in fact only mak-
ing a pseudo-random selection with overall zero entropy.
5.4.1. Algorithm
The collisional adversary operates as follows:
1. Input.
As in Section 5.3.1,
receives the input
parameters:;;;.
2. Quantum over-sampling.
uses its sampling
budget to obtain a sequence of
strings, called
the pre-sample, which may have repetitions (i.e.,
collisions).
organizes the strings into bins. Bin
is the set of strings that appeared with multiplicity
in the pre-sample.
Let
denote the expected number of strings in bin
. For example, with  qubits we have:
 
 
  
We abuse notation and also let
denote the actual
number of strings obtained with each multiplicity in
a given experiment. We have:
 (21)
We use a prime in superscript (e.g., as in
) to
indicate the union of bins of multiplicities larger or
equal to a certain value. For more general union of
bins we can simply use a set, instead of an integer,
as index. For example:
.
3. Number of quantum strings.
The adversary de-
cides, from the pre-sample with distinct strings
in each bin
, totalling
distinct strings, how
many (
) quantumly-obtained strings, and from
which unions of bins, to include in the nal sam-
ple. An adversarially optimal selection takes into
account that the expected QC-value and entropy
of quantumly sampled strings also varies across the
bins. Tendentiously the strings with higher multi-
plicity are preferable in terms of having higher QC
value and lower entropy. However, an optimal deci-
sion can be more intricate considering the rejection
sampling step ahead, which depends on the number
of strings available in each bin
, and on the SQC
threshold required by the client to accept a sample.
Concretely, as a function of the input parameters
(
,
,
), and of the tally

of collisions
in the pre-sample,
will determine a sequence

of subsets (possibly only one) of multi-
plicities, from whose corresponding unions-of-bins to
pseudo-randomly sample, and determine how many
strings to select from each such union. In other
words,
needs to determine two (jointly optimal)
non-empty same-length sequences:

(or simply
), where each
is a subset of possible multiplicities .
  12
, where each
𝑗
is a posi-
tive integer, such that
  𝑗
is the
number of quantumly-obtained strings to be in-
cluded in the nal sample (not counting strings
that, although obtained in the over-sampling
step but not selected in the rejection sampling
step, may in the subsequent step be, by coinci-
dence, pseudo-randomly selected.)
Note on various options:
The pseudo-delity
adversary described in Section 5.3.1 uses
 

, i.e., all
strings are selected from within
the set of all pre-sampled distinct strings (regardless
of multiplicity); conversely, the collisional adversary
is allowed a more intricate choice across dierent
unions of bins. We dene that the optimal colli-
sional adversary is one that makes an optimal choice
of the vectors
and
, for the purpose of minimizing
entropy while satisfying the FP goal.
4. Rejection sampling.
From each union set
, the
adversary uses a dierently seeded pseudo-random
number generator to obtain
𝑗
strings by rejection
sampling. Specically,
selects the lexicographi-
cally rst
𝑗
strings upon application of the pseudo-
random permutation. See Section D.3.2 for a discus-
sion of other options.
5. Positioning of strings.
initializes a sample
vector
of length
(the sample size requested by
the client) and pseudo-randomly selects a vector
of
   
distinct positioned therein.
then se-
quentially places in those positions the
quantumly
obtained strings selected in the previous step, which
possibly came from various bins, in the respective
devised order (namely, considering the lexicographic
ordering respective to the pseudo-random permu-
tation used for rejection sampling in each union of
bins). Then,
pseudo-randomly selects
other
strings, distinct from the already selected
strings,
to complete a nal sample with overall strings.
6. Output. outputs the sequence of strings.
Page 17 of 35
5.4.2. Statistics per bins or unions of bins
Size of bins. Appendix D.2 shows experimentally ob-
tained formulas for
, as functions of the string space
size
and the sampling budget factor
  
. One
case of interest is the union of bins whose multiplicity
is at least a certain value (
). In those cases we use a
prime in superscript (e.g., as in
) to indicate that the
statistics refer to said union of bins. For example, the
expected number
of distinct strings that appear with
multiplicity at least is approximately equal to:
(22)
QC-values.
Appendix D.2 shows experimentally
obtained formulas for the expected QC-value (
)
and variance (
), for each bin
. When the budget
is signicantly smaller than the string space, the
expected QC-value and the variance remain very close,
respectively, to

and

, within each
bin
. However, for larger sampling budgets those values
start to noticeably dier. It is also relevant to consider
the special case of the expected average QC-value
in the union of bins with at least a certain multiplicity
. This is approximately equal to:

(23)
Entropy.
The expected entropy
per string de-
creases with the multiplicity. However, the entropy
of each string in the pre-sample is indistinguishable
across the strings within the same bin, i.e., before
the deadline to publish a sample. Nonetheless, the
adversary will still aect the probability distribution
with which the strings will appear in the nal sample, by
using rejection sampling. Technically, the entropy of a
string as measured in the pre-sample is not the same as
measured in the nal sample. Particularly, depending on
the rejection sampling technique, the entropy in the nal
sample will also depend on the number
of strings to
select from each union of bins.
By denition, an optimal collisional adversary is the one
that optimally selects the sequence of unions of bins in
a way that minimizes the overall expected entropy in
the nal sample, while subject to the FP goal. For the
purpose of these notes we nd sucient to highlight the
eect of a simple collisional choice —

(i.e.,
selecting strings from those that appeared at least twice in
the pre-sample) — that already outperforms the pseudo-
delity attack when the sampling budget
is suciently
large, such as  when considering  qubits.
Using a logic similar to the one used for the analysis
of the pseudo-delity adversary, we can consider a rst
approximation of the entropy contributed by the rst
ordered sequence of
1
strings selected from the rst
union of bins as being about:
𝑗𝑗log𝑗log𝑗log𝑗 (24)
where
1
is the expected apriori average entropy per
string in the union of bins in .
For a more conservative and still simple estimate we can
consider the result of iteratively applying the previous
formula, thus getting:
log𝑢1𝑢1
𝑢1𝑢1
log1𝑢1(25)
where and
is the descending factorial of
order
.
The expressions are similar in look to formulas (19) and
(20), which considered

, but now
we consider separate bins.
The corrections discussed in Section D.3 also apply. The
entropy contributed by strings selected across several
unions of bins also depends on the previous selections
across other unions of bins. Concretely, the initial entropy
𝑗
considered for the rst string in a union
on bins
depends on the number of strings already selected, and
from which unions of bins.
Asymptotically large budget.
It is instructive to
consider the asymptotic limit of large sampling budgets,
i.e., when
is a large enough exponential in the number
of qubits. Each string
will tend to appear in an
individual bin with multiplicity
approximately equal
to
, where
Prob  
is the QC-value
of string
with respect to the circuit
. Thus, the
adversary can estimate that the QC-value of each string
is approximately .
In fact, the asymptotically large
would even allow an
exact simulation of the circuit evaluation, as follows.
1.
Pseudo-randomly simulate a binomial number
of
strings to obtain from quantum evaluation. The
binomial has parameters
and
, to simulate how
many strings, out of
, would be from correct quan-
tum evaluation in an experiment with delity .
2.
Pseudo-randomly simulate
uniform oating point
numbers between 0 and 1, as points in the inverse-
CDF of the QC-values, and determine what are the
correspondingly selected strings.
3.
Output a sequence of
strings, composed of a
pseudo-random positioning of the initial
strings,
and then interleaved by other

pseudo-randomly
selected strings simulating a uniform selection of dis-
tinct strings.
Page 18 of 35
The above described sample would be cryptographically
indistinguishable from a honest sample, implying that
not only it would pass an SQC test with the same FP
rate as the FN rate set for the honest case, as well as it
would do so for any other practical statistical test.
5.4.3. Comparison of adversaries
The pseudo-delity adversary is a special case of the
collisional adversary, when using

and then
selecting
is as the minimum possible. We conjecture
that in the black-box evaluation model this is optimal
for a sampling budget of the order
since there
is no information gained from collisions. Obtaining a
few collisions is possible in practice by evaluating the
circuit a number of times of at least the order of the
square-root of the string space. However, for
  
qubits, where a distinguishability from uniform with FP
rate

and delity
 
already requires
sampling approximately

times, it is conceivable that
an adversary would in fact be able to sample more strings,
say, up to , within the allowed time for sampling.
The collisional adversary takes advantage of observed
collisions. Below, we show some examples comparing
the pseudo-delity adversary vs. a simple collisional
adversary that simply selects strings from those that
have collided twice, i.e., using
  
. (For higher
budgets an optimal collisional adversary may be more
successful by using a variety of bins and their unions.)
Pseudo-delity vs. collisional.
Consider parame-
ters
FP
such that a pseudo-delity adversary
(
) is compelled to include
(e.g., 1024) quantumly-
generated strings, and then pseudo-randomly obtain the
remaining
strings.
We now ask: how many quantumly-generated strings a
collisional adversary would actually include in the sample
if its budget induces a large enough number of collisions?
The answer depends on several quantities. For the given
budget (implicit, not shown in the indices), let:
be the expected QC-value of strings in bin
, the
values of which are determined in Section D.2.1;
be the number of strings that the
c
ollisional
adversary will select from bin ;
PRG 
be as
, and
PRG
be as
, but with
respect to the set of all strings that the adversary
will not select from bins with positive multiplicity,
and of all strings not output by quantum evaluation.
This is the set from which the adversary will select
strings directly by pseudo-random generation.
Consider a simplied collisional adversary that will, for
the same nal sample size
, use
fewer quantumly
generated strings, all from bin . Then we have:
(26)
SQC
PRG PRG
(27)
SQC
PRG PRG
(28)
where SQC
denotes the expected
s
um of QC-values, and
refers to either the pseudo-delity(
) or the collisional
() adversary. The above system yields:
(29)
Example 7.
Consider a setting with
  
qubits.
Table 9shows, for two dierent budgets (

and

),
the estimated entropy for collisional attacks of various
degrees (i.e., various multiplicities
they take advantage
of). The calculation is for a case with

(i.e.,
when the SQC threshold requires a contribution of
  
from strings quantumly-generated by a
pseudo-delity adversary).
The number
of strings obtained with multiplicity
depends on the budget
. For example, evaluating the
circuit

times yields about

collisions. The expected
entropy
per string also varies with the budget and
the multiplicity
, as determined in Section D.2.2. In the
table, the precision shown for
and
was tailored
in each case to highlight how small is the correction com-
pared to the approximations
and
.
Some observations:
1.
For both budgets the expected QC-values of strings
are still very close to

, but they would
become noticeably dierent for high enough budget.
2.
For

, the attack with
is not possible
if
  
, since then
 
would be greater
than the expected number of collisions.
3. The higher the budget the lower the entropy.
Table 9: Comparison pseudo-delity vs. collisional
Assuming
 
 
 1 1.999999 1024.0  
2 2.999999 512.0   
 1 1.99998 1024.0  
2 2.99998 512.0  
The approximation
is based on
(24)
, i.e., the iterated application
of
(25)
in each bin. The direct application
(24)
, one string at a
time, would yield a higher estimate, by a factor of up to about
10 % in each case.
Page 19 of 35
A more relaxed estimation.
We have considered an
adversary with a quantum computer with delity
.
But a client may want to assume that the adversary can
only quantumly evaluate with a lower delity, e.g.,

. Then, in a model where any faulty evaluation yields
a uniformly random string, the previous formulas
(9)
for
expected entropy
per string output by the quantum
computer would have to be adjusted. Then, to achieve
the entropy reduction the adversary would need to have
higher sampling budget.
5.5. Final randomness for applications
What use may a client make of a list of millions of strings
that may potentially include several hundreds or thou-
sands of bits of fresh entropy? We do not explore in these
notes the interesting use of information theoretic random-
ness extractors. However, from a practical standpoint
and considering cryptographic use, we recommend the
use of a cryptographic combination of the entropy, into a
bit-string with approximately full entropy. For example,
this can be a 512-bit string with about 511 or 512 bits of
entropy. (We propose to assume 511.37 bits, as expected
for a random function with 512 bits of output.) For practi-
cal purposes, this is enough as a seed of a pseudo-random
number generator that can then produce a much larger
string indistinguishable from random (by whoever does
not know the seed). A combination performed by direct
application of an ecient cryptographic hash function is
a candidate with merit but susceptible to a bias attack.
Hash bias attack.
If the adversary
anticipates that
the client will extract entropy from the sample by ap-
plying a fast-to-evaluate compressing function without
secrets, then it can induce a further reduction in entropy.
For example, consider that the client uses a cryptographic
hash function, such as the Secure Hash Algorithm 3 (SHA-
3) version with 512 bits of output, to hash the sample and
use it as the actual randomness output by the protocol.
In that case,
can try many modications of the one or
two of the last pseudo-random strings in order to bias
the hash of the sample, e.g., making it satisfy a secret
predicate with small positive probability (e.g., of about

if it can still perform

hash computations within
the deadline). This makes the hash output be from a set
of reduced size, e.g., about

instead of

.
For example, the adversary could induce the rst 64 bits
of the hash to be a certain secret known only to the
adversary. This would reduce the entropy of the output
by about 64 bits. In practice this is not problematic for
applications that intend to use a seed not required to
have more than, say, 400+ bits of entropy.
Nonetheless, we describe two possible mitigations:
If the application allows the client to wait for the
calculation of QC-values, then the client can include
(at least a few of) the QC-values of the strings as part
of the hash input. This can be impractical for some
applications, due to the required delay in computing
the QC-values.
Using a veriable delay function for the hash, the
adversary does not have enough time to compute it
before it has to publish the sample of strings.
Alternative post-processing.
Alternatively to hash-
ing, the client can instead include a secret key (if one
exists) as part of the hash input, to prevent the operator
from limiting the size of the image space. A standard
method for this approach is to use a hash-based message-
authentication code. But if it is a one-time use secret, it
is enough to prepend it to the rest of the sample before
hashing. Actually, the entropy is not lost if the client
reveals the secret at this stage, as long as it was unpre-
dictable by the adversary before it had to publish the
sample of strings. In an application setting where it is
preferable to also prevent the client from biasing the out-
put, then the secret can be committed in advance, before
the circuit is sent to the quantum computer operator.
5.6. Classes of adversaries
Let an adversary be called “optimal” within a class if,
while satisfying the FP constraint, it minimizes the en-
tropy of the sample.
Security reduction.
These notes do not provide a
complexity-theoretic reduction. Such a reduction would
have to rule out the existence of adversaries much stronger
than those we consider here. Instead, we make the en-
tropy estimation for a concrete ecient adversary, ar-
guably optimal within an interesting class. We leave as
an open problem investigating how large this class is. A
reduction by Aaronson [Aar19] guarantees a minimum
of a few bits of entropy, in the setting of one string per
circuit. We consider, instead, repeated sampling from
the same circuit, as described in Section 2.4.
Class “A” of adversaries.
We dene class “A” as
the class of adversaries (parametrized by a sampling
budget
) for which the optimal adversary is a collisional
one. We hypothesize that this class captures the range of
adversaries of practical concern. We also hypothesize that
class A includes the set of ecient adversaries that only
access the quantum computer via a black-box interface.
While we do not know of any ecient adversary, outside
class A, that is better than the collisional adversary, we
do not rule out that possibility. We hypothesize that
any ecient adversary outside of this class would need
to use a non-trivial (currently unknown) mathematical
trick taking advantage of the circuit specication
. The
intuition for this hypothesis is conveyed below by an
analogy.
Page 20 of 35
An analogy.
We do not prove how general or restrictive
the class A is with respect to aecting the distribution of
QC-values, nor do we attempt to relate it to a complexity
theoretic argument. However, we provide an intuitive
argument by making an analogy with the properties of
Carter-Wegman universal hashing [CW77].
Universal hashing:
1. Member of a large class.
The hash function
is uniformly selected from a large family of hash
functions, all with the same output range.
2. Equal distribution of output values.
For each
hash function, each possible output has the same
number of pre-images.
3. Advantage in predicting or biasing the out-
put values.
Until the hash function is dened, the
future hash output of any particular input cannot
be predicted any better than guessing the output
of a uniform selection over the range.
Sampling from random quantum circuits:
1. Member of a large class.
The random quantum
circuit is (pseudo-)uniformly selected from a large
class of circuits, all with the same output range.