Available via license: CC BY
Content may be subject to copyright.
Citation: Rioul, O. What Is
Randomness? The Interplay between
Alpha Entropies, Total Variation and
Guessing. Phys. Sci. Forum 2022,5, 30.
https://doi.org/10.3390/
psf2022005030
Academic Editors: Frédéric
Barbaresco, Ali Mohammad-Djafari,
Frank Nielsen and Martino
Trassinelli
Published: 13 December 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the author.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Proceeding Paper
What Is Randomness? The Interplay between Alpha Entropies,
Total Variation and Guessing †
Olivier Rioul
LTCI, Télécom Paris, Institut Polytechnique de Paris, 91120 Palaiseau, France; olivier.rioul@telecom-paris.fr
†
Presented at the 41st International Workshop on Bayesian Inference and Maximum Entropy Methods in Science
and Engineering, Paris, France, 18–22 July 2022.
Abstract:
In many areas of computer science, it is of primary importance to assess the randomness
of a certain variable
X
. Many different criteria can be used to evaluate randomness, possibly after
observing some disclosed data. A “sufficiently random”
X
is often described as “entropic”. Indeed,
Shannon’s entropy is known to provide a resistance criterion against modeling attacks. More generally
one may consider the Rényi
α
-entropy where Shannon’s entropy, collision entropy and min-entropy
are recovered as particular cases
α=
1, 2 and
+∞
, respectively. Guess work or guessing entropy is
also of great interest in relation to
α
-entropy. On the other hand, many applications rely instead on
the “statistical distance”, also known as “total variation" distance, to the uniform distribution. This
criterion is particularly important because a very small distance ensures that no statistical test can
effectively distinguish between the actual distribution and the uniform distribution. In this paper,
we establish optimal lower and upper bounds between
α
-entropy, guessing entropy on one hand,
and error probability and total variation distance to the uniform on the other hand. In this context, it
turns out that the best known “Pinsker inequality” and recent “reverse Pinsker inequalities” are not
necessarily optimal. We recover or improve previous Fano-type and Pinsker-type inequalities used
for several applications.
Keywords: statistical (total variation) distance; α-entropy; guessing entropy; probability of error
1. Some Well-Known “Randomness” Measures
It is of primary importance to assess the “randomness” of a certain random variable
X
,
which represents some identifier, cryptographic key, signature or any type of intended secret.
Applications include pseudo-random bit generators [
1
], general cipher security [
2
], random-
ness extractors [
3
] and hash functions ([
4
], Chapter 8), physically unclonable functions [
5
],
true random number generators [
6
], to list but a few. In all of these examples,
X
takes finitely
many values
x∈ {x1
,
x2
,
. . .
,
xM}
with probabilities
pX(x) = P(X=x)
. In this paper, it will
be convenient to denote
p(1)≥p(2)≥ · · · ≥ p(M)(1)
any rearrangement of the probabilities
p(x)
in descending order (where ties can be resolved
arbitrarily),
p(1)=maxxpX(x)
is the maximum probability,
p(2)
the second maximum, etc.
In addition, we need to define the cumulative sums
P(k),p(1)+. . . +p(k)(k=1, 2, . . . , M)(2)
where, in particular, P(M)=1.
Many different criteria can be used to evaluate the randomness of
X
or its distribution
pX
, depending on the type of attack that can be carried out to recover the whole or part
of the secret, possibly after observing disclosed data
Y
. The observed random variable
Y
can be any random variable and is not necessarily discrete. The conditional probability
Phys. Sci. Forum 2022,5, 30. https://doi.org/10.3390/psf2022005030 https://www.mdpi.com/journal/psf
Phys. Sci. Forum 2022,5, 30 2 of 9
distribution of
X
having observed
Y=y
is denoted by
pX|y
to distinguish it from from the
unconditional distribution pX. To simplify the notation, we write
p(x),pX(x) = P(X=x)(3)
p(x|y),pX|y(x) = P(X=x|Y=y). (4)
A “sufficiently random” secret is often described as “entropic” in the literature. Indeed,
Shannon’s entropy
H(X) = H(p),∑
x
p(x)log 1
p(x)=Elog 1
p(X)(5)
(with the convention 0
log 1
0=
0) is known to provide a resistance criterion against model-
ing attacks. It was introduced by Shannon as a measure of uncertainty of
X
. The average
entropy after having observed Yis the usual conditional entropy
H(X|Y),EyH(pX|y) = Elog 1
p(X|Y). (6)
A well-known generalization of Shannon’s entropy is the Rényi entropy of order
α>
0
or α-entropy
Hα(X) = Hα(p),1
1−αlog ∑
x
p(x)α=α
1−αlog kpXkα(7)
where, by continuity as
α→
1, the 1-entropy
H1(X) = H(X)
is Shannon’s entropy. One
may consider many different definitions of conditional
α
-entropy [
7
], but for many applica-
tions the preferred choice is Arimoto’s definition [8–10]
Hα(X|Y),α
1−αlog EykpX|ykα(8)
where the expectation over
Y
is taken over the “
α
-norm” inside the logarithm. (Strictly
speaking, k · kαis not a norm when α<1.)
For α=2, the collision entropy
H2(X) = H2(p) = log 1
P(X=X0), (9)
where
X0
is an independent copy of
X
, is often used to ensure security against collision at-
tacks. Perhaps one of the most popular criteria is the min-entropy defined when
α→+∞as
H∞(X) = H∞(p) = log 1
p(1)
=log 1
1−Pe(X), (10)
whose maximization is equivalent to a probability criterion to ensure a worst-case security
level. Arimoto’s conditional ∞-entropy takes the form
H∞(X|Y) = log 1
1−Pe(X|Y)(11)
where we have noted Pe(X) = Pe(p),1−p(1)=1−P(1)(12)
Pe(X|Y),EyPe(X|y). (13)
The latter quantities correspond to the minimum probability of decision error using a MAP
(maximum a posteriori probability) rule (see, e.g., [11]).
Phys. Sci. Forum 2022,5, 30 3 of 9
Guess work or guessing entropy [2,12]
G(X) = G(pX),M
∑
i=1
i·p(i)(14)
and more generally guessing moments of order ρ>0 or ρ-guessing entropy
Gρ(X) = Gρ(pX),M
∑
i=1
iρ·p(i)(15)
are also of great interest in relation to
α
-entropy [
10
,
13
,
14
]. The conditional versions given
observation Yare the expectations
Gρ(X|Y),EyGρ(X|y). (16)
When
ρ=
1, this represents the average number of guesses that an attacker has to make to
guess the secret Xcorrectly after having observed Y[13].
2. Statistical (Total Variation) Distance to the Uniform Distribution
As shown in the sequel, all quantities introduced in the preceding section (
H
,
Hα
,
Pe
,
G
,
Gρ) have many properties in common. In particular, each of these quantities attains
•
its minimum value for a delta (Dirac) distribution
p=δ
, that is, a deterministic random
variable Xwith p(1)=1 and all other probabilities =0;
•
its maximum value for the uniform distribution
p=u
, that is, a uniformly distributed
random variable Xwith p(x) = 1
Mfor all x.
Indeed, it can be easily checked that
0≤Hα(X)≤log M(17)
1≤G(X)≤M+1
2(18)
0≤Pe(X)≤1−1
M(19)
where the lower (resp. upper) bounds are attained for a delta (resp. uniform) distribution,
the uniform distribution is the “most entropic” (
Hα
), “hardest to guess” (
G
), and “hardest
to detect” (Pe).
The maximum entropy property is related to the minimization of divergence [15]
D(pku) = log M−H(p)(20)
where
D(pkq) = ∑p(x)log p(x)
q(x)≥
0 denotes the Kullback-Leibler divergence which
vanishes if and only if
p=q
. Therefore, entropy appears as the complementary value of
the divergence to the uniform distribution. Similarly, for α-entropy,
Dα(pku) = log M−Hα(p)(21)
where
Dα(pkq) = 1
α−1log ∑xp(x)αq(x)1−α
denotes the Rényi
α
-divergence [
16
]
(Bhattacharyya distance for α=1
2).
Instead of the divergence to the uniform distribution, it is often desirable to rely
instead on the statistical distance, also known as total variation distance to the uniform
distribution. The general expression of the total variation distance is
∆(p,q) = 1
2∑
x
|p(x)−q(x)|(22)
Phys. Sci. Forum 2022,5, 30 4 of 9
where the 1/2 factor is there to ensure that 0 ≤∆(p,q)≤1. Equivalently,
∆(p,q) = max
T|P(T)−Q(T)|(23)
where the maximum is over any event
T
and
P
,
Q
denote the respective probabilities w.r.t.
pand q. As is well known, the maximum
∆(p,q) = P(T+)−Q(T+)(24)
is attained when T=T+={x|p(x)≥q(x)}.
The total variation criterion is particularly important because a very small distance
∆(p
,
q)
ensures that no statistical test can effectively distinguish between
p
and
q
. In fact,
given some observation
X
following either
p
(null hypothesis
H0
) or
q
(alternate hypothesis
H1
), such a statistical test takes the form “is
X∈T
?” (then accept
H0
, otherwise reject
H0
).
If
|P(X∈T)−Q(X∈T)| ≤ ∆(p
,
q)
is small enough, the type-I or type-II errors have total
probability
P(X6∈ T) + Q(X∈T)≈
1. Thus, in this sense the two hypotheses
p
and
q
are
undistinguishable (statistically equivalent).
By analogy with
(20)
and
(21)
we can then define “statistical randomness”
R(X) = R(p)≥
0
as the complementary value of the statistical distance to the uniform distribution, i.e., such that
∆(p,u) = 1−R(p)(25)
holds. With this definition,
R(X) = R(p),1−1
2∑
x
|p(x)−1
M|(26)
is maximum
=
1 when
∆(p
,
u) =
0, i.e.,
p=u
. Thus the uniform distribution
u
is the
“most random”. What is fundamental is that
R(X)≈
1 ensures that no statistical test can
effectively distinguish the actual distribution from the uniform distribution.
Again the “least random” distribution corresponds to the deterministic case. In fact,
from (24) we have
∆(p,u) = P(T+)−K
M=P(K)−K
M(27)
where
T+={x|p(x)≥1
M}
of cardinality
K=|T+|
, and
P(T+) = P(K)
by definition
(2)
. It
is easily seen that
∆(p
,
u)
attains its maximum value
=
1
−1
M
if and only if
p=δ
is a delta
distribution. In summary
1
M≤R(X)≤1 (28)
where the lower (resp. upper) bound is attained for a delta (resp. uniform) distribution.
The conditional version is again taken by averaging over the observation:
R(X|Y),EyR(X|y). (29)
3. F-Concavity: Knowledge Reduces Randomness and Data Processing
Knowledge of the observed data
Y
(on average) reduces uncertainty, improves detection or
guessing, and reduces randomness in the sense that:
Hα(X|Y)≤Hα(X)(30)
G(X|Y)≤G(X)(31)
Pe(X|Y)≤Pe(X)(32)
R(X|Y)≤R(X). (33)
When
α=
1, the property
H(X|Y)≤H(X)
is well-known (“conditioning reduces entropy”
[
15
]): the difference
H(X)−H(X|Y) = I(X
;
Y)
is the mutual information, which is nonnegative.
Property
(30)
for
α6=
1 is also well known, see [
7
,
8
]. In view of
(10)
and
(11)
, the case
α= +∞
Phys. Sci. Forum 2022,5, 30 5 of 9
in
(30)
is equivalent to
(32)
which is obvious in the sense that any observation can only improve
MAP detection. This, as well as (31), is also easily proved directly (see, e.g., [17]).
For all quantities
H
,
Pe
,
G
,
R
, the conditional quantity is obtained by averaging over
the observation as in
(6)
,
(13)
,
(16)
and
(29)
. Since
p(x) = Eyp(x|y)
, the fact that knowledge
of
Y
reduces
H
,
Pe
,
G
or
R
amounts to saying that these are concave functions of the
distribution
p
of
X
. Note that concavity of
R(X) = R(p)
in
p
is clear from the definition
(26), which shows (33).
For entropy
H
, this also has been given some physical interpretation: “mixing” distri-
butions (taking convex combinations of probability distributions) can only increase the entropy
on average. For example, given any two distributions
p
and
q
,
H(λp+¯
λq)≥λH(p) + ¯
λH(q)
where 0
≤λ=
1
−¯
λ≤
1. Similarly, such mixing of distributions increases the average probabil-
ity of error Pe, guessing entropy G, and statistical randomness R.
For conditional
α
-entropy
Hα(X|Y)
where
α6=
1, averaging over
Y
in the definition
(8)
is made on the
α
-norm of the distribution
pX|y
, which is known to be convex for
α>
1 (by
Minkowski’s inequality) and concave for 0
<α<
1 (by the reverse Minkowski inequality),
the fact that knowledge reduces
α
-entropy (inequality
(30)
) is equivalent to the fact that
Hα(p)
in
(6)
is an
F
-concave function, that is, an increasing function
F
of a concave function in
p
,
where
F(x) = α
1−αlog(sgn(
1
−α)x)
. The average over
Y
in
Hα(X|Y)
is made on the quantity
F−1(Hα)instead of Hα. Thus, for example, H1/2 (p)is a log-concave function of p.
A straightforward generalization of
(30)
–
(33)
is the data processing inequality: for any
Markov chain X −Y−Z, i.e., such that p(x|y,z) = p(x|y),
Hα(X|Y)≤Hα(X|Z)(34)
G(X|Y)≤G(X|Z)(35)
Pe(X|Y)≤Pe(X|Z)(36)
R(X|Y)≤R(X|Z)(37)
When
α=
1, the property
H(X|Y)≤H(X|Z)
amounts to
I(X
;
Z)≤I(X
;
Y)
, i.e., (post)-
processing can never increase information. Inequalities
(34)
–
(37)
can be deduced from
(30)
–
(33)
by considering a fixed
Z=z
, averaging over
Z
to show that
H(X|Y
,
Z)≤H(X|Z)
, etc.
(additional knowledge reduces randomness) and then noting that
p(x|y
,
z) = p(x|y)
by the Markov
property—see, e.g., [
7
,
18
] for
Hα
and [
17
] for
G
. Conversely,
(30)
–
(33)
can be re-obtained from
(34)
–
(37)
as the particular case
Z=
0 (any deterministic variable representing zero information).
4. S-Concavity: Mixing Increases Randomness and Data Processing
Another type of mixing (different from the one described in the preceding section) is
also useful in certain physical science considerations. It can be described as a sequence of
elementary mixing operations as follows. Suppose that one only modifies two probability
values
pi=p(xi)
and
pj=p(xj)
for
i6=j
. Since the result should be again a probability
distribution, the sum pi+pjshould be kept constant. Then there are two possibilities:
•|pi−pj|
decreases; the resulting distribution is “smoother”, “more spread out”, “more
disordered”; the resulting operation can be written as
(pi
,
pj)7→ (λpi+¯
λpj
,
λpj+¯
λpi)
where 0
≤λ=
1
−¯
λ≤
1, also known as “transfer” operation. We call it elementary
mixing operation or M-transformation in short.
•|pi−pj|
increases; this is the reverse operation, an elementary unmixing operation or
U-transformation in short.
We say that a quantity is
s
-concave if it increases by any
M
-transformation (equivalently,
decreases by any
U
-transformation). Note that any increasing function
F
of an
s
-concave
function is again s-concave.
This notion coincides with that of Schur-concavity from majorization theory [
19
]. In fact,
we can say that
p
is majorized by
q
, and we write
p≺q
, if
p
is obtained from
q
by a (finite)
sequence of elementary
M
-transformations, or, what amounts the same, that
q
majorizes
Phys. Sci. Forum 2022,5, 30 6 of 9
p
, that is,
q
is obtained from
p
by a (finite) sequence of elementary
U
-transformations. A
well-known result ([19], p. 34) states that p≺qif and only if
P(k)≤Q(k)(0<k<M)(38)
(see definition (2)) where always P(M)=Q(M)=1.
From the above definitions it is immediate to see that all previously considered quan-
tities
H
,
Hα
,
G
,
Gρ
,
Pe
,
R
are
s
-concave, mixing increases uncertainty, guessing, error, and
randomness, that is, p≺qimplies
Hα(p)≥Hα(q)(39)
Gρ(p)≥Gρ(q)(40)
Pe(p)≥Pe(q)(41)
R(p)≥R(q). (42)
For
Hα
and
R
this can be easily seen from the fact that these quantities can be written as (an
increasing function of) a quantity of the form
∑xφ(p(x))
where
φ
is concave. Then the effet of
an
M
-transformation
(pi
,
pj)7→ (λpi+¯
λpj
,
λpj+¯
λpi)
gives
φ(λpi+¯
λpj) + φ(λpj+¯
λpi)≥
λφ(pi) + ¯
λφ(pj) + λφ(pj) + ¯
λφ(pi) = φ(pi) + φ(pj)
. For
Pe
it is obvious, and for
G
and
Gρ
it is also easily proved using characterization (38) and summation by parts [17].
Another kind of (functional or deterministic) data processing inequality can be obtained
from (39)–(42) as a particular case. For any deterministic function f,
Hα(f(X)) ≤Hα(X)(43)
G(f(X)) ≤G(X)(44)
Pe(f(X)) ≤Pe(X)(45)
R(f(X)) ≤R(X)(46)
Thus deterministic processing (by
f
) decreases (cannot increase) uncertainty, can only make
guessing or detection easier, and decreases randomness. For
α=
1 the inequality
H(f(X)) ≤
H(X)
can also be seen from the data processing inequality of the preceding section by
noting that
H(f(X)) = I(f(X)
;
f(X)) ≤I(X
;
f(X)) ≤H(X)
(since
X−f(X)−f(X)
is
trivially a Markov chain).
To prove
(43)
–
(46)
in general, consider preimages by
f
of values of
y=f(x)
; it is
enough to show that each of the quantities
Hα
,
Pe
,
G
, or
R
decreases by the elementary
operation consisting in putting together two distincts values
xi
,
xj
of
x
in the same preimage
of
y
. However, for probability distributions, this operation amounts the
U
-transformation
(pi,pj)7→ (pi+pj, 0)and the result follows by s-concavity.
An equivalent property of
(43)
–
(46)
is the fact that any additional random variable
Y
increases uncertainty, probability of error, guessing, and randomness in the sense that
Hα(X)≤Hα(X,Y)(47)
G(X)≤G(X,Y)(48)
Pe(X)≤Pe(X,Y)(49)
R(X)≤R(X,Y)(50)
This is a particular case of
(43)
–
(46)
applied to the joint
(X
,
Y)
and the first projection
f(x
,
y) = x
. Conversely,
(43)
–
(46)
follows from
(47)
–
(50)
by applying it to
(f(X)
,
X)
in
place of (X,Y)and noting that the distribution of (f(X),X)is essentially that of X.
5. Optimal Fano-Type and Pinsker-Type Bounds
We have seen that informational quantities such as entropies
H
,
Hα
, guessing entropies
G
,
Gρ
on one hand, and statistical quantities such as probability of error for MAP detection
Pe
and statistical randomness
R
on the other hand, satisfy many common properties:
Phys. Sci. Forum 2022,5, 30 7 of 9
decrease by knowledge, data processing, increase by mixing, etc. For this reason, it is
desirable to establish the best possible bounds between one informational quantity (such
as Hαor Gρ) and one statistical quantity (Peor R=1−∆(p,u)).
To achieve this, we remark that for any distribution
p
, we have the following majoriza-
tions. For fixed Pe=1−Ps:
(Ps,Pe
M−1, . . . , Pe
M−1)≺p≺(Ps, . . . , Ps, 1 −KPs, 0, . . . , 0)(51)
where (necessarily) K=b1
Psc, and for fixed R=1−∆:
(1
M+∆
K, . . . , 1
M+∆
K
|{z }
Ktimes
,1
M−∆
M−K, . . . , 1
M−∆
M−K
| {z }
M−Ktimes
)≺p≺(∆+1
M,1
M, . . . , 1
M
|{z }
L−1 times
,R−L
M, 0, . . . , 0)(52)
where
K=|{p≥1
M}|
as in
(27)
and (necessarily)
L=bMRc
(
K
can possibly be any integer
between 1 and
L
). These majorizations are easily established using characterizations
(12)
,
(27) and (38).
Applying
s
-concavity of entropies
Hα
or
Gρ
to
(51)
gives closed-form upper bounds of
entropies as a function of
Pe
, known as Fano inequalities; and closed-form lower bounds,
known as reverse Fano inequalities. Figure 1shows some optimal regions.
00,25 0,5 0,75 1
0,25
0,5
0,75
1
1,25
1,5
1,75
2
H1/2
Pe
00,25 0,5 0,75 1
0,25
0,5
0,75
1
1,25
1,5
1,75
2
H
Pe
00,25 0,5 0,75 1
0,25
0,5
0,75
1
1,25
1,5
1,75
2
H2
Pe
00,25 0,5 0,75 1
0,25
0,5
0,75
1
1,25
log G
Pe
00,25 0,5 0,75 1
0,5
1
1,5
2
2,5
3
3,5
4
4,5
5
H1/2
Pe
00,25 0,5 0,75 1
0,8
1,6
2,4
3,2
4
4,8
H
Pe
00,25 0,5 0,75 1
0,5
1
1,5
2
2,5
3
3,5
4
4,5
5
H2
Pe
00,25 0,5 0,75 1
0,4
0,8
1,2
1,6
2
2,4
2,8
3,2
3,6
4
log G
Pe
Figure 1.
Optimal regions: Entropies (in bits) vs. error probability. Top row
M=
4; bottom row
M=
32.
The original Fano inequality was an upper bound on conditional entropy
H(X|Y)
as a function of
Pe(X|Y)
. It can be shown that upper bounds in the conditional case are
unchanged. Lower bounds of conditional entropies or
α
-entropies, however, have to be
slightly changed due to the average operation inside the function
F
(see Section 3above)
by taking the convex enveloppe (piecewise linear) of the lower curve on
F−1(Hα)
. In this
way, one recovers easily the results of [20] for H, [11] for Hα, and [14,17] for Gand Gρ.
Likewise, applying
s
-concavity of entropies
Hα
or
Gρ
to
(52)
gives closed-form upper
bounds of entropies as a function of
R
, similar to Pinsker inequalities; and closed-form lower
bounds, similar to reverse Pinsker inequalities. Figure 2shows some optimal regions.
The various Pinsker and reverse Pinsker inequalities that can be found in the literature
give bounds between
∆(p
,
q)
and
D(pkq)
for general
q
. Such inequalities find application
in Quantum physics [
21
] and to derive lower bounds on the minimax risk in nonparametric
estimation [
22
]. As they are of more general applicability, they turn out not to be optimal
here since we have optimized the bounds in the particular case
q=u
. Using our method,
one again recovers easily previous results of [
23
] (and [
24
], Theorem 26) for
H
, and improves
previous inequalities used for several applications [3,4,6].
Phys. Sci. Forum 2022,5, 30 8 of 9
00,25 0,5 0,75
0,25
0,5
0,75
1
1,25
1,5
1,75
2
0
H1/2
R
00,25 0,5 0,75
0,25
0,5
0,75
1
1,25
1,5
1,75
2
H
R
00,25 0,5 0,75
0,25
0,5
0,75
1
1,25
1,5
1,75
2
H2
R
00,25 0,5 0,75 1
0,25
0,5
0,75
1
1,25
log G
R
00,25 0,5 0,75
0,8
1,6
2,4
3,2
4
4,8
H1/2
R
00,25 0,5 0,75 1
0,8
1,6
2,4
3,2
4
4,8
H
R
00,25 0,5 0,75 1
0,5
1
1,5
2
2,5
3
3,5
4
4,5
5
1
H2
R
00,25 0,5 0,75 1
0,4
0,8
1,2
1,6
2
2,4
2,8
3,2
3,6
4
log G
R
Figure 2.
Optimal regions: Entropies (in bits) vs. randomness
R
. Top row
M=
4; bottom row
M=32.
6. Conclusions
Using a simple method based on “mixing” or majorization, we have established
optimal (Fano-type and Pinsker-type) bounds between entropic quantities (
Hα
,
Gρ
) and
statistical quantities (
Pe
,
R
) in an interplay between information theory and statistics. As a
perspective, similar methodology could be developed for statistical distance to an arbitrary
(not necessarily uniform) distribution.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The author declares no conflict of interest.
References
1. Maurer, U.M. A Universal Statistical Test for Random Bit Generators. J. Cryptol. 1992,5, 89–105. [CrossRef]
2.
Pliam, J.O. Guesswork and Variation Distance as Measures of Cipher Security. In SAC 1999: Selected Areas in Cryptography,
Proceedings of the International Workshop on Selected Areas in Cryptography, Kingston, ON, Canada, 9–10 August 1999; Heys, H., Adams,
C., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1999; Volume 1758, pp. 62–77.
3.
Chevalier, C.; Fouque, P.A.; Pointcheval, D.; Zimmer, S. Optimal Randomness Extraction from a Diffie-Hellman Element.
In Advances in Cryptology—EUROCRYPT 2009, Proceedings of the Annual International Conference on the Theory and Applications
of Cryptographic Techniques, Cologne, Germany, 26–30 April 2009; Joux, A., Ed.; Lecture Notes in Computer Science; Springer:
Berlin/Heidelberg, Germany, 2009; Volume 5479, pp. 572–589.
4.
Shoup, V. A Computational Introduction to Number Theory and Algebra, 2nd ed.; Cambridge University Press: Cambridge, UK, 2009.
5.
Schaub, A.; Boutros, J.J.; Rioul, O. Entropy Estimation of Physically Unclonable Functions via Chow Parameters. In Proceedings of
the 57th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 24–27 September 2019.
6.
Killmann, W.; Schindler, W. A Proposal for Functionality Classes for Random Number Generators. Ver. 2.0, Anwendungshinweise
und Interpretationen zum Schema (AIS) 31 of the Bundesamt für Sicherheit in der Informationstechnik. 2011. Available on-
line: https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Certification/Interpretations/AIS_31_Functionality_classes_
for_random_number_generators_e.pdf?__blob=publicationFile&v=4 (accessed on 11 March 2021).
7. Fehr, S.; Berens, S. On the conditional Rényi entropy. IEEE Trans. Inf. Theory 2014,60, 6801–6810. [CrossRef]
8.
Arimoto, S. Information measures and capacity of order
α
for discrete memoryless channels. In Topics in Information Theory;
Csiszár, I., Elias, P., Eds.; Colloquium Mathematica Societatis János Bolyai, 2nd ed.; North Holland: Amsterdam, The Netherlands,
1977; Volume 16, pp. 41–52.
9.
Liu, Y.; Cheng, W.; Guilley, S.; Rioul, O. On conditional alpha-information and its application in side-channel analysis. In
Proceedings of the 2021 IEEE Information Theory Workshop (ITW2021), Online, 17–21 October 2021.
10. Rioul, O. Variations on a theme by Massey. IEEE Trans. Inf. Theory 2022,68, 2813–2828. [CrossRef]
Phys. Sci. Forum 2022,5, 30 9 of 9
11.
Sason, I.; Verdú, S. Arimoto–Rényi Conditional Entropy and Bayesian
M
-Ary Hypothesis Testing. IEEE Trans. Inf. Theory
2018
,
64, 4–25. [CrossRef]
12.
Massey, J.L. Guessing and entropy. In Proceedings of the IEEE International Symposium on Information Theory, Trondheim,
Norway, 27 June–1 July 1994; p. 204.
13.
Arikan, E. An inequality on guessing and its application to sequential decoding. IEEE Trans. Inf. Theory
1996
,42, 99–105.
[CrossRef]
14.
Sason, I.; Verdú, S. Improved Bounds on Lossless Source Coding and Guessing Moments via Rényi Measures. IEEE Trans. Inf. Theory
2018,64, 4323–4346. [CrossRef]
15. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006.
16.
van Erven, T.; Harremoës, P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory
2014
,60, 3797–3820.
[CrossRef]
17.
Béguinot, J.; Cheng, W.; Guilley, S.; Rioul, O. Be my guess: Guessing entropy vs. success rate for evaluating side-channel attacks
of secure chips. In Proceedings of the 25th Euromicro Conference on Digital System Design (DSD 2022), Maspalomas, Gran
Canaria, Spain, 31 August–2 September 2022.
18.
Rioul, O. A primer on alpha-information theory with application to leakage in secrecy systems. In Geometric Science of Information,
Proceedings of the 5th Conference on Geometric Science of Information (GSI’21), Paris, France, 21–23 July 2021; Lecture Notes in
Computer Science; Springer: Berlin/Heidelberg, Germany, 2021; Volume 12829, pp. 459–467.
19.
Marshall, A.W.; Olkin, I.; Arnold, B.C. Inequalities: Theory of Majorization and Its Applications, 2nd ed.; Springer Series in Statistics;
Springer: Berlin/Heidelberg, Germany, 2011.
20.
Ho, S.W.; Verdú, S. On the Interplay Between Conditional Entropy and Error Probability. IEEE Trans. Inf. Theory
2010
,56,
5930–5942. [CrossRef]
21. Audenaert, K.M.R.; Eisert, J. Continuity Bounds on the Quantum Relative Entropy—II. J. Math. Phys. 2011,52, 7. [CrossRef]
22.
Tsybakov, A.B. Introduction to Nonparametric Estimation; Springer Series in Statistics; Springer: Berlin/Heidelberg, Germany, 2009.
23.
Ho, S.W.; Yeung, R.W. The Interplay Between Entropy and Variational Distance. IEEE Trans. Inf. Theory
2010
,56, 5906–5929.
[CrossRef]
24. Sason, I.; Verdú, S. f-Divergence Inequalities. IEEE Trans. Inf. Theory 2016,62, 5973–6006. [CrossRef]