Conference PaperPDF Available

What Is Randomness? The Interplay between Alpha Entropies, Total Variation and Guessing

Authors:
  • Télécom Paris


Citation: Rioul, O. What Is
Randomness? The Interplay between
Alpha Entropies, Total Variation and
Guessing. Phys. Sci. Forum 2022,5, 30.
https://doi.org/10.3390/
psf2022005030
Academic Editors: Frédéric
Barbaresco, Ali Mohammad-Djafari,
Frank Nielsen and Martino
Trassinelli
Published: 13 December 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the author.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Proceeding Paper
What Is Randomness? The Interplay between Alpha Entropies,
Total Variation and Guessing
Olivier Rioul
LTCI, Télécom Paris, Institut Polytechnique de Paris, 91120 Palaiseau, France; olivier.rioul@telecom-paris.fr
Presented at the 41st International Workshop on Bayesian Inference and Maximum Entropy Methods in Science
and Engineering, Paris, France, 18–22 July 2022.
Abstract:
In many areas of computer science, it is of primary importance to assess the randomness
of a certain variable
X
. Many different criteria can be used to evaluate randomness, possibly after
observing some disclosed data. A “sufficiently random”
X
is often described as “entropic”. Indeed,
Shannon’s entropy is known to provide a resistance criterion against modeling attacks. More generally
one may consider the Rényi
α
-entropy where Shannon’s entropy, collision entropy and min-entropy
are recovered as particular cases
α=
1, 2 and
+
, respectively. Guess work or guessing entropy is
also of great interest in relation to
α
-entropy. On the other hand, many applications rely instead on
the “statistical distance”, also known as “total variation" distance, to the uniform distribution. This
criterion is particularly important because a very small distance ensures that no statistical test can
effectively distinguish between the actual distribution and the uniform distribution. In this paper,
we establish optimal lower and upper bounds between
α
-entropy, guessing entropy on one hand,
and error probability and total variation distance to the uniform on the other hand. In this context, it
turns out that the best known “Pinsker inequality” and recent “reverse Pinsker inequalities” are not
necessarily optimal. We recover or improve previous Fano-type and Pinsker-type inequalities used
for several applications.
Keywords: statistical (total variation) distance; α-entropy; guessing entropy; probability of error
1. Some Well-Known “Randomness” Measures
It is of primary importance to assess the “randomness” of a certain random variable
X
,
which represents some identifier, cryptographic key, signature or any type of intended secret.
Applications include pseudo-random bit generators [
1
], general cipher security [
2
], random-
ness extractors [
3
] and hash functions ([
4
], Chapter 8), physically unclonable functions [
5
],
true random number generators [
6
], to list but a few. In all of these examples,
X
takes finitely
many values
x {x1
,
x2
,
. . .
,
xM}
with probabilities
pX(x) = P(X=x)
. In this paper, it will
be convenient to denote
p(1)p(2) · · · p(M)(1)
any rearrangement of the probabilities
p(x)
in descending order (where ties can be resolved
arbitrarily),
p(1)=maxxpX(x)
is the maximum probability,
p(2)
the second maximum, etc.
In addition, we need to define the cumulative sums
P(k),p(1)+. . . +p(k)(k=1, 2, . . . , M)(2)
where, in particular, P(M)=1.
Many different criteria can be used to evaluate the randomness of
X
or its distribution
pX
, depending on the type of attack that can be carried out to recover the whole or part
of the secret, possibly after observing disclosed data
Y
. The observed random variable
Y
can be any random variable and is not necessarily discrete. The conditional probability
Phys. Sci. Forum 2022,5, 30. https://doi.org/10.3390/psf2022005030 https://www.mdpi.com/journal/psf
Phys. Sci. Forum 2022,5, 30 2 of 9
distribution of
X
having observed
Y=y
is denoted by
pX|y
to distinguish it from from the
unconditional distribution pX. To simplify the notation, we write
p(x),pX(x) = P(X=x)(3)
p(x|y),pX|y(x) = P(X=x|Y=y). (4)
A “sufficiently random” secret is often described as “entropic” in the literature. Indeed,
Shannon’s entropy
H(X) = H(p),
x
p(x)log 1
p(x)=Elog 1
p(X)(5)
(with the convention 0
log 1
0=
0) is known to provide a resistance criterion against model-
ing attacks. It was introduced by Shannon as a measure of uncertainty of
X
. The average
entropy after having observed Yis the usual conditional entropy
H(X|Y),EyH(pX|y) = Elog 1
p(X|Y). (6)
A well-known generalization of Shannon’s entropy is the Rényi entropy of order
α>
0
or α-entropy
Hα(X) = Hα(p),1
1αlog
x
p(x)α=α
1αlog kpXkα(7)
where, by continuity as
α
1, the 1-entropy
H1(X) = H(X)
is Shannon’s entropy. One
may consider many different definitions of conditional
α
-entropy [
7
], but for many applica-
tions the preferred choice is Arimoto’s definition [810]
Hα(X|Y),α
1αlog EykpX|ykα(8)
where the expectation over
Y
is taken over the
α
-norm” inside the logarithm. (Strictly
speaking, k · kαis not a norm when α<1.)
For α=2, the collision entropy
H2(X) = H2(p) = log 1
P(X=X0), (9)
where
X0
is an independent copy of
X
, is often used to ensure security against collision at-
tacks. Perhaps one of the most popular criteria is the min-entropy defined when
α+as
H(X) = H(p) = log 1
p(1)
=log 1
1Pe(X), (10)
whose maximization is equivalent to a probability criterion to ensure a worst-case security
level. Arimoto’s conditional -entropy takes the form
H(X|Y) = log 1
1Pe(X|Y)(11)
where we have noted Pe(X) = Pe(p),1p(1)=1P(1)(12)
Pe(X|Y),EyPe(X|y). (13)
The latter quantities correspond to the minimum probability of decision error using a MAP
(maximum a posteriori probability) rule (see, e.g., [11]).
Phys. Sci. Forum 2022,5, 30 3 of 9
Guess work or guessing entropy [2,12]
G(X) = G(pX),M
i=1
i·p(i)(14)
and more generally guessing moments of order ρ>0 or ρ-guessing entropy
Gρ(X) = Gρ(pX),M
i=1
iρ·p(i)(15)
are also of great interest in relation to
α
-entropy [
10
,
13
,
14
]. The conditional versions given
observation Yare the expectations
Gρ(X|Y),EyGρ(X|y). (16)
When
ρ=
1, this represents the average number of guesses that an attacker has to make to
guess the secret Xcorrectly after having observed Y[13].
2. Statistical (Total Variation) Distance to the Uniform Distribution
As shown in the sequel, all quantities introduced in the preceding section (
H
,
Hα
,
Pe
,
G
,
Gρ) have many properties in common. In particular, each of these quantities attains
its minimum value for a delta (Dirac) distribution
p=δ
, that is, a deterministic random
variable Xwith p(1)=1 and all other probabilities =0;
its maximum value for the uniform distribution
p=u
, that is, a uniformly distributed
random variable Xwith p(x) = 1
Mfor all x.
Indeed, it can be easily checked that
0Hα(X)log M(17)
1G(X)M+1
2(18)
0Pe(X)11
M(19)
where the lower (resp. upper) bounds are attained for a delta (resp. uniform) distribution,
the uniform distribution is the “most entropic” (
Hα
), “hardest to guess” (
G
), and “hardest
to detect” (Pe).
The maximum entropy property is related to the minimization of divergence [15]
D(pku) = log MH(p)(20)
where
D(pkq) = p(x)log p(x)
q(x)
0 denotes the Kullback-Leibler divergence which
vanishes if and only if
p=q
. Therefore, entropy appears as the complementary value of
the divergence to the uniform distribution. Similarly, for α-entropy,
Dα(pku) = log MHα(p)(21)
where
Dα(pkq) = 1
α1log xp(x)αq(x)1α
denotes the Rényi
α
-divergence [
16
]
(Bhattacharyya distance for α=1
2).
Instead of the divergence to the uniform distribution, it is often desirable to rely
instead on the statistical distance, also known as total variation distance to the uniform
distribution. The general expression of the total variation distance is
(p,q) = 1
2
x
|p(x)q(x)|(22)
Phys. Sci. Forum 2022,5, 30 4 of 9
where the 1/2 factor is there to ensure that 0 (p,q)1. Equivalently,
(p,q) = max
T|P(T)Q(T)|(23)
where the maximum is over any event
T
and
P
,
Q
denote the respective probabilities w.r.t.
pand q. As is well known, the maximum
(p,q) = P(T+)Q(T+)(24)
is attained when T=T+={x|p(x)q(x)}.
The total variation criterion is particularly important because a very small distance
(p
,
q)
ensures that no statistical test can effectively distinguish between
p
and
q
. In fact,
given some observation
X
following either
p
(null hypothesis
H0
) or
q
(alternate hypothesis
H1
), such a statistical test takes the form “is
XT
?” (then accept
H0
, otherwise reject
H0
).
If
|P(XT)Q(XT)| (p
,
q)
is small enough, the type-I or type-II errors have total
probability
P(X6∈ T) + Q(XT)
1. Thus, in this sense the two hypotheses
p
and
q
are
undistinguishable (statistically equivalent).
By analogy with
(20)
and
(21)
we can then define statistical randomness
R(X) = R(p)
0
as the complementary value of the statistical distance to the uniform distribution, i.e., such that
(p,u) = 1R(p)(25)
holds. With this definition,
R(X) = R(p),11
2
x
|p(x)1
M|(26)
is maximum
=
1 when
(p
,
u) =
0, i.e.,
p=u
. Thus the uniform distribution
u
is the
“most random”. What is fundamental is that
R(X)
1 ensures that no statistical test can
effectively distinguish the actual distribution from the uniform distribution.
Again the “least random” distribution corresponds to the deterministic case. In fact,
from (24) we have
(p,u) = P(T+)K
M=P(K)K
M(27)
where
T+={x|p(x)1
M}
of cardinality
K=|T+|
, and
P(T+) = P(K)
by definition
(2)
. It
is easily seen that
(p
,
u)
attains its maximum value
=
1
1
M
if and only if
p=δ
is a delta
distribution. In summary
1
MR(X)1 (28)
where the lower (resp. upper) bound is attained for a delta (resp. uniform) distribution.
The conditional version is again taken by averaging over the observation:
R(X|Y),EyR(X|y). (29)
3. F-Concavity: Knowledge Reduces Randomness and Data Processing
Knowledge of the observed data
Y
(on average) reduces uncertainty, improves detection or
guessing, and reduces randomness in the sense that:
Hα(X|Y)Hα(X)(30)
G(X|Y)G(X)(31)
Pe(X|Y)Pe(X)(32)
R(X|Y)R(X). (33)
When
α=
1, the property
H(X|Y)H(X)
is well-known (“conditioning reduces entropy
[
15
]): the difference
H(X)H(X|Y) = I(X
;
Y)
is the mutual information, which is nonnegative.
Property
(30)
for
α6=
1 is also well known, see [
7
,
8
]. In view of
(10)
and
(11)
, the case
α= +
Phys. Sci. Forum 2022,5, 30 5 of 9
in
(30)
is equivalent to
(32)
which is obvious in the sense that any observation can only improve
MAP detection. This, as well as (31), is also easily proved directly (see, e.g., [17]).
For all quantities
H
,
Pe
,
G
,
R
, the conditional quantity is obtained by averaging over
the observation as in
(6)
,
(13)
,
(16)
and
(29)
. Since
p(x) = Eyp(x|y)
, the fact that knowledge
of
Y
reduces
H
,
Pe
,
G
or
R
amounts to saying that these are concave functions of the
distribution
p
of
X
. Note that concavity of
R(X) = R(p)
in
p
is clear from the definition
(26), which shows (33).
For entropy
H
, this also has been given some physical interpretation: “mixing” distri-
butions (taking convex combinations of probability distributions) can only increase the entropy
on average. For example, given any two distributions
p
and
q
,
H(λp+¯
λq)λH(p) + ¯
λH(q)
where 0
λ=
1
¯
λ
1. Similarly, such mixing of distributions increases the average probabil-
ity of error Pe, guessing entropy G, and statistical randomness R.
For conditional
α
-entropy
Hα(X|Y)
where
α6=
1, averaging over
Y
in the definition
(8)
is made on the
α
-norm of the distribution
pX|y
, which is known to be convex for
α>
1 (by
Minkowski’s inequality) and concave for 0
<α<
1 (by the reverse Minkowski inequality),
the fact that knowledge reduces
α
-entropy (inequality
(30)
) is equivalent to the fact that
Hα(p)
in
(6)
is an
F
-concave function, that is, an increasing function
F
of a concave function in
p
,
where
F(x) = α
1αlog(sgn(
1
α)x)
. The average over
Y
in
Hα(X|Y)
is made on the quantity
F1(Hα)instead of Hα. Thus, for example, H1/2 (p)is a log-concave function of p.
A straightforward generalization of
(30)
(33)
is the data processing inequality: for any
Markov chain X YZ, i.e., such that p(x|y,z) = p(x|y),
Hα(X|Y)Hα(X|Z)(34)
G(X|Y)G(X|Z)(35)
Pe(X|Y)Pe(X|Z)(36)
R(X|Y)R(X|Z)(37)
When
α=
1, the property
H(X|Y)H(X|Z)
amounts to
I(X
;
Z)I(X
;
Y)
, i.e., (post)-
processing can never increase information. Inequalities
(34)
(37)
can be deduced from
(30)
(33)
by considering a fixed
Z=z
, averaging over
Z
to show that
H(X|Y
,
Z)H(X|Z)
, etc.
(additional knowledge reduces randomness) and then noting that
p(x|y
,
z) = p(x|y)
by the Markov
property—see, e.g., [
7
,
18
] for
Hα
and [
17
] for
G
. Conversely,
(30)
(33)
can be re-obtained from
(34)
(37)
as the particular case
Z=
0 (any deterministic variable representing zero information).
4. S-Concavity: Mixing Increases Randomness and Data Processing
Another type of mixing (different from the one described in the preceding section) is
also useful in certain physical science considerations. It can be described as a sequence of
elementary mixing operations as follows. Suppose that one only modifies two probability
values
pi=p(xi)
and
pj=p(xj)
for
i6=j
. Since the result should be again a probability
distribution, the sum pi+pjshould be kept constant. Then there are two possibilities:
|pipj|
decreases; the resulting distribution is “smoother”, “more spread out”, “more
disordered”; the resulting operation can be written as
(pi
,
pj)7→ (λpi+¯
λpj
,
λpj+¯
λpi)
where 0
λ=
1
¯
λ
1, also known as “transfer” operation. We call it elementary
mixing operation or M-transformation in short.
|pipj|
increases; this is the reverse operation, an elementary unmixing operation or
U-transformation in short.
We say that a quantity is
s
-concave if it increases by any
M
-transformation (equivalently,
decreases by any
U
-transformation). Note that any increasing function
F
of an
s
-concave
function is again s-concave.
This notion coincides with that of Schur-concavity from majorization theory [
19
]. In fact,
we can say that
p
is majorized by
q
, and we write
pq
, if
p
is obtained from
q
by a (finite)
sequence of elementary
M
-transformations, or, what amounts the same, that
q
majorizes
Phys. Sci. Forum 2022,5, 30 6 of 9
p
, that is,
q
is obtained from
p
by a (finite) sequence of elementary
U
-transformations. A
well-known result ([19], p. 34) states that pqif and only if
P(k)Q(k)(0<k<M)(38)
(see definition (2)) where always P(M)=Q(M)=1.
From the above definitions it is immediate to see that all previously considered quan-
tities
H
,
Hα
,
G
,
Gρ
,
Pe
,
R
are
s
-concave, mixing increases uncertainty, guessing, error, and
randomness, that is, pqimplies
Hα(p)Hα(q)(39)
Gρ(p)Gρ(q)(40)
Pe(p)Pe(q)(41)
R(p)R(q). (42)
For
Hα
and
R
this can be easily seen from the fact that these quantities can be written as (an
increasing function of) a quantity of the form
xφ(p(x))
where
φ
is concave. Then the effet of
an
M
-transformation
(pi
,
pj)7→ (λpi+¯
λpj
,
λpj+¯
λpi)
gives
φ(λpi+¯
λpj) + φ(λpj+¯
λpi)
λφ(pi) + ¯
λφ(pj) + λφ(pj) + ¯
λφ(pi) = φ(pi) + φ(pj)
. For
Pe
it is obvious, and for
G
and
Gρ
it is also easily proved using characterization (38) and summation by parts [17].
Another kind of (functional or deterministic) data processing inequality can be obtained
from (39)–(42) as a particular case. For any deterministic function f,
Hα(f(X)) Hα(X)(43)
G(f(X)) G(X)(44)
Pe(f(X)) Pe(X)(45)
R(f(X)) R(X)(46)
Thus deterministic processing (by
f
) decreases (cannot increase) uncertainty, can only make
guessing or detection easier, and decreases randomness. For
α=
1 the inequality
H(f(X))
H(X)
can also be seen from the data processing inequality of the preceding section by
noting that
H(f(X)) = I(f(X)
;
f(X)) I(X
;
f(X)) H(X)
(since
Xf(X)f(X)
is
trivially a Markov chain).
To prove
(43)
(46)
in general, consider preimages by
f
of values of
y=f(x)
; it is
enough to show that each of the quantities
Hα
,
Pe
,
G
, or
R
decreases by the elementary
operation consisting in putting together two distincts values
xi
,
xj
of
x
in the same preimage
of
y
. However, for probability distributions, this operation amounts the
U
-transformation
(pi,pj)7→ (pi+pj, 0)and the result follows by s-concavity.
An equivalent property of
(43)
(46)
is the fact that any additional random variable
Y
increases uncertainty, probability of error, guessing, and randomness in the sense that
Hα(X)Hα(X,Y)(47)
G(X)G(X,Y)(48)
Pe(X)Pe(X,Y)(49)
R(X)R(X,Y)(50)
This is a particular case of
(43)
(46)
applied to the joint
(X
,
Y)
and the first projection
f(x
,
y) = x
. Conversely,
(43)
(46)
follows from
(47)
(50)
by applying it to
(f(X)
,
X)
in
place of (X,Y)and noting that the distribution of (f(X),X)is essentially that of X.
5. Optimal Fano-Type and Pinsker-Type Bounds
We have seen that informational quantities such as entropies
H
,
Hα
, guessing entropies
G
,
Gρ
on one hand, and statistical quantities such as probability of error for MAP detection
Pe
and statistical randomness
R
on the other hand, satisfy many common properties:
Phys. Sci. Forum 2022,5, 30 7 of 9
decrease by knowledge, data processing, increase by mixing, etc. For this reason, it is
desirable to establish the best possible bounds between one informational quantity (such
as Hαor Gρ) and one statistical quantity (Peor R=1(p,u)).
To achieve this, we remark that for any distribution
p
, we have the following majoriza-
tions. For fixed Pe=1Ps:
(Ps,Pe
M1, . . . , Pe
M1)p(Ps, . . . , Ps, 1 KPs, 0, . . . , 0)(51)
where (necessarily) K=b1
Psc, and for fixed R=1:
(1
M+
K, . . . , 1
M+
K
|{z }
Ktimes
,1
M
MK, . . . , 1
M
MK
| {z }
MKtimes
)p(+1
M,1
M, . . . , 1
M
|{z }
L1 times
,RL
M, 0, . . . , 0)(52)
where
K=|{p1
M}|
as in
(27)
and (necessarily)
L=bMRc
(
K
can possibly be any integer
between 1 and
L
). These majorizations are easily established using characterizations
(12)
,
(27) and (38).
Applying
s
-concavity of entropies
Hα
or
Gρ
to
(51)
gives closed-form upper bounds of
entropies as a function of
Pe
, known as Fano inequalities; and closed-form lower bounds,
known as reverse Fano inequalities. Figure 1shows some optimal regions.
00,25 0,5 0,75 1
0,25
0,5
0,75
1
1,25
1,5
1,75
2
H1/2
Pe
00,25 0,5 0,75 1
0,25
0,5
0,75
1
1,25
1,5
1,75
2
H
Pe
00,25 0,5 0,75 1
0,25
0,5
0,75
1
1,25
1,5
1,75
2
H2
Pe
00,25 0,5 0,75 1
0,25
0,5
0,75
1
1,25
log G
Pe
00,25 0,5 0,75 1
0,5
1
1,5
2
2,5
3
3,5
4
4,5
5
H1/2
Pe
00,25 0,5 0,75 1
0,8
1,6
2,4
3,2
4
4,8
H
Pe
00,25 0,5 0,75 1
0,5
1
1,5
2
2,5
3
3,5
4
4,5
5
H2
Pe
log G
Pe
Figure 1.
Optimal regions: Entropies (in bits) vs. error probability. Top row
M=
4; bottom row
M=
32.
The original Fano inequality was an upper bound on conditional entropy
H(X|Y)
as a function of
Pe(X|Y)
. It can be shown that upper bounds in the conditional case are
unchanged. Lower bounds of conditional entropies or
α
-entropies, however, have to be
slightly changed due to the average operation inside the function
F
(see Section 3above)
by taking the convex enveloppe (piecewise linear) of the lower curve on
F1(Hα)
. In this
way, one recovers easily the results of [20] for H, [11] for Hα, and [14,17] for Gand Gρ.
Likewise, applying
s
-concavity of entropies
Hα
or
Gρ
to
(52)
gives closed-form upper
bounds of entropies as a function of
R
, similar to Pinsker inequalities; and closed-form lower
bounds, similar to reverse Pinsker inequalities. Figure 2shows some optimal regions.
The various Pinsker and reverse Pinsker inequalities that can be found in the literature
give bounds between
(p
,
q)
and
D(pkq)
for general
q
. Such inequalities find application
in Quantum physics [
21
] and to derive lower bounds on the minimax risk in nonparametric
estimation [
22
]. As they are of more general applicability, they turn out not to be optimal
here since we have optimized the bounds in the particular case
q=u
. Using our method,
one again recovers easily previous results of [
23
] (and [
24
], Theorem 26) for
H
, and improves
previous inequalities used for several applications [3,4,6].
Phys. Sci. Forum 2022,5, 30 8 of 9
00,25 0,5 0,75
0,25
0,5
0,75
1
1,25
1,5
1,75
2
0
H1/2
R
00,25 0,5 0,75
0,25
0,5
0,75
1
1,25
1,5
1,75
2
H
R
00,25 0,5 0,75
0,25
0,5
0,75
1
1,25
1,5
1,75
2
H2
R
00,25 0,5 0,75 1
0,25
0,5
0,75
1
1,25
log G
R
00,25 0,5 0,75
0,8
1,6
2,4
3,2
4
4,8
H1/2
R
00,25 0,5 0,75 1
0,8
1,6
2,4
3,2
4
4,8
H
R
00,25 0,5 0,75 1
0,5
1
1,5
2
2,5
3
3,5
4
4,5
5
1
H2
R
00,25 0,5 0,75 1
0,4
0,8
1,2
1,6
2
2,4
2,8
3,2
3,6
4
log G
R
Figure 2.
Optimal regions: Entropies (in bits) vs. randomness
R
. Top row
M=
4; bottom row
M=32.
6. Conclusions
Using a simple method based on “mixing” or majorization, we have established
optimal (Fano-type and Pinsker-type) bounds between entropic quantities (
Hα
,
Gρ
) and
statistical quantities (
Pe
,
R
) in an interplay between information theory and statistics. As a
perspective, similar methodology could be developed for statistical distance to an arbitrary
(not necessarily uniform) distribution.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The author declares no conflict of interest.
References
1. Maurer, U.M. A Universal Statistical Test for Random Bit Generators. J. Cryptol. 1992,5, 89–105. [CrossRef]
2.
Pliam, J.O. Guesswork and Variation Distance as Measures of Cipher Security. In SAC 1999: Selected Areas in Cryptography,
Proceedings of the International Workshop on Selected Areas in Cryptography, Kingston, ON, Canada, 9–10 August 1999; Heys, H., Adams,
C., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1999; Volume 1758, pp. 62–77.
3.
Chevalier, C.; Fouque, P.A.; Pointcheval, D.; Zimmer, S. Optimal Randomness Extraction from a Diffie-Hellman Element.
In Advances in Cryptology—EUROCRYPT 2009, Proceedings of the Annual International Conference on the Theory and Applications
of Cryptographic Techniques, Cologne, Germany, 26–30 April 2009; Joux, A., Ed.; Lecture Notes in Computer Science; Springer:
Berlin/Heidelberg, Germany, 2009; Volume 5479, pp. 572–589.
4.
Shoup, V. A Computational Introduction to Number Theory and Algebra, 2nd ed.; Cambridge University Press: Cambridge, UK, 2009.
5.
Schaub, A.; Boutros, J.J.; Rioul, O. Entropy Estimation of Physically Unclonable Functions via Chow Parameters. In Proceedings of
the 57th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 24–27 September 2019.
6.
Killmann, W.; Schindler, W. A Proposal for Functionality Classes for Random Number Generators. Ver. 2.0, Anwendungshinweise
und Interpretationen zum Schema (AIS) 31 of the Bundesamt für Sicherheit in der Informationstechnik. 2011. Available on-
line: https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Certification/Interpretations/AIS_31_Functionality_classes_
for_random_number_generators_e.pdf?__blob=publicationFile&v=4 (accessed on 11 March 2021).
7. Fehr, S.; Berens, S. On the conditional Rényi entropy. IEEE Trans. Inf. Theory 2014,60, 6801–6810. [CrossRef]
8.
Arimoto, S. Information measures and capacity of order
α
for discrete memoryless channels. In Topics in Information Theory;
Csiszár, I., Elias, P., Eds.; Colloquium Mathematica Societatis János Bolyai, 2nd ed.; North Holland: Amsterdam, The Netherlands,
1977; Volume 16, pp. 41–52.
9.
Liu, Y.; Cheng, W.; Guilley, S.; Rioul, O. On conditional alpha-information and its application in side-channel analysis. In
Proceedings of the 2021 IEEE Information Theory Workshop (ITW2021), Online, 17–21 October 2021.
10. Rioul, O. Variations on a theme by Massey. IEEE Trans. Inf. Theory 2022,68, 2813–2828. [CrossRef]
Phys. Sci. Forum 2022,5, 30 9 of 9
11.
Sason, I.; Verdú, S. Arimoto–Rényi Conditional Entropy and Bayesian
M
-Ary Hypothesis Testing. IEEE Trans. Inf. Theory
2018
,
64, 4–25. [CrossRef]
12.
Massey, J.L. Guessing and entropy. In Proceedings of the IEEE International Symposium on Information Theory, Trondheim,
Norway, 27 June–1 July 1994; p. 204.
13.
Arikan, E. An inequality on guessing and its application to sequential decoding. IEEE Trans. Inf. Theory
1996
,42, 99–105.
[CrossRef]
14.
Sason, I.; Verdú, S. Improved Bounds on Lossless Source Coding and Guessing Moments via Rényi Measures. IEEE Trans. Inf. Theory
2018,64, 4323–4346. [CrossRef]
15. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006.
16.
van Erven, T.; Harremoës, P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory
2014
,60, 3797–3820.
[CrossRef]
17.
Béguinot, J.; Cheng, W.; Guilley, S.; Rioul, O. Be my guess: Guessing entropy vs. success rate for evaluating side-channel attacks
of secure chips. In Proceedings of the 25th Euromicro Conference on Digital System Design (DSD 2022), Maspalomas, Gran
Canaria, Spain, 31 August–2 September 2022.
18.
Rioul, O. A primer on alpha-information theory with application to leakage in secrecy systems. In Geometric Science of Information,
Proceedings of the 5th Conference on Geometric Science of Information (GSI’21), Paris, France, 21–23 July 2021; Lecture Notes in
Computer Science; Springer: Berlin/Heidelberg, Germany, 2021; Volume 12829, pp. 459–467.
19.
Marshall, A.W.; Olkin, I.; Arnold, B.C. Inequalities: Theory of Majorization and Its Applications, 2nd ed.; Springer Series in Statistics;
Springer: Berlin/Heidelberg, Germany, 2011.
20.
Ho, S.W.; Verdú, S. On the Interplay Between Conditional Entropy and Error Probability. IEEE Trans. Inf. Theory
2010
,56,
5930–5942. [CrossRef]
21. Audenaert, K.M.R.; Eisert, J. Continuity Bounds on the Quantum Relative Entropy—II. J. Math. Phys. 2011,52, 7. [CrossRef]
22.
Tsybakov, A.B. Introduction to Nonparametric Estimation; Springer Series in Statistics; Springer: Berlin/Heidelberg, Germany, 2009.
23.
Ho, S.W.; Yeung, R.W. The Interplay Between Entropy and Variational Distance. IEEE Trans. Inf. Theory
2010
,56, 5906–5929.
[CrossRef]
24. Sason, I.; Verdú, S. f-Divergence Inequalities. IEEE Trans. Inf. Theory 2016,62, 5973–6006. [CrossRef]
... In this respect, considering entropy of infinite order and first order guessing moment yields optimal bounds between SR and GE. In a similar approach, [2,3] derives optimal inequalities for all randomness measures using majorization theory. ...
... More generally [2] unified Theorems 2, 4, 6 by showing that all ''randomness measures'' verify a data processing inequality where being a ''randomness measure'' essentially means being a Schur-concave function. ...
... We first introduce some notations for the theory of majorization [14]. Hereafter we let (1) , (2) ...
... Extending the work in [21], Section 2 presents a simple alternative, which naturally encompass all common randomness measures H, H α , G, G ρ , P e , P m e , R 2 and R, based on two natural axioms: ...
... Another important issue is to study the relationship between randomness measures, by establishing the exact locus or joint range of two such measures among all probability distributions with tight lower and upper bounds. In this paper, extending the presentation made in [21], we establish the optimal bounds relating information-theoretic (e.g., entropic) quantities on one hand and statistical quantities (probability of error and statistical distance) on the other hand. ...
Article
Full-text available
Using majorization theory via “Robin Hood” elementary operations, optimal lower and upper bounds are derived on Rényi and guessing entropies with respect to either error probability (yielding reverse-Fano and Fano inequalities) or total variation distance to the uniform (yielding reverse-Pinsker and Pinsker inequalities). This gives a general picture of how the notion of randomness can be measured in many areas of computer science.
... is. Hereupon, Rioul recently characterized optimal Pinsker-like and reversed Pinsker-like inequalities [25]. While the optimal Pinsker inequality does not improve upon Pinsker's inequality, the optimal reverse Pinsker does improve it. ...
... Let f opt be the optimal reverse Pinsker inequality, i.e., Proof. By applying the entropy which is Schur-concave to Eqn. 51 in [25]. We factor − log M in each term of the inequality to obtain Prop. 5. ...
Chapter
Full-text available
At Eurocrypt 2015, Duc et al. conjectured that the success rate of a side-channel attack targeting an intermediate computation encoded in a linear secret-sharing, a.k.a. masking with d+1 shares, could be inferred by measuring the mutual information between the leakage and each share separately. This way, security bounds can be derived without having to mount the complete attack. So far, the best proven bounds for masked encodings were nearly tight with the conjecture, up to a constant factor overhead equal to the field size, which may still give loose security guarantees compared to actual attacks. In this paper, we improve upon the state-of-the-art bounds by removing the field size loss, in the cases of Boolean masking and arithmetic masking modulo a power of two. As an example, when masking in the AES field, our new bound outperforms the former ones by a factor 256. Moreover, we provide theoretical hints that similar results could hold for masking in other fields as well.
... defines a probability measure. Paper [7] deals with the assessment of randomness of a certain variable X and with different criteria that can be used to evaluate randomness. There are established optimal bounds between an tropic quantities and statistical quantities in an interplay between information theory and statistics. ...
Article
Full-text available
Possibilities and methods of applying the concept of uncertainty in order to assess the quality of control are investigated. An analysis of the approaches currently used for uncertainty assessment is carried out. The use of the informational approach for this purpose is substantiated. It is proposed to use informational uncertainty as a criterion for the quality of control tools. For this, the amount of negative information (misinformation) caused by the imperfection of management methods and devices is calculated. The method of estimating the amount of misinformation is based on Bongard's concept of uncertainty. Misinformation is considered as Bongard's negative useful information. The amount of misinformation is the difference between the Shannon entropy and the Bongard's uncertainty and is used as a criterion for absolute information uncertainty. The criterion of relative information uncertainty is also proposed as the ratio of the amount of misinformation introduced by the control tool to the maximum possible value of misinformation. The maximum value is the amount of misinformation at zero Shannon entropy. Mathematical expressions for calculating the absolute and relative uncertainty of control systems are given. Formulas for calculating deterministic analogs of Shannon's entropy and Bongard's uncertainty are proposed to assess the quality of control tools that are investigated by non-statistical methods. Appropriate expressions for calculating criteria of absolute and relative uncertainty based on transient processes of control systems are derived. The practical use of the proposed method is shown. To demonstrate the use of the criterion of information uncertainty, simulation of the PID controller was carried out using Scilab/Xcos tools. The vectors of input and output values obtained as a result of modeling were processed using the formulas introduced in this article. The criterion of relative information uncertainty was applied to compare the quality of PID controllers that were discretized by different methods. Анотація. Досліджуються можливості та методи використання концепції невизначеності з метою оцінки якості управління. Виконаний аналіз підходів, що нині використовуються для оцінки невизначеності. Обґрунтовується використання з цієї метою інформаційного підходу. Пропонується використовувати інформаційну невизначеність як критерій якості засобів керування. Для цього розраховується кількість негативної інформації (дезінформації), що спричиняється недосконалістю методів та пристроїв керування. Метод оцінки кількості дезінформації заснований на концепції невизначеності Бонгарда. Дезінформація розглядається як негативна корисна інформація Бонгарда. Кількість дезінформації є різницею між ентропією Шеннона і невизначеністю Бонгарда і використовується як критерій абсолютної інформаційної невизначеності. Запропонований також критерій відносної інформаційної невизначеності як відношення кількості внесеної засобом керування дезінформації до її максимально можливого значення. За максимальне значення береться кількість дезінформації при нульовому значенні ентропії Шеннона. Наведені математичні вирази для розрахунку абсолютної і відносної невизначеності систем керування. Для оцінки якості керуючих засобів, які досліджуються нестатистичними методами, запропоновані формули для розрахунку детермінованих аналогів ентропії Шеннона і невизначеності Бонгарда. Виведені відповідні вираження для розрахунку критеріїв абсолютної і відносної невизначеності на основі перехідних процесів систем керування. Показане практичне використання запропонованого методу. Для демонстрації використання критерію інформаційної невизначеності було проведено моделювання ПІД-регулятора засобами Scilab/Xcos. Отримані в результаті моделювання вектори значень вхідних і вихідних величин оброблялись за допомогою введених у цій статті формул. Критерій відносної інформаційної невизначеності був застосований для порівняння якості ПІД-регуляторів, які були дискретизовані різними методами. Introduction. During the analysis and synthesis of control systems, the task of assessing the effectiveness and quality of systems arises. Today, there are many criteria of effectiveness and quality, which are often contradictory. To assess the quality of control, the following are used: overshoot, oscillation, duration of the transient process, settling time, time to reach the first maximum of the controlled value, stability margin, response speed, frequency of natural oscillations of the system, etc. Improving one indicator leads to deterioration of another. It would be more convenient to have a single universal criterion that provides a comprehensive assessment of the quality of control tools. Thus, there is a need to develop a universal criterion for assessing the accuracy of the production of control influences, which would allow taking into account various factors affecting the quality of the designed control system. It is known that a high degree of generalization of patterns and phenomena in a wide variety of areas is achieved by using methods and concepts of information theory. This allows one to abstract from specific physical processes occurring in the designed system. Many researchers view information as a methodological basis for generalization and simplification. It is necessary to analyze information processes, estimate the amount of information circulating in the system, and calculate the degree of its distortion. Such characteristics of control systems as complexity, orderliness, organization and entropy are used. However, such approaches have not found wide application in the practice of analysis and synthesis of control systems. Lately the Uncertainty Approach (UA) has become established in measurement theory. But uncertainty is not only found in measurements. To a large extent, this concept also applies to problems of control. The expediency of using UA also arises in relation to the quality of controls. In paper [1], a similar problem was solved for assessing the quality of measuring instruments. It was proposed to use the information criterion. There is introduced the concept of information uncertainty, which is estimated by the amount of negative useful information, that is, misinformation introduced by the measuring instrument. This approach is also useful for solving problems of analysis and synthesis of control systems. Literature review. Since Shannon introduced the concept of information entropy as a quantitative measure of the uncertainty of some source of information, a large number of researchers have proposed a number of other approaches to estimate the uncertainty. The first push for this was made by Alfrped Renyi in his report at the 4th Berkeley Symposium [2]. He considered the problem of estimation of amount of uncertainty of the distribution ℘, that is, the amount of uncertainty concerning the outcome of an experiment, the possible results of which have the probabilities p1, p2, …, pn. Renyi pointed out that the Shannon entropy
... defines a probability measure. Paper [7] deals with the assessment of randomness of a certain variable X and with different criteria that can be used to evaluate randomness. There are established optimal bounds between an tropic quantities and statistical quantities in an interplay between information theory and statistics. ...
Article
Possibilities and methods of applying the concept of uncertainty in order to assess the quality of control are investigated. An analysis of the approaches currently used for uncertainty assessment is carried out. The use of the informational approach for this purpose is substantiated. It is proposed to use informational uncertainty as a criterion for the quality of control tools. For this, the amount of negative information (misinformation) caused by the imperfection of management methods and devices is calculated. The method of estimating the amount of misinformation is based on Bongard's concept of uncertainty. Misinformation is considered as Bongard's negative useful information. The amount of misinformation is the difference between the Shannon entropy and the Bongard’s uncertainty and is used as a criterion for absolute information uncertainty. The criterion of relative information uncertainty is also proposed as the ratio of the amount of misinformation introduced by the control tool to the maximum possible value of misinformation. The maximum value is the amount of misinformation at zero Shannon entropy. Mathematical expressions for calculating the absolute and relative uncertainty of control systems are given. Formulas for calculating deterministic analogs of Shannon's entropy and Bongard's uncertainty are proposed to assess the quality of control tools that are investigated by non-statistical methods. Appropriate expressions for calculating criteria of absolute and relative uncertainty based on transient processes of control systems are derived. The practical use of the proposed method is shown. To demonstrate the use of the criterion of information uncertainty, simulation of the PID controller was carried out using Scilab/Xcos tools. The vectors of input and output values obtained as a result of modeling were processed using the formulas introduced in this article. The criterion of relative information uncertainty was applied to compare the quality of PID controllers that were discretized by different methods.
Article
Full-text available
We consider transformations of independent random variables by quasigroup operations. We explore the role of non-uniformity measures in the proof of convergence to uniformity and the estimation of convergence rate. We suggest a non-uniformity measure that allows an elementary proof of exponential convergence rate. The results on convergence rate are juxtaposed with known results on transformations of random variables in finited groups.
Article
Full-text available
In 1994, Jim Massey proposed the guessing entropy as a measure of the difficulty that an attacker has to guess a secret used in a cryptographic system, and established a well-known inequality between entropy and guessing entropy. Over 15 years before, in an unpublished work, he also established a well-known inequality for the entropy of an integer-valued random variable of given variance. In this paper, we establish a link between the two works by Massey in the more general framework of the relationship between discrete (absolute) entropy and continuous (differential) entropy. Two approaches are given in which the discrete entropy (or Rényi entropy) of an integer-valued variable can be upper bounded using the differential (Rényi) entropy of some suitably chosen continuous random variable. As an application, lower bounds on guessing entropy and guessing moments are derived in terms of entropy or Rényi entropy (without side information) and conditional entropy or Arimoto conditional entropy (when side information is available).
Article
Full-text available
This paper provides upper and lower bounds on the optimal guessing moments of a random variable taking values on a finite set when side information may be available. These moments quantify the number of guesses required for correctly identifying the unknown object and, similarly to Arikan's bounds, they are expressed in terms of the Arimoto-R\'{e}nyi conditional entropy. Although Arikan's bounds are asymptotically tight, the improvement of the bounds in this paper is significant in the non-asymptotic regime. Relationships between moments of the optimal guessing function and the MAP error probability are also established, characterizing the exact locus of their attainable values. The bounds on optimal guessing moments serve to improve non-asymptotic bounds on the cumulant generating function of the codeword lengths for fixed-to-variable optimal lossless source coding without prefix constraints. Non-asymptotic bounds on the reliability function of discrete memoryless sources are derived as well. Relying on these techniques, lower bounds on the cumulant generating function of the codeword lengths are derived, by means of the smooth R\'{e}nyi entropy, for source codes that allow decoding errors.
Article
Full-text available
This paper gives upper and lower bounds on the minimum error probability of Bayesian M-ary hypothesis testing in terms of the Arimoto-R\'enyi conditional entropy of an arbitrary order α\alpha. The improved tightness of these bounds over their specialized versions with the Shannon conditional entropy (α=1\alpha=1) is demonstrated. In particular, in the case where M is finite, we show how to generalize Fano's inequality under both the conventional and list-decision settings. As a counterpart to the generalized Fano's inequality, allowing M to be infinite, a lower bound on the Arimoto-R\'enyi conditional entropy is derived as a function of the minimum error probability. Explicit upper and lower bounds on the minimum error probability are obtained as a function of the Arimoto-R\'enyi conditional entropy for both positive and negative α\alpha. Furthermore, we give upper bounds on the minimum error probability as a function of the R\'enyi divergence and the Chernoff information. In the setup of discrete memoryless channels, we analyze the exponentially vanishing decay of the Arimoto-R\'enyi conditional entropy of the transmitted codeword given the channel output when averaged over a code ensemble.
Article
Full-text available
This paper develops systematic approaches to obtain f -divergence inequalities, dealing with pairs of probability measures defined on arbitrary alphabets. Functional domination is one such approach, where special emphasis is placed on finding the best possible constant upper bounding a ratio of f -divergences. Another approach used for the derivation of bounds among f -divergences relies on moment inequalities and the logarithmic-convexity property, which results in tight bounds on the relative entropy and Bhattacharyya distance in terms of χ 2 divergences. A rich variety of bounds are shown to hold under boundedness assumptions on the relative information. Special attention is devoted to the total variation distance and its relation to the relative information and relative entropy, including “reverse Pinsker inequalities,” as well as on the E γ divergence, which generalizes the total variation distance. Pinsker's inequality is extended for this type of f -divergence, a result which leads to an inequality linking the relative entropy and relative information spectrum. Integral expressions of the Rényi divergence in terms of the relative information spectrum are derived, leading to bounds on the Rényi divergence in terms of either the variational distance or relative entropy.
Book
Number theory and algebra play an increasingly significant role in computing and communications, as evidenced by the striking applications of these subjects to such fields as cryptography and coding theory. This introductory book emphasizes algorithms and applications, such as cryptography and error correcting codes, and is accessible to a broad audience. The presentation alternates between theory and applications in order to motivate and illustrate the mathematics. The mathematical coverage includes the basics of number theory, abstract algebra and discrete probability theory. This edition now includes over 150 new exercises, ranging from the routine to the challenging, that flesh out the material presented in the body of the text, and which further develop the theory and present new applications. The material has also been reorganized to improve clarity of exposition and presentation. Ideal as a textbook for introductory courses in number theory and algebra, especially those geared towards computer science students.
Chapter
We give an informative review of the notions of Rényi’s α-entropy and α-divergence, Arimoto’s conditional α-entropy, and Sibson’s α-information, with emphasis on the various relations between them. All these generalize Shannon’s classical information measures corresponding to α=1. We present results on data processing inequalities and provide some new generalizations of the classical Fano’s inequality for any α>0. This enables one to α-information as a information theoretic metric of leakage in secrecy systems. Such metric can bound the gain of an adversary in guessing some secret (any potentially random function of some sensitive dataset) from disclosed measurements, compared with the adversary’s prior belief (without access to measurements).
Article
The Rényi entropy of general order unifies the well-known Shannon entropy with several other entropy notions, like the min-entropy or collision entropy. In contrast to the Shannon entropy, there seems to be no commonly accepted definition for the conditional Rényi entropy: several versions have been proposed and used in the literature. In this paper, we reconsider the definition for the conditional Rényi entropy of general order as proposed by Arimoto in the seventies. We show that this particular notion satisfies several natural properties. In particular, we show that it satisfies monotonicity under conditioning, meaning that conditioning can only reduce the entropy, and (a weak form of) chain rule, which implies that the decrease in entropy due to conditioning is bounded by the number of bits one conditions on. None of the other suggestions for the conditional Rényi entropy satisfies both these properties. Finally, we show a natural interpretation of the conditional Rényi entropy in terms of (unconditional) Rényi divergence, and we show consistency with a recently proposed notion of conditional Rényi entropy in the quantum setting.