PreprintPDF Available

# Information geometry of scaling expansions of non-exponentially growing confifiguration spaces

Authors:

## Abstract

Many stochastic complex systems are characterized by the fact that their configuration space doesn't grow exponentially as a function of the degrees of freedom. The use of scaling expansions is a natural way to measure the asymptotic growth of the configuration space volume in terms of the scaling exponents of the system. These scaling exponents can, in turn, be used to define universality classes that uniquely determine the statistics of a system. Every system belongs to one of these classes. Here we derive the information geometry of scaling expansions of sample spaces. In particular, we present the deformed logarithms and the metric in a systematic and coherent way. We observe a phase transition for the curvature. The phase transition can be well measured by the characteristic length r, corresponding to a ball with radius 2r having the same curvature as the statistical manifold. Increasing characteristic length with respect to the size of the system is associated with sub-exponential sample space growth is associated with strongly constrained and correlated complex systems. Decreasing of the characteristic length corresponds to super-exponential sample space growth that occurs for example in systems that develop structure as they evolve. Constant curvature means exponential sample space growth that is associated with multinomial statistics, and traditional Boltzmann-Gibbs, or Shannon statistics applies. This allows us to characterize transitions between statistical manifolds corresponding to different families of probability distributions.
EPJ manuscript No.
(will be inserted by the editor)
Information geometry of scaling expansions
of non-exponentially growing conﬁguration
spaces
Jan Korbel1,2,a, Rudolf Hanel1,2,b, and Stefan Thurner1,2,3,4,c
1Section for Science of Complex Systems, CeMSIIS, Medical University of Vienna, Spital-
gasse 23, 1090 Vienna, Austria
2Complexity Science Hub Vienna, Josefst¨adter Strasse 39, 1080 Vienna, Austria
3Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
4IIASA, Schlossplatz 1, 2361 Laxenburg, Austria
Abstract. Many stochastic complex systems are characterized by the
fact that their conﬁguration space doesn’t grow exponentially as a func-
tion of the degrees of freedom. The use of scaling expansions is a nat-
ural way to measure the asymptotic growth of the conﬁguration space
volume in terms of the scaling exponents of the system. These scal-
ing exponents can, in turn, be used to deﬁne universality classes that
uniquely determine the statistics of a system. Every system belongs
to one of these classes. Here we derive the information geometry of
scaling expansions of sample spaces. In particular, we present the de-
formed logarithms and the metric in a systematic and coherent way.
We observe a phase transition for the curvature. The phase transition
can be well measured by the characteristic length r, corresponding
to a ball with radius 2rhaving the same curvature as the statistical
manifold. Increasing characteristic length with respect to size of the
system is associated with sub-exponential sample space growth is as-
sociated to strongly constrained and correlated complex systems. De-
creasing of the characteristic lenght corresponds to super-exponential
sample space growth that occurs for example in systems that develop
structure as they evolve. Constant curvature means exponential sample
space growth that is associated with multinomial statistics, and tradi-
tional Boltzmann-Gibbs, or Shannon statistics applies. This allows us
to characterize transitions between statistical manifolds corresponding
to diﬀerent families of probability distributions.
1 Introduction
Statistical physics of complex systems has turned into an increasingly important
topic with many applications. Its main aim is to come up with a uniﬁed approach to
ae-mail: jan.korbel@meduniwien.ac.at
be-mail: rudolf.hanel@meduniwien.ac.at
ce-mail: stefan.thurner@meduniwien.ac.at
arXiv:2001.06393v1 [cond-mat.stat-mech] 17 Jan 2020
2 Will be inserted by the editor
understand, describe, and predict the statistical properties of a plethora of diﬀerent
complex systems; see e.g., [1] for an overview. While the microscopic nature of complex
systems can be very diﬀerent, their statistical properties often have common features
across various systems. Entropy is undoubtedly the key concept in statistical physics
that connects the statistical description of microscopic dynamics with the macroscopic
thermodynamic properties of a system. The notion of entropy has been adopted also in
other contexts, such as information theory or statistical inference, which are concepts
quite diﬀerent from thermodynamics [2]. One elegant and powerful concept arising
from the theory of statistical inference is that of information geometry [3,4]. It applies
ideas from diﬀerential geometry to probability theory and statistics. In this context,
the concept of entropy also plays a crucial role, since the metric on the statistical
manifold is derived from the corresponding (relative) entropy. This, so-called Fisher-
Rao metric, enables us to analyze statistical systems from a diﬀerent perspective. For
example one can study critical transitions by calculating singularities of the metric
[5].
In information geometry, most attention has focused on systems that are governed
by Shannon entropy [3,4]. However, it is well known that many complex systems,
especially strongly correlated or constrained systems, or systems with emergent com-
ponents, cannot be described within the framework of Shannon entropy [1]. For this
reason, a number of generalizations to Shannon entropy have been proposed; in con-
nection with power laws [6,7], special relativity [8], multifractal thermodynamics [9],
or black holes [10,11].
To classify entropies for stochastic systems of various kinds, it is natural to
Shannon-Khinchin (SK) axioms [12,13]. The ﬁrst three SK axioms are usually formu-
lated as:
(SK1) Entropy is a continuous function of the probabilities pionly1.
(SK2) Entropy is maximal for the uniform distribution, pi= 1/W .
(SK3) Adding a state W+ 1 to a system with pW+1 = 0 does not change the
entropy of the system.
The fourth axiom is called the composability axiom and determines the entropy func-
tional uniquely:
(SK4) H(A+B) = H(A) + H(B|A), where H(B|A) = PpA
kH(B|Ak)
where H(B|Ak) is the entropy of the conditional probability, pB|Ak. In this for-
mulation, the unique solution that is compatible with SK1-4 is Shannon entropy
H(P) = Pipilog pi. When the fourth axiom is relaxed, one can obtain wider class
of entropic functionals. First generalizations of the fourth axiom were introduced in
connection with generalized additivity [14,15] group laws [16] or statistical inference
[17]. These approaches are somewhat limited in scope, since they all lead to class of
entropies, which can be expressed as a function of Tsallis entropy [6].
The relaxation of SK4 also naturally leads to a classiﬁcation scheme of complex
systems [18,19]. The main idea of this approach is to study the asymptotic scaling ex-
ponents of the entropy functional that are associated to a particular system’s conﬁgu-
ration space. These systems are associated with systems that have a sub-exponentially
growing conﬁguration space, when seen as a function of degrees of freedom. This clas-
1In several cases, entropies incorporate external parameter such as qfor Tsallis entropy
or cand dfor (c,d)-entropies. However, these parameters are constants that characterize the
universality class of the process. They are not parameters subject to variation in entropy
maximization.
Will be inserted by the editor 3
siﬁcation scheme is based on mathematical analysis of the asymptotic scaling of the
entropic functionals that are governed by the ﬁrst three SK axioms2.
Since the conﬁguration space of most complex systems does not grow exponentially
(as for the case of Shannon entropy), but polynomially [7], as a stretched exponential
[20], or even super-exponentially [21], the appropriate scaling behavior of the entropic
functional is crucial for a proper thermodynamic interpretation. For this end, we
use recently developed scaling expansion [22], which is a special case of Poincar´e
asymptotic series [23], whose coeﬃcients are the scaling exponents of the system.
The aim of this paper is to deﬁne a generalization of Shannon entropy that matches
the appropriate asymptotic scaling of a given system, and use it to derive the asso-
ciated generalized Fisher-Rao metric of the underlying statistical manifold. For this
end, we use the framework of deformed logarithms [35,25]. It has been shown re-
cently [26] that one can naturally obtain two types of information metric within that
framework, one, corresponding to the maximum entropy principle with linear con-
straints, and the other, corresponding to the maximum entropy principle when used
with so-called escort constraints, instead of ordinary (linear) constraint terms.
Escort distributions appeared in connection with chaotic systems [27], and were
discussed in the context of superstatistics [28,29]. Later it became possible to relate
them to linear constraints through a log-duality [30]. Interestingly, escort distributions
also appear as a canonical coordinate in information geometry [31,32]. In this paper,
we use both linear and escort approaches and compare their corresponding metric ten-
sor and its invariants. We focus particularly on the microcanonical ensemble in the
thermodynamic limit, since the metric should correspond to the system’s asymptotic
properties, given by its characteristic structure. Some partial results for the curvature
of escort metric were recently obtained in this direction [34]. However, no systematic
and analytically expressible results for metric tensor and its scalar curvature have
been obtained so far. We show that the curvature of the statistical manifold natu-
rally distinguishes between three types of systems: systems with sub-exponentially
growing conﬁguration or sample space (correlated and constrained systems), expo-
nentially growing sample space (equivalent to ordinary multinomial statistics), and
super-exponentially growing sample space (e.g. systems that develop emergent struc-
tures as they evolve). The vector of scaling exponents plays the role of a set of order
parameters, i.e., the distance from the phase transition between sub-exponential and
super-exponential phases.
The paper is organized as follows: Section 2 introduces the scaling expansion and
how to calculate corresponding scaling exponents. We discuss several systems with
non-trivial scaling exponents. In the last part of the section we establish a represen-
tation of universality classes for complex systems, by introducing scaling vectors and
their basic operations. In Section 3, we brieﬂy revisit the results of information geom-
etry in the framework of φ-deformed logarithms. We focus on information geometry
with both, linear, and escort constraints. The main results of the paper are derived
in Section 4, where we deﬁne the appropriate generalized logarithm by combining
the φ-deformation framework and the requirement of asymptotic scaling. The prop-
erties of corresponding entropic functionals are discussed. We exemplify the whole
approach by the simple, yet very general, class of entropies with one correction term
from the scaling expansion and calculate the asymptotic behavior of scalar curvature
of the microcanonical ensemble in the thermodynamic limit. The last section draws
conclusions. The paper has several appendices that contain several technical details.
2This does not mean that actual distribution functions that are, say, obtained from the
maximum entropy principle must be equi-distributed, since the form of the distribution is
determined not only by the entropic functional, but also by the constraints.
4 Will be inserted by the editor
2 Scaling expansion of the volume of conﬁguration space
The scaling expansion [22] is a method to investigate of the asymptotic scaling be-
havior of a sample space volume, W(N). Here Wis the number of accessible states
in a system, and Nindicates size of a system 3. The scaling expansion is a spe-
cial case of the Poincar´e asymptotic series, where the coeﬃcients correspond to the
scaling exponents of the system. We introduce the notation for the iterated use
of functions, f(n)(x) = f(. . . f (f(x)) . . . )
| {z }
ntimes
, to deﬁne a set of re-scaling operations,
r(n)
λ(x) = exp(n)(λlog(n)(x)). This set of re-scaling operations contains the well-known
multiplicative re-scaling, x7→ λx (n= 0), power rescaling x7→ xλ(n= 1), and the
additive rescaling x7→ x+ log λ(n=1). For each n,r(n)is a representation of the
multiplicative group (R+,×), i.e., r(n)
λr(n)
λ0=r(n)
λλ0. We now investigate how a func-
tion, W(N), scales with re-scaling of N7→ r(n)
λ(N). Note that due to a simple theorem
(see Appendix A2 in [22]) the function z(λ), deﬁned as z(λ) = limN→∞
g(r(n)
λ(N))
g(N),
must have the form z(λ) = λcfor cR∪ {±∞} whenever the limit exists. We start
with multiplicative scaling (n= 0): The expression W(λN )
W(N)is, according to the the-
orem, equal to λc0. We assume that W(N) is a strictly increasing function, then it
follows that c004. It can happen that c0= +. In that case, the expression grows
faster than any polynomial. This problem can be resolved by using log(l)(W(N)) in-
stead of W(N), for an appropriate choice of l. The parameter lis chosen such that
c(l)
0, corresponding to log(l)W(λN)
log(l)W(N)λc(l)
0, is ﬁnite. We call lthe order of the process.
We get that W(N)exp(l)(Nc(l)
0), for N1. To get the corrections to the leading
order, we use the fact that log(l)W(λN )
log(l)W(N)
Nc0
(λN)c01. When we use the re-scaling for
n= 1, we get the second scaling exponent: log(l)W(Nλ)
log(l)W(N)
Nc(l)
0
(Nλ)c(l)
0
λc(l)
1. Therefore,
W(N)exp(l)(Nc(l)
0(log N)c(l)
1). One can continue along the same lines to obtain the
asymptotic expansion of W(N), which reads
W(N)exp(l)
n
Y
j=0
(log(j)N)c(l)
j
for N ,(1)
where c(l)
jare the characteristic scaling exponents. The scaling expansion of log(l)W(N)
can be written
log(log(l)W(N)) =
n
X
j=0
c(l)
jlog(j+1) N+O(logn+1(N)) .(2)
It can be shown that the scaling exponents can be calculated from W(N) as
c(l)
k= lim
N→∞ log(k)(N) log(k1)(N) ... log(N) Nd log(l)(W(N))
dNc(l)
0!c(l)
1!...!c(l)
k1!.
(3)
3For example, think of Nas the number of particles in a system, or the number of throws
in a coin tossing experiment.
4Details about processes with reducing sample space can be found e.g., in Refs.
[40,41,42,43].
Will be inserted by the editor 5
As a next step, we apply the scaling expansion to obtain the corresponding ex-
tensive entropy functionals. It is well-known that for complex systems (with sub- or
super-exponential phase space growth) the Shannon-Boltzman-Gibbs entropy is not
an extensive quantity. To obtain an extensive expression for such systems, one can
introduce an appropriate generalization of the entropy functional [1]. A natural way
how to characterize thermodynamic entropy is to deﬁne the entropy functional S(W)
which is extensive. This requirement can be expressed for the microcanonical ensem-
ble as S(W(N)) Nfor N→ ∞. For the purpose of thermodynamics, we do not
have to require exact extensivity (with equality sign), but only its weaker asymptotic
version. We consider the general trace-form entropy functional
S(P) =
W
X
i=1
g(pi).(4)
The scaling expansion of the extensive entropy in the microcanonical ensemble can
be expressed as
S(W)
n
Y
j=0
log(j+l)(W(N))d(l)
jfor N ,(5)
and the scaling expansion of g(x) is
g(x)x
n
Y
j=0
log(j+l)1
xd(l)
j
.(6)
The scaling coeﬃcients d(l)
jcan be obtained by
d(l)
k= lim
N→∞ log(l+k)(W) log(l+k1)(W) ... log(l)(W) Nd log(l)W(N)
dN!1!d(l)
0!...!d(l)
k1!.
(7)
The requirement of extensivity determines the relation between scaling exponents
c(l)
jand d(l)
jas
d(l)
0=1
c(l)
0
d(l)
k=c(l)
k
c(l)
0
k= 1,2, . . . . (8)
Examples of systems with diﬀerent scaling exponents. The ﬁrst example is a ran-
dom walk (RW) on the discrete one-dimensional lattice with two possible steps: left
or right. The space of all possible paths grows exponentially, WRW (N)=2N
exp(N), and we obtain the formula for Boltzmann entropy SRW = log WRW (kB=
1). Now consider an aging random walk (ARW) [19], where the walker takes one
step in a random direction, followed by two steps into a random direction, fol-
lowed by three steps, etc. In this case, the sample space grows sub-exponentially,
WARW 2N/2, and SARW = (log WARW )2. The next example is the magnetic
coin model (MC) [21], where each coin can be in two states: head or tail, however,
two coins can also stick together and create a bond state. It can be shown that
the corresponding sample space grows super-exponentially, WMC NN/2e2N. One
6 Will be inserted by the editor
(a)
stretched exp
RW
CS
q-exp
MC
RN
RWC ARW
violates K2
violates K3
Lambert W0-expLambert W-1-exp
-2-1 1 2 d
-0.5
0.5
1.0
1.5
c
Fig. 1. parametric space of scaling expansion universality classes with scaling exponents
of Random walk (RW), Aging random walk (ARW) Magnetic coin model (MC), Random
networks (RN), Random walk cascade (RWC) and processes with compact support distri-
bution (CS). (a) 2D parametric space of scaling expansion universality classes for the ﬁrst
two exponents (as in [18]). We see that some super-exponential systems are not properly
represented. (b) Extension to three dimensions by adding the third scaling exponent, d2. All
mentioned examples can be described with the ﬁrst three scaling exponents.
can conclude that the corresponding extensive entropy is asymptotically equivalent
to SMC = log WMC /log log WM C . Another example of super-exponential processes
are random networks (RN), whose sample spaces grow as WRN = 2(N
2), and thus,
SRN = (log WRN )1/2. The ﬁnal example is the double-exponential growth of random
walk cascade (RWC), where the walker can take a step to the right, to the left, or
split into two independent walkers [22]. For this we get that WRWC = 22N1, and,
SRW C = log log WRW C . In Fig. 1 we show the parameter space of entropies given by
three scaling exponents (d0, d1, d2). The above examples are indicated as points. In
Fig. 1 (a) the plane for the ﬁrst two scaling exponents is shown, as presented in [18].
We see that if one uses only the ﬁrst two exponents, some super-exponential pro-
cesses are not properly represented. By adding a third scaling exponent this problem
is solved Fig. 1 (b). So far, we have not yet found simple examples that need more
than three scaling exponents.
2.1 Universality classes for scaling expansions
Scaling expansions deﬁne universality classes of statistical complex systems according
to set of the scaling exponents of their sample space [22]. The representation of the
sample space volume, W(N), by its scaling expansion can be used to uniquely describe
the statistical properties in the thermodynamic limit.
Consider a function c(x) represented by its scaling expansion
c(x)exp(l)
n
Y
j=0 log(j)(x)c(l)
j
.(9)
Its scaling exponents can be collected in the scaling vector
C={l;c(l)
0, c(l)
1, . . . , c(l)
n}.(10)
In principle, the scaling vector can be inﬁnite, however, typically, after several terms
the corrections are either zero, or do not contribute signiﬁcantly. The parameter n
denotes the number of corrections.
Will be inserted by the editor 7
Let a(x) and b(x) be two functions with its respective scaling expansion deter-
mined by the two vectors of scaling exponents
A={la;a0, a1, . . . , an}(11)
B={lb;b0, b1, . . . , bn}.(12)
Without loss of generality, ncan be the same for both vectors because one can always
append zeros to the shorter vector. We can now deﬁne the equivalence relation
a(x)b(x) if A≡B,(13)
as well as natural ordering
a(x)b(x) if A<B,(14)
where the symbol <is used in the lexicographic meaning, i.e.,
A<Bif
la< lb
la=lb, a0< b0
la=lb, a0=b0, a1< b1
. . . .
(15)
For every vector Cwe deﬁne the corresponding entropy scaling vector D, denoted
by D=C1, that is obtained from Eq. (8) by requirement of extensivity. One can
deﬁne analogous relations for Dthrough the relations for corresponding vectors C.
Thus, for entropy scaling vectors Eand F, we can say that
E<Fif
le< lf
le=lf, e0< f0
le=lf, e0=f0, e1>f1
. . . .
(16)
Note that for sub-leading scaling exponents the inequality is reversed, which is the
result of Eq. (8). Additionally, one can deﬁne basic algebraic operations on the scaling
vectors, such as generalized addition or derivative operator. More details can be found
in Appendix A. Let us make an important note. As discussed in [22], the SK axioms
set requirements on the admissible set of scaling exponents. From SK2 we get that
dld(l)
0>0 and from SK3 that d0<1. Note that the vector Dcan be also
represented as
D={l;d(l)
0, d(l)
1, . . . , d(l)
n}={0,...,0
| {z }
ltimes
, dl, dl+1, . . . , dl+n}.(17)
This means that one can use the representation without specifying lwith an appro-
priate number of zeros at the beginning. This is useful for example for the plots in the
parametric space, where it is possible to plot processes of diﬀerent order l(as e.g., in
Fig. 1). However, one has to keep in mind that this representation can be misleading
in the sense that the limit dl0 does not have a clear meaning, since it changes the
order of the process. This can be nicely seen in the example of Tsallis entropy [6],
where
lim
q1PW
i=1 pq
i1
1q=
W
X
i=1
pilog pi,(18)
which can be formulated in terms of entropy scaling vectors for as
lim
q1D= lim
q1(1 q, 0) = (0,1) (19)
8 Will be inserted by the editor
Interestingly, the limit from above, q1+, is even more pathological. In this case the
scaling vector corresponding to Sq(P) for q > 1 is (0,0), because Sq(N)N1q+ 1 N0.
These pathologies have their origin in the non-commutativity of limits, limN→∞ limdl06=
limdl0limN→∞. The limit dl0 depends on the particular representation of the
extensive entropy.
3 Information geometry of φ-deformations
Information geometry plays a central role in theory of information as well as in sta-
tistical inference. It allows one to study the structure of the statistical manifold by
means of diﬀerential geometry. We derive the information-geometric properties of the
scaling expansion in the framework of φ-deformed logarithms introduced in [35,25].
The φ-deformation is a generalization of logarithmic functions. It can be subsequently
used to establish a connection with information theory, where the logarithm plays the
role of a natural information measure (Hartley information). The φ-deformed loga-
rithm is deﬁned by a positive, strictly increasing function φ(x), on (0,+) as
logφ(x) = Zx
1
dy
φ(y).(20)
Hence, logφis an increasing concave function with logφ(1) = 0. For φ(x) = x, we
obtain the ordinary logarithm. Naturally,
d logφ(x)
dx=1
φ(x).(21)
The inverse function of logφ, the so-called φ-exponential, is an increasing and convex
function. This enables one to deﬁne the parametric φ-exponential family of probability
distributions as
p(x;θ) = expφ Ψ(θ) + X
i
xiθi!,(22)
where the function Ψ(θ) is called the Massieu function and normalizes the distribu-
tion. As discussed in [26], there are two natural ways how to make a connection with
the theory of information through the maximum entropy principle. The ﬁrst is based
on the maximization of the entropy functional under the linear (thermodynamic) con-
straints, the latter is based on a maximization under so-called escort (or geometric)
constraints. Both approaches lead to the φ-exponential family. The former approach
deﬁnes the φ-deformed entropy as [25]
SN
φ(p) =
W
X
i=1 Zpi
0
dxlogφ(x) (23)
which is maximized by the φ-exponential family for linear constraints, i.e., constraints
of the type
W
X
i=1
piEi=hEi.(24)
In information geometry, escort distributions play a special role of dual coordinates
on statistical manifolds [33]. They can be deﬁned by φ-deformations as
Pφ
i=φ(pi)
Pkφ(pk)=φ(pi)
hφ(P).(25)
Will be inserted by the editor 9
It can be shown that the entropy maximized by the φ-exponential family for escort
constraints, i.e., for constraints of the type
W
X
i=1
Pφ
iEi=hEiφ,(26)
can be expressed as
SA
φ(p) =
W
X
i=1
Pφ
ilogφ(pi) = PW
i=1 φ(pi) logφ(pi)
PW
j=1 φ(pj).(27)
For both approaches can be linked to information geometry, i.e., to derive a general-
ization of a Fisher information metric, which can be done through a divergence (or
relative entropy) of Bregmann type, which is deﬁned as
Df(p||q) = f(p)f(q)− h∇f(q), p qi,(28)
where ,·i denotes the inner product. Alternatively, one can use the divergence
of Csisz´ar type, but its information geometry is trivial, because it is conformal to
ordinary Fisher information geometry, see e.g., Refs. [26,35].
Let us consider a parametric family of distributions p(θ). The Fisher information
metric of this family at point θ0can be calculated as
gf
ij (θ) = 2Df(p(θ0)||p(θ))
∂θiθj
|θ=θ0.(29)
Let us consider a discrete probability distribution {pi}n
i=0. The normalization is
given by Pn
i=0 pi= 1, so we consider pi, . . . , pnas independent variables, while
p0is determined from p0= 1 Pn
i=1 pi. We parameterize this probability sim-
plex by a φ-deformed exponential family5. For the entropy SN
φ, we have fN
φ(p) =
PiRpi
0logφ(x) dxwhile for SA
φ(p) we end with f(p) = PiPθ
ilogφ(pi). After a straight-
forward calculation, we obtain that [26]
gN
φ,ij (P) = log0
φ(pi)δij + log0
φ(p0) = 1
φ(pi)δij +1
φ(p0),(30)
and
gA
φ,ij (P) = 1
hφ(p) log00
φ(pi)
log0
φ(pi)δij +log00
φ(p0)
log0
φ(p0)!=1
hφ(p)φ0(pj)
φ(pj)δij +φ0(p0)
φ(p0),
(31)
respectively. As a result, for a given φ-deformation there are two types of metric
on the information manifold. Note that it is natural to consider a one-parametric
class of aﬃne connections for which we obtain the so-called dually-ﬂat structure for
which the corresponding Christoﬀel coeﬃcients vanish [33]. This structure is useful
in information geometry, however, we stick to the well-known Levi-Civita connection
(which can be obtained as a special case of a dually-ﬂat connection, since the Levi-
Civita connection is the only self-dual connection [4]), because the metric is non-
vanishing. Thus, the corresponding invariants, such as scalar curvature, are non-trivial
and reveal some information about the statistical manifold.
5Note that this parametric family typically constitutes a smooth manifold [33].
10 Will be inserted by the editor
Let us now focus on the scalar curvature of corresponding to the metric tensor,
Rφ=gik
φglj
φRφ,ilkj , in the thermodynamic limit N→ ∞. We focus on the micro-
canonical ensemble, i.e., we consider pi= 1/W . We assume no prior information
about the system or its dynamics, so all states are equally probable.
It is possible to show in a technical but straightforward calculation that the scalar
Rφ(W) = W(W1)
(2rφ(W+ 1))2,(32)
which corresponds to the scalar curvature of a W-dimensional ball of radius 2rφ. The
function rφdepends only on the form of the φ-deformation. We call the function rφ
characteristic length. For the case of the Amari metric, it can be expressed as
(rA
φ(W))2=log0
φ1
W2log00
φ1
W3
log000
φ1
Wlog0
φ1
W3 log00
φ1
W22,(33)
while for the metric of Naudts type we obtain
(rN
φ(W))2=W(log0
φ1
W)3
(log00
φ1
W)2.(34)
4 Information geometry of scaling expansions
Let us now consider an arbitrary φ-deformed logarithm. We show how to introduce
a generalization of the logarithm with a given asymptotic scaling. In contrast to φ-
deformations, we do not start with the deﬁnition of φ, but focus on the deﬁnition of
the logarithm. We denote the desired logarithmic function as ΛD. Let us state the
requirements that ΛDshould fulﬁl:
1. Domain:ΛD:R+R,
2. Monotonicity:Λ0
D(x)>0,
3. Concavity:Λ00
D(x)<0,
4. Normalization:Λ0
D(1) = 1,
5. Self-duality:ΛD(1/x) = ΛD(x),
6. Scaling expansion:ΛD(x)Qk
j=0 hlog(j+l)(x)id(l)
jfor x→ ∞.
The requirements follow the properties of the ordinary logarithm. Particularly conve-
nient is the self-duality requirement, from which we can directly calculate the asymp-
totic expansion around 0+. A direct consequence of self-duality is that ΛD(1) = 0.
Next, we want to ﬁnd a representation that is simple, analytically expressible, and
universal for any set of scaling exponents. Due to the self-duality requirement, we
can focus only on the interval (1,+), while on the interval (0,1) the logarithm is
deﬁned by the self-duality. To ﬁnd an appropriate representation, we start from the
scaling expansion itself. Unfortunately, the scaling expansion, Qk
j=0 hlog(j+l)(x)id(l)
j,
is not generally deﬁned on the whole interval (1,), since the domain of log(l)(x) is
(exp(l2)(1),). We can overcome this issue by adjusting the nested logarithm by
replacing log 7→ 1 + log. Further, to be able to fulﬁl the normalization condition, we
Will be inserted by the editor 11
add a multiplicative constant to the ﬁrst nesting, so that for each order the corre-
sponding term can be expressed as (1 +rjlog([1 +log](j1) (x)). Thus, the generalized
logarithm can be expressed as
ΛD(x) = R
n
Y
j=0 1 + rjlog[1 + log](j+l1)(x)d(l)
j1
.(35)
The logarithm automatically fulﬁls the condition ΛD(1) = 0. The parameters rn
deﬁne the set of scale parameters that inﬂuence the behavior at ﬁnite values, while
the asymptotic properties are preserved. Because
Λ0
D(1) = R
n
X
j=0
rjd(l)
j(36)
we can obtain normalization of the derivative in several ways. For this we deﬁne the
“calibration”
r0=ρ1rPn
j=1 d(l)
j
rd(l)
0
(37)
rk=ρ(38)
R=r/ρ , (39)
where rand ρare free parameters. The parameter ρcan be determined by additional
requirements. The ﬁrst option is to require that ΛDis smooth enough, at least it
has continuous second derivative. From the second derivative of the self-duality con-
dition together with the normalization condition, we get Λ00
D(1) = 1. Following a
straightforward calculation, we ﬁnd
Λ00
D(1) = R
2X
i<j
rirjd(l)
id(l)
j+
n
X
j=0 r2
jd(l)
j.d(l)
j1(j+l)rjd(l)
j
.(40)
Using Eq. (37) in Eq. (40), we get an expression for ρC, i.e., the scale parameter in
the smooth calibration
ρC=rPjd(l)
j+l1
d(l)
01
d(l)
0r1rPjd(l)
j2+ (2 r)Pjd(l)
j2rPjid(l)
id(l)
j+rPjd(l)
j2.
(41)
The free parameter rcan be used to ensure that ρis positive. Alternatively, we can
simply consider r0= 1, which is useful for several applications. In this case we get
that ρLscale parameter in the leading-order calibration is simply
ρL=rd(l)
0
1rPn
j=1 d(l)
j
.(42)
Note that after a proper normalization, this calibration corresponds to the calibration
used in [18,19]. Unless a continuous second derivative is explicitly required, it is more
convenient to work with this simpler calibration.
We now turn our attention to the information geometry of ΛD-deformations and
introduce a notation for the nested logarithm
µk(x) = [1 + log](k)(x).(43)
12 Will be inserted by the editor
We sketch the results on for the scaling expansion with one correction. All technical
details can be found in Appendix B. In Appendix C we show the calculation for
arbitrary scaling vectors and calibrations, which is technically more diﬃcult, but
leads to the same results. We now denote the scaling vector as D= (l;c, d). Note
that this entropy has been studied for l= 0 in [18]. This inspires us to deﬁne the
generalized logarithm as
log(l;c,d)(x) = r µl(x)c1 + 1cr
dr log µl(x)d
1!= log(c,d)(µl(x)) .(44)
This deﬁnition corresponds to the choice of ρin Eq. (42)6. The logarithms are de-
picted in Fig. 2(a) for various scaling exponents. The inverse function, the deformed
exponential, can be obtained in terms of the Lambert-W function7
exp(l;c,d)(x) = νlexp d
chWB(1 x/r)1/dW(B)i ,(45)
where B=cr
1cr exp cr
1cr and νlis the inverse function of µl, i.e.,
νl(x) = exp(exp(. . . (exp(x1) 1) . . . ))
| {z }
ltimes
.(46)
Note that depending on the values of cand dthis deformed exponential contains the
exponential, power laws, and stretched exponentials, respectively [1]. It is easy to see
that the corresponding scaling vector of the exponential is C= (l; 1/c, d/c). The
function φ(l;c,d)(x) can be expressed as
φ(l;c,d)(x) = φ(0;c,d)(µl(x)) ·µl(x)1
=µl(x)
log(c,d)r
dr + (1 cr) log µl(x)
d+c(1 cr) log µl(x)
l1
Y
j=0
µk(x).(47)
The escort distribution, ρ(l;c,d)(p) = φ(l;c,d)(p)/(φ(l;c,d)(p) + φ(l;c,d)(1 p)), corre-
sponding to the two-event distribution (p, 1p) is depicted in Fig. 2(b) for various
scaling exponents. Interestingly, for D<(1; 1), i.e., for entropies corresponding to sub-
exponential sample space growth, the distribution shows high probabilities (generally
p > 1/N), while for D>(1; 1), i.e., for super-exponential growth, the distribution
shows low probabilities (p < 1/N). Let us ﬁnally show the asymptotic behavior of the
curvature that corresponds to the deformed logarithm. It can be easily calculated if
one keeps only dominant contributions from each term in the Eqs. (33) and (34). In
this case we have
dnlog(l;c)(x)
dxn(µl(x))c1µ0
l(x)x1nfor x→ ∞ ,(48)
and therefore
r(l;c)(W)1/W µc1
l(W)µ0
l(W) for W→ ∞ ,(49)
6Note that the original (c, d)-logarithm (as appearing in the rightmost part of Eq. (44))
was introduced in [18] for l= 0 and c7→ 1c. Nevertheless, that choice of parametrization
is not so convenient for l > 0.
7The Lambert-W function is deﬁned as the solution of the equation W(z)eW(z) = z.
Will be inserted by the editor 13
2 4 6 8 10x
-4
-2
0
2
4
Log(l,c,d)(x)
(a)Generalized logarithm
RW ARW MC RN RWC
0.2 0.4 0.6 0.8 1.0x
0.2
0.4
0.6
0.8
1.0
ϕ(l,c,d)(x)
(b)Escort distribution
Fig. 2. (a) Generalized logarithms corresponding to scaling exponents of the aforementioned
models. (b) Escort distributions corresponding to the generalized logarithms. The scaling
exponents (l;c, d) for the models are: Random walk (RW): (1; 1,0), Ageing random walk
(ARW): (1; 2,0), Magnetic coin model (MC): (1; 1,1), Random network (RN): (1; 1/2,0),
20 40 60 80 100N
1.0
1.5
2.0
2.5
3.0
3.5
r(l;c,d)
A(N)
20 40 60 80 100N
1
2
3
4
r(l;c,d)
N(N)
Fig. 3. Characteristic length corresponding to the curvature of the statistical manifold
for equiprobable distribution corresponding to diﬀerent scaling exponents. Figure (a) corre-
sponds to length of Amari type, (b) corresponds to length of Naudts type, respectively.
for both curvatures, calculated from both types of metric, as shown in B. From this
we deduce that
lim
W→∞ r(W) = +, l = 0 or l= 1, c > 1
0, l 2 or l= 1, c < 1.(50)
For the case l= 1 and c= 1, we can make a similar approximation
dnlog(1;1,d)(x)
dxnxnlog(1 + log(x))d,(51)
to get
r(1;1,d)log(1 + log(W))dN→ ∞ (52)
Similar results can be obtained for higher-order corrections. The behavior of rfor
diﬀerent scaling vectors is depicted in Fig. 3. We see that the asymptotic behavior is
similar for both types of curvature, the only diﬀerence is for smaller N.
Similarly, we obtain the same behavior also for higher-order corrections (see Ap-
pendix C). In conclusion, we ﬁnd three distinct regimes for the statistical manifold
with respect to the scaling vector
14 Will be inserted by the editor
(I) D<(1; 1) rD(W)→ ∞ for W→ ∞,
(II) D= (1; 1) rD(W) = 1 for W > 0,
(III) D>(1; 1) rD(W)0 for W→ ∞.
As a result, the curvature exhibits a phase transition — the statistical manifold in
thermodynamic limit is ﬂattening for sub-exponential processes, has constant section
curvature for exponential processes, and is curving for super-exponential processes.
While processes with exponentially growing sample space have (practically) indepen-
dent sub-systems, sub-exponential processes impose some restrictions and constaints
on the sample space. Super-exponential processes are characterized by emergent struc-
tures of its sample space. The scaling vector plays the natural role of the set of order
parameters. Let us ﬁnally note that the limit W→ ∞ is performed for rD(W).
The “limit space” obtained in the limit of the statistical manifolds, for W→ ∞,
might not be a smooth manifold and the curvature might not correspond to the limit
limW→∞ RD(W).
5 Conclusions and Perspectives
In this paper, we have deﬁned a class of deformed logarithms with a given scaling
expansion in the framework of φ-deformed logarithms. The corresponding entropy
can be used to deﬁne the statistical manifold with generalized Fisher-Rao metric. We
have shown that for the microcanonical ensemble in the thermodynamic limit, the
scalar curvature exhibits a phase transition where the critical point is represented by
the class of phenomena that are characterized by exponentially growing phase spaces.
These include weakly interacting systems that are correctly described by Shannon
entropy. The scaling vector of a given system naturally deﬁnes a set of order param-
eters. A possible explanation for this phenomenon is that the number of independent
degrees of freedom grows slower than the size of the system for sub-exponential pro-
cesses and faster for the super-exponential processes. This classiﬁcation, however, does
not appear for the case of the Fisher metric of Csisz´ar type, since the characteristic
length is constant for every φ-deformation.
Contrary to common approach in information geometry, where the statistical man-
ifold corresponds to one functional family of distributions (e.g., exponential family of
distributions), this paper presents a parametric way how to switch between diﬀerent
functional families of distributions (e.g., from power-laws to stretched exponentials).
This opens a novel connection between parametric and non-parametric information
geometry and enables to classify diﬀerent types of statistical manifolds related to
various classes of deformed exponential families.
It will be natural to extend these results to generalizations of Bregmann di-
vergence enabling gauge invariance [44]. Moreover we will focus on application of the
results to the canonical ensemble and use the well-known results using Fisher informa-
tion metric on the thermodynamic manifold [45,46] for the case of complex systems,
where we need to use the generalized form of the Boltzmann factor [47]. Moreover,
it should also be possible to go beyond equilibrium statistical mechanics and extend
the generalized Fisher metric to non-equilibrium scenarios [48].
Acknowledgements
We acknowledge support from the Austrian Science Fund FWF project I 3073.
Will be inserted by the editor 15
References
1. S. Thurner, R. Hanel and P. Klimek, Introduction to the Theory of Complex Systems.
Oxford University Press: Oxford, UK (2018).
2. S. Thurner, B. Corominas-Murtra and R. Hanel, Three faces of entropy for complex
systems: Information, thermodynamics, and the maximum entropy principle. Phys.
Rev. E 96 (2017) 032124.
3. N. Ay, J. Jost, H.V. Le, L. Schwachh¨ofer, Information Geometry. Springer: Berlin,
Germany, (2017).
4. S.-I. Amari, Information Geometry and Its Applications. Springer, Japan, (2016).
5. W. Janke, D.A. Johnston and R. Kenna, Information geometry and phase transitions.
Physica A 336 (2004) 181–186.
6. C. Tsallis, Possible Generalization of Boltzmann-Gibbs Statistics. Journal of Statistical
Physics 52 (1988) 479-487.
7. C. Tsallis, M. Gell-Mann and Y. Sato, Asymptotically scale-invariant occupancy of
phase space makes the entropy Sqextensive. Proc. Natl. Acad. Sci. USA 102 (2005)
1537715382.
8. G. Kaniadakis, Statistical mechanics in the context of special relativity. Phys. Rev. E
66 (2002) 056125.
9. P. Jizba and T. Arimitsu, The world according to Rnyi: Thermodynamics of multifrac-
tal systems. Ann. Phys. 312 (2004) 1759.
10. C. Tsallis and L.J. Cirto, Black hole thermodynamical entropy. Eur. Phys. J. C 73
(2013) 2487.
11. T. S. Bir´o, V. G. Czinner, H. Iguchi and P. V´an, Black hole horizons can hide positive
heat capacity. Phys. Lett. B 782 (2018) 228231.
12. C.E. Shannon, A mathematical theory of communication, Bell Syst. Tech. J. 27 (1948)
379-423, 623-656.
13. A.I. Khinchin, Mathematical Foundations of Information Theory. Dover, New York
(1957).
14. S. Abe, Axioms and uniqueness theorem for Tsallis entropy Physics Letters A 271(1-2)
(2000), 74-79.
15. V. M. Ili´c and M. S. Stankovi´c, Generalized ShannonKhinchin axioms and uniqueness
theorem for pseudo-additive entropies. Physica A 411 (2014) 138-145.
16. P. Tempesta, Group entropies, correlation laws, and zeta functions. Phys. Rev. E 84
(2011) 021121.
17. P. Jizba and J. Korbel, Maximum Entropy Principle in Statistical Inference: Case for
Non-Shannonian Entropies. Phys. Rev. Lett. 122 (2019) 120601.
18. R. Hanel and S. Thurner, A comprehensive classiﬁcation of complex statistical systems
and an axiomatic derivation of their entropy and distribution functions. Europhys.
Lett. 93 (2011) 20006.
19. R. Hanel and S. Thurner, When do generalized entropies apply? How phase space
volume determines entropy. Europhys. Lett. 96 (2011) 50003.
20. C. Anteneodo and A. R. Plastino. Maximum entropy approach to stretched exponential
probability distributions. J. Phys. A 32 (1999) 1089.
21. H.J. Jensen, R.H. Pazuki, G. Pruessner and P. Tempesta, Statistical mechanics of
exploding phase spaces: Ontic open systems. J. Phys. A 51 (2018) 375002.
22. J. Korbel, R. Hanel and S. Thurner, Classiﬁcation of complex systems by their sample-
space scaling exponents. New J. Phys. 20, 2018, 093007.
23. E. T. Copson, Asymptotic Expansions, Cambridge Tracts in Mathematics, Cambridge
University Press (1965).
24. J. Naudts, Deformed exponentials and logarithms in generalized thermostatistics. Phys-
ica A 316 (2002) 323334.
25. J. Naudts, Generalised thermostatistics. Springer Science & Business Media (2011).
26. J. Korbel, R. Hanel and S. Thurner, Information Geometric Duality of φ-Deformed
Exponential Families. Entropy 21 (2019) 112.
16 Will be inserted by the editor
27. C. Beck and F. Sch¨ogl, Thermodynamics of chaotic systems: an introduction. Cam-
bridge University Press (1995).
28. C. Beck, and E.D.G. Cohen. Superstatistics. Physica A 2003, 322, 267275.
29. C. Tsallis and A.M.C. Souza, Constructing a statistical mechanics for Beck-Cohen
superstatistics. Phys. Rev. E 67 (2003) 026106.
30. R. Hanel, S. Thurner and M. Gell-Mann, Generalized entropies and logarithms and
their duality relations. Proc. Natl. Acad. Sci. USA 109 (2012) 1915119154.
31. S.Abe, Geometry of escort distributions. Phys. Rev. E 68 (2003) 031101.
32. A. Ohara, H. Matsuzoe and S.-I. Amari, A dually ﬂat structure on the space of escort
distributions. J. Phys. Conf. Ser. 201 (2010) 012012.
33. S.-I. Amari, A. Ohara and H. Matsuzoe, Geometry of deformed exponential families:
Invariant, dually-ﬂat and conformal geometries. Physica A 391 (2012) 43084319.
34. D. P. K. Ghikas and F. D. Oikonomou, Towards an information geometric characteri-
zation/classiﬁcation of complex systems. I. Use of generalized entropies. Physica A 496
(2018) 384.
35. J. Naudts, Continuity of a class of entropies and relative entropies. Rev. Math. Phys.
16 (2004) 809822.
36. F. Caruso and C. Tsallis, Nonadditive entropy reconciles the area law in quantum
systems with classical thermodynamics. Phys. Rev. E 78 (2008) 021102.
37. J. A. Carrasco, F. Finkel, A. Gonz´alez-L´opez, M. A. Rodr´ıguez and P. Tempesta, Gener-
alized isotropic LipkinMeshkovGlick models: ground state entanglement and quantum
entropies. J. Stat. Mech. Theor. Exp. 2016(3) (2016) 033114.
38. J. Zhang, Divergence function, duality, and convex analysis. Neural Computation, 16(1)
(2004) 159-195.
39. S.-I. Amari, Diﬀerential-Geometrical Methods in Statistics. Lecture Notes in Statistics
Vol. 28 (2012) Springer Science & Business Media.
40. B. Corominas-Murtra, R. Hanel, S. Thurner, Understanding scaling through history-
dependent processes with collapsing sample space. Proc. Nat. Acad. Sci. USA 112
(2015) 5348.
41. B. Corominas-Murtra, R. Hanel and S. Thurner, Extreme robustness of scaling in
sample space reducing processes explains Zipf’s law in diﬀusion on directed networks.
New J. Phys. 18 (2016) 093010.
42. B. Corominas-Murtra, R. Hanel and S. Thurner, Sample space reducing cascading
processes produce the full spectrum of scaling exponents. Scientiﬁc Reports 7 (2017)
11223.
43. R. Hanel and S. Thurner, Maximum conﬁguration principles for driven systems with
arbitrary driving, Entropy 20 (2018) 838.
44. J. Naudts and J. Zhang, Rhotau embedding and gauge freedom in information geom-
etry. Information geometry 1(1) (2018) 79-115.
45. G. Ruppeiner, Riemannian geometry in thermodynamic ﬂuctuation theory. Rev. Mod.
Phys. 67 (1995) 605.
46. G. E. Crooks, Measuring Thermodynamic Length. Phys. Rev. Lett. 99 (2007) 100602.
47. R. Hanel and S. Thurner, Derivation of power-law distributions within standard sta-
tistical mechanics. Physica A 351(2-4) (2005) 260-268.
48. S. Ito, Stochastic Thermodynamic Interpretation of Information Geometry. Phys. Rev.
Lett. 121 (2018) 030605.
Will be inserted by the editor 17
A Basic algebra of scaling vectors
Let us discuss some deﬁnitions of ordinary operations on the space of scaling expo-
nents. First, let us introduce a truncated vector of the scaling vector deﬁned in Eq.
(10) as
Ck={l;c(l)
0, c(l)
1, . . . , c(l)
k}(53)
where kn. Then, we can introduce
Truncated equivalence relation: a(x)(k)b(x) if Ak≡ Bk
Truncated inequality relation: a(x)(k)b(x) if Ak<Bk.
Let us also add one set of inequality relations, and particularly for the case, when
even the order lis not equal. For this we deﬁne
Strong inequality relation: a(x)b(x) if la< lb.
Let us investigate representations of basic operations on the space of scaling ex-
ponents. Before that let us deﬁne the rescaling of the general operator O:Rm7→ R
as
O(l)(x1, x2, . . . , xm) = exp(l)hO(log(l)x1,log(l)x2,...,log(l)xm)i.(54)
Let us now denote the generalized addition as a(x)(l)b(x) and multiplication as
a(x)(l)b(x). It is easy to show that
a(x)(l)b(x) = a(x)(l+1) b(x) (55)
Let us now consider, without loss of generality, that a(x)b(x). The scaling vector
Cof c(x) = a(x)(l)b(x) can be expressed as follows:
C=
A+B= (l, a0+b0, a1+b1, . . . ),for la=lb=l;
B,for l < lalbor l=la< lb;
undeﬁned,for l > la.
(56)
The scaling vector Cof the generalized composition c(x) = exp(l)b(log(l)a(x)) can be
expressed as
C=
b(lb)
0A= (la+lb;a0b0, a1b0, a2b0, . . . , anb0),for la=l;
1(lb)A= (la+lb;a0, a1, a2, . . . , an),for l < la;
undeﬁned,for l > la.
(57)
Finally, let us focus on the derivative of the scaling expansion. Let us denote the
rescaled derivative operator as
(l)Dx[f] = exp(l) d(log(l)f(x))
dx!(58)
The scaling vector corresponding to the rescaled derivative is
(l)A0=
(la;a01, a1, a2, . . . , an),for la=l;
A,for la> l;
(l;1,...,1
| {z }
lla
,0, . . . ),for la< l. (59)
18 Will be inserted by the editor
B Asymptotic curvature of (l;c, d)-logarithm
In this appendix, we calculate asymptotic properties of the (l;c, d) logarithm. Let us
ﬁrst express the derivatives of (l;c, d) logarithm in terms of (c, d) logarithm and µl:
log0
(l;c,d)(x) = log0
(c,d)(µl(x))µ0
l(x),(60)
log00
(l;c,d)(x) = log00
(c,d)(x)(µ0
l(x))2+ log0
(c,d)(µl(x))µ00
l(x),(61)
log000
(l;c,d)(x) = log000
(c,d)(µ0
l(x))3+ 3 log00
(c,d)(µl(x))µ0
l(x)µ00
l(x) + log0
(c,d)(µl(x))µ000
l(x).(62)
The derivatives of the nested logarithm µl(x) = [1 + log](l)(x) can be expressed as:
µ0
l(x) = 1
Ql1
k=0 µk(x),(63)
µ00
l(x) = µ0
l(x)
l1
X
k=0
µ0
k(x) = 1
Ql1
k=0 µk(x)
l1
X
k=0
1
Qk
j=0 µj(x)
,(64)
µ000
l(x) = µ00
l(x)
l1
X
k=0
µ0
k(x)µ0
l(x)
l1
X
k=0
µ00
k(x)
=µ0
l(x)
l1
X
j=0
µ0
j(x)
l1
X
k=0
µ0
k(x) + µ0
l(x)
l1
X
k=0
µ0
k(x)
k1
X
j=0
µ0
j(x)
= 2µ0
l(x)
l1
X
k=0
µ0
k(x)
k1
X
j=0
µ0
j(x) + µ0
l(x)
l1
X
k=0
µ0
k(x)
l1
X
j=k
µ0
j(x).(65)
Let us ﬁrst denote l(c,d)(x) = log(c,d)(x) + r. Then the derivatives of log(c,d)can be
log0
(c,d)(x) = lc,d(x)
x(dr + (1 cr) log x)[d+c(1 cr) log x],(66)
log00
(c,d)(x) = lc,d(x)
x2(dr + (1 cr) log x)2hdddr (cr 1)2
+dc2(r2)r+ 2c1log x+ (c1)c(cr 1)2log2xi,(67)
log000
(c,d)(x) = lc,d(x)
x3(dr + (1 cr) log x)3hd3d(r1)(cr 1)22(cr 1)3+d22r23r+ 1
+d(cr 1) 3c3r23c2r(r+ 2) + cd2r2+ 6r3+ 6r+ 3+d(3 4r)3log x
d3c2(r1) + c(6 4r)2(cr 1)2log2x
cc23c+ 2(cr 1)3log3xi.(68)
In the asymptotic limit, only dominant contributions are relevant. Thus, let us con-
sider only dominant scaling c(i.e., take d= 0) and we get
log0
(l;c,d)(x) = log0
(c,d)(µl(x))µ0
l(x)(µl(x))c1µ0
l(x),(69)
log00
(l;c,d)(x)log0
(c,d)(µl(x))µ00
l(x)≈ −(µl(x))c1µ0
l(x)
x(70)
log000
(l;c,d)(x)log0
(c,d)(µl(x))µ000
l(x)(µl(x))c1µ0
l(x)
x2.(71)
Thus, we plugged in Eqs. (33,34) then we get
rA
(l;c)µc1
l(x)(µ0
l(x))
xµc1
l(x)(µ0
l(x))2
"µc1
l(x)(µ0
l(x))µc1
l(x)(µ0
l(x))
x23µc1
l(x)(µ0
l(x))
x2#2c1
l(x)µ0
l(x),(72)
Will be inserted by the editor 19
and
rN
(l;c)µc1
l(x)(µ0
l(x))3
xµc1
l(x)(µ0
l(x))
x2c1
l(x)µ0
l(x).(73)
Let us then focus on the situation l= 1, c= 1. In this case the leading order terms
cancel and we have to look at the ﬁrst correction given by the scaling exponent d. In
this case
log0
(1;1,d)(x)log(1 + log(x))d
x,(74)
log00
(1;1,d)(x)log(1 + log(x))d
x2,(75)
log000
(1;1,d)(x)log(1 + log(x))d
x3.(76)
So the curvature of both Amari and Naudts type can be asymptotically expressed
as
r(1;1,d)log(1 + log(x))d.(77)
C Fisher metric and scalar curvature corresponding to general
logarithm
Let us now show the full calculation of the scalar curvature corresponding to the
ΛD-logarithm with arbitrary scaling vector Dand constants rj. Let us ﬁrst recall the
product rule for higher derivatives. The ﬁrst three derivatives of a function ΛD(x) =
RQn
j=0 λj(x)1:
Λ0
D(x) = R
n
Y
j=0
λj(x)
n
X
j=0
λ0
j(x)
λj(x)
,(78)
Λ00
D(x) = R
n
Y
j=0
λj(x)
2X
i<j
λ0
i(x)λ0
j(x)
λi(x)λj(x)+
n
X
j=0
λ00
j(x)
λj(x)
,(79)
Λ000
D(x) = R
n
Y
j=0
λj(x)
6X
i<j<k
λ0
i(x)λ0
j(x)λ0
k(x)
λi(x)λj(x)λk(x)
+3 X
i<j
λ00
i(x)λ0
j(x) + λ0
i(x)λ00
j(x)
λi(x)λj(x)+X
i
λ000
i(x)
λi(x)
.(80)
The derivatives of λjcan be expressed by deﬁning function L
Lj(x) = 1
(1 + rjlog µj+l1(x)) Qj+l1
k=0 µk(x)=µ0
j+l(x)
(1 + rjlog µj+l1(x)) .(81)
Then we can express
λ0
j(x) = λj(x) (rjdjLj(x)) ,(82)
λ00
j(x) = λj(x)r2
jd2
jL2
j(x) + rjdjL0
j(x),(83)
λ000
j(x) = λj(x)r3
jd3
jL3
j(x)+3r2
jd2
jL0
j(x)Lj(x) + rjdjL00
j(x).(84)
20 Will be inserted by the editor
The derivatives of Lj(x) can be expressed as
L0
j(x) = −L2
j(x)
rj+ (1 + rjlog µj+l1(x))
l+j1
X
k=0
j+l1
Y
m=k+1
µm(x)
,(85)
L00
j(x) = 2L3
j(x)
rj+ (1 + rjlog µj+l1(x))
l+j1
X
k=0
j+l1
Y
m=k+1
µm(x)
2
− L2
j(x)
rj
l+j1
X
k=0
j+l1
Y
m=k+1
µm(x)
+ (1 + rjlog µj+l1(x))
j+l1
X
k=0
j+l1
X
m=k+1 Qj+l1
p=k+1 µp(x)
Qm
p0=0 µp0(x)
.(86)
We can ﬁnally rewrite the derivatives of ΛDas
d
dx(ΛD(x)) = R
n
Y
j=0
λj(x)
n
X
j=0 C1
j(x)Lj(x)
,(87)
d2
dx2(ΛD(x)) = R
n
Y
j=0
λj(x)
n
X
i=1
n
X
j=1 C2
ij (x)Li(x)Lj(x)
,(88)
d3
dx3(ΛD(x)) = R
n
Y
j=0
λj(x)
n
X
i=1
n
X
j=1
n
X
k=1 C3
ijk (x)Li(x)Lj(x)Lk(x)
,(89)
where the coeﬃcients Ccan be expressed as
C1
i(x) = ridi,(90)
C2
ij (x) = ridi[rjdjδij Aj(x)] ,(91)
C3
ijk (x) = ridi[rjdjrkdk(δij rjdjAj(x) + δikrkdkAk(x) + δj krjdjAj(x)) δij kBi(x)] ,(92)
where
Ai(x) =
(ri+ (1 + rilog µi+l1(x)))
i+l1
X
k=0
i+l1
Y
m=k+1
µm(x)
,(93)
Bi(x) = 2Ai(x)2+Li(x)
ri
l+j1
X
k=0
i+l1
Y
m=k+1
µm(x)
+(1 + rilog µi+l1(x))
i+l1
X
k=0
i+l1
X
m=k+1 Qi+l1
p=k+1 µp(x)
Qm
p0=0 µp0(x)
.(94)
Finally, we plug the expressions for ΛDand its derivatives into Eqs. (33,34), and we
end with
rA
D(x) = PiC1
iLi(x)2Pkl C2
kl(x)Lk(x)Ll(x)3
Pijkl hC1
i(x)C3
jkl (x)3C2
ij (x)C2
kl(x)iLi(x)Lj(x)Lk(x)Ll(x)2,(95)
and
rN
D(x) = PiC1
i(x)Li(x)3
xPij C2
ij (x)Li(x)Lj(x)2,(96)
for x= 1/W , respectively.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In the world of generalized entropies—which, for example, play a role in physical systems with sub- and super-exponential phase space growth per degree of freedom—there are two ways for implementing constraints in the maximum entropy principle: linear and escort constraints. Both appear naturally in different contexts. Linear constraints appear, e.g., in physical systems, when additional information about the system is available through higher moments. Escort distributions appear naturally in the context of multifractals and information geometry. It was shown recently that there exists a fundamental duality that relates both approaches on the basis of the corresponding deformed logarithms (deformed-log duality). Here, we show that there exists another duality that arises in the context of information geometry, relating the Fisher information of ϕ -deformed exponential families that correspond to linear constraints (as studied by J.Naudts) to those that are based on escort constraints (as studied by S.-I. Amari). We explicitly demonstrate this information geometric duality for the case of ( c , d ) -entropy, which covers all situations that are compatible with the first three Shannon–Khinchin axioms and that include Shannon, Tsallis, Anteneodo–Plastino entropy, and many more as special cases. Finally, we discuss the relation between the deformed-log duality and the information geometric duality and mention that the escort distributions arising in these two dualities are generally different and only coincide for the case of the Tsallis deformation.
Article
Full-text available
Depending on context, the term entropy is used for a thermodynamic quantity, a measure of available choice, a quantity to measure information, or, in the context of statistical inference, a maximum configuration predictor. For systems in equilibrium or processes without memory, the mathematical expression for these different concepts of entropy appears to be the so-called Boltzmann–Gibbs–Shannon entropy, H. For processes with memory, such as driven- or self- reinforcing-processes, this is no longer true: the different entropy concepts lead to distinct functionals that generally differ from H. Here we focus on the maximum configuration entropy (that predicts empirical distribution functions) in the context of driven dissipative systems. We develop the corresponding framework and derive the entropy functional that describes the distribution of observable states as a function of the details of the driving process. We do this for sample space reducing (SSR) processes, which provide an analytically tractable model for driven dissipative systems with controllable driving. The fact that a consistent framework for a maximum configuration entropy exists for arbitrarily driven non-equilibrium systems opens the possibility of deriving a full statistical theory of driven dissipative systems of this kind. This provides us with the technical means needed to derive a thermodynamic theory of driven processes based on a statistical theory. We discuss the Legendre structure for driven systems.
Article
Full-text available
The nature of statistics, statistical mechanics and consequently the thermodynamics of stochastic systems is largely determined by how the number of states W(N) depends on the size N of the system. Here we propose a scaling expansion of the phasespace volume W(N) of a stochastic system. The corresponding expansion coefficients (exponents) define the universality class the system belongs to. Systems within the same universality class share the same statistics and thermodynamics. For sub-exponentially growing systems such expansions have been shown to exist. By using the scaling expansion this classification can be extended to all stochastic systems, including correlated, constraint and super-exponential systems. The extensive entropy of these systems can be easily expressed in terms of these scaling exponents. Systems with super-exponential phasespace growth contain important systems, such as magnetic coins that combine combinatorial and structural statistics. We discuss other applications in the statistics of networks, aging, and cascading random walks.
Article
Full-text available
The standard model of information geometry, expressed as Fisher–Rao metric and Amari-Chensov tensor, reflects an embedding of probability density by $$\log$$-transform. The present paper studies parametrized statistical models and the induced geometry using arbitrary embedding functions, comparing single-function approaches (Eguchi’s U-embedding and Naudts’ deformed-log or phi-embedding) and a two-function embedding approach (Zhang’s conjugate rho-tau embedding). In terms of geometry, the rho-tau embedding of a parametric statistical model defines both a Riemannian metric, called “rho-tau metric”, and an alpha-family of rho-tau connections, with the former controlled by a single function and the latter by both embedding functions $$\rho$$ and $$\tau$$ in general. We identify conditions under which the rho-tau metric becomes Hessian and hence the $$\pm 1$$ rho-tau connections are dually flat. For any choice of rho and tau there exist models belonging to the phi-deformed exponential family for which the rho-tau metric is Hessian. In other cases the rho–tau metric may be only conformally equivalent with a Hessian metric. Finally, we show a formulation of the maximum entropy framework which yields the phi-exponential family as the solution.
Article
Full-text available
Assuming Euler homogeneity of the entropy we point out that black hole horizons can hide positive heat capacity and specific heat. Such horizons are mechanically marginal, but thermally stable. The homogeneity requires the usage of a volume in general relativity, for which various suggestions, scaling differently with the radius, are reviewed. Considering regular density functions, besides the Hawking temperature, a pressure and a corresponding work term render the equation of state beyond the horizon thermally stable.
Article
Full-text available
Sample Space Reducing (SSR) processes are simple stochastic processes that offer a new route to understand scaling in path-dependent processes. Here we define a cascading process that generalises the recently defined SSR processes and is able to produce power laws with arbitrary exponents. We demonstrate analytically that the frequency distributions of states are power laws with exponents that coincide with the multiplication parameter of the cascading process. In addition, we show that imposing energy conservation in SSR cascades allows us to recover Fermi's classic result on the energy spectrum of cosmic rays, with the universal exponent -2, which is independent of the multiplication parameter of the cascade. Applications of the proposed process include fragmentation processes or directed cascading diffusion on networks, such as rumour or epidemic spreading.
Article
In this Letter, we show that the Shore-Johnson axioms for the maximum entropy principle in statistical estimation theory account for a considerably wider class of entropic functional than previously thought. Apart from a formal side of the proof where a one-parameter class of admissible entropies is identified, we substantiate our point by analyzing the effect of weak correlations and by discussing two pertinent examples: two-qubit quantum system and transverse-momentum behavior of hadrons in high-energy proton-proton collisions.
Article
The volume of phase space may grow super-exponentially ('explosively') with the number of degrees of freedom for certain types of complex systems such as those encountered in biology and neuroscience, where components interact and create new emergent states. Standard ensemble theory can break down as we demonstrate in a simple model reminiscent of complex systems where new collective states emerge. We present an axiomatically defined entropy and argue that it is extensive in the micro-canonical, equal probability, and canonical (max-entropy) ensemble for super-exponentially growing phase spaces. This entropy may be useful in determining probability measures in analogy with how statistical mechanics establishes statistical ensembles by maximising entropy.
Article
In recent years, the unified theory of information and thermodynamics has been intensively discussed in the context of stochastic thermodynamics. The unified theory reveals that information theory would be useful to understand non-stationary dynamics of systems far from equilibrium. In this letter, we have found a new link between stochastic thermodynamics and information theory well known as information geometry. By applying this link, information geometric inequalities can be interpreted as thermodynamic uncertainty relationship between speed and thermodynamic cost. We have numerically applied information geometric inequalities to a thermodynamical model of biochemical enzyme reaction.
Article
There are three ways to conceptualize entropy: entropy as an extensive thermodynamic quantity of physical systems (Clausius, Boltzmann, Gibbs), entropy as a measure for information production of ergodic sources (Shannon), and entropy as a means for statistical inference on multinomial Bernoulli processes (Jaynes maximum entropy principle). Even though these notions are fundamentally different concepts, the functional form of the entropy for thermodynamic systems in equilibrium, for ergodic sources in information theory, and for independent sampling processes in statistical systems, is degenerate, $H(p)=-\sum_i p_i\log p_i$. For many complex systems, which are typically history-dependent, non-ergodic and non-multinomial, this is no longer the case. Here we show that for such processes the three entropy concepts lead to different functional forms of entropy. We explicitly compute these entropy functionals for three concrete examples. For Polya urn processes, which are simple self-reinforcing processes, the source information rate is $S_{\rm IT}=\frac{1}{1-c}\frac1N \log N$, the thermodynamical (extensive) entropy is $(c,d)$-entropy, $S_{\rm EXT}=S_{(c,0)}$, and the entropy in the maxent principle (MEP) is $S_{\rm MEP}(p)=-\sum_i \log p_i$. For sample space reducing (SSR) processes, which are simple path-dependent processes that are associated with power law statistics, the information rate is $S_{\rm IT}=1+ \frac12 \log W$, the extensive entropy is $S_{\rm EXT}=H(p)$, and the maxent result is $S_{\rm MEP}(p)=H(p/p_1)+H(1-p/p_1)$. Finally, for multinomial mixture processes, the information rate is given by the conditional entropy $\langle H\rangle_f$, with respect to the mixing kernel $f$, the extensive entropy is given by $H$, and the MEP functional corresponds one-to-one to the logarithm of the mixing kernel.