Content uploaded by Jan Korbel
Author content
All content in this area was uploaded by Jan Korbel on Jan 20, 2020
Content may be subject to copyright.
EPJ manuscript No.
(will be inserted by the editor)
Information geometry of scaling expansions
of non-exponentially growing configuration
spaces
Jan Korbel1,2,a, Rudolf Hanel1,2,b, and Stefan Thurner1,2,3,4,c
1Section for Science of Complex Systems, CeMSIIS, Medical University of Vienna, Spital-
gasse 23, 1090 Vienna, Austria
2Complexity Science Hub Vienna, Josefst¨adter Strasse 39, 1080 Vienna, Austria
3Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
4IIASA, Schlossplatz 1, 2361 Laxenburg, Austria
Abstract. Many stochastic complex systems are characterized by the
fact that their configuration space doesn’t grow exponentially as a func-
tion of the degrees of freedom. The use of scaling expansions is a nat-
ural way to measure the asymptotic growth of the configuration space
volume in terms of the scaling exponents of the system. These scal-
ing exponents can, in turn, be used to define universality classes that
uniquely determine the statistics of a system. Every system belongs
to one of these classes. Here we derive the information geometry of
scaling expansions of sample spaces. In particular, we present the de-
formed logarithms and the metric in a systematic and coherent way.
We observe a phase transition for the curvature. The phase transition
can be well measured by the characteristic length r, corresponding
to a ball with radius 2rhaving the same curvature as the statistical
manifold. Increasing characteristic length with respect to size of the
system is associated with sub-exponential sample space growth is as-
sociated to strongly constrained and correlated complex systems. De-
creasing of the characteristic lenght corresponds to super-exponential
sample space growth that occurs for example in systems that develop
structure as they evolve. Constant curvature means exponential sample
space growth that is associated with multinomial statistics, and tradi-
tional Boltzmann-Gibbs, or Shannon statistics applies. This allows us
to characterize transitions between statistical manifolds corresponding
to different families of probability distributions.
1 Introduction
Statistical physics of complex systems has turned into an increasingly important
topic with many applications. Its main aim is to come up with a unified approach to
ae-mail: jan.korbel@meduniwien.ac.at
be-mail: rudolf.hanel@meduniwien.ac.at
ce-mail: stefan.thurner@meduniwien.ac.at
arXiv:2001.06393v1 [cond-mat.stat-mech] 17 Jan 2020
2 Will be inserted by the editor
understand, describe, and predict the statistical properties of a plethora of different
complex systems; see e.g., [1] for an overview. While the microscopic nature of complex
systems can be very different, their statistical properties often have common features
across various systems. Entropy is undoubtedly the key concept in statistical physics
that connects the statistical description of microscopic dynamics with the macroscopic
thermodynamic properties of a system. The notion of entropy has been adopted also in
other contexts, such as information theory or statistical inference, which are concepts
quite different from thermodynamics [2]. One elegant and powerful concept arising
from the theory of statistical inference is that of information geometry [3,4]. It applies
ideas from differential geometry to probability theory and statistics. In this context,
the concept of entropy also plays a crucial role, since the metric on the statistical
manifold is derived from the corresponding (relative) entropy. This, so-called Fisher-
Rao metric, enables us to analyze statistical systems from a different perspective. For
example one can study critical transitions by calculating singularities of the metric
[5].
In information geometry, most attention has focused on systems that are governed
by Shannon entropy [3,4]. However, it is well known that many complex systems,
especially strongly correlated or constrained systems, or systems with emergent com-
ponents, cannot be described within the framework of Shannon entropy [1]. For this
reason, a number of generalizations to Shannon entropy have been proposed; in con-
nection with power laws [6,7], special relativity [8], multifractal thermodynamics [9],
or black holes [10,11].
To classify entropies for stochastic systems of various kinds, it is natural to
start with the information-theoretic foundations of Shannon entropy, i.e. the so-called
Shannon-Khinchin (SK) axioms [12,13]. The first three SK axioms are usually formu-
lated as:
–(SK1) Entropy is a continuous function of the probabilities pionly1.
–(SK2) Entropy is maximal for the uniform distribution, pi= 1/W .
–(SK3) Adding a state W+ 1 to a system with pW+1 = 0 does not change the
entropy of the system.
The fourth axiom is called the composability axiom and determines the entropy func-
tional uniquely:
–(SK4) H(A+B) = H(A) + H(B|A), where H(B|A) = PpA
kH(B|Ak)
where H(B|Ak) is the entropy of the conditional probability, pB|Ak. In this for-
mulation, the unique solution that is compatible with SK1-4 is Shannon entropy
H(P) = −Pipilog pi. When the fourth axiom is relaxed, one can obtain wider class
of entropic functionals. First generalizations of the fourth axiom were introduced in
connection with generalized additivity [14,15] group laws [16] or statistical inference
[17]. These approaches are somewhat limited in scope, since they all lead to class of
entropies, which can be expressed as a function of Tsallis entropy [6].
The relaxation of SK4 also naturally leads to a classification scheme of complex
systems [18,19]. The main idea of this approach is to study the asymptotic scaling ex-
ponents of the entropy functional that are associated to a particular system’s configu-
ration space. These systems are associated with systems that have a sub-exponentially
growing configuration space, when seen as a function of degrees of freedom. This clas-
1In several cases, entropies incorporate external parameter such as qfor Tsallis entropy
or cand dfor (c,d)-entropies. However, these parameters are constants that characterize the
universality class of the process. They are not parameters subject to variation in entropy
maximization.
Will be inserted by the editor 3
sification scheme is based on mathematical analysis of the asymptotic scaling of the
entropic functionals that are governed by the first three SK axioms2.
Since the configuration space of most complex systems does not grow exponentially
(as for the case of Shannon entropy), but polynomially [7], as a stretched exponential
[20], or even super-exponentially [21], the appropriate scaling behavior of the entropic
functional is crucial for a proper thermodynamic interpretation. For this end, we
use recently developed scaling expansion [22], which is a special case of Poincar´e
asymptotic series [23], whose coefficients are the scaling exponents of the system.
The aim of this paper is to define a generalization of Shannon entropy that matches
the appropriate asymptotic scaling of a given system, and use it to derive the asso-
ciated generalized Fisher-Rao metric of the underlying statistical manifold. For this
end, we use the framework of deformed logarithms [35,25]. It has been shown re-
cently [26] that one can naturally obtain two types of information metric within that
framework, one, corresponding to the maximum entropy principle with linear con-
straints, and the other, corresponding to the maximum entropy principle when used
with so-called escort constraints, instead of ordinary (linear) constraint terms.
Escort distributions appeared in connection with chaotic systems [27], and were
discussed in the context of superstatistics [28,29]. Later it became possible to relate
them to linear constraints through a log-duality [30]. Interestingly, escort distributions
also appear as a canonical coordinate in information geometry [31,32]. In this paper,
we use both linear and escort approaches and compare their corresponding metric ten-
sor and its invariants. We focus particularly on the microcanonical ensemble in the
thermodynamic limit, since the metric should correspond to the system’s asymptotic
properties, given by its characteristic structure. Some partial results for the curvature
of escort metric were recently obtained in this direction [34]. However, no systematic
and analytically expressible results for metric tensor and its scalar curvature have
been obtained so far. We show that the curvature of the statistical manifold natu-
rally distinguishes between three types of systems: systems with sub-exponentially
growing configuration or sample space (correlated and constrained systems), expo-
nentially growing sample space (equivalent to ordinary multinomial statistics), and
super-exponentially growing sample space (e.g. systems that develop emergent struc-
tures as they evolve). The vector of scaling exponents plays the role of a set of order
parameters, i.e., the distance from the phase transition between sub-exponential and
super-exponential phases.
The paper is organized as follows: Section 2 introduces the scaling expansion and
how to calculate corresponding scaling exponents. We discuss several systems with
non-trivial scaling exponents. In the last part of the section we establish a represen-
tation of universality classes for complex systems, by introducing scaling vectors and
their basic operations. In Section 3, we briefly revisit the results of information geom-
etry in the framework of φ-deformed logarithms. We focus on information geometry
with both, linear, and escort constraints. The main results of the paper are derived
in Section 4, where we define the appropriate generalized logarithm by combining
the φ-deformation framework and the requirement of asymptotic scaling. The prop-
erties of corresponding entropic functionals are discussed. We exemplify the whole
approach by the simple, yet very general, class of entropies with one correction term
from the scaling expansion and calculate the asymptotic behavior of scalar curvature
of the microcanonical ensemble in the thermodynamic limit. The last section draws
conclusions. The paper has several appendices that contain several technical details.
2This does not mean that actual distribution functions that are, say, obtained from the
maximum entropy principle must be equi-distributed, since the form of the distribution is
determined not only by the entropic functional, but also by the constraints.
4 Will be inserted by the editor
2 Scaling expansion of the volume of configuration space
The scaling expansion [22] is a method to investigate of the asymptotic scaling be-
havior of a sample space volume, W(N). Here Wis the number of accessible states
in a system, and Nindicates size of a system 3. The scaling expansion is a spe-
cial case of the Poincar´e asymptotic series, where the coefficients correspond to the
scaling exponents of the system. We introduce the notation for the iterated use
of functions, f(n)(x) = f(. . . f (f(x)) . . . )
| {z }
ntimes
, to define a set of re-scaling operations,
r(n)
λ(x) = exp(n)(λlog(n)(x)). This set of re-scaling operations contains the well-known
multiplicative re-scaling, x7→ λx (n= 0), power rescaling x7→ xλ(n= 1), and the
additive rescaling x7→ x+ log λ(n=−1). For each n,r(n)is a representation of the
multiplicative group (R+,×), i.e., r(n)
λ◦r(n)
λ0=r(n)
λλ0. We now investigate how a func-
tion, W(N), scales with re-scaling of N7→ r(n)
λ(N). Note that due to a simple theorem
(see Appendix A2 in [22]) the function z(λ), defined as z(λ) = limN→∞
g(r(n)
λ(N))
g(N),
must have the form z(λ) = λcfor c∈R∪ {±∞} whenever the limit exists. We start
with multiplicative scaling (n= 0): The expression W(λN )
W(N)is, according to the the-
orem, equal to λc0. We assume that W(N) is a strictly increasing function, then it
follows that c0≥04. It can happen that c0= +∞. In that case, the expression grows
faster than any polynomial. This problem can be resolved by using log(l)(W(N)) in-
stead of W(N), for an appropriate choice of l. The parameter lis chosen such that
c(l)
0, corresponding to log(l)W(λN)
log(l)W(N)∼λc(l)
0, is finite. We call lthe order of the process.
We get that W(N)∼exp(l)(Nc(l)
0), for N1. To get the corrections to the leading
order, we use the fact that log(l)W(λN )
log(l)W(N)
Nc0
(λN)c0∼1. When we use the re-scaling for
n= 1, we get the second scaling exponent: log(l)W(Nλ)
log(l)W(N)
Nc(l)
0
(Nλ)c(l)
0
∼λc(l)
1. Therefore,
W(N)∼exp(l)(Nc(l)
0(log N)c(l)
1). One can continue along the same lines to obtain the
asymptotic expansion of W(N), which reads
W(N)∼exp(l)
n
Y
j=0
(log(j)N)c(l)
j
for N→ ∞ ,(1)
where c(l)
jare the characteristic scaling exponents. The scaling expansion of log(l)W(N)
can be written
log(log(l)W(N)) =
n
X
j=0
c(l)
jlog(j+1) N+O(logn+1(N)) .(2)
It can be shown that the scaling exponents can be calculated from W(N) as
c(l)
k= lim
N→∞ log(k)(N) log(k−1)(N) ... log(N) Nd log(l)(W(N))
dN−c(l)
0!−c(l)
1!...!−c(l)
k−1!.
(3)
3For example, think of Nas the number of particles in a system, or the number of throws
in a coin tossing experiment.
4Details about processes with reducing sample space can be found e.g., in Refs.
[40,41,42,43].
Will be inserted by the editor 5
As a next step, we apply the scaling expansion to obtain the corresponding ex-
tensive entropy functionals. It is well-known that for complex systems (with sub- or
super-exponential phase space growth) the Shannon-Boltzman-Gibbs entropy is not
an extensive quantity. To obtain an extensive expression for such systems, one can
introduce an appropriate generalization of the entropy functional [1]. A natural way
how to characterize thermodynamic entropy is to define the entropy functional S(W)
which is extensive. This requirement can be expressed for the microcanonical ensem-
ble as S(W(N)) ∼Nfor N→ ∞. For the purpose of thermodynamics, we do not
have to require exact extensivity (with equality sign), but only its weaker asymptotic
version. We consider the general trace-form entropy functional
S(P) =
W
X
i=1
g(pi).(4)
The scaling expansion of the extensive entropy in the microcanonical ensemble can
be expressed as
S(W)∼
n
Y
j=0
log(j+l)(W(N))d(l)
jfor N→ ∞ ,(5)
and the scaling expansion of g(x) is
g(x)∼x
n
Y
j=0
log(j+l)1
xd(l)
j
.(6)
The scaling coefficients d(l)
jcan be obtained by
d(l)
k= lim
N→∞ log(l+k)(W) log(l+k−1)(W) ... log(l)(W) Nd log(l)W(N)
dN!−1!−d(l)
0!...!−d(l)
k−1!.
(7)
The requirement of extensivity determines the relation between scaling exponents
c(l)
jand d(l)
jas
d(l)
0=1
c(l)
0
d(l)
k=−c(l)
k
c(l)
0
k= 1,2, . . . . (8)
Examples of systems with different scaling exponents. The first example is a ran-
dom walk (RW) on the discrete one-dimensional lattice with two possible steps: left
or right. The space of all possible paths grows exponentially, WRW (N)=2N∼
exp(N), and we obtain the formula for Boltzmann entropy SRW = log WRW (kB=
1). Now consider an aging random walk (ARW) [19], where the walker takes one
step in a random direction, followed by two steps into a random direction, fol-
lowed by three steps, etc. In this case, the sample space grows sub-exponentially,
WARW ∼2√N/2, and SARW = (log WARW )2. The next example is the magnetic
coin model (MC) [21], where each coin can be in two states: head or tail, however,
two coins can also stick together and create a bond state. It can be shown that
the corresponding sample space grows super-exponentially, WMC ∼NN/2e2√N. One
6 Will be inserted by the editor
(a)
stretched exp
RW
CS
q-exp
MC
RN
RWC ARW
violates K2
violates K3
Lambert W0-expLambert W-1-exp
-2-1 1 2 d
-0.5
0.5
1.0
1.5
c
Fig. 1. parametric space of scaling expansion universality classes with scaling exponents
of Random walk (RW), Aging random walk (ARW) Magnetic coin model (MC), Random
networks (RN), Random walk cascade (RWC) and processes with compact support distri-
bution (CS). (a) 2D parametric space of scaling expansion universality classes for the first
two exponents (as in [18]). We see that some super-exponential systems are not properly
represented. (b) Extension to three dimensions by adding the third scaling exponent, d2. All
mentioned examples can be described with the first three scaling exponents.
can conclude that the corresponding extensive entropy is asymptotically equivalent
to SMC = log WMC /log log WM C . Another example of super-exponential processes
are random networks (RN), whose sample spaces grow as WRN = 2(N
2), and thus,
SRN = (log WRN )1/2. The final example is the double-exponential growth of random
walk cascade (RWC), where the walker can take a step to the right, to the left, or
split into two independent walkers [22]. For this we get that WRWC = 22N−1, and,
SRW C = log log WRW C . In Fig. 1 we show the parameter space of entropies given by
three scaling exponents (d0, d1, d2). The above examples are indicated as points. In
Fig. 1 (a) the plane for the first two scaling exponents is shown, as presented in [18].
We see that if one uses only the first two exponents, some super-exponential pro-
cesses are not properly represented. By adding a third scaling exponent this problem
is solved Fig. 1 (b). So far, we have not yet found simple examples that need more
than three scaling exponents.
2.1 Universality classes for scaling expansions
Scaling expansions define universality classes of statistical complex systems according
to set of the scaling exponents of their sample space [22]. The representation of the
sample space volume, W(N), by its scaling expansion can be used to uniquely describe
the statistical properties in the thermodynamic limit.
Consider a function c(x) represented by its scaling expansion
c(x)∼exp(l)
n
Y
j=0 log(j)(x)c(l)
j
.(9)
Its scaling exponents can be collected in the scaling vector
C={l;c(l)
0, c(l)
1, . . . , c(l)
n}.(10)
In principle, the scaling vector can be infinite, however, typically, after several terms
the corrections are either zero, or do not contribute significantly. The parameter n
denotes the number of corrections.
Will be inserted by the editor 7
Let a(x) and b(x) be two functions with its respective scaling expansion deter-
mined by the two vectors of scaling exponents
A={la;a0, a1, . . . , an}(11)
B={lb;b0, b1, . . . , bn}.(12)
Without loss of generality, ncan be the same for both vectors because one can always
append zeros to the shorter vector. We can now define the equivalence relation
a(x)∼b(x) if A≡B,(13)
as well as natural ordering
a(x)≺b(x) if A<B,(14)
where the symbol <is used in the lexicographic meaning, i.e.,
A<Bif
la< lb
la=lb, a0< b0
la=lb, a0=b0, a1< b1
. . . .
(15)
For every vector Cwe define the corresponding entropy scaling vector D, denoted
by D=C−1, that is obtained from Eq. (8) by requirement of extensivity. One can
define analogous relations for Dthrough the relations for corresponding vectors C.
Thus, for entropy scaling vectors Eand F, we can say that
E<Fif
le< lf
le=lf, e0< f0
le=lf, e0=f0, e1>f1
. . . .
(16)
Note that for sub-leading scaling exponents the inequality is reversed, which is the
result of Eq. (8). Additionally, one can define basic algebraic operations on the scaling
vectors, such as generalized addition or derivative operator. More details can be found
in Appendix A. Let us make an important note. As discussed in [22], the SK axioms
set requirements on the admissible set of scaling exponents. From SK2 we get that
dl≡d(l)
0>0 and from SK3 that d0<1. Note that the vector Dcan be also
represented as
D={l;d(l)
0, d(l)
1, . . . , d(l)
n}={0,...,0
| {z }
ltimes
, dl, dl+1, . . . , dl+n}.(17)
This means that one can use the representation without specifying lwith an appro-
priate number of zeros at the beginning. This is useful for example for the plots in the
parametric space, where it is possible to plot processes of different order l(as e.g., in
Fig. 1). However, one has to keep in mind that this representation can be misleading
in the sense that the limit dl→0 does not have a clear meaning, since it changes the
order of the process. This can be nicely seen in the example of Tsallis entropy [6],
where
lim
q→1−PW
i=1 pq
i−1
1−q=−
W
X
i=1
pilog pi,(18)
which can be formulated in terms of entropy scaling vectors for as
lim
q→1−D= lim
q→1−(1 −q, 0) = (0,1) (19)
8 Will be inserted by the editor
Interestingly, the limit from above, q→1+, is even more pathological. In this case the
scaling vector corresponding to Sq(P) for q > 1 is (0,0), because Sq(N)∼N1−q+ 1 ∼N0.
These pathologies have their origin in the non-commutativity of limits, limN→∞ limdl→06=
limdl→0limN→∞. The limit dl→0 depends on the particular representation of the
extensive entropy.
3 Information geometry of φ-deformations
Information geometry plays a central role in theory of information as well as in sta-
tistical inference. It allows one to study the structure of the statistical manifold by
means of differential geometry. We derive the information-geometric properties of the
scaling expansion in the framework of φ-deformed logarithms introduced in [35,25].
The φ-deformation is a generalization of logarithmic functions. It can be subsequently
used to establish a connection with information theory, where the logarithm plays the
role of a natural information measure (Hartley information). The φ-deformed loga-
rithm is defined by a positive, strictly increasing function φ(x), on (0,+∞) as
logφ(x) = Zx
1
dy
φ(y).(20)
Hence, logφis an increasing concave function with logφ(1) = 0. For φ(x) = x, we
obtain the ordinary logarithm. Naturally,
d logφ(x)
dx=1
φ(x).(21)
The inverse function of logφ, the so-called φ-exponential, is an increasing and convex
function. This enables one to define the parametric φ-exponential family of probability
distributions as
p(x;θ) = expφ Ψ(θ) + X
i
xiθi!,(22)
where the function Ψ(θ) is called the Massieu function and normalizes the distribu-
tion. As discussed in [26], there are two natural ways how to make a connection with
the theory of information through the maximum entropy principle. The first is based
on the maximization of the entropy functional under the linear (thermodynamic) con-
straints, the latter is based on a maximization under so-called escort (or geometric)
constraints. Both approaches lead to the φ-exponential family. The former approach
defines the φ-deformed entropy as [25]
SN
φ(p) =
W
X
i=1 Zpi
0
dxlogφ(x) (23)
which is maximized by the φ-exponential family for linear constraints, i.e., constraints
of the type
W
X
i=1
piEi=hEi.(24)
In information geometry, escort distributions play a special role of dual coordinates
on statistical manifolds [33]. They can be defined by φ-deformations as
Pφ
i=φ(pi)
Pkφ(pk)=φ(pi)
hφ(P).(25)
Will be inserted by the editor 9
It can be shown that the entropy maximized by the φ-exponential family for escort
constraints, i.e., for constraints of the type
W
X
i=1
Pφ
iEi=hEiφ,(26)
can be expressed as
SA
φ(p) =
W
X
i=1
Pφ
ilogφ(pi) = PW
i=1 φ(pi) logφ(pi)
PW
j=1 φ(pj).(27)
For both approaches can be linked to information geometry, i.e., to derive a general-
ization of a Fisher information metric, which can be done through a divergence (or
relative entropy) of Bregmann type, which is defined as
Df(p||q) = f(p)−f(q)− h∇f(q), p −qi,(28)
where h·,·i denotes the inner product. Alternatively, one can use the divergence
of Csisz´ar type, but its information geometry is trivial, because it is conformal to
ordinary Fisher information geometry, see e.g., Refs. [26,35].
Let us consider a parametric family of distributions p(θ). The Fisher information
metric of this family at point θ0can be calculated as
gf
ij (θ) = ∂2Df(p(θ0)||p(θ))
∂θi∂θj
|θ=θ0.(29)
Let us consider a discrete probability distribution {pi}n
i=0. The normalization is
given by Pn
i=0 pi= 1, so we consider pi, . . . , pnas independent variables, while
p0is determined from p0= 1 −Pn
i=1 pi. We parameterize this probability sim-
plex by a φ-deformed exponential family5. For the entropy SN
φ, we have fN
φ(p) =
PiRpi
0logφ(x) dxwhile for SA
φ(p) we end with f(p) = PiPθ
ilogφ(pi). After a straight-
forward calculation, we obtain that [26]
gN
φ,ij (P) = log0
φ(pi)δij + log0
φ(p0) = 1
φ(pi)δij +1
φ(p0),(30)
and
gA
φ,ij (P) = −1
hφ(p) log00
φ(pi)
log0
φ(pi)δij +log00
φ(p0)
log0
φ(p0)!=1
hφ(p)φ0(pj)
φ(pj)δij +φ0(p0)
φ(p0),
(31)
respectively. As a result, for a given φ-deformation there are two types of metric
on the information manifold. Note that it is natural to consider a one-parametric
class of affine connections for which we obtain the so-called dually-flat structure for
which the corresponding Christoffel coefficients vanish [33]. This structure is useful
in information geometry, however, we stick to the well-known Levi-Civita connection
(which can be obtained as a special case of a dually-flat connection, since the Levi-
Civita connection is the only self-dual connection [4]), because the metric is non-
vanishing. Thus, the corresponding invariants, such as scalar curvature, are non-trivial
and reveal some information about the statistical manifold.
5Note that this parametric family typically constitutes a smooth manifold [33].
10 Will be inserted by the editor
Let us now focus on the scalar curvature of corresponding to the metric tensor,
Rφ=gik
φglj
φRφ,ilkj , in the thermodynamic limit N→ ∞. We focus on the micro-
canonical ensemble, i.e., we consider pi= 1/W . We assume no prior information
about the system or its dynamics, so all states are equally probable.
It is possible to show in a technical but straightforward calculation that the scalar
curvature can be expressed as (see also [38,39])
Rφ(W) = W(W−1)
(2rφ(W+ 1))2,(32)
which corresponds to the scalar curvature of a W-dimensional ball of radius 2rφ. The
function rφdepends only on the form of the φ-deformation. We call the function rφ
characteristic length. For the case of the Amari metric, it can be expressed as
(rA
φ(W))2=−log0
φ1
W2log00
φ1
W3
log000
φ1
Wlog0
φ1
W−3 log00
φ1
W22,(33)
while for the metric of Naudts type we obtain
(rN
φ(W))2=W(log0
φ1
W)3
(log00
φ1
W)2.(34)
4 Information geometry of scaling expansions
Let us now consider an arbitrary φ-deformed logarithm. We show how to introduce
a generalization of the logarithm with a given asymptotic scaling. In contrast to φ-
deformations, we do not start with the definition of φ, but focus on the definition of
the logarithm. We denote the desired logarithmic function as ΛD. Let us state the
requirements that ΛDshould fulfil:
1. Domain:ΛD:R+→R,
2. Monotonicity:Λ0
D(x)>0,
3. Concavity:Λ00
D(x)<0,
4. Normalization:Λ0
D(1) = 1,
5. Self-duality:ΛD(1/x) = −ΛD(x),
6. Scaling expansion:ΛD(x)∼Qk
j=0 hlog(j+l)(x)id(l)
jfor x→ ∞.
The requirements follow the properties of the ordinary logarithm. Particularly conve-
nient is the self-duality requirement, from which we can directly calculate the asymp-
totic expansion around 0+. A direct consequence of self-duality is that ΛD(1) = 0.
Next, we want to find a representation that is simple, analytically expressible, and
universal for any set of scaling exponents. Due to the self-duality requirement, we
can focus only on the interval (1,+∞), while on the interval (0,1) the logarithm is
defined by the self-duality. To find an appropriate representation, we start from the
scaling expansion itself. Unfortunately, the scaling expansion, Qk
j=0 hlog(j+l)(x)id(l)
j,
is not generally defined on the whole interval (1,∞), since the domain of log(l)(x) is
(exp(l−2)(1),∞). We can overcome this issue by adjusting the nested logarithm by
replacing log 7→ 1 + log. Further, to be able to fulfil the normalization condition, we
Will be inserted by the editor 11
add a multiplicative constant to the first nesting, so that for each order the corre-
sponding term can be expressed as (1 +rjlog([1 +log](j−1) (x)). Thus, the generalized
logarithm can be expressed as
ΛD(x) = R
n
Y
j=0 1 + rjlog[1 + log](j+l−1)(x)d(l)
j−1
.(35)
The logarithm automatically fulfils the condition ΛD(1) = 0. The parameters rn
define the set of scale parameters that influence the behavior at finite values, while
the asymptotic properties are preserved. Because
Λ0
D(1) = R
n
X
j=0
rjd(l)
j(36)
we can obtain normalization of the derivative in several ways. For this we define the
“calibration”
r0=ρ1−rPn
j=1 d(l)
j
rd(l)
0
(37)
rk=ρ(38)
R=r/ρ , (39)
where rand ρare free parameters. The parameter ρcan be determined by additional
requirements. The first option is to require that ΛDis smooth enough, at least it
has continuous second derivative. From the second derivative of the self-duality con-
dition together with the normalization condition, we get Λ00
D(1) = −1. Following a
straightforward calculation, we find
Λ00
D(1) = R
2X
i<j
rirjd(l)
id(l)
j+
n
X
j=0 r2
jd(l)
j.d(l)
j−1−(j+l)rjd(l)
j
.(40)
Using Eq. (37) in Eq. (40), we get an expression for ρC, i.e., the scale parameter in
the smooth calibration
ρC=rPjd(l)
j+l−1
d(l)
0−1
d(l)
0r1−rPjd(l)
j2+ (2 −r)Pjd(l)
j−2rPj≤id(l)
id(l)
j+rPjd(l)
j2.
(41)
The free parameter rcan be used to ensure that ρis positive. Alternatively, we can
simply consider r0= 1, which is useful for several applications. In this case we get
that ρLscale parameter in the leading-order calibration is simply
ρL=rd(l)
0
1−rPn
j=1 d(l)
j
.(42)
Note that after a proper normalization, this calibration corresponds to the calibration
used in [18,19]. Unless a continuous second derivative is explicitly required, it is more
convenient to work with this simpler calibration.
We now turn our attention to the information geometry of ΛD-deformations and
introduce a notation for the nested logarithm
µk(x) = [1 + log](k)(x).(43)
12 Will be inserted by the editor
We sketch the results on for the scaling expansion with one correction. All technical
details can be found in Appendix B. In Appendix C we show the calculation for
arbitrary scaling vectors and calibrations, which is technically more difficult, but
leads to the same results. We now denote the scaling vector as D= (l;c, d). Note
that this entropy has been studied for l= 0 in [18]. This inspires us to define the
generalized logarithm as
log(l;c,d)(x) = r µl(x)c1 + 1−cr
dr log µl(x)d
−1!= log(c,d)(µl(x)) .(44)
This definition corresponds to the choice of ρin Eq. (42)6. The logarithms are de-
picted in Fig. 2(a) for various scaling exponents. The inverse function, the deformed
exponential, can be obtained in terms of the Lambert-W function7
exp(l;c,d)(x) = νlexp −d
chWB(1 −x/r)1/d−W(B)i ,(45)
where B=cr
1−cr exp cr
1−cr and νlis the inverse function of µl, i.e.,
νl(x) = exp(exp(. . . (exp(x−1) −1) . . . ))
| {z }
ltimes
.(46)
Note that depending on the values of cand dthis deformed exponential contains the
exponential, power laws, and stretched exponentials, respectively [1]. It is easy to see
that the corresponding scaling vector of the exponential is C= (l; 1/c, −d/c). The
function φ(l;c,d)(x) can be expressed as
φ(l;c,d)(x) = φ(0;c,d)(µl(x)) ·µl(x)−1
=µl(x)
log(c,d)−r
dr + (1 −cr) log µl(x)
d+c(1 −cr) log µl(x)
l−1
Y
j=0
µk(x).(47)
The escort distribution, ρ(l;c,d)(p) = φ(l;c,d)(p)/(φ(l;c,d)(p) + φ(l;c,d)(1 −p)), corre-
sponding to the two-event distribution (p, 1−p) is depicted in Fig. 2(b) for various
scaling exponents. Interestingly, for D<(1; 1), i.e., for entropies corresponding to sub-
exponential sample space growth, the distribution shows high probabilities (generally
p > 1/N), while for D>(1; 1), i.e., for super-exponential growth, the distribution
shows low probabilities (p < 1/N). Let us finally show the asymptotic behavior of the
curvature that corresponds to the deformed logarithm. It can be easily calculated if
one keeps only dominant contributions from each term in the Eqs. (33) and (34). In
this case we have
dnlog(l;c)(x)
dxn≈(µl(x))c−1µ0
l(x)x1−nfor x→ ∞ ,(48)
and therefore
r(l;c)(W)≈1/W µc−1
l(W)µ0
l(W) for W→ ∞ ,(49)
6Note that the original (c, d)-logarithm (as appearing in the rightmost part of Eq. (44))
was introduced in [18] for l= 0 and c7→ 1−c. Nevertheless, that choice of parametrization
is not so convenient for l > 0.
7The Lambert-W function is defined as the solution of the equation W(z)eW(z) = z.
Will be inserted by the editor 13
2 4 6 8 10x
-4
-2
0
2
4
Log(l,c,d)(x)
(a)Generalized logarithm
RW ARW MC RN RWC
0.2 0.4 0.6 0.8 1.0x
0.2
0.4
0.6
0.8
1.0
ϕ(l,c,d)(x)
(b)Escort distribution
Fig. 2. (a) Generalized logarithms corresponding to scaling exponents of the aforementioned
models. (b) Escort distributions corresponding to the generalized logarithms. The scaling
exponents (l;c, d) for the models are: Random walk (RW): (1; 1,0), Ageing random walk
(ARW): (1; 2,0), Magnetic coin model (MC): (1; 1,−1), Random network (RN): (1; 1/2,0),
Random walk cascade (RWC): (2;1,0).
20 40 60 80 100N
1.0
1.5
2.0
2.5
3.0
3.5
r(l;c,d)
A(N)
(a)Amari radius
20 40 60 80 100N
1
2
3
4
r(l;c,d)
N(N)
(b)Naudts radius
Fig. 3. Characteristic length corresponding to the curvature of the statistical manifold
for equiprobable distribution corresponding to different scaling exponents. Figure (a) corre-
sponds to length of Amari type, (b) corresponds to length of Naudts type, respectively.
for both curvatures, calculated from both types of metric, as shown in B. From this
we deduce that
lim
W→∞ r(W) = +∞, l = 0 or l= 1, c > 1
0, l ≥2 or l= 1, c < 1.(50)
For the case l= 1 and c= 1, we can make a similar approximation
dnlog(1;1,d)(x)
dxn≈x−nlog(1 + log(x))d,(51)
to get
r(1;1,d)≈log(1 + log(W))dN→ ∞ (52)
Similar results can be obtained for higher-order corrections. The behavior of rfor
different scaling vectors is depicted in Fig. 3. We see that the asymptotic behavior is
similar for both types of curvature, the only difference is for smaller N.
Similarly, we obtain the same behavior also for higher-order corrections (see Ap-
pendix C). In conclusion, we find three distinct regimes for the statistical manifold
with respect to the scaling vector
14 Will be inserted by the editor
(I) D<(1; 1) ⇔rD(W)→ ∞ for W→ ∞,
(II) D= (1; 1) ⇔rD(W) = 1 for W > 0,
(III) D>(1; 1) ⇔rD(W)→0 for W→ ∞.
As a result, the curvature exhibits a phase transition — the statistical manifold in
thermodynamic limit is flattening for sub-exponential processes, has constant section
curvature for exponential processes, and is curving for super-exponential processes.
While processes with exponentially growing sample space have (practically) indepen-
dent sub-systems, sub-exponential processes impose some restrictions and constaints
on the sample space. Super-exponential processes are characterized by emergent struc-
tures of its sample space. The scaling vector plays the natural role of the set of order
parameters. Let us finally note that the limit W→ ∞ is performed for rD(W).
The “limit space” obtained in the limit of the statistical manifolds, for W→ ∞,
might not be a smooth manifold and the curvature might not correspond to the limit
limW→∞ RD(W).
5 Conclusions and Perspectives
In this paper, we have defined a class of deformed logarithms with a given scaling
expansion in the framework of φ-deformed logarithms. The corresponding entropy
can be used to define the statistical manifold with generalized Fisher-Rao metric. We
have shown that for the microcanonical ensemble in the thermodynamic limit, the
scalar curvature exhibits a phase transition where the critical point is represented by
the class of phenomena that are characterized by exponentially growing phase spaces.
These include weakly interacting systems that are correctly described by Shannon
entropy. The scaling vector of a given system naturally defines a set of order param-
eters. A possible explanation for this phenomenon is that the number of independent
degrees of freedom grows slower than the size of the system for sub-exponential pro-
cesses and faster for the super-exponential processes. This classification, however, does
not appear for the case of the Fisher metric of Csisz´ar type, since the characteristic
length is constant for every φ-deformation.
Contrary to common approach in information geometry, where the statistical man-
ifold corresponds to one functional family of distributions (e.g., exponential family of
distributions), this paper presents a parametric way how to switch between different
functional families of distributions (e.g., from power-laws to stretched exponentials).
This opens a novel connection between parametric and non-parametric information
geometry and enables to classify different types of statistical manifolds related to
various classes of deformed exponential families.
It will be natural to extend these results to generalizations of Bregmann di-
vergence enabling gauge invariance [44]. Moreover we will focus on application of the
results to the canonical ensemble and use the well-known results using Fisher informa-
tion metric on the thermodynamic manifold [45,46] for the case of complex systems,
where we need to use the generalized form of the Boltzmann factor [47]. Moreover,
it should also be possible to go beyond equilibrium statistical mechanics and extend
the generalized Fisher metric to non-equilibrium scenarios [48].
Acknowledgements
We acknowledge support from the Austrian Science Fund FWF project I 3073.
Will be inserted by the editor 15
References
1. S. Thurner, R. Hanel and P. Klimek, Introduction to the Theory of Complex Systems.
Oxford University Press: Oxford, UK (2018).
2. S. Thurner, B. Corominas-Murtra and R. Hanel, Three faces of entropy for complex
systems: Information, thermodynamics, and the maximum entropy principle. Phys.
Rev. E 96 (2017) 032124.
3. N. Ay, J. Jost, H.V. Le, L. Schwachh¨ofer, Information Geometry. Springer: Berlin,
Germany, (2017).
4. S.-I. Amari, Information Geometry and Its Applications. Springer, Japan, (2016).
5. W. Janke, D.A. Johnston and R. Kenna, Information geometry and phase transitions.
Physica A 336 (2004) 181–186.
6. C. Tsallis, Possible Generalization of Boltzmann-Gibbs Statistics. Journal of Statistical
Physics 52 (1988) 479-487.
7. C. Tsallis, M. Gell-Mann and Y. Sato, Asymptotically scale-invariant occupancy of
phase space makes the entropy Sqextensive. Proc. Natl. Acad. Sci. USA 102 (2005)
1537715382.
8. G. Kaniadakis, Statistical mechanics in the context of special relativity. Phys. Rev. E
66 (2002) 056125.
9. P. Jizba and T. Arimitsu, The world according to Rnyi: Thermodynamics of multifrac-
tal systems. Ann. Phys. 312 (2004) 1759.
10. C. Tsallis and L.J. Cirto, Black hole thermodynamical entropy. Eur. Phys. J. C 73
(2013) 2487.
11. T. S. Bir´o, V. G. Czinner, H. Iguchi and P. V´an, Black hole horizons can hide positive
heat capacity. Phys. Lett. B 782 (2018) 228231.
12. C.E. Shannon, A mathematical theory of communication, Bell Syst. Tech. J. 27 (1948)
379-423, 623-656.
13. A.I. Khinchin, Mathematical Foundations of Information Theory. Dover, New York
(1957).
14. S. Abe, Axioms and uniqueness theorem for Tsallis entropy Physics Letters A 271(1-2)
(2000), 74-79.
15. V. M. Ili´c and M. S. Stankovi´c, Generalized ShannonKhinchin axioms and uniqueness
theorem for pseudo-additive entropies. Physica A 411 (2014) 138-145.
16. P. Tempesta, Group entropies, correlation laws, and zeta functions. Phys. Rev. E 84
(2011) 021121.
17. P. Jizba and J. Korbel, Maximum Entropy Principle in Statistical Inference: Case for
Non-Shannonian Entropies. Phys. Rev. Lett. 122 (2019) 120601.
18. R. Hanel and S. Thurner, A comprehensive classification of complex statistical systems
and an axiomatic derivation of their entropy and distribution functions. Europhys.
Lett. 93 (2011) 20006.
19. R. Hanel and S. Thurner, When do generalized entropies apply? How phase space
volume determines entropy. Europhys. Lett. 96 (2011) 50003.
20. C. Anteneodo and A. R. Plastino. Maximum entropy approach to stretched exponential
probability distributions. J. Phys. A 32 (1999) 1089.
21. H.J. Jensen, R.H. Pazuki, G. Pruessner and P. Tempesta, Statistical mechanics of
exploding phase spaces: Ontic open systems. J. Phys. A 51 (2018) 375002.
22. J. Korbel, R. Hanel and S. Thurner, Classification of complex systems by their sample-
space scaling exponents. New J. Phys. 20, 2018, 093007.
23. E. T. Copson, Asymptotic Expansions, Cambridge Tracts in Mathematics, Cambridge
University Press (1965).
24. J. Naudts, Deformed exponentials and logarithms in generalized thermostatistics. Phys-
ica A 316 (2002) 323334.
25. J. Naudts, Generalised thermostatistics. Springer Science & Business Media (2011).
26. J. Korbel, R. Hanel and S. Thurner, Information Geometric Duality of φ-Deformed
Exponential Families. Entropy 21 (2019) 112.
16 Will be inserted by the editor
27. C. Beck and F. Sch¨ogl, Thermodynamics of chaotic systems: an introduction. Cam-
bridge University Press (1995).
28. C. Beck, and E.D.G. Cohen. Superstatistics. Physica A 2003, 322, 267275.
29. C. Tsallis and A.M.C. Souza, Constructing a statistical mechanics for Beck-Cohen
superstatistics. Phys. Rev. E 67 (2003) 026106.
30. R. Hanel, S. Thurner and M. Gell-Mann, Generalized entropies and logarithms and
their duality relations. Proc. Natl. Acad. Sci. USA 109 (2012) 1915119154.
31. S.Abe, Geometry of escort distributions. Phys. Rev. E 68 (2003) 031101.
32. A. Ohara, H. Matsuzoe and S.-I. Amari, A dually flat structure on the space of escort
distributions. J. Phys. Conf. Ser. 201 (2010) 012012.
33. S.-I. Amari, A. Ohara and H. Matsuzoe, Geometry of deformed exponential families:
Invariant, dually-flat and conformal geometries. Physica A 391 (2012) 43084319.
34. D. P. K. Ghikas and F. D. Oikonomou, Towards an information geometric characteri-
zation/classification of complex systems. I. Use of generalized entropies. Physica A 496
(2018) 384.
35. J. Naudts, Continuity of a class of entropies and relative entropies. Rev. Math. Phys.
16 (2004) 809822.
36. F. Caruso and C. Tsallis, Nonadditive entropy reconciles the area law in quantum
systems with classical thermodynamics. Phys. Rev. E 78 (2008) 021102.
37. J. A. Carrasco, F. Finkel, A. Gonz´alez-L´opez, M. A. Rodr´ıguez and P. Tempesta, Gener-
alized isotropic LipkinMeshkovGlick models: ground state entanglement and quantum
entropies. J. Stat. Mech. Theor. Exp. 2016(3) (2016) 033114.
38. J. Zhang, Divergence function, duality, and convex analysis. Neural Computation, 16(1)
(2004) 159-195.
39. S.-I. Amari, Differential-Geometrical Methods in Statistics. Lecture Notes in Statistics
Vol. 28 (2012) Springer Science & Business Media.
40. B. Corominas-Murtra, R. Hanel, S. Thurner, Understanding scaling through history-
dependent processes with collapsing sample space. Proc. Nat. Acad. Sci. USA 112
(2015) 5348.
41. B. Corominas-Murtra, R. Hanel and S. Thurner, Extreme robustness of scaling in
sample space reducing processes explains Zipf’s law in diffusion on directed networks.
New J. Phys. 18 (2016) 093010.
42. B. Corominas-Murtra, R. Hanel and S. Thurner, Sample space reducing cascading
processes produce the full spectrum of scaling exponents. Scientific Reports 7 (2017)
11223.
43. R. Hanel and S. Thurner, Maximum configuration principles for driven systems with
arbitrary driving, Entropy 20 (2018) 838.
44. J. Naudts and J. Zhang, Rhotau embedding and gauge freedom in information geom-
etry. Information geometry 1(1) (2018) 79-115.
45. G. Ruppeiner, Riemannian geometry in thermodynamic fluctuation theory. Rev. Mod.
Phys. 67 (1995) 605.
46. G. E. Crooks, Measuring Thermodynamic Length. Phys. Rev. Lett. 99 (2007) 100602.
47. R. Hanel and S. Thurner, Derivation of power-law distributions within standard sta-
tistical mechanics. Physica A 351(2-4) (2005) 260-268.
48. S. Ito, Stochastic Thermodynamic Interpretation of Information Geometry. Phys. Rev.
Lett. 121 (2018) 030605.
Will be inserted by the editor 17
A Basic algebra of scaling vectors
Let us discuss some definitions of ordinary operations on the space of scaling expo-
nents. First, let us introduce a truncated vector of the scaling vector defined in Eq.
(10) as
Ck={l;c(l)
0, c(l)
1, . . . , c(l)
k}(53)
where k≤n. Then, we can introduce
–Truncated equivalence relation: a(x)∼(k)b(x) if Ak≡ Bk
–Truncated inequality relation: a(x)≺(k)b(x) if Ak<Bk.
Let us also add one set of inequality relations, and particularly for the case, when
even the order lis not equal. For this we define
–Strong inequality relation: a(x)b(x) if la< lb.
Let us investigate representations of basic operations on the space of scaling ex-
ponents. Before that let us define the rescaling of the general operator O:Rm7→ R
as
O(l)(x1, x2, . . . , xm) = exp(l)hO(log(l)x1,log(l)x2,...,log(l)xm)i.(54)
Let us now denote the generalized addition as a(x)⊕(l)b(x) and multiplication as
a(x)⊗(l)b(x). It is easy to show that
a(x)⊗(l)b(x) = a(x)⊕(l+1) b(x) (55)
Let us now consider, without loss of generality, that a(x)≺b(x). The scaling vector
Cof c(x) = a(x)⊗(l)b(x) can be expressed as follows:
C=
A+B= (l, a0+b0, a1+b1, . . . ),for la=lb=l;
B,for l < la≤lbor l=la< lb;
undefined,for l > la.
(56)
The scaling vector Cof the generalized composition c(x) = exp(l)b(log(l)a(x)) can be
expressed as
C=
b(lb)
0A= (la+lb;a0b0, a1b0, a2b0, . . . , anb0),for la=l;
1(lb)A= (la+lb;a0, a1, a2, . . . , an),for l < la;
undefined,for l > la.
(57)
Finally, let us focus on the derivative of the scaling expansion. Let us denote the
rescaled derivative operator as
(l)Dx[f] = exp(l) d(log(l)f(x))
dx!(58)
The scaling vector corresponding to the rescaled derivative is
(l)A0=
(la;a0−1, a1, a2, . . . , an),for la=l;
A,for la> l;
(l;−1,...,−1
| {z }
l−la
,0, . . . ),for la< l. (59)
18 Will be inserted by the editor
B Asymptotic curvature of (l;c, d)-logarithm
In this appendix, we calculate asymptotic properties of the (l;c, d) logarithm. Let us
first express the derivatives of (l;c, d) logarithm in terms of (c, d) logarithm and µl:
log0
(l;c,d)(x) = log0
(c,d)(µl(x))µ0
l(x),(60)
log00
(l;c,d)(x) = log00
(c,d)(x)(µ0
l(x))2+ log0
(c,d)(µl(x))µ00
l(x),(61)
log000
(l;c,d)(x) = log000
(c,d)(µ0
l(x))3+ 3 log00
(c,d)(µl(x))µ0
l(x)µ00
l(x) + log0
(c,d)(µl(x))µ000
l(x).(62)
The derivatives of the nested logarithm µl(x) = [1 + log](l)(x) can be expressed as:
µ0
l(x) = 1
Ql−1
k=0 µk(x),(63)
µ00
l(x) = −µ0
l(x)
l−1
X
k=0
µ0
k(x) = −1
Ql−1
k=0 µk(x)
l−1
X
k=0
1
Qk
j=0 µj(x)
,(64)
µ000
l(x) = −µ00
l(x)
l−1
X
k=0
µ0
k(x)−µ0
l(x)
l−1
X
k=0
µ00
k(x)
=µ0
l(x)
l−1
X
j=0
µ0
j(x)
l−1
X
k=0
µ0
k(x) + µ0
l(x)
l−1
X
k=0
µ0
k(x)
k−1
X
j=0
µ0
j(x)
= 2µ0
l(x)
l−1
X
k=0
µ0
k(x)
k−1
X
j=0
µ0
j(x) + µ0
l(x)
l−1
X
k=0
µ0
k(x)
l−1
X
j=k
µ0
j(x).(65)
Let us first denote l(c,d)(x) = log(c,d)(x) + r. Then the derivatives of log(c,d)can be
expressed as (see also Ref. [26]):
log0
(c,d)(x) = lc,d(x)
x(dr + (1 −cr) log x)[d+c(1 −cr) log x],(66)
log00
(c,d)(x) = lc,d(x)
x2(dr + (1 −cr) log x)2hdd−dr −(cr −1)2
+dc2(r−2)r+ 2c−1log x+ (c−1)c(cr −1)2log2xi,(67)
log000
(c,d)(x) = lc,d(x)
x3(dr + (1 −cr) log x)3hd3d(r−1)(cr −1)2−2(cr −1)3+d22r2−3r+ 1
+d(cr −1) 3c3r2−3c2r(r+ 2) + cd−2r2+ 6r−3+ 6r+ 3+d(3 −4r)−3log x
−d3c2(r−1) + c(6 −4r)−2(cr −1)2log2x
−cc2−3c+ 2(cr −1)3log3xi.(68)
In the asymptotic limit, only dominant contributions are relevant. Thus, let us con-
sider only dominant scaling c(i.e., take d= 0) and we get
log0
(l;c,d)(x) = log0
(c,d)(µl(x))µ0
l(x)≈(µl(x))c−1µ0
l(x),(69)
log00
(l;c,d)(x)≈log0
(c,d)(µl(x))µ00
l(x)≈ −(µl(x))c−1µ0
l(x)
x(70)
log000
(l;c,d)(x)≈log0
(c,d)(µl(x))µ000
l(x)≈(µl(x))c−1µ0
l(x)
x2.(71)
Thus, we plugged in Eqs. (33,34) then we get
rA
(l;c)≈µc−1
l(x)(µ0
l(x))
xµc−1
l(x)(µ0
l(x))2
"µc−1
l(x)(µ0
l(x))µc−1
l(x)(µ0
l(x))
x2−3µc−1
l(x)(µ0
l(x))
x2#2≈xµc−1
l(x)µ0
l(x),(72)
Will be inserted by the editor 19
and
rN
(l;c)≈µc−1
l(x)(µ0
l(x))3
xµc−1
l(x)(µ0
l(x))
x2≈xµc−1
l(x)µ0
l(x).(73)
Let us then focus on the situation l= 1, c= 1. In this case the leading order terms
cancel and we have to look at the first correction given by the scaling exponent d. In
this case
log0
(1;1,d)(x)≈log(1 + log(x))d
x,(74)
log00
(1;1,d)(x)≈log(1 + log(x))d
x2,(75)
log000
(1;1,d)(x)≈log(1 + log(x))d
x3.(76)
So the curvature of both Amari and Naudts type can be asymptotically expressed
as
r(1;1,d)≈log(1 + log(x))d.(77)
C Fisher metric and scalar curvature corresponding to general
logarithm
Let us now show the full calculation of the scalar curvature corresponding to the
ΛD-logarithm with arbitrary scaling vector Dand constants rj. Let us first recall the
product rule for higher derivatives. The first three derivatives of a function ΛD(x) =
RQn
j=0 λj(x)−1:
Λ0
D(x) = R
n
Y
j=0
λj(x)
n
X
j=0
λ0
j(x)
λj(x)
,(78)
Λ00
D(x) = R
n
Y
j=0
λj(x)
2X
i<j
λ0
i(x)λ0
j(x)
λi(x)λj(x)+
n
X
j=0
λ00
j(x)
λj(x)
,(79)
Λ000
D(x) = R
n
Y
j=0
λj(x)
6X
i<j<k
λ0
i(x)λ0
j(x)λ0
k(x)
λi(x)λj(x)λk(x)
+3 X
i<j
λ00
i(x)λ0
j(x) + λ0
i(x)λ00
j(x)
λi(x)λj(x)+X
i
λ000
i(x)
λi(x)
.(80)
The derivatives of λjcan be expressed by defining function L
Lj(x) = 1
(1 + rjlog µj+l−1(x)) Qj+l−1
k=0 µk(x)=µ0
j+l(x)
(1 + rjlog µj+l−1(x)) .(81)
Then we can express
λ0
j(x) = λj(x) (rjdjLj(x)) ,(82)
λ00
j(x) = λj(x)r2
jd2
jL2
j(x) + rjdjL0
j(x),(83)
λ000
j(x) = λj(x)r3
jd3
jL3
j(x)+3r2
jd2
jL0
j(x)Lj(x) + rjdjL00
j(x).(84)
20 Will be inserted by the editor
The derivatives of Lj(x) can be expressed as
L0
j(x) = −L2
j(x)
rj+ (1 + rjlog µj+l−1(x))
l+j−1
X
k=0
j+l−1
Y
m=k+1
µm(x)
,(85)
L00
j(x) = −2L3
j(x)
rj+ (1 + rjlog µj+l−1(x))
l+j−1
X
k=0
j+l−1
Y
m=k+1
µm(x)
2
− L2
j(x)
rj
l+j−1
X
k=0
j+l−1
Y
m=k+1
µm(x)
+ (1 + rjlog µj+l−1(x))
j+l−1
X
k=0
j+l−1
X
m=k+1 Qj+l−1
p=k+1 µp(x)
Qm
p0=0 µp0(x)
.(86)
We can finally rewrite the derivatives of ΛDas
d
dx(ΛD(x)) = R
n
Y
j=0
λj(x)
n
X
j=0 C1
j(x)Lj(x)
,(87)
d2
dx2(ΛD(x)) = R
n
Y
j=0
λj(x)
n
X
i=1
n
X
j=1 C2
ij (x)Li(x)Lj(x)
,(88)
d3
dx3(ΛD(x)) = R
n
Y
j=0
λj(x)
n
X
i=1
n
X
j=1
n
X
k=1 C3
ijk (x)Li(x)Lj(x)Lk(x)
,(89)
where the coefficients Ccan be expressed as
C1
i(x) = ridi,(90)
C2
ij (x) = ridi[rjdj−δij Aj(x)] ,(91)
C3
ijk (x) = ridi[rjdjrkdk−(δij rjdjAj(x) + δikrkdkAk(x) + δj krjdjAj(x)) −δij kBi(x)] ,(92)
where
Ai(x) =
(ri+ (1 + rilog µi+l−1(x)))
i+l−1
X
k=0
i+l−1
Y
m=k+1
µm(x)
,(93)
Bi(x) = 2Ai(x)2+Li(x)
ri
l+j−1
X
k=0
i+l−1
Y
m=k+1
µm(x)
+(1 + rilog µi+l−1(x))
i+l−1
X
k=0
i+l−1
X
m=k+1 Qi+l−1
p=k+1 µp(x)
Qm
p0=0 µp0(x)
.(94)
Finally, we plug the expressions for ΛDand its derivatives into Eqs. (33,34), and we
end with
rA
D(x) = PiC1
iLi(x)2Pkl C2
kl(x)Lk(x)Ll(x)3
Pijkl hC1
i(x)C3
jkl (x)−3C2
ij (x)C2
kl(x)iLi(x)Lj(x)Lk(x)Ll(x)2,(95)
and
rN
D(x) = PiC1
i(x)Li(x)3
xPij C2
ij (x)Li(x)Lj(x)2,(96)
for x= 1/W , respectively.