Content uploaded by Jan Korbel

Author content

All content in this area was uploaded by Jan Korbel on Jan 20, 2020

Content may be subject to copyright.

EPJ manuscript No.

(will be inserted by the editor)

Information geometry of scaling expansions

of non-exponentially growing conﬁguration

spaces

Jan Korbel1,2,a, Rudolf Hanel1,2,b, and Stefan Thurner1,2,3,4,c

1Section for Science of Complex Systems, CeMSIIS, Medical University of Vienna, Spital-

gasse 23, 1090 Vienna, Austria

2Complexity Science Hub Vienna, Josefst¨adter Strasse 39, 1080 Vienna, Austria

3Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA

4IIASA, Schlossplatz 1, 2361 Laxenburg, Austria

Abstract. Many stochastic complex systems are characterized by the

fact that their conﬁguration space doesn’t grow exponentially as a func-

tion of the degrees of freedom. The use of scaling expansions is a nat-

ural way to measure the asymptotic growth of the conﬁguration space

volume in terms of the scaling exponents of the system. These scal-

ing exponents can, in turn, be used to deﬁne universality classes that

uniquely determine the statistics of a system. Every system belongs

to one of these classes. Here we derive the information geometry of

scaling expansions of sample spaces. In particular, we present the de-

formed logarithms and the metric in a systematic and coherent way.

We observe a phase transition for the curvature. The phase transition

can be well measured by the characteristic length r, corresponding

to a ball with radius 2rhaving the same curvature as the statistical

manifold. Increasing characteristic length with respect to size of the

system is associated with sub-exponential sample space growth is as-

sociated to strongly constrained and correlated complex systems. De-

creasing of the characteristic lenght corresponds to super-exponential

sample space growth that occurs for example in systems that develop

structure as they evolve. Constant curvature means exponential sample

space growth that is associated with multinomial statistics, and tradi-

tional Boltzmann-Gibbs, or Shannon statistics applies. This allows us

to characterize transitions between statistical manifolds corresponding

to diﬀerent families of probability distributions.

1 Introduction

Statistical physics of complex systems has turned into an increasingly important

topic with many applications. Its main aim is to come up with a uniﬁed approach to

ae-mail: jan.korbel@meduniwien.ac.at

be-mail: rudolf.hanel@meduniwien.ac.at

ce-mail: stefan.thurner@meduniwien.ac.at

arXiv:2001.06393v1 [cond-mat.stat-mech] 17 Jan 2020

2 Will be inserted by the editor

understand, describe, and predict the statistical properties of a plethora of diﬀerent

complex systems; see e.g., [1] for an overview. While the microscopic nature of complex

systems can be very diﬀerent, their statistical properties often have common features

across various systems. Entropy is undoubtedly the key concept in statistical physics

that connects the statistical description of microscopic dynamics with the macroscopic

thermodynamic properties of a system. The notion of entropy has been adopted also in

other contexts, such as information theory or statistical inference, which are concepts

quite diﬀerent from thermodynamics [2]. One elegant and powerful concept arising

from the theory of statistical inference is that of information geometry [3,4]. It applies

ideas from diﬀerential geometry to probability theory and statistics. In this context,

the concept of entropy also plays a crucial role, since the metric on the statistical

manifold is derived from the corresponding (relative) entropy. This, so-called Fisher-

Rao metric, enables us to analyze statistical systems from a diﬀerent perspective. For

example one can study critical transitions by calculating singularities of the metric

[5].

In information geometry, most attention has focused on systems that are governed

by Shannon entropy [3,4]. However, it is well known that many complex systems,

especially strongly correlated or constrained systems, or systems with emergent com-

ponents, cannot be described within the framework of Shannon entropy [1]. For this

reason, a number of generalizations to Shannon entropy have been proposed; in con-

nection with power laws [6,7], special relativity [8], multifractal thermodynamics [9],

or black holes [10,11].

To classify entropies for stochastic systems of various kinds, it is natural to

start with the information-theoretic foundations of Shannon entropy, i.e. the so-called

Shannon-Khinchin (SK) axioms [12,13]. The ﬁrst three SK axioms are usually formu-

lated as:

–(SK1) Entropy is a continuous function of the probabilities pionly1.

–(SK2) Entropy is maximal for the uniform distribution, pi= 1/W .

–(SK3) Adding a state W+ 1 to a system with pW+1 = 0 does not change the

entropy of the system.

The fourth axiom is called the composability axiom and determines the entropy func-

tional uniquely:

–(SK4) H(A+B) = H(A) + H(B|A), where H(B|A) = PpA

kH(B|Ak)

where H(B|Ak) is the entropy of the conditional probability, pB|Ak. In this for-

mulation, the unique solution that is compatible with SK1-4 is Shannon entropy

H(P) = −Pipilog pi. When the fourth axiom is relaxed, one can obtain wider class

of entropic functionals. First generalizations of the fourth axiom were introduced in

connection with generalized additivity [14,15] group laws [16] or statistical inference

[17]. These approaches are somewhat limited in scope, since they all lead to class of

entropies, which can be expressed as a function of Tsallis entropy [6].

The relaxation of SK4 also naturally leads to a classiﬁcation scheme of complex

systems [18,19]. The main idea of this approach is to study the asymptotic scaling ex-

ponents of the entropy functional that are associated to a particular system’s conﬁgu-

ration space. These systems are associated with systems that have a sub-exponentially

growing conﬁguration space, when seen as a function of degrees of freedom. This clas-

1In several cases, entropies incorporate external parameter such as qfor Tsallis entropy

or cand dfor (c,d)-entropies. However, these parameters are constants that characterize the

universality class of the process. They are not parameters subject to variation in entropy

maximization.

Will be inserted by the editor 3

siﬁcation scheme is based on mathematical analysis of the asymptotic scaling of the

entropic functionals that are governed by the ﬁrst three SK axioms2.

Since the conﬁguration space of most complex systems does not grow exponentially

(as for the case of Shannon entropy), but polynomially [7], as a stretched exponential

[20], or even super-exponentially [21], the appropriate scaling behavior of the entropic

functional is crucial for a proper thermodynamic interpretation. For this end, we

use recently developed scaling expansion [22], which is a special case of Poincar´e

asymptotic series [23], whose coeﬃcients are the scaling exponents of the system.

The aim of this paper is to deﬁne a generalization of Shannon entropy that matches

the appropriate asymptotic scaling of a given system, and use it to derive the asso-

ciated generalized Fisher-Rao metric of the underlying statistical manifold. For this

end, we use the framework of deformed logarithms [35,25]. It has been shown re-

cently [26] that one can naturally obtain two types of information metric within that

framework, one, corresponding to the maximum entropy principle with linear con-

straints, and the other, corresponding to the maximum entropy principle when used

with so-called escort constraints, instead of ordinary (linear) constraint terms.

Escort distributions appeared in connection with chaotic systems [27], and were

discussed in the context of superstatistics [28,29]. Later it became possible to relate

them to linear constraints through a log-duality [30]. Interestingly, escort distributions

also appear as a canonical coordinate in information geometry [31,32]. In this paper,

we use both linear and escort approaches and compare their corresponding metric ten-

sor and its invariants. We focus particularly on the microcanonical ensemble in the

thermodynamic limit, since the metric should correspond to the system’s asymptotic

properties, given by its characteristic structure. Some partial results for the curvature

of escort metric were recently obtained in this direction [34]. However, no systematic

and analytically expressible results for metric tensor and its scalar curvature have

been obtained so far. We show that the curvature of the statistical manifold natu-

rally distinguishes between three types of systems: systems with sub-exponentially

growing conﬁguration or sample space (correlated and constrained systems), expo-

nentially growing sample space (equivalent to ordinary multinomial statistics), and

super-exponentially growing sample space (e.g. systems that develop emergent struc-

tures as they evolve). The vector of scaling exponents plays the role of a set of order

parameters, i.e., the distance from the phase transition between sub-exponential and

super-exponential phases.

The paper is organized as follows: Section 2 introduces the scaling expansion and

how to calculate corresponding scaling exponents. We discuss several systems with

non-trivial scaling exponents. In the last part of the section we establish a represen-

tation of universality classes for complex systems, by introducing scaling vectors and

their basic operations. In Section 3, we brieﬂy revisit the results of information geom-

etry in the framework of φ-deformed logarithms. We focus on information geometry

with both, linear, and escort constraints. The main results of the paper are derived

in Section 4, where we deﬁne the appropriate generalized logarithm by combining

the φ-deformation framework and the requirement of asymptotic scaling. The prop-

erties of corresponding entropic functionals are discussed. We exemplify the whole

approach by the simple, yet very general, class of entropies with one correction term

from the scaling expansion and calculate the asymptotic behavior of scalar curvature

of the microcanonical ensemble in the thermodynamic limit. The last section draws

conclusions. The paper has several appendices that contain several technical details.

2This does not mean that actual distribution functions that are, say, obtained from the

maximum entropy principle must be equi-distributed, since the form of the distribution is

determined not only by the entropic functional, but also by the constraints.

4 Will be inserted by the editor

2 Scaling expansion of the volume of conﬁguration space

The scaling expansion [22] is a method to investigate of the asymptotic scaling be-

havior of a sample space volume, W(N). Here Wis the number of accessible states

in a system, and Nindicates size of a system 3. The scaling expansion is a spe-

cial case of the Poincar´e asymptotic series, where the coeﬃcients correspond to the

scaling exponents of the system. We introduce the notation for the iterated use

of functions, f(n)(x) = f(. . . f (f(x)) . . . )

| {z }

ntimes

, to deﬁne a set of re-scaling operations,

r(n)

λ(x) = exp(n)(λlog(n)(x)). This set of re-scaling operations contains the well-known

multiplicative re-scaling, x7→ λx (n= 0), power rescaling x7→ xλ(n= 1), and the

additive rescaling x7→ x+ log λ(n=−1). For each n,r(n)is a representation of the

multiplicative group (R+,×), i.e., r(n)

λ◦r(n)

λ0=r(n)

λλ0. We now investigate how a func-

tion, W(N), scales with re-scaling of N7→ r(n)

λ(N). Note that due to a simple theorem

(see Appendix A2 in [22]) the function z(λ), deﬁned as z(λ) = limN→∞

g(r(n)

λ(N))

g(N),

must have the form z(λ) = λcfor c∈R∪ {±∞} whenever the limit exists. We start

with multiplicative scaling (n= 0): The expression W(λN )

W(N)is, according to the the-

orem, equal to λc0. We assume that W(N) is a strictly increasing function, then it

follows that c0≥04. It can happen that c0= +∞. In that case, the expression grows

faster than any polynomial. This problem can be resolved by using log(l)(W(N)) in-

stead of W(N), for an appropriate choice of l. The parameter lis chosen such that

c(l)

0, corresponding to log(l)W(λN)

log(l)W(N)∼λc(l)

0, is ﬁnite. We call lthe order of the process.

We get that W(N)∼exp(l)(Nc(l)

0), for N1. To get the corrections to the leading

order, we use the fact that log(l)W(λN )

log(l)W(N)

Nc0

(λN)c0∼1. When we use the re-scaling for

n= 1, we get the second scaling exponent: log(l)W(Nλ)

log(l)W(N)

Nc(l)

0

(Nλ)c(l)

0

∼λc(l)

1. Therefore,

W(N)∼exp(l)(Nc(l)

0(log N)c(l)

1). One can continue along the same lines to obtain the

asymptotic expansion of W(N), which reads

W(N)∼exp(l)

n

Y

j=0

(log(j)N)c(l)

j

for N→ ∞ ,(1)

where c(l)

jare the characteristic scaling exponents. The scaling expansion of log(l)W(N)

can be written

log(log(l)W(N)) =

n

X

j=0

c(l)

jlog(j+1) N+O(logn+1(N)) .(2)

It can be shown that the scaling exponents can be calculated from W(N) as

c(l)

k= lim

N→∞ log(k)(N) log(k−1)(N) ... log(N) Nd log(l)(W(N))

dN−c(l)

0!−c(l)

1!...!−c(l)

k−1!.

(3)

3For example, think of Nas the number of particles in a system, or the number of throws

in a coin tossing experiment.

4Details about processes with reducing sample space can be found e.g., in Refs.

[40,41,42,43].

Will be inserted by the editor 5

As a next step, we apply the scaling expansion to obtain the corresponding ex-

tensive entropy functionals. It is well-known that for complex systems (with sub- or

super-exponential phase space growth) the Shannon-Boltzman-Gibbs entropy is not

an extensive quantity. To obtain an extensive expression for such systems, one can

introduce an appropriate generalization of the entropy functional [1]. A natural way

how to characterize thermodynamic entropy is to deﬁne the entropy functional S(W)

which is extensive. This requirement can be expressed for the microcanonical ensem-

ble as S(W(N)) ∼Nfor N→ ∞. For the purpose of thermodynamics, we do not

have to require exact extensivity (with equality sign), but only its weaker asymptotic

version. We consider the general trace-form entropy functional

S(P) =

W

X

i=1

g(pi).(4)

The scaling expansion of the extensive entropy in the microcanonical ensemble can

be expressed as

S(W)∼

n

Y

j=0

log(j+l)(W(N))d(l)

jfor N→ ∞ ,(5)

and the scaling expansion of g(x) is

g(x)∼x

n

Y

j=0

log(j+l)1

xd(l)

j

.(6)

The scaling coeﬃcients d(l)

jcan be obtained by

d(l)

k= lim

N→∞ log(l+k)(W) log(l+k−1)(W) ... log(l)(W) Nd log(l)W(N)

dN!−1!−d(l)

0!...!−d(l)

k−1!.

(7)

The requirement of extensivity determines the relation between scaling exponents

c(l)

jand d(l)

jas

d(l)

0=1

c(l)

0

d(l)

k=−c(l)

k

c(l)

0

k= 1,2, . . . . (8)

Examples of systems with diﬀerent scaling exponents. The ﬁrst example is a ran-

dom walk (RW) on the discrete one-dimensional lattice with two possible steps: left

or right. The space of all possible paths grows exponentially, WRW (N)=2N∼

exp(N), and we obtain the formula for Boltzmann entropy SRW = log WRW (kB=

1). Now consider an aging random walk (ARW) [19], where the walker takes one

step in a random direction, followed by two steps into a random direction, fol-

lowed by three steps, etc. In this case, the sample space grows sub-exponentially,

WARW ∼2√N/2, and SARW = (log WARW )2. The next example is the magnetic

coin model (MC) [21], where each coin can be in two states: head or tail, however,

two coins can also stick together and create a bond state. It can be shown that

the corresponding sample space grows super-exponentially, WMC ∼NN/2e2√N. One

6 Will be inserted by the editor

(a)

stretched exp

RW

CS

q-exp

MC

RN

RWC ARW

violates K2

violates K3

Lambert W0-expLambert W-1-exp

-2-1 1 2 d

-0.5

0.5

1.0

1.5

c

Fig. 1. parametric space of scaling expansion universality classes with scaling exponents

of Random walk (RW), Aging random walk (ARW) Magnetic coin model (MC), Random

networks (RN), Random walk cascade (RWC) and processes with compact support distri-

bution (CS). (a) 2D parametric space of scaling expansion universality classes for the ﬁrst

two exponents (as in [18]). We see that some super-exponential systems are not properly

represented. (b) Extension to three dimensions by adding the third scaling exponent, d2. All

mentioned examples can be described with the ﬁrst three scaling exponents.

can conclude that the corresponding extensive entropy is asymptotically equivalent

to SMC = log WMC /log log WM C . Another example of super-exponential processes

are random networks (RN), whose sample spaces grow as WRN = 2(N

2), and thus,

SRN = (log WRN )1/2. The ﬁnal example is the double-exponential growth of random

walk cascade (RWC), where the walker can take a step to the right, to the left, or

split into two independent walkers [22]. For this we get that WRWC = 22N−1, and,

SRW C = log log WRW C . In Fig. 1 we show the parameter space of entropies given by

three scaling exponents (d0, d1, d2). The above examples are indicated as points. In

Fig. 1 (a) the plane for the ﬁrst two scaling exponents is shown, as presented in [18].

We see that if one uses only the ﬁrst two exponents, some super-exponential pro-

cesses are not properly represented. By adding a third scaling exponent this problem

is solved Fig. 1 (b). So far, we have not yet found simple examples that need more

than three scaling exponents.

2.1 Universality classes for scaling expansions

Scaling expansions deﬁne universality classes of statistical complex systems according

to set of the scaling exponents of their sample space [22]. The representation of the

sample space volume, W(N), by its scaling expansion can be used to uniquely describe

the statistical properties in the thermodynamic limit.

Consider a function c(x) represented by its scaling expansion

c(x)∼exp(l)

n

Y

j=0 log(j)(x)c(l)

j

.(9)

Its scaling exponents can be collected in the scaling vector

C={l;c(l)

0, c(l)

1, . . . , c(l)

n}.(10)

In principle, the scaling vector can be inﬁnite, however, typically, after several terms

the corrections are either zero, or do not contribute signiﬁcantly. The parameter n

denotes the number of corrections.

Will be inserted by the editor 7

Let a(x) and b(x) be two functions with its respective scaling expansion deter-

mined by the two vectors of scaling exponents

A={la;a0, a1, . . . , an}(11)

B={lb;b0, b1, . . . , bn}.(12)

Without loss of generality, ncan be the same for both vectors because one can always

append zeros to the shorter vector. We can now deﬁne the equivalence relation

a(x)∼b(x) if A≡B,(13)

as well as natural ordering

a(x)≺b(x) if A<B,(14)

where the symbol <is used in the lexicographic meaning, i.e.,

A<Bif

la< lb

la=lb, a0< b0

la=lb, a0=b0, a1< b1

. . . .

(15)

For every vector Cwe deﬁne the corresponding entropy scaling vector D, denoted

by D=C−1, that is obtained from Eq. (8) by requirement of extensivity. One can

deﬁne analogous relations for Dthrough the relations for corresponding vectors C.

Thus, for entropy scaling vectors Eand F, we can say that

E<Fif

le< lf

le=lf, e0< f0

le=lf, e0=f0, e1>f1

. . . .

(16)

Note that for sub-leading scaling exponents the inequality is reversed, which is the

result of Eq. (8). Additionally, one can deﬁne basic algebraic operations on the scaling

vectors, such as generalized addition or derivative operator. More details can be found

in Appendix A. Let us make an important note. As discussed in [22], the SK axioms

set requirements on the admissible set of scaling exponents. From SK2 we get that

dl≡d(l)

0>0 and from SK3 that d0<1. Note that the vector Dcan be also

represented as

D={l;d(l)

0, d(l)

1, . . . , d(l)

n}={0,...,0

| {z }

ltimes

, dl, dl+1, . . . , dl+n}.(17)

This means that one can use the representation without specifying lwith an appro-

priate number of zeros at the beginning. This is useful for example for the plots in the

parametric space, where it is possible to plot processes of diﬀerent order l(as e.g., in

Fig. 1). However, one has to keep in mind that this representation can be misleading

in the sense that the limit dl→0 does not have a clear meaning, since it changes the

order of the process. This can be nicely seen in the example of Tsallis entropy [6],

where

lim

q→1−PW

i=1 pq

i−1

1−q=−

W

X

i=1

pilog pi,(18)

which can be formulated in terms of entropy scaling vectors for as

lim

q→1−D= lim

q→1−(1 −q, 0) = (0,1) (19)

8 Will be inserted by the editor

Interestingly, the limit from above, q→1+, is even more pathological. In this case the

scaling vector corresponding to Sq(P) for q > 1 is (0,0), because Sq(N)∼N1−q+ 1 ∼N0.

These pathologies have their origin in the non-commutativity of limits, limN→∞ limdl→06=

limdl→0limN→∞. The limit dl→0 depends on the particular representation of the

extensive entropy.

3 Information geometry of φ-deformations

Information geometry plays a central role in theory of information as well as in sta-

tistical inference. It allows one to study the structure of the statistical manifold by

means of diﬀerential geometry. We derive the information-geometric properties of the

scaling expansion in the framework of φ-deformed logarithms introduced in [35,25].

The φ-deformation is a generalization of logarithmic functions. It can be subsequently

used to establish a connection with information theory, where the logarithm plays the

role of a natural information measure (Hartley information). The φ-deformed loga-

rithm is deﬁned by a positive, strictly increasing function φ(x), on (0,+∞) as

logφ(x) = Zx

1

dy

φ(y).(20)

Hence, logφis an increasing concave function with logφ(1) = 0. For φ(x) = x, we

obtain the ordinary logarithm. Naturally,

d logφ(x)

dx=1

φ(x).(21)

The inverse function of logφ, the so-called φ-exponential, is an increasing and convex

function. This enables one to deﬁne the parametric φ-exponential family of probability

distributions as

p(x;θ) = expφ Ψ(θ) + X

i

xiθi!,(22)

where the function Ψ(θ) is called the Massieu function and normalizes the distribu-

tion. As discussed in [26], there are two natural ways how to make a connection with

the theory of information through the maximum entropy principle. The ﬁrst is based

on the maximization of the entropy functional under the linear (thermodynamic) con-

straints, the latter is based on a maximization under so-called escort (or geometric)

constraints. Both approaches lead to the φ-exponential family. The former approach

deﬁnes the φ-deformed entropy as [25]

SN

φ(p) =

W

X

i=1 Zpi

0

dxlogφ(x) (23)

which is maximized by the φ-exponential family for linear constraints, i.e., constraints

of the type

W

X

i=1

piEi=hEi.(24)

In information geometry, escort distributions play a special role of dual coordinates

on statistical manifolds [33]. They can be deﬁned by φ-deformations as

Pφ

i=φ(pi)

Pkφ(pk)=φ(pi)

hφ(P).(25)

Will be inserted by the editor 9

It can be shown that the entropy maximized by the φ-exponential family for escort

constraints, i.e., for constraints of the type

W

X

i=1

Pφ

iEi=hEiφ,(26)

can be expressed as

SA

φ(p) =

W

X

i=1

Pφ

ilogφ(pi) = PW

i=1 φ(pi) logφ(pi)

PW

j=1 φ(pj).(27)

For both approaches can be linked to information geometry, i.e., to derive a general-

ization of a Fisher information metric, which can be done through a divergence (or

relative entropy) of Bregmann type, which is deﬁned as

Df(p||q) = f(p)−f(q)− h∇f(q), p −qi,(28)

where h·,·i denotes the inner product. Alternatively, one can use the divergence

of Csisz´ar type, but its information geometry is trivial, because it is conformal to

ordinary Fisher information geometry, see e.g., Refs. [26,35].

Let us consider a parametric family of distributions p(θ). The Fisher information

metric of this family at point θ0can be calculated as

gf

ij (θ) = ∂2Df(p(θ0)||p(θ))

∂θi∂θj

|θ=θ0.(29)

Let us consider a discrete probability distribution {pi}n

i=0. The normalization is

given by Pn

i=0 pi= 1, so we consider pi, . . . , pnas independent variables, while

p0is determined from p0= 1 −Pn

i=1 pi. We parameterize this probability sim-

plex by a φ-deformed exponential family5. For the entropy SN

φ, we have fN

φ(p) =

PiRpi

0logφ(x) dxwhile for SA

φ(p) we end with f(p) = PiPθ

ilogφ(pi). After a straight-

forward calculation, we obtain that [26]

gN

φ,ij (P) = log0

φ(pi)δij + log0

φ(p0) = 1

φ(pi)δij +1

φ(p0),(30)

and

gA

φ,ij (P) = −1

hφ(p) log00

φ(pi)

log0

φ(pi)δij +log00

φ(p0)

log0

φ(p0)!=1

hφ(p)φ0(pj)

φ(pj)δij +φ0(p0)

φ(p0),

(31)

respectively. As a result, for a given φ-deformation there are two types of metric

on the information manifold. Note that it is natural to consider a one-parametric

class of aﬃne connections for which we obtain the so-called dually-ﬂat structure for

which the corresponding Christoﬀel coeﬃcients vanish [33]. This structure is useful

in information geometry, however, we stick to the well-known Levi-Civita connection

(which can be obtained as a special case of a dually-ﬂat connection, since the Levi-

Civita connection is the only self-dual connection [4]), because the metric is non-

vanishing. Thus, the corresponding invariants, such as scalar curvature, are non-trivial

and reveal some information about the statistical manifold.

5Note that this parametric family typically constitutes a smooth manifold [33].

10 Will be inserted by the editor

Let us now focus on the scalar curvature of corresponding to the metric tensor,

Rφ=gik

φglj

φRφ,ilkj , in the thermodynamic limit N→ ∞. We focus on the micro-

canonical ensemble, i.e., we consider pi= 1/W . We assume no prior information

about the system or its dynamics, so all states are equally probable.

It is possible to show in a technical but straightforward calculation that the scalar

curvature can be expressed as (see also [38,39])

Rφ(W) = W(W−1)

(2rφ(W+ 1))2,(32)

which corresponds to the scalar curvature of a W-dimensional ball of radius 2rφ. The

function rφdepends only on the form of the φ-deformation. We call the function rφ

characteristic length. For the case of the Amari metric, it can be expressed as

(rA

φ(W))2=−log0

φ1

W2log00

φ1

W3

log000

φ1

Wlog0

φ1

W−3 log00

φ1

W22,(33)

while for the metric of Naudts type we obtain

(rN

φ(W))2=W(log0

φ1

W)3

(log00

φ1

W)2.(34)

4 Information geometry of scaling expansions

Let us now consider an arbitrary φ-deformed logarithm. We show how to introduce

a generalization of the logarithm with a given asymptotic scaling. In contrast to φ-

deformations, we do not start with the deﬁnition of φ, but focus on the deﬁnition of

the logarithm. We denote the desired logarithmic function as ΛD. Let us state the

requirements that ΛDshould fulﬁl:

1. Domain:ΛD:R+→R,

2. Monotonicity:Λ0

D(x)>0,

3. Concavity:Λ00

D(x)<0,

4. Normalization:Λ0

D(1) = 1,

5. Self-duality:ΛD(1/x) = −ΛD(x),

6. Scaling expansion:ΛD(x)∼Qk

j=0 hlog(j+l)(x)id(l)

jfor x→ ∞.

The requirements follow the properties of the ordinary logarithm. Particularly conve-

nient is the self-duality requirement, from which we can directly calculate the asymp-

totic expansion around 0+. A direct consequence of self-duality is that ΛD(1) = 0.

Next, we want to ﬁnd a representation that is simple, analytically expressible, and

universal for any set of scaling exponents. Due to the self-duality requirement, we

can focus only on the interval (1,+∞), while on the interval (0,1) the logarithm is

deﬁned by the self-duality. To ﬁnd an appropriate representation, we start from the

scaling expansion itself. Unfortunately, the scaling expansion, Qk

j=0 hlog(j+l)(x)id(l)

j,

is not generally deﬁned on the whole interval (1,∞), since the domain of log(l)(x) is

(exp(l−2)(1),∞). We can overcome this issue by adjusting the nested logarithm by

replacing log 7→ 1 + log. Further, to be able to fulﬁl the normalization condition, we

Will be inserted by the editor 11

add a multiplicative constant to the ﬁrst nesting, so that for each order the corre-

sponding term can be expressed as (1 +rjlog([1 +log](j−1) (x)). Thus, the generalized

logarithm can be expressed as

ΛD(x) = R

n

Y

j=0 1 + rjlog[1 + log](j+l−1)(x)d(l)

j−1

.(35)

The logarithm automatically fulﬁls the condition ΛD(1) = 0. The parameters rn

deﬁne the set of scale parameters that inﬂuence the behavior at ﬁnite values, while

the asymptotic properties are preserved. Because

Λ0

D(1) = R

n

X

j=0

rjd(l)

j(36)

we can obtain normalization of the derivative in several ways. For this we deﬁne the

“calibration”

r0=ρ1−rPn

j=1 d(l)

j

rd(l)

0

(37)

rk=ρ(38)

R=r/ρ , (39)

where rand ρare free parameters. The parameter ρcan be determined by additional

requirements. The ﬁrst option is to require that ΛDis smooth enough, at least it

has continuous second derivative. From the second derivative of the self-duality con-

dition together with the normalization condition, we get Λ00

D(1) = −1. Following a

straightforward calculation, we ﬁnd

Λ00

D(1) = R

2X

i<j

rirjd(l)

id(l)

j+

n

X

j=0 r2

jd(l)

j.d(l)

j−1−(j+l)rjd(l)

j

.(40)

Using Eq. (37) in Eq. (40), we get an expression for ρC, i.e., the scale parameter in

the smooth calibration

ρC=rPjd(l)

j+l−1

d(l)

0−1

d(l)

0r1−rPjd(l)

j2+ (2 −r)Pjd(l)

j−2rPj≤id(l)

id(l)

j+rPjd(l)

j2.

(41)

The free parameter rcan be used to ensure that ρis positive. Alternatively, we can

simply consider r0= 1, which is useful for several applications. In this case we get

that ρLscale parameter in the leading-order calibration is simply

ρL=rd(l)

0

1−rPn

j=1 d(l)

j

.(42)

Note that after a proper normalization, this calibration corresponds to the calibration

used in [18,19]. Unless a continuous second derivative is explicitly required, it is more

convenient to work with this simpler calibration.

We now turn our attention to the information geometry of ΛD-deformations and

introduce a notation for the nested logarithm

µk(x) = [1 + log](k)(x).(43)

12 Will be inserted by the editor

We sketch the results on for the scaling expansion with one correction. All technical

details can be found in Appendix B. In Appendix C we show the calculation for

arbitrary scaling vectors and calibrations, which is technically more diﬃcult, but

leads to the same results. We now denote the scaling vector as D= (l;c, d). Note

that this entropy has been studied for l= 0 in [18]. This inspires us to deﬁne the

generalized logarithm as

log(l;c,d)(x) = r µl(x)c1 + 1−cr

dr log µl(x)d

−1!= log(c,d)(µl(x)) .(44)

This deﬁnition corresponds to the choice of ρin Eq. (42)6. The logarithms are de-

picted in Fig. 2(a) for various scaling exponents. The inverse function, the deformed

exponential, can be obtained in terms of the Lambert-W function7

exp(l;c,d)(x) = νlexp −d

chWB(1 −x/r)1/d−W(B)i ,(45)

where B=cr

1−cr exp cr

1−cr and νlis the inverse function of µl, i.e.,

νl(x) = exp(exp(. . . (exp(x−1) −1) . . . ))

| {z }

ltimes

.(46)

Note that depending on the values of cand dthis deformed exponential contains the

exponential, power laws, and stretched exponentials, respectively [1]. It is easy to see

that the corresponding scaling vector of the exponential is C= (l; 1/c, −d/c). The

function φ(l;c,d)(x) can be expressed as

φ(l;c,d)(x) = φ(0;c,d)(µl(x)) ·µl(x)−1

=µl(x)

log(c,d)−r

dr + (1 −cr) log µl(x)

d+c(1 −cr) log µl(x)

l−1

Y

j=0

µk(x).(47)

The escort distribution, ρ(l;c,d)(p) = φ(l;c,d)(p)/(φ(l;c,d)(p) + φ(l;c,d)(1 −p)), corre-

sponding to the two-event distribution (p, 1−p) is depicted in Fig. 2(b) for various

scaling exponents. Interestingly, for D<(1; 1), i.e., for entropies corresponding to sub-

exponential sample space growth, the distribution shows high probabilities (generally

p > 1/N), while for D>(1; 1), i.e., for super-exponential growth, the distribution

shows low probabilities (p < 1/N). Let us ﬁnally show the asymptotic behavior of the

curvature that corresponds to the deformed logarithm. It can be easily calculated if

one keeps only dominant contributions from each term in the Eqs. (33) and (34). In

this case we have

dnlog(l;c)(x)

dxn≈(µl(x))c−1µ0

l(x)x1−nfor x→ ∞ ,(48)

and therefore

r(l;c)(W)≈1/W µc−1

l(W)µ0

l(W) for W→ ∞ ,(49)

6Note that the original (c, d)-logarithm (as appearing in the rightmost part of Eq. (44))

was introduced in [18] for l= 0 and c7→ 1−c. Nevertheless, that choice of parametrization

is not so convenient for l > 0.

7The Lambert-W function is deﬁned as the solution of the equation W(z)eW(z) = z.

Will be inserted by the editor 13

2 4 6 8 10x

-4

-2

0

2

4

Log(l,c,d)(x)

(a)Generalized logarithm

RW ARW MC RN RWC

0.2 0.4 0.6 0.8 1.0x

0.2

0.4

0.6

0.8

1.0

ϕ(l,c,d)(x)

(b)Escort distribution

Fig. 2. (a) Generalized logarithms corresponding to scaling exponents of the aforementioned

models. (b) Escort distributions corresponding to the generalized logarithms. The scaling

exponents (l;c, d) for the models are: Random walk (RW): (1; 1,0), Ageing random walk

(ARW): (1; 2,0), Magnetic coin model (MC): (1; 1,−1), Random network (RN): (1; 1/2,0),

Random walk cascade (RWC): (2;1,0).

20 40 60 80 100N

1.0

1.5

2.0

2.5

3.0

3.5

r(l;c,d)

A(N)

(a)Amari radius

20 40 60 80 100N

1

2

3

4

r(l;c,d)

N(N)

(b)Naudts radius

Fig. 3. Characteristic length corresponding to the curvature of the statistical manifold

for equiprobable distribution corresponding to diﬀerent scaling exponents. Figure (a) corre-

sponds to length of Amari type, (b) corresponds to length of Naudts type, respectively.

for both curvatures, calculated from both types of metric, as shown in B. From this

we deduce that

lim

W→∞ r(W) = +∞, l = 0 or l= 1, c > 1

0, l ≥2 or l= 1, c < 1.(50)

For the case l= 1 and c= 1, we can make a similar approximation

dnlog(1;1,d)(x)

dxn≈x−nlog(1 + log(x))d,(51)

to get

r(1;1,d)≈log(1 + log(W))dN→ ∞ (52)

Similar results can be obtained for higher-order corrections. The behavior of rfor

diﬀerent scaling vectors is depicted in Fig. 3. We see that the asymptotic behavior is

similar for both types of curvature, the only diﬀerence is for smaller N.

Similarly, we obtain the same behavior also for higher-order corrections (see Ap-

pendix C). In conclusion, we ﬁnd three distinct regimes for the statistical manifold

with respect to the scaling vector

14 Will be inserted by the editor

(I) D<(1; 1) ⇔rD(W)→ ∞ for W→ ∞,

(II) D= (1; 1) ⇔rD(W) = 1 for W > 0,

(III) D>(1; 1) ⇔rD(W)→0 for W→ ∞.

As a result, the curvature exhibits a phase transition — the statistical manifold in

thermodynamic limit is ﬂattening for sub-exponential processes, has constant section

curvature for exponential processes, and is curving for super-exponential processes.

While processes with exponentially growing sample space have (practically) indepen-

dent sub-systems, sub-exponential processes impose some restrictions and constaints

on the sample space. Super-exponential processes are characterized by emergent struc-

tures of its sample space. The scaling vector plays the natural role of the set of order

parameters. Let us ﬁnally note that the limit W→ ∞ is performed for rD(W).

The “limit space” obtained in the limit of the statistical manifolds, for W→ ∞,

might not be a smooth manifold and the curvature might not correspond to the limit

limW→∞ RD(W).

5 Conclusions and Perspectives

In this paper, we have deﬁned a class of deformed logarithms with a given scaling

expansion in the framework of φ-deformed logarithms. The corresponding entropy

can be used to deﬁne the statistical manifold with generalized Fisher-Rao metric. We

have shown that for the microcanonical ensemble in the thermodynamic limit, the

scalar curvature exhibits a phase transition where the critical point is represented by

the class of phenomena that are characterized by exponentially growing phase spaces.

These include weakly interacting systems that are correctly described by Shannon

entropy. The scaling vector of a given system naturally deﬁnes a set of order param-

eters. A possible explanation for this phenomenon is that the number of independent

degrees of freedom grows slower than the size of the system for sub-exponential pro-

cesses and faster for the super-exponential processes. This classiﬁcation, however, does

not appear for the case of the Fisher metric of Csisz´ar type, since the characteristic

length is constant for every φ-deformation.

Contrary to common approach in information geometry, where the statistical man-

ifold corresponds to one functional family of distributions (e.g., exponential family of

distributions), this paper presents a parametric way how to switch between diﬀerent

functional families of distributions (e.g., from power-laws to stretched exponentials).

This opens a novel connection between parametric and non-parametric information

geometry and enables to classify diﬀerent types of statistical manifolds related to

various classes of deformed exponential families.

It will be natural to extend these results to generalizations of Bregmann di-

vergence enabling gauge invariance [44]. Moreover we will focus on application of the

results to the canonical ensemble and use the well-known results using Fisher informa-

tion metric on the thermodynamic manifold [45,46] for the case of complex systems,

where we need to use the generalized form of the Boltzmann factor [47]. Moreover,

it should also be possible to go beyond equilibrium statistical mechanics and extend

the generalized Fisher metric to non-equilibrium scenarios [48].

Acknowledgements

We acknowledge support from the Austrian Science Fund FWF project I 3073.

Will be inserted by the editor 15

References

1. S. Thurner, R. Hanel and P. Klimek, Introduction to the Theory of Complex Systems.

Oxford University Press: Oxford, UK (2018).

2. S. Thurner, B. Corominas-Murtra and R. Hanel, Three faces of entropy for complex

systems: Information, thermodynamics, and the maximum entropy principle. Phys.

Rev. E 96 (2017) 032124.

3. N. Ay, J. Jost, H.V. Le, L. Schwachh¨ofer, Information Geometry. Springer: Berlin,

Germany, (2017).

4. S.-I. Amari, Information Geometry and Its Applications. Springer, Japan, (2016).

5. W. Janke, D.A. Johnston and R. Kenna, Information geometry and phase transitions.

Physica A 336 (2004) 181–186.

6. C. Tsallis, Possible Generalization of Boltzmann-Gibbs Statistics. Journal of Statistical

Physics 52 (1988) 479-487.

7. C. Tsallis, M. Gell-Mann and Y. Sato, Asymptotically scale-invariant occupancy of

phase space makes the entropy Sqextensive. Proc. Natl. Acad. Sci. USA 102 (2005)

1537715382.

8. G. Kaniadakis, Statistical mechanics in the context of special relativity. Phys. Rev. E

66 (2002) 056125.

9. P. Jizba and T. Arimitsu, The world according to Rnyi: Thermodynamics of multifrac-

tal systems. Ann. Phys. 312 (2004) 1759.

10. C. Tsallis and L.J. Cirto, Black hole thermodynamical entropy. Eur. Phys. J. C 73

(2013) 2487.

11. T. S. Bir´o, V. G. Czinner, H. Iguchi and P. V´an, Black hole horizons can hide positive

heat capacity. Phys. Lett. B 782 (2018) 228231.

12. C.E. Shannon, A mathematical theory of communication, Bell Syst. Tech. J. 27 (1948)

379-423, 623-656.

13. A.I. Khinchin, Mathematical Foundations of Information Theory. Dover, New York

(1957).

14. S. Abe, Axioms and uniqueness theorem for Tsallis entropy Physics Letters A 271(1-2)

(2000), 74-79.

15. V. M. Ili´c and M. S. Stankovi´c, Generalized ShannonKhinchin axioms and uniqueness

theorem for pseudo-additive entropies. Physica A 411 (2014) 138-145.

16. P. Tempesta, Group entropies, correlation laws, and zeta functions. Phys. Rev. E 84

(2011) 021121.

17. P. Jizba and J. Korbel, Maximum Entropy Principle in Statistical Inference: Case for

Non-Shannonian Entropies. Phys. Rev. Lett. 122 (2019) 120601.

18. R. Hanel and S. Thurner, A comprehensive classiﬁcation of complex statistical systems

and an axiomatic derivation of their entropy and distribution functions. Europhys.

Lett. 93 (2011) 20006.

19. R. Hanel and S. Thurner, When do generalized entropies apply? How phase space

volume determines entropy. Europhys. Lett. 96 (2011) 50003.

20. C. Anteneodo and A. R. Plastino. Maximum entropy approach to stretched exponential

probability distributions. J. Phys. A 32 (1999) 1089.

21. H.J. Jensen, R.H. Pazuki, G. Pruessner and P. Tempesta, Statistical mechanics of

exploding phase spaces: Ontic open systems. J. Phys. A 51 (2018) 375002.

22. J. Korbel, R. Hanel and S. Thurner, Classiﬁcation of complex systems by their sample-

space scaling exponents. New J. Phys. 20, 2018, 093007.

23. E. T. Copson, Asymptotic Expansions, Cambridge Tracts in Mathematics, Cambridge

University Press (1965).

24. J. Naudts, Deformed exponentials and logarithms in generalized thermostatistics. Phys-

ica A 316 (2002) 323334.

25. J. Naudts, Generalised thermostatistics. Springer Science & Business Media (2011).

26. J. Korbel, R. Hanel and S. Thurner, Information Geometric Duality of φ-Deformed

Exponential Families. Entropy 21 (2019) 112.

16 Will be inserted by the editor

27. C. Beck and F. Sch¨ogl, Thermodynamics of chaotic systems: an introduction. Cam-

bridge University Press (1995).

28. C. Beck, and E.D.G. Cohen. Superstatistics. Physica A 2003, 322, 267275.

29. C. Tsallis and A.M.C. Souza, Constructing a statistical mechanics for Beck-Cohen

superstatistics. Phys. Rev. E 67 (2003) 026106.

30. R. Hanel, S. Thurner and M. Gell-Mann, Generalized entropies and logarithms and

their duality relations. Proc. Natl. Acad. Sci. USA 109 (2012) 1915119154.

31. S.Abe, Geometry of escort distributions. Phys. Rev. E 68 (2003) 031101.

32. A. Ohara, H. Matsuzoe and S.-I. Amari, A dually ﬂat structure on the space of escort

distributions. J. Phys. Conf. Ser. 201 (2010) 012012.

33. S.-I. Amari, A. Ohara and H. Matsuzoe, Geometry of deformed exponential families:

Invariant, dually-ﬂat and conformal geometries. Physica A 391 (2012) 43084319.

34. D. P. K. Ghikas and F. D. Oikonomou, Towards an information geometric characteri-

zation/classiﬁcation of complex systems. I. Use of generalized entropies. Physica A 496

(2018) 384.

35. J. Naudts, Continuity of a class of entropies and relative entropies. Rev. Math. Phys.

16 (2004) 809822.

36. F. Caruso and C. Tsallis, Nonadditive entropy reconciles the area law in quantum

systems with classical thermodynamics. Phys. Rev. E 78 (2008) 021102.

37. J. A. Carrasco, F. Finkel, A. Gonz´alez-L´opez, M. A. Rodr´ıguez and P. Tempesta, Gener-

alized isotropic LipkinMeshkovGlick models: ground state entanglement and quantum

entropies. J. Stat. Mech. Theor. Exp. 2016(3) (2016) 033114.

38. J. Zhang, Divergence function, duality, and convex analysis. Neural Computation, 16(1)

(2004) 159-195.

39. S.-I. Amari, Diﬀerential-Geometrical Methods in Statistics. Lecture Notes in Statistics

Vol. 28 (2012) Springer Science & Business Media.

40. B. Corominas-Murtra, R. Hanel, S. Thurner, Understanding scaling through history-

dependent processes with collapsing sample space. Proc. Nat. Acad. Sci. USA 112

(2015) 5348.

41. B. Corominas-Murtra, R. Hanel and S. Thurner, Extreme robustness of scaling in

sample space reducing processes explains Zipf’s law in diﬀusion on directed networks.

New J. Phys. 18 (2016) 093010.

42. B. Corominas-Murtra, R. Hanel and S. Thurner, Sample space reducing cascading

processes produce the full spectrum of scaling exponents. Scientiﬁc Reports 7 (2017)

11223.

43. R. Hanel and S. Thurner, Maximum conﬁguration principles for driven systems with

arbitrary driving, Entropy 20 (2018) 838.

44. J. Naudts and J. Zhang, Rhotau embedding and gauge freedom in information geom-

etry. Information geometry 1(1) (2018) 79-115.

45. G. Ruppeiner, Riemannian geometry in thermodynamic ﬂuctuation theory. Rev. Mod.

Phys. 67 (1995) 605.

46. G. E. Crooks, Measuring Thermodynamic Length. Phys. Rev. Lett. 99 (2007) 100602.

47. R. Hanel and S. Thurner, Derivation of power-law distributions within standard sta-

tistical mechanics. Physica A 351(2-4) (2005) 260-268.

48. S. Ito, Stochastic Thermodynamic Interpretation of Information Geometry. Phys. Rev.

Lett. 121 (2018) 030605.

Will be inserted by the editor 17

A Basic algebra of scaling vectors

Let us discuss some deﬁnitions of ordinary operations on the space of scaling expo-

nents. First, let us introduce a truncated vector of the scaling vector deﬁned in Eq.

(10) as

Ck={l;c(l)

0, c(l)

1, . . . , c(l)

k}(53)

where k≤n. Then, we can introduce

–Truncated equivalence relation: a(x)∼(k)b(x) if Ak≡ Bk

–Truncated inequality relation: a(x)≺(k)b(x) if Ak<Bk.

Let us also add one set of inequality relations, and particularly for the case, when

even the order lis not equal. For this we deﬁne

–Strong inequality relation: a(x)b(x) if la< lb.

Let us investigate representations of basic operations on the space of scaling ex-

ponents. Before that let us deﬁne the rescaling of the general operator O:Rm7→ R

as

O(l)(x1, x2, . . . , xm) = exp(l)hO(log(l)x1,log(l)x2,...,log(l)xm)i.(54)

Let us now denote the generalized addition as a(x)⊕(l)b(x) and multiplication as

a(x)⊗(l)b(x). It is easy to show that

a(x)⊗(l)b(x) = a(x)⊕(l+1) b(x) (55)

Let us now consider, without loss of generality, that a(x)≺b(x). The scaling vector

Cof c(x) = a(x)⊗(l)b(x) can be expressed as follows:

C=

A+B= (l, a0+b0, a1+b1, . . . ),for la=lb=l;

B,for l < la≤lbor l=la< lb;

undeﬁned,for l > la.

(56)

The scaling vector Cof the generalized composition c(x) = exp(l)b(log(l)a(x)) can be

expressed as

C=

b(lb)

0A= (la+lb;a0b0, a1b0, a2b0, . . . , anb0),for la=l;

1(lb)A= (la+lb;a0, a1, a2, . . . , an),for l < la;

undeﬁned,for l > la.

(57)

Finally, let us focus on the derivative of the scaling expansion. Let us denote the

rescaled derivative operator as

(l)Dx[f] = exp(l) d(log(l)f(x))

dx!(58)

The scaling vector corresponding to the rescaled derivative is

(l)A0=

(la;a0−1, a1, a2, . . . , an),for la=l;

A,for la> l;

(l;−1,...,−1

| {z }

l−la

,0, . . . ),for la< l. (59)

18 Will be inserted by the editor

B Asymptotic curvature of (l;c, d)-logarithm

In this appendix, we calculate asymptotic properties of the (l;c, d) logarithm. Let us

ﬁrst express the derivatives of (l;c, d) logarithm in terms of (c, d) logarithm and µl:

log0

(l;c,d)(x) = log0

(c,d)(µl(x))µ0

l(x),(60)

log00

(l;c,d)(x) = log00

(c,d)(x)(µ0

l(x))2+ log0

(c,d)(µl(x))µ00

l(x),(61)

log000

(l;c,d)(x) = log000

(c,d)(µ0

l(x))3+ 3 log00

(c,d)(µl(x))µ0

l(x)µ00

l(x) + log0

(c,d)(µl(x))µ000

l(x).(62)

The derivatives of the nested logarithm µl(x) = [1 + log](l)(x) can be expressed as:

µ0

l(x) = 1

Ql−1

k=0 µk(x),(63)

µ00

l(x) = −µ0

l(x)

l−1

X

k=0

µ0

k(x) = −1

Ql−1

k=0 µk(x)

l−1

X

k=0

1

Qk

j=0 µj(x)

,(64)

µ000

l(x) = −µ00

l(x)

l−1

X

k=0

µ0

k(x)−µ0

l(x)

l−1

X

k=0

µ00

k(x)

=µ0

l(x)

l−1

X

j=0

µ0

j(x)

l−1

X

k=0

µ0

k(x) + µ0

l(x)

l−1

X

k=0

µ0

k(x)

k−1

X

j=0

µ0

j(x)

= 2µ0

l(x)

l−1

X

k=0

µ0

k(x)

k−1

X

j=0

µ0

j(x) + µ0

l(x)

l−1

X

k=0

µ0

k(x)

l−1

X

j=k

µ0

j(x).(65)

Let us ﬁrst denote l(c,d)(x) = log(c,d)(x) + r. Then the derivatives of log(c,d)can be

expressed as (see also Ref. [26]):

log0

(c,d)(x) = lc,d(x)

x(dr + (1 −cr) log x)[d+c(1 −cr) log x],(66)

log00

(c,d)(x) = lc,d(x)

x2(dr + (1 −cr) log x)2hdd−dr −(cr −1)2

+dc2(r−2)r+ 2c−1log x+ (c−1)c(cr −1)2log2xi,(67)

log000

(c,d)(x) = lc,d(x)

x3(dr + (1 −cr) log x)3hd3d(r−1)(cr −1)2−2(cr −1)3+d22r2−3r+ 1

+d(cr −1) 3c3r2−3c2r(r+ 2) + cd−2r2+ 6r−3+ 6r+ 3+d(3 −4r)−3log x

−d3c2(r−1) + c(6 −4r)−2(cr −1)2log2x

−cc2−3c+ 2(cr −1)3log3xi.(68)

In the asymptotic limit, only dominant contributions are relevant. Thus, let us con-

sider only dominant scaling c(i.e., take d= 0) and we get

log0

(l;c,d)(x) = log0

(c,d)(µl(x))µ0

l(x)≈(µl(x))c−1µ0

l(x),(69)

log00

(l;c,d)(x)≈log0

(c,d)(µl(x))µ00

l(x)≈ −(µl(x))c−1µ0

l(x)

x(70)

log000

(l;c,d)(x)≈log0

(c,d)(µl(x))µ000

l(x)≈(µl(x))c−1µ0

l(x)

x2.(71)

Thus, we plugged in Eqs. (33,34) then we get

rA

(l;c)≈µc−1

l(x)(µ0

l(x))

xµc−1

l(x)(µ0

l(x))2

"µc−1

l(x)(µ0

l(x))µc−1

l(x)(µ0

l(x))

x2−3µc−1

l(x)(µ0

l(x))

x2#2≈xµc−1

l(x)µ0

l(x),(72)

Will be inserted by the editor 19

and

rN

(l;c)≈µc−1

l(x)(µ0

l(x))3

xµc−1

l(x)(µ0

l(x))

x2≈xµc−1

l(x)µ0

l(x).(73)

Let us then focus on the situation l= 1, c= 1. In this case the leading order terms

cancel and we have to look at the ﬁrst correction given by the scaling exponent d. In

this case

log0

(1;1,d)(x)≈log(1 + log(x))d

x,(74)

log00

(1;1,d)(x)≈log(1 + log(x))d

x2,(75)

log000

(1;1,d)(x)≈log(1 + log(x))d

x3.(76)

So the curvature of both Amari and Naudts type can be asymptotically expressed

as

r(1;1,d)≈log(1 + log(x))d.(77)

C Fisher metric and scalar curvature corresponding to general

logarithm

Let us now show the full calculation of the scalar curvature corresponding to the

ΛD-logarithm with arbitrary scaling vector Dand constants rj. Let us ﬁrst recall the

product rule for higher derivatives. The ﬁrst three derivatives of a function ΛD(x) =

RQn

j=0 λj(x)−1:

Λ0

D(x) = R

n

Y

j=0

λj(x)

n

X

j=0

λ0

j(x)

λj(x)

,(78)

Λ00

D(x) = R

n

Y

j=0

λj(x)

2X

i<j

λ0

i(x)λ0

j(x)

λi(x)λj(x)+

n

X

j=0

λ00

j(x)

λj(x)

,(79)

Λ000

D(x) = R

n

Y

j=0

λj(x)

6X

i<j<k

λ0

i(x)λ0

j(x)λ0

k(x)

λi(x)λj(x)λk(x)

+3 X

i<j

λ00

i(x)λ0

j(x) + λ0

i(x)λ00

j(x)

λi(x)λj(x)+X

i

λ000

i(x)

λi(x)

.(80)

The derivatives of λjcan be expressed by deﬁning function L

Lj(x) = 1

(1 + rjlog µj+l−1(x)) Qj+l−1

k=0 µk(x)=µ0

j+l(x)

(1 + rjlog µj+l−1(x)) .(81)

Then we can express

λ0

j(x) = λj(x) (rjdjLj(x)) ,(82)

λ00

j(x) = λj(x)r2

jd2

jL2

j(x) + rjdjL0

j(x),(83)

λ000

j(x) = λj(x)r3

jd3

jL3

j(x)+3r2

jd2

jL0

j(x)Lj(x) + rjdjL00

j(x).(84)

20 Will be inserted by the editor

The derivatives of Lj(x) can be expressed as

L0

j(x) = −L2

j(x)

rj+ (1 + rjlog µj+l−1(x))

l+j−1

X

k=0

j+l−1

Y

m=k+1

µm(x)

,(85)

L00

j(x) = −2L3

j(x)

rj+ (1 + rjlog µj+l−1(x))

l+j−1

X

k=0

j+l−1

Y

m=k+1

µm(x)

2

− L2

j(x)

rj

l+j−1

X

k=0

j+l−1

Y

m=k+1

µm(x)

+ (1 + rjlog µj+l−1(x))

j+l−1

X

k=0

j+l−1

X

m=k+1 Qj+l−1

p=k+1 µp(x)

Qm

p0=0 µp0(x)

.(86)

We can ﬁnally rewrite the derivatives of ΛDas

d

dx(ΛD(x)) = R

n

Y

j=0

λj(x)

n

X

j=0 C1

j(x)Lj(x)

,(87)

d2

dx2(ΛD(x)) = R

n

Y

j=0

λj(x)

n

X

i=1

n

X

j=1 C2

ij (x)Li(x)Lj(x)

,(88)

d3

dx3(ΛD(x)) = R

n

Y

j=0

λj(x)

n

X

i=1

n

X

j=1

n

X

k=1 C3

ijk (x)Li(x)Lj(x)Lk(x)

,(89)

where the coeﬃcients Ccan be expressed as

C1

i(x) = ridi,(90)

C2

ij (x) = ridi[rjdj−δij Aj(x)] ,(91)

C3

ijk (x) = ridi[rjdjrkdk−(δij rjdjAj(x) + δikrkdkAk(x) + δj krjdjAj(x)) −δij kBi(x)] ,(92)

where

Ai(x) =

(ri+ (1 + rilog µi+l−1(x)))

i+l−1

X

k=0

i+l−1

Y

m=k+1

µm(x)

,(93)

Bi(x) = 2Ai(x)2+Li(x)

ri

l+j−1

X

k=0

i+l−1

Y

m=k+1

µm(x)

+(1 + rilog µi+l−1(x))

i+l−1

X

k=0

i+l−1

X

m=k+1 Qi+l−1

p=k+1 µp(x)

Qm

p0=0 µp0(x)

.(94)

Finally, we plug the expressions for ΛDand its derivatives into Eqs. (33,34), and we

end with

rA

D(x) = PiC1

iLi(x)2Pkl C2

kl(x)Lk(x)Ll(x)3

Pijkl hC1

i(x)C3

jkl (x)−3C2

ij (x)C2

kl(x)iLi(x)Lj(x)Lk(x)Ll(x)2,(95)

and

rN

D(x) = PiC1

i(x)Li(x)3

xPij C2

ij (x)Li(x)Lj(x)2,(96)

for x= 1/W , respectively.