Content uploaded by Demetris Koutsoyiannis

Author content

All content in this area was uploaded by Demetris Koutsoyiannis on Nov 24, 2017

Content may be subject to copyright.

Hurst-Kolmogorov dynamics as a result of extremal entropy production

Demetris Koutsoyiannis

Department of Water Resources and Environmental Engineering, National Technical

University of Athens, Greece (dk@itia.ntua.gr – http://www.itia.ntua.gr/dk)

Abstract. It is demonstrated that extremization of entropy production of stochastic

representations of natural systems, performed at asymptotic times (zero or infinity) results in

constant derivative of entropy in logarithmic time and, in turn, in Hurst-Kolmogorov

processes. The constraints used include preservation of the mean, variance and lag-1

autocovariance at the observation time step, and an inequality relationship between

conditional and unconditional entropy production, which is necessary to enable physical

consistency. An example with real world data illustrates the plausibility of the findings.

Keywords. Stochastic processes, Long-term persistence, Hurst-Kolmogorov dynamics,

Entropy, Uncertainty, Extremal entropy production

1. Introduction

Variational principles, such as the principle of extremal time for determining the path of light

(a generalization of Hero’s and Fermat’s principles) and the principle of extremal action for

determining the path of simple physical systems, are well established in physics. In complex

systems, entropy maximization is a principle that can determine the thermodynamic

equilibrium of a system. Historically, in classical thermodynamics, entropy was introduced as

a conceptual and rather obscure notion via a rather circular logic

*

. Later, within statistical

thermophysics, entropy has been given an austere and clear probabilistic meaning by

Boltzmann and Gibbs, while Shannon used essentially the same entropy definition to describe

the information content, which he also called entropy at von Neumann’s suggestion [1].

Despite having the same name, information entropy and thermodynamic entropy are

commonly regarded as two distinct notions†. One of the strongest proponents of this view is E.

T. Jaynes [2]‡, who, notably, introduced the principle of maximum entropy as a principle for

*

Cf. the classical definition dS = dQ/T, where S, Q and T denote entropy, heat and temperature, respectively; the

definition is applicable in a reversible process, which in turn is one in which dS = dQ/T.

†

This was also suggested by a reviewer of this paper.

‡

Jaynes [2] wrote: “We must warn at the outset that the major occupational disease of this field is a persistent

failure to distinguish between the information entropy, which is a property of any probability distribution, and

the experimental entropy of thermodynamics, which is instead a property of a thermodynamic state as defined,

for example by such observed quantities as pressure, volume, temperature, magnetization, of some physical

system. They should never have been called by the same name; the experimental entropy makes no reference to

any probability distribution, and the information entropy makes no reference to thermodynamics. Many

textbooks and research papers are flawed fatally by the author's failure to distinguish between these entirely

different things, and in consequence proving nonsense theorems”.

2

inference. However, this distinction has been disputed by others: the “information entropy [is]

shown to correspond to thermodynamic entropy” [1] and the two become precisely identical,

if Boltzmann’s constant is taken equal to unity. Perhaps the fact that the information entropy

is a dimensionless quantity, while the thermodynamic entropy has units (J/K), has been a

major obstacle to see that the two are identical notions*. But this is an historical accident,

related to the arbitrary introduction of temperature scales, which Boltzmann respected, thus

multiplying the probabilistic entropy by the constant bearing his name. This has been made

crystal clear now [3]† and it has been recognized that, given Shannon’s entropy definition, “in

principle we do not need a separate base unit for temperature, the kelvin” and in essence,

thermodynamic entropy, like the information entropy, is dimensionless [4].

Here we regard the thermodynamic and information entropy identical notions, for which

we use the term Boltzmann-Gibbs-Shannon (BGS) entropy whose definition is given in

Equation (3) below. Likewise, we regard the principle of maximum entropy both as a physical

principle to determine equilibrium thermodynamic states of natural systems and as a logical

principle to make inference about natural systems (in an optimistic view that our logic in

making inference about natural systems could be consistent with the behavior of the natural

systems).

Application of the principle of maximum entropy is rather simple when time is not

involved in the problem studied, which is the case in systems in equilibrium. In studying

systems out of the equilibrium, time should necessarily be involved [e.g., 5] and the paths of

the systems should be inferred. The more recently developed, within non-equilibrium

thermodynamics, extremal principles of entropy production attempt to predict the most likely

path of system evolution [6,7,8]. Central among the latter are Prigogine’s minimum entropy

production principle [9] and Ziegler’s maximum entropy production principle [10], which are

not in contradiction [11].

Extremization of entropy or its production rate involves a set of constraints, which impose

macroscopic physical restrictions in the probabilistic concept of entropy. For example, in the

kinetic theory of gases [e.g.,12], if f(p)dp is the number of molecules with momenta between

p and p + dp in a certain macroscopic volume, then the zeroeth moment (the integral of f(p)

over the entire volume) is the total number of molecules and should be constant due to

*

The fact the kelvin is just a convention rather than a fundamental physical unit is reflected in the currently

discussed proposal for redefining the kelvin scale indirectly, based on Boltzmann's constant k (specifically, the

kelvin, unit of thermodynamic temperature, is such that the Boltzmann constant is exactly 1.380 65 × 10

−23

J/K

[4]); had k been chosen equal to one, without dimensions, the kelvin scale would not be needed at all).

†

Atkins [3] writes: “Although Boltzmann’s constant k is commonly listed as a fundamental constant, it is

actually only a recovery from a historical mistake. If Ludwig Boltzmann had done his work before Fahrenheit

and Celsius had done theirs, then … we might have become used to expressing temperatures in the units of

inverse joules… Thus, Boltzmann’s constant is nothing but a conversion factor between a well-established

conventional scale and the one that, with hindsight, society might have adopted”.

3

conservation of mass. Also, the first moment is constrained to a constant value due to the law

of conservation of (macroscopic) momentum, and the second marginal moment is constrained

by the law of conservation of energy (since ||p||

2

is proportional to the kinetic energy).

Likewise, joint moments of momenta represent deviatoric stresses [13].

In studying very complex systems, e.g. the Earth climatic system or parts thereof, it may be

difficult to establish a relationship of macroscopic and microscopic quantities. However, the

entropy can still be determined in terms of the probability density function of merely

macroscopic quantities, and constraints in entropy extremization could also be formulated on

the basis of assigning a conceptual meaning to macroscopic properties (moments) of the

process of interest. The values of these moments can be determined from a statistical sample,

rather than deduced from fundamental physical laws. Thus, the first and second marginal

moments (mean and variance) represent a mean level of the process and the intensity of

fluctuations around the mean, respectively, whereas a joint moment of time lagged variables

(autocovariance) represents time dependence. In natural processes, for short lags, the

dependence (expressed in terms of autocovariance) must necessarily be positive because

values at adjacent times should be close to each other.

In geophysical processes, the available length of observation records are usually short and

restrict the reliable statistical estimation to the lowest marginal moments (mean, variance) and

the smallest lags for autocovariances. The asymptotic stochastic properties of the processes,

i.e. the marginal distribution tails and the tails of the autocovariance functions, can hardly be

estimated from available records. However, these asymptotic properties are crucial for the

quantification of future uncertainty, as well as for planning and design purposes. This work

tries to connect statistical physics (the extremal entropy production concept, in particular)

with stochastic representations of natural processes, which are used as modeling tools in

planning and design purposes. Extremal entropy production may provide a theoretical

background in such stochastic representations, which are otherwise solely data-driven. The

focus of the study is the asymptotic behavior of the autocovariance function. The principal

question is whether the autocovariance decays following an exponential or a power-type law.

The two different behaviors are also known as short-term and long-term persistence,

respectively. A typical representative of the former behavior is a Markov process, while one

of the latter is the Hurst-Kolmogorov (HK) process (named so after Hurst [14], who pioneered

its detection in geophysical time series, and Kolmogorov [15], who proposed the

mathematical process), else known as fractional Gaussian noise [16]. Both these processes

will be fully described below.

2. Basic notions

Let x

∆

i

be a stationary stochastic process at discrete time i = 0, 1, … (where time 0 denotes the

present), representing a natural process x

∆

i

. (Notice that underlined quantities denote random

variables, whereas regular mathematical quantities are not underlined). As natural processes

4

evolve in continuous time, it is natural to assume that x

∆

i

are stationary increments, at time

steps of equal size ∆, of a cumulative process z(t) in continuous time t, i.e., x

∆

i

:= z(i∆) –

z((i – 1)∆). Observing the natural process x

1

i

at time step, say, ∆ = 1 we can estimate from a

statistical sample the following quantities:

E[x

1

i

] =: µ, Var[x

1

i

] := E[(x

1

i

– µ)

2

] =: γ, Cov[x

1

i

, x

1

i + 1

] := E[(x

1

i

– µ) (x

1

i + 1

– µ)] = ρ γ (1)

where E[ ] denotes expectation and ρ is the lag-1 autocorrelation coefficient. The three

equalities in (1) form the simplest possible set of constraints for multivariate entropy

extremization. Alternatively, these constraints could be formulated in terms of the cumulative

process z(t) (assuming z(0) = 0) as:

E[z(1)] = µ, Var[z(1)] = γ, Var[z(2)] = 2(1 + ρ) γ (2)

The problem we investigate here is the determination of the multivariate (joint) distribution

of the process x

∆

i

(or equivalently, z(t)) by extremizing a proper entropic metric. It is known

[e.g., 17] that, for a single random variable x (rather than a stochastic process), extremization

of the BGS entropy using the first two of the above constraints results in the normal

(Gaussian) distribution of x. Also, for many random variables x

i

with given correlation matrix,

the BGS entropy extremizing distribution is the multivariate Gaussian [17]. We note that the

BGS entropy definition may be insufficient for certain phenomena, where generalized

entropies are necessary to consider, which result in different entropy extremizing distributions

[18,19]; again, however, an appropriate nonlinear transformation (derived from the

generalized entropy definition) may make the process Gaussian. For simplicity, in this study

we adhere to the BGS entropy and use the above classical results to assert that the joint

distribution of z(t) (as well as x

∆

i

) of any order is Gaussian. The focus of the study is not the

marginal distribution but the dependence structure (autocovariance) of x

∆

i

and this can be

readily inferred if the variance of z(t) is known for any time t. We recall that the BGS entropy

of a single random variable z with probability density function f(z) is defined as*:

Φ[z] := Ε[–ln f(z)] = –⌡

⌠

–∞

∞

f(z) ln f(z) dz (3)

and that for a Gaussian variable the entropy depends on its variance γ only and is given as

[17]

Φ[z] = (1/2) ln(2πe γ) (4)

*

Here we have used the older notation of entropy by Φ instead of the newer S and H, which in stochastics are

reserved for standard deviation and Hurst coefficient, respectively. The quantity ln f(z) in (3) is meant, more

precisely, as ln [f(z)/m(z)] (e.g. Ref. [2], p. 375) where m(z) is meant as the Lebesgue density, i.e. equal to 1

times the unit [z]

-1

, so that Φ[z] is dimensionless.

5

Furthermore, for the stochastic process z(t) we can define entropy production as the time

derivative Φ΄[z(t)] := dΦ[z(t)]/dt. For a Gaussian process, by virtue of (4), the entropy

production is

Φ΄[z(t)] = [dγ(t)/dt] / 2γ(t) (5)

We will use the notion of extremal entropy production to infer the variance of z(t), which in

turn determines the entire dependence structure of x

∆

i

, using the constraints (2) (or (1)). We

clarify that, while other studies make an assumption on the type of dependence (e.g. long-

range dependence [20]), here we attempt to derive, rather than assume, the dependence

structure.

3. Methodology

We start our investigation from a linear Markovian process in continuous time, because of its

simplicity and mathematical convenience, also assuming (without loss of generality) zero

mean µ (we will remove this assumption in section

5). We note that any dependence structure

of a stochastic process can be obtained by the sum of an infinite number of Markovian

processes independent to each other [cf. 21]. Furthermore, the sum of a finite number of

independent Markovian processes can provide good approximations of any dependence

structure on a finite range of time scales. For example, in practical problems, an HK process

can be approximated using as few as three Markovian processes [22].

A linear Markovian process q(t) in continuous time, else known as Ornstein–Uhlenbeck

(OU) process [17,23], with zero mean is defined by

(1/k) dq + q dt = σ v dt (6)

where k > 0 and σ > 0 are parameters, and v is the white noise process (i.e., v dt = dw where w

is the Wiener process) with mean E[v(t)] = 0 and covariance function Cov[v(t), v(s)] = δ(t – s),

whereas δ( ) is the Dirac delta function. The quantity 1/k has units of time and represents a

characteristic time scale of the process, whereas the quantity σ

2

represents a characteristic

variance. The linear differential equation (6) is easily solved to give

q(t) = q(0) e

–kt

+ k σ e

–kt

⌡

⌠

0

t

v(ξ) e

kξ

dξ (7)

and has mean and variance, respectively,

Ε[q(t)]

=

e

–kt

Ε[q(0)], Var[q(t)] = Var[q(0)] e

–2kt

+ (k σ

2

/2) (1 – e

–2kt

) (8)

The cumulative OU process is

z(t) = ⌡

⌠

0

t

q(ξ) dξ = q(0) (1 – e

–kt

)/k + σ ⌡

⌠

0

t

v(ξ) [1 – e

–k(t – ξ)

] dξ (9)

6

and has mean and variance, respectively,

Ε[z(t)]

=

Ε[q(0)] (1 – e

–kt

)/k (10)

Var[z(t)] = Var[q(0)] (1 – e

–kt

)

2

/ k

2

+ σ

2

[2kt + 1 – (2 – e

–kt

)

2

] / (2k) (11)

Two cases are most interesting to consider: the unconditional case, in which no initial

condition is known, and the conditional case where the present and the entire past of q(t), t ≤

0, are known (observed). In the former case, stationarity in q(t) demands that E[q(t)] = Ε[q(0)]

= 0 and Var[q(t)] = Var[q(0)]. Substituting this into (8), we find Var[q(0)] = k σ

2

/2 =: ω, so

that in the unconditional (stationary) case we have (by virtue of (11); see also [17]):

Ε[q(t)]

= Ε[z(t)]

= 0, Var[q(t)] = ω, γ(t) := Var[z(t)] = 2(kt + e

–kt

– 1)

k

2

ω (12)

In the conditional case, since the process is Markovian, only the most recent observed value

q(0) = q(0) matters in conditioning on the past, and also Var[q(0)] = 0. Thus,

Ε[q(t)|q(0)]

= q(0) e

–kt

, Ε[z(t)|q(0)]

=

q(0) (1 – e

–kt

)/k, Var[q(t)|q(0)] = (1 – e

–2kt

) ω,

γ

C

(t) := Var[z(t)|q(0)] = 2kt + 1 – (2 – e

–kt

)

2

k

2

ω

(13)

By means of γ(t) and γ

C

(t) in (12) and (13) and by virtue of (4) we can determine, at any

time t, the unconditional and conditional entropy of z(t). Likewise, by virtue of (5), we can

determine the unconditional and conditional entropy production, which are, respectively,

Φ΄[z(t)] = k(1 – e

–kt

)

2(kt + e

–kt

– 1), Φ

C

΄[z(t)] = k(1 – e

–kt

)

2

2kt + 1 – (2 – e

–kt

)

2

(14)

We observe that in both cases the entropy production is proportional to k, the inverse

characteristic time scale, depends on the dimensionless time kt, and is independent of the

characteristic variance ω. We also observe that the limit of both Φ΄[z(t)] and Φ΄

C

[z(t)] for t →

0 is ∞ and for t → ∞ is 0. Consequently, any composite process determined as a sum of

Markovian processes will also have the same limits. This does not enable comparison of the

asymptotic behaviour of different processes (dependence structures). On the other hand, the

asymptotic values of entropy production for t → 0 and ∞ are the most important for model

comparisons and eventually for extremization of entropy production. Otherwise, comparisons

of different models at any specified finite time t would involve a degree of subjectivity and

arbitrariness for the choice of t. In a similar situation, Koutsoyiannis [24] used an average

entropy over a range of time scales.

Here, we follow a different method to tackle the infinite and zero entropy production. We

first observe that for any time t an inequality relationship between any two stochastic

processes 1 and 2, i.e. Φ΄[z

1

(t)] ≤ Φ΄[z

2

(t)] is preserved if we replace entropy production Φ΄

with the following quantity, which we will call entropy production in logarithmic time

(EPLT):

7

φ[z(t)] := Φ΄[z(t)] t ≡ dΦ[z(t)] / d(lnt) (15)

Therefore, extremization of entropy production is equivalent to extremization of EPLT. In the

OU model the unconditional and conditional EPLTs are, respectively,

φ[z(t)] = kt (1 – e

–kt

)

2(kt + e

–kt

– 1), φ

C

[z(t)] = kt (1 – e

–kt

)

2

2kt + 1 – (2 – e

–kt

)

2

(16)

These quantities depend on the dimensionless time kt only and their limits are finite, i.e.

φ[z(0)] = 1, φ

C

[z(0)] = 3/2, and φ[z(∞)] = φ

C

[z(∞)] = 1/2.

As stated above, we can materialize a stochastic process with an arbitrary dependence

structure as the sum of a number m of OU processes independent to each other, i.e. z(t) :=

Σ

m

j = 1

z

j

(t), where z

j

(t) has characteristic variance ω

j

and characteristic time scale 1/k

j

. The

variances of z(t) will then be

γ(t) = ∑

j = 1

m

2(k

j

t + e

–k

j

t

– 1) ω

j

/k

2

j

, γ

C

(t) = ∑

j = 1

m

[2k

j

t + 1 – (2 – e

–k

j

t

)

2

] ω

j

/k

2

j

(17)

and the EPLTs

φ[z(t)] = t

γ(t) ∑

j = 1

m

(1– e

–k

j

t

) ω

j

/k

j

, φ

C

[z(t)] = t

γ

C

(t) ∑

j = 1

m

(1– e

–k

j

t

)

2

ω

j

/k

j

(18)

Clearly, the characteristic variances ω

j

are weights of each of the constituents z

j

(t) of the

composite process z(t) and the EPLTs of z(t) depend in a nonlinear manner on all these

weights.

4. Application

The EPLT extremization problem is formulated as

extremize φ[z(t)] or φ

C

[z(t)] for t → 0 or t → ∞

(19)

subject to constraints of Equation (2)

The control variables (unknown quantities) to be determined are the weights ω

j

for specified

inverse characteristic time scales k

j

. By appropriate choice of k

j

, analytical solutions may be

possible, but here we tackle the optimization problem numerically. The numerical procedure

is quite general, so that it can host constraints more than those contained in (2) (e.g. more than

one autocovariance value), but here we adhere to the simplest possible set of constraints for

parsimony. At the same time, the framework is quite simple and, thus, the extremization could

be performed by widespread all-purpose optimization tools (here we used the generalized

reduced gradient tool by Frontline Systems, v3.5, implemented as a Solver in Excel).

In our numerical framework we investigate times t in the range 2

–10

to 2

10

(a range

spanning approximately 6 orders of magnitude) and we use inverse characteristic time scales

8

k

j

in the range k

min

= k

1

= 2

–15

to k

max

= 2

15

, with k

j

= 2 k

j – 1

(a range spanning approximately 9

orders of magnitude). The unknown quantities are the m = 31 characteristic variances ω

j

.

These are determined by extremizing (maximizing or minimizing) the EPLTs for the lowest

and highest times of the range considered (2

–10

to 2

10

). These are supposed to approximate the

limits for t → 0 and t → ∞, respectively. We use the three equality constraints in (2) and, to

force determination of a limit rather than a local optimum for time 2

–10

or 2

10

, we set an

additional inequality constraint that the standard deviation of the EPLT values at five

consecutive time scales nearest to 2

–10

or 2

10

is lower than a small value ε.

We performed several numerical optimizations of this type and we generalized the results

which proved to be very simple. An example is shown in Figure 1 in terms of the extremizing

weights ω

j

for each inverse characteristic time scale k

j

, and in Figure 2 in terms of the

resulting EPLTs. This example corresponds to constraints µ = 0, γ = 1 and ρ = 0.543; the latter

value was chosen so as to correspond to an OU process with k = 1. Two characteristic

processes emerge by EPLT extremization. The first is a single OU process, in which all ω

j

are

zero except one, namely that corresponding to k = 1. The single OU process maximizes both

EPLTs for t → 0, with φ[z(0)] = 1 and φ

C

[z(0)] = 3/2. The same process is the one that

minimizes both EPLTs for t → ∞, with φ[z(∞)] = φ

C

[z(∞)] = 1/2.

The second extremizing process, denoted in the figures as the Hybrid process, has again all

ω

j

zero except two, i.e. those corresponding to k

min

= 2

–15

and k

max

=2

15

. This minimizes both

EPLTs for t → 0, with φ[z(0)] = φ

C

[z(0)] = 1/2 and maximizes EPLTs for t → ∞, with φ[z(∞)]

= 1 and φ

C

[z(∞)] > 1.

0.001

0.01

0.1

1

10

100

1000

10000

0.00001

0.0001

0.001

0.01

0.1

1

10

100

1000

10000

100000

k

j

ω

j

Markov

Hybrid

HK

Figure 1 Entropy extremizing characteristic variances (weights of the composite processes) ω

j

versus the inverse

characteristic time scales k

j

, as obtained by the numerical framework for the three solutions discussed in text.

9

0

0.5

1

1.5

2

0.001 0.01 0.1 1 10 100 1000

t

φ(t)

Markov, unconditional

Markov, conditional

Hybrid, unconditional

Hybrid, conditional

HK, unconditional+conditional

0

0.5

1

1.5

2

0.001 0.01 0.1 1 10 100 1000

t

φ(t)

Markov, unconditional

Markov, conditional

Hybrid, unconditional

Hybrid, conditional

HK, unconditional+conditional

Figure 2 Entropy production in logarithmic time φ(t) ≡ φ[z(t)] versus time t in the three extremizing solutions

discussed in text and shown in Figure 1; (upper) as determined by the numerical framework; (lower) analytical

models.

However, it is readily understood that the latter numerical solution is artificially affected

because of the use of finite minimum and maximum scales. We can thus infer that the precise

analytical solution would be the one in which k

min

→ 0 and k

max

→ ∞. It can be derived from

(12) and (13) that, as k

max

→ ∞, γ(t) = γ

c

(t) → 2ω

max

t/k

max

= at (where a = 2ω

max

/k

max

),

whereas, as k

min

→ 0, γ(t) → ω

min

t

2

= bt

2

(where b = ω

min

) and γ

c

(t) = 0. Thus, in the

analytical extremizing solution, the variance of z(t) should be the sum γ(t) = at + bt

2

and the

10

conditional variance should be γ

C

(t) = at. Given constraints (2), we can calculate the

parameters a and b as a = 2γ(1) – γ(2)/2 = (1 – ρ)γ and b = –γ(1) + γ(2)/2 = ργ. Its EPLTs are

thus φ[z(t)] = (1 – ρ + 2ρt) / (2 – 2ρ + 2ρt) (so that φ[z(0)] = 1/2 and φ[z(∞)] = 1), and φ

C

[z(t)]

= 1/2 (constant). These analytical solutions have been depicted in Figure 2 (lower panel),

where it can be seen that they are indistinguishable from the numerical ones except for

φ

C

[z(t)] for very large times (t > 2

5

), for which the numerical solution indicates values > 1/2

while the analytical solution is exactly 1/2.

Are these mathematically optimal results physically meaningful? It seems that both have

problems. The OU process heavily depends on the time scale (here assumed equal to 1) in

which the constraints in (2) were formulated; a different OU process would perhaps be

obtained if the three constraints were formulated at a different time scale. The problem here

emerges because the constraints were quantified based on induction (observational data as

explained in the introduction) rather than deduction. It is not reasonable to expect that the

actual dependence structure of a process might be affected by the choice of the time scale of

observation. The Hybrid process has a more fundamental problem: As shown in Figure 2, the

conditional entropy production is lower that the unconditional one at all times t. This

contradicts physical realism: Naturally, by observing the present state of a process (at t = 0),

the future entropy is reduced. In other words, the conditional entropy should be lower than the

unconditional for t > 0, whereas as t → ∞ conditional and unconditional entropies should tend

to be equal. However, this cannot happen if the entropy production is consistently lower in the

conditional than in the unconditional case. The problem arises here because of the fact that

noises, such as the constituents of the Hybrid process, are just mathematical constructs,

whereas Nature does not produce noises—rather it produces uncertainty, i.e. entropy [25]. As

will be seen in section

5, the Hybrid process has additional problems related to its

observability.

For this reason, to reinstate physical plausibility we impose an inequality constraint

φ

C

[z(t)] ≥ φ[z(t)], in addition to the equality constraints (2). In this case, the Hybrid model is

replaced by one in which the inequality constraint becomes binding. Our numerical

framework resulted in an extremizing solution whose weights ω

j

are all nonzero as shown in

Figure 1 (the solution marked as “HK”) and whose resulting EPLTs are shown in Figure 2.

The same solution is extremizing for both t → 0 and t → ∞ and for both the conditional and

unconditional cases. In this solution,

φ

C

[z(t)] = φ[z(t)] = H (20)

(independent of time). Despite being constant, this solution minimizes both EPLTs for t → 0

and maximizes them for t → ∞. The small roughness of the curve shaped by the weights ω

j

, as

a function of the inverse scale k

j

, in Figure 1 may be due to numerical effects, but these effects

do not seem to disturb the constancy of EPLTs in Figure 2.

11

Under constant EPLT equal to H, combining (5) and (15) we obtain (1/2) d(ln γ)/d(ln t) =

H, or dγ/γ = 2H dt/t. This results in

γ(t) = t

2H

γ(1) (21)

which is readily recognized as the variance function of the cumulative HK process. In effect,

(21) can serve as a definition of the cumulative HK process, while the constant H is the well

known Hurst coefficient. The fact that H in (21) is identical to the EPLT gives it a sound

physical interpretation. Based on a numerical approximation from [24], the conditional

variance of the HK process is

γ

C

(t) = t

2H

c γ(1), c := 1 – (2H – 1)

2

[0.72(H – 1) + 1] (22)

Unlike the OU and Hybrid processes, the HK extremizing solution does not involve apparent

inconsistencies with physical reality.

5. A real world example

To examine the consistency with reality of each of the three stochastic processes extremizing

entropy production we present an example with real world data. We chose one of the longest

available instrumental geophysical records, the mean annual temperature of Vienna, Austria

(WMO station code: 11035 WIEN/HOHE_WAR; coordinates: 48.25ºN, 16.37ºE, 209 m)

which comprises 235 years of data (1775-2009) available online from the Royal Netherlands

Meteorological Institute (http://climexp.knmi.nl). Part of the record (141 years; 1851-1991)

has been included in the Global Historical Climatology Network (GHCN). Comparison of the

GHCN time series (available from the same web site), which has undergone consistency

checking and correction, with the original one shows that adjustments are minor (less than

±0.25º at the annual level), which increases confidence to the longer original data set. Both

time series are shown in Figure 3 (upper panel). The Gaussian distribution fits very well the

time series, whose skewness is almost zero.

The EPLT cannot be calculated directly from the time series but the entropy Φ[z(t)] can,

for a certain range of t, based on (4) and using sample estimates g(t) of γ(t). Denoting the total

observation period as T (235 years) and assuming a varying time step ∆ we form samples of

x

∆

i

with size n = T/∆ (more precisely, the floor of T/∆). Here we varied ∆ from 1 to 23 years,

so that the sample size be at least 10. For each ∆, the sample mean is

x

–

∆

≡ z(T) (∆/T) (23)

which is an unbiased estimator of the true mean µ, and the sample variance is

g(∆) = 1

T/∆ – 1 ∑

l = 1

T/∆

(x

∆

l

- x

–

∆

)

2

(24)

12

6

7

8

9

10

11

12

1770 1790 1810 1830 1850 1870 1890 1910 1930 1950 1970 1990 2010

Θ (ºC)

Original Adjusted 30-year average

Slope = 0.5

Slope —> 1

Slope = 0.74

0

1

2

3

4

5

6

1

1.5

2

2.5

3

3.5

4

4.5

0 0.5 1 1.5 2 2.5 3 3.5 4

ln ∆

Φ[x

∆

]

ln γ(∆),

ln g(∆),

ln E[g(∆)]

Empirical

White noise

Markov

Hybrid, theoretical

Hybrid, adapted

HK, theoretical

HK, adapted

_

Figure 3 (Upper) The time series of the annual temperature (Θ) in Vienna, Austria; (lower) the entropy Φ[x

∆

],

along with the corresponding variances at time step ∆, versus ln ∆, as estimated from the time series using the

classical statistical variance estimate g(∆) and as predicted by the three extremizing models theoretically (γ(∆))

and after adaptation for bias (E[g[∆]).

In classical statistics, where x

∆

l

are independent identically distributed, g(∆) is an unbiased

estimator of the true variance γ(∆). However, if consecutive x

∆

l

are dependent, the following

equation holds true:

13

E[g(∆)] = T/∆

T/∆ – 1{Var[x

∆

i

] – Var[x

–

∆

]}

(25)

where Var[x

–

∆

]

=

(∆/T)

2

Var[z(T)] and Var[x

∆

i

]

=

γ(∆). This results in E[g(∆)] =

c(∆,

T)

γ(∆),

where

c(∆, T) := 1 – (∆/T)

2

γ(T)/γ(∆)

1 – ∆/T

(26)

represents a bias adaptation factor, which equals 1 (no bias) only when γ(∆)/γ(T) = ∆/T, i.e. in

white noise. Depending on the dependence structure, the bias adaptation factor may be quite

smaller than 1 (although this is often missed in the literature). This is particularly the case in

the Hybrid model (Figure 3, lower panel), where E[g(∆)] (plotted as a continuous line with

triangles) is almost indistinguishable from the true variance γ(∆) of the white noise instead of

being close to the true variance of the Hybrid model (notice that there is a one-to-one

association between variance and entropy, shown in figure). This constitutes the observability

problem of the Hybrid model mentioned above.

In contrast, in the Markovian (OU) model (also shown in Figure 3, lower panel), the bias

adaptation is negligible (the two curves before and after adaptation are indistinguishable to

each other), i.e. the factor c(∆, T) is very close to unity. On the other hand, the OU model

implies the lowest theoretical entropy for large scale ∆. After adaptation for bias, as shown in

Figure 3 (lower panel), the Hybrid model entails even lower observed entropy, despite that it

entails the highest theoretical entropy.

In the HK model the adaptation factor is closer to unity than in the Hybrid model (the

theoretical and adapted entropy curves are close to each other) but not as close as in the OU

model. In this respect, the HK model represents a good balance of observability on the one

hand, expressed by high c(∆, T), and, on the other hand, high entropy for large scales,

expressed by γ(∆) or even by the ratio γ(T)/γ(∆) (since T >> ∆). In the HK model the

adaptation factor c(∆, T) is maximized for H = 0.5 and the ratio γ(T)/γ(∆) is maximized for H

= 1, in which however c(∆, T) = 0. The aforesaid balance is expressed by the fact that the

product of the two (i.e., c(∆, T) γ(T)/γ(∆)) is maximized for H between 0 and 1 (specifically,

for H = 1 – (1/2) ln 2/ln(T/∆), as can be easily verified).

To compare the three models with reality, the proximity of empirical points (corresponding

to sample estimates g(∆)) with the model should be assessed on the basis of the predicted

curves E[g(∆)] of the models, rather than their theoretical variances γ(∆). The former are

designated in Figure 3 (lower panel) as “adapted” whereas the latter are designated as

“theoretical”. It is seen in the figure that the HK model is the only one that agrees with reality.

The curves E[g(∆)] of the other two models lie far apart from empirical points. It is noted that

the consistency of the HK model with reality has been detected in numerous studies using

geophysical, biological, technological and even economical time series (e.g. [22-25,26,27,28,

29,30,31] and references therein) and this makes the findings of this study more physically

plausible.

14

6. Conclusion and discussion

It is demonstrated that extremization of entropy production of stochastic representations of

natural systems, performed at asymptotic times (zero of infinity) and using simple constraints

referred to finite times, at which a process is observed, results in constant derivative of

entropy in logarithmic time, which in turn results in Hurst-Kolmogorov processes and long-

term persistence. One eminent characteristic of the derivation is its parsimony, in terms of

both constraints and physical principles used. Specifically, it was demonstrated that no other

notions (e.g. self-organized criticality, scale invariance, etc.) in addition to entropy

extremization are necessary to explain the emergence of the Hurst-Kolmogorov behavior. An

example with real world data, which is in agreement with a large body of studies that have

detected long-term persistence in long time series of a wide range of processes, illustrates the

plausibility of the findings.

These findings connect statistical physics (the extremal entropy production concept, in

particular) with stochastic representations of natural processes. Extremal entropy production

may provide a theoretical background in such stochastic representations, which otherwise are

solely data-driven. A theoretical background in stochastic representations is important also for

the reason that, as noted by Koutsoyiannis and Montanari [31], merely statistical arguments

do not suffice to verify or falsify the presence of long-term persistence in natural processes.

The practical consequences of these findings may be significant because stochastic

representations of processes are used as modeling tools particularly for the estimation of

future uncertainty for planning and design purposes with long time horizons. The emergence

of maximum entropy (i.e., maximum uncertainty) for large time horizons, as demonstrated

here, should be considered seriously in planning and design studies, because otherwise the

uncertainty would be underestimated and the constructions undersized. The relevance of the

last point may be even wider, given the current scientific and public interest on long-term

predictions.

Acknowledgments I gratefully thank three reviewers for their encouraging comments and

constructive suggestions, which helped me to substantially improve the presentation.

References

[1] H. S. Robertson, Statistical Thermophysics (Prentice Hall, Englewood Cliffs, NJ, 1993).

[2] E. T. Jaynes, Probability Theory: The Logic of Science (Cambridge Univ. Press, 728 pp., 2003).

[3] P. Atkins, Four Laws that Drive the Universe (Oxford Univ. Press, 131 pp., 2007).

[4] J. Fischer et al., Report to the CIPM on the implications of changing the definition of the base unit Kelvin

(International Committee for Weights and Measures, http://www.bipm.org/wg/CCT/TG-

SI/Allowed/Documents/Report_to_CIPM_2.pdf, Retrieved 2010-11-28).

[5] A. Porporato et al., Irreversibility and fluctuation theorem in stationary time series, Phys. Rev. Lett., 98,

094101 (2007).

15

[6] D. Kondepudi and I. Prigogine, Modern Thermodynamics (Wiley, Chichester, 1998).

[7] T. Pujol and J. E. Llebot, Q. J. R. Meteorol. Soc., 125, 79-90 (1999) .

[8] H. Ozawa et al., Rev. Geophys., 41(4), 1018, doi:10.1029/2002RG000113 (2003).

[9] I. Prigogine, Bulletin de la Classe des Sciences, Academie Royale de Belgique, 31, 600-606 (1945).

[10] H. Ziegler, in Progress in Solid Mechanics ed. by I.N. Sneddon and R. Hill, vol. 4 (North-Holland,

Amsterdam, 1963).

[11] L.M. Martyushev and V.D. Seleznev, Phys. Rep., 426, 1-45 (2006).

[12] W. Pauli, Statistical Mechanics (Dover, New York, 1973).

[13] I. Müller, in Entropy, ed. by A. Greven, G. Keller and G. Warnecke, Ch. 5, 79–105 (Princeton University

Press, Princeton, New Jersey, USA, 2003).

[14] H.E. Hurst, Trans. Am. Soc. Civil Engrs., 116, 776–808 (1951).

[15] A.N. Kolmogorov, Dokl. Akad. Nauk URSS, 26, 115–118 (1940).

[16] B.B. Mandelbrot and J. W. Van Ness, SIAM Rev, 10 (4), 422-437 (1968).

[17] A. Papoulis, Probability, Random Variables and Stochastic Processes, 3rd edn. (McGraw-Hill, New York,

1991).

[18] C. Tsallis, Journal of Statistical Physics, 52(1), 479-487 (1988).

[19] C. Tsallis, Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World, Springer

(2009).

[20] F. Bouchet et al., Physica A, 389, 4389-4405 (2010).

[21] B. B. Mandelbrot, A fast fractional Gaussian noise generator. Wat. Resour. Res. 7 (3), 543–553 (1971).

[22] D. Koutsoyiannis, Hydrol. Sci. J., 47 (4), 573–595 (2002).

[23] S. Karlin and H. M. Taylor, A Second Course in Stochastic Processes (Academic Press, Boston, 1981).

[24] D. Koutsoyiannis, Hydrol. Sci. J., 50 (3), 405–426 (2005).

[25] D. Koutsoyiannis, Hydrol. Earth Sys. Sci., 14, 585–601 (2010).

[26] T. Lux, Appl. Econ. Lett., 1996, 3, 701–706 (1996).

[27] E. Koscielny-Bunde et al., Phys. Rev. Lett., 81(3), 729–732 (1998).

[28] W.E. Leland et al., IEEE/ACM Trans. Networking, 2(1), 1-15 (1994).

[29] A. Bunde et al., Phys. Rev. Lett., 85(17), 3736- 3739 (2000).

[30] A. Bunde and S. Havlin, Physica A, 314, 15-24 (2002).

[31] D. Koutsoyiannis and A. Montanari, Water Resour. Res., 43 (5), doi:10.1029/2006WR005592 (2007).