Content uploaded by Demetris Koutsoyiannis

Author content

All content in this area was uploaded by Demetris Koutsoyiannis on Nov 22, 2016

Content may be subject to copyright.

Negligent killing of scientific concepts:

the stationarity case

Demetris Koutsoyiannis1 and Alberto Montanari2

1Department of Water Resources and Environmental Engineering, School of Civil Engineering,

National Technical University of Athens, Greece (dk@itia.ntua.gr – http://itia.ntua.gr/dk)

2Department DICAM, University of Bologna

Abstract In the scientific vocabulary, the term “process” is used to denote change in time. Even a

stationary process describes a system changing in time, rather than a static one which keeps a constant

state all the time. However, this is often missed, which has led to misusing the term “nonstationarity”

as a synonym of “change”. A simple rule to avoid such misuse is to answer the question: can the

change be predicted in deterministic terms? Only if the answer is positive it is legitimate to invoke

nonstationarity. In addition, we should have in mind that models are made to simulate the future rather

than to describe the past; the past is rather characterized by observations (data). Usually future changes

are not deterministically predictable and thus the models should, on the one hand, be stationary and,

on the other hand, describe in stochastic terms the full variability, originating from all agents of

change. Even if the past evolution of the process of interest contains changes explainable in

deterministic terms (e.g. urbanization), again it is better to describe the future conditions in stationary

terms, after “stationarizing” the past observations, i.e. adapting them to represent the future conditions.

Introduction

While a phrase like “there is a lot of heat in the kitchen” is perfectly understandable in everyday

language, the same expression is unacceptable for a scientific text, or even in oral communications

among scientists. One would expect from a scientist to say “there is a lot of thermal energy in the

kitchen” or reformulate it as “the temperature in the kitchen is high”. Interestingly, despite such

scientific concepts like heat, (thermal) energy and temperature are centuries old, misconceptions and

misuses are still common, even among university students of thermal physics (Georgiou and Sharma,

2012) and may hinder learning and understanding. Perhaps the difficulties stem from the fact that

thermal physics concepts are au fond statistical concepts, while our education systems favour

deterministic approaches and give much less importance to probability, statistics and stochastic

processes, areas that are collectively described by the term stochastics.

While stochastics offers a much more powerful approach to model physical systems

(Koutsoyiannis 2010) including hydrological systems (Montanari and Koutsoyiannis 2012), it also

requires a different type of understanding for the concepts it introduces. Lack of such understanding

may result in misuse of the terms and the related concepts.

Take for instance the terms stationarity and nonstationarity. Interestingly, while the adjective

stationary is contained in English dictionaries with the meaning not moving or not changing, the nouns

stationarity and nonstationarity do not appear in common dictionaries like the Cambridge

(dictionary.cambridge.org), the Dictionary.com (dictionary.reference.com/), the Merriam-Webster

(www.merriam-webster.com/dictionary/) and the Oxford (www.oxforddictionaries.com/). The reason

is that they are only scientific terms, not words used in everyday language. Thus, in contrast to heat

and temperature which may be used colloquially in a loose manner, misuse of the stationarity and

2

nonstationarity can be more easily detected and hopefully avoided as there is no ambiguity owing to

difference in everyday language and scientific meaning.

However, for a wide variety of reasons, including its stochastic origin, ideologico-political

influences and a clustering effect, misuse of the terms stationarity and nonstationarity has been

common. Illustration of the clustering effect in the term’s misuse is offered by the paper by Milly et al.

(2008) with the bold title “Stationarity is dead”. A Google Scholar search reveals that up to now

(March 2014) 1350 papers contain the phrase “Stationarity is dead”. This can be interpreted as an

indication that more or less those papers have adopted the misuse of the term in the original paper.

Interestingly, only four papers query “Is stationarity dead?”, not a single paper says “Stationarity is not

dead” and only one paper says “Stationarity is alive”. Other searches, again using Google Scholar,

reveal that (as of March 2014) 660 papers speak about “Nonstationary world”, 551 about

“Nonstationary climate”, 15 about “Nonstationary catchment(s)” and 14 700 papers about

“Nonstationary data”. It can be conjectured that most of them misuse the term nonstationary, as will

be justified below.

In an attempt to combat such misuse, the World Meteorological Organization (Commission for

Hydrology 2013) has issued a statement with clarifications about the terms. This note has exactly the

same purpose.

Semantics and historical review

In this section we will try to reconstruct a possible history of the concept of stationarity based on some

key references including historical ones. The term stationary is originally a classical mathematical

term: given a function x(t), a stationary point thereof is a point in which the derivative is zero, i.e. x΄(t)

= 0. In dynamical systems, the notion of a stationary point has a richer meaning. A dynamical system

described by its state x(t) is typically characterized by a differential equation of the form x΄(t) = g(x(t)).

On the other hand, it is common to express the dynamical system in terms of a transformation St of its

initial state x(0) (at time 0) to its current state x(t) (at time t), that is, x(t) = St(x(0)) (Lasota and Mackey

1994). In this case, if x is a stationary point, i.e. a solution of x΄(t) = 0, it also satisfies x = St(x), that is,

it remains invariant under the transformation St. In classical algebra a point x satisfying x = St(x) is

called a fixed point or an invariant point and is distinguished from the stationary point. However in

differential equations books (e.g. Conrad, 2003) and dynamical systems books (e.g. Wiggins, 2003) all

these terms (and other ones, e.g., “equilibrium point”, “rest point”) are used interchangeably.

As an example consider the logistic differential equation

x΄(t) = g(x(t)) = a x(t) (1 – x(t)/b)

(1)

Solving the differential equation for x(t) = St(x(0)) with initial condition x(0) ≡ x we find that the

transformation St(x) is

(2)

It can be readily seen that there are two stationary points, x = 0 and x = b, which satisfy both x΄ = g(x)

= 0 and x = St(x).

Now, a more effective description of dynamical systems is achieved when, instead of studying

trajectories of points (i.e. the St(x) for varying t), we study the evolution in time t of probability

densities f(x) (Lasota and Mackey, 1994). In this case, the description through St(x) is replaced by that

based on Kt f(x), where Kt is a transformation operator. Accordingly, a stationary density is defined to

be a probability density function f which satisfies f = Kt f thus remaining unchanged in the course of

time. For a simple account about why evolution of probability densities provide a better description of

3

a system than trajectories of points and for more details about transformation operators, the reader is

referred to Koutsoyiannis (2010).

In this context, Kolmogorov (1931) used the term stationary to describe a probability density

function that is unchanged in time. In the same work, Kolmogorov introduced the term stochastic

process although he cited Bachelier (1900) as having already used stochastic processes in continuous

time; however, it appears that Bachelier did not use the term stochastic process. A more formal

definition of a stochastic process and stationarity was given a few years later by Khinchin (Khintchine

1934; notice the different transliteration in the English and the German literature of author’s name,

originally appearing in the Cyrillic alphabet). A concise presentation thereof has been given by

Kolmogorov (1938) as follows:

[…] a stationary stochastic process in the sense of Khinchin […] is a set of random variables xt

depending on the parameter t, −∞ < t < +∞, such that the distributions of the systems

(xt1, xt2, …, xtn) and (xt1 + τ, xt2 + τ, …, xtn + τ)

(3)

coincide for any n, t1, t2, … ,tn, and τ.

Kolmogorov (1947) has also defined wide-sense stationarity in terms of independence on time t of the

expectation E[xt + τ xt].

These authors were Russian (then Soviet) and their papers were published mostly in German.

The concepts were tranplanted in the English literature, unfortunately with some loss of clarity.

Perhaps the first who introduced them into the English literature was Doob (1934), from whom we

quote the following definition:

A stochastic process is defined by Khintchine [(1934)] to be a one parameter set of chance

variables: x(t), –∞ < t < ∞. It is supposed that if t1, ..., tn is any finite set of values of t, and aj < x

< bj, j = 1, . . .,n any set of intervals, the probability that

aj < x(tj) < bj, j = 1, …, n

(4)

is defined. If the probability that [(4)] is true is independent of translations of the t-axis, the

process is called stationary.

By comparing with the original paper (in German) or with Kolmogorov’s definition in (3) it can be

seen that Doob distorts somewhat Khinchin’s definition. Furthermore, the famous book by Kendall

and Stuart (1966, p. 404), while essentially keeping the Khinchin-Kolmogorov definition, speaks

about a stationary series instead of a stationary process. It further states that stationary time-series are

a particular case of the theory of stochastic processes. However, the definition of a time series given in

the book does not support the latter statement, while a definition of a stochastic process is missing.

Specifically the definition of a time series given by Kendall and Stuart (1966, p. 342) is the following:

Observations on a phenomenon which is moving through time generate an ordered set known as

a time series. The values assumed by a variable at time t may or may not embody an element of

random variation, but in the majority of cases with which we shall be concerned some such

element is present, if only as an error of observation.

This seems to recognize a time series as a series of observations, which could be a series of

realizations of the random variables that constitute the stochastic process (else known as a sample

function of the stochastic process). Also, it could be a series of values not necessarily associated with a

stochastic process. However, in the definition of the stationarity, the concept of a time series looks to

be treated as identical to (or subcase of) that of a stochastic process.

Indeed, in the English literature the concept of a time series is ambiguous, sometimes denoting a

realization of a stochastic process and other times denoting the stochastic process per se (with the

4

specification that the index set defining the stochastic process denotes time; e.g. Parzen 1957).

Nevertheless, there are books in the English literature characterized by perfect clarity, of which we

mention Papoulis (1991; first edition 1965). Such books follow the Khinchin-Kolmogorov definition

of stochastic processes and stationarity.

Some preliminary conclusions that we can draw from this historical review are: (a) stationarity

refers to stochastic processes; (b) stochastic processes are families of random variables usually (but

not necessarily) indexed by time; and (c) random variables are variables associated with a probability

distribution or density function. It follows that any attempt to conceptualize stationarity without

reference to the notion of a stochastic process will be inconsistent with the theory.

The concept of nonstationarity

Inevitably, the negation of stationarity, i.e. nonstationarity, needs to be defined and conceived again

within the notion of stochastic processes. The negation could be conceived with reference to Eqn. (3),

meaning that in a nonstationary process the probability density f(xt1+τ, xt2+τ, …, xtn+τ) for some (or all) τ

should not equal f(xt1, xt2, …, xtn). In turn, this means that the mathematical expression of f(xt1+τ, xt2+τ, …,

xtn + τ) should explicitly contain the time shift τ, or else that it should be a deterministic function of τ.

Often, in looser terms, a stationary process is thought of as a process whose statistical properties

do not change with time, while, by negation, a nonstationary process is thought of as a process whose

statistical properties do change with time. The latter statement, if combined with a loose perception of

what statistical properties are, and perhaps dismissal of “statistical” from “properties” has led to very

widespread misconceptions. Two common examples of such misconceptions are (a) that random

changes in some properties indicate nonstationarity, and (b) that it is not necessary to have

deterministic functions of time in order to claim nonstationarity.

We can clarify these misconceptions, by considering Eqn. (3) with just one variable, xt1, also

assuming t1 = 0, so that we have to compare statistical properties of xτ, with those of x0 (notice that

underlined symbols denote random variables and stochastic processes—the Dutch convention). For

misconception (a), let us assume that xτ can be decomposed in two parts as follows:

xτ = dτ + vτ

(5)

where vt is a stationary stochastic process. If the component dτ is a constant (dτ ≡ d, where d is a

number), then obviously xt is also a stationary stochastic process. If dτ is a deterministic function of

time (dτ ≡ d(τ)), then xτ will be a nonstationary stochastic process as its mean will be E[xτ] = d(τ) +

E[vτ], i.e., a function of time. Now if dτ changes in time, but not according to a deterministic function,

then it should be modelled as a stochastic process itself (dτ ≡ dτ). If the latter process is a stationary one

(see further clarification about this in section “Stationarity under change”) then the process xτ is

stationary (because E[xτ] = E[dτ] + E[vτ] = constant)—even though many would regard it as

nonstationary.

As an example to analyse misconception (b), let us consider as statistical properties of xτ its

moments E[

] for varying q = 0, 1, …. Such moments are not random variables, and thus they can be

either constants (E[

] ≡ mq) or deterministic functions of τ (E[

] ≡ mq(τ)). If we assume that all

moments E[

] for all q are not functions of τ then, by virtue of the moment theorem (Papoulis 1991,

p. 116) the probability density function f(xτ) is uniquely determined by the moments and will not be a

function of τ per se. Therefore, it is necessary that at least one moment E[

] for a certain q be a

deterministic function mq(τ) in order to have a time dependent f(xτ).

The above example is for first-order stationarity, because we used only one xτ. A process can be

first-order stationary yet being nonstationary, if joint statistical properties of variables corresponding

5

to different times are deterministic functions of time (e.g., if the joint moment E[xτ xτ + t] is a function

of both t and τ, the process is nonstationary). However, the above result can be generalized for two or

more variables as the moment theorem holds for more than one variable (e.g., Papoulis 1991, p. 160).

Hence, if a stochastic process is nonstationary, then at least one of its moments, marginal or joint,

should be a deterministic function of time.

In conclusion, as the meaning of change in everyday language is quite general and certainly

includes random change, statements like “a nonstationary process is a process whose statistical proper-

ties change with time” may be misunderstood. In order to be strictly valid, such statements should be

accompanied by the clarification that change here is meant as a deterministic function of time.

Philosophical reflections

While misconceptions about stationarity and nonstationarity can be relieved because the concept is

only 80 years old, the discourse in the Introduction (in particular, the typical phrases that involve

nonstationarity) reveals that there may be a more fundamental misconception about a notion that is 2.5

thousand years old. Specifically, this is related to confusing the material world with the world of

models, which usually are abstract (mathematical) entities. The distinction between the two and the

relevant discussion are at least as old as Plato’s metaphysical theory.

Figure 1 A schematic to illustrate Plato’s metaphysical theory (upper row) in comparison to a view more

compatible to modern science (lower row). The left column depicts the real world, which in Plato’s theory is the

world of the archetypes, while the material world is composed by imperfect “shadows” of the archetypes. By

rotating the schematic corresponding to Plato’s theory by 180º we identify the material world with the real world

and we replace the ideal archetypes with imperfect models of reality. In both cases, however, the intellectual

structures are distinguished from the material ones (image of pyramid from en.wikipedia.org/wiki/File:Giza-

pyramids.JPG).

Β

Α

ΓΔ

Ε

Β

Α

Γ

Δ

Ε

Plato’s

metaphysical

theory

Modern

physical

view

Real/perfect Imperfect (approximation)

Real/perfect Imperfect (shadow)

6

According to Plato’s theory, the real world is a world of ideal or perfect forms (αρχέτυπα,

archetypes). It is unchanging and unseen and it can only be perceived by reason (νοούμενα,

nooumena). The physical world is an imperfect image of the world of archetypes. Physical objects and

events are “shadows” of their ideal forms, are subject to change and can be perceived by senses

(φαινόμενα, phenomena).

By turning upside-down Plato’s theory (or making a 180º turn, as illustrated in Fig. 1) we obtain

a view that is more consistent with modern science. According to the latter, the physical world is the

real world. It is perfect and it is perpetually changing. Abstract representations or models of the real

world are imperfect but can be useful to describe the real world. It should be noted though that, while

Plato’s archetypes and modern models have in common the fact that are abstract concepts perceived

by reason, the two notions are not identical.

Whether one accepts the original or the upside-down version of Plato’s theory, it is important

that both make the distinctions: physical world ≠ models (or forms) and phenomena ≠ nooumena.

Also, both recognize that in the physical world change is the rule, a fact that had been earlier and aptly

expressed by Heraclitus as “Πάντα ῥεῖ” (Panta rhei, Everything flows). The latter aphorism has now

become the emblem of the IAHS initiative for the current decade (Montanari et al. 2013).

With respect to the discussion of the earlier section, it is quite easy to understand that

probability and all concepts based on it are abstract concepts, and not properties of the physical world.

Attempts to define probability in terms of experience from the physical world resulted in logical

problems (e.g. circular logic), which were resolved only when Kolmogorov (1933) introduced the

axiomatic foundation of probability. The abstract character of probability is more evident in the

Bayesian interpretation thereof. Likewise, the stochastic processes, as well as the random variables

they consist of and their probability density functions, are models and not physical objects. This means

that stationarity and nonstationarity are properties belonging to the world of models. In contrast, the

objects and processes of the real world are neither stationary nor nonstationary; they are just

perpetually changing.

Stationarity under change

Stationarity does not contradict change. Rather it offers a powerful way to model change. Without

change, we would not need the concept of a stationary process, not even that of a process. Note that a

process is defined in common English dictionaries, such as those mentioned above, as a series of

changes—or actions. In addition, in the scientific literature the term process has been introduced as

synonymous to change, as evident in Kolmogorov’s (1931) pioneering paper, which starts as “A

physical process [is] a change of a certain physical system”.

It is very common in science to try to identify invariant properties within change (Koutsoyiannis

2011). For example, in the absence of an external force, the position of a body in motion changes in

time but the velocity is unchanged (Newton’s first law). If a constant force is present, then the velocity

changes but the acceleration is constant (Newton’s second law). If the force changes, e.g. the

gravitational force with changing distance in planetary motion, the acceleration is no longer constant,

but other invariant properties emerge, e.g. the angular momentum (Newton’s law of gravitation; see

also Koutsoyiannis 2011).

Also, in motion of fluids we speak about steady flow. This is not a contradiction. Obviously,

there is change because of the flow (change and flow are tightly connected), but in a steady flow the

velocity does not change in time.

Likewise, the theory of dynamical systems is a theory describing change. Amazingly, most of

dynamical systems used to model natural processes are autonomous systems, that is, they are

expressed by autonomous differential equations, e.g., x΄(t) = g(x(t)), which do not explicitly depend on

7

time. In the more general case of non-autonomous systems, i.e. those expressed as x΄(t) = g(x(t); t), the

dynamics is not the same on the intervals [0, t΄] and [t, t + t΄]. However, this is rarely the case because

the laws of nature which hold now are identical to those holding for any time in the past or future.

Even if change in the laws happens to be the case (e.g. in a macroscopic description of a complex

system) again the non-autonomous system is converted into an autonomous one. This is a rather easy

task as it only needs the definition of new dependent variables (Mackey 1992).

Now, coming to the notion of a stochastic process, we note that it was invented to describe the

irregular changes of complex natural systems, which are impossible to model deterministically in full

detail or predict their future evolution in detail and with precision. Here, the great scientific

achievement is the invention of macroscopic descriptions instead of modelling the details. This is

essentially done using stochastics. Here lies the essence and usefulness of the stationarity concept,

which seeks invariant properties in complex systems (Koutsoyiannis, 2011). As in the case of

converting non-autonomous to autonomous systems, again even if a process is nonstationary, it should

be converted into a stationary one, which can be effectively studied and used in predictions.

As the stochastic model is a mathematical construction, the use of a stationary or a

nonstationary model is a modelling option rather than a property of the observational data of the real

world. Of course any model, whether deterministic or stochastic, stationary or nonstationary, should

be consistent with the data. But the data alone do not suffice to make a correct modelling choice.

This is illustrated in Figure 2, which depicts 1000 terms of a synthetic time series; the present

time, as designated in the figure, divides the time axis into two parts, the past (with 150 observations)

and the future. Considering the 150 observations of the past, one sees two step changes: a large rise at

time 70 and a smaller drop at time 120. According to common statistical practice, one would describe

such changes as nonstationarities and would construct a nonstationary model with different statistical

properties for each of the three periods designated by the two step changes. But such a description is

totally useless if prediction of the future is of interest and, at the same time, it contradicts the fact that

the stochastic process which generated the time series is in fact a stationary process, which is

described in detail in Koutsoyiannis (2011).

Figure 2 Schematic for a prediction problem based on a time series.

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 100 200 300 400 500 600 700 800 900 1000

Time, i

Time series

Local average

Global average

FuturePast

Present

8

It is recalled that nonstationarity is determined based on ensemble means or, more generally,

ensemble statistical properties of a stochastic process. The ensemble mean is defined as

E[x] =

(6)

while the temporal mean is

(7)

It is seen that the temporal mean is a stochastic process per se, that is, a random function of time. The

ensemble mean is not a random quantity, but in a stationary process it is a single number. As explained

in section “The concept of nonstationarity”, if the process is nonstationary, E[x] can vary in time,

being a deterministic function of time (i.e., precisely known and perfectly predictable). But, as the

graphical example shows, this cannot be inferred from data only, because the data can give us only the

temporal mean (actually a realization thereof) and not the ensemble mean. Rather, the data can

mislead us to adopt a nonstationary description, while the underlining process is stationary. To

establish a deterministic function of time, as required in order to claim nonstationarity, we need at

least both of the following conditions to hold: (a) deductive reasoning in order to establish the

deterministic function of time; (b) validation of the deterministic function by data which were not used

in the model construction.

It seems the frequent invocation of nonstationarity, as well as deterministic trends, shifts and

cycles, is a remnant of deterministic thinking, while using stochastic descriptions. The actual

stochastic concept that should replace all these, within a stationary stochastic setting is dependence,

possibly a long-range one. The handling of dependence, and eventually its utilization in prediction, is

made through conditioning on known information, e.g. observations of the past. For example,

assuming that x0, x1, …, xn, are observations at n past times t0, t1, …, tn – 1, and at the present time tn,

the conditional average of the process x(t) for any future time t > tn will be a function of time, h(t):

E[x(t)|x0, x1, …, xn] = h(t) ≠ E[x(t)]

(8)

Likewise, the conditional variance is also a (non-decreasing) function of time and every statistical

characteristic, conditional on observations, can be a function of time. All these functions of time are

deterministic functions, derived by deductive reasoning (using stochastic processes algebra). Hence,

conditioning results in statistical characteristics that vary in time, even though overall the model is

stationary.

Figure 2 further illustrates the effect of conditioning on the past, within stationarity, in order to

predict the future. While the past values can be known from observation, the evolution of the process

in future times is unknown. Ignorance of the future does not concern only the precise value at each

time. The local averages, shown as dashed lines in the figure are also unknown (had they been known,

the model would be nonstationary). Let us suppose we are required to make a prediction for the future.

If the prediction horizon is long, then we will use the global (i.e. the true or ensemble) average and the

global variance for our prediction. However, if the prediction horizon is short, then we will use the

local average at the present time and a reduced variance. More generally, by utilizing the dependence

structure of the model, which can be inferred from the observations, assuming that their length is

sufficient, we are able to make predictions conditioning on the observations, i.e. formally evaluating

Eqn. (8) for the assumed stationary stochastic model and for the required prediction horizon t.

Justified use of nonstationary models

As clarified above, in a nonstationary process, to describe change we use both stochastic variations

and deterministic functions. By definition, a deterministic function should be constructed by deduction

9

(the Aristotelian apodeixis), not by induction (the Aristotelian epagoge) which makes direct use of the

data (see also next section). Because it explains in deterministic terms part of the variability, a

nonstationary description is associated with reduced uncertainty. Hence unjustified or inappropriate

claim of nonstationarity results in underestimation of variability, uncertainty and risk.

However, there are cases where use of a nonstationary description is justified. Sometimes we

undertake modelling of the past, in order to explain observed behaviours. Changes in hydrological

behaviours happen all the time, as a result of changes in quantifiable characteristics of catchments and

conceptual parameters of models. If we know the evolution of these characteristics and parameters

(e.g. in addition to hydrological observations, we have information about how the percent of urban

area changed in time), then we can build a nonstationary model. The specific information has reduced

uncertainty thus enabling a nonstationary description. In contrast, if we see a changing behaviour but

we do not have this quantitative information, then we may treat the catchment characteristics and

parameters as random variables. In this case we build stationary models entailing larger uncertainty.

It is important to distinguish explanation of observed phenomena in the past from modelling

that is made for the future. Except for trivial cases, the future is not easy to predict in deterministic

terms. If changes in the recent past are foreseen to endure in the future (e.g. urbanization, hydraulic

infrastructures), then the model of the future should be adapted to the most recent past. This may

imply a stationary model of the future that is different from that of the distant past (prior to the

change). It may also require “stationarizing” of the past observations, i.e. adapting them to represent

the future conditions (e.g. the flow data prior to the construction of the dam could be adapted to

determine what the flow would be if the dam existed).

In the case of planned and controllable future changes (e.g. catchment modification by hydraulic

infrastructures, water abstractions), which indeed allow prediction in deterministic terms or at least

condition our predictions, nonstationary models are justified.

Stationarity, ergodicity and inductive inference

It is important to note that stationarity is also related to ergodicity, which in turn is a prerequisite to

make inference from data, that is, induction. In dynamical systems, by definition (e.g. Mackey, 1992,

p. 48), ergodicity is the property of a system whose all invariant sets under the dynamic transformation

are trivial (have zero probability). In other words, in an ergodic transformation starting from any point,

the trajectory of the system state will visit all other points, without being trapped to a certain subset.

The ergodic theorem (Birkhoff 1931; see also Mackey 1992 p. 54) allows redefining ergodicity within

the stochastic processes domain (Papoulis 1991 p. 427; Koutsoyiannis 2010) in the following manner:

A stochastic process xt is ergodic if the time average of any (integrable) function g(xt), as time tends to

infinity, equals the true (ensemble) expectation E[g(xt)]. Thus, with reference to Eqn. (7), in an ergodic

system the time average, as t΄ → ∞, will tend to the ensemble average given in (6). This allows the

estimation (i.e. approximate calculation) of the true but unknown property E[xt] from the , that is

from the available data. Without ergodicity inference from data would not be possible.

Now, if the system that is modelled in a stochastic framework has deterministic dynamics

(meaning that a system input will give a single system response, as happens for example in most

hydrological models) then a theorem applies (Mackey 1992, theorem 4.5 p. 52), according to which a

dynamical system with dynamics St(x) has a stationary probability density if and only if it is ergodic.

Therefore, a stationary system is also ergodic and vice versa, and a nonstationary system is also non-

ergodic and vice versa. Here we note that even if a system has deterministic dynamics, again it is

legitimate to use a stochastic description, replacing the study of the evolution of system states St(x)

with the evolution of probability densities of states f(x) as already mentioned in the section “Semantics

and historical review”; one reason to prefer the stochastic description over the pure deterministic

10

description is that the former incudes quantification of uncertainty, whereas the deterministic

dynamics does not eliminate uncertainty (Koutsoyiannis 2010). Furthermore, we clarify that the

deterministic description through the transformation St(x) is fully compatible with a stochastic

description that is stationary and ergodic, according to theorem stated above: while the system state is

changing in time t according to the transformation St(x), its statistical properties (and the probability

density f(x)) can be constant in time.

If the system dynamics is stochastic (a single input could result in multiple outputs), then

ergodicity and stationarity do not necessarily coincide. However, recalling that a stochastic process is

a model and not part of the real world, we can always conveniently device a stochastic process that is

ergodic, provided that we have excluded nonstationarity. To clarify this idea we give the following

example. A well-known stochastic process xt which is stationary and non-ergodic is (Papoulis 1991, p.

434)

xt a ut

(9)

where a is a random variable and ut is a stationary and ergodic stochastic process. A single realization

of xt does not enable the estimation of its true statistics because all its history corresponds to a single

realization a of the random variable a. However, the single realization allows the estimation of the

statistics of the process x΄t a ut ≡ xt|a = a, which is stationary and ergodic. In most practical

problems corresponding to physical systems, which actually do not allow different realizations but

rather have a unique evolution, the properties of the ergodic process x΄t rather than those of the non-

ergodic xt are actually of interest.

In conclusion, from a practical point of view ergodicity can always be assumed when there is

stationarity, while this assumption if fully justified by the theory if the system dynamics is

deterministic. Conversely, if nonstationarity is assumed, then ergodicity cannot hold, which forbids

inference from data. This contradicts the basic premise in geosciences, where data are the only reliable

information in building models and making inference and prediction.

Conclusions

Change is Nature’s style and occurs at all times and all time scales (i.e. Πάντα ῥεῖ). The frequent use

of the term nonstationarity lately indicates that it is confused with change. However, change is not

synonymous to nonstationarity: While change is a general notion applicable everywhere, including to

the real (material) world, stationarity and nonstationarity apply only to models, not to the real world,

and are defined within stochastics. The confusion about these terms extends also to the world of

models, as several properties related to stochastic dependence of processes are often interpreted as

nonstationarity. Nonstationary descriptions are justified only if the future can be predicted in

deterministic terms and are associated with reduction of uncertainty, while misuse of nonstationarity

results in underestimation of variability, uncertainty and risk. In absence of credible predictions of the

future, admitting stationarity (and larger uncertainty) provides a more consistent and more effective

modelling option.

Acknowledgment The present work was developed within the framework of the Panta Rhei Research

Initiative of the International Association of Hydrological Sciences (IAHS). We thank two anonymous

reviewers, the eponymous reviewer Tim Cohn and the Guest Editor Guillaume Thirel for their

comments which resulted in expansion and improvement of the paper.

11

References

Bachelier, M. L., 1900. Théorie de la spéculation, Ann. Ecole Norm. Super. 17, 21-86.

Birkhoff, G. D., 1931. Proof of the ergodic theorem, Proc. Nat. Acad. Sci., 17, 656–660.

Commission for Hydrology, 2013, CHy Statement on the Terms Stationarity and Nonstationarity, World

Meteorological Organization,

http://www.wmo.int/pages/prog/hwrp/publications/statements/Stationarity_CHy_Statement.pdf.

Conrad, B. P., 2003, Differential Equations, A Systems Approach, Prentice Hall, Upper Saddle River, NJ, USA,

462 pp.

Doob, J.L., 1934. Stochastic processes and statistics. Proceedings of the National Academy of Sciences of the

United States of America, 20(6), p.376.

Georgiou, H., and Sharma, M. D. 2012, University students’ understanding of thermal physics in everyday

contexts, International Journal of Science and Mathematics Education, 10, 1119-1142.

Kendall, M. G. & Stuart, A. (1966) The Advanced Theory of Statistics, vol. 3: Design and Analysis, and Time-

series. 552 pp., Griffin, London, UK.

Khintchine, A., 1934. Korrelationstheorie der stationären stochastischen Prozesse. Mathematische Annalen, 109

(1), 604–615.

Kolmogorov, A. N., 1931. Uber die analytischen Methoden in der Wahrscheinlichkcitsrechnung, Math. Ann.

104, 415-458. (English translation: On analytical methods in probability theory, In: Kolmogorov, A.N.,

1992. Selected Works of A. N. Kolmogorov - Volume 2, Probability Theory and Mathematical Statistics

A. N. Shiryayev, ed., Kluwer, Dordrecht, The Netherlands, pp. 62-108).

Kolmogorov, A. N., 1933. Grundbegriffe der Wahrscheinlichkeitsrechnung (Ergebnisse der Mathematik und

Ihrer Grenzgebiete), Berlin (2nd English Edition: Foundations of the Theory of Probability, 84 pp.

Chelsea Publishing Company, New York, 1956).

Kolmogorov, A.N., 1938, A simplified proof of the Birkhoff-Khinchin ergodic theorem, Uspekhi Mat. Nauk 5,

52-56. (English edition: Kolmogorov, A.N., 1991, Selected Works of A. N. Kolmogorov - Volume 1,

Mathematics and Mechanics, Tikhomirov, V. M. ed., Kluwer, Dordrecht, The Netherlands, pp. 271-276).

Kolmogorov, A.N., 1947, Statistical theory of oscillations with continuous spectrum, Collected papers on the

30th anniversary of the Great October Socialist Revolution, Vol. 1, Akad. Nauk SSSR, Moscow-

Leningrad, pp. 242-252. (English edition: Kolmogorov, A.N., 1992. Selected Works of A. N.

Kolmogorov - Volume 2, Probability Theory and Mathematical Statistics A. N. Shiryayev ed., Kluwer,

Dordrecht, The Netherlands, pp. 321-330).

Koutsoyiannis, D., 2010, A random walk on water, Hydrology and Earth System Sciences, 14, 585–601.

Koutsoyiannis, D., 2011, Hurst-Kolmogorov dynamics and uncertainty, Journal of the American Water

Resources Association, 47 (3), 481–495.

Lasota, A., and Mackey, M. C.: Chaos, Fractals and Noise, Springer-Verlag, 1994.

Mackey, M. C. 1992, Time’s Arrow: The Origins of Thermodynamic Behavior, Dover, Mineola, NY, USA, 175

pp.

Milly, P. C. D., J. Betancourt, M. Falkenmark, R. M. Hirsch, Z. W. Kundzewicz, D. P. Lettenmaier and R. J.

Stouffer, 2008, Stationarity is dead: whither water management?, Science, 319, 573-574.

Montanari, A., and D. Koutsoyiannis, 2012, A blueprint for process-based modeling of uncertain hydrological

systems, Water Resources Research, 48, W09555, doi:10.1029/2011WR011412.

Montanari, A., et al. 2013, “Panta Rhei – Everything Flows”, Change in Hydrology and Society – The IAHS

Scientific Decade 2013-2022, Hydrological Sciences Journal, 58 (6), 1256–1275.

Papoulis, A., 1991, Probability, Random Variables, and Stochastic Processes, 3rd ed., McGraw-Hill, New York,

USA.

Parzen, E. 1957. On consistent estimates of the spectrum of a stationary time series, The Annals of Mathematical

Statistics 28 (2), 329-348.

Wiggins, S., 2003. Introduction to applied nonlinear dynamical systems and chaos, Ed. 2. Springer-Verlag, New

York, USA, 843 pp.