ArticlePDF Available

Abstract and Figures

Three common stochastic tools, the climacogram i.e. variance of the time averaged process over averaging time scale, the autocovariance function and the power spectrum are compared to each other to assess each one’s advantages and disadvantages in stochastic modelling and statistical inference. Although in theory, all three are equivalent to each other (transformations one another expressing second order stochastic properties), in practical application their ability to characterize a geophysical process and their utility as statistical estimators may vary. In the analysis both Markovian and non Markovian stochastic processes, which have exponential and power-type autocovariances, respectively, are used. It is shown that, due to high bias in autocovariance estimation, as well as effects of process discretization and finite sample size, the power spectrum is also prone to bias and discretization errors as well as high uncertainty, which may misrepresent the process behaviour (e.g. Hurst phenomenon) if not taken into account. Moreover, it is shown that the classical climacogram estimator has small error as well as an expected value always positive, well-behaved and close to its mode (most probable value), all of which are important advantages in stochastic model building. In contrast, the power spectrum and the autocovariance do not have some of these properties. Therefore, when building a stochastic model, it seems beneficial to start from the climacogram, rather than the power spectrum or the autocovariance. The results are illustrated by a real world application based on the analysis of a long time series of high-frequency turbulent flow measurements.
Content may be subject to copyright.
1
Climacogram vs. autocovariance and power spectrum in stochastic
1
modelling for Markovian and Hurst-Kolmogorov processes
2
Panayiotis Dimitriadis* and Demetris Koutsoyiannis 3
Department of Water Resources and Environmental Engineering, School of Civil Engineering, 4
National Technical University of Athens, Heroon Polytechneiou 5, 158 80 Zographou, Greece 5
*corresponding author, email: pandim@itia.ntua.gr, tel.: +30-210-772-28-38, fax: +30-210-772-28-32. 6
Abstract
7
Three common stochastic tools, the climacogram i.e. variance of the time averaged process over 8
averaging time scale, the autocovariance function and the power spectrum are compared to each 9
other to assess each one’s advantages and disadvantages in stochastic modelling and statistical 10
inference. Although in theory all three are equivalent to each other (transformations one another 11
expressing second order stochastic properties), in practical application their ability to characterize a 12
geophysical process and their utility as statistical estimators may vary. In the analysis both Markovian 13
and non Markovian stochastic processes, which have exponential and power-type autocovariances, 14
respectively, are used. It is shown that, due to high bias in autocovariance estimation, as well as 15
effects of process discretization and finite sample size, the power spectrum is also prone to bias and 16
discretization errors as well as high uncertainty, which may misrepresent the process behaviour 17
(e.g. Hurst phenomenon) if not taken into account. Moreover, it is shown that the classical 18
climacogram estimator has small error as well as an expected value always positive, well-behaved 19
and close to its mode (most probable value), all of which are important advantages in stochastic 20
model building. In contrast, the power spectrum and the autocovariance do not have some of these 21
properties. Therefore, when building a stochastic model, it seems beneficial to start from the 22
climacogram, rather than the power spectrum or the autocovariance. The results are illustrated 23
by a real world application based on the analysis of a long time series of high-frequency turbulent 24
flow measurements. 25
Keywords: stochastic modelling; climacogram; autocovariance; power spectrum; uncertainty; bias; 26
turbulence 27
1. Introduction
28
The power spectrum (or else spectral density) was introduced as a tool to estimate the distribution of 29
the power (i.e. energy over time) of a sample over frequency, more than a century ago by Schuster 30
(Stoica and Moses, 2004, p. xiii). Since then, various methods have been proposed and used to estimate 31
the power spectrum, via the Fourier transform of the time series (periodogram) or its autocovariance 32
or autocorrelation functions (for more information on these methods see in Stoica and Moses, 2004, ch. 33
2 and Gilgen et al., 2006, ch. 9). Most common (and also used in this paper) is that of the 34
autocovariance which corresponds to the definition of the power spectrum of a stochastic process (for 35
details, see sect. 2.3). However, this accurate mathematical definition lacks immediate physical 36
interpretation since the Fourier transform of a function is nothing more than a mathematical tool to 37
represent the function in the frequency domain in order to identify any periodic patterns which are 38
not easily tracked in the time domain. 39
Several researchers have tried in the past to evaluate the statistical estimator of the power spectrum 40
concluding that its major disadvantage is that of its large variance (Stoica and Moses, 2004, p. xiv). 41
Notably, this variance is not reduced with increased sample size (Papoulis, 1991, p. 447). To remedy 42
this, several mathematical smoothing techniques (e.g. windowing, regression analysis, see Stoica and 43
Moses, 2004, ch. 2.6) have been developed. In cases of short datasets, trend-line approaches are most 44
commonly used to obtain a very rough estimation of the model behaviour or simple rules to 45
2
distinguish exponential and power-type behaviours (e.g., Fleming, 2008). In cases of long datasets, the 46
most commonly used approach is the windowing (data partitioning), also known as the Welch 47
approach, where a certain window function (the simplest of which is the Bartlett window) is applied 48
to nearly independent segments. In the latter method, one has first to divide the sample into several 49
segments (but only after insuring these segments have very small correlations between them), to 50
calculate the power spectrum for each segment and then to estimate the average. Assuming that the 51
process is stationary, this average will be the power spectrum estimate. Unfortunately, the more 52
segments we divide the sample into, the more the cross-correlations between segments are increasing 53
as well as the more we lose in low frequency values (since the lowest frequency is determined by the 54
length of the segments). Thus, this method could be indeed a robust one, but only for a very long 55
sample (which is a rare case in geophysics), only when there is no interest in the low frequency values 56
(which can reveal large-scale behaviours) and only for an unbiased power spectrum estimator or at 57
least for an ‘a priori’ known bias, e.g. via an analytical equation (which, as we will show in this study, 58
is rarely the case). Based on these limitations, Dimitriadis et al. (2012) and Koutsoyiannis (2013a, b) 59
provided examples where this smoothing technique fails to detect the large scale behaviour (i.e. Hurst 60
phenomenon), gives small scale trends that are completely different from the ones characterizing the 61
stochastic model and have several numerical calculation problems that could cause misinterpretation 62
(see sect. 4 and Fig. 10d for an illustrative example of the limitations of this method). These all are due 63
to the fact that the power spectrum estimator is biased and it is difficult to estimate this bias 64
analytically. Nevertheless, the power spectrum is a useful tool to analyze a sample in harmonic 65
functions and so, to detect any dominant frequencies (this is the reason behind harmonic analysis 66
introduced by J. Fourier, 1822). 67
In this paper, we investigate the bias in power spectrum estimator (evaluated via the autocovariance) 68
which are caused by the bias of autocovariance, the finite sample size and discretization of the 69
continuous-time process, complementing earlier studies (e.g. Stoica and Moses, 2004, ch. 2.4). We also 70
examine the asymptotic behaviour when the sample size tends to infinity, investigating the question 71
whether or not the discrete power spectrum estimator is asymptotically unbiased or not. We perform 72
similar investigations for the climacogram, a term coined by Koutsoyiannis (2010) to describe the 73
variance of the time averaged process as a function of time scale. The concepts of autocovariance, 74
power spectrum and climacogram are examined using both exponential and power-type 75
autocovariance, as well as combinations thereof, in order to obtain representative results for most 76
types of geophysical processes. 77
In sect. 2, we give the definitions of the concepts used in the paper and in sect. 3, we investigate the 78
estimation of the climacogram, the autocovariance and the power spectrum for some characteristic 79
processes, and we compare their classical estimators based on illustrative examples. In sect. 4, we 80
present an application of these stochastic tools to a small scale turbulent process and propose certain 81
practices to be used in stochastic modelling. Finally, in sect. 5 we summarize the analyses and derive 82
some conclusions. 83
2. Definitions and notations
84
Stochastic processes are families of random variables (denoted as , where underlined symbols 85
denote random variables and t denotes time) that are often used to represent the temporal evolution 86
of natural processes. Natural processes as well as their mathematical representation as stochastic 87
processes evolve in continuous time. However, observed time series from these processes are 88
characterized by a sampling time interval D, often fixed by the observer and a response time Δ of the 8 9
instrument (Fig. 1). The time constants D and Δ affect the estimation of the statistical properties of the 90
continuous time process. Two special cases, Δ 0 and D = Δ, are analyzed by Koutsoyiannis (2013a) 91
who shows that in most tasks the differences are small and thus, here we will focus only on the case D 92
= Δ > 0 that is also practical for samples with small D (the Markovian process for any D and Δ, in 93
3
terms of its autocovariance, is shown in sect. 4 of the supplementary material, abbreviated as SM). 94
Thus, the discrete time stochastic process

, for D = Δ > 0, can be calculated from as: 95


(1) 96
where  is an index representing discrete time,  is the total number of observations 97
and  is time length of observations. 98
99
Figure 1: An example of a continuous time process sampled at time intervals D for a total period T and 100
with instrument response time Δ. 101
2.1 Climacogram
102
The climacogram (Koutsoyiannis, 2013a) comes from the Greek word climax (meaning scale). It is 103
defined as the (plot of) variance of the averaged process (assuming stationary) versus averaging 104
time scale m and is symbolized by γ(m). The climacogram is useful for detecting the long term change 105
(or else dependence, persistence, clustering) of a process. This can be quantified through the Hurst 106
coefficient H, which equals the half of the slope of the climacogram in a log-log plot, as scale tends to 107
infinity, plus 1. For sufficiently large scales, if 0 ≤ H < 0.5 the process is anti-correlated (for more 108
information see e.g., Koutsoyiannis, 2010), for 0.5 < H 1 the process is positively correlated (most 109
common case in geophysical processes) and for H = 0.5 the process is purely random (zero 110
autocorrelation, thus white noise behaviour) at these large scales. Long-term persistence in natural 111
processes was first discovered by H.E. Hurst (1951) while A. Kolmogorov (1941) mathematically 1 12
described it, working on self-similar processes while studying turbulence. This behaviour is also 113
known as the Hurst phenomenon or Hurst-Kolmogorov (HK) behavior (Koutsoyiannis, 2010). A 114
stochastic process with HK behaviour with constant slope of climacogram (–2 + 2H) for all scales m 115
(not only asymptotically), is known as a Hurst-Kolmogorov process or fractional Gaussian noise (see 116
sect. 2 of the SM). In Table 1, we introduce the climacogram definition in case of a stochastic process in 117
continuous time (eq. 2) and in discrete time (eq. 3), a widely used climacogram estimator (eq. 4) as 118
well as climacogram estimation based on the latter estimator and expressed as a function of the true 1 19
climacogram (eq. 5). 120
121
4
Table 1: Climacogram definition and expressions for a process in continuous and discrete time, along 122
with the properties of its estimator. 123
Type Climacogram
continuous
!
"
#
$%&
(
)
(
*
+
,
*
"
.
$%&
/
0
(
)
(
,
1
2
"
.
where
3
"
4
+
and
!
5
$%&
(2)
discrete
!
8
#
9:;
'
<
=
>
=
?
>
@
-
A
B
9:;
'
<
=
>
=
?
-
A
B
!
8
where
3
8
C
is the dimensionless scale for a discrete time process
(3)
classical
estimator
!
D
8
E
F
G
E
<
H
E
A
I
<
J
A
J
K
A
G
E
+
E
L
M
<
=
N
=?
F
O
.
F
K
E
(4)
expectation
of classical
estimator
P
!
D
8
E
G
Q
R
F
Q
R
A
EGAF
!
8
(5)
2.2 Autocovariance
124
The climacogram is fully determined if the autocovariance is known and vice versa. The specifics of 12 5
the autocovariance, including its definition and estimator, are displayed in Table 2. Note that 12 6
autocovariance is an even function. 127
128
Table 2: Autocovariance definition and expressions for a process in continuous and discrete time, 129
along with the properties of its estimator. 130
Type Autocovariance
continuous
*
S
T
#
UVW
X
T
)
.
T
.
!
T
Y
)
T
.
3
where
3
T
4
is the lag for a continuous time process (in time units)
(6)
discrete
S
Z
#
UVW
+
[
Y\ZX
.
!IZXLXZM
.
!IZMLMYZ
.
!Z]
where3Z^ is the lag for the process at discrete time (dimensionless) and
the right-hand side of the equation corresponds to the 2
nd
central finite
derivative j
2
γ().
(7)
classical
estimator
S
_
Z
E
`[
<
a
M
E
F
I
<
J
F
J
K
E
L
b
a
+
[
M
E
F
I
<
J
F
J
K
E
L
b
F
G
[
K
E
where
c
Z
is usually taken as: n or n – 1 or n j
(8)
expectation
of classical
estimator
**
P
S
_
Z
c
Z
d
M
Z
S
Z
X
Z
.
!
Z
M
Z!

M
M
Z
.
!
M
Z
e
f
(9)
*
Eq. 6 can also be solved in terms of γ to yield (Koutsoyiannis 2013a): !"YMS")
E
1
. 131
**
For proof see Appendix. 132
It is easy to see that for Δ > 0: 133
5
S

#!

!g!S (10) 134
2.3 Power spectrum
135
Historically the power spectrum was defined in terms of the Fourier transform of the process x(t) by 136
taking the expected value of the squared norm of the transform for time tending to infinity, which for 137
a stationary process converges to the Fourier transform of its autocovariance (this is known as the 138
Wiener- Khintchine theorem after Wiener, 1930, and Khintchine, 1934). Both definitions can be used 139
for the power spectrum; however the latter is simpler and more operational and has been preferred in 1 40
modern texts (e.g. Papoulis, 1991, ch. 12.4). In Table 3, we summarize the basic equations for the 141
power spectrum definition and estimation. 142
143
Table 3: Power spectrum definition and expressions for a process in continuous and discrete time, 144
along with the properties of its estimator. 145
Type Power spectrum
continuous
*
h
i
#
j
0
S
T
kVl
Y
m
iT
)
T
n
1
where
3
i
4
is the frequency for a continuous time process (in inverse time
units)
(11)
discrete
**
h
o
#
Y
!
X
j
p
S
Z
kVl
Y
m
o
Z
n
[KE
where
3
o
4
is the frequency for a discrete time process (dimensionless; ω
= )
(12)
classical
estimator
***
h
_
o
Y
S
_
X
j
<
S
_
Z
kVl
Y
m
o
Z
F
[
K
E
(13)
expectation
of classical
estimator
***
P
h
_
o
Y
I
!
M
!

L
c
X
XjpkVlYmoZ
cZdMZS

ZXZ
.
!ZMZ!
F
[KE
M
M
Z
.
!
M
Z
O
(14)
*
Eq. 11 can be solved in terms of c to yield: SThikVlYmiT)i
n
1
. 146
**
Eq. 12 can be solved in terms of S

to yield: S

Zh

okVlYmoZ)o
E .
q
1
. 147
***
Eq. 13 and 14 are more easily calculated with fast Fourier transform (fft) algorithms. 148
149
Note that power spectrum is an even function. As easily verified from eq. 12, in discrete time the 150
power spectrum is periodic with period 1. Continuous and discrete time power spectra can be linked 151
to each other by the simple equation (Koutsoyiannis, 2013a): 152 h

o<h\
r+[
]lst
.
ImoXZLumoXZv
.
n
[KGn
(15) 153
6
3. Statistical behaviour of the estimation of climacogram,
154
autocovariance and power spectrum
155
Various physical interpretations of geophysical processes are based on the power spectrum and/or 156
autocovariance behaviour (e.g. spectral density function of free isotropic turbulence, see in Pope, 2010, 157
p. 610). However, the estimation of these tools from data may distort the true behaviour of the process 158
and thus, may lead to wrong or unnecessarily complicated interpretation. To study the possible 159
distortion we use the simplest processes often met in geophysics, which could also be used in 160
synthesizing more complicated ones. Specifically, we investigate and compare the climacogram, 1 61
autocovariance and power spectrum of various simple stochastic processes (whose expressions are 162
presented in sect. 3.1) in terms of their behaviour and of their estimator performance (sect. 3.2 and 3.3) 163
for different values of their parameters. 164
3.1 Testing stochastic models
165
To investigate the statistical behaviour of the estimators of the three tools, climacogram, 1 66
autocovariance and power spectrum, we use two simple models. The first is the well-known 167
Markovian model, else known as Ornstein-Uhlenbeck model, which has an exponentially decaying 168
autocovariance. The second is a generalization of the HK process (abbreviation gHK), whose 1 69
autocovariance decays as a power function of lag for large time lags while it is virtually an exponential 170
function of lag, for small lags. Note that in sect. 2 of the SM, we also test the HK model. 171
In Table 4 and 5, we provide the mathematical expressions of the climacogram, autocovariance and 172
power spectrum of a Markovian and gHK stochastic processes, respectively, in continuous and 173
discrete time. Their estimates can be found from eq. 5, 9 and 14 and their model parameters, λ and q 174
have dimensions [x
2
] and T, respectively, while b is dimensionless. 175
176
Table 4: Climacogram, autocovariance and power spectrum expressions of a Markovian process, in 177
continuous and discrete time. 178
Type Markovian process
Autocovariance
(continuous)
S
T
w
x
G
y
z
y
{
(16)
Autocovariance
(discrete)
S
|
Z
w
I
M
x
G
{
q
L
.
}
.
3
~
G
y
[
y
G
E
{
q
for
y
Z
y
≥ 1 and
S
|
!
(17)
Climacogram
(for continuous
and discrete)
!
"
Y
w
"
}
.
I
"
}
X
x
G
,
{
q
M
L
with
!
w
(18)
Power spectrum
(continuous)
h
i
j
w
}
X
j
m
}
.
i
.
(19)
Power spectrum
(discrete)
h
|
o
j
w}
H
M
}
q
3
M
kVl
Y
•
o
lst€
}
q
kVl€
}
q
M
kVl
Y
•
o
O
(20)
179
7
Table 5: Climacogram, autocovariance and power spectrum expressions of a positively correlated gHK 180
process, with gg, in continuous and discrete time. 181
Type gHK process
Autocovariance
(continuous)
S
T
w
y
T
y
}
X
G
with
Y
M
Y
ƒ
(21)
Autocovariance
(discrete)
S
|
Z
w
y
Z
}
M
}
X
y
.
G
X
y
Z
}
X
}
X
y
.
G
M
Y
y
Z
}
X
y
.
G
}
.
M
Y
M
for j ≥ 1, with
S
|
!
(22)
Climacogram
(for continuous
and discrete)
!
"
Y
w
"
}
X
.
G
M
Y
M
"
}
M
M
Y
M
"
}
.
with
!
w
(23)
Power spectrum
(continuous)
h
i
j
w
}
3
M
†st
\
••
Y
X
Y
}•
y
i
y
]
Ymyiy
EG‚
Mjw} ‡
E.
'ˆM
Y
YM
YˆM•
.
}
.
i
.
-
M
(where
E
.
is the hyper-geometric function)
(24)
Power spectrum
(discrete) for q>0 not a closed expression
*
-
*
eq. 12 couldn’t be further analysed 182
183
It should be noted that the gHK process can be considered as an HK process that gives a finite 1 84
autocovariance value at zero lag, which is the common case in geophysical processes (an HK process 185
with autocovariance yTy
-b
gives infinity at zero lag). Thus, a parameter q is added to the HK process 186
indicating the limit between HK processes (q << yTy) and those affected by the minimum scale limit of 1 87
the process (q >> yTy). To switch to an HK process from the gHK one in the equations of Table 5, we can 1 88
replace λ with w}
G‚
and then estimate the limit } (see sect. 2.1 of the SM). 189
The expressions in Tables 4-5 are derived starting from the true autocovariance in continuous time 190
(since most studies have preferred autocovariance-based computations; however the easiest way 1 91
would be to start from the climacogram, to avoid the more complicated integral derived from eq. 6). 192
Then, we can estimate its true value in discrete time and its expected value expressions (from eq. 7 193
and 9). Further, we can estimate the true values in continuous time as well as the expected values of 194
the climacogram (from eq. 2 and 5) and finally, the true values in continuous and discrete time as well 195
as the expected values of the power spectrum (from eq. 11, 12 and 14). From now on and for 196
simplicity, only positive lags and frequencies will be considered as both the autocovariance and 1 97
power spectrum are even functions. 198
3.2 Graphical investigation on the climacogram, autocovariance and
199
power spectrum
200
We start our comparison with graphical investigations, which are actually very common in model 201
identification. In Fig. 2-3, we have built the climacograms, autocovariances and power spectra for 2 02
Markovian processes with q = 1, 10 and 100, and λ = 1 (Fig. 2) and gHK processes with q = 1, 10 and 2 03
100, b = 0.2 and λ = q
-b
(Fig. 3), all with D = Δ = 1. In particular, in Fig. 2-3 we compare the true, 2 04
continuous-time stochastic tools, along with their discrete-time versions as well as their expectation of 205
8
classical estimators, as given in the equations of Tables 4-5. For the estimator, a medium sample size n 2 06
= 10
3
was used (apparently, as n increases the bias will decrease). The graphs also contain plots of the 2 07
negative logarithmic derivative (abbreviated as NLD) of all three functions. It is noted that the NLD is 208
an important concept in identifying possible scaling behaviour (i.e. asymptotic power-laws like in the 2 09
Hurst phenomenon) in geophysical processes and a useful metric for quantifying this behaviour (e.g., 210
see Tyralis and Koutsoyiannis (2011) for the estimation of the Hurst coefficient). The NLD of any 211
function f(x) is defined as: 212 Š
5M
Œ•IŽL
Υ
M
ŽŽ

(25) 213
and for the finite logarithmic derivative of f(x), e.g. in case of discrete time process, we choose the 214
forward logarithmic derivative, i.e.: 215 Š
+E
5M
Œ•IŽ
@
Ž
L
Υ
@

(26) 216
Figures 2-3 (including the analysis of the HK process in sect. 2.1 of the SM), allow us to make the 2 17
following observations: 218
(a) As shown in eq. 3, the climacogram continuous-time values are equal to the discrete-time ones (for 219
Δ = D > 0), while in case of the autocovariance and power spectrum they are different. More 220
specifically, the discrete-time autocovariance (S

) is practically indistinguishable from the 2 21
continuous-time one (c), but only after the first lags, while the power spectrum continuous and 222
discrete time values vary in both small and large frequencies (where this variation is larger in the 223
latter). 224
(b) The expectation of autocovariance,3P6S_

Z7, departs from both the true one (c) and the discrete-225
time one (S

), for all the examined processes and its bias is always larger than that of the 226
climacogram and the power spectrum (e.g., see also Lombardo et al., 2013). The climacogram has 227
larger bias, in comparison with the power spectrum, in case of a gHK process (Fig. 3) and smaller bias 2 28
for the Markovian one (Fig. 2). 229
(c) While in theory the NLD of the climacogram, autocovariance and power spectrum should 230
correspond to each other, at least asymptotically (e.g., see Koutsoyiannis, 2013a), in practice, as 231
observed in Fig. 2-3, this correspondence may be lost. In particular, on one hand, the NLDs of the 232
discrete-time autocovariance (S
|

) and expectation value, P6S_
|
Z7
, always tend to infinity in the 233
high lag tail (due to the negative values produced). On the other hand, the NLD of the climacogram 234
expectation value, '!D

-
, is close to the true one (γ
#
) for a Markovian process and increases with 235
scale, in case of a gHK process. On the contrary, while for a Markovian process, the difference 2 36
between the NLDs of the discrete-time power spectra (h
|
) and expectation value, P6h_
|
Z7
, is 237
small, in case of a gHK one, it is non-monotonic, as it varies in both low and high frequencies. Also, 238
there is always a drop in the NLD of the power spectrum in the high frequency tail at ω = 0.5, which is 239
attributed to the symmetry of the discrete-time and expectation of the power spectrum around ω = 0.5, 240
leading to h
|
•‘P6h_
|
Z7
•‘. 241
(d) The expected value of the power can be estimated theoretically (through eq. 14) only up to 242
frequency ω = 0.5 (which is the Nyquist frequency), due to the cosine periodicity. On the contrary, 243
autocovariance and climacogram expected values can be estimated theoretically for scales and lags, 244
respectively, up to n - 1. 245
(e) Finally, there is a high computational cost involved in the calculation of values and expectations of 246
the power (taken from eq. 13 and 14, respectively) as compared to the simple expressions for the 247
climacogram (eq. 5) and autocovariance (eq. 7 and 9), which is often dealt with fft algorithms. These 248
large sums, along with the large number of trigonometric products, can often also cause numerical 2 49
instabilities (e.g. in the gHK case, with q = 100, in Fig. 3e-f). 250
251
9
25 2
253
254
255
Figure 2: True values in continuous and discrete time and expected values of the climacograms (a), 256
autocovariances (c) and power spectra (e) as well as their corresponding NLDs (b, d and f, 257
respectively) of Markovian processes with q = 1, 10 and 100, λ = 1 and n = 10
3
. Note that the continuous 258
and discrete values of the climacogram are identical for Δ = D > 0. 259
260
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E+00 1.0E+01 1.0E+02 1.0E+03
γ
k
(a)
0.00
0.25
0.50
0.75
1.00
1.0E+00 1.0E+01 1.0E+02 1.0E+03
γ
#
k
(b)
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E+00 1.0E+01 1.0E+02 1.0E+03
c
j
(c)
0.0
2.0
4.0
6.0
8.0
10.0
1.0E+00 1.0E+01 1.0E+02 1.0E+03
c
#
j
(d)
1.0E-03
1.0E-01
1.0E+01
1.0E+03
1.0E-03 1.0E-02 1.0E-01 1.0E+00
s
ω
(e)(e)
0.0
0.5
1.0
1.5
2.0
2.5
1.0E-03 1.0E-02 1.0E-01 1.0E+00
s
#
ω
(f)
10
261
262
2 63
Figure 3: True values in continuous and discrete time and expected values of the climacograms (a), 264
autocovariances (c) and power spectra (e) as well as their corresponding NLDs (b, d and f, 265
respectively) of gHK processes with b = 0.2 and q = 1, 10 and 100, λ = q
-b
(not λ = 1, for demonstration 266
purposes) and n=10
3
. Note that the continuous and discrete values of the climacogram are identical for 2 67
Δ = D > 0. 268
269
Certainly, all the above are just indications arising from this graphical investigation of simple cases. 270
For more complicated processes one should investigate further. 271
Some of the observations concerning the estimated power spectrum can be explained by considering 272
the way the power spectrum is calculated from the autocovariance: when a sample value is above 273
(below) the sample mean, the residual is positively (negatively) signed; thus, a high autocovariance 274
value means that, in that lag, most of the residuals of the same sign are multiplied together (++ or --). 275
In other words, the same signs are repeated (regardless of their difference in magnitude). The same 276
‘battle of signs’ process, is followed in the case of the power spectrum, but this time, the sign is given 277
by the cosine function. A large value of the power spectrum indicates that, in that frequency, the 278
autocovariance values multiplied by a positive sign (through the cosine function) are more than those 279
multiplied by a negative one. So, the power spectrum can often misinterpret an intermediate change 2 80
in the true autocovariance or climacogram. A way to track it down will be through the autocovariance 281
itself, i.e. not using the power spectrum at all, but this is also prone to high bias (especially in its high 282
1.0E-02
1.0E-01
1.0E+00
1.0E+00 1.0E+01 1.0E+02 1.0E+03
γ
k
(a)
0.0
0.2
0.4
0.6
1.0E+00 1.0E+01 1.0E+02 1.0E+03
γ
#
k
(b)
1.0E-02
1.0E-01
1.0E+00
1.0E+00 1.0E+01 1.0E+02 1.0E+03
c
j
(c)
0.0
0.5
1.0
1.0E+00 1.0E+01 1.0E+02 1.0E+03
c
#
j
(d)
1.0E-05
1.0E-03
1.0E-01
1.0E+01
1.0E+03
1.0E-03 1.0E-02 1.0E-01 1.0E+00
s
ω
(e)
0.0
0.5
1.0
1.5
2.0
2.5
1.0E-03 1.0E-02 1.0E-01 1.0E+00
s
#
ω
True (q=1) True (q=10) True (q=100)
Discrete (q=1) Discrete (q=10) Discrete (q=100)
Expected (q=1) Expected (q=10) Expected (q=100)
(f)
11
lag tail) which always results in at least one negative value (for proof see Hassani, 2010 and analysis in 283
Hassani, 2012). These can be avoided with an approach based on the climacogram, i.e. the variance of 2 84
the time averaged process over averaging time scale, as the calculated variance is always positive. 285
Also, the structure of the power spectrum is not only complicated to visualize and to calculate but also 28 6
lacks direct physical meaning (opposite to autocovariance and climacogram), as it actually describes 287
the Fourier transform of the autocovariance. 288
Furthermore, the power spectrum can often lead to process misinterpretations as the one shown in 2 89
Fig. 2 (Markovian process), where almost in the whole frequency domain P6h_
7h

and (h

290 P6h_
Z7
. This can lead to the wrong conclusion that the area underneath S

is smaller than P6S_
7 291
and that S

tends to zero more quickly than P6S_
7. This can be easily derived from Fig. 4, if one 292
replaces the cosine function with a simplified one (with only +1 and -1, where cosine is negative and 293
positive, respectively). Then, the negative part of the simplified function lies with the negative part of 294
the biased autocovariance, resulting in a positively signed value when multiplied with each other. 295
However, this is not the case for the discrete autocovariance resulting in P6h_
7h

. 296
297
298
Figure 4: True autocovariance in discrete time for a Markov process (with q = 100) and its expected 299
value for n = 10
3
, along with a cosine function cos(2πfr), where f is the frequency and r the lag and its 300
sign sign(cos(2πfr)), for (a) f = 1/n and (b) f = 2/n. 301
3.3 Investigation of the estimators of climacogram, autocovariance and
302
power spectrum
303
In this section, we will investigate the performance of the estimators of climacogram, autocovariance 3 04
and power spectrum. For their evaluation we use mean square error expressions as shown in the 305
equations below. Assuming that θ is the true value of a statistical characteristic (i.e. climacogram, 3 06
autocovariance, power spectral density and NLDs thereof) of the process, a dimensionless mean 307
square error (MSE), similar to the one used for the probability density function in Papalexiou et al. 308
(2013), is: 309
”'I•
G•L
B
-
B
X3
(27) 310
where we have decomposed the dimensionless MSE into a variance and a bias term, i.e. 311
$%&6˜7˜
.
(28) 312
I˜M6˜7L
.
˜
.
(29) 313
Note that θ is given by eq. 2 (for the true climacogram), eq. 7 (for the true autocovariance in discrete-314
time) and eq. 12 (for the true power spectrum in discrete-time).
can be found analytically through 3 15 6˜7, from eq. 5, 9 and 14, respectively, but
cannot (because of lack of analytical solutions for 6˜
.
7 316
and hence, $%&6˜7, for the classical estimators of climacogram, autocovariance and power spectrum). A 317
way of tackling this would be by a Monte Carlo method, and specifically by producing many 318
-1.0
-0.5
0.0
0.5
1.0
0.0E+00 2.5E+02 5.0E+02 7.5E+02 1.0E+03
Autocovariance, Cosinus and Simplified Cosine
r
True discrete c
Expected discrete c
Cosine (f=1/n)
Simplified Cosine (f=1/n)
(a)
-1.0
-0.5
0.0
0.5
1.0
0.0E+00 2.5E+02 5.0E+02 7.5E+02 1.0E+03
Autocovariance, Cosinus and Simplified Cosine
r
True discrete c
Expected discrete c
Cosine (f=2/n)
Simplified Cosine (f=2/n)
(b)
12
independent Gaussian synthetic time series with a known climacogram (and thus, autocovariance and 319
power spectrum) and estimating the variance for each scale/lag/frequency, respectively. The 320
methodology we used to produce synthetic time series, for any stochastic process based on a 321
combination of Markovian processes (e.g., Mandelbrot, 1977), is given in sect. 3 of the SM. For a typical 322
finite size n, the sum of a finite, usually small, number of Markovian processes is capable of adequate 323
representing most processes; for example, Koutsoyiannis (2010) showed that the sum of 3 AR(1) 324
models is adequate for representing an HK process for n < 10
4
. Certainly, as accuracy requirements 325
and n increase, a larger number of Markovian processes is required. Note that here, we do not use the 3 26
AR(1) model to represent a process that is Markovian in continuous time (as shown in sect. 4 of the 327
SM, the AR(1) model cannot represent a discretized continuous-time Markovian process for Δ/q > 0 as 328
well as Δ ≠ D). Instead, we use the ARMA(1,1) model which (as mentioned in Koutsoyiannis, 2002, 3 29
2013a) successfully represents any Markovian process and in sect. 4 of the SM we derive its 3 30
parameters. 331
Thus, we produce synthetic time series for Markovian processes with q = 1, 10 and 100 (Fig. 5) and 332
gHK ones with q = 1, 10 and 100 and b = 0.2 (Fig. 6), all with D = Δ = 1. Then, for each scale, lag and 333
frequency, we calculate for all processes the means, variances, means of the NLD, and variances of the 334
NLD, for the climacogram, autocovariance and power spectrum, and their corresponding errors 33 5
through eq. 27 to 29, for n = 10
3
(Fig. 5-6) and for n = 10
2
and 10
4
(sect. 2.2 of the SM). Note that, on one 336
hand, as n decreases, both bias and variance increase and thus, for the point estimate and variance to 337
be closer to the expected ones, we need more time series. On the other hand, as n increases, more 338
Markovian processes have to be added and with a larger bias and variance (due to larger q). So, for the 339
examined processes, we conclude that in order to achieve a maximum error of about 1‰ between 340
scales 1 and n/2, we have to produce approximate 10
4
time series for n = 10
2
, 10
3
and 10
4
. The error is 341
meant here as the absolute difference, between the estimated and expected value, divided by the 342
expected value. Furthermore, the 1‰ error refers to the climacogram and corresponds to a gHK 343
process with b = 0.2 and q = 100, which is considered the more adverse of the examined processes. 344
Note that in Fig. 5-6, we try to show all estimates within a single plot for comparison with each other. 345
The inverse frequency in the horizontal axis is set to 1/(2ω), so as to vary between 1 and n/2 and the 3 46
lag to j+1, so as the estimation of variance at j = 0 is also shown in a log-log plot. 347
Moreover, we investigate the shape of the probability density function (pdf) for each stochastic tool, 3 48
which, in many cases, differs from a Gaussian one, resulting in deviations between the mean 349
(expected) and mode. To measure this difference, we use the sample skewness (denoted g), where for 350
g 0, the difference is small and for any other case, larger. In Fig. 7, we show for each stochastic tool 351
and for a gHK process with b = 0.2 and q/Δ = 10, an example of their 95% upper and lower confidence 352
intervals (corresponding to exceedence probabilities of 2.5% and 97.5%), as well as their pdf for a 353
specific scale, lag and frequency. 354
355
13
356
3 57
3 58
35 9
Figure 5: Dimensionless errors of the climacogram estimator (continuous line), autocovariance (dashed 360
line) and power spectrum (dotted line), calculated from 10
4
Markovian synthetic series with n = 10
3
361
(for b = 0.2, q = 1, 10 and 100 and λ = q
-b
): (a)
š
(dimensionless MSE of variance); (b)
(dimensionless 362
MSE of bias); (c) ε (total dimensionless MSE); and (d)
(total dimensionless MSE of NLD); as well as 363
the sample skewness of each of the stochastic tools and their NLDs are also shown (e) and (f). 364
365
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E+01
1.0E+02
1.0E+00 1.0E+01 1.0E+02 1.0E+03
ε
v
k
,
j
+1, 1/(2
ω
)
(a)
1.0E-06
1.0E-04
1.0E-02
1.0E+00
1.0E+02
1.0E+00 1.0E+01 1.0E+02 1.0E+03
ε
b
k,j+1, 1/(2ω)
(b)
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E+01
1.0E+02
1.0E+00 1.0E+01 1.0E+02 1.0E+03
ε
k,j+1, 1/(2ω)
(c)
1.0E-02
1.0E+00
1.0E+02
1.0E+04
1.0E+06
1.0E+08
1.0E+00 1.0E+01 1.0E+02 1.0E+03
ε
#
k,j+1, 1/(2ω)
(d)
-1.0E+00
0.0E+00
1.0E+00
2.0E+00
3.0E+00
1.0E+00 1.0E+01 1.0E+02 1.0E+03
g
k,j+1, 1/(2ω)
(e)
-1.0E+0 0
0.0E+00
1.0E+00
2.0E+00
3.0E+00
1.0E+0 0 1.0E+01 1 .0E+02 1 .0E+03
g
#
k,j+1, 1/(2ω)
(f)
14
366
3 67
3 68
36 9
Figure 6: Dimensionless errors of the climacogram estimator (continuous line), autocovariance (dashed 370
line) and power spectrum (dotted line), calculated from 10
4
gHK synthetic series with n = 10
3
(for b = 371
0.2, q = 1, 10 and 100 and λ = q
-b
): (a)
š
(dimensionless MSE of variance); (b)
(dimensionless MSE of 372
bias); (c) ε (total dimensionless MSE); and (d)
(total dimensionless MSE of NLD); as well as the 373
sample skewness of each of the stochastic tools and their NLDs are also shown in (e) and (f). 374
375
Figures 5-6 (including the analysis in sect. 2.2 of the SM), allow us to make some observations related 3 76
to stochastic model building: 377
(1) In general, the climacogram has lower variance than that of the autocovariance, which in turn is 378
lower than that of the power spectrum (e.g. Markovian and HK processes as well as gHK for most 379
scales). Also, it has a smaller bias than that of the autocovariance but larger than the one of the power 380
spectrum (for all examined processes). Since, for the Markovian and HK processes, the error 381
component related to the variance,
š
, is usually larger than the one from the bias,
, or conversely for 382
the gHK ones, the climacogram has a smaller total error ε, in most cases. Thus, we can state that (for 383
all the examined cases) the expression below holds: 384
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E+01
1.0E+02
1.0E+00 1.0E+01 1.0E+02 1.0E+03
ε
v
k,j+1, 1/(2ω)
(a)
1.0E-05
1.0E-03
1.0E-01
1.0E+01
1.0E+00 1.0E+01 1.0E+02 1.0E+03
ε
b
k,j+1, 1/(2ω)
(b)
1.0E-01
1.0E+00
1.0E+01
1.0E+02
1.0E+00 1.0E+01 1.0E+02 1.0E+03
ε
k,j+1, 1/(2ω)
(c)
1.0E-02
1.0E+00
1.0E+02
1.0E+04
1.0E+06
1.0E+08
1.0E+00 1.0E+01 1.0E+02 1.0E+03
ε
#
k,j+1, 1/(2ω)
(d)
-1.0E+00
0.0E+00
1.0E+00
2.0E+00
3.0E+00
1.0E+00 1.0E+01 1.0E+02 1.0E+03
g
k,j+1, 1/(2ω)
(e)
-1.0E+0 0
0.0E+00
1.0E+00
2.0E+00
3.0E+00
4.0E+00
1.0E+0 0 1.0E+01 1 .0E+02 1 .0E+03
g
#
k,j+1, 1/(2ω)
(f)(f)
15
Pœ\!DM!]
.
!
.
žP'IS_
|
MS
|
L
.
-S
|.
žP'Ih_
|
Mh
|
L
.
-h
|.
(30) 385
(2) We see that as n and b (for the HK process) or q (for the Markovian and gHK processes) increase, 386
the climacogram estimator entails much smaller error than that of the autocovariance and power 3 87
spectrum for the whole domain of scales, lags and frequencies. 388
(3) The total error for the NLD, ε
#
, increases with scale in the climacogram and with lag in the 389
autocovariance for all examined processes. In case of an exponentially decaying autocovariance (e.g. 390
in a Markovian process), the power spectrum slope ε
#
first decreases and then increases in large 39 1
inverse-frequency values, while the autocovariance and climacogram ε
#
always increase. In this type 392
of process, climacogram and autocovariance ε
#
are close to each other and in most cases smaller than 393
the power spectrum ε
#
. For HK and gHK processes, where large scales/lags/inverse-frequencies 394
exhibit an HK behaviour, the power spectrum always decreases with inverse frequency under a 39 5
power-law decay, in contrast to the autocovariance and climacogram ε
#
which they always increase. 396
Thus, in this type of processes, there exists a cross point between power spectrum ε
#
and the other 397
two, where behind this point, the power spectrum has a larger ε
#
and beyond a smaller one. 398
(4) The pdf of the climacogram and autocovariance have small skewness magnitude and can 399
approximate a Gaussian pdf for most of scales and lags, while the power spectrum pdf has a larger 400
skewness for its regular values (besides its theoretical smaller bias), which results in non-symmetric 401
confidence intervals (very important when it comes to uncertainty in stochastic modeling, e.g., see 402
Lombardo et al., 2014). However, the NLD of the power spectrum has a negligible skewness in 403
comparison with those of the autocovariance and climacogram, which means that the expected NLD 404
should be very close to the NLD mode. 405
(5) The climacogram skewness is increasing with scale up to 3, while the autocovariance one is larger 406
at first and then it drops to -1 (the point where it starts to drop is when the expected autocovariance 407
reach a negative value for the first time). It is interesting that the power spectrum skewness has a 408
value around 2 for regular values and 0 for NLDs, for all the examined processes (with the exception 409
of the extreme gHK process with q/Δ = 100, where it is around 2.5). 410
(6) The power spectrum has a large in high frequencies and then it stabilizes around 1 for all the 411
examined processes and n. This observation is also mathematically verified by Papoulis (1991, p. 449, 412
eq. 13-59). Also, we observe that the autocovariance and climacogram always increases with scale 413
and lag, respectively. 414
(7) The autocovariance is decreasing with }, for the examined Markovian and gHK processes, and 415
increasing with for the examined HK ones. In contrast, the climacogram is increasing with } (for 416
the examined Markovian and gHK processes) and decreasing with (for the HK ones). 417
(8) The autocovariance and power spectrum
are decreasing with }, for the examined Markovian 418
and gHK processes, and increasing with for the examined HK ones. The climacogram
is 419
decreasing with both } and . 420
(9) The climacogram exhibits sudden increases of and
(like a stairway) beyond scales equal to the 42 1
10%-20% of n/2 (maximum possible scale for the climacogram). This is due to the small number of 422
data from which the variance is calculated. This is also verified by Koutsoyiannis (2003, 2013a) leading 423
to a rule of thumb of estimating the climacogram until the n/10 (20% of n/2) scale. 424
(10) $%&6h_
7 has a power-type decay with inverse-frequency with an exponent around -2.0 to -2.5, for 4 25
all the examined processes. 426
(11) We observe that the variance of the power spectrum, for all the examined processes and sample 4 27
sizes, is approximately equal to the square of its expected value for frequencies ω 0, 0.5 and 1 and 428
double the square of its expected value for ω = 0, 0.5 and 1. This is also verified by Papoulis (1991, p. 4 29
447, eq. 13-50) and discussed in Press et al. (2007, p. 655). 430
431
16
4 32
4 33
Figure 7: Expected value (continuous blue line), upper 95% confidence interval (dashed green line), 4 34
lower 95% confidence interval (dashed red line) and mode for (a) climacogram, (b) autocovariance 435
and (c) power spectrum and (d) climacogram empirical pdf (blue line), autocovariance (red line) and 436
power spectrum (green line), at k = j = 100 and ω = 0.1, respectively, calculated from 10
4
gHK synthetic 437
series, with b = 0.2, q=10, λ = q
-b
and n = 10
3
. 438
439
Apparently, these results are valid for the simple processes examined, and the typical estimator and 440
sample sizes used, while to draw conclusions for more complex processes, the above analyses should 441
be repeated. On the one hand, we can conclude that from observations 1 and 2, it is more likely for the 442
sample climacogram to be closer to the theoretical one (considering also the bias) in comparison to the 443
sample autocovariance or power spectrum to be closer to their theoretical values. Thus, it is proposed 444
to use the climacogram when building a stochastic model and estimate the autocovariance and power 445
spectrum from that model, rather than directly from the data (see application in sect. 4). On the other 446
hand, it seems from observation 3, that in case of a power-law decay in large scales, lags and inverse-447
frequencies (e.g. in a HK or a gHK process) the NLD of that decay (i.e. b which is related to the HK 4 48
coefficient) is better estimated from the power spectrum rather than the climacogram or 44 9
autocovariance. However, this applies only for inverse-frequencies beyond the cross point (discussed 450
in the 3
rd
observation). This can be tricky as we do not know where this point lies and also, this rule 451
doesn’t apply for exponential autocovariance decay (e.g. in a Markovian process) where the NLD is 4 52
now very large and again, it can lead to wrong conclusions about the nature of the large scale decay 453
(i.e. presence or not of the Hurst phenomenon). In conclusion, the observations 1-3 can be used to 454
build a general frame of rules of thumb (described in the steps below) to build a stochastic model from 455
a sample or to interpret its physical process, e.g. identify what type of process is (Markovian, HK, 456
gHK etc.). This framework is based only on the three examined stochastic tools and it should be 457
expanded in case more tools are to be used in the analysis. An application to a real-world example is 4 58
presented in sect. 4 for illustration purposes. 459
(a) First, we have to decide upon the large scale type of decay from the climacogram. For example, if 460
the large scale NLD is close to 1 then the process is more likely to exhibit either an exponential decay of 461
autocovariance at large lags (scenario S1) or a white noise behaviour, i.e. H = 0.5 (scenario S2). In case 4 62
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E+00 1.0E+01 1.0E+02 1.0E+03
γ
k
expected
lower 2.5% confidence interval
upper 2.5% confidence interval
mode
(a)
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E+00 1.0E+01 1.0E+02 1.0E+03
c
r
expected
lower 2.5% confidence interval
upper 2.5% confidence interval
mode
(b)
1.0E-05
1.0E-03
1.0E-01
1.0E+01
1.0E+03
1.0E-03 1.0E-02 1.0E-01 1.0E+00
s
ω
expected
lower 2.5% confidence interval
upper 2.5% confidence interval
mode
(c)
0.0E+00
2.0E-02
4.0E-02
6.0E-02
8.0E-02
1.0E-02 1.0E-01 1.0E+00
pdf γ, c,s
k,j,ω
climacogram
autocovariance
power spectrum
(d)
17
where the large scale NLD deviates from 1 then the process is more likely to exhibit an HK behaviour 463
(scenario S3). The autocovariance can help us choose between scenarios S1 and S2, as in S1 we expect 464
an immediate, exponential-like, drop of the autocovariance (which often has the smaller difference 465
between its expected and mode value) whereas in S2 it is unbiased and therefore, the NLD should be 466
close to 1. In case of the scenario S1, we can estimate the scale parameter of the Markovian-type decay 467
from the NLD of the climacogram while in case of S3, we should also look into the power spectrum 4 68
decay behaviour in low frequencies. Thereafter, for the determination of the Hurst coefficient, we can 469
use various algorithms, e.g., the one of Tyralis and Koutsoyiannis (2010), which is based on the 470
climacogram (usually taken up to 10%-20% of its maximum scale n/2), or that of Chen et al. (2010), 471
which is based on the power spectrum. 472
(b) For the estimation of the rest of the properties (e.g. for intermediate and smaller scales) we should 4 73
use the climacogram. 474
(c) To build a model, we should first try to use a combination of the processes used in this paper, i.e. 475
an combination of Markovian, HK and gHK processes, as they are the simplest ones (principle of 476
parsimony), with an immediate physical interpretation and their combination should cover most of 477
the cases. If they do not represent well the physical process, we can use more complicated 4 78
mathematical processes but repeating for each one the graphical investigation and statistical analysis 479
proposed in this paper (for example, as done in sections 3.2 and 3.3). 480
(d) After we built our model, we should make the statistical analysis proposed in section 3.3, to verify 481
our initial assumptions (null hypothesis) on the smaller ε and ε
#
of the process as well as their pdf 482
skewness magnitude, concerning its climacogram, autocovariance and power spectrum. 483
4. Application
484
In this section, we will show a statistical analysis of a set of 40 time series derived from a large open 485
access dataset (http://www.me.jhu.edu/meneveau/datasets/datamap.html), provided by the Johns 486
Hopkins University, which consists of turbulent wind velocity data, measured by X-wire probes 487
downstream of an active grid at the direction of the flow (Kang et al., 2003). The first 16 time series 4 88
correspond to velocities measured at transverse points abstaining r = 20M from the source, where M = 489
0.152 m is the size of the grid placed at the source. The next 4 time series correspond to a distance r = 490
30M, the next 4 to 40M and the last 16 to 48M (for more details concerning the experimental setup and 491
data, see Kang et al., 2003). We have chosen this type of dataset for our application because of the 492
controlled environment of the experiment, as well as for its broad importance as turbulence drives 493
almost any geophysical process. Additionally, all time series have a nearly-Gaussian probability 494
density function (see Fig. 8b) and are nearly isotropic (isotropy ratio 1.5, see in Kang et al., 2003). Also, 495
their sample sizes are very large, n = 10
6
data for each time series (the original data set consisted of 36 × 496
10
6
data values but, following Koutsoyiannis (2012) approach, we averaged every 36 observations, 497
resulting in 10
6
observations, for the sake of simplicity). Yet D remains small (0.9 ms for the averaged 4 98
time series) and thus, the equality D Δ can still be assumed valid. Finally, the data set gives the 499
opportunity of cross checking the methodology proposed in section 3.3, by applying it firstly for the 500
averaged process (Fig. 9a-d) derived from all 40 time series and then for a single one (Fig. 9d) with 501
statistical characteristics close to the averaged one. In all cases stationarity is assumed, given that the 502
macroscopic flow characteristics are steady. The modelling of higher moments and derivatives of the 5 03
process, which are important for phenomena such as intermittency and bottleneck effects, as well as 504
interpretation of model parameters, is not within the scope of this paper. We only focus on the 505
preservation of the 2
nd
order statistics related to the three examined stochastic tools. 506
18
507
Figure 8: Data preliminary analysis: a) averaged velocity mean (red line) and averaged standard 508
deviation (blue line) along the wind tunnel axis and (b) empirical pdfs of the normalized time series 5 09
(by subtracting the mean and dividing with the standard deviation, for each time series) and their 510
averaged empirical pdf (black thick line). 511
512
In Fig. 9, we show the climacograms, autocovariances and power spectra of all the 40 normalized time 513
series, their averaged values and the corresponding values for the 38
th
time series whose stochastic 514
properties are closest to the averaged one. We choose to analyze this single time series to show a 515
comparison with the averaged one. Notice here, that we do not apply the windowing technique to 5 16
eliminate some of the power spectrum variance as it is causing loss of information for small 517
frequencies (see Fig. 10d). Also, windowing should be used with caution when choosing small 518
segment lengths and should be avoided in strongly correlated processes (e.g. the ones that exhibit 51 9
Hurst behaviour) as the time series of the divided segments are not independent from each other. 520
521
5 22
523
Figure 9: Data stochastic analysis: (a) climacograms, (b) autocovariances and (c) power spectra of all 524
the 40 time series (multi-coloured lines) as well as their averaged values (dashed thick black line), (d) 525
all three in one plot focusing on the comparison of the averaged values with those of the 38
th
time 526
series; NLDs at large scales, lags and inverse frequencies are also shown. 527
1.0
1.5
2.0
11.0
11.5
12.0
1 1.5 2
velocity standard deviation (m/s)
velocity mean (m/s)
ln(r) (rin m)
mean
standard deviation
(a)
0.0E+00
1.5E-01
3.0E-01
4.5E-01
-6.0 -4.0 -2.0 0.0 2.0 4.0 6.0
pdf
normalized velocity (m/s)
averaged empirical
(b)
1.0E-04
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E+00 1.0E+01 1.0E+02 1.0E+03 1.0E+04 1.0E+05 1.0E+06
Climacogram (m
2
/s
2
)
Scale k(-)
(a)
averaged
1.0E-04
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E+00 1.0E+01 1.0E+02 1.0E+03 1.0E+04 1.0E+05 1.0E+06
Autocovariance (m
2
/s
2
)
Lag j(-)
(b)
averaged
1.0E-08
1.0E-06
1.0E-04
1.0E-02
1.0E+00
1.0E+02
1.0E-06 1.0E-05 1.0E-04 1.0E-03 1.0E-02 1.0E-01 1.0E+00
Power spectrum (m
2
/s)
Frequency ω(-)
(c)
averaged
1.0E-08
1.0E-06
1.0E-04
1.0E-02
1.0E+00
1.0E+02
1.0E-04
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E+00 1.0E+01 1.0E+02 1.0E+03 1.0E+04 1.0E+05 1.0E+06
power spectrum (m
2
/s)
Aggregated variance and autocovariance
k, j+1, 1/(2ω)
climacogram (averaged)
autocovariance (averaged)
climacogram (38th)
autocovariance (38th)
power spectrum (averaged)
power spectrum (38th)
(d)
(m
2
/s
2
)
19
The velocity field is not homogeneous in the direction of the flow, e.g. the velocity mean and standard 528
deviation in every position is decreasing with the distance r from the source as shown in Fig. 8a. To 529
homogenize all time series, we normalize each one by subtracting the mean (red line) and dividing 5 30
with the standard deviation (blue line). 531
Assuming that the averaged values, shown in Fig. 9d, are close to the expected values of the process, 532
we can fit a model following the proposed methodology in section 3.3. The large scale NLD is far from 533
1, hence, it is most likely that the process exhibits a Hurst behaviour, i.e. a power law decay of the 534
autocovariance (scenario S3). For the identification of the process’ behaviour at intermediate and small 535
scales, we use the climacogram as it is more likely to have the least standardized variance (as shown 536
in sect. 3.3). We finally observe that the NLD at small scales can be very well represented by a 537
Markovian process. Thus, we fit a stochastic model consistent with the observed behaviour (as seen on 538
the climacogram) combining Markovian and gHK processes. Namely, we fit a model (Table 6) 539
consisting of one Markovian process (controlling small scale behaviour) and a gHK process 540
(controlling large scale behaviour). 541
542
Table 6: Autocovariance, climacogram and power spectrum mathematical expressions, in continuous 543
and discrete time, of a composite model consisted of a Markovian and a gHK process. 544
Type Stochastic model
Autocovariance
(continuous)
S
T
w
E
x
G
y
z
y
{
X
w
.
y
T
y
}
.
X
G
(31)
Autocovariance
(discrete)
S
|
Z
w
E
I
M
x
G
{
q
L
.
}
E
.
3
~
G
y
[
y
G
E
{
q
Xw
.
yZ}
.
M}
.
Xy
.G‚
XyZ}
.
X}
.
Xy
.G‚
MYyZ}
.
Xy
.G‚
}
.
.
MYM
with
S
|
!
(32)
Climacogram
(continuous and
discrete)
!
"
Y
w
E
"
}
E
.
I
"
}
E
X
x
G
,
{
q
M
L
XYw
.
u"}
.
X
.G‚
MYM"}
.
Mv
MYM"}
.
.
with
!
w
E
X
w
.
(33)
Power spectrum
(continuous)
h
i
j
w
E
}
E
Xjm}
E.
i
.
X
j
w
.
}
.
3
M
†st
\
••
Y
X
Y
}
.
y
i
y
]
Ymyiy
EG‚
M
jw
.
}
.
E.
'ˆM
Y
Y
M
Y
ˆM•
.
}
.
i
.
-
M
(34)
Power spectrum
(discrete)
not a closed expression (see in Table 5) -
545
As a first priority, we try to best fit the climacogram of the time series and on a secondary basis, the 546
autocovariance and power spectrum (see Fig. 10). To estimate the parameters of the model two 547
alternative fitting errors were considered: 548
Ÿ
<¡
”'Q
¢A-GQ
¢
R
A
”'Q
¢A-
£
.
F.
AKE
(35) 549
20
¤
¥%¦
AKE§F.
¨
”'Q
¢A-GQ
¢
R
A
”'Q
¢A-
¨ (36) 5 50
where !D
|
8 is the empirical climacogram (estimated from data) and P'!D8- the expected one 551
(estimated from the model). Firstly, we use the
Ÿ
error to locate initial values and then the
¤
for 552
fine tuning and distributing the error equally to all scales. The optimization analysis results in: λ
1
=0.81 553
and λ
2
= 0.19 m
2
/s
2
, q
1
= 0.504 ms and q
2
= 5.04 ms and b = 0.45 (H=0.775), with
¤
41% and the R
2
554
equal to ~100% for the climacogram, 99.9% for the autocovariance and 99.0% for the power spectrum. 555
556
5 57
5 58
Figure 10: (a) Climacogram, (b) autocovariance and (c) power spectrum for the model of Table 6 fitted 559
to turbulence data: true values in continuous time (estimated from the model – shown with a green 560
line), true values in discrete time (estimated from the model shown with an orange line), expected 56 1
values (estimated from the model – shown with a red line), empirical averaged (estimated from all 40 562
time series – shown with a purple line) and sample values (estimated from the 38
th
time series – shown 563
with a dashed blue line). Note that, to avoid large computational burden, the expected values of the 564
power spectrum are not calculated from eq. 14, but from a Monte Carlo analysis of 10
4
synthetic time 565
series. In (d) Bartlett’s method is applied for the 38
th
time series for various numbers of segments and 566
the cross-correlation between segments is shown. 567
568
Note that in Fig. 10d, Bartlett’s method (Welch method for non-overlapping segments and with the 569
use of a uniform window) is applied for the 38
th
time series. The increase of the cross-correlation with 570
the increase of the number of segments the original time series is divided into, causes an increase to 571
the dependence between segments, and thus, highlights the inappropriateness of this method in 572
estimating the expected power spectrum. Finally, to test the validity of our assumption that for the 573
specific model in Table 6, the estimator based on the climacogram has the smallest error ε compared to 574
those based on the autocovariance and power spectrum, we use the same analysis proposed in step (d) 575
in section 3.3. We produce 10
4
time series with n = 10
6
and we compare the errors ε for each estimator 5 76
for 81 points logarithmically distributed from 1 to n (Fig. 11). Following the methodology of sect. 3 577
and 4 of the SM, we fit the gHK process in Table 6 with 7 Markovian models, with: p
1
= 26.622, p
2
= 6. 5 78
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E-03 1.0E-02 1.0E-01 1.0E+00 1.0E+01 1.0E+02 1.0E+03
Climacogram (m
2
/s
2
)
Scale m(s)
observed (empirical from 38th time series)
observed (empirical averaged)
true (model)
expected (model)
(a)
with R
2
=1.0
1.0E-04
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E-03 1.0E-02 1.0E-01 1.0E+00 1.0E+01 1.0E+02 1.0E+03
Autocovariance (m
2
/s
2
)
Lag τ(s)
observed (empirical from 38th time series)
observed (empirical averaged)
true continuous (model)
true discretized (model)
expected (model)
(b)
with R
2
=0.999
1.0E-06
1.0E-04
1.0E-02
1.0E+00
1.0E+02
1.0E-03 1.0E-02 1.0E-01 1.0E+00 1.0E+01 1.0E+02 1.0E+03
Power spectrum (m
2
/s)
Frequency w(Hz)
observed (empirical from 38th time series)
observed (empirical averaged)
true continuous (model)
true discretized (model)
expected (from Monte Carlo analysis of 1000 time series)
(c)
with R
2
=0.90
1.0E-06
1.0E-04
1.0E-02
1.0E+00
1.0E+02
1.0E-03 1.0E-02 1.0E-01 1.0E+00 1.0E+01 1.0E+02 1.0E+03
Power spectrum (m
2
/s)
Frequency w(Hz)
1 segment (original)
32 segments (max cross-correlation 0.2)
1024 segments (max cross-correlation 0.5)
32768 segments (max cross-correlation 0.8)
(d)
38th time series
21
377 and ε
rm
0.2%. As can be observed from Fig. 11, the initial choice of the climacogram based 579
estimators to identify the true process from the sample (null hypothesis), is proven valid for the 580
current model and for all examined scales (in comparison with the other two estimators). Specifically, 581
for all time scales the climacogram is more skillful for the estimation of both regular and NLD values 582
of the process. The only clear exceptions are the smallest magnitude of the sample skewness of the 583
autocovariance in the last lags and those of the NLD of the power spectrum (which means that their 584
pdfs are closer to Gaussian and thus, their mode value is closer to their mean). However, these 585
advantages are diminished by their larger variance and/or bias related errors. Here, it is also observed 586
that the power spectrum errors seem to be quite constant not only for ε (as expected from the analysis 587
in sect. 3.3) but for ε
#
as well. This is due to the mixing of increasing Markovian process ε
#
(see Fig. 4) 588
and to the decreasing power-type ones (see Fig. 6 for the gHK process). The larger fluctuations of the 589
power spectrum, in contrast to the climacogram and autocovariance ones, in Fig. 11, are indicative of 5 90
its larger statistical variance and thus, of the smaller likelihood that the empirical power spectrum is 591
closer to the expected one from the model. 592
593
594
595
596
Figure 11: Dimensionless errors (a) ε and (b)
of the climacogram and autocovariance compared with 597
the power spectrum, as well as their expected values, along with upper and lower 95% confidence 598
intervals and mode (c, d and e), as well as (f) skewness, calculated from 10
4
synthetic series with n = 599
10
6
based on the process in Table 6. 600
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E+01
1.0E+00 1.0E+0 1 1.0E+02 1.0E+03 1.0E+04 1.0E+0 5 1.0E+06
ε
j+1, k, 1/(2ω)
autocovariance
climacogram
power spectrum
(a)
1.0E-04
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E+01
1.0E+02
1.0E+03
1.0E+00 1.0E+0 1 1.0E+02 1.0E+03 1 .0E+04 1.0E+05 1.0 E+06
ε
#
j+1, k, 1/(2ω)
autocovariance
climacogram
power spectrum
(b)
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E+01
1.0E+00 1 .0E+01 1 .0E+02 1.0 E+03 1.0E+ 04 1.0E+0 5 1.0E+06
γ
k
expected
lower 2.5% confidence interval
upper 2.5% confidence interva
mode
(c)
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E+01
1.0E+00 1 .0E+01 1.0 E+02 1.0E+ 03 1.0E+04 1 .0E+05 1. 0E+06
c
j
expected
lower 2.5% confidence interval
upper 2.5% confidence interval
mode
(d)
1.0E-08
1.0E-06
1.0E-04
1.0E-02
1.0E+00
1.0E+02
1.0E-06 1. 0E-05 1.0E-04 1 .0E-03 1. 0E-02 1.0E-01 1.0 E+00
s
ω
expected
lower 2.5% confidence interval
upper 2.5% confidence interval
mode
(e)
-1.0E+00
1.0E+00
3.0E+00
5.0E+00
7.0E+00
1.0E+00 1 .0E+01 1.0 E+02 1.0E+ 03 1.0E+04 1 .0E+05 1.0 E+06
g
k,j+1, 1/(2ω)
climacogram
autocovariance
power spectrum
climacogram NLD
autocovariance NLD
power spectrum NLD
(f)
22
5. Summary and conclusions
601
The applications of the autocovariance and power spectrum, in order to identify the stochastic 602
structure of natural processes and construct models thereof, abound in the literature. Less frequent is 603
the use of the climacogram, which is a simpler tool and is related by one-to-one transformation to both 604
the autocovariance and power spectrum. However, in very few cases the estimation uncertainty and 60 5
bias are included in the calculations, causing possible inconsistencies and misspecifications of the 606
model sought. Here we provide a theoretical framework to calculate the uncertainty and bias for those 607
three stochastic tools, which also enables inter-comparison of the three tools and identification of their 6 08
advantages and disadvantages. 609
For the climacogram and the autocovariance, analytical formulae for the calculation of the bias are 610
possible and are presented here; in particular, the expected value of the classical estimator of the 611
autocovariance in terms of the true climacogram and true autocovariance in discrete time is derived 612
here (Eq. 9 and Appendix) and it is shown how it can be decomposed into only four parts, which are 613
easy to evaluate. In contrast, the power spectrum, due to its more complicated definition (based on the 614
Fourier transform of the autocovariance), does not enable a generic, analytically derived, formula for 615
the estimation bias. 616
The study shows some of the advantages and difficulties presented in stochastic model building when 617
starting from the climacogram, autocovariance or power spectrum. Specifically: 618
The climacogram has the smallest estimation error in estimating the true values (in all 619
examined cases as described in eq. 30) as well as the true logarithmic derivatives, i.e. slopes in 6 20
log-log plots (with few exceptions). Also, its bias can be estimated through a simple and 621
analytical expression (Eq. 5). Moreover, the climacogram is always positive (a property 622
helpful in stochastic model building, e.g. the logarithmic derivative always exists), well-623
defined (with an intuitive definition through the variance of the time averaged process over 624
averaging time scale) and typically monotonic (observed in all the examined processes and in 6 25
the NLDs, in Fig. 2-3 and in sect. 2.1 of the SM). Finally, it has (for all the examined processes) 6 26
values of sample skewness close to 0, for the small scale tail, while in the large scale tail, its 627
skewness is increasing up to 3 (Fig. 5-6 and sect. 2.2 of the SM). 628
The autocovariance has estimation errors larger than those of the climacogram. Besides its 629
large bias, it is also prone to discretization errors as its value (eq. 7) can never be equal with 630
the true value in continuous time (eq. 6), even for an infinite sample size. Moreover, it has 631
negative values in the high lag tail (creating difficulties in stochastic model building, e.g. the 6 32
logarithmic derivative does not exist). However, it is well-defined (with an intuitive 633
definition), and with the help of Eq. 9, its bias can be estimated through a simple and 634
analytical expression. Finally, it has (for all the examined processes) values of skewness close 635
to 0, for the small lag tail, while in the large lag tail, its skewness is decreasing down to -1 (Fig. 6 36
5-6 and sect. 2.2 of the SM). 637
The power spectrum has the largest values of estimation error (in all examined cases it is 638
mostly around 100% of the true value in discrete time). Besides its bias, it is also prone to 639
discretization error as its value (eq. 12) can never be equal to the true value in continuous time 640
(eq. 11) even for an infinite sample size. Moreover, while theoretically its values are positive, 641
numerical calculations based on data can result in negative values. In addition, it has a 642
complicated definition (based on the Fourier transform of the autocovariance), which also 643
involves complicated and high computational cost calculations for the discrete time and 644
expected values (eq. 12-14 and Fig. 6e-f), as well as a non-monotonic NLD (observed in all the 645
examined processes; Fig. 2-3 and sect. 2.1 of the SM). Finally, it often has the highest value of 646
skewness for its regular values (mostly constant around 2) and the smallest one (around 0) for 647
its NLD values (Fig. 5-6 and sect. 2.2 of the SM). The latter advantage of the power spectrum 648
means that its mode should be close to the expected one, which however, is difficult to 649
estimate, due to the aforementioned reasons. 650
23
The above theoretical and experimental results allow us to draw a general conclusion that the 651
climacogram could provide a more direct, easy and accurate means both to make diagnoses from data 652
and build stochastic models in comparison to the power spectrum and autocovariance. 653
As incidental contributions of the paper, we mention in the SM (sect. 3) the proposed methodology to 654
produce synthetic Gaussian distributed time series of a process by decomposing it in multiple 655
Markovian processes. This methodology is based only on an equation providing the scale parameters 656
of the Markovian processes. Furthermore, we developed an ARMA(1,1) model in the SM (sect. 4), 657
appropriate for simulating discrete-time Markovian processes; the need to introduce this, is related to 658
the fact that the errors produced by a discrete-time AR(1) model (whose equivalent continuous-time 659
process exhibits Markovian properties only when Δ = 0) when Δ > 0, can be significant for large first-660
order autocorrelation coefficient (see Fig. 5 of the SM). 661
Acknowledgement
662
This paper was partly funded by the Greek General Secretariat for Research and Technology through 663
the research project “Combined REnewable Systems for Sustainable ENergy DevelOpment” 664
(CRESSENDO; programme ARISTEIA II; grant number 5145). We thank the anonymous Associate 665
Editor and the three anonymous Reviewers for the constructive comments which helped us to improve 666
the paper, as well as the Springer Correction Team for editing the manuscript. 667
References
668
Chen, Y., R. Sun and A. Zhou (2010), An improved Hurst parameter estimator based on fractional 669
Fourier transform, Telecommunication Systems, 43(3/4), 197–206. 670
Dimitriadis P., D. Koutsoyiannis and Y. Markonis (2012), Spectrum vs Climacogram, European 671
Geosciences Union General Assembly 2012, Geophysical Research Abstracts, Vienna, Session 672
HS7.5/NP8.3: Hydroclimatic stochastics, EGU2012-993. 673
Fleming S.W. (2008), Approximate record length constraints for experimental identification of 674
dynamical fractals, Ann. Phys. (Berlin) 17, No. 12, 955-969. 675
Fourier J. (1822), Théorie analytique de la chaleur, Firmin Didot Père et Fils, Paris. 676
Gilgen, H. J. (2006), Univariate time series in geosciences: Theory and examples, Berlin: Springer. 677
Hassani H. (2010), A note on the sum of the sample autocorrelation function, Physica A 389, 1601-1606. 678
Hassani H. (2012), The sample autocorrelation function and the detection of long-memory processes, 6 79
Physica A 391, 6367-6379. 680
Hurst, H.E., 1951. Long term storage capacities of reservoirs, Trans. Am. Soc. Civil Engrs., 116, 776–808. 681
Kang H.S., S. Chester and C. Meneveau (2003), Decaying turbulence in an active-grid-generated flow 6 82
and comparisons with large-eddy simulation, J. Fluid Mech. 480, p. 129-160. 683
Khintchine, A. (1934), Korrelationstheorie der stationären stochastischen Prozesse, Mathematische 684
Annalen, 109(1): 604–615. 685
Kolmogorov, A.N., 1941. Dissipation energy in locally isotropic turbulence, Dokl. Akad. Nauk. SSSR, 32, 16-686
18. 687
Koutsoyiannis, D. (2002), The Hurst phenomenon and fractional Gaussian noise made easy, 688
Hydrological Sciences Journal, 47 (4), 573–595. 689
Koutsoyiannis, D. (2003), Climate change, the Hurst phenomenon, and hydrological statistics, 690
Hydrological Sciences Journal, 48 (1), 3–24. 691
Koutsoyiannis, D. (2010), A random walk on water, Hydrology and Earth System Sciences, 14, 585–601. 692
Koutsoyiannis, D. (2012), Re-establishing the link of hydrology with engineering, Invited lecture at the 693
National Institute of Agronomy of Tunis (INAT), Tunis, Tunisia. 694
24
Koutsoyiannis, D. (2013a), Encolpion of stochastics: Fundamentals of stochastic processes, 12 pages, 695
Department of Water Resources and Environmental Engineering National Technical University of Athens, 696
Athens. 697
Koutsoyiannis, D. (2013b), Climacogram-based pseudospectrum: a simple tool to assess scaling 698
properties, European Geosciences Union General Assembly 2013, Geophysical Research Abstracts, Vol. 15, 699
Vienna, EGU2013-4209, European Geosciences Union. 700
Lombardo, F., E. Volpi and D. Koutsoyiannis (2013), Effect of time discretization and finite record 7 01
length on continuous-time stochastic properties, IAHS - IAPSO - IASPEI Joint Assembly, Gothenburg, 702
Sweden, International Association of Hydrological Sciences, International Association for the Physical 703
Sciences of the Oceans, International Association of Seismology and Physics of the Earth's Interior. 704
Lombardo, F., E. Volpi, S. Papalexiou and D. Koutsoyiannis (2014), Just two moments! A cautionary 705
note against use of high-order moments in multifractal models in hydrology, Hydrol. Earth Syst. Sci. 706
Mandelbrot, B. B. (1977), The Fractal Geometry of Nature, Freeman, New York, USA. 707
Papalexiou, S.M., D. Koutsoyiannis and C. Makropoulos (2013), How extreme is extreme? An 708
assessment of daily rainfall distribution tails, Hydrology and Earth System Sciences. 709
Papoulis A. (1991), Probability, Random Variables and Stochastic Processes, 3rd edition, McGraw Hill. 710
Press, W. H., S. A. Teukolsky, W.T. Vetterling and B.P.Flannery (2007), Numerical Recipes: The Art of 711
Scientific Computing (3rd ed.), New York: Cambridge University Press. 712
Pope, S.B., 2000. Turbulent Flows, Cambridge University Press. 713
Stoica P. and Moses R. (2004), Spectral Analysis of Samples, Prentice Hall. 714
Tyralis, H., and D. Koutsoyiannis (2011), Simultaneous estimation of the parameters of the Hurst-715
Kolmogorov stochastic process, Stochastic Environmental Research & Risk Assessment, 25 (1), 21–33. 716
Wiener N. (1930), Generalized Harmonic Analysis, Acta Mathematica, 55: 117–258. 717
Appendix
718
Here, we express the expected value of the discrete time autocovariance in terms only of its true 7 19
continuous time value using the corresponding true climacogram. This is very useful in stochastic 7 20
modelling as it saves computational time (compared to a direct calculation where a sum throughout 721
all the discrete time autocovariances is needed) and also because it gives a physical interpretation of 722
the expected discrete time autocovariance. 723
Eq. 2 can be expressed in terms of the true discrete autocovariance: 724 !8
E
A
B
< < S
|
MZ
A
[KE
A
KE
.
A
B
< 8MS
|

AGE
KE
X
Q
A
(37) 725
The estimation of autocovariance in eq. 9 can be analysed to: 726 P6S_
|
Z7'
E
`[
<aD
M
E
F
I<D
J
F
JKE
LbaD
+[
M
E
F
I<D
J
F
JKE
Lb
FG[
KE
-
E
`[
<'aID
M©LM
FG[
KE
727
\
E
F
I<D
J
F
JKE
LM©]baID
+[
M©LM\
E
F
I<D
J
F
JKE
LM©]b-
E
`[
<ª•'ID
M©L\D
+[
M©]-
«
¬
¬
¬
¬
¬
¬
¬
-
¬
¬
¬
¬
¬
¬
¬
®
”E
M
FG[
KE
728
'ID
M©L\
E
F
I<D
J
F
JKE
LM©]-
«
¬
¬
¬
¬
¬
¬
¬
¬
¬
¬
-
¬
¬
¬
¬
¬
¬
¬
¬
¬
¬
®
”.
M'\D
+[
M©]\
E
F
I<D
J
F
JKE
LM©]-
«
¬
¬
¬
¬
¬
¬
¬
¬
¬
¬
-
¬
¬
¬
¬
¬
¬
¬
¬
¬
¬
®
”¯
Xœ\
E
F
I<D
J
F
JKE
LM©]
.
«
¬
¬
¬
¬
¬
¬
-
¬
¬
¬
¬
¬
¬
®
”°
± (38) 729
where ©6D

7. 730
Below we will express the above sums of expressions E1, E2, E3 and E4 in terms of the true 731
climacogram !8 and true autocovariance in discrete time S
|
Z for j ≥ 1. Firstly, the sum of E1 is: 732 <P
FG[
KE
MZS
|
Z (39) 733
We observe that <PY
FG[
KE
<P‰
FG[
KE
and thus, we only calculate the sum of E3: 734
25
<P‰
FG[
KE
E
F
< < 6ID
+[
M©LID
J
M©L7
F
JKE
FG[
KE
E
F
< < S
|
²MMZ
F
JKE
FG[
KE
FG[
B
QFG[
F
X735
E
F
< < S
|
²MMZ
[JKE
FG[
KE
«
¬
¬
¬
¬
¬
¬
¬
-
¬
¬
¬
¬
¬
¬
¬
®
”³
(40) 736
The sum of E4 can be expressed in terms of the true climacogram: 737 <Pj
FG[
KE
MZ3$%&'
E
F
I<D
J
F
JKE
L-MZ3! (41) 7 38
For the estimation of E5, we distinguish two cases, j n/2 and j > n/2. For the first case, we have: 739
P‘\Zž
Y]ZpS
|
FG[
K[
X
pS
|
X
p S
|
M
FGE
KFG[+E
[GE
KE
[QG[
B
Q[
.F
X
E
F
<S
|
M
FGE
KFG[+E
XZ<S
|
FG[
KE
«
¬
¬
¬
¬
¬
¬
¬
¬
¬
¬
¬
¬
-
¬
¬
¬
¬
¬
¬
¬
¬
¬
¬
¬
¬
®
”´
(42) 740
For the estimation of E6, we have: 741
!YM!YX
E
F
<S
|
ZMX
FG[
KE
«
¬
¬
¬
¬
¬
-
¬
¬
¬
¬
¬
®
Ӧ
(43) 742
and E7 can be expressed as: 743 MZ!YMMZ
.
!IMZLYt (44) 744
For j > n/2, E5 is the same as for j n/2 but with replacing j with n-j and thus, in the general case of E5: 7 45 P‘!YMZ
.
!ZYMMZ
.
!IMZLY (45) 7 46
Thus, eq. 38 results in: 747 6S_
|
Z7
E
`[
HMZS
|
ZX
[
B
F
!ZMZ!M
FG[
B
F
!IMZLO (46) 748
where cZ is usually taken as: n or n – 1 or n j. 749
It is interesting to notice that using eq. 7 we can express the expected discrete time autocovariance of 750
the above using only the true climacogram. 751
... Such structures cannot be known a priori and are often estimated from single short time series (e.g., less than 30 years, which is considered the typical climatic scale). However, because the estimators of both the marginal and dependence structures rely on the selected model, it is argued that their values cannot be robustly estimated when the following conditions are met: (a) the process is characterized by long-range de-pendence (LRD; also called long-term persistence, long-term memory, Hurst phenomenon [3] or Hurst-Kolmogorov (HK) behaviour [4]), which is the dominant case, especially in the hydrological cycle processes (e.g., see global-scale analyses from reanalysis data in [5]; and in hydrometeorological stations in [6]); (b) a single and often short time series of a process is analysed (e.g., of some years of length or even smaller), which is a central practice in the literature; and, (c) statistical bias is not taken into account, which is also the most common practice (e.g., see the discussion on the importance of bias in [7,8]). The mentioned practices are considered unreliable because they may lead to severe underestimation of the expected values (depending on the strength of the LRD behaviour) and thus of the parameters of the selected models. ...
... Additionally, the second-order dependence structure is expressed herein through the climacogram (i.e., the variance of the averaged process at the scale domain [26]), which is adjusted for estimation bias as opposed to the commonly used estimators (see definitions in Beran [27], and comparison among other estimators in [6,7]). Finally, the resulting model form and parameters (such as the first four K-moments, and the fractal and Hurst parameters) are compared with those reported in other studies, where a higher number of streamflow stations was assessed. ...
... Furthermore, all the above mean climacograms, climaco-variograms and CBSs were standardized (i.e., divided by their first value); for example, the mean climacograms for all scales and from both time series of the Swiss stations and the paleoclimatic reconstructions are divided by the average variance of their time series. Notably, only 90% of the available scales were depicted and used for model fitting, since estimations of variance from samples with fewer than 10 values are considered unreliable and misleading [7]. ...
Full-text available
Article
The identification of the second-order dependence structure of streamflow has been one of the oldest challenges in hydrological sciences, dating back to the pioneering work of H.E Hurst on the Nile River. Since then, several large-scale studies have investigated the temporal structure of streamflow spanning from the hourly to the climatic scale, covering multiple orders of magnitude. In this study, we expanded this range to almost eight orders of magnitude by analysing small-scale streamflow time series (in the order of minutes) from ground stations and large-scale streamflow time series (in the order of hundreds of years) acquired from paleoclimatic reconstructions. We aimed to determine the fractal behaviour and the long-range dependence behaviour of the stream-flow. Additionally, we assessed the behaviour of the first four marginal moments of each time series to test whether they follow similar behaviours as suggested in other studies in the literature. The results provide evidence in identifying a common stochastic structure for the streamflow process, based on the Pareto-Burr-Feller marginal distribution and a generalized Hurst-Kolmogorov (HK) dependence structure.
... Next, the synthetic ensemble is used for the simulation of the reservoir water balance, and finally, for the estimation of the hydropower potential for covering residential energy needs. For the second-order dependence structure of the river flow, the climacogram metric is used (expressed in the standard deviation, instead of the variance, of the averaged process vs. scale; [37]), which is shown to have a lower statistical bias compared to other common stochastic tools such as the autocovariance function and the power-spectrum [38]. In previous studies, the stochastic simulation of the water balance in the reservoir, and in particular, of the input streamflow, is performed by preserving the marginal structure of the process (e.g., [39][40][41][42] or by applying a transformation from the Gaussian parent distribution to the desired parent distribution in order to implicitly preserve the marginal distribution and the autocorrelation structure through autoregressive models (e.g., [43][44][45][46][47][48]). Here, we use an algorithm that can explicitly preserve an adequate and necessary number of moments from the marginal function of the gamma distribution and the second-order Hurst-Kolmogorov dependence structure, as identified and expressed through the less statistical biased metric of the climacogram, for a vast range of scales ( [49]; for the merits of this method and applications of explicit preservation of four to six marginal moments, see [50]. ...
... For the second-order dependence structure of the river flow the climacogram metric is used (i.e., standard deviation of the averaged process vs. scale) [37], which is shown to have a lower statistical bias compared with other common stochastic tools, such as the autocorrelation function and the power-spectrum [38], and is based on the theoretical value of the selected stochastic model. ...
... Standard deviation of the averaged process vs. scale for the extended river flow timeseries upstream of the reservoir. Note that the climacogram is estimated up to the scale equal to 20% of the length of the timeseries as suggested to improve fitting in HK processes [38]. Figure 6. ...
Full-text available
Article
Hydropower is the most widely used renewable power source worldwide. The current work presents a methodological tool to determine the hydropower potential of a reservoir based on available hydrological information. A Bayesian analysis of the river flow process and of the reservoir water volume is applied, and the estimated probability density function parameters are integrated for a stochastic analysis and long-term simulation of the river flow process, which is then used as input for the water balance in the reservoir, and thus, for the estimation of the hydropower energy potential. The stochastic approach is employed in terms of the Monte Carlo ensemble technique in order to additionally account for the effect of the intermediate storage retention due to the thresholds of the reservoir. A synthetic river flow timeseries is simulated by preserving the marginal probability distribution function properties of the observed timeseries and also by explicitly preserving the second-order dependence structure of the river flow in the scale domain. The synthetic ensemble is used for the simulation of the reservoir water balance, and the estimation of the hydropower potentialis used for covering residential energy needs. For the second-order dependence structure of the river flow, the climacogram metric is used. The proposed methodology has been implemented to assess different reservoir volume scenarios offering the associated hydropower potential for a case study at the island of Crete in Greece. The tool also provides information on the probability of occurrence of the specific volumes based on available hydrological data. Therefore, it constitutes a useful and integrated framework for evaluating the hydropower potential of any given reservoir. The effects of the intermediate storage retention of the reservoir, the marginal and dependence structures of the parent distribution of inflow and the final energy output are also discussed.
... It may be seen in Table 3.2 that the expressions of the discrete time characteristics may differ substantially from those in continuous time, and thus attention is needed to avoid confusion and misuse. The climacogram and the climacospectrum are exceptions, as they are not affected by discretization (they admit the same expressions for both continuous and discrete time), and have some additional advantages, such as simplicity, close relationship to entropy (see below), and more stable behaviour (Dimitriadis and Koutsoyiannis, 2015;Koutsoyiannis, 2016; which make them the preferable tool in stochastic modelling-even though they are less popular than other tools. ...
... An exact equation has been derived in Dimitriadis and Koutsoyiannis (2015; Table 2). If we estimate the autocorrelation coefficient by: ...
... Estimation of the second derivative from data is too uncertain and makes a very rough graph. The alternative of using the periodogram (the estimate of the power spectrum, which is the Fourier transform of the autocovariance; see section 4.10) is even worse as it entails an even rougher shape and more uncertain estimation than in the autocovariance (see also section 4.10 and Dimitriadis and Koutsoyiannis, 2015). ...
Full-text available
Preprint
This is a working draft of a book in preparation. Current version 0.4 – uploaded on ResearchGate on 25 January 2022. (Earlier versions: 0.3 – uploaded on ResearchGate on 17 January 2022. 0.2 – uploaded on ResearchGate on 3 January 2022. 0.1 (initial) – uploaded on ResearchGate on 1 January 2022.) Some stuff is copied from Koutsoyiannis (2021, https://www.researchgate.net/ publication/351081149). Comments and suggestions will be greatly appreciated and acknowledged.</