ArticlePDF Available

Abstract and Figures

Clustering of extremes is critical for hydrological design and risk management and challenges the popular assumption of independence of extremes. We investigate the links between clustering of extremes and long-term persistence, else Hurst-Kolmogorov (HK) dynamics, in the parent process exploring the possibility of inferring the latter from the former. We find that (a) identifiability of persistence from maxima depends foremost on the choice of the threshold for extremes, the skewness and kurtosis of the parent process, and less on sample size; and (b) existing indices for inferring dependence from series of extremes are downward biased when applied to non-Gaussian processes. We devise a probabilistic index based on the probability of occurrence of peak-over-threshold events across multiple scales, which can reveal clustering, linking it to the persistence of the parent process. Its application shows that rainfall extremes may exhibit noteworthy departures from independence and consistency with an HK model.
Content may be subject to copyright.
1
Revealing hidden persistence in maximum rainfall records
1
1*Theano Iliopoulou and 1Demetris Koutsoyiannis
2
1Department of Water Resources, Faculty of Civil Engineering, National Technical University of
3
Athens, Heroon Polytechneiou 5, GR-157 80 Zografou, Greece
4
* Corresponding author. Tel.: +30 6978580613
5
E-mail address: tiliopoulou@hydro.ntua.gr
6
7
Abstract
8
Clustering of extremes is critical for hydrological design and risk management and challenges the
9
popular assumption of independence of extremes. We investigate the links between clustering of
10
extremes and long-term persistence, else Hurst-Kolmogorov (HK) dynamics, in the parent process
11
exploring the possibility of inferring the latter from the former. We find that a) identifiability of
12
persistence from maxima depends foremost on the choice of the threshold for extremes, the
13
skewness and kurtosis of the parent process, and less on sample size; and b) existing indices for
14
inferring dependence from series of extremes are biased downward when applied to non-Gaussian
15
processes. We devise a probabilistic index based on the probability of occurrence of peak-over-
16
threshold events across multiple scales, which can reveal clustering, linking it to the persistence of
17
the parent process. Its application shows that rainfall extremes may exhibit noteworthy departures
18
from independence and consistency with an HK model.
19
Keywords: extremes, clustering, HK dynamics, persistence, peaks over threshold, rainfall
20
2
1. Introduction
21
The identification of clusters in series of extreme events is an ongoing research topic in
22
geosciences, including hydrology, one that is particularly challenging due to the large estimation
23
uncertainties involved when studying series of rare events. Regardless of the complications, this
24
question has multiple important implications for earth sciences which range from understanding
25
natural variability and process dynamics to correctly applying stochastic models for the purposes
26
of inference and prediction. This is evident as most relevant hydrological and engineering
27
applications require settling this issue at the early stage of the analysis, by either assuming
28
independence (e.g. Coles et al., 2001; Kottegoda and Rosso, 2008) or ensuring’ it through
29
‘adequate’ sampling techniques (Ferro and Segers, 2003). Thereby, the research focus can be
30
uniquely placed on the more straightforward task of characterizing the probability distribution of
31
extremes. For example, typical flood guidelines suggest that successive flood events have at least
32
a certain separation lag time in order to be considered independent for the application of models
33
(Lang et al., 1999). In light of concerns for intensification of hydrological extremes due to
34
anthropogenic forcing, the investigation of clustering receives additional interest (Ntegeka and
35
Willems, 2008; Tye et al., 2018; Merz et al., 2016; Serinaldi and Kilsby, 2018), as attribution of
36
trends to an external deterministic forcing presupposes that at least the presence of natural inherent
37
variability has been beforehand properly accounted for. In this respect, increasing evidence
38
reporting the presence of persistence in various hydroclimatic variables (Hurst, 1951;
39
Koutsoyiannis, 2003; Montanari, 2003; Markonis and Koutsoyiannis, 2016; O’Connell et al.,
40
2016; Iliopoulou et al., 2016; Tegos et al., 2017; Dimitriadis, 2017) gives rise to the question of
41
whether or not, and to what extent a regular behaviour of the extremes originating from persistent
42
processes could be misinterpreted as a result of an anthropogenic cause.
43
3
This study deals with the investigation of clustering behaviour in records of maxima with a
44
special focus on long-term daily rainfall observational records. As recent studies reported evidence
45
on the presence of persistence in annual rainfall (Iliopoulou et al., 2016; Tyralis et al., 2018), the
46
question of possible propagation of persistence to rainfall extremes naturally arises. Therefore, the
47
research objectives can be articulated as follows: a) what are the links between persistence in the
48
parent process and clustering of extreme events and can we infer the one from the other? and, b)
49
what constitutes an informative characterization for clustering?
50
Typically, the assessment of clustering properties of extremes from a timeseries implies the
51
selection of a threshold based on which the sampling of ‘extreme’ events is performed. Then,
52
clustering is quantified based on the departure of the properties of extremes from the ones of a
53
purely random process. This evaluation is performed either by considering the series of the inter-
54
arrival times of extremes or equivalently, the series of counts of extreme events over counting
55
windows. There is a direct correspondence between the two; it is well-known for example, that
56
when the data come from a Poisson process, their inter-arrival times are exponentially distributed
57
(Papoulis, A., 1991).
58
In the hydrological literature, various ad-hoc, sometimes visual and subjective approaches
59
are used in order to quantify departures of extremes typically floods from independence and
60
characterize clustering. The most systematic usually consist of some type of ‘window’ analysis,
61
where the timeseries is split into subperiods which are examined for presence of perturbations in
62
the statistics of extreme events, often corroborated by statistical testing (Marani and Zanetti, 2015;
63
Ntegeka and Willems, 2008; Willems, 2013). Avoiding the need for selection of time windows to
64
study, Merz et al. (2016) applied a dispersion index, although mostly focused on a combination of
65
kernel-based methods coupled with statistical significance tests to identify flood-rich and flood-
66
4
poor periods in Germany. Yet, with a few exceptions only (Eichner et al., 2011; Serinaldi and
67
Kilsby, 2016, 2018), the majority of clustering characterizations for hydrological extremes are not
68
studied in relation to the dependence properties of the parent process, which is the focal point here.
69
To evaluate the clustering properties in a more comprehensive framework, two established
70
indices are used in geophysical timeseries analysis, especially for the clustering analysis of
71
earthquakes (Telesca et al., 2002) and storms (Vitolo et al., 2009) and are based on the ‘counts’
72
approach: the index of dispersion and the Allan factor. Both can be used to formally test the data
73
against the Poissonian assumption (Serinaldi, 2013; Serinaldi and Kilsby, 2013) and it is reported
74
that their scaling behaviour can also reveal the fractal properties of the underlying process for ideal
75
rate fractal processes (Thurner et al., 1997). The latter is related to the asymptotic dependence
76
property for large time horizons, long-term dependence, quantified by the Hurst parameter. For
77
revealing the HK dynamics, a number of methods examining the original series also exist with the
78
climacogram (Koutsoyiannis, 2010), i.e. the variance of the aggregated process over scales, shown
79
to be the most robust (Dimitriadis and Koutsoyiannis, 2015).
80
We briefly review the above methods based on their performance on revealing the clustering
81
of extremes sampled from synthetic timeseries generated in order to exhibit various degrees of
82
persistence and different marginal distributions. We assess their degree of generality and showcase
83
their shortcomings when extremes arrive from complex processes. We show how the interplay of
84
persistence and moments of order higher than 2 (skewness, kurtosis) can obscure the identification
85
of the latter from extremes. Accordingly, we propose an alternative characterization of clustering
86
based on a probabilistic index with distinctive features and test the proposed method on synthetic
87
and real-world rainfall data. We find that the index exhibits some advantageous characteristics,
88
namely it is capable of quantifying clustering by probabilistic means, linking it to the scaling
89
5
behavior of the parent process for a range of distributional and dependence properties. It also
90
enables modelling the probabilities of threshold exceedances across multiple timescales, which
91
can be used as a simulation tool, that being an important advance over existing methods that have
92
mainly an inferential character.
93
2. Dataset
94
An extended dataset comprising the 60 longest available daily rainfall records is investigated in
95
terms of its extreme properties. The data used in this study are collected from global datasets, i.e.
96
Global Historical Climatology Network Daily database (Menne et al., 2012) and European
97
Climate Assessment and Dataset (Klein Tank et al., 2002) and third parties acknowledged in the
98
acknowledgments sections. They present an update of the previous dataset explored in Iliopoulou
99
et al. (2018) of long rainfall records surpassing 150 years of daily values. The geographic location
100
of the rain gauges is shown in Figure 1. The length of the timeseries enables the investigation of
101
clustering on extended time horizons from daily to yearly timescales.
102
3. Methodological framework
103
3.1 Definition of notation and mathematical formulation
104
Let xi be a stationary stochastic process in discrete time i, i.e. a collection of random variables xi,
105
and x:={x1, … xn} a single realization (observation) of the latter, i.e. a timeseries. Now for u being
106
a threshold, u ϵ , we define the process of peaks over the threshold (POT) consisting of events
107
surpassing the threshold u, i.e,
108
 
 
(1)
6
Let also N(t) be a counting process of POT occurrences in time which is an increasing function of
109
time t. We then define the process z(k)q:= N(qk)N((q 1)k) as the number of occurrences of POT
110
at timescale k and at discrete time q =1,.., n/k.
111
We also define by m(k)q:= max(q 1)k j qk{xj} the block maxima series, which is formed by
112
extracting the maximum order statistic of the observations divided in non-overlapping equally
113
sized periods of length (timescale) k. In the following, we call the timescale k as timescale of
114
filtering of the maxima. Figure 2 visualizes all the above at two temporal scales for a realization
115
of a random process with Hurst parameter H=0.8 and the first four moments following a
116
generalized Pareto distribution.
117
3.2 Generation of benchmark synthetic timeseries
118
To evaluate the ability of clustering indices to discern the dependence characteristics of the parent
119
(extreme generating) process, we first produce a set of synthetic timeseries xi with different
120
dependence properties and marginal distributions. For the generation scheme, we employ a
121
simulation procedure proposed by Dimitriadis and Koutsoyiannis (2018) which is capable of
122
generating timeseries explicitly reproducing chosen theoretical moments up to any order together
123
with any (long-term) persistence structure, i.e. the HK dynamics. We focus here only on processes
124
exhibiting persistence as these are the ones assumed consistent with the natural phenomena studied
125
and also known to produce long-term clustering. For the marginal distribution, we generate
126
timeseries preserving up to the 4th order moments following the normal, generalized Pareto and
127
gamma distributions. The higher-order moments of the generated timeseries follow the entropic
128
distribution. Because the generation scheme preserves up to a specific number of moments from a
129
distribution, the final shape may be slightly distorted with respect to the theoretical one, and
130
therefore, we denote the generated series as type-gamma and type-Pareto, instead of gamma and
131
7
generalized Pareto, respectively. For a detailed explanation of the generation scheme, the reader is
132
referred to the Dimitriadis and Koutsoyiannis (2018). We focus only on the first four moments as
133
higher-order classical moments cannot be reliably estimated from ordinary sample sizes
134
(Lombardo et al., 2014).
135
The properties of these timeseries are chosen in order to cover a range of statistical and
136
stochastic characteristics in terms of skewness, kurtosis and H parameter, and therefore, provide a
137
good benchmark sample for testing the indices in typical but also more extreme cases. Their
138
properties are summarized in Table 1.We note that these timeseries are meant as theoretical case
139
studies to test the appropriateness of the indices and are not to be considered as synthetic series of
140
daily rainfall, which are the real-world data in question. However, since only the sequence of
141
counts of extremes is of interest, and not their actual values, it is not necessary to strictly preserve
142
other properties of daily rainfall, i.e. intermittency, and therefore in this sense comparison to the
143
synthetic series is allowed. A sample of the timeseries is plotted in Figure 3.
144
Additionally to the above benchmark timeseries, we generate ensembles of shorter timeseries
145
having lengths of 150 × 365 values, i.e. equal to the minimum record length of the rainfall data,
146
and preserving the same moments as the benchmark timeseries. These series are produced using
147
fewer weights for the SMA scheme, up to 2000, but applying proper weight adjustment scheme
148
(Koutsoyiannis, 2016). They reproduce two dependence structures, white noise, and HK with H
149
parameter 0.7, considered a representative value for hydrological processes. The purpose of the
150
second benchmark sample is to test the methods in ‘realistic’ record lengths and to evaluate
151
estimation uncertainty by Monte Carlo simulations that require significantly less computational
152
effort compared to the first benchmark sample, which is generated using 106 weights, i.e. equal to
153
the series length.
154
8
3.3 Second-order characterization of extremes
155
The Hurst parameter is a well-established measure of persistence. It can be estimated from the
156
slope of the double logarithmic plot of the standard deviation of the averaged process versus the
157
averaging timescale, i.e. the climacogram (Koutsoyiannis, 2010). To test how the estimator is
158
impacted when extremes are used instead of the original values, we compute the H parameter for
159
extremes extracted from windows (scales) of length 1 to N/10 where N is the timeseries length. An
160
example is provided in Figure 4. The first H value (scale = 1) is the value for the original data (the
161
parent timeseries) and as the scale increases progressively the time series is filtered to show only
162
the most extreme data. For instance, if the basic timescale is daily, the estimated H parameter at
163
timescale k=365 corresponds to the H parameter of the annual maxima. To reduce computational
164
time, we perform estimation every 50 scales. The results are shown in Figure 4 are for the normal
165
and the other benchmark timeseries. The impact of skewness and kurtosis on the estimator is
166
striking as in the case of non-Gaussian timeseries, the H parameter quickly decays to 0.5, as if
167
there was independence. On the contrary, for the normal timeseries it yields almost a stable value.
168
To verify that this is not due to the impact of standard deviation bias induced by dependence, we
169
performed estimation for selected timescales with the unbiased with respect to standard deviation,
170
LSSV estimator (Koutsoyiannis, 2003; Tyralis and Koutsoyiannis, 2011) as well. We also repeat
171
the estimation for the shorter timeseries and plot the average values at each scale obtained from
172
the Monte Carlo experiments. The same conclusion can be drawn. The climacogram estimator is
173
severely biased downward for extremes originating from non-Gaussian processes and falsely
174
indicates independence after a few scales of filtering. Therefore, we do not consider the
175
climacogram estimator for the rest of the analysis on empirical data. Since it has been shown that
176
the climacogram is closely related to other second-order characterizations, i.e. spectrum and
177
9
autocovariance (Dimitriadis and Koutsoyiannis, 2015), we also expect similar results from the
178
latter. Furthermore, Barunik and Kristoufek (2010) have shown that even for the underlying
179
process (the parent), the sampling properties of the Hurst parameter estimation by some other
180
approaches, i.e. the multifractal detrended fluctuation analysis and the detrending moving average,
181
are also greatly impacted by heavy tails.
182
3.4 Clustering indices: the dispersion index
183
A well-known measure of clustering of events is the index of dispersion of counts, also known as
184
the Fano factor (e.g. Thurner et al., 1997), which is defined as the ratio of the variance of the counts
185
of events versus their mean number at a specific timescale k, i.e.:
186
 


(2)
For a Poisson point process, the dispersion index is unity for all timescales. According to the
187
literature (Thurner et al., 1997) the dispersion index exhibits power-law scaling behavior which is
188
linked to the underlying persistence structure. Although the exact form of the equation provided
189
could not be theoretically validated per se at small scales, we have confirmed the power-law
190
scaling at large scales, which by revising the original equation (Thurner et al., 1997), can be
191
expressed as:
192
 
(3)
where c a real parameter and k0 denotes the scaling onset timescale (a minimum time scale, for
193
which the above scaling law applies). It follows that the exponent 2H 1 can be obtained as the
194
slope of the double logarithmic plot of the dispersion index versus the timescale for and
195
therefore the Hurst parameter, H, ranging in the [0,1] interval can be estimated accordingly. An
196
example is provided in Figure 5.
197
10
We test the dispersion index against samples of Gaussian and non-Gaussian timeseries
198
exhibiting HK dynamics. Namely, we use a) the two long benchmark series (N=106), the normal
199
and the type-Pareto, both exhibiting H = 0.8, and b) the ensemble of simulations of shorter length
200
(equal to daily values for 150 years) for three different distributions, normal, type-gamma with
201
shape parameter α=0.01 and type-Pareto with α=0.2, all exhibiting H = 0.7. For the second sample,
202
we provide the average value estimated from the 103 Monte Carlo simulations of the dispersion
203
index at each scale. Results are shown in Figure 5.
204
At first, it is worth noting that the onset scale, from which scaling arises, appears to be
205
smaller for the long compared to the shorter timeseries. The related H parameters are estimated
206
from Eq.3 for onset scale k0=500, for both cases, in order to ensure a more robust estimate (yet
207
fitted lines are extrapolated backwards to scale 365). It can be seen that the index yields satisfactory
208
approximations of H only for the normal distribution and the long benchmark series (estimated H
209
= 0.77, theoretical H = 0.8), whereas results are biased downward for the non-Gaussian one
210
(estimated H = 0.67, theoretical H = 0.8). In the case of the shorter record length, the bias severely
211
increases as the index yields H parameters falsely denoting independence (average H = 0.54).
212
There is also a considerable degree of ambiguity regarding the selection of the onset time, a task
213
that requires visual examination and subjective judgement. Due to the above reasons, and namely,
214
to the observed underestimation of persistence for common record lengths, we do not consider the
215
index for the rest of the analysis. A more sophisticated use of the dispersion index as well as bias
216
correction methods may be possible but remain out of the scope of the paper. For more information
217
on a related index, the Allan factor, and its properties for testing independence the reader is referred
218
to Serinaldi and Kilsby (2013).
219
11
3.5 A new probabilistic index to identify multi-scale clustering behaviour
220
The above review highlights the complexity involved in identifying clustering of extremes and the
221
need to devise an informative and objective characterization able to reveal persistence even for
222
non-Gaussian series, which are usually the ones of interest in geophysical studies. To address this,
223
we formulate a straightforward and assumption-free representation of clustering by estimating the
224
probability of occurrence of extreme events across multiple scales. The proposed probabilistic
225
index is defined as follows.
226
We set a threshold to the original timeseries and select the data surpassing the threshold as
227
extreme events, hence, forming the Peaks Over Threshold series, yi. Accordingly, we form the
228
series of counts of the POT events for each scale, z(k), as explained in Section 3.1 (see also Fig.2).
229
We additionally, define the binary process  to denote the event of exceedance of the threshold
230
at each time interval q of size k, q = 1, …,:
231
  

 

(4)
232
Then, the probability of exceedance of the threshold for timescale k is obtained as the frequency
233
of exceedances estimated from all  intervals:
234
 


(5)
235
The latter is the exceedance probability (of the threshold) versus the scale (EPvS) and its
236
complement,   is the non-exceedance probability versus scale (NEPvS). Evidently,
237
at scale k = 1 the EPvS is an estimate of the probability of the threshold value,  , and
238
12
the NEPvs is  . For example, in the previous applications, the threshold value was
239
selected so that F(u) = 0.05. For a purely random process, the NEPvS is obtained as:
240

(6)
241
where p is the probability of non-exceedance at the basic scale k = 1 and equals 1 F(u). Therefore,
242
for white noise processes, the probability of occurrence of extremes across scales is fully
243
determined by the choice of the threshold (controlling its probability at the basic scale) and the
244
scale. For HK processes though, a different behaviour is revealed, with the probabilities of non-
245
exceedance of the threshold being larger than those obtained under independence. This property
246
of HK is discussed and investigated extensively in the following Section 4.
247
To model the NEPvS, we revisit a probabilistic model proposed by Koutsoyiannis (2006) to
248
describe the clustering behaviour of dry spells in rainfall timeseries. The model derives from an
249
entropy-maximization framework and was originally proposed to describe the probability dry
250
across different timescales. The latter, according to our definition, corresponds to a threshold
251
taking the value of 0. Therefore, in a similar manner to the probability dry, we obtain the
252
probability of non-exceedance of the threshold at scale k as:
253
 
 
(7)
where u is the threshold parameter and η, ξ ϵ [0, 1]. For η = 1 and ξ = 0.5, Eq. 7 describes the white
254
noise process. To allow backward extendibility to scale k = 0, the positivity of the base should be
255
ensured and therefore the following inequality should hold:  
. We apply both the index
256
and the proposed model to the synthetic series as well as to the rainfall data and assess their
257
performance in characterizing clustering. We evaluate the index’s ability to reveal dependence by
258
13
examining its performance for all the benchmark timeseries and we test its robustness by varying
259
all the involved factors, i.e. sample size, marginal distribution’s properties and threshold value.
260
4. Results
261
4.1 Relating multi-scale clustering to LRD behavior
262
We estimate the NEPvS index for the synthetic benchmark timeseries setting the threshold of
263
extremes to 5%. The benchmark series have length 106 and therefore for a 5% threshold we obtain
264
50 000 extreme values (POT events). We investigate the temporal scales 1 to 1000, since the
265
index’s applicability to larger scales is to some extent also conditioned by the available sample
266
size (this feature is discussed in Section 4.2).
267
Results from the NEPvS application are demonstrated on a double logarithmic plot of minus
268
natural logarithm of the non-exceedance probability of the threshold versus the scale, which for
269
most cases yields a straight line (Fig. 6). Some interesting insights can be derived. As persistence
270
increases, the probability of no occurrences of extremes in a scale progressively increases
271
(equivalently, its minus logarithm shown in the plots decreases), which is true for all the
272
examined distribution types. As already mentioned, there is a maximum temporal scale until which
273
the index is informative. The latter, which we will call the max-discernible scale, is the scale for
274
which the estimated (from the simulated series) non-exceedance probability equals zero as at least
275
one extreme event is encountered in every one of the  intervals. In this case, the minus
276
logarithm of the NEPvS tends to infinity and is not shown on the plots. For a given number of
277
extremes and thus, sample size, the max-discernible scale depends on the H parameter; the larger
278
the persistence, the more timescales are required in order to ‘encounter’ the extremes. This can be
279
explained by considering that another manifestation of clustering of extremes is the existence of
280
prolonged periods of time with no extreme occurrences.
281
14
It is worth noticing that the marginal properties are irrelevant for the NEPvS of the white
282
noise process. The latter is also proved in Fig.6 as the lines of all the white noise timeseries with
283
different marginals are completely identical, for which there is a theoretical justification. Likewise,
284
for H parameters no far from 0.5 the different non-Gaussian distributions (Fig. 6a) yield negligible
285
differences on the NEPvS plots. However, notable differences appear for H > 0.7. Specifically, the
286
non-Gaussian NEPvS plots evidently differ from the NEPvS of the normal distribution, especially
287
for large H values, with the latter showing more apparent clustering behaviour.
288
The NEPvS model (Eq. 7) fits perfectly all the range of non-Gaussian distributions, with a
289
slight exception for the normal timeseries at small scales (k < 50) and very large H parameter (H
290
= 0.9).
291
4.2 NEPvS index sensitivity to sample size, threshold selection and distribution type
292
Having established that a representation in terms of the minus logarithm of probability vs.
293
timescale, like that of Fig. 6, reflects the presence of persistence for a range of distribution types,
294
we aim to frame its statistical behaviour for different configurations of extreme value analysis. For
295
this purpose, the statistical behaviour of this graph is investigated by means of Monte Carlo
296
simulation starting from the white noise case, which will serve as a benchmark model for
297
identifying dependence from the rainfall data.
298
4.2.1 Sample size impact
299
We generate two ensembles of 103 white noise timeseries with sample sizes 150 years (150×365
300
daily values) and 300 years respectively, thus covering all the range of observed record lengths of
301
our data set, and we produce the NEPvS plots for both lengths, shown in Figure 7. As expected,
302
the larger sample size produces narrower Monte Carlo Prediction Limits (MCPL), yet the
303
difference is almost negligible. The fact that sample sizes of this order of magnitude yield only
304
15
minimal differences in the MCPL gives confidence in attributing the differences between the
305
models that are examined next to other factors instead. The essential change however, between the
306
two sample sizes is the propagation of the max-discernible scale to a larger timescale for the longer
307
timeseries (Fig. 7). The latter is due to the fact that ‘extremes’ are distributed in longer time periods
308
for the longer series, and therefore, the longer the series the more timescales may be inspected for
309
clustering.
310
4.2.2 Threshold impact
311
The selection of the threshold is the most important choice when analysing records of maxima. It
312
is generally acknowledged that choosing ‘high’ thresholds for the extremes results to observations
313
that are located far in the right tail of the distribution, and therefore they are of interest, but
314
simultaneously, increases uncertainty as the sampled observations are fewer. The exact opposite
315
is true for lower thresholds. Therefore, one has to seek an optimal threshold compromising this
316
trade-off.
317
We first evaluate the choice of the threshold by examining four different thresholds
318
associated with exceedance probabilities 0.5%, 1%, 5% and 10% respectively, applied for the
319
benchmark case of independence, as seen in Figure 8. It is interesting to note that the main effect
320
of the threshold for the iid case is the opportunity to apply the index to larger scales if the threshold
321
is increased (smaller probability of exceedance). This is due to the fact that for the same record
322
length, fewer extreme events are likely to be more separated in time and therefore, require longer
323
timescales to be grouped.
324
We also inspect the impact of the threshold in relation to the H parameter of the parent
325
process for three distribution types from the benchmark series, type-Pareto with a = 0.2, type-
326
gamma with a = 0.01 and the normal. In this case, we evaluate three different thresholds, 5%, 10%
327
16
and 20%. Although the latter threshold would be considered ‘low’ for most extreme value
328
analyses, here it is of interest, as by varying the threshold we aim to investigate the limits of
329
identifiability of the HK behaviour, and not to focus on the exact shape of the distribution tail. To
330
this aim, we fit the probabilistic model introduced in Eq.7 to each timeseries and evaluate the
331
ability to reveal persistence through the identifiability of the fitted parameters, η and ξ. In Fig. 9,
332
the impact of the threshold is striking within the same distribution with lower threshold values
333
(e.g. 20%) increasing identifiability of the parameters more than 10%. Additionally, it can be seen
334
that the η parameter is more sensitive to the normal distribution, while on the contrary the ξ
335
parameter is sensitive to increasing skewness and kurtosis.
336
By performing the above experiments, we have demonstrated the twofold effect of the
337
threshold: ‘lower’ thresholds (higher probability of exceedance) enable better identifiability of
338
persistence, yet they limit application of the index to less scales, and vice versa.
339
4.2.3 Distribution type
340
At this stage, for the same threshold (5%), sample size (150 years) and H (0.7) parameter, we
341
estimate the NEPvS index for the shorter benchmark series characterized by different marginal
342
properties, and thus distribution tails, so as to focus solely on the impact of skewness and kurtosis
343
on the index. Results are plotted in Figure 10. Two important conclusions can be drawn: a)
344
clustering of extremes and its identifiability is, in this case too, greater for the normal distribution
345
(Fig.10a) and b) for a specified non-Gaussian distribution, clustering is greater and also more
346
visible for increasing skewness and kurtosis (Fig. 10b). The latter is a significant advance as the
347
reviewed tools in sections 3.3 and 3.4 showed very high downward bias for increasing higher order
348
moments of the non-Gaussian distributions and practically no difference among them for the
349
record lengths available (150 years). We also provide the plots of the fitting of η and ξ parameters
350
17
computed for the long benchmark series with H parameters ranging in [0.5 0.99] as well as their
351
comparison in the Appendix (Fig. A1-A3). All three plots confirm the above observations.
352
4.3 Clustering in real world rainfall extremes I: identifying clustering mechanisms in the
353
parent process
354
Rainfall is a complex geophysical process for the stochastic modelling of which it is necessary to
355
take into account its mixed-type marginal distribution (due to intermittency), the presence of cyclo-
356
stationarity (seasonality and also diurnal cycle for sub-daily scales) as well as its scale dependence
357
structure (Markonis and Koutsoyiannis, 2016). It is expected that all these mechanisms affect the
358
clustering process of extremes.
359
In the following, we investigate their impact separately, although we note that the interplay
360
among them may not necessarily allow the robust disentanglement of their effects at the different
361
scales.
362
4.3.1 Influence of probability dry
363
The most distinctive feature of the rainfall process is its highly intermittent nature at fine temporal
364
scales (Koutsoyiannis, 2006). To statistically account for intermittency, the marginal distribution
365
is formed as a mixed (discrete-continuous) type one, having a probability mass function
366
concentrated at 0 and a probability distribution function to describe the nonzero values. Therefore,
367
if pd is the probability of no-rain, termed probability dry, then the cumulative distribution function
368
for the whole rainfall record  can be defined in terms of the conditional distribution of wet
369
days  as:
370
 (8)
371
Since the threshold of extremes u is obtained as the quantile with a chosen probability of
372
exceedance, it is evident that in the case of mixed-type processes, as in daily rainfall, the same
373
18
threshold value will have a different probability of exceedance for the whole process and for the
374
wet process (the nonzero rainfall). By simple probabilistic statements, it follows that the two
375
exceedance probabilities of the threshold u for the compound and the wet process, pc(u) and pw(u),
376
respectively, are related as:
377
 (9)
378
where pd = 1 pc(0) is the probability dry. Therefore, the exceedance probability for the same
379
threshold is higher for the wet series, which means that depending on the probability dry, the values
380
surpassing the same threshold may not necessarily belong to the right tail of the wet series as
381
‘extremes’. For instance, a threshold u with associated exceedance probability 5% for the whole
382
rainfall record with probability dry equal to 80% yields exceedance probability 25% for the wet
383
series, and therefore the resulting series of POT events would also include lower rainfall values.
384
While this is not a limitation of the methodology, it should be properly accounted for in order to
385
a) ensure that the resulting extremes are indeed towards the right end of the wet series tail and b)
386
to make meaningful comparisons among stations with different values of the probability dry. For
387
this reason, we compute pd for all stations in order to make sure that the resulting extremes are
388
surpassing relevant thresholds. As previously shown, the latter is important since the threshold is
389
the key control on the results.
390
4.3.2 Influence of seasonality
391
Seasonality may be in cases an important attribute of extreme rainfall impacting the central
392
tendency of rainfall maxima belonging to different seasons and inducing temporal clustering in the
393
series of extremes (Iliopoulou et al. 2018). Since our aim is to focus on the impact of HK dynamics
394
on clustering of extremes, we apply deseasonalization schemes to the original series in order to
395
smooth out the seasonal components and reduce associated clustering. By doing so, we may
396
19
perform Monte Carlo simulations with one marginal distribution per station for the validation of
397
the chosen models. We note that a perfect separation of the impact of seasonality from HK
398
dynamics may not always be possible, as in stations exhibiting strong seasonality we anticipate
399
interplay between the two.
400
We consider two different methods for removing seasonality. The first one, termed M1, is a
401
simple standardization scheme performed on a monthly basis. The daily values xi belonging to
402
each month m = 1,..,12 are transformed by subtracting the mean and dividing by the standard
403
deviation of all daily values belonging to the same month, as follows:  

404
. This method effectively removes seasonality from the first two moments of the data. In order
405
to deal with higher order moments, we apply a second deseasonalization scheme denoted M2,
406
which is based on the Normal Quantile Transformation (NQT) also applied on a monthly basis.
407
The daily series for each month m are transformed to standard Gaussian quantiles through the
408
inverse function of the standard Gaussian cumulative distribution,  with their
409
cumulative probability F(x) estimated via their Weibull plotting position. Consequently, after the
410
transformation, all daily values of each month follow the standard normal distribution. We found
411
that the two schemes show minimal differences in the index’s behaviour, with the most apparent
412
ones belonging to the stations of Athens (Fig. 11b), Palermo and Lisbon.
413
In Figure 11, we plot three characteristic cases of the NEPvS behaviours found in the data:
414
a) in a typical station with minimal to no seasonality (Oxford), extremes are not affected by
415
deseasonalization schemes (Fig.11a), b) in a station with prominent seasonality (Athens, Fig.11b),
416
a stronger deseasonalization scheme (M2) maybe required, and c) in an intermediate case
417
(Helsinki, Fig.11c), the seasonal component in extremes is effectively dealt by with the simpler
418
scheme (M1). The majority of the stations (40) belong to the third category, while for 17 stations
419
20
accounting for seasonality yields minimal to no difference. These findings are consistent in general
420
with the analysis of Iliopoulou et al. (2018) on the presence of seasonality in extreme rainfall.
421
4.3.3 Rainfall scaling regimes
422
In order to highlight the motivation behind selecting the daily rainfall as a case study for the
423
method and establish the ‘target’ persistence structure that we aim to reveal, we estimate the
424
persistent properties of the previously deseasonalized daily rainfall series. To this aim, we compute
425
the H parameter through the climacogram as introduced in section 3.3 All the empirical
426
climacograms are plotted in Figure 12. The estimated average persistence (Table 2) is close but
427
even larger than the global estimate (H≈0.6) of Iliopoulou et al. (2016) concerning annual rainfall.
428
Remarkably, in many stations we observe a change of the scaling regime, namely an intensification
429
of persistence, at scales above yearly. A similar result was observed in the work of Markonis and
430
Koutsoyiannis (2016) for rainfall records at the over-decadal scale. This behaviour is also evident
431
in the Table 2 reporting the estimated H parameters for the daily and above-yearly scales.
432
4.4 Clustering in real world rainfall extremes II: HK dynamics?
433
4.4.1 Analysis of daily rainfall extremes in the Netherlands
434
It should be evident by now that the clustering dynamics of extremes depend not only on the
435
persistent properties of the parent process but on its higher-order moments as well. The
436
identifiability of clustering also varies depending on the choice of the threshold, which may be
437
needed to be modified for mixed type processes, as discussed before. In our case, this means that
438
depending on the probability dry of each station the chosen threshold will correspond to a different
439
one for the ‘wet’ record of each station. Therefore, a blind comparison of different stations with
440
the obtained MCPL for a given threshold could be uninformative depending on the variability of
441
probability dry in the sample of the stations. In order to apply the methodology effectively in as
442
21
many stations as possible we assume a climatically homogenous regions in which the rainfall
443
timeseries can be regarded as realizations of a single process. For this purpose, we select the region
444
of the Netherlands in which 28 out of the 60 stations are located and preliminary analysis showed
445
small variability of the summary statistics. We estimate the average values of the first four
446
moments of the deseasonalized records for all 28 stations and we also estimate the H parameter
447
resulting from the analysis of the daily values. We form an ensemble of 103 Monte Carlo
448
simulations for the average number of years of the sample (160 years) with an HK-model
449
preserving the first four moments and subsequently, compare its clustering behaviour with the one
450
observed from the sample of the stations. We also repeat the Monte Carlo simulation for a white-
451
noise process. We present both at Fig. 13. It is evident that the assumed model is consistent with
452
the majority of the observed records, with only a few stations located at the south-west of the
453
Netherlands exhibiting even stronger clustering outside of the 95% region of the assumed HK
454
model. As expected, as the threshold increases evidence of persistence is progressively ‘lost’ and
455
the probabilistic behavior of POT occurrences approaches a random one.
456
4.4.2 Stykkisholmur case study
457
As a second case study we select a single station located in Stykkisholmur, Iceland, which is the
458
station with the most peculiar behaviour among all those we analysed. We repeat the Monte Carlo
459
analysis for both a white noise process and a HK process preserving the first four moments and
460
the H (= 0.65) parameter of the record. Results are shown in Figure 14. It is interesting to note that
461
clustering in this case appears stronger than predicted by the HK model. The Monte Carlo
462
experiment is repeated for H = 0.7 to explore the possible impact of estimation uncertainty due to
463
the standard deviation bias in finite sample sizes (Koutsoyiannis and Montanari, 2007). In this
464
case, the MCPL approach the observed data for the lower threshold, yet the impact is lower for the
465
22
higher threshold. A similar behaviour was found in the station of Uppsala. We hypothesize that
466
this ‘discrepancy’ between the persistence found in the parent process and the stronger one implied
467
by the extremes might be explained by the impact of large-scale atmospheric circulation patterns
468
(as the NAO) on rainfall extremes, which might need even longer record lengths in order to be
469
effectively summarized by the second-order characterization provided by the H parameter.
470
4.4.3 Modeling the clustering behavior
471
We apply the NEPvS model to both seasonal and deseasonalized timeseries of the rainfall data of
472
all 60 stations in order to assess its applicability in all cases. We employ the deseasonalized scheme
473
M1. In Fig.15 we plot the boxplots of the estimated parameters η and ξ as well as the RMSE for
474
the seasonal and the deseasonalized series for three different threshold, 1%, 5% and 10%. From
475
the fitted parameters, it is reaffirmed by this analysis as well that as the threshold decreases the
476
estimates of the parameters deviate from the ones obtained for the iid case (ξ = 0.5 and η = 1).
477
From the RMSE (Fig. 15c), it can be seen that the proposed model describes very well the
478
deseasonalized data and fairly well the original observations, and in both cases the modelling
479
efficiency improves for lower thresholds. Seasonality is associated with increased temporal
480
clustering in the intermediate scales (approx. 20-150 days), which manifests with a curvature in
481
the NEPvS plots that the model captures less efficiently compared to the deseasonalized case,
482
typically producing a straight line plot. Also, it is evident that results concerning the threshold are
483
not as robust for this case, since the impact of the threshold on seasonal clustering may vary
484
depending on the specific seasonal regime. For instance, it is expected that for stations with
485
prominent seasonality, high thresholds will show increased clustering only in the wettest season,
486
whereas lower threshold will enable inspection of clustering in more seasons. However, depending
487
on the characteristics of the seasonal regime and the intensity of the specific seasons, the temporal
488
mixture of extremes from the different seasons differs from case to case, and thus, it is not
489
23
straightforward to discern the impact of seasonality from a bulk fitting to all cases. On the other
490
hand, for the deseasonalized cases it is clear that ‘dependence’ emerges as the threshold lowers.
491
5. Discussion
492
Clustering of extreme events is related to the presence of persistence, or HK dynamics, in natural
493
processes. Here we approached this relationship with a twofold intention; first to ‘retrieve’
494
persistence from records of maxima, and second, to characterize it by probabilistic means. To this
495
aim, we have introduced the NEPvS index, for which we also propose a model. The index
496
examines the probabilistic behaviour of POT occurrences across multiple scales and proved
497
successful in revealing persistence from extremes from various non-Gaussian timeseries, for which
498
well-known tools performed poorly.
499
It seems, though, to be difficult to establish general analytical relationships linking the
500
NEPvS behaviour to the H parameter of the parent process, which is true without even considering
501
the uncertainty involved in estimating H from small record lengths in the first place. As the H
502
parameter is a second-order characterization of a process, generation schemes reproducing H
503
behaviour but coupled with different marginal distributions (having different high order moments),
504
will yield different behaviours of extremes. For instance, clustering of extremes and its
505
identifiability appears to be much more prominent in Gaussian processes. The task therefore, of
506
linking clustering of extremes to the H parameter, without also accounting for the specific high
507
order moments of the timeseries seems infeasible. We showed though, that the threshold is a key
508
determinant in this respect, as lowering the threshold, i.e. moving towards the central tendency of
509
the data, enables better identification of persistence. On the contrary, as the threshold increases,
510
evidence of persistence is progressively lost and the behaviour of extremes may falsely suggest
511
independence of the parent process.
512
24
Application of the NEPvS index to daily rainfall data showed that there may exist significant
513
departures from independence, particularly for lower thresholds, which are dependent on the
514
location and specific climatic region. In general, the behaviour of rainfall extremes in multiple
515
case studies (28 stations in the Netherlands and 1 in Iceland) was found by means of extensive
516
Monte Carlo simulations, to be consistent with HK dynamics characterized by moderate H
517
parameters (in the range 0.6-0.7). The NEPvS model showed a very good fit to the probabilistic
518
behaviour of exceedances for the seasonal and deseasonalized observations across multiple scales
519
for all 60 stations. As a similar version of the model has been previously proposed to describe the
520
probability dry across multiple scales (Koutsoyiannis, 2006), this result suggests that there exists
521
a probabilistic law which effectively describes the multi-level exceedances of rainfall thresholds
522
across scales, from zero-crossings (wet days) to high-level crossings, as the ones examined here.
523
From a theoretical point of view, these findings suggest that it is important to study change
524
and clustering in a consistent stochastic framework examining the whole process behaviour, in
525
order to better understand the process dynamics and avoid retaining ‘preconceived’ assumptions,
526
such as iid, which may be inconsistent with the physical reality. For instance, various trend tests
527
assume iid for the examined process, while modified tests accounting for persistence (Hamed,
528
2008), also do not consider its interplay with the higher order moments. Therefore, it is likely that
529
they fail to account for extremes from complex processes, leaving aside issues regarding
530
problematic applications due to misinterpretation of stationarity (Koutsoyiannis and Montanari,
531
2015; Montanari and Koutsoyiannis, 2014). Overdispersion in POT rainfall events has been also
532
studied lately and attributed to a mixture of Poisson models, representing different climate regimes
533
(Tye et al., 2018) as well as seasonality mechanisms (Serinaldi and Kilsby, 2013). Although, we
534
have found as well that in some cases seasonality accounts for most of the observed clustering in
535
25
the rainfall extremes, by performing multiple MC experiments focusing on the deseanonalized
536
extremes, we have revealed consistency with HK dynamics. We note though that as the H
537
parameter for rainfall revolves around the value of 0.6 and rainfall is a heavily skewed process, it
538
is expected that identifiability of persistence from extremes will be limited, except if ones lowers
539
the threshold. Nevertheless, this highlights an alternative scientific hypothesis to be considered in
540
‘attribution’ studies, which is the emergence of clustering and overdispersion of extremes from
541
persistence in the parent process.
542
From a practical point of view, the presence of persistence in the parent process affects
543
estimation of extreme values, and therefore various design outcomes, in multiple ways. Although
544
the theoretical definition of return period is still valid under presence of persistence (Koutsoyiannis
545
2008; Volpi et al., 2015;), the statistical estimates of distribution quantiles for a specified return
546
period are severely impacted. Other important implications concern flood risk underestimation
547
under persistence (Serinaldi and Kilsby, 2016), as well as underestimation of IDF curves when the
548
temporal dependence is disregarded (Roy et al. 2019). Therefore, although persistence of the
549
parent process is less evident in the series of its extremes, and it is highly unlikely that it can be
550
fully retrieved except for very low thresholds, its impact cannot be disregarded when studying
551
extremes, even if the latter appear independent. Yet theoretical arguments exist concerning validity
552
of well-known theorems under relaxed assumptions of iid, for instance fundamental EVT results
553
(limiting distributions etc.) which hold true under weak presence of persistence (Leadbetter, 1983).
554
However, for scientific applications, which involve estimation from data of finite, and typically
555
small record lengths, the presence of persistence in the process induces uncertainty in the
556
estimation, as the actual information content of the data is lower than that for iid conditions
557
26
(Koutsoyiannis and Montanari, 2007), and this uncertainty inevitably propagates into the extreme
558
value estimates.
559
The existence of clustering also increases the arguments towards the use of the POT method
560
for sampling of extremes, instead of block maxima approaches which tend to hide dependence, as
561
also evident in Fig.2. As the threshold plays a vital role, using POT approaches with more than
562
one event per year on average, which is the common practice, is also equally important. Empirical
563
declustering approaches (Lang et al., 1999) may as well be non-effective if they do not take into
564
account each process characteristics. In this regard, we argue that instead of seeking to resort to
565
independence, often at the cost of reducing the available information (e.g. by discounting
566
‘dependent’ data), accounting for dependence is a more viable and consistent way forward. In fact,
567
the use of all the set of observations has been recently advocated (Volpi et al., 2019), while the
568
emergence of new types of high-order moments (Koutsoyiannis, 2019) that exploit the whole set
569
of observations, provide an improved stochastic framework for applying this principle.
570
6. Conclusions
571
This research deals with the question of identifying the links between persistence in the parent
572
process and clustering of extremes, with the specific aim to ‘rediscover’ the usually ‘lost’
573
persistence when one examines records of maxima. This is achieved by devising a probabilistic
574
characterization of clustering of extremes. The main findings are summarized below:
575
a. There is significant influence from both the second-order properties and the high-order
576
moments of the parent process on the generated extremes, and therefore characterizations
577
of clustering of extremes need to account for both.
578
b. Identifiability of persistence from records of maxima is in general limited and weakens as
579
the threshold for extremes increases.
580
27
c. The estimates of the Hurst parameter from the climacogram analysis and from the
581
dispersion index are found to be severely biased downward when derived from extremes
582
originating from non-Gaussian processes.
583
d. A new probabilistic index is proposed to represent clustering based on the probability of
584
non-exceedance of a given threshold across scales, called the NEPvS (non-exceedance
585
probability vs scale) index.
586
e. The NEPvS exhibits scaling behaviour which is described by a proposed model accurately
587
simulating the probability of exceedance of a threshold at multiple temporal scales.
588
f. The index is transparent and can be directly used for statistical testing of departures from
589
independence. Case-specific Monte Carlo simulations are needed to validate more
590
complicated models coupling persistence with different marginal properties.
591
g. The POT approach applied with ‘low’ thresholds is a robust and informative way to reveal
592
the clustering dynamics of extremes, in contrast to the block maxima method which hinders
593
identifiability of persistence.
594
h. Deseasonalized daily rainfall POT events may show prominent departures from
595
independence especially at lower thresholds, which may become important depending on
596
the climatic region. Extensive station-specific Monte Carlo experiments showed
597
consistency of clustering of extremes for various examined thresholds with assumed HK
598
models fitted based on the properties of the parent process.
599
Further research is required in order to obtain analytical mathematic results for extremes
600
arising from persistent processes, with the aim of constructing estimators for any distribution type
601
and dependence structure without the need for Monte Carlo validations. However, the latter is
602
doubtful as a task, since extremes over scales are controlled by higher order moments, which are
603
28
also difficult to estimate correctly from data (Lombardo et al., 2014). Recently proposed moment
604
types with unbiased estimators across all orders that can also model joint properties of processes
605
could provide a way to circumvent this (Koutsoyiannis, 2019).
606
We conclude that extremes tend to ‘hide’ the persistence of the parent process, often falsely
607
signalling independence. Regardless however of the strength of the evidence, the impact of
608
persistence in the parent process on the estimation of extreme values is nonetheless present. In this
609
respect, more research should focus on the stochastic properties of extremes from natural
610
processes, where dependence mechanisms manifest themselves across various temporal scales and
611
challenge common assumptions and practices.
612
Acknowledgments
613
We greatly thank the Radcliffe Meteorological Station, the Icelandic Meteorological Office
614
(Trausti Jónsson), the Czech Hydrometeorological Institute, the Finnish Meteorological Institute,
615
the National Observatory of Athens, the Department of Earth Sciences of the Uppsala University
616
and the Regional Hydrologic Service of the Tuscany Region
617
(servizio.idrologico@regione.toscana.it) for providing the required data for each region
618
respectively. We are also grateful to Professor Ricardo Machado Trigo (University of Lisbon) for
619
providing the Lisbon timeseries, to Professor Marco Marani (University of Padua) for providing
620
the Padua timeseries and to Professor Joo-Heon Lee (Joongbu University) for providing the
621
Seoul timeseries. All the above data were freely provided after contacting the acknowledged
622
sources. The remaining timeseries are publicly available by the data providers in the ECA&D
623
project (http://www.ecad.eu), and in the GHCN-Daily database
624
(https://data.noaa.gov/dataset/global-historical-climatology-network-daily-ghcn-daily-version-3).
625
The analyses were performed in the Python 2.6 (Python Software Foundation. Python Language
626
Reference, version 2.7, available at http://www.python.org) using the contributed packages
627
pandas, scipy and seaborn. The codes used for the generation of the synthetic SMA series
628
(Dimitriadis and Koutsoyiannis, 2018) are available at:
629
https://www.itia.ntua.gr/en/docinfo/1656/. We are grateful to the Associate Editor Elena Volpi
630
and the anonymous reviewer for the encouraging and constructive comments.
631
References
632
Barunik, J., Kristoufek, L., 2010. On Hurst exponent estimation under heavy-tailed distributions.
633
Physica A: Statistical Mechanics and its Applications 389, 38443855.
634
https://doi.org/10.1016/j.physa.2010.05.025
635
29
Coles, S., Bawa, J., Trenner, L., Dorazio, P., 2001. An introduction to statistical modeling of
636
extreme values. Springer.
637
Dimitriadis, P., 2017. Hurst-Kolmogorov dynamics in hydrometeorological processes and in the
638
microscale of turbulence.
639
Dimitriadis, P., Koutsoyiannis, D., 2018. Stochastic synthesis approximating any process
640
dependence and distribution. Stoch Environ Res Risk Assess 32, 14931515.
641
https://doi.org/10.1007/s00477-018-1540-2
642
Dimitriadis, P., Koutsoyiannis, D., 2015. Climacogram versus autocovariance and power spectrum
643
in stochastic modelling for Markovian and HurstKolmogorov processes. Stochastic
644
environmental research and risk assessment 29, 16491669.
645
Eichner, J.F., Kantelhardt, J.W., Bunde, A., Havlin, S., 2011. The statistics of return intervals,
646
maxima, and centennial events under the influence of long-term correlations, in: In
647
Extremis. Springer, pp. 243.
648
Ferro, C.A., Segers, J., 2003. Inference for clusters of extreme values. Journal of the Royal
649
Statistical Society: Series B (Statistical Methodology) 65, 545556.
650
Hamed, K.H., 2008. Trend detection in hydrologic data: the MannKendall trend test under the
651
scaling hypothesis. Journal of hydrology 349, 350363.
652
Hurst, H.E., 1951. Long-term storage capacity of reservoirs. Trans. Amer. Soc. Civil Eng. 116,
653
770808.
654
Iliopoulou, T., Koutsoyiannis, D., Montanari, A., 2018. Characterizing and modeling seasonality
655
in extreme rainfall. Water Resources Research 54, 62426258.
656
Iliopoulou, T., Papalexiou, S.M., Markonis, Y., Koutsoyiannis, D., 2016. Revisiting long-range
657
dependence in annual precipitation. Journal of Hydrology.
658
Klein Tank, A.M.G., Wijngaard, J.B., Können, G.P., Böhm, R., Demarée, G., Gocheva, A., Mileta,
659
M., Pashiardis, S., Hejkrlik, L., Kern-Hansen, C., 2002. Daily dataset of 20th-century
660
surface air temperature and precipitation series for the European Climate Assessment.
661
International journal of climatology 22, 14411453.
662
Kottegoda, N.T., Rosso, R., 2008. Applied statistics for civil and environmental engineers.
663
Blackwell Malden, MA.
664
Koutsoyiannis, D., 2019. Knowable moments for high-order stochastic characterization and
665
modelling of hydrological processes. Hydrological Sciences Journal 0, 115.
666
https://doi.org/10.1080/02626667.2018.1556794
667
Koutsoyiannis, D., 2016. Generic and parsimonious stochastic modelling for hydrology and
668
beyond. Hydrological Sciences Journal 61, 225244.
669
Koutsoyiannis, D., 2010. HESS Opinions" A random walk on water". Hydrology and Earth System
670
Sciences 14, 585601.
671
Koutsoyiannis, D., 2008, Probability and statistics for geophysical processes,
672
doi:10.13140/RG.2.1.2300.1849/1, National Technical University of Athens, Athens.
673
Koutsoyiannis, D., 2006. An entropic-stochastic representation of rainfall intermittency: The
674
origin of clustering and persistence. Water Resources Research 42.
675
Koutsoyiannis, D., 2003. Climate change, the Hurst phenomenon, and hydrological statistics.
676
Hydrological Sciences Journal 48, 324.
677
Koutsoyiannis, D., Montanari, A., 2015. Negligent killing of scientific concepts: the stationarity
678
case. Hydrological Sciences Journal 60, 11741183.
679
Koutsoyiannis, D., Montanari, A., 2007. Statistical analysis of hydroclimatic time series:
680
Uncertainty and insights. Water resources research 43.
681
30
Lang, M., Ouarda, T., Bobée, B., 1999. Towards operational guidelines for over-threshold
682
modeling. Journal of hydrology 225, 103117.
683
Leadbetter, M.R., 1983. Extremes and local dependence in stationary sequences. Probability
684
Theory and Related Fields 65, 291306.
685
Lombardo, F., Volpi, E., Koutsoyiannis, D., Papalexiou, S.M., 2014. Just two moments! A
686
cautionary note against use of high-order moments in multifractal models in hydrology.
687
Hydrology and Earth System Sciences 18, 243255.
688
Marani, M., Zanetti, S., 2015. Long-term oscillations in rainfall extremes in a 268 year daily time
689
series. Water Resources Research 51, 639647.
690
Markonis, Y., Koutsoyiannis, D., 2016. Scale-dependence of persistence in precipitation records.
691
Nature Climate Change 6, 399.
692
Menne, M.J., Durre, I., Vose, R.S., Gleason, B.E., Houston, T.G., 2012. An Overview of the
693
Global Historical Climatology Network-Daily Database. J. Atmos. Oceanic Technol. 29,
694
897910. https://doi.org/10.1175/JTECH-D-11-00103.1
695
Merz, B., Nguyen, V.D., Vorogushyn, S., 2016. Temporal clustering of floods in Germany: Do
696
flood-rich and flood-poor periods exist? Journal of Hydrology 541, 824838.
697
Montanari, A., 2003. Long-range dependence in hydrology. Theory and applications of long-range
698
dependence 461472.
699
Montanari, A., Koutsoyiannis, D., 2014. Modeling and mitigating natural hazards: Stationarity is
700
immortal! Water Resources Research 50, 97489756.
701
Ntegeka, V., Willems, P., 2008. Trends and multidecadal oscillations in rainfall extremes, based
702
on a more than 100-year time series of 10 min rainfall intensities at Uccle, Belgium. Water
703
Resources Research 44.
704
O’Connell, P.E., Koutsoyiannis, D., Lins, H.F., Markonis, Y., Montanari, A., Cohn, T., 2016. The
705
scientific legacy of Harold Edwin Hurst (18801978). Hydrological Sciences Journal 61,
706
15711590.
707
Papoulis, A., 1991. Probability, Random Variables, and Stochastic Processes, 3rd ed. McGraw-
708
Hill, New York.
709
Roy, T., Dimitriadis, P., Iliopoulou, T., Koutsoyiannis, D., 2019, A probabilistic Intensity-
710
Duration-Frequency framework considering temporal dependence (in preparation)
711
Serinaldi, F., 2013. On the relationship between the index of dispersion and Allan factor and their
712
power for testing the Poisson assumption. Stochastic environmental research and risk
713
assessment 27, 17731782.
714
Serinaldi, F., Kilsby, C.G., 2018. Unsurprising Surprises: The Frequency of Record-breaking and
715
Overthreshold Hydrological Extremes Under Spatial and Temporal Dependence. Water
716
Resources Research 54, 64606487.
717
Serinaldi, F., Kilsby, C.G., 2016. Understanding persistence to avoid underestimation of collective
718
flood risk. Water 8, 152.
719
Serinaldi, F., Kilsby, C.G., 2013. On the sampling distribution of Allan factor estimator for a
720
homogeneous Poisson process and its use to test inhomogeneities at multiple scales.
721
Physica A: Statistical Mechanics and its Applications 392, 10801089.
722
Tegos, A., Tyralis, H., Koutsoyiannis, D., Hamed, K., 2017. An R function for the estimation of
723
trend significance under the scaling hypothesis-application in PET parametric annual time
724
series. Open Water Journal 4, 6.
725
Telesca, L., Cuomo, V., Lapenna, V., Macchiato, M., 2002. On the methods to identify clustering
726
properties in sequences of seismic time-occurrences. Journal of seismology 6, 125134.
727
31
Thurner, S., Lowen, S.B., Feurstein, M.C., Heneghan, C., Feichtinger, H.G., Teich, M.C., 1997.
728
Analysis, synthesis, and estimation of fractal-rate stochastic point processes. Fractals 5,
729
565595.
730
Tye, M. R., Katz, R. W., & Rajagopalan, B. 2019. Climate change or climate regimes? Examining
731
multi-annual variations in the frequency of precipitation extremes over the Argentine
732
Pampas. Climate Dynamics, 53(1-2), 245-260. doi:10.1007/s00382-018-4581-9.
733
Tyralis, H., Dimitriadis, P., Koutsoyiannis, D., O’Connell, P.E., Tzouka, K., Iliopoulou, T., 2018.
734
On the long-range dependence properties of annual precipitation using a global network of
735
instrumental measurements. Advances in Water Resources 111, 301318.
736
Tyralis, H., Koutsoyiannis, D., 2011. Simultaneous estimation of the parameters of the Hurst
737
Kolmogorov stochastic process. Stochastic Environmental Research and Risk Assessment
738
25, 2133.
739
Vitolo, R., Stephenson, D.B., Cook, I.M., Mitchell-Wallace, K., 2009. Serial clustering of intense
740
European storms. Meteorologische Zeitschrift 18, 411424.
741
Volpi, E., Fiori, A., Grimaldi, S., Lombardo, F., Koutsoyiannis, D., 2015. One hundred years of
742
return period: Strengths and limitations. Water Resources Research 51, 85708585.
743
Volpi, E., et al., 2019. Save hydrological observations! Return period estimation without data
744
decimation. Journal of Hydrology, 571, 782-792. doi:10.1016/j.jhydrol.2019.02.017.
745
Willems, P., 2013. Adjustment of extreme rainfall statistics accounting for multidecadal climate
746
oscillations. Journal of hydrology 490, 126133.
747
748
Figures
749
750
32
Figure 1. Map of the 60 stations with longest records used in the analysis.
751
752
Figure 2. Explanatory graph of mathematical formulation. (a) Parent timeseries, (b) POT series,
753
(c) temporal distribution of counts of POT at basic scale k=1, (d) temporal distribution of counts
754
of POT occurrences at scale k=10 and (e) block maxima series at scale k=10.
755
33
756
Figure 3. Visualization of three timeseries with H=0.8 and different marginal distributions
757
generated from the 4-moment SMA scheme (Dimitriadis and Koutsoyiannis, 2018). The legends
758
report the mean, standard deviation, coefficient of skewness and coefficient of kurtosis of each
759
distribution.
760
34
761
Figure 4. H parameters estimated from block maxima series at increasing scale of filtering for
762
(a) benchmark series of length 106 from HK models with H=0.8 following normal and type-
763
Pareto distributions and (b) average H values from 103 Monte Carlo simulations for HK models
764
with H=0.7 and three different marginal distributions, type-gamma, type-Pareto and normal.
765
35
766
Figure 5. Index of dispersion of POT occurrences versus scale (double logarithmic axes) and
767
estimated H parameters for scales>500 for (a) benchmark series of length 106 from HK models
768
with theoretical H=0.8 following normal and type-Pareto distributions and (b) average values
769
from 103 Monte Carlo simulations for HK models with theoretical H=0.7 and three different
770
marginal distributions, type-gamma, type-Pareto and normal.
771
36
772
Figure 6. Minus natural logarithm of non-exceedance probability versus scale (NEPvS) index on
773
double logarithmic axes along with the fit of the proposed model (Eq. 2) for (a) benchmark non-
774
Gaussian timeseries (type-gamma and type-Pareto) and (b) benchmark normal timeseries, for a
775
range of H parameters.
776
37
777
Figure 7. Minus natural logarithm of non-exceedance probability versus scale (NEPvS) index on
778
double logarithmic axes for white noise timeseries and two sample lengths, 150×365 and
779
300×365.
780
38
781
Figure 8. Minus natural logarithm of non-exceedance probability versus scale (NEPvS) index on
782
double logarithmic axes for white noise timeseries (length 150×365) and variations of the
783
sampling threshold of extremes
784
39
785
Figure 9. (a) Parameter η variation for increasing H parameter and different combinations of the
786
sampling threshold and distribution type. (b) Parameter ξ variation for increasing H parameter
787
and different combinations of the sampling threshold and distribution type.
788
40
789
Figure 10. Minus natural logarithm of non-exceedance probability versus scale (NEPvS) index
790
on double logarithmic axes along with 95% MCPL for (a) H=0.7 with type-gamma (α=0.1) and
791
type-Pareto (α=0.2), and white noise and (b) H =0.7 for two type-gamma distributions with
792
α=0.1 and α=0.01.
793
41
794
Figure 11. Minus natural logarithm of non-exceedance probability versus scale (NEPvS) index
795
on double logarithmic axes for white noise timeseries and seasonal and deseasonalized series by
796
methods 1 (M1) and 2 (M2) for the stations of Oxford (a), Athens (b) and Helsinki (c).
797
42
798
Figure 12. Empirical climacograms of the 60 daily rainfall series used in the analysis along with
799
theoretical lines for H=0.5, 0.6, 0.7, 0.8.
800
43
801
Figure 13. Minus natural logarithm of non-exceedance probability versus scale (NEPvS) index
802
on double logarithmic axes for deseasonalized series for the 28 rainfall records in the
803
Netherlands along with 95% MCPL of the fitted model with H=0.7, for four different thresholds:
804
(a) 10%, (b) 5%, (c) 1% and (d) 0.5%.
805
44
806
Figure 14. Minus natural logarithm of non-exceedance probability versus scale (NEPvS) index
807
on double logarithmic axes for the deseasonalized series of Stykkisholmur in Iceland along with
808
95% MCPL of the fitted models with H=0.65 and H=0.7, for four different thresholds: (a) 10%,
809
(b) 5%, (c) 1% and (d) 0.5%.
810
45
811
Figure 15. Boxplots of (a) parameter η, (b) parameter ξ and (c) RMSE from the fitting of the
812
model to the seasonal and deseasonalized series by M1 for three different thresholds (1%, 5%
813
and 10%).
814
815
816
817
818
819
820
46
Tables
821
Table 1 Properties of the benchmark samples used in the experiments.
822
Distribution
type
Parameters
Mean
Variance
Skewness
Kurtosis
H
Length
Shape
Scale
Location
Normal
-
2.6
1.25
1.25
2.6
0
3
0.5-0.99
106
Gamma
0.1
5.1
-
0.51
2.6
6.325
63
0.5-0.99
106
Gamma
0.01
16.125
-
0.16
2.6
20
603
0.5-0.99
106
Pareto
0.1
1
0
1.11
1.54
2.81
17.83
0.5-0.99
106
Pareto
0.2
1
0
1.25
2.6
4.65
73.8
0.5-0.99
106
823
Table 2 Summary statistics (first and third quantiles, Q1 and Q3, mean and standard deviation,
824
St.Dev.) of the properties of the rainfall dataset. Mean, Variance, Skewness and Kurtosis are
825
estimated for the wet record.
826
Statistic
Mean
Variance
Skewness
Kurtosis
Prob. Dry
Hdaily
Hannual
Years
Missing %
Q1
3.68
24.85
2.9
17.28
0.47
0.56
0.55
153
0.75
Mean
4.98
64.85
3.39
24.03
0.55
0.63
0.67
169.25
2.62
Q3
5.91
64.64
3.54
25.85
0.61
0.7
0.77
173
1.31
St.Dev.
2.27
94.15
0.72
10.94
0.11
0.09
0.13
24.66
5.11
47
Appendix I
827
828
Figure A1. Plots of η and ξ parameters versus the H parameter and polynomial fitting for the (a)
829
type-Pareto with α=0.1, (b) type-Pareto with α=0.2, (c) type-gamma with α=0.1 and (d) type-
830
gamma with α=0.01.
831
832
833
Figure A2. Plots of η and ξ parameters versus the H parameter and polynomial fitting for the
834
normal distribution.
835
48
836
Figure A3. Plots of η and ξ parameters versus the H parameter for the type-Pareto with α=0.1
837
and α=0.2, type-gamma with α=0.1 and α=0.01 and the normal.
838
839
... Neglecting dependence results in underestimation of extremes. On the other hand, the procedure of extracting block maxima leads to severe distortion of the dependence structure (Iliopoulou and Koutsoyiannis, 2019;Koutsoyiannis, 2021a), whereas the concept of taking values over threshold relies on a tacit assumption of time independence, which may be inappropriate, particularly for the streamflow process . ...
... These effects are particularly important when we study maxima, neglecting the small values (below a high threshold), a practice that tends to hide the existence of long-range dependence even in long records (see Iliopoulou and Koutsoyiannis, 2019). ...
Preprint
Full-text available
This is a working draft of a book in preparation. Current version 0.4 – uploaded on ResearchGate on 25 January 2022. (Earlier versions: 0.3 – uploaded on ResearchGate on 17 January 2022. 0.2 – uploaded on ResearchGate on 3 January 2022. 0.1 (initial) – uploaded on ResearchGate on 1 January 2022.) Some stuff is copied from Koutsoyiannis (2021, https://www.researchgate.net/ publication/351081149). Comments and suggestions will be greatly appreciated and acknowledged.
... Earth and atmospheric sciences Precipitation [21,22,91,92,93,94,95,96,97,98,99,100,101,102,103] [ 103,104,105,106,107,108,109,110,111,112] Temperature [3,99,113,114,115,116,117,118,119,120,121,122,123,124] [ 125,126,127,128,129,130,131,132,133,134,135,136,137] Soil moisture [138,139,140,141,142,143,144,145,146,147,148,149,150,151] Climate processes [4,152,153,154,155,156,157,158,159] Fog events [160,161,162] Sea and reservoir level [163,164,165,166,167,168,169] Atmospheric pollution [170,171,172,173,174,175,176,177,178,179,180] Geophysics and seismology [2,25,26,181,182,183,184,185,186,187,188,189,190] Energy resources Solar [20,191,192,193,194,195,194,195,196,197,198,199,200] Wind [19,201,202,203,204,205,206] Complex Networks ...
... The study is focused on discussing the multi-fractal temporal scaling properties of precipitation and river discharge records on large timescales. In [106] the links between clustering of rainfall extremes and long-term persistence (characterized by the Hurst coefficient) is studied. In [107] a DFA and MF-DFA algorithms have been applied to analyze the long-term persistence of river runoff fluctuations, using data of 12 mayor rivers in China. ...
Preprint
Full-text available
Persistence is an important characteristic of many complex systems in nature, related to how long the system remains at a certain state before changing to a different one. The study of complex systems' persistence involves different definitions and uses different techniques, depending on whether short-term or long-term persistence is considered. In this paper we discuss the most important definitions, concepts, methods, literature and latest results on persistence in complex systems. Firstly, the most used definitions of persistence in short-term and long-term cases are presented. The most relevant methods to characterize persistence are then discussed in both cases. A complete literature review is also carried out. We also present and discuss some relevant results on persistence, and give empirical evidence of performance in different detailed case studies, for both short-term and long-term persistence. A perspective on the future of persistence concludes the work.
... An important remark is that when LRD is present and explicitly preserved, the first four moments can capture a vast range of scales of a synthetic time series (see discussion, algorithms and applications in [28,29]). Additionally, in an analysis of any range of scales or related to extremes, more moments may need to be preserved (see discussion, algorithms and applications in [8,30,42]), or even additional attributes related to streamflow [43]. ...
Article
Full-text available
The identification of the second-order dependence structure of streamflow has been one of the oldest challenges in hydrological sciences, dating back to the pioneering work of H.E Hurst on the Nile River. Since then, several large-scale studies have investigated the temporal structure of streamflow spanning from the hourly to the climatic scale, covering multiple orders of magnitude. In this study, we expanded this range to almost eight orders of magnitude by analysing small-scale streamflow time series (in the order of minutes) from ground stations and large-scale streamflow time series (in the order of hundreds of years) acquired from paleoclimatic reconstructions. We aimed to determine the fractal behaviour and the long-range dependence behaviour of the stream-flow. Additionally, we assessed the behaviour of the first four marginal moments of each time series to test whether they follow similar behaviours as suggested in other studies in the literature. The results provide evidence in identifying a common stochastic structure for the streamflow process, based on the Pareto-Burr-Feller marginal distribution and a generalized Hurst-Kolmogorov (HK) dependence structure.
... Persistence governs the mechanism of interarrival times of extreme events including floods (Villarini et al. 2009) and droughts (Pelletier and Turcotte 1997). It also influences the duration of successive peaks over threshold in streamflow (Dimitriadis and Koutsoyiannis 2020) rather than the magnitude of extreme streamflow (Iliopoulou and Koutsoyiannis 2019). ...
Article
In this study, catchments are considered as complex systems, and information-theoretic measures are used to capture temporal streamflow characteristics. Emergence and self-organization are used to quantify information production and order in streamflow time series, respectively. The measure complexity is used to quantify the balance between emergence and self-organization in streamflow variability. The complexity measure is found to be effective in distinguishing streamflow variability for high and low snow-dominated catchments. The state of persistence-reflecting the memory of streamflow time series, is shown to be related to the complexity of streamflow. Moreover, it is observed that conventional causal detection methods are constrained by the state of persistence, and more robust methods are needed in hydrological applications considering persistence.
... Particularly, a high Hurst parameter (high above 0.5) indicates a physical process with high variability that can remain generally at high levels or at low levels for extended periods of time. Analysis of the time dependence of rainfall and flood extremes may be erroneous if the LRD behavior is not taken into account in a statistical analysis, such as in a flood risk assessment or in rainfall extremes (Iliopoulou and Koutsoyiannis 2019). ...
... This approach may seem at odds with the common practice of using block maxima or a certain amount of values over threshold, yet it is sounder in terms of retained information on the process properties. Series of extremes tend to hide the persistence of the parent process (Iliopoulou and Koutsoyiannis, 2019), while discarding the body of the distribution, in favour of modelling its tails, has been criticized as wasteful usage of data (Volpi et al., 2019). Current approaches promote the use of the parent process as the natural basis for estimating design quantities in hydrological design. ...
Chapter
Full-text available
Ombrian curves, i.e. mathematic relationships linking average rainfall intensity to time scale of averaging and return period, also known as IDF (intensity-duration-frequency) curves, are essential tools in hydrology and engineering. Their use is supported by long-term hydrological experience, yet related formulas remain mostly empirical and lack a theoretical basis. As such, they entail several theoretical inconsistencies, particularly over large scales, while they cannot be applied in simulation. This Chapter reviews the typical form of ombrian curves along with its merits and limitations, and presents a modelling framework to overcome the latter by advancing curves to stochastic models of rainfall intensity. This is achieved through stochastic modelling of the joint second-order and marginal higher-order properties of the parent process. Two variants of the ombrian model are presented; a full version valid over time scales spanning multiple orders of magnitude, and a simplified relationship applicable over fine scales of the order of common applications, i.e. sub-hourly to daily. Specific emphasis is given to the fitting procedure combining multiple data sources and addressing bias in the estimation induced by temporal dependence. A detailed application of the ombrian model is performed for the rainfall station in Bologna (Italy), highlighting the efficiency of the resulting curves over multiple scales.
... Particularly, a high Hurst parameter (high above 0.5) indicates a physical process with high variability that can remain generally at high levels or at low levels for extended periods of time. Analysis of the time dependence of rainfall and flood extremes may be erroneous if the LRD behavior is not taken into account in a statistical analysis, such as in a flood risk assessment or in rainfall extremes (Iliopoulou and Koutsoyiannis 2019). ...
Article
This review provides a broad overview of the current state of flood research, current challenges, and future directions. Beginning with a discussion of flood generating mechanisms, the review synthesizes the literature on flood forecasting, multivariate and non-stationary flood frequency analysis, urban flooding, and the remote sensing of floods. Challenges and future flood research directions are outlined and highlight emerging topics where more work is needed to help mitigate flood risks. It is anticipated that the future urban systems will likely have more significant flood risk due to the compounding effects of continued climate change and land-use intensification. The timely prediction of urban floods, quantification of the socio-economic impacts of flooding, and developing mitigation strategies will continue to be challenging. There is a need to bridge the scales between model capabilities and end-user needs by integrating multiscale models, stakeholder input, and social and citizen science input for flood monitoring, mapping, and dissemination. Although much progress has been made in using remote sensing for flood applications, recent and upcoming Earth Observations provide excellent potential to unlock additional benefits for flood applications. The flood community can benefit from more downscaled, as well as ensemble scenarios that consider climate and land-use changes. Efforts are also needed for data assimilation approaches, especially, to ingest local, citizen and social media data. Also needed are enhanced capabilities to model compound hazards and assess as well as help reduce social vulnerability and impacts. The dynamic and complex interactions between climate, societal change, watershed processes, and human factors often confronted with deep uncertainty highlights the need for transdisciplinary research between science, policymakers, and stakeholders to reduce flood risk and social vulnerability.
Article
Full-text available
In flood frequency analysis (FFA), annual maximum (AM) model is widely adopted in practice due to its straightforward sampling process. However, AM model has been criticized for its limited flexibility. FFA using peaks-over-threshold (POT) model is an alternative to AM model, which offers several theoretical advantages; however, this model is currently underemployed internationally. This study aims to bridge the current knowledge gap by conducting a scoping review covering several aspects of the POT approach including model assumptions, independence criteria, threshold selection, parameter estimation, probability distribution, regionalization and stationarity. We have reviewed the previously published articles on POT model to investigate: (a) possible reasons for underemployment of the POT model in FFA; and (b) challenges in applying the POT model. It is highlighted that the POT model offers a greater flexibility compared to the AM model due to the nature of sampling process associated with the POT model. The POT is more capable of providing less biased flood estimates for frequent floods. The underemployment of POT model in FFA is mainly due to the complexity in selecting a threshold (e.g., physical threshold to satisfy independence criteria and statistical threshold for Generalized Pareto distribution – the most commonly applied distribution in POT modelling). It is also found that the uncertainty due to individual variable and combined effects of the variables are not well assessed in previous research, and there is a lack of established guideline to apply POT model in FFA.
Article
Persistence is an important characteristic of many complex systems in nature, related to how long the system remains at a certain state before changing to a different one. The study of complex systems’ persistence involves different definitions and uses different techniques, depending on whether short-term or long-term persistence is considered. In this paper we discuss the most important definitions, concepts, methods, literature and latest results on persistence in complex systems. Firstly, the most used definitions of persistence in short-term and long-term cases are presented. The most relevant methods to characterize persistence are then discussed in both cases. A complete literature review is also carried out. We also present and discuss some relevant results on persistence, and give empirical evidence of performance in different detailed case studies, for both short-term and long-term persistence. A perspective on the future of persistence concludes the work.
Article
Recent advances in the study of extreme values, namely the Metastatistical Extreme Value (MEV) framework, showed good performances for the estimation of extremes in several fields. Here we adopt MEV for flood frequency analysis and leverage its intrinsic property of allowing for the choice of the distribution which best describes ordinary peaks to improve flood estimation. To this end, we develop a non-parametric approach to select ex ante the most suitable distribution of ordinary peaks between Gamma and Log-Normal. The method relies on the tail ratio, which we define as the ratio between the empirical 99th and 95th percentile of the ordinary peaks, and is tested by using daily streamflow time series from 182 gauges in Germany. Based on the value of the tail ratio index, we choose either the Gamma or the Log-Normal distributions to represent the ordinary peaks in each gauge. The approach correctly identifies the most suitable distribution of ordinary peaks in a large majority of the analyzed basins, and is robust to changes of the considered dataset. The preliminary selection of the ordinary distribution based on the tail ratio index improves the estimation of frequent and rare floods with respect to MEV applied with a single distribution not tailored on the specific statistical properties of the ordinary peaks. Finally, by comparing the developed methodology with the standard Generalized Extreme Value (GEV) distribution, we show that we are able to reduce the estimation uncertainty of high flood quantiles.
Article
Full-text available
The concept of return period and its estimation are pivotal in risk management for many geophysical applications. Return period is usually estimated by inferring a probability distribution from an observed series of the random process of interest and then applying the classical equation, i.e. the inverse of the exceedance probability. Traditionally, we form a statistical sample by selecting, from the “complete” time series (e.g. at the daily scale), those values that can reasonably be considered as realizations of independent extremes, e.g. annual maxima or peaks over a certain high threshold. Such a selection procedure entails that a large number of observations are discarded; this wastage of information could have important consequences in practical problems, where the reduction of the already small size of common hydrological records significantly affects the reliability of the estimates. Under such circumstances, it is crucial to exploit all the available information. To this end, we investigate the advantages of estimating the return period without any data decimation, by using the full data-set. The proposed procedure, denoted as Complete Time-series Analysis (CTA), exploits the property that the average interarrival time (i.e. return period) of potentially damaging events is not affected by the dependence structure of the underlying process, even for cyclo-stationary (e.g. seasonal) processes. For the sake of illustration, the CTA is compared to that based on annual maxima selection, through a simple non-parametric approach, discussing advantages and limitations of the method. Results suggest that the proposed CTA approach provides a more conservative return period estimation in an holistic implementation framework within a broader range of return period values than that pertaining to other methods, which means not only the largest extremes that are the focus of extreme value theory.
Article
Full-text available
Classical moments, raw or central, express important theoretical properties of probability distributions but can hardly be estimated from typical hydrological samples for orders beyond two. L-moments are better estimated, but they all are of first order in terms of the process of interest; while they are effective in inferring the marginal distribution of stochastic processes, they cannot characterize even second-order dependence of processes (autocovariance, climacogram, power spectrum) and thus they cannot help in stochastic modelling. Picking from both categories, we introduce knowable (K-) moments, which combine advantages of both classical and L-moments, and enable reliable estimation from samples and effective description of high-order statistics, useful for marginal and joint distributions of stochastic processes. Further, we extend recent stochastic tools by introducing the K-climacogram and the K-climacospectrum, which enable characterization, in terms of univariate functions, of high-order properties of stochastic processes, as well as preservation thereof in simulations.
Article
Full-text available
A recent period of increased precipitation over the Argentinian Pampas expanded the boundary of rain-fed agriculture. However, such changes may not be sustainable if they arose from transient climate regime shifts. Considerable research exists on trends and cycles in sub-daily to annual precipitation metrics including the frequency and intensity of extreme precipitation. However, efforts to identify wetter and drier phases (or regimes) in this region are scant. This article aims to bridge that gap and advance our understanding of the multi-annual behavior of regional precipitation extremes, which can have the greatest impacts. It is unlikely that all extreme events are drawn from a single probability distribution or generated by the same physical processes. Hence, hidden mixtures of Poisson distributions are fitted to several precipitation frequency metrics to explore whether the annual to decadal variations in extreme precipitation frequency are greater than anticipated from a single system, and representative of regime shifts. Statistically significant improvements in the fit over single distributions were found for statistical mixture models of the frequency of very wet days, and the frequency of wet spells. This supports the hypothesis that multiple weather regimes exist giving rise to wetter or drier epochs. Posterior probabilities of hidden states from the fitted mixture distributions were used to identify wetter and drier years for comparison with sea surface temperature anomalies. This confirmed the presence of two distinct regimes, supporting other research, into the dynamical influences of precipitation behavior in the Argentine Pampas.
Article
Full-text available
A comprehensive understanding of seasonality in extreme rainfall is essential for climate studies, flood prediction and various hydrological applications such as scheduling season‐specific engineering works, intra‐annual management of reservoirs, seasonal flood risk mitigation and stormwater management. To identify seasonality in extreme rainfall and quantify its impact in a theoretically consistent yet practically appealing manner, we investigate a dataset of 27 daily rainfall records spanning at least 150 years. We aim to objectively identify periods within the year with distinct seasonal properties of extreme rainfall by employing the Akaike Information Criterion (AIC). Optimal partitioning of seasons is identified by minimizing the within‐season variability of extremes. The statistics of annual and seasonal extremes are evaluated by fitting a generalized extreme value (GEV) distribution to the annual and seasonal block maxima series. The results indicate that seasonal properties of rainfall extremes mainly affect the average values of seasonal maxima and their variability, while the shape of their probability distribution and its tail do not substantially vary from season to season. Uncertainty in the estimation of the GEV parameters is quantified by employing three different estimation methods (Maximum Likelihood, Method of Moments and Least Squares) and the opportunity for joint parameter estimation of seasonal and annual probability distributions of extremes is discussed. The effectiveness of the proposed scheme for seasonal characterization and modeling is highlighted when contrasted to results obtained from the conventional approach of using fixed climatological seasons.
Article
Full-text available
An extension of the symmetric-moving-average (SMA) scheme is presented for stochastic synthesis of a stationary process for approximating any dependence structure and marginal distribution. The extended SMA model can exactly preserve an arbitrary second-order structure as well as the high order moments of a process, thus enabling a better approximation of any type of dependence (through the second-order statistics) and marginal distribution function (through statistical moments), respectively. Interestingly, by explicitly preserving the coefficient of kurtosis, it can also simulate certain aspects of intermittency, often characterizing the geophysical processes. Several applications with alternative hypothetical marginal distributions, as well as with real world processes, such as precipitation, wind speed and grid-turbulence, highlight the scheme’s wide range of applicability in stochastic generation and Monte-Carlo analysis. Particular emphasis is given on turbulence, in an attempt to simulate in a simple way several of its characteristics regarded as puzzles.
Thesis
Full-text available
The high complexity and uncertainty of atmospheric dynamics has been long identified through the observation and analysis of hydroclimatic processes such as temperature, dew-point, humidity, atmospheric wind, precipitation, atmospheric pressure, river discharge and stage etc. Particularly, all these processes seem to exhibit high unpredictability due to the clustering of events, a behaviour first identified in Nature by H.E. Hurst in 1951 while working at the River Nile, although its mathematical description is attributed to A. N. Kolmogorov who developed it while studying turbulence in 1940. To give credits to both scientists this behaviour and dynamics is called Hurst-Kolmogorov (HK). In order to properly study the clustering of events as well as the stochastic behaviour of hydroclimatic processes in general we would require numerous of measurements in annual scale. Unfortunately, large lengths of high quality annual data are hardly available in observations of hydroclimatic processes. However, the microscopic processes driving and generating the hydroclimatic ones are governed by turbulent state. By studying turbulent phenomena in situ we may be able to understand certain aspects of the related macroscopic processes in field. Certain strong advantages of studying microscopic turbulent processes in situ is the recording of very long time series, the high resolution of records and the controlled environment of the laboratory. The analysis of these time series offers the opportunity of better comprehending, control and comparison of the two scientific methods through the deterministic and stochastic approach. In this thesis, we explore and further advance the second-order stochastic framework for the empirical as well as theoretical estimation of the marginal characteristic and dependence structure of a process (from small to extreme behaviour in time and state). Also, we develop and apply explicit and implicit algorithms for stochastic synthesis of mathematical processes as well as stochastic prediction of physical processes. Moreover, we analyze several turbulent processes and we estimate the Hurst parameter (H >> 0.5 for all cases) and the drop of variance with scale based on experiments in turbulent jets held at the laboratory. Additionally, we propose a stochastic model for the behaviour of a process from the micro to the macro scale that results from the maximization of entropy for both the marginal distribution and the dependence structure. Finally, we apply this model to microscale turbulent processes, as well as hydroclimatic ones extracted from thousands of stations around the globe including countless of data. The most important innovation of this thesis is that, to the Author’s knowledge, a unique framework (through modelling of common expression of both the marginal density distribution function and the second-order dependence structure) is presented that can include the simulation of the discretization effect, the statistical bias, certain aspects of the turbulent intermittent (or else fractal) behaviour (at the microscale of the dependence structure) and the long-term behaviour (at the macroscale of the dependence structure), the extreme events (at the left and right tail of the marginal distribution), as well as applications to 13 turbulent and hydroclimatic processes including experimentation and global analyses of surface stations (overall, several billions of observations). A summary of the major innovations of the thesis are: (a) the further development, and extensive application to numerous processes, of the classical second-order stochastic framework including innovative approaches to account for intermittency, discretization effects and statistical bias; (b) the further development of stochastic generation schemes such as the Sum of Autoregressive (SAR) models, e.g. AR(1) or ARMA(1,1), the Symmetric-Moving-Average (SMA) scheme in many dimensions (that can generate any process second-order dependence structure, approximate any marginal distribution to the desired level of accuracy and simulate certain aspects of the intermittent behaviour) and an explicit and implicit (pseudo) cyclo-stationary (pCSAR and pCSMA) schemes for simulating the deterministic periodicities of a process such as seasonal and diurnal; and (c) the introduction and application of an extended stochastic model (with an innovative identical expression of a four-parameter marginal distribution density function and correlation structure, i.e. g(x;C)=λ/[(1+|x/a+b|^c )]^d, with C=[λ,a,b,c,d]), that encloses a large variety of distributions (ranging from Gaussian to powered-exponential and Pareto) as well as dependence structures (such as white noise, Markov and HK), and is in agreement (in this form or through more simplified versions) with an interestingly large variety of turbulent (such as horizontal and vertical thermal jet of positively buoyancy processes using laser-induced-fluorescence techniques as well as grid-turbulence generated within a wind-tunnel), geostatistical (such as 2d rock formations), and hydroclimatic processes (such as temperature, atmospheric wind, dew-point and thus, humidity, precipitation, atmospheric pressure, river discharges and solar radiation, in a global scale, as well as a very long time series of river stage, and wave height and period). Amazingly, all examined physical processes (overall 13) exhibited long-range dependence and in particular, most (if treated properly within a robust physical and statistical framework, e.g. by adjusting the process for sampling errors as well as discretization and bias effects) with a mean long-term persistence parameter equal to H ≈ 5/6 (as in the case of isotropic grid-turbulence), and (for the processes examined in the microscale such atmospheric wind, surface temperature and dew-point, in a global scale, and a long duration discharge time series and storm event in terms of precipitation and wind) a powered-exponential behaviour with a fractal parameter close to M ≈ 1/3 (as in the case of isotropic grid-turbulence).
Article
Full-text available
The long-range dependence (LRD) is considered an inherent property of geophysical processes, whose presence increases uncertainty. Here we examine the spatial behaviour of LRD in precipitation by regressing the Hurst parameter estimate of mean annual precipitation instrumental data which span from 1916-2015 and cover a big area of the earth's surface on location characteristics of the instrumental data stations. Furthermore, we apply the Mann-Kendall test under the LRD assumption (MKt-LRD) to reassess the significance of observed trends. To summarize the results, the LRD is spatially clustered, it seems to depend mostly on the location of the stations, while the predictive value of the regression model is good. Thus when investigating for LRD properties we recommend that the local characteristics should be considered. The application of the MKt LRD suggests that no significant monotonic trend appears in global precipitation, excluding the climate type D (snow) regions in which positive significant trends appear. Supplementary information files are hosted at: https://doi.org/10.6084/m9.figshare.4892447.v1
Article
Full-text available
We present an R function for testing the significant trend of time series. Te function calculates trend significance using a modified Mann-Kendall test, which takes into account the well-known physical behavior of the Hurst-Kolmogorov dynamics. Te function is tested at 10 stations in Greece, with approximately 50 years of PET data with the use of a recent parametric approach. A significant downward trend was detected at two stations. Te R software is now suitable for extensive use in several fields of the scientific community, allowing a physical consistent of a trend analysis.
Book
Full-text available
Chapter 1: The utility of probability – Chapter 2: Basic concepts of probability – Chapter 3: Elementary statistical concepts – Chapter 4: Special concepts of probability theory in geophysical applications – Chapter 5: Typical univariate statistical analysis in geophysical processes – Chapter 6: Typical distribution functions in geophysics, hydrology and water resources – Appendix (Statistical tables)
Article
Record-breaking (RB) events are the highest or lowest values assumed by a given variable, such as temperature and precipitation, since the beginning of the observation period. Research in hydroclimatic fluctuations and their link with this kind of extreme events recently renewed the interest in RB events. However, empirical analyses of RB events usually rely on statistical techniques based on too restrictive hypotheses such as independent and identically distributed (i/id) random variables or nongeneral numerical methods. In this study, we propose some exact distributions along with accurate approximations describing the occurrence probability of RB and peak-over-threshold (POT) events under general spatiotemporal dependence, which enable analyses based on more appropriate assumptions. We show that (i) the Poisson binomial distribution is the exact distribution of the number of RB events under i/id, (ii) equivalent binomial distributions are accurate approximations under i/id, (iii) beta-binomial distributions provide the exact distribution of POT occurrences under spatiotemporal dependence, and (iv) equivalent beta-binomial distributions provide accurate approximations for the distribution of RB occurrences under spatiotemporal dependence. To perform numerical validations, we also introduce a generator of spatially and temporally correlated binary processes, called BetaBitST. As examples of application, we study RB and POT occurrences for monthly precipitation and temperature over the conterminous United States and reanalyze Mauna Loa daily temperature data. Results show that accounting for spatiotemporal dependence yields strikingly different conclusions, making the observed frequencies of RB and POT events much less surprising than expected and calling into question previous results reported in the literature.