Content uploaded by Theano (Any) Iliopoulou
Author content
All content in this area was uploaded by Theano (Any) Iliopoulou on Sep 09, 2019
Content may be subject to copyright.
1
Revealing hidden persistence in maximum rainfall records
1
1*Theano Iliopoulou and 1Demetris Koutsoyiannis
2
1Department of Water Resources, Faculty of Civil Engineering, National Technical University of
3
Athens, Heroon Polytechneiou 5, GR157 80 Zografou, Greece
4
* Corresponding author. Tel.: +30 6978580613
5
Email address: tiliopoulou@hydro.ntua.gr
6
7
Abstract
8
Clustering of extremes is critical for hydrological design and risk management and challenges the
9
popular assumption of independence of extremes. We investigate the links between clustering of
10
extremes and longterm persistence, else HurstKolmogorov (HK) dynamics, in the parent process
11
exploring the possibility of inferring the latter from the former. We find that a) identifiability of
12
persistence from maxima depends foremost on the choice of the threshold for extremes, the
13
skewness and kurtosis of the parent process, and less on sample size; and b) existing indices for
14
inferring dependence from series of extremes are biased downward when applied to nonGaussian
15
processes. We devise a probabilistic index based on the probability of occurrence of peakover
16
threshold events across multiple scales, which can reveal clustering, linking it to the persistence of
17
the parent process. Its application shows that rainfall extremes may exhibit noteworthy departures
18
from independence and consistency with an HK model.
19
Keywords: extremes, clustering, HK dynamics, persistence, peaks over threshold, rainfall
20
2
1. Introduction
21
The identification of clusters in series of extreme events is an ongoing research topic in
22
geosciences, including hydrology, one that is particularly challenging due to the large estimation
23
uncertainties involved when studying series of rare events. Regardless of the complications, this
24
question has multiple important implications for earth sciences which range from understanding
25
natural variability and process dynamics to correctly applying stochastic models for the purposes
26
of inference and prediction. This is evident as most relevant hydrological and engineering
27
applications require settling this issue at the early stage of the analysis, by either assuming
28
independence (e.g. Coles et al., 2001; Kottegoda and Rosso, 2008) or ‘ensuring’ it through
29
‘adequate’ sampling techniques (Ferro and Segers, 2003). Thereby, the research focus can be
30
uniquely placed on the more straightforward task of characterizing the probability distribution of
31
extremes. For example, typical flood guidelines suggest that successive flood events have at least
32
a certain separation lag time in order to be considered independent for the application of models
33
(Lang et al., 1999). In light of concerns for intensification of hydrological extremes due to
34
anthropogenic forcing, the investigation of clustering receives additional interest (Ntegeka and
35
Willems, 2008; Tye et al., 2018; Merz et al., 2016; Serinaldi and Kilsby, 2018), as attribution of
36
trends to an external deterministic forcing presupposes that at least the presence of natural inherent
37
variability has been beforehand properly accounted for. In this respect, increasing evidence
38
reporting the presence of persistence in various hydroclimatic variables (Hurst, 1951;
39
Koutsoyiannis, 2003; Montanari, 2003; Markonis and Koutsoyiannis, 2016; O’Connell et al.,
40
2016; Iliopoulou et al., 2016; Tegos et al., 2017; Dimitriadis, 2017) gives rise to the question of
41
whether or not, and to what extent a regular behaviour of the extremes originating from persistent
42
processes could be misinterpreted as a result of an anthropogenic cause.
43
3
This study deals with the investigation of clustering behaviour in records of maxima with a
44
special focus on longterm daily rainfall observational records. As recent studies reported evidence
45
on the presence of persistence in annual rainfall (Iliopoulou et al., 2016; Tyralis et al., 2018), the
46
question of possible propagation of persistence to rainfall extremes naturally arises. Therefore, the
47
research objectives can be articulated as follows: a) what are the links between persistence in the
48
parent process and clustering of extreme events and can we infer the one from the other? and, b)
49
what constitutes an informative characterization for clustering?
50
Typically, the assessment of clustering properties of extremes from a timeseries implies the
51
selection of a threshold based on which the sampling of ‘extreme’ events is performed. Then,
52
clustering is quantified based on the departure of the properties of extremes from the ones of a
53
purely random process. This evaluation is performed either by considering the series of the inter
54
arrival times of extremes or equivalently, the series of counts of extreme events over counting
55
windows. There is a direct correspondence between the two; it is wellknown for example, that
56
when the data come from a Poisson process, their interarrival times are exponentially distributed
57
(Papoulis, A., 1991).
58
In the hydrological literature, various adhoc, sometimes visual and subjective approaches
59
are used in order to quantify departures of extremes —typically floods— from independence and
60
characterize clustering. The most systematic usually consist of some type of ‘window’ analysis,
61
where the timeseries is split into subperiods which are examined for presence of perturbations in
62
the statistics of extreme events, often corroborated by statistical testing (Marani and Zanetti, 2015;
63
Ntegeka and Willems, 2008; Willems, 2013). Avoiding the need for selection of time windows to
64
study, Merz et al. (2016) applied a dispersion index, although mostly focused on a combination of
65
kernelbased methods coupled with statistical significance tests to identify floodrich and flood
66
4
poor periods in Germany. Yet, with a few exceptions only (Eichner et al., 2011; Serinaldi and
67
Kilsby, 2016, 2018), the majority of clustering characterizations for hydrological extremes are not
68
studied in relation to the dependence properties of the parent process, which is the focal point here.
69
To evaluate the clustering properties in a more comprehensive framework, two established
70
indices are used in geophysical timeseries analysis, especially for the clustering analysis of
71
earthquakes (Telesca et al., 2002) and storms (Vitolo et al., 2009) and are based on the ‘counts’
72
approach: the index of dispersion and the Allan factor. Both can be used to formally test the data
73
against the Poissonian assumption (Serinaldi, 2013; Serinaldi and Kilsby, 2013) and it is reported
74
that their scaling behaviour can also reveal the fractal properties of the underlying process for ideal
75
rate fractal processes (Thurner et al., 1997). The latter is related to the asymptotic dependence
76
property for large time horizons, longterm dependence, quantified by the Hurst parameter. For
77
revealing the HK dynamics, a number of methods examining the original series also exist with the
78
climacogram (Koutsoyiannis, 2010), i.e. the variance of the aggregated process over scales, shown
79
to be the most robust (Dimitriadis and Koutsoyiannis, 2015).
80
We briefly review the above methods based on their performance on revealing the clustering
81
of extremes sampled from synthetic timeseries generated in order to exhibit various degrees of
82
persistence and different marginal distributions. We assess their degree of generality and showcase
83
their shortcomings when extremes arrive from complex processes. We show how the interplay of
84
persistence and moments of order higher than 2 (skewness, kurtosis) can obscure the identification
85
of the latter from extremes. Accordingly, we propose an alternative characterization of clustering
86
based on a probabilistic index with distinctive features and test the proposed method on synthetic
87
and realworld rainfall data. We find that the index exhibits some advantageous characteristics,
88
namely it is capable of quantifying clustering by probabilistic means, linking it to the scaling
89
5
behavior of the parent process for a range of distributional and dependence properties. It also
90
enables modelling the probabilities of threshold exceedances across multiple timescales, which
91
can be used as a simulation tool, that being an important advance over existing methods that have
92
mainly an inferential character.
93
2. Dataset
94
An extended dataset comprising the 60 longest available daily rainfall records is investigated in
95
terms of its extreme properties. The data used in this study are collected from global datasets, i.e.
96
Global Historical Climatology Network Daily database (Menne et al., 2012) and European
97
Climate Assessment and Dataset (Klein Tank et al., 2002) and third parties acknowledged in the
98
acknowledgments sections. They present an update of the previous dataset explored in Iliopoulou
99
et al. (2018) of long rainfall records surpassing 150 years of daily values. The geographic location
100
of the rain gauges is shown in Figure 1. The length of the timeseries enables the investigation of
101
clustering on extended time horizons from daily to yearly timescales.
102
3. Methodological framework
103
3.1 Definition of notation and mathematical formulation
104
Let xi be a stationary stochastic process in discrete time i, i.e. a collection of random variables xi,
105
and x:={x1, … xn} a single realization (observation) of the latter, i.e. a timeseries. Now for u being
106
a threshold, u ϵ ℝ, we define the process of peaks over the threshold (POT) consisting of events
107
surpassing the threshold u, i.e,
108
(1)
6
Let also N(t) be a counting process of POT occurrences in time which is an increasing function of
109
time t. We then define the process z(k)q:= N(qk)–N((q – 1)k) as the number of occurrences of POT
110
at timescale k and at discrete time q =1,.., n/k.
111
We also define by m(k)q:= max(q – 1)k ≤ j ≤ qk{xj} the block maxima series, which is formed by
112
extracting the maximum order statistic of the observations divided in nonoverlapping equally
113
sized periods of length (timescale) k. In the following, we call the timescale k as timescale of
114
filtering of the maxima. Figure 2 visualizes all the above at two temporal scales for a realization
115
of a random process with Hurst parameter H=0.8 and the first four moments following a
116
generalized Pareto distribution.
117
3.2 Generation of benchmark synthetic timeseries
118
To evaluate the ability of clustering indices to discern the dependence characteristics of the parent
119
(extreme generating) process, we first produce a set of synthetic timeseries xi with different
120
dependence properties and marginal distributions. For the generation scheme, we employ a
121
simulation procedure proposed by Dimitriadis and Koutsoyiannis (2018) which is capable of
122
generating timeseries explicitly reproducing chosen theoretical moments up to any order together
123
with any (longterm) persistence structure, i.e. the HK dynamics. We focus here only on processes
124
exhibiting persistence as these are the ones assumed consistent with the natural phenomena studied
125
and also known to produce longterm clustering. For the marginal distribution, we generate
126
timeseries preserving up to the 4th order moments following the normal, generalized Pareto and
127
gamma distributions. The higherorder moments of the generated timeseries follow the entropic
128
distribution. Because the generation scheme preserves up to a specific number of moments from a
129
distribution, the final shape may be slightly distorted with respect to the theoretical one, and
130
therefore, we denote the generated series as typegamma and typePareto, instead of gamma and
131
7
generalized Pareto, respectively. For a detailed explanation of the generation scheme, the reader is
132
referred to the Dimitriadis and Koutsoyiannis (2018). We focus only on the first four moments as
133
higherorder classical moments cannot be reliably estimated from ordinary sample sizes
134
(Lombardo et al., 2014).
135
The properties of these timeseries are chosen in order to cover a range of statistical and
136
stochastic characteristics in terms of skewness, kurtosis and H parameter, and therefore, provide a
137
good benchmark sample for testing the indices in typical but also more ‘extreme’ cases. Their
138
properties are summarized in Table 1.We note that these timeseries are meant as theoretical case
139
studies to test the appropriateness of the indices and are not to be considered as synthetic series of
140
daily rainfall, which are the realworld data in question. However, since only the sequence of
141
counts of extremes is of interest, and not their actual values, it is not necessary to strictly preserve
142
other properties of daily rainfall, i.e. intermittency, and therefore in this sense comparison to the
143
synthetic series is allowed. A sample of the timeseries is plotted in Figure 3.
144
Additionally to the above benchmark timeseries, we generate ensembles of shorter timeseries
145
having lengths of 150 × 365 values, i.e. equal to the minimum record length of the rainfall data,
146
and preserving the same moments as the benchmark timeseries. These series are produced using
147
fewer weights for the SMA scheme, up to 2000, but applying proper weight adjustment scheme
148
(Koutsoyiannis, 2016). They reproduce two dependence structures, white noise, and HK with H
149
parameter 0.7, considered a representative value for hydrological processes. The purpose of the
150
second benchmark sample is to test the methods in ‘realistic’ record lengths and to evaluate
151
estimation uncertainty by Monte Carlo simulations that require significantly less computational
152
effort compared to the first benchmark sample, which is generated using 106 weights, i.e. equal to
153
the series length.
154
8
3.3 Secondorder characterization of extremes
155
The Hurst parameter is a wellestablished measure of persistence. It can be estimated from the
156
slope of the double logarithmic plot of the standard deviation of the averaged process versus the
157
averaging timescale, i.e. the climacogram (Koutsoyiannis, 2010). To test how the estimator is
158
impacted when extremes are used instead of the original values, we compute the H parameter for
159
extremes extracted from windows (scales) of length 1 to N/10 where N is the timeseries length. An
160
example is provided in Figure 4. The first H value (scale = 1) is the value for the original data (the
161
parent timeseries) and as the scale increases progressively the time series is filtered to show only
162
the most ‘extreme’ data. For instance, if the basic timescale is daily, the estimated H parameter at
163
timescale k=365 corresponds to the H parameter of the annual maxima. To reduce computational
164
time, we perform estimation every 50 scales. The results are shown in Figure 4 are for the normal
165
and the other benchmark timeseries. The impact of skewness and kurtosis on the estimator is
166
striking as in the case of nonGaussian timeseries, the H parameter quickly decays to 0.5, as if
167
there was independence. On the contrary, for the normal timeseries it yields almost a stable value.
168
To verify that this is not due to the impact of standard deviation bias induced by dependence, we
169
performed estimation for selected timescales with the unbiased with respect to standard deviation,
170
LSSV estimator (Koutsoyiannis, 2003; Tyralis and Koutsoyiannis, 2011) as well. We also repeat
171
the estimation for the shorter timeseries and plot the average values at each scale obtained from
172
the Monte Carlo experiments. The same conclusion can be drawn. The climacogram estimator is
173
severely biased downward for extremes originating from nonGaussian processes and falsely
174
indicates independence after a few scales of filtering. Therefore, we do not consider the
175
climacogram estimator for the rest of the analysis on empirical data. Since it has been shown that
176
the climacogram is closely related to other secondorder characterizations, i.e. spectrum and
177
9
autocovariance (Dimitriadis and Koutsoyiannis, 2015), we also expect similar results from the
178
latter. Furthermore, Barunik and Kristoufek (2010) have shown that even for the underlying
179
process (the parent), the sampling properties of the Hurst parameter estimation by some other
180
approaches, i.e. the multifractal detrended fluctuation analysis and the detrending moving average,
181
are also greatly impacted by heavy tails.
182
3.4 Clustering indices: the dispersion index
183
A wellknown measure of clustering of events is the index of dispersion of counts, also known as
184
the Fano factor (e.g. Thurner et al., 1997), which is defined as the ratio of the variance of the counts
185
of events versus their mean number at a specific timescale k, i.e.:
186
(2)
For a Poisson point process, the dispersion index is unity for all timescales. According to the
187
literature (Thurner et al., 1997) the dispersion index exhibits powerlaw scaling behavior which is
188
linked to the underlying persistence structure. Although the exact form of the equation provided
189
could not be theoretically validated per se at small scales, we have confirmed the powerlaw
190
scaling at large scales, which by revising the original equation (Thurner et al., 1997), can be
191
expressed as:
192
(3)
where c a real parameter and k0 denotes the scaling onset timescale (a minimum time scale, for
193
which the above scaling law applies). It follows that the exponent 2H – 1 can be obtained as the
194
slope of the double logarithmic plot of the dispersion index versus the timescale for and
195
therefore the Hurst parameter, H, ranging in the [0,1] interval can be estimated accordingly. An
196
example is provided in Figure 5.
197
10
We test the dispersion index against samples of Gaussian and nonGaussian timeseries
198
exhibiting HK dynamics. Namely, we use a) the two long benchmark series (N=106), the normal
199
and the typePareto, both exhibiting H = 0.8, and b) the ensemble of simulations of shorter length
200
(equal to daily values for 150 years) for three different distributions, normal, typegamma with
201
shape parameter α=0.01 and typePareto with α=0.2, all exhibiting H = 0.7. For the second sample,
202
we provide the average value estimated from the 103 Monte Carlo simulations of the dispersion
203
index at each scale. Results are shown in Figure 5.
204
At first, it is worth noting that the onset scale, from which scaling arises, appears to be
205
smaller for the long compared to the shorter timeseries. The related H parameters are estimated
206
from Eq.3 for onset scale k0=500, for both cases, in order to ensure a more robust estimate (yet
207
fitted lines are extrapolated backwards to scale 365). It can be seen that the index yields satisfactory
208
approximations of H only for the normal distribution and the long benchmark series (estimated H
209
= 0.77, theoretical H = 0.8), whereas results are biased downward for the nonGaussian one
210
(estimated H = 0.67, theoretical H = 0.8). In the case of the shorter record length, the bias severely
211
increases as the index yields H parameters falsely denoting independence (average H = 0.54).
212
There is also a considerable degree of ambiguity regarding the selection of the onset time, a task
213
that requires visual examination and subjective judgement. Due to the above reasons, and namely,
214
to the observed underestimation of persistence for common record lengths, we do not consider the
215
index for the rest of the analysis. A more sophisticated use of the dispersion index as well as bias
216
correction methods may be possible but remain out of the scope of the paper. For more information
217
on a related index, the Allan factor, and its properties for testing independence the reader is referred
218
to Serinaldi and Kilsby (2013).
219
11
3.5 A new probabilistic index to identify multiscale clustering behaviour
220
The above review highlights the complexity involved in identifying clustering of extremes and the
221
need to devise an informative and objective characterization able to reveal persistence even for
222
nonGaussian series, which are usually the ones of interest in geophysical studies. To address this,
223
we formulate a straightforward and assumptionfree representation of clustering by estimating the
224
probability of occurrence of extreme events across multiple scales. The proposed probabilistic
225
index is defined as follows.
226
We set a threshold to the original timeseries and select the data surpassing the threshold as
227
extreme events, hence, forming the Peaks Over Threshold series, yi. Accordingly, we form the
228
series of counts of the POT events for each scale, z(k), as explained in Section 3.1 (see also Fig.2).
229
We additionally, define the binary process to denote the event of exceedance of the threshold
230
at each time interval q of size k, q = 1, …,:
231
(4)
232
Then, the probability of exceedance of the threshold for timescale k is obtained as the frequency
233
of exceedances estimated from all intervals:
234
(5)
235
The latter is the exceedance probability (of the threshold) versus the scale (EPvS) and its
236
complement, is the nonexceedance probability versus scale (NEPvS). Evidently,
237
at scale k = 1 the EPvS is an estimate of the probability of the threshold value, , and
238
12
the NEPvs is . For example, in the previous applications, the threshold value was
239
selected so that F(u) = 0.05. For a purely random process, the NEPvS is obtained as:
240
(6)
241
where p is the probability of nonexceedance at the basic scale k = 1 and equals 1 – F(u). Therefore,
242
for white noise processes, the probability of occurrence of extremes across scales is fully
243
determined by the choice of the threshold (controlling its probability at the basic scale) and the
244
scale. For HK processes though, a different behaviour is revealed, with the probabilities of non
245
exceedance of the threshold being larger than those obtained under independence. This property
246
of HK is discussed and investigated extensively in the following Section 4.
247
To model the NEPvS, we revisit a probabilistic model proposed by Koutsoyiannis (2006) to
248
describe the clustering behaviour of dry spells in rainfall timeseries. The model derives from an
249
entropymaximization framework and was originally proposed to describe the probability dry
250
across different timescales. The latter, according to our definition, corresponds to a threshold
251
taking the value of 0. Therefore, in a similar manner to the probability dry, we obtain the
252
probability of nonexceedance of the threshold at scale k as:
253
(7)
where u is the threshold parameter and η, ξ ϵ [0, 1]. For η = 1 and ξ = 0.5, Eq. 7 describes the white
254
noise process. To allow backward extendibility to scale k = 0, the positivity of the base should be
255
ensured and therefore the following inequality should hold:
. We apply both the index
256
and the proposed model to the synthetic series as well as to the rainfall data and assess their
257
performance in characterizing clustering. We evaluate the index’s ability to reveal dependence by
258
13
examining its performance for all the benchmark timeseries and we test its robustness by varying
259
all the involved factors, i.e. sample size, marginal distribution’s properties and threshold value.
260
4. Results
261
4.1 Relating multiscale clustering to LRD behavior
262
We estimate the NEPvS index for the synthetic benchmark timeseries setting the threshold of
263
extremes to 5%. The benchmark series have length 106 and therefore for a 5% threshold we obtain
264
50 000 extreme values (POT events). We investigate the temporal scales 1 to 1000, since the
265
index’s applicability to larger scales is to some extent also conditioned by the available sample
266
size (this feature is discussed in Section 4.2).
267
Results from the NEPvS application are demonstrated on a double logarithmic plot of minus
268
natural logarithm of the nonexceedance probability of the threshold versus the scale, which for
269
most cases yields a straight line (Fig. 6). Some interesting insights can be derived. As persistence
270
increases, the probability of no occurrences of extremes in a scale progressively increases
271
(equivalently, its minus logarithm —shown in the plots— decreases), which is true for all the
272
examined distribution types. As already mentioned, there is a maximum temporal scale until which
273
the index is informative. The latter, which we will call the ‘maxdiscernible’ scale, is the scale for
274
which the estimated (from the simulated series) nonexceedance probability equals zero as at least
275
one extreme event is encountered in every one of the intervals. In this case, the minus
276
logarithm of the NEPvS tends to infinity and is not shown on the plots. For a given number of
277
extremes and thus, sample size, the maxdiscernible scale depends on the H parameter; the larger
278
the persistence, the more timescales are required in order to ‘encounter’ the extremes. This can be
279
explained by considering that another manifestation of clustering of extremes is the existence of
280
prolonged periods of time with no extreme occurrences.
281
14
It is worth noticing that the marginal properties are irrelevant for the NEPvS of the white
282
noise process. The latter is also proved in Fig.6 as the lines of all the white noise timeseries with
283
different marginals are completely identical, for which there is a theoretical justification. Likewise,
284
for H parameters no far from 0.5 the different nonGaussian distributions (Fig. 6a) yield negligible
285
differences on the NEPvS plots. However, notable differences appear for H > 0.7. Specifically, the
286
nonGaussian NEPvS plots evidently differ from the NEPvS of the normal distribution, especially
287
for large H values, with the latter showing more apparent clustering behaviour.
288
The NEPvS model (Eq. 7) fits perfectly all the range of nonGaussian distributions, with a
289
slight exception for the normal timeseries at small scales (k < 50) and very large H parameter (H
290
= 0.9).
291
4.2 NEPvS index sensitivity to sample size, threshold selection and distribution type
292
Having established that a representation in terms of the minus logarithm of probability vs.
293
timescale, like that of Fig. 6, reflects the presence of persistence for a range of distribution types,
294
we aim to frame its statistical behaviour for different configurations of extreme value analysis. For
295
this purpose, the statistical behaviour of this graph is investigated by means of Monte Carlo
296
simulation starting from the white noise case, which will serve as a benchmark model for
297
identifying dependence from the rainfall data.
298
4.2.1 Sample size impact
299
We generate two ensembles of 103 white noise timeseries with sample sizes 150 years (150×365
300
daily values) and 300 years respectively, thus covering all the range of observed record lengths of
301
our data set, and we produce the NEPvS plots for both lengths, shown in Figure 7. As expected,
302
the larger sample size produces narrower Monte Carlo Prediction Limits (MCPL), yet the
303
difference is almost negligible. The fact that sample sizes of this order of magnitude yield only
304
15
minimal differences in the MCPL gives confidence in attributing the differences between the
305
models that are examined next to other factors instead. The essential change however, between the
306
two sample sizes is the propagation of the maxdiscernible scale to a larger timescale for the longer
307
timeseries (Fig. 7). The latter is due to the fact that ‘extremes’ are distributed in longer time periods
308
for the longer series, and therefore, the longer the series the more timescales may be inspected for
309
clustering.
310
4.2.2 Threshold impact
311
The selection of the threshold is the most important choice when analysing records of maxima. It
312
is generally acknowledged that choosing ‘high’ thresholds for the extremes results to observations
313
that are located far in the right tail of the distribution, and therefore they are of interest, but
314
simultaneously, increases uncertainty as the sampled observations are fewer. The exact opposite
315
is true for lower thresholds. Therefore, one has to seek an optimal threshold compromising this
316
tradeoff.
317
We first evaluate the choice of the threshold by examining four different thresholds
318
associated with exceedance probabilities 0.5%, 1%, 5% and 10% respectively, applied for the
319
benchmark case of independence, as seen in Figure 8. It is interesting to note that the main effect
320
of the threshold for the iid case is the opportunity to apply the index to larger scales if the threshold
321
is increased (smaller probability of exceedance). This is due to the fact that for the same record
322
length, fewer extreme events are likely to be more separated in time and therefore, require longer
323
timescales to be grouped.
324
We also inspect the impact of the threshold in relation to the H parameter of the parent
325
process for three distribution types from the benchmark series, typePareto with a = 0.2, type
326
gamma with a = 0.01 and the normal. In this case, we evaluate three different thresholds, 5%, 10%
327
16
and 20%. Although the latter threshold would be considered ‘low’ for most extreme value
328
analyses, here it is of interest, as by varying the threshold we aim to investigate the limits of
329
identifiability of the HK behaviour, and not to focus on the exact shape of the distribution tail. To
330
this aim, we fit the probabilistic model introduced in Eq.7 to each timeseries and evaluate the
331
ability to reveal persistence through the identifiability of the fitted parameters, η and ξ. In Fig. 9,
332
the impact of the threshold is striking within the same distribution with lower threshold values
333
(e.g. 20%) increasing identifiability of the parameters more than 10%. Additionally, it can be seen
334
that the η parameter is more sensitive to the normal distribution, while on the contrary the ξ
335
parameter is sensitive to increasing skewness and kurtosis.
336
By performing the above experiments, we have demonstrated the twofold effect of the
337
threshold: ‘lower’ thresholds (higher probability of exceedance) enable better identifiability of
338
persistence, yet they limit application of the index to less scales, and vice versa.
339
4.2.3 Distribution type
340
At this stage, for the same threshold (5%), sample size (150 years) and H (0.7) parameter, we
341
estimate the NEPvS index for the shorter benchmark series characterized by different marginal
342
properties, and thus distribution tails, so as to focus solely on the impact of skewness and kurtosis
343
on the index. Results are plotted in Figure 10. Two important conclusions can be drawn: a)
344
clustering of extremes and its identifiability is, in this case too, greater for the normal distribution
345
(Fig.10a) and b) for a specified nonGaussian distribution, clustering is greater and also more
346
visible for increasing skewness and kurtosis (Fig. 10b). The latter is a significant advance as the
347
reviewed tools in sections 3.3 and 3.4 showed very high downward bias for increasing higher order
348
moments of the nonGaussian distributions and practically no difference among them for the
349
record lengths available (150 years). We also provide the plots of the fitting of η and ξ parameters
350
17
computed for the long benchmark series with H parameters ranging in [0.5 0.99] as well as their
351
comparison in the Appendix (Fig. A1A3). All three plots confirm the above observations.
352
4.3 Clustering in real world rainfall extremes I: identifying clustering mechanisms in the
353
parent process
354
Rainfall is a complex geophysical process for the stochastic modelling of which it is necessary to
355
take into account its mixedtype marginal distribution (due to intermittency), the presence of cyclo
356
stationarity (seasonality and also diurnal cycle for subdaily scales) as well as its scale dependence
357
structure (Markonis and Koutsoyiannis, 2016). It is expected that all these mechanisms affect the
358
clustering process of extremes.
359
In the following, we investigate their impact separately, although we note that the interplay
360
among them may not necessarily allow the robust disentanglement of their effects at the different
361
scales.
362
4.3.1 Influence of probability dry
363
The most distinctive feature of the rainfall process is its highly intermittent nature at fine temporal
364
scales (Koutsoyiannis, 2006). To statistically account for intermittency, the marginal distribution
365
is formed as a mixed (discretecontinuous) type one, having a probability mass function
366
concentrated at 0 and a probability distribution function to describe the nonzero values. Therefore,
367
if pd is the probability of norain, termed probability dry, then the cumulative distribution function
368
for the whole rainfall record can be defined in terms of the conditional distribution of wet
369
days as:
370
(8)
371
Since the threshold of extremes u is obtained as the quantile with a chosen probability of
372
exceedance, it is evident that in the case of mixedtype processes, as in daily rainfall, the same
373
18
threshold value will have a different probability of exceedance for the whole process and for the
374
wet process (the nonzero rainfall). By simple probabilistic statements, it follows that the two
375
exceedance probabilities of the threshold u for the compound and the wet process, pc(u) and pw(u),
376
respectively, are related as:
377
(9)
378
where pd = 1 – pc(0) is the probability dry. Therefore, the exceedance probability for the same
379
threshold is higher for the wet series, which means that depending on the probability dry, the values
380
surpassing the same threshold may not necessarily belong to the right tail of the wet series as
381
‘extremes’. For instance, a threshold u with associated exceedance probability 5% for the whole
382
rainfall record with probability dry equal to 80% yields exceedance probability 25% for the wet
383
series, and therefore the resulting series of POT events would also include lower rainfall values.
384
While this is not a limitation of the methodology, it should be properly accounted for in order to
385
a) ensure that the resulting extremes are indeed towards the right end of the wet series tail and b)
386
to make meaningful comparisons among stations with different values of the probability dry. For
387
this reason, we compute pd for all stations in order to make sure that the resulting extremes are
388
surpassing relevant thresholds. As previously shown, the latter is important since the threshold is
389
the key control on the results.
390
4.3.2 Influence of seasonality
391
Seasonality may be in cases an important attribute of extreme rainfall impacting the central
392
tendency of rainfall maxima belonging to different seasons and inducing temporal clustering in the
393
series of extremes (Iliopoulou et al. 2018). Since our aim is to focus on the impact of HK dynamics
394
on clustering of extremes, we apply deseasonalization schemes to the original series in order to
395
smooth out the seasonal components and reduce associated clustering. By doing so, we may
396
19
perform Monte Carlo simulations with one marginal distribution per station for the validation of
397
the chosen models. We note that a perfect separation of the impact of seasonality from HK
398
dynamics may not always be possible, as in stations exhibiting strong seasonality we anticipate
399
interplay between the two.
400
We consider two different methods for removing seasonality. The first one, termed M1, is a
401
simple standardization scheme performed on a monthly basis. The daily values xi belonging to
402
each month m = 1,..,12 are transformed by subtracting the mean and dividing by the standard
403
deviation of all daily values belonging to the same month, as follows:
404
. This method effectively removes seasonality from the first two moments of the data. In order
405
to deal with higher order moments, we apply a second deseasonalization scheme denoted M2,
406
which is based on the Normal Quantile Transformation (NQT) also applied on a monthly basis.
407
The daily series for each month m are transformed to standard Gaussian quantiles through the
408
inverse function of the standard Gaussian cumulative distribution, with their
409
cumulative probability F(x) estimated via their Weibull plotting position. Consequently, after the
410
transformation, all daily values of each month follow the standard normal distribution. We found
411
that the two schemes show minimal differences in the index’s behaviour, with the most apparent
412
ones belonging to the stations of Athens (Fig. 11b), Palermo and Lisbon.
413
In Figure 11, we plot three characteristic cases of the NEPvS behaviours found in the data:
414
a) in a typical station with minimal to no seasonality (Oxford), extremes are not affected by
415
deseasonalization schemes (Fig.11a), b) in a station with prominent seasonality (Athens, Fig.11b),
416
a stronger deseasonalization scheme (M2) maybe required, and c) in an intermediate case
417
(Helsinki, Fig.11c), the seasonal component in extremes is effectively dealt by with the simpler
418
scheme (M1). The majority of the stations (40) belong to the third category, while for 17 stations
419
20
accounting for seasonality yields minimal to no difference. These findings are consistent in general
420
with the analysis of Iliopoulou et al. (2018) on the presence of seasonality in extreme rainfall.
421
4.3.3 Rainfall scaling regimes
422
In order to highlight the motivation behind selecting the daily rainfall as a case study for the
423
method and establish the ‘target’ persistence structure that we aim to reveal, we estimate the
424
persistent properties of the previously deseasonalized daily rainfall series. To this aim, we compute
425
the H parameter through the climacogram as introduced in section 3.3 All the empirical
426
climacograms are plotted in Figure 12. The estimated average persistence (Table 2) is close but
427
even larger than the global estimate (H≈0.6) of Iliopoulou et al. (2016) concerning annual rainfall.
428
Remarkably, in many stations we observe a change of the scaling regime, namely an intensification
429
of persistence, at scales above yearly. A similar result was observed in the work of Markonis and
430
Koutsoyiannis (2016) for rainfall records at the overdecadal scale. This behaviour is also evident
431
in the Table 2 reporting the estimated H parameters for the daily and aboveyearly scales.
432
4.4 Clustering in real world rainfall extremes II: HK dynamics?
433
4.4.1 Analysis of daily rainfall extremes in the Netherlands
434
It should be evident by now that the clustering dynamics of extremes depend not only on the
435
persistent properties of the parent process but on its higherorder moments as well. The
436
identifiability of clustering also varies depending on the choice of the threshold, which may be
437
needed to be modified for mixed type processes, as discussed before. In our case, this means that
438
depending on the probability dry of each station the chosen threshold will correspond to a different
439
one for the ‘wet’ record of each station. Therefore, a blind comparison of different stations with
440
the obtained MCPL for a given threshold could be uninformative depending on the variability of
441
probability dry in the sample of the stations. In order to apply the methodology effectively in as
442
21
many stations as possible we assume a climatically homogenous regions in which the rainfall
443
timeseries can be regarded as realizations of a single process. For this purpose, we select the region
444
of the Netherlands in which 28 out of the 60 stations are located and preliminary analysis showed
445
small variability of the summary statistics. We estimate the average values of the first four
446
moments of the deseasonalized records for all 28 stations and we also estimate the H parameter
447
resulting from the analysis of the daily values. We form an ensemble of 103 Monte Carlo
448
simulations for the average number of years of the sample (160 years) with an HKmodel
449
preserving the first four moments and subsequently, compare its clustering behaviour with the one
450
observed from the sample of the stations. We also repeat the Monte Carlo simulation for a white
451
noise process. We present both at Fig. 13. It is evident that the assumed model is consistent with
452
the majority of the observed records, with only a few stations located at the southwest of the
453
Netherlands exhibiting even stronger clustering outside of the 95% region of the assumed HK
454
model. As expected, as the threshold increases evidence of persistence is progressively ‘lost’ and
455
the probabilistic behavior of POT occurrences approaches a random one.
456
4.4.2 Stykkisholmur case study
457
As a second case study we select a single station located in Stykkisholmur, Iceland, which is the
458
station with the most peculiar behaviour among all those we analysed. We repeat the Monte Carlo
459
analysis for both a white noise process and a HK process preserving the first four moments and
460
the H (= 0.65) parameter of the record. Results are shown in Figure 14. It is interesting to note that
461
clustering in this case appears stronger than predicted by the HK model. The Monte Carlo
462
experiment is repeated for H = 0.7 to explore the possible impact of estimation uncertainty due to
463
the standard deviation bias in finite sample sizes (Koutsoyiannis and Montanari, 2007). In this
464
case, the MCPL approach the observed data for the lower threshold, yet the impact is lower for the
465
22
higher threshold. A similar behaviour was found in the station of Uppsala. We hypothesize that
466
this ‘discrepancy’ between the persistence found in the parent process and the stronger one implied
467
by the extremes might be explained by the impact of largescale atmospheric circulation patterns
468
(as the NAO) on rainfall extremes, which might need even longer record lengths in order to be
469
effectively summarized by the secondorder characterization provided by the H parameter.
470
4.4.3 Modeling the clustering behavior
471
We apply the NEPvS model to both seasonal and deseasonalized timeseries of the rainfall data of
472
all 60 stations in order to assess its applicability in all cases. We employ the deseasonalized scheme
473
M1. In Fig.15 we plot the boxplots of the estimated parameters η and ξ as well as the RMSE for
474
the seasonal and the deseasonalized series for three different threshold, 1%, 5% and 10%. From
475
the fitted parameters, it is reaffirmed by this analysis as well that as the threshold decreases the
476
estimates of the parameters deviate from the ones obtained for the iid case (ξ = 0.5 and η = 1).
477
From the RMSE (Fig. 15c), it can be seen that the proposed model describes very well the
478
deseasonalized data and fairly well the original observations, and in both cases the modelling
479
efficiency improves for lower thresholds. Seasonality is associated with increased temporal
480
clustering in the intermediate scales (approx. 20150 days), which manifests with a curvature in
481
the NEPvS plots that the model captures less efficiently compared to the deseasonalized case,
482
typically producing a straight line plot. Also, it is evident that results concerning the threshold are
483
not as robust for this case, since the impact of the threshold on seasonal clustering may vary
484
depending on the specific seasonal regime. For instance, it is expected that for stations with
485
prominent seasonality, high thresholds will show increased clustering only in the wettest season,
486
whereas lower threshold will enable inspection of clustering in more seasons. However, depending
487
on the characteristics of the seasonal regime and the intensity of the specific seasons, the temporal
488
mixture of extremes from the different seasons differs from case to case, and thus, it is not
489
23
straightforward to discern the impact of seasonality from a bulk fitting to all cases. On the other
490
hand, for the deseasonalized cases it is clear that ‘dependence’ emerges as the threshold lowers.
491
5. Discussion
492
Clustering of extreme events is related to the presence of persistence, or HK dynamics, in natural
493
processes. Here we approached this relationship with a twofold intention; first to ‘retrieve’
494
persistence from records of maxima, and second, to characterize it by probabilistic means. To this
495
aim, we have introduced the NEPvS index, for which we also propose a model. The index
496
examines the probabilistic behaviour of POT occurrences across multiple scales and proved
497
successful in revealing persistence from extremes from various nonGaussian timeseries, for which
498
wellknown tools performed poorly.
499
It seems, though, to be difficult to establish general analytical relationships linking the
500
NEPvS behaviour to the H parameter of the parent process, which is true without even considering
501
the uncertainty involved in estimating H from small record lengths in the first place. As the H
502
parameter is a secondorder characterization of a process, generation schemes reproducing H
503
behaviour but coupled with different marginal distributions (having different high order moments),
504
will yield different behaviours of extremes. For instance, clustering of extremes and its
505
identifiability appears to be much more prominent in Gaussian processes. The task therefore, of
506
linking clustering of extremes to the H parameter, without also accounting for the specific high
507
order moments of the timeseries seems infeasible. We showed though, that the threshold is a key
508
determinant in this respect, as lowering the threshold, i.e. moving towards the central tendency of
509
the data, enables better identification of persistence. On the contrary, as the threshold increases,
510
evidence of persistence is progressively lost and the behaviour of extremes may falsely suggest
511
independence of the parent process.
512
24
Application of the NEPvS index to daily rainfall data showed that there may exist significant
513
departures from independence, particularly for lower thresholds, which are dependent on the
514
location and specific climatic region. In general, the behaviour of rainfall extremes in multiple
515
case studies (28 stations in the Netherlands and 1 in Iceland) was found by means of extensive
516
Monte Carlo simulations, to be consistent with HK dynamics characterized by moderate H
517
parameters (in the range 0.60.7). The NEPvS model showed a very good fit to the probabilistic
518
behaviour of exceedances for the seasonal and deseasonalized observations across multiple scales
519
for all 60 stations. As a similar version of the model has been previously proposed to describe the
520
probability dry across multiple scales (Koutsoyiannis, 2006), this result suggests that there exists
521
a probabilistic law which effectively describes the multilevel exceedances of rainfall thresholds
522
across scales, from zerocrossings (wet days) to highlevel crossings, as the ones examined here.
523
From a theoretical point of view, these findings suggest that it is important to study change
524
and clustering in a consistent stochastic framework examining the whole process behaviour, in
525
order to better understand the process dynamics and avoid retaining ‘preconceived’ assumptions,
526
such as iid, which may be inconsistent with the physical reality. For instance, various trend tests
527
assume iid for the examined process, while modified tests accounting for persistence (Hamed,
528
2008), also do not consider its interplay with the higher order moments. Therefore, it is likely that
529
they fail to account for extremes from complex processes, leaving aside issues regarding
530
problematic applications due to misinterpretation of stationarity (Koutsoyiannis and Montanari,
531
2015; Montanari and Koutsoyiannis, 2014). Overdispersion in POT rainfall events has been also
532
studied lately and attributed to a mixture of Poisson models, representing different climate regimes
533
(Tye et al., 2018) as well as seasonality mechanisms (Serinaldi and Kilsby, 2013). Although, we
534
have found as well that in some cases seasonality accounts for most of the observed clustering in
535
25
the rainfall extremes, by performing multiple MC experiments focusing on the deseanonalized
536
extremes, we have revealed consistency with HK dynamics. We note though that as the H
537
parameter for rainfall revolves around the value of 0.6 and rainfall is a heavily skewed process, it
538
is expected that identifiability of persistence from extremes will be limited, except if ones lowers
539
the threshold. Nevertheless, this highlights an alternative scientific hypothesis to be considered in
540
‘attribution’ studies, which is the emergence of clustering and overdispersion of extremes from
541
persistence in the parent process.
542
From a practical point of view, the presence of persistence in the parent process affects
543
estimation of extreme values, and therefore various design outcomes, in multiple ways. Although
544
the theoretical definition of return period is still valid under presence of persistence (Koutsoyiannis
545
2008; Volpi et al., 2015;), the statistical estimates of distribution quantiles for a specified return
546
period are severely impacted. Other important implications concern flood risk underestimation
547
under persistence (Serinaldi and Kilsby, 2016), as well as underestimation of IDF curves when the
548
temporal dependence is disregarded (Roy et al. 2019). Therefore, although persistence of the
549
parent process is less evident in the series of its extremes, and it is highly unlikely that it can be
550
fully retrieved except for very low thresholds, its impact cannot be disregarded when studying
551
extremes, even if the latter appear independent. Yet theoretical arguments exist concerning validity
552
of wellknown theorems under relaxed assumptions of iid, for instance fundamental EVT results
553
(limiting distributions etc.) which hold true under weak presence of persistence (Leadbetter, 1983).
554
However, for scientific applications, which involve estimation from data of finite, and typically
555
small record lengths, the presence of persistence in the process induces uncertainty in the
556
estimation, as the actual information content of the data is lower than that for iid conditions
557
26
(Koutsoyiannis and Montanari, 2007), and this uncertainty inevitably propagates into the extreme
558
value estimates.
559
The existence of clustering also increases the arguments towards the use of the POT method
560
for sampling of extremes, instead of block maxima approaches which tend to hide dependence, as
561
also evident in Fig.2. As the threshold plays a vital role, using POT approaches with more than
562
one event per year on average, which is the common practice, is also equally important. Empirical
563
declustering approaches (Lang et al., 1999) may as well be noneffective if they do not take into
564
account each process characteristics. In this regard, we argue that instead of seeking to resort to
565
independence, often at the cost of reducing the available information (e.g. by discounting
566
‘dependent’ data), accounting for dependence is a more viable and consistent way forward. In fact,
567
the use of all the set of observations has been recently advocated (Volpi et al., 2019), while the
568
emergence of new types of highorder moments (Koutsoyiannis, 2019) that exploit the whole set
569
of observations, provide an improved stochastic framework for applying this principle.
570
6. Conclusions
571
This research deals with the question of identifying the links between persistence in the parent
572
process and clustering of extremes, with the specific aim to ‘rediscover’ the usually ‘lost’
573
persistence when one examines records of maxima. This is achieved by devising a probabilistic
574
characterization of clustering of extremes. The main findings are summarized below:
575
a. There is significant influence from both the secondorder properties and the highorder
576
moments of the parent process on the generated extremes, and therefore characterizations
577
of clustering of extremes need to account for both.
578
b. Identifiability of persistence from records of maxima is in general limited and weakens as
579
the threshold for extremes increases.
580
27
c. The estimates of the Hurst parameter from the climacogram analysis and from the
581
dispersion index are found to be severely biased downward when derived from extremes
582
originating from nonGaussian processes.
583
d. A new probabilistic index is proposed to represent clustering based on the probability of
584
nonexceedance of a given threshold across scales, called the NEPvS (nonexceedance
585
probability vs scale) index.
586
e. The NEPvS exhibits scaling behaviour which is described by a proposed model accurately
587
simulating the probability of exceedance of a threshold at multiple temporal scales.
588
f. The index is transparent and can be directly used for statistical testing of departures from
589
independence. Casespecific Monte Carlo simulations are needed to validate more
590
complicated models coupling persistence with different marginal properties.
591
g. The POT approach applied with ‘low’ thresholds is a robust and informative way to reveal
592
the clustering dynamics of extremes, in contrast to the block maxima method which hinders
593
identifiability of persistence.
594
h. Deseasonalized daily rainfall POT events may show prominent departures from
595
independence especially at lower thresholds, which may become important depending on
596
the climatic region. Extensive stationspecific Monte Carlo experiments showed
597
consistency of clustering of extremes for various examined thresholds with assumed HK
598
models fitted based on the properties of the parent process.
599
Further research is required in order to obtain analytical mathematic results for extremes
600
arising from persistent processes, with the aim of constructing estimators for any distribution type
601
and dependence structure without the need for Monte Carlo validations. However, the latter is
602
doubtful as a task, since extremes over scales are controlled by higher order moments, which are
603
28
also difficult to estimate correctly from data (Lombardo et al., 2014). Recently proposed moment
604
types with unbiased estimators across all orders that can also model joint properties of processes
605
could provide a way to circumvent this (Koutsoyiannis, 2019).
606
We conclude that extremes tend to ‘hide’ the persistence of the parent process, often falsely
607
signalling independence. Regardless however of the strength of the evidence, the impact of
608
persistence in the parent process on the estimation of extreme values is nonetheless present. In this
609
respect, more research should focus on the stochastic properties of extremes from natural
610
processes, where dependence mechanisms manifest themselves across various temporal scales and
611
challenge common assumptions and practices.
612
Acknowledgments
613
We greatly thank the Radcliffe Meteorological Station, the Icelandic Meteorological Office
614
(Trausti Jónsson), the Czech Hydrometeorological Institute, the Finnish Meteorological Institute,
615
the National Observatory of Athens, the Department of Earth Sciences of the Uppsala University
616
and the Regional Hydrologic Service of the Tuscany Region
617
(servizio.idrologico@regione.toscana.it) for providing the required data for each region
618
respectively. We are also grateful to Professor Ricardo Machado Trigo (University of Lisbon) for
619
providing the Lisbon timeseries, to Professor Marco Marani (University of Padua) for providing
620
the Padua timeseries and to Professor JooHeon Lee (Joongbu University) for providing the
621
Seoul timeseries. All the above data were freely provided after contacting the acknowledged
622
sources. The remaining timeseries are publicly available by the data providers in the ECA&D
623
project (http://www.ecad.eu), and in the GHCNDaily database
624
(https://data.noaa.gov/dataset/globalhistoricalclimatologynetworkdailyghcndailyversion3).
625
The analyses were performed in the Python 2.6 (Python Software Foundation. Python Language
626
Reference, version 2.7, available at http://www.python.org) using the contributed packages
627
pandas, scipy and seaborn. The codes used for the generation of the synthetic SMA series
628
(Dimitriadis and Koutsoyiannis, 2018) are available at:
629
https://www.itia.ntua.gr/en/docinfo/1656/. We are grateful to the Associate Editor Elena Volpi
630
and the anonymous reviewer for the encouraging and constructive comments.
631
References
632
Barunik, J., Kristoufek, L., 2010. On Hurst exponent estimation under heavytailed distributions.
633
Physica A: Statistical Mechanics and its Applications 389, 3844–3855.
634
https://doi.org/10.1016/j.physa.2010.05.025
635
29
Coles, S., Bawa, J., Trenner, L., Dorazio, P., 2001. An introduction to statistical modeling of
636
extreme values. Springer.
637
Dimitriadis, P., 2017. HurstKolmogorov dynamics in hydrometeorological processes and in the
638
microscale of turbulence.
639
Dimitriadis, P., Koutsoyiannis, D., 2018. Stochastic synthesis approximating any process
640
dependence and distribution. Stoch Environ Res Risk Assess 32, 1493–1515.
641
https://doi.org/10.1007/s0047701815402
642
Dimitriadis, P., Koutsoyiannis, D., 2015. Climacogram versus autocovariance and power spectrum
643
in stochastic modelling for Markovian and Hurst–Kolmogorov processes. Stochastic
644
environmental research and risk assessment 29, 1649–1669.
645
Eichner, J.F., Kantelhardt, J.W., Bunde, A., Havlin, S., 2011. The statistics of return intervals,
646
maxima, and centennial events under the influence of longterm correlations, in: In
647
Extremis. Springer, pp. 2–43.
648
Ferro, C.A., Segers, J., 2003. Inference for clusters of extreme values. Journal of the Royal
649
Statistical Society: Series B (Statistical Methodology) 65, 545–556.
650
Hamed, K.H., 2008. Trend detection in hydrologic data: the Mann–Kendall trend test under the
651
scaling hypothesis. Journal of hydrology 349, 350–363.
652
Hurst, H.E., 1951. Longterm storage capacity of reservoirs. Trans. Amer. Soc. Civil Eng. 116,
653
770–808.
654
Iliopoulou, T., Koutsoyiannis, D., Montanari, A., 2018. Characterizing and modeling seasonality
655
in extreme rainfall. Water Resources Research 54, 6242–6258.
656
Iliopoulou, T., Papalexiou, S.M., Markonis, Y., Koutsoyiannis, D., 2016. Revisiting longrange
657
dependence in annual precipitation. Journal of Hydrology.
658
Klein Tank, A.M.G., Wijngaard, J.B., Können, G.P., Böhm, R., Demarée, G., Gocheva, A., Mileta,
659
M., Pashiardis, S., Hejkrlik, L., KernHansen, C., 2002. Daily dataset of 20thcentury
660
surface air temperature and precipitation series for the European Climate Assessment.
661
International journal of climatology 22, 1441–1453.
662
Kottegoda, N.T., Rosso, R., 2008. Applied statistics for civil and environmental engineers.
663
Blackwell Malden, MA.
664
Koutsoyiannis, D., 2019. Knowable moments for highorder stochastic characterization and
665
modelling of hydrological processes. Hydrological Sciences Journal 0, 1–15.
666
https://doi.org/10.1080/02626667.2018.1556794
667
Koutsoyiannis, D., 2016. Generic and parsimonious stochastic modelling for hydrology and
668
beyond. Hydrological Sciences Journal 61, 225–244.
669
Koutsoyiannis, D., 2010. HESS Opinions" A random walk on water". Hydrology and Earth System
670
Sciences 14, 585–601.
671
Koutsoyiannis, D., 2008, Probability and statistics for geophysical processes,
672
doi:10.13140/RG.2.1.2300.1849/1, National Technical University of Athens, Athens.
673
Koutsoyiannis, D., 2006. An entropicstochastic representation of rainfall intermittency: The
674
origin of clustering and persistence. Water Resources Research 42.
675
Koutsoyiannis, D., 2003. Climate change, the Hurst phenomenon, and hydrological statistics.
676
Hydrological Sciences Journal 48, 3–24.
677
Koutsoyiannis, D., Montanari, A., 2015. Negligent killing of scientific concepts: the stationarity
678
case. Hydrological Sciences Journal 60, 1174–1183.
679
Koutsoyiannis, D., Montanari, A., 2007. Statistical analysis of hydroclimatic time series:
680
Uncertainty and insights. Water resources research 43.
681
30
Lang, M., Ouarda, T., Bobée, B., 1999. Towards operational guidelines for overthreshold
682
modeling. Journal of hydrology 225, 103–117.
683
Leadbetter, M.R., 1983. Extremes and local dependence in stationary sequences. Probability
684
Theory and Related Fields 65, 291–306.
685
Lombardo, F., Volpi, E., Koutsoyiannis, D., Papalexiou, S.M., 2014. Just two moments! A
686
cautionary note against use of highorder moments in multifractal models in hydrology.
687
Hydrology and Earth System Sciences 18, 243–255.
688
Marani, M., Zanetti, S., 2015. Longterm oscillations in rainfall extremes in a 268 year daily time
689
series. Water Resources Research 51, 639–647.
690
Markonis, Y., Koutsoyiannis, D., 2016. Scaledependence of persistence in precipitation records.
691
Nature Climate Change 6, 399.
692
Menne, M.J., Durre, I., Vose, R.S., Gleason, B.E., Houston, T.G., 2012. An Overview of the
693
Global Historical Climatology NetworkDaily Database. J. Atmos. Oceanic Technol. 29,
694
897–910. https://doi.org/10.1175/JTECHD1100103.1
695
Merz, B., Nguyen, V.D., Vorogushyn, S., 2016. Temporal clustering of floods in Germany: Do
696
floodrich and floodpoor periods exist? Journal of Hydrology 541, 824–838.
697
Montanari, A., 2003. Longrange dependence in hydrology. Theory and applications of longrange
698
dependence 461–472.
699
Montanari, A., Koutsoyiannis, D., 2014. Modeling and mitigating natural hazards: Stationarity is
700
immortal! Water Resources Research 50, 9748–9756.
701
Ntegeka, V., Willems, P., 2008. Trends and multidecadal oscillations in rainfall extremes, based
702
on a more than 100year time series of 10 min rainfall intensities at Uccle, Belgium. Water
703
Resources Research 44.
704
O’Connell, P.E., Koutsoyiannis, D., Lins, H.F., Markonis, Y., Montanari, A., Cohn, T., 2016. The
705
scientific legacy of Harold Edwin Hurst (1880–1978). Hydrological Sciences Journal 61,
706
1571–1590.
707
Papoulis, A., 1991. Probability, Random Variables, and Stochastic Processes, 3rd ed. McGraw
708
Hill, New York.
709
Roy, T., Dimitriadis, P., Iliopoulou, T., Koutsoyiannis, D., 2019, A probabilistic Intensity
710
DurationFrequency framework considering temporal dependence (in preparation)
711
Serinaldi, F., 2013. On the relationship between the index of dispersion and Allan factor and their
712
power for testing the Poisson assumption. Stochastic environmental research and risk
713
assessment 27, 1773–1782.
714
Serinaldi, F., Kilsby, C.G., 2018. Unsurprising Surprises: The Frequency of Recordbreaking and
715
Overthreshold Hydrological Extremes Under Spatial and Temporal Dependence. Water
716
Resources Research 54, 6460–6487.
717
Serinaldi, F., Kilsby, C.G., 2016. Understanding persistence to avoid underestimation of collective
718
flood risk. Water 8, 152.
719
Serinaldi, F., Kilsby, C.G., 2013. On the sampling distribution of Allan factor estimator for a
720
homogeneous Poisson process and its use to test inhomogeneities at multiple scales.
721
Physica A: Statistical Mechanics and its Applications 392, 1080–1089.
722
Tegos, A., Tyralis, H., Koutsoyiannis, D., Hamed, K., 2017. An R function for the estimation of
723
trend significance under the scaling hypothesisapplication in PET parametric annual time
724
series. Open Water Journal 4, 6.
725
Telesca, L., Cuomo, V., Lapenna, V., Macchiato, M., 2002. On the methods to identify clustering
726
properties in sequences of seismic timeoccurrences. Journal of seismology 6, 125–134.
727
31
Thurner, S., Lowen, S.B., Feurstein, M.C., Heneghan, C., Feichtinger, H.G., Teich, M.C., 1997.
728
Analysis, synthesis, and estimation of fractalrate stochastic point processes. Fractals 5,
729
565–595.
730
Tye, M. R., Katz, R. W., & Rajagopalan, B. 2019. Climate change or climate regimes? Examining
731
multiannual variations in the frequency of precipitation extremes over the Argentine
732
Pampas. Climate Dynamics, 53(12), 245260. doi:10.1007/s0038201845819.
733
Tyralis, H., Dimitriadis, P., Koutsoyiannis, D., O’Connell, P.E., Tzouka, K., Iliopoulou, T., 2018.
734
On the longrange dependence properties of annual precipitation using a global network of
735
instrumental measurements. Advances in Water Resources 111, 301–318.
736
Tyralis, H., Koutsoyiannis, D., 2011. Simultaneous estimation of the parameters of the Hurst–
737
Kolmogorov stochastic process. Stochastic Environmental Research and Risk Assessment
738
25, 21–33.
739
Vitolo, R., Stephenson, D.B., Cook, I.M., MitchellWallace, K., 2009. Serial clustering of intense
740
European storms. Meteorologische Zeitschrift 18, 411–424.
741
Volpi, E., Fiori, A., Grimaldi, S., Lombardo, F., Koutsoyiannis, D., 2015. One hundred years of
742
return period: Strengths and limitations. Water Resources Research 51, 8570–8585.
743
Volpi, E., et al., 2019. Save hydrological observations! Return period estimation without data
744
decimation. Journal of Hydrology, 571, 782792. doi:10.1016/j.jhydrol.2019.02.017.
745
Willems, P., 2013. Adjustment of extreme rainfall statistics accounting for multidecadal climate
746
oscillations. Journal of hydrology 490, 126–133.
747
748
Figures
749
750
32
Figure 1. Map of the 60 stations with longest records used in the analysis.
751
752
Figure 2. Explanatory graph of mathematical formulation. (a) Parent timeseries, (b) POT series,
753
(c) temporal distribution of counts of POT at basic scale k=1, (d) temporal distribution of counts
754
of POT occurrences at scale k=10 and (e) block maxima series at scale k=10.
755
33
756
Figure 3. Visualization of three timeseries with H=0.8 and different marginal distributions
757
generated from the 4moment SMA scheme (Dimitriadis and Koutsoyiannis, 2018). The legends
758
report the mean, standard deviation, coefficient of skewness and coefficient of kurtosis of each
759
distribution.
760
34
761
Figure 4. H parameters estimated from block maxima series at increasing scale of filtering for
762
(a) benchmark series of length 106 from HK models with H=0.8 following normal and type
763
Pareto distributions and (b) average H values from 103 Monte Carlo simulations for HK models
764
with H=0.7 and three different marginal distributions, typegamma, typePareto and normal.
765
35
766
Figure 5. Index of dispersion of POT occurrences versus scale (double logarithmic axes) and
767
estimated H parameters for scales>500 for (a) benchmark series of length 106 from HK models
768
with theoretical H=0.8 following normal and typePareto distributions and (b) average values
769
from 103 Monte Carlo simulations for HK models with theoretical H=0.7 and three different
770
marginal distributions, typegamma, typePareto and normal.
771
36
772
Figure 6. Minus natural logarithm of nonexceedance probability versus scale (NEPvS) index on
773
double logarithmic axes along with the fit of the proposed model (Eq. 2) for (a) benchmark non
774
Gaussian timeseries (typegamma and typePareto) and (b) benchmark normal timeseries, for a
775
range of H parameters.
776
37
777
Figure 7. Minus natural logarithm of nonexceedance probability versus scale (NEPvS) index on
778
double logarithmic axes for white noise timeseries and two sample lengths, 150×365 and
779
300×365.
780
38
781
Figure 8. Minus natural logarithm of nonexceedance probability versus scale (NEPvS) index on
782
double logarithmic axes for white noise timeseries (length 150×365) and variations of the
783
sampling threshold of extremes
784
39
785
Figure 9. (a) Parameter η variation for increasing H parameter and different combinations of the
786
sampling threshold and distribution type. (b) Parameter ξ variation for increasing H parameter
787
and different combinations of the sampling threshold and distribution type.
788
40
789
Figure 10. Minus natural logarithm of nonexceedance probability versus scale (NEPvS) index
790
on double logarithmic axes along with 95% MCPL for (a) H=0.7 with typegamma (α=0.1) and
791
typePareto (α=0.2), and white noise and (b) H =0.7 for two typegamma distributions with
792
α=0.1 and α=0.01.
793
41
794
Figure 11. Minus natural logarithm of nonexceedance probability versus scale (NEPvS) index
795
on double logarithmic axes for white noise timeseries and seasonal and deseasonalized series by
796
methods 1 (M1) and 2 (M2) for the stations of Oxford (a), Athens (b) and Helsinki (c).
797
42
798
Figure 12. Empirical climacograms of the 60 daily rainfall series used in the analysis along with
799
theoretical lines for H=0.5, 0.6, 0.7, 0.8.
800
43
801
Figure 13. Minus natural logarithm of nonexceedance probability versus scale (NEPvS) index
802
on double logarithmic axes for deseasonalized series for the 28 rainfall records in the
803
Netherlands along with 95% MCPL of the fitted model with H=0.7, for four different thresholds:
804
(a) 10%, (b) 5%, (c) 1% and (d) 0.5%.
805
44
806
Figure 14. Minus natural logarithm of nonexceedance probability versus scale (NEPvS) index
807
on double logarithmic axes for the deseasonalized series of Stykkisholmur in Iceland along with
808
95% MCPL of the fitted models with H=0.65 and H=0.7, for four different thresholds: (a) 10%,
809
(b) 5%, (c) 1% and (d) 0.5%.
810
45
811
Figure 15. Boxplots of (a) parameter η, (b) parameter ξ and (c) RMSE from the fitting of the
812
model to the seasonal and deseasonalized series by M1 for three different thresholds (1%, 5%
813
and 10%).
814
815
816
817
818
819
820
46
Tables
821
Table 1 Properties of the benchmark samples used in the experiments.
822
Distribution
type
Parameters
Mean
Variance
Skewness
Kurtosis
H
Length
Shape
Scale
Location
Normal

2.6
1.25
1.25
2.6
0
3
0.50.99
106
Gamma
0.1
5.1

0.51
2.6
6.325
63
0.50.99
106
Gamma
0.01
16.125

0.16
2.6
20
603
0.50.99
106
Pareto
0.1
1
0
1.11
1.54
2.81
17.83
0.50.99
106
Pareto
0.2
1
0
1.25
2.6
4.65
73.8
0.50.99
106
823
Table 2 Summary statistics (first and third quantiles, Q1 and Q3, mean and standard deviation,
824
St.Dev.) of the properties of the rainfall dataset. Mean, Variance, Skewness and Kurtosis are
825
estimated for the wet record.
826
Statistic
Mean
Variance
Skewness
Kurtosis
Prob. Dry
Hdaily
Hannual
Years
Missing %
Q1
3.68
24.85
2.9
17.28
0.47
0.56
0.55
153
0.75
Mean
4.98
64.85
3.39
24.03
0.55
0.63
0.67
169.25
2.62
Q3
5.91
64.64
3.54
25.85
0.61
0.7
0.77
173
1.31
St.Dev.
2.27
94.15
0.72
10.94
0.11
0.09
0.13
24.66
5.11
47
Appendix I
827
828
Figure A1. Plots of η and ξ parameters versus the H parameter and polynomial fitting for the (a)
829
typePareto with α=0.1, (b) typePareto with α=0.2, (c) typegamma with α=0.1 and (d) type
830
gamma with α=0.01.
831
832
833
Figure A2. Plots of η and ξ parameters versus the H parameter and polynomial fitting for the
834
normal distribution.
835
48
836
Figure A3. Plots of η and ξ parameters versus the H parameter for the typePareto with α=0.1
837
and α=0.2, typegamma with α=0.1 and α=0.01 and the normal.
838
839