ArticlePDF Available

On the Exact Distribution of Correlated Extremes in Hydrology

Authors:
  • Ministero dell'Interno

Abstract and Figures

The analysis of hydrological hazards usually relies on asymptotic results of extreme value theory, which commonly deals with block maxima or peaks over threshold (POT) data series. However, data quality and quantity of block maxima and POT hydrological records do not usually fulfill the basic requirements of extreme value theory, thus making its application questionable and results prone to high uncertainty and low reliability. An alternative approach to better exploit the available information of continuous time series and nonextreme records is to build the exact distribution of maxima (i.e., nonasymptotic extreme value distributions) from a sequence of low‐threshold POT. Practical closed‐form results for this approach do exist only for independent high‐threshold POT series with Poisson occurrences. This study introduces new closed‐form equations of the exact distribution of maxima taken from low‐threshold POT with magnitudes characterized by an arbitrary marginal distribution and first‐order Markovian dependence, and negative binomial occurrences. The proposed model encompasses and generalizes the independent‐Poisson model and allows for analyses relying on significantly larger samples of low‐threshold POT values exhibiting dependence, temporal clustering, and overdispersion. To check the analytical results, we also introduce a new generator (called Gen2Mp) of proper first‐order Markov chains with arbitrary marginal distributions. An illustrative application to long‐term rainfall and streamflow data series shows that our model for the distribution of extreme maxima under dependence takes a step forward in developing more reliable data‐rich‐based analyses of extreme values.
This content is subject to copyright. Terms and conditions apply.
Accepted for publication in Water Resources Research
1
On the exact distribution of correlated extremes in hydrology
1
2
F. Lombardo1,2, F. Napolitano1, F. Russo1, and D. Koutsoyiannis1,3
3
4
1 Dipartimento di Ingegneria Civile, Edile e Ambientale, Sapienza Università di Roma, Via
5
Eudossiana, 18 00184 Rome, Italy.
6
2 Corpo Nazionale dei Vigili del Fuoco, Ministero dell’Interno, Piazza del Viminale, 1 00184
7
Rome, Italy.
8
3 Department of Water Resources and Environmental Engineering, National Technical
9
University of Athens, Heroon Polytechneiou 5, GR-157 80 Zographou, Greece.
10
11
Corresponding author: Federico Lombardo (federico.lombardo@uniroma1.it)
12
13
Key Points:
14
We propose non-asymptotic closed-form distribution for dependent maxima.
15
We introduce a new efficient generator of Markov chains with arbitrary marginals.
16
We contribute to develop more reliable data-rich-based analyses of extreme values.
17
18
19
20
Accepted for publication in Water Resources Research
2
Abstract
21
The analysis of hydrological hazards usually relies on asymptotic results of extreme value theory
22
(EVT), which commonly deals with block maxima (BM) or peaks over threshold (POT) data
23
series. However, data quality and quantity of BM and POT hydrological records do not usually
24
fulfill the basic requirements of EVT, thus making its application questionable and results prone
25
to high uncertainty and low reliability. An alternative approach to better exploit the available
26
information of continuous time series and non-extreme records is to build the exact distribution
27
of maxima (i.e., non-asymptotic extreme value distributions) from a sequence of low-threshold
28
POT. Practical closed-form results for this approach do exist only for independent high-threshold
29
POT series with Poisson occurrences. This study introduces new closed-form equations of the
30
exact distribution of maxima taken from low-threshold POT with magnitudes characterized by an
31
arbitrary marginal distribution and first-order Markovian dependence, and negative binomial
32
occurrences. The proposed model encompasses and generalizes the independent-Poisson model
33
and allows for analyses relying on significantly larger samples of low-threshold POT values
34
exhibiting dependence, temporal clustering and overdispersion. To check the analytical results,
35
we also introduce a new generator (called Gen2Mp) of proper first-order Markov chains with
36
arbitrary marginal distributions. An illustrative application to long-term rainfall and streamflow
37
data series shows that our model for the distribution of extreme maxima under dependence takes
38
a step forward in developing more reliable data-rich-based analyses of extreme values.
39
1 Introduction
40
The study of hydrological extremes is one of long history in research applied to design
41
and management of water supply (e.g. Hazen, 1914) and flood protection works (e.g. Fuller,
42
1914). Almost half a century after the first pioneering empirical studies, Gumbel (1958) provided
43
Accepted for publication in Water Resources Research
3
a general framework linking the theoretical properties of probabilities of extreme values (e.g.
44
Fisher and Tippet, 1928) to the empirical basis of hydrological frequency curves. Since then,
45
extreme value theory (EVT) applied to hydrological analyses has been a matter of primary
46
concern in the literature (see e.g. Papalexiou and Koutsoyiannis, 2013; Serinaldi and Kilsby,
47
2014 for detailed overview). EVT aims at modeling the extremal behavior of observed
48
phenomena by asymptotic probability distributions, and observations to which such distributions
49
are allegedly related should meet the following important conditions:
50
1. They should resemble the samples of independent and identically distributed (i.i.d.)
51
random variables. Then, extreme events arise from a stationary distribution and are
52
independent of one another.
53
2. Their number should be large. Defining how large their size should be depends on the
54
characteristics of the parent distribution from which the extreme values are taken (e.g. the
55
tail behavior) and the degree of precision we seek.
56
Most of these assumptions, commonly made in classical statistical analyses, are hardly
57
ever realized in hydrological applications, especially when studying extremes. Specifically, the
58
traditional analysis of hydrological extremes is based on statistical samples that are formed by
59
selecting from the entire data series (e.g. at the daily scale) those values that can reasonably be
60
considered as realizations of independent extremes, e.g. annual maxima or peaks over a certain
61
high threshold. Thus, many observations are discarded and the reduction of the already small size
62
of common hydrological records significantly affects the reliability of the estimates
63
(Koutsoyiannis, 2004a,b; Volpi et al., 2019). In addition, Koutsoyiannis (2004a) showed that the
64
convergence to the asymptotic distributions can be extremely slow and may require a huge
65
Accepted for publication in Water Resources Research
4
number of events. Thus, a typical number of extreme hydrological events does not guarantee
66
convergence in applications.
67
Furthermore, the long-term behavior of the hydrological cycle and its driving forces
68
provide the context to understand that correlations between hydrological samples not only occur,
69
but they also can persist for a long time (see O’Connell et al., 2016 for a recent review). While
70
Leadbetter (1974, 1983) demonstrated that distributions based on dependent events (with limited
71
longterm persistence at extreme levels) share the same asymptotic properties of distributions
72
based on independent trials, there is evidence that correlation has strong influence on the exact
73
statistical properties of extreme values and it slows down the already slow rate of convergence
74
(e.g. Eichner et al., 2011; Bogachev and Bunde, 2012; Volpi et al., 2015; Serinaldi and Kilsby,
75
2016). In essence, correlation inflates the variability of the expected values and the width of
76
confidence intervals (CIs) due to information redundancy, and a typical effect is reflected in the
77
tendency of hydrological extremes to cluster in space and time (e.g. Serinaldi and Kilsby, 2018
78
and references therein). Moreover, focusing on extreme data values, such as annual maxima,
79
hinders reliable retrieval of the dependence structure characterizing the underlying process
80
because of sampling effects of data selection (Serinaldi et al., 2018; Iliopoulou and
81
Koutsoyiannis, 2019). Then, correlation structures and variability of hydrological processes
82
might easily be underestimated, further compromising the attempt to draw conclusions about
83
trends spanning the period of records (see Serinaldi et al., 2018, for detailed discussion). In other
84
words, the lately growing body of publications examining “nonstationarity” in hydrological
85
extremes (see Salas et al., 2018 and references therein) may likely reflect time dependence of
86
such extremes within a stationary setting, as observed patterns are usually compatible with
87
Accepted for publication in Water Resources Research
5
stationary correlated random processes (Koutsoyiannis and Montanari, 2015; Luke et al., 2017;
88
Serinaldi and Kilsby, 2018).
89
In classical statistical analyses of hydrological extremes, to form data samples we
90
commonly use two alternative strategies referred to as “block maxima” (BM) and “peaks over
91
threshold” (POT) methods. The former is to choose the highest of all recorded values at each
92
year (for a given time scale, e.g. daily rainfall) and form a sample with size equal to the number
93
of years of the record. The POT method is to form a sample with all recorded values exceeding a
94
certain threshold irrespective of the year they occurred, allowing to increase the available
95
information by using more than one extreme value per year (Coles, 2001; Claps and Laio, 2003).
96
The fact that observed hydrological extremes tend to cluster in time increases the
97
arguments towards the use of the POT sampling method, instead of block maxima approaches
98
which tend to hide dependence (Iliopoulou and Koutsoyiannis, 2019). Such clustering reflects
99
dependence (at least) in the neighboring excesses of a threshold, invalidating the basic
100
assumption of independence made in classical POT analyses. Therefore, the standard approach in
101
case studies is to fix a (somewhat subjective) high threshold, and then filter the clusters of
102
exceedances so as to obtain a set of observations that can be considered mutually independent.
103
Such a declustering procedure involves using empirical rules to define clusters (e.g. setting a run
104
length that represents a minimum timespan between consecutive clusters, meaning that a cluster
105
ends when the separation between two consecutive threshold exceedances is greater than the
106
fixed run length) and then selecting only the maximum excess within each cluster (Coles, 2001;
107
Ferro and Segers, 2003; Bernardara et al., 2014; Bommier, 2014). Declustering results in
108
significant loss of data that can potentially provide additional information about extreme values.
109
Accepted for publication in Water Resources Research
6
In this paper, we aim to overcome these problems by investigating the exact distribution
110
of correlated extremes. Hence, we can set considerably lower thresholds with respect to the
111
standard POT analyses and avoid declustering procedures whose effectiveness is called into
112
question if we do not account for the process characteristics. The proposed approach provides
113
new insight into probabilistic methods devised for extreme value analysis taking into account the
114
clustering dynamics of extremes, and it is consistent with the general principle of allowing
115
maximal use of information (Volpi et al., 2019).
116
In summary, hydrological applications have made wide recourse to asymptotes or
117
limiting extreme value distributions, while exact distributions for real-world finite-size samples
118
are barely used in stochastic hydrology because their evaluation requires the parent distribution
119
to be known. However, the small size of common hydrological records (e.g. a few tens of years)
120
and the impact of correlations on the information content of observed extremes cannot provide
121
sufficient empirical evidence to estimate limiting extreme value distributions with precision.
122
Therefore, we believe that non-asymptotic analytical models for extremes arising from correlated
123
processes should receive renewed research interest (Iliopoulou and Koutsoyiannis, 2019).
124
This paper is concerned with a theoretical approach to the exact distribution of high
125
extremes based on the pioneering work by Todorovic and Zelenhasic (1970), who proposed a
126
general stationary stochastic model to describe and predict behavior of the maximum term
127
among a random number of random variables in an interval of time  assuming
128
independence. As verified in several studies mentioned above, to make a realistic stochastic
129
model of hydrological processes, we are forced to confront the fact that dependence should
130
necessarily be taken into consideration. The dilemma is that dependence structures make for
131
realistic models, but also reduce the possibility for explicit probability calculations (i.e.,
132
Accepted for publication in Water Resources Research
7
analytical derivations of joint probability distributions are more complicated than under
133
independence). The challenge of this paper is to propose a stochastic model of extremes with
134
dependencies allowing for acceptable realism, but also permitting sufficient mathematical
135
tractability. In this context, short-range dependence structures, such as Pólya’s and Markov’s
136
schemes, nicely make a trade-off between these two demands, when hydrological maxima satisfy
137
Leadbetter’s condition of the absence of long-range dependence (Koutsoyiannis, 2004a).
138
In the remainder of this paper, we first introduce a novel theoretical framework to model
139
the exact distribution of correlated extremes in Section 2. In Section 3, we present a new
140
generator, called Gen2Mp, of correlated processes with arbitrary marginal distributions and
141
Markovian dependence, and use it to validate the theoretical reasoning described in Section 2.
142
Then, Section 4 deals with case studies in order to test the capability of our model to reproduce
143
the statistical behavior of extremes of long-term rainfall and streamflow time series from the real
144
world. Concluding remarks are reported in Section 5.
145
2 Theoretical framework
146
We use herein the POT approach to analyze the extreme maxima, and assume the number
147
of peaks (e.g., flood peak discharges or maximum rainfall depths) exceeding a certain threshold
148
and their magnitudes to be random variables. The threshold simplifies the study and helps
149
focus the attention on the distribution tails, as they are important to know in engineering design
150
(Papalexiou et al., 2013). In the following, we use upper case letters for random variables or
151
distribution functions, and lower case letters for values, parameters or constants.
152
If we consider only those peaks in  exceeding , then we can define the strictly
153
positive random variable
154
Accepted for publication in Water Resources Research
8
(1)
for all , where is the number of exceedances in . Clearly, is a non-
155
increasing function of for a given , but we assume herein that is a fixed constant.
156
It is recalled from probability theory that, given a fixed number of i.i.d. random
157
variables , the largest order statistic  has a probability distribution
158
fully dependent on the joint distribution function of that is
159

(2)
In hydrological applications, it may be assumed that the number of values of in
160
 (e.g. the number of storms or floods per year), whose maximum is the variable of interest
161
(e.g. the maximum rainfall depth or flood discharge), is not constant but it is a realization of a
162
random variable . Therefore, we are interested in the maximum term among a
163
random number of a sequence of random variables in an interval of time .
164
In the following, we attempt to determine the one-dimensional distribution function of
165
that is defined as . Since the magnitude of exceedances and their number
166
are supposed to be random variables, Todorovic (1970) derived the distribution of the extreme
167
maximum of such a particular class of stochastic processes as
168



(3)
which represents the probability that all exceedances in  are less than or equal to .
169
If , then  is the probability that there are no exceedances in .
170
Accepted for publication in Water Resources Research
9
Todorovic and Zelenhasic (1970) proposed the simplest form of the general model in eq.
171
(3) for use in hydrological statistics, which is now the benchmark against which we measure
172
frequency analysis of extreme events (e.g. Koutsoyiannis and Papalexiou, 2017). Its basic
173
assumptions are that is a sequence of independent random variables with common parent
174
distribution , and is a Poisson-distributed random variable independent of
175
with mean , i.e.
. Then, recalling that 

176
, eq. (3) becomes
177


 
(4)
It can be shown that with satisfactory approximation (Koutsoyiannis, 2004a).
178
As stated above, the derivation of eq. (4) includes strong assumptions, such as
179
independence, and the purpose of this paper is to modify and test this equation under suitable
180
dependence conditions.
181
Firstly, we suppose that is a sequence of random variables with common parent
182
distribution  and a particular Markovian dependence that give rise to the two-
183
state Markov-dependent process (2Mp, see next Section for further details). Specifically, we let
184
the occurrences of the event evolve according to a Markov chain with two states, whose
185
probabilities are:
186


(5)
and the transition probabilities (see also Lombardo et al., 2017, appendix C) are:
187
Accepted for publication in Water Resources Research
10
 
 
  
  
(6)
where is the lag-one autocorrelation coefficient of the Markov chain.
188
It follows that, for the process , the probability of the state at a given time
189
depends solely on the state  at the previous time step . Then, for a fixed number
190
of exceedances , the Markov property yields:
191
 
(7)
Applying the chain rule of probability theory to the distribution function of the maximum term
192
, , we obtain
193
 
(8)
From the above it follows that can be determined in terms of the conditional probabilities
194
 and the parent univariate distribution function . As the
195
random variables are identically distributed, they correspond to a stationary stochastic
196
process, and then the function  is invariant to a shift of the origin. In this
197
case, is determined in terms of the second-order (bivariate) distribution
198
 and the first-order (univariate) parent
199
distribution . Indeed, from eq. (8) we obtain
200
 
(9)
It can be easily shown that eq. (9) reduces to eq. (2) in case of independence, i.e.
201
.
202
Accepted for publication in Water Resources Research
11
Secondly, we assume that exceedances have positively correlated occurrences
203
causing a larger variance than if they were independent, i.e. the occurrences are overdispersed
204
with respect to a Poisson distribution, for which the mean is equal to the variance. Therefore, we
205
assume that the random number of occurrences in a specific interval of time  follows the
206
negative binomial distribution (e.g. Calenda et al., 1977; Eastoe and Tawn, 2010), which allows
207
adjusting the variance independently of the mean. The negative binomial distribution (known as
208
the limiting form of the Pólya distribution, cf. Feller, 1968, p. 143) is a compound probability
209
distribution that results from assuming that the random variable is distributed according to a
210
Poisson distribution whose mean varies randomly following a gamma distribution with shape
211
parameter and scale parameter , so that its density is
212


(10)
Then, the probability distribution function of conditional on is
213


(11)
We can derive the unconditional distribution of by marginalizing over the distribution of ,
214
i.e., by integrating out the unknown parameter as
215

(12)
Substituting eqs. (10) and (11) into eq. (12), we have
216

 

(13)
Accepted for publication in Water Resources Research
12
Recalling that the gamma function is defined as 
, then multiplying
217
and dividing eq. (13) by   and integrating by substitution, we obtain after
218
algebraic manipulations
219


(14)
To summarize, we specialize the general model in eq. (3) for the following conditions:
220
1. is a sequence of correlated random variables with 2Mp dependence and common
221
parent distribution .
222
2. is a negative binomial random variable independent of with mean  and
223
variance 
224
Under the above assumptions, from eq. (3) we can derive the conditional distribution function of
225
the maximum as
226

 

(15)
where for of 2Mp
227

 
(16)
Substituting eqs. (11) and (16) in eq. (15), we obtain
228



(17)
Then, adding and subtracting the term 
  yields
229
Accepted for publication in Water Resources Research
13




(18)
and thus
230




(19)
which is the conditional distribution function of the maximum term among a Poisson-
231
distributed random number with gamma-distributed mean of 2Mp random variables
232
in an interval of time . It can be shown that eq. (4) is easily recovered assuming
233
independence, i.e.  and is a fixed constant.
234
The unconditional distribution of is derived by substituting eqs. (14) and (16) into eq.
235
(3) as follows
236



(20)
Then, adding and subtracting the term 
  and denoting by
237
the Pochhammer’s symbol (Abramowitz and Stegun, 1972, p. 256) yields
238


 

(21)
Since 
 and is a real number, then this series is known as a
239
binomial series (Graham et al., 1994, p. 162), and, setting 
, it
240
converges to 

 , thus
241
Accepted for publication in Water Resources Research
14




(22)
which is the unconditional distribution of the extreme maximum . The parameters of the model
242
in eq. (22) are and along with those of the models chosen for both the parent distribution,
243
, and the bivariate distribution (see Sect. 4 for further details).
244
In the case of independence, where , eq. (22) reduces to
245

(23)
As shown in later examples and case studies, eq. (22) yields probabilities of non-exceedance that
246
are systematically larger than those under independence, i.e..
247
3 Gen2Mp: An Algorithm to Simulate the Two-State Markov-Dependent Process (2Mp)
248
with Arbitrary Marginal Distribution
249
To check the performance of our stochastic model for correlated extremes, we need to
250
simulate a random process with any marginal distribution and Markovian dependence.
251
Nevertheless, we must better clarify what the Markovian dependence refers to here. As stated
252
in the previous Section, we assume that a Markov chain with two states (which may represent for
253
example flood or no flood, dry or wet year, etc.) governs the excursions above/below any level
254
(threshold) of the process (see e.g. Fernández and Salas, 1999). We refer to this process as
255
2Mp (Volpi et al. 2015). For such a process, the Markov property is valid because the
256
probability of the state at a given time depends solely on the state  at the
257
previous time step , i.e.,   .
258
Accepted for publication in Water Resources Research
15
One can be tempted to use the classical AR(1) (first-order autoregressive) model to
259
simulate the 2Mp. However, this is not appropriate in general, as we show in the following by a
260
numerical experiment that provides insights into an effective simulation strategy. Let us define
261
the random variable in such a way that for , it is
262
 
(24)
Then, by definition of conditional probability, we may write e.g. for
263



(25)
In our case the Markov property yields
264


(26)
where  because is stationary. From
265
eqs. (25) and (26), it is easily understood that we seek a modelling framework for which the ratio
266
   should be constant for every , depending solely on the
267
value of the threshold . In order to show that this is generally not valid for AR(1) processes, we
268
compute such a ratio from a sequence of 100000 random numbers generated by a standard
269
Gaussian AR(1) model with lag-one correlation equal to 0.85. In particular, we calculate four
270
ratios () for various threshold values () selected randomly over the
271
entire range of the standard Gaussian distribution. Then, as the ratio values depend on the
272
threshold, for each we “standardize” the results by taking the absolute difference between
273
each ratio  and its mean  computed over , i.e.
274
Accepted for publication in Water Resources Research
16
 

 , then dividing all by ; hence, we obtain the relative
275
difference  
 .
276
We seek a model with a particular Markovian dependence so that for all and
277
. In Fig. 1, we show the boxplots depicting the variability of (percent) over all threshold
278
values with . In the left panel, we display the results for the AR(1) model
279
described above. In contrast it can be noted that values are not only significantly different
280
from zero (especially if compared with results shown in the right panel of Fig. 1, based on
281
simulation algorithm described below), but their variability also changes strongly with the index
282
. Then, we conclude that AR(1) models are not appropriate for our purposes. As shown later,
283
despite sharing similar dependence structures (see Fig. 2), Gen2Mp outperforms AR(1) in terms
284
of .
285
286
Figure 1. Box plots of four () relative differences
 
  for various
287
threshold values () selected at random from the parent (standard Gaussian) distribution, where
288
   and  

 . The red line inside each box is the
289
Accepted for publication in Water Resources Research
17
median and the box edges are the 25th and 75th percentiles of the samples. The left panel depicts results for AR(1)
290
model, while right panel shows boxplots of synthetic data from Gen2Mp algorithm.
291
3.1 Description of the Gen2Mp simulation algorithm
292
We introduce herein a new generator, which enables the Monte Carlo materialization of a
293
2Mp with any arbitrary marginal distribution. It is worth stressing that the theoretical
294
considerations discussed above result in a conceptually simple simulation algorithm, whose
295
scheme consists of an iteration procedure with the following steps:
296
a) We start by generating two sequences 
and 
of independent random
297
numbers with the same arbitrary distribution but conditional on being higher (
) or
298
lower (
) than the median.
299
b) Then, we generate the series 
sampled from i.i.d. Bernoulli random variables
300
taking values 1 and 0 with probability and , respectively.
301
c) The events in the Bernoulli series determine the alternation between the two
302
states of our target process, i.e. higher (state 1) and lower (state 2) than the median. In
303
other words, the series 
determines the holding times before our process
304
switches (jumps) from a state to the other one, because we assume that the state remains
305
the same up to the “time” when there comes a state change . We can now
306
simulate the state-of-generation sequence 
taking values 1 when the state of our
307
process is higher than the median (i.e., 
) and 2 otherwise (i.e., 
).
308
d) Consequently, the sequence 
is a sample of a Markov chain with state space
309
. Since the holding times of each state are completely random, the state probabilities
310
are . On the other hand, as the jumps arrive randomly
311
Accepted for publication in Water Resources Research
18
according to the Bernoulli process, the transition probabilities are
312
  and 
313
 . Therefore, the dependence structure of 
is
314
completely specified in terms of the lag-one autocorrelation coefficient  (see
315
e.g. Lombardo et al., 2017).
316
e) We can now obtain the target correlated sequence 
as follows:
317


(27)
f) As the resulting sequence 
generally does not satisfy the properties of the process
318
we are interested in, we must subdivide each of the cases “> median” and “< median”
319
into two subcases. Specifically, we generate the i.i.d. sequences 
, 
and
320

, 
conditional on being, respectively, “ > 75th percentile”, “(median, 75th
321
percentile)”, (25th percentile, median) and < 25th percentile”. Then we generate other
322
two Bernoulli series 
and 
with same parameter as above, and consequently
323
derive the corresponding state-of-generation sequences 
(taking values 1 when the
324
state of our process is higher than the 75th percentile, and 2 if it belongs to the interval
325
(median, 75th percentile)) and 
(taking values 1 when the state belongs to the
326
interval (25th percentile, median), and 2 if it is lower than the 25th percentile). We can
327
now obtain the target correlated sequence 
as follows:
328




(27’)
Accepted for publication in Water Resources Research
19
g) We continue to subdivide until the relative difference converges to zero for any .
329
In any subdivision step, we follow the same procedure as that described above with a
330
fixed parameter , until a convergence threshold is achieved (here, a mean absolute error
331
equal to 0.002 for is used in the numerical examples below, which is obtained
332
after 9 subdivision steps for ).
333
3.2 Numerical simulations
334
We show some Monte Carlo experiments assuming the standard Gaussian probability
335
model as parent distribution, but it can be changed to any distribution function. We generate a
336
correlated series of 100000 standard Gaussian random numbers using Gen2Mp with parameter
337
. Such a parameter completely determines the dependence structure of the 2Mp process.
338
For  the process is positively correlated, while it reduces to white noise for .
339
For  we get an anticorrelated series. The particular value of  is chosen in
340
order to have the dependence structure of the generated series similar to that of the AR(1) model
341
with lag-one correlation equal to 0.85 (see Fig. 2). Such a value of has been determined
342
numerically exploiting the fact that the dependence structure of the generated series is closely
343
related (showing slight downward bias) to that of the Markov chain defined above, whose
344
lag-one autocorrelation is  (see Fig. 2). Then, to a first approximation, we start
345
assuming , and progressively increase it until the dependence structures of the 2Mp
346
and AR(1) match.
347
Accepted for publication in Water Resources Research
20
348
Figure 2. Comparison of the empirical autocorrelation functions (EACFs) resulting from time series generated by
349
Gen2Mp 
and the Markov chain 
with parameter , and by AR(1) model with lag-one
350
correlation equal to 0.85.
351
Then, even though Gen2Mp and the classical AR(1) algorithms generate time series
352
exhibiting analogous dependence structures, the former significantly outperforms the latter in
353
terms of , as shown in Fig. 1 (right panel). Furthermore, we generate an independent
354
series of 100000 standard Gaussian random numbers as a benchmark using classical generators
355
(e.g. Press et al., 2007). As it can be noticed from the probability-probability (PP) and quantile-
356
quantile (QQ) plots in Fig. 3, the marginal distribution of the final dependent series
357
(corresponding to a 2Mp) is the same as that of the benchmark series. In summary, the important
358
achievement is that Gen2Mp does not alter the parent distribution, but it only induces time
359
dependence in a Markov chain sense.
360
0 5 10 15 20 25 30
0
0.2
0.4
0.6
0.8
1
Lag
EACF
Gen2Mp
AR(1)
Markov chain Di
Accepted for publication in Water Resources Research
21
361
Figure 3. ProbabilityProbability plot (left) and QuantileQuantile plot (right) comparing the marginal distribution
362
of a benchmark series (i.i.d. standard Gaussian random numbers) to that of the correlated series generated using
363
Gen2Mp.
364
Focusing on the frequency analysis of maxima, we investigate the distribution of the
365
maximum term among a random number of a sequence of standard Gaussian random
366
variables . Specifically, we assume that follows a negative binomial distribution in eq.
367
(14), while the variables form a 2Mp stochastic process. Based on such hypotheses, in the
368
previous Section we derived the corresponding theoretical probability distribution function
369
 given by eq. (22). To check this numerically, we generate the random
370
numbers 
(where ) from the negative binomial distribution with parameters
371
and , then we form the target sample 
by taking the maximum of non-
372
overlapping sequences of consecutive random numbers 
. We allow two different
373
dependence structures for 
. In the first case we assume that 
are sampled from i.i.d.
374
random variables; while in the second case 
are sampled from a 2Mp stochastic process
375
with parameter , which is simulated by Gen2Mp.
376
Accepted for publication in Water Resources Research
22
Results in the form of PP plots are depicted in Fig. 4. In the left panel, we show the
377
independent case, and it can be noticed how the empirical distribution of 
is closely
378
matched by eq. (23), i.e. the PP plot (blue line) follows a straight line configuration oriented
379
from  to . In other words, when are i.i.d. eq. (23) proves to be a good model for
380
the theoretical distribution of .
381
In the right panel of Fig. 4, we show the dependent case where the joint probability
382
 in eq. (22) is determined numerically. Clearly, if we apply eq.
383
(23) to the correlated sample 
, then the corresponding plot (blue line) shows a marked
384
departure from the 45° line (i.e., the line of equality). By contrast, the theoretical distribution that
385
we propose in eq. (22) reasonably models the empirical distribution of correlated maxima
386

in all respects (see black line). Therefore, when the belong to 2Mp eq. (22) (black
387
line) largely outperforms eq. (23) (blue line) in modelling the extreme maxima
388
389
Figure 4. ProbabilityProbability plots of the maximum term among a (negative binomial) random number of a
390
sequence of i.i.d. (left panel) and 2Mp (right panel) standard Gaussian random variables .
391
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
Theoretical distribution
Empirical distribution
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
Theoretical distribution
Empirical distribution
independent
45° line
dependent
Accepted for publication in Water Resources Research
23
4 Applications to Rainfall and Streamflow Data
392
In order to provide some insights into the capability of the proposed methodology to
393
reproduce the statistical pattern of observed hydrological extremes, the datasets used in the
394
applications comprise long-term daily rainfall and streamflow time series with no missing values
395
or as few as possible, to fulfil the requirements of POT analyses. In more detail, we use three
396
daily precipitation time series recorded by rain gages located at Groningen (north-eastern
397
Netherlands), Middelburg (south-western Netherlands) and Bologna (northern Italy) respectively
398
ranging from 1847 to 2017 (171 years, no missing values), from 1855 to 2017 (163 years, no
399
missing values) and from 1813 to 2018 (206 years, only three missing values). Raw data,
400
retrieved through the Royal Netherlands Meteorological Institute (KNMI) Climate Explorer web
401
site, are available at https://climexp.knmi.nl/data/bpeca147.dat (accessed on 26 October 2019)
402
for Groningen station, at https://climexp.knmi.nl/data/bpeca2474.dat (accessed on 26 October
403
2019) for Middelburg station and at https://climexp.knmi.nl/data/pgdcnITE00100550.dat
404
(accessed on 26 October 2019) for Bologna station in the period 1813-2007 (see Klein Tank et
405
al., 2002; Menne et al., 2012). For the most recent period, 2008-2018, daily data for Bologna
406
station are provided by the Dext3r public repository (http://www.smr.arpa.emr.it/dext3r/)
407
(accessed on 26 October 2019) of the Regional Agency for Environmental Protection and Energy
408
(Arpae) of Emilia Romagna, Italy (retrieved and processed by Koutsoyiannis for the book:
409
Stochastics of Hydroclimatic Extremes, in preparation for 2020).
410
Furthermore, we analyze one daily streamflow time series of the Po River recorded at
411
Pontelagoscuro, northern Italy (see Montanari, 2012 for further details). The data series,
412
spanning from 1920 to 2017 (98 years, no missing values), is made publicly available by Prof.
413
Alberto Montanari at
414
Accepted for publication in Water Resources Research
24
https://distart119.ing.unibo.it/albertonew/sites/default/files/uploadedfiles/po-pontelagoscuro.txt
415
(accessed on 26 October 2019) for the period 1920-2009, while the remainder (2010-2017) has
416
been retrieved through the Dext3r repository.
417
Since it has been shown that seasonality affects the distribution of hydrological extremes
418
(Allamano et al., 2011), our analyses are performed on a seasonal basis; we distinguish four
419
seasons, each consisting of three months such that the autumn comprises September, October,
420
and November. Winter, spring, and summer are defined similarly. We prefer not to use
421
deseasonalization procedures to avoid possible artifacts that may affect the results. Furthermore,
422
as daily rainfall and streamflow processes exhibit very different marginal distributional
423
properties, all recorded values exceeding a certain threshold are transformed to normality by
424
normal quantile transformation (NQT) for the sake of comparison (Krzysztofowicz, 1997). In
425
practice, observed exceedances 
are transformed to , where  is the
426
quantile function of the standard Gaussian distribution and is the Weibull plotting position of
427
the ordered sample. In addition, all datasets used in this study have been preprocessed by
428
removing leap days, because the February 29th was already removed from all leap years of the
429
1920-2009 Po river discharge dataset.
430
We now investigate the frequency analysis of observed hydrological maxima. For each
431
season of any dataset, we use for example the value of the threshold corresponding to the 5th
432
percentile (excluding zeros for rainfall datasets for simplicity, but we checked that results do not
433
vary considerably if we include zeros), whose exceedances are normalized to for each
434
sample. As stated in Sect. 1, we are interested in the statistical behavior of the maximum term
435
among a random number of equally distributed random variables (i.e., belonging to a certain
436
season) in an interval of time (we assume one year). Then, first we form the POT samples for
437
Accepted for publication in Water Resources Research
25
each year of the record, consisting of (i.e., number of years) sequences of threshold excesses
438

each of size (for ); second we form the sample of annual extremes
439

by taking the maximum of each POT series. In other words,
is a sample of
440
annual maxima of size (i.e., the number of years of the given dataset) taken from annual POT
441
series of size (i.e., the number of exceedances in the k-th year for the considered season). It
442
follows that the sample size used in classical BM analysis is , while that used in our approach
443
is
 . As detailed below, all parameter values (see, e.g., Tables 1 and 2) are estimated from
444
the POT series by maximum likelihood method.
445
We compare the empirical distribution of to the theoretical probability distribution
446
function  given by eq. (4) (i.e., the classical method) assuming Poisson
447
occurrences of independent exceedances, and by eq. (22) (i.e., the proposed method) assuming
448
negative binomial occurrences of 2Mp exceedances. Parameters of Poisson and negative
449
binomial distributions are derived through a process of maximum likelihood estimation from the
450
annual counts 
for each season of each dataset. To a first approximation, we assume
451
statistical independence of 
by checking that, for each dataset, the empirical
452
autocorrelations between the numbers of exceedances of subsequent years are negligible (not
453
shown). Furthermore, we assume that the joint probability of exceedances
454
 in eq. (22) can be written in terms of the univariate marginal distribution
455
(which is the standard normal in case of normal quantile transformation) and a bivariate
456
copula that describes the dependence structure between the variables (Salvadori et al., 2007).
457
Several bivariate families of copulas have been presented in the literature, allowing the selection
458
of different dependence frameworks (Favre et al., 2004). For the sake of simplicity, we choose
459
the following three types of copulas that have been in common use:
460
Accepted for publication in Water Resources Research
26
1. The Gaussian copula (Salvadori et al., 2007 pp. 254-256), which implies the elliptical
461
shape of isolines of the pairwise joint distribution that in our case is given by a
462
bivariate normal distribution  with zero mean and covariance matrix
463
 
 , where the parameter is the average (over years) lag-one autocorrelation
464
coefficient of the annual POT series .
465
2. The Clayton copula (Salvadori et al., 2007 pp. 237-240), which exhibits upper tail
466
independence and lower tail dependence (Salvadori et al., 2007 pp. 170-175), and in our
467
case yields
468
 

(28)
where the parameter can be written in terms of the Kendall's tau correlation coefficient
469
as  , which is the average (over years) of lag-one Kendall's tau
470
autocorrelation coefficient of the annual POT series .
471
3. The Gumbel-Hougaard copula (Salvadori et al., 2007 pp. 236-237), which exhibits upper
472
tail dependence and lower tail independence, and in our case yields
473

(29)
where the parameter is again written in terms of the Kendall's tau correlation
474
coefficient as.
475
All parameter values for all seasons and datasets are reported in Table 1.
476
Accepted for publication in Water Resources Research
27
477
Figure 5. ProbabilityProbability plots of Groningen dataset of daily rainfall. The empirical distributions of
478
maximum terms 
among annual exceedances of the 5th percentile threshold for winter (top left), spring (top
479
right), summer (bottom left) and autumn (bottom right) seasons are compared to the corresponding theoretical
480
distributions assuming both Poisson (P) occurrences (with parameter ) of independent exceedances (eq. 4), and
481
negative binomial occurrences (with parameters and ) of correlated exceedances (eq. 22) with pairwise joint
482
distribution described by the Gaussian (N), Clayton (C, eq. 28) and Gumbel (G, eq. 29) copulas, with parameters
483
and as detailed in the text. All parameter values are reported in Table 1.
484
Accepted for publication in Water Resources Research
28
485
Figure 6. Same as Fig. 5 for Middelburg dataset of daily rainfall.
486
487
Figure 7. Same as Fig. 5 for Bologna dataset of daily rainfall.
488
In Figs. 5-7 we may observe that for all daily rainfall datasets the magnitudes of extreme
489
events taken from excesses of a low threshold (the 5th percentile of the nonzero sample) can be
490
Accepted for publication in Water Resources Research
29
considered independent and identically distributed, and this is consistent with the results shown
491
in the literature using different approaches (see e.g. Marani and Ignaccolo, 2015; Zorzetto et al.,
492
2016; De Michele and Avanzi, 2018). In addition, we may notice that the classical model of POT
493
analyses assuming Poisson occurrences (see eq. (4)) seems to be appropriate to study rainfall
494
extremes. Analogous considerations obviously apply to higher thresholds (not shown). Our
495
model of correlated extremes in eq. (22) is capable of capturing such a behavior with precision.
496
After showing the results with daily rainfall, we also analyze rainfall records at finer time
497
resolution (hourly scale) whose correlation can be stronger than that pertaining to daily data. To
498
this end, we use hourly rainfall data of Bologna idrografico” station for the period 1990-2013
499
provided by the Dext3r repository (23 years full coverage, while the entire 2008 is missing). We
500
checked that such hourly rainfall data aggregated at the daily scale are consistent with the daily
501
data recorded in the same period by Bologna station above (not shown).
502
503
Figure 8. Same as Fig. 5 for Bologna dataset of hourly rainfall.
504
Accepted for publication in Water Resources Research
30
Comparing Figs. 7 and 8, it is noted that extremes of hourly rainfall data are more
505
affected by correlation than daily data (see e.g. winter and autumn seasons, respectively top left
506
and bottom right panels). This is also the case if we consider the same period of record (1990-
507
2013) for both datasets (not shown). Then, we may conclude that low thresholds can be used for
508
classical POT analyses (assuming independence) of rainfall time series at the daily scale (or
509
above), while further investigations of different datasets are required to describe the impact of
510
dependence on the extremal behavior of the rainfall process at finer time scales. Besides, other
511
interesting future analyses could investigate the extremes of areal rainfall, as for example
512
weather radar data will become more reliable and will accumulate in time providing samples
513
with lengths adequate enough to enable reliable investigation of the probability distribution of
514
areal rainfall (Lombardo et al., 2006a,b; Lombardo et al., 2009).
515
By contrast, results change significantly when analyzing extremes of streamflow time
516
series. In fact, we present a case study that shows how models assuming independence among
517
magnitudes of extreme events prove to be inadequate to study the probability distribution of
518
discharge maxima.
519
Accepted for publication in Water Resources Research
31
520
Figure 9. Same as Fig. 5 for the Po River dataset of daily discharge.
521
In Fig. 9, we show the PP plots of the distribution of extreme maxima taken from annual
522
exceedances of the 5th percentile thresholds for the four seasons of the Po River discharge
523
dataset, recorded at Pontelagoscuro station. Contrary to the rainfall case studies, the classical
524
model assuming independent magnitudes with Poisson (P) occurrences shows marked departures
525
from the 45° line. The theoretical distribution is usually much lower than its empirical
526
counterpart, meaning that, under the popular assumption of independent extremes, the theoretical
527
probability of an extreme event of given magnitude being exceeded is significantly higher than
528
the corresponding observed frequency of exceedance. Fig. 9 shows that our 2Mp model of
529
correlated extremes outperforms the widely used independent model. In particular, the
530
distribution of maxima that has a Gumbel copula seems to be more consistent with observed
531
extreme values, denoting dependence in the upper tail of the bivariate distribution
532
 (Schmidt, 2005). In summary, daily streamflow extremes may exhibit
533
Accepted for publication in Water Resources Research
32
noteworthy departures from independence which are consistent with a stochastic process
534
characterized by a 2Mp behavior and upper tail dependence.
535
Table 1. Parameters values for all normalized case studies detailed in the text: for Poisson (P) occurrences (eq. 4);
536
and for negative binomial occurrences (eq. 22); for Clayton (C) and Gumbel (G) copulas (eqs. 28-29); for
537
Gaussian copula.
538
Station
Parameter /
Season
Winter
Spring
Summer
Autumn
Groningen
50.04
41.56
45.04
50.05
76.24
73.15
150.54
164.94
0.66
0.57
0.30
0.30
0.08
0.04
0.02
0.1
0.10
0.05
0.04
0.13
Middelburg
48.41
40.00
38.42
47.16
35.71
40.22
35.47
61.68
1.36
0.99
1.08
0.76
0.09
0.04
0.02
0.09
0.12
0.06
0.02
0.14
Bologna daily
20.92
25.39
16.59
24.67
7.20
22.57
20.98
21.14
2.91
1.13
0.79
1.17
0.03
0.02
-0.05
-0.01
0.05
0.02
-0.06
0.01
Bologna hourly
127.59
128.87
54.09
129.74
5.27
14.14
4.55
12.32
24.22
9.12
11.90
10.53
0.43
0.30
0.17
0.33
0.54
0.38
0.20
0.41
Pontelagoscuro
85.48
87.40
87.39
86.41
67.02
136.59
81.95
245.15
1.28
0.64
1.07
0.35
0.82
0.81
0.84
0.84
0.92
0.92
0.94
0.93
The above results are also evident if we compare theoretical and empirical distributions
539
of streamflow maxima by plotting their quantiles against each other. We use real values for this
540
example (i.e., we do not apply the normal quantile transformation to the data series); therefore,
541
empirical quantiles equal the observed annual maxima. Theoretical quantiles referring to eqs. (4)
542
and (22) (the latter specializes for Gaussian, Clayton and Gumbel copulas) are computed by
543
numerically solving for the root of the equation for a given probability value,
544
Accepted for publication in Water Resources Research
33
(i.e., the Weibull plotting position of observed annual maxima), assuming the classical
545
generalized Pareto (GPD) with zero lower bound as parent distribution of threshold excesses:
546





(30)
where is the shape parameter and is the scale parameter, which we estimate through the
547
maximum likelihood method applied to the entire POT series of each season.
548
In Fig. 10, QQ plots of Po river discharge for the spring season are shown when varying
549
the threshold (from the 5th, , to the 75th, , percentiles) to form POT series. It can be
550
noticed that for low thresholds there is a shift in variance between theoretical (i.e., derived from
551
eq. (22) with Gumbel copula) and empirical quantiles, namely the variance of theoretical annual
552
maxima underestimates its empirical counterpart. This can be due to the fitting performance of
553
the marginal generalized Pareto, which does not reproduce well the tail behavior of observed
554
data (not shown). Fig. 10 shows that increasing the threshold value helps focus the attention on
555
the distribution tail to better capture the behavior of maxima. This is also the case if we compare
556
streamflow quantiles resulting from our model with those estimated through “classical”
557
Generalized Extreme Value (GEV) distribution fitted to the observed annual maxima. All
558
parameter values are reported in Table 2. We note that the three GEV parameters are estimated
559
on  data points, while the five parameters of our model in eq. (22) (, , or , and the
560
two parameters of the GPD with zero lower bound) are estimated on
 data, which are
561
, , , and  for , , , , respectively.
562
As threshold increases evidence of persistence is progressively reduced as expected, but,
563
we also note in Fig. 10 that the theoretical quantiles derived from the classical independent
564
Accepted for publication in Water Resources Research
34
Poisson method always show a shift in mean with respect to observed maxima (i.e., under
565
independence, theoretical streamflow quantiles systematically and significantly overestimate
566
observed streamflow maxima).
567
568
Figure 10. QuantileQuantile plots of Po river discharge (m3/s) for spring season. The observed maximum terms
569
among annual peaks over the 5th percentile (top left), 25th percentile (top right), 50th percentile (bottom left) and
570
75th percentile (bottom right) thresholds are compared to the corresponding theoretical quantiles. In all cases, we
571
assume the Generalized Pareto as parent distribution of daily streamflow (with shape , scale and
572
threshold parameters), and compute quantiles specializing eq. (22) for Poisson (P) occurrences (with
573
parameter , eq. 4) of independent exceedances, and for negative binomial occurrences (with parameters and) of
574
correlated exceedances with pairwise joint distribution described by the Gaussian (N), Clayton (C) and Gumbel (G)
575
copulas, with parameters and as detailed in the text. We also plot theoretical quantiles from GEV distribution
576
(with shape , scale and location parameters) fitted to the observed annual maxima. All parameter
577
values are reported in Table 2.
578
To summarize, our model provides a closed-form expression of the exact distribution for
579
dependent hydrological maxima, which is capable of capturing the behavior of observed
580
Accepted for publication in Water Resources Research
35
extremes of long-term hydrological records. In particular, while rainfall extremes do not seem to
581
be significantly affected by correlation at the daily scale so that the classical Poisson model can
582
be appropriate for use in POT analyses of daily rainfall time series, the influence of correlation is
583
prominent in the streamflow process at the daily scale and it is important to preserve in
584
simulation and analysis of extremes.
585
Table 2. Parameters values for all models used in the QQ plots of Fig. 10.
586
Model
Parameter /
Threshold
Q5
Q25
Q50
Q75
Generalized Pareto
-0.10
-0.03
-0.05
-0.03
1220.16
1044.03
1065.80
998.06
653.00
998.00
1410.00
2133.00
Poisson
87.40
68.97
45.89
22.99
Negative Binomial
136.59
5.89
1.74
0.71
0.64
11.71
26.45
32.22
Clayton & Gumbel
copulas
0.82
0.76
0.63
0.48
Gaussian copula
0.91
0.86
0.75
0.61
GEV
-0.11
-0.11
-0.08
-0.07
1463.94
1463.94
1399.01
1273.31
3309.91
3309.91
3369.76
3739.46
5 Conclusions
587
The study of hydrological extremes faces the chronic lack of sufficient data to perform
588
reliable analyses. This is partly related to the inherent nature of extreme values, which are rare by
589
definition, and partly related to the relative shortness of systematic records from hydro-
590
meteorological gauge networks. The limited availability of data poses serious problems for an
591
effective and reliable use of asymptotic results provided by EVT.
592
Alternative methods focusing on the exact distribution of extreme maxima extracted from
593
POT sequences of random size over fixed time windows have been proposed in the past.
594
However, closed-form analytical results were developed only for independent data with Poisson
595
Accepted for publication in Water Resources Research
36
occurrences. Even though these assumptions may be sufficiently reliable for high-threshold POT
596
values, this type of data still generates relatively small sample size. In order to better exploit the
597
available information, it can be convenient to consider lower thresholds. However, the effect of
598
lower thresholds is twofold: on the one side the sample size increases, but on the other side the
599
hypotheses of independent magnitudes and Poisson occurrences of POT values are no longer
600
reliable.
601
In this study, we have introduced closed-form analytical formulae for the exact
602
distribution of maxima from POT sequences that generalize the classical independent model,
603
overcoming its limits and enabling the study of maxima taken from dependent low-threshold
604
POT values with arbitrary marginal distribution, first-order Markov dependence structure, and
605
negative binomial occurrences, and tested real data against this hypothesis. Even though the
606
framework can be further generalized by introducing arbitrary dependence structures and models
607
for POT occurrences, first-order Markov chains and negative binomial distributions provide a
608
good compromise between flexibility and the possibility to obtain simple ready-to-use formulae.
609
In this respect, it should be noted that our model of correlated extremes can cover a sufficient
610
range of cases. We have shown that the modulation of the lag-one autocorrelation coefficient of
611
the annual sequences of POT values (i.e. the Markov chain parameter) gives a set of extremal
612
distributions that include the empirical distribution of maxima for rainfall data series, and for
613
highly correlated low-threshold discharge POT series. On the other hand, the negative binomial
614
model is a widely used and theoretically well-established model for occurrences exhibiting
615
clustering and overdispersion, which are common characteristics of POT events resulting from
616
persistent processes, such as river discharge.
617
Accepted for publication in Water Resources Research
37
The relationship between our model and its classical independent version (i.e. eqs. (22)
618
and (4)) along with results of the case studies show that distribution of extreme maxima under
619
dependence yields probabilities of exceedance that are systematically lower than those under
620
independence, and are also consistent with traditional approaches (GEV), based on extreme
621
value theory, applied to long annual maxima series.
622
Finally, we stress that our model of the exact distribution of correlated extremes requires
623
knowledge or fitting of a bivariate distribution (and therefore its univariate marginal
624
distribution). In particular, while the extremal behavior of the rainfall process does not seem to
625
be significantly affected by dependence at the daily scale so that the classical Poisson model can
626
be appropriate for use in POT analyses of daily rainfall time series, the influence of correlation is
627
prominent in the streamflow process at the daily scale and it appears also in the rainfall process
628
at the hourly scale. Then, it is important to account for such dependence in the extreme value
629
analyses, which are crucial to hydrological design and risk management because critical values
630
can be less extreme and more frequent than expected under the classical independent models.
631
Comparing the Gaussian, Clayton and Gumbel bivariate copulas, describing different
632
dependence structures, and the standard Gaussian and Generalized Pareto marginal distributions,
633
we found that the distribution of maxima that has a Gumbel copula seems to be more consistent
634
with streamflow extreme values, denoting dependence in the upper tail of the bivariate
635
distribution. However, these aspects require further investigation form both theoretical and
636
empirical standpoints, and will be the subject of future research. In the spirit of the recent
637
literature on the topic, we believe that the present study will contribute to develop more reliable
638
data-rich-based analyses of extreme values.
639
Accepted for publication in Water Resources Research
38
Acknowledgments
640
All data used in this study are freely available online, as described in Section 4 above. The
641
associate editor, an eponymous reviewer, Geoff Pegram, and two anonymous reviewers are
642
gratefully acknowledged for their constructive comments that helped to substantially improve the
643
paper. We also thank Alessio Domeneghetti for providing the first author with detailed
644
information on the Dext3r public repository.
645
References
646
Abramowitz, M., and Stegun, I. A. (1972). Handbook of Mathematical Functions with Formulas,
647
Graphs, and Mathematical Tables, 9th printing. New York: Dover.
648
Allamano, P., Laio, F., and Claps, P. (2011). Effects of disregarding seasonality on the
649
distribution of hydrological extremes. Hydrology and Earth System Sciences, 15, 3207-
650
3215.
651
Bernardara, P., Mazas, F., Kergadallan, X., & Hamm, L. (2014). A two-step framework for over-
652
threshold modelling of environmental extremes. Natural Hazards and Earth System
653
Sciences, 14(3), 635-647.
654
Bogachev, M. I., and Bunde, A. (2012). Universality in the precipitation and river runoff. EPL
655
(Europhysics Letters), 97(4), 48011.
656
Bommier, E. (2014). Peaks-Over-Threshold Modelling of Environmental Data. U.U.D.M.
657
Project Report 2014:33, Department of Mathematics, Uppsala University.
658
Calenda, G., Petaccia, A., and Togna, A. (1977). Theoretical probability distribution of critical
659
hydrologic events by the partial-duration series method. Journal of Hydrology, 33(3-4),
660
233-245.
661
Accepted for publication in Water Resources Research
39
Claps, P., and Laio, F. (2003). Can continuous streamflow data support flood frequency
662
analysis? An alternative to the partial duration series approach. Water Resources
663
Research, 39(8), 1216.
664
Coles S. (2001). An Introduction to Statistical Modeling of Extreme Values, Springer Series in
665
Statistics, Springer, London.
666
De Michele, C., and Avanzi, F. (2018). Superstatistical distribution of daily precipitation
667
extremes: A worldwide assessment. Scientific reports, 8, 14204.
668
Eastoe, E. F., and Tawn, J. A. (2010). Statistical models for overdispersion in the frequency of
669
peaks over threshold data for a flow series. Water Resources Research, 46(2).
670
Eichner, J. F., Kantelhardt, J. W., Bunde, A., and Havlin, S. (2011). The statistics of return
671
intervals, maxima, and centennial events under the influence of long-term correlations. In
672
J. Kropp & H.-J. Schellnhuber (Eds.), Extremis (pp. 243). Berlin, Heidelberg: Springer.
673
Favre, A. C., El Adlouni, S., Perreault, L., Thiémonge, N., and Bobée, B. (2004). Multivariate
674
hydrological frequency analysis using copulas. Water Resources Research, 40(1).
675
Feller, W. (1968). An Introduction to Probability Theory and Its Applications, vol. I, 3rd edition,
676
London-New York-Sydney-Toronto, John Wiley & Sons.
677
Fernández, B., and Salas, J. D. (1999). Return period and risk of hydrologic events. I:
678
mathematical formulation. Journal of Hydrologic Engineering, 4(4), 297-307.
679
Ferro, C. A., and Segers, J. (2003). Inference for clusters of extreme values. Journal of the Royal
680
Statistical Society: Series B (Statistical Methodology), 65(2), 545-556.
681
Fisher, R., & Tippett, L. (1928). Limiting forms of the frequency distribution of the largest or
682
smallest member of a sample. Mathematical Proceedings of the Cambridge
683
Philosophical Society, 24(2), 180-190.
684
Accepted for publication in Water Resources Research
40
Fuller, W. E. (1914). Flood flows. Transactions of the American Society of Civil Engineers, 77,
685
564-617.
686
Graham, R. L., Knuth, D. E., and Patashnik, O. (1994). Concrete Mathematics: A Foundation for
687
Computer Science, 2nd ed. Reading, MA: Addison-Wesley.
688
Hazen, A. (1914). The storage to be provided in impounding reservoirs for municipal water
689
supply. Transactions of the American Society of Civil Engineers, 77, 1539-1669.
690
Klein Tank, A.M.G. et al. (2002). Daily dataset of 20thcentury surface air temperature and
691
precipitation series for the European Climate Assessment. International Journal of
692
Climatology, 22(12), 1441-1453.
693
Koutsoyiannis, D. (2004a). Statistics of extremes and estimation of extreme rainfall: I.
694
Theoretical investigation. Hydrological Sciences Journal, 49(4), 575590.
695
Koutsoyiannis, D. (2004b). Statistics of extremes and estimation of extreme rainfall: II.
696
Empirical investigation of long rainfall records. Hydrological Sciences Journal, 49(4),
697
591610.
698
Koutsoyiannis, D., and Montanari, A. (2015). Negligent killing of scientific concepts: the
699
stationarity case. Hydrological Sciences Journal, 60(7-8), 1174-1183.
700
Koutsoyiannis, D., and Papalexiou, S.M. (2017). Extreme rainfall: Global perspective, Handbook
701
of Applied Hydrology, Second Edition, edited by V.P. Singh, 74.174.16, McGraw-Hill,
702
New York.
703
Krzysztofowicz, R. (1997). Transformation and normalization of variates with specified
704
distributions. Journal of Hydrology, 197(1-4), 286-292.
705
Iliopoulou, T., and Koutsoyiannis, D. (2019). Revealing hidden persistence in maximum rainfall
706
records. Hydrological Sciences Journal, doi: 10.1080/02626667.2019.1657578.
707
Accepted for publication in Water Resources Research
41
Leadbetter M. R. (1974). On extreme values in stationary sequences. Zeitschrift für
708
Wahrscheinlichkeitstheorie und Verwandte Gebiete, 28, 289303.
709
Leadbetter M. R. (1983). Extremes and local dependence in stationary sequences. Zeitschrift für
710
Wahrscheinlichkeitstheorie und Verwandte Gebiete, 65, 291306.
711
Lombardo, F., Volpi, E., Koutsoyiannis, D., and Serinaldi, F. (2017). A theoretically consistent
712
stochastic cascade for temporal disaggregation of intermittent rainfall. Water Resources
713
Research, 53(6), 4586-4605.
714
Lombardo, F., Montesarchio, V., Napolitano, F., Russo, F., and Volpi, E. (2009). Operational
715
applications of radar rainfall data in urban hydrology. In Proceedings of a symposium on
716
the role of hydrology in water resources management, Capri, Italy, October 2008. (pp.
717
258-265). IAHS Press.
718
Lombardo, F., Napolitano, F., & Russo, F. (2006a). On the use of radar reflectivity for estimation
719
of the areal reduction factor. Natural Hazards and Earth System Sciences, 6(3), 377-386.
720
Lombardo, F., Napolitano, F., Russo, F., Scialanga, G., Baldini, L., and Gorgucci, E. (2006b).
721
Rainfall estimation and ground clutter rejection with dual polarization weather
722
radar. Advances in Geosciences, 7, 127-130.
723
Luke, A., Vrugt, J. A., AghaKouchak, A., Matthew, R., and Sanders, B. F. (2017). Predicting
724
nonstationary flood frequencies: Evidence supports an updated stationarity thesis in the
725
United States. Water Resources Research, 53(7), 5469-5494.
726
Marani, M., and Ignaccolo, M. (2015). A metastatistical approach to rainfall extremes. Advances
727
in Water Resources, 79, 121-126.
728
Accepted for publication in Water Resources Research
42
Menne, M. J., Durre, I., Vose, R. S., Gleason, B. E., and Houston, T. G. (2012). An overview of
729
the global historical climatology network-daily database. Journal of Atmospheric and
730
Oceanic Technology, 29(7), 897-910.
731
Montanari, A. (2012). Hydrology of the Po River: looking for changing patterns in river
732
discharge. Hydrology and Earth System Sciences, 16, 3739-3747.
733
O’Connell, P. E., Koutsoyiannis, D., Lins, H. F., Markonis, Y., Montanari, A., and Cohn, T.
734
(2016). The scientific legacy of Harold Edwin Hurst (18801978). Hydrological Sciences
735
Journal, 61(9), 1571-1590.
736
Papalexiou, S. M., and Koutsoyiannis, D. (2013). Battle of extreme value distributions: A global
737
survey on extreme daily rainfall. Water Resources Research, 49, 187-201.
738
Papalexiou, S. M., Koutsoyiannis, D., and Makropoulos, C. (2013). How extreme is extreme? An
739
assessment of daily rainfall distribution tails. Hydrology and Earth System Sciences, 17,
740
851-862.
741
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2007). Numerical recipes
742
3rd edition: The art of scientific computing. Cambridge University Press.
743
Salas, J. D., Obeysekera, J., & Vogel, R. M. (2018). Techniques for assessing water
744
infrastructure for nonstationary extreme events: a review. Hydrological Sciences Journal,
745
63(3), 325-352.
746
Salvadori, G., De Michele, C., Kottegoda, N. T., & Rosso, R. (2007). Extremes in nature: an
747
approach using copulas. Vol. 56. Springer Science & Business Media.
748
Schmidt, R. (2005). Tail dependence. In Statistical Tools for Finance and Insurance (pp. 65-91).
749
Springer, Berlin, Heidelberg.
750
Accepted for publication in Water Resources Research
43
Serinaldi, F., and Kilsby, C. G. (2014). Rainfall extremes: Toward reconciliation after the battle
751
of distributions. Water Resources Research, 50(1), 336-352.
752
Serinaldi, F., and Kilsby, C. G. (2016). Understanding persistence to avoid underestimation of
753
collective flood risk. Water, 8(4), 152.
754
Serinaldi, F., and Kilsby, C. G. (2018). Unsurprising Surprises: The Frequency of Record
755
breaking and Overthreshold Hydrological Extremes Under Spatial and Temporal
756
Dependence. Water Resources Research, 54(9), 6460-6487.
757
Serinaldi, F., Kilsby, C. G., and Lombardo, F. (2018). Untenable nonstationarity: An assessment
758
of the fitness for purpose of trend tests in hydrology. Advances in Water Resources, 111,
759
132-155.
760
Todorovic, P. (1970). On some problems involving random number of random variables. The
761
Annals of Mathematical Statistics, 41(3), 10591063.
762
Todorovic, P., and Zelenhasic, E. (1970). A stochastic model for flood analysis. Water Resources
763
Research, 6(6), 16411648.
764
Volpi, E., Fiori, A., Grimaldi, S., Lombardo, F., and Koutsoyiannis, D. (2015). One hundred
765
years of return period: Strengths and limitations. Water Resources Research, 51(10),
766
8570-8585.
767
Volpi, E., Fiori, A., Grimaldi, S., Lombardo, F., and Koutsoyiannis, D. (2019). Save
768
hydrological observations! Return period estimation without data decimation. Journal of
769
Hydrology, 571, 782-792.
770
Zorzetto, E., Botter, G., and Marani, M. (2016). On the emergence of rainfall extremes from
771
ordinary events. Geophysical Research Letters, 43(15), 8076-8082.
772
... Thus, by definition their design and management have to take into consideration the probabilistic behaviour of extremes, i.e., account for the distribution's tails (in particular the right one for maxima), where the extremes live. This criticality has motivated a significant amount of research in the domain hydrological extremes, offering a variety of approaches (Buishand 1989(Buishand , 1991Pilon et al. 1991;Wilks 1993;Koutsoyiannis et al. 1998;Koutsoyiannis 1999Koutsoyiannis , 2004Koutsoyiannis , 2020Katz et al. 2002;Park and Jung 2002;Coles et al. 2003;Favre et al. 2004;Wilson and Toumi 2005;Deidda and Puliga 2006;Calenda et al. 2009;Svensson and Jones 2010;Volpi andFiori 2012, 2014;Cavanaugh et al. 2015;Marani and Ignaccolo 2015;Volpi et al. 2015Volpi et al. , 2019Zorzetto et al. 2016;Blum et al. 2017;Salas et al. 2018;Ye et al. 2018;De Michele and Avanzi 2018;Salas and Obeysekera 2019;Benestad et al. 2019;Courty et al. 2019;De Michele 2019;Lombardo et al. 2019;Iliopoulou and Koutsoyiannis 2020;Serinaldi et al. 2020), just to name a few. For a thorough discussion on hydroclimatic extremes, and associated methodological approaches, the interested reader is referred to the recent book of Koutsoyiannis (2020). ...
... Probably due to, 1) the distribution of the base/parent processes has to be a priori known (or inferred from data), and 2) the convenience offered (in terms of data storage/management and computation) by utilizing limiting laws that imply the use of subsets of data (i.e., inference on the distribution of maxima using only BM or POT observations). Of course, this convenience comes at the cost of neglecting the effect of temporal dependence, as well as neglecting observations per se (e.g., the second and third larger maxima within a block), facts that arguably affect the inference about the extremes behaviour (Volpi et al. 2019;Lombardo et al. 2019;Koutsoyiannis 2020;Serinaldi et al. 2020). ...
... See also the recent review on the topic by De Michele (2019), focusing on the exact distribution of maximum annual daily precipitation. Notable exceptions are the works of Lombardo et al. (2019) and Serinaldi et al. (2020), that regard the exact distribution of -length block maxima under the assumptions of Markov and general autocorrelation structures respectively. In particular, the latter work via thorough analyses and insightful discussions clarifies, and brings order, in many delicate matters that concern the non-asymptotic distribution of maxima of autocorrelated processes (highlighting also important links with the seminal works of Todorovic andZelenhasic (1970), andTodorovic (1970)). ...
Article
Focal point of this work is the estimation of the distribution of maxima without the use of classic extreme value theory and asymptotic properties, which may not be ideal for hydrological processes. The problem is revisited from the perspective of non-asymptotic conditions, and regards the so-called exact distribution of block-maxima of finite-sized k-length blocks. First, we review existing non-asymptotic approaches/models, and also introduce an alternative and fast model. Next, through simulations and comparisons (using asymptotic and non-asymptotic models), involving intermittent processes (e.g., rainfall), we highlight the capability of non-asymptotic approaches to model the distribution of maxima with reduced uncertainty and variability. Finally, we discuss an alternative use of such models that concerns the theoretical estimation of the multi-scale probability of obtaining a zero value. A useful finding when the scope is the multi-scale modeling of intermittent hydrological processes (e.g., intensity-duration-frequency models). The work also entails step-by-step recipes and an R-package.
... A more general algorithm for generation of any type of marginal distribution was recently proposed by Lombardo et al. [28], but only under the condition of Markov dependence, thus leaving out problems with more complex dependence, including LRD. Recent advances include the use of machine learning methods in stochastic simulation, e.g., [29], which, however, have the disadvantages of being implicit in their mathematical structure, and non-parsimonious. ...
... For a process with exponential distribution, which is a subcase of the gamma distribution, there exist generation algorithms for the case of short-range (Markov) dependence (e.g., [46]). As already mentioned, a more general algorithm for generation of any type of marginal distribution has recently been proposed by Lombardo et al. [28], but again under the condition of the Markov dependence. However, the method proposed here can generate such a process irrespective of the type of the dependence, whether SRD or LRD. ...
Article
Full-text available
We outline and test a new methodology for genuine simulation of stochastic processes with any dependence structure and any marginal distribution. We reproduce time dependence with a generalized, time symmetric or asymmetric, moving-average scheme. This implements linear filtering of non-Gaussian white noise, with the weights of the filter determined by analytical equations, in terms of the autocovariance of the process. We approximate the marginal distribution of the process, irrespective of its type, using a number of its cumulants, which in turn determine the cumulants of white noise, in a manner that can readily support the generation of random numbers from that approximation, so that it be applicable for stochastic simulation. The simulation method is genuine as it uses the process of interest directly, without any transformation (e.g., normalization). We illustrate the method in a number of synthetic and real-world applications, with either persistence or antipersistence, and with non-Gaussian marginal distributions that are bounded, thus making the problem more demanding. These include distributions bounded from both sides, such as uniform, and bounded from below, such as exponential and Pareto, possibly having a discontinuity at the origin (intermittence). All examples studied show the satisfactory performance of the method.
... On the other hand, processes with asym-60 metric distributions can also exhibit asymmetry in time. 61 A more general algorithm for generation of any type of marginal distribution has 62 recently been proposed by Lombardo et al. [28] but only under the condition of Markov 63 dependence, thus leaving out problems with more complex dependence, including LRD. 64 For these reasons, it is necessary to develop genuine stochastic simulation procedures 65 which will be able to generate non-Gaussian processes without transformations to a 66 Gaussian or other distribution. Such procedures have already been discussed in earlier 67 works, referring to the explicit preservation of four moments in a time-symmetric setting 68 [29] as well as preservation of distributions in terms of cumulants, rather than moments 69 [30,31]. ...
... [47]). As already mentioned, a more general algorithm for generation of any type of 330 marginal distribution has recently been proposed by Lombardo et al. [28] but again under 331 the condition of Markov dependence. However, the method proposed here can generate 332 such a process irrespective of the type of the dependence, whether SRD or LRD. ...
Preprint
Full-text available
We outline and test a new methodology for genuine simulation of stochastic processes with any dependence and any marginal distribution. We reproduce time dependence with a generalized, time symmetric or asymmetric, moving-average scheme. This implements linear filtering of non-Gaussian white noise, with the weights of the filter determined by analytical equations in terms of the autocovariance of the process. We approximate the marginal distribution of the process, irre-spective of its type, using a number of its cumulants, which in turn determine the cumulants of white noise in a manner that can readily support the generation of random numbers from that approximation, so that it be applicable for stochastic simulation. The simulation method is genuine as it uses the process of interest directly without any transformation (e.g. normalization). We illus-trate the method in a number of synthetic and real-world applications with either persistence or antipersistence, and with non-Gaussian marginal distributions that are bounded, thus making the problem more demanding. These include distributions bounded from both sides, such as uniform, and bounded form below, such as exponential and Pareto, possibly having a discontinuity at the origin (intermittence). All examples studied show the satisfactory performance of the method.
... On the other hand, the non-asymptotic distributions have attracted more attention in the S-HFA recently (De Michele, 2019;Lombardo et al., 2019;Marani and Ignaccolo, 2015;Marra et al., 2020). Unlike the EVT-based distributions, the non-asymptotic distributions are advantageous in i) exploiting more available information, as they use the ordinary events rather than only the extremes, ii) relaxing the requirement of asymptotic convergence, and/or iii) avoiding the additional assumption of the Poisson distributed event arrival (as in the GP distribution). ...
Article
The implementation of nonstationary hydrological frequency analysis (NS-HFA) has often been hampered by the relatively short datasets and the resulting high uncertainty. Most recently, the non-asymptotic Metastatistical extreme value (MEV) and simplified MEV (SMEV) distributions, which rely on the ordinary events rather than the extremes only, have attracted attention in the HFA of both rainfall and streamflow. Despite their use for trend detection/attribution and producing future projections, their practical implementation for the NS-HFA is absent in the literature. This paper therefore implemented these models (called MEV and SMEV-based models) in the NS-HFA and comprehensively assessed their performance from the perspectives of fitting efficiency, accuracy, and uncertainty for both in-sample fitting and out-of-sample prediction purposes. The asymptotic models based on the generalized extreme value (GEV) distribution were used as the benchmark. The assessment employed synthetic and real rainfall datasets that exhibit stationarity in the number of events per year. All the nonstationary ordinary-event datasets followed the Weibull distribution with linearly changing parameters, while their standardized annual maximum series aligned with the GEV distribution. Thus, the MEV, SMEV- and GEV-based models could be fairly assessed and compared. The regula-falsi profile likelihood method was extended to quantify the uncertainty of the MEV and SMEV-based models. The results demonstrated that the MEV model was not advantageous over other models in terms of all three evaluation perspectives. Whereas the SMEV-based models demonstrated superiority due to their higher accuracy, equivalent or better fitting efficiency, as well as lower uncertainty compared to all other models. Therefore, this paper advocates the use of the SMEV distribution to advance the NS-HFA by harnessing the information from the ordinary events.
... In addition, considerable literature applied sophisticated algorithms, such as machine learning, genetic algorithms, principal component analysis, etc. [6,[35][36][37], or multitemporal acquisitions, all methods that can involve a considerable computational effort and an emergency response timing not adequate. Besides, the more complex the model, the harder it is to seek its theoretical consistency in statistical terms (see e.g., [38,39]). On the contrary, we used a rapid, relatively simple method that had success also thanks to the availability of very high-resolution SAR images. ...
Article
Full-text available
The increasing availability of satellite Synthetic Aperture Radar (SAR) images is opening new opportunities for operational support to predictive maintenance and emergency actions. With the purpose of investigating the performances of SAR images characterized by different geometric resolutions for post-earthquake damage detection and mapping, we analyzed three SAR image datasets (Sentinel-1, COSMO-SkyMed Spotlight, and COSMO-SkyMed StripMap) available in Norcia (Central Italy) that were severely affected by a strong seismic sequence in 2016. By applying the amplitude and the coherent change detection processing tools, we compared pairs of images with equivalent features collected before and after the main shock on 30 October 2016 (at 06:40, UTC). Results were compared against each other and then measured against the findings of post-earthquake field surveys for damage assessment, performed by the Italian National Fire and Rescue Service (Corpo Nazionale dei Vigili del Fuoco—CNVVF). Thanks to the interesting and very rare opportunity to have pre-event COSMO-SkyMed Spotlight images, we determined that 1 × 1-m nominal geometric resolutions can provide very detailed single-building damage mapping, while COSMO-SkyMed StripMap HIMAGE images at 3 × 3-m resolutions return relatively good detections of damaged buildings; and, the Sentinel-1 images did not allow acquiring information on single buildings—they simply provided approximate identifications of the most severely damaged sectors. The main outcomes of the performance investigation we carried out in this work can be exploited considering the exponentially growing satellite market in terms of revisit time and image resolution.
Article
At macroscales (e.g., large river basins, nations, continents, or the globe), hydro-hazard mitigation under climate change requires accurate, applicable hydrologic models, especially over cold regions of complicated hydroclimatic relations. As an effort to address such a challenge, we innovate a macroscale distributed hydro-modeling method, Bayesian principal-monotonicity inference (BaPMI), based on climate classification, representative-grids selection, statistical hydrology, surrogates-based Bayesian optimization, and variance-based sensitivity analysis. The method is applied to a typical large cold-region watershed, i.e., Athabasca River Basin in Canada, which reveals a series of findings such as the following representatives. BaPMI shows promising skills in quantitatively characterizing macroscale cold-region hydrology under overall impacts of heterogeneous climates. Bayesian optimization of which hyperparameters are insensitive decreases hydro-modeling computational time by 96.3% and, together with climate classification, largely enhances BaPMI applicability while ensuring modeling accuracies. Individual BaPMI hydro-model parameters can explain over ¾ of the variation of hydro-modeling accuracies and, due to their interaction, joint calibration is required for avoiding underestimation of the impacts. Climatic conditions dominating cross-scale uppermost hydrologic variations consist of night temperature (40%), day temperature (38%), and precipitation (22%) for all catchments; all river flows along the mainstem are associated with upstream climatic events such as glacier melts. Climatic impacts decline from upstream to downstream and from summer, spring, winter to autumn, which may relate to tempo-spatial heterogeneity of soil, vegetation, geology, human interference and other non-climatic factors. Without this study, macroscale distributed hydrologic modeling would lack one advanced method (i.e., BaPMI) that can immunize against effects of data uncertainty, subjective judgement and climatic collinearity, adapt to complicated hydroclimatic relations and diverse hydro-variable(s) distributions, reveal cross-scale uppermost hydrologic variations and, through inexpensive computations, enhance hydro-modeling accuracies to avoid climatic-impact underestimation. The findings of this study could facilitate BaPMI applications and advance macroscale (cold-region) hydrology.
Article
Full-text available
In any statistical investigation, we deal with the applications of probability theory to real problems, and the conclusions are inferences based on observations. To obtain plausible inferences, statistical analysis requires careful understanding of the underlying probabilistic model, which constrains the extraction and interpretation of information from observational data, and must be preliminarily checked under controlled conditions. However, these very first principles of statistical analysis are often neglected in favor of superficial and automatic application of increasingly available ready-to-use software, which might result in misleading conclusions, confusing the effect of model constraints with meaningful properties of the process of interest. To illustrate the consequences of this approach, we consider the emerging research area of so-called ‘compound events’, defined as a combination of multiple drivers and/or hazards that contribute to hydro-climatological risk. In particular, we perform an independent validation analysis of a statistical testing procedure applied to binary series describing the joint occurrence of hydro-climatological events or extreme values, which is supposed to be superior to classical analysis based on Pearson correlation coefficient. To this aim, we suggest a theoretically grounded model relying on Pearson correlation coefficient and marginal rates of occurrence, which enables accurate reproduction of the observed joint behavior of binary series, and offers a sound simulation tool useful for informing risk assessment procedures. Our discussion on compound events highlights the dangers of renaming known topics, using imprecise definitions and overlooking or misusing existing statistical methods. On the other hand, our model-based approach reveals that consistent statistical analyses should rely on informed stochastic modeling in order to avoid the proposal of flawed methods, and the untimely dismissal of well-devised theories.
Article
Recent advances in the study of extreme values, namely the Metastatistical Extreme Value (MEV) framework, showed good performances for the estimation of extremes in several fields. Here we adopt MEV for flood frequency analysis and leverage its intrinsic property of allowing for the choice of the distribution which best describes ordinary peaks to improve flood estimation. To this end, we develop a non-parametric approach to select ex ante the most suitable distribution of ordinary peaks between Gamma and Log-Normal. The method relies on the tail ratio, which we define as the ratio between the empirical 99th and 95th percentile of the ordinary peaks, and is tested by using daily streamflow time series from 182 gauges in Germany. Based on the value of the tail ratio index, we choose either the Gamma or the Log-Normal distributions to represent the ordinary peaks in each gauge. The approach correctly identifies the most suitable distribution of ordinary peaks in a large majority of the analyzed basins, and is robust to changes of the considered dataset. The preliminary selection of the ordinary distribution based on the tail ratio index improves the estimation of frequent and rare floods with respect to MEV applied with a single distribution not tailored on the specific statistical properties of the ordinary peaks. Finally, by comparing the developed methodology with the standard Generalized Extreme Value (GEV) distribution, we show that we are able to reduce the estimation uncertainty of high flood quantiles.
Preprint
Full-text available
This is a working draft of a book in preparation. Current version 0.4 – uploaded on ResearchGate on 25 January 2022. (Earlier versions: 0.3 – uploaded on ResearchGate on 17 January 2022. 0.2 – uploaded on ResearchGate on 3 January 2022. 0.1 (initial) – uploaded on ResearchGate on 1 January 2022.) Some stuff is copied from Koutsoyiannis (2021, https://www.researchgate.net/ publication/351081149). Comments and suggestions will be greatly appreciated and acknowledged.
Article
There has long been interest in making inferences about future low-probability natural events that have magnitudes greater than any in the past record. Given a stationary time series, the unbounded Type 1 and Type 2 asymptotic extreme value distributions are often invoked as giving theoretical justification for extrapolating to large magnitudes and long return periods for hydrological variables such as rainfall and river discharge. However, there is a problem in that environmental extremes are bounded above by the bounded nature of their causal variables. Extrapolation using unbounded asymptotic models therefore cannot be justified from extreme value theory because at some point they will over-predict future magnitudes. This leaves the apparent contradiction, for example, of annual rainfall maxima being well approximated by Type 2 extreme value distributions despite the fact of the true bounded nature of rainfall magnitudes. An alternative asymptotic extreme value approach is suggested for further investigation, with the model being the asymptotic distribution of minima (Weibull distribution) applied to block maxima reciprocals. Two examples are presented where data that is well matched by Type 1 or Type 2 extreme value distributions give reciprocals suggestive of lower bounds (upper bound γ to the original data). This corresponds to a three-parameter Weibull distribution for the reciprocals, with location parameter γ⁻¹. When this situation is demonstrated, parameter estimation can be carried out with respect to the distribution of reciprocals of three-parameter Weibull random variables. This distribution is referenced here as the bounded inverse Weibull distribution. A maximum likelihood parameter estimation methodology is presented, together with a parametric bootstrap approach for obtaining one sided upper confidence limits to γ. Where data permits estimation of γ, the bounded inverse Weibull distribution is suggested as an improved alternative to Type 1 or Type 2 extreme value distributions because the upper bound reality is recognised. However, extensive application to many data sets is required to evaluate the practical utility of the bounded approach for extrapolating beyond the largest recorded event.
Thesis
Full-text available
Review on the choice of threshold in the Peaks-over-Threshold method Application on temperatures records in Uppsala, Sweden, during the time period 1840-2012
Article
Full-text available
Clustering of extremes is critical for hydrological design and risk management and challenges the popular assumption of independence of extremes. We investigate the links between clustering of extremes and long-term persistence, else Hurst-Kolmogorov (HK) dynamics, in the parent process exploring the possibility of inferring the latter from the former. We find that (a) identifiability of persistence from maxima depends foremost on the choice of the threshold for extremes, the skewness and kurtosis of the parent process, and less on sample size; and (b) existing indices for inferring dependence from series of extremes are downward biased when applied to non-Gaussian processes. We devise a probabilistic index based on the probability of occurrence of peak-over-threshold events across multiple scales, which can reveal clustering, linking it to the persistence of the parent process. Its application shows that rainfall extremes may exhibit noteworthy departures from independence and consistency with an HK model.
Article
Full-text available
The concept of return period and its estimation are pivotal in risk management for many geophysical applications. Return period is usually estimated by inferring a probability distribution from an observed series of the random process of interest and then applying the classical equation, i.e. the inverse of the exceedance probability. Traditionally, we form a statistical sample by selecting, from the “complete” time series (e.g. at the daily scale), those values that can reasonably be considered as realizations of independent extremes, e.g. annual maxima or peaks over a certain high threshold. Such a selection procedure entails that a large number of observations are discarded; this wastage of information could have important consequences in practical problems, where the reduction of the already small size of common hydrological records significantly affects the reliability of the estimates. Under such circumstances, it is crucial to exploit all the available information. To this end, we investigate the advantages of estimating the return period without any data decimation, by using the full data-set. The proposed procedure, denoted as Complete Time-series Analysis (CTA), exploits the property that the average interarrival time (i.e. return period) of potentially damaging events is not affected by the dependence structure of the underlying process, even for cyclo-stationary (e.g. seasonal) processes. For the sake of illustration, the CTA is compared to that based on annual maxima selection, through a simple non-parametric approach, discussing advantages and limitations of the method. Results suggest that the proposed CTA approach provides a more conservative return period estimation in an holistic implementation framework within a broader range of return period values than that pertaining to other methods, which means not only the largest extremes that are the focus of extreme value theory.
Article
Full-text available
Maximum annual daily precipitation is a fundamental hydrologic variable that does not attain asymptotic conditions. Thus the classical extreme value theory (i.e., the Fisher-Tippett's theorem) does not apply and the recurrent use of the Generalized Extreme Value distribution (GEV) to estimate precipitation quantiles for structural-design purposes could be inappropriate. In order to address this issue, we first determine the exact distribution of maximum annual daily precipitation starting from a Markov chain and in a closed analytical form under the hypothesis of stochastic independence. As a second step, we formulate a superstatistics conjecture of daily precipitation, meaning that we assume that the parameters of this exact distribution vary from a year to another according to probability distributions, which is supported by empirical evidence. We test this conjecture using the world GHCN database to perform a worldwide assessment of this superstatistical distribution of daily precipitation extremes. The performances of the superstatistical distribution and the GEV are tested against data using the Kolmogorov-Smirnov statistic. By considering the issue of model's extrapolation, that is, the evaluation of the estimated model against data not used in calibration, we show that the superstatistical distribution provides more robust estimations than the GEV, which tends to underestimate (7-13%) the quantile associated to the largest cumulative frequency. The superstatistical distribution, on the other hand, tends to overestimate (10-14%) this quantile, which is a safer option for hydraulic design. The parameters of the proposed superstatistical distribution are made available for all 20,561 worldwide sites considered in this work.
Article
Record-breaking (RB) events are the highest or lowest values assumed by a given variable, such as temperature and precipitation, since the beginning of the observation period. Research in hydroclimatic fluctuations and their link with this kind of extreme events recently renewed the interest in RB events. However, empirical analyses of RB events usually rely on statistical techniques based on too restrictive hypotheses such as independent and identically distributed (i/id) random variables or nongeneral numerical methods. In this study, we propose some exact distributions along with accurate approximations describing the occurrence probability of RB and peak-over-threshold (POT) events under general spatiotemporal dependence, which enable analyses based on more appropriate assumptions. We show that (i) the Poisson binomial distribution is the exact distribution of the number of RB events under i/id, (ii) equivalent binomial distributions are accurate approximations under i/id, (iii) beta-binomial distributions provide the exact distribution of POT occurrences under spatiotemporal dependence, and (iv) equivalent beta-binomial distributions provide accurate approximations for the distribution of RB occurrences under spatiotemporal dependence. To perform numerical validations, we also introduce a generator of spatially and temporally correlated binary processes, called BetaBitST. As examples of application, we study RB and POT occurrences for monthly precipitation and temperature over the conterminous United States and reanalyze Mauna Loa daily temperature data. Results show that accounting for spatiotemporal dependence yields strikingly different conclusions, making the observed frequencies of RB and POT events much less surprising than expected and calling into question previous results reported in the literature.
Article
Statistical and physically-based methods have been used for designing and assessing water infrastructure such as spillways and stormwater drainage systems. Traditional approaches assume that hydrological processes evolve in an environment where the hydrological cycle is stationary over time. However, in recent years, it became increasingly evident that in many areas of the world, the foregoing assumption may no longer apply, due to the effect of anthropogenic and climatic induced stressors that cause nonstationary conditions. This has attracted attention of national and international agencies, research institutions, academia, and practicing water specialists, which led to developing new techniques that may be useful in those cases where there is good evidence and attribution of nonstationarity. We review the various techniques proposed in the field and point out some of the challenges ahead in future developments and applications. Our review emphasizes hydrologic design to protect against extreme events such as floods and low flows.
Book
The study of the statistics of extreme events is an essential first step in the mitigation of natural catastrophies, that often cause severe economic losses worldwide. This book is about the theoretical and practical aspects of the statistics of Extreme Events in Nature. Most importantly, this is the first text in which Copulas are introduced and used in Geophysics. Several topics are fully original, and show how standard models and calculations can be improved by exploiting the opportunities offered by Copulas. In addition, new quantities useful for design and risk assessment are introduced. Practicioners in all research areas of Geosciences and extreme events (including Finance and Insurance, closely related to natural disasters) will definitely benefit from the new Copula-approach outlined in the book. Audience This volume will be of interest to researchers and practitioners in the fields of civil and environmental engineering, geophysics, geosciences, geography and environmental science. Also scientists and undergraduate up to post graduate level students in water resources and hydrology will find valuable information in this book.