ArticlePDF Available

Modelling Extreme Rain Accumulation with an Application to the 2011 Lake Champlain Flood

Authors:

Abstract and Figures

A simple strategy is proposed to model total accumulation in non‐overlapping clusters of extreme values from a stationary series of daily precipitation. Assuming that each cluster contains at least one value above a high threshold, the cluster sum S is expressed as the ratio S=M/P of the cluster maximum M and a random scaling factor P ∈ (0,1]. The joint distribution for the pair (M,P) is then specified by coupling marginal distributions for M and P with a copula. Although the excess distribution of M is well approximated by a generalized Pareto distribution, it is argued that, conditionally on P<1, a scaled beta distribution may already be sufficiently rich to capture the behaviour of P. An appropriate copula for the pair (M,P) can also be selected by standard rank‐based techniques. This approach is used to analyse rainfall data from Burlington, Vermont, and to estimate the return period of the spring 2011 precipitation accumulation which was a key factor in that year's devastating flood in the Richelieu Valley Basin in Québec, Canada.
Content may be subject to copyright.
©2019 The Authors Journal of the Royal Statistical Society: Series C (Applied Statistics)
Published by John Wiley & Sons Ltd on behalf of the Royal Statistical Society.
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which
permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used
for commercial purposes.
0035–9254/19/68000
Appl. Statist. (2019)
Modelling extreme rain accumulation with an
application to the 2011 Lake Champlain flood
Jonathan Jalbert,
Polytechnique Montr´eal, Canada
and Orla A. Murphy, Christian Genest and Johanna G. Neˇslehov´a
McGill University, Montr´eal, Canada
[Received June 2017. Final revision January 2019]
Summary. A simple strategy is proposed to model total accumulation in non-overlapping clus-
ters of extreme values from a stationary series of daily precipitation. Assuming that each cluster
contains at least one value above a high threshold, the cluster sum Sis expressed as the ratio
SDM=P of the cluster maximum Mand a random scaling factor P2.0, 1]. The joint distribution
for the pair .M,P/ is then specified by coupling marginal distributions for Mand Pwith a copula.
Although the excess distribution of Mis well approximated by a generalized Pareto distribution,
it is argued that, conditionally on P<1, a scaled beta distribution may already be sufficiently rich
to capture the behaviour of P. An appropriate copula for the pair .M,P/ can also be selected by
standard rank-based techniques.This approach is used to analyse rainfall data from Burlington,
Vermont, and to estimate the return period of the spring 2011 precipitation accumulation which
was a key factor in that year’s devastating flood in the Richelieu Valley Basin in Qu´ebec, Canada.
Keywords: Clusters of extremes; Copula; High precipitation; Peaks over threshold; Time
series extremes
1. Introduction
Lake Champlain is a natural freshwater lake located primarily in the north-eastern USA, whose
only outlet is the Richelieu River (Qu´
ebec, Canada). In spring 2011, the lake level reached
an unprecedented height, leading to a major flood in its surroundings and in the Richelieu
Valley. The flood stage was reached on April 14th and continued for over 2 months, forcing
the evacuation of thousands of residents and causing an estimated US $100 million in damages
(International Joint Commission, 2013). As part of an effort to understand this phenomenon
and to develop appropriate mitigation solutions, it is thus of interest to estimate the return
period of catastrophic events of this magnitude.
Fig. 1 shows Lake Champlains annual maxima of daily water levels as measured since 1907
at the Burlington gauge station located in Vermont. The data are freely available from the
US Geological Survey database (https://waterdata.usgs.gov). The series seems to be
stationary; for example, the p-value of the Mann–Kendall test is about 0.54. To see whether
the lake’s 2011 historical high of 31.45 m could be predicted from this record, one could fit
a generalized extreme value (GEV) distribution to the annual maxima from the period 1907–
2010 spanning 104 years. Recall that the GEV distribution is the limiting distribution of properly
Address for correspondence: Jonathan Jalbert, D´
epartement de math´
ematiques et de g´
enie industriel, ´
Ecole
Polytechnique de Montr´
eal, CP 6179, Succursale Centre-ville, Montr´
eal, Qu´
ebec, H3C 3A7, Canada.
E-mail: jonathan.jalbert@polymtl.ca
2J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
1907
1930
1960
1990
2008
2011
2016
Year
29
29.5
30
30.5
31
31.5
Lake level (m)
Fig. 1. Lake Champlain’s annual maxima of daily water level at the Burlington gauge station in Vermont
normalized sample maxima whose distribution function Hμ,σ,ξwith location μR, scale σ>0
and shape ξRis given, for all zR,by
Hμ,σ,ξ.z/ =
exp1+ξzμ
σ1=ξξ=0 and 1 +ξ.z μ/=σ>0,
expexpzμ
σ ξ=0.
For background on this class of distributions, see, for example Coles (2001). The maximum
likelihood estimates of the parameters are ˆμ=30:239, ˆσ=0:392 and ˆ
ξ=−0:440. Because ˆ
ξis
negative, the fitted GEV distribution has a finite upper end point whose estimate is 31:13 m. At
31:45 m, the 2011 peak water level thus lies beyond the 95% confidence interval for this upper
end point, i.e. .30:048, 31:419/. Based on this classical GEV analysis, Lake Champlain’s 2011
water level maximum seems nearly impossible to predict from past lake level records. This is
thus a ‘Black Swan’ in the sense of Taleb (2007).
The inability of the GEV model to predict Lake Champlain’s 2011 flood is not surprising. In
this northern watershed, the maximum water level is observed during snow melt, which always
occurs between April and June. The yearly maximum is thus taken over this period, which
comprises only 91 days. In addition, the daily water levels exhibit strong auto-correlation, as
illustrated for spring 2011 in Fig. 2(a); this further reduces the effective block size on which
relies the asymptotic theory.
To estimate the return period of Lake Champlain’s spring 2011 flood, we focus instead
on daily precipitation, which is the most critical factor influencing floods in this watershed.
Using a hydrological model, Riboust and Brissette (2016) could indeed show that, although the
spring freshet in northern watersheds is typically the result of the snow melt and concurrent
Modelling Extreme Rain Accumulation 3
Apr MayJunJul
Date 2011
30
30.5
31
31.5
Lake level (m)
100 150 200 250 300 350 400 450 500
Spring accumulation (mm)
(a)
(b)
Fig. 2. (a) Daily Lake Champlain water levels for spring 2011 and (b) spring rainfall accumulations from
1884 to 2011 at Burlington, Vermont
precipitation, the snowpack played a minor role in Lake Champlain’s spring 2011 flood. For
example, the largest snowpack was actually recorded during the spring of 2008, and yet it was
not an unusual year for the annual water level maximum (see Fig. 1). Combining the 2008
snowpack observations with the 2011 precipitation series in their model, Riboust and Brissette
(2016) found that the simulated flood was not much larger than the actual 2011 flood. They also
noted that the temperature that was recorded that spring did not play a major role.
A boxplot of the annual spring accumulation recorded in Burlington, Vermont, is shown in
Fig. 2(b). The data are for the years 1884–2011; the 2011 value, which is marked by a cross, is
an obvious outlier. Again, a simple approach would be to fit a generalized Pareto distribution
(GPD) to the tail of the observed spring accumulations. On the basis of standard tools, the
threshold can be fixed at the 75th percentile (u=278 mm). We then find 32 exceedances between
1884 and 2010, inclusively. The maximum likelihood estimates for the GPD parameters are
ˆσ=63:364 and ˆ
ξ=−0:278. Because ˆ
ξis again negative, the fitted GPD has a finite upper end
point whose estimate is 506.2 mm. At 510 mm, the spring 2011 accumulation thus lies beyond
4J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
Apr MayJunJul
Date 2011
0
10
20
30
40
50
60
70
Precipitation (mm)
Fig. 3. Daily precipitation at Burlington Airport and the daily Lake Champlain water levels for spring 2011:
, 95th centile of non-zero precipitation (uD21:6 mm) observed between April and June from 1884 to
2010
the support of the fitted model and the return period is undetermined. The 95% confidence
interval for the upper end point of the GPD is .261:6, 750:9/.
To motivate an alternative approach, consider Fig. 3, which shows the daily precipitation
recorded at the Burlington Airport station between April and June 2011. The red broken line is
the 95th centile of non-zero precipitation (u=21:6 mm) observed during these 3 months over
the entire record, which extends from 1884 to 2010. As can be seen, this threshold was exceeded
on 8 days marked by blue asterisks between April and June 2011. Also highlighted in blue in this
picture are five clusters of high precipitation, defined here as streaks of consecutive rainy days
containing at least one exceedance above the threshold u=21:6 mm. In two cases, an extreme
rainfall was preceded by a day of medium rainfall that was due to the same weather system.
Comparing with Fig. 2(a), we can see that the lake level rose sharply following the 4-day cluster
that cumulated a total of 103 mm of precipitation, and that it only began to sink gradually after
the heavy spring rains had passed. This tendency of threshold exceedances to occur in streaks
can actually be observed in the entire Burlington spring daily precipitation series. Between 1884
and 2010, there were 233 daily exceedances of u=21:6 mm, only 48 of which were isolated events
with no rain either on the previous day or the next. Because the total accumulation per streak
can be much larger than a given exceedance, a proper assessment of accumulation thus requires
modelling clusters of high precipitation.
In this paper, we propose an extension of the peaks-over-threshold (POT) approach to model
accumulations within clusters of high precipitation. Whereas the classical POT model considers
only the frequency and severity of cluster maxima, rain accumulation in each cluster is needed
to assess flood risk properly. The new model scales up each cluster maximum by a possibly
dependent random factor. The dependence between the cluster maximum and the scaling factor
Modelling Extreme Rain Accumulation 5
is modelled through a copula. As we demonstrate with the Burlington precipitation data, this
random-scale model is simple to implement and leads to a realistic estimate of the return period
of the spring 2011 flood, which could not easily be done, either with the standard approaches
that were described above or more advanced techniques that are reviewed in Section 5.
The rest of the paper is organized as follows. The new random-scale model is presented and
motivated in Section 2. The model is then fitted to the Burlington precipitation data in Section 3.
In Section 4, the return period of Lake Champlains spring 2011 flood is computed by using
only the precipitation as the proxy for flood. Comparisons with existing models are discussed in
Section 5. Conclusions are presented in Section 6. Appendix A reports the results of a small-scale
simulation study. Note that the code used for estimating the random-scale model is available
from https://github.com/jojal5.
2. Random-scale model for cluster accumulation
Let Y1,Y2,:::be a stationary time series of non-negative measurements. In the present context,
these values will represent daily precipitations and will be called as such in what follows, but of
course the model can be used for other types of data as well. Suppose that nclusters of high
precipitation, say C1,:::,Cn, were identified by using some high threshold u. The exact cluster
definition is not important at this point; it is only assumed that each cluster contains at least one
exceedance, that every exceedance belongs to a cluster and that the clusters are non-overlapping.
2.1. Model description
For each i{1, :::,n}, let Yi=.Yj:jCi/be the vector of daily precipitation amounts corre-
sponding to cluster Ci. Let also Miand Sirespectively denote the maximum daily precipitation
and total precipitation in cluster Ci, i.e.
Mi=Yi=max.Yj,jCi/,
Si=
jCi
Yj:
Further let Li=|Ci|be the size of Ciand Pi=Mi=Sidenote the ratio of the cluster maximum to
the cluster sum. The quantity LiPiis often referred to as the peak-to-average ratio in engineering;
see, for example, Morrison and Tobias (1965). For this reason, we propose to call Pithe peak-
to-sum ratio associated with cluster Ci.
We regard .M1,P1/,:::,.Mn,Pn/as mutually independent copies of a pair .M,P/ corre-
sponding to a generic cluster Cof length L. The assumption of independence between clusters
is motivated by theorem 4.5 of Hsing (1987). We then seek a joint distribution for .M,P/ given
M>u, from which the cluster sum Scan be recovered as S=M=P.
For this, first note that P=1 when L=1, in which case the distribution of Pis degenerate. Let
ωu=Pr.P =1|M>u/ and Fube the excess distribution of M, i.e. the conditional distribution
function of Mugiven M>u. Assume that, for all m[0, /,wehavePr.M um|M>u,
P=1/=Fu.m/, which also implies that Pr.M um|M>u,P<1/=Fu.m/ for all m[0, /.
The expression
Pr.M um,Pp|M>u/
=ωu1.p =1/Fu.m/ +.1ωu/Pr.M um,Pp|M>u,P<1/.1/
is then valid for all m[0, /and p.0, 1]. Let also Gudenote the distribution of Pgiven
M>uand P<1. We then call on Sklar’s representation theorem (Nelsen, 2006) to write, for all
6J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
m[0, /and p.0, 1/,
Pr.M um,Pp|M>u,P<1/=D{Fu.m/,Gu.p/}.2/
in terms of a copula D, i.e. a joint cumulative distribution function having standard uniform
margins U.0, 1/. Equations (1) and (2) together imply that the marginal distributions are res-
pectively given, for all m.0, /and p.0, 1], by
Pr.M um|M>u/=Fu.m/,
Pr.P p|M>u/=ωu1.p =1/+.1ωu/Gu.p/:
The random-scale model is then specified by selecting suitable classes of univariate distributions
for Fuand Gu, as well as a family of bivariate copulas for D. These issues are addressed in turn
in Sections 2.2 and 2.3.
2.2. Choice of marginal distributions
To choose a model for the excess distribution Fu, recall from the Pickands–Balkema–de Haan
theorem that, if the univariate marginal distribution of the underlying time series is in the domain
of attraction of an extreme value distribution, Fuis well approximated by the GPD with scale
σu>0 and shape ξR, i.e., for all m.0, /,
Pr.M um|M>u/Fσu,ξ.m/ =1.1+ξm=σu/1=ξξ=0 and 1+ξm=σu>0,
1exp.m=σu/ξ=0.
As will be seen in Section 3, the GPD approximation works well for the Burlington precipitation
data.
To find a suitable distribution Gufor Pgiven M>u and P<1, first note that, if a generic
cluster Ccontains no 0s almost surely (as in our application), then the events {P<1}and {L>1}
are equal almost surely and thus Guis the distribution function of Pgiven M>u and L>1.
Second, Guclearly depends on Lbecause we always have SLM and hence P[1=L, 1]. Given
L=l{2, 3, :::}, a convenient choice for the conditional density of Pwould be defined, for all
p.1=l,1/,by
f.P|M>u,L=l/ .p/ =B.1=l,1/.p|αl,u,βl,u/,
where B.θ,1/.p|α,β/denotes the density of the random variable .1θ/X +θ, where Xhas a
B.α,β/beta distribution.
To construct Gu, we could thus use a hierarchical model in which the cluster length Lis
modelled at the first level and the above conditional distribution for Pgiven Lis used at the
second level. The distribution of the cluster length is generally cumbersome and, more im-
portantly, depends on the way in which the clusters are defined. For example, in Markovich
(2014), a geometric-like distribution involving the extremal index is proposed for the number
of consecutive threshold exceedances; the case where the extremal index is 0 was considered in
Markovich (2017). However, these results are not applicable to clusters that can also include
non-exceedances, as in our application.
To circumvent having to model cluster length, we advocate here a simpler solution that hap-
pens to work well for the Burlington precipitation data, as we demonstrate in Section 3. Specifi-
cally, we propose to model Gudirectly with the B.θu,1/.p|αu,βu/distribution, where θu.0, 1/is
an additional parameter that accounts for the variable cluster length. This proposal effectively
pools all clusters of length l>1 without imposing any upper bound on cluster length. Although
θudoes not have a direct interpretation in terms of cluster length, small values of this parameter
Modelling Extreme Rain Accumulation 7
are indicative of the presence of long clusters with several large values. The fitted scaled beta
distribution can moreover be used to make probabilistic statements of the following kind. If l
is an integer such that 1=l>θu, then L>l with probability at least Pr.P<1=l/. This is because
P1=L,soP<1=l implies that L>l.
2.3. Choice of dependence structure
Finally, a parametric copula family must be chosen for D. Although elements of theory that
could inform this choice are scant, a few things can be said. For example, suppose that a generic
cluster Chas length land that the vector Yof elements of Cis multivariate regularly varying
(Resnick, 1987). This implies that if ‘·’ denotes the max-norm, then there is a real η>0 and
a probability distribution ςon the unit simplex {x[0, 1]l:x=1}such that
Pr.Y>yt,Y=Y∈·/
Pr.Y>t/ yης.·/.3/
for all y>0ast→∞,where‘’ denotes weak convergence. In view of corollary 5:18 in Resnick
(1987), Yis then in the domain of attraction of a multivariate extreme value distribution and
M=Yis in the domain of attraction of the Fr´
echet distribution with parameter η. More to
the point, expression (3) implies that, if M>ufor some high threshold u,Mand Y=M are nearly
independent. Thus, given M>u, we also have approximate independence between Mand P.
We can then take Din equation (1) to be the product copula Πdefined, for all u,v[0, 1], by
Π.u,v/ =uv. As explained below, if Yis regularly varying, the tail of Sis correctly specified in
the random-scale model with D=Π.
Remark 1. Let Y=.Y1,:::,Yl/Rlbe a multivariate regularly varying random vector with
non-negative components. Set M=max.Y1,:::,Yl/, and S=Y1+:::+Yl. Then there is a Radon
measure Qon Rl\{0}such that Pr.Y=t ∈·/=Pr.M>t/Qas t→∞,where‘’ refers to vague
convergence. In view of lemma 3:9 in Jessen and Mikosch (2006), we have
lim
t→∞
Pr.S>t/
Pr.M>t/ =κQ{.x1,:::,xl/.0, /l:x1+:::+xl>1}:
Given that Pr.S>t/ Pr.M>t/,wehaveκ1 and hence Sand Mare tail equivalent;in fact, they
are both in the domain of attraction of the Fr´
echet distribution with the same shape parameter.
This tail equivalence between Sand Mis preserved when S=M=P with Pindependent of M,
provided that Mis in the domain of attraction of the Fr´
echet distribution with shape parameter
ηand E.1=Pη+/<for some real >0. This result, which follows from Breiman’s lemma
(Jessen and Mikosch (2006), lemma 4.2), holds in particular when Pis bounded from below.
Multivariate regular variation is not the only scenario under which the independence copula
Πmay be a suitable choice for Dwhen uis sufficiently high. Suppose for example that the vector
Yadmits the representation
.Y1,:::,Yl/=R×.Z1,:::,Zl/= max.Z1:::,Zl/,
where Z1,:::,Zland Rare mutually independent strictly positive random variables, and
Z1,:::,Zlare identically distributed. By construction, we then have independence between
M=Y=Rand
P=M=.Y1+:::+Yl/=max.Z1:::,Zl/=.Z1+:::+Zl/:
In this construction, the distribution of Rcan be arbitrary; in particular, it need not be heavy
tailed. The simulation study that is reported in Appendix A suggests that D=Πalso holds (at
8J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
least approximately) in other settings involving vectors Ywhose components have light-tailed
distributions and are asymptotically independent, provided that a sufficiently high threshold
is selected. However, the simulation study also reveals that there are cases in which D=Πis a
poor choice.
If the hypothesis of independence between Mand Pis rejected, a suitable copula family for D
can be chosen, fitted and validated by using rank-based techniques, as described, for example,
in Genest and Favre (2007) or Genest and Neˇ
slehov´
a (2012). Because Dis bivariate, there is a
wealth of models to tap into. In the example detailed in Appendix A, the asymmetric Gumbel
(or logistic) family appears to be a suitable choice. As will be seen in Section 3, however, the
independence assumption seems reasonable for the Burlington precipitation data.
3. Application to the Burlington precipitation data
To illustrate the use of the random-scale model proposed here, it will now be fitted to the
precipitation series measured at Burlington before 2011. The model will then be used in Section
4 to estimate the return period of the 2011 flood.
3.1. Data description
Daily precipitation in millimetres was considered for the months of April–June, for the pe-
riod 1884–2010. The data were extracted from the web site of the National Centers for En-
vironmental Information of the US National Oceanic and Atmospheric Administration; see
https://www.ncdc.noaa.gov/. For the period 1884–1943, we used the measurements
that were taken at a weather station 3 km from the airport in Burlington, Vermont. As this
station was then closed, we resorted to data that were collected at the airport itself for the years
1944–2010. To justify pooling the two series, we checked that the years 1943 and 1944 were not
change points in the combined series of annual maxima. We also tested the stationarity of this
series and its two subseries. In particular, the p-values of the Mann–Kendall test were 0.52, 0.53
and 0.24 for the pooled series and the first and second subseries respectively.
The stationarity of the total spring accumulations before 2011 was also checked by using the
Mann–Kendall trend test (p-value 0.048) and the stationarity test of Priestley and Subba Rao
(1969), whose p-value was 0.475. Moreover, we investigated the stationarity of the non-extreme
and extreme accumulations separately, i.e. the accumulation due to precipitation excluding the
clusters of high precipitation, and accumulation stemming from clusters of high precipitation
only. The hypothesis of no trend by using the Mann–Kendall test was not rejected at the 5%
level in either case; the p-value was 0:072 for non-extreme accumulations and 0:479 for extreme
accumulations.
3.2. Cluster definition
Before the random-scale model can be fitted to the Burlington data, non-overlapping clusters
of high precipitation must be constructed. This requires the selection of a high threshold uand
a cluster definition which ensures that each of them contains at least one exceedance above u,
and each exceedance belongs to one and only one cluster.
After considering different options, we defined a cluster of high precipitation as a streak (i.e.
an uninterrupted sequence) of consecutive days with non-zero precipitation containing at least
one value above a high threshold u. This way, each cluster is then separated from any other by at
least 1 day without rain. This definition leads to somewhat different clusters from the classical
runs method (O’Brien, 1987; Smith and Weissman, 1994), which puts threshold exceedances
in the same cluster unless they are separated by at least rnon-exceedances. An advantage of
Modelling Extreme Rain Accumulation 9
1883 1900 1920 1940 1960 19802010
Year
0
20
40
60
80
100
120
Precipitation (mm)
Fig. 4. Daily precipitation series at Burlington, Vermont: , 95th centile of non-zero daily precipitation
amounts (uD21.6 mm)
the present cluster definition is that it allows clusters of high precipitation to start or end with
a non-exceedance. This is convenient because rainfall that is associated with a given weather
system may intensify gradually. This was so for four of the five clusters in spring 2011, as can
be seen in Fig. 3.
Using the 95th centile of non-zero daily precipitation amounts u=21:6 mm as the threshold,
there were 233 exceedances between 1884 and 2010. The series is displayed in Fig. 4, along with
the threshold. There were 220 clusters of high precipitation as per our definition; 208 contained
one exceedance, 11 contained two, and one contained three. There were 48, 65, 44 and 16 clusters
of length 1, 2, 3 and 4 respectively; the largest cluster was of size 14.
As a preliminary step, the pairs .P1,S1/,:::,.P220,S220/are visualized in Fig. 5(a). The clusters
of length 1 correspond to the 48 points on the vertical line P=1. The rank plot of the pairs
.P1,S1/,:::,.P220,S220 /in Fig. 5(b) clearly exhibits negative association between Pand S, and
in particular the clumping of points in the top left-hand corner. These points correspond to
clusters with a high precipitation accumulation but a small peak-to-sum ratio associated with
potentially dangerous weather systems with several days of heavy rain.
3.3. Choice of dependence structure
Fig. 6 shows the rank plot derived from the 172 pairs .Mi,Pi/of cluster maxima and peak-to-sum
ratios for which Pi<1. We cannot discern any particular pattern in Fig. 6, which suggests that
the assumption of independence between Mand Pgiven P<1 seems appropriate at threshold
level u=21:6 mm. This conclusion is further supported by a p-value of 0:76 for the consistent
Cram´
er–von Mises test of independence based on the L2-distance between the product copula Π
and an asymptotically unbiased rank-based estimate of the true underlying copula D; for details
about this test, which is available in the R package copula, see Genest and R´
emillard (2004).
In contrast, modelling the dependence between Pand Swould be much more challenging, as
evidenced by the rank plot in Fig. 5(b).
3.4. Bayesian fitting of the distribution of cluster maxima
As stated in Section 2.2, suppose that the excess distribution Fuof cluster maxima is a GPD
with scale σ>0 and shape ξR. Further assume an improper prior for these parameters given,
for all σ>0 and ξR,byf.σ,ξ/.σ,ξ/1=σ. Note that this prior yields a proper posterior as
10 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
0.2 0.4 0.6 0.81
P
20
40
60
80
100
120
140
S
0.2 0.4 0.6 0.8
Rank of P
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Rank of S
(a)
(b)
Fig. 5. (a) Scatter plot and (b) rank plot of the pairs .P1,S1/,...,.P220,S220 /of peak-to-sum ratios and
cluster sums
Modelling Extreme Rain Accumulation 11
0 0.2 0.4 0.6 0.81
Rank of M
0
0.2
0.4
0.6
0.8
1
Rank of P
Fig. 6. Rank plot derived from the 172 pairs .Mi,Pi/of cluster maxima and peak-to-sum ratios for which
Pi<1
long as the sample size is greater than 2 (Northrop and Attalides, 2016), which is the case here.
Bayesian estimates and associated 95% credible intervals for the parameters are then given by
ˆσ=8:6086 .7:1258, 10:2472/,
ˆ
ξ=0:0630 .0:0464, 0:2056/:
The Bayesian QQ-plot displayed in Fig. 7(a) suggests an adequate fit, though the most extreme
precipitation observation is underestimated. To check the adequacy of this model further, the
fitted distribution Fuwas used to estimate at 66 years the return period for the extreme rainfall
of 69.6 mm that occurred on April 26th, 2011. This may seem low, but it does make good sense
given that rainfalls of similar (or even higher) magnitude had already been recorded in the past.
3.5. Bayesian fitting of the peak-to-sum ratio
When the scaled beta distribution for Guis used, the marginal distribution of Pgiven M>u is
the 1-inflated scaled beta distribution defined, for all p[0, 1], by
IB.p|ω,θ,α,β/=ωδ{1}.p/ +.1ω/B.θ,1/.p|α,β/,
where δ{1}denotes a Dirac mass at 1. To fit this distribution, it was first reparameterized by
setting ν=α=.α+β/and γ=α+β, so that the following non-informative priors could be used:
fω.ω/ω1.1ω/1,ω.0, 1/;
fθ.θ/=1, θ.0, 1/;
fν.ν/1, ν.0, 1/;
fγ.γ/1=γ,γ.0, /:
12 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
20 30405060708090100110
Sample quantiles (mm)
20
30
40
50
60
70
80
90
100
110
Quantiles of the Generalized Pareto distribution (mm)
0.2 0.30.4 0.5 0.6 0.7 0.80.9 1
Sample quantiles
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Quantile of the Scaled Beta
(a)
(b)
Fig. 7. (a) QQ-plot of the GPD fitted to the 220 cluster maxima and (b) QQ -plot of the 1-inflated beta
distribution fitted to the 220 peak-to-sum ratios: , data; , 95% credible bounds
Modelling Extreme Rain Accumulation 13
20 40 60 80 100 120 140
Observed cluster accumulation (mm)
20
40
60
80
100
120
140
Simulated cluster accumulation (mm)
0.2 0.4 0.6 0.8
Rank of P
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Rank of S
(a)
(b)
Fig. 8. (a) QQ-plot of the cluster sums from the random-scale model and (b) rank plot of pairs .P ,S/ derived
from one random sample of size 220 from the fitted random-scale model
14 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
The posterior for the lower bound θis insensitive to this choice of prior (not shown). The
QQ-plot of the fitted 1-inflated scaled beta distribution is displayed in Fig. 7(b). It suggests
a good fit, particularly in the lower tail. This is important because low values of Ptypically
correspond to long clusters with several days of heavy rain. The Bayesian point estimates of the
1-inflated scaled beta distribution are ˆ
θ=0:205, ˆα=1:92, ˆ
β=1:14 and ˆω=0:207.
Fig. 8 provides two additional diagnostic plots attesting to the good fit of the random-scale
model. Fig. 8(a) displays the QQ-plot of the cluster sums in which the theoretical quantiles were
computed by a Monte Carlo procedure. The fit of the cluster sum distribution derived from the
random-scale model is acceptable; in spite of a light overestimation in the interval .80, 120/,
the right-hand tail is well estimated. Fig. 8(b) shows the rank plot of the pairs .P,S/ for one
random sample of size 220 generated from the fitted random-scale model. Comparing Fig. 8(b)
with Fig. 5(b), we can see that the dependence between Pand Sis well captured.
4. Computation of the return period of the spring 2011 flood
In the Lake Champlain watershed, the value Tof the spring precipitation accumulation is the
main contributing factor to floods. As mentioned before and illustrated in Fig. 2(b), the value
of Tobserved in 2011 was very high: 510 mm. Because of the presence of extreme rainfall, it is
natural to regard Tas the sum Z+Wof two independent components, namely the accumulation
Zof non-extreme rainfall and the accumulation Wof precipitation from the clusters of high
precipitation. For any given year k{1, :::, 127}between 1884 and 2010, the observed value Zk
is simply the total precipitation accumulation in year kminus the accumulation Wkof rain from
clusters of high precipitation in the same year. The independence between Zand Wwas assessed
by using the tie-corrected version of the Cram´
er–von Mises rank test of independence that is
described in Genest et al. (2019) (p-value approximately 0:098); the rank plot is displayed in
Fig. 9. The horizontal line of points in Fig. 9 corresponds to years with no extreme precipitation.
0 0.2 0.4 0.6 0.81
Rank of W
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Rank of Z
Fig. 9. Rank plot derived from the pairs .W1,Z1/,...,.W127,Z127/of total extreme and non-extreme precip-
itations for the years 1884–2010
Modelling Extreme Rain Accumulation 15
100 150 200 250
100 150 200 250
Quantiles of the Normal Distribution (mm)
Sample quantities (mm)
50 100 150 200 250
100 150 200 250
Quantiles of the Normal Distribution (mm)
Sample quantities (mm)
100 150 200 250
100 150 200 250 300
Quantiles of the Normal Distribution (mm)
Sample quantities (mm)
100 150 200 250
100 150 200 250 300
Quantiles of the Normal Distribution (mm)
Sample quantities (mm)
100 150 200 250
100 150 200 250 300
Quantiles of the Normal Distribution (mm)
Sample quantities (mm)
100 150 200 250
100 150 200 250 300
Quantiles of the Normal Distribution (mm)
Sample quantities (mm)
(a)(b)
(c) (d)
(e) (f)
Fig. 10. QQ -plots of the non-extreme spring accumulation: (a) random-scale model;(b) M3–Dirichlet model;
(c) conditional exceedance model fitted by using constrained maximum likelihood; (d) conditional exceedance
model fitted by using the semiparametric Bayesian method;(e) first-order Markov chain model with asymptotic
independence; (f) first-order Markov chain model with asymptotic dependence
16 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
Because Zkis a sum of daily rainfall amounts, none of which is extreme, and given that
the entire series is stationary, it seems reasonable to assume that Z1,:::,Z127 form a normal
random sample. This assumption was validated by using a Shapiro–Wilk normality test (p-value
approximately 0.67). The predictive distribution of the accumulation Zof non-extreme rainfall
was found to be Student twith n1=126 degrees of freedom, location ¯z=161:2 and scale
{.n +1/s2=n}with s2=1739:9; the corresponding 95% credible intervals are .221:487, 245:91/
and .61:9103, 79:3274/. These Bayesian estimates were obtained by using the reference prior
defined, for all τ>0, by f.ν,τ/1=τ2.FromtheQQ-plot of non-extreme accumulations displayed
in Fig. 10(a), the fit is good.
Using the random-scale model, the distribution of W=Wkfor any given year k{1, :::, 127}
can be approximated by Monte Carlo sampling, as follows. First, the number Nkof clusters of
high precipitation in spring kis drawn from the predictive distribution given, for all nN,by
f.Nk|Y=y/.n/ =
0
P.n|91λ/G.λ|a;b/dλ,.4/
where P.·|ζ/denotes the Poisson distribution with mean ζand G.·|a;b/ is the gamma distribution
with mean a=b; the latter distribution is the posterior for λgiven Ysince the improper prior
fλ.λ/1=λwas assumed for the frequency of clusters. Here ζ=91λ, where the factor 91 denotes
a period of 91 days, i.e. the months of April, May and June which constitute the spring season.
Furthermore, a=220 corresponds to the number of cluster maxima and b=11557 corresponds
to the number of days of observations (127 years with 91 spring days per year).
Second, given a number Nk=nkof clusters of high precipitation, the cluster maxima
M1,:::,Mnkare drawn independently from the predictive distribution obtained from the POT
model given, for all z>0, by
f.Mu|Y=y/.z/ =
−∞
0
GP.z|σ,ξ/f
[.σ,ξ/|Y=y].σ,ξ/dσdξ,.5/
where GP.·|σ,ξ/denotes the GPD. Third, the peak-to-sum ratios P1,:::,Pnkare drawn inde-
pendently from the predictive distribution defined, for all p>0, by
f.P|Y=y/.p/ =1
01
01
0
0
IB.p|ω,θ,ν,γ/f
[.ω,θ,ν,γ/|Y=y].ω,θ,ν,γ/dγdνdθdω:.6/
The total amount of rain from clusters of high precipitation is then given by W=M1=P1+:::+
Mnk=Pnk. This procedure is summarized in algorithm 1 in Table 1.
The QQ-plots of the total and extreme spring precipitation accumulation corresponding to
Tab l e 1. Algorithm 1: generating rainfall accumulation for spring kfrom Nkclusters of
high precipitation
Step 1: draw the number Nk=nkof clusters of high precipitation from distribution (4)
Step 2: draw the excesses M1u,:::,Mnkufrom distribution (5)
Step 3: draw the peak-to-sum ratios P1,:::,Pnkfrom distribution (6)
Step 4: draw the accumulation of precipitation pertaining to clusters of high precipitation,
Wk=M1=P1+:::+Mnk=Pnk
Step 5: draw the accumulation Zof non-extreme rainfall from its predictive distribution
Step 6: compute the total spring accumulation Tk=Zk+Wk
Modelling Extreme Rain Accumulation 17
150 200 250 300 350 400
100 200 300 400
Observed spring accumulations (mm)
Simulated spring accumulations (mm)
150 200 250 300 350 400 450
100 200 300 400 500 600
Observed spring accumulations (mm)
Simulated spring accumulations (mm)
150 200 250 300 350 400
100 200 300 400
Observed spring accumulations (mm)
Simulated spring accumulations (mm)
150 200 250 300 350 400
100 200 300 400
Observed spring accumulations (mm)
Simulated spring accumulations (mm)
150 200 250 300 350 400
100 150 200 250 300 350 400
Observed spring accumulations (mm)
Simulated spring accumulations (mm)
150 200 250 300 350 400
100 150 200 250 300 350 400
Observed spring accumulations (mm)
Simulated spring accumulations (mm)
(a)(b)
(c) (d)
(e) (f)
Fig. 11. QQ -plots of the total spring accumulation: (a) random-scale model; (b) M3–Dirichlet model; (c)
conditional exceedance model fitted by using constrained maximum likelihood; (d) conditional exceedance
model fitted by using the semiparametric Bayesian method;(e) first-order Markov chain model with asymptotic
independence; (f) first-order Markov chain model with asymptotic dependence
18 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
0 50 100 150 200 250 300
0 50 100 150 200 250
Observed spring accumulations
of extremes (mm)
Simulated spring accumulations
of extremes (mm)
0 50 100 150 200 250
0 100 200 300
Observed spring accumulations
of extremes (mm)
Simulated spring accumulations
of extremes (mm)
0 50 100 150 200
0 50 100 150 200 250
Observed spring accumulations
of extremes (mm)
Simulated spring accumulations
of extremes (mm)
0 50 100 150 200
0 50 100 150 200
Observed spring accumulations
of extremes (mm)
Simulated spring accumulations
of extremes (mm)
0 50 100 150 200 250
050100150
Observed spring accumulations
of extremes (mm)
Simulated spring accumulations
of extremes (mm)
0 50 100 150 200
0 50 100 150 200
Observed spring accumulations
of extremes (mm)
Simulated spring accumulations
of extremes (mm)
(a)(b)
(c) (d)
(e) (f)
Fig. 12. QQ-plots of the extreme spring accumulation: (a) random-scale model; (b) M3–Dirichlet model; (c)
conditional exceedance model fitted by using constrained maximum likelihood; (d) conditional exceedance
model fitted by using the semiparametric Bayesian method;(e) first-order Markov chain model with asymptotic
independence; (f) first-order Markov chain model with asymptotic dependence
Modelling Extreme Rain Accumulation 19
0 200 400 600 800 1000
Return period (years)
0
0.5
1
1.5
2
2.5
3
3.5
410-3
Fig. 13. Predictive density of the return period estimated with the spring accumulation of the period
1884–2010
the fitted random-scale model are displayed in Figs 11 and 12 respectively. In both cases, the fit
is excellent.
The probability that Tsurpasses the value that was observed in spring 2011, i.e Pr.T >
510 mm/, can then be estimated from the predictive distribution, leading to a return period
of 430 years; the corresponding one-sided 95% credible interval is [231,/. The predictive
distribution of the return period is displayed in Fig. 13. Thus although the heavy rain of 69.6 mm
that was recorded on April 26th, 2011, is not uncommon, as already mentioned in Section 3.4,
the total spring 2011 rainfall accumulation does qualify as a rare event according to the random-
scale model.
Spring 2011 was also atypical in that five clusters of high precipitation were recorded and
the total rain accumulation in these clusters was 318 mm. On the basis of the random-scale
model, the probability of observing five or more clusters in a given spring is 3:62 ×102; the
corresponding Bayesian estimate of the return period is 33 years, which is not so high. However,
Pr.W > 318 mm/3:13 ×103, which corresponds to a return period of 302 years.
It would also have been possible to sample directly from the observed peak-to-sum ratios
P1,:::,Pnin algorithm 1 rather than from the fitted 1-inflated scaled beta distribution. Such a
bootstrapping approach would possibly make very good sense when a large data set is available.
In the present application, the parametric and non-parametric approaches lead to virtually the
same predictive distribution of the return period.
5. Comparisons with existing models
In this section, we briefly review existing approaches for the modelling of clusters of extreme
events and use the Burlington precipitation series to discuss their pros and cons with respect
to the random-scale model that is advocated here. We consider the M3–Dirichlet approach in
20 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
Section 5.1, the conditional exceedance model in Section 5.2 and a first-order Markov chain
model in Section 5.3.
5.1. The M3–Dirichlet model
S¨
uveges and Davison (2012) studied a disastrous rainfall that occurred in coastal Venezuela in
December 1999. As for the Burlington precipitation data that are considered here, standard
extremal models failed to account for this catastrophe because clusters of heavy precipitation
were not appropriately accounted for. To model such clusters, S ¨
uveges and Davison (2012)
proposed to rely on the moving maximum process M3 due to Smith and Weissman (1996).
Recall that a univariate stationary time series .Yi:iZ/is said to be an M3-process if, for each
iZ, we can write Yi=maxkZmaxlNal,kXl,ikin terms of mutually independent unit Fr´
echet
random variables .Xl,k:lN,kZ/and a so-called filter matrix A=.al,k:lN,kZ/of non-
negative constants summing to 1. It is typically assumed, as S ¨
uveges and Davison (2012) did, that
al,k>0 only when l{1, :::,L}and k{1, :::,K}so that all profiles are of the same fixed length
K. When normalized by the sum of its components, i.e. .cl,1,:::,cl,K/=.al,1,:::,al,K/=.al,1 +
:::+al,K/, the lth row of Ais referred to as the signature of the lth cluster type.
S¨
uveges and Davison (2012) argued that, when the threshold uis sufficiently high, any cluster
.Yj:jC/of extremes, once normalized by the sum of its components, i.e.
W=.Wj:jC/=1
jC
Yj
.Yj:jC/,.7/
corresponds to a noisy version of one of the signatures. This intuition is rooted in a result of
Zhang and Smith (2004) stating that, if .Yi:iZ/is an M3-process, then, for each l{1, :::,L},
Pr.Yt+1,:::,Yt+K/
Yt+1+:::+Yt+K=.cl,1,:::,cl,K/infinitely often=1:
Therefore, S¨
uveges and Davison (2012) proposed
(a) to normalize the series so that its marginals are approximately unit Fr´
echet;
(b) to identify clusters of extremes of a fixed length Kthrough an elaborate algorithm and
(c) to model the normalized cluster profiles Wwith a finite Dirichlet mixture.
The number of mixing components is at least Land an estimate of the filter matrix Ais then
obtained from the fitted Dirichlet parameters.
To apply the M3–Dirichlet model to the Burlington precipitation data, we considered three
thresholds set at the 95th, 97th and 98th centiles of precipitation (including the 0s) corresponding
to u=14:2, 18.0, 21.6 mm respectively. The last value of ucorresponds to the 95th centile of
non-zero precipitation that was used earlier. Three possible run lengths and five choices for
the number of components for the Dirichlet mixture were considered, namely r{1, 2, 3}and
m{1, :::,5}; however, r=1 could not be used with u=14:2 mm as it resulted in many overlaps
between clusters. To estimate the return period of the spring 2011 accumulation of 510 mm for
each different combination of u,rand m, 100000 spring extreme rainfalls were simulated as
follows.
(a) First generate the total number of profiles of extreme precipitation from a Poisson distri-
bution whose intensity is the mean number of profiles observed for the combination of u,
rand munder consideration, e.g. 2:44 when u=18:0 mm, r=3 and m=1.
(b) Next, when m>1, generate the number of profiles in each mixture component from a
Modelling Extreme Rain Accumulation 21
multinomial distribution whose parameters are the number of profiles per group divided
by the total number of observed profiles.
(c) Finally, for a profile in a given group, sample the total accumulation by drawing inde-
pendently the profile maximum from the fitted GPD and divide it by the maximum of the
Dirichlet vector W.
As in the random-scale model, the total spring non-extreme accumulation was assumed inde-
pendent of the extreme accumulation and was modelled by using the normal distribution. The
Bayesian information criterion BIC and QQ-plots of the spring non-extreme, extreme and to-
tal precipitation accumulations were used to select the best-fitting model, which had threshold
u=18:0 mm, run length r=3 and a Dirichlet distribution (i.e. m=1). The parameter estimates
of the GPD were ˆσ=9:54 and ˆ
ξ=1:64 ×107and those of the Dirichlet distribution were
.0:427, 0:526, 5:246, 0:508, 0:460/.
The QQ-plot of the spring precipitation total of the best-fitting model is displayed in Fig. 11.
The fit looks good overall, as does the fit of the extreme subtotal that is displayed in Fig. 12.
However, the fit of the non-extreme subtotal shown in Fig. 10 is somewhat less satisfactory.
There is no evidence of dependence between these two subtotals; the test of independence due
to Genest et al. (2019) yielded a p-value of 0:328. The QQ-plot of the cluster sums in Fig. 14
exhibits an overestimation of the upper tail. This model leads to an estimated return period of
the 2011 observation which is smaller than with the random-scale model, namely 290 years.
Compared with the random-scale model, the M3–Dirichlet approach has the advantage of
modelling the entire normalized profile, thus allowing for inference about other quantities than
the cluster sum. In this application, however, it is precisely the normalized profile distribution
which is problematic. The fixed profile length Kranged from 4 to 6 for the various combinations
of u,rand mconsidered; we found K=5 for the best-fitting model (u=18:0 mm, r=3 and m=1).
Some of the profiles included days without rain, which seems unreasonable. More importantly,
the marginal PP- and QQ-plots suggest that the Dirichlet distribution fits the normalized profiles
poorly. This problem occurred for all combinations of u,rand mthat were considered.Moreover,
the return period estimates were rather unstable as a function of u,rand mwith values ranging
from 40 to over 100000 years.
5.2. The conditional exceedance model
Following Keef et al. (2009) and Winter and Tawn (2016), one could also adapt the conditional
exceedance model of Heffernan and Tawn (2004) to account for clusters of extreme values
in the Burlington series. Given that a daily precipitation Yiexceeds some threshold u, this
approach provides a convenient semiparametric model for Yi+1,:::,Yi+τ, where τis the lag
beyond which observations can be deemed independent of Yi. Because independence appears to
hold at any lag τ>1 for the thresholds u{14:2, 18:0, 21:6}that were considered in Section 5.1,
we chose τ=1. On transformation to Laplace margins, the model boils down to assuming that
Pr{Yiu>x,.Yi+1aYi/=Y b
iz|Yi>u}exp.x/G.z/ for some non-degenerate distribution
Gwhich can be estimated either non-parametrically (Keef, Papastathopoulos and Tawn, 2013;
Keef, Tawn and Lamb, 2013) or via a Bayesian semiparametric procedure (Lugrin et al., 2016).
Both estimation approaches were used and, in each case, the return period for the spring
2011 event was estimated by using 100000 samples of total spring accumulations. To do this,
the total extreme and non-extreme accumulations were assumed independent and the latter
was taken to be normal. The total number of clusters of extreme precipitation in spring k
{1, :::, 100000}was then drawn from the Poisson distribution and each cluster of extreme
precipitation was simulated by using the method of Rootz ´
en (1998), i.e. Yiuwas first generated
22 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
20 40 60 80 100 120 140
20 40 60 80 100 120 140
Observed cluster accumulations (mm)
Simulated cluster accumulations (mm)
20 40 60 80 100 120
50 100 150 200
Observed cluster accumulations (mm)
Simulated cluster accumulations (mm)
20 40 60 80 100 120
20 40 60 80 100
Observed cluster accumulations (mm)
Simulated cluster accumulations (mm)
20 40 60 80 100 120
20 40 60 80 100 120 140
Observed cluster accumulations (mm)
Simulated cluster accumulations (mm)
20 40 60 80 100 120
20 30405060
Observed cluster accumulations (mm)
Simulated cluster accumulations (mm)
20 40 60 80 100 120
20 40 60 80
Observed cluster accumulations (mm)
Simulated cluster accumulations (mm)
(a)(b)
(c) (d)
(e) (f)
Fig. 14. QQ-plots of the cluster sums: (a) random-scale model; (b) M3–Dirichlet model; (c) conditional
exceedance model fitted by using constrained maximum likelihood;(d) conditional exceedance model fitted by
using the semiparametric Bayesian method;(e) first-order Mar kov chain model with asymptotic independence;
(f) first-order Markov chain model with asymptotic dependence
Modelling Extreme Rain Accumulation 23
and the conditional exceedance model was then used to simulate the following day, and so forth,
until an observation dropped below u.
Based on the constrained likelihood approach of Keef, Papastathopoulos and Tawn (2013),
the estimate of .a,b/ was .0:0286, 0:123/when u=14:2, .0:00285, 0:406/when u=18 and
.0:0129, 0:895/when u=21:6. The best fit of the extreme precipitation totals was obtained
when u=21:6 mm, leading to a return period of 2631:6 years. The QQ-plot of the spring
precipitation totals displayed in Fig. 11 looks decent, but the QQ-plot of the extreme spring
accumulation in Fig. 12 reveals that the fit in the upper tail is rather poor; the underestimation
of the upper tail is worse at lower thresholds. The situation is much improved when the Bayesian
semiparametric method of Lugrin et al. (2016) is used. The optimal choice of threshold here
is again u=21:6 mm. The medians of the posterior samples of aand bwere 0:00406 and
0.00463 respectively, hinting at asymptotic independence, and the estimated return period is
1481:6 years. The QQ-plots of the total and total extreme spring accumulation are displayed
in Figs 11 and 12 respectively. In this case, both plots look fine. From Fig. 14, the fit of the
cluster sums is particularly good but this is in fact largely due to the excellent fit of the GPD
for the cluster maximum because over 95% of clusters are of length 1 or 2 and, in most cases,
the second day contains only traces of precipitation. For both the constrained likelihood and
the semiparametric Bayesian approach, the normal distribution fits the non-extreme spring
accumulations very well (see Fig. 10) and there is no reason to suspect that the extreme and
non-extreme precipitation totals are dependent; the test of independence that is described in
Genest et al. (2019) yielded a p-value of 0.688 when the Bayesian semiparametric method or
the constrained likelihood approach was used.
In conclusion, the conditional exceedance model fitted by using the semiparametric Bayesian
method is good at capturing precipitation accumulation, but the estimated return period for
the 2011 event is much higher than with the random-scale model. As with the M3–Dirichlet
approach, the conditional exceedance model can be used to perform inference on other quantities
than just the cluster sum, but it is more complex than the approach that is presented here.
5.3. First-order Markov chain model
Given that the first-order Markov assumption corresponding to a lag τ=1 in the conditional
exceedance model is reasonable, we also considered the first-order Markov chain model of Smith
et al. (1997) with the asymmetric logistic distribution, and its extension due to Ramos and Led-
ford (2009) that incorporates cases of asymptotic independence and uses a modified version of
the asymmetric logistic dependence structure. We tried the same thresholds u{14:2, 18:0, 21:6}
as in the previous subsections but found that, in both cases, u=21:6 mm was again the best
choice. As before, we assumed that the non-extreme precipitation totals are normal and inde-
pendent of extreme precipitation totals. The hypothesis of independence was not rejected by
using the test of Genest et al. (2019); p-values of 0:228 and 0:992 were found for the asymptotic
dependence and independence models respectively.
The parameter estimates in the model of Ramos and Ledford (2009) were ˆη=0:999, ˆρ=2:32
and ˆα=1. Because ˆηis close to 1, this model hints at asymptotic dependence but it is obvious from
the QQ-plots in Figs 11, 12 and 14 that this model gives a poor fit, particularly in the upper tail
which is badly underestimated. It is thus not surprising that it leads to a very large return period
of over 10000 years. The underestimation is worse at lower thresholds, because ˆηdecreases.
The asymptotic dependence model fits the data better as evidenced by Figs 11, 12 and 14,
although it does not perform as well as the random-scale model and the conditional exceedance
model when the semiparametric Bayesian method is used. The asymmetric logistic parameter
estimates are ˆ=0:942 (dependence) and .ˆ
θ1,ˆ
θ2/=.0:282, 0:999/(asymmetry). This means that
24 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
there seems to be a very slight positive dependence, but samples from the asymmetric logistic
distribution with the estimated parameters are almost indistinguishable from independence.
As uincreases, the estimate of decreases and hence the association increases, which leads
to a better fit in the upper tail of both total and extreme total precipitation. The estimated
return period of the spring 2011 precipitation total is 1053 years, which is still much larger than
suggested by the random-scale model.
6. Conclusion
In this paper, precipitation recorded at Burlington, Vermont, was used to estimate the return
period of the spring 2011 Lake Champlain flood. This series contains several clusters of extreme
values, which need to be taken into account for flood risk estimation. For this, a simple extension
of the POT model, called the random-scale model, was proposed in which each cluster maximum
is scaled up by a random factor referred to as the peak-to-sum ratio. In this model, a GPD is used
for the excess of cluster maxima beyond a high threshold and the peak-to-sum ratios are taken
to follow a 1-inflated beta distribution. In principle, these two variables could be dependent, in
which case their association could be modelled by a copula. In the application that is considered
here, however, it could realistically be assumed that they are independent; this assumption also
seems theoretically sensible at high thresholds in various contexts, including when the underlying
series is regularly varying. Although the approach is tailored for precipitation data in this paper,
it could be used in other situations where cluster totals are of interest.
The random-scale model was seen to fit the Burlington precipitation data well. Through
Monte Carlo simulations, it led to a high, yet realistic, estimate of 430 years for the return period
of the spring 2011 accumulation of 510 mm. Assuming stationarity of the precipitation series,
the probability that such an event will occur again thus remains small. In fact, the estimated
100-year return level of a spring accumulation is 446 mm, which is 70 mm less than the value
that was observed in 2011. The estimate of the return period of the 2011 flood should help the
International Joint Commission on the Lake Champlain and the Richelieu River in identifying
the causes and effects of flooding, and in developing appropriate mitigation solutions and
recommendations.
In the context of the present data application, the random-scale model was compared with
other models that have been proposed in the literature (Table 2). Out of these, the conditional
Tab l e 2. Summary performance of the models considered
Model Cluster Cluster Non- Total Estimated return
sum S accumulation W extreme Z T period (years)
RAN-SCL  430
M3–Dirichlet Overestimates ×290
CE-ML ×××2632
CE-SB ≈≈1481.6
MC-IND ××Tail underestimates ×>10000
MC-DEP ×≈Tail underestimates ×1053
Plots Fig. 14 Fig. 12 Fig. 10 Fig. 11
†RAN-SCL, random-scale model; CE-SB, conditional exceedance model fitted with the semiparametric Bayesian
method; CE-ML conditional exceedance model fitted with the restricted maximum likelihood method: MC-DEP,
first-order Markov chain with asymptotic dependence; MC-IND, first-order Markov chain with asymptotic inde-
pendence; , good; , so-so; ×, bad.
Modelling Extreme Rain Accumulation 25
exceedance model fitted with the semiparametric Bayesian approach proposed by Lugrin et al.
(2016) was the closest competitor. The random-scale model and conditional exceedance model
could both adequately capture the accumulations of a streak of large rainfall, although the
cluster definitions of both models are different. The benefits of the random-scale model are
its simplicity and the flexibility of the cluster definition that can be used. Here, we used a
cluster definition that is intuitive to hydrologists but the definition defined by the runs method
could also be directly implemented. Possibly the main advantage of the conditional exceedance
model over the present approach is that it captures the entire cluster dynamics and can be
used to estimate other quantities than cluster sums such as cluster length or the probability of
consecutive threshold exceedances. If cluster summaries such as these were deemed useful for
assessing flood risk, the random-scale model would need to be extended, but it is not obvious
how this could be done.
In the future, it may be interesting to explore the effect of other variables such as snowpack,
and to take into account rainfall in the entire watershed, not just at Burlington. It also seems
that rainfalls in the area have intensified since 2011, and it may be worthwhile to investigate
whether this phenomenon is transient or whether it is a trend that may be attributed to climate
change or other factors.
Acknowledgements
Thanks are due to the National Centers for Environmental Information of the US National
Oceanic and Atmospheric Administration for freely providing the precipitation data that were
used in this study. Funding in partial support of this work was provided by the Canada Research
Chairs Program, the Natural Sciences and Engineering Research Council (grants RGPIN/2018–
04481, RGPIN/2016–04720 and RGPIN/06801–2015), the Canadian Statistical Sciences Insti-
tute and the Fonds de recherche du Qu´
ebec—Nature et technologies (grant 2015–PR–183236),
as well as by the Mitacs Elevate Program.
Appendix A
This appendix reports the results of a small-scale simulation study that was run to test the hypothesis of
independence between the cluster maximum Mand the peak-to-sum ratio P=M=S. For simplicity, clusters
of length 2 only were considered. Various combinations of marginal behaviour and extremal dependence
were investigated for the pair .Y1,Y2/through the following scenarios:
(a) light-tailed margins and asymptotic independence,
(i) a bivariate normal distribution with correlation ρ=0:4 and standard margins,
(ii) a bivariate normal distribution with correlation ρ=0:7 and standard margins and
(iii) a Liouville distribution with gamma radius, i.e. the distribution of a pair .Y1,Y2/=R×
.U,1U/, where Uis uniform on .0, 1/and independent of the gamma variable R, whose
shape and scale parameters were set to θ=3 and σ=1 respectively;
(b) heavy-tailed margins and asymptotic independence, a distribution whose copula is Gaussian with
parameter ρ=0:4 and whose margins are identical Pareto distributions with parameter κ=3;
(c) heavy-tailed margins and asymptotic dependence, a Liouville distribution with unit Pareto radius,
i.e. as in scenario (a)(iii) but with Pareto variable Rhaving parameter κ=3;
(d) independence between Mand Pholds by design, a max-norm symmetric distribution with gamma
radius, i.e. .Y1,Y2/=R×.U,V/= max.U ,V/, where Uand Vare independent uniform random vari-
ables on .0, 1/which are independent of the radial variable R, chosen to be gamma with shape and
scale parameters θ=3 and σ=1 respectively.
For details about why these distributions have the claimed tail behaviour, see Ledford and Tawn (1996)
and Belzile and Neˇ
slehov´
a (2017).
26 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
P
Frequency
0.5 0.6 0.7 0.8 0.9 1.0
0 20406080100120
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
Rank of M
Rank of P
(a)
(b)
Fig. 15. (a) Histogram of P1,:::,Pmand (b) rank plot of the pairs .M1,P1/,:::,.Mm,Pm/for one sample of
size nD105under scenario (b) when mD500
Modelling Extreme Rain Accumulation 27
Tab l e 3. Percentage of rejection of the null hypothesis H0:DDΠbased on
1000 independent samples of size n2{5000, 105}from five distributions de-
fined in the text
nDistribution % of rejection for the following
numbers of exceedances:
2500 1000 500 100
5000 (a) (i) Gauss .ρ=0:4/100 99.5 48.8 7.8
(a) (ii) Gauss .ρ=0:7/100 100 83.0 7.7
(a) (iii) Gamma–Liouville 100 100 93.8 18.1
(b) Gauss–Pareto 100.0 100.0 100.0 98.7
(c) Pareto–Liouville 6.9 5.4 4.1 3.9
(d) Max-norm symmetric 3.7 4.7 4.4 4.0
100000 (a) (i) Gauss .ρ=0:4/92.6 27.4 7.8 4.3
(a) (ii) Gauss .ρ=0:7/97.7 42.5 12.9 4.5
(a) (iii) Gamma–Liouville 100 92.9 53.1 7.9
(b) Gauss–Pareto 100.0 100.0 100.0 94.8
(c) Pareto–Liouville 5.6 5.7 3.1 4.5
(d) Max-norm symmetric 6.4 6.1 4.1 5.0
From each of these models, N=1000 samples of size n{5000, 105}were drawn. For each sample, four
thresholds were chosen as the quantiles that lead to the number of exceedances m{100, 500, 1000, 2500}.
For each set of exceedances, the pairs .M1,P1/,:::,.Mm,Pm/were computed and the independence hy-
pothesis H0:D=Πwas tested at the 5% level by using the consistent rank-based Cram´
er–von Mises test
of independence from Genest and R´
emillard (2004), implemented in the R package copula.
Table 3 reports the percentages of rejection of H0. As expected, H0is rejected in approximately 5% of
cases under scenario (d). Here, the distribution is constructed in a way that Mand Pare independent for
any threshold. Also, H0seems to hold under scenario (c) even for rather low thresholds when n=5000.
This is because .Y1,Y2/is regularly varying in this case. Interestingly, the independence assumption seems
plausible under scenario (a) if the threshold is sufficiently high. However, the meaning of ‘sufficiently high’
depends on the underlying distribution, and it may be that there are other light-tailed distributions with
asymptotic independence for which H0is not reasonable even at high thresholds.
Finally, under scenario (b), H0is rejected nearly always, even at very high thresholds when n=105.To
illustrate, Fig. 15 displays the histogram of Pand the rank plot of the pairs .M1,P1/,:::,.Mm,Pm/when
n=105and m=500. The lack of independence between Pand Mis clearly visible from the rank plot,
which also exhibits asymmetry and dependence in the upper tail. The dependence is due to the fact that,
when Mis large, .Y1,Y2/tends to lie close to one of the axes because of asymptotic independence. When
this happens, MSand P1. A suitable dependence model in this case may thus be the asymmetric
Gumbel (or logistic) family.
References
Belzile, L. R. and Neˇ
slehov´
a, J. G. (2017) Extremal attractors of Liouville copulas. J. Multiv. Anal.,160, 68–92.
Coles, S. (2001) An Introduction to Statistical Modeling of Extreme Values. London: Springer.
Genest, C. and Favre, A.-C. (2007) Everything you always wanted to know about copula modeling but were afraid
to ask. J. Hydrol. Engng,12, 347–368.
Genest, C. and Neˇ
slehov´
a, J. (2012) Copulas and copula models. In Encyclopedia of Environmetrics (eds A. H.
El-Shaarawi and W. W. Piegorsch), 2nd edn, pp. 541–553. Chichester: Wiley.
Genest, C., Neˇ
slehov´
a, J. G., R´
emillard, B. and Murphy, O. A. (2019) Testing independence in multivariate
distributions. Biometrika,106, 47–68.
Genest, C. and R´
emillard, B. (2004) Tests of independence and randomness based on the empirical copula process.
Test,13, 335–369.
Heffernan, J. E. and Tawn, J. A. (2004) A conditional approach for multivariate extreme values (with discussion).
J. R. Statist. Soc. B, 66, 497–546.
28 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
Hsing, T. (1987) On the characterization of certain point processes. Stoch. Processes Appl.,26, 297–316.
International Joint Commission (2013) The identification of measures to mitigate flooding and the impacts of
flooding of Lake Champlain and Richelieu River. Technical Report. International Joint Commission, Ottawa.
Jessen, A. H. and Mikosch, T. (2006) Regularly varying functions. Publ. Inst. Math. Beograd,80, 171–192.
Keef, C., Papastathopoulos, I. and Tawn, J. A. (2013) Estimation of the conditional distribution of a multivariate
variable given that one of its components is large: additional constraints for the Heffernan and Tawn model.
J. Multiv. Anal.,115, 396–404.
Keef, C., Svensson, C. and Tawn, J. A. (2009) Spatial dependence in extreme river flows and precipitation for
Great Britain. J. Hydrol.,378, 240–252.
Keef, C., Tawn, J. A. and Lamb, R. (2013) Estimating the probability of widespread flood events. Environmetrics,
24, 13–21.
Ledford, A. W. and Tawn, J. A. (1996) Statistics for near independence in multivariate extreme values. Biometrika,
83, 169–187.
Lugrin, T., Davison, A. C. and Tawn, J. A. (2016) Bayesian uncertainty management in temporal dependence of
extremes. Extremes,19, 491–515.
Markovich, N. M. (2014) Modeling clusters of extreme values. Extremes,17, 97–125.
Markovich, N. M. (2017) Clusters of extremes: modeling and examples. Extremes,20, 519–538.
Morrison, M. and Tobias, F. (1965) Some statistical characteristics of a peak to average ratio. Technometrics,7,
379–385.
Nelsen, R. B. (2006) An Introduction to Copulas, 2nd edn. New York: Springer.
Northrop, P. J. and Attalides, N. (2016) Posterior propriety in Bayesian extreme value analyses using reference
priors. Statist. Sin.,26, 721–743.
O’Brien, G. L. (1987) Extreme values for stationary and Markov sequences. Ann. Probab.,15, 281–291.
Priestley, M. B. and Subba Rao, T. (1969) A test for non-stationarity of time-series. J. R. Statist. Soc. B, 31,
140–149.
Ramos, A. and Ledford, A. (2009) A new class of models for bivariate joint tails. J. R. Statist. Soc. B, 71, 219–241.
Resnick, S. I. (1987) Extreme Values, Regular Variation and Point Processes. New York: Springer.
Riboust, P. and Brissette, F. (2016) Analysis of Lake Champlain/Richelieu River’s historical 2011 flood. Can. Wat.
Resour. J.,41, 174–185.
Rootz´
en, H. (1998) Maxima and exceedances of stationary Markov chains. Adv. Appl. Probab.,20, 371–390.
Smith, R. L., Tawn, J. A. and Coles, S. G. (1997) Markov chain models for threshold exceedances. Biometrika,
84, 249–268.
Smith, R. L. and Weissman, I. (1994) Estimating the extremal index. J. R. Statist. Soc. B, 56, 515–528.
Smith, R. L. and Weissman, I. (1996) Characterization and estimation of the multivariate extremalindex. Technical
Report. University of North Carolina, Chapel Hill.
S¨
uveges, M. and Davison, A. C. (2012) A case study of a “Dragon-King”: the 1999 Venezuelan catastrophe.
Eur. Phys. J. Specl Top.,205, 131–146.
Taleb, N. N. (2007) The Black Swan: the Impact of the Highly Improbable. New York: Random House.
Winter, H. C. and Tawn, J. A. (2016) Modelling heatwaves in central France: a case-study in extremal dependence.
Appl. Statist.,65, 345–365.
Zhang, Z. and Smith, R. L. (2004) The behavior of multivariate maxima of moving maxima processes. J. Appl.
Probab.,41, 1113–1123.
Article
Lakes are important components of the Earth's surface water bodies. They serve as irreplaceable functions in regional socio-economic development and conservation of biological diversities. Climate variations and intensive water consumption result in significant deviations in temporal change and spatial pattern in terms of physical and chemical processes. As a sub-discipline of hydrology, lake hydrology addresses the changing patterns of hydrological variables, their relationships, balancing and evolution. It addresses fundamental scientific questions and offers solutions to practical issues. Examples includes hydrological attribution and dynamic evolution, hydrological extremes and practical mitigation, lake resources assessment and sustainable utilization, which have been strongly supporting regional developments. This article reviews the progress of lake hydrology in China in the latest 50 years, with emphases on lake water balance and variation, lake hydrodynamics, lake hydrological extremes, and remote sensing of lake hydrology. Several key research areas are also identified and discussed for future research interests.
Article
Full-text available
Statistics are proposed for testing the hypothesis that arbitrary random variables are mutually independent. The tests are consistent and well behaved for any marginal distributions; they can be used, for example, for contingency tables which are sparse or whose dimension depends on the sample size, as well as for mixed data. No regularity conditions, data jittering, or binning mechanisms are required. The statistics are rank-based functionals of Cramér-von Mises type whose asymptotic behaviour derives from the empirical multilinear copula process. Approximate p-values are computed using a wild bootstrap. The procedures are simple to implement and computationally efficient, and maintain their level well in moderate to large samples. Simulations suggest that the tests are robust with respect to the number of ties in the data, can easily detect a broad range of alternatives, and outperform existing procedures in many settings. Additional insight into their performance is provided through asymptotic local power calculations under contiguous alternatives. The procedures are illustrated on traumatic brain injury data.
Article
Full-text available
We study clusters of threshold exceedances caused by dependence in time series. The clusters are defined as conglomerates containing consecutive threshold exceedances of the series separated by return intervals with consecutive non-exceedances. We derive asymptotic distributions of the cluster and inter-cluster sizes for processes with the extremal index equal to zero, the asymptotic expectation of the inter-cluster size and an exponential rate of convergence of the distribution tail of the return interval between clusters to the stable distribution tail. Distributions of the cluster and inter-cluster sizes of ARMAX, MM and AR(1) processes are obtained.
Article
Full-text available
Both marginal and dependence features must be described when modelling the extremes of a stationary time series. There are standard approaches to marginal modelling, but long- and short-range dependence of extremes may both appear. An assumption of long-range independence often seems reasonable, but short-range dependence, i.e., the clustering of extremes, needs attention. The extremal index 0<θ10<\theta\le 1 is a natural limiting measure of clustering, but for wide classes of dependent processes, including all Gaussian processes, it cannot distinguish dependent processes from independent processes with θ=1\theta=1. Eastoe and Tawn (2012) exploit methods from multivariate extremes to treat the subasymptotic extremal dependence structure of stationary time series, covering both 0<θ<10<\theta<1 and θ=1\theta=1, through the introduction of a threshold-based extremal index. Inference for their dependence models uses an inefficient stepwise procedure that has various weaknesses and has no reliable assessment of uncertainty. We overcome these issues using a Bayesian semiparametric approach. Simulations and the analysis of a UK daily river flow time series show that the new approach provides improved efficiency for estimating properties of functionals of clusters.
Article
Full-text available
The Generalized Pareto (GP) and Generalized extreme value (GEV) distributions play an important role in extreme value analyses, as models for threshold excesses and block maxima respectively. For each of these distributions we consider Bayesian inference using "reference" prior distributions (in the general sense of priors constructed using formal rules) for the model parameters, specifically a Jeffreys prior, the maximal data information (MDI) prior and independent uniform priors on separate model parameters. We investigate the important issue of whether these improper priors lead to proper posterior distributions. We show that, in the GP and GEV cases, the MDI prior, unless modified, never yields a proper posterior and that in the GEV case this also applies to the Jeffreys prior. We also show that a sample size of three (four) is sufficient for independent uniform priors to yield a proper posterior distribution in the GP (GEV) case.
Article
The extremal index is an important parameter measuring the degree of clustering of extremes in a stationary process. If we consider the point process of exceedance times over a high threshold, then this can be shown to converge asymptotically to a clustered Poisson process. The extremal index, a parameter in the interval [0, 1], is the reciprocal of the mean cluster size. Apart from being of interest in its own right, it is a crucial parameter for determining the limiting distribution of extreme values from the process. In this paper we review current work on statistical estimation of the extremal index and consider an optimality criterion based on a bias‐variance trade‐off. Theoretical results are developed for a simple doubly stochastic process, and it is argued that the main formula obtained is valid for a much wider class of processes. The practical implications are examined through simulations and a real data example.
Article
Extremes Values, Regular Variation and Point Processes is a readable and efficient account of the fundamental mathematical and stochastic process techniques needed to study the behavior of extreme values of phenomena based on independent and identically distributed random variables and vectors. It presents a coherent treatment of the distributional and sample path fundamental properties of extremes and records. It emphasizes the core primacy of three topics necessary for understanding extremes: the analytical theory of regularly varying functions; the probabilistic theory of point processes and random measures; and the link to asymptotic distribution approximations provided by the theory of weak convergence of probability measures in metric spaces. The book is self-contained and requires an introductory measure-theoretic course in probability as a prerequisite. Almost all sections have an extensive list of exercises which extend developments in the text, offer alternate approaches, test mastery and provide for enjoyable muscle flexing by a reader. The material is aimed at students and researchers in probability, statistics, financial engineering, mathematics, operations research, civil engineering and economics who need to know about: * asymptotic methods for extremes; * models for records and record frequencies; * stochastic process and point process methods and their applications to obtaining distributional approximations; * pervasive applications of the theory of regular variation in probability theory, statistics and financial engineering. "This book is written in a very lucid way. The style is sober, the mathematics tone is pleasantly conversational, convincing and enthusiastic. A beautiful book!" ---Bulletin of the Dutch Mathematical Society "This monograph is written in a very attractive style. It contains a lot of complementary exercises and practically all important bibliographical reference." ---Revue Roumaine de Mathématiques Pures et Appliquées
Article
Liouville copulas, which were introduced in McNeil and Ne\v{s}lehov\'a (2010), are asymmetric generalizations of the ubiquitous Archimedean copula class. They are the dependence structures of scale mixtures of Dirichlet distributions, also called Liouville distributions. In this paper, the limiting extreme-value copulas of Liouville copulas and of their survival counterparts are derived. The limiting max-stable models, termed here the scaled extremal Dirichlet, are new and encompass several existing classes of multivariate max-stable distributions, including the logistic, negative logistic and extremal Dirichlet. As shown herein, the stable tail dependence function and angular density of the scaled extremal Dirichlet model have a tractable form, which in turn leads to a simple de Haan representation. The latter is used to design efficient algorithms for unconditional simulation based on the work of Dombry, Engelke and Oesting (2015) and to derive tractable formulas for maximum-likelihood inference. The scaled extremal Dirichlet model is illustrated on river flow data of the river Isar in southern Germany.
Article
In the characterization of multivariate extremal indices of multivariate stationary processes, multivariate maxima of moving maxima processes, or M4 processes for short, have been introduced by Smith and Weissman. Central to the introduction of M4 processes is that the extreme observations of multivariate stationary processes may be characterized in terms of a limiting max-stable process under quite general conditions, and that a max-stable process can be arbitrarily closely approximated by an M4 process. In this paper, we derive some additional basic probabilistic properties for a finite class of M4 processes, each of which contains finite-range clustered moving patterns, called signature patterns, when extreme events occur. We use these properties to construct statistical estimation schemes for model parameters.
Article
Heatwaves are phenomena that have large social and economic consequences. Understanding and estimating the frequency of such events are of great importance to climate scientists and decision makers. Heatwaves are a type of extreme event which are by definition rare and as such there are few data in the historical record to help planners. Extreme value theory is a general framework from which inference can be drawn from extreme events. When modelling heatwaves it is important to take into account the intensity and duration of events above a critical level as well as the interaction between both factors. Most previous methods assume that the duration distribution is independent of the critical level that is used to define a heatwave: a shortcoming that can lead to incorrect inferences. The paper characterizes a novel method for analysing the temporal dependence of heatwaves with reference to observed temperatures from Orleans in central France. This method enables estimation of the probabilities for heatwave events irrespectively of whether the duration distribution is independent of the critical level. The methods are demonstrated by estimating the probability of an event more severe than the 2003 European heatwave or an event that causes a specified increase in mortality.
Chapter
This entry introduces the notion of copula, reviews classical copula models, and describes their main properties. The entry also presents rank-based estimation procedures and goodness-of-fit tests for copula modeling.Keywords:Archimedean;dependence concept;empirical copula;comonotonicity;extreme-value;Fréchet--Hoeffding bounds;goodness-of-fit test;Kendall distribution;Kendall's tau;meta-elliptical;moment-based estimator;pair-copula;pseudo-likelihood;pseudo-observations;probability integral transform;rank-based inference;Sklar's representation;Spearman's rho;stochastic ordering;tail dependence coefficient;vine