Available via license: CC BY-NC
Content may be subject to copyright.
©2019 The Authors Journal of the Royal Statistical Society: Series C (Applied Statistics)
Published by John Wiley & Sons Ltd on behalf of the Royal Statistical Society.
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which
permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used
for commercial purposes.
0035–9254/19/68000
Appl. Statist. (2019)
Modelling extreme rain accumulation with an
application to the 2011 Lake Champlain flood
Jonathan Jalbert,
Polytechnique Montr´eal, Canada
and Orla A. Murphy, Christian Genest and Johanna G. Neˇslehov´a
McGill University, Montr´eal, Canada
[Received June 2017. Final revision January 2019]
Summary. A simple strategy is proposed to model total accumulation in non-overlapping clus-
ters of extreme values from a stationary series of daily precipitation. Assuming that each cluster
contains at least one value above a high threshold, the cluster sum Sis expressed as the ratio
SDM=P of the cluster maximum Mand a random scaling factor P2.0, 1]. The joint distribution
for the pair .M,P/ is then specified by coupling marginal distributions for Mand Pwith a copula.
Although the excess distribution of Mis well approximated by a generalized Pareto distribution,
it is argued that, conditionally on P<1, a scaled beta distribution may already be sufficiently rich
to capture the behaviour of P. An appropriate copula for the pair .M,P/ can also be selected by
standard rank-based techniques.This approach is used to analyse rainfall data from Burlington,
Vermont, and to estimate the return period of the spring 2011 precipitation accumulation which
was a key factor in that year’s devastating flood in the Richelieu Valley Basin in Qu´ebec, Canada.
Keywords: Clusters of extremes; Copula; High precipitation; Peaks over threshold; Time
series extremes
1. Introduction
Lake Champlain is a natural freshwater lake located primarily in the north-eastern USA, whose
only outlet is the Richelieu River (Qu´
ebec, Canada). In spring 2011, the lake level reached
an unprecedented height, leading to a major flood in its surroundings and in the Richelieu
Valley. The flood stage was reached on April 14th and continued for over 2 months, forcing
the evacuation of thousands of residents and causing an estimated US $100 million in damages
(International Joint Commission, 2013). As part of an effort to understand this phenomenon
and to develop appropriate mitigation solutions, it is thus of interest to estimate the return
period of catastrophic events of this magnitude.
Fig. 1 shows Lake Champlain’s annual maxima of daily water levels as measured since 1907
at the Burlington gauge station located in Vermont. The data are freely available from the
US Geological Survey database (https://waterdata.usgs.gov). The series seems to be
stationary; for example, the p-value of the Mann–Kendall test is about 0.54. To see whether
the lake’s 2011 historical high of 31.45 m could be predicted from this record, one could fit
a generalized extreme value (GEV) distribution to the annual maxima from the period 1907–
2010 spanning 104 years. Recall that the GEV distribution is the limiting distribution of properly
Address for correspondence: Jonathan Jalbert, D´
epartement de math´
ematiques et de g´
enie industriel, ´
Ecole
Polytechnique de Montr´
eal, CP 6179, Succursale Centre-ville, Montr´
eal, Qu´
ebec, H3C 3A7, Canada.
E-mail: jonathan.jalbert@polymtl.ca
2J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
1907
1930
1960
1990
2008
2011
2016
Year
29
29.5
30
30.5
31
31.5
Lake level (m)
Fig. 1. Lake Champlain’s annual maxima of daily water level at the Burlington gauge station in Vermont
normalized sample maxima whose distribution function Hμ,σ,ξwith location μ∈R, scale σ>0
and shape ξ∈Ris given, for all z∈R,by
Hμ,σ,ξ.z/ =⎧
⎪
⎨
⎪
⎩
exp−1+ξz−μ
σ−1=ξξ=0 and 1 +ξ.z −μ/=σ>0,
exp−exp−z−μ
σ ξ=0.
For background on this class of distributions, see, for example Coles (2001). The maximum
likelihood estimates of the parameters are ˆμ=30:239, ˆσ=0:392 and ˆ
ξ=−0:440. Because ˆ
ξis
negative, the fitted GEV distribution has a finite upper end point whose estimate is 31:13 m. At
31:45 m, the 2011 peak water level thus lies beyond the 95% confidence interval for this upper
end point, i.e. .30:048, 31:419/. Based on this classical GEV analysis, Lake Champlain’s 2011
water level maximum seems nearly impossible to predict from past lake level records. This is
thus a ‘Black Swan’ in the sense of Taleb (2007).
The inability of the GEV model to predict Lake Champlain’s 2011 flood is not surprising. In
this northern watershed, the maximum water level is observed during snow melt, which always
occurs between April and June. The yearly maximum is thus taken over this period, which
comprises only 91 days. In addition, the daily water levels exhibit strong auto-correlation, as
illustrated for spring 2011 in Fig. 2(a); this further reduces the effective block size on which
relies the asymptotic theory.
To estimate the return period of Lake Champlain’s spring 2011 flood, we focus instead
on daily precipitation, which is the most critical factor influencing floods in this watershed.
Using a hydrological model, Riboust and Brissette (2016) could indeed show that, although the
spring freshet in northern watersheds is typically the result of the snow melt and concurrent
Modelling Extreme Rain Accumulation 3
Apr MayJunJul
Date 2011
30
30.5
31
31.5
Lake level (m)
100 150 200 250 300 350 400 450 500
Spring accumulation (mm)
(a)
(b)
Fig. 2. (a) Daily Lake Champlain water levels for spring 2011 and (b) spring rainfall accumulations from
1884 to 2011 at Burlington, Vermont
precipitation, the snowpack played a minor role in Lake Champlain’s spring 2011 flood. For
example, the largest snowpack was actually recorded during the spring of 2008, and yet it was
not an unusual year for the annual water level maximum (see Fig. 1). Combining the 2008
snowpack observations with the 2011 precipitation series in their model, Riboust and Brissette
(2016) found that the simulated flood was not much larger than the actual 2011 flood. They also
noted that the temperature that was recorded that spring did not play a major role.
A boxplot of the annual spring accumulation recorded in Burlington, Vermont, is shown in
Fig. 2(b). The data are for the years 1884–2011; the 2011 value, which is marked by a cross, is
an obvious outlier. Again, a simple approach would be to fit a generalized Pareto distribution
(GPD) to the tail of the observed spring accumulations. On the basis of standard tools, the
threshold can be fixed at the 75th percentile (u=278 mm). We then find 32 exceedances between
1884 and 2010, inclusively. The maximum likelihood estimates for the GPD parameters are
ˆσ=63:364 and ˆ
ξ=−0:278. Because ˆ
ξis again negative, the fitted GPD has a finite upper end
point whose estimate is 506.2 mm. At 510 mm, the spring 2011 accumulation thus lies beyond
4J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
Apr MayJunJul
Date 2011
0
10
20
30
40
50
60
70
Precipitation (mm)
Fig. 3. Daily precipitation at Burlington Airport and the daily Lake Champlain water levels for spring 2011:
, 95th centile of non-zero precipitation (uD21:6 mm) observed between April and June from 1884 to
2010
the support of the fitted model and the return period is undetermined. The 95% confidence
interval for the upper end point of the GPD is .261:6, 750:9/.
To motivate an alternative approach, consider Fig. 3, which shows the daily precipitation
recorded at the Burlington Airport station between April and June 2011. The red broken line is
the 95th centile of non-zero precipitation (u=21:6 mm) observed during these 3 months over
the entire record, which extends from 1884 to 2010. As can be seen, this threshold was exceeded
on 8 days marked by blue asterisks between April and June 2011. Also highlighted in blue in this
picture are five clusters of high precipitation, defined here as streaks of consecutive rainy days
containing at least one exceedance above the threshold u=21:6 mm. In two cases, an extreme
rainfall was preceded by a day of medium rainfall that was due to the same weather system.
Comparing with Fig. 2(a), we can see that the lake level rose sharply following the 4-day cluster
that cumulated a total of 103 mm of precipitation, and that it only began to sink gradually after
the heavy spring rains had passed. This tendency of threshold exceedances to occur in streaks
can actually be observed in the entire Burlington spring daily precipitation series. Between 1884
and 2010, there were 233 daily exceedances of u=21:6 mm, only 48 of which were isolated events
with no rain either on the previous day or the next. Because the total accumulation per streak
can be much larger than a given exceedance, a proper assessment of accumulation thus requires
modelling clusters of high precipitation.
In this paper, we propose an extension of the peaks-over-threshold (POT) approach to model
accumulations within clusters of high precipitation. Whereas the classical POT model considers
only the frequency and severity of cluster maxima, rain accumulation in each cluster is needed
to assess flood risk properly. The new model scales up each cluster maximum by a possibly
dependent random factor. The dependence between the cluster maximum and the scaling factor
Modelling Extreme Rain Accumulation 5
is modelled through a copula. As we demonstrate with the Burlington precipitation data, this
random-scale model is simple to implement and leads to a realistic estimate of the return period
of the spring 2011 flood, which could not easily be done, either with the standard approaches
that were described above or more advanced techniques that are reviewed in Section 5.
The rest of the paper is organized as follows. The new random-scale model is presented and
motivated in Section 2. The model is then fitted to the Burlington precipitation data in Section 3.
In Section 4, the return period of Lake Champlain’s spring 2011 flood is computed by using
only the precipitation as the proxy for flood. Comparisons with existing models are discussed in
Section 5. Conclusions are presented in Section 6. Appendix A reports the results of a small-scale
simulation study. Note that the code used for estimating the random-scale model is available
from https://github.com/jojal5.
2. Random-scale model for cluster accumulation
Let Y1,Y2,:::be a stationary time series of non-negative measurements. In the present context,
these values will represent daily precipitations and will be called as such in what follows, but of
course the model can be used for other types of data as well. Suppose that nclusters of high
precipitation, say C1,:::,Cn, were identified by using some high threshold u. The exact cluster
definition is not important at this point; it is only assumed that each cluster contains at least one
exceedance, that every exceedance belongs to a cluster and that the clusters are non-overlapping.
2.1. Model description
For each i∈{1, :::,n}, let Yi=.Yj:j∈Ci/be the vector of daily precipitation amounts corre-
sponding to cluster Ci. Let also Miand Sirespectively denote the maximum daily precipitation
and total precipitation in cluster Ci, i.e.
Mi=Yi∞=max.Yj,j∈Ci/,
Si=
j∈Ci
Yj:
Further let Li=|Ci|be the size of Ciand Pi=Mi=Sidenote the ratio of the cluster maximum to
the cluster sum. The quantity LiPiis often referred to as the peak-to-average ratio in engineering;
see, for example, Morrison and Tobias (1965). For this reason, we propose to call Pithe peak-
to-sum ratio associated with cluster Ci.
We regard .M1,P1/,:::,.Mn,Pn/as mutually independent copies of a pair .M,P/ corre-
sponding to a generic cluster Cof length L. The assumption of independence between clusters
is motivated by theorem 4.5 of Hsing (1987). We then seek a joint distribution for .M,P/ given
M>u, from which the cluster sum Scan be recovered as S=M=P.
For this, first note that P=1 when L=1, in which case the distribution of Pis degenerate. Let
ωu=Pr.P =1|M>u/ and Fube the excess distribution of M, i.e. the conditional distribution
function of M−ugiven M>u. Assume that, for all m∈[0, ∞/,wehavePr.M −um|M>u,
P=1/=Fu.m/, which also implies that Pr.M −um|M>u,P<1/=Fu.m/ for all m∈[0, ∞/.
The expression
Pr.M −um,Pp|M>u/
=ωu1.p =1/Fu.m/ +.1−ωu/Pr.M −um,Pp|M>u,P<1/.1/
is then valid for all m∈[0, ∞/and p∈.0, 1]. Let also Gudenote the distribution of Pgiven
M>uand P<1. We then call on Sklar’s representation theorem (Nelsen, 2006) to write, for all
6J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
m∈[0, ∞/and p∈.0, 1/,
Pr.M −um,Pp|M>u,P<1/=D{Fu.m/,Gu.p/}.2/
in terms of a copula D, i.e. a joint cumulative distribution function having standard uniform
margins U.0, 1/. Equations (1) and (2) together imply that the marginal distributions are res-
pectively given, for all m∈.0, ∞/and p∈.0, 1], by
Pr.M −um|M>u/=Fu.m/,
Pr.P p|M>u/=ωu1.p =1/+.1−ωu/Gu.p/:
The random-scale model is then specified by selecting suitable classes of univariate distributions
for Fuand Gu, as well as a family of bivariate copulas for D. These issues are addressed in turn
in Sections 2.2 and 2.3.
2.2. Choice of marginal distributions
To choose a model for the excess distribution Fu, recall from the Pickands–Balkema–de Haan
theorem that, if the univariate marginal distribution of the underlying time series is in the domain
of attraction of an extreme value distribution, Fuis well approximated by the GPD with scale
σu>0 and shape ξ∈R, i.e., for all m∈.0, ∞/,
Pr.M −um|M>u/≈Fσu,ξ.m/ =1−.1+ξm=σu/−1=ξξ=0 and 1+ξm=σu>0,
1−exp.−m=σu/ξ=0.
As will be seen in Section 3, the GPD approximation works well for the Burlington precipitation
data.
To find a suitable distribution Gufor Pgiven M>u and P<1, first note that, if a generic
cluster Ccontains no 0s almost surely (as in our application), then the events {P<1}and {L>1}
are equal almost surely and thus Guis the distribution function of Pgiven M>u and L>1.
Second, Guclearly depends on Lbecause we always have SLM and hence P∈[1=L, 1]. Given
L=l∈{2, 3, :::}, a convenient choice for the conditional density of Pwould be defined, for all
p∈.1=l,1/,by
f.P|M>u,L=l/ .p/ =B.1=l,1/.p|αl,u,βl,u/,
where B.θ,1/.p|α,β/denotes the density of the random variable .1−θ/X +θ, where Xhas a
B.α,β/beta distribution.
To construct Gu, we could thus use a hierarchical model in which the cluster length Lis
modelled at the first level and the above conditional distribution for Pgiven Lis used at the
second level. The distribution of the cluster length is generally cumbersome and, more im-
portantly, depends on the way in which the clusters are defined. For example, in Markovich
(2014), a geometric-like distribution involving the extremal index is proposed for the number
of consecutive threshold exceedances; the case where the extremal index is 0 was considered in
Markovich (2017). However, these results are not applicable to clusters that can also include
non-exceedances, as in our application.
To circumvent having to model cluster length, we advocate here a simpler solution that hap-
pens to work well for the Burlington precipitation data, as we demonstrate in Section 3. Specifi-
cally, we propose to model Gudirectly with the B.θu,1/.p|αu,βu/distribution, where θu∈.0, 1/is
an additional parameter that accounts for the variable cluster length. This proposal effectively
pools all clusters of length l>1 without imposing any upper bound on cluster length. Although
θudoes not have a direct interpretation in terms of cluster length, small values of this parameter
Modelling Extreme Rain Accumulation 7
are indicative of the presence of long clusters with several large values. The fitted scaled beta
distribution can moreover be used to make probabilistic statements of the following kind. If l
is an integer such that 1=l>θu, then L>l with probability at least Pr.P<1=l/. This is because
P1=L,soP<1=l implies that L>l.
2.3. Choice of dependence structure
Finally, a parametric copula family must be chosen for D. Although elements of theory that
could inform this choice are scant, a few things can be said. For example, suppose that a generic
cluster Chas length land that the vector Yof elements of Cis multivariate regularly varying
(Resnick, 1987). This implies that if ‘·∞’ denotes the max-norm, then there is a real η>0 and
a probability distribution ςon the unit simplex {x∈[0, 1]l:x∞=1}such that
Pr.Y∞>yt,Y=Y∞∈·/
Pr.Y∞>t/ y−ης.·/.3/
for all y>0ast→∞,where‘’ denotes weak convergence. In view of corollary 5:18 in Resnick
(1987), Yis then in the domain of attraction of a multivariate extreme value distribution and
M=Y∞is in the domain of attraction of the Fr´
echet distribution with parameter η. More to
the point, expression (3) implies that, if M>ufor some high threshold u,Mand Y=M are nearly
independent. Thus, given M>u, we also have approximate independence between Mand P.
We can then take Din equation (1) to be the product copula Πdefined, for all u,v∈[0, 1], by
Π.u,v/ =uv. As explained below, if Yis regularly varying, the tail of Sis correctly specified in
the random-scale model with D=Π.
Remark 1. Let Y=.Y1,:::,Yl/∈Rlbe a multivariate regularly varying random vector with
non-negative components. Set M=max.Y1,:::,Yl/, and S=Y1+:::+Yl. Then there is a Radon
measure Qon Rl\{0}such that Pr.Y=t ∈·/=Pr.M>t/⇒Qas t→∞,where‘⇒’ refers to vague
convergence. In view of lemma 3:9 in Jessen and Mikosch (2006), we have
lim
t→∞
Pr.S>t/
Pr.M>t/ =κ≡Q{.x1,:::,xl/∈.0, ∞/l:x1+:::+xl>1}:
Given that Pr.S>t/ Pr.M>t/,wehaveκ1 and hence Sand Mare tail equivalent;in fact, they
are both in the domain of attraction of the Fr´
echet distribution with the same shape parameter.
This tail equivalence between Sand Mis preserved when S=M=P with Pindependent of M,
provided that Mis in the domain of attraction of the Fr´
echet distribution with shape parameter
ηand E.1=Pη+/<∞for some real >0. This result, which follows from Breiman’s lemma
(Jessen and Mikosch (2006), lemma 4.2), holds in particular when Pis bounded from below.
Multivariate regular variation is not the only scenario under which the independence copula
Πmay be a suitable choice for Dwhen uis sufficiently high. Suppose for example that the vector
Yadmits the representation
.Y1,:::,Yl/=R×.Z1,:::,Zl/= max.Z1:::,Zl/,
where Z1,:::,Zland Rare mutually independent strictly positive random variables, and
Z1,:::,Zlare identically distributed. By construction, we then have independence between
M=Y∞=Rand
P=M=.Y1+:::+Yl/=max.Z1:::,Zl/=.Z1+:::+Zl/:
In this construction, the distribution of Rcan be arbitrary; in particular, it need not be heavy
tailed. The simulation study that is reported in Appendix A suggests that D=Πalso holds (at
8J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
least approximately) in other settings involving vectors Ywhose components have light-tailed
distributions and are asymptotically independent, provided that a sufficiently high threshold
is selected. However, the simulation study also reveals that there are cases in which D=Πis a
poor choice.
If the hypothesis of independence between Mand Pis rejected, a suitable copula family for D
can be chosen, fitted and validated by using rank-based techniques, as described, for example,
in Genest and Favre (2007) or Genest and Neˇ
slehov´
a (2012). Because Dis bivariate, there is a
wealth of models to tap into. In the example detailed in Appendix A, the asymmetric Gumbel
(or logistic) family appears to be a suitable choice. As will be seen in Section 3, however, the
independence assumption seems reasonable for the Burlington precipitation data.
3. Application to the Burlington precipitation data
To illustrate the use of the random-scale model proposed here, it will now be fitted to the
precipitation series measured at Burlington before 2011. The model will then be used in Section
4 to estimate the return period of the 2011 flood.
3.1. Data description
Daily precipitation in millimetres was considered for the months of April–June, for the pe-
riod 1884–2010. The data were extracted from the web site of the National Centers for En-
vironmental Information of the US National Oceanic and Atmospheric Administration; see
https://www.ncdc.noaa.gov/. For the period 1884–1943, we used the measurements
that were taken at a weather station 3 km from the airport in Burlington, Vermont. As this
station was then closed, we resorted to data that were collected at the airport itself for the years
1944–2010. To justify pooling the two series, we checked that the years 1943 and 1944 were not
change points in the combined series of annual maxima. We also tested the stationarity of this
series and its two subseries. In particular, the p-values of the Mann–Kendall test were 0.52, 0.53
and 0.24 for the pooled series and the first and second subseries respectively.
The stationarity of the total spring accumulations before 2011 was also checked by using the
Mann–Kendall trend test (p-value 0.048) and the stationarity test of Priestley and Subba Rao
(1969), whose p-value was 0.475. Moreover, we investigated the stationarity of the non-extreme
and extreme accumulations separately, i.e. the accumulation due to precipitation excluding the
clusters of high precipitation, and accumulation stemming from clusters of high precipitation
only. The hypothesis of no trend by using the Mann–Kendall test was not rejected at the 5%
level in either case; the p-value was 0:072 for non-extreme accumulations and 0:479 for extreme
accumulations.
3.2. Cluster definition
Before the random-scale model can be fitted to the Burlington data, non-overlapping clusters
of high precipitation must be constructed. This requires the selection of a high threshold uand
a cluster definition which ensures that each of them contains at least one exceedance above u,
and each exceedance belongs to one and only one cluster.
After considering different options, we defined a cluster of high precipitation as a streak (i.e.
an uninterrupted sequence) of consecutive days with non-zero precipitation containing at least
one value above a high threshold u. This way, each cluster is then separated from any other by at
least 1 day without rain. This definition leads to somewhat different clusters from the classical
runs method (O’Brien, 1987; Smith and Weissman, 1994), which puts threshold exceedances
in the same cluster unless they are separated by at least rnon-exceedances. An advantage of
Modelling Extreme Rain Accumulation 9
1883 1900 1920 1940 1960 19802010
Year
0
20
40
60
80
100
120
Precipitation (mm)
Fig. 4. Daily precipitation series at Burlington, Vermont: , 95th centile of non-zero daily precipitation
amounts (uD21.6 mm)
the present cluster definition is that it allows clusters of high precipitation to start or end with
a non-exceedance. This is convenient because rainfall that is associated with a given weather
system may intensify gradually. This was so for four of the five clusters in spring 2011, as can
be seen in Fig. 3.
Using the 95th centile of non-zero daily precipitation amounts u=21:6 mm as the threshold,
there were 233 exceedances between 1884 and 2010. The series is displayed in Fig. 4, along with
the threshold. There were 220 clusters of high precipitation as per our definition; 208 contained
one exceedance, 11 contained two, and one contained three. There were 48, 65, 44 and 16 clusters
of length 1, 2, 3 and 4 respectively; the largest cluster was of size 14.
As a preliminary step, the pairs .P1,S1/,:::,.P220,S220/are visualized in Fig. 5(a). The clusters
of length 1 correspond to the 48 points on the vertical line P=1. The rank plot of the pairs
.P1,S1/,:::,.P220,S220 /in Fig. 5(b) clearly exhibits negative association between Pand S, and
in particular the clumping of points in the top left-hand corner. These points correspond to
clusters with a high precipitation accumulation but a small peak-to-sum ratio associated with
potentially dangerous weather systems with several days of heavy rain.
3.3. Choice of dependence structure
Fig. 6 shows the rank plot derived from the 172 pairs .Mi,Pi/of cluster maxima and peak-to-sum
ratios for which Pi<1. We cannot discern any particular pattern in Fig. 6, which suggests that
the assumption of independence between Mand Pgiven P<1 seems appropriate at threshold
level u=21:6 mm. This conclusion is further supported by a p-value of 0:76 for the consistent
Cram´
er–von Mises test of independence based on the L2-distance between the product copula Π
and an asymptotically unbiased rank-based estimate of the true underlying copula D; for details
about this test, which is available in the R package copula, see Genest and R´
emillard (2004).
In contrast, modelling the dependence between Pand Swould be much more challenging, as
evidenced by the rank plot in Fig. 5(b).
3.4. Bayesian fitting of the distribution of cluster maxima
As stated in Section 2.2, suppose that the excess distribution Fuof cluster maxima is a GPD
with scale σ>0 and shape ξ∈R. Further assume an improper prior for these parameters given,
for all σ>0 and ξ∈R,byf.σ,ξ/.σ,ξ/∝1=σ. Note that this prior yields a proper posterior as
10 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
0.2 0.4 0.6 0.81
P
20
40
60
80
100
120
140
S
0.2 0.4 0.6 0.8
Rank of P
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Rank of S
(a)
(b)
Fig. 5. (a) Scatter plot and (b) rank plot of the pairs .P1,S1/,...,.P220,S220 /of peak-to-sum ratios and
cluster sums
Modelling Extreme Rain Accumulation 11
0 0.2 0.4 0.6 0.81
Rank of M
0
0.2
0.4
0.6
0.8
1
Rank of P
Fig. 6. Rank plot derived from the 172 pairs .Mi,Pi/of cluster maxima and peak-to-sum ratios for which
Pi<1
long as the sample size is greater than 2 (Northrop and Attalides, 2016), which is the case here.
Bayesian estimates and associated 95% credible intervals for the parameters are then given by
ˆσ=8:6086 ∈.7:1258, 10:2472/,
ˆ
ξ=0:0630 ∈.−0:0464, 0:2056/:
The Bayesian QQ-plot displayed in Fig. 7(a) suggests an adequate fit, though the most extreme
precipitation observation is underestimated. To check the adequacy of this model further, the
fitted distribution Fuwas used to estimate at 66 years the return period for the extreme rainfall
of 69.6 mm that occurred on April 26th, 2011. This may seem low, but it does make good sense
given that rainfalls of similar (or even higher) magnitude had already been recorded in the past.
3.5. Bayesian fitting of the peak-to-sum ratio
When the scaled beta distribution for Guis used, the marginal distribution of Pgiven M>u is
the 1-inflated scaled beta distribution defined, for all p∈[0, 1], by
IB.p|ω,θ,α,β/=ωδ{1}.p/ +.1−ω/B.θ,1/.p|α,β/,
where δ{1}denotes a Dirac mass at 1. To fit this distribution, it was first reparameterized by
setting ν=α=.α+β/and γ=α+β, so that the following non-informative priors could be used:
fω.ω/∝ω−1.1−ω/−1,ω∈.0, 1/;
fθ.θ/=1, θ∈.0, 1/;
fν.ν/∝1, ν∈.0, 1/;
fγ.γ/∝1=γ,γ∈.0, ∞/:
12 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
20 30405060708090100110
Sample quantiles (mm)
20
30
40
50
60
70
80
90
100
110
Quantiles of the Generalized Pareto distribution (mm)
0.2 0.30.4 0.5 0.6 0.7 0.80.9 1
Sample quantiles
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Quantile of the Scaled Beta
(a)
(b)
Fig. 7. (a) QQ-plot of the GPD fitted to the 220 cluster maxima and (b) QQ -plot of the 1-inflated beta
distribution fitted to the 220 peak-to-sum ratios: , data; , 95% credible bounds
Modelling Extreme Rain Accumulation 13
20 40 60 80 100 120 140
Observed cluster accumulation (mm)
20
40
60
80
100
120
140
Simulated cluster accumulation (mm)
0.2 0.4 0.6 0.8
Rank of P
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Rank of S
(a)
(b)
Fig. 8. (a) QQ-plot of the cluster sums from the random-scale model and (b) rank plot of pairs .P ,S/ derived
from one random sample of size 220 from the fitted random-scale model
14 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
The posterior for the lower bound θis insensitive to this choice of prior (not shown). The
QQ-plot of the fitted 1-inflated scaled beta distribution is displayed in Fig. 7(b). It suggests
a good fit, particularly in the lower tail. This is important because low values of Ptypically
correspond to long clusters with several days of heavy rain. The Bayesian point estimates of the
1-inflated scaled beta distribution are ˆ
θ=0:205, ˆα=1:92, ˆ
β=1:14 and ˆω=0:207.
Fig. 8 provides two additional diagnostic plots attesting to the good fit of the random-scale
model. Fig. 8(a) displays the QQ-plot of the cluster sums in which the theoretical quantiles were
computed by a Monte Carlo procedure. The fit of the cluster sum distribution derived from the
random-scale model is acceptable; in spite of a light overestimation in the interval .80, 120/,
the right-hand tail is well estimated. Fig. 8(b) shows the rank plot of the pairs .P,S/ for one
random sample of size 220 generated from the fitted random-scale model. Comparing Fig. 8(b)
with Fig. 5(b), we can see that the dependence between Pand Sis well captured.
4. Computation of the return period of the spring 2011 flood
In the Lake Champlain watershed, the value Tof the spring precipitation accumulation is the
main contributing factor to floods. As mentioned before and illustrated in Fig. 2(b), the value
of Tobserved in 2011 was very high: 510 mm. Because of the presence of extreme rainfall, it is
natural to regard Tas the sum Z+Wof two independent components, namely the accumulation
Zof non-extreme rainfall and the accumulation Wof precipitation from the clusters of high
precipitation. For any given year k∈{1, :::, 127}between 1884 and 2010, the observed value Zk
is simply the total precipitation accumulation in year kminus the accumulation Wkof rain from
clusters of high precipitation in the same year. The independence between Zand Wwas assessed
by using the tie-corrected version of the Cram´
er–von Mises rank test of independence that is
described in Genest et al. (2019) (p-value approximately 0:098); the rank plot is displayed in
Fig. 9. The horizontal line of points in Fig. 9 corresponds to years with no extreme precipitation.
0 0.2 0.4 0.6 0.81
Rank of W
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Rank of Z
Fig. 9. Rank plot derived from the pairs .W1,Z1/,...,.W127,Z127/of total extreme and non-extreme precip-
itations for the years 1884–2010
Modelling Extreme Rain Accumulation 15
100 150 200 250
100 150 200 250
Quantiles of the Normal Distribution (mm)
Sample quantities (mm)
50 100 150 200 250
100 150 200 250
Quantiles of the Normal Distribution (mm)
Sample quantities (mm)
100 150 200 250
100 150 200 250 300
Quantiles of the Normal Distribution (mm)
Sample quantities (mm)
100 150 200 250
100 150 200 250 300
Quantiles of the Normal Distribution (mm)
Sample quantities (mm)
100 150 200 250
100 150 200 250 300
Quantiles of the Normal Distribution (mm)
Sample quantities (mm)
100 150 200 250
100 150 200 250 300
Quantiles of the Normal Distribution (mm)
Sample quantities (mm)
(a)(b)
(c) (d)
(e) (f)
Fig. 10. QQ -plots of the non-extreme spring accumulation: (a) random-scale model;(b) M3–Dirichlet model;
(c) conditional exceedance model fitted by using constrained maximum likelihood; (d) conditional exceedance
model fitted by using the semiparametric Bayesian method;(e) first-order Markov chain model with asymptotic
independence; (f) first-order Markov chain model with asymptotic dependence
16 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
Because Zkis a sum of daily rainfall amounts, none of which is extreme, and given that
the entire series is stationary, it seems reasonable to assume that Z1,:::,Z127 form a normal
random sample. This assumption was validated by using a Shapiro–Wilk normality test (p-value
approximately 0.67). The predictive distribution of the accumulation Zof non-extreme rainfall
was found to be Student twith n−1=126 degrees of freedom, location ¯z=161:2 and scale
√{.n +1/s2=n}with s2=1739:9; the corresponding 95% credible intervals are .221:487, 245:91/
and .61:9103, 79:3274/. These Bayesian estimates were obtained by using the reference prior
defined, for all τ>0, by f.ν,τ/∝1=τ2.FromtheQQ-plot of non-extreme accumulations displayed
in Fig. 10(a), the fit is good.
Using the random-scale model, the distribution of W=Wkfor any given year k∈{1, :::, 127}
can be approximated by Monte Carlo sampling, as follows. First, the number Nkof clusters of
high precipitation in spring kis drawn from the predictive distribution given, for all n∈N,by
f.Nk|Y=y/.n/ =∞
0
P.n|91λ/G.λ|a;b/dλ,.4/
where P.·|ζ/denotes the Poisson distribution with mean ζand G.·|a;b/ is the gamma distribution
with mean a=b; the latter distribution is the posterior for λgiven Ysince the improper prior
fλ.λ/∝1=λwas assumed for the frequency of clusters. Here ζ=91λ, where the factor 91 denotes
a period of 91 days, i.e. the months of April, May and June which constitute the spring season.
Furthermore, a=220 corresponds to the number of cluster maxima and b=11557 corresponds
to the number of days of observations (127 years with 91 spring days per year).
Second, given a number Nk=nkof clusters of high precipitation, the cluster maxima
M1,:::,Mnkare drawn independently from the predictive distribution obtained from the POT
model given, for all z>0, by
f.M−u|Y=y/.z/ =∞
−∞ ∞
0
GP.z|σ,ξ/f
[.σ,ξ/|Y=y].σ,ξ/dσdξ,.5/
where GP.·|σ,ξ/denotes the GPD. Third, the peak-to-sum ratios P1,:::,Pnkare drawn inde-
pendently from the predictive distribution defined, for all p>0, by
f.P|Y=y/.p/ =1
01
01
0∞
0
IB.p|ω,θ,ν,γ/f
[.ω,θ,ν,γ/|Y=y].ω,θ,ν,γ/dγdνdθdω:.6/
The total amount of rain from clusters of high precipitation is then given by W=M1=P1+:::+
Mnk=Pnk. This procedure is summarized in algorithm 1 in Table 1.
The QQ-plots of the total and extreme spring precipitation accumulation corresponding to
Tab l e 1. Algorithm 1: generating rainfall accumulation for spring kfrom Nkclusters of
high precipitation
Step 1: draw the number Nk=nkof clusters of high precipitation from distribution (4)
Step 2: draw the excesses M1−u,:::,Mnk−ufrom distribution (5)
Step 3: draw the peak-to-sum ratios P1,:::,Pnkfrom distribution (6)
Step 4: draw the accumulation of precipitation pertaining to clusters of high precipitation,
Wk=M1=P1+:::+Mnk=Pnk
Step 5: draw the accumulation Zof non-extreme rainfall from its predictive distribution
Step 6: compute the total spring accumulation Tk=Zk+Wk
Modelling Extreme Rain Accumulation 17
150 200 250 300 350 400
100 200 300 400
Observed spring accumulations (mm)
Simulated spring accumulations (mm)
150 200 250 300 350 400 450
100 200 300 400 500 600
Observed spring accumulations (mm)
Simulated spring accumulations (mm)
150 200 250 300 350 400
100 200 300 400
Observed spring accumulations (mm)
Simulated spring accumulations (mm)
150 200 250 300 350 400
100 200 300 400
Observed spring accumulations (mm)
Simulated spring accumulations (mm)
150 200 250 300 350 400
100 150 200 250 300 350 400
Observed spring accumulations (mm)
Simulated spring accumulations (mm)
150 200 250 300 350 400
100 150 200 250 300 350 400
Observed spring accumulations (mm)
Simulated spring accumulations (mm)
(a)(b)
(c) (d)
(e) (f)
Fig. 11. QQ -plots of the total spring accumulation: (a) random-scale model; (b) M3–Dirichlet model; (c)
conditional exceedance model fitted by using constrained maximum likelihood; (d) conditional exceedance
model fitted by using the semiparametric Bayesian method;(e) first-order Markov chain model with asymptotic
independence; (f) first-order Markov chain model with asymptotic dependence
18 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
0 50 100 150 200 250 300
0 50 100 150 200 250
Observed spring accumulations
of extremes (mm)
Simulated spring accumulations
of extremes (mm)
0 50 100 150 200 250
0 100 200 300
Observed spring accumulations
of extremes (mm)
Simulated spring accumulations
of extremes (mm)
0 50 100 150 200
0 50 100 150 200 250
Observed spring accumulations
of extremes (mm)
Simulated spring accumulations
of extremes (mm)
0 50 100 150 200
0 50 100 150 200
Observed spring accumulations
of extremes (mm)
Simulated spring accumulations
of extremes (mm)
0 50 100 150 200 250
050100150
Observed spring accumulations
of extremes (mm)
Simulated spring accumulations
of extremes (mm)
0 50 100 150 200
0 50 100 150 200
Observed spring accumulations
of extremes (mm)
Simulated spring accumulations
of extremes (mm)
(a)(b)
(c) (d)
(e) (f)
Fig. 12. QQ-plots of the extreme spring accumulation: (a) random-scale model; (b) M3–Dirichlet model; (c)
conditional exceedance model fitted by using constrained maximum likelihood; (d) conditional exceedance
model fitted by using the semiparametric Bayesian method;(e) first-order Markov chain model with asymptotic
independence; (f) first-order Markov chain model with asymptotic dependence
Modelling Extreme Rain Accumulation 19
0 200 400 600 800 1000
Return period (years)
0
0.5
1
1.5
2
2.5
3
3.5
410-3
Fig. 13. Predictive density of the return period estimated with the spring accumulation of the period
1884–2010
the fitted random-scale model are displayed in Figs 11 and 12 respectively. In both cases, the fit
is excellent.
The probability that Tsurpasses the value that was observed in spring 2011, i.e Pr.T >
510 mm/, can then be estimated from the predictive distribution, leading to a return period
of 430 years; the corresponding one-sided 95% credible interval is [231,∞/. The predictive
distribution of the return period is displayed in Fig. 13. Thus although the heavy rain of 69.6 mm
that was recorded on April 26th, 2011, is not uncommon, as already mentioned in Section 3.4,
the total spring 2011 rainfall accumulation does qualify as a rare event according to the random-
scale model.
Spring 2011 was also atypical in that five clusters of high precipitation were recorded and
the total rain accumulation in these clusters was 318 mm. On the basis of the random-scale
model, the probability of observing five or more clusters in a given spring is 3:62 ×10−2; the
corresponding Bayesian estimate of the return period is 33 years, which is not so high. However,
Pr.W > 318 mm/≈3:13 ×10−3, which corresponds to a return period of 302 years.
It would also have been possible to sample directly from the observed peak-to-sum ratios
P1,:::,Pnin algorithm 1 rather than from the fitted 1-inflated scaled beta distribution. Such a
bootstrapping approach would possibly make very good sense when a large data set is available.
In the present application, the parametric and non-parametric approaches lead to virtually the
same predictive distribution of the return period.
5. Comparisons with existing models
In this section, we briefly review existing approaches for the modelling of clusters of extreme
events and use the Burlington precipitation series to discuss their pros and cons with respect
to the random-scale model that is advocated here. We consider the M3–Dirichlet approach in
20 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
Section 5.1, the conditional exceedance model in Section 5.2 and a first-order Markov chain
model in Section 5.3.
5.1. The M3–Dirichlet model
S¨
uveges and Davison (2012) studied a disastrous rainfall that occurred in coastal Venezuela in
December 1999. As for the Burlington precipitation data that are considered here, standard
extremal models failed to account for this catastrophe because clusters of heavy precipitation
were not appropriately accounted for. To model such clusters, S ¨
uveges and Davison (2012)
proposed to rely on the moving maximum process M3 due to Smith and Weissman (1996).
Recall that a univariate stationary time series .Yi:i∈Z/is said to be an M3-process if, for each
i∈Z, we can write Yi=maxk∈Zmaxl∈Nal,kXl,i−kin terms of mutually independent unit Fr´
echet
random variables .Xl,k:l∈N,k∈Z/and a so-called filter matrix A=.al,k:l∈N,k∈Z/of non-
negative constants summing to 1. It is typically assumed, as S ¨
uveges and Davison (2012) did, that
al,k>0 only when l∈{1, :::,L}and k∈{1, :::,K}so that all profiles are of the same fixed length
K. When normalized by the sum of its components, i.e. .cl,1,:::,cl,K/=.al,1,:::,al,K/=.al,1 +
:::+al,K/, the lth row of Ais referred to as the signature of the lth cluster type.
S¨
uveges and Davison (2012) argued that, when the threshold uis sufficiently high, any cluster
.Yj:j∈C/of extremes, once normalized by the sum of its components, i.e.
W=.Wj:j∈C/=1
j∈C
Yj
.Yj:j∈C/,.7/
corresponds to a noisy version of one of the signatures. This intuition is rooted in a result of
Zhang and Smith (2004) stating that, if .Yi:i∈Z/is an M3-process, then, for each l∈{1, :::,L},
Pr.Yt+1,:::,Yt+K/
Yt+1+:::+Yt+K=.cl,1,:::,cl,K/infinitely often=1:
Therefore, S¨
uveges and Davison (2012) proposed
(a) to normalize the series so that its marginals are approximately unit Fr´
echet;
(b) to identify clusters of extremes of a fixed length Kthrough an elaborate algorithm and
(c) to model the normalized cluster profiles Wwith a finite Dirichlet mixture.
The number of mixing components is at least Land an estimate of the filter matrix Ais then
obtained from the fitted Dirichlet parameters.
To apply the M3–Dirichlet model to the Burlington precipitation data, we considered three
thresholds set at the 95th, 97th and 98th centiles of precipitation (including the 0s) corresponding
to u=14:2, 18.0, 21.6 mm respectively. The last value of ucorresponds to the 95th centile of
non-zero precipitation that was used earlier. Three possible run lengths and five choices for
the number of components for the Dirichlet mixture were considered, namely r∈{1, 2, 3}and
m∈{1, :::,5}; however, r=1 could not be used with u=14:2 mm as it resulted in many overlaps
between clusters. To estimate the return period of the spring 2011 accumulation of 510 mm for
each different combination of u,rand m, 100000 spring extreme rainfalls were simulated as
follows.
(a) First generate the total number of profiles of extreme precipitation from a Poisson distri-
bution whose intensity is the mean number of profiles observed for the combination of u,
rand munder consideration, e.g. 2:44 when u=18:0 mm, r=3 and m=1.
(b) Next, when m>1, generate the number of profiles in each mixture component from a
Modelling Extreme Rain Accumulation 21
multinomial distribution whose parameters are the number of profiles per group divided
by the total number of observed profiles.
(c) Finally, for a profile in a given group, sample the total accumulation by drawing inde-
pendently the profile maximum from the fitted GPD and divide it by the maximum of the
Dirichlet vector W.
As in the random-scale model, the total spring non-extreme accumulation was assumed inde-
pendent of the extreme accumulation and was modelled by using the normal distribution. The
Bayesian information criterion BIC and QQ-plots of the spring non-extreme, extreme and to-
tal precipitation accumulations were used to select the best-fitting model, which had threshold
u=18:0 mm, run length r=3 and a Dirichlet distribution (i.e. m=1). The parameter estimates
of the GPD were ˆσ=9:54 and ˆ
ξ=1:64 ×10−7and those of the Dirichlet distribution were
.0:427, 0:526, 5:246, 0:508, 0:460/.
The QQ-plot of the spring precipitation total of the best-fitting model is displayed in Fig. 11.
The fit looks good overall, as does the fit of the extreme subtotal that is displayed in Fig. 12.
However, the fit of the non-extreme subtotal shown in Fig. 10 is somewhat less satisfactory.
There is no evidence of dependence between these two subtotals; the test of independence due
to Genest et al. (2019) yielded a p-value of 0:328. The QQ-plot of the cluster sums in Fig. 14
exhibits an overestimation of the upper tail. This model leads to an estimated return period of
the 2011 observation which is smaller than with the random-scale model, namely 290 years.
Compared with the random-scale model, the M3–Dirichlet approach has the advantage of
modelling the entire normalized profile, thus allowing for inference about other quantities than
the cluster sum. In this application, however, it is precisely the normalized profile distribution
which is problematic. The fixed profile length Kranged from 4 to 6 for the various combinations
of u,rand mconsidered; we found K=5 for the best-fitting model (u=18:0 mm, r=3 and m=1).
Some of the profiles included days without rain, which seems unreasonable. More importantly,
the marginal PP- and QQ-plots suggest that the Dirichlet distribution fits the normalized profiles
poorly. This problem occurred for all combinations of u,rand mthat were considered.Moreover,
the return period estimates were rather unstable as a function of u,rand mwith values ranging
from 40 to over 100000 years.
5.2. The conditional exceedance model
Following Keef et al. (2009) and Winter and Tawn (2016), one could also adapt the conditional
exceedance model of Heffernan and Tawn (2004) to account for clusters of extreme values
in the Burlington series. Given that a daily precipitation Yiexceeds some threshold u, this
approach provides a convenient semiparametric model for Yi+1,:::,Yi+τ, where τis the lag
beyond which observations can be deemed independent of Yi. Because independence appears to
hold at any lag τ>1 for the thresholds u∈{14:2, 18:0, 21:6}that were considered in Section 5.1,
we chose τ=1. On transformation to Laplace margins, the model boils down to assuming that
Pr{Yi−u>x,.Yi+1−aYi/=Y b
iz|Yi>u}≈exp.−x/G.z/ for some non-degenerate distribution
Gwhich can be estimated either non-parametrically (Keef, Papastathopoulos and Tawn, 2013;
Keef, Tawn and Lamb, 2013) or via a Bayesian semiparametric procedure (Lugrin et al., 2016).
Both estimation approaches were used and, in each case, the return period for the spring
2011 event was estimated by using 100000 samples of total spring accumulations. To do this,
the total extreme and non-extreme accumulations were assumed independent and the latter
was taken to be normal. The total number of clusters of extreme precipitation in spring k∈
{1, :::, 100000}was then drawn from the Poisson distribution and each cluster of extreme
precipitation was simulated by using the method of Rootz ´
en (1998), i.e. Yi−uwas first generated
22 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
20 40 60 80 100 120 140
20 40 60 80 100 120 140
Observed cluster accumulations (mm)
Simulated cluster accumulations (mm)
20 40 60 80 100 120
50 100 150 200
Observed cluster accumulations (mm)
Simulated cluster accumulations (mm)
20 40 60 80 100 120
20 40 60 80 100
Observed cluster accumulations (mm)
Simulated cluster accumulations (mm)
20 40 60 80 100 120
20 40 60 80 100 120 140
Observed cluster accumulations (mm)
Simulated cluster accumulations (mm)
20 40 60 80 100 120
20 30405060
Observed cluster accumulations (mm)
Simulated cluster accumulations (mm)
20 40 60 80 100 120
20 40 60 80
Observed cluster accumulations (mm)
Simulated cluster accumulations (mm)
(a)(b)
(c) (d)
(e) (f)
Fig. 14. QQ-plots of the cluster sums: (a) random-scale model; (b) M3–Dirichlet model; (c) conditional
exceedance model fitted by using constrained maximum likelihood;(d) conditional exceedance model fitted by
using the semiparametric Bayesian method;(e) first-order Mar kov chain model with asymptotic independence;
(f) first-order Markov chain model with asymptotic dependence
Modelling Extreme Rain Accumulation 23
and the conditional exceedance model was then used to simulate the following day, and so forth,
until an observation dropped below u.
Based on the constrained likelihood approach of Keef, Papastathopoulos and Tawn (2013),
the estimate of .a,b/ was .−0:0286, 0:123/when u=14:2, .−0:00285, −0:406/when u=18 and
.0:0129, −0:895/when u=21:6. The best fit of the extreme precipitation totals was obtained
when u=21:6 mm, leading to a return period of 2631:6 years. The QQ-plot of the spring
precipitation totals displayed in Fig. 11 looks decent, but the QQ-plot of the extreme spring
accumulation in Fig. 12 reveals that the fit in the upper tail is rather poor; the underestimation
of the upper tail is worse at lower thresholds. The situation is much improved when the Bayesian
semiparametric method of Lugrin et al. (2016) is used. The optimal choice of threshold here
is again u=21:6 mm. The medians of the posterior samples of aand bwere −0:00406 and
0.00463 respectively, hinting at asymptotic independence, and the estimated return period is
1481:6 years. The QQ-plots of the total and total extreme spring accumulation are displayed
in Figs 11 and 12 respectively. In this case, both plots look fine. From Fig. 14, the fit of the
cluster sums is particularly good but this is in fact largely due to the excellent fit of the GPD
for the cluster maximum because over 95% of clusters are of length 1 or 2 and, in most cases,
the second day contains only traces of precipitation. For both the constrained likelihood and
the semiparametric Bayesian approach, the normal distribution fits the non-extreme spring
accumulations very well (see Fig. 10) and there is no reason to suspect that the extreme and
non-extreme precipitation totals are dependent; the test of independence that is described in
Genest et al. (2019) yielded a p-value of 0.688 when the Bayesian semiparametric method or
the constrained likelihood approach was used.
In conclusion, the conditional exceedance model fitted by using the semiparametric Bayesian
method is good at capturing precipitation accumulation, but the estimated return period for
the 2011 event is much higher than with the random-scale model. As with the M3–Dirichlet
approach, the conditional exceedance model can be used to perform inference on other quantities
than just the cluster sum, but it is more complex than the approach that is presented here.
5.3. First-order Markov chain model
Given that the first-order Markov assumption corresponding to a lag τ=1 in the conditional
exceedance model is reasonable, we also considered the first-order Markov chain model of Smith
et al. (1997) with the asymmetric logistic distribution, and its extension due to Ramos and Led-
ford (2009) that incorporates cases of asymptotic independence and uses a modified version of
the asymmetric logistic dependence structure. We tried the same thresholds u∈{14:2, 18:0, 21:6}
as in the previous subsections but found that, in both cases, u=21:6 mm was again the best
choice. As before, we assumed that the non-extreme precipitation totals are normal and inde-
pendent of extreme precipitation totals. The hypothesis of independence was not rejected by
using the test of Genest et al. (2019); p-values of 0:228 and 0:992 were found for the asymptotic
dependence and independence models respectively.
The parameter estimates in the model of Ramos and Ledford (2009) were ˆη=0:999, ˆρ=2:32
and ˆα=1. Because ˆηis close to 1, this model hints at asymptotic dependence but it is obvious from
the QQ-plots in Figs 11, 12 and 14 that this model gives a poor fit, particularly in the upper tail
which is badly underestimated. It is thus not surprising that it leads to a very large return period
of over 10000 years. The underestimation is worse at lower thresholds, because ˆηdecreases.
The asymptotic dependence model fits the data better as evidenced by Figs 11, 12 and 14,
although it does not perform as well as the random-scale model and the conditional exceedance
model when the semiparametric Bayesian method is used. The asymmetric logistic parameter
estimates are ˆ=0:942 (dependence) and .ˆ
θ1,ˆ
θ2/=.0:282, 0:999/(asymmetry). This means that
24 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
there seems to be a very slight positive dependence, but samples from the asymmetric logistic
distribution with the estimated parameters are almost indistinguishable from independence.
As uincreases, the estimate of decreases and hence the association increases, which leads
to a better fit in the upper tail of both total and extreme total precipitation. The estimated
return period of the spring 2011 precipitation total is 1053 years, which is still much larger than
suggested by the random-scale model.
6. Conclusion
In this paper, precipitation recorded at Burlington, Vermont, was used to estimate the return
period of the spring 2011 Lake Champlain flood. This series contains several clusters of extreme
values, which need to be taken into account for flood risk estimation. For this, a simple extension
of the POT model, called the random-scale model, was proposed in which each cluster maximum
is scaled up by a random factor referred to as the peak-to-sum ratio. In this model, a GPD is used
for the excess of cluster maxima beyond a high threshold and the peak-to-sum ratios are taken
to follow a 1-inflated beta distribution. In principle, these two variables could be dependent, in
which case their association could be modelled by a copula. In the application that is considered
here, however, it could realistically be assumed that they are independent; this assumption also
seems theoretically sensible at high thresholds in various contexts, including when the underlying
series is regularly varying. Although the approach is tailored for precipitation data in this paper,
it could be used in other situations where cluster totals are of interest.
The random-scale model was seen to fit the Burlington precipitation data well. Through
Monte Carlo simulations, it led to a high, yet realistic, estimate of 430 years for the return period
of the spring 2011 accumulation of 510 mm. Assuming stationarity of the precipitation series,
the probability that such an event will occur again thus remains small. In fact, the estimated
100-year return level of a spring accumulation is 446 mm, which is 70 mm less than the value
that was observed in 2011. The estimate of the return period of the 2011 flood should help the
International Joint Commission on the Lake Champlain and the Richelieu River in identifying
the causes and effects of flooding, and in developing appropriate mitigation solutions and
recommendations.
In the context of the present data application, the random-scale model was compared with
other models that have been proposed in the literature (Table 2). Out of these, the conditional
Tab l e 2. Summary performance of the models considered†
Model Cluster Cluster Non- Total Estimated return
sum S accumulation W extreme Z T period (years)
RAN-SCL ≈ 430
M3–Dirichlet Overestimates ≈×290
CE-ML ××≈×2632
CE-SB ≈≈1481.6
MC-IND ××Tail underestimates ×>10000
MC-DEP ×≈Tail underestimates ×1053
Plots Fig. 14 Fig. 12 Fig. 10 Fig. 11 —
†RAN-SCL, random-scale model; CE-SB, conditional exceedance model fitted with the semiparametric Bayesian
method; CE-ML conditional exceedance model fitted with the restricted maximum likelihood method: MC-DEP,
first-order Markov chain with asymptotic dependence; MC-IND, first-order Markov chain with asymptotic inde-
pendence; , good; ≈, so-so; ×, bad.
Modelling Extreme Rain Accumulation 25
exceedance model fitted with the semiparametric Bayesian approach proposed by Lugrin et al.
(2016) was the closest competitor. The random-scale model and conditional exceedance model
could both adequately capture the accumulations of a streak of large rainfall, although the
cluster definitions of both models are different. The benefits of the random-scale model are
its simplicity and the flexibility of the cluster definition that can be used. Here, we used a
cluster definition that is intuitive to hydrologists but the definition defined by the runs method
could also be directly implemented. Possibly the main advantage of the conditional exceedance
model over the present approach is that it captures the entire cluster dynamics and can be
used to estimate other quantities than cluster sums such as cluster length or the probability of
consecutive threshold exceedances. If cluster summaries such as these were deemed useful for
assessing flood risk, the random-scale model would need to be extended, but it is not obvious
how this could be done.
In the future, it may be interesting to explore the effect of other variables such as snowpack,
and to take into account rainfall in the entire watershed, not just at Burlington. It also seems
that rainfalls in the area have intensified since 2011, and it may be worthwhile to investigate
whether this phenomenon is transient or whether it is a trend that may be attributed to climate
change or other factors.
Acknowledgements
Thanks are due to the National Centers for Environmental Information of the US National
Oceanic and Atmospheric Administration for freely providing the precipitation data that were
used in this study. Funding in partial support of this work was provided by the Canada Research
Chairs Program, the Natural Sciences and Engineering Research Council (grants RGPIN/2018–
04481, RGPIN/2016–04720 and RGPIN/06801–2015), the Canadian Statistical Sciences Insti-
tute and the Fonds de recherche du Qu´
ebec—Nature et technologies (grant 2015–PR–183236),
as well as by the Mitacs Elevate Program.
Appendix A
This appendix reports the results of a small-scale simulation study that was run to test the hypothesis of
independence between the cluster maximum Mand the peak-to-sum ratio P=M=S. For simplicity, clusters
of length 2 only were considered. Various combinations of marginal behaviour and extremal dependence
were investigated for the pair .Y1,Y2/through the following scenarios:
(a) light-tailed margins and asymptotic independence,
(i) a bivariate normal distribution with correlation ρ=0:4 and standard margins,
(ii) a bivariate normal distribution with correlation ρ=0:7 and standard margins and
(iii) a Liouville distribution with gamma radius, i.e. the distribution of a pair .Y1,Y2/=R×
.U,1−U/, where Uis uniform on .0, 1/and independent of the gamma variable R, whose
shape and scale parameters were set to θ=3 and σ=1 respectively;
(b) heavy-tailed margins and asymptotic independence, a distribution whose copula is Gaussian with
parameter ρ=0:4 and whose margins are identical Pareto distributions with parameter κ=3;
(c) heavy-tailed margins and asymptotic dependence, a Liouville distribution with unit Pareto radius,
i.e. as in scenario (a)(iii) but with Pareto variable Rhaving parameter κ=3;
(d) independence between Mand Pholds by design, a max-norm symmetric distribution with gamma
radius, i.e. .Y1,Y2/=R×.U,V/= max.U ,V/, where Uand Vare independent uniform random vari-
ables on .0, 1/which are independent of the radial variable R, chosen to be gamma with shape and
scale parameters θ=3 and σ=1 respectively.
For details about why these distributions have the claimed tail behaviour, see Ledford and Tawn (1996)
and Belzile and Neˇ
slehov´
a (2017).
26 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
P
Frequency
0.5 0.6 0.7 0.8 0.9 1.0
0 20406080100120
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
Rank of M
Rank of P
(a)
(b)
Fig. 15. (a) Histogram of P1,:::,Pmand (b) rank plot of the pairs .M1,P1/,:::,.Mm,Pm/for one sample of
size nD105under scenario (b) when mD500
Modelling Extreme Rain Accumulation 27
Tab l e 3. Percentage of rejection of the null hypothesis H0:DDΠbased on
1000 independent samples of size n2{5000, 105}from five distributions de-
fined in the text
nDistribution % of rejection for the following
numbers of exceedances:
2500 1000 500 100
5000 (a) (i) Gauss .ρ=0:4/100 99.5 48.8 7.8
(a) (ii) Gauss .ρ=0:7/100 100 83.0 7.7
(a) (iii) Gamma–Liouville 100 100 93.8 18.1
(b) Gauss–Pareto 100.0 100.0 100.0 98.7
(c) Pareto–Liouville 6.9 5.4 4.1 3.9
(d) Max-norm symmetric 3.7 4.7 4.4 4.0
100000 (a) (i) Gauss .ρ=0:4/92.6 27.4 7.8 4.3
(a) (ii) Gauss .ρ=0:7/97.7 42.5 12.9 4.5
(a) (iii) Gamma–Liouville 100 92.9 53.1 7.9
(b) Gauss–Pareto 100.0 100.0 100.0 94.8
(c) Pareto–Liouville 5.6 5.7 3.1 4.5
(d) Max-norm symmetric 6.4 6.1 4.1 5.0
From each of these models, N=1000 samples of size n∈{5000, 105}were drawn. For each sample, four
thresholds were chosen as the quantiles that lead to the number of exceedances m∈{100, 500, 1000, 2500}.
For each set of exceedances, the pairs .M1,P1/,:::,.Mm,Pm/were computed and the independence hy-
pothesis H0:D=Πwas tested at the 5% level by using the consistent rank-based Cram´
er–von Mises test
of independence from Genest and R´
emillard (2004), implemented in the R package copula.
Table 3 reports the percentages of rejection of H0. As expected, H0is rejected in approximately 5% of
cases under scenario (d). Here, the distribution is constructed in a way that Mand Pare independent for
any threshold. Also, H0seems to hold under scenario (c) even for rather low thresholds when n=5000.
This is because .Y1,Y2/is regularly varying in this case. Interestingly, the independence assumption seems
plausible under scenario (a) if the threshold is sufficiently high. However, the meaning of ‘sufficiently high’
depends on the underlying distribution, and it may be that there are other light-tailed distributions with
asymptotic independence for which H0is not reasonable even at high thresholds.
Finally, under scenario (b), H0is rejected nearly always, even at very high thresholds when n=105.To
illustrate, Fig. 15 displays the histogram of Pand the rank plot of the pairs .M1,P1/,:::,.Mm,Pm/when
n=105and m=500. The lack of independence between Pand Mis clearly visible from the rank plot,
which also exhibits asymmetry and dependence in the upper tail. The dependence is due to the fact that,
when Mis large, .Y1,Y2/tends to lie close to one of the axes because of asymptotic independence. When
this happens, M≈Sand P≈1. A suitable dependence model in this case may thus be the asymmetric
Gumbel (or logistic) family.
References
Belzile, L. R. and Neˇ
slehov´
a, J. G. (2017) Extremal attractors of Liouville copulas. J. Multiv. Anal.,160, 68–92.
Coles, S. (2001) An Introduction to Statistical Modeling of Extreme Values. London: Springer.
Genest, C. and Favre, A.-C. (2007) Everything you always wanted to know about copula modeling but were afraid
to ask. J. Hydrol. Engng,12, 347–368.
Genest, C. and Neˇ
slehov´
a, J. (2012) Copulas and copula models. In Encyclopedia of Environmetrics (eds A. H.
El-Shaarawi and W. W. Piegorsch), 2nd edn, pp. 541–553. Chichester: Wiley.
Genest, C., Neˇ
slehov´
a, J. G., R´
emillard, B. and Murphy, O. A. (2019) Testing independence in multivariate
distributions. Biometrika,106, 47–68.
Genest, C. and R´
emillard, B. (2004) Tests of independence and randomness based on the empirical copula process.
Test,13, 335–369.
Heffernan, J. E. and Tawn, J. A. (2004) A conditional approach for multivariate extreme values (with discussion).
J. R. Statist. Soc. B, 66, 497–546.
28 J. Jalbert, O. A. Murphy, C. Genest and J. G. Neˇslehov ´a
Hsing, T. (1987) On the characterization of certain point processes. Stoch. Processes Appl.,26, 297–316.
International Joint Commission (2013) The identification of measures to mitigate flooding and the impacts of
flooding of Lake Champlain and Richelieu River. Technical Report. International Joint Commission, Ottawa.
Jessen, A. H. and Mikosch, T. (2006) Regularly varying functions. Publ. Inst. Math. Beograd,80, 171–192.
Keef, C., Papastathopoulos, I. and Tawn, J. A. (2013) Estimation of the conditional distribution of a multivariate
variable given that one of its components is large: additional constraints for the Heffernan and Tawn model.
J. Multiv. Anal.,115, 396–404.
Keef, C., Svensson, C. and Tawn, J. A. (2009) Spatial dependence in extreme river flows and precipitation for
Great Britain. J. Hydrol.,378, 240–252.
Keef, C., Tawn, J. A. and Lamb, R. (2013) Estimating the probability of widespread flood events. Environmetrics,
24, 13–21.
Ledford, A. W. and Tawn, J. A. (1996) Statistics for near independence in multivariate extreme values. Biometrika,
83, 169–187.
Lugrin, T., Davison, A. C. and Tawn, J. A. (2016) Bayesian uncertainty management in temporal dependence of
extremes. Extremes,19, 491–515.
Markovich, N. M. (2014) Modeling clusters of extreme values. Extremes,17, 97–125.
Markovich, N. M. (2017) Clusters of extremes: modeling and examples. Extremes,20, 519–538.
Morrison, M. and Tobias, F. (1965) Some statistical characteristics of a peak to average ratio. Technometrics,7,
379–385.
Nelsen, R. B. (2006) An Introduction to Copulas, 2nd edn. New York: Springer.
Northrop, P. J. and Attalides, N. (2016) Posterior propriety in Bayesian extreme value analyses using reference
priors. Statist. Sin.,26, 721–743.
O’Brien, G. L. (1987) Extreme values for stationary and Markov sequences. Ann. Probab.,15, 281–291.
Priestley, M. B. and Subba Rao, T. (1969) A test for non-stationarity of time-series. J. R. Statist. Soc. B, 31,
140–149.
Ramos, A. and Ledford, A. (2009) A new class of models for bivariate joint tails. J. R. Statist. Soc. B, 71, 219–241.
Resnick, S. I. (1987) Extreme Values, Regular Variation and Point Processes. New York: Springer.
Riboust, P. and Brissette, F. (2016) Analysis of Lake Champlain/Richelieu River’s historical 2011 flood. Can. Wat.
Resour. J.,41, 174–185.
Rootz´
en, H. (1998) Maxima and exceedances of stationary Markov chains. Adv. Appl. Probab.,20, 371–390.
Smith, R. L., Tawn, J. A. and Coles, S. G. (1997) Markov chain models for threshold exceedances. Biometrika,
84, 249–268.
Smith, R. L. and Weissman, I. (1994) Estimating the extremal index. J. R. Statist. Soc. B, 56, 515–528.
Smith, R. L. and Weissman, I. (1996) Characterization and estimation of the multivariate extremalindex. Technical
Report. University of North Carolina, Chapel Hill.
S¨
uveges, M. and Davison, A. C. (2012) A case study of a “Dragon-King”: the 1999 Venezuelan catastrophe.
Eur. Phys. J. Specl Top.,205, 131–146.
Taleb, N. N. (2007) The Black Swan: the Impact of the Highly Improbable. New York: Random House.
Winter, H. C. and Tawn, J. A. (2016) Modelling heatwaves in central France: a case-study in extremal dependence.
Appl. Statist.,65, 345–365.
Zhang, Z. and Smith, R. L. (2004) The behavior of multivariate maxima of moving maxima processes. J. Appl.
Probab.,41, 1113–1123.