ArticlePDF Available

# How extreme is extreme? An assessment of daily rainfall distribution tails

Authors:

## Abstract and Figures

The upper part of a probability distribution, usually known as the tail, governs both the magnitude and the frequency of extreme events. The tail behaviour of all probability distributions may be, loosely speaking, categorized into two families: heavy-tailed and light-tailed distributions, with the latter generating "milder" and less frequent extremes compared to the former. This emphasizes how important for hydrological design it is to assess the tail behaviour correctly. Traditionally, the wet-day daily rainfall has been described by light-tailed distributions like the Gamma distribution, although heavier-tailed distributions have also been proposed and used, e.g., the Lognormal, the Pareto, the Kappa, and other distributions. Here we investigate the distribution tails for daily rainfall by comparing the upper part of empirical distributions of thousands of records with four common theoretical tails: those of the Pareto, Lognormal, Weibull and Gamma distributions. Specifically, we use 15 029 daily rainfall records from around the world with record lengths from 50 to 172 yr. The analysis shows that heavier-tailed distributions are in better agreement with the observed rainfall extremes than the more often used lighter tailed distributions. This result has clear implications on extreme event modelling and engineering design.
Content may be subject to copyright.
Hydrol. Earth Syst. Sci., 17, 851–862, 2013
www.hydrol-earth-syst-sci.net/17/851/2013/
doi:10.5194/hess-17-851-2013
EGU Journal Logos (RGB)
Geosciences
Open Access
Natural Hazards
and Earth System
Sciences
Open Access
Annales
Geophysicae
Open Access
Nonlinear Processes
in Geophysics
Open Access
Atmospheric
Chemistry
and Physics
Open Access
Atmospheric
Chemistry
and Physics
Open Access
Discussions
Atmospheric
Measurement
Techniques
Open Access
Atmospheric
Measurement
Techniques
Open Access
Discussions
Biogeosciences
Open Access
Open Access
Biogeosciences
Discussions
Climate
of the Past
Open Access
Open Access
Climate
of the Past
Discussions
Earth System
Dynamics
Open Access
Open Access
Earth System
Dynamics
Discussions
Geoscientic
Instrumentation
Methods and
Data Systems
Open Access
Geoscientic
Instrumentation
Methods and
Data Systems
Open Access
Discussions
Geoscientic
Model Development
Open Access
Open Access
Geoscientic
Model Development
Discussions
Hydrology and
Earth System
Sciences
Open Access
Hydrology and
Earth System
Sciences
Open Access
Discussions
Ocean Science
Open Access
Open Access
Ocean Science
Discussions
Solid Earth
Open Access
Open Access
Solid Earth
Discussions
The Cryosphere
Open Access
Open Access
The Cryosphere
Discussions
Natural Hazards
and Earth System
Sciences
Open Access
Discussions
How extreme is extreme? An assessment of daily rainfall
distribution tails
S. M. Papalexiou, D. Koutsoyiannis, and C. Makropoulos
Department of Water Resources, Faculty of Civil Engineering, National Technical University of Athens,
Heroon Polytechneiou 5, 157 80 Zographou, Greece
Correspondence to: S. M. Papalexiou (smp@itia.ntua.gr)
Received: 6 April 2012 – Published in Hydrol. Earth Syst. Sci. Discuss.: 2 May 2012
Revised: 6 February 2013 – Accepted: 6 February 2013 – Published: 28 February 2013
Abstract. The upper part of a probability distribution, usu-
ally known as the tail, governs both the magnitude and the
frequency of extreme events. The tail behaviour of all prob-
ability distributions may be, loosely speaking, categorized
into two families: heavy-tailed and light-tailed distributions,
with the latter generating “milder” and less frequent extremes
compared to the former. This emphasizes how important for
hydrological design it is to assess the tail behaviour correctly.
Traditionally, the wet-day daily rainfall has been described
by light-tailed distributions like the Gamma distribution, al-
though heavier-tailed distributions have also been proposed
and used, e.g., the Lognormal, the Pareto, the Kappa, and
other distributions. Here we investigate the distribution tails
for daily rainfall by comparing the upper part of empirical
distributions of thousands of records with four common the-
oretical tails: those of the Pareto, Lognormal, Weibull and
Gamma distributions. Speciﬁcally, we use 15 029 daily rain-
fall records from around the world with record lengths from
50 to 172yr. The analysis shows that heavier-tailed distribu-
tions are in better agreement with the observed rainfall ex-
tremes than the more often used lighter tailed distributions.
This result has clear implications on extreme event modelling
and engineering design.
1 Introduction
Heavy rainfall may induce serious infrastructure failures and
may even result in loss of human lives. It is common then
to characterize such rainfall with adjectives like “abnormal”,
“rare” or “extreme”. But what can be considered “extreme”
rainfall? Behind any discussion on the subjective nature of
such pronouncements, there lies the fundamental issue of in-
frastructure design, and the crucial question of the threshold
beyond which events need not be taken into account as they
are considered too rare for practical purposes. This question
is all the more pertinent in view of the EU Flooding Direc-
tive’s requirement to consider “extreme (ﬂood) event scenar-
ios” (European Commission, 2007).
Although short-term prediction of rainfall is possible to a
degree (and useful for operational purposes), long-term pre-
diction, on which infrastructure design is based, is infeasible
in deterministic terms. We thus treat rainfall in a probabilistic
manner, i.e., we consider rainfall as a random variable (RV)
governed by a distribution law. Such a distribution law en-
ables us to assign a return period to any rainfall amount, so
that we can then reasonably argue that a rainfall event, e.g.,
with return period 1000 yr or more, is indeed an extreme. Yet,
which distribution law we should choose is still a matter of
debate.
The typical procedure for selecting a distribution law for
rainfall is to (a) try some of many, a priori chosen, parametric
families of distributions, (b) estimate the parameters accord-
ing to one of many existing ﬁtting methods, and (c) choose
the one best ﬁtted according to some metric or ﬁtting test.
Nevertheless, this procedure does not guarantee that the se-
lected distribution will model adequately the tail, which is
the upper part of the distribution that controls both the mag-
nitude and frequency of extreme events. On the contrary, as
only a very small portion of the empirical data belongs to the
tail (unless a very large sample is available), all ﬁtting meth-
ods will be “biased” against the tail, since the estimated ﬁt-
ting parameters will point towards the distribution that best
describes the largest portion of the data (by deﬁnition not
852 S. M. Papalexiou et al.: How extreme is extreme?
belonging to the tail). Clearly, an ill-ﬁtted tail may result
in serious errors in terms of extreme event modelling with
potentially severe consequences for hydrological design. For
example, in Fig. 1 where four different distributions are ﬁtted
to the empirical distribution tail, it can be observed that the
predicted magnitude of the 1000-yr event varies signiﬁcantly.
The distributions can be classiﬁed according to the asymp-
totic behaviour of their tail into two general classes: (a) the
subexponential class with tails tending to zero less rapidly
than an exponential tail (here the term “exponential tail” is
used to describe the tail of the exponential distribution), and
(b) the hyperexponential or the superexponential class, with
tails approaching zero more rapidly than an exponential tail
(Teugels, 1975; Kl¨
uppelberg, 1988, 1989). Mathematically,
this “intuitive” deﬁnition of the subexponential class for a
distribution function Fis expressed as
lim
x→∞
1F (x)
exp(x/β ) = ∞ β > 0,(1)
while several equivalent mathematical conditions, in order to
classify a distribution as subexponential, have been proposed
(see, e.g., Embrechts et al., 1997; Goldie and Kl¨
uppelberg,
1998). Furthermore, this is not the only classiﬁcation, as sev-
eral other exist (see, e.g., El Adlouni et al., 2008, and refer-
ences therein). In addition, many different terms have been
used in the literature to refer to tails “heavier” than the expo-
nential, e.g., “heavy tails”, “fat tails”, “thick tails”, or, “long
tails”, that may lead to some ambiguity: see for example the
various deﬁnitions that exist for the class of heavy-tailed dis-
tributions discussed by Werner and Upper (2004). Here, we
use the term “heavy tail” in an intuitive and general way, i.e.,
to refer to tails approaching zero less rapidly than an expo-
nential tail.
The practical implication of a heavy tail is that it predicts
more frequent larger magnitude rainfall compared to light
tails. Hence, if heavy tails are more suitable for modelling
extreme events, the usual approach of adopting light-tailed
models (e.g., the Gamma distribution) and ﬁtting them on
the whole sample of empirical data would result in a signif-
icant underestimation of risk with potential implications for
human lives. However, there are signiﬁcant indications that
heavy tailed distributions may be more suitable. For exam-
ple, in a pioneering study Mielke (1973) proposed the use
of the Kappa distribution, a power-type distribution, to de-
scribe daily rainfall. Today there are large databases of rain-
fall records that allow us to investigate the appropriateness of
light or heavy tails for modelling extreme events. This is the
subject in which this paper aims to contribute.
2 The dataset
The data used in this study are daily rainfall records from
the Global Historical Climatology Network-Daily database
(version 2.60, www.ncdc.noaa.gov/oa/climate/ghcn-daily),
Fig. 1. Four different distribution tails ﬁtted to an empirical tail (P,
LN, W and G stands for the Pareto, the Lognormal, the Weibull
and the Gamma distribution). A wrong choice may lead to severely
underestimated or overestimated rainfall for large return periods.
which includes over 40000 stations worldwide. Many of the
records, however, are too short, have many missing data, or
contain data that are suspect in terms of quality (for details
regarding the quality ﬂags refer to the Network’s website
above).
Thus, only records fulﬁlling the following criteria were se-
lected for the analysis: (a) record length greater or equal than
50yr, (b) missing data less than 20%, and (c) data assigned
with “quality ﬂags” less than 0.1%. Among the several dif-
ferent quality ﬂags assigned to measurements, we screened
against two: values with quality ﬂags “G” (failed gap check)
or “X” (failed bounds check). These were used to ﬂag sus-
piciously large values, i.e., a sample value that is orders of
magnitude larger than the second larger value in the sample.
Whenever such a value existed in the records it was deleted
(this, however, occurred in only 594 records in total, and in
each of these records typically one or two values had to be
deleted). Screening with these criteria resulted in 15137 sta-
tions. The locations of these stations as well as their record
lengths can be seen in Fig. 2, while Table 1 presents some ba-
sic summary statistics of the nonzero daily rainfall of those
records.
We note that we did not ﬁll any missing values as we
deemed it meaningless for this study, focusing on extreme
rainfall, because any regression-type technique would under-
estimate the real values. Missing values only affect the effec-
tive record length and, given the relatively high lower limit of
record length we set (50yr, while much smaller records are
often used in hydrology, e.g., 10–30yr), the resulting prob-
lem was not serious. Additionally, the percentage 20% of
missing daily values refers to the worst case and is actually
much smaller in the majority of the records; thus, missing
values would not alter or modify the conclusions drawn.
Finally, we note that the statistical procedure we describe
next failed in a few records, for reasons of algorithmic con-
vergence or time limits. Excluding these records, the total
number of records where the analysis was applied is 15029.
Hydrol. Earth Syst. Sci., 17, 851–862, 2013 www.hydrol-earth-syst-sci.net/17/851/2013/
S. M. Papalexiou et al.: How extreme is extreme? 853
Fig. 2. Locations of the stations studied (a total of 15137 daily rainfall records with time series length greater than 50 yr). Note that there are
overlaps with points corresponding to high record lengths shadowing (being plotted in front of) points of lower record lengths.
3 Deﬁning and ﬁtting the tail
The marginal distribution of rainfall, particularly at small
time scales like the daily, belongs to the so-called mixed type
distributions, with a discrete part describing the probability
of zero rainfall, or the probability dry, and a continuous part
expressing the magnitude of the nonzero (wet-day) rainfall.
As suggested earlier, studying extreme rainfall requires fo-
cusing on the behaviour of the distribution’s right tail, which
governs the frequency and the magnitude of extremes.
If we denote the rainfall with X, and the nonzero rain-
fall with X|X > 0, then the exceedence probability function
(EPF; also known as survival function, complementary dis-
tribution function, or tail function) of the nonzero rainfall,
using common notation, is deﬁned as
P(X > x|X > 0)=¯
FX|X>0(x) =1FX|X>0(x), (2)
where FX|X>0(x) is any valid probability distribution func-
tion chosen to describe nonzero rainfall. It should be clear
that the unconditional EPF is easily derived if the probabil-
ity dry p0is known: ¯
FX(x) =(1p0)¯
FX|X>0(x). Since we
focus on the continuous part of the distribution, and more
speciﬁcally on the right tail, from this point on, for notational
simplicity we omit the subscript in ¯
FX|X>0(x) denoting the
conditional EPF function simply as ¯
F (x). To avoid ambigu-
ity due to the term “tail function” for EPF, we clarify that we
Table 1. Some basic statistics of the 15 137 records of daily rain-
fall. Apart from probability dry (Pdry), these statistics are for the
nonzero daily rainfall.
No. of nonzero Median Mean SD
Pdry (%) values (mm) (mm) (mm) Skew
min 15.11 320 0.40 1.00 1.76 1.37
Q553.92 2121 1.70 3.61 5.01 2.36
Q25 68.55 4038 3.00 6.18 8.28 2.85
Q50 76.35 5973 4.80 9.27 12.08 3.28
(Median)
Q75 83.65 8497 6.90 12.65 16.42 3.94
Q95 91.36 13060 10.20 17.75 24.25 5.38
max 98.25 27867 25.70 83.96 158.02 26.31
Mean 75.13 6604 5.18 9.77 12.97 3.56
SD 11.46 3508 2.70 4.60 6.20 1.31
Skew 0.74 1.12 1.03 1.16 1.88 5.58
use the term “tail” to refer only to the upper part of the EPF,
i.e., the part that describes the extremes.
At this point, however, we need to deﬁne what we con-
sider as the upper part. A common practice is to set a lower
threshold value xL(see, e.g., Cunnane, 1973; Tavares and
Da Silva, 1983; Ben-Zvi, 2009) and study the behaviour for
values greater than xL. Yet, there is no universally accepted
method to choose this lower value. A commonly accepted
method (known as partial duration series method) is to de-
termine the threshold indirectly based on the empirical dis-
tribution, in such a way that the number of values above the
www.hydrol-earth-syst-sci.net/17/851/2013/ Hydrol. Earth Syst. Sci., 17, 851–862, 2013
854 S. M. Papalexiou et al.: How extreme is extreme?
threshold equals the number of years Nof the record (see,
e.g., Cunnane, 1973). The resulting series, deﬁned in this
way, is known in the literature as annual exceedance series
and is a standard method for studying extremes in hydrology
(see, e.g., Chow, 1964; Gupta, 2011).
This may look similar to another common method in
which the Nannual maxima of the Nyears are extracted
and studied. However, the method of annual maxima, by se-
lecting the maximum value of each year, may distort the tail
behaviour (e.g., when the three largest daily values occur
within a single year, it only takes into account the largest
of them). For this reason, instead of studying the Ndaily an-
nual maxima, here we focus on the Nlargest daily values of
the record, assuming that these values are representative of
the distribution’s tail and can provide information for its be-
better representing the exact tail of the parent distribution.
It is worth noting that a common method of studying se-
ries above a threshold value is based on the results obtained
by Balkema and de Haan (1974) and Pickands III (1975).
Loosely speaking, according to these results the conditional
distribution above the threshold converges to the General-
ized Pareto as the threshold tends to inﬁnity. The latter in-
cludes, as a special case, the Exponential distribution. We
note, though, that these results are asymptotic results, i.e.,
valid (or providing a good approximation) if this threshold
value tends to inﬁnity (or if it is very large). In the case
where the parent distribution is of power type or of expo-
nential type, the theory is applicable even for not so large
threshold values because the convergence of the tail is fast.
In other cases, e.g., Lognormal or Stretched Exponential dis-
tributions, the convergence is very slow. The same applies to
the classical extreme value theory (EVT), which predicts that
the distribution of maxima converges to one of the three ex-
treme value distributions. For some examples illustrating the
slow convergence to the asymptotic distributions of EVT (the
same philosophy applies for Balkema–de Haan–Pickands
theorem), see, e.g., Papalexiou and Koutsoyiannis (2013) and
Koutsoyiannis (2004a).
Given that each station has an N-year record of daily val-
ues and a total number nof nonzero values, we deﬁne the
empirical EPF ¯
FN(xi), conditional on nonzero rainfall, as
the empirical probability of exceedence (according to the
Weibull plotting position):
¯
FN(xi)=1r(xi)
n+1,(3)
where r(xi)is the rank of the value xi, i.e., the position of
xiin the ordered sample x(1)... x(n) of the nonzero val-
ues. Thus, the empirical tail is determined by the Nlargest
nonzero rainfall values of ¯
FN(xi)with nN+1in
(note that xL=x(nN+1)). Some basic summary statistics of
the series of the Nlargest nonzero rainfall values are pre-
sented in Table 2.
Obviously the number of nonzero daily rainfall values is
n=(1p0)ndNwhere nd=365.25 is the average number
of the days in a year. According to the Weibull plotting po-
sition given in Eq. (3), the exceedence probability ¯p(xL)of
xLwill be
¯p(xL)=1nN+1
n+1=N
(1p0)ndN+11
(1p0)nd.(4)
This shows that the exceedence probability of the threshold
xLdepends only on the probability dry p0. Interestingly, the
average p0of the records analysed in this study is approxi-
mately 0.75, which implies that the exceedence probability
of xLis on average as low as 0.01, while even for p0=0.95
its value is 0.055. We deem that values above this threshold
can be assumed to belong to the tail of the distribution. We
note that there are studies (see e.g., Beguer´
ıa et al., 2009)
in which the threshold value was chosen to correspond to
the 90th percentile, a value much smaller than the one cor-
responding to our choice of threshold. In Sect. 6 we discuss
further the selection of the threshold, also in comparison with
different methods of selection.
The ﬁtting method we follow here is straightforward, i.e.,
we directly ﬁt and compare the performance of different the-
oretical distribution tails to the empirical tails estimated from
the daily rainfall records previously described. The theoreti-
cal tails are ﬁtted to the empirical ones by minimizing numer-
ically a modiﬁed mean square error (MSE) norm N1 deﬁned
as
N1 =1
N
n
X
i=nN+1¯
F (x(i) )
¯
FN(x(i) )12
.(5)
A complete veriﬁcation of the method and a comparison with
other norms is presented in Sect. 6. Here we only note that its
rationale (and advantage over classical square error norms)
is that it properly “weights” each point that contributes in
the sum. Namely, it considers the relative error between the
theoretical and the empirical values rather than using the x
values themselves. For example, if we consider the classical
square error, i.e., (xixu)2, with xudenoting the quantile
value for probability uequal to the empirical probability of
the value xi, then large values would contribute much more
to the total error than the smaller ones. This may be a prob-
lem especially for rainfall records where the values usually
differ more than one order of magnitude, e.g., from 0.1mm
to more than 100mm. Obviously, the best ﬁtted tail for a
speciﬁc record is considered to be the one with the smallest
MSE.
The proposed approach, which ﬁts the theoretical distribu-
tion only to the largest points of each dataset, ensures that
the ﬁtted distribution provides the best possible description
of the tail and is not affected by lower values. As an example
of the ﬁtting method, Fig. 3 depicts the Weibull distribution
ﬁtted to an empirical sample (the station was randomly se-
lected and has code IN00121070) by minimizing the norm
Hydrol. Earth Syst. Sci., 17, 851–862, 2013 www.hydrol-earth-syst-sci.net/17/851/2013/
S. M. Papalexiou et al.: How extreme is extreme? 855
Fig. 3. Explanatory diagram of the ﬁtting approach followed. The
dashed line depicts a Weibull distribution ﬁtted to the whole empir-
ical distribution points, while the solid red line depicts the distribu-
tion ﬁtted only to the tail points.
given by Eq. (5) in two ways: (a) in all the points of the em-
pirical distribution, and (b) in only the largest Npoints. It is
clear that the ﬁrst approach (dashed line) does not adequately
describe the tail.
It is well known that several other methods have been ex-
tensively used to estimate the parameters of candidate dis-
tributions, e.g., the lognormal maximum likelihood and the
log-probability plot regression (Kroll and Stedinger, 1996),
and more recently the log partial probability weighted mo-
ments and the partial L-moments (Wang, 1996; Bhattarai,
2004; Moisello, 2007). Yet, the advantage of the proposed
method is that any tail can be ﬁtted in the same manner and
can be directly compared with other ﬁtted tails since the re-
sulting MSE value can clearly indicate the best ﬁtted; in the
aforementioned methods an additional measure has to be es-
timated in order to compare the performance of the ﬁtted dis-
tributions.
4 The ﬁtted distribution tails
It is clear from the previous section that any tail can be ﬁtted
to the empirical ones. Nevertheless, in this study we ﬁt and
compare the performance of four different and common dis-
tribution tails, i.e., the tails of the Pareto type II (PII) the Log-
normal (LN), the Weibull (W), and the Gamma (G) distribu-
tions. These distributions were chosen for their simplicity,
popularity, as well as for being tail-equivalent (or for having
similar asymptotic behaviour) with many other more compli-
cated distributions. It is reminded that two distribution func-
tions Fand Gwith support unbounded to the right are called
tail-equivalent if limx→∞ ¯
F (x)/ ¯
G(x) =cwith 0 < c < .
The Pareto and the Lognormal distributions belong to
the subexponential class and are considered heavy-tailed
Table 2. Some basic statistics of the 15 137 tail samples deﬁned for
an N-year record as the Nlargest nonzero values.
No. of tail Median Mean SD Max
values (mm) (mm) (mm) (mm)
min 50 8.90 10.42 3.01 21.50
Q552 28.30 31.71 8.61 68.60
Q25 61 43.55 48.24 13.85 110.00
Q50 70 62.75 69.12 19.01 152.40
(Median)
Q75 97 85.30 93.72 27.59 218.40
Q95 122 130.30 144.70 47.48 357.60
max 172 977.00 1041.02 395.96 1750.00
Mean 79 68.78 76.01 22.50 175.06
SD 23 34.84 38.20 13.21 93.42
Skew 0.80 2.73 2.58 3.55 1.79
distributions; the Weibull can belong to both classes, depend-
ing on the values of its shape parameter, while the gamma
distribution has essentially an exponential tail but not pre-
cisely (see below). From a practical point of view, the or-
dering of these distributions, from heavier to lighter tail,
is Pareto, Lognormal, Weibull with shape parameter <1,
Gamma and Weibull with shape parameter >1 (see, e.g., El
Adlouni et al., 2008). Note that Pareto is the only power-type
distribution while the other three are of exponential form.
Speciﬁcally, the Pareto type II distribution is the simplest
power-type distribution deﬁned in [0,). Its probability den-
sity function (PDF) and EPF are given, respectively, by
fPII(x) =1
β1+γx
β1
γ1(6)
¯
FPII (x)=1+γx
β1
γ,(7)
and it is deﬁned by the scale parameter β > 0 and the shape
parameter γ0 that controls the asymptotic behaviour of the
tail. Namely, as the value of γincreases, the tail becomes
heavier and consequently extreme values occur more fre-
quently. For γ=0 it degenerates to the exponential tail while
for γ0.5 the distribution has inﬁnite variance. Many other
power-type distributions are tail-equivalent, i.e., exhibiting
asymptotic behaviour similar to x1with the Pareto type II
tail, e.g., the Burr type XII (Burr, 1942; Tadikamalla, 1980),
the two- and three-parameter Kappa (Mielke, 1973), the Log-
Logistic (e.g., Ahmad et al., 1988) and the Generalized Beta
of the second kind (Mielke Jr. and Johnson, 1974).
Another very common distribution used in hydrology is
the Lognormal with PDF and EPF, respectively,
fLN(x) =1
π γ x exp ln2x
β1!(8)
¯
FLN(x) =1
2erfc lnx
β1!(9)
www.hydrol-earth-syst-sci.net/17/851/2013/ Hydrol. Earth Syst. Sci., 17, 851–862, 2013
856 S. M. Papalexiou et al.: How extreme is extreme?
where erfc(x) =2π1/2R
xet2dt. The distribution com-
prises the scale parameter β > 0 and the parameter γ > 0 that
controls the shape and the behaviour of the tail. Lognormal
is also considered a heavy-tailed distribution (it belongs to
the subexponential family) and can approximate power-law
distributions for a large portion of the distribution’s body
(Mitzenmacher, 2004). Notice that the notation in Eqs. (8)
and (9) differs from the common one and illustrates more
clearly the kind of the two parameters (scale and shape).
The Weibull distribution, which can be considered as
a generalization of the exponential distribution, is another
common model in hydrology (Heo et al., 2001a, b) and its
PDF and EPF are given, respectively, by
fW(x) =γ
βx
βγ1expx
βγ(10)
¯
FW(x) =expx
βγ.(11)
The parameter β > 0 is a scale parameter, while the shape
parameter γ > 0 governs also the tail’s asymptotic behaviour.
For γ < 1 the distribution belongs to the subexponential fam-
ily with a tail heavier than the exponential one, while for
γ > 1 the distribution is characterized as hyperexponential
with a tail thinner than the exponential. Many distributions
can be assumed tail-equivalent with the Weibull for a speciﬁc
value of the parameter γ, e.g., the Generalized Exponential,
the Logistic and the Normal.
Finally, one of the most popular models for describing
daily rainfall is the Gamma distribution (e.g., Buishand,
1978), which, like the Weibull distribution, belongs to the
exponential family. Its PDF and EPF are given, respectively,
by
fG(x) =1
β0 (γ ) x
βγ1expx
β(12)
¯
FG(x) =0γ , x
β/ 0(γ ) (13)
with 0(s , x) =R
xts1etdtand 0(s) =R
0ts1etdt.
Generally, we can assume that the Gamma tail behaves
similar to the exponential tail. Yet, this is only approxi-
mately correct as the Gamma distribution belongs to a class
of distributions (denoted as S(γ ); see, e.g., Embrechts and
Goldie, 1982; Kl¨
uppelberg, 1989; Alsmeyer and Sgibnev,
1998) that irrespective of its parameter values cannot be
classiﬁed as subexponential, while it is not tail-equivalent
with the exponential. This can be seen from the fact that
the limx→∞ ¯
FG(x)/ ¯
G(x) is 0 for β < βEand for β > βE,
where ¯
G(x) =exp(x E)is the exponential tail. Yet it is
noted that if compared with an exponential tail with β=βE,
then
lim
x→∞
¯
F (x)
¯
G(x) =
0 0 < γ < 1
1γ=1
γ > 1.(14)
Therefore, in this case and practically speaking, for 0 < γ <
1 the Gamma distribution has a “slightly lighter” tail than
the exponential tail as it decreases faster, while for γ > 1 it
exhibits a “slightly heavier” tail as it decreases more slowly
than the exponential tail.
All four distributions we compare here, and consequently
their tails, have similarities in their structure as all have
two parameters and speciﬁcally one scale parameter and one
shape parameter. Nevertheless, among the various distribu-
tions with the same parameter structure, inevitably some are
more ﬂexible than others. One way to quantify this ﬂexibil-
ity is by comparing them in terms of various shape mea-
sures (e.g., skewness, kurtosis, etc.). For example, the fea-
sible ranges of skewness for the Pareto, Lognormal, Weibull
and Gamma are, respectively, (2, ), (0, ), (1.14, )
and (0, ). Therefore, the Weibull distribution seems to be
the most “ﬂexible” distribution among them and the Pareto
the least. Yet this argument is not valid when we focus on the
tail because the general shape of the tail is basically similar
and what differs is the rate at which the tail approaches zero.
5 Results and discussion
The basic statistical results from ﬁtting the four distribution
tails, following the methodology described, to the 15029
daily rainfall records are given in Table 3. In order to assess
which tail has the best ﬁt, the four tails were compared in
couples in terms of the resulting MSE, i.e., the tail with the
smaller MSE is considered better ﬁtted. As shown in Fig. 4,
the Pareto tail, when compared with the other three distribu-
tions, was better ﬁtted in about 60% of the stations. Interest-
ingly, the distribution with the heavier tail of each couple, in
all cases, was better ﬁtted in a higher percentage of the sta-
tions, which implies a rule of thumb of the type “the heavier,
the better”!
Another comparison revealing the overall performance of
the ﬁtted tails was based on their average rank. That is, the ﬁt-
ted tails in each record were ranked according to their MSE,
i.e., the tail with the smaller MSE was ranked as 1 and the
one with the largest as 4. Figure 5 depicts the average rank
of each tail for all stations. Again, the Pareto performed best,
while the most popular model for rainfall, the Gamma distri-
bution, performed the worst. The percentages of each distri-
bution tail that was best ﬁtted are 30.7% for Pareto, 29.8 %
for Lognormal, 13.6% for Weibull and 25.8 % for Gamma.
Again, the Pareto distribution is best according to these per-
centages; interestingly, however, the Gamma distribution has
a relatively high percentage, higher than the Weibull. This
does not contradict the conclusion derived by the average
Hydrol. Earth Syst. Sci., 17, 851–862, 2013 www.hydrol-earth-syst-sci.net/17/851/2013/
S. M. Papalexiou et al.: How extreme is extreme? 857
Table 3. Summary statistics from the ﬁtting of the four distribution
tails into the 15 029 tail-samples of daily rainfall (expressed in mm).
Pareto Lognormal
MSE β γ MSE β γ
Min 0.002 0.42 0.001 0.002 1.22 0.531
Mode0.011 7.54 0.134 0.012 8.78 1.060
Mean 0.017 8.80 0.140 0.018 9.46 1.087
Median 0.021 9.51 0.145 0.022 10.59 1.107
Max 0.336 54.79 0.797 0.322 76.74 2.284
SD 0.015 4.92 0.076 0.015 6.44 0.214
Skew 2.910 1.23 0.495 2.755 1.73 0.561
Weibull Gamma
MSE β γ MSE β γ
Min 0.002 0.02 0.230 0.002 3.79 0.010
Mode 0.013 4.33 0.661 0.015 17.50 0.092
Mean 0.019 5.91 0.678 0.023 23.15 0.219
Median 0.022 6.88 0.692 0.032 28.18 0.294
Max 0.298 52.72 1.491 0.482 120.00 2.433
SD 0.015 4.69 0.139 0.034 17.30 0.269
Skew 2.151 1.82 0.668 4.377 1.65 2.567
The mode was estimated from the empirical density function (histogram) after
smoothing.
rank. The explanation is that the Gamma distribution was
ranked as best in some cases, but when it was not the best
ﬁtted, it was probably the worst ﬁtted.
Figure 6 depicts the empirical distributions of the shape
parameters of the ﬁtted tails. It is well-known that the most
probable values are the ones around the mode, which for the
Pareto shape parameter is 0.134. Interestingly, this value is
close to the one determined in a different context by Kout-
soyiannis (1999) using Hershﬁeld’s (1961) dataset. This im-
plies that power-type distributions, which asymptotically be-
have like the Pareto, will not have ﬁnite power moments of
order greater than 1/0.134 7.5. Moreover, as the empirical
distribution of the Pareto shape parameter in Fig. 6 attests,
values around 0.2 are also common, implying non-existence
of moments greater than the ﬁfth order. We should thus bear
in mind that sample moments of that or higher order (some-
times appearing in research papers) may not exist. Regarding
the Weibull tail, the estimated mode of its shape parameter
is 0.661, implying a much heavier tail compared to the ex-
ponential one. Finally, it is worth noting that the estimated
mode of the Gamma shape parameter is as low as 0.092. The
shape parameter of the Gamma distribution controls mainly
the behaviour of the left tail, resulting in J- or bell-shaped
densities (loosely speaking, the right tail is dominated by
the exponential function and thus behaves like an exponen-
tial tail). A value that low corresponds to an extraordinarily
J-shaped density, which would be unrealistic for describing
the whole distribution body of daily rainfall. In other words,
Pareto
60%
Pareto
59%
Weibull
41%
Pareto
67%
Gamma
33%
Weibull
42%
Lognormal
66%
Gamma
34%
Weibull
73%
Gamma
27%
0
20
40
60
80
100
Pareto
vs.
Lognormal
Pareto
vs.
Weibull
Pareto
vs.
Gamma
Lognormal
vs.
Weibull
Lognormal
vs.
Gamma
Weibull
vs.
Gamma
Records better fitted H%L
Fig. 4. Comparison of the ﬁtted tails in couples in terms of the re-
sulting MSE. The heavier tail of each couple is better ﬁtted to the
empirical points in a higher percentage of the records.
a Gamma distribution ﬁtted to the whole set of points would
most probably underestimate the behaviour of extremes.
We searched for the existence of any geographical pat-
terns, potentially deﬁning climatic zones, in the best ﬁtted
tails, i.e., the existence of zones in the world where the ma-
jority of the records were better described by one of the stud-
ied distribution tails. The maps in Fig. 7, which depict the
locations of the stations where each distribution tail was best
ﬁtted, did not unveil any regular patterns in terms of the best
ﬁtted distribution but rather seem to follow a random varia-
tion.
Another way to investigate for geographical patterns, as
the previous map did not reveal any useful information, is
to study the ﬁtted tails grouped into two coarser groups: the
subexponential group and the exponential-hyperexponential
group. The former includes the Pareto, the Lognormal and
the Weibull with γ < 1 tails, while the latter includes the
Gamma and the Weibull with γ1 tails. Among the 15029
records, subexponential tails were best ﬁtted in 10911 cases
or in 72.6% while exponential-hyperexponential tails were
best ﬁtted in 4118 or in 27.4%. Further, in order to get a
clearer picture instead of constructing maps with the loca-
tions where the ﬁrst-group or the second-group tails were
best ﬁtted, we studied the percentage of subexponential tails
that were best ﬁtted in large regions. Speciﬁcally, we con-
structed a grid covering the entire earth using a latitude
difference =2.5and longitude difference =5. In
each grid cell we estimated the percentage of the best ﬁtted
subexponential tails simply by counting the number of the
best ﬁtted subexponential tails divided by the total number
of records within the cell. We present these percentages in
the form of a map in Fig. 8, using a colour scale as shown
in the map’s legend. The cells plotted in the map are those
containing at least two records, so that the calculation of per-
centages have some meaning.
www.hydrol-earth-syst-sci.net/17/851/2013/ Hydrol. Earth Syst. Sci., 17, 851–862, 2013
858 S. M. Papalexiou et al.: How extreme is extreme?
Fig. 5. Mean ranks of the tails for all records. The best-ﬁtted tail
is ranked as 1 while the worst-ﬁtted as 4. A lower average rank
indicates a better performance.
The map of Fig. 8 clearly shows that in the vast majority
of cells subexponential tails dominate (percentage>60 %).
Particularly, out of 532 cells having at least two records, 255
and 163 have percentages of subexponential tails 60–80%
and >80%, respectively. In contrast, in only 35 and 79 cells
are the percentage values in the ranges 0–40% and 40–60 %,
respectively.
6 Veriﬁcation of the ﬁtting method
The use of a different norm for ﬁtting the tail into the em-
pirical data could potentially modify the conclusions drawn.
Nevertheless, this argument is pointless in the sense that the
main concern should be the efﬁciency of the norm used, i.e.,
if it possesses desired properties, e.g., if it is unbiased and has
lower variance in comparison to other candidates. Usually,
the error is expressed in terms of random variable values,
e.g., rainfall values, and not in terms of probability. However,
a literature search did not reveal or verify that the commonly
used norms, e.g., the classical MSE norm, are better than the
norm N1 used here (see Eq. 5).
For this reason, we implemented a Monte Carlo scheme,
which actually replicates the method we followed, where we
evaluate the performance of the norm N1 and also compare
it with the more common norms N2 and N3 deﬁned as
N2 =1
N
n
X
i=nN+1xu
x(i) 12(15)
N3 =1
N
n
X
i=nN+1xux(i) 2.(16)
Here, xu=Q(u) is the value predicted by the quantile func-
tion Qof the distribution under study for uequal to the em-
pirical probability of x(i) (the ith element the sample ranked
Fig. 6. Histograms of the shape parameters of the ﬁtted tails.
in ascending order) according to the Weibull plotting posi-
tion. The norm N2 has the same rationale as the one we used
but the error is estimated in terms of rainfall values, rather
than in terms of probability, while the norm N3 is the classi-
cal and most commonly used MSE norm.
The Monte Carlo scheme we performed can be summa-
rized in the following steps: (a) we generated 1000 random
samples from each one of the four distributions we studied
with sample size equal to 6600 values, which is approxi-
mately the average number of nonzero daily rainfall values
per record; (b) we selected the scale and the shape parameter
values to be approximately equal with the median values re-
sulted from the analysis of the real world dataset (see Table 3)
in order for the generated random samples to be representa-
tive of the real data; and (c) we ﬁtted each distribution to
its corresponding random sample and estimated the parame-
ters by applying our method for each one of the three norms,
while we set Nequal to 80yr, which is approximately the
average record length.
The results are presented in Fig. 9. The whiskers of the
box plots express the 95% Monte Carlo conﬁdence interval
of the parameters while the dashed lines show the true param-
eter values. It is clear that the norm N1 we used results in al-
most unbiased estimation of the parameters while, especially
for the Pareto and the Lognormal distributions, it results in
markedly smaller variance compared to the classical norm
N3. The norm N2 seems to perform very well for the Pareto,
Lognormal and Weibull distributions (although somewhat bi-
ased) but the results are poor for the Gamma distribution.
The classical and the most commonly used norm N3 is by far
the worst in term of bias except for the Gamma distribution,
for which it performs equally well as N1. In particular, for
the subexponential distributions of this simulation, i.e., the
Pareto, the Lognormal and the Weibull, the classical norm
N3 fails to provide good results. This may point to a more
general conclusion, i.e., that the classical MSE, which is in-
spired based on properties of the normal distribution, is not
Hydrol. Earth Syst. Sci., 17, 851–862, 2013 www.hydrol-earth-syst-sci.net/17/851/2013/
S. M. Papalexiou et al.: How extreme is extreme? 859
Fig. 7. Geographical depiction of the 15029 stations where the best ﬁtted tail is (a) Pareto in 4621, (b) Lognormal in 4486, (c) Weibull in
2051, and (d) Gamma in 3871.
Fig. 8. Geographical variation of the percentage of best ﬁtted subexponential tails in cells deﬁned by latitude difference =2.5and
longitude difference =5. In total, in 72.6% of the 15 029 records analysed, the subexponential tails were the best ﬁtted.
www.hydrol-earth-syst-sci.net/17/851/2013/ Hydrol. Earth Syst. Sci., 17, 851–862, 2013
860 S. M. Papalexiou et al.: How extreme is extreme?
Fig. 9. Results of a Monte Carlo scheme implemented to evaluate
the performance of the norm N1 used in ﬁtting of tails in this study,
in comparison to commonly used ones (N2, N3).
a good choice for subexponential distributions. This needs to
be further investigated; however, we deem that there is a ra-
tionale supporting the following conclusion: subexponential
distributions can generate “extremely” extreme values com-
pared to the main “body” of values, and thus, in the classical
norm these values will contribute “extremely” to the total er-
ror heavily affecting the ﬁtting results.
Another issue of potential concern for the validity of the
conclusions drawn is the impact of the sample size, i.e., the
number of the largest events N, or equivalently the threshold
xL, for which the four distribution tails are ﬁtted. As men-
tioned before, we used the annual exceedance series, a stan-
dard method in hydrology in which Nequals the number of
the record’s years. Obviously, Ncan be deﬁned in many dif-
ferent ways, either with reference to record length or as a
ﬁxed number for every record studied.
In order to assess the impact of number of events in the
performance of the four ﬁtted distribution tails, we selected
randomly 2000 records among the 15029 analysed and we
ﬁtted the four distribution tails using six different meth-
ods for deﬁning N. The ﬁrst method (M1) is the one we
used for all above analyses, in which Nequals the number
of the record’s years. In the second (M2) and third (M3)
Fig. 10. Performance results of the four ﬁtted tails in 2000 randomly
selected records using six different methods for selecting the sample
size: (top panel) percentage of records in which each distribution
tail was best ﬁtted; (bottom panel) average ranks of the ﬁtted tails
(lower average rank indicates better performance).
methods we deﬁned the threshold xLas the 90th- and the
95th-percentiles, respectively, so that Nequals the number
of events included in the upper 10% and 5 %, respectively,
of the nonzero values. Obviously, in these two methods N
varies from record to record depending on the total number
of nonzero values, and on the average it equals 667 and 333
values for M2 and M3, respectively. In the rest three methods
(M4, M5 and M6) Nis deﬁned as a ﬁxed number for every
record, i.e., 50, 100 and 200 values, respectively.
The performance results comparing the six methods are
summarized in Fig. 10, which depicts (a) the percentage of
cases in which each distribution was best ﬁtted and (b) the
average rank of each distribution tail. Again the Pareto II
tail was best ﬁtted in a higher percentage of records in all
cases (M1–M6) with the percentage values varying in a nar-
row range. The results are essentially the same with those
obtained from the analysis of the whole database. The only
noticeable difference regards the method M2, in which the
Weibull tail sometimes seems to “gain ground” over the
Gamma and the Lognormal tails. In general it seems that
the Weibull tail improves its performance as Nincreases.
Thus, in M4 where Nhas the lowest value, i.e., 50 values,
it performs the worst, while in M2 where Nis maximum
(667 values on the average), it performs the best. The average
Hydrol. Earth Syst. Sci., 17, 851–862, 2013 www.hydrol-earth-syst-sci.net/17/851/2013/
S. M. Papalexiou et al.: How extreme is extreme? 861
rank, which is a better measure of the overall performance of
the distribution tails, remains essentially the same for each
distribution in all methods. An exception is observed again
in M2 where the Weibull tail performs better than the Log-
normal tail. Apart from this exception the general conclusion
is again that the Pareto II performs the best, followed by the
Lognormal and the Weibull tails, while the Gamma tail per-
forms the worst in all cases.
7 Summary and conclusions
Daily rainfall records from 15029 stations are used to inves-
tigate the performance of four common tails that correspond
to the Pareto, the Weibull, the Lognormal and the Gamma
distributions. These theoretical tails were ﬁtted to the empir-
ical tails of the records and their ability to adequately capture
the behaviour of extreme events was quantiﬁed by comparing
the resulting MSE. The ranking from best to worst in terms
of their performance is (a) the Pareto, (b) the Lognormal,
(c) the Weibull, and (d) the Gamma distributions. The anal-
ysis suggests that heavier-tailed distributions in general per-
formed better than their lighter-tailed counterparts. Particu-
larly, in 72.6% of the records subexponential tails were better
ﬁtted while the exponential-hyperexponential tails were bet-
ter ﬁtted is only 27.4 %. It is instructive that the most popular
model used in practice, the Gamma distribution, performed
the worst, revealing that the use of this distribution under-
estimates in general the frequency and the magnitude of ex-
treme events. Nevertheless, we must not neglect the fact that
the Gamma distribution was the best ﬁtted in 25.8% of the
records.
Additionally, we note that heavy tails tend to be hidden
(see, e.g., Koutsoyiannis, 2004a, b; Papalexiou and Kout-
soyiannis, 2013), especially when the sample size is small.
Thus, we believe that even in the cases where the Gamma tail
performed well, the true underlying distribution tail may be
heavier. This leads to the recommendation that heavy-tailed
distributions are preferable as a means to model extreme rain-
fall events worldwide. We also note that the tails studied
here are as simple as possible, i.e., only one shape parame-
ter controls their asymptotic behaviour. Yet there are many
distributions with more than one shape parameters which
may affect their tail behaviour. Particularly, the Generalized
Gamma (Stacy, 1962) and the Burr type XII distributions
were compared as candidates for the daily rainfall (based on
L-moments) in anonther study, using thousands of empirical
daily records and the former performed better (Papalexiou
and Koutsoyiannis, 2012).
The key implication of this analysis is that the frequency
and the magnitude of extreme events have generally been un-
derestimated in the past. Engineering practice needs to ac-
knowledge that extreme events are not as rare as previously
thought and to shift toward more heavy-tailed probability dis-
tributions.
Acknowledgements. Four eponymous reviewers, Aaron Clauset,
Roberto Deidda, Salvatore Grimaldi and Francesco Laio, and four
commenters, Santiago Beguer´
ıa, Federico Lombardo, Chris Onof
and Patrick Willems, are acknowledged for their public review
Most of the comments helped us to improve the original manuscript.
Edited by: P. Molnar
References
Ahmad, M. I., Sinclair, C. D., and Werritty, A.: Log-logistic ﬂood
frequency analysis, J. Hydrol., 98, 205–224, doi:10.1016/0022-
1694(88)90015-7, 1988.
Alsmeyer, G. and Sgibnev, M.: On the tail behaviour of the supre-
mum of a random walk deﬁned on a Markov chain, available at:
http://kamome.lib.ynu.ac.jp/dspace/handle/10131/5689 (last ac-
cess: 10 November 2012), 1998.
Balkema, A. A. and De Haan, L.: Residual Life Time at Great Age,
Ann. Probab., 2, 792–804, doi:10.1214/aop/1176996548, 1974.
Beguer´
ıa, S., Vicente-Serrano, S. M., L´
opez-Moreno, J. I., and
Garc´
ıa-Ruiz, J. M.: Annual and seasonal mapping of peak in-
tensity, magnitude and duration of extreme precipitation events
across a climatic gradient, northeast Spain, Int. J. Climatol., 29,
1759–1779, 2009.
Ben-Zvi, A.: Rainfall intensity–duration–frequency relationships
derived from large partial duration series, J. Hydrol., 367, 104–
114, doi:10.1016/j.jhydrol.2009.01.007, 2009.
Bhattarai, K. P.: Partial L-moments for the analysis of censored
ﬂood samples, Hydrolog. Sci. J., 49, 855–868, 2004.
Buishand, T. A.: Some remarks on the use of daily rainfall mod-
els, J. Hydrol., 36, 295–308, doi:10.1016/0022-1694(78)90150-
6, 1978.
Burr, I. W.: Cumulative Frequency Functions, Ann. Math. Stat., 13,
215–232, 1942.
Chow, V. T.: Handbook of applied hydrology: a compendium of
water-resources technology, McGraw-Hill, 1964.
Cunnane, C.: A particular comparison of annual maxima and partial
duration series methods of ﬂood frequency prediction, J. Hydrol.,
18, 257–271, doi:10.1016/0022-1694(73)90051-6, 1973.
ee, B., and Ouarda, T. B. M. J.: On the tails of
extreme event distributions in hydrology, J. Hydrol., 355, 16–33,
doi:10.1016/j.jhydrol.2008.02.011, 2008.
Embrechts, P. and Goldie, C. M.: On convolution tails, Stoch. Proc.
Appl., 13, 263–278, doi:10.1016/0304-4149(82)90013-8, 1982.
Embrechts, P., Kl¨
uppelberg, C., and Mikosch, T.: Modelling ex-
tremal events for insurance and ﬁnance, Springer Verlag, Berlin
Heidelberg, 1997.
European Commission: Directive 2007/60/EC of the European Par-
liament and of the Council of 23 October 2007 on the assessment
and management of ﬂood risks, Ofﬁcial Journal of the European
Communities, L, 288(6.11), 27–34, 2007.
Goldie, C. M. and Kl¨
uppelberg, C.: Subexponential distributions,
in: A Practical Guide to Heavy Tails: Statistical Techniques and
Applications, edited by: Adler, R., Feldman, R., and Taggu, M.
S., 435–459, Birkh¨
auser Boston, 1998.
Gupta, S. K.: Modern Hydrology and Sustainable Water Develop-
ment, John Wiley & Sons, 2011.
www.hydrol-earth-syst-sci.net/17/851/2013/ Hydrol. Earth Syst. Sci., 17, 851–862, 2013
862 S. M. Papalexiou et al.: How extreme is extreme?
Heo, J. H., Boes, D. C., and Salas, J. D.: Regional ﬂood fre-
quency analysis based on a Weibull model: Part 1. Estimation
and asymptotic variances, J. Hydrol., 242, 157–170, 2001a.
Heo, J. H., Salas, J. D., and Boes, D. C.: Regional ﬂood frequency
analysis based on a Weibull model: Part 2. Simulations and ap-
plications, J. Hydrol., 242, 171–182, 2001b.
Hershﬁeld, D. M.: Estimating the probable maximum precipitation,
J. Hydraul. Eng.-ASCE, 87, 99–106, 1961.
Kl¨
uppelberg, C.: Subexponential Distributions and Integrated Tails,
J. Appl. Probab., 25, 132–141, doi:10.2307/3214240, 1988.
Kl¨
uppelberg, C.: Subexponential distributions and characteriza-
tions of related classes, Probab. Theory Rel., 82, 259–269,
doi:10.1007/BF00354763, 1989.
Koutsoyiannis, D.: A probabilistic view of Hershﬁeld’s method for
estimating probable maximum precipitation, Water Resour. Res.,
35, 1313–1322, 1999.
Koutsoyiannis, D.: Statistics of extremes and estimation of extreme
rainfall, 1, Theoretical investigation, Hydrolog. Sci. J., 49, 575–
590, 2004a.
Koutsoyiannis, D.: Statistics of extremes and estimation of extreme
rainfall, 2, Empirical investigation of long rainfall records, Hy-
drolog. Sci. J., 49, 591–610, 2004b.
Kroll, C. N. and Stedinger, J. R.: Estimation of moments and quan-
tiles using censored data, Water Resour. Res., 32, 1005–1012,
1996.
Mielke Jr., P. W.: Another Family of Distributions for Describing
and Analyzing Precipitation Data, J. Appl. Meteorol., 12, 275–
280, 1973.
Mielke Jr., P. W. and Johnson, E. S.: Some generalized beta distri-
butions of the second kind having desirable application features
in hydrology and meteorology, Water Resour. Res., 10, 223–226,
1974.
Mitzenmacher, M.: A brief history of generative models for power
law and lognormal distributions, Internet Mathematics, 1, 226–
251, 2004.
Moisello, U.: On the use of partial probability weighted moments
in the analysis of hydrological extremes, Hydrol. Process., 21,
1265–1279, 2007.
Papalexiou, S. M. and Koutsoyiannis, D.: Entropy based derivation
of probability distributions: A case study to daily rainfall, Adv.
2012.
Papalexiou, S. M. and Koutsoyiannis, D.: Battle of extreme value
distributions: A global survey on extreme daily rainfall, Water
Resour. Res., online ﬁrst, doi:10.1029/2012WR012557, 2013.
Pickands III, J.: Statistical Inference Using Extreme Order Statis-
tics, Ann. Stat., 3, 119–131, 1975.
Stacy, E. W.: A Generalization of the Gamma Distribution, Ann.
Math. Stat., 33, 1187–1192, 1962.
Tadikamalla, P. R.: A Look at the Burr and Related Distributions,
Int. Stat. Rev., 48, 337–344, 1980.
Tavares, L. V. and Da Silva, J. E.: Partial duration series method re-
visited, J. Hydrol., 64, 1–14, doi:10.1016/0022-1694(83)90056-
2, 1983.
Teugels, J.: Class of subexponential distributions, Ann. Probab., 3,
1000–1011, doi:10.1214/aop/1176996225, 1975.
Wang, Q. J.: Using partial probability weighted moments to ﬁt the
extreme value distributions to censored samples, Water Resour.
Res., 32, 1767–1771, 1996.
Werner, T. and Upper, C.: Time variation in the tail behavior of
Bund future returns, J. Future Markets, 24, 387–398, 2004.
Hydrol. Earth Syst. Sci., 17, 851–862, 2013 www.hydrol-earth-syst-sci.net/17/851/2013/
... These distributions are common in hydrological practice, and were already tested and recommended in previous SPI studies [13,23,24,45]. We fit and compare the fitting performance using a modified Mean Square Error Norm (MSEN) thanks to its proven reliability and simplicity [46][47][48]. Then, the metrics used to quantify the differences between the SPI estimation approaches are described. ...
... , 12 denotes a specific month of the year, F N x i,j and F(x i ) are the empirical and the theoretical exceedance probabilities of the monthly rainfall amount x i,j . The main advantage of this method is that it allows the simultaneous estimation of the unknown parameters and the identification of the best suitable distribution, between the candidates, for each analyzed sample [19,[46][47][48]. ...
Article
Full-text available
Drought is ranked second in type of natural phenomena associated with billion dollars weather disaster during the past years. It is estimated that in EU countries the number of people affected by drought was increased by 20% over the last decades. It is widely recognized that the Standardized Precipitation Index (SPI) can effectively provide drought characteristics in time and space. The paper questions the standard approach to estimate the SPI based on the Gamma probability distribution function, assessing the fitting performance of different biparametric distribution laws to monthly precipitation data. We estimate SPI time series, for different scale of temporal aggregation, on an unprecedented dataset consisting of 332 rain gauge stations deployed across Italy with observations recorded between 1951 and 2000. Results show that the Lognormal distribution performs better than the Gamma in fitting the monthly precipitation data at all time scales, affecting drought characteristics estimated from SPI signals. However, drought events detected using the original and the best fitting approaches does not diverge consistently in terms of return period. This suggests that the SPI in its original formulation can be applied for a reliable detection of drought events and for promoting mitigation strategies over the Italian peninsula.
... It is also worth mentioning that a few stations in our data set contain missing values (Table S1). Since they are below the 20% criteria (Papalexiou et al., 2013), instead of removing these stations or replacing the missing values, we excluded the missing values. ...
Article
In this study, we present the long‐term daily and sub‐daily station data of annual extreme rainfall (1970‐2015) and the trend analyses in rainfall regimes in Turkey. Trends in 5, 10, 15, 30 minutes, 1, 2, 6, 12, and 24 hours of extreme rainfall in seven different rainfall regimes are estimated through non‐parametric tests. The trends in return levels (2‐year, 20‐year, and 100‐year) are defined by an appropriate three‐parameter generalized extreme value distribution and are evaluated in the climatological context of rainfall regimes. Overall, from 5‐min to 2‐hr durations, magnitudes of trends in extreme rainfall constantly increase in all rainfall regimes, which may be attributed to the intensified contribution of convective rainfall as a response to warming. Trend analysis of return levels reveals that, compared with 2‐yr return levels, low probability high impact extreme rainfall events generally have the lowest estimated median trends until 30‐min to 2‐hr range. A shift in the magnitudes of trends occurs generally at 30‐min and 1‐hr durations; trends of rare intense events increase at the expense of less intense extreme rainfall. Thus, the intensification of 1‐hr to 2‐hr extreme rainfall events can arise from both the increasing trends of more common events in shorter rainfall durations and from the increase in the trends of low probability high impact extreme rainfall events at this range. Moreover, explicitly in continental rainfall regimes, increases in the magnitudes of the trends in 30‐min to 2‐hr duration range are accompanied by various declines in 6‐hr to 24‐hr duration range. Although coastal regimes generally have increasing trend values from 5‐min to 24‐hr durations, in northern and southern clusters, changes occur in the variability of extreme rainfall and the trend values of rarer (20‐year and 100‐yr return periods) extreme rainfall exhibit increases in 6‐hr to 12‐hr range.
... It is not surprising to find that the best-fit of Equation (3) is a logarithm. The Lognormal is a right-skewed continuous probability distribution introduced by Galton [31], frequently used to represent physical variables that never take negative values, or to represent risk as a function of the time [32][33][34]. This is consistent with the extreme value theory (EVT), where extreme events are those contained in the tail distribution of a given variable [35]. ...
Article
Full-text available
The European Standard EN 15757: 2010 ‘Conservation of Cultural Property—Specifications for temperature and relative humidity to limit climate-induced mechanical damage in organic hygroscopic materials’ is a guide specifying the allowed limits of variability of the indoor climate, in particular relative humidity (RH) to preserve cultural heritage objects and collections composed of climate-vulnerable materials. This paper is finalized to provide useful elements to improve the Standard at its next revision, based on focused research. The methodologies and the mathematical tools used are performed on 18 case studies representing different buildings, climates, and use, including heated and unheated buildings, museums, churches, concert halls, archives, and storage rooms. The first aim is to compare the method based on the centred moving average suggested by Annex A of EN15757 with an alternative method based on percentile interpolation to calculate the reference RH values, and in particular the safe band of RH variability, as well as the upper and lower risky bands. It has been found that the two methods provided the same results, but the latter is easier to manage. The second aim is to verify if the duration of the record necessary for the determination of the safe band is really 13 months of measurements as required by the Standard to account for the specific request of the centred moving average with a 30-day time window. This paper demonstrates that the same goal may be reached with a 12-month record, but extracting from the record itself the two periods required by the time window, i.e., the last 15 days of the year will be copied before the start of the record, and the same with the first 15 days after the end. The third aim is to test if the particular choice of the width of the time window is influential on the width of the safe band, and to assess the relationship between the width of the safe band and the width of the time window. The results show that the safe band logarithmically depends on the length of the time window, so it is crucial to respect the 30-day window established by the Standard.
... For smaller thresholds, such as those less than the 98th percentile, non-extreme rainfall values were incorporated in the POT samples, which lead to POT samples that may not be suitable for the J-shaped GPD, but for a bell-shaped distribution (Serinaldi and Kilsby, 2014;Papalexiou and Koutsoyiannis, 2012). However, for extreme rainfall, heavy tailed distributions such as GPD are more suitable (Moccia et al., 2021), while the rule "the heavier, the better" can be applied (Papalexiou et al., 2013;Adlouni et al., 2008). Therefore, the 98th percentile was a suitable threshold. ...
Article
Full-text available
Nonstationary frequency analysis of peak over threshold (POT) extreme rainfall series is of crucial in hydrology. Most previous studies on the nonstationary frequency analyses of POT extreme rainfall series use only a single threshold, ignoring the differences in the statistical characteristics of POT extreme rainfall series extracted with different thresholds. This study investigated the impact of thresholds on the nonstationary frequency analyses of POT extreme rainfall series using a three-step method. First, the non-stationarities in POT extreme rainfall series extracted with different thresholds were assessed using the Iterative Mann-Kendall test and the Mood test. Second, the nonstationary POT extreme rainfall series were modeled using the generalized Pareto distribution (GPD) with the scale parameter to be linked with physical covariates. Third, the uncertainties in the parameters and return level of the optimal model for the POT extreme rainfall series were evaluated using the Markov chain Monte Carlo method. This method was applied to the daily rainfall at 48 stations in the Pearl River Basin (PRB) from 1979 to 2020. By comparing the optimal models of POT extreme rainfall series extracted with different percentile thresholds, we can draw the following conclusions. (1) As the threshold increases from the 90th percentile to the 99.7th percentile, the percentage of nonstationary POT extreme rainfall series gradually decreases from 92% to 13%, with most stations with a significant increasing trend changing to most stations with a significant decreasing trend, especially when the threshold is greater than the 98th percentile. (2) The uncertainty in the scale parameter, shape parameter, and return level increases as the threshold increases, and increases significantly when the threshold is greater than the 98th percentile, especially for the scale and shape parameters. (3) The 98th percentile is suggested as the optimal threshold for the PRB; (4) For the 98th percentile thresholds, the total column water vapor in the convective indices is the most significant covariate over the PRB.
... In general, identifying the tail type or quantifying the tail-heaviness is not trivial and many methods have been invented and tested (see e.g., El Adlouni et al., 2008;Embrechts et al., 1997;Langousis et al., 2016;Nerantzaki & Papalexiou, 2019;Serinaldi, 2013;Smith, 1987;Wietzke et al., 2020). Tail-type identification or tail-heaviness estimates can be more robust if informed by global or regional studies (e.g., Papalexiou et al., 2013;Rajulapati et al., 2020;Serinaldi & Kilsby, 2014). For example, Papalexiou, AghaKouchak, and Foufoula-Georgiou (2018) used regional tail estimates to fix the tail parameter in the  and XII distributions before fitting them to the whole sample; this approach was used to assess hourly precipitation depths at large return periods. ...
Article
Full-text available
What elements should a parsimonious model reproduce at a single scale to precisely simulate rainfall at many scales? We posit these elements are: (a) the probability of dry and linear correlation structure of the wet/dry sequence as a proxy reproducing the distribution of wet/dry spells, and (b) the marginal distribution of nonzero rainfall and its correlation structure. We build a two‐state rainfall model, the CoSMoS‐2s, that explicitly reproduces these elements and is easily applicable at any timescale. Additionally, the paper: (a) introduces the Generalized Exponential (GE $\mathcal{G}\mathcal{E}$) distribution system comprising six flexible distributions with desired properties to describe nonzero rainfall and facilitate time series generation; (b) extends the CoSMoS framework to allow simulations with negative correlations; (c) simplifies the generation of binary sequences with any correlation structure by analytical approximations; (d) introduces the rank‐based CoSMoS‐2s that preserves Spearman's correlations, has an analytical formulation, and is also applicable for infinite variance time series, (e) introduces the copula‐based CoSMoS‐2s enabling intermittent times series generation with nonzero values having the dependence structure of any desired copula, and (f) offers conceptual generalizations for rainfall modeling and beyond, with specific ideas for future improvements and extensions. The CoSMoS‐2s is tested using four long hourly rainfall records; the simulations reproduce rainfall properties at multiple scales including the wet/dry spells, probability of dry, characteristics of nonzero rainfall, and the behavior of extremes.
Article
Many areas in South Africa are prone to localized flooding. With climate change already said to affect the intensity of rainfall, there is a need to investigate if there is a change in the probability of significant to extreme daily rainfall across South Africa. This was investigated through the analysis of the daily time series of 70 manual rainfall stations, over the period 1921 to 2020. The analysis period was divided into two equal periods of 50 years for comparison. With the application of the gamma distribution, it is shown that most stations experienced an increase in the probability of receiving more than 50 mm per day, defined as significant rainfall, in the latter half of the analysis period. Also, most stations showed an increase in their 1:50- and 1:100-year return period values, with some stations over the eastern parts showing increases of over 100 mm. There was also an increase in the probability of “heavy rainfall” (>75 mm) and “very heavy rainfall” events (>115 mm) between the first and second half of the analysis period for most stations over the country when applying the Peak-Over-Threshold approach. In summary, the results indicate that, although the number of rain days has remained near-constant over the 1921–2020 period, the probability of experiencing significant and extreme daily rainfall events has increased generally for most regions in South Africa. This is of concern as rainfall of this nature can have serious consequences in terms of flooding, erosion, and damage to agriculture and infrastructure.
Article
Full-text available
Statistical distributions of flood peak discharge often show heavy tail behavior, that is, extreme floods are more likely to occur than would be predicted by commonly used distributions that have exponential asymptotic behavior. This heavy tail behavior may surprise flood managers and citizens, as human intuition tends to expect light tail behavior, and the heaviness of the tails is very difficult to predict, which may lead to unnecessarily high flood damage. Despite its high importance, the literature on the heavy tail behavior of flood distributions is rather fragmented. In this review, we provide a coherent overview of the processes causing heavy flood tails and the implications for science and practice. Specifically, we propose nine hypotheses on the mechanisms causing heavy tails in flood peak distributions related to processes in the atmosphere, the catchment, and the river system. We then discuss to which extent the current knowledge supports or contradicts these hypotheses. We also discuss the statistical conditions for the emergence of heavy tail behavior based on derived distribution theory and relate them to the hypotheses and flood generation mechanisms. We review the degree to which the heaviness of the tails can be predicted from process knowledge and data. Finally, we recommend further research toward testing the hypotheses and improving the prediction of heavy tails.
Article
Full-text available
In some catchments, the distribution of annual maximum streamflow shows heavy tail behavior, meaning the occurrence probability of extreme events is higher than if the upper tail decayed exponentially. Neglecting heavy tail behavior can lead to an underestimation of the likelihood of extreme floods and the associated risk. Partly contradictory results regarding the controls of heavy tail behavior exist in the literature and the knowledge is still very dispersed and limited. To better understand the drivers, we analyze the upper tail behavior and its controls for 480 catchments in Germany and Austria over a period of more than 50 years. The catchments span from quickly reacting mountain catchments to large lowland catchments, allowing for general conclusions. We compile a wide range of event and catchment characteristics and investigate their association with an indicator of the tail heaviness of flood distributions, namely the shape parameter of the GEV distribution. Following univariate analyses of these characteristics, along with an evaluation of different aggregations of event characteristics, multiple linear regression models, as well as random forests, are constructed. A novel slope indicator, which represents the relation between the return period of flood peaks and event characteristics, captures the controls of heavy tails best. Variables describing the catchment response are found to dominate the heavy tail behavior, followed by event precipitation, flood seasonality, and catchment size. The pre‐event moisture state in a catchment has no relevant impact on the tail heaviness even though it does influence flood magnitudes.
Article
The probability distribution of precipitation amount strongly depends on geography, climate zone, and time scale considered. Closed-form parametric probability distributions are not sufficiently flexible to provide accurate and universal models for precipitation amount over different time scales. This paper derives non-parametric estimates of the cumulative distribution function (CDF) of precipitation amount for wet periods. The CDF estimates are obtained by integrating the kernel density estimator leading to semi-explicit CDF expressions for different kernel functions. An adaptive plug-in bandwidth estimator (KCDE) is investigated, using both synthetic data sets and reanalysis precipitation data from the Mediterranean island of Crete (Greece). It is shown that KCDE provides better estimates of the probability distribution than the standard empirical (staircase) estimate and kernel-based estimates that use the normal reference bandwidth. It is also demonstrated that KCDE enables the simulation of non-parametric precipitation amount distributions by means of the inverse transform sampling method.
Article
The popular approach to select a suitable distribution to characterize extreme rainfall events relies on the assessment of its descriptive performance. This study examines an alternative approach to this task that evaluates, in addition to the descriptive performance of the models, their performance in estimating out-of-sample events (predictive performance). With a numerical experiment and a study case in São Paulo state, Brazil, we evaluated the adequacy of seven probability distributions widely used in hydrological analysis to characterize extreme events in the region and compared the selection process of both popular and altenative frameworks. The results indicate that (1) the popular approach is not capable of selecting distributions with good predictive performance and (2) combining different predictive and descriptive tests can improve the reliability of extreme event prediction. The proposed framework allowed the assessment of model suitability from a regional perspective, identifying the Generalized Extreme Value (GEV) distribution as the most adequate to characterize extreme rainfall events in the region.
Article
Full-text available
[1] Theoretically, if the distribution of daily rainfall is known or justifiably assumed, then one could argue, based on extreme value theory, that the distribution of the annual maxima of daily rainfall would resemble one of the three limiting types: (a) type I, known as Gumbel; (b) type II, known as Fréchet; and (c) type III, known as reversed Weibull. Yet, the parent distribution usually is not known and often only records of annual maxima are available. Thus, the question that naturally arises is which one of the three types better describes the annual maxima of daily rainfall. The question is of great importance as the naive adoption of a particular type may lead to serious underestimation or overestimation of the return period assigned to specific rainfall amounts. To answer this question, we analyze the annual maximum daily rainfall of 15,137 records from all over the world, with lengths varying from 40 to 163 years. We fit the generalized extreme value (GEV) distribution, which comprises the three limiting types as special cases for specific values of its shape parameter, and analyze the fitting results focusing on the behavior of the shape parameter. The analysis reveals that (a) the record length strongly affects the estimate of the GEV shape parameter and long records are needed for reliable estimates; (b) when the effect of the record length is corrected, the shape parameter varies in a narrow range; (c) the geographical location of the globe may affect the value of the shape parameter; and (d) the winner of this battle is the Fréchet law.
Article
Full-text available
The upper part of a probability distribution, usually known as the tail, governs both the magnitude and the frequency of extreme events. The tail behaviour of all probability distributions may be, loosely speaking, categorized in two families: heavy-tailed and light-tailed distributions, with the latter generating more "mild" and infrequent extremes compared to the former. This emphasizes how important for hydrological design is to assess correctly the tail behaviour. Traditionally, the wet-day daily rainfall has been described by light-tailed distributions like the Gamma, although heavier-tailed distributions have also been proposed and used, e.g. the Lognormal, the Pareto, the Kappa, and others. Here, we investigate the issue of tails for daily rainfall by comparing the upper part of empirical distributions of thousands of records with four common theoretical tails: those of the Pareto, Lognormal, Weibull and Gamma distributions. Specifically, we use 15 029 daily rainfall records from around the world with record lengths from 50 to 163 yr. The analysis shows that heavier-tailed distributions are in better agreement with the observed rainfall extremes than the more often used lighter tailed distributios, with clear implications on extreme event modelling and engineering design.
Article
Full-text available
The principle of maximum entropy, along with empirical considerations, can provide consistent basis for constructing a consistent probability distribution model for highly varying geophysical processes. Here we examine the potential of using this principle with the Boltzmann–Gibbs–Shannon entropy definition in the probabilistic modeling of rainfall in different areas worldwide. We define and theoretically justify specific simple and general entropy maximization constraints which lead to two flexible distributions, i.e., the three-parameter Generalized Gamma (GG) and the four-parameter Generalized Beta of the second kind (GB2), with the former being a particular limiting case of the latter. We test the theoretical results in 11,519 daily rainfall records across the globe. The GB2 distribution seems to be able to describe all empirical records while two of its specific three-parameter cases, the GG and the Burr Type XII distributions perform very well by describing the 97.6% and 87.7% of the empirical records, respectively.
Article
A parameter estimation method is proposed for fitting the generalized extreme value (GEV) distribution to censored flood samples. Partial L-moments (PL-moments), which are variants of L-moments and analogous to "partial probability weighted moments", are defined for the analysis of such flood samples. Expressions are derived to calculate PL-moments directly from uncensored annual floods, and to fit the parameters of the GEV distribution using PL-moments. Results of Monte Carlo simulation study show that sampling properties of PL-moments, with censoring flood samples of up to 30% are similar to those of simple L-moments, and also that both PL-moment and LH-moments (higher-order L-moments) have similar sampling properties. Finally, simple L-moments, LH-moments, and PL-moments are used to fit the GEV distribution to 75 annual maximum flow series of Nepalese and Irish catchments, and it is found that, in some situations, both LH- and PL-moments can produce a better fit to the larger flow values than simple L-moments.
Chapter
Plate 1.3Plate 2.1Plate 3.15Plates 7.4, 7.11Plates 9.1, 9.12Plate 11.5Plates 13.1, 13.2, 13.7, 13.12, 13.14, 13.15, 13.17
Article
Let F be a distribution function on [0, ∞ with finite expectation. In terms of the hazard rate of F several conditions are given which simultaneously imply subexponentiality of F and of its integrated tail distribution F1. These conditions apply to a wide class of longtailed distributions, and they can also be used in connection with certain random walks which occur in risk theory and queueing theory.