Content uploaded by Demetris Koutsoyiannis

Author content

All content in this area was uploaded by Demetris Koutsoyiannis on Jul 24, 2017

Content may be subject to copyright.

Content uploaded by Demetris Koutsoyiannis

Author content

All content in this area was uploaded by Demetris Koutsoyiannis on Apr 15, 2014

Content may be subject to copyright.

Seediscussions,stats,andauthorprofilesforthispublicationat:https://www.researchgate.net/publication/254013576

Acriticalreviewofprobabilityofextreme

rainfall:principlesandmodels

Chapter·February2007

DOI:10.1201/9780203945988.ch7

CITATIONS

3

READS

69

1author:

Someoftheauthorsofthispublicationarealsoworkingontheserelatedprojects:

DeanofCivilEngineering,NationalTechnicalUniversityofAthensViewproject

Bayesianstatisticsandlong-rangedependenceViewproject

DemetrisKoutsoyiannis

NationalTechnicalUniversityofAthens

670PUBLICATIONS6,059CITATIONS

SEEPROFILE

AllcontentfollowingthispagewasuploadedbyDemetrisKoutsoyiannison15April2014.

Theuserhasrequestedenhancementofthedownloadedfile.

A CRITICAL REVIEW OF PROBABILITY OF EXTREME

RAINFALL: PRINCIPLES AND MODELS

Demetris Koutsoyiannis

Department of Water Resources, Faculty of Civil Engineering, National Technical

University of Athens, Heroon Polytechneiou 5, GR-157 80 Zographou, Greece

(http://www.itia.ntua.gr/dk/, dk@itia.ntua.gr)

Abstract

Probabilistic modelling of extreme rainfall has a crucial role in flood risk estimation

and consequently in the design and management of flood protection works. This is

particularly the case for urban floods, where the plethora of flow control Vites and the

scarcity of flow measurements make the use of rainfall data indispensable. For half a

century, the Gumbel distribution has been the prevailing model of extreme rainfall.

Several arguments including theoretical reasons and empirical evidence are supposed

to support the appropriateness of the Gumbel distribution, which corresponds to an

exponential parent distribution tail. Recently, the applicability of this distribution has

been criticized both on theoretical and empirical grounds. Thus, new theoretical

arguments based on comparisons of actual and asymptotic extreme value distributions

as well as on the principle of maximum entropy indicate that the Extreme Value Type

2 distribution should replace the Gumbel distribution. In addition, several empirical

analyses using long rainfall records agree with the new theoretical findings. Further-

more, the empirical analyses show that the Gumbel distribution may significantly

underestimate the largest extreme rainfall amounts (albeit its predictions for small

return periods of 5-10 years are satisfactory), whereas this distribution would seem as

an appropriate model if fewer years of measurements were available (i.e., parts of the

long records were used).

2

1. Introduction

The design and management of flood protection works and measures requires reliable

estimation of flood probability and risk. A solid empirical basis for this estimation can

be offered by flow observation records with an appropriate length, sufficient to

include a sample of representative floods. In practice, however, flow measurements

are never enough to support flood modelling. Particularly, in urban floods the control

points are numerous and the flow gauge sites scarce or non existing at all (for example

in Athens, a city with a history extended over several millennia, traversed by the

Kephisos and Ilisos Rivers and other urban streams, no flow gauge with systematic

measurements has ever operated). The obvious alternative is the use of hydrological

models with rainfall input data and the substitution of rainfall for streamflow empiri-

cal information. Notably, even when flow records exist, yet rainfall probability has

still a major role in hydrologic practice; for instance in major hydraulic structures, the

design floods are generally estimated from appropriately synthesised design storms

(e.g. U.S. Department of the Interior, Bureau of Reclamation, 1977, 1987; Sutcliffe,

1978).

However, from the birth time of science, which is typically located in the era of the

Ionian philosophers (6th century BC), it is known that the empirical evidence alone

never suffices to form a comprehensive and consistent picture of natural phenomena

and behaviours. A theory, based on reasoning, is required to interpret empirical obser-

vations and draw such a picture. Such a theory has been sought for more than 26 cen-

turies, since the formulation of the first logical explanations of hydrometeorological

phenomena by Anaximander (c. 610- c. 547 BC) and Anaximenes (585-525 BC) of

Miletus, who studied the formation of clouds, rain and hail (Koutsoyiannis and Xan-

thopoulos, 1999; Koutsoyiannis et al., 2006). However, still the state of affairs

regarding understanding and description of these phenomena and their behaviours

may be not satisfactory.

Some of the questions in seeking a fundament for a theory are philosophical ques-

tions; for instance the concepts of infinite vs. finite and of determinism vs. indetermi-

nism, including the notions of probability and entropy. It is necessary to briefly dis-

cuss these questions because they greatly influence our perception of hydrometeo-

rological phenomena including rainfall and flood.

The history of infinite goes back to the 6th century BC, with Anaximander, who

regarded infinite as the cosmological principle, and continues with Zeno of Elea (c.

490- c. 430 BC) and his famous paradoxes, and later with Aristotle (384-328 BC) who

introduced the notion of potential infinite, as opposed to the actual or complete infi-

nite. The Aristotelian potential infinite “exists in no other way, but … potentially or

by reduction” (Physics, 3.7, 206b16). It is generally claimed that the problem of

mathematical infinite was tackled in the late 19th century. According to Bertrand Rus-

sell, Zeno’s paradoxes “after two thousand years of continual refutation, … made the

foundation of a mathematical renaissance (Russell, 1903). Furthermore, “for over two

thousand years the human intellect was baffled by the problem [of infinity]… The

definite solution to the difficulties is due to Georg Cantor” (Russell, 1926; see also

Crossley et al., 1990 and Priest, 2002).

In hydrometeorology, however, the concept of infinity is still not understood and this

situation has led to fallacies of upper bounds in precipitation and flood, the well-

3

known concepts of the probable maximum precipitation (PMP) and probable maxi-

mum flood (PMF) (World Meteorological Organization, 1986). These contradictory

concepts are still in wide use, even though merely the Aristotelian notion of potential

infinite would suffice to abandon them. To quote, for example, Dingman (1994, p.

141) “conceptually, we can always imagine that a few more molecules of water could

fall beyond any specified limit.” This thinking is absolutely consistent with the Aris-

totelian potential infinite.

Criticisms of the PMP and PMF concepts must have started from the 1970s; among

them, one of the neatest was offered by Benson (1973):

“The ‘probable maximum’ concept began as ‘maximum possible’ because it was

considered that maximum limits exist for all the elements that act together to pro-

duce rainfall, and that these limits could be defined by a study of the natural proc-

esses. This was found to be impossible to accomplish – basically because nature is

not constrained to limits ... At this point, the concept should have been abandoned

and admitted to be a failure. Instead, it was salvaged by the device of renaming it

‘probable maximum’ instead of ‘maximum possible’. This was done, however, at a

sacrifice of any meaning or logical consistency that may have existed originally ...

The only merit in the value arrived at is that it is a very large one. However, in

some instances, maximum probable precipitation or flood values have been

exceeded shortly after or before publication, whereas, in some instances, values

have been considered by competent scientists to be absurdly high … The method

is, therefore, subject to serious criticism on both technical and ethical grounds –

technical because of a preponderance of subjective factors in the computation

process, and because of a lack of specific or consistent meaning in the result; ethi-

cal because of the implication that the design value is virtually free from risk.”

More recently, the particular hypotheses and methodologies elements of the different

approaches for estimating PMP have been also criticized. The so-called statistical

approach to PMP, based on the studies of Hershfield (1961a, 1965) has been revisited

recently (Koutsoyiannis, 1999) and it was concluded that the data used by Hershfield

do not suggest the existence of an upper limit. To formulate his method, Hershfield

compiled a huge and worldwide rainfall data set (a total of 95 000 station-years of

annual maximum rainfall belonging to 2645 stations, of which about 90% were in the

USA), standardized each record and found the maximum over the 95 000 standardized

values, which he asserted PMP. Clearly then, the PMP hypothesis is based on the

incorrect interpretation that an observed maximum in precipitation is a physical upper

limit; had the sample size been greater, the estimated PMP value would been greater,

too.

The situation is perhaps even worse with the so-called moisture maximization

approach of PMP estimation (World Meteorological Organization, 1986), which

seemingly is more physically based than the statistical approach of Hershfield. This is

the most representative and widely used approach to PMP, and is based on the

“maximization” of the observed atmospheric moisture content (i.e. to a maximum

observed value) and on the assumption that if the moisture content were maximum,

then the rainfall depth would be greater than observed by a factor equal to the ratio of

the maximum over the observed rainfall depth. Applying this “maximization” proce-

dure for all observed storms, the PMP value is assumed to be the maximum over all

maximized depths.

4

Clearly, then, the approach suffers twice by the incorrect interpretation that an

observed maximum is a physical upper limit. This fallacy is used for first time to

determine the maximum moisture content (formally, the maximum dew point,

assuming that the observed maximum in a record of about 50 years is a physical limit;

obviously, had the record length been 100 or 200 years the observed maximum dew

point would most likely be higher). This logic is also used for a second time to deter-

mine the PMP as the maximum of observed maximized values. Papalexiou and Kout-

soyiannis (2006) have demonstrated the arbitrariness of the approach and its enor-

mous sensitivity to the observation records (e.g. a missing rainfall observation could

result in 25% reduction of the PMP value). The arbitrary assumptions of the approach

extend beyond the confusion of maximum observed quantities with physical limits.

For example, the logic of moisture maximization at a particular location is unsup-

ported given that a large storm at this location depends on the convergence of atmos-

pheric moisture from much greater areas.

In conclusion, it is surprising that the contradictory PMP and PMF concepts are

regarded by many as concepts more physically based than a probabilistic approach to

extreme rainfall and flood. This is particularly the case because the PMP and PMF

concepts are greatly based on probabilistic or statistical assumptions, which in addi-

tion are rather misrepresentations of physical phenomena and indicate confused inter-

pretation of probability. In turn, as will be discussed in the next section, these very

concepts may have also affected probabilistic approaches of hydrological processes,

in an attempt to make them consistent with the unsupported assumption of an upper

bound.

This situation harmonizes with a dominant logic in hydrometeorology that probability

does not offer a physical insight and is not related to understanding of physical phe-

nomena, but rather it is only an unavoidable modelling tool. In contrast, understanding

and insights are regarded as pertinent to deterministic thinking and to mechanistic

explanations of phenomena. This logic, however, ought to have been abandoned at the

end of the 19th century, after the development of statistical thermophysics and later of

the quantum physics which rely upon the concepts of probability and statistics and

depart from mechanistic physics. More recently, the study of chaotic dynamical sys-

tems and the astonishing results that the evolution of even the simplest nonlinear

systems is unpredictable after a short lead time, have demonstrated the ineffectiveness

of deterministic thinking. In this respect, even a faithful follower of determinism is

inevitably forced to accept probabilistic description of phenomena for practical prob-

lems. However, when using probabilistic descriptions the gain may be greater if these

descriptions are not regarded as incomprehensible mathematical models but rather as

insightful physical descriptions.

The notion of indeterminism is at least as old as Heraclitus (c. 535 - 475 BC) and the

notion of probability is the extension (quantifying transformation) of the Aristotelian

idea of “potentia” (Popper, 1982, p. 133). The mathematical formalism of probability

is much older than the recent notion of chaotic systems albeit its concrete fundament

was offered in the mid 20th century by Kolmogorov (1933). The notion of probability

may imply indeterminism from the outset (all events are possible, usually with differ-

ent probabilities, but eventually one occurs) and may differ from the deterministic

thinking (only one event is possible but it may be difficult to predict which one).

The notion of probability in synergy with the notion of infinite can remove paradoxi-

5

cal impressions related to upper bounds of physical quantities such as rainfall: The

probability that rainfall exceeds any positive number x decreases toward zero as x

decreases, becomes inconceivably small for very high x and becomes precisely zero

for x = ∞. So, there is no need to assume such controversial concepts as PMP. This

was explained half a century ago by the famous statistician Feller (1950), using

another example, the age of a person:

“The question then arises as to which numbers can actually represent the life span

of a person. Is there a maximal age beyond which life is impossible, or is any age

conceivable? We hesitate to admit that man can grow 1000 years old, and yet cur-

rent actuarial practice admits no bounds to the possible duration of life. According

to formulas on which modern mortality tables are based, the proportion of men

surviving 1000 years is of the order of magnitude of one in 101036 a number with

1027 billions of zeros. This statement does not make sense from a biological or

sociological point of view, but considered exclusively from a statistical standpoint

it certainly does not contradict any experience. There are fewer than 1010 people

born in a century. To test the contention statistically, more than 101035 centuries

would be required, which is considerably more than 101034/ lifetimes of the earth.

Obviously, such extremely small probabilities are compatible with our notion of

impossibility. Their use may appear utterly absurd, but it does no harm and is con-

venient in simplifying many formulas. Moreover, if we were seriously to discard

the possibility of living 1000 years, we should have to accept the existence of a

maximum age, and the assumption that it should be possible to live x years and

impossible to live x years and two seconds is as unappealing as the idea of unlim-

ited life.”

In hydrometeorology, the introduction and development of the concepts of probability

and statistics have been closely related to the study of extreme rainfall and flood and

were greatly determined by the design needs of flood protection works. Empirical

ideas similar to the modern probability concepts had been formulated in hydrology

about a century ago (for instance, the hydrological frequency curves known as “dura-

tion curves”; Hazen, 1914). At about the same time, great mathematicians were

developing the theoretical foundation of probability of extreme values (von Bort-

kiewicz, 1922a, b; von Mises, 1923; Fréchet, 1927; Fisher and Tippet, 1928; Gne-

denco, 1941). Around the 1950s the empirical and theoretical approaches converged

to form the branch of hydrology now called hydrologic statistics, whose founders

were Jenkinson (1955), Gumbel (1958) and later Chow (1964). However, as already

stated above, based on the PMP example, the current state of knowledge is not satis-

factory and several important questions still wait for answers. For instance, Klemeš

(2000) argues that “The distribution models used now, though disguised in rigorous

mathematical garb, are no more, and quite likely less, valid for estimating the prob-

abilities of rare events than were the extensions ‘by eye’ of duration curves employed

50 years ago.” Obviously, however, the probabilistic approach to extreme values of

hydrological processes signifies a major progress in hydrological science and engi-

neering as it quantifies risk and disputes arbitrary and rather irrational concepts and

approaches.

The most important questions that have not received definite answers yet are related

in one or another manner to the notion of infinite. These questions concern the

asymptotic distribution of maxima, a distribution that assumes a number of events

6

tending to infinity, and are focused on the distribution tails, i.e. the behaviour of the

distribution function as the hydrological quantity of interest tends to infinity.

Thus, if one is exempted from the concept of an upper limit to a hydrological quantity

and adopts a probabilistic approach, one will accept that the quantity may grow to

infinity with decreasing probability of exceedence. In this case, as probability of

exceedence tends to zero, there exists a lower limit to the rate of growth which is

mathematically proven. This lower limit is represented by the Gumbel distribution,

which has the “lightest” possible tail. So, abandoning the PMP concept and adopting

the Gumbel distribution can be thought of as a step from a finite upper limit to infin-

ity, but with the slowest possible growth rate towards infinity. Does nature follow the

slowest path to infinity? This question is not a philosophical one but has strong engi-

neering implications. If the answer is positive, the design values for flood protection

structures or measures will be the smallest possible ones (among those obtained by the

probabilistic approach), otherwise they will be higher. These questions are studied in

this article with the help of some recent works.

2. Basic concepts of extreme value distributions

It is recalled from probability theory that, given a number n of independent identically

distributed random variables, the largest (in the sense of a specific realization) of them

(more precisely, the largest order statistic), i.e.:

X := max {Y1, Y2, …, Yn} (1)

has probability distribution function

Hn(x) = [F(x)]n (2)

where F(x) := P{Yi ≤ x} is the common probability distribution function of each of Yi.

Herein, F(x) will be referred to as parent distribution. If n is not constant but rather

can be regarded as a realisation of a random variable with Poisson distribution with

mean ν, then the distribution of X becomes (e.g. Todorovic and Zelenhasic, 1970;

Rossi et al., 1984),

H΄

ν(x) = exp{–ν[1 – F(x)]} (3)

Since ln[F(x)]n = n ln {1 – [1 – F(x)]} = n {–[1 – F(x)] – [1 – F(x)]2 – …} ≈ –n [1 –

F(x)], it turns out that for large n or large F(x), Hn(x) ≈ H΄

n(x). Numerical investigation

shows that even for relatively small n, the difference between Hn(x) and H΄

n(x) is small

(e.g., for n = 10, the relative error in estimating the exceedence probability 1 – Hn(x)

from (3) rather than from (2) is about 3% at most).

In hydrological applications concerning the distribution of annual maximum rainfall

or flood, it may be assumed that the number of values of Yi (e.g., the number of

storms or floods per year), whose maximum is the variable of interest X (e.g. the

maximum rainfall intensity or flood discharge), is not constant. Besides, the Poisson

model can be regarded as an acceptable approximation for such applications. Given

also the small difference between (3) and (2), it can be concluded that (3) should be

regarded as an appropriate model for the practical hydrological applications discussed

in this article.

7

The exact distributions (2) or (3), whose evaluation requires the parent distribution to

be known, have rarely been used in hydrological statistics. Instead, hydrological

applications have made wide use of asymptotes or limiting extreme value distribu-

tions, which are obtained from the exact distributions when n tends to infinity. Gum-

bel (1958) developed a comprehensive theory of extreme value distributions.

According to this, as n tends to infinity Hn(x) converges to one of three possible

asymptotes, depending on the mathematical form of F(x). Obviously, the same limit-

ing distributions may also result from H΄

ν(x) as ν tends to infinity. All three asymp-

totes can be described by a single mathematical expression introduced by Jenkinson

(1955, 1969) and become known as the general extreme value (GEV) distribution.

This expression is

H(x) = exp⎩

⎪

⎨

⎪

⎧

⎭

⎪

⎬

⎪

⎫

–⎣

⎢

⎡

⎦

⎥

⎤

1 + κ⎝

⎜

⎛

⎠

⎟

⎞

x

λ – ψ

–1/κ

, κx ≥ κλ(ψ – 1/κ) (4)

where ψ, λ > 0 and κ are location, scale and shape parameters, respectively; ψ and κ

are dimensionless whereas λ has same units as x. (Note that the sign convention of κ

in (4) may differ in some hydrological texts). Leadbetter (1974) showed that this

holds not only for maxima of independent random variables but for dependent random

variables, as well, provided that there is no long-range dependence of high-level

exceedences.

When κ = 0, the type I distribution of maxima (EV1 or Gumbel distribution) is

obtained. Using simple calculus it is found that in this case, (4) takes the form

H(x) = exp[–exp (–x/λ + ψ)] (5)

which is unbounded from both from above and below (–∞ < x < +∞).

When κ > 0, H(x) represents the extreme value distribution of maxima of type II

(EV2). In this case the variable is bounded from below and unbounded from above

(λψ – λ/κ ≤ x < +∞). A special case is obtained when the left bound becomes zero (ψ =

1/κ). This special two-parameter distribution is

Η(x) = exp⎩

⎪

⎨

⎪

⎧

⎭

⎪

⎬

⎪

⎫

–⎝

⎜

⎛

⎠

⎟

⎞

λ

κx

1/κ

, x ≥ 0 (6)

In some texts, (6) is referred to as the EV2 distribution. Here, as in Gumbel (1958),

the name EV2 distribution is used for the complete three-parameter form (equation

(4)) with κ > 0. Distribution (6) is referred to as the Fréchet distribution.

When κ < 0, H(x) represents the type III (EV3) distribution of maxima. This, how-

ever, is of no practical interest in hydrology as it refers to random variables bounded

from above (–∞ < x ≤ λψ – λ/κ). As discussed in the introduction, many regard an

upper bound in hydrological quantities as reasonable. Even Jenkinson (1955) regards

the EV3 distribution as “the most frequently found in nature, since it is reasonable to

expect the maximum values to have an upper bound”. However, he leaves out rainfall

from this conjecture saying “to a considerable extent rainfall amounts are ‘uncon-

trolled’ and high falls may be recorded”. In fact, he proposes the EV2 distribution for

rainfall (note that he uses a different convention, referring to EV2 as type I). In a

recent study, Sisson et al. (2006), even though detecting EV2 behaviour of rainfall

8

maxima, attempt to incorporate the idea of a PMP upper bound within an EV2 model-

ling framework (see also Francés, this volume).

The simplicity of the above mathematical expressions is remarkable. This extends to

the inverse function x(H) ≡ xH that is used to estimate a distribution quantile for a

given non-exceedence probability H. This is

xH = (λ/κ) [exp(κ zH) – 1] + λψ (7)

where zH is the so called Gumbel reduced variate, defined as

zH := –ln(–ln H) (8)

For the Gumbel distribution, (7) takes the special form

xH = λ(zH + ψ) (9)

which implies a linear plot of xH versus zH (a plot known as the Gumbel probability

plot). For the Fréchet distribution, (7) takes the form

xH = λψ exp(κ zH) (10)

which implies a linear plot of ln xH versus zH (a plot referred to as the Fréchet prob-

ability plot).

The close relationship between the distribution of maxima H(x) and the tail of the

parent distribution F(x) allows for the determination of the latter if the former is

known. The tail of F(x) can be represented by the distribution of x conditional on

being greater than a certain threshold ξ, i.e. Gξ(x) := F(x|x > ξ), for which:

1 – Gξ(x) = 1 – F(x)

1 – F(ξ) , x ≥ ξ (11)

If one chooses ξ so that the exceedence probability 1 – F(ξ) equals 1/ν, the reciprocal

of the mean number of events in a year (this is implied when the partial duration

series is formed from a time series of measurements, by choosing a number of events

equal to the number of years of record), and denote G(x) the conditional distribution

for this specific value, then:

1 – G(x) = ν[1 – F(x)] (12)

Combining equation (12) with equation (3) it is obtained that:

G(x) = 1 + ln H΄

ν(x) (13)

If H΄

ν(x) is given by the limit distribution H(x) in equation (4), then it is concluded that

for κ > 0:

G(x) = 1 – ⎣

⎢

⎡

⎦

⎥

⎤

1 + κ⎝

⎜

⎛

⎠

⎟

⎞

x

λ – ψ

–1/κ

, x ≥ λψ (14)

which is the generalized Pareto distribution. Similarly, for κ = 0:

9

G(x) = 1 – exp(–x/λ + ψ), x ≥ λψ (15)

which is the exponential distribution. For the special case ψ = 1/κ:

G(x) = 1 – ⎝

⎜

⎛

⎠

⎟

⎞

λ

κx

1/κ

, x ≥ λ/κ (16)

In this way, a one to one correspondence between the type of the extreme value distri-

bution and the type of the tail of the parent distribution is established. The EV1 distri-

bution (κ = 0, equation (5)) corresponds to an exponential parent distribution tail

(equation (15)), else known as short tail, or light tail. The EV2 distribution (κ > 0,

equation (4) including the special case (6)) corresponds to an over-exponential parent

distribution tail (equation (14) including the special case (16)), else known as hyper-

exponential tail, Pareto tail, power-law tail, algebraic tail, long tail, heavy tail and fat

tail.

From the distribution functions H(x) and G(x), two return periods can be defined as

follows:

T := δ / [1 – G(x)], T΄ := δ / [1 – Η(x)] (17)

where δ is the mean interarrival time of an event that is represented by the variable X.

In both cases X represents annual values, so δ = 1 year; δ is most commonly omitted

but here we kept it for dimensional consistency, given that the return period has units

of time, typically expressed in years.

Equation (16) is precisely a power law relationship between the distribution quantile x

and the return period T:

x = (λ/κ)(T/δ)κ (18)

In the generalized Pareto case (equation (14)), the corresponding relationship is

x = (λ/κ)[(T/δ)κ – 1 + κψ] (19)

whereas in the exponential case the corresponding relationship is

x = λ [ln(T/δ) + ψ)] (20)

3. The dominance of the Gumbel distribution

Due to their simplicity and generality, the limiting extreme value distributions have

become very widespread in hydrology. In particular, EV1 has been by far the most

popular model. In hydrological education is so prevailing that most textbooks contain

the EV1 distribution only, omitting EV2. In hydrological engineering studies, espe-

cially those analysing rainfall maxima, the use of EV1 has become so common that its

adoption is almost automatic, without any reasoning or comparison with other possi-

ble models. Sometimes, it is also suggested, or even required, by the guidelines or

regulations of several organizations, institutes and country services. Historically,

several reasons have been contributed to the prevailing of the Gumbel distribution:

10

Theoretical reasons. Most types of parent distributions functions that are used in

hydrology, such as exponential, gamma, Weibull, normal, lognormal, and the EV1

itself (e.g. Kottegoda and Rosso, 1997) belong to the domain of attraction of the

Gumbel distribution. In contrast, the domain of attraction of the EV2 distribution

includes parent distributions such as Pareto, Cauchy, log-gamma (also called log-

Pearson type 3), and the EV2, which traditionally are not in very common use in

hydrology, particularly in rainfall modelling.

Simplicity. The mathematical handling of the two-parameter EV1 is simpler than that

of the three-parameter EV2.

Accuracy of estimated parameters. Obviously, two parameters are more accurately

estimated than three. For the former case, mean and standard deviation (or second L-

moment) suffice, whereas in the latter case the skewness is also required and its esti-

mation is extremely uncertain for typical small-size hydrological samples.

Practical reasons. Probability plots are the most common tools used by practitioners,

engineers and hydrologists, to choose an appropriate distribution function. As

explained earlier, EV1 offers a linear Gumbel probability plot of observed xH versus

observed zH (which is estimated in terms of plotting positions, i.e. sample estimates of

probability of non-exceedence). In contrast, a linear probability plot for the three-

parameter EV2 is not possible to construct (unless the shape parameter κ is fixed).

This may be regarded as a primary reason of choosing EV1 against the three-parame-

ter EV2 in practice. For the two parameter EV2 (Fréchet) distribution, a linear plot

(ln xH versus zH) is possible as discussed earlier. However, empirical evidence shows

that, in most cases, plots of xH versus zH give more straight-line arrangements than

plots of ln xH versus zH.

From a practical point of view, the choice of an EV1 over an EV2 distribution may be

immaterial if small return periods T are considered. For instance, in typical storm

sewer networks, designed on the basis on return periods of about 5-10 years, the dif-

ference of the two distributions is negligible; besides, in such return periods even

interpolation from the empirical distribution would suffice. However, for large T (>

50 years), for which extrapolation is required, EV1 results in probability of

exceedence of a certain value significantly lower than EV2. That is, for large rainfall

depths, EV1 yields the lowest possible probability of exceedence (the highest possible

T) in comparison to those of EV2 for any value of κ. For T > 1000, the return period

estimated by EV1 could be orders of magnitude higher than that of EV2 (see Figure 3

and its discussion in section 5).

This should be regarded as a strong disadvantage of EV1 from the engineering point

of view. Normally, this would be a sufficient reason to avoid the use of EV1 in engi-

neering studies. Obviously, this disadvantage of EV1 would be counterbalanced only

by strong empirical evidence and theoretical reasoning. In practice, the small size of

common hydrological records (e.g. a few tens of years) cannot provide sufficient

empirical evidence for preferring EV1 over EV2. This will be discussed further in

section 5. In addition, the theoretical reasons, exhibited above, are not strong enough

to justify the adoption of the Gumbel distribution. This will be discussed in section 4.

11

4. Theoretical justification of the distribution type

As discussed above, the rainfall process at fine time scales (hourly, daily) has been

modelled by distributions belonging to the domain of attraction of EV1 such as

gamma or Weibull. However, the adoption of these distributions is rather empirical,

not based on theoretical reasoning. Thus, the above theoretical justification of the

EV1 distribution is inconsistent. In contrast, recently three arguments have been for-

mulated that favour the EV2 over the EV1 distribution, which are summarized below.

Argument 1: Asymptotic vs. actual distribution. What matters in hydrological

applications, is the actual distribution of maxima, i.e. Hn(x) or H΄

n(x) as given in (2) or

(3), respectively. The asymptotic distribution H(x) for n → ∞ provides a useful indica-

tion of the behaviour in the tails but not necessarily a model for practical use. It has

been observed (Koutsoyiannis, 2004a) that the convergence of Hn(x) to H(x) may be

enormously slow. This is demonstrated in Figure 1, which depicts Gumbel probability

plots of the exact distribution functions of maxima Hn(x) for n = 103 and 106 for a

parent distribution function that is Weibull (F(y) = 1 – exp(–yk)) with shape parameter

k = 0.5. The parent distribution belongs to the domain of attraction of the Gumbel

limiting distribution, so the Gumbel probability plot tends to a straight line as n → ∞.

However, even for n as high as 106 the curvature of the distribution function is appar-

ent. Obviously, in hydrological applications, such a high number of events within,

say, a year, is not realistic (it can be expected that the number of storms or floods in a

location will not exceed the order of 10-102). Thus, the limiting distribution for n → ∞

may be not useful. The slow convergence in this case should be contrasted with fast

convergence in other limiting situations; for example the distribution of the sum of a

number of variables to the normal distribution, according to the central limit theorem,

is very fast, so that about 10-30 events suffice to obtain an almost perfect approxima-

tion to the normal distribution.

Let us assume that the Weibull distribution (which belongs to the domain of attraction

of EV1) with shape parameter smaller than 1 (e.g. k = 0.5 as in the example of Figure

1) can be a plausible parent distribution of storms and floods at a fine time scale,

which is known to be positively skewed and with J-shaped density function. Accord-

ingly, as observed in Figure 1, the probability plot of the exact distribution of maxima

should be a convex curve, rather than a straight line, which indicates that, for a rela-

tively small n, a three-parameter EV2 distribution may approximate sufficiently the

exact distribution. Thus, even if the parent distribution belongs to the domain of

attraction of the Gumbel distribution, an EV2 distribution can be a choice better than

EV1.

Argument 2: Change of domain of attraction due to parameter changes. In argu-

ment 1 it was assumed that the random variables Yi whose maximum values are stud-

ied are independent and identically distributed ones. However, it is more plausible to

assume that different Yi have the same type of distribution function Fi(y) but with

different parameters. The statistical characteristics (e.g., averages, standard deviations

etc.) and, consequently, the parameters of distribution functions exhibit seasonal

variation. In addition, evidence from long geophysical records shows that there exist

random fluctuations of the statistical properties on multiple large time scales (e.g.,

tens of years, hundreds of years, etc.).

In this respect, it has been shown theoretically that a gamma parent distribution,

12

which belongs to the domain of attraction of EV1, switches to the EV2 domain of

attraction if its scale parameter varies randomly following another gamma distribution

function (Koutsoyiannis, 2004a). This point was also made by Katz et al. (2005) for

an exponential parent distribution, which is a special case of the gamma distribution

function. In addition, it was demonstrated using Monte Carlo simulations (Koutsoy-

iannis, 2004a) that a gamma parent distribution function with constant shape parame-

ter and scale parameter shifting between two values, which are sampled at random

with specified probabilities, results in an actual (for n = 5) extreme value distribution

which is closely approximated by an EV2 distribution, whereas the EV1 distribution

departs significantly from the simulated actual distribution.

Argument 3: Principle of maximum entropy. The principle of maximum entropy is

a well established mathematical and physical principle, defined on grounds of prob-

ability theory, that can infer the detailed structure or behaviour of a system from

rough (macroscopical) information of the system. For a stochastic system, the princi-

ple can determine the distribution function of the system states, from assumed macro-

scopical constraints (e.g. moments) of the system. The classical definition of entropy

φ, known as the Boltzmann-Gibbs-Shannon entropy, is

φ := Ε[–ln f(Y)] = –⌡

⌠

–∞

∞

f(y) ln f(y) dy (9)

where f(y) := dF(y)/dy denotes the probability density function of the parent variable

and E[.] denotes expectation.

In a recent study, Koutsoyiannis (2005a) has shown that the principle of maximum

entropy can predict and explain the distribution functions of hydrological variables

using only two “macroscopic” statistical properties of observed time series (equality

constraints), the mean µ and the standard deviation σ, as well as the inequality con-

straint that the variables under study are non-negative quantities. For variables with

high variation (σ/µ > 1) the classical entropy φ fails to apply with these constraints. In

this case, a generalized definition of entropy, due to Tsallis (1988, 2004) should be

used instead. This is

φq =

1 – ⌡

⌠

0

∞

[f(x)]q dx

q – 1 (17)

and precisely reproduces φ when q = 0. Maximization of φq with the aforementioned

constraints results in Pareto tail of the parent distribution with shape parameter κ =

(1 – q)/q. Now, there is sufficient empirical evidence that at small time scales rainfall

exhibits high variation (σ/µ > 1). In this case, maximization of Tsallis entropy yields

power-type (Pareto) distribution.

5. Empirical justification of the distribution type of extreme rainfall

In seeking empirical evidence to justify the distribution type, one must be aware of

bias in statistical estimations and error probability in statistical tests that emerge from

typical hydrological samples. In fact, estimation bias and error probability are very

large and this explains why the inappropriateness the EV1 distribution was not under-

13

stood for so many years. Specifically, typical annual maximum rainfall series with

record lengths 20–50 years completely hide the EV2 distribution and display EV1

behaviour. This was initially demonstrated by Koutsoyiannis and Baloutsos (2000)

using an annual series of maximum daily rainfall in Athens, Greece, extending

through 1860–1995 (136 years). This series was found to follow EV2 distribution, but

if smaller parts of the series were analysed, the EV1 distribution seemed to be an

appropriate model.

A systematic Monte Carlo simulation study to address this problem has been done in

Koutsoyiannis (2004a). Some of the results, concerning the estimation bias, are

depicted in Figure 2. A negative bias, defined as estimated κ minus true κ, is apparent,

for both the moments and L-moments estimators. It can be observed that for true κ =

0.15 (a value that is typical for extreme rainfall, as will be discussed later) and for a

record length of 20 years the bias of the method of moments is –0.15, which means

that the estimated κ will be zero! Even for a record length of 50 years the negative

bias is high (b = –0.12), so that κ will be estimated at 0.03, a value that will not give

good reason for preferring EV2 to EV1.

The situation is improved if L-moments estimators are used as the resulting bias is

much lower. However the method of L-moments is relatively new (Hosking et al.,

1985; Hosking, 1990) and its use has not been very common so far. In addition, even

the method of L-moments is susceptible to type II error (no rejection of the null false

hypothesis of an EV1 distribution against the true alternative hypothesis of EV2 dis-

tribution) with a high probability. As demonstrated in Koutsoyiannis (2004a) for κ =

0.15 and record length 20 years the frequency of not rejecting the EV1 distribution is

80%. Even for record length 50 years this frequency is high: 62%.

The results of this analysis show that (a) only long records (e.g. 100 years or more)

could provide evidence of the distribution type of extreme rainfall, and (b) even with

these records, the estimation of the shape parameter κ of the GEV distribution is

highly uncertain, and an ensemble of many records should be used to obtain a reliable

estimate.

In this respect, Koutsoyiannis (2004b) compiled an ensemble of annual maximum

daily rainfall series from 169 stations of the Northern Hemishpere (28 from Europe

and 141 from the USA) roughly belonging to six major climatic zones. All series had

lengths from 100 to 154 years, the top three (in terms of length) being Florence,

Genoa and Athens, with record lengths 154, 148 and 143 years respectively. The

empirical distribution of one of the stations (Athens, Greece) is shown in Figure 3, on

Gumbel probability plot, along with the theoretical EV2 and EV1 distributions fitted

by several methods. The plot clearly shows that (a) the EV2 distribution fits the

empirical one better than the EV1 distribution; for the highest observed daily rainfall

(~150 mm), EV2 and EV1 assign return periods of ~200 and ~1000 years (differing

by a factor of 5), respectively; for a rainfall depth of ~220 mm, EV2 and EV1 assign

return periods of ~1000 and ~100 000 years (differing by two orders of magnitude),

respectively. These observations demonstrate how important the correct choice of the

theoretical model is and how much the EV1 distribution underestimates the return

period of extreme rainfall.

In addition, a PMP value, estimated by Hershfield’s method is also plotted in Figure

3. As discussed above, this value should not be regarded as an upper bound of rainfall

14

but just as a value with high return period. It turns out from Figure 3 that the return

period of this PMP values is around 50 000 years. It may be useful to mention that the

aforementioned critical revisit (Koutsoyiannis, 1999) of Hershfield’s data set, on

which his method was based, revealed that Hershfield’s PMP should be regarded as a

rainfall value with return period of about 65 000 years.

These findings are representative of a general behaviour of all 169 rainfall records. In

fact, in more than 90% of the records the estimated κ by the methods of maximum

likelihood and L-moments were positive. The small percentage of non-positive κ in

the remaining records is fully explained as a statistical sampling effect. This provides

sufficient support for a general applicability of the EV2 distribution worldwide. Fur-

thermore, the ensemble of all samples were analysed in combination and it was found

that several dimensionless statistics, including the coefficient of variation of the

annual maximum series, are virtually constant worldwide, except for an error that can

be attributed to a pure statistical sampling effect. This enabled the formation of a

compound series of annual maxima, after standardization by mean, for all 169 sta-

tions. The empirical distribution of the compound series is shown in Figure 4, on

Gumbel probability plot, along with the theoretical EV2 and EV1 distributions fitted

by several methods. The plot clearly shows that the EV2 distribution fits the empirical

one whereas the EV1 distribution is totally inappropriate. The compound series also

supported the estimation of a unique κ for all stations, which was found to be 0.15.

The same data set was revisited in Koutsoyiannis (2005a) in a framework investigat-

ing the applicability of the maximum entropy principle in hydrology. In this case,

instead of series of annual maxima, the series-above-threshold were constructed for

168 out of 169 records (in the Athens case only the annual maximum values were

available, and thus the construction of a series-above-threshold was not possible). All

series were standardized by their mean and merged in one sample with length 17 922

station-years. The empirical distribution of this sample is depicted in Figure 5 (double

logarithmic plot), where values lower than 0.79 are not shown, as this number is the

lowest value of the merged series-above-threshold. In addition, several theoretical

distribution functions are also plotted. Among these, the Pareto distribution is

obtained by the maximum entropy principle for coefficient of variation σ/µ = 1.19.

The agreement of the Pareto distribution with the empirical one is remarkable. The

Pareto distribution is precisely consistent with the EV2 distribution of the annual

maximum, as justified in section 2. The shape parameter of the Pareto distribution, as

obtained by the maximum entropy principle, is 0.15, the same value with the one

obtained by fitting the EV2 distribution in the compound series of annual maximum

rainfall.

Additional empirical evidence with same conclusions is provided by the aforemen-

tioned Hershfield’s (1961a) data set. Koutsoyiannis (1999) showed that this is con-

sistent with the EV2 distribution with κ = 0.13. The plot of Figure 6 (EV2 probability

plot with fixed κ = 0.15, which is further explained in section 7) indicates that the

value κ = 0.15 can be acceptable for that data set too. This enhances the trust that an

EV2 distribution with κ = 0.15 can be thought of as a generalized model appropriate

for mid latitude areas of the north hemisphere.

Additional empirical evidence with same orientation was provided by Chaouche

(2001) and Chaouche et al. (2002). Chaouche (2001) exploited a data base of 200

rainfall series of various time steps (month, day, hour, minute) from the five conti-

15

nents, each including more than 100 years of data. Using multifractal analyses he

showed that (a) an EV2/Pareto type law describes the rainfall amounts for large return

periods; (b) the exponent of this law is scale invariant over scales greater than an

hour; and (c) this exponent is almost space invariant.

Other studies have also expressed scepticism for the appropriateness of the Gumbel

distribution for the case of rainfall extremes and suggested hyper-exponential tail

behaviour. Thus, Wilks (1993), who investigated empirically several distributions

which are potentially suitable for describing extreme rainfall, using rainfall records of

13 stations in the USA with lengths ranging from 39 to 91 years, noted that EV1 often

underestimates the largest extreme rainfall amounts and suggested an update and

revision of the Technical Paper 40 (Hershfield, 1961b), a widely used climatological

atlas of United States that was compiled fitting EV1 distributions to annual extreme

rainfall data. Coles et al. (2003) and Coles and Pericchi (2003) concluded that infer-

ence based on the Gumbel model to annual maxima may result in unrealistically high

return periods for certain observed events and suggested a number of modifications to

standard methods, among which is the replacement of the Gumbel model with the

GEV model. Mora et al. (2005) confirmed that rainfall in Marseille (a raingauge

included in the study by Koutsoyiannis, 2004b) shows hyper-exponential tail behav-

iour. They also provided two regional studies in the Languedoc-Roussillon region

(south of France) with 15 and 23 gauges, for which they found that a similar distribu-

tion with hyper-exponential tail could be fitted; this, when compared with previous

estimations, leads to a significant increase in the depth of rare rainfall. On the same

lines, Bacro and Chaouche (2006) showed that the distribution of extreme daily rain-

fall at Marseille is not in the Gumbel law domain. Sisson et al. (2006) highlighted the

fact that standard Gumbel analyses routinely assign near-zero probability to subse-

quently observed disasters, and that for San Juan, Puerto Rico, standard 100-year

predicted rainfall estimates may be routinely underestimated by a factor of two.

Schaefer et al. (2006) using the methodology by Hosking and Wallis (1997) for

regional precipitation-frequency analysis and spatial mapping for 24-hour and 2-hour

durations for the Wahington State, USA, found that the distribution of rainfall

maxima in this State generally follows the EV2 distribution type.

6. The distribution tails in other hydrological processes

The theoretical arguments presented in section 4 that support the EV2 over the EV1

distribution are not related merely to rainfall but rather to any process with high vari-

ability. Thus, it could be expected that other processes should also exhibit a similar

behaviour.

This is the case for flood runoff. In fact, as demonstrated by Koutsoyiannis (2005b, c)

and Gaume (2006), there are theoretical reasons by which we can conclude that the

type of extreme value distribution in rainfall and runoff will be the same. If rainfall

follows the EV1 distribution, then it can be shown that runoff will also follow the

EV2 distribution. Conversely, if runoff follows the EV2 distribution, then rainfall

should necessarily follow the EV2 distribution. Perhaps, the EV2 distribution in flood

is easier to verify empirically (due to magnification of variability of extremes) and

thus, the EV1 distribution has not been as standard in flood modelling as is in rainfall

modelling. Thus, the log-gamma model, which belongs to the domain of attraction of

EV2 has more frequently used in flood modelling. For instance, this model is the

16

federally adopted approach to flood frequency in the USA (US Water Resources

Council, 1982). But when flood frequency is estimated from rainfall, which is mod-

elled using the EV1 model, then the flood frequency becomes necessarily consistent

to the EV1, as explained above. Several more recent studies have also supported a

three-parameter GEV over an EV1 distribution for floods (Farquharson et al., 1992;

Madsen et al., 1997).

Similar results have been provided by fractal/multifractal analyses. Thus, Turcotte

(1994) studied flood peaks over threshold in 1200 stations in the United States and

concluded that they follow a fractal law, which essentially is described by equation

(18). Pandey et al. (1998) established power-law distributions for daily mean stream-

flows in 19 river basins in the USA. Similarly, Malamud and Turcotte (2006) exam-

ined six river basins from different climatic regions and hydrologic conditions in the

USA and concluded in power law distributions using either flood peaks over threshold

or all daily mean streamflows, also considering in some cases paleoflood data.

Naturally, other hydrological processes driven by runoff are anticipated to follow

long-tail distributions, too. However, it may be more difficult to verify empirically the

type of distribution tail in such cases, because instrumental records are typically much

shorter. Nevertheless, reconstructions of time series are possible in some other cases,

for instance, in sediment yield time series from sediment deposits. Thus, Katz et al.

(2005) were able to detect long tail behaviour in the annual sediment yield time series

Nicolay Lake on Cornwall Island, Canada. In addition, Katz et al. (2005) provide an

excellent review of the tail behaviours of several ecological variables.

7. Practical issues for the application of the EV2 distribution

As discussed in section 3, the simplicity and the two-parameter form of the EV1

distribution are strong points that made it prevail in hydrology. However, if the shape

parameter of the EV2 distribution is fixed (in extreme rainfall κ = 0.15, as discussed

in section 5) the general handling of the distribution becomes as simple as that of the

EV1 distribution. For example, the estimation of the remaining two parameters

becomes similar to that of the EV1 distribution. That is, the scale parameter can be

estimated by the method of moments from:

λ = c1σ (21)

where c1 = κ/Γ(1 – 2κ) – Γ2(1 – κ) or c1 = 0.61 for κ = 0.15, while in the EV1 case c1

= 0.78. The relevant estimate for the method of L-moments is:

λ = c2λ2 (22)

where λ2 is the second L-moment and c2 = κ/[Γ(1 – κ)(2κ – 1)] or c2 = 1.23 for κ =

0.15, while in the EV1 case c2 = 1.443. The estimate of the location parameter for

both the method of moments and L-moments is:

ψ = µ/λ – c3 (23)

where c3 = [Γ(1 – κ) – 1]/κ or c3 = 0.75 for κ = 0.15, while in the EV1 case c3 = 0.577.

17

If, in addition to λ and ψ, the shape parameter is to be estimated directly from the

sample (which is not advisable but it may be useful for comparisons) the following

approximate equations can be used (Koutsoyiannis, 2004b):

κ = 1

3 – 1

0.31 + 0.91Cs + (0.91Cs)2 + 1.8 (24)

κ = 8c – 3c2, c := ln2

ln3 – 2

3 + τ3 (25)

where Cs and τ3 are the regular and L skewness coefficients, respectively. The former

corresponds to the method of moments and the resulting error is smaller than ±0.01

for –1 < κ < 1/3 (–2 < Cs< ∞). The latter corresponds to the method of L-moments and

the resulting error is smaller than ±0.008 for –1 < κ < 1 (–1/3 < τ3 < 1).

The construction of linear probability plots is also easy if κ is fixed. It suffices to

replace in the horizontal axis the Gumbel reduced variate zH = –ln(–lnH) (equation

(8)) with the GEV reduced variate zH = [(–lnH)–κ – 1]/κ. An example of such a plot is

depicted in Figure 6.

8. Resulting intensity-duration-frequency curves

The construction of rainfall intensity-duration-frequency (IDF) relationships or curves

is one of the most common practical tasks related to the probabilistic description of

extreme rainfall. Unfortunately, however, the construction is typically performed by

empirical procedures (e.g. Chow et al., 1988). Even the terms “duration” and “fre-

quency” in IDF are misnomers; in fact, “duration” should read “timescale” (in order

not to be confused with the duration of a rainfall event) and “frequency” should read

“return period”. Thus, the IDF relationships are mathematical expressions of the rain-

fall intensity i(d, T) averaged over timescale d and exceeded on a return period T.

The recent theoretical advances in the probabilistic description can support a more

theoretically based, mathematically consistent, and physically sound approach. A few

assumptions are needed to support such an approach, namely:

1. The separability assumption, according to which the influences of return period

and timescale are separable (Koutsoyiannis et al., 1998), i.e.,

i(d, T) = a(T) / b(d) (26)

where a(T) and b(d) are mathematical expressions to be determined.

2. The similarity assumption, according to which the distribution of average rain-

fall intensity conditional on being wet is statistically similar for all time scales

(Koutsoyiannis, 2006).

3. A stochastic description of rainfall intermittency, which, as suggested by Kout-

soyiannis (2006) should be a generalization of a Markov chain process that

results applying the maximum entropy principle to the rainfall occurrence

process.

18

4. A probabilistic distribution of the rainfall depth at any scale, which as discussed

above should be of Pareto/EV2 type.

Based on assumptions 1-3, Koutsoyiannis (2006) showed that the function b(d) can be

approximated for relatively short timescales by the expression (here written in slightly

different form)

b(d) = (1 + d/θ)η (27)

where θ > is a parameter with units same as the timescale d and η is a dimensionless

parameter with values in the interval (0, 1). This resembles an expression historically

established with empirical considerations. The approximate character of (46) as well

as that of assumptions 1 and 2 should be underlined. At the same time, it should be

noted that (46) is more accurate than a pure power law of b(d), which has been sug-

gested by modern fractal approaches. Particularly, (46) implies a decrease of rainfall

intensity on small timescales, as compared to what is predicted by a power law. This

is very important for the design of urban drainage networks that have small concen-

tration times.

Furthermore, assumption 3 combined with (19) results in

i(d, T) = (λ/κ)[(T/δ)κ – 1 + κψ] (28)

By comparison of (28) with (26), we conclude that only the scale parameter λ should

be a function of timescale d and particularly that λ ~ (1 + d/θ) –η. We easily then

deduce that the final form of the IDF will be

i(d, T) = λ΄ (T/δ)κ – ψ΄

(1 + d/θ)η (29)

where ψ΄ := 1 – κψ and λ΄ := (λ/κ) (1 + d/θ)η, which should be constant, independent

of d, Notice that (29) is dimensionally consistent and that the return period T refers to

the parent distribution (and thus it can take values smaller than δ = 1 year, but neces-

sarily greater than δψ΄1/κ). Also, notice that the numerator of (29) differs from a pure

power law that has been commonly used in engineering practice. By virtue of (13) and

(17), (29) can be easily converted in terms of the return period of the distribution of

maxima and takes the form

i(d, T) = λ΄ [–ln(1 – δ/T΄)]–κ – ψ΄

(1 + d/θ)η (30)

In the latter case, obviously T΄ should be greater than δ = 1 year. All parameters are

precisely the same in both (29) and (30). Consistent parameter estimation techniques

for these relationships have been discussed in Koutsoyiannis et al. (1998).

9. Conclusions

Historically, the modelling of rainfall has suffered from several fallacies, such as the

existence of an upper bound (PMP), and empirical practices that do not have theoreti-

cal support. Rational thinking and fundamental scientific principles, formulated since

the birth of science in ancient Greece, can help combat such fallacies.

19

Probability, statistics and stochastic processes have offered a better alternative in

perceiving and modelling of the rainfall process. However, even the probabilistic

approaches have suffered from misconceptions and bad practices that have resulted in

underestimation of rainfall variability and uncertainty. Among them is the wide appli-

cation of the Gumbel or EV1 distribution, which has been the prevailing model for

rainfall extremes despite the fact that it yields unsafe (the smallest possible) design

rainfall values.

More recent studies have provided theoretical arguments and general empirical

evidence from many rainfall records worldwide, which suggest a long distribution tail

and favour the EV2 distribution of maxima. Simultaneously, they explain that the

broad use of the EV1 distribution worldwide is in fact related to statistical biases and

errors due to small sample sizes, rather than to the real behaviour of rainfall maxima,

which should be better described by the EV2 distribution. Similar behaviours have

been also detected in other hydrological processes such as streamflow and sediment

transport.

The new methodological framework is more theoretically consistent, and more

mathematically and physically sound (justified by the physico-mathematical principle

of maximum entropy). Simultaneously, it is very simple so as to allow its easy

implementation in typical engineering tasks such as estimation and prediction of

design parameters, including the construction of IDF curves. The new framework

imposes also some requirements for stochastic models of rainfall, many of which are

currently not consistent with the long tail behaviour of the rainfall distribution.

References

Bacro, J.-N. and A. Chaouche (2006), Incertitude d’estimation des pluies extrêmes du pourtour

méditerranéen: illustration par les données de Marseille, Hydrol. Sci. J., 51(3), 389-405.

Benson, M.A. (1973), Thoughts on the design of design floods, in Floods and Droughts, Proc. 2nd

Intern. Symp. in Hydrology, pp. 27-33, Water Resources Publications, Fort Collins, Colorado.

Chaouche K. (2001), Approche multifractale de la modélisation stochastique en hydrologie. Thèse,

Ecole Nationale du Génie Rural, des Eaux et des Forêts, Centre de Paris, France

(http://www.engref.fr/ thesechaouche.htm).

Chaouche, K., P. Hubert and G. Lang (2002), Graphical characterisation of probability distribution

tails. Stoch. Environ. Res. Risk Assess. 16(5), 342–357.

Chow, V.T. (1964), Statistical and probability analysis of hydrology data, Part I, Frequency analysis,

In: Chow, V.T. (Ed.), Handbook of Applied Hydrology, McGraw-Hill, New York, pp. 8.1–8.42

(Section 8-I).

Chow, V.T., D.R. Maidment and L.W. Mays (1988), Applied Hydrology, McGraw-Hill.

Coles, S. and L. Pericchi (2003) Anticipating catastrophes through extreme value modelling. Appl.

Statist., 52, 405–416.

Coles, S., L.R. Pericchi and S. Sisson (2003), A fully probabilistic approach to extreme rainfall mod-

eling, J. Hydrol., 273(1–4), 35–50.

Crossley, J.N., C.J. Ash, C.J. Brickhill, J.C. Stillwell and N.H. Williams (1990), What Is Mathematical

Logic?, Dover, New York.

Dingman, S.L. (1994), Physical Hydrology, Prentice Hall, Englewood Cliffs, New Jersey.

Farquharson, F.A.K., J.R. Meigh J.V. and Sutcliffe (1992), Regional flood frequency analysis in arid

and semi-arid areas, J. Hydrol., 138, 487–501.

Feller, W. (1950), An introduction to Probability Theory and its Applications, Wiley, New York.

Fisher, R.A., and L.H.C. Tippet (1928), Limiting forms of the frequency distribution of the largest or

smallest member of a sample, Proc. Cambridge Phil. Soc., 24, 180-190.

Fréchet, M. (1927), Sur la loi de probabilité de l’écart maximum, Ann. de la Soc. Polonaise de Math.,

Cracow, 6, 93-117.

20

Gaume, E. (2006), On the asymptotic behavior of flood peak distributions, Hydrol. Earth Syst. Sci., 10,

233–243.

Gnedenco, B.V. (1941), Limit theorems for the maximal term of a variational series, Doklady Akad.

Nauk SSSR, Moscow, 32, 37 (in Russian).

Gumbel, E.J. (1958), Statistics of Extremes, Columbia University Press, New York.

Hazen, A. (1914), Storage to be provided in impounding reservoirs for municipal water supply. Trans.

Am. Soc. Civil Engrs, 77, 1539–1640.

Hershfield, D.M. (1961a), Estimating the probable maximum precipitation, Proc. ASCE, J. Hydraul.

Div., 87(HY5), 99-106.

Hershfield, D.M. (1961b), Rainfall Frequency Atlas of the United States, U.S. Weather Bur. Tech. Pap.

TP-40, Washington, DC.

Hershfield, D.M. (1965), Method for estimating probable maximum precipitation, J. American Water-

works Assoc., 57, 965-972.

Hosking, J.R.M. (1990), L-moments: analysis and estimation of distributions using linear combinations

of order statistics. J. Roy. Statist. Soc. Ser. B 52, 105–124.

Hosking, J.R.M., J.R. Wallis and E.F. Wood (1985), Estimation of the generalized extreme value

distribution by the method of probability weighted moments. Technometrics 27(3), 251–261.

Hosking, J.R.M., and J.R. Wallis (1997), Regional Frequency Analysis—An Approach Based on L-

Moments, Cambridge Univ. Press, New York.

Jenkinson, A.F. (1955), The frequency distribution of the annual maximum (or minimum) value of

meteorological elements, Q. J. Royal Meteorol. Soc., 81, 158-171.

Jenkinson, A.F. (1969), Estimation of maximum floods, World Meteorological Organization, Techni-

cal Note No 98, ch. 5, 183-257.

Katz, R.W., G.S. Brush and M.B. Parlange (2005), Statistics of extremes: modeling ecological distur-

bances, Ecology, 86(5), 1124–1134.

Klemeš, V. (2000), Tall tales about tails of hydrological distributions, J. Hydrol. Engineering, 5(3),

227-231 and 232-239.

Kolmogorov, A. N. (1933), Grundbegriffe der Wahrscheinlichkeitrechnung, Springer, Berlin. Pub-

lished in English in 1950 as Foundations of the Theory of Probability, Chelsea, New York.

Kottegoda, N.T., and R. Rosso (1997), Statistics, Probability, and Reliability for Civil and Environ-

mental Engineers, McGraw-Hill, New York.

Koutsoyiannis, D. (1999), A probabilistic view of Hershfield’s method for estimating probable maxi-

mum precipitation, Water Resour. Res., 35(4), 1313-1322.

Koutsoyiannis, D. (2004a), Statistics of extremes and estimation of extreme rainfall, 1, Theoretical

investigation, Hydrol. Sci. J., 49(4), 575-590.

Koutsoyiannis, D. (2004b), Statistics of extremes and estimation of extreme rainfall, 2, Empirical

investigation of long rainfall records, Hydrol. Sci. J., 49(4), 591-610.

Koutsoyiannis, D. (2005a), Uncertainty, entropy, scaling and hydrological stochastics, 1, Marginal dis-

tributional properties of hydrological processes and state scaling, Hydrol. Sci. J., 50(3), 381-

404.

Koutsoyiannis, D. (2005b), Interactive comment on “On the asymptotic behavior of flood peak distri-

butions – theoretical results” by E. Gaume, Hydrol. Earth Syst. Sci. Discuss., 2, S792–S796,

(www.copernicus.org/EGU/hess/hessd/2/S792/)

Koutsoyiannis, D. (2005c), Interactive comment on “On the asymptotic behavior of flood peak distri-

butions – theoretical results” by E. Gaume, Hydrol. Earth Syst. Sci. Discuss., 2, S838–S840,

(www.copernicus.org/EGU/hess/hessd/2/S838/).

Koutsoyiannis, D. (2006), An entropic-stochastic representation of rainfall intermittency: The origin of

clustering and persistence, Water Resour. Res., 42(1), W01401.

Koutsoyiannis, D., and G. Baloutsos (2000), Analysis of a long record of annual maximum rainfall in

Athens, Greece, and design rainfall inferences, Natural Hazards, 22(1), 31-51.

Koutsoyiannis, D., D. Kozonis, and A. Manetas (1998), A mathematical framework for studying rain-

fall intensity-duration-frequency relationships, J. Hydrol., 206(1-2), 118-135.

Koutsoyiannis, D., N. Mamassis and A. Tegos (2006), Logical and illogical exegeses of hydrometeo-

rological phenomena in ancient Greece, Proceedings of the 1st IWA International Symposium on

Water and Wastewater Technologies in Ancient Civilizations, 135-143, International Water

Association, Iraklio, 2006.

Koutsoyiannis, D. and T. Xanthopoulos (1999), Τεχνική Υδρολογία (Engineering Hydrology), 3rd Ed.,

National Technical University of Athens, Athens (in Greek).

Leadbetter M. R. (1974), On extreme values in stationary sequences, Z. Wahrscheinlichkeitstheorie u.

Verwandte Gebiete 28, 289–303.

21

Madsen, H., C.P. Pearson and D. Rosbjerg (1997), Comparison of annual maximum series and partial

duration series methods for modeling extreme hydrologic events, 2, Regional modeling, Water

Resour. Res., 33(4), 759–769.

Malamud, B.D. and D.L. Turcotte (2006), The applicability of power-law frequency statistics to floods,

J. Hydrol., 322, 168–180.

Mora, R.D., C. Bouvier, L. Neppel and H. Niel (2005), Approche régionale pour l’estimation des

distributions ponctuelles des pluies journalières dans le Languedoc-Roussillon (France), Hydrol.

Sci. J., 50(1), 17-29.

Papalexiou, S., and D. Koutsoyiannis (2006), A probabilistic approach to the concept of probable

maximum precipitation, Advances in Geosciences, 7, 51-54.

Pandey, G., S. Lovejoy and D. Schertzer (1998), Multifractal analysis of daily river flows including

extremes for basins of five to two million square kilometres, one day to 75 years, J. Hydrol.,

208, 62–81.

Popper, K. (1982), Quantum Physics and the Schism in Physics, Unwin Hyman, London.

Priest, G. (2002), Beyond the Limits of Thought, Oxford.

Rossi, F., M. Fiorentino and P. Versace (1984), Two-component extreme value distribution for flood

frequency analysis, Water Resour. Res., 20(7), 847-856.

Russell, B. (1903), Principles of Mathematics, Allen and Unwin.

Russell, B. (1926), Our Knowledge of the External World, revised edition, Allen and Unwin.

Schaefer, M.G., B.L. Barker, G.H. Taylor and J.R.Wallis, Regional precipitation-frequency analysis

and spatial mapping for 24-hour and 2-hour durations for Wahington State, Geophysical

Research Abstracts, Vol. 8, 10899, 2006.

Sisson, S.A., L.R. Pericchi and S.G. Coles (2006), A case for a reassessment of the risks of extreme

hydrological hazards in the Caribbean, Stoch. Environ. Res. Risk Assess., 20, 296–306.

Sutcliffe J. V. (1978), Methods of Flood Estimation, A Guide to Flood Studies Report, Report 49,

Institute of Hydrology, Wallingford, UK.

Todorovic, P., and E. Zelenhasic (1970), A stochastic model for flood analysis, Water Resour. Res.,

6(6), 1641-1648.

Tsallis, C. (1988), Possible generalization of Boltzmann-Gibbs statistics. J. Statist. Phys. 52, 479–487.

Tsallis, C. (2004), Nonextensive statistical mechanics: construction and physical interpretation, in

Nonextensive Entropy, Interdisciplinary Applications (ed. by M. Gell-Mann and C. Tsallis),

Oxford University Press, New York, NY.

Turcotte, D.L. (1994), Fractal theory and the estimation of extreme floods, J. Res. Nat. Inst. Stand.

Technol., 99(4), 377–389.

US Department of the Interior, Bureau of Reclamation (1977), Design of Arch Dams. US Government

Printing Office, Denver, Colorado, USA.

US Department of the Interior, Bureau of Reclamation (1987), Design of Small Dams, third edn. US

Government Printing Office, Denver, Colorado, USA.

US Water Resources Council (1982), Guidelines for Determining Flood Flow Frequency, Bull. 17B,

Hydrol. Subcom., Office of Water Data Coordination, US Geological Survey, Reston, VA.

von Bortkiewicz, L. (1922a), Variationsbreite und mittlerer Fehler, Sitzungsberichte d. Berliner Math.

Ges., 21, 3.

von Bortkiewicz, L. (1922b), Die Variationsbreite bein Gauss’schen Fehlergesetz, Nordisk Statistik

Tidskrift, 1(1), 11 & 1(2), 13.

von Mises, R. (1923), Über die Variationsbreite einer Beobachtungsreihe, Sitzungsber, d. Berliner

Math. Ges., 22, 3.

Wilks, D. S. (1993), Comparison of three-parameter probability distributions for representing annual

extreme and partial duration precipitation series, Water Resour. Res. 29(10), 3543–3549.

World Meteorological Organization (1986), Manual for Estimation of Probable Maximum Precipita-

tion, Operational Hydrology Report 1, 2nd edition, Publication 332, World Meteorological

Organization, Geneva.

22

n = 10

6

n = 10

3

n =1

0

0.2

0.4

0.6

0.8

1

-20246810

Gumbel reduced variate

Standardised distribution quantile 1

Figure 1: Gumbel probability plots of exact distribution function of maxima Hn(x) for n = 103 and 106,

also in comparison with the parent distribution function F(y) ≡ H1(y), which is Weibull with shape

parameter k = 0.5. The distribution quantile has been standardised by x0.9999 corresponding to zH = 9.21

(from Koutsoyiannis, 2004a).

-0.25

-0.2

-0.15

-0.1

-0.05

0

0 0.05 0.1 0.15 0.2 0.25 0.3

Shape parameter, κ

Estimaton bias 1

20

30

50

100

150

L moments

estimator

moments

estimator

Sam

p

le size

Figure 2: Bias in estimating the shape parameter κ of the GEV distribution using the methods of

moments and L-moments (from Koutsoyiannis, 2004a).

23

0

50

100

150

200

250

300

350

400

450

-2024681012

Gumbel reduced variate

Rainfall depth (mm)

Empirical

EV2/L-moments

EV2/Max likelihood

EV2/Mom ents

EV1/L-moments

1.01

1.2

2

5

10

20

50

100

200

500

1000

2000

5000

10000

20000

50000

100000

Return period, years

Estimated PMP value ↑

Figure 3: Empirical distribution and theoretical EV2 and EV1 distributions fitted by several methods

for the annual maximum daily rainfall series of Athens, National Observatory, Greece (Gumbel prob-

ability plot; from Koutsoyiannis, 2004b). The PMP value (424.1 mm) was estimated by Koutsoyiannis

and Baloutsos (2000).

0

1

2

3

4

5

6

7

8

-2024681012

Gumbel reduc ed variate

Rescaled rainfall depth

Empirical

EV2/Least squares

EV2/Moments

EV2/L-moments

EV2/Max likelihood

EV1/L-moments

1.01

1.2

2

5

10

20

50

100

200

500

1000

2000

5000

10000

20000

50000

100000

Return

p

eriod,

y

ears

Figure 4: Empirical distribution and theoretical EV2 and EV1 distributions fitted by several methods

for the unified record of all 169 annual maximum rescaled daily rainfall series (18 065 station-years;

from Koutsoyiannis 2004b).

24

0.1

1

10

0.1 1 10 100 1000 10000 100000

T (years )

x

Empirical Pareto

Exponential Truncated Normal

Normal

Figure 5: Plot of daily rainfall depth from the unified standardized sample above threshold, formed

from data of 168 stations worldwide, vs return period, in comparison to Pareto, exponential, truncated

normal and normal distributions (adapted from Koutsoyiannis, 2005a).

0

2

4

6

8

10

12

14

16

-2 3 8 13 18 23 28 33

GEV reduced variate

Hers hfield-standardised rainfall depth

Empirical

κ = 0.15

κ = 0.13 (Koutsoyiannis, 1999)

1.01

1.2

2

5

10

20

50

100

200

500

1000

2000

5000

10000

20000

50000

100000

Return

p

eriod,

y

ears

Figure 6: Empirical distribution of standardized rainfall depth k = (X – µ)/σ for Hershfield’s (1961a)

data set (95 000 station years from 2645 stations), as determined by Koutsoyiannis (1999), and fitted

EV2 distributions with κ = 0.13 (Koutsoyiannis, 1999) and κ = 0.15 (Koutsoyiannis, 2004b) (EV2

probability plots with fixed κ = 0.15).

View publication statsView publication stats