Content uploaded by Demetris Koutsoyiannis

Author content

All content in this area was uploaded by Demetris Koutsoyiannis on Nov 25, 2018

Content may be subject to copyright.

Entropy based derivation of probability distributions: A case study to

daily rainfall

SimonMichael Papalexiou, Demetris Koutsoyiannis

Department of Water Resources, Faculty of Civil Engineering, National Technical University

of Athens, Heroon Polytechneiou 5, GR-157 80 Zographou, Greece (sp@itia.ntua.gr)

Abstract

The principle of maximum entropy, along with empirical considerations, can provide

consistent basis for constructing a consistent probability distribution model for highly varying

geophysical processes. Here we examine the potential of using this principle with the

Boltzmann-Gibbs-Shannon entropy definition in the probabilistic modelling of rainfall in

different areas worldwide. We define and theoretically justify specific simple and general

entropy maximization constraints which lead to two flexible distributions, i.e., the three-

parameter Generalized Gamma (GG) and the four-parameter Generalized Beta of the second

kind (GB2), with the former being a particular limiting case of the latter. We test the

theoretical results in 11 519 daily rainfall records across the globe. The GB2 distribution

seems to be able to describe all empirical records while two of its specific three-parameter

cases, the GG and the Burr Type XII distributions perform very well by describing the 97.6%

and 87.7% of the empirical records, respectively.

Keywords: maximum entropy; daily rainfall; Generalized Gamma distribution; Generalized

Beta distribution; Burr distribution

1. Introduction

Even though long-term predictions of rainfall are not possible in deterministic terms (e.g.,

weather forecasts are skillful for no more than a week ahead), in probabilistic terms it is

possible to assign a stochastic model or a probabilistic law and to any rainfall amount assign a

return period or a probability of exceedence. Actually, most infrastructures affected by

rainfall and flood are designed this way. Rainfall is generally characterized as an intermittent

stochastic process (for fine timescales), with a mixed-type marginal distribution, partly

discrete and partly continuous. The discrete part is concentrated at zero and defines the

probability dry, while the rest is continuously spread over the positive real axis and

determines the nonzero rainfall distribution. The discrete part of the rainfall distribution can

be easily estimated as the ratio of the number of dry days to total number of days. On the

contrary, the continuous part of the distribution cannot be easily assessed.

Rainfall is usually studied in many different timescales, e.g., from sub-hourly to yearly,

yet, the daily timescale is one of the most convenient and important in hydrological design.

Specifically, it is the smallest timescale for which thousands of records exist with some of

them being more than a century long. Nevertheless, and although daily rainfall has been

extensively studied over the years, a search in the literature reveals that a universally accepted

model for the wet-day daily rainfall distribution does not exist. On the contrary, many

distributions have been proposed in specific studies for specific locations of the world

including, e.g., the two-parameter Gamma, which is probably the prevailing model, the two-

and three-parameter Lognormal, the Generalized Logistic, the Pearson Type III, the Pareto

and the Generalized Pareto, the three- and four-parameter Kappa distributions, and many

more.

The common method to construct an appropriate probability distribution model for

describing one or more samples is to try a variety of different models and choose the best

fitted using a particular mathematical norm, e.g., a least square error or a likelihood norm.

Nevertheless, this approach is rather naïve and laborious; first, there are (at least theoretically)

infinitely many different models to try, and second, this method does not offer any theoretical

justification for the final choice, thus making it an ad hoc empirical choice. This practice

explains why so numerous models have been proposed. Here, we use the principle of

maximum entropy as a solid theoretical background for constructing an appropriate

probability distribution for rainfall and for geophysical processes in general. Our study is both

theoretical, in terms of using the principle of maximum entropy and seeking the appropriate

constraints for entropy maximization, and also empirical, since we test the theoretical results

using 11 519 daily rainfall samples across the world. Our main target is to assess whether a

single generalized model could be appropriate for all rainfall records worldwide.

2. The entropic framework

2.1 Entropy measures

The concept of entropy dates back to the works of Rudolf Clausius in 1850, yet, it was

Ludwig Boltzmann around 1870 who gave entropy a statistical meaning and related it to

statistical mechanics. The concept of entropy was advanced later in the works of J. Willard

Gibbs in thermodynamics and Von Neumann in quantum mechanics, and was reintroduced in

information theory by Claude Shannon [17] in 1948, who showed that entropy is a purely

probabilistic concept, a measure of the uncertainty related to a random variable (RV).

The most famous and well justified measure of entropy for continuous RVs, is the

Boltzmann-Gibbs-Shannon (BGS) entropy, which for a non-negative RV X is

( ) ( )

0

ln d

X X X

S f x f x x

∞

= −

∫

(1)

where

(

)

X

f x

is the probability density function of X. The BGS entropy is not the only

entropy measure. A search in the literature reveals that more than twenty different entropy

measures have been proposed, mainly generalizations of BGS entropy (for a summary of

entropy measures see [2]). Among those measures, it is worth noting the Rényi entropy,

introduced by the Hungarian mathematician Alfréd Rényi in 1961, which have been used in

many different disciplines, e.g., ecology and statistics. It is also worth noting another entropy

measure that has gained much popularity in the last decade, the Havrda-Charvat-Tsallis

(HTC) entropy. It was initially proposed by Havrda and Charvat [4], and was reintroduced

and applied to physics by Tsallis [21]. Apart from its use in physics, the HTC entropy has also

been used more recently in hydrology [11,18] as it gives rise to power-type distributions. The

HTC entropy is a generalization of the BGS entropy given by

( ) ( )

0

1 d

1

q

X

X

f x x

S q q

∞

−

=−

∫

(2)

It is easy to verify that for

1

q

=

it becomes identical to the BGS entropy.

2.2 The principle of maximum entropy

The principle of maximum entropy was established, as a tool for inference under uncertainty,

by Edwin Jaynes [6,7]. In essence, the principle of maximum entropy relies in finding the

most suitable probability distribution under the available information. As Jaynes [6] expressed

it, the resulted maximum entropy distribution “is the least biased estimate possible on the

given information; i.e., it is maximally noncommittal with regard to missing information”.

In a mathematical frame, the given information used in the principle of maximum

entropy, is expressed as a set of constraints formed as expectations of functions g

j

( ) of X, i.e.,

( ) ( ) ( )

0

d 1,...,

j j X j

E g X g x f x x c j n

∞

= = =

∫

(3)

The resulting maximum entropy distributions emerge by maximizing the selected form of

entropy with constraints (3), and with the obvious additional constraint

( )

0

d 1

X

f x x

∞

=

∫

(4)

The maximization is accomplished by using calculus of variation and the method of

Lagrange multipliers. Particularity, the general solution of the maximum entropy distributions

resulting from the maximization of BGS entropy and the HCT entropy, assuming arbitrary

constraints are, respectively,

( ) ( )

0

1

exp

n

X j j

j

f x

λ λ g x

=

= − −

∑

(5)

( ) ( ) ( )

1

1

0

1

1 1

q

n

X j j

j

f x q λ λ g x

−

−

=

= + − +

∑

(6)

where λ

j

, with j =1,…, n are the Lagrange multipliers linked to the constraints (3) and λ

0

is the

multiplier linked to the constraint (4), i.e., λ

0

guarantees the legitimacy of the distribution.

2.3 Justification of the constraints

It becomes clear from the above discourse that the resulting maximum entropy distribution is

uniquely defined by the choice of the imposed constraints. This implies that this choice is the

most important and determinative part of the method. Constraints express our state of

knowledge concerning a RV and should summarize all the available information from

observations or from theoretical considerations. Nevertheless, choosing constraints is not

trivial; they are introduced as expectations of RV functions without any intrinsic limitation on

the form of those functions.

So, how should we choose the appropriate constraints among an infinite number of

choices? In classical statistical mechanics, these constraints are imposed by physical

principles such as the mass, momentum and energy conservation. However, in complex

geophysical processes, these principles cannot help. In geophysical processes, the standard

procedure to assign a probability law is to study the available observations and infer the

underlying distribution without entropy considerations. However, whatever we infer in this

way, is in fact based on a small portion of the past (the available record), which may (or may

not) change in the future. Nevertheless, we can reasonably assume that some RV features may

be more likely to be approximately preserved in the future than others, e.g., coarse features

like the mean and the variance are less likely to change in the future [8] than finer features

based on higher moments (e.g., it is well known that the kurtosis coefficient is extremely

sensitive to observations and additional observations may radically alter it). Therefore, as a

first rule, constraints should be simple and express those features that are likely to be

preserved in the future.

The previous rule is rather subjective in the sense that is difficult to distinguish between

simple and not simple constraints or to foresee what RV quantities will be preserved.

Furthermore, the use of a particular set of “simple” constraints may lead to a distribution that

is not supported by the empirical data. Obviously, it is difficult to reject or verify the detailed

shape features of a distribution based on a small sample which apparently does not provide

the sufficient amount of information needed. Nonetheless, many geophysical processes, even

if long records do not exist for particular regions, are extensively recorded worldwide e.g.,

thousands of stations record precipitation, temperature, etc. Thus, the study of this massive

amount of information may lead in determining some important prior characteristics of the

underlying distribution that should be preserved, e.g., a J- or bell-shaped distribution or a

heavy- or light-tailed distribution. Therefore, constraints should be chosen not only based on

simplicity, but also on the appropriateness of the resulting distribution given the empirical

evidence.

Commonly used constraints in maximizing entropy assume known mean and variance,

i.e., known first and second moments, which are clearly two very simple constraints.

Particularly, entropy maximization assuming known first two moments leads: (a) to the

celebrated normal distribution in the BGS entropy case, or, to the truncated normal if the

mandatory constraint of non-negativity for geophysical processes is imposed, and (b) to a

symmetric bell-shaped distribution with power-type tails in the HCT entropy case, or, its

truncated version for a non-negative RV. The distribution arising in the HCT case for zero

mean is now known as the Tsallis distribution. For non-zero mean the resulting distribution is

the Pearson type VII introduced by Pearson in 1916, whose special case is the Tsallis

distribution. Both these distributions are symmetric bell-shaped, in which asymmetry can only

emerge by truncation at zero. As a consequence, those distributions may likely fail to describe

sufficiently many geophysical processes that exhibit a rich pattern of asymmetries (e.g., it is

well known that the rainfall in small time scales is heavily skewed and likely heavy tailed).

Accordingly, in this study we aim to define some simple and general constraints

alternative to those of the first two moments that lead to suitable probability distributions for

geophysical processes, particularly for rainfall. Additionally, we aim to only use the BGS

entropy, which is theoretically justified and widely accepted, avoiding the use of generalized

entropy measures.

The mean is one of the most commonly used constraints, as it is a classical measure of

central tendency. Another useful measure of central tendency, exhibiting the convenient

property for geophysical processes to be defined only for positive values, is the geometric

mean µ

G

. An estimate of this, from a sample of size n, is given by

( )

xx

n

xµ

n

i

i

n

n

i

iG

lnexpln

1

exp

1

/1

1

=

=

=

∑

∏

=

=

(7)

where the overbar stands for the sample average. The sample geometric mean (also referred as

a constraint in [9]) is smaller than the arithmetic mean. Intuitively, this leads us to formulate

the following constraint for entropy maximization

[

]

G

ln ln

E X

µ

= (8)

The expectation of ln X, apart from its relationship to the geometric mean and its simplicity,

makes an essential constraint for positively skewed RVs. To clarify, samples drawn from

positively skewed distributions and, even more so, drawn from heavy-tailed distributions,

exhibit values located on the right area very far from the mean value; in a sense, those values

act like outliers and consequently strongly influence the sample moments, especially those of

higher order. Therefore, it is not rational to assume that sample moments, especially based on

samples drawn from heavy-tailed distributions, are likely to be preserved. On the contrary, the

function ln x applied to this kind of samples eliminates the influence of those “extreme”

values and offers a very robust measure that is more likely to be preserved than the estimated

sample moments. Essentially for this reason, the logarithmic transformation is probably the

most common transformation used in hydrology as it tends to normalize positively skewed

data.

As stated above, the link of the mean and variance with the physical principles of

momentum and energy conservation is invalid in geophysical processes. For example, the

mean of the rainfall is not its momentum and its variance it is not its energy. Even in these

processes, mean and variance (as measures of central tendency and dispersion) provide useful

information, which can at least explain general behaviours and shapes of probability density

functions [11]. However, this information is good only for explanatory purposes and does not

enable detailed and accurate modelling. For, there do not exist theoretical arguments (apart

from simplicity and conceptual meaning as measures of central tendency and dispersion)

which to favor mean and variance against, e.g., fractional moments of small order or even

negative. For example, if the second moment is likely to be preserved, then one could think

that the square root moment is more likely to be preserved as it is more robust in outliers.

Additionally, we can relate low order fractional moments with the ln x function, as it is well

known that

0

1

lim ln

q

q

x

x

q

→

−= (9)

Thus, we may say that the function x

q

for small values of q behaves similar to ln x, thus

exhibiting properties similar to those of the logarithmic function described above.

Based on this reasoning we deem that, instead of choosing the order of moments a

priori, it is better to let the order unspecified, so that any value can be a posteriori chosen,

including small fractional values. This leads in imposing as a constraint any moment m

q

of

order q, i.e.,

( )

0

d

q q

q X

m E X x f x x

∞

= =

∫

(10)

One reason that many entropy generalizations have emerged was to explain many

empirically detected deviations from exponential type distributions that arise from the BGS

entropy using standard moment constraints. Yet, generalized entropy measures have been

criticized for lacking theoretically consistency and for being arbitrary, a reasonable argument

considering the large number of entropy generalizations available in the literature. Here,

instead of using generalized entropy measures that may result in power-law distributions, we

generalize the important notion of moments inspired by the limiting definition of the

exponential function, i.e.,

(

)

(

)

1/

0

exp lim 1

p

q q

p

x p x

→

= + . We first define the function

p

q

x

as

(

)

: ln 1 /

p

q q

x p x p

= +

(11)

which for p = 0 becomes the familiar power function x

q

as

(

)

0

0

lim ln 1 /

q q q

p

x p x p x

→

= + =

.

Thus, we can define a generalization of the classical moments, for which we use the name p-

moments, by

( )

( )

0

1

ln 1 d

q q q

p p X

m E X p x f x x

p

∞

= = +

∫

(12)

Arguably, this generalization is arbitrary and many other moment generalizations can be (and

in fact are) constructed. Nonetheless, we believe that there is a rationale that supports the use

of p-moments, which can be summarized as follows: (a) if generalized entropy measures,

considered by many as arbitrary, have been successfully used, then there is no reason to avoid

using generalized moments; (b) maximization of the BGS entropy using p-moments leads, as

will become apparent in the next section, to flexible power-type distributions (including the

Pareto and Tsallis distributions for q = 1 and q = 2, respectively); (c) p-moments are simple

and, for p = 0, become identical to the ordinary moments; and (d) they are based on the

p

q

x

function that exhibits all the desired properties, like those of the

ln

x

function described

above, and thus are suitable for positively skewed RVs; additionally, compared to

[

]

ln

E X

they are always positive.

2.4 The resulting entropy distributions

Entropy optimization can be accomplished in many different combinations of the previously

defined constraints; however, here we use two simple combinations of the aforementioned

constraints based on the type and the generality of the distributions that emerge. We combine

the

[

]

ln

E X

constraint, first, with classical moments, and second, with p-moments, letting in

both cases the moment order arbitrary.

In the first case, the maximization of the BGS entropy, given in (1), with constraints (8)

and (10) results in the density function

(

)

(

)

0 1 2

exp ln

q

X

f x

λ λ x λ x

= − − −

(13)

which after algebraic manipulations and parameter renaming can be written as

( ) ( )

1 2

1

2

1 2

exp , 0

Γ /

γ γ

X

γ x x

f x x

β γ γ β β

−

= − ≥

(14)

corresponding to the distribution function

( )

2

1 1

2 2

1 Γ , / Γ

γ

X

γ γ

x

F x

γ β γ

= −

(15)

where

(

)

Γ

is the Gamma function and

(

)

Γ ,

is the upper incomplete Gamma function.

This distribution, commonly attributed to Stacy [19]appeared much earlier in the

literature in the works of Amoroso around 1920 [10], and seems to have been rediscovered

many times under different forms (see e.g., [10]). Here, we use a slightly different form from

that proposed by Stacy. Essentially, it is a generalization of the Gamma distribution and will

be denoted by

(

)

1 2

GG , ,

β γ γ

, or simply GG. It is a very flexible distribution that includes

many other well-known distributions as particular cases, e.g., the Gamma, the Weibull, the

Exponential, or even the Chi-squared distributions and others.

The distribution includes the scale parameter

0

β

>

, and the shape parameters 1

0

γ

>

and 2

0

γ

>

. The parameter γ

1

controls the behavior of the left tail, i.e., if 1

0 1

γ

< <

the density

function is J-shaped and for

0

x

→

,

(

)

X

f x

→ ∞

; if 1

1

γ

>

the density function is bell-shaped

and mainly positively skewed; yet, for certain values of

1

γ

and

2

γ

it can be symmetric or even

negatively skewed, and for

0

x

=

,

(

)

0

X

f x

=

; finally, for 1

1

γ

=

the distribution degenerates

to a generalized exponential function and for

0

x

=

,

(

)

0

X

f

< ∞

. The parameter

2

γ

is very

important as for fixed

1

γ

it controls the behavior of the right tail, i.e., it determines the

frequency and the magnitude of the extreme events. Generally and loosely speaking, for

2

1

γ

<

the distribution can be characterized as sub-exponential or heavy-tailed, and for 2

1

γ

>

as hyper-exponential or light-tailed (for a classification of distribution tails see [3]).

Notably, the distribution is also valid if the shape parameters are simultaneously

negative (a generalized inverse Gamma distribution); however, the distribution looses some

important shape characteristics and seems not suitable for geophysical RV like rainfall. Thus,

here the distribution is only considered for positive shape parameters.

In the second case, the maximization of the BGS entropy with constraints (8) and (12)

results in the density function

( )

(

)

0 1 2

exp ln ln 1 /

q

X

f x

λ λ x λ p x p

= − − − +

(16)

which after algebraic manipulations and parameter renaming can be written as

( )

1 2

1 3 3

( )

1

3

1 2

( ) 1

, 0

B ,

γ γ

γ γ γ

X

x

β β β

γx x

f x γ γ

− +

−

= +

≥

(17)

corresponding to the distribution function

( ) ( ) ( )

3

1

1 2 1 2

( ) B , / B , , where 1 /

γ

X z

F γ γ γ γ z xx β

−

−

= = +

(18)

where

(

)

B ,

is the Beta function and

(

)

B ,

z

is the incomplete Beta function.

This distribution has not been formed earlier on a similar rationale, yet, a search in the

literature reveals that it has been rediscovered many times under different names and

parameterizations. It is most commonly known as the Generalized Beta of the second kind—

hereafter denoted as

(

)

1 2 3

GB2 , , ,

β γ γ γ

, or simply GB2. It seems that Milke and Johnson [14]

were the first that formed this distribution, and proposed it for describing hydrological and

meteorological variables. It has also been used in different disciplines, e.g., McDonald [12]

used the GB2 as an income distribution. Nevertheless, the distribution can be considered as a

simple generalization of many well-known and much earlier introduced distributions, e.g., the

F-distribution or the Pearson type VI of the celebrated Pearson system.

The GB2 distribution is a very flexible four-parameter distribution with

0

β

>

being the

scale parameter, and

1

0

γ

>

,

2

0

γ

>

and

3

0

γ

>

being the three shape parameters, allowing the

distribution to form very many different shapes. The GB2 distribution includes as special or

limiting cases many of the well-known distributions, e.g., the Beta of the second kind, the

Pareto type II, the Loglogistic, the Burr type XII, even the Generalized Gamma (for a

complete account see [10,13].

Obviously, the flexibility of the GB2 distribution makes it a good model for describing

rainfall—we have already used the GB2, under the name JH distribution, to describe the

rainfall in a large range of timescales [15] and to construct theoretically consistent IDF curves

[16]. Nonetheless, as a general rule based on the principle of parsimony, a three-parameter

model is preferable than a four-parameter model, provided that the simpler model describes

the data adequately. Additionally, it is not reasonable to compare the performance of the GG

distribution, which is a three-parameter model, with GB2, which is a four-parameter model.

Thus, a simpler form of the GB2 distribution is selected based on its flexibility and its simple

analytical expression of the distribution function, and consequently, of the quantile function.

A simple three-parameter form of GB2 is derived by setting

1

1

γ

=

in equation (17). By

renaming the parameters and after algebraic manipulations we obtain a distribution known as

the Burr type XII [1] (denoted hereafter as BurrXII), which was introduced by Burr in 1942 in

the framework of a distribution system similar to Pearson’s. Its probability density function is

( )

1 1 1 2

11

1

2

1

1

, 0

γ γ γ γ

X

x x

f x γ x

β β β

− −

−

= ≥

+

(19)

and its distribution function is

( )

1

1 2

1

2

1 1

γ

γ γ

X

x

F x γ β

−

= − +

(20)

The BurrXII distribution is a flexible power-type distribution that comprises the scale

parameter

0

β

>

and the shape parameters

1

0

γ

>

and

2

0

γ

≥

.

The form of the BurrXII distribution we use here is not the one found in the literature (see

e.g. [20]). We preferred the expression (19) because it is suggestive of a generalization of the

familiar Weibull distribution (for

2

0

γ

→

) and also because the asymptotic behavior of the

right tail is solely controlled by the parameter

2

γ

(for large values of X,

{

}

2 2

1/ 1/

2

γ γ

P X x

γ β x

−

>

). The distribution has a finite variance distribution for

2

0 0.5

γ≤ <

and finite mean for

2

0 1

γ

≤ <

. Finally, the shape parameter

1

γ

controls the left tail as for

1

0 1

γ

< <

the distribution is J-shaped, for

1

1

γ

>

bell-shaped and for

1

1

γ

=

degenerates to the

familiar Pareto type II distribution.

3. Application

To test the applicability of the above theoretical framework, a large data set of daily rainfall

observations was used. This is a subset of the Global Historical Climatology Network – Daily

database (http://www.ncdc.noaa.gov/oa/climate/ghcn-daily) that includes data recorded at

over 43 000 stations. A total of 11 519 stations were selected from this database with the

following criteria: (a) record length of over 50 years; (b) percentage of missing values less

than 1%; and (c) percentage of flags for suspect quality less than 1% (see the above web site

for the details about flags). The locations of the selected stations are shown in Fig. 1.

To each of the 11 519 daily rainfall samples we tested the suitability of the above

described, entropy derived distributions using evaluation tools based on the L-moments ratio

plots. A typical approach to identify a suitable distribution for one or more samples is to

compare empirically derived L-ratios with the respective theoretical quantities. The latter are

plotted on a graph as a point, line or area (depending on the number and the type of the

distributional parameters). The most common plot of this type is the L-kurtosis vs. L-

skewness plot. Nevertheless, here we prefer the L-skewness vs. L-variation plot because, as

we deal with positive RVs, the L-variation is meaningful (nonzero and nonnegative, with

values ranging from 0 to 1), and evidently its estimation is more robust than that of the L-

kurtosis.

The most general distribution described above is the four parameter GB2 distribution,

whose graphical depiction in terms of L-ratio plots is not feasible as different parameter sets

may correspond to the same point on an L-ratio plot. Therefore, we made plots for the three-

parameter special cases, i.e. the GG and the BurrXII distributions, which have one scale

parameter and two shape parameters. Evidently, the suitability of one of those two entails also

the suitability of the more general GB2 distribution. Each of the GG and the BurrXII

distributions forms an area on the L-ratio plot. The area was depicted by drawing several

theoretical lines that correspond to specific values of the tail parameter γ

2

(with varying

parameter γ

1

). The theoretical points of L-variation and L-skewness were calculated

numerically based on the integral definition of the L-moments [5].

Fig. 2 and Fig. 3 depict the theoretical area of the GG and BurrXII distributions,

respectively, in the L-skewness vs. L-variation plot, as well as the empirical points of the

corresponding sample statistics of the 11 519 records. In Fig. 2, 97.6% of the empirical points

lie in the theoretical area of the GG distribution for a tail parameter

2

0.1 2

γ

≤ ≤

. It is worth

noting that the vast majority of the empirical points are located above the line of the Gamma

distribution (

2

1

γ

=

) resulting in an average tail parameter

2

0.53

γ=. This indicates that the

Gamma distribution, which has probably been the most common model for daily rainfall,

should not be uncritically adopted as it would seriously underestimate the frequency and the

magnitude of extreme events. Specifically, for

2

1

γ

<

(as observed in the vast majority of

cases of Fig. 2), the GG distribution exhibits a tail heavier than that of the Gamma

distribution. In addition, in points where the GG distribution is unsuitable, a heavier type of

distribution tail is needed.

Regarding the BurrXII distribution in Fig. 3, 87.7% of the empirical points lie in a space

corresponding to a tail parameter

2

0 0.7

γ≤ ≤ . The average value of the tail parameter is

2

0.19

γ= meaning that approximately only moments of up to order 1/0.19 ≈ 5 exist. Notably,

the empirical points that the BurrXII fails to describe can be described by the GG distribution

and vice versa. Particularly, the empirical points in Fig. 3 that lie above the line

2

0.5

γ=

(more or less the ones that GG distribution fails to describe) correspond to BurrXII

distributions with infinite variance and thus, it is hard to describe by an exponential-type

distribution whose all moments are finite.

An additional comment that regards both distributions and also empirically recognized,

(see for example the empirical densities in Fig. 2), is that a large percentage of the empirical

points of daily rainfall are better described by bell-shaped distributions (highly positively

skewed though) rather than J-shaped. Thus, a general model for rainfall should comprehend

both J-shaped and bell-shaped densities. This excludes simple models like the Exponential,

the Pareto, and the Lognormal distributions (commonly used as rainfall models) because the

first two can only be J-shaped and the third only bell-shaped. Finally, as mentioned before,

both distributions are special cases of the GB2 distribution and thus the GB2 distribution

describes 100% of the empirical points, being thus, a suitable model for rainfall, not only for

the daily scale examined here, but also for finer (e.g. sub-hourly) and coarser (e.g. annual)

scales [15].

4. Summary and conclusions

In order to derive statistical distributions suitable for geophysical processes, and particularly

for rainfall, we propose a rationale for defining and selecting constraints within a BGS

entropy maximization framework. Entropy maximization offers a solid theoretical basis for

identifying a probabilistic law based on the available information, in contrast to the common

technique of choosing a distribution from a repertoire based on trial-and-error methods. This

rationale is based on the premises that the constraints should be as few and simple as possible

and incorporate prior information on the process of interest. This prior information may

concern the general shapes of densities and could be obtained by studying the process

worldwide. We studied and justified conceptually three particular constraints, related to the

logarithmic and power functions, which are suitable for positive, highly varying and

asymmetric RVs. Namely, the constraints are the expected values of (a)

ln

x

; (b) x

q

; and (c)

(

)

ln 1 /

q

p x p

+. The last constraint generalizes the classical moments and naturally leads to

power-type distributions avoiding generalized entropy measures.

The BGS entropy maximization under two combinations of these constraints leads to

two flexible distributions, i.e., a three-parameter exponential type, known as the Generalized

Gamma (GG), and, a four-parameter power type, known as the Generalized Beta of the

second kind (GB2)—the former is a particular limiting case of the latter. Another three-

parameter model, known as the Burr type XII (power type), easily derived from the GB2,

proves to be also useful. In order to evaluate the performance of the three-parameter entropy

derived distributions, we used a very large database with thousands of daily rainfall records

across the world. We formed the theoretical area of those distributions in an L-skewness vs.

L-variation plot and compared it with the corresponding sample statistics of 11 519 daily

rainfall records. Both the GG and BurrXII distributions performed very well by describing

97.6% and 87.7%, respectively, of the empirical points. Notably, the two distributions are

complementary in the sense that empirical points that cannot be described by one can be

described by the other. Consequently, as both distributions are special cases of the GB2, the

latter can describe 100% of the empirical points being thus a model suitable for all daily

rainfall records.

Both the empirical analysis of this massive number of records as well as the

distributions tested, lead to two useful conclusions regarding the shape characteristics of the

daily rainfall distribution. First, a suitable distribution for daily rainfall must be able to form

both J- and bell-shaped densities, with J-shaped densities having also the property for

0

x

→

,

(

)

X

f x

→ ∞

. This excludes commonly used models like the Exponential, the Pareto or the

Lognormal distributions and many more. Of course, these distributions may be suitable for

particular cases or for specific rainfall ranges (above certain thresholds) and actually are, but

cannot be proposed as general models for daily rainfall. Second, regarding the right

distribution tail (a very important feature as it controls the behavior of extremes) the analysis

showed that heavy-tailed distributions describe better the vast majority of the empirical points

compared to light-tailed distributions. Consequently, the Gamma distribution (probably the

most common daily rainfall model) is rejected as a general model for daily rainfall, because in

the majority of rainfall records would seriously underestimate the extreme events.

Evidently, rainfall in different places of Earth is influenced by local characteristics such

as climate, topography, distance from the sea and many more. The diversity of such

characteristics produces different rainfall patterns across the globe. This does not contradict

our finding that a single flexible probabilistic law (the GB2 distribution) or simpler special

cases thereof (the GG and BurrXII distributions) can model rainfall over all examined cases.

The diversity of characteristics is rather reflected in the diversity of shapes that the GB2

distribution can produce, as well as in the wide range of feasible parameter values.

References

[1] Burr IW. Cumulative Frequency Functions. The Annals of Mathematical Statistics.

1942;13(2):215-232.

[2] Esteban MD, Morales D. A summary on entropy statistics. Kybernetika.

1995;31(4):337-346.

[3] Goldie CM, Klüppelberg C. Subexponential distributions. A Practical Guide to Heavy

Tails: Statistical Techniques and Applications. 1998:435–459.

[4] Havrda J, Charvát F. Concept of structural a-entropy. Kybernetika. 1967;3:30–35.

[5] Hosking JR. L-moments: analysis and estimation of distributions using linear

combinations of order statistics. Journal of the Royal Statistical Society. Series B

(Methodological). 1990;52(1):105–124.

[6] Jaynes ET. Information Theory and Statistical Mechanics. Phys. Rev. 1957;106(4):620.

[7] Jaynes ET. Information Theory and Statistical Mechanics. II. Phys. Rev.

1957;108(2):171.

[8] Jaynes ET. Probability: The logic of science. Cambridge University Press; 2003.

[9] Kapur JN. Maximum-entropy models in science and engineering. John Wiley & Sons;

1989.

[10] Kleiber C, Kotz S. Statistical size distributions in economics and actuarial sciences.

Wiley-Interscience; 2003.

[11] Koutsoyiannis D. Uncertainty, entropy, scaling and hydrological stochastics, 1,

Marginal distributional properties of hydrological processes and state scaling.

Hydrological Sciences Journal. 2005;50(3):381-404.

[12] McDonald JB. Some Generalized Functions for the Size Distribution of Income.

Econometrica. 1984;52(3):647-663.

[13] McDonald JB, Xu YJ. A generalization of the beta distribution with applications.

Journal of Econometrics. 1995;66(1-2):133-152.

[14] Mielke Jr PW, Johnson ES. Some generalized beta distributions of the second kind

having desirable application features in hydrology and meteorology. Water Resources

Research. 1974;10(2):223–226.

[15] Papalexiou SM, Koutsoyiannis D. Probabilistic description of rainfall intensity at

multiple time scales. In: IHP 2008 Capri Symposium: “The Role of Hydrology in Water

Resources Management”.; 2008. Available at: http://www.itia.ntua.gr/en/docinfo/884/

[Accessed October 6, 2010].

[16] Papalexiou SM, Koutsoyiannis D. Ombrian curves in a maximum entropy framework.

In: European Geosciences Union General Assembly 2008.; 2008:00702. Available at:

http://www.itia.ntua.gr/en/docinfo/851/ [Accessed October 6, 2010].

[17] Shannon CE. The mathematical theory of communication. Bell System Technical

Journal. 1948;27:379–423.

[18] Singh VP. Entropy theory for derivation of infiltration equations. Water Resour. Res.

2010;46:20.

[19] Stacy EW. A Generalization of the Gamma Distribution. The Annals of Mathematical

Statistics. 1962;33(3):1187-1192.

[20] Tadikamalla PR. A Look at the Burr and Related Distributions. International Statistical

Review / Revue Internationale de Statistique. 1980;48(3):337-344.

[21] Tsallis C. Possible generalization of Boltzmann-Gibbs statistics. Journal of Statistical

Physics. 1988;52(1):479-487.

Figures

Fig. 1 Locations of the raingauge stations of the study, which are a subset of the Global Historical Climatology

Network-Daily database containing those stations with daily rainfall record length of over 50 years (a total of

11 519 stations with very few missing values).

Average of

all records

Gamma line

(γ

2

= 1)

Fig. 2 Theoretical relationships of L-skewness vs. L-variation of the Generalized Gamma distribution and

empirical points of the corresponding sample statistics of the 11 519 records. 97.6% of the empirical points lie in

the space of the GG for 0.1 < γ

2

< 2. For two of the records with the indicated positions on the L-moments

diagram the empirical probability density functions are also shown.

Average of

all records

Pareto line

Fig. 3 Theoretical relationships of L-skewness vs. L-variation of the Burr type XII distribution and empirical

points of the corresponding sample statistics of the 11 519 records. 87.7% of the empirical points lie in the space

of the BurrXII for 0 < γ

2

< 0.7. For two of the records with the indicated positions on the L-moments diagram the

empirical probability density functions are also shown.