ArticlePDF Available

Abstract

The principle of maximum entropy, along with empirical considerations, can provide consistent basis for constructing a consistent probability distribution model for highly varying geophysical processes. Here we examine the potential of using this principle with the Boltzmann–Gibbs–Shannon entropy definition in the probabilistic modeling of rainfall in different areas worldwide. We define and theoretically justify specific simple and general entropy maximization constraints which lead to two flexible distributions, i.e., the three-parameter Generalized Gamma (GG) and the four-parameter Generalized Beta of the second kind (GB2), with the former being a particular limiting case of the latter. We test the theoretical results in 11,519 daily rainfall records across the globe. The GB2 distribution seems to be able to describe all empirical records while two of its specific three-parameter cases, the GG and the Burr Type XII distributions perform very well by describing the 97.6% and 87.7% of the empirical records, respectively.
Entropy based derivation of probability distributions: A case study to
daily rainfall
SimonMichael Papalexiou, Demetris Koutsoyiannis
Department of Water Resources, Faculty of Civil Engineering, National Technical University
of Athens, Heroon Polytechneiou 5, GR-157 80 Zographou, Greece (sp@itia.ntua.gr)
Abstract
The principle of maximum entropy, along with empirical considerations, can provide
consistent basis for constructing a consistent probability distribution model for highly varying
geophysical processes. Here we examine the potential of using this principle with the
Boltzmann-Gibbs-Shannon entropy definition in the probabilistic modelling of rainfall in
different areas worldwide. We define and theoretically justify specific simple and general
entropy maximization constraints which lead to two flexible distributions, i.e., the three-
parameter Generalized Gamma (GG) and the four-parameter Generalized Beta of the second
kind (GB2), with the former being a particular limiting case of the latter. We test the
theoretical results in 11 519 daily rainfall records across the globe. The GB2 distribution
seems to be able to describe all empirical records while two of its specific three-parameter
cases, the GG and the Burr Type XII distributions perform very well by describing the 97.6%
and 87.7% of the empirical records, respectively.
Keywords: maximum entropy; daily rainfall; Generalized Gamma distribution; Generalized
Beta distribution; Burr distribution
1. Introduction
Even though long-term predictions of rainfall are not possible in deterministic terms (e.g.,
weather forecasts are skillful for no more than a week ahead), in probabilistic terms it is
possible to assign a stochastic model or a probabilistic law and to any rainfall amount assign a
return period or a probability of exceedence. Actually, most infrastructures affected by
rainfall and flood are designed this way. Rainfall is generally characterized as an intermittent
stochastic process (for fine timescales), with a mixed-type marginal distribution, partly
discrete and partly continuous. The discrete part is concentrated at zero and defines the
probability dry, while the rest is continuously spread over the positive real axis and
determines the nonzero rainfall distribution. The discrete part of the rainfall distribution can
be easily estimated as the ratio of the number of dry days to total number of days. On the
contrary, the continuous part of the distribution cannot be easily assessed.
Rainfall is usually studied in many different timescales, e.g., from sub-hourly to yearly,
yet, the daily timescale is one of the most convenient and important in hydrological design.
Specifically, it is the smallest timescale for which thousands of records exist with some of
them being more than a century long. Nevertheless, and although daily rainfall has been
extensively studied over the years, a search in the literature reveals that a universally accepted
model for the wet-day daily rainfall distribution does not exist. On the contrary, many
distributions have been proposed in specific studies for specific locations of the world
including, e.g., the two-parameter Gamma, which is probably the prevailing model, the two-
and three-parameter Lognormal, the Generalized Logistic, the Pearson Type III, the Pareto
and the Generalized Pareto, the three- and four-parameter Kappa distributions, and many
more.
The common method to construct an appropriate probability distribution model for
describing one or more samples is to try a variety of different models and choose the best
fitted using a particular mathematical norm, e.g., a least square error or a likelihood norm.
Nevertheless, this approach is rather naïve and laborious; first, there are (at least theoretically)
infinitely many different models to try, and second, this method does not offer any theoretical
justification for the final choice, thus making it an ad hoc empirical choice. This practice
explains why so numerous models have been proposed. Here, we use the principle of
maximum entropy as a solid theoretical background for constructing an appropriate
probability distribution for rainfall and for geophysical processes in general. Our study is both
theoretical, in terms of using the principle of maximum entropy and seeking the appropriate
constraints for entropy maximization, and also empirical, since we test the theoretical results
using 11 519 daily rainfall samples across the world. Our main target is to assess whether a
single generalized model could be appropriate for all rainfall records worldwide.
2. The entropic framework
2.1 Entropy measures
The concept of entropy dates back to the works of Rudolf Clausius in 1850, yet, it was
Ludwig Boltzmann around 1870 who gave entropy a statistical meaning and related it to
statistical mechanics. The concept of entropy was advanced later in the works of J. Willard
Gibbs in thermodynamics and Von Neumann in quantum mechanics, and was reintroduced in
information theory by Claude Shannon [17] in 1948, who showed that entropy is a purely
probabilistic concept, a measure of the uncertainty related to a random variable (RV).
The most famous and well justified measure of entropy for continuous RVs, is the
Boltzmann-Gibbs-Shannon (BGS) entropy, which for a non-negative RV X is
( ) ( )
0
ln d
X X X
S f x f x x
= −
(1)
where
(
)
X
f x
is the probability density function of X. The BGS entropy is not the only
entropy measure. A search in the literature reveals that more than twenty different entropy
measures have been proposed, mainly generalizations of BGS entropy (for a summary of
entropy measures see [2]). Among those measures, it is worth noting the Rényi entropy,
introduced by the Hungarian mathematician Alfréd Rényi in 1961, which have been used in
many different disciplines, e.g., ecology and statistics. It is also worth noting another entropy
measure that has gained much popularity in the last decade, the Havrda-Charvat-Tsallis
(HTC) entropy. It was initially proposed by Havrda and Charvat [4], and was reintroduced
and applied to physics by Tsallis [21]. Apart from its use in physics, the HTC entropy has also
been used more recently in hydrology [11,18] as it gives rise to power-type distributions. The
HTC entropy is a generalization of the BGS entropy given by
( ) ( )
0
1 d
1
q
X
X
f x x
S q q
− 
 
=
(2)
It is easy to verify that for
1
q
it becomes identical to the BGS entropy.
2.2 The principle of maximum entropy
The principle of maximum entropy was established, as a tool for inference under uncertainty,
by Edwin Jaynes [6,7]. In essence, the principle of maximum entropy relies in finding the
most suitable probability distribution under the available information. As Jaynes [6] expressed
it, the resulted maximum entropy distribution “is the least biased estimate possible on the
given information; i.e., it is maximally noncommittal with regard to missing information”.
In a mathematical frame, the given information used in the principle of maximum
entropy, is expressed as a set of constraints formed as expectations of functions g
j
( ) of X, i.e.,
( ) ( ) ( )
0
d 1,...,
j j X j
E g X g x f x x c j n
= = =
 
(3)
The resulting maximum entropy distributions emerge by maximizing the selected form of
entropy with constraints (3), and with the obvious additional constraint
( )
0
d 1
X
f x x
(4)
The maximization is accomplished by using calculus of variation and the method of
Lagrange multipliers. Particularity, the general solution of the maximum entropy distributions
resulting from the maximization of BGS entropy and the HCT entropy, assuming arbitrary
constraints are, respectively,
( ) ( )
0
1
exp
n
X j j
j
f x
λ λ g x
=
 
= − −
 
 
(5)
( ) ( ) ( )
1
1
0
1
1 1
q
n
X j j
j
f x q λ λ g x
=
 
 
 
= + − +
 
 
 
 
 
(6)
where λ
j
, with j =1,…, n are the Lagrange multipliers linked to the constraints (3) and λ
0
is the
multiplier linked to the constraint (4), i.e., λ
0
guarantees the legitimacy of the distribution.
2.3 Justification of the constraints
It becomes clear from the above discourse that the resulting maximum entropy distribution is
uniquely defined by the choice of the imposed constraints. This implies that this choice is the
most important and determinative part of the method. Constraints express our state of
knowledge concerning a RV and should summarize all the available information from
observations or from theoretical considerations. Nevertheless, choosing constraints is not
trivial; they are introduced as expectations of RV functions without any intrinsic limitation on
the form of those functions.
So, how should we choose the appropriate constraints among an infinite number of
choices? In classical statistical mechanics, these constraints are imposed by physical
principles such as the mass, momentum and energy conservation. However, in complex
geophysical processes, these principles cannot help. In geophysical processes, the standard
procedure to assign a probability law is to study the available observations and infer the
underlying distribution without entropy considerations. However, whatever we infer in this
way, is in fact based on a small portion of the past (the available record), which may (or may
not) change in the future. Nevertheless, we can reasonably assume that some RV features may
be more likely to be approximately preserved in the future than others, e.g., coarse features
like the mean and the variance are less likely to change in the future [8] than finer features
based on higher moments (e.g., it is well known that the kurtosis coefficient is extremely
sensitive to observations and additional observations may radically alter it). Therefore, as a
first rule, constraints should be simple and express those features that are likely to be
preserved in the future.
The previous rule is rather subjective in the sense that is difficult to distinguish between
simple and not simple constraints or to foresee what RV quantities will be preserved.
Furthermore, the use of a particular set of “simple” constraints may lead to a distribution that
is not supported by the empirical data. Obviously, it is difficult to reject or verify the detailed
shape features of a distribution based on a small sample which apparently does not provide
the sufficient amount of information needed. Nonetheless, many geophysical processes, even
if long records do not exist for particular regions, are extensively recorded worldwide e.g.,
thousands of stations record precipitation, temperature, etc. Thus, the study of this massive
amount of information may lead in determining some important prior characteristics of the
underlying distribution that should be preserved, e.g., a J- or bell-shaped distribution or a
heavy- or light-tailed distribution. Therefore, constraints should be chosen not only based on
simplicity, but also on the appropriateness of the resulting distribution given the empirical
evidence.
Commonly used constraints in maximizing entropy assume known mean and variance,
i.e., known first and second moments, which are clearly two very simple constraints.
Particularly, entropy maximization assuming known first two moments leads: (a) to the
celebrated normal distribution in the BGS entropy case, or, to the truncated normal if the
mandatory constraint of non-negativity for geophysical processes is imposed, and (b) to a
symmetric bell-shaped distribution with power-type tails in the HCT entropy case, or, its
truncated version for a non-negative RV. The distribution arising in the HCT case for zero
mean is now known as the Tsallis distribution. For non-zero mean the resulting distribution is
the Pearson type VII introduced by Pearson in 1916, whose special case is the Tsallis
distribution. Both these distributions are symmetric bell-shaped, in which asymmetry can only
emerge by truncation at zero. As a consequence, those distributions may likely fail to describe
sufficiently many geophysical processes that exhibit a rich pattern of asymmetries (e.g., it is
well known that the rainfall in small time scales is heavily skewed and likely heavy tailed).
Accordingly, in this study we aim to define some simple and general constraints
alternative to those of the first two moments that lead to suitable probability distributions for
geophysical processes, particularly for rainfall. Additionally, we aim to only use the BGS
entropy, which is theoretically justified and widely accepted, avoiding the use of generalized
entropy measures.
The mean is one of the most commonly used constraints, as it is a classical measure of
central tendency. Another useful measure of central tendency, exhibiting the convenient
property for geophysical processes to be defined only for positive values, is the geometric
mean µ
G
. An estimate of this, from a sample of size n, is given by
( )
xx
n
xµ
n
i
i
n
n
i
iG
lnexpln
1
exp
1
/1
1
=
=
=
=
=
(7)
where the overbar stands for the sample average. The sample geometric mean (also referred as
a constraint in [9]) is smaller than the arithmetic mean. Intuitively, this leads us to formulate
the following constraint for entropy maximization
[
]
G
ln ln
E X
µ
= (8)
The expectation of ln X, apart from its relationship to the geometric mean and its simplicity,
makes an essential constraint for positively skewed RVs. To clarify, samples drawn from
positively skewed distributions and, even more so, drawn from heavy-tailed distributions,
exhibit values located on the right area very far from the mean value; in a sense, those values
act like outliers and consequently strongly influence the sample moments, especially those of
higher order. Therefore, it is not rational to assume that sample moments, especially based on
samples drawn from heavy-tailed distributions, are likely to be preserved. On the contrary, the
function ln x applied to this kind of samples eliminates the influence of those “extreme”
values and offers a very robust measure that is more likely to be preserved than the estimated
sample moments. Essentially for this reason, the logarithmic transformation is probably the
most common transformation used in hydrology as it tends to normalize positively skewed
data.
As stated above, the link of the mean and variance with the physical principles of
momentum and energy conservation is invalid in geophysical processes. For example, the
mean of the rainfall is not its momentum and its variance it is not its energy. Even in these
processes, mean and variance (as measures of central tendency and dispersion) provide useful
information, which can at least explain general behaviours and shapes of probability density
functions [11]. However, this information is good only for explanatory purposes and does not
enable detailed and accurate modelling. For, there do not exist theoretical arguments (apart
from simplicity and conceptual meaning as measures of central tendency and dispersion)
which to favor mean and variance against, e.g., fractional moments of small order or even
negative. For example, if the second moment is likely to be preserved, then one could think
that the square root moment is more likely to be preserved as it is more robust in outliers.
Additionally, we can relate low order fractional moments with the ln x function, as it is well
known that
0
1
lim ln
q
q
x
x
q
= (9)
Thus, we may say that the function x
q
for small values of q behaves similar to ln x, thus
exhibiting properties similar to those of the logarithmic function described above.
Based on this reasoning we deem that, instead of choosing the order of moments a
priori, it is better to let the order unspecified, so that any value can be a posteriori chosen,
including small fractional values. This leads in imposing as a constraint any moment m
q
of
order q, i.e.,
( )
0
d
q q
q X
m E X x f x x
 
= =
 
(10)
One reason that many entropy generalizations have emerged was to explain many
empirically detected deviations from exponential type distributions that arise from the BGS
entropy using standard moment constraints. Yet, generalized entropy measures have been
criticized for lacking theoretically consistency and for being arbitrary, a reasonable argument
considering the large number of entropy generalizations available in the literature. Here,
instead of using generalized entropy measures that may result in power-law distributions, we
generalize the important notion of moments inspired by the limiting definition of the
exponential function, i.e.,
(
)
(
)
1/
0
exp lim 1
p
q q
p
x p x
= + . We first define the function
p
q
x
as
(
)
: ln 1 /
p
q q
x p x p
= +
(11)
which for p = 0 becomes the familiar power function x
q
as
(
)
0
0
lim ln 1 /
q q q
p
x p x p x
= + =
.
Thus, we can define a generalization of the classical moments, for which we use the name p-
moments, by
( )
( )
0
1
ln 1 d
q q q
p p X
m E X p x f x x
p
 
= = +
 
(12)
Arguably, this generalization is arbitrary and many other moment generalizations can be (and
in fact are) constructed. Nonetheless, we believe that there is a rationale that supports the use
of p-moments, which can be summarized as follows: (a) if generalized entropy measures,
considered by many as arbitrary, have been successfully used, then there is no reason to avoid
using generalized moments; (b) maximization of the BGS entropy using p-moments leads, as
will become apparent in the next section, to flexible power-type distributions (including the
Pareto and Tsallis distributions for q = 1 and q = 2, respectively); (c) p-moments are simple
and, for p = 0, become identical to the ordinary moments; and (d) they are based on the
p
q
x
function that exhibits all the desired properties, like those of the
ln
x
function described
above, and thus are suitable for positively skewed RVs; additionally, compared to
[
]
ln
E X
they are always positive.
2.4 The resulting entropy distributions
Entropy optimization can be accomplished in many different combinations of the previously
defined constraints; however, here we use two simple combinations of the aforementioned
constraints based on the type and the generality of the distributions that emerge. We combine
the
[
]
ln
E X
constraint, first, with classical moments, and second, with p-moments, letting in
both cases the moment order arbitrary.
In the first case, the maximization of the BGS entropy, given in (1), with constraints (8)
and (10) results in the density function
(
)
(
)
0 1 2
exp ln
q
X
f x
λ λ x λ x
= − −
(13)
which after algebraic manipulations and parameter renaming can be written as
( ) ( )
1 2
1
2
1 2
exp , 0
Γ /
γ γ
X
γ x x
f x x
β γ γ β β
 
   
= − ≥
 
   
 
   
 
(14)
corresponding to the distribution function
( )
2
1 1
2 2
1 Γ , / Γ
γ
X
γ γ
x
F x
γ β γ
 
 
 
= −  
 
 
 
 
 
 
(15)
where
(
)
Γ
is the Gamma function and
(
)
Γ ,
is the upper incomplete Gamma function.
This distribution, commonly attributed to Stacy [19]appeared much earlier in the
literature in the works of Amoroso around 1920 [10], and seems to have been rediscovered
many times under different forms (see e.g., [10]). Here, we use a slightly different form from
that proposed by Stacy. Essentially, it is a generalization of the Gamma distribution and will
be denoted by
(
)
1 2
GG , ,
β γ γ
, or simply GG. It is a very flexible distribution that includes
many other well-known distributions as particular cases, e.g., the Gamma, the Weibull, the
Exponential, or even the Chi-squared distributions and others.
The distribution includes the scale parameter
0
β
>
, and the shape parameters 1
0
γ
>
and 2
0
γ
>
. The parameter γ
1
controls the behavior of the left tail, i.e., if 1
0 1
γ
< <
the density
function is J-shaped and for
0
x
,
(
)
X
f x
→ ∞
; if 1
1
γ
>
the density function is bell-shaped
and mainly positively skewed; yet, for certain values of
1
γ
and
2
γ
it can be symmetric or even
negatively skewed, and for
0
x
=
,
(
)
0
X
f x
=
; finally, for 1
1
γ
=
the distribution degenerates
to a generalized exponential function and for
0
x
=
,
(
)
0
X
f
< ∞
. The parameter
2
γ
is very
important as for fixed
1
γ
it controls the behavior of the right tail, i.e., it determines the
frequency and the magnitude of the extreme events. Generally and loosely speaking, for
2
1
γ
<
the distribution can be characterized as sub-exponential or heavy-tailed, and for 2
1
γ
>
as hyper-exponential or light-tailed (for a classification of distribution tails see [3]).
Notably, the distribution is also valid if the shape parameters are simultaneously
negative (a generalized inverse Gamma distribution); however, the distribution looses some
important shape characteristics and seems not suitable for geophysical RV like rainfall. Thus,
here the distribution is only considered for positive shape parameters.
In the second case, the maximization of the BGS entropy with constraints (8) and (12)
results in the density function
( )
(
)
0 1 2
exp ln ln 1 /
q
X
f x
λ λ x λ p x p
 
= − − +
 
(16)
which after algebraic manipulations and parameter renaming can be written as
( )
1 2
1 3 3
( )
1
3
1 2
( ) 1
, 0
B ,
γ γ
γ γ γ
X
x
β β β
γx x
f x γ γ
− +
 
   
= +
 
   
 
(17)
corresponding to the distribution function
( ) ( ) ( )
3
1
1 2 1 2
( ) B , / B , , where 1 /
γ
X z
F γ γ γ γ z xx β
 
= = +
 
(18)
where
(
)
B ,
is the Beta function and
(
)
B ,
z
is the incomplete Beta function.
This distribution has not been formed earlier on a similar rationale, yet, a search in the
literature reveals that it has been rediscovered many times under different names and
parameterizations. It is most commonly known as the Generalized Beta of the second kind—
hereafter denoted as
(
)
1 2 3
GB2 , , ,
β γ γ γ
, or simply GB2. It seems that Milke and Johnson [14]
were the first that formed this distribution, and proposed it for describing hydrological and
meteorological variables. It has also been used in different disciplines, e.g., McDonald [12]
used the GB2 as an income distribution. Nevertheless, the distribution can be considered as a
simple generalization of many well-known and much earlier introduced distributions, e.g., the
F-distribution or the Pearson type VI of the celebrated Pearson system.
The GB2 distribution is a very flexible four-parameter distribution with
0
β
>
being the
scale parameter, and
1
0
γ
>
,
2
0
γ
>
and
3
0
γ
>
being the three shape parameters, allowing the
distribution to form very many different shapes. The GB2 distribution includes as special or
limiting cases many of the well-known distributions, e.g., the Beta of the second kind, the
Pareto type II, the Loglogistic, the Burr type XII, even the Generalized Gamma (for a
complete account see [10,13].
Obviously, the flexibility of the GB2 distribution makes it a good model for describing
rainfall—we have already used the GB2, under the name JH distribution, to describe the
rainfall in a large range of timescales [15] and to construct theoretically consistent IDF curves
[16]. Nonetheless, as a general rule based on the principle of parsimony, a three-parameter
model is preferable than a four-parameter model, provided that the simpler model describes
the data adequately. Additionally, it is not reasonable to compare the performance of the GG
distribution, which is a three-parameter model, with GB2, which is a four-parameter model.
Thus, a simpler form of the GB2 distribution is selected based on its flexibility and its simple
analytical expression of the distribution function, and consequently, of the quantile function.
A simple three-parameter form of GB2 is derived by setting
1
1
γ
=
in equation (17). By
renaming the parameters and after algebraic manipulations we obtain a distribution known as
the Burr type XII [1] (denoted hereafter as BurrXII), which was introduced by Burr in 1942 in
the framework of a distribution system similar to Pearson’s. Its probability density function is
( )
1 1 1 2
11
1
2
1
1
, 0
γ γ γ γ
X
x x
f x γ x
β β β
− −
   
= ≥
 
 
+
 
   
 
 
(19)
and its distribution function is
( )
1
1 2
1
2
1 1
γ
γ γ
X
x
F x γ β
 
 
= − +
 
 
 
 
 
(20)
The BurrXII distribution is a flexible power-type distribution that comprises the scale
parameter
0
β
>
and the shape parameters
1
0
γ
>
and
2
0
γ
.
The form of the BurrXII distribution we use here is not the one found in the literature (see
e.g. [20]). We preferred the expression (19) because it is suggestive of a generalization of the
familiar Weibull distribution (for
2
0
γ
) and also because the asymptotic behavior of the
right tail is solely controlled by the parameter
2
γ
(for large values of X,
{
}
2 2
1/ 1/
2
γ γ
P X x
γ β x
>
). The distribution has a finite variance distribution for
2
0 0.5
γ≤ <
and finite mean for
2
0 1
γ
≤ <
. Finally, the shape parameter
1
γ
controls the left tail as for
1
0 1
γ
< <
the distribution is J-shaped, for
1
1
γ
>
bell-shaped and for
1
1
γ
=
degenerates to the
familiar Pareto type II distribution.
3. Application
To test the applicability of the above theoretical framework, a large data set of daily rainfall
observations was used. This is a subset of the Global Historical Climatology Network – Daily
database (http://www.ncdc.noaa.gov/oa/climate/ghcn-daily) that includes data recorded at
over 43 000 stations. A total of 11 519 stations were selected from this database with the
following criteria: (a) record length of over 50 years; (b) percentage of missing values less
than 1%; and (c) percentage of flags for suspect quality less than 1% (see the above web site
for the details about flags). The locations of the selected stations are shown in Fig. 1.
To each of the 11 519 daily rainfall samples we tested the suitability of the above
described, entropy derived distributions using evaluation tools based on the L-moments ratio
plots. A typical approach to identify a suitable distribution for one or more samples is to
compare empirically derived L-ratios with the respective theoretical quantities. The latter are
plotted on a graph as a point, line or area (depending on the number and the type of the
distributional parameters). The most common plot of this type is the L-kurtosis vs. L-
skewness plot. Nevertheless, here we prefer the L-skewness vs. L-variation plot because, as
we deal with positive RVs, the L-variation is meaningful (nonzero and nonnegative, with
values ranging from 0 to 1), and evidently its estimation is more robust than that of the L-
kurtosis.
The most general distribution described above is the four parameter GB2 distribution,
whose graphical depiction in terms of L-ratio plots is not feasible as different parameter sets
may correspond to the same point on an L-ratio plot. Therefore, we made plots for the three-
parameter special cases, i.e. the GG and the BurrXII distributions, which have one scale
parameter and two shape parameters. Evidently, the suitability of one of those two entails also
the suitability of the more general GB2 distribution. Each of the GG and the BurrXII
distributions forms an area on the L-ratio plot. The area was depicted by drawing several
theoretical lines that correspond to specific values of the tail parameter γ
2
(with varying
parameter γ
1
). The theoretical points of L-variation and L-skewness were calculated
numerically based on the integral definition of the L-moments [5].
Fig. 2 and Fig. 3 depict the theoretical area of the GG and BurrXII distributions,
respectively, in the L-skewness vs. L-variation plot, as well as the empirical points of the
corresponding sample statistics of the 11 519 records. In Fig. 2, 97.6% of the empirical points
lie in the theoretical area of the GG distribution for a tail parameter
2
0.1 2
γ
≤ ≤
. It is worth
noting that the vast majority of the empirical points are located above the line of the Gamma
distribution (
2
1
γ
=
) resulting in an average tail parameter
2
0.53
γ=. This indicates that the
Gamma distribution, which has probably been the most common model for daily rainfall,
should not be uncritically adopted as it would seriously underestimate the frequency and the
magnitude of extreme events. Specifically, for
2
1
γ
<
(as observed in the vast majority of
cases of Fig. 2), the GG distribution exhibits a tail heavier than that of the Gamma
distribution. In addition, in points where the GG distribution is unsuitable, a heavier type of
distribution tail is needed.
Regarding the BurrXII distribution in Fig. 3, 87.7% of the empirical points lie in a space
corresponding to a tail parameter
2
0 0.7
γ≤ ≤ . The average value of the tail parameter is
2
0.19
γ= meaning that approximately only moments of up to order 1/0.19 ≈ 5 exist. Notably,
the empirical points that the BurrXII fails to describe can be described by the GG distribution
and vice versa. Particularly, the empirical points in Fig. 3 that lie above the line
2
0.5
γ=
(more or less the ones that GG distribution fails to describe) correspond to BurrXII
distributions with infinite variance and thus, it is hard to describe by an exponential-type
distribution whose all moments are finite.
An additional comment that regards both distributions and also empirically recognized,
(see for example the empirical densities in Fig. 2), is that a large percentage of the empirical
points of daily rainfall are better described by bell-shaped distributions (highly positively
skewed though) rather than J-shaped. Thus, a general model for rainfall should comprehend
both J-shaped and bell-shaped densities. This excludes simple models like the Exponential,
the Pareto, and the Lognormal distributions (commonly used as rainfall models) because the
first two can only be J-shaped and the third only bell-shaped. Finally, as mentioned before,
both distributions are special cases of the GB2 distribution and thus the GB2 distribution
describes 100% of the empirical points, being thus, a suitable model for rainfall, not only for
the daily scale examined here, but also for finer (e.g. sub-hourly) and coarser (e.g. annual)
scales [15].
4. Summary and conclusions
In order to derive statistical distributions suitable for geophysical processes, and particularly
for rainfall, we propose a rationale for defining and selecting constraints within a BGS
entropy maximization framework. Entropy maximization offers a solid theoretical basis for
identifying a probabilistic law based on the available information, in contrast to the common
technique of choosing a distribution from a repertoire based on trial-and-error methods. This
rationale is based on the premises that the constraints should be as few and simple as possible
and incorporate prior information on the process of interest. This prior information may
concern the general shapes of densities and could be obtained by studying the process
worldwide. We studied and justified conceptually three particular constraints, related to the
logarithmic and power functions, which are suitable for positive, highly varying and
asymmetric RVs. Namely, the constraints are the expected values of (a)
ln
x
; (b) x
q
; and (c)
(
)
ln 1 /
q
p x p
+. The last constraint generalizes the classical moments and naturally leads to
power-type distributions avoiding generalized entropy measures.
The BGS entropy maximization under two combinations of these constraints leads to
two flexible distributions, i.e., a three-parameter exponential type, known as the Generalized
Gamma (GG), and, a four-parameter power type, known as the Generalized Beta of the
second kind (GB2)—the former is a particular limiting case of the latter. Another three-
parameter model, known as the Burr type XII (power type), easily derived from the GB2,
proves to be also useful. In order to evaluate the performance of the three-parameter entropy
derived distributions, we used a very large database with thousands of daily rainfall records
across the world. We formed the theoretical area of those distributions in an L-skewness vs.
L-variation plot and compared it with the corresponding sample statistics of 11 519 daily
rainfall records. Both the GG and BurrXII distributions performed very well by describing
97.6% and 87.7%, respectively, of the empirical points. Notably, the two distributions are
complementary in the sense that empirical points that cannot be described by one can be
described by the other. Consequently, as both distributions are special cases of the GB2, the
latter can describe 100% of the empirical points being thus a model suitable for all daily
rainfall records.
Both the empirical analysis of this massive number of records as well as the
distributions tested, lead to two useful conclusions regarding the shape characteristics of the
daily rainfall distribution. First, a suitable distribution for daily rainfall must be able to form
both J- and bell-shaped densities, with J-shaped densities having also the property for
0
x
,
(
)
X
f x
→ ∞
. This excludes commonly used models like the Exponential, the Pareto or the
Lognormal distributions and many more. Of course, these distributions may be suitable for
particular cases or for specific rainfall ranges (above certain thresholds) and actually are, but
cannot be proposed as general models for daily rainfall. Second, regarding the right
distribution tail (a very important feature as it controls the behavior of extremes) the analysis
showed that heavy-tailed distributions describe better the vast majority of the empirical points
compared to light-tailed distributions. Consequently, the Gamma distribution (probably the
most common daily rainfall model) is rejected as a general model for daily rainfall, because in
the majority of rainfall records would seriously underestimate the extreme events.
Evidently, rainfall in different places of Earth is influenced by local characteristics such
as climate, topography, distance from the sea and many more. The diversity of such
characteristics produces different rainfall patterns across the globe. This does not contradict
our finding that a single flexible probabilistic law (the GB2 distribution) or simpler special
cases thereof (the GG and BurrXII distributions) can model rainfall over all examined cases.
The diversity of characteristics is rather reflected in the diversity of shapes that the GB2
distribution can produce, as well as in the wide range of feasible parameter values.
References
[1] Burr IW. Cumulative Frequency Functions. The Annals of Mathematical Statistics.
1942;13(2):215-232.
[2] Esteban MD, Morales D. A summary on entropy statistics. Kybernetika.
1995;31(4):337-346.
[3] Goldie CM, Klüppelberg C. Subexponential distributions. A Practical Guide to Heavy
Tails: Statistical Techniques and Applications. 1998:435–459.
[4] Havrda J, Charvát F. Concept of structural a-entropy. Kybernetika. 1967;3:30–35.
[5] Hosking JR. L-moments: analysis and estimation of distributions using linear
combinations of order statistics. Journal of the Royal Statistical Society. Series B
(Methodological). 1990;52(1):105–124.
[6] Jaynes ET. Information Theory and Statistical Mechanics. Phys. Rev. 1957;106(4):620.
[7] Jaynes ET. Information Theory and Statistical Mechanics. II. Phys. Rev.
1957;108(2):171.
[8] Jaynes ET. Probability: The logic of science. Cambridge University Press; 2003.
[9] Kapur JN. Maximum-entropy models in science and engineering. John Wiley & Sons;
1989.
[10] Kleiber C, Kotz S. Statistical size distributions in economics and actuarial sciences.
Wiley-Interscience; 2003.
[11] Koutsoyiannis D. Uncertainty, entropy, scaling and hydrological stochastics, 1,
Marginal distributional properties of hydrological processes and state scaling.
Hydrological Sciences Journal. 2005;50(3):381-404.
[12] McDonald JB. Some Generalized Functions for the Size Distribution of Income.
Econometrica. 1984;52(3):647-663.
[13] McDonald JB, Xu YJ. A generalization of the beta distribution with applications.
Journal of Econometrics. 1995;66(1-2):133-152.
[14] Mielke Jr PW, Johnson ES. Some generalized beta distributions of the second kind
having desirable application features in hydrology and meteorology. Water Resources
Research. 1974;10(2):223–226.
[15] Papalexiou SM, Koutsoyiannis D. Probabilistic description of rainfall intensity at
multiple time scales. In: IHP 2008 Capri Symposium: “The Role of Hydrology in Water
Resources Management”.; 2008. Available at: http://www.itia.ntua.gr/en/docinfo/884/
[Accessed October 6, 2010].
[16] Papalexiou SM, Koutsoyiannis D. Ombrian curves in a maximum entropy framework.
In: European Geosciences Union General Assembly 2008.; 2008:00702. Available at:
http://www.itia.ntua.gr/en/docinfo/851/ [Accessed October 6, 2010].
[17] Shannon CE. The mathematical theory of communication. Bell System Technical
Journal. 1948;27:379–423.
[18] Singh VP. Entropy theory for derivation of infiltration equations. Water Resour. Res.
2010;46:20.
[19] Stacy EW. A Generalization of the Gamma Distribution. The Annals of Mathematical
Statistics. 1962;33(3):1187-1192.
[20] Tadikamalla PR. A Look at the Burr and Related Distributions. International Statistical
Review / Revue Internationale de Statistique. 1980;48(3):337-344.
[21] Tsallis C. Possible generalization of Boltzmann-Gibbs statistics. Journal of Statistical
Physics. 1988;52(1):479-487.
Figures
Fig. 1 Locations of the raingauge stations of the study, which are a subset of the Global Historical Climatology
Network-Daily database containing those stations with daily rainfall record length of over 50 years (a total of
11 519 stations with very few missing values).
Average of
all records
Gamma line
(γ
2
= 1)
Fig. 2 Theoretical relationships of L-skewness vs. L-variation of the Generalized Gamma distribution and
empirical points of the corresponding sample statistics of the 11 519 records. 97.6% of the empirical points lie in
the space of the GG for 0.1 < γ
2
< 2. For two of the records with the indicated positions on the L-moments
diagram the empirical probability density functions are also shown.
Average of
all records
Pareto line
Fig. 3 Theoretical relationships of L-skewness vs. L-variation of the Burr type XII distribution and empirical
points of the corresponding sample statistics of the 11 519 records. 87.7% of the empirical points lie in the space
of the BurrXII for 0 < γ
2
< 0.7. For two of the records with the indicated positions on the L-moments diagram the
empirical probability density functions are also shown.
... Yet advances in stochastic modeling (Papalexiou, 2018;Papalexiou & Serinaldi, 2020;Papalexiou, Serinaldi, & Porcu, 2021), allow rainfall modeling with any desired distribution and correlation structure. Also, global studies (Papalexiou & Koutsoyiannis, 2012, 2016 analyzing thousands of records indicate that such distributions describe effectively nonzero rainfall. Particularly, the Generalized Gamma ( ; Equation A1 in Appendix A) introduced by Stacy (1962) and the Burr type XII (XII ; Equation A2) from the Burr (1942) system have been reparametrized and used extensively for rainfall (Papalexiou & Koutsoyiannis, 2012, 2016. ...
... Also, global studies (Papalexiou & Koutsoyiannis, 2012, 2016 analyzing thousands of records indicate that such distributions describe effectively nonzero rainfall. Particularly, the Generalized Gamma ( ; Equation A1 in Appendix A) introduced by Stacy (1962) and the Burr type XII (XII ; Equation A2) from the Burr (1942) system have been reparametrized and used extensively for rainfall (Papalexiou & Koutsoyiannis, 2012, 2016. The Burr type III (III ; Equation A3) has also been suggested (Papalexiou, 2018), while a generalization of the Beta of the second kind (II ; Equation A4) and the Generalized Standard Gompertz ( ; Equation A5) were introduced in Papalexiou and Serinaldi (2020) to describe rainfall in random fields. ...
Article
Full-text available
What elements should a parsimonious model reproduce at a single scale to precisely simulate rainfall at many scales? We posit these elements are: (a) the probability of dry and linear correlation structure of the wet/dry sequence as a proxy reproducing the distribution of wet/dry spells, and (b) the marginal distribution of nonzero rainfall and its correlation structure. We build a two‐state rainfall model, the CoSMoS‐2s, that explicitly reproduces these elements and is easily applicable at any timescale. Additionally, the paper: (a) introduces the Generalized Exponential (GE $\mathcal{G}\mathcal{E}$) distribution system comprising six flexible distributions with desired properties to describe nonzero rainfall and facilitate time series generation; (b) extends the CoSMoS framework to allow simulations with negative correlations; (c) simplifies the generation of binary sequences with any correlation structure by analytical approximations; (d) introduces the rank‐based CoSMoS‐2s that preserves Spearman's correlations, has an analytical formulation, and is also applicable for infinite variance time series, (e) introduces the copula‐based CoSMoS‐2s enabling intermittent times series generation with nonzero values having the dependence structure of any desired copula, and (f) offers conceptual generalizations for rainfall modeling and beyond, with specific ideas for future improvements and extensions. The CoSMoS‐2s is tested using four long hourly rainfall records; the simulations reproduce rainfall properties at multiple scales including the wet/dry spells, probability of dry, characteristics of nonzero rainfall, and the behavior of extremes.
... Martinez-Villalobos and Neelin 2019, using a model based on a simplified version of the moisture equation (Stechmann and Neelin 2014;Neelin et al. 2017), provides a first order explanation on how the moisture budget controls the shape of daily precipitation PDFs and why they have shapes that are often approximated by Gamma or similar distributions. A gamma distribution has historically been one of the most popular choices to empirically fit daily precipitation PDFs over wet days (Barger and Thom 1949;Thom 1958;Ropelewski et al. 1985;Groisman et al. 1999;Wilby and Wigley 2002;Watterson and Dix 2003;Husak et al. 2007;Martinez-Villalobos and Neelin 2018;Chang et al. 2021), although other Gamma-like alternatives are also used (e.g., Wilks (1998); Wilson and Toumi (2005); Papalexiou and Koutsoyiannis (2012)). ...
... We emphasize that we are not relying on conformance to a particular distribution, but we use Gamma-like distribution properties to inform metrics, their interpretations and relationships among them. For applications to more subtle features, such as deviations from the approximate power law scaling at low values (Papalexiou and Koutsoyiannis 2016), or accurately capturing the folding of the very extreme tail (Papalexiou and Koutsoyiannis 2013;Cavanaugh et al. 2015;Papalexiou and Koutsoyiannis 2016), then distributions with an additional parameter (e.g., Generalized Gamma distribution, Burr Type XII distribution) can be better suited (Papalexiou and Koutsoyiannis 2012)-similar considerations to those presented here can still apply, although with added complexity, as further discussed in section 2. The approximate power law range arises from fluctuations across the threshold between raining and non-raining conditions. For daily average precipitation, a main control of the exponent τ P is the number of individual precipitating events within wet days (Martinez-Villalobos and Neelin 2019) -all else equal, regions with fewer events per day tend to have steeper power law ranges. ...
Article
Full-text available
The performance of GCMs in simulating daily precipitation probability distributions are investigated by comparing 35 CMIP6 models against observational datasets (TRMM-3B42 and GPCP). In these observational datasets, PDFs on wet days follow a power-law range for low and moderate intensities below a characteristic precipitation cutoff-scale. Beyond the cutoff-scale, the probability drops much faster, hence controlling the size of extremes in a given climate. In the satellite products analyzed, PDFs have no interior peak. Contributions to the first and second moments tend to be single-peaked, implying a single dominant precipitation scale— the relationship to the cutoff scale and log-precipitation coordinate and normalization of frequency density are outlined. Key metrics investigated include the fraction of wet days, PDF power-law exponent, cutoff-scale, shape of probability distributions and number of probability peaks. The simulated power-law exponent and cutoff-scale generally fall within observational bounds, although these bounds are large; GPCP systematically display smaller exponent and cutoff-scale than TRMM-3B42. Most models simulate a more complex PDF shape than these observational datasets, with both PDFs and contributions exhibiting additional peaks in many regions. In most of these instances, one peak can be attributed to large-scale precipitation and the other to convective precipitation. Similar to previous CMIP phases, most models also rain too often and too lightly. These differences in wet-day fraction and PDF shape occur primarily over oceans and may relate to deterministic scales in precipitation parameterizations. It is argued that stochastic parameterizations may contribute to simplifying simulated distributions.
... In this case, a sufficiently general form with a closed expression of ( ) is derived if we set = 1; this is listed as the generalized (power transformed) beta prime distribution (where the standard beta prime corresponds to = 1). The generalized gamma and generalized beta prime distributions were also studied in Koutsoyiannis (2005a,c, where additional information for some of their characteristics are provided) and Papalexiou and Koutsoyiannis (2012). ...
... Options (a) and (b) have been studied in Koutsoyiannis (2005a) and Papalexiou and Koutsoyiannis (2012), and option (c) in Koutsoyiannis (2017). We note that option (c) in which the standard entropy definition of entropy is kept, is more advantageous: It satisfies Shannon's postulates (section 2.9) and, in particular postulate (d), which is not satisfied by other definitions. ...
Preprint
Full-text available
This is a working draft of a book in preparation. Current version 0.4 – uploaded on ResearchGate on 25 January 2022. (Earlier versions: 0.3 – uploaded on ResearchGate on 17 January 2022. 0.2 – uploaded on ResearchGate on 3 January 2022. 0.1 (initial) – uploaded on ResearchGate on 1 January 2022.) Some stuff is copied from Koutsoyiannis (2021, https://www.researchgate.net/ publication/351081149). Comments and suggestions will be greatly appreciated and acknowledged.
... In particular, when the threshold was greater than the 98th percentile, the variability of the ξ value increased dramatically. For smaller thresholds, such as those less than the 98th percentile, non-extreme rainfall values were incorporated in the POT samples, which lead to POT samples that may not be suitable for the J-shaped GPD, but for a bell-shaped distribution (Serinaldi and Kilsby, 2014;Papalexiou and Koutsoyiannis, 2012). However, for extreme rainfall, heavy tailed distributions such as GPD are more suitable (Moccia et al., 2021), while the rule "the heavier, the better" can be applied (Papalexiou et al., 2013;Adlouni et al., 2008). ...
Article
Nonstationary frequency analysis of peak over threshold (POT) extreme rainfall series is of crucial in hydrology. Most previous studies on the nonstationary frequency analyses of POT extreme rainfall series use only a single threshold, ignoring the differences in the statistical characteristics of POT extreme rainfall series extracted with different thresholds. This study investigated the impact of thresholds on the nonstationary frequency analyses of POT extreme rainfall series using a three-step method. First, the non-stationarities in POT extreme rainfall series extracted with different thresholds were assessed using the Iterative Mann-Kendall test and the Mood test. Second, the nonstationary POT extreme rainfall series were modeled using the generalized Pareto distribution (GPD) with the scale parameter to be linked with physical covariates. Third, the uncertainties in the parameters and return level of the optimal model for the POT extreme rainfall series were evaluated using the Markov chain Monte Carlo method. This method was applied to the daily rainfall at 48 stations in the Pearl River Basin (PRB) from 1979 to 2020. By comparing the optimal models of POT extreme rainfall series extracted with different percentile thresholds, we can draw the following conclusions. (1) As the threshold increases from the 90th percentile to the 99.7th percentile, the percentage of nonstationary POT extreme rainfall series gradually decreases from 92% to 13%, with most stations with a significant increasing trend changing to most stations with a significant decreasing trend, especially when the threshold is greater than the 98th percentile. (2) The uncertainty in the scale parameter, shape parameter, and return level increases as the threshold increases, and increases significantly when the threshold is greater than the 98th percentile, especially for the scale and shape parameters. (3) The 98th percentile is suggested as the optimal threshold for the PRB; (4) For the 98th percentile thresholds, the total column water vapor in the convective indices is the most significant covariate over the PRB.
... Later, Jaynes [31] developed the principle of maximum entropy, which states that the best estimate of current states of knowledge (best fit of the probability distribution) is the one with the maximum entropy. Such properties allow wide application in hydrological and hydraulic engineering, for example, estimation of velocity distribution [32][33][34], probability distribution [35], and flow forecasting [36]. The Shannon entropy and the Tsallis entropy [37] have been applied to estimate the flow duration curves and have shown an advantage in resolution compared to the traditional method [38]. ...
Article
Full-text available
Due to both anthropogenic and climate change impacts, precipitation and runoff in the Yellow River basin have decreased in the past 50 years, leading to more pressure in sustaining human beings and ecosystem needs. It is essential to evaluate the flow condition in the Yellow River basin and see whether it may satisfy its ecological flow in the future. Therefore, this study applied an entropy-based method to calculate the flow duration curves from both observed and simulated data to evaluate the impact of climate change on ecological flow in the Yellow River basin. The simulated FDCs from H08 and DBH models show good agreement with each other and fit observation well. Results show that the decadal FDC at each station is generally predicted to be higher or stay in the higher range under both RCP 2.6 and 8.5 scenarios, suggesting an increase in water amount in the future. It is found that the high flows increase much faster than the low flows, resulting in larger slopes than the references ones, which is due to the larger entropy and M values in the future. At most of the stations, the future values of Q95 and Q90 will safely exceed the threshold. It is found that at the Lanzhou, Wubao, Longmen, and Huayuankou stations, there will be no or little threat to future ecological flow. Still, at the Toudaoguai and Sanmanxia stations, the ecological requirement is not always satisfied. The water stress at the Tangnaihai station from the upper stream of the Yellow River may be threatened in the future.
Article
The three-parameter generalized gamma (TPGG) distribution is a generalization of the two-parameter gamma distribution and includes as special cases the exponential distribution, the two-parameter gamma distribution, the Weibull distribution, and the lognormal distribution that are employed for frequency analysis in water engineering. In this chapter, the TPGG distribution is derived using the entropy theory and then its parameters are estimated with the principle of maximum entropy and the methods of maximum likelihood estimation and moments.
Article
The Burr–Singh–Maddala (BSM) probability distribution is a generalization of the Pareto distribution and the Weibull distribution that are used for frequency analyses of a variety of hydrologic and hydrometeorologic data. This distribution possesses a number of interesting characteristics that are discussed in this chapter. The BSM distribution is derived using the entropy theory, which then is applied to derive the BSM distribution parameters. Real-world data are used to illustrate the application of the distribution.
Article
The probability distribution of precipitation amount strongly depends on geography, climate zone, and time scale considered. Closed-form parametric probability distributions are not sufficiently flexible to provide accurate and universal models for precipitation amount over different time scales. This paper derives non-parametric estimates of the cumulative distribution function (CDF) of precipitation amount for wet periods. The CDF estimates are obtained by integrating the kernel density estimator leading to semi-explicit CDF expressions for different kernel functions. An adaptive plug-in bandwidth estimator (KCDE) is investigated, using both synthetic data sets and reanalysis precipitation data from the Mediterranean island of Crete (Greece). It is shown that KCDE provides better estimates of the probability distribution than the standard empirical (staircase) estimate and kernel-based estimates that use the normal reference bandwidth. It is also demonstrated that KCDE enables the simulation of non-parametric precipitation amount distributions by means of the inverse transform sampling method.
Article
Full-text available
Since Shannon’s formulation of the entropy theory in 1940 and Jaynes’ discovery of the principle of maximum entropy ( POME ) in 1950, entropy applications have proliferated across a wide range of different research areas including hydrological and environmental sciences. In addition to POME , the method of probability-weighted moments ( PWM ), was introduced and recommended as an alternative to classical moments. The PWM is thought to be less impacted by sampling variability and be more efficient at obtaining robust parameter estimates. To enhance the PWM , self-determined probability-weighted moments was introduced by (Haktanir 1997). In this article, we estimate the parameters of Kumaraswamy distribution using the previously mentioned methods. These methods are compared to two older methods, the maximum likelihood and the conventional method of moments techniques using Monte Carlo simulations. A numerical example based on real data is presented to illustrate the implementation of the proposed procedures.
Article
In recent years China has witnessed massive economic losses and fatalities by natural disasters caused by extreme precipitation including floods and landslides. Alleviating the adverse effects of extreme precipitation in modern societies relies on building protection infrastructure. In turn, cost-effective and robust infrastructure design requires the identification of the statistical properties and laws that govern extreme precipitation. China is a vast country with diverse climates and terrains, and as such, different analysis and research methods have been applied in its regions. In this paper, we provide a comprehensive review and synthesis of techniques and methods used in research and engineering practice for extreme precipitation analysis across the country. Specifically, we focus on: (1) annual maximum methods, (2) peaks over threshold methods, (3) probable maximum precipitation estimates, and (4) non-stationary analysis of precipitation extremes. Research on extreme precipitation in China is generally based on the above four approaches and this review aims to provide a detailed timeline of the evolution and application of these methods. Finally, we stress ideas for further research on frequency analysis of extreme precipitation in response to climate change and human activities in a changing environment.
Conference Paper
Full-text available
The probabilistic description of the average rainfall intensity over a certain time scale in relationship with the time scale length has theoretical interest, in understanding the behaviour of the rainfall process, and practical interest in constructing relationships between intensity, time scale (sometimes called “duration”) and return period (or “frequency”). To study these relationships, the principle of maximum entropy can serve as a sound theoretical background. Using a long rainfall dataset from Athens, Greece, and time scales ranging from 1 hour to 1 year, we study statistical properties such as (a) probability dry and its relationship with rainfall intensity and time scale, (b) marginal probability distribution function of rainfall intensity, with emphasis on the tails, and its variation with time scale (c) dependence structure of rainfall intensity with reference to time scale, and (d) statistical properties that are invariant or scaling with time scale. The study concludes with a discussion of the usefulness of these analyses in hydrological design.
Article
Full-text available
The well-established physical and mathematical principle of maximum entropy (ME), is used to explain the distributional and autocorrelation properties of hydrological processes, including the scaling behaviour both in state and in time. In this context, maximum entropy is interpreted as maximum uncertainty. The conditions used for the maximization of entropy are as simple as possible, i.e. that hydrological processes are non-negative with specified coefficients of variation (CV) and lag one autocorrelation. In this first part of the study, the marginal distributional properties of hydrological variables and the state scaling behaviour are investigated. Application of the ME principle under these very simple conditions results in the truncated normal distribution for small values of CV and in a nonexponential type (Pareto) distribution for high values of CV. In addition, the normal and the exponential distributions appear as limiting cases of these two distributions. Testing of these theoretical results with numerous hydrological data sets on several scales validates the applicability of the ME principle, thus emphasizing the dominance of uncertainty in hydrological processes. Both theoretical and empirical results show that the state scaling is only an approxi- mation for the high return periods, which is merely valid when processes have high variation on small time scales. In other cases the normal distributional behaviour, which does not have state scaling properties, is a more appropriate approximation. Interestingly however, as discussed in the second part of the study, the normal distribution combined with positive autocorrelation of a process, results in time scaling behaviour due to the ME principle.
Article
Full-text available
With the purpose to study as a whole the major part of entropy measures cited in the literature, a mathematical expression is proposed. In favour of this mathematical tool is the fact that most entropy measures can be obtained as a particular or a limit case of the H h,v φ 1 ,φ 2 -entropy functional, and therefore, all those properties which are proved for the functional are also true for its particularizations. Entropy estimates are obtained by replacing probabilities by relative frequencies and their asymptotic distributions are obtained. To finish, the asymptotic variances of many entropy statistics are tabulated.
Article
Three new fungicide molecules, namely, Folio Gold @ 0.2%, Ridomil Gold @ 0.25% and Mandipropamid 250 SC @ 0.16% were compared for field evaluation with standard recommendations viz. Mancozeb and Ridomil MZ-72 @ 0.2% against late blight organism. The new fungicides, Ridomil Gold @ 0.25%, Mandipropamid @ 0.16% and Folio Gold @ 0.2% were found to be highly effective in reducing disease severity and protecting yield loss. These new fungicides proved their effectiveness against late blight pathogen under field condition.
Book
A comprehensive account of economic size distributions around the world and throughout the years In the course of the past 100 years, economists and applied statisticians have developed a remarkably diverse variety of income distribution models, yet no single resource convincingly accounts for all of these models, analyzing their strengths and weaknesses, similarities and differences. Statistical Size Distributions in Economics and Actuarial Sciences is the first collection to systematically investigate a wide variety of parametric models that deal with income, wealth, and related notions. Christian Kleiber and Samuel Kotz survey, compliment, compare, and unify all of the disparate models of income distribution, highlighting at times a lack of coordination between them that can result in unnecessary duplication. Considering models from eight languages and all continents, the authors discuss the social and economic implications of each as well as distributions of size of loss in actuarial applications. Specific models covered include: Pareto distributions Lognormal distributions Gamma-type size distributions Beta-type size distributions Miscellaneous size distributions Three appendices provide brief biographies of some of the leading players along with the basic properties of each of the distributions. Actuaries, economists, market researchers, social scientists, and physicists interested in econophysics will find Statistical Size Distributions in Economics and Actuarial Sciences to be a truly one-of-a-kind addition to the professional literature.
Article
An entropy theory is formulated for modeling the potential rate of infiltration in unsaturated soils. The theory is composed of six parts: (1) Shannon entropy, (2) principle of maximum entropy (POME), (3) specification of information on infiltration in terms of constraints, (4) maximization of entropy in accordance with POME, (5) derivation of the probability distribution of infiltration, and (6) derivation of infiltration equations. The theory is illustrated with the derivation of six infiltration equations commonly used in hydrology, watershed management, and agricultural irrigation, including Horton, Kostiakov, Philip two-term, Green-Ampt, Overton, and Holtan equations, and the determination of the least biased probability distributions of these infiltration equations and their entropies. The theory leads to the expression of parameters of the derived infiltration equations in terms of measurable quantities (or information), called constraints, and in this sense these equations are rendered nonparametric. Furthermore, parameters of these infiltration equations can be expressed in terms of three measurable quantities: initial infiltration, steady infiltration, and soil moisture retention capacity. Using parameters so obtained, infiltration rates are computed using these six infiltration equations and are compared with field experimental observations reported in the hydrologic literature as well as the rates computed using parameters of these equations obtained by calibration. It is found that infiltration parameter values yielded by the entropy theory are good approximations.
Article
Two distinct versions of the generalized beta distribution of the second kind are considered. These beta-type distributions compare favorably with the commonly used gamma and log normal distributions in their ability to fit selected sets of accumulated streamflow and precipitation amount data. The comparisons are based on empirical results associated with three different goodness of fit criteria. Since the cumulative distribution functions of these beta-type distributions are in closed form, they possess unique computational advantages over the gamma and log normal distributions.