Full-text
Available from: Hans Joachim Haubold, Dec 12, 2012arXiv:0709.3820v1 [cond-mat.stat-mech] 24 Sep 2007
AN ENTROPIC PATHWAY TO MULTIVARIATE
GAUSSIAN DENSITY
H.J. Haubold
1
, A.M. Mathai
2
, S. Thomas
3
1
Oﬃce for Outer Space Aﬀairs, United Nations, Vienna International Centre, P.O. Box 500, A-1400,
Vienna, Austria.
2
Centre for Mathematical Sciences Pala Campus, Arunapuram P.O., Pala-686 574, Kerala, India and
Department of Mathematics and Statistics, McGill University, Montreal, Canada H3A 2K6.
3
Department of Statistics, St. Thomas College, Palai, Arunapuram P.O., Pala-686 574, K e rala, India
Abstract
A general principle called “conservation of the ellipsoid of concentration” is introduced
and a generalized entropic form of order α is optimized under t his principle. It is shown
that this can produce a density which can act as a pathway to multivariate Gaussian
density. The resulting entropic pathway contains as special cases the Boltzmann-Gibbs
(Shannon) and Tsallis (Havrda-Charv´at) entropic forms.
Key words: Multivariate Gaussian density; pathway model; generalized entropic form of
order α; ellipsoid of concentration; conservation principle.
1 Introduction
The normal (Gaussian) distribution is a family of continuous probability distributions
and is ubiquitous in the ﬁeld of statistics and probability (Feller [4]). The importance of
the normal distribution as a model of quantitative phenomena is due to the central limit
theorem. The normal distribution maximizes Shannon entropy among all distributions
with known mean and varia nce and in information theory, Shannon entropy is the measure
of uncertainty associated with a random variable.
In statistical mechanics, Gaussian (Maxwell-Boltzmann) distribution maximizes the
Boltzmann-Gibbs entropy under appropriate constraints (Gell-Mann and Tsallis [7]).
Given a probability distribution P = {p
i
} (i = 1, ..., N), with p
i
representing t he
probability of t he system to be in the ith microstate, the Boltzmann-Gibbs entropy is
S(P ) = −k
P
N
i=1
p
i
lnp
i
, where k is the Boltzmann constant and N the total number
of microstates. If all states are equally probable it leads to the Boltzmann principle
S = k lnW (N = W ). Boltzmann-Gibbs entropy is equivalent to Shannon’s entropy if
k = 1.
A generalization of Boltzmann-Gibbs extensive statistical mechanics is known as Tsal-
lis non-extensive statistical mechanics (Swinney and Tsallis [5], Abe and Okamoto [6]).
Tsallis discovered the generalization of Shannon’s entropy to non- extensivity as S(P, q) =
1
Corresponding author:
Email address: HANS.HAUBOLD@UNVIENNA.ORG
1
Page 1
(
P
N
i=1
p
q
i
− 1)/ ( 1 − q). For q → 1, Shannon’s entro py is recovered. Tsallis introduced
q-pro babilities accommodating the fact that non-extensive systems are better described
by power law distributions, p
q
i
, now called q-probabilities. The p
q
i
are scaled probabilities
where q is a real parameter.
This paper, in Section 2, introduces a general principle called conservation of the
ellipsoid of concentration and maximizes a generalized entropic form of order α, containing
Shannon ( Bo ltzmann-G ibbs), R´enyi, Havrda- Charv´at (Tsallis) entropies as special cases,
under this principle, in Section 3. Normalizing constants are derived in Section 3.1 and
mean value and covariance matrix in Section 3.2 for the cases α < 1, α > 1, and α = 1.
The pathway, characterized by α is shown to produce multivariate type-1 beta, Gaussian,
and type- 2 beta densities, respectively. In Section 3.3 a graphical representat io n of the
pathway surface is shown. Section 4 draws conclusions.
2 Conservation of the ellipsoid of concentration
Consider a q × 1 vector X, X
′
= (x
1
, . . . , x
q
), where a prime denotes the tra nspose. The
components x
1
, . . . , x
q
may be real scalar mathematical variables or random variables
describing various components in a physical system. Each component in X can be assumed
to have a ﬁnite mean value and variance. If E denotes the expected value, the value on
the average in the long-run, then we can assume E(x
i
) = µ
i
< ∞ f or i = 1, . . . , q. Let
µ
′
= (µ
1
, . . . , µ
q
). Similarly one can assume the expected dispersion in each component
to be ﬁnite. The square of a measure of dispersion is given by the variance or Var(x
i
).
That is, Var(x
i
) < ∞. The components may be correlated o r may have pair-wise jo int
varia tions. A measure of pair-wise joint variation is covariance between x
i
and x
j
or
Cov(x
i
, x
j
) = E[x
i
− E(x
i
)][x
j
− E(x
j
)] = v
ij
so that when i = j we have Var(x
i
) = v
ii
.
The matrix of such variances and covariances is the covariance matrix in X, denoted by
Cov(X) = E(X − E(X))(X − E(X))
′
= V = (v
ij
). Note that V is real symmetric when
x
i
, i = 1, . . . , q are real, and V is a t least non-negative deﬁnite. Let us assume that no
component in the q × 1 vector X is a linear function of other components so that we
can take V to be nonsingular. This will then imply that V is positive deﬁnite. That is,
V = V
′
> 0. Let V
1
2
be the positive deﬁnite square root of the positive deﬁnite matrix
V .
Standardization of a component x
i
is achieved by relocating it at µ
j
and by rescaling it
by taking y
i
=
x
i
−µ
i
√
V ar(x
i
)
so that E(y
i
) = 0 and Var(y
i
) = 1. Similarly, standardization of
the q×1 vector X is achieved by a linear transformation on X−µ, namely, Y = V
−
1
2
(X−µ )
so that E(Y ) = O and Cov(Y ) = I where I is the identity matrix. The Euclidean norm in
Y is then [Y
′
Y ]
1
2
= [(X −µ)
′
V
−1
(X −µ)]
1
2
. This scalar quantity (X −µ)
′
V
−1
(X −µ) has
many interpretations in diﬀerent disciplines. A measure of distance between X and µ is
any norm ||X −µ||. But if we want to accommodate t he joint variations in the components
x
1
, . . . , x
q
as well as the fact that the variances of the components may be diﬀerent then
we consider a g eneralized distance between X and µ. One such square of the generalized
2
Page 2
distance is the square o f the Euclidian norm in Y or Y
′
Y = (X − µ )
′
V
−1
(X − µ). For
a given constant c > 0, (X − µ)
′
V
−1
(X − µ) = c deﬁnes the surface of an ellipsoid since
V is positive deﬁnite. This ellipsoid is known as the ellipsoid of concentrat io n of X
around its expected value µ. If we assume that c is ﬁxed, fo r example c = 1 which implies
(X −µ)
′
V
−1
(X −µ) = 1 then this assumption is equivalent to saying t hat the standardized
X, namely, Y is a point on the surface of a hypersphere of radius 1. When it is assumed
that the ellipsoid of concentration is a ﬁxed ﬁnite quantity what we are saying is that the
generalized distance of X from µ is ﬁxed and ﬁnite. This is the principle of conservation
of the ellipsoid of concent r ation.
3 Generalized entropic form of order α
Let f(X) be a real-valued scalar function of X where X could be a scalar quantity or a
q ×1 vector, q > 1, or p×q matrix, p > 1, q > 1. Let us assume that the elements in X are
real scalar random variables. Then f (X) can deﬁne a density provided
R
X
f(X)dX = 1
and f(X) ≥ 0 for all X. If
R
X
f(X)dX = h < ∞ then g(X) =
1
h
f(X) is a density
provided f(X) ≥ 0 for all X. Here dX denotes the wedge product of the diﬀerentials in
X. For example, dX = dx
11
∧ dx
12
∧ . . . ∧ dx
1q
∧ dx
21
∧ . . . ∧ dx
pq
if X is p × q and all
elements in X are functionally independent. A measure of uncertainty or information in
X or in f(X) is measured by Shannon entropy deﬁned by
S(f) = −
Z
X
f(X) ln f(X)dX (3.1)
when f is continuous, where X may be scalar or vector or a general matrix and f is the
density of X. There are generalizations o f S(f), some of them are listed in Mathai and
Rathie [1]. Some of these are the following (Mathai and Haubold [2]):
R´enyi’s entropy R
α
(f) =
ln[
R
X
{f(X)}
α
dX]
1 −α
, α 6= 1, α > 0
Havrda-Charv´a t entropy H
α
(f) =
R
X
[f(X)]
α
]dX − 1
2
1−α
− 1
, α 6= 1, α > 0
Tsallis’ non-extensive entropy T
α
(f) =
R
X
[f(X)]
α
dX − 1
1 −α
, α 6= 1, α > 0
Non-extensive generalized entropic form M
α
(f) =
R
X
[f(X)]
2−α
dX −1
α − 1
, α 6= 1, α < 2
Extensive generalized entro pic form M
∗
α
(f) =
ln[f
X
{f(X)}
2−α
dX]
α − 1
, α 6= 1, α < 2.
Let us look into the problem of o ptimizing the non-extensive generalized entropic form
M
α
(f) under the principle of the conservation of the ellipsoid of concentration. That is,
to optimize M
α
(f) over all functional f , subject to the conditions
(i)
Z
X
f(X)dX = 1; (ii)
Z
X
(X − µ)
′
V
−1
(X − µ)f(X)dX = constant
3
Page 3
for all f ≥ 0 for all X. If we apply calculus of variation technique then the Euler equation
becomes
∂
∂f
f
2−α
− λ
1
f + λ
2
(X − µ)
′
V
−1
(X − µ)f
= 0, α < 2
where λ
1
and λ
2
are Lagrangian multipliers, observing the fact that since α is ﬁxed,
optimization of
f
2−α
α−1
is equivalent to optimizing f
2−α
over all functional f. That is,
f
1−α
=
λ
1
2 −α
1 −
λ
2
λ
1
(X − µ)
′
V
−1
(X − µ)
.
Either by taking
λ
2
λ
1
= a(1 − α), a > 0 or by taking the second condition as the expected
value of (1 −α)(X −µ)
′
V
−1
(X −µ) is 1 where 1 −α denotes the strength of information
in f(X), see Mathai and Haubold [2], we have
f = λ[1 −a(1 − α)(X − µ)
′
V
−1
(X − µ)]
1
1−α
(3.2)
where λ is the normalizing constant, 1 −a(1 −α)(X −µ)
′
V
−1
(X −µ) > 0. Observe t hat
when α < 1 t he form in (3.2) is that of a multivariate type-1 beta type density. When
α > 1, writing 1 −α = −(α − 1) we have
f = λ[1 + a(α −1)(X − µ)
′
V
−1
(X − µ)]
−
1
α−1
, α > 1, a > 0 . (3.3)
Note that (3.3) is a multivariate type-2 beta type density. But when α → 1 in (3.2) and
(3.3) we have the form
f = λe
−a(X− µ)
′
V
−1
(X−µ)
. (3.4)
Note that λ in (3.2), (3.3) and (3.4) are diﬀerent, which are to be evaluated separately
for the three cases of α < 1, α > 1 and α → 1. Thus (3.2) and (3.3) provide a pathway
to the multivariate Gaussian density in (3.4). When a =
1
2
the normalizing constant in
(3.4) is
λ =
a
q
2
(π)
q
2
|V |
1
2
=
1
(2π)
q
2
|V |
1
2
for a =
1
2
(3.5)
or when α → 1 in (3 .2 ) and (3.3).
4
Page 4
3.1 The normalizing constant λ
Let us consider the case α < 1 ﬁrst. Since the total integral is 1 we have
1 =
Z
X
f(X)dX
= λ|V |
1
2
Z
Y
[1 −a(1 − α)(y
2
1
+ ··· + y
2
q
)]
1
1−α
dY, Y = V
−
1
2
(X − µ) ⇒ dX = |V |
1
2
dY.
= λ2
q
|V |
1
2
Z
. . .
Z
y
j
>0,j=1...q,1−a(1−α)(y
2
1
+···+y
2
q
)>0
[1 −a(1 − α)(y
2
1
+ ··· + y
2
q
)]
1
1−α
dY.
Put u
j
= a(1 − α)y
2
j
⇒ dy
j
=
1
2
u
1
2
−1
j
du
′
j
[a(1−α)]
1
2
, α < 1. Then
1 =
λ|V |
1
2
[a(1 − α)]
q
2
Z
. . .
Z
1−u
1
−···−u
q
>0,0<u
j
<1,j=1,...,q
u
1
2
−1
1
. . . u
1
2
−1
q
×(1 − u
1
− ··· −u
q
)
1
1−α
du
1
∧ . . . ∧du
q
=
λ|V |
1
2
[a(1 − α)]
q
2
Γ
1
2
q
Γ
1
1−α
+ 1
Γ
1
1−α
+ 1 +
q
2
.
by evaluating the integral with the help of a type-1 Dirichlet int egral (Mathai [3]). Thus
λ =
Γ
1
1−α
+ 1 +
q
2
[a(1 − α)]
1
2
Γ
1
1−α
+ 1
|V |
1
2
π
q
2
for α < 1. (3.6)
For α > 1, writing 1 −α = −(α −1) and proceeding as above and then ﬁnally evaluating
the integra l with the help of a type-2 Dirichlet integral [Mathai [3]) we have
λ =
[a(α − 1)]
q
2
Γ
1
α−1
|V |
1
2
π
q
2
Γ
1
α−1
−
q
2
,
1
α − 1
−
q
2
> 0, α > 1. (3.7)
When α → 1 do (3.6) and (3.7) go to (3.3)? This can be checked with the help of Stirling’s
formula which states that for |z| → ∞ and ε a bounded quantity,
Γ(z + ε) ≈
√
2πz
z+ε−
1
2
e
−z
. (3.8)
Note that for α < 1 and when α → 1,
1
1−α
→ ∞. Then applying Stirling’s formula to
Γ
1
1−α
+ 1 +
q
2
and Γ
1
1−α
+ 1
in (3.6) we have
λ →
√
2π
1
1−α
+ 1 +
q
2
1
1−α
+1+
q
2
−
1
2
e
−
1
1−a
[a(1 − α)]
q
2
√
2π
1
1−α
+ 1
1
1−α
+1−
1
2
e
−
1
1−α
|V |
1
2
π
q
2
=
a
q
2
π
q
2
|V |
1
2
5
Page 5
which is the value o f λ in (3.5). Then when α approaches 1 from the left, (3.6) goes to
(3.5). Similarly we can see that (3.7) also goes t o (3.5) when α → 1 from the right. This
constitutes the pathway to multivariate Gaussian density.
3.2 The mean value and covariance matrix of X in (3.2)
E(X) =
Z
X
Xf(X)dX = µ
Z
X
f(X)dX +
Z
X
(X − µ)f(X)dX
= µ + λ|V |
1
2
{V
1
2
Z
Y
Y [1 − a(1 − α)Y
′
Y ]
1
1−α
dY },
since
R
X
f(X)dX = 1 and since X − µ = V
1
2
Y when Y = V
−
1
2
(X − µ) ⇒ dX = |V |
1
2
dY.
But Y [1−a(1−α)Y
′
Y ]
1
1−α
is an odd function and hence the integral over Y is null. Hence
E(X) = µ.
Cov(X) = E(X − E(X))(X − E(X))
′
= E(X − µ)(X − µ)
′
= V
1
2
{E(Y Y
′
)}V
1
2
, Y = V
−
1
2
(X − µ)
= λ|V |
1
2
V
1
2
{
Z
Y
Y Y
′
[1 −a(1 − α)Y
′
Y ]
1
1−α
dY }V
1
2
.
Note that Y Y
′
is a q × q matrix where the (i, j)th element is y
i
y
j
. For i 6= j the integral
over Y is zero since y
i
y
j
[1 −a(1 −α)Y
′
Y ]
1
1−α
is an odd function in y
i
as well as in y
j
. The
diagonal elements of Y Y
′
are y
2
1
, . . . , y
2
q
. The integral over one of them will be of the form
R
Y
y
2
1
[1 − a(1 − α)Y
′
Y ]
1
1−α
dY f or 1 −a(1 − α)Y
′
Y > 0 when α < 1.
= 2
q
Z
. . .
Z
y
2
1
(1 −a(1 − α)Y
′
Y ]
1
1−α
dY f or y
i
> 0, j = 1 . . . q, α < 1 and
1 − a(1 −α)(y
2
1
+ ···+ y
2
q
) > 0
=
1
[a(1 − α)]
q
2
+1
Z
. . .
Z
u
3
2
−1
1
u
1
2
−1
2
. . . u
1
2
−1
q
(1 −u
1
− ··· −u
q
)
1
1−α
du
1
∧ . . . ∧du
q
=
1
2
Γ
1
2
q
Γ
1
1−α
+ 1
[a(1 − α)]
q
2
+1
Γ
1
1−α
+ 1 +
q
2
+ 1
,
by using a type- 1 Dirichlet integral. Now, substitute in (3.2) and (3.6) we have
Cov(X) =
1
2a(1 − α)
1
1−α
+ 1 +
q
2
V =
1
2a[1 + (1 − α)(1 +
q
2
)]
V, α < 1. (3.9)
Observe that it is an interesting result because the covariance matrix in X is not the
parameter matrix V in the model (3 .2) and (3.6). For α > 1, proceeding as before, one
6
Page 6
has
Cov(X) =
1
2a[1 − (α −1)
q
2
+ 1
V, (3.10)
for α > 1, 1 − (α − 1)
q
2
+ 1
> 0 which implies 1 < α < 1 +
1
q
2
+1
. Observe that when
a =
1
2
and α → 1 then (3.8 ) and (3.9) give the covariance matrix as V which agrees with
the multivariate Gaussian density. Hence the pathway for the covariance matrix is given
in (3.8) and (3.9).
3.3 The pathway surface
Let us look into the pathway model for the standard case. That is, for α < 1,
g
1
(Y ) =
[a(1 − α)]
q
2
Γ
1
1−α
+ 1 +
q
2
Γ
1
1−α
+ 1
π
q
2
[1 −a(1 − α)(y
2
1
+ ··· + y
2
q
)]
1
1−α
, α < 1
1 − a(1 −α)(y
2
1
+ ··· + y
2
q
) > 0 . This is plotted for q = 2, a = 1 and f or α = −0.5, 0 , 0.5.
For α > 1 ,
g
2
(Y ) =
[a(α − 1)]
q
2
Γ
1
α−1
Γ
1
α−1
−
q
2
π
q
2
[1 + a(α −1)(y
2
1
+ ···+ y
2
q
)]
−
1
α−1
, α < 1,
1
α − 1
−
q
2
> 0, α > 1.
This is plotted for q = 2, a = 1, and for α = 1.1, 1.5, 1.7.
7
Page 7
For α → 1
g
3
(Y ) =
a
q
2
π
q
2
e
−a(y
2
1
+···+y
2
q
)
.
This is plotted for a = 1.
The nature of the pat hway surface when α moves from -0.5 to 1 can be seen from Figures
1a-1c and Figure 3. The nature of the movement when α moves from 1 to 1.7 can be seen
from Figure 3 and Figures 2a-2c.
4 Conclusions
The multivariate Gaussian density and its central place in the procedure of maximizing
a generalized entropic for m of order α is the core result of t his paper. It contributes to
gain understanding of diﬀerent entropic forms and how they relate to each other by using
the parameter α (Mathai and Rathie [1], Masi [8]). This makes visible the pathway from
type-1 beta, through G aussian, to type-2 beta densities as they emerge depending on α
and shows the relation to entro pies of Boltzmann-Gibbs and Tsallis statistical mechanics
(Hilhorst and Schehr [9], Vignat and Plastino [10]). While the generalized entropic form
of order α may not have direct applications in statistical mechanics, it might be of in-
terest to information theory and to a better understanding of attempts to unify entropic
forms under either mathematical or physical principles. A graphical representation of the
pathway is given in Figures 1, 2, and 3.
Acknowledgement The authors would like to thank the Department of Science and
Technology, G overnment of India, New Delhi, for the ﬁnancial assistance for this work
under project No. SR/S4/MS:287/05 which enabled this collabora tio n possible.
References
[1] A.M. Mathai and P.N. Rathie, Basic Concepts in Information Theory and Statistics:
Axiomatic Foundations and Applications, Wiley Halsted, New York 1975.
[2] A.M. Mathai and H.J. Haubold, Pathway model, superstatistics, Tsallis statistics, and
a generalized measure of entropy, Physica A 375 (2007) 110-122.
8
Page 8
[3] A.M. Mathai, A review of the recent developments on generalized complex matrix-
varia te Dirichlet integrals, in Proceedings of the 7th International Conference of the
Society for Special Functions and their Applications (SSFA), Pune, India, 21-23 Febru-
ary 2006, Ed. A.K. Agarwal, Published by SSFA, pp. 131-142.
[4] W. Feller, An Introduction to Probability Theory and Its Applications, Volume I, Third
Edition, John Wiley and Sons, New York 1968 .
[5] H.L. Swinney and C. Tsallis (Eds.), Anomalous Distributions, Nonlinear Dynamics,
and Nonextensivity, Physics D, 193 (20 04) 1-356.
[6] S. Abe and Y. Okamoto (Eds.), Nonextensive Statistical Mechanics and Its Applica-
tions, Springer, Heidelb erg 2001.
[7] M. Gell-Mann and C. Tsallis (Eds.), Nonextensive Entropy: Interdisciplinary Appli-
cations, Oxford University Press, New York 2004.
[8] M. Masi, A step beyond Tsallis and Re’nyi entropies, Physics Letters A, 338 (2005)
217-224.
[9] H.J. Hilhorst and G. Schehr, A note on q-Gaussians and non-Gaussians in statistical
mechanics, Journal of Statistical Mechanics: Theory and Experiment, 2007, P06003.
[10] C. Vignat and A. Plastino, Scale invariance and related properties of q-Gaussian
systems, Physics Letters A, 365 (2007) 370-375.
9
Page 9
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.
- [Show abstract] [Hide abstract] ABSTRACT: Stimulated by the recent debate on the physical relevance and on the predictivity of q-Gaussian formalism, we present specific analytical expressions for the parameters characterizing non-Gaussian distributions, such as the nonextensive parameter q, expressions that we have proposed for different physical systems, an important example being plasmas in the stellar cores. Comment: 8 pages, no figures, two columns AAS Latex Macros, references updated. To be published in Astrophysics and Space Science