Content uploaded by Abraham Nunes
Author content
All content in this area was uploaded by Abraham Nunes on Jun 17, 2020
Content may be subject to copyright.
Article
Multiplicative Decomposition of Heterogeneity in
Mixtures of Continuous Distributions
Abraham Nunes1,†,∗, Martin Alda1and Thomas Trappenberg2
1Department of Psychiatry, Dalhousie University, Halifax, Nova Scotia, Canada
2Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
*Correspondence: nunes@dal.ca
†
Current address: 5909 Veterans Memorial Lane (8th Floor), Abbie J. Lane Memorial Building, QE I.I. Health
Sciences Centre, Halifax, Nova Scotia, B3H 2E2, Canada
Received: date; Accepted: date; Published: date
Abstract:
A system’s heterogeneity (diversity) is the effective size of its event space, and can be
quantified using the Rényi family of indices (also known as Hill numbers in ecology or Hannah-Kay
indices in economics), which are indexed by an elasticity parameter
q≥
0. Under these indices, the
heterogeneity of a composite system (the
γ
-heterogeneity) is decomposable into heterogeneity arising
from variation within and between component subsystems (the
α
- and
β
-heterogeneity, respectively).
Since the average heterogeneity of a component subsystem should not be greater than that of the
pooled system, we require that
γ≥α
. There exists a multiplicative decomposition for Rényi
heterogeneity of composite systems with discrete event spaces, but less attention has been paid to
decomposition in the continuous setting. We therefore describe multiplicative decomposition of
the Rényi heterogeneity for continuous mixture distributions under parametric and non-parametric
pooling assumptions. Under non-parametric pooling, the
γ
-heterogeneity must often be estimated
numerically, but the multiplicative decomposition holds such that
γ≥α
for
q>
0. Conversely,
under parametric pooling,
γ
-heterogeneity can be computed efficiently in closed-form, but the
γ≥α
condition holds reliably only at
q=
1. Our findings will further contribute to heterogeneity
measurement in continuous systems.
Keywords: Heterogeneity, Diversity, Decomposition, Gaussian mixture
1. Introduction
Measurement of heterogeneity is important across many scientific disciplines. Ecologists are
interested in the heterogeneity of ecosystems’ biological composition (biodiversity) [
1
], economists are
interested in the heterogeneity of resource ownership (wealth equality) [
2
], and medical researchers
and physicians are interested in the heterogeneity of diseases and their presentations [
3
]. Using Rényi
heterogeneity [
3
–
5
], which for categorical random variables corresponds to ecologists’ Hill numbers
[
6
] and economists’ Hannah-Kay indices [
7
], one can measure a system’s heterogeneity as its effective
number of distinct configurations.
The heterogeneity of a mixture or ensemble of systems is often known as
γ
-heterogeneity, and is
generated by variation occurring within and between constituent subsystems. A good heterogeneity
measure will facilitate decomposition of
γ
-heterogeneity into
α
(within subsystem) and
β
(between
subsystem) components. Under this decomposition, we require that
γ≥α
, since it is counterintuitive
that the heterogeneity of the overall ensemble should be less than any of its constituents, let alone the
“average” subsystem [
8
,
9
]. Such a decomposition was introduced by Jost
[9]
for systems represented
on discrete event spaces (such as representations of organisms by species labels). However, many data
are better modeled by continuous embeddings; including word semantics [
10
–
12
], genetic population
structure [
13
], and natural images [
14
]. Unfortunately, there is considerably less understood about how
to decompose Rényi heterogeneity in such cases where data are represented on non-categorical spaces
[
4
]. Although there are decomposable functional diversity indices expressed in numbers equivalent,
2 of 13
they require categorical partitioning of the data (in order to supply species (dis)similarity matrices)
[
15
–
18
] and setting sensitivity or threshold parameters for (dis)similarities [
16
,
18
]. For many research
applications, such as those in psychiatry [
3
,
4
,
19
] or involving unsupervised learning [
13
,
14
], we may
not have categorical partitions of the observable space that are valid, reliable, and of semantic relevance.
If we are to apply Rényi heterogeneity to such continuous-space systems, then we must demonstrate
that its multiplicative decomposition of γ-heterogeneity into αand βcomponents is retained.
Therefore, our present work extends the Jost
[9]
multiplicative decomposition of Rényi
heterogeneity to the analysis of continuous systems, and provides conditions under which the
γ≥α
condition is satisfied. In Section 2, we introduce decomposition of the Rényi heterogeneity in categorical
and continuous systems. Specifically, we highlight that the most important decision guiding the
availability of a decomposition is how one defines the distribution over the mixture of subsystems. We
show that for non-parametrically pooled systems (i.e. finite mixture models, illustrated in Section 3), the
γ≥α
condition can hold for all values of the Rényi elasticity parameter
q>
0, but that
γ
-heterogeneity
will generally require numerical estimation. Section 4introduces decomposition of Rényi heterogeneity
under parametric assumptions on the pooled system’s distribution. In this case, which amounts to a
Gaussian mixed-effects model (as commonly implemented in biomedical meta-analyses), we show
that
γ≥α
will hold at
q=
1, though not necessarily at
q6=
1. Finally, in Section 5, we discuss the
implications of our findings and scenarios in which parametric or non-parametric pooling assumptions
might be particularly useful.
2. Background
2.1. Categorical Rényi Heterogeneity Decomposition
In this section, we consider the definition and decomposition of Rényi heterogeneity for a
composite random variable (or “system”) that we call a discrete mixture (Definition 1).
Definition 1
(Discrete Mixture)
.
A random variable or system
X
is called a discrete mixture when it is defined
on an
n
-dimensional discrete state space
X={
1, 2,
. . .
,
n}
with probability distribution
¯
p=(¯
pi)i=1,2,...,n
,
where
¯
pi
is the probability that
X
is observed in state
i∈ X
. Furthermore, let
X
be an aggregation of
N
component subsystems
X1
,
X2
,
. . .
,
XN
with corresponding probability distributions
P=pij j=1,2,...,n
i=1,2,...,N
. The
proportion of
X
attributable to each component is governed by the weights
w=(wi)i=1,2,...,N
, where 0
≤wi≤
1
and ∑N
i=1wi=1.
Let Xbe a discrete mixture. The Rényi heterogeneity for the ithcomponent is
Πq(Xi)= n
∑
j=1
pq
ij !1
1−q
, (1)
which is the effective number of states in
Xi
. Assuming the pooled distribution over discrete mixture
Xis a weighted average of subsystem distributions, ¯
p=P>w, the γ-heterogeneity is thus
Πγ
q(X)= n
∑
i=1
¯
pq
i!1
1−q
, (2)
which we interpret as the effective number of states in the pooled system X.
Jost [9] proposed the following decomposition of γ-heterogeneity:
Πγ
q(X)=Πα
q(X)Πβ
q(X), (3)
3 of 13
where
Πα
q(X)
and
Πβ
q(X)
are summary measures of heterogeneity due to variation within and between
subsystems, respectively. Since the
γ
factor has units of effective number of states in the pooled system,
and αhas units of effective number of states per component, then
Πβ
q(X)=Πγ
q(X)
Πα
q(X)(4)
yields the effective number of components in X.
For discrete mixtures, Jost [9] specified the functional form for α-heterogeneity as
Πα
q(X)=
∑N
i=1
wq
i
∑N
k=1wq
k
∑n
j=1pq
ij 1
1−qq6=1
exp{− ∑N
i=1wi∑n
j=1pij log pi j}q=1
, (5)
which allows the decomposition in Equation 3to satisfy the following desiderata:
1. The αand βcomponents are independent [20]
2. The within-group heterogeneity is a lower bound on total heterogeneity [8]: Πα
q≤Πγ
q
3. The α-heterogeneity is a form of average heterogeneity over groups
4. The αand βcomponents are both expressed in numbers equivalent.
Specifically, Jost
[9]
proved that
Πγ
q(X)≥Πα
q(X)
is guaranteed for all
q≥
0 when
wi=wj
for all
(i,j)∈ {1, 2, . . . , N}, or for unequal weights wif the elasticity is set to the Shannon limit of q→1.
2.2. Continuous Rényi Heterogeneity Decomposition
Let
X
be a non-parametric continuous mixture according to Definition 2. Despite individual
mixture components in
X
potentially having parametric probability density functions, we call this a
“non-parametric” mixture because the distribution over pooled components does not assume the form
of a known parametric family.
Definition 2
(Non-parametric Continuous Mixture)
.
A non-parametric continuous mixture is a random
variable
X
defined on an
n
-dimensional continuous space
X ⊆ Rn
, and composed of subsystems
X1
,
X2
,
. . .
,
XN
,
with respective probability density functions
f(x) = {fi(x)}i=1,2,...,N
and weights
w=(wi)i=1,2,...,N
such that
∑N
i=1wi=1and 0≤wi≤1. The pooled probability density over X is defined as
¯
f(x) =
N
∑
i=1
wifi(x). (6)
The continuous Rényi heterogeneity for the ithsubsystem of Xis
Πq(Xi)=ZXfq
i(x)dx1
1−q, (7)
whose interpretation is given by Proposition 1(see Proposition A3 in Nunes et al.
[5]
for the proof),
which we henceforth call the “effective volume” of the event space or domain of Xi.
Proposition 1
(Rényi Heterogeneity of a Continuous Random Variable)
.
The Rényi heterogeneity of a
continuous random variable
X
defined on event space
X ⊆ Rn
with probability density function
f
is equal to
the magnitude of the volume of an
n
-cube over which there is a uniform probability density with the same Rényi
heterogeneity as that in X.
Given the pooled distribution as defined in Equation 6, the Rényi heterogeneity over the mixture,
which is the γ-heterogeneity, is
4 of 13
Πγ
q(X)=ZX
¯
fq(x)dx1
1−q. (8)
The
γ
-heterogeneity is thus the total effective volume of
X
’s domain. The
α
-heterogeneity represents
the effective volume per component mixture component in X, and is computed as follows:
Πα
q(X)= N
∑
i=1
wq
i
∑N
k=1wq
kZXfq
i(x)dx!1
1−q
. (9)
Given Equations 8and 9, the following theorem provides conditions under which
γ≥α
is
satisfied for a non-parametric continuous mixture. The proof is analogous to that given by Jost
[9]
for
discrete mixtures, and is detailed in Appendix A.
Theorem 1.
If
X
is a non-parametric continuous mixture (Definition 2), with
γ
-heterogeneity specified by
Equation 8and α-heterogeneity given by Equation 9, then
Πβ
q(X)=Πγ
q(X)
Πα
q(X)≥1 (10)
under the following conditions:
1. q =1
2. q >0when weights are equal for all mixture components.
If
RXfq
i(x)
d
x
is analytically tractable for all
i∈ {
1, 2,
. . .
,
N}
, then a closed form expression
for
Πα
q(X)
will be available. If
RX¯
fq(x)
d
x
is also analytically tractable, then so too will be
Πβ
q(X)
.
However, this will depend entirely on the functional form of
¯
f
, and will rarely be the case using real
world data. In the majority of cases, RX¯
fq(x)dxwill have to be computed numerically.
3. Rényi Heterogeneity Decomposition under a Non-parametric Pooling Distribution
Definition 3defines a general Gaussian mixture
X
as a weighted combination of component
Gaussian random variables, without identifying the function form of the composition. The
non-parametric Gaussian mixture, where the distribution over
X
is a simple model average over
it’s Gaussian components, is specified in Definition 4.
Definition 3
(Gaussian Mixture)
.
The
n
-dimensional Gaussian mixture
X
is a weighted combination of the set
of
n
-dimensional Gaussian random variables
{Xi}i=1,2,...,N
with component weights
w=(wi)i=1,2,...,N
such
that 0
≤wi≤
1and
∑N
i=1wi=
1. The probability density function of component
Xi
is denoted
N(x|µi,Σi)
,
and is parameterized by an n ×1mean vector µiand n ×n covariance matrix Σi.
Definition 4
(Non-parametric Gaussian Mixture)
.
We define the random variable
X
as a non-parametric
Gaussian mixture if it is a Gaussian mixture (Definition 3) whose probability density function is defined as
¯
f(x|µ1:N,Σ1: N,w) =
N
∑
i=1
wiN(x|µi,Σi), (11)
where
µ1:N
and
Σ1:N
denote the set of component mean vectors
µ1
,
. . .
,
µN
and covariance matrices
Σ1
,
. . .
,
ΣN
,
respectively.
We now introduce the Rényi heterogeneity of a single
n
-dimensional Gaussian random
variable (Proposition 2) and subsequently characterize the
γ
-,
α
-, and
β
-heterogeneity values for
a non-parametric Gaussian mixture.
5 of 13
Proposition 2
(Rényi Heterogeneity of a Multivariate Gaussian)
.
The Rényi heterogeneity of an
n-dimensional Gaussian random variable X with mean µand covariance matrix Σis
Πq(X)=
Undefined q=0
(2πe)n
2|Σ|1
2q=1
(2π)n
2|Σ|1
2q=∞
(2π)n
2qn
2(q−1)|Σ|1
2q/∈ {0, 1, ∞}
. (12)
The proof of Proposition 2is included in Appendix A. Unfortunately, a closed form solution such
as Equation 12 cannot be obtained for the γ-heterogeneity of a non-parametric Gaussian mixture,
Πγ
q(X)= ZX N
∑
i=1
wiN(x|µi,Σi)!q
dx!1
1−q
, (13)
which must be computed numerically to yield the effective size of the mixture’s domain. This process
may be computationally expensive, particularly in high dimensions. Conversely, Equation 9, which
yields the effective size of the domain per mixture component, can be evaluated in closed form for a
Gaussian mixture:
Πα
q(X)=
Undefined q=0
exp n1
2n+∑N
i=1wilog |2πΣi|o q=1
0q=∞
(2π)n
2∑N
i=1
wq
i
∑N
j=1wq
j
|Σi|1
2
qn
21
1−q
q/∈ {0, 1, ∞}
. (14)
The
β
-heterogeneity, which returns the effective number of components in the mixture, can then
be computed using Equation 4. Example 1demonstrates an important property of considering
X
as a
non-parametric Gaussian mixture: that low-probability regions of the domain between well-separated
components will have little to no effect on the γ- or β-heterogeneity estimates.
Example 1
(Decomposition of Rényi heterogeneity in a univariate Gaussian mixture)
.
Consider three
non-parametric Gaussian mixtures
X(1)
,
X(2)
,
X(3)
defined on
R
whose number of components are respectively
N1=
2,
N2=
3, and
N3=
4. Components in each mixture are equally weighted—that is, the components
of mixture
X(j)
have weights
w(j)
i=
1
/Nj
for all
i∈ {
1, 2,
. . .
,
Nj}
—and have equal standard deviation
σ=
0.5. This yields a per-component Rényi heterogeneity of approximately 2.07, which is also consequently the
α-heterogeneity for each Gaussian mixture.
Figure 1demonstrates the multiplicative decomposition of Rényi heterogeneity (at
q=
1) in these Gaussian
mixtures, where
γ
-heterogeneity was computed numerically, across varying separations of respective mixtures’
component means. Note that the
β
-heterogeneity in this case represents the effective number of distinct
components in the mixture distribution, and is bound between 1 (when all components overlap), and
Nj
(when
all components are well separated). Further separating the mixture components beyond the point at which
β-heterogeneity reaches Njyielded no further increase in β-heterogeneity.
6 of 13
0.0
0.5
Density
= 2.1, =1.0 = 4.1, =2.0 = 4.1, =2.0 = 4.1, =2.0
0.0
0.5
Density
= 2.1, =1.0 = 5.7, =2.8 = 6.2, =3.0 = 6.2, =3.0
10 0 10
x
0.0
0.5
Density
= 2.1, =1.0
10 0 10
x
= 5.9, =2.9
10 0 10
x
= 8.1, =3.9
10 0 10
x
= 8.3, =4.0
Figure 1.
Demonstration of the multiplicative decomposition of Rényi heterogeneity in Gaussian
mixture models, where
γ
-heterogeneity is computed using numerical integration. Each row represents
a different number of mixture components (from top to bottom: 2, 3, and 4 univariate Gaussians with
σ=
0.5, respectively). Each column shows a case in which the component locations are progressively
further separated (
maxiµi−miniµi
distance from left to right: 0, 2, 4, 6). The
α
-heterogeneity in all
scenarios was ≈2.07. The headings on each panel show the resulting γand β-heterogeneity values.
Assuming sufficiently accurate approximation of the integral in Equation 13, the
γ
-heterogeneity
in Example 1appears to reach a limit corresponding to the sum of effective domain sizes under
all mixture components, and the
β
-heterogeneity reaches a limit corresponding to the number of
individual mixture components.
Unfortunately, computation of
β
-heterogeneity in a non-parametric Gaussian mixture will yield
results whose accuracy will depend on the error of numerical integration, and which may consume
significant computational resources when evaluated for large
N
(many components) and large
n
(high
dimension). Although the non-parametric pooling approach may be the only available method for
many distribution classes, a computationally efficient parametric pooling approach exists for Gaussian
mixtures, to which we now turn our attention.
4. Rényi Heterogeneity Decomposition Under a Parametric Pooling Distribution
This section introduces the parametric Gaussian mixture (Definition 5), and subsequently
provides conditions under which decomposition of its heterogeneity satisfies the requirement that
α-heterogeneity be a lower bound on γ-heterogeneity (Theorem 2).
Definition 5
(Parametric Gaussian Mixture)
.
We define the random variable
X
as an
n
-dimensional
parametric Gaussian mixture if it is a Gaussian mixture (Definition 3) whose probability density function
is defined as
¯
f(x|µ∗,Σ∗) = N(x|µ∗,Σ∗), (15)
with pooled mean vector
µ∗=
N
∑
i=1
wiµi, (16)
and pooled covariance matrix
7 of 13
Σ∗=−µ∗µ>
∗+
N
∑
i=1
wiΣi+µiµ>
i. (17)
The efficiency of assuming a parametric, rather than non-parametric, Gaussian mixture is that
γ
-heterogeneity for the latter may be computed in closed form using Equation 12 (it is simply a
function of Equation 17). However, the critical difference between the parametric and non-parametric
Gaussian mixture assumptions is that
γ
-heterogeneity—and therefore
β
-heterogeneity—will depend
on the component means µ1:N, according to the following Lemma.
Lemma 1
(Relationship of
γ
-Heterogeneity to Component Dispersion)
.
Let
X
and
X0
be
N
-component
parametric Gaussian mixtures on
Rn
with component-wise mean vectors
µ1:N={µi}i=1,2,...,N
and
µ0
1:N=
√cµii=1,2,...,N
, where
c≥
1is a scaling factor. The component-wise weights
w
and covariance matrices
Σ1:N={Σi}i=1,2,...,N
are identical between
X
and
X0
. Finally, let
Σ∗
and
Σ0
∗
be the pooled covariance matrices
for X and X0, respectively. Then, for all c ≥1, we have that
Πγ
qX0≥Πγ
q(X), (18)
with equality if c =1.
Lemma 1, whose proof is detailed in Appendix A, implies that the resulting
β
-heterogeneity of a
parametric Gaussian mixture will increase as the mixture component means are spread further apart.
This follows from the fact that Equation 14, which is computed component-wise, remains a valid
expression of the α-heterogeneity in a parametric Gaussian mixture.
Before stating the conditions under which
α
is a lower bound on
γ
for a parametric Gaussian
mixture (Theorem 2), we introduce the following Lemma, whose proof is left to Appendix A.
Lemma 2.
If
{Σi}i=1,2,...,N
is a set of
N∈N≥2
positive semidefinite
n×n
matrices with corresponding
weights w=(wi)i=1,2,...,Nsuch that 0≤wi≤1and ∑N
i=1wi=1, then
N
∑
i=1
wiΣi
1
2
≥
N
∑
i=1|Σi|wi
2. (19)
Theorem 2.
The Rényi
β
-heterogeneity of order
q=
1of a parametric Gaussian mixture
X
(Definition 5) has a
lower bound of 1:
Πβ
1(X)=Πγ
1(X)
Πα
1(X)≥1 (20)
Proof.
Recall that
Πα
q(X)
is independent of the mean-vectors of components in
X
(Equation 14).
Furthermore, it follows from Lemma 1that if
µ1:N={0}i=1,2,...,N
, where
0
is an
n×
1 zero vector,
then for any parametric Gaussian mixture
X0
with means
µ0
1:N
we will have
Πγ
q(X0)≥Πγ
q(X)
, where
equality is obtained if µ0
1:Nare also zero vectors, or the covariance of mean vectors in X0,
Cov[µ0] = E[µ0µ0>]−E[µ0]E[µ0]>, (21)
is otherwise singular. Thus, it suffices to prove our theorem under the assumption that
µ1:N=
{0}i=1,2,...,N, where the pooled covariance of Xis redefined as
Σ∗=
N
∑
i=1
wiΣi. (22)
The expression for Πγ
1(X)≥Πα
1(X)is
8 of 13
(2πe)n
2|Σ∗|1
2≥exp (1
2 n+
N
∑
i=1
wilog |2πΣi|!), (23)
which after simplification,
|Σ∗|1
2≥
N
∏
i=1|Σi|wi
2, (24)
can be appreciated to satisfy Lemma 2.
Although Theorem 2highlights the reliability and flexibility of using elasticity
q=
1, we must
emphasize that
q=
1 may not be the only condition under which
Πγ
q(X)≥Πα
q(X)
, as suggested by
Example 2. Indeed, Example 2suggests that the integrity of this bound on
β
-heterogeneity at elasticity
values
q6=
1 may depend in various ways on the unique combination of component-wise parameters
in a parametric Gaussian mixture.
Example 2
(Decomposition of Rényi Heterogeneity in a Parametric Gaussian Mixture)
.
Consider a
parametric Gaussian mixture
X
with four components defined on
R
(for instance, Figure 2A). The components’
respective standard deviations are
σ=(0.5, 0.8, 1.1, 1.6)
. We vary the column vector of mixture component
weights w=(wi)i=1,...,4 according to the following function,
w(a) =
(1, 0, 0, 0)>a=0
(0.25, 0.25, 0.25, 0.25)>a=1
(0, 0, 0, 1)>a=∞
a1
3−1
a4
3−1
ai−1
3i=1,...,4
a/∈ {0, 1, ∞}
(25)
which “skews” the distribution of weights over components in
X
according to the value of a skew parameter
a≥
0(shown in Figure 2B. As the parameter
a
decreases further below 1, components
X1
and
X2
(which have
the narrowest distributions) become preferentially weighted. Conversely, as
a
increases above 1, components
X3
and
X4
are preferentially weighted. At
a=
1, all components are equally weighted (depicted as the dashed black
lines in Figure 2B-F).
9 of 13
5.0 2.5 0.0 2.5 5.0
x
0.0
0.2
0.4
0.6
pdf
(A) Mixture Components
X
1
X
2
X
3
X
4
X
1
X
2
X
3
X
4
Mixture Component
0.2
0.3
0.4
Weight
(B) Mixture Weight Skews
0123
q
4
6
8
10
12
14
q
(
X
)
(C) -Heterogeneity
Mixture Weighting
W
1
W
2
W
3
W
4
W
5
W
6
W
7
0123
q
0
2
4
6
8
10
12
q
(
X
)
(D) -Heterogeneity
0123
q
0.0
0.5
1.0
1.5
2.0
2.5
3.0
q
(
X
)
(E) -Heterogeneity (
q
1)
10 1100101102103
(
X
1) Weight Skew (
X
4)
1.05
1.10
1.15
1(
X
)
W
1
W
4
W
7
(F) -Heterogeneity at
q
= 1
Figure 2.
Graphical counterexample showing that
α
-heterogeneity is not always a lower bound on
γ
-heterogeneity when
q6=
1 for a parametric Gaussian mixture.
Panel A
: Four univariate Gaussian
components used in the mixture distribution evaluated.
Panel B
: Mixture component weights. Each
colored line (see bottom right of Figure for legend) represents a different distribution of weights on the
mixture components, such that in some settings, the most narrow components are weighted highest,
and vice versa.
Panel C
:
γ
-heterogeneity as computed by pooling the mixture components from Panel
A according to Equation 15, for each weighting scheme at
q6=
1.
Panel D
: The
α
-heterogeneity for
each weighting scheme at
q6=
1.
Panel E
: The
β
-heterogeneity across each weighting scheme at
q6=
1.
Panel F
: The
β
-heterogeneity across various weighting schemes (plotted on the x-axis in log scale) at
q=
1. The vertical coloured lines correspond to the values of
Πβ
1(X)
across the weighting schemes
W1:7 shown in the legend of Panel C.
Figures 2C-E plot the
γ
-,
α
-, and
β
-heterogeneity for the parametric Gaussian mixture at
q6=
1, respectively,
while Figure 2F computes the
β
-heterogeneity at
q=
1for variously skewed weight distributions. When the
skew parameter results in a distribution of weights whose ranking of components agrees with the rank order
of component distribution widths (
σ
), then
β
-heterogeneity appears to exceed 1 for
q>
1. However, when the
component weights and distribution widths are anti-correlated (in terms of rank order), then we observe values
of β-heterogeneity below 1 at values of q >1, as well as for some values of q <1.
5. Discussion
This paper provided approaches for multiplicative decomposition of heterogeneity in
continuous mixture distributions, thereby extending the earlier work on discrete space heterogeneity
decomposition presented by Jost
[9]
. Two approaches were offered, dependent upon whether the
distribution over the pooled system is defined either parametrically or non-parametrically. Our results
improve the understanding of heterogeneity measurement in non-categorical systems by providing
conditions under which decomposition of heterogeneity into
α
and
β
components conforms to the
intuitive property that γ≥α.
If one defines the pooled mixture non-parametrically, as in a finite mixture model, heterogeneity is
decomposable such that
γ≥α
for all
q>
0 (if component weights are uniform, or at
q=
1 otherwise),
and
β
may be interpreted as the discrete number of distinct mixture components (Sections 2.2 &3).
This has the advantage of conforming with the original discrete decomposition by Jost
[9]
, insofar as
probability mass in the mixture is recorded only where it is observed in the data, and not elsewhere,
as would be assumed under a parametric model of the pooled system. Consequently, one achieves a
more precise estimate of the size of the pooled system’s base of support. The primary limitation arises
from the need to numerically integrate the
γ
-heterogeneity, which can become prohibitively expensive
in higher dimensions. Future work should investigate the error bounds on numerically integrated γ.
10 of 13
A more computationally efficient approach for decomposition of continuous Rényi heterogeneity
is to assume that the pooled mixture has an overall parametric distribution. A common application for
which this assumption is generally made is in mixed-effects meta-analysis [
21
]. An important departure
from the non-parametric pooling approach of finite mixture models is that non-trivial probability mass
may now be assigned to regions not covered by any of the constituent component distributions. From
another perspective, one may appreciate that the non-parametric approach to pooling is insensitive to
the distance between component distributions, and rather only measures the effective volume of event
space to which component distributions assign probability. Conversely, assumption of the parametric
distribution over mixture (in the case of Section 4, a Gaussian) incorporates the distance between
the component distributions into the calculation of
γ
-heterogeneity. This would be appropriate in
scenarios where one assumes that the observed components undersamples the true distribution on
the pooled system. For example, in the case of mixed-effects meta-analysis, the available research
studies for inclusion may differ significantly in terms of their means, but one might assume that there
is a significant probability of a new study yielding an effect somewhere in between. Specifying a
parametric distribution over the pooled system would capture this assumption.
One limitation of the present study is the use of a Gaussian model for the pooled system
distribution. This was chosen on account of (A) its prevalence in the scientific literature and (B)
analytical tractability. Future work should expand these results to other distributions. Notwithstanding,
we have demonstrated the decomposition of
γ
Rényi heterogeneity into its
α
and
β
components for
continuous systems. There are (broadly) two approaches, based on whether parametric assumptions
are made about the pooled system distribution. Under these assumptions applied to Gaussian mixture
distributions, we provided conditions under which the criterion that
γ≥α
is satisfied. Future
studies should evaluate this method as an alternative approach for the measurement of meta-analytic
heterogeneity, and expand these results to other parametric distributions over the pooled system.
Author Contributions:
Conceptualization, A.N.; methodology, A.N.; validation, A.N.; formal analysis, A.N.;
investigation, A.N.; writing–original draft preparation, A.N.; writing–review and editing, M.A. and T.T.;
visualization, A.N.; supervision, M.A. and T.T.
Funding: This research received no external funding.
Conflicts of Interest: The authors declare no conflict of interest.
Appendix A. Proofs
Proof of Theorem 1.
Following Jost
[9]
(proof 2), in the limit
q→
1, one obtains the following
inequality
−
N
∑
i=1
wiZXfi(x)log fi(x)dx≤ −ZX
¯
f(x)log ¯
f(x)dx, (A1)
whereas when wi=wjfor all (i,j)∈ {1, 2, . . . , N}, for q>1 we have
1
N
N
∑
i=1ZXfq
i(x)dx≥ZX 1
N
N
∑
i=1
fi(x)!q
dx. (A2)
and for q<0 we have
1
N
N
∑
i=1ZXfq
i(x)dx≤ZX 1
N
N
∑
i=1
fi(x)!q
dx, (A3)
all of which hold by Jensen’s inequality.
Proof of Proposition 2.We must solve the following integral:
11 of 13
Πq(X)=(2π)−qn
2|Σ|−q
2ZRne−q
2(x−µ)>Σ−1(x−µ)dx1
1−q(A4)
The eigendecomposition of the inverse of the covariance matrix
Σ−1
into an orthonormal matrix
of eigenvectors
U
and an
n×n
diagonal matrix of eigenvalues
Λ=δij λij=1,2,...,n
i=1,2,...,n
, where
δij
is
Kronkecker’s delta, facilitates the substitution
y=U−1(x−µ)
required for Gaussian integration, by
which we obtain the following solution for q/∈ {0, 1, ∞}:
Πq(X)=qn
2(q−1)(2π)n
2|Σ|1
2. (A5)
L’Hôpital’s rule facilitates computation of the limit as q→1:
lim
q→1log Πq(X)=lim
q→1n
2(q−1)log q+n
2log(2π) + 1
2log |Σ|
=n
2+n
2log(2π) + 1
2log |Σ|,
(A6)
giving the perplexity,
Π1(X)=(2πe)n
2|Σ|1
2. (A7)
By the same procedure, we can compute the limit as q→∞,
Π∞(X)=(2π)n
2|Σ|1
2, (A8)
as well as show that Π0(X)is undefined.
Proof of Lemma 1.
For all
q>
0, proving
Πγ
q(X0)≥Πγ
q(X)
amounts to proving
Σ0
∗
1
2≥|Σ∗|1
2
. To
this end, we have
Σ0
∗=
N
∑
i=1
wiΣi+
N
∑
i=1
wi√cµi√cµi>− N
∑
i=1
wi√cµi! N
∑
i=1
wi√cµi!>
(A9)
=
N
∑
i=1
wiΣi+c N
∑
i=1
wiµiµ>
i−µ∗µ>
∗!(A10)
=ˆ
Σ+cC[µ](A11)
and
Σ∗=ˆ
Σ+C[µ], (A12)
where we denoted
ˆ
Σ=∑N
i=1wiΣi
and
C[µ] = ∑N
i=1wiµiµ>
i−µ∗µ>
∗
for notational parsimony. Clearly,
when c=1, we have Σ0
∗=Σ∗.
By the Minkowski determinant inequality, we have that
Σ0
∗
1
2≥ˆ
Σ
1
2+cn
2|C[µ]|1
2(A13)
|Σ∗|1
2≥ˆ
Σ
1
2+|C[µ]|1
2, (A14)
which, since
c≥
1, implies the first line is greater than or equal to the second. Subtracting the second
line from the first and simplifying yields
12 of 13
Σ0
∗
1
2−|Σ∗|1
2
|C[µ]|1
2≥cn
2−1 (A15)
At
c=
1 Equation A15 reduces to an equality, and since
c≥
1 and
n≥
1, Equation A15 establishes that
Σ0
∗
1
2≥|Σ∗|1
2.
Proof of Lemma 2.
Since
Σ1:N
are positive semidefinite matrices, then for all
x∈Rn
, we have that
−1
2x>(wiΣi)x≤
0, and thus
−1
2x>∑N
i=1wiΣix≤
0. By exponentiating the quadratic term, we have
e−1
2x>(∑N
i=1wiΣi)x=
N
∏
i=1e−1
2x>Σixwi. (A16)
We obtain the following expressions by applying Gaussian integration to the left hand side,
ZRne−1
2x>(∑N
i=1wiΣi)xdx=(2π)n
2 N
∑
i=1
wiΣi!
−1
2
, (A17)
as well as to a bound on the right hand side obtained by Hölder’s inequality,
ZRn
N
∏
i=1e−1
2x>Σixwidx≤
N
∏
i=1ZRne−1
2x>Σixdxwi
(A18)
=(2π)n
2 N
∏
i=1|Σi|−wi
2!. (A19)
Substituting Equations A17 and A19 into Equation A16 and simplifying terms yields
N
∑
i=1
wiΣi
1
2
≥
N
∏
i=1|Σi|wi
2. (A20)
References
1.
Hooper, D.; Chapin, F.; Ewel, J.; Hector, A.; Inchausti, P.; Lavorel, S.; Lawton, J.; Lodge, D.; Loreau,
M.; Naeem, S.; Schmid, B.; Setälä, H.; Symstad, A.; Vandermeer, J.; Wardle, D. Effects of biodiversity
on ecosystem functioning: A consensus of current knowledge. Ecological Monographs
2005
,75, 3–35,
[arXiv:1011.1669v3]. doi:10.1890/04-0922.
2. Cowell, F. Measuring Inequality, 2nd ed.; Oxford University Press: Oxford, UK, 2011.
3.
Nunes, A.; Trappenberg, T.; Alda, M. We need an operational framework for heterogeneity in psychiatric
research. Journal of Psychiatry and Neuroscience 2020,45, 3–6. doi:10.1503.jpn/190198.
4.
Nunes, A.; Trappenberg, T.; Alda, M. The Definition and Measurement of Heterogeneity. PsyArXiv
2020
.
doi:10.31234/osf.io/3hykf.
5.
Nunes, A.; Alda, M.; Bardouille, T.; Trappenberg, T. Representational Rényi heterogeneity. Entropy
2020
,
22(4).
6.
Hill, M. Diversity and Evenness: A Unifying Notation and Its Consequences. Ecology
1973
,54, 427–432,
[arXiv:arXiv:astro-ph/0507464v2]. doi:10.2307/1934352.
7.
Hannah, L.; Kay, J. Concentration in Modern Industry: Theory, Measurement, and the U.K. Experience.; The
MacMillan Press: London, UK, 1977.
8.
Lande, R. Statistics and partitioning of species diversity and similarity among multiple communities. Oikos
1996,76, 5–13.
13 of 13
9.
Jost, L. Partitioning Diversity into Independent Alpha and Beta Components. Ecology
2007
,88, 2427–2439,
[1106.4388]. doi:10.1002/ecy.2039.
10.
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Distributed representations of words and hrases and their
compositionality. NIPS, 2013, pp. 1–9, [1310.4546].
11.
Pennington, J.; Socher, R.; Manning, C. Glove: Global Vectors for Word Representation. Proceedings of the
2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532–1543,
[1504.06654]. doi:10.3115/v1/D14-1162.
12.
Nickel, M.; Kiela, D. Poincaré embeddings for learning hierarchical representations. Advances in Neural
Information Processing Systems, 2017, Vol. 2017-Decem, pp. 6339–6348.
13.
Price, A.L.; Patterson, N.J.; Plenge, R.M.; Weinblatt, M.E.; Shadick, N.a.; Reich, D. Principal components
analysis corrects for stratification in genome-wide association studies. Nature genetics
2006
,38, 904–909.
doi:10.1038/ng1847.
14.
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image
Quality of StyleGAN. CoRR 2019,abs/1912.04958.
15.
Ricotta, C.; Szeidl, L. Diversity partitioning of Rao’s quadratic entropy. Theoretical Population Biology
2009
,
76, 299–302.
16.
Leinster, T.; Cobbold, C. Measuring diversity: The importance of species similarity. Ecology
2012
,
93, 477–489, [1106.4388].
17.
Chiu, C.; Chao, A. Distance-based functional diversity measures and their decomposition: A framework
based on hill numbers. PLoS ONE 2014,9.
18.
Chao, A.; Chiu, C.H.; Villéger, S.; Sun, I.F.; Thorn, S.; Lin, Y.C.; Chiang, J.M.; Sherwin, W.B. An
attribute-diversity approach to functional diversity, functional beta diversity, and related (dis)similarity
measures. Ecological Monographs 2019,89, e01343. doi:10.1002/ecm.1343.
19.
Marquand, A.; Wolfers, T.; Mennes, M.; Buitelaar, J.; Beckmann, C. Beyond Lumping and Splitting: A
Review of Computational Approaches for Stratifying Psychiatric Disorders. Biological Psychiatry: Cognitive
Neuroscience and Neuroimaging 2016,1, 433–447.
20.
Wilson, M.; Shmida, A. Measuring Beta Diversity with Presence-Absence Data. The Journal of Ecology
1984
,
72, 1055. doi:10.2307/2259551.
21.
DerSimonian, R.; Laird, N. Meta-analysis in clinical trials. Controlled Clinical Trials
1986
,7, 177–188.
doi:10.1016/0197-2456(86)90046-2.