PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

A system's heterogeneity (\textit{diversity}) is the effective size of its event space, and can be quantified using the R\'enyi family of indices (also known as Hill numbers in ecology or Hannah-Kay indices in economics), which are indexed by an elasticity parameter $q \geq 0$. Under these indices, the heterogeneity of a composite system (the $\gamma$-heterogeneity) is decomposable into heterogeneity arising from variation \textit{within} and \textit{between} component subsystems (the $\alpha$- and $\beta$-heterogeneity, respectively). Since the average heterogeneity of a component subsystem should not be greater than that of the pooled system, we require that $\gamma \geq \alpha$. There exists a multiplicative decomposition for R\'enyi heterogeneity of composite systems with discrete event spaces, but less attention has been paid to decomposition in the continuous setting. We therefore describe multiplicative decomposition of the R\'enyi heterogeneity for continuous mixture distributions under parametric and non-parametric pooling assumptions. Under non-parametric pooling, the $\gamma$-heterogeneity must often be estimated numerically, but the multiplicative decomposition holds such that $\gamma \geq \alpha$ for $q > 0$. Conversely, under parametric pooling, $\gamma$-heterogeneity can be computed efficiently in closed-form, but the $\gamma \geq \alpha$ condition holds reliably only at $q=1$. Our findings will further contribute to heterogeneity measurement in continuous systems.
Content may be subject to copyright.
Article
Multiplicative Decomposition of Heterogeneity in
Mixtures of Continuous Distributions
Abraham Nunes1,†,, Martin Alda1and Thomas Trappenberg2
1Department of Psychiatry, Dalhousie University, Halifax, Nova Scotia, Canada
2Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
*Correspondence: nunes@dal.ca
Current address: 5909 Veterans Memorial Lane (8th Floor), Abbie J. Lane Memorial Building, QE I.I. Health
Sciences Centre, Halifax, Nova Scotia, B3H 2E2, Canada
Received: date; Accepted: date; Published: date
Abstract:
A system’s heterogeneity (diversity) is the effective size of its event space, and can be
quantified using the Rényi family of indices (also known as Hill numbers in ecology or Hannah-Kay
indices in economics), which are indexed by an elasticity parameter
q
0. Under these indices, the
heterogeneity of a composite system (the
γ
-heterogeneity) is decomposable into heterogeneity arising
from variation within and between component subsystems (the
α
- and
β
-heterogeneity, respectively).
Since the average heterogeneity of a component subsystem should not be greater than that of the
pooled system, we require that
γα
. There exists a multiplicative decomposition for Rényi
heterogeneity of composite systems with discrete event spaces, but less attention has been paid to
decomposition in the continuous setting. We therefore describe multiplicative decomposition of
the Rényi heterogeneity for continuous mixture distributions under parametric and non-parametric
pooling assumptions. Under non-parametric pooling, the
γ
-heterogeneity must often be estimated
numerically, but the multiplicative decomposition holds such that
γα
for
q>
0. Conversely,
under parametric pooling,
γ
-heterogeneity can be computed efficiently in closed-form, but the
γα
condition holds reliably only at
q=
1. Our findings will further contribute to heterogeneity
measurement in continuous systems.
Keywords: Heterogeneity, Diversity, Decomposition, Gaussian mixture
1. Introduction
Measurement of heterogeneity is important across many scientific disciplines. Ecologists are
interested in the heterogeneity of ecosystems’ biological composition (biodiversity) [
1
], economists are
interested in the heterogeneity of resource ownership (wealth equality) [
2
], and medical researchers
and physicians are interested in the heterogeneity of diseases and their presentations [
3
]. Using Rényi
heterogeneity [
3
5
], which for categorical random variables corresponds to ecologists’ Hill numbers
[
6
] and economists’ Hannah-Kay indices [
7
], one can measure a system’s heterogeneity as its effective
number of distinct configurations.
The heterogeneity of a mixture or ensemble of systems is often known as
γ
-heterogeneity, and is
generated by variation occurring within and between constituent subsystems. A good heterogeneity
measure will facilitate decomposition of
γ
-heterogeneity into
α
(within subsystem) and
β
(between
subsystem) components. Under this decomposition, we require that
γα
, since it is counterintuitive
that the heterogeneity of the overall ensemble should be less than any of its constituents, let alone the
“average” subsystem [
8
,
9
]. Such a decomposition was introduced by Jost
[9]
for systems represented
on discrete event spaces (such as representations of organisms by species labels). However, many data
are better modeled by continuous embeddings; including word semantics [
10
12
], genetic population
structure [
13
], and natural images [
14
]. Unfortunately, there is considerably less understood about how
to decompose Rényi heterogeneity in such cases where data are represented on non-categorical spaces
[
4
]. Although there are decomposable functional diversity indices expressed in numbers equivalent,
2 of 13
they require categorical partitioning of the data (in order to supply species (dis)similarity matrices)
[
15
18
] and setting sensitivity or threshold parameters for (dis)similarities [
16
,
18
]. For many research
applications, such as those in psychiatry [
3
,
4
,
19
] or involving unsupervised learning [
13
,
14
], we may
not have categorical partitions of the observable space that are valid, reliable, and of semantic relevance.
If we are to apply Rényi heterogeneity to such continuous-space systems, then we must demonstrate
that its multiplicative decomposition of γ-heterogeneity into αand βcomponents is retained.
Therefore, our present work extends the Jost
[9]
multiplicative decomposition of Rényi
heterogeneity to the analysis of continuous systems, and provides conditions under which the
γα
condition is satisfied. In Section 2, we introduce decomposition of the Rényi heterogeneity in categorical
and continuous systems. Specifically, we highlight that the most important decision guiding the
availability of a decomposition is how one defines the distribution over the mixture of subsystems. We
show that for non-parametrically pooled systems (i.e. finite mixture models, illustrated in Section 3), the
γα
condition can hold for all values of the Rényi elasticity parameter
q>
0, but that
γ
-heterogeneity
will generally require numerical estimation. Section 4introduces decomposition of Rényi heterogeneity
under parametric assumptions on the pooled system’s distribution. In this case, which amounts to a
Gaussian mixed-effects model (as commonly implemented in biomedical meta-analyses), we show
that
γα
will hold at
q=
1, though not necessarily at
q6=
1. Finally, in Section 5, we discuss the
implications of our findings and scenarios in which parametric or non-parametric pooling assumptions
might be particularly useful.
2. Background
2.1. Categorical Rényi Heterogeneity Decomposition
In this section, we consider the definition and decomposition of Rényi heterogeneity for a
composite random variable (or “system”) that we call a discrete mixture (Definition 1).
Definition 1
(Discrete Mixture)
.
A random variable or system
X
is called a discrete mixture when it is defined
on an
n
-dimensional discrete state space
X={
1, 2,
. . .
,
n}
with probability distribution
¯
p=(¯
pi)i=1,2,...,n
,
where
¯
pi
is the probability that
X
is observed in state
i∈ X
. Furthermore, let
X
be an aggregation of
N
component subsystems
X1
,
X2
,
. . .
,
XN
with corresponding probability distributions
P=pij j=1,2,...,n
i=1,2,...,N
. The
proportion of
X
attributable to each component is governed by the weights
w=(wi)i=1,2,...,N
, where 0
wi
1
and N
i=1wi=1.
Let Xbe a discrete mixture. The Rényi heterogeneity for the ithcomponent is
Πq(Xi)= n
j=1
pq
ij !1
1q
, (1)
which is the effective number of states in
Xi
. Assuming the pooled distribution over discrete mixture
Xis a weighted average of subsystem distributions, ¯
p=P>w, the γ-heterogeneity is thus
Πγ
q(X)= n
i=1
¯
pq
i!1
1q
, (2)
which we interpret as the effective number of states in the pooled system X.
Jost [9] proposed the following decomposition of γ-heterogeneity:
Πγ
q(X)=Πα
q(X)Πβ
q(X), (3)
3 of 13
where
Πα
q(X)
and
Πβ
q(X)
are summary measures of heterogeneity due to variation within and between
subsystems, respectively. Since the
γ
factor has units of effective number of states in the pooled system,
and αhas units of effective number of states per component, then
Πβ
q(X)=Πγ
q(X)
Πα
q(X)(4)
yields the effective number of components in X.
For discrete mixtures, Jost [9] specified the functional form for α-heterogeneity as
Πα
q(X)=
N
i=1
wq
i
N
k=1wq
k
n
j=1pq
ij 1
1qq6=1
exp{− N
i=1win
j=1pij log pi j}q=1
, (5)
which allows the decomposition in Equation 3to satisfy the following desiderata:
1. The αand βcomponents are independent [20]
2. The within-group heterogeneity is a lower bound on total heterogeneity [8]: Πα
qΠγ
q
3. The α-heterogeneity is a form of average heterogeneity over groups
4. The αand βcomponents are both expressed in numbers equivalent.
Specifically, Jost
[9]
proved that
Πγ
q(X)Πα
q(X)
is guaranteed for all
q
0 when
wi=wj
for all
(i,j)∈ {1, 2, . . . , N}, or for unequal weights wif the elasticity is set to the Shannon limit of q1.
2.2. Continuous Rényi Heterogeneity Decomposition
Let
X
be a non-parametric continuous mixture according to Definition 2. Despite individual
mixture components in
X
potentially having parametric probability density functions, we call this a
“non-parametric” mixture because the distribution over pooled components does not assume the form
of a known parametric family.
Definition 2
(Non-parametric Continuous Mixture)
.
A non-parametric continuous mixture is a random
variable
X
defined on an
n
-dimensional continuous space
X Rn
, and composed of subsystems
X1
,
X2
,
. . .
,
XN
,
with respective probability density functions
f(x) = {fi(x)}i=1,2,...,N
and weights
w=(wi)i=1,2,...,N
such that
N
i=1wi=1and 0wi1. The pooled probability density over X is defined as
¯
f(x) =
N
i=1
wifi(x). (6)
The continuous Rényi heterogeneity for the ithsubsystem of Xis
Πq(Xi)=ZXfq
i(x)dx1
1q, (7)
whose interpretation is given by Proposition 1(see Proposition A3 in Nunes et al.
[5]
for the proof),
which we henceforth call the “effective volume” of the event space or domain of Xi.
Proposition 1
(Rényi Heterogeneity of a Continuous Random Variable)
.
The Rényi heterogeneity of a
continuous random variable
X
defined on event space
X Rn
with probability density function
f
is equal to
the magnitude of the volume of an
n
-cube over which there is a uniform probability density with the same Rényi
heterogeneity as that in X.
Given the pooled distribution as defined in Equation 6, the Rényi heterogeneity over the mixture,
which is the γ-heterogeneity, is
4 of 13
Πγ
q(X)=ZX
¯
fq(x)dx1
1q. (8)
The
γ
-heterogeneity is thus the total effective volume of
X
’s domain. The
α
-heterogeneity represents
the effective volume per component mixture component in X, and is computed as follows:
Πα
q(X)= N
i=1
wq
i
N
k=1wq
kZXfq
i(x)dx!1
1q
. (9)
Given Equations 8and 9, the following theorem provides conditions under which
γα
is
satisfied for a non-parametric continuous mixture. The proof is analogous to that given by Jost
[9]
for
discrete mixtures, and is detailed in Appendix A.
Theorem 1.
If
X
is a non-parametric continuous mixture (Definition 2), with
γ
-heterogeneity specified by
Equation 8and α-heterogeneity given by Equation 9, then
Πβ
q(X)=Πγ
q(X)
Πα
q(X)1 (10)
under the following conditions:
1. q =1
2. q >0when weights are equal for all mixture components.
If
RXfq
i(x)
d
x
is analytically tractable for all
i∈ {
1, 2,
. . .
,
N}
, then a closed form expression
for
Πα
q(X)
will be available. If
RX¯
fq(x)
d
x
is also analytically tractable, then so too will be
Πβ
q(X)
.
However, this will depend entirely on the functional form of
¯
f
, and will rarely be the case using real
world data. In the majority of cases, RX¯
fq(x)dxwill have to be computed numerically.
3. Rényi Heterogeneity Decomposition under a Non-parametric Pooling Distribution
Definition 3defines a general Gaussian mixture
X
as a weighted combination of component
Gaussian random variables, without identifying the function form of the composition. The
non-parametric Gaussian mixture, where the distribution over
X
is a simple model average over
it’s Gaussian components, is specified in Definition 4.
Definition 3
(Gaussian Mixture)
.
The
n
-dimensional Gaussian mixture
X
is a weighted combination of the set
of
n
-dimensional Gaussian random variables
{Xi}i=1,2,...,N
with component weights
w=(wi)i=1,2,...,N
such
that 0
wi
1and
N
i=1wi=
1. The probability density function of component
Xi
is denoted
N(x|µi,Σi)
,
and is parameterized by an n ×1mean vector µiand n ×n covariance matrix Σi.
Definition 4
(Non-parametric Gaussian Mixture)
.
We define the random variable
X
as a non-parametric
Gaussian mixture if it is a Gaussian mixture (Definition 3) whose probability density function is defined as
¯
f(x|µ1:N,Σ1: N,w) =
N
i=1
wiN(x|µi,Σi), (11)
where
µ1:N
and
Σ1:N
denote the set of component mean vectors
µ1
,
. . .
,
µN
and covariance matrices
Σ1
,
. . .
,
ΣN
,
respectively.
We now introduce the Rényi heterogeneity of a single
n
-dimensional Gaussian random
variable (Proposition 2) and subsequently characterize the
γ
-,
α
-, and
β
-heterogeneity values for
a non-parametric Gaussian mixture.
5 of 13
Proposition 2
(Rényi Heterogeneity of a Multivariate Gaussian)
.
The Rényi heterogeneity of an
n-dimensional Gaussian random variable X with mean µand covariance matrix Σis
Πq(X)=
Undefined q=0
(2πe)n
2|Σ|1
2q=1
(2π)n
2|Σ|1
2q=
(2π)n
2qn
2(q1)|Σ|1
2q/∈ {0, 1, }
. (12)
The proof of Proposition 2is included in Appendix A. Unfortunately, a closed form solution such
as Equation 12 cannot be obtained for the γ-heterogeneity of a non-parametric Gaussian mixture,
Πγ
q(X)= ZX N
i=1
wiN(x|µi,Σi)!q
dx!1
1q
, (13)
which must be computed numerically to yield the effective size of the mixture’s domain. This process
may be computationally expensive, particularly in high dimensions. Conversely, Equation 9, which
yields the effective size of the domain per mixture component, can be evaluated in closed form for a
Gaussian mixture:
Πα
q(X)=
Undefined q=0
exp n1
2n+N
i=1wilog |2πΣi|o q=1
0q=
(2π)n
2N
i=1
wq
i
N
j=1wq
j
|Σi|1
2
qn
21
1q
q/∈ {0, 1, }
. (14)
The
β
-heterogeneity, which returns the effective number of components in the mixture, can then
be computed using Equation 4. Example 1demonstrates an important property of considering
X
as a
non-parametric Gaussian mixture: that low-probability regions of the domain between well-separated
components will have little to no effect on the γ- or β-heterogeneity estimates.
Example 1
(Decomposition of Rényi heterogeneity in a univariate Gaussian mixture)
.
Consider three
non-parametric Gaussian mixtures
X(1)
,
X(2)
,
X(3)
defined on
R
whose number of components are respectively
N1=
2,
N2=
3, and
N3=
4. Components in each mixture are equally weighted—that is, the components
of mixture
X(j)
have weights
w(j)
i=
1
/Nj
for all
i∈ {
1, 2,
. . .
,
Nj}
—and have equal standard deviation
σ=
0.5. This yields a per-component Rényi heterogeneity of approximately 2.07, which is also consequently the
α-heterogeneity for each Gaussian mixture.
Figure 1demonstrates the multiplicative decomposition of Rényi heterogeneity (at
q=
1) in these Gaussian
mixtures, where
γ
-heterogeneity was computed numerically, across varying separations of respective mixtures’
component means. Note that the
β
-heterogeneity in this case represents the effective number of distinct
components in the mixture distribution, and is bound between 1 (when all components overlap), and
Nj
(when
all components are well separated). Further separating the mixture components beyond the point at which
β-heterogeneity reaches Njyielded no further increase in β-heterogeneity.
6 of 13
Figure 1.
Demonstration of the multiplicative decomposition of Rényi heterogeneity in Gaussian
mixture models, where
γ
-heterogeneity is computed using numerical integration. Each row represents
a different number of mixture components (from top to bottom: 2, 3, and 4 univariate Gaussians with
σ=
0.5, respectively). Each column shows a case in which the component locations are progressively
further separated (
maxiµiminiµi
distance from left to right: 0, 2, 4, 6). The
α
-heterogeneity in all
scenarios was 2.07. The headings on each panel show the resulting γand β-heterogeneity values.
Assuming sufficiently accurate approximation of the integral in Equation 13, the
γ
-heterogeneity
in Example 1appears to reach a limit corresponding to the sum of effective domain sizes under
all mixture components, and the
β
-heterogeneity reaches a limit corresponding to the number of
individual mixture components.
Unfortunately, computation of
β
-heterogeneity in a non-parametric Gaussian mixture will yield
results whose accuracy will depend on the error of numerical integration, and which may consume
significant computational resources when evaluated for large
N
(many components) and large
n
(high
dimension). Although the non-parametric pooling approach may be the only available method for
many distribution classes, a computationally efficient parametric pooling approach exists for Gaussian
mixtures, to which we now turn our attention.
4. Rényi Heterogeneity Decomposition Under a Parametric Pooling Distribution
This section introduces the parametric Gaussian mixture (Definition 5), and subsequently
provides conditions under which decomposition of its heterogeneity satisfies the requirement that
α-heterogeneity be a lower bound on γ-heterogeneity (Theorem 2).
Definition 5
(Parametric Gaussian Mixture)
.
We define the random variable
X
as an
n
-dimensional
parametric Gaussian mixture if it is a Gaussian mixture (Definition 3) whose probability density function
is defined as
¯
f(x|µ,Σ) = N(x|µ,Σ), (15)
with pooled mean vector
µ=
N
i=1
wiµi, (16)
and pooled covariance matrix
7 of 13
Σ=µµ>
+
N
i=1
wiΣi+µiµ>
i. (17)
The efficiency of assuming a parametric, rather than non-parametric, Gaussian mixture is that
γ
-heterogeneity for the latter may be computed in closed form using Equation 12 (it is simply a
function of Equation 17). However, the critical difference between the parametric and non-parametric
Gaussian mixture assumptions is that
γ
-heterogeneity—and therefore
β
-heterogeneity—will depend
on the component means µ1:N, according to the following Lemma.
Lemma 1
(Relationship of
γ
-Heterogeneity to Component Dispersion)
.
Let
X
and
X0
be
N
-component
parametric Gaussian mixtures on
Rn
with component-wise mean vectors
µ1:N={µi}i=1,2,...,N
and
µ0
1:N=
cµii=1,2,...,N
, where
c
1is a scaling factor. The component-wise weights
w
and covariance matrices
Σ1:N={Σi}i=1,2,...,N
are identical between
X
and
X0
. Finally, let
Σ
and
Σ0
be the pooled covariance matrices
for X and X0, respectively. Then, for all c 1, we have that
Πγ
qX0Πγ
q(X), (18)
with equality if c =1.
Lemma 1, whose proof is detailed in Appendix A, implies that the resulting
β
-heterogeneity of a
parametric Gaussian mixture will increase as the mixture component means are spread further apart.
This follows from the fact that Equation 14, which is computed component-wise, remains a valid
expression of the α-heterogeneity in a parametric Gaussian mixture.
Before stating the conditions under which
α
is a lower bound on
γ
for a parametric Gaussian
mixture (Theorem 2), we introduce the following Lemma, whose proof is left to Appendix A.
Lemma 2.
If
{Σi}i=1,2,...,N
is a set of
NN2
positive semidefinite
n×n
matrices with corresponding
weights w=(wi)i=1,2,...,Nsuch that 0wi1and N
i=1wi=1, then
N
i=1
wiΣi
1
2
N
i=1|Σi|wi
2. (19)
Theorem 2.
The Rényi
β
-heterogeneity of order
q=
1of a parametric Gaussian mixture
X
(Definition 5) has a
lower bound of 1:
Πβ
1(X)=Πγ
1(X)
Πα
1(X)1 (20)
Proof.
Recall that
Πα
q(X)
is independent of the mean-vectors of components in
X
(Equation 14).
Furthermore, it follows from Lemma 1that if
µ1:N={0}i=1,2,...,N
, where
0
is an
n×
1 zero vector,
then for any parametric Gaussian mixture
X0
with means
µ0
1:N
we will have
Πγ
q(X0)Πγ
q(X)
, where
equality is obtained if µ0
1:Nare also zero vectors, or the covariance of mean vectors in X0,
Cov[µ0] = E[µ0µ0>]E[µ0]E[µ0]>, (21)
is otherwise singular. Thus, it suffices to prove our theorem under the assumption that
µ1:N=
{0}i=1,2,...,N, where the pooled covariance of Xis redefined as
Σ=
N
i=1
wiΣi. (22)
The expression for Πγ
1(X)Πα
1(X)is
8 of 13
(2πe)n
2|Σ|1
2exp (1
2 n+
N
i=1
wilog |2πΣi|!), (23)
which after simplification,
|Σ|1
2
N
i=1|Σi|wi
2, (24)
can be appreciated to satisfy Lemma 2.
Although Theorem 2highlights the reliability and flexibility of using elasticity
q=
1, we must
emphasize that
q=
1 may not be the only condition under which
Πγ
q(X)Πα
q(X)
, as suggested by
Example 2. Indeed, Example 2suggests that the integrity of this bound on
β
-heterogeneity at elasticity
values
q6=
1 may depend in various ways on the unique combination of component-wise parameters
in a parametric Gaussian mixture.
Example 2
(Decomposition of Rényi Heterogeneity in a Parametric Gaussian Mixture)
.
Consider a
parametric Gaussian mixture
X
with four components defined on
R
(for instance, Figure 2A). The components’
respective standard deviations are
σ=(0.5, 0.8, 1.1, 1.6)
. We vary the column vector of mixture component
weights w=(wi)i=1,...,4 according to the following function,
w(a) =
(1, 0, 0, 0)>a=0
(0.25, 0.25, 0.25, 0.25)>a=1
(0, 0, 0, 1)>a=
a1
31
a4
31
ai1
3i=1,...,4
a/∈ {0, 1, }
(25)
which “skews” the distribution of weights over components in
X
according to the value of a skew parameter
a
0(shown in Figure 2B. As the parameter
a
decreases further below 1, components
X1
and
X2
(which have
the narrowest distributions) become preferentially weighted. Conversely, as
a
increases above 1, components
X3
and
X4
are preferentially weighted. At
a=
1, all components are equally weighted (depicted as the dashed black
lines in Figure 2B-F).
9 of 13
Figure 2.
Graphical counterexample showing that
α
-heterogeneity is not always a lower bound on
γ
-heterogeneity when
q6=
1 for a parametric Gaussian mixture.
Panel A
: Four univariate Gaussian
components used in the mixture distribution evaluated.
Panel B
: Mixture component weights. Each
colored line (see bottom right of Figure for legend) represents a different distribution of weights on the
mixture components, such that in some settings, the most narrow components are weighted highest,
and vice versa.
Panel C
:
γ
-heterogeneity as computed by pooling the mixture components from Panel
A according to Equation 15, for each weighting scheme at
q6=
1.
Panel D
: The
α
-heterogeneity for
each weighting scheme at
q6=
1.
Panel E
: The
β
-heterogeneity across each weighting scheme at
q6=
1.
Panel F
: The
β
-heterogeneity across various weighting schemes (plotted on the x-axis in log scale) at
q=
1. The vertical coloured lines correspond to the values of
Πβ
1(X)
across the weighting schemes
W1:7 shown in the legend of Panel C.
Figures 2C-E plot the
γ
-,
α
-, and
β
-heterogeneity for the parametric Gaussian mixture at
q6=
1, respectively,
while Figure 2F computes the
β
-heterogeneity at
q=
1for variously skewed weight distributions. When the
skew parameter results in a distribution of weights whose ranking of components agrees with the rank order
of component distribution widths (
σ
), then
β
-heterogeneity appears to exceed 1 for
q>
1. However, when the
component weights and distribution widths are anti-correlated (in terms of rank order), then we observe values
of β-heterogeneity below 1 at values of q >1, as well as for some values of q <1.
5. Discussion
This paper provided approaches for multiplicative decomposition of heterogeneity in
continuous mixture distributions, thereby extending the earlier work on discrete space heterogeneity
decomposition presented by Jost
[9]
. Two approaches were offered, dependent upon whether the
distribution over the pooled system is defined either parametrically or non-parametrically. Our results
improve the understanding of heterogeneity measurement in non-categorical systems by providing
conditions under which decomposition of heterogeneity into
α
and
β
components conforms to the
intuitive property that γα.
If one defines the pooled mixture non-parametrically, as in a finite mixture model, heterogeneity is
decomposable such that
γα
for all
q>
0 (if component weights are uniform, or at
q=
1 otherwise),
and
β
may be interpreted as the discrete number of distinct mixture components (Sections 2.2 &3).
This has the advantage of conforming with the original discrete decomposition by Jost
[9]
, insofar as
probability mass in the mixture is recorded only where it is observed in the data, and not elsewhere,
as would be assumed under a parametric model of the pooled system. Consequently, one achieves a
more precise estimate of the size of the pooled system’s base of support. The primary limitation arises
from the need to numerically integrate the
γ
-heterogeneity, which can become prohibitively expensive
in higher dimensions. Future work should investigate the error bounds on numerically integrated γ.
10 of 13
A more computationally efficient approach for decomposition of continuous Rényi heterogeneity
is to assume that the pooled mixture has an overall parametric distribution. A common application for
which this assumption is generally made is in mixed-effects meta-analysis [
21
]. An important departure
from the non-parametric pooling approach of finite mixture models is that non-trivial probability mass
may now be assigned to regions not covered by any of the constituent component distributions. From
another perspective, one may appreciate that the non-parametric approach to pooling is insensitive to
the distance between component distributions, and rather only measures the effective volume of event
space to which component distributions assign probability. Conversely, assumption of the parametric
distribution over mixture (in the case of Section 4, a Gaussian) incorporates the distance between
the component distributions into the calculation of
γ
-heterogeneity. This would be appropriate in
scenarios where one assumes that the observed components undersamples the true distribution on
the pooled system. For example, in the case of mixed-effects meta-analysis, the available research
studies for inclusion may differ significantly in terms of their means, but one might assume that there
is a significant probability of a new study yielding an effect somewhere in between. Specifying a
parametric distribution over the pooled system would capture this assumption.
One limitation of the present study is the use of a Gaussian model for the pooled system
distribution. This was chosen on account of (A) its prevalence in the scientific literature and (B)
analytical tractability. Future work should expand these results to other distributions. Notwithstanding,
we have demonstrated the decomposition of
γ
Rényi heterogeneity into its
α
and
β
components for
continuous systems. There are (broadly) two approaches, based on whether parametric assumptions
are made about the pooled system distribution. Under these assumptions applied to Gaussian mixture
distributions, we provided conditions under which the criterion that
γα
is satisfied. Future
studies should evaluate this method as an alternative approach for the measurement of meta-analytic
heterogeneity, and expand these results to other parametric distributions over the pooled system.
Author Contributions:
Conceptualization, A.N.; methodology, A.N.; validation, A.N.; formal analysis, A.N.;
investigation, A.N.; writing–original draft preparation, A.N.; writing–review and editing, M.A. and T.T.;
visualization, A.N.; supervision, M.A. and T.T.
Funding: This research received no external funding.
Conflicts of Interest: The authors declare no conflict of interest.
Appendix A. Proofs
Proof of Theorem 1.
Following Jost
[9]
(proof 2), in the limit
q
1, one obtains the following
inequality
N
i=1
wiZXfi(x)log fi(x)dx≤ −ZX
¯
f(x)log ¯
f(x)dx, (A1)
whereas when wi=wjfor all (i,j)∈ {1, 2, . . . , N}, for q>1 we have
1
N
N
i=1ZXfq
i(x)dxZX 1
N
N
i=1
fi(x)!q
dx. (A2)
and for q<0 we have
1
N
N
i=1ZXfq
i(x)dxZX 1
N
N
i=1
fi(x)!q
dx, (A3)
all of which hold by Jensen’s inequality.
Proof of Proposition 2.We must solve the following integral:
11 of 13
Πq(X)=(2π)qn
2|Σ|q
2ZRneq
2(xµ)>Σ1(xµ)dx1
1q(A4)
The eigendecomposition of the inverse of the covariance matrix
Σ1
into an orthonormal matrix
of eigenvectors
U
and an
n×n
diagonal matrix of eigenvalues
Λ=δij λij=1,2,...,n
i=1,2,...,n
, where
δij
is
Kronkecker’s delta, facilitates the substitution
y=U1(xµ)
required for Gaussian integration, by
which we obtain the following solution for q/∈ {0, 1, }:
Πq(X)=qn
2(q1)(2π)n
2|Σ|1
2. (A5)
L’Hôpital’s rule facilitates computation of the limit as q1:
lim
q1log Πq(X)=lim
q1n
2(q1)log q+n
2log(2π) + 1
2log |Σ|
=n
2+n
2log(2π) + 1
2log |Σ|,
(A6)
giving the perplexity,
Π1(X)=(2πe)n
2|Σ|1
2. (A7)
By the same procedure, we can compute the limit as q,
Π(X)=(2π)n
2|Σ|1
2, (A8)
as well as show that Π0(X)is undefined.
Proof of Lemma 1.
For all
q>
0, proving
Πγ
q(X0)Πγ
q(X)
amounts to proving
Σ0
1
2|Σ|1
2
. To
this end, we have
Σ0
=
N
i=1
wiΣi+
N
i=1
wicµicµi> N
i=1
wicµi! N
i=1
wicµi!>
(A9)
=
N
i=1
wiΣi+c N
i=1
wiµiµ>
iµµ>
!(A10)
=ˆ
Σ+cC[µ](A11)
and
Σ=ˆ
Σ+C[µ], (A12)
where we denoted
ˆ
Σ=N
i=1wiΣi
and
C[µ] = N
i=1wiµiµ>
iµµ>
for notational parsimony. Clearly,
when c=1, we have Σ0
=Σ.
By the Minkowski determinant inequality, we have that
Σ0
1
2ˆ
Σ
1
2+cn
2|C[µ]|1
2(A13)
|Σ|1
2ˆ
Σ
1
2+|C[µ]|1
2, (A14)
which, since
c
1, implies the first line is greater than or equal to the second. Subtracting the second
line from the first and simplifying yields
12 of 13
Σ0
1
2|Σ|1
2
|C[µ]|1
2cn
21 (A15)
At
c=
1 Equation A15 reduces to an equality, and since
c
1 and
n
1, Equation A15 establishes that
Σ0
1
2|Σ|1
2.
Proof of Lemma 2.
Since
Σ1:N
are positive semidefinite matrices, then for all
xRn
, we have that
1
2x>(wiΣi)x
0, and thus
1
2x>N
i=1wiΣix
0. By exponentiating the quadratic term, we have
e1
2x>(N
i=1wiΣi)x=
N
i=1e1
2x>Σixwi. (A16)
We obtain the following expressions by applying Gaussian integration to the left hand side,
ZRne1
2x>(N
i=1wiΣi)xdx=(2π)n
2 N
i=1
wiΣi!
1
2
, (A17)
as well as to a bound on the right hand side obtained by Hölder’s inequality,
ZRn
N
i=1e1
2x>Σixwidx
N
i=1ZRne1
2x>Σixdxwi
(A18)
=(2π)n
2 N
i=1|Σi|wi
2!. (A19)
Substituting Equations A17 and A19 into Equation A16 and simplifying terms yields
N
i=1
wiΣi
1
2
N
i=1|Σi|wi
2. (A20)
References
1.
Hooper, D.; Chapin, F.; Ewel, J.; Hector, A.; Inchausti, P.; Lavorel, S.; Lawton, J.; Lodge, D.; Loreau,
M.; Naeem, S.; Schmid, B.; Setälä, H.; Symstad, A.; Vandermeer, J.; Wardle, D. Effects of biodiversity
on ecosystem functioning: A consensus of current knowledge. Ecological Monographs
2005
,75, 3–35,
[arXiv:1011.1669v3]. doi:10.1890/04-0922.
2. Cowell, F. Measuring Inequality, 2nd ed.; Oxford University Press: Oxford, UK, 2011.
3.
Nunes, A.; Trappenberg, T.; Alda, M. We need an operational framework for heterogeneity in psychiatric
research. Journal of Psychiatry and Neuroscience 2020,45, 3–6. doi:10.1503.jpn/190198.
4.
Nunes, A.; Trappenberg, T.; Alda, M. The Definition and Measurement of Heterogeneity. PsyArXiv
2020
.
doi:10.31234/osf.io/3hykf.
5.
Nunes, A.; Alda, M.; Bardouille, T.; Trappenberg, T. Representational Rényi heterogeneity. Entropy
2020
,
22(4).
6.
Hill, M. Diversity and Evenness: A Unifying Notation and Its Consequences. Ecology
1973
,54, 427–432,
[arXiv:arXiv:astro-ph/0507464v2]. doi:10.2307/1934352.
7.
Hannah, L.; Kay, J. Concentration in Modern Industry: Theory, Measurement, and the U.K. Experience.; The
MacMillan Press: London, UK, 1977.
8.
Lande, R. Statistics and partitioning of species diversity and similarity among multiple communities. Oikos
1996,76, 5–13.
13 of 13
9.
Jost, L. Partitioning Diversity into Independent Alpha and Beta Components. Ecology
2007
,88, 2427–2439,
[1106.4388]. doi:10.1002/ecy.2039.
10.
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Distributed representations of words and hrases and their
compositionality. NIPS, 2013, pp. 1–9, [1310.4546].
11.
Pennington, J.; Socher, R.; Manning, C. Glove: Global Vectors for Word Representation. Proceedings of the
2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532–1543,
[1504.06654]. doi:10.3115/v1/D14-1162.
12.
Nickel, M.; Kiela, D. Poincaré embeddings for learning hierarchical representations. Advances in Neural
Information Processing Systems, 2017, Vol. 2017-Decem, pp. 6339–6348.
13.
Price, A.L.; Patterson, N.J.; Plenge, R.M.; Weinblatt, M.E.; Shadick, N.a.; Reich, D. Principal components
analysis corrects for stratification in genome-wide association studies. Nature genetics
2006
,38, 904–909.
doi:10.1038/ng1847.
14.
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image
Quality of StyleGAN. CoRR 2019,abs/1912.04958.
15.
Ricotta, C.; Szeidl, L. Diversity partitioning of Rao’s quadratic entropy. Theoretical Population Biology
2009
,
76, 299–302.
16.
Leinster, T.; Cobbold, C. Measuring diversity: The importance of species similarity. Ecology
2012
,
93, 477–489, [1106.4388].
17.
Chiu, C.; Chao, A. Distance-based functional diversity measures and their decomposition: A framework
based on hill numbers. PLoS ONE 2014,9.
18.
Chao, A.; Chiu, C.H.; Villéger, S.; Sun, I.F.; Thorn, S.; Lin, Y.C.; Chiang, J.M.; Sherwin, W.B. An
attribute-diversity approach to functional diversity, functional beta diversity, and related (dis)similarity
measures. Ecological Monographs 2019,89, e01343. doi:10.1002/ecm.1343.
19.
Marquand, A.; Wolfers, T.; Mennes, M.; Buitelaar, J.; Beckmann, C. Beyond Lumping and Splitting: A
Review of Computational Approaches for Stratifying Psychiatric Disorders. Biological Psychiatry: Cognitive
Neuroscience and Neuroimaging 2016,1, 433–447.
20.
Wilson, M.; Shmida, A. Measuring Beta Diversity with Presence-Absence Data. The Journal of Ecology
1984
,
72, 1055. doi:10.2307/2259551.
21.
DerSimonian, R.; Laird, N. Meta-analysis in clinical trials. Controlled Clinical Trials
1986
,7, 177–188.
doi:10.1016/0197-2456(86)90046-2.
ResearchGate has not been able to resolve any citations for this publication.
Preprint
Full-text available
Heterogeneity is an important concept in psychiatric research and science more broadly. It negatively impacts effect size estimates under case-control paradigms, and it exposes important flaws in our existing categorical nosology. Yet, our field has no precise definition of heterogeneity proper. We tend to quantify heterogeneity by measuring associated correlates such as entropy or variance: practices which are akin to accepting the radius of a sphere as a measure of its volume. Under a definition of heterogeneity as the degree to which a system deviates from perfect conformity, this paper argues that its proper measure roughly corresponds to the size of a system’s event/sample space, and has units known as numbers equivalent. We arrive at this conclusion through focused review of more than 100 years of (re)discoveries of indices by ecologists, economists, statistical physicists, and others. In parallel, we review psychiatric approaches for quantifying heterogeneity, including but not limited to studies of symptom heterogeneity, microbiome biodiversity, cluster-counting, and time-series analyses. We argue that using numbers equivalent heterogeneity measures could improve the interpretability and synthesis of psychiatric research on heterogeneity. However, significant limitations must be overcome for these measures—largely developed for economic and ecological research—to be useful in modern translational psychiatric science.
Article
Full-text available
Based on the framework of attribute diversity (a generalization of Hill numbers of order q), we develop a class of functional diversity measures sensitive not only to species abundances but also to trait‐based species‐pairwise functional distances. The new method refines and improves on the conventional species‐equivalent approach in three areas: (i) the conventional method often gives similar values (close to unity) to assemblages with contrasting levels of functional diversity; (ii) when a distance metric is unbounded, the conventional functional diversity depends on the presence/absence of other assemblages in the study; (iii) in partitioning functional gamma diversity into alpha and beta components, the conventional gamma is sometimes less than alpha. To resolve these issues, we add to the attribute‐diversity framework a novel concept: τ ‐ the threshold of functional distinctiveness between any two species; here τ can be chosen to be any positive value. Any two species with functional distance ≥ τ are treated as functionally equally‐distinct. Our functional diversity quantifies the effective number of functionally equally‐distinct species (or “virtual functional groups”) with all pairwise distances at least τ for different‐species pairs. We advocate the use of two complementary diversity profiles (τ‐profile and q‐profile), which depict functional diversity with varying levels of τ and q, respectively. Both the conventional species‐equivalent method (i.e., τ = the maximum of species‐pairwise distances) and classic taxonomic diversity (i.e., τ = the minimum of non‐zero species‐pairwise distances) are incorporated into our proposed τ‐profile. For any type of species‐pairwise distance matrices, our attribute‐diversity approach allows proper diversity partitioning, with the desired property gamma≥alpha and thus avoids all the restrictions that apply to the conventional diversity decomposition. Our functional alpha and gamma are interpreted as the effective numbers of functionally equally‐distinct species, respectively, in an assemblage and in the pooled assemblage, while beta is the effective number of equally‐large assemblages with no shared species and all species in the assemblages being equally‐distinct. The resulting beta diversity can be transformed to obtain abundance‐sensitive Sørensen‐ and Jaccard‐type functional (dis)similarity profiles. Hypothetical and real examples are used to illustrate the framework. Online software and R codes are available to facilitate computations. This article is protected by copyright. All rights reserved.
Article
Full-text available
Heterogeneity is a key feature of all psychiatric disorders and manifests on many levels including symptoms, disease course and biological underpinnings. These form a substantial barrier to understanding disease mechanisms and developing effective, personalised treatments. In response, many studies have aimed to stratify psychiatric disorders, aiming to find more consistent subgroups on the basis of many types of data. Such approaches have received renewed interest following recent research initiatives such as the NIMH Research Domain Criteria (RDoC) and the European Roadmap for Mental Health (ROAMER), both of which emphasize finding stratifications that are based on biological systems and that cut across current classifications. Here, we first introduce the basic concepts for stratifying psychiatric disorders, then provide a methodologically-oriented and critical review of the existing literature. This illustrates that the predominant clustering approach that aims to subdivide clinical populations into more coherent subgroups has made a useful contribution but is heavily dependent on the type of data employed; it has produced many different ways to subgroup the disorders we review, but for most disorders has not converged on a consistent set of subgroups. We highlight problems with current approaches that are not widely recognised and discuss the importance of validation to ensure that the derived subgroups index clinically relevant variation. Finally, we review emerging techniques – such as those that estimate normative models for mappings between biology and behaviour – that provide new ways to parse heterogeneity underlying psychiatric disorders and evaluate all methods to meeting the objectives of RDoC and ROAMER
Article
Full-text available
Hill numbers (or the "effective number of species") are increasingly used to characterize species diversity of an assemblage. This work extends Hill numbers to incorporate species pairwise functional distances calculated from species traits. We derive a parametric class of functional Hill numbers, which quantify "the effective number of equally abundant and (functionally) equally distinct species" in an assemblage. We also propose a class of mean functional diversity (per species), which quantifies the effective sum of functional distances between a fixed species to all other species. The product of the functional Hill number and the mean functional diversity thus quantifies the (total) functional diversity, i.e., the effective total distance between species of the assemblage. The three measures (functional Hill numbers, mean functional diversity and total functional diversity) quantify different aspects of species trait space, and all are based on species abundance and species pairwise functional distances. When all species are equally distinct, our functional Hill numbers reduce to ordinary Hill numbers. When species abundances are not considered or species are equally abundant, our total functional diversity reduces to the sum of all pairwise distances between species of an assemblage. The functional Hill numbers and the mean functional diversity both satisfy a replication principle, implying the total functional diversity satisfies a quadratic replication principle. When there are multiple assemblages defined by the investigator, each of the three measures of the pooled assemblage (gamma) can be multiplicatively decomposed into alpha and beta components, and the two components are independent. The resulting beta component measures pure functional differentiation among assemblages and can be further transformed to obtain several classes of normalized functional similarity (or differentiation) measures, including N-assemblage functional generalizations of the classic Jaccard, Sørensen, Horn and Morisita-Horn similarity indices. The proposed measures are applied to artificial and real data for illustration.
Article
Representation learning has become an invaluable approach for learning from symbolic data such as text and graphs. However, while complex symbolic datasets often exhibit a latent hierarchical structure, state-of-the-art methods typically learn embeddings in Euclidean vector spaces, which do not account for this property. For this purpose, we introduce a new approach for learning hierarchical representations of symbolic data by embedding them into hyperbolic space -- or more precisely into an n-dimensional Poincar\'e ball. Due to the underlying hyperbolic geometry, this allows us to learn parsimonious representations of symbolic data by simultaneously capturing hierarchy and similarity. We introduce an efficient algorithm to learn the embeddings based on Riemannian optimization and show experimentally that Poincar\'e embeddings outperform Euclidean embeddings significantly on data with latent hierarchies, both in terms of representation capacity and in terms of generalization ability.