Content uploaded by Abraham Nunes

Author content

All content in this area was uploaded by Abraham Nunes on Jun 17, 2020

Content may be subject to copyright.

Article

Multiplicative Decomposition of Heterogeneity in

Mixtures of Continuous Distributions

Abraham Nunes1,†,∗, Martin Alda1and Thomas Trappenberg2

1Department of Psychiatry, Dalhousie University, Halifax, Nova Scotia, Canada

2Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada

*Correspondence: nunes@dal.ca

†

Current address: 5909 Veterans Memorial Lane (8th Floor), Abbie J. Lane Memorial Building, QE I.I. Health

Sciences Centre, Halifax, Nova Scotia, B3H 2E2, Canada

Received: date; Accepted: date; Published: date

Abstract:

A system’s heterogeneity (diversity) is the effective size of its event space, and can be

quantiﬁed using the Rényi family of indices (also known as Hill numbers in ecology or Hannah-Kay

indices in economics), which are indexed by an elasticity parameter

q≥

0. Under these indices, the

heterogeneity of a composite system (the

γ

-heterogeneity) is decomposable into heterogeneity arising

from variation within and between component subsystems (the

α

- and

β

-heterogeneity, respectively).

Since the average heterogeneity of a component subsystem should not be greater than that of the

pooled system, we require that

γ≥α

. There exists a multiplicative decomposition for Rényi

heterogeneity of composite systems with discrete event spaces, but less attention has been paid to

decomposition in the continuous setting. We therefore describe multiplicative decomposition of

the Rényi heterogeneity for continuous mixture distributions under parametric and non-parametric

pooling assumptions. Under non-parametric pooling, the

γ

-heterogeneity must often be estimated

numerically, but the multiplicative decomposition holds such that

γ≥α

for

q>

0. Conversely,

under parametric pooling,

γ

-heterogeneity can be computed efﬁciently in closed-form, but the

γ≥α

condition holds reliably only at

q=

1. Our ﬁndings will further contribute to heterogeneity

measurement in continuous systems.

Keywords: Heterogeneity, Diversity, Decomposition, Gaussian mixture

1. Introduction

Measurement of heterogeneity is important across many scientiﬁc disciplines. Ecologists are

interested in the heterogeneity of ecosystems’ biological composition (biodiversity) [

1

], economists are

interested in the heterogeneity of resource ownership (wealth equality) [

2

], and medical researchers

and physicians are interested in the heterogeneity of diseases and their presentations [

3

]. Using Rényi

heterogeneity [

3

–

5

], which for categorical random variables corresponds to ecologists’ Hill numbers

[

6

] and economists’ Hannah-Kay indices [

7

], one can measure a system’s heterogeneity as its effective

number of distinct conﬁgurations.

The heterogeneity of a mixture or ensemble of systems is often known as

γ

-heterogeneity, and is

generated by variation occurring within and between constituent subsystems. A good heterogeneity

measure will facilitate decomposition of

γ

-heterogeneity into

α

(within subsystem) and

β

(between

subsystem) components. Under this decomposition, we require that

γ≥α

, since it is counterintuitive

that the heterogeneity of the overall ensemble should be less than any of its constituents, let alone the

“average” subsystem [

8

,

9

]. Such a decomposition was introduced by Jost

[9]

for systems represented

on discrete event spaces (such as representations of organisms by species labels). However, many data

are better modeled by continuous embeddings; including word semantics [

10

–

12

], genetic population

structure [

13

], and natural images [

14

]. Unfortunately, there is considerably less understood about how

to decompose Rényi heterogeneity in such cases where data are represented on non-categorical spaces

[

4

]. Although there are decomposable functional diversity indices expressed in numbers equivalent,

2 of 13

they require categorical partitioning of the data (in order to supply species (dis)similarity matrices)

[

15

–

18

] and setting sensitivity or threshold parameters for (dis)similarities [

16

,

18

]. For many research

applications, such as those in psychiatry [

3

,

4

,

19

] or involving unsupervised learning [

13

,

14

], we may

not have categorical partitions of the observable space that are valid, reliable, and of semantic relevance.

If we are to apply Rényi heterogeneity to such continuous-space systems, then we must demonstrate

that its multiplicative decomposition of γ-heterogeneity into αand βcomponents is retained.

Therefore, our present work extends the Jost

[9]

multiplicative decomposition of Rényi

heterogeneity to the analysis of continuous systems, and provides conditions under which the

γ≥α

condition is satisﬁed. In Section 2, we introduce decomposition of the Rényi heterogeneity in categorical

and continuous systems. Speciﬁcally, we highlight that the most important decision guiding the

availability of a decomposition is how one deﬁnes the distribution over the mixture of subsystems. We

show that for non-parametrically pooled systems (i.e. ﬁnite mixture models, illustrated in Section 3), the

γ≥α

condition can hold for all values of the Rényi elasticity parameter

q>

0, but that

γ

-heterogeneity

will generally require numerical estimation. Section 4introduces decomposition of Rényi heterogeneity

under parametric assumptions on the pooled system’s distribution. In this case, which amounts to a

Gaussian mixed-effects model (as commonly implemented in biomedical meta-analyses), we show

that

γ≥α

will hold at

q=

1, though not necessarily at

q6=

1. Finally, in Section 5, we discuss the

implications of our ﬁndings and scenarios in which parametric or non-parametric pooling assumptions

might be particularly useful.

2. Background

2.1. Categorical Rényi Heterogeneity Decomposition

In this section, we consider the deﬁnition and decomposition of Rényi heterogeneity for a

composite random variable (or “system”) that we call a discrete mixture (Deﬁnition 1).

Deﬁnition 1

(Discrete Mixture)

.

A random variable or system

X

is called a discrete mixture when it is deﬁned

on an

n

-dimensional discrete state space

X={

1, 2,

. . .

,

n}

with probability distribution

¯

p=(¯

pi)i=1,2,...,n

,

where

¯

pi

is the probability that

X

is observed in state

i∈ X

. Furthermore, let

X

be an aggregation of

N

component subsystems

X1

,

X2

,

. . .

,

XN

with corresponding probability distributions

P=pij j=1,2,...,n

i=1,2,...,N

. The

proportion of

X

attributable to each component is governed by the weights

w=(wi)i=1,2,...,N

, where 0

≤wi≤

1

and ∑N

i=1wi=1.

Let Xbe a discrete mixture. The Rényi heterogeneity for the ithcomponent is

Πq(Xi)= n

∑

j=1

pq

ij !1

1−q

, (1)

which is the effective number of states in

Xi

. Assuming the pooled distribution over discrete mixture

Xis a weighted average of subsystem distributions, ¯

p=P>w, the γ-heterogeneity is thus

Πγ

q(X)= n

∑

i=1

¯

pq

i!1

1−q

, (2)

which we interpret as the effective number of states in the pooled system X.

Jost [9] proposed the following decomposition of γ-heterogeneity:

Πγ

q(X)=Πα

q(X)Πβ

q(X), (3)

3 of 13

where

Πα

q(X)

and

Πβ

q(X)

are summary measures of heterogeneity due to variation within and between

subsystems, respectively. Since the

γ

factor has units of effective number of states in the pooled system,

and αhas units of effective number of states per component, then

Πβ

q(X)=Πγ

q(X)

Πα

q(X)(4)

yields the effective number of components in X.

For discrete mixtures, Jost [9] speciﬁed the functional form for α-heterogeneity as

Πα

q(X)=

∑N

i=1

wq

i

∑N

k=1wq

k

∑n

j=1pq

ij 1

1−qq6=1

exp{− ∑N

i=1wi∑n

j=1pij log pi j}q=1

, (5)

which allows the decomposition in Equation 3to satisfy the following desiderata:

1. The αand βcomponents are independent [20]

2. The within-group heterogeneity is a lower bound on total heterogeneity [8]: Πα

q≤Πγ

q

3. The α-heterogeneity is a form of average heterogeneity over groups

4. The αand βcomponents are both expressed in numbers equivalent.

Speciﬁcally, Jost

[9]

proved that

Πγ

q(X)≥Πα

q(X)

is guaranteed for all

q≥

0 when

wi=wj

for all

(i,j)∈ {1, 2, . . . , N}, or for unequal weights wif the elasticity is set to the Shannon limit of q→1.

2.2. Continuous Rényi Heterogeneity Decomposition

Let

X

be a non-parametric continuous mixture according to Deﬁnition 2. Despite individual

mixture components in

X

potentially having parametric probability density functions, we call this a

“non-parametric” mixture because the distribution over pooled components does not assume the form

of a known parametric family.

Deﬁnition 2

(Non-parametric Continuous Mixture)

.

A non-parametric continuous mixture is a random

variable

X

deﬁned on an

n

-dimensional continuous space

X ⊆ Rn

, and composed of subsystems

X1

,

X2

,

. . .

,

XN

,

with respective probability density functions

f(x) = {fi(x)}i=1,2,...,N

and weights

w=(wi)i=1,2,...,N

such that

∑N

i=1wi=1and 0≤wi≤1. The pooled probability density over X is deﬁned as

¯

f(x) =

N

∑

i=1

wifi(x). (6)

The continuous Rényi heterogeneity for the ithsubsystem of Xis

Πq(Xi)=ZXfq

i(x)dx1

1−q, (7)

whose interpretation is given by Proposition 1(see Proposition A3 in Nunes et al.

[5]

for the proof),

which we henceforth call the “effective volume” of the event space or domain of Xi.

Proposition 1

(Rényi Heterogeneity of a Continuous Random Variable)

.

The Rényi heterogeneity of a

continuous random variable

X

deﬁned on event space

X ⊆ Rn

with probability density function

f

is equal to

the magnitude of the volume of an

n

-cube over which there is a uniform probability density with the same Rényi

heterogeneity as that in X.

Given the pooled distribution as deﬁned in Equation 6, the Rényi heterogeneity over the mixture,

which is the γ-heterogeneity, is

4 of 13

Πγ

q(X)=ZX

¯

fq(x)dx1

1−q. (8)

The

γ

-heterogeneity is thus the total effective volume of

X

’s domain. The

α

-heterogeneity represents

the effective volume per component mixture component in X, and is computed as follows:

Πα

q(X)= N

∑

i=1

wq

i

∑N

k=1wq

kZXfq

i(x)dx!1

1−q

. (9)

Given Equations 8and 9, the following theorem provides conditions under which

γ≥α

is

satisﬁed for a non-parametric continuous mixture. The proof is analogous to that given by Jost

[9]

for

discrete mixtures, and is detailed in Appendix A.

Theorem 1.

If

X

is a non-parametric continuous mixture (Deﬁnition 2), with

γ

-heterogeneity speciﬁed by

Equation 8and α-heterogeneity given by Equation 9, then

Πβ

q(X)=Πγ

q(X)

Πα

q(X)≥1 (10)

under the following conditions:

1. q =1

2. q >0when weights are equal for all mixture components.

If

RXfq

i(x)

d

x

is analytically tractable for all

i∈ {

1, 2,

. . .

,

N}

, then a closed form expression

for

Πα

q(X)

will be available. If

RX¯

fq(x)

d

x

is also analytically tractable, then so too will be

Πβ

q(X)

.

However, this will depend entirely on the functional form of

¯

f

, and will rarely be the case using real

world data. In the majority of cases, RX¯

fq(x)dxwill have to be computed numerically.

3. Rényi Heterogeneity Decomposition under a Non-parametric Pooling Distribution

Deﬁnition 3deﬁnes a general Gaussian mixture

X

as a weighted combination of component

Gaussian random variables, without identifying the function form of the composition. The

non-parametric Gaussian mixture, where the distribution over

X

is a simple model average over

it’s Gaussian components, is speciﬁed in Deﬁnition 4.

Deﬁnition 3

(Gaussian Mixture)

.

The

n

-dimensional Gaussian mixture

X

is a weighted combination of the set

of

n

-dimensional Gaussian random variables

{Xi}i=1,2,...,N

with component weights

w=(wi)i=1,2,...,N

such

that 0

≤wi≤

1and

∑N

i=1wi=

1. The probability density function of component

Xi

is denoted

N(x|µi,Σi)

,

and is parameterized by an n ×1mean vector µiand n ×n covariance matrix Σi.

Deﬁnition 4

(Non-parametric Gaussian Mixture)

.

We deﬁne the random variable

X

as a non-parametric

Gaussian mixture if it is a Gaussian mixture (Deﬁnition 3) whose probability density function is deﬁned as

¯

f(x|µ1:N,Σ1: N,w) =

N

∑

i=1

wiN(x|µi,Σi), (11)

where

µ1:N

and

Σ1:N

denote the set of component mean vectors

µ1

,

. . .

,

µN

and covariance matrices

Σ1

,

. . .

,

ΣN

,

respectively.

We now introduce the Rényi heterogeneity of a single

n

-dimensional Gaussian random

variable (Proposition 2) and subsequently characterize the

γ

-,

α

-, and

β

-heterogeneity values for

a non-parametric Gaussian mixture.

5 of 13

Proposition 2

(Rényi Heterogeneity of a Multivariate Gaussian)

.

The Rényi heterogeneity of an

n-dimensional Gaussian random variable X with mean µand covariance matrix Σis

Πq(X)=

Undeﬁned q=0

(2πe)n

2|Σ|1

2q=1

(2π)n

2|Σ|1

2q=∞

(2π)n

2qn

2(q−1)|Σ|1

2q/∈ {0, 1, ∞}

. (12)

The proof of Proposition 2is included in Appendix A. Unfortunately, a closed form solution such

as Equation 12 cannot be obtained for the γ-heterogeneity of a non-parametric Gaussian mixture,

Πγ

q(X)= ZX N

∑

i=1

wiN(x|µi,Σi)!q

dx!1

1−q

, (13)

which must be computed numerically to yield the effective size of the mixture’s domain. This process

may be computationally expensive, particularly in high dimensions. Conversely, Equation 9, which

yields the effective size of the domain per mixture component, can be evaluated in closed form for a

Gaussian mixture:

Πα

q(X)=

Undeﬁned q=0

exp n1

2n+∑N

i=1wilog |2πΣi|o q=1

0q=∞

(2π)n

2∑N

i=1

wq

i

∑N

j=1wq

j

|Σi|1

2

qn

21

1−q

q/∈ {0, 1, ∞}

. (14)

The

β

-heterogeneity, which returns the effective number of components in the mixture, can then

be computed using Equation 4. Example 1demonstrates an important property of considering

X

as a

non-parametric Gaussian mixture: that low-probability regions of the domain between well-separated

components will have little to no effect on the γ- or β-heterogeneity estimates.

Example 1

(Decomposition of Rényi heterogeneity in a univariate Gaussian mixture)

.

Consider three

non-parametric Gaussian mixtures

X(1)

,

X(2)

,

X(3)

deﬁned on

R

whose number of components are respectively

N1=

2,

N2=

3, and

N3=

4. Components in each mixture are equally weighted—that is, the components

of mixture

X(j)

have weights

w(j)

i=

1

/Nj

for all

i∈ {

1, 2,

. . .

,

Nj}

—and have equal standard deviation

σ=

0.5. This yields a per-component Rényi heterogeneity of approximately 2.07, which is also consequently the

α-heterogeneity for each Gaussian mixture.

Figure 1demonstrates the multiplicative decomposition of Rényi heterogeneity (at

q=

1) in these Gaussian

mixtures, where

γ

-heterogeneity was computed numerically, across varying separations of respective mixtures’

component means. Note that the

β

-heterogeneity in this case represents the effective number of distinct

components in the mixture distribution, and is bound between 1 (when all components overlap), and

Nj

(when

all components are well separated). Further separating the mixture components beyond the point at which

β-heterogeneity reaches Njyielded no further increase in β-heterogeneity.

6 of 13

0.0

0.5

Density

= 2.1, =1.0 = 4.1, =2.0 = 4.1, =2.0 = 4.1, =2.0

0.0

0.5

Density

= 2.1, =1.0 = 5.7, =2.8 = 6.2, =3.0 = 6.2, =3.0

10 0 10

x

0.0

0.5

Density

= 2.1, =1.0

10 0 10

x

= 5.9, =2.9

10 0 10

x

= 8.1, =3.9

10 0 10

x

= 8.3, =4.0

Figure 1.

Demonstration of the multiplicative decomposition of Rényi heterogeneity in Gaussian

mixture models, where

γ

-heterogeneity is computed using numerical integration. Each row represents

a different number of mixture components (from top to bottom: 2, 3, and 4 univariate Gaussians with

σ=

0.5, respectively). Each column shows a case in which the component locations are progressively

further separated (

maxiµi−miniµi

distance from left to right: 0, 2, 4, 6). The

α

-heterogeneity in all

scenarios was ≈2.07. The headings on each panel show the resulting γand β-heterogeneity values.

Assuming sufﬁciently accurate approximation of the integral in Equation 13, the

γ

-heterogeneity

in Example 1appears to reach a limit corresponding to the sum of effective domain sizes under

all mixture components, and the

β

-heterogeneity reaches a limit corresponding to the number of

individual mixture components.

Unfortunately, computation of

β

-heterogeneity in a non-parametric Gaussian mixture will yield

results whose accuracy will depend on the error of numerical integration, and which may consume

signiﬁcant computational resources when evaluated for large

N

(many components) and large

n

(high

dimension). Although the non-parametric pooling approach may be the only available method for

many distribution classes, a computationally efﬁcient parametric pooling approach exists for Gaussian

mixtures, to which we now turn our attention.

4. Rényi Heterogeneity Decomposition Under a Parametric Pooling Distribution

This section introduces the parametric Gaussian mixture (Deﬁnition 5), and subsequently

provides conditions under which decomposition of its heterogeneity satisﬁes the requirement that

α-heterogeneity be a lower bound on γ-heterogeneity (Theorem 2).

Deﬁnition 5

(Parametric Gaussian Mixture)

.

We deﬁne the random variable

X

as an

n

-dimensional

parametric Gaussian mixture if it is a Gaussian mixture (Deﬁnition 3) whose probability density function

is deﬁned as

¯

f(x|µ∗,Σ∗) = N(x|µ∗,Σ∗), (15)

with pooled mean vector

µ∗=

N

∑

i=1

wiµi, (16)

and pooled covariance matrix

7 of 13

Σ∗=−µ∗µ>

∗+

N

∑

i=1

wiΣi+µiµ>

i. (17)

The efﬁciency of assuming a parametric, rather than non-parametric, Gaussian mixture is that

γ

-heterogeneity for the latter may be computed in closed form using Equation 12 (it is simply a

function of Equation 17). However, the critical difference between the parametric and non-parametric

Gaussian mixture assumptions is that

γ

-heterogeneity—and therefore

β

-heterogeneity—will depend

on the component means µ1:N, according to the following Lemma.

Lemma 1

(Relationship of

γ

-Heterogeneity to Component Dispersion)

.

Let

X

and

X0

be

N

-component

parametric Gaussian mixtures on

Rn

with component-wise mean vectors

µ1:N={µi}i=1,2,...,N

and

µ0

1:N=

√cµii=1,2,...,N

, where

c≥

1is a scaling factor. The component-wise weights

w

and covariance matrices

Σ1:N={Σi}i=1,2,...,N

are identical between

X

and

X0

. Finally, let

Σ∗

and

Σ0

∗

be the pooled covariance matrices

for X and X0, respectively. Then, for all c ≥1, we have that

Πγ

qX0≥Πγ

q(X), (18)

with equality if c =1.

Lemma 1, whose proof is detailed in Appendix A, implies that the resulting

β

-heterogeneity of a

parametric Gaussian mixture will increase as the mixture component means are spread further apart.

This follows from the fact that Equation 14, which is computed component-wise, remains a valid

expression of the α-heterogeneity in a parametric Gaussian mixture.

Before stating the conditions under which

α

is a lower bound on

γ

for a parametric Gaussian

mixture (Theorem 2), we introduce the following Lemma, whose proof is left to Appendix A.

Lemma 2.

If

{Σi}i=1,2,...,N

is a set of

N∈N≥2

positive semideﬁnite

n×n

matrices with corresponding

weights w=(wi)i=1,2,...,Nsuch that 0≤wi≤1and ∑N

i=1wi=1, then

N

∑

i=1

wiΣi

1

2

≥

N

∑

i=1|Σi|wi

2. (19)

Theorem 2.

The Rényi

β

-heterogeneity of order

q=

1of a parametric Gaussian mixture

X

(Deﬁnition 5) has a

lower bound of 1:

Πβ

1(X)=Πγ

1(X)

Πα

1(X)≥1 (20)

Proof.

Recall that

Πα

q(X)

is independent of the mean-vectors of components in

X

(Equation 14).

Furthermore, it follows from Lemma 1that if

µ1:N={0}i=1,2,...,N

, where

0

is an

n×

1 zero vector,

then for any parametric Gaussian mixture

X0

with means

µ0

1:N

we will have

Πγ

q(X0)≥Πγ

q(X)

, where

equality is obtained if µ0

1:Nare also zero vectors, or the covariance of mean vectors in X0,

Cov[µ0] = E[µ0µ0>]−E[µ0]E[µ0]>, (21)

is otherwise singular. Thus, it sufﬁces to prove our theorem under the assumption that

µ1:N=

{0}i=1,2,...,N, where the pooled covariance of Xis redeﬁned as

Σ∗=

N

∑

i=1

wiΣi. (22)

The expression for Πγ

1(X)≥Πα

1(X)is

8 of 13

(2πe)n

2|Σ∗|1

2≥exp (1

2 n+

N

∑

i=1

wilog |2πΣi|!), (23)

which after simpliﬁcation,

|Σ∗|1

2≥

N

∏

i=1|Σi|wi

2, (24)

can be appreciated to satisfy Lemma 2.

Although Theorem 2highlights the reliability and ﬂexibility of using elasticity

q=

1, we must

emphasize that

q=

1 may not be the only condition under which

Πγ

q(X)≥Πα

q(X)

, as suggested by

Example 2. Indeed, Example 2suggests that the integrity of this bound on

β

-heterogeneity at elasticity

values

q6=

1 may depend in various ways on the unique combination of component-wise parameters

in a parametric Gaussian mixture.

Example 2

(Decomposition of Rényi Heterogeneity in a Parametric Gaussian Mixture)

.

Consider a

parametric Gaussian mixture

X

with four components deﬁned on

R

(for instance, Figure 2A). The components’

respective standard deviations are

σ=(0.5, 0.8, 1.1, 1.6)

. We vary the column vector of mixture component

weights w=(wi)i=1,...,4 according to the following function,

w(a) =

(1, 0, 0, 0)>a=0

(0.25, 0.25, 0.25, 0.25)>a=1

(0, 0, 0, 1)>a=∞

a1

3−1

a4

3−1

ai−1

3i=1,...,4

a/∈ {0, 1, ∞}

(25)

which “skews” the distribution of weights over components in

X

according to the value of a skew parameter

a≥

0(shown in Figure 2B. As the parameter

a

decreases further below 1, components

X1

and

X2

(which have

the narrowest distributions) become preferentially weighted. Conversely, as

a

increases above 1, components

X3

and

X4

are preferentially weighted. At

a=

1, all components are equally weighted (depicted as the dashed black

lines in Figure 2B-F).

9 of 13

5.0 2.5 0.0 2.5 5.0

x

0.0

0.2

0.4

0.6

pdf

(A) Mixture Components

X

1

X

2

X

3

X

4

X

1

X

2

X

3

X

4

Mixture Component

0.2

0.3

0.4

Weight

(B) Mixture Weight Skews

0123

q

4

6

8

10

12

14

q

(

X

)

(C) -Heterogeneity

Mixture Weighting

W

1

W

2

W

3

W

4

W

5

W

6

W

7

0123

q

0

2

4

6

8

10

12

q

(

X

)

(D) -Heterogeneity

0123

q

0.0

0.5

1.0

1.5

2.0

2.5

3.0

q

(

X

)

(E) -Heterogeneity (

q

1)

10 1100101102103

(

X

1) Weight Skew (

X

4)

1.05

1.10

1.15

1(

X

)

W

1

W

4

W

7

(F) -Heterogeneity at

q

= 1

Figure 2.

Graphical counterexample showing that

α

-heterogeneity is not always a lower bound on

γ

-heterogeneity when

q6=

1 for a parametric Gaussian mixture.

Panel A

: Four univariate Gaussian

components used in the mixture distribution evaluated.

Panel B

: Mixture component weights. Each

colored line (see bottom right of Figure for legend) represents a different distribution of weights on the

mixture components, such that in some settings, the most narrow components are weighted highest,

and vice versa.

Panel C

:

γ

-heterogeneity as computed by pooling the mixture components from Panel

A according to Equation 15, for each weighting scheme at

q6=

1.

Panel D

: The

α

-heterogeneity for

each weighting scheme at

q6=

1.

Panel E

: The

β

-heterogeneity across each weighting scheme at

q6=

1.

Panel F

: The

β

-heterogeneity across various weighting schemes (plotted on the x-axis in log scale) at

q=

1. The vertical coloured lines correspond to the values of

Πβ

1(X)

across the weighting schemes

W1:7 shown in the legend of Panel C.

Figures 2C-E plot the

γ

-,

α

-, and

β

-heterogeneity for the parametric Gaussian mixture at

q6=

1, respectively,

while Figure 2F computes the

β

-heterogeneity at

q=

1for variously skewed weight distributions. When the

skew parameter results in a distribution of weights whose ranking of components agrees with the rank order

of component distribution widths (

σ

), then

β

-heterogeneity appears to exceed 1 for

q>

1. However, when the

component weights and distribution widths are anti-correlated (in terms of rank order), then we observe values

of β-heterogeneity below 1 at values of q >1, as well as for some values of q <1.

5. Discussion

This paper provided approaches for multiplicative decomposition of heterogeneity in

continuous mixture distributions, thereby extending the earlier work on discrete space heterogeneity

decomposition presented by Jost

[9]

. Two approaches were offered, dependent upon whether the

distribution over the pooled system is deﬁned either parametrically or non-parametrically. Our results

improve the understanding of heterogeneity measurement in non-categorical systems by providing

conditions under which decomposition of heterogeneity into

α

and

β

components conforms to the

intuitive property that γ≥α.

If one deﬁnes the pooled mixture non-parametrically, as in a ﬁnite mixture model, heterogeneity is

decomposable such that

γ≥α

for all

q>

0 (if component weights are uniform, or at

q=

1 otherwise),

and

β

may be interpreted as the discrete number of distinct mixture components (Sections 2.2 &3).

This has the advantage of conforming with the original discrete decomposition by Jost

[9]

, insofar as

probability mass in the mixture is recorded only where it is observed in the data, and not elsewhere,

as would be assumed under a parametric model of the pooled system. Consequently, one achieves a

more precise estimate of the size of the pooled system’s base of support. The primary limitation arises

from the need to numerically integrate the

γ

-heterogeneity, which can become prohibitively expensive

in higher dimensions. Future work should investigate the error bounds on numerically integrated γ.

10 of 13

A more computationally efﬁcient approach for decomposition of continuous Rényi heterogeneity

is to assume that the pooled mixture has an overall parametric distribution. A common application for

which this assumption is generally made is in mixed-effects meta-analysis [

21

]. An important departure

from the non-parametric pooling approach of ﬁnite mixture models is that non-trivial probability mass

may now be assigned to regions not covered by any of the constituent component distributions. From

another perspective, one may appreciate that the non-parametric approach to pooling is insensitive to

the distance between component distributions, and rather only measures the effective volume of event

space to which component distributions assign probability. Conversely, assumption of the parametric

distribution over mixture (in the case of Section 4, a Gaussian) incorporates the distance between

the component distributions into the calculation of

γ

-heterogeneity. This would be appropriate in

scenarios where one assumes that the observed components undersamples the true distribution on

the pooled system. For example, in the case of mixed-effects meta-analysis, the available research

studies for inclusion may differ signiﬁcantly in terms of their means, but one might assume that there

is a signiﬁcant probability of a new study yielding an effect somewhere in between. Specifying a

parametric distribution over the pooled system would capture this assumption.

One limitation of the present study is the use of a Gaussian model for the pooled system

distribution. This was chosen on account of (A) its prevalence in the scientiﬁc literature and (B)

analytical tractability. Future work should expand these results to other distributions. Notwithstanding,

we have demonstrated the decomposition of

γ

Rényi heterogeneity into its

α

and

β

components for

continuous systems. There are (broadly) two approaches, based on whether parametric assumptions

are made about the pooled system distribution. Under these assumptions applied to Gaussian mixture

distributions, we provided conditions under which the criterion that

γ≥α

is satisﬁed. Future

studies should evaluate this method as an alternative approach for the measurement of meta-analytic

heterogeneity, and expand these results to other parametric distributions over the pooled system.

Author Contributions:

Conceptualization, A.N.; methodology, A.N.; validation, A.N.; formal analysis, A.N.;

investigation, A.N.; writing–original draft preparation, A.N.; writing–review and editing, M.A. and T.T.;

visualization, A.N.; supervision, M.A. and T.T.

Funding: This research received no external funding.

Conﬂicts of Interest: The authors declare no conﬂict of interest.

Appendix A. Proofs

Proof of Theorem 1.

Following Jost

[9]

(proof 2), in the limit

q→

1, one obtains the following

inequality

−

N

∑

i=1

wiZXfi(x)log fi(x)dx≤ −ZX

¯

f(x)log ¯

f(x)dx, (A1)

whereas when wi=wjfor all (i,j)∈ {1, 2, . . . , N}, for q>1 we have

1

N

N

∑

i=1ZXfq

i(x)dx≥ZX 1

N

N

∑

i=1

fi(x)!q

dx. (A2)

and for q<0 we have

1

N

N

∑

i=1ZXfq

i(x)dx≤ZX 1

N

N

∑

i=1

fi(x)!q

dx, (A3)

all of which hold by Jensen’s inequality.

Proof of Proposition 2.We must solve the following integral:

11 of 13

Πq(X)=(2π)−qn

2|Σ|−q

2ZRne−q

2(x−µ)>Σ−1(x−µ)dx1

1−q(A4)

The eigendecomposition of the inverse of the covariance matrix

Σ−1

into an orthonormal matrix

of eigenvectors

U

and an

n×n

diagonal matrix of eigenvalues

Λ=δij λij=1,2,...,n

i=1,2,...,n

, where

δij

is

Kronkecker’s delta, facilitates the substitution

y=U−1(x−µ)

required for Gaussian integration, by

which we obtain the following solution for q/∈ {0, 1, ∞}:

Πq(X)=qn

2(q−1)(2π)n

2|Σ|1

2. (A5)

L’Hôpital’s rule facilitates computation of the limit as q→1:

lim

q→1log Πq(X)=lim

q→1n

2(q−1)log q+n

2log(2π) + 1

2log |Σ|

=n

2+n

2log(2π) + 1

2log |Σ|,

(A6)

giving the perplexity,

Π1(X)=(2πe)n

2|Σ|1

2. (A7)

By the same procedure, we can compute the limit as q→∞,

Π∞(X)=(2π)n

2|Σ|1

2, (A8)

as well as show that Π0(X)is undeﬁned.

Proof of Lemma 1.

For all

q>

0, proving

Πγ

q(X0)≥Πγ

q(X)

amounts to proving

Σ0

∗

1

2≥|Σ∗|1

2

. To

this end, we have

Σ0

∗=

N

∑

i=1

wiΣi+

N

∑

i=1

wi√cµi√cµi>− N

∑

i=1

wi√cµi! N

∑

i=1

wi√cµi!>

(A9)

=

N

∑

i=1

wiΣi+c N

∑

i=1

wiµiµ>

i−µ∗µ>

∗!(A10)

=ˆ

Σ+cC[µ](A11)

and

Σ∗=ˆ

Σ+C[µ], (A12)

where we denoted

ˆ

Σ=∑N

i=1wiΣi

and

C[µ] = ∑N

i=1wiµiµ>

i−µ∗µ>

∗

for notational parsimony. Clearly,

when c=1, we have Σ0

∗=Σ∗.

By the Minkowski determinant inequality, we have that

Σ0

∗

1

2≥ˆ

Σ

1

2+cn

2|C[µ]|1

2(A13)

|Σ∗|1

2≥ˆ

Σ

1

2+|C[µ]|1

2, (A14)

which, since

c≥

1, implies the ﬁrst line is greater than or equal to the second. Subtracting the second

line from the ﬁrst and simplifying yields

12 of 13

Σ0

∗

1

2−|Σ∗|1

2

|C[µ]|1

2≥cn

2−1 (A15)

At

c=

1 Equation A15 reduces to an equality, and since

c≥

1 and

n≥

1, Equation A15 establishes that

Σ0

∗

1

2≥|Σ∗|1

2.

Proof of Lemma 2.

Since

Σ1:N

are positive semideﬁnite matrices, then for all

x∈Rn

, we have that

−1

2x>(wiΣi)x≤

0, and thus

−1

2x>∑N

i=1wiΣix≤

0. By exponentiating the quadratic term, we have

e−1

2x>(∑N

i=1wiΣi)x=

N

∏

i=1e−1

2x>Σixwi. (A16)

We obtain the following expressions by applying Gaussian integration to the left hand side,

ZRne−1

2x>(∑N

i=1wiΣi)xdx=(2π)n

2 N

∑

i=1

wiΣi!

−1

2

, (A17)

as well as to a bound on the right hand side obtained by Hölder’s inequality,

ZRn

N

∏

i=1e−1

2x>Σixwidx≤

N

∏

i=1ZRne−1

2x>Σixdxwi

(A18)

=(2π)n

2 N

∏

i=1|Σi|−wi

2!. (A19)

Substituting Equations A17 and A19 into Equation A16 and simplifying terms yields

N

∑

i=1

wiΣi

1

2

≥

N

∏

i=1|Σi|wi

2. (A20)

References

1.

Hooper, D.; Chapin, F.; Ewel, J.; Hector, A.; Inchausti, P.; Lavorel, S.; Lawton, J.; Lodge, D.; Loreau,

M.; Naeem, S.; Schmid, B.; Setälä, H.; Symstad, A.; Vandermeer, J.; Wardle, D. Effects of biodiversity

on ecosystem functioning: A consensus of current knowledge. Ecological Monographs

2005

,75, 3–35,

[arXiv:1011.1669v3]. doi:10.1890/04-0922.

2. Cowell, F. Measuring Inequality, 2nd ed.; Oxford University Press: Oxford, UK, 2011.

3.

Nunes, A.; Trappenberg, T.; Alda, M. We need an operational framework for heterogeneity in psychiatric

research. Journal of Psychiatry and Neuroscience 2020,45, 3–6. doi:10.1503.jpn/190198.

4.

Nunes, A.; Trappenberg, T.; Alda, M. The Deﬁnition and Measurement of Heterogeneity. PsyArXiv

2020

.

doi:10.31234/osf.io/3hykf.

5.

Nunes, A.; Alda, M.; Bardouille, T.; Trappenberg, T. Representational Rényi heterogeneity. Entropy

2020

,

22(4).

6.

Hill, M. Diversity and Evenness: A Unifying Notation and Its Consequences. Ecology

1973

,54, 427–432,

[arXiv:arXiv:astro-ph/0507464v2]. doi:10.2307/1934352.

7.

Hannah, L.; Kay, J. Concentration in Modern Industry: Theory, Measurement, and the U.K. Experience.; The

MacMillan Press: London, UK, 1977.

8.

Lande, R. Statistics and partitioning of species diversity and similarity among multiple communities. Oikos

1996,76, 5–13.

13 of 13

9.

Jost, L. Partitioning Diversity into Independent Alpha and Beta Components. Ecology

2007

,88, 2427–2439,

[1106.4388]. doi:10.1002/ecy.2039.

10.

Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Distributed representations of words and hrases and their

compositionality. NIPS, 2013, pp. 1–9, [1310.4546].

11.

Pennington, J.; Socher, R.; Manning, C. Glove: Global Vectors for Word Representation. Proceedings of the

2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532–1543,

[1504.06654]. doi:10.3115/v1/D14-1162.

12.

Nickel, M.; Kiela, D. Poincaré embeddings for learning hierarchical representations. Advances in Neural

Information Processing Systems, 2017, Vol. 2017-Decem, pp. 6339–6348.

13.

Price, A.L.; Patterson, N.J.; Plenge, R.M.; Weinblatt, M.E.; Shadick, N.a.; Reich, D. Principal components

analysis corrects for stratiﬁcation in genome-wide association studies. Nature genetics

2006

,38, 904–909.

doi:10.1038/ng1847.

14.

Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image

Quality of StyleGAN. CoRR 2019,abs/1912.04958.

15.

Ricotta, C.; Szeidl, L. Diversity partitioning of Rao’s quadratic entropy. Theoretical Population Biology

2009

,

76, 299–302.

16.

Leinster, T.; Cobbold, C. Measuring diversity: The importance of species similarity. Ecology

2012

,

93, 477–489, [1106.4388].

17.

Chiu, C.; Chao, A. Distance-based functional diversity measures and their decomposition: A framework

based on hill numbers. PLoS ONE 2014,9.

18.

Chao, A.; Chiu, C.H.; Villéger, S.; Sun, I.F.; Thorn, S.; Lin, Y.C.; Chiang, J.M.; Sherwin, W.B. An

attribute-diversity approach to functional diversity, functional beta diversity, and related (dis)similarity

measures. Ecological Monographs 2019,89, e01343. doi:10.1002/ecm.1343.

19.

Marquand, A.; Wolfers, T.; Mennes, M.; Buitelaar, J.; Beckmann, C. Beyond Lumping and Splitting: A

Review of Computational Approaches for Stratifying Psychiatric Disorders. Biological Psychiatry: Cognitive

Neuroscience and Neuroimaging 2016,1, 433–447.

20.

Wilson, M.; Shmida, A. Measuring Beta Diversity with Presence-Absence Data. The Journal of Ecology

1984

,

72, 1055. doi:10.2307/2259551.

21.

DerSimonian, R.; Laird, N. Meta-analysis in clinical trials. Controlled Clinical Trials

1986

,7, 177–188.

doi:10.1016/0197-2456(86)90046-2.