Indian Buffet Processes with Powerlaw Behavior.
ABSTRACT The Indian buffet process (IBP) is an exchangeable distribution over binary ma trices used in Bayesian nonparametric featural models. In this paper we propose a threeparameter generalization of the IBP exhibiting powerlaw behavior. We achieve this by generalizing the beta process (the de Finetti measure of the IBP) to the stablebeta process and deriving the IBP corresponding to it. We find interest ing relationships between the stablebeta process and the PitmanYor process (an other stochastic process used in Bayesian nonparametric models with interesting powerlaw properties). We derive a stickbreaking construction for the stablebeta process, and find that our powerlaw IBP is a good model for word occurrences in document corpora.

Conference Proceeding: Covariatedependent dictionary learning and sparse coding
[show abstract] [hide abstract]
ABSTRACT: A dependent hierarchical beta process (dHBP) is developed as a prior for data that may be represented in terms of a sparse set of latent features (dictionary elements), with covariate dependent feature usage. The dHBP is applicable to general covariates and data models, imposing that signals with similar covariates are likely to be manifested in terms of similar features. As an application, we consider the simultaneous sparse modeling of multiple images, with the covariate of a given image linked to its similarity to all other images (as applied in manifold learning). Efficient inference is performed using hybrid Gibbs, MetropolisHastings and slice sampling.Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on; 06/2011 · 4.63 Impact Factor
Page 1
Indian Buffet Processes with Powerlaw Behavior
Yee Whye Teh and Dilan G¨ or¨ ur
Gatsby Computational Neuroscience Unit, UCL
17 Queen Square, London WC1N 3AR, United Kingdom
{ywteh,dilan}@gatsby.ucl.ac.uk
Abstract
The Indian buffet process (IBP) is an exchangeable distribution over binary ma
trices used in Bayesian nonparametric featural models. In this paper we propose
a threeparameter generalization of the IBP exhibiting powerlaw behavior. We
achieve this by generalizing the beta process (the de Finetti measure of the IBP) to
the stablebeta process and deriving the IBP corresponding to it. We find interest
ing relationships between the stablebeta process and the PitmanYor process (an
other stochastic process used in Bayesian nonparametric models with interesting
powerlaw properties). We derive a stickbreaking construction for the stablebeta
process, and find that our powerlaw IBP is a good model for word occurrences in
document corpora.
1 Introduction
The Indian buffet process (IBP) is an infinitely exchangeable distribution over binary matrices with
a finite number of rows and an unbounded number of columns [1, 2]. It has been proposed as a
suitable prior for Bayesian nonparametric featural models, where each object (row) is modeled with
a potentially unbounded number of features (columns). Applications of the IBP include Bayesian
nonparametric models for ICA [3], choice modeling [4], similarity judgements modeling [5], dyadic
data modeling [6] and causal inference [7].
InthispaperweproposeathreeparametergeneralizationoftheIBPwithpowerlawbehavior. Using
the usual analogy of customers entering an Indian buffet restaurant and sequentially choosing dishes
from an infinitely long buffet counter, our generalization with parameters α > 0, c > −σ and
σ ∈ [0,1) is simply as follows:
• Customer 1 tries Poisson(α) dishes.
• Subsequently, customer n + 1:
– tries dish k with probabilitymk−σ
– tries Poisson(αΓ(1+c)Γ(n+c+σ)
n+c, for each dish that has previously been tried;
Γ(n+1+c)Γ(c+σ)) new dishes.
where mkis the number of previous customers who tried dish k. The dishes and the customers
correspond to the columns and the rows of the binary matrix respectively, with an entry of the matrix
being one if the corresponding customer tried the dish (and zero otherwise). The mass parameter α
controls the total number of dishes tried by the customers, the concentration parameter c controls
the number of customers that will try each dish, and the stability exponent σ controls the powerlaw
behavior of the process. When σ = 0 the process does not exhibit powerlaw behavior and reduces
to the usual twoparameter IBP [2].
Many naturally occurring phenomena exhibit powerlaw behavior, and it has been argued that using
models that can capture this behavior can improve learning [8]. Recent examples where this has led
to significant improvements include unsupervised morphology learning [8], language modeling [9]
1
Page 2
and image segmentation [10]. These examples are all based on the PitmanYor process [11, 12, 13],
a generalization of the Dirichlet process [14] with powerlaw properties. Our generalization of the
IBP extends the ability to model powerlaw behavior to featural models, and we expect it to lead to
a wealth of novel applications not previously well handled by the IBP.
The approach we take in this paper is to first define the underlying de Finetti measure, then to derive
the conditional distributions of Bernoulli process observations with the de Finetti measure integrated
out. This automatically ensures that the resulting powerlaw IBP is infinitely exchangeable. We call
the de Finetti measure of the powerlaw IBP the stablebeta process. It is a novel generalization of
the beta process [15] (which is the de Finetti measure of the normal twoparameter IBP [16]) with
characteristics reminiscent of the stable process [17, 11] (in turn related to the PitmanYor process).
We will see that the stablebeta process has a number of properties similar to the PitmanYor process.
In the following section we first give a brief description of completely random measures, a class of
random measures which includes the stablebeta and the beta processes. In Section 3 we introduce
the stablebeta process, a three parameter generalization of the beta process and derive the power
law IBP based on the stablebeta process. Based on the proposed model, in Section 4 we construct
a model of word occurrences in a document corpus. We conclude with a discussion in Section 5.
2 Completely Random Measures
In this section we give a brief description of completely random measures [18]. Let Θ be a measure
space with Ω its σalgebra. A random variable whose values are measures on (Θ,Ω) is referred
to as a random measure. A completely random measure (CRM) µ over (Θ,Ω) is a random mea
sure such that µ(A)⊥ ⊥µ(B) for all disjoint measurable subsets A,B ∈ Ω. That is, the (random)
masses assigned to disjoint subsets are independent. An important implication of this property is
that the whole distribution over µ is determined (with usually satisfied technical assumptions) once
the distributions of µ(A) are given for all A ∈ Ω.
CRMs can always be decomposed into a sum of three independent parts: a (nonrandom) measure,
an atomic measure with fixed atoms but random masses, and an atomic measure with random atoms
and masses. CRMs in this paper will only contain the second and third components. In this case we
can write µ in the form,
µ =
N
?
k=1
ukδφk+
M
?
l=1
vlδψl,
(1)
where uk,vl> 0 are the random masses, φk∈ Θ are the fixed atoms, ψl∈ Θ are the random atoms,
and N,M ∈ N∪{∞}. To describe µ fully it is sufficient to specify N and {φk}, and to describe the
joint distribution over the random variables {uk},{vl},{ψl} and M. Each ukhas to be independent
from everything else and has some distribution Fk. The random atoms and their weights {vl,ψl}
are jointly drawn from a 2D Poisson process over (0,∞] × Θ with some nonatomic rate measure
Λ called the L´ evy measure. The rate measure Λ has to satisfy a number of technical properties; see
[18, 19] for details. If?
is described by Λ and {φk,Fk}N
µ ∼ CRM(Λ,{φk,Fk}N
Θ
?
(0,∞]Λ(du×dθ) = M∗< ∞ then the number of random atoms M in µ
is Poisson distributed with mean M∗, otherwise there are an infinite number of random atoms. If µ
k=1as above, we write,
k=1).
(2)
3The Stablebeta Process
In this section we introduce a novel CRM called the stablebeta process (SBP). It has no fixed atoms
while its L´ evy measure is defined over (0,1) × Θ:
Λ0(du × dθ) = α
where the parameters are: a mass parameter α > 0, a concentration parameter c > −σ, a stability
exponent 0 ≤ σ < 1, and a smooth base distribution H. The mass parameter controls the overall
mass of the process and the base distribution gives the distribution over the random atom locations.
Γ(1 + c)
Γ(1 − σ)Γ(c + σ)u−σ−1(1 − u)c+σ−1duH(dθ)
(3)
2
Page 3
The mean of the SBP can be shown to be E[µ(A)] = αH(A) for each A ∈ Ω, while var(µ(A)) =
α1−σ
of the SBP around its mean. The stability exponent also governs the powerlaw behavior of the SBP.
When σ = 0 the SBP does not have powerlaw behavior and reduces to a normal twoparameter beta
process [15, 16]. When c = 1 − σ the stablebeta process describes the random atoms with masses
< 1 in a stable process [17, 11]. The SBP is so named as it can be seen as a generalization of both
the stable and the beta processes. Both the concentration parameter and the stability exponent can
be generalized to functions over Θ though we will not deal with this generalization here.
1+cH(A). Thus the concentration parameter and the stability exponent both affect the variability
3.1 Posterior Stablebeta Process
Consider the following hierarchical model:
µ ∼ CRM(Λ0,{}),
Ziµ ∼ BernoulliP(µ)
iid, for i = 1,...,n. (4)
The random measure µ is a SBP with no fixed atoms and with L´ evy measure (3), while Zi ∼
BernoulliP(µ)isaBernoulliprocesswithmeanµ[16]. ThisisalsoaCRM:inasmallneighborhood
dθ around θ ∈ Θ it has a probability µ(dθ) of having a unit mass atom in dθ; otherwise it does not
have an atom in dθ. If µ has an atom at θ the probability of Zihaving an atom at θ as well is µ({θ}).
If µ has a smooth component, say µ0, Ziwill have random atoms drawn from a Poisson process
with rate measure µ0. In typical applications to featural models the atoms in Zigive the features
associated with data item i, while the weights of the atoms in µ give the prior probabilities of the
corresponding features occurring in a data item.
We are interested in both the posterior of µ given Z1,...,Zn, as well as the conditional distribu
tion of Zn+1Z1,...,Znwith µ marginalized out. Let θ∗
Z1,...,Znwith atom θ∗
given Z1,...,Znis still a CRM, but now including fixed atoms given by θ∗
L´ evy measure and the distribution of the mass at each fixed atom θ∗
1,...,θ∗
Kbe the K unique atoms among
koccurring mktimes. Theorem 3.3 of [20] shows that the posterior of µ
1,...,θ∗
K. Its updated
kare,
k=1),
µZ1,...,Zn∼ CRM(Λn,{θ∗
k,Fnk}K
(5)
where
Λn(du × dθ) =α
Γ(1 + c)
Γ(1 − σ)Γ(c + σ)u−σ−1(1 − u)n+c+σ−1duH(dθ),
Γ(n + c)
Γ(mk− σ)Γ(n − mk+ c + σ)umk−σ−1(1 − u)n−mk+c+σ−1du.
Intuitively, the posterior is obtained as follows. Firstly, the posterior of µ must be a CRM since
both the prior of µ and the likelihood of each Ziµ factorize over disjoint subsets of Θ. Secondly,
µ must have fixed atoms at each θ∗
Z1,...,Znat precisely θ∗
“likelihood” umk(1 − u)n−mk(since there are mkoccurrences of the atom θ∗
to the “prior” Λ0(du×dθ∗
there are no other atoms among Z1,...,Zn. We can think of this as n observations of 0 among n
iid Bernoulli variables, so a “likelihood” of (1 − u)nis multiplied into Λ0(without normalization),
giving the updated L´ evy measure in (6a).
Let us inspect the distributions (6) of the fixed and random atoms in the posterior µ in turn. The
random mass at θ∗
σ,n − mk+ c + σ). This differs from the usual beta process in the subtraction of σ from mkand
addition of σ to n − mk+ c. This is reminiscent of the PitmanYor generalization to the Dirichlet
process [11, 12, 13], where a discount parameter is subtracted from the number of customers seated
around each table, and added to the chance of sitting at a new table. On the other hand, the L´ evy
measure of the random atoms of µ is still a L´ evy measure corresponding to an SBP with updated
parameters
α?← αΓ(1 + c)Γ(n + c + σ)
c?← c + n,
(6a)
Fnk(du) =
(6b)
ksince otherwise the probability that there will be atoms among
kis zero. The posterior mass at θ∗
kis obtained by multiplying a Bernoulli
kamong Z1,...,Zn)
k) in (3) and normalizing, giving us (6b). Finally, outside of these K atoms
khas a distribution Fnkwhich is simply a beta distribution with parameters (mk−
Γ(n + 1 + c)Γ(c + σ),σ?← σ
H?← H.
(7)
3
Page 4
Note that the update depends only on n, not on Z1,...,Zn. In summary, the posterior of µ is simply
an independent sum of an SBP with updated parameters and of fixed atoms with beta distributed
masses. Observe that the posterior µ is not itself a SBP. In other words, the SBP is not conjugate
to Bernoulli process observations. This is different from the beta process and again reminiscent
of PitmanYor processes, where the posterior is also a sum of a PitmanYor process with updated
parameters and fixed atoms with random masses, but not a PitmanYor process [11]. Fortunately,
the nonconjugacy of the SBP does not preclude efficient inference. In the next subsections we de
scribe an Indian buffet process and a stickbreaking construction corresponding to the SBP. Efficient
inference techniques based on both representations for the beta process can be straightforwardly
generalized to the SBP [1, 16, 21].
3.2The Stablebeta Indian Buffet Process
We can derive an Indian buffet process (IBP) corresponding to the SBP by deriving, for each n,
the distribution of Zn+1conditioned on Z1,...,Zn, with µ marginalized out. This derivation is
straightforward and follows closely that for the beta process [16]. For each of the atoms θ∗
posterior of µ(θ∗
kthe
k) given Z1,...,Znis beta distributed with meanmk−σ
n+c. Thus
p(Zn+1(θ∗
k) = 1Z1,...,Zn) = E[µ(θ∗
k)Z1,...,Zn] =mk− σ
n + c
(8)
Metaphorically speaking, customer n + 1 tries dish k with probabilitymk−σ
atoms. Let θ ∈ Θ\{θ∗
n+c. Now for the random
1,...,θ∗
K}. In a small neighborhood dθ around θ, we have:
p(Zn+1(dθ) = 1Z1,...,Zn) = E[µ(dθ)Z1,...,Zn] =
?1
=α
Γ(1 − σ)Γ(c + σ)H(dθ)
=αΓ(1 + c)Γ(n + c + σ)
Γ(n + 1 + c)Γ(c + σ)H(dθ)
?1
0
uΛn(du × dθ)
=
0
uα
Γ(1 + c)
Γ(1 − σ)Γ(c + σ)u−1−σ(1 − u)n+c+σ−1duH(dθ)
Γ(1 + c)
u−σ(1 − u)n+c+σ−1du
?1
0
(9)
Since Zn+1is completely random and H is smooth, the above shows that on Θ\{θ∗
Zn+1is simply a Poisson process with rate measure αΓ(1+c)Γ(n+c+σ)
Poisson(αΓ(1+c)Γ(n+c+σ)
H. In the IBP metaphor, this corresponds to customer n+1 trying new dishes, with each dish associ
ated with a new draw from H. The resulting Indian buffet process is as described in the introduction.
It is automatically infinitely exchangeable since it was derived from the conditional distributions of
the hierarchical model (4).
Multiplying the conditional probabilities of each Zngiven previous ones together, we get the joint
probability of Z1,...,Znwith µ marginalized out:
?
i=1
where there are K atoms (dishes) θ∗
and h is the density of H. (10) is to be contrasted with (4) in [1]. The Kh! terms in [1] are absent
as we have to distinguish among these Khdishes in assigning each of them a distinct atom (this
also contributes the h(θ∗
Z1,...,Znalso indicates the infinite exchangeability of the stablebeta IBP.
1,...,θ∗
K}
Γ(n+1+c)Γ(c+σ)H. In particular, it will have
Γ(n+1+c)Γ(c+σ)) new atoms, each independently and identically distributed according to
p(Z1,...,Zn) = exp
−α
n
?
Γ(1+c)Γ(i−1+c+σ)
Γ(i+c)Γ(c+σ)
? K
k=1
?
Γ(mk−σ)Γ(n−mk+c+σ)Γ(1+c)
Γ(1−σ)Γ(c+σ)Γ(n+c)
αh(θ∗
k), (10)
1,...,θ∗
Kamong Z1,...,Znwith atom k appearing mktimes,
k) terms). The fact that (10) is invariant to permuting the ordering among
3.3 Stickbreaking constructions
In this section we describe stickbreaking constructions for the SBP generalizing those for the beta
process. The first is based on the sizebiased ordering of atoms induced by the IBP [16], while
4
Page 5
the second is based on the inverse L´ evy measure method [22], and produces a sequence of random
atoms of strictly decreasing masses [21].
The sizebiased construction is straightforward: we use the IBP to generate the atoms (dishes) in the
SBP; each time a dish is newly generated the atom is drawn from H and its mass from Fnk. This
leads to the following procedure:
for n = 1,2,...:
for k = 1,...,Jn:
Jn∼ Poisson(αΓ(1+c)Γ(n−1+c+σ)
vnk∼ Beta(1 − σ,n − 1 + c + σ),
∞
?
Γ(n+c)Γ(c+σ)
),
ψnk∼ H,
(11)
µ =
n=1
Jn
?
k=1
vnkδψnk.
The inverse L´ evy measure is a general method of generating from a Poisson process with non
uniform rate measure. It essentially transforms the Poisson process into one with uniform rate,
generates a sample, and transforms the sample back.
SBP because the inverse transform has no analytically tractable form. The L´ evy measure Λ0of
the SBP factorizes into a product Λ0(du×dθ) = L(du)H(dθ) of a σfinite measure L(du) =
α
that we can generate a sample {vl,ψl}∞
pling the masses {vl}∞
associating each vlwith an iid draw ψl∼ H [19]. Now consider the mapping T : (0,1) → (0,∞)
given by
?1
T is bijective and monotonically decreasing. The Mapping Theorem for Poisson processes [19]
shows that {vl}∞
Lebesgue measure on (0,∞). A sample {tl}∞
el ∼ Exponential(1) and setting tl =?l
v1,v2,... is a decreasing sequence of masses. Deriving the density of vlgiven vl−1, we get:
p(vlvl−1) =??dtl
In general these densities do not simplify and we have to resort to solving for T−1(tl) numerically.
There are two cases for which they do simplify. For c = 1, σ = 0, the density function reduces to
p(vlvl−1) = αvα−1
[21]. In the stable process case when c = 1 − σ and σ ?= 0, the density of vlsimplifies to:
p(vlvl−1) = α
= α(1 − σ)v−σ−1
Doing a change of values to yl= v−σ
l
, we get:
?
That is, each ylis exponentially distributed with rate α1−σ
of the parameters we do not have an analytic stick breaking form. However note that the weights
generated using this method are still going to be strictly decreasing.
This method is more involved for the
Γ(1+c)
Γ(1−σ)Γ(c+σ)u−σ−1(1−u)c+σ−1du over (0,1) and a probability measure H over Θ. This implies
l=1of the random atoms of µ and their masses by first sam
l=1∼ PoissonP(L) from a Poisson process on (0,1) with rate measure L, and
T(u) =
u
L(du) =
?1
u
α
Γ(1 + c)
Γ(1 − σ)Γ(c + σ)u−σ−1(1 − u)c+σ−1du.
(12)
l=1∼ PoissonP(L) if and only if {T(vl)}∞
l=1∼ PoissonP(L) where L is
l=1∼ PoissonP(L) can be easily drawn by letting
i=1eifor all l. Transforming back with vl = T−1(tl),
we have {vl}∞
l=1∼ PoissonP(L). As t1,t2,... is an increasing sequence and T is decreasing,
dvl
??p(tltl−1) = α
Γ(1+c)
Γ(1−σ)Γ(c+σ)v−σ−1
l
(1−vl)c+σ−1exp
?
−
?vl−1
vl
L(du)
?
. (13)
l
/vα
l−1, leading to the stickbreaking construction of the single parameter IBP
?
l
exp
σ
(v−σ
l
Γ(2−σ)
Γ(1−σ)Γ(1)v−σ−1
l
× exp
?
−?vl−1
vl
α
Γ(2−σ)
Γ(1−σ)Γ(1)u−σ−1du
− v−σ
?
−α(1−σ)
l−1)
?
.
(14)
p(ylyl−1) = α1−σ
σexp
− α1−σ
σ(yl− yl−1)
and offset by yl−1. For general values
?
.
(15)
σ
3.4Powerlaw Properties
The SBP has a number of appealing powerlaw properties. In this section we shall assume σ > 0
since the case σ = 0 reduces the SBP to the usual beta process with less interesting powerlaw
properties. Derivations are given in the appendix.
5