Page 1
FAST ADAPTIVE VARIATIONAL SPARSE BAYESIAN LEARNING WITH AUTOMATIC
RELEVANCE DETERMINATION
Dmitriy Shutin†
Thomas Buchgraber?
Sanjeev R. Kulkarni†
H. Vincent Poor†
†Department of Electrical Engineering, Princeton University, USA
?Signal Processing and Speech Comm. Lab., Graz University of Technology, Austria
ABSTRACT
In this work a new adaptive fast variational sparse Bayesian learn-
ing (V-SBL) algorithm is proposed that is a variational counterpart
of the fast marginal likelihood maximization approach to SBL. It al-
lows one to adaptively construct a sparse regression or classification
function as a linear combination of a few basis functions by mini-
mizing the variational free energy. In the case of non-informative
hyperpriors, also referred to as automatic relevance determination,
the minimization of the free energy can be efficiently realized by
computing the fixed points of the update expressions for the varia-
tional distribution of the sparsity parameters. The criteria that estab-
lish convergence to these fixed points, termed pruning conditions,
allow an efficient addition or removal of basis functions; they also
have a simple and intuitive interpretation in terms of a component’s
signal-to-noise ratio. It has been demonstrated that this interpreta-
tion allows a simple empirical adjustment of the pruning conditions,
which in turn improves sparsity of SBL and drastically accelerates
the convergence rate of the algorithm. The experimental evidence
collected with synthetic data demonstrates the effectiveness of the
proposed learning scheme.
1. INTRODUCTION
During the past decade, research on sparse signal representations has
received considerable attention [1–5]. With a few minor variations,
the general goal of sparse reconstruction is to optimally estimate the
parameters of the following canonical model:
t = Φw + ξ,
(1)
where t ∈ RNis a vector of targets, Φ = [φ1,...,φL] is a design
matrix with L columns corresponding to basis functions φl∈ RN,
l = 1,...,L, and w = [w1,...,wL]Tis a vector of weights that
are to be estimated. The additive perturbation ξ is typically assumed
tobea whiteGaussianrandom vectorwithzero meanand covariance
matrix Σ = τ−1I, where τ is a noise precision parameter. Impos-
ing constraints on the model parameters w is key to sparse signal
modeling [3].
In sparse Bayesian learning (SBL) [2, 4, 6] the weights w are
constrained using a parametric prior probability density function
(pdf) p(w|α) = N(w|0,diag(α)−1), with the prior parameters
α = [α1,...,αL]T, also called sparsity parameters, being inversely
This work was supported in part by an Erwin Schr¨ odinger Postdoctoral
Fellowship, FWF Project J2909-N23, in part by the Austrian Science Fund
(FWF) under Award S10604-N13 within the national research network SISE,
and in part by the U.S. Office of Naval Research under Grant N00014-09-1-
0342, the U.S. Army Research Office under Grant W911NF-07-1-0185, and
by the NSF Science and Technology Center Grant CCF-0939370
proportional to the width of the pdf. Naturally, a large value of αl
will drive the corresponding weight wl to zero, thus encouraging a
solution with only a few nonzero coefficients.
In the relevance vector machine (RVM) approach to the SBL
problem [2] the sparsity parameters α are estimated by maximizing
the marginal likelihood p(t|α,τ) =?p(t|w,τ)p(w|α)dw, which
approach is then referred to as the Evidence Procedure (EP) [2]. Un-
fortunately, the RVM solution is known to converge rather slowly
and the computational complexity of the algorithm scales as O(L3)
[2,7]; this makes the application of RVMs to large data sets imprac-
tical. In [7] an alternative learning scheme was proposed to alleviate
this drawback. This scheme exploits the structure of the marginal
likelihood function to accelerate the maximization via a sequential
addition and deletion of candidate basis functions, thus allowing ef-
ficient implementations of SBL even for “very wide” matrices Φ.
An alternative approach to SBL is based on approximating the
posterior p(w,τ,α|t) with a variational proxy pdf q(w,τ,α) =
q(w)q(τ)q(α) [8] such as to minimize the variational free energy
[9]. There are several advantages of the variational solution to SBL
as compared to that proposed in [2] and [7]: first, the distributions
rather than point estimates of the unobserved variables can be ob-
tained. Second, the variational approach to SBL allows one to ob-
tain analytical approximations to the distributions of interest even
when exact inference of these distributions is intractable. Finally,
the variational methodology provides a general tool for inference on
graphical models that represent extensions of (1), e.g., different pri-
ors, parametric design matrices, etc. Unfortunately, the variational
approach in [8] is equivalent to RVM in terms of estimation com-
plexity and rate of convergence. Also, due to the nature of the vari-
ational approximation, it is no longer possible to exploit the struc-
ture of the marginal likelihood function to implement the learning
more efficiently: the pdfs q(α) and q(τ) are estimated such as to
approximate the true posterior pdfs, thus obscuring the structure of
the marginal likelihood that was exploited in [7].
Nonetheless, it can be shown [10] that by computing the fixed
points of the update expressions for the variational parameters of
q(α) one can establish a dependency between the optimum of the
variational free energy and a sparsity parameter of a single basis
function φl. Our goal in this paper is to extend these results by
constructing a fast adaptive variational SBL scheme that can be used
to implement adaptive SBL by allowing deletion and addition of new
basis functions. We show that the criteria that guarantee the conver-
gence of a sparsity parameter for a basis function, which we term
the pruning condition, can be used either to prune or to add new ba-
sis functions. The computation of the pruning conditions requires
knowing only the target vector t and the posterior covariance matrix
of the weights w. Moreover, the proposed adaptive scheme requires
only O(L2) operations for adding or deleting a basis function. We
is also termed model evidence [2,6]; the corresponding estimation
2180978-1-4577-0539-7/11/$26.00 ©2011 IEEEICASSP 2011
Page 2
also show that, when adding a new basis function to the model, the
fixed points of V-SBL also maximize the marginal likelihood func-
tion. However, in contrast to the pruning conditions used in [7], our
conditions have a simple interpretation that allows constructing a
sparse representation that guarantees a certain desired quality of the
estimated components in terms of their individual signal-to-noise ra-
tios (SNRs).
Throughout the paper we shall make use of the following nota-
tion. Vectors and matrices are represented as respectively boldface
lowercase letters, e.g., x, and boldface uppercase letters, e.g., X.
For vectors and matrices (·)Tdenotes the transpose. The expres-
sion [B]lkdenotes a matrix obtained by deleting the lth row and
kth column from the matrix B; similarly, [b]ldenotes a vector ob-
tained by deleting the lth element from the vector b. With a slight
abuse of notation we will sometimes refer to a matrix as a set of
column vectors; for instance we write a ∈ X to imply that a is a
column in X, and X \ a to denote a matrix obtained by deleting
the column vector a ∈ X. We use el = [0,...,0,1,0,...,0]T
to denote a canonical vector of appropriate dimension. Finally, for
a random vector x, N(x|a,B) denotes a multivariate Gaussian pdf
with mean a and covariance matrix B; similarly, for a random vari-
able x, Ga(x|a,b) =
with parameters a and b.
ba
Γ(a)xa−1exp(−bx) denotes a gamma pdf
2. VARIATIONAL SPARSE BAYESIAN LEARNING
For the purpose of further analysis let us assume that we have a dic-
tionary D of some potential basis functions. D is assumed to consist
of an active dictionary Φ used in (1) and a passive dictionary ΦC
such that D = Φ ∪ ΦCand Φ ∩ ΦC= ∅.
In SBL it is assumed that the joint pdf of all the variables fac-
tors as p(w,τ,α,t) = p(t|w,τ)p(w|α)p(α)p(τ) [2,4,6]. Under
the Gaussian noise assumption, p(t|w,τ) = N(t|Φw,τ−1I).
The sparsity prior p(w|α) is assumed to factor as p(w|α) =
?L
choice would be a gamma distribution due to conjugacy properties,
e.g., p(τ) = Ga(τ|c,d). The prior p(αl), also called the hyperprior
of the lth component, is selected as a gamma pdf Ga(αl|al,bl).
We will however consider an automatic relevance determination
scenario, obtained when al= bl= 0 for all components; this choice
renders the hyperpriors non-informative [2,6].
The variational solution to SBL is obtained by finding an
approximating pdf q(w,α,τ) = q(w)q(τ)?L
are the variational approximating factors.
q(w,α,τ)thevariationalparameters{ˆ w,ˆS,ˆ c,ˆd,ˆ a1,ˆb1,...,ˆ aL,ˆbL}
can be found in closed form as follows [8]:
l=1p(wl|αl), where p(wl|αl) = N(wl|0,α−1
the prior p(τ) is arbitrary in the context of this work; a convenient
l). The choice of
k=1q(αk), where
q(w) = N(w|ˆ w,ˆS), q(αl) = Ga(αl|ˆ al,ˆbl), andq(τ) = Ga(τ|ˆ c,ˆd)
With this choice of
ˆS =
?
ˆ τΦTΦ + diag(ˆ α)
?−1
,
ˆ w = ˆ τˆSΦTt
(2)
ˆ al= al+ 1/2,
ˆbl= bl+ 1/2(| ˆ wl|2+ˆSll),
ˆd = d +?t − Φˆ w?2+ Trace(ˆSΦTΦ)
(3)
ˆ c = c +N
2, and
2
, (4)
where ˆ τ = Eq(τ){τ} = ˆ c/ˆd, ˆ αl = Eq(αl){αl} = ˆ al/ˆbl, ˆ wlis the
lth element of the vector ˆ w, andˆSllis the lth element on the main
diagonal of the matrixˆS.
2.1. Adaptive fast variational SBL
Although expressions (2)-(4) reduce to those obtained in [2] when
the approximating factors q(τ) and q(αl) are chosen as Dirac mea-
sures on the corresponding domains1, they do not reveal the structure
of the marginal likelihood function that leads to an efficient SBL al-
gorithm in [7]. Nonetheless, an analysis similar to [7] can be per-
formed by computing the fixed points of the update expression for
the variational parameters of a single factor q(αl) [10]. In [10] we
have shown that, for a given basis function φl∈ Φ, the sequence of
estimates {ˆ α[m]
l
}M
and q(αl), converges to the following fixed point ˆ α[∞]
m=1, obtained by successively updating pdfs q(w)
l
as M → ∞:2
ˆ α[∞]
l
=
?
(ω2
∞
l− ςl)−1
ω2
ω2
l> ςl
l≤ ςl
,
(5)
where ςland ωlare the pruning parameters defined as
ςl= (ˆ τφT
lφl− ˆ τ2φT
lΦlˆSlΦT
lΦlˆSlΦT
lφl)−1,
lt)2.
(6)
ω2
l= (ˆ τςlφT
lt − ˆ τ2ςlφT
(7)
The parameters ςland ω2
ldepend on Φl= Φ \ φl, ˆ αl= [ˆ α]l, and
ˆSl= (ˆ τΦT
lΦl+ diag(ˆ αl))−1=
?
ˆS −
ˆSeleT
eT
lˆS
lˆSel
?
ll
,
(8)
where (8) is the posterior covariance matrix of the weights obtained
when the basis function φlis removed from Φ. The result (5) pro-
vides a simple criterion for pruning a basis function from the active
dictionary: a finite value of ˆ α[∞]
l
instructs us to keep the lth compo-
nent in the model since it should minimize the free energy, while the
infinite value of ˆ α[∞]
l
indicates that the basis function φlis superflu-
ous. It can be shown [10] that the test parameters ω2
squared weight of the basis function φland the weight’s variance
computed when ˆ αlequals zero – a fact that will become more evi-
dent later when we consider an inclusion of a basis function in the
active dictionary. The pruning test is applied sequentially to all the
basis functions in the active dictionary Φ to determine which com-
ponents should be pruned. Algorithm 1 summarizes the key steps of
this procedure. For further details we refer the reader to [10].
land ςlare the
Algorithm 1 A test for a deletion of a basis function φl∈ Φ
function TestComponentPrune(l)
Compute ςlfrom (6) and ω2
if ω2
Δˆ αl← ((ω2
ˆS ←ˆS −
Δˆ α−1
l
+eT
l
else
ˆS ←ˆSl, ˆ α ← [ˆ α]l, ΦC← [ΦC,φl], Φ ← Φ \ φl
end if
lfrom (7)
l> ςlthen
l− ςl)−1− ˆ αl);
ˆSeleT
lˆS
ˆSel,ˆ αl← (ω2
l− ςl)−1
It is natural to ask whether this scheme can be made fully adap-
tive by also allowing inclusion of new basis functions from the pas-
sive dictionary ΦC. Let us assume that at some iteration of the al-
1In [2] the posterior pdf of the weights w is Gaussian; its parameters
coincide with the variational parameters of q(w) in (2).
2Notice that since al= bl= 0, the parameters of q(αl) in (3) can be
specified as ˆ al= 1/2 andˆbl= 1/(2ˆ αl), where ˆ αl= 1/( ˆ w2
it makes sense to study the fixed point of the variational update expressions
in terms of ˆ αl, rather than in terms of ˆ alandˆbl.
l+ˆSll) . Thus,
2181
Page 3
gorithm we have L basis functions in the active dictionary Φ and an
estimate ofˆS and ˆ α. Our goal is to test whether a basis function
φL+1∈ ΦCshould be included in Φ.
Assume for the moment that φL+1is in the active dictionary
and that ΦL+1 = [Φ,φL+1], ˆ αL+1 = [ˆ αT, ˆ αL+1]TandˆSL+1 =
(ˆ τΦT
pute ˆ α[∞]
should be kept in the model. This can be done efficiently using only
the current active dictionary Φ and the corresponding covariance
matrixˆS. First, consider a matrix SL+1 obtained fromˆSL+1 by
setting ˆ αL+1 = 0:
L+1ΦL+1+ diag(ˆ αL+1))−1are available. Then we can com-
L+1from (5) and determine whether the basis function φL+1
SL+1 =
?ˆ τΦTΦ + diag(ˆ α)
?
ˆ τΦTφL+1
ˆ τφT
L+1φL+1
−ˆ τˆSΦTφL+1y−1
y−1
L+1
ˆ τφT
L+1Φ
X−1
L+1
L+1φT
?−1
=
L+1
−ˆ τy−1
L+1ΦˆS
?
,
(9)
where XL+1 =ˆS
−1− ˆ τΦTφL+1(φT
L+1φL+1)−1φT
L+1Φ and
yL+1 = ˆ τφT
L+1φL+1− ˆ τ2φT
L+1ΦˆSΦTφL+1.
(10)
By comparing (6) and (10) we immediately notice that ςL+1 =
y−1
Thus, ω2
L+1, whichisthevarianceoftheweightforφL+1when ˆ αL+1 = 0.
L+1can be computed from ΦL+1and SL+1as
ω2
L+1= eT
L+1(ˆ τSL+1ΦT
L+1t)(ˆ τSL+1ΦT
L+1t)TeL+1.
(11)
By substituting (9) into (11) and simplifying the resulting expression
we finally obtain
ω2
L+1= (ˆ τςL+1φT
L+1t − ˆ τ2ςL+1φT
L+1ΦˆSΦTt)2,
(12)
which is identical to (7) with an exception that (12) uses a new basis
function φL+1, a current design matrix Φ and a weight covariance
matrixˆS. Once the test parameters ω2
we can test if the basis function φL+1should be included in the ac-
tivedictionaryΦ. Itshouldbementionedthatin[7]theauthorscom-
pute parameters qL+1 and sL+1 that maximize the marginal likeli-
hood function (see Eq. (19) in [7]). In fact, it is easy to notice
that the expressions for ω2
incide respectively with the q2
the corresponding pruning test and the value of the sparsity param-
eter coincide. In the case of pruning an existing basis function the
relationship is not that straightforward; nonetheless, the simulation
results indicate that the both adaptive schemes achieve almost iden-
tical performance in terms of the mean squared error and the sparsity
of the estimated models.
In case when the new basis function is accepted, the weight co-
variance matrix has to be updated as well. Luckily this can be done
quite simply as follows:
L+1and ςL+1 are computed,
L+1in (12) and ςL+1 = y−1
L+1/s2
L+1in (10) co-
L+1. Consequently,
L+1and s−1
ˆSL+1 =
⎛
⎜
⎝
˘ X
−1
L+1
−ˆ τ
ˆSΦTφL+1
ˆ α[∞]
L+1+yL+1
(ˆ α[∞]
−ˆ τ
φT
L+1ΦˆS
ˆ α[∞]
L+1+yL+1
L+1+ yL+1)−1
⎞
⎟
⎠,
(13)
where˘ XL+1 =ˆS
−1−
ˆ τΦTφL+1φT
ˆ α[∞]
L+1+φT
L+1Φ
L+1φL+1. The inverse of a Schur
complement˘ XL+1 can be computed efficiently using a rank-one
update [11].
In Algorithm 2 we summarize the main steps of this procedure.
Observe that the conditions for adding or deleting a basis function
Algorithm 2 A test for an addition of a basis function φL+1∈ ΦC
function TestComponentAdd(φL+1)
Compute ςL+1 = y−1
if ω2
ˆ α[∞]
Φ ← [Φ,φL+1]; ΦC← ΦC\ φL+1; updateˆS from (13)
else
Reject φL+1
end if
L+1from (10) and ω2
L+1from (12)
L+1> ςL+1then
L+1← (ω2
L+1− ςL+1)−1; ˆ α ← [ˆ αT, ˆ α[∞]
L+1]T
depend exclusively on the measurement t and the matrixˆS that es-
sentially determines how well a basis function “aligns” or correlates
with the other basis functions in the active dictionary.
Notice that the ratio ω2
the component signal-to-noise ratio3SNRl = ω2
SBL prunes a component if its estimated SNR is below 0dB. This
interpretation allows a simple adjustment of the pruning condition
as follows:
ω2
where SNR?is the adjustment SNR. This modified pruning condi-
tion (14) can be used both when adding as well as when pruning
components. Such adjustment might be of a practical interest in sce-
narios for which the true SNR is known and the goal is to delete spu-
rious components introduced by SBL due to the “imperfection” of
the Gaussian sparsity prior or when we wish a sparse estimator that
guarantees a certain quality of the estimated components in terms of
their SNRs.
l/ςlcan be interpreted as an estimate of
l/ςl [10]. Thus,
l> ςl× SNR?,
(14)
2.2. Implementation aspects
Variational inference typically requires choosing initial values for
the variational parameters of q(w,α,τ). Obviously, the adaptive
ability of the algorithm can be exploited to recover the initial factor-
ization by assuming Φ = ∅, and ΦC= D and selecting the initial
value of the noise precision ˆ τ. The algorithm sequentially adds com-
ponents using Alg. 2 and prunes irrelevant ones using Alg. 1. The
pdfs q(w) and q(τ) can be re-computed from (2) and (4) at any stage
of the algorithm. It is important to mention that the order in which
basis functions are added from the passive dictionary influences the
final sparsity of the algorithm, which, as has been pointed out in [7],
is related to the greediness of the fast SBL method. In our imple-
mentation of the fast adaptive V-SBL algorithm we rank all the com-
ponents in ΦCby pre-computing their sparsity parameters ˆ αl and
begin the inclusion with those basis functions that have the smallest
value of ˆ αl, i.e, those functions that are best aligned with the mea-
surement t. Also notice that updating q(τ) requires re-computingˆS,
which requires O(L3) operations.
3. SIMULATION RESULTS
In this section we compare the performance results of the fast adap-
tive V-SBL with the standard RVM algorithm [2], the fast marginal
likelihood maximization method [7], and fast adaptive variational
SBL with an SNR-adjusted threshold, via simulations. The standard
RVM scheme is non-adaptive; we thus assume that all the available
3To gain some intuition into why this is so consider (9) and (11) when
L = 0 and Φ = ∅.
2182
Page 4
basis functions are included in the active dictionary. Note that the
standard RVM algorithm requires one to specify a threshold for the
hyperparameters ˆ αl, ∀l, to “detect” the divergence of the hyperpa-
rameter values. Obviously, this affects the performance of the stan-
dard RVM algorithm. In order to simplify the analysis of the simu-
lation results we assumed the variance ˆ τ−1of the noise to be known
in all simulations. For all compared algorithms the same conver-
gence criterion is used: an algorithm stops when (i) the number of
basis functions between two consecutive iterations has stabilized and
when (ii) the ?2-norm of the difference between the values of hyper-
parameters at two consecutive iterations is less than 10−4.
To test the algorithms we use basis functions φk∈ RN, k =
1,...,K, generated by drawing samples from a multivariate Gaus-
sian distribution with zero mean and covariance matrix I, and a
sparse vector w with L = 10 nonzero elements equal to 1 at ran-
dom locations. With this setting we aim to test how the algorithm’s
pruning mechanism performs when the exact sparsity of the model
is known. We set N = 100 and K = 200. The target vector t
is generated according to (1). The performance of the tested algo-
rithms, averagedover100independentruns, issummarizedinFig.1.
For adaptive algorithms each iteration corresponds to two steps: 1)
−1001020 304050
−60
−40
−20
0
NMSE (dB)
SNR (dB)
(a)
−10010203040 50
0
10
20
30
Number of Bfs
SNR (dB)
(b)
10
0
10
1
10
2
10
3
0
50
100
150
200
Number of Bfs
Number of iterations
(c)
Standard RVM, threshold 104
Standard RVM, threshold 1014
Fast marginal likelihood
Fast adaptive SBL
Fast adaptive SBL (SNR−adjusted)
Fig. 1.
(NMSE) versus the signal to noise ratio (SNR); (b) the estimated
number of components versus the SNR; (c) the estimated number of
components versus the number of iterations for SNR=30dB.
SBL performance.(a) Normalized mean-square error
adding components from the passive dictionary and 2) removing
components from the active dictionary. As we see in Fig. 1(a) and
1(b) the variational SBL with adjusted pruning condition (14) out-
performs the other estimation methods in terms of normalized mean-
square error (NMSE) as well as in terms of the number of estimated
components. In fact, it is able to recover the true model sparsity for
SNR > 10dB. The standard RVM recovers the true model spar-
sity only for SNR > 40dB with a pruning threshold 104; increas-
ing this threshold leads to an over-estimation of the true sparsity.
In Fig. 1(c) we plot the convergence rate of the algorithms for the
SNR = 30dB. Here the variational SBL with SNR-adjusted prun-
ing is also a clear winner, reaching the stopping criterion in less
than 10 iterations. Note, however, that both the fast variational SBL
algorithm without the SNR-adjusted pruning and the fast marginal
likelihood maximization algorithm exhibit very fast convergence;
nonetheless they tend to overestimate the true model sparsity, as seen
from Fig. 1(b).
4. CONCLUSION
In this work a fast adaptive variational Sparse Bayesian Learning
(V-SBL) framework has been considered. The fast V-SBL algorithm
optimizes a variational free energy with respect to variational param-
eters of the pdf of a single component. The fixed points of sparsity
parameter update expressions as well as conditions that guarantee
convergence to these fixed points – pruning conditions – have been
obtained in a closed form. This significantly improves the conver-
gence rate of V-SBL. The pruning conditions also reveal the rela-
tionship between the performance of SBL in terms of the number
of estimated components and a measure of SNR. This relationship
enables an empirical adjustment that allows inclusion of the compo-
nents that guarantee a predefined quality in terms of their individ-
ual SNRs. Setting the adjustment parameter to the true SNR allows
extraction of the true sparsity in simulated scenarios. Simulation
studies demonstrate that this adjustment further accelerates the con-
vergence rate of the algorithm.
5. REFERENCES
[1] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic de-
composition by basis pursuit,” SIAM J. Scientific Computing,
vol. 20, pp. 33–61, 1998.
[2] M. Tipping, “Sparse Bayesian learning and the relevance vec-
tor machine,” J. Machine Learning Res., vol. 1, pp. 211–244,
June 2001.
[3] M. Wakin, “An introduction to compressive sampling,” IEEE
Signal Process. Mag., vol. 25, no. 2, pp. 21–30, Mar. 2008.
[4] D. G. Tzikas, A. C. Likas, and N. P. Galatsanos, “The vari-
ational approximation for Bayesian inference,” IEEE Signal
Process. Mag., vol. 25, no. 6, pp. 131–146, November 2008.
[5] W. Bajwa, J. Haupt, A. Sayeed, and R. Nowak, “Compressed
channel sensing: A new approach to estimating sparse multi-
path channels,” Proceedings of the IEEE, vol. 98, no. 6, pp.
1058–1076, Jun. 2010.
[6] D. Wipf and B. Rao, “Sparse Bayesian learning for basis selec-
tion,” IEEE Trans. on Signal Process., vol. 52, no. 8, pp. 2153
– 2164, Aug. 2004.
[7] M. E. Tipping and A. C. Faul, “Fast marginal likelihood max-
imisation for sparse Bayesian models,” in Proc. 9th Int. Work-
shop Artif. Intelligence and Stat., Key West, FL, January 2003.
[8] C. M. Bishop and M. E. Tipping, “Variational relevance vec-
tor machines,” in Proc. 16th Conf. Uncer. in Artif. Intell.
San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,
2000, pp. 46–53.
[9] C. M. Bishop, Pattern Recognition and Machine Learning.
New York: Springer, 2006.
[10] D. Shutin, T. Buchgraber, S. R. Kulkarni, and H. V. Poor, “Fast
variational sparse Bayesian learning with automatic relevance
determination,” submitted to IEEE Trans. on Signal Process.
[11] W.W.Hager, “Updatingtheinverseofamatrix,” SIAMReview,
vol. 31, no. 2, pp. pp. 221–239, 1989.
2183
Download full-text