A sparse variational Bayesian approach for fMRI data analysis.
ABSTRACT The aim of this work is to propose a new approach for the determination of the design matrix in fMRI experiments. The design matrix embodies all available knowledge about experimentally controlled factors and potential confounds. This knowledge is expressed through the regressors of the design matrix. However, in a particular fMRI time series some of those regressors may not be present. In order to take into account this prior information a Bayesian approach based on hierarchical prior, which expresses the sparsity of the design matrix, is used over the parameters of the generalized linear model. The proposed method automatically prunes the columns of the design matrix which are irrelevant to the generation of data. The evaluation of the proposed approach on simulated and real experiments have shown higher performance compared to the conventional ttest approach.

Article: A bayesian spatio  temporal approach for the analysis of FMRI data with non  stationary noise.
[show abstract] [hide abstract]
ABSTRACT: In this work, the bayesian framework is used for the analysis of fMRI data. The novelty of the proposed approach is the introduction of a spatio  temporal model used to estimate the variance of the noise across the images and the voxels. The proposed approach is based on a spatio  temporal version of Generalized Linear Model (GLM). To estimate the regression parameters of the GLM as well as the variance components of the noise, the Variational Bayesian (VB) Methodology is employed. The use of VB methodology results in an iterative algorithm, where the estimation of the regression coefficients and the estimation of variance components of the noise, across images and across voxels, are alternated in an elegant and fully automated way. The proposed approach is compared with the Weighted Least Squares (WLS) approach and both methods are evaluated on a real fMRI experiment.Conference proceedings: ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference 01/2009; 2009:44448. 
Conference Proceeding: A sparse linear model for the analysis of fMRI data with non stationary noise
[show abstract] [hide abstract]
ABSTRACT: In this work we present a Bayesian approach for the estimation of the regression parameters in the analysis of fMRI data when the noise is non  stationary. The proposed approach is based on the variational Bayesian (VB) methodology and the generalized linear model (GLM). The VB methodology permits the use of prior distributions over the parameters of the noise. This results to a very elegant approach to estimate the time varying variance of the noise and to overcome the problem of overparameterization which is present in the estimation procedure. The proposed approach is compared to the weighted least square (WLS) and is evaluated using simulated and real fMRI time series. The proposed approach shows better performance than WLS.Neural Engineering, 2009. NER '09. 4th International IEEE/EMBS Conference on; 01/2009
Page 1
A Sparse Variational Bayesian Approach for fMRI data analysis
Vangelis P. Oikonomou, Evanthia E. Tripoliti and Dimitrios I. Fotiadis
Abstract—The aim of this work is to propose a new approach
for the determination of the design matrix in fMRI experiments.
The design matrix embodies all available knowledge about
experimentally controlled factors and potential confounds. This
knowledge is expressed through the regressors of the design
matrix. However, in a particular fMRI time series some of
those regressors may not be present. In order to take into
account this prior information a Bayesian approach based on
hierarchical prior, which expresses the sparsity of the design
matrix, is used over the parameters of the generalized linear
model. The proposed method automatically prunes the columns
of the design matrix which are irrelevant to the generation of
data. The evaluation of the proposed approach on simulated and
real experiments have shown higher performance compared to
the conventional ttest approach.
I. INTRODUCTION
Functional magnetic resonance imaging (fMRI) is a pro
cedure that uses MR imaging to measure the tiny metabolic
changes that take place in an active part of the brain. fMRI
is becoming the diagnostic method of choice for learning
how a normal, diseased or injured brain is working, as
well as for assessing the potential risks of surgery or other
invasive treatments of the brain. Functional MRI is based
on the increase in blood flow to the local vasculature that
accompanies neural activity of the brain [11], [5]. When
neurons are activated, the resulting increased need for oxygen
is overcompensated by a large increase in perfusion. As a
result, the venous oxyhemoglobin concentration increases
and the deoxyhemoglobin concentration decreases. The latter
has paramagnetic properties and the intensity of the fMRI
images increases in the activated areas. As the conditions
are alternated, the signal in the activated voxels increases and
decreases according to the paradigm. fMRI detects changes
of deoxyhemoglobin levels and generates blood oxygen level
dependent (BOLD) signals related to the activation of the
neurons [11], [5].
The objective of the fMRI data analysis is to detect the
weak BOLD signal from the noisy data and determine the
activated regions of the brain. The analysis of fMRI images
consists from two basic stages, preprocessing and statistical
analysis. Data preprocessing is carried out in four stages,
slice timing, motion correction, spatial normalization and
V.P. Oikonomou is with Unit of Medical Technology and Intelligent
Information Systems Dept. of Computer Science, University of Ioannina
GR 45110 Ioannina, Greece voikonom@cs.uoi.gr
E.E. Tripoliti is with Unit of Medical Technology and Intelligent Informa
tion Systems Dept. of Computer Science, University of Ioannina GR 45110
Ioannina, Greece evi@cs.uoi.gr
D.I. Fotiadis is with Unit of Medical Technology and Intelligent Informa
tion Systems Dept. of Computer Science, University of Ioannina GR 45110
Ioannina, Greece, and Biomedical Research Institute  FORTH GR 45110
Ioannina, Greece fotiadis@cs.uoi.gr
spatial smoothing [5]. All the preprocessing stages are used
for the preparation of the fMRI time series for statistical
analysis. In the statistical analysis a general linear model
(GLM) [6] is used to make inference about the parameters
of the model and after that a statistic is calculated, usually
a t or F statistic [5], to decide if we have activation. For the
estimation of parameters of GLM two general frameworks
exist, which lead to different statistics about the activation,
the classical approach and the bayesian approach. A com
parison between the classical approach and the bayesian one
in fMRI data analysis is not in the scope of particular work.
The interested reader can refer in [7], [9], [16].
The bayesian framework is not new in fMRI data analysis.
Many works have been published in this area. These works
addressed several issues in the fMRI data analysis. In [12]
the authors use the bayesian framework to estimate the
parameters of the GLM. However, in their analysis they use
noninformative prior over the parameters of GLM. This type
of prior is used since there is no prior knowledge about
the parameters. In [17] the authors are concentrated mostly
to the estimation of noise, which is modeled using an AR
model, rather than to the estimation of parameters of GLM.
In [18], [20] bayesian approaches are presented using the
spatial domain. In [13] a bayesian approach is presented
which determines the design matrix in a flexible (automatic)
way. To do that they assume sparsity over the parameters of
GLM. The sparsity has been modeled by an hierarchical prior
which is called Automatic Relevance Determination (ARD)
[15]. However, in the estimation of hyperparameters they use
an ML principle. This approach does not take into account
the variability of hyperparameters. To do that a full bayesian
approach must be used [2]. Our work addresses this problem.
The use of bayesian approach is twofold in our approach.
First to introduce any prior knowledge about the problem
and second to determine automatically the design matrix of
the experiment. These two goals can be achieved through the
choice of prior distribution in the bayesian framework. The
objective in a bayesian approach is to obtain the posterior
distribution and to make inference about the parameters of
GLM. However, this is not an easy task as multiple integra
tions are involved, which are intractable, and approximate
approaches must be used. Initially, the GLM is presented
together with a general bayesian approach and a discussion
about the prior which we use. Next, the Variational Bayesian
Methodology is presented. After that, experiments using
simulated and real data are provided to show the superiority
of the proposed approach against the GLM framework and
finally some conclusions are presented.
Page 2
II. GENERAL LINEAR MODEL AND BAYESIAN
APPROACH
One of the most used models for fMRI data analysis is
the GLM. The GLM is described as:
y = Xw + e
(1)
where y is a Nx1 vector containing the fMRI time series, X
is a known Nxp design matrix, w a px1 vector of parameters
to be estimated and e a Nx1 vector contains the noise. In
our study we assume that the noise is white guassian with
precision (inverse variance) λ. In the case where the noise
is not white, but colored a prewhitening procedure can be
applied [5]. The unknown quantities are the parameters w
and the precision of the noise, λ, must be estimated using
the data. The aim in fMRI data analysis is to determine the
activation regions of the brain through the parameters w. The
parameters w can be estimated using least squares (LS) as:
ˆ w = (XTX)−1XTy.
(2)
However, the LS solution does not utilize any prior informa
tion about the parameters. Also the LS solution in that case
corresponds to Maximun Likelihood (ML) solution. To be
able to use the prior information a bayesian approach must
be adopted. The prior information is coded through the prior
distribution.
In the case of bayesian inference we are interested for
the posterior probability density (pdf) of the parameters
p(w,λy) given the fMRI time serie which from Bayes
theorem is:
p(w,λy) =p(yw,λ)p(w,λ)
p(y)
.
(3)
In the above equation p(w,λ) is the prior distribution of
the parameters. In this pdf we want to express all the prior
knowledge we have about the values of the parameters.
The pdf p(y) is a function of the fMRI time series only
and it is constant with respect to the parameters. Also,
the density p(yw,λ) can be viewed as a function of the
parameters since the fMRI time series is known. In that case
this function is written as L(w,λ;y) and it is called the
likelihood function. The posterior distribution can be written
now as:
p(w,λy) ∝ L(w,λ;y)p(w,λ).
When constructing the prior distribution it is useful to
determine which of the parameters are independently dis
tributed. For this study we assume that the precision of noise
is independent of the parameters w so that:
(4)
p(w,λ) = p(w)p(λ).
(5)
Then, if there exists information about the values a particular
parameter may take, this should be introduced to quantify the
functional form of the prior. When nothing is known about
the parameters it can be expressed by the noninformative
prior distribution [3].
In our study we explored the sparsity of parameters, hence
a natural choice for prior distribution is the Automatic Rel
evance Determination (ARD) prior [15]. More specifically,
the parameter vector w is treated as a random variable with
Gaussian prior of zero mean and variance a−1
element in the vector w:
?
As we can observe new parameters, ai, are introduced.
These parameters are called hyperparameters and control
the prior distribution of the parameter vector w. The ARD
prior is an hierarchical prior [4]. Hierarchical priors are
often designed using conjugate distributions. This happens
for analytical eases and because the previous knowledge
can be readily expressed. The empirical Bayes refers to the
practice of optimizing the hyperparameters of the priors, so
as to maximize the marginal distribution of the dataset. This
practice is suboptimal since it ignores the uncertainty of the
hyperparameters. Alternatively, a more robust approach is to
define priors over the hyperparameters. This leads us to a
full bayesian model.
Now the overall prior over the parameters and the hyper
parameters is:
i
for each
p(wa) =
p
i=1
N(0,a−1
i).
(6)
p(w,λ,a) = p(wa)p(λ)p(a),
and the posterior distribution is:
(7)
p(w,λ,ay) ∝ L(w,λ;y)p(wa)p(λ)p(a).
There are two main goals in bayesian learning. The first is to
obtain the marginal likelihood to perform model comparison
and the second to obtain the posterior distribution of the
parameters to draw conclusions about the specific problem
such as is the activated voxels. In both cases the interested
quantities cannot evaluated analytically as multiple integra
tions entering the problem are intractable. In our case we are
interested mainly for the posterior distribution of parameters
which can not be evaluated in closed form. In such cases
approximate approaches must be used and one such approach
is the Variational Bayesian Methodology [1].
(8)
III. VARIATIONAL BAYESIAN METHODOLOGY
Several methods exist to solve the problem of the hyperpa
rameter estimation. In [14] a comparison between the MLII
method (evidence framework) and the variational approach
is presented. The main conclusion of this work is that the
evidence framework and the variational approach have the
same minimum in the limiting case of the uninformative
prior. However, the variational approach provides us with an
EMlike algorithm, and hence a convergence criterion. Also,
as it is reported in [4] the evidence framework (or Empirical
Bayes) gives us point estimates. However, a problem exists in
parameters overfitting due to ML estimation. The above leads
us to the use of a full bayesian framework which overcome
the problem of parameters overfitting and the acceptance of
a convergence criterion.
Page 3
In the following equations θ represent the quantities to be
estimated, in our case these are the parameters of GLM w,
the precision of the noise, λ, and the hyperparameters of the
prior, ai, i.e. θ = [w,λ,a]. The loglikelihood can be written
as:
logp(y)= log
?
?
p(y,θ)dθ
=log
?
F(q,θ).
q(θ)p(y,θ)
q(θ)
dθ
≥
=
q(θ)logp(y,θ)
q(θ)
dθ
(9)
It can be written also as:
logp(y)=
?
?
?
?
+
q(θ)logp(y)dθ
=
q(θ)log?p(y)p(y,θ)
q(θ)logp(y,θ)
p(θy)dθ
q(θ)logp(y,θ)
q(θ)
?
F(q,θ) + KL(qp(θy)).
p(y,θ)
?dθ
=
=
dθ
q(θ)log
q(θ)
p(θy)dθ
=
(10)
Maximizing F(q,θ) is equal to minimizing the KL di
vergence between the true posterior and the approximate
posterior. The variational free energy F(q,θ) is evaluated
as:
?
=
?
−
=
< logp(yθ) >q(θ)
−KL(qp(θ)),
where < · >q(θ)is the expectation with respect to the ap
proximate posterior of the parameters θ. We want to mention
here that the KL divergence in Eq. (10) is between the
approximate posterior of parameters and the true posterior,
while in Eq. (11) is between the approximate posterior of
parameters and the prior of the parameters.
The goal in a variational approach is to choose a suitable
form of q(θ) so that the lower bound can be evaluated.
In general, we choose a family of qdistributions and we
seek the best approximation within this family by maxi
mizing the lower bound. Since the true loglikelihood is
independent of q this is equivalent to the minimization of
the KL divergence. The KL divergence between the two
distributions q(θ) and p(θy) is minimized when q(θ) =
F(q,θ)=
q(θ)logp(y,θ)
q(θ)
dθ
?
q(θ)logp(yθ)p(θ)
q(θ)
dθ
=
q(θ)logp(yθ)dθ
?
q(θ)logq(θ)
p(θ)dθ
(11)
p(θy) and, thus, the optimal solution for q(θ) is the true
posterior. This solution does not simplify the problem, so
to make progress we consider a more restricted range of q
distribution. One approach is to consider a parametric form
for q(θ) such that q(θ,φ) is governed by a set of parameters
φ [10]. We then minimize the KL divergence with respect
to φ, finding the best approximation within this family. An
alternative approach is to restrict the functional form of q(θ)
by assuming that it factorizes over the component variables
{θi} in θ [1]:
q(θ) =
?
i
qi(θi).
(12)
Minimizing the KL divergence over all the factorial distri
butions qi(θi), we have the following result:
qi(θi) ∝ exp < lnp(y,θ) >k?=i,
where < · >k?=i denotes expectation with respect to the
distributions qk(θk) for all k ?= i.
Now to apply the VB methodology in our problem we
approximate the posterior distribution with the factorized
density:
q(w,a,λ  y) = q(w)q(a)q(λ).
Also we set as prior over the precision λ and over each
hyperparameter aia gamma distribution
(13)
(14)
p(ai) = Γ(ai;bai,cai),
p(λ) = Γ(λ;bλ,cλ),
(15)
(16)
where
Γ(x;b,c) =
1
Γ(c)
xc−1
bc
exp{−x
b}.
(17)
The overall prior over all hyperparameters is given by:
p(a) =
p
?
i=1
p(ai).
(18)
The posterior over the parameter vector w is a Normal
distribution with mean and covariance N(ˆ w,Cw):
ˆ w
Cw
=
=
CwˆλXTXy,
(ˆλXTX + A)−1,
(19)
(20)
where A is a diagonal matrix having the hyperparameters
ai in its diagonal. The posterior over the parameter λ is a
Gamma distribution with parameters:
1
b?
λ
+
=
1
2(yTy − 2yTXˆ w
Tr(HTH(Cw+ ˆ wˆ wT))) +1
bλ,
(21)
c?
λ
ˆλ
=
N
2+ cλ,
b?
λ.
(22)
=
λc?
(23)
Page 4
Finally, the posterior over each hyperparameter ai is a
Gamma distribution with parameters:
1
b?ai
1
2+ cai,
ˆ ai
=
b?
=
< w2
i>
2
+
1
bai
,
(24)
c?
ai
=
(25)
aic?
ai.
(26)
A. Discussion about the prior over the parameters w
The prior over one parameter widepends on the hyperpa
rameter ai. The ”true” prior is given by integrating over the
hyperparameter:
?
The prior over hyperparameter is given by Eq. (15) while
the conditional density p(wi ai) from Eq. (6). Making the
above integration we obtain for the parameter prior:
?1
which is the kernel of a Studentt density. If we allow cai→
0 and bai→ ∞ then we obtain the hyperprior:
p(ai) ∝1
which is an noninformative prior [3]. Now, the true prior for
one parameter, wi, is
p(wi) =
p(wi ai)p(ai)dai.
(27)
p(wi) ∝
bai
+w2
i
2
?−(cai+1
2)
(28)
ai,
(29)
p(wi) ∝
1
wi,
(30)
and for all parameters:
p(w) ∝
p
?
i=1
1
wi.
(31)
This prior is recognized as sparse due to heavy tail and the
sharp peak at zero [2], [19].
IV. RESULTS
A. Simulated Data
We estimated the parameters of GLM using the LS ap
proach and the proposed approach. The statistical evaluation
is performed by computing the ttest value for each voxel of
the image [11]. The ttest is given as:
cTˆ w
?
where c is a contrast vector, w are the parameters evaluated
from each method and cTCwc is the variance of the effects
under each method. Although, the use of ttest for the
bayesian approach is inconsistent, experimental results have
been presented in the literature that shown the usefulness
of this approach [13]. For a discussion on this subject the
interested reader could refer in [9]. The contrast vector c
specifies particular differences of the parameters w. It has
the same length as w and specifies a linear combination of
the parameters cTw.
t =
cTCwc,
(32)
Fig. 1.ROC curves for simulated data.
A comparison of the detection ability of the proposed
method and the conventional t  test is investigated using
the receiver operatic characteristic (ROC) analysis. ROC
analysis reflects the ability of the processing methods to
detect most of the real activations while minimizing the
detections of false activations. In ROC analysis, two values
must be computed the true positive ratio (TPR) and the
false positive ratio (FPR). The ROC curve is a plot of TPR
versus FPR under different threshold ratio. For the simulated
activated voxels the fMRI time series has been modeled as
BOLD response plus a constant mean value plus the noise,
while in the non activated voxels the BOLD response was
absent. The design matrix contains eight regressors, six for
the motion effects, one for the BOLD response and one for
the constant mean value. The parameters wicorrespond to
the motion regressors were set to zero for the two conditions
in the construction of the simulated data. We have created
2000 fMRI time series, from them 1000 corresponds to
activated voxels and the other 1000 to non activated voxels.
The SNR between the BOLD response and the noise in the
case of activated voxels was 9dB. In our experiments the
contrast vector has the following values c = [00000010],
which means that we examine the stimulus condition versus
rest. The zero’s excludes the irrelevant parameters, in our
case the movement parameters and the neutral condition
(mean value). This means that the estimate of the effect is
cTˆ w = ˆ w7. The ROC curves for the two methods are shown
in Fig. 1. As we can observe the proposed approach detect
more real activations under the same FPR. This shown the
higher performance of our method.
B. Real fMRI Data
The proposed method is validated on a block design real
fMRI data. This fMRI experiment was designed for auditory
processing task on a health volunteer. It consisted of 96
acquisitions. The acquisitions were made in blocks of 6,
giving 16 blocks of 42sec duration. The condition for succes
sive blocks alternated between rest and auditory stimulation,
starting with rest. Auditory stimulation was with bisyllabic
words presented binaurally at a rate of 60 words per minute.
The functional data starts at acquisition 16. Due to T1
Page 5
effects the first two blocks were discarded. The whole brain
BOLD/EPI images were acquired on a modified 2T Siemens
MAGNETOM Vision system. Each acquisition consisted of
64 slices (6x64x64, 3mm x 3mm x 3mm voxles). Acquisition
lasted 6.05sec, with the scan to scan repetition time set to
7sec. After preprocessing, functional images consisted of 68
slices (79x95x68, 2mm x 2mm x 2mm voxles). The data have
been downloaded from http://www.fil.ion.ucl.ac.uk/spm/.
The design matrix that was used in order the model
the fMRI experiment consisted of 84 rows (one for each
observation) and 8 columns. The first 6 columns contain the
regressors due to motion (realignment parameters that were
computed in preprocessing stage) and the other 2 columns
contain the regressors for BOLD response and a constant
mean value.
Fig. 2 shows the activation maps resulted from the SPM
using uncorrected height threshold (conventional ttest) and
the posterior probability map [8] of proposed method and a
comparison between them. More specifically Fig. 2(a) and
Fig. 2(b) depict the activated regions that were detected
from the SPM and the proposed method, respectively. Those
images were then converted to binary images (Fig. 3(a) and
Fig. 3(b)) in order to extract the perimeter of the activated
regions detected using the posterior probability map. The
boundaries were superimposed on the first statistical activa
tion map. The result of this action is depicted on Fig. 3(c).
We calculated the activated voxels in each case. Using the
ttest approach we have found 914 activated voxels, while
using the proposed approach we found 364 activated voxels.
Also, we could see that the proposed method could detect
activation in expected regions of auditory cortex with less
erratic points. We can see that the activated regions of the
proposed approach included voxels with stronger activation
(high values of statistical test).
V. CONCLUSIONS
The GLM is a useful tool for fMRI data analysis. At the
core of GLM analysis is the design matrix, which describes
the various effects of the experiments. The construction of
design matrix is critical for the gathered conclusions. In
the classical approach, the design matrix is defined before
the analysis in a strict way. To construct a more flexible
design matrix the bayesian approach is used, which gives
the ability to use prior knowledge about the design matrix.
The proposed method automatically prunes the columns of
the design matrix which are irrelevant to the generation of
data. This property of the proposed approach give us the
ability to have a design matrix which is defined during the
analysis of the data and not before this. The experiments,
based on real and simulated data, have shown the usefulness
of the proposed approach compared to the conventional ttest
analysis.
REFERENCES
[1] M. Beal. Variational Algorithms for Approximate Bayesian Inference.
PhD thesis, Gatsby Computational Neuroscience Unit, Univ. College
London, London, U.K., 2003.
[2] C.M. Bishop and M.E. Tipping. Variational relevance vector machines.
Proc. 16th Conf. Uncertainty in Artificial Intelligence, pages 46–53,
2000.
[3] G.E.P. Box and G.C. Tiao. Bayesian inference in statistical analysis.
John Wiley and Sons, Inc, 1973.
[4] B.P. Charlin and T.A. Louis. Bayes and Empirical Bayes Methods for
Data Analysis. CRC Press, New York, NY, 2000.
[5] R.S.J. Frackowiak, J.T. Ashburner, W.D. Penny, S. Zeki, K.J. Friston,
C.D. Frith, R.J. Dolan, and C.J. Price. Human Brain Function, Second
Edition. Elsevier Science, USA, 2004.
[6] K. J. Friston. Analysis of fmri time series revisited. Neuroimage,
2:45–53, 1995.
[7] K. J. Friston, D. E. Glaser, R. N. A. Henson, S. Kiebel, C. Phillips,
and J. Ashburner. Classical and bayesian inference in neuroimaging:
Applications. NeuroImage, 16:484–512, June 2002.
[8] K. J. Friston and W. Penny. Posterior probability maps and spms.
NeuroImage, 19, July 2003.
[9] K. J. Friston, W. Penny, C. Phillips, S. Kiebel, G. Hinton, and
J. Ashburner.Classical and bayesian inference in neuroimaging:
Theory. NeuroImage, 16:465–483, June 2002.
[10] T.S. Jaakola.Variational methods for inference and learning in
graphical models. PhD thesis, Mass.Inst.Technol., Campribge, MA,
1997.
[11] P. Jezzard, P. M. Matthews, and S. M. Smith. Functional MRI: An
Introduction to Methods. Oxford University Press, USA, 2001.
[12] J. Kershaw, B.A. Ardekani, and I. Kanno. Application of bayesian
inference to fmri data analysis. Medical Imaging, IEEE Transactions
on, 18(12):1138–1153, Dec 1999.
[13] H. Luo and S. Puthusserypady.
determination of flexible design matrix for fmri data analysis. Circuits
and Systems I: Regular Papers, IEEE Transactions on, 52(12):2699–
2706, Dec. 2005.
[14] D. MacKay. Probable networks and plausible predictions  a review of
practical bayesians methods for supervised neural networks. Network:
Computation in Neural Systems, 6:469–505, 1995.
[15] D. J. MacKay. Bayesian interpolation. Neural Computation, 4:415–
447, 1992.
[16] M.A. Mohamed, F. AbouChadi, and B.K. Ouda. Analysis of fmri data
using classical and bayesian approaches: A comparative study. IFMBE
Proceedings, World Congress on Medical Physics and Biomedical
Engineering 2006, 14:924–931, 2006.
[17] W. Penny, S. Kiebel, and K. Friston. Variational bayesian inference
for fmri time series. NeuroImage, 19:727–741, July 2003.
[18] W. D. Penny, N. J. TrujilloBarreto, and K. J. Friston. Bayesian fmri
time series analysis with spatial priors. NeuroImage, 24:350–362, Jan.
2005.
[19] D.P. Wipf and B.D. Rao. Sparse bayesian learning for basis selection.
IEEE Transactions on Signal Processing, 52:2153–2164, August 2004.
[20] M.W. Woolrich, M. Jenkinson, J.M. Brady, and S.M. Smith. Fully
bayesian spatiotemporal modeling of fmri data. Medical Imaging,
IEEE Transactions on, 23(2):213–231, Feb. 2004.
A sparse bayesian method for