Page 1

FMRI ANALYSIS THROUGH BAYESIAN VARIABLE SELECTION WITH A SPATIAL PRIOR

Jing Xia1, Feng Liang1, Yongmei Michelle Wang1,2,3

Department of Statistics1,Psychology2,Bioengineering3

University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA

ABSTRACT

This paper presents a novel spatial Bayesian method for si-

multaneous activation detection and hemodynamic response

function (HRF) estimation of functional magnetic resonance

imaging (fMRI) data. A Bayesian variable selection approach

is used to induce shrinkage and sparsity, with a spatial prior

on latent variables representing activated hemodynamic re-

sponse components. Then, the activation map is generated

from the full spectrum of posterior inference constructed

through a Markov chain Monte Carlo scheme, and HRFs at

different voxels are estimated non-parametrically with in-

formation pooling from neighboring voxels. By integrating

functional activation detection and HRFs estimation in a uni-

fied framework, our method is more robust to noise and less

sensitive to model mis-specification.

Index Terms— Bayesian variable selection, hemody-

namic response function, activation detection, spatial prior,

Markov chain Monte Carlo (MCMC).

1. INTRODUCTION

Functional magnetic resonance imaging (fMRI) is an emerg-

ing modality that provides insights into the study of brain

functions. Most fMRI techniques are based on the so-called

blood oxygen level dependent (BOLD) contrast. One aim of

fMRI analysis is to detect regions that respond to certain stim-

uli, i.e., activation regions. In fact, the fMRI measurement is

indirectly related to neuronal activity through a process that

is still under investigation. A challenge in fMRI is that the

fMRI response to stimuli is not instantaneous, but lagged and

damped by the so-called hemodynamic response. Estimating

hemodynamic response functions (HRFs) has gained increas-

ing interests recently, since it provides not only a deep insight

into the underlying dynamics of human brain, but also a basis

for making inference of brain activation regions [1]. In this

paper, we propose a spatial Bayesian method for simultane-

ous activation detection and HRF estimation through variable

selection of a fMRI regression model.

There is a rich literature on fMRI analysis. In the stan-

dard general linear model (GLM) approach [2], HRFs are as-

sumed to be the same at each voxel and take a certain shape

such as a Gamma function or a linear combination of Gamma

functions. This is over-constrained and may lead to inaccu-

rate detection of activation regions. In addition, the voxels

are processed and analyzed separately with little considera-

tion of the spatial correlations among neighboring voxels. Re-

cently, a number of Bayesian methods have been proposed to

model the spatial dependence in fMRI data [3, 4, 5]. Some of

these methods place prior distributions on the activation am-

plitudes that may lead to over-smoothed anatomically mean-

ingful peaks or edges between active and inactive regions;

some assume specific parametric HRFs that could introduce

bias in activation detection due to model mis-specification.

We propose a new spatial Bayesian method to detect ac-

tivation regions and estimate HRFs simultaneously. The two

goals are achieved through a Bayesian variable selection for

a fMRI regression model with nonparametric HRFs and a

Markov random field (MRF) prior. Compared with existing

methods for fMRI analysis [6], our work has several dis-

tinct features and advantages. First, the HRF at each voxel

is modeled non-parametrically, which reduces bias due to

model mis-specification and enhances the power of activation

detection.Second, neighborhood information is incorpo-

rated through a spatial MRF prior placed on latent variables

representing activated hemodynamic response components.

This leads to more accurate HRF estimation. Third, detect-

ing functional activation and estimating HRFs are integrated

in a unified (instead of two stage) framework, which makes

our method more robust and less sensitive to model mis-

specification. Last, the activation map is generated from the

full spectrum of posterior inference constructed through a

Markov chain Monte Carlo (MCMC) scheme.

2. STATISTICAL MODELING IN FMRI

Let {yit;t = t1,··· ,tn} denote the fMRI signal at voxel i.

We assume that the observed data consist of BOLD signal

response ritand some noise process eit, and model rita con-

volution assuming a linear time invariant system:

yit= st∗ hit+ eit,

where ∗ denotes the convolution operator, stis the external

input stimulus at time t with st= 1 or 0 indicating the pres-

ence or absence of a stimulus, and hitis the HRF at time t

after neural activity. There are several sources of fMRI noise,

such as system noise, motion noise, etc. Following [5] we

model the noise eit as two parts: the large-scale variation

or deterministic drift and the stationary short-scale stochas-

tic variation. The deterministic drift can be removed in a

preprocessing stage, for example, by the SPM package. The

short-scale stochastic variation can be modeled by AR(1) or

ARMA(1,1) as suggested in [2].

(1)

714978-1-4244-3932-4/09/$25.00 ©2009 IEEEISBI 2009

Page 2

Equivalently the model (1) can be expressed in a matrix

notation as

yi= Shi+ ei,

where S is a Toeplitz matrix with the (ij)th entry equal to

s(ti−tj) if i ≥ j and 0 otherwise, and yi= (yi1,··· ,yin)T,

hi= (hi1,··· ,hin)Tand ei= (ei1,··· ,ein)Trepresent the

data, the HRF and the noise at the n time points at voxel i,

respectively. An advantage of our model (2) is that it models

the HRF nonparametrically instead of assuming any paramet-

ric form such as a Gamma or linear combination of Gam-

mas. Further, an individual HRF is estimated for each voxel,

which relaxes the unrealistic assumption that all voxels have

the same HRF as held by many other papers and allows het-

erogeneity in hemodynamic responses among voxels.

However, an immediate concern with model (2) is over-

filling, that is, with a limited sample available for each voxel,

one can not reliably estimate a high-dimensional parameter

hi. We address this concern with two approaches: one is

to pool information across neighboring voxels for estimation

that will be discussed in the next Section, and two is to take

into account the sparsity condition of hi.

As discussed in [7], it is reasonable to assume hi is a

sparse vector in the sense that only a small fraction of its to-

tal n elements is non-zero. In some experiments, if voxel i

does not respond to the stimuli, there is no hemodynamic re-

sponse, that is, all elements in hiare zero; if a voxel responds

to the stimuli, the corresponding HRF usually does not span

the whole domain, but first climbs up to a peak value and then

drops quickly to zero, thus only the first a few elements in hi

are non-zero. So for each voxel, we introduce a latent variable

γi∈ {0,1,··· ,q} to indicate that only the first γihemody-

namic responses components are non-zero. Specially γi= 0

corresponds to a non-activation voxel. Data-driven selection

of the maximum number of components q can be made by

applying a change-point detection or other model-selection

methods [7].

Given γi, we can further simplify the model (2) as

yi= S(γi)hi(γi) + ei,

(2)

(3)

where hi(γi) is a vector of the first γinon-zero coefficients in

hi, and S(γi) is the corresponding n × γidesign matrix.

3. BAYESIAN INFERENCE

WederiveaBayesianapproachforstatisticalinferenceonγi’s

and hi’s. In this Section, we first describe our prior choice

on the unknown parameters, and then the MCMC sampling

scheme we use for posterior inference.

3.1. A Variable Selection Prior

We employ a normal prior on the coefficients hi(γi), centered

at the null assumption i.e., hi(γi) = 0.

hi(γi)|σ2

which is a special case of the popular g-prior and is related to

the BIC criterion [8].

i,γi∼ N?0,nσ2

i(S(γi)TΣ−1S(γi))−1?,

(4)

One advantage of this prior is that it leads to a closed

form expression of the integrated likelihood, which will make

our sampling algorithm more efficient. With a standard non-

informative prior for σ2, p(σ2

i) ∝ 1/σ2

i, we have

p(yi|γi) ∝ Ri(γi)−n/2|Σ|−1/2(1 + n)−γi/2,

where Ri(γi) = yT

S(γi))−1S(γi)TΣ−1yi.

(5)

iΣ−1yi−

n

n+1yT

iΣ−1S(γi)(S(γi)TΣ−1

3.2. An Information Pooling Prior

Let δidenote the set of neighboring voxles around voxel i,

which, in our case, consists of 5 (if pixel i is on the boundary)

or 8 (if the pixel i is not on the boundary) immediate neigh-

boring pixels for a 2D data, and 11 or 26 voxels for a 3D data.

To pool information across neighboring voxels for both

activation detection and HRF estimation, we introduce a spa-

tial MRF prior on γ = (γ1,··· ,γN)

⎧

⎩

where Qij = 1(γi = γj = 0) + 1(γi > 0,γj > 0), with

1(A) being the indicator function equal to 1 if A is true and

0 otherwise. The first term in (6) denotes the “external filed”

and usually takes a linear form with fixed αi’s. The second

term denotes the interaction effect of neighboring elements in

γ. The weight ωijmeasures the interaction between neigh-

boring voxels i and j ∈ δi. We set ωijto be the inverse of

“functional distance” of two voxels i and j, which is defined

as

?yi,yj?

bility p(γi= 1,··· ,q|θ) = 1/q for all i, and the θ is purely

a spatial smoothing parameter; if αi(γi) ?= 0 for some i, θ is

spatial smoothing parameter but also determines the marginal

probability of γ. In both cases, γ is independent when θ = 0.

Our prior (6) can be viewed as a modified version of the Ising

prior used in [4].

p(γ|θ) ∝ exp

⎨

N

?

i=1

αi(γi) +

N

?

i=1

?

j∈δi

θωijQij

⎫

⎬

⎭,

(6)

|corr(yi,yj)|

. If αi(γi) = 0 for all i, the marginal proba-

3.3. Posterior Inference via MCMC

For efficient sampling, we integrate over the HRF coefficient

hi, which can be retrieved later. The joint posterior distribu-

tion of all indicator variables can be calculated by combining

the spatial prior and the integrated likelihood as follows

?

⎧

⎩

whereli(γi) = log(p(yi|γi))denotesthelogoftheintegrated

likelihood.

As an effective sampling scheme, Metropolis-Hastings

Sampling is used to draw samples from the posterior dis-

tribution. The proposal distribution for γi ∈ {1,··· ,q} is

p(γ|θ,y)

∝

N

i=1

p(yi|γi)p(γi|θ)

⎛

(7)

∝

exp

⎨

N

?

i=1

⎝li(γi) + αi(γi) +

?

j∈δi

θijωijQij

⎞

⎠

⎫

⎬

⎭,

715

Page 3

π(γi= k) =

⎛

⎝

q

?

l=0

exp

⎧

⎩

⎨

N

?

i=1

αi(l − k) + θ

?

j∈δi

ωijWljk

⎫

⎭

⎬

⎞

⎠

−1

where Wljk = (1(l = 0,γj = 0) + 1(l > 0,γj > 0)) −

(1(k = 0,γj= 0) + 1(k > 0,γj> 0)).

For a voxel i, the sampling steps are:

1) Start with initial values γ(0)

2) Draw proposal γ∗

3) Given the candidate γ∗

current lγ(m−1)

i

, and α = l(γ∗

4) If the jump increases the density (α > 1), we accept

the candidate point, let γ(m)

i

= γ∗

jump decreases the density (α < 1), we accept the candidate

point with probability α; otherwise we reject it and return to

step 2.

i

∈ {0,··· ,q};

ifrom π(γi);

i, calculate the likelihood l(γ∗

i)/l

i

i),

?

?

?

γ(m−1)

?

;

i, and return to step 2. If the

3.4. Monte Carlo Estimates

An advantage of the MCMC estimation is that a full spectrum

of posterior inference can be constructed. Here, we simulta-

neously make two types of inference: 1) recovering the HRFs

for the activated voxels; 2) testing whether a voxel is activated

or not.

Let {γ(m)

Carlo iterates from our sampling scheme for voxel i, a “model

average”estimateoftheposteriormeancanbeusedtorecover

the HRF:

1

M

m=1

?

A Monte Carlo estimate for the latent variable γiis given by

1

M

m=1

i

;m = 1,··· ,M} denote a series of Monte

E(hi|y) ≈

M

?

E(hi|γ(m),y)

(8)

=

1

M

M

m=1

n

n + 1(ST(γ(m)

i

)Σ−1S(γ(m)

i

))−1ST(γ(m)

i

)Σ−1y,

E(γi|y) ≈

M

?

γ(m)

i

.

(9)

Further we produce an “activation map” that consists of vox-

els whose estimate (9) is bigger than 0.5.

4. RESULTS

4.1. Simulated Results

In the simulated experiments, we focus on testing whether the

proposed method can detect the activation map and estimate

variousHRFswhenthelineartrendistakenawayinaseparate

preprocessing step.

The synthetic data are composed of white noise on a base

2D image with size 40 × 40 × 130. Some randomly selected

regions are further summed with different types of signal time

series generated from different HRFs defined by the SPM

software package, to simulate activated voxels. We specify

q = 16. The definition of signal-to-noise ratio (SNR) is

SNR = var(Sh)/var(e).

We generate 4 types of HRFs, h1,h2,h3,h4 and a mix-

ture of h1,h2,h3,h4, i.e. h5 =

The delay parameter controls the length of the non-zero

part, corresponding to the γ. The “ground truth” of γ for

h1,h2,h3,h4 are respectively 5, 6, 10, 15. Figure 1 shows

the HRF patterns of h1,h2,h3,h4 and the corresponding

four ground truth time series without any noise on the HRFs.

1

4(h1 + h2 + h3 + h4).

Fig. 1. The HRF at different delays of response and the corresponding time series. The

first row are different HRFs. From left to right, the γ is 5,6,10 and 15. The second row

shows the corresponding time series of activated voxels without noise on HRF.

We get estimators and recover the HRFs by using (8). Fig-

ure 2 shows the comparison of the estimation results from

the proposed method with the ordinary least square method,

demonstrating that our method leads to more smoothed and

accurate estimation of the HRFs.

Fig. 2. The ground truth of the HRFs and their estimators for the simulated data without

noise on HRF and the overall SNR = 1. The black line (with gray dots) is the ground

truth; the green line (with black squares) shows the estimators from the present method;

and the red line (with black squares) shows the estimators from the ordinary least square

method.

Figure 3 shows the comparison of the activation map from

the proposed method and the GLM at different SNR levels.

Both methods can detect functional activation regions, but

there are more false positives with the GLM method. Fur-

thermore, our proposed method can distinguish the difference

among HRFs, while the GLM can not. All procedures are

implemented using Matlab 7.0. The computation times in-

dicated are obtained for a computer with an Intel2Core 2.4

GHz processor and 2 gigabyte of memory. For the simulated

data set, t = 2 minutes for GLM and t = 13 minutes for our

716

Page 4

proposed method.

Fig. 3. The “Activation Map” comparison of the present method and GLM. Column A

and B : results of proposed method when SNR = 0.5 and SNR = 1.0; column C

and D: results of GLM when SNR = 0.5 and SNR = 1.0. First row: HRF without

noise; second and third rows: HRF with variance 0.005 and 0.05, respectively.

In the method, the hyper-parameter θ is pre-specified

withoutestimation. Here, wetestthesensitivityofthemethod

to θ with the most noisy simulated data (var(hrf) = 0.1 and

SNR = 0.5). Figure 4 (left) shows the likelihood functions

over MCMC iterations at different θ values, which turn out to

be not sensitive to the θ. For different θ values, the likelihood

functions converge to the same value, though at different con-

vergence rates. In addition, the detected “activation maps”

are the same at different θ values (shown in Figure 4 right).

Fig. 4. Likelihood function at different θ values and “Activation Map”.

4.2. Real fMRI Data Results

ThevisualmotiontaskrealfMRIdata(53×63×46×360)was

obtained from the SPM data site (http://www.fil.ion.ucl.ac.uk

/spm/data/attention. html). The subject was scanned during

four runs, with 90 image volumes in each run. Four condi-

tions - “fixation”, “attention”, “no attention” and “stationary”

- were used and there were 10 multi-slice volumes per condi-

tion. The SPM package is used for the standard preprocess-

ing.

We empirically assume fixed length of HRF is 16. The

activation maps generated from our method are shown in Fig-

ure 5. In the map, all color regions are activation regions.

The V1 left and right and the posterior parietal (PP) cortices

are shown as activated regions. Furthermore, the activated

regions show slightly different HRFs. V1 right and left re-

gions have very similar HRFs and the HRFs in these regions

are stronger than those in the PP cortex when a visual mo-

tion task is present. All results are consistent with previous

findings [9].

Fig. 5. Activation map of the real fMRI data, showing V1 Left/Right, V5 Left/Right and

the posterior parietal (PP) cortices.

5. SUMMARY

We present a Bayesian method for simultaneously generat-

ing activation maps and learning HRFs. HRFs at different

voxels are estimated in a nonparametric way, and a Bayesian

variable selection approach is used to induce shrinkage and

sparsity. A MRF prior is placed on latent variables repre-

senting activated hemodynamic response components, which

incorporated similarity of neighboring voxels in fMRI data.

We have demonstrated that our method produces successful

functional activation detection and HRF estimation in both

simulated data and real applications.

6. REFERENCES

[1] R. B. Buxton, K. Uludag, D. J. Dubowitz, and T. T. Liu, “Modeling the hemody-

niamic response to brain activation,” NeuroImage, pp. S220–S233, 1994.

[2] K. Friston, A. Holmes, K. Worsley, J. Poline, C. Frith, and R. Frachkowiak, “Sta-

tistical parametric maps in functional imaging: a general linear approach,” Human

Brain Mapping, pp. 189–210, 1995.

[3] W. D. Penny, N. J. Trujillo-Barreto, and K. J. Friston, “Bayesian fMRI time series

analysis with spatial priors,” NeuroImage, pp. 350–362, 2005.

[4] M. Smith and L. Fahrmeir, “Spatial bayesian variable selection with application

to functional magnetic resonance imaging,” Journal of the American Statistical

Association, pp. 417–431, 2007.

[5] M. W. Woolrich, M. Jenkinson, J. M. Brady, and S. M. Smith, “Fully bayesian

spatio-temporal modeling of fMRI data,” IEEE Transactions on Medical Imaging,

pp. 213–231, 2004.

[6] J. Coelho, J. Sanches, and M. H. Lauterbach, “fmri binary detection of brain

activated regions with graph-cuts,” 30th Annual International IEEE EMBS Con-

ference, 2008.

[7] C. Zhang and T. Yu, “Semiparametric detection of significant activation for brain

fMRI,” Annals of Statistics, pp. 1693–1725, 2007.

[8] F. Liang, R. Paulo, G. Molina, M. Clyde, and J. Berger, “Mixtures of g-priors

for bayesian variable selection,” Journal of the American Statistical Association,

2008.

[9] C. Buchel and K. J. Friston, “Modulation of connectivity in visual pathways

by attention: Cortical inferences evaluated with structural equation modeling and

fMRI,” Cerebral Cortex, pp. 768–778, 1997.

717