Page 1
Investigations into Restingstate Connectivity
using Independent Component Analysis
FMRIB Technical Report TR05CB1
(A related paper has been accepted for publication in
Philosophical Transactions of the Royal Society,
Special Issue on ’Multimodal neuroimaging of brain connectivity’)
Christian F. Beckmann, Marilena DeLuca, Joseph T. Devlin and Stephen M. Smith
Oxford Centre for Functional Magnetic Resonance Imaging of the Brain (FMRIB),
Department of Clinical Neurology, University of Oxford, John Radcliffe Hospital,
Headley Way, Headington, Oxford, UK
Corresponding author is Christian F. Beckmann: beckmann@fmrib.ox.ac.uk
Abstract
Inferring restingstate connectivity patterns from functional magnetic resonance imaging (FMRI) data is a challenging
task for any analytical technique. In this paper we review a probabilistic independent component analysis (PICA) approach,
optimised for the analysis of FMRI data (Beckmann and Smith, 2004), and discuss the role which this exploratory technique
can take in scientific investigations into the structure of these effects. We apply PICA to FMRI data acquired at rest in order
to characterise the spatiotemporal structure of such data, and demonstrate that this is an effective and robust tool for the
identification of lowfrequency restingstate patterns from data acquired at various different spatial and temporal resolutions.
We show that these networks exhibit high spatial consistency across subjects and closely resemble discrete cortical functional
networks such as visual cortical areas or sensory motor cortex.
Keywords: Functional Magnetic Resonance Imaging; brain connectivity; Restingstate fluctuations; Independent Com
ponent Analysis;
1 Introduction
FunctionalMagneticResonanceImaging(FMRI)hasbecomeanimportantneuroscientifictoolforprobingneuralmechanisms
in the human brain. Typical FMRI experiments have focused on the acquisition of T2*sensitive MR images during periods
of increased oxygen consumption (due to neuronal response to externally controlled experimental conditions) and contrast
the measured image intensities with recordings obtained at ’rest’. Critically, some important quantitative concepts in FMRI
analysis such as the calculation of percent signal change or the interpretation of deactivation implicitly hinge on a suitable
definition of this baseline/rest signal. The baseline ’restingstate’ of the brain itself, however, is a somewhat ill defined and
poorly understood concept.
Of particular interest in this context are certain lowfrequency fluctuations of the measured cerebral haemodynamics
(around 0.01–0.1Hz) which exhibit complex spatial structure reminiscent of FMRI ’activation maps’ and which can be iden
tified in FMRI data taken both under rest condition and under external stimulation. Recently, some attention has been focused
on the characterisation of these maps and the identification of possible origins of slow variations in the measured blood oxygen
level dependent signal. Various researchers have suggested that these signal variations, temporally correlated across the brain,
are of neuronal origin and correspond to functional restingstate networks (RSNs) which jointly characterise the neuronal
baseline activity of the human brain in the absence of deliberate and/or externally stimulated neuronal activity, and may reflect
functionally distinct networks.
Biswal et al. (1995) first demonstrated the feasibility of using FMRI to detect such spatially distributed networks within
primary motor cortex during restingstate by calculating temporal correlations across the brain with the time course from a
seed voxel whose spatial location was chosen from a prior fingertapping study. The temporal signal from a seed voxel in the
motor cortex was correlated with other motor cortex voxels and uncorrelated with other voxels, with major frequency peaks in
1
Page 2
the resting correlations at around 0.02Hz. Lowe et al. (1998) found similar results using both singleslice low time of repetition
(TR of 130ms) and wholehead volumes with longer TR (2000ms) while Xiong et al. (1999) describe functional connectivity
maps that cover additional nonmotor areas. Based also on findings from PET studies, the existence of a default mode brain
network involving several regions including the posterior cingulate cortex has been proposed (Raichle et al., 2001; Shulman
et al., 1997; Mazoyer et al., 2001). Using simulteneously acquired EEG and FMRI data under rest, Goldman et al. (2002)
have shown that the variation in Alpha rhythm in EEG (812Hz) is correlated with the FMRI measurements. In particular,
the authors report that increased alpha power was correlated with decreased BOLD signal in multiple regions of occipital,
superior temporal, inferior frontal, and cingulate cortex, and with increased signal in the thalamus and insula. These results
have important implications for interpretation of RSNs as they suggest a neuronal cause for these fluctuations.
Alternatively, it has been argued that these effects simply reflect vascular processes unrelated to neuronal function, which
would make RSNs of less interest to neuroscience (though still of potential clinical interest). Physiological noise in the
resting brain and its echotime and field strength dependencies were investigated by Kruger and Glover (2001) who showed
that physiological noise demonstrates a field strength dependency, exceeds the thermal as well as scanner noise at 3T and is
increased in grey matter (see also Woolrich et al. (2001)). Various researchers have investigated the relation between low
frequency fluctuations in the measured BOLD signal and other physiological observations: Obrig et al. (2000) reviewed and
studied lowfrequency variations in oxygenation, cerebral blood flow (CBF) and metabolism and report significant correlations
with similar fluctuations observed by near infrared spectroscopy (NIRS). More recently, Wise et al. (2004) have investigated
the influence of arterial carbon dioxide fluctuations by using the endtidal level of exhaled carbon dioxide as covariate of
interest in a General Linear Model (GLM) analysis. The most significant changes were concentrated in the occipital, parietal
and temporal lobes as well as in the cingulate cortex, and suggest that vascular processes (unrelated to neuronal function) play
a significant role in the generation of such restingstate patterns.
EstimatingthetemporalandspatialcharacteristicsoftheselowfrequencyfluctuationsfromFMRIdatapresentsaformidable
challenge to analytical techniques. In the majority of existing studies, resting patterns are inferred by a correlation analysis
of the voxelwise FMRI recodings against a reference time course obtained from secondary recordings (e.g. from EEG,
NIRS or physiologic measurements like the carbondioxide concentration) or simply by regressing against a single voxel’s
time course from resting data which is believed to be of functional relevance (seedvoxel based correlation analysis). These
techniques fundamentally test very specific hypotheses about the temporal structure of these effects. Recently, however, In
dependent Component Analysis has succesfully been applied to the estimation of certain lowfrequency patterns (Goldman
and Cohen, 2003; Kiviniemi et al., 2003; Greicius et al., 2004). An important benefit of such exploratory techniques over
more hypothesisbased techniques is the ability to identify various types of signal fluctuations by virtue of their spatial and/or
temporal characteristics without the need to specify an explicit temporal model. Such flexibility in data modelling is essential
in cases where the effects of interest are not very well understood and cannot be predicted acurately.
This paper is organised as follows: in section 2 we review a probabilistic approach to Independent Component Analysis
(PICA) specifically optimised for the analysis of FMRI data (Beckmann and Smith, 2004). Section 3 discusses the constraints
of this exploratory data analysis technique when used for the identification of largescale noise fluctuations. In particular,
we demonstrate that optimisation for maximally independent spatial sources does not imply an inability to estimate largely
overlapping spatial maps. We demonstrate the ability of PICA to extract resting fluctuations and apply the technique to FMRI
resting data in order to test a set of important hypotheses about the structure of restingstate connectivity in the human brain.
In particular, we will investigate (i) if and how estimated source processes are driven by less interesting physiological effects
such as the cardiac or respiratory cycle, (ii) the spatial characteristics of estimated maps in terms of locality within grey matter
and (iii) the consistency of maps obtained from multiple subjects.
2 Decomposing FMRI data using ICA
Independent Component Analysis (ICA, Comon (1994); Bell and Sejnowski (1995); McKeown et al. (1998)) is a technique
which decomposes a 2dimensional (time × voxels) data matrix1into a set of time courses and associated spatial maps which
jointly describe the temporal and spatial characteristics of underlying hidden signals (components). A probabilistic ICA model
extends this by assuming that the pdimensional vectors of observations (time series in the case of FMRI data) is generated
from a set of q(< p) statistically independent nonGaussian sources (spatial maps) via a linear and instantaneous ’mixing’
process corrupted by additive Gaussian noise η(t):
xi= Asi+ ηi
(1)
1Here, we only discuss the case of a decomposition into spatially independent source signals; the reason for this will become apparent later.
2
Page 3
Here, xidenotes the individual measurements2at voxel location i, sidenotes the nonGaussian source signals contained in
the data and ηidenotes Gaussian noise3ηi∼ G(0,σ2Σi).
The p × q dimensional mixing matrix A is assumed to be nondegenerate, i.e. of rank q. Solving the blind separation
problem requires finding a linear ’unmixing’ matrix W of dimension q × p such that
? s = Wx
The PICA model is similar to the standard GLM with the difference that, unlike the design matrix in the GLM, the mixing
matrix A is no longer prespecified prior to model fitting but will be estimated from the data. The spatial source signals
correspond to parameter estimate images in the GLM with the additional constraint of being statistically independent of each
other.
is a good approximation to the true source signals s.
2.1 Parameter estimation
Without loss of generality we can assume that the source signals have unit variance. If the noise covariance Σiis known,
we can prewhiten the data and obtain a new representation ¯ xi =¯Asi+ ¯ ηi, where ¯ ηi∼ G(0,σ2I), i.e. where the noise
covariance is isotropic at every voxel location. To simplify notation, we will henceforth assume isotropic noise and drop the
additonal bar.
Noise and signal are uncorrelated, so the data covariance matrix Rx= ?xixt
matrix A can be estimated as the matrix square root of Rx−σ2I: let X be a p×N matrix containing all N different FMRI
time series in its columns and let X = U(NΛ)
i? = AAt+ σ2I, i.e. the unknown mixing
1
2V be the singular value decomposition of X. Then
?AML= Uq(Λq− σ2Iq)
1
2Qt,
(2)
where Uqand Λqcontain the first q Eigenvectors and Eigenvalues. The matrix Q denotes a q × q orthogonal rotation matrix,
i.e. a matrix with QQt= I. This matrix is not directly identifyable from the data covariance matrix since Rxis invariant
under postmultiplication of A by any orthogonal rotation¯Q given that (A¯Q)(A¯Q)t= A¯Q¯QtAt= AAt= Rx− σ2I.
Estimating the mixing matrix A, however, reduces to identifying the square matrix Q after whitening the data with respect
to the noise covariance Σiand projecting the temporally whitened observations onto the space spanned by the q Eigenvectors
of Rxwith largest Eigenvalues. The maximum likelihood estimates of sources and σ are obtained using generalised least
squares:
? sML=?
and
? σ2
Solving the model in the case of an unknown noise covariance can be achieved by iterating estimates of the mixing matrix
and the sources and reestimating the noise covariances from the residuals ? η. The form of Σitypically is constrained by a
et al., 2001), and restrict the structure to autoregressive noise. However, since the exploratory approach allows modelling of
various sources of variability, e.g. temporally consistent physiological noise, as part of the signal in equation 1, the noise
model itself can actually be quite simplistic.
A consequence of the isotropic noise model is that as an initial preprocessing step we will modify the original data time
courses to be normalised to zero mean and unit variance. This preconditions the data under the null hypothesis of no signal:
the data matrix X is identical (up to second order statistics) to a simple set of realisations from a G(0,I) noise process. Any
signal will have to reveal itself via its deviation from Gaussianity.
The maximum likelihood estimators depend on knowledge of the number of underlying sources q. In the noise free case
this quantity can easily be deduced from the rank of the covariance of the observations Rxwhich is of rank q. In the presence
of isotropic noise, however, the covariance matrix will be of full rank where the additional noise has the effect of raising the
Eigenvalues of the covariance matrix by σ2(Roberts and Everson, 2001). Inferring the number of estimable source processes
amounts to testing for sphericity of Eigenspaces beyond a given threshold level (Beckmann and Smith, 2004). Simplistic
criteria like the reconstruction error or predictive likelihood will naturally predict that the accuracy steadily increases with
increased dimensionality. Thus, criteria like retaining 99.9% of the variability result in arbitrary threshold levels (Beckmann
Wx
with
?
W = (?A
p − q
t?A)1?A
λl.
t
ML=
1
p
?
l=q+1
(3)
suitable parameterisation; here we use the common approaches to FMRI noise modelling (Bullmore et al., 1996; Woolrich
2For simplicity we assume demeaned data.
3The covariance of the noise is allowed to be voxel dependent in order to encode the vastly different noise covariance observed within different tissue
types (Woolrich et al., 2001).
3
Page 4
et al., 2001). This problem is intensified by the fact that the data covariance Rxis being estimated by the sample covariance
matrix. In the absence of any source signals, the Eigenspectrum of this sample covariance matrix is not identical to σ2Ip
but instead distributed skewed around the true noise covariance: the Eigenspectrum will depict an apparent difference in the
significance of individual directions within the noise (Everson and Roberts, 2000), even in the absence of signal. In the case
of Gaussian noise, however, this ’skew’ of the Eigenspectrum is of analytic form: the Eigenvalues have a Wishart distribution
and we can adjust the observed Eigenspectrum by the quantiles of the predicted cumulative distribution of Eigenvalues from
Gaussian noise (Johnstone, 2000) , prior to estimating the model order. If we assume that the source distributions p(s) are
Gaussian, the model then reduces to probabilistic PCA (Tipping and Bishop, 1999) and we can use Bayesian model selection
criteria. Within the PICA approach, we use the Laplace approximation to the posterior distribution of the model evidence that
can be calculated efficiently from the adjusted Eigenspectrum (Minka, 2000; Beckmann and Smith, 2004).
In order to complete the estimation of the mixing matrix and the sources, we need to optimise an orthogonal rotation
matrix Q in the space of whitened observations:
? s = Wx = Q˜ x,
Hyv¨ arinen and Oja (1997) have presented an elegant fixed point algorithm that uses approximations to negentropy in
order to optimise for nonGaussian source distributions and give a clear account of the relation between this approach to
statistical independence. In brief, the individual sources are obtained by projecting the data x onto the individual rows of Q,
i.e. the rth source is estimated as
? sr= vt
the following contrast function:
J(sr) ∝ [E{F( ? sr)} − E{F(ν)}],
combines the highorder moments of srin order to estimate the amount of nonGaussianity in the individual sources. From
equation 5, the vectors vt
1997).
(4)
where ˜ x = (Λq− σ2Iq)−1/2Ut
qx denotes the spatially whitened data.
r˜ x,
where vt
rdenotes the rth row of Q. In order to optimise for nonGaussian source estimates, Hyv¨ arinen and Oja (1997) propose
(5)
where ν denotes a standardised Gaussian variable, E denotes the expectation and F is a general nonquadratic function that
rare optimised to maximise J(? sr) using an approximative Newton method (Hyv¨ arinen and Oja,
2.2
After estimating the mixing matrix?A, the source estimates are calculated by projecting each voxel’s time course onto the time
the estimated noise is a linear projection of the true noise and is unconfounded by residual signal. At every voxel location we
have preconditioned the data such that xihas unit standard deviation and the estimate of the noise variance ? σ2
’Zstatistic maps’ zr·by dividing the raw IC maps by the standard error of the residual noise.
In order to assess the Zmaps for ’significantly activated’ voxels, we employ mixture modelling of the probability density
of the Zstatistic spatial maps.
From equation 3 it follows that ? si=?
of one Gaussian and two Gamma distributions, to model background noise and positive and negative BOLD effects (Hartvig
and Jensen, 2000; Beckmann et al., 2003). The mixture is fitted using an expectationmaximisation algorithm Dempster
et al. (1977). In cases where the number of ’active’ voxels is very small, the relative proportions of the Gamma densities
in the overall mixture distribution might be estimated as zero. In this case, a simple transformation to spatial Zscores and
subsequent thresholding is appropriate, i.e. reverting to nullhypothesis testing instead of the otherwise preferable alternative
hypothesis testing. Otherwise we can evaluate the fitted mixture model to calculate the posterior probability of ’activation’ as
the ratio of the probability of intensity value under the ’noise’ Gaussian relative to the sum of probabilities of the value under
the ’activation’ Gamma densities4.
Any threshold level, though arbitrary, directly relates to the loss function we like to associate with the estimation process,
e.g. a threshold level of 0.5 places an equal loss on false positives and false negatives (Hartvig and Jensen, 2000).
Inference
courses contained in the columns of the unmixing matrix?
location will approximately equal the true variance of the noise. We can thus convert the individual spatial IC maps sr·into
W. In the case where the model order q was estimated correctly,
iat each voxel
WAsi+?
Wηi, i.e. the noise term in equation 1 manifests itself as additive Gaussian
noise in the estimated sources. We therefore model the distribution of the spatial intensity values of each Zmap by a mixture
2.3 PICA Algorithm Overview
The individual steps that constitute the Probabilistic Independent Component Analysis are illustrated in figure 1. The de
meaned original data is first normalised to unit variance at each voxel location. If appropriate spatial information is available,
4In this case ’activation’ is to be understood as signal that ’cannot be explained as random correlation coefficient’.
4
Page 5
PICA
map
standard
deviation
of b η
IC
maps
Z stat.
map
prob.
maps
noise
estimate
spatially
whitened
data
voxelwise
prewhitened
data
variance–
normalised
data
original
data
?
prior
infor–
mation
+
Σi
6
Rx

??

?
?
modelorder

?

Mixture Models
PPCA
estimate
unmixing
Figure 1: Schematic illustration of the probabilistic ICA model (Beckmann and Smith, 2004)
thisisencodedintheestimationofthesamplecovariancematrixRx. Individualvoxelweights, e.g. graymattersegmentation,
can be used to calculate a weighted covariance matrix while voxelpair weightings can be used to calculate the withingroup
covariance (Beckmann and Smith, 2004). Probabilistic PCA is used to infer upon the unknown number of sources and results
in an estimate of the noise and a set of spatially whitened observations. We can estimate the noise covariance structure Σifrom
the residuals in order to voxelwise (temporally) prewhiten and renormalise the data and iterate the entire cycle. Estimation
of Σifrom residuals in the case of autocorrelated noise can be achieved as described by Woolrich et al. (2001). In practice,
the output results do not suggest a strong dependency on the form of Σiand preliminary results suggest that it is sufficient to
iterate these steps only once. From the spatially whitened observations, the individual component maps are obtained using a
modified fixed point iteration scheme (FastICA (Hyv¨ arinen and Oja, 1997)) to optimise for non Gaussian source estimates via
maximising the negentropy. These maps are separately transformed to Zscores. In contrast to raw IC estimates which only
encode the estimated signal, these Z score maps depend on the amount of variability explained by the entire decomposition
at each voxel location relative to the residual noise similar to statistical parametric maps from a GLM analysis. This is an
important aspect of the probabilistic ICA model as now these maps also reflect the degree to which the signal explained
within this model fits to the data and, unlike standard ICA, no longer ignores the signal variation which remains unaccounted
for. Finally, Gaussian/Gamma Mixture Models are fitted to the individual Z maps in order to infer voxel locations that are
significantly modulated by the associated time course.
3Estimating overlapping maps using ICA
The choice of optimising for independence between spatial maps could equally well be replaced by optimising for indepen
dence between time courses. Different authors have argued in favour of one or the other technique, where the main objection
appears to revolve around the question of whether orthogonality (i.e. uncorrelatedness) between estimated sources should
be enforced in the temporal or spatial domain (Friston, 1998; Petersen et al., 2000). At a conceptual level, the notion of
orthogonality is overly restrictive in either domain: for temporal modes, the existence of stimulus correlated effects (e.g. mo
tion artefacts or higher order brain function) means that enforced orthogonality necessarily results in a misrepresentation of
underlying temporal signals. Similarly, for spatial modes, (Friston, 1998) has argued that even though different brain function
might be spatially localised, the principle of ’functional integration’ might imply that neuronal processes share a large pro
portion of cortical anatomy. These arguments suggest that independence and implied orthogonality are always suboptimal for
the analysis of data which is as complicated as that obtained from functional MRI experiments.
From a signal detection point of view, however, it is important to consider the extent to which signal ’appears’ in space or
5