An empirical comparison of SPM preprocessing parameters to the analysis of fMRI data.
ABSTRACT We present the results from two sets of Monte Carlo simulations aimed at evaluating the robustness of some preprocessing parameters of SPM99 for the analysis of functional magnetic resonance imaging (fMRI). Statistical robustness was estimated by implementing parametric and nonparametric simulation approaches based on the images obtained from an event-related fMRI experiment. Simulated datasets were tested for combinations of the following parameters: basis function, global scaling, low-pass filter, high-pass filter and autoregressive modeling of serial autocorrelation. Based on single-subject SPM analysis, we derived the following conclusions that may serve as a guide for initial analysis of fMRI data using SPM99: (1) The canonical hemodynamic response function is a more reliable basis function to model the fMRI time series than HRF with time derivative. (2) Global scaling should be avoided since it may significantly decrease the power depending on the experimental design. (3) The use of a high-pass filter may be beneficial for event-related designs with fixed interstimulus intervals. (4) When dealing with fMRI time series with short interstimulus intervals (<8 s), the use of first-order autoregressive model is recommended over a low-pass filter (HRF) because it reduces the risk of inferential bias while providing a relatively good power. For datasets with interstimulus intervals longer than 8 seconds, temporal smoothing is not recommended since it decreases power. While the generalizability of our results may be limited, the methods we employed can be easily implemented by other scientists to determine the best parameter combination to analyze their data.
Article: Age-Related Differences in Test-Retest Reliability in Resting-State Brain Functional ConnectivityPLoS ONE 12/2012; · 4.09 Impact Factor
Article: Optimizing preprocessing and analysis pipelines for single-subject fMRI: 2. Interactions with ICA, PCA, task contrast and inter-subject heterogeneity.[show abstract] [hide abstract]
ABSTRACT: A variety of preprocessing techniques are available to correct subject-dependant artifacts in fMRI, caused by head motion and physiological noise. Although it has been established that the chosen preprocessing steps (or "pipeline") may significantly affect fMRI results, it is not well understood how preprocessing choices interact with other parts of the fMRI experimental design. In this study, we examine how two experimental factors interact with preprocessing: between-subject heterogeneity, and strength of task contrast. Two levels of cognitive contrast were examined in an fMRI adaptation of the Trail-Making Test, with data from young, healthy adults. The importance of standard preprocessing with motion correction, physiological noise correction, motion parameter regression and temporal detrending were examined for the two task contrasts. We also tested subspace estimation using Principal Component Analysis (PCA), and Independent Component Analysis (ICA). Results were obtained for Penalized Discriminant Analysis, and model performance quantified with reproducibility (R) and prediction metrics (P). Simulation methods were also used to test for potential biases from individual-subject optimization. Our results demonstrate that (1) individual pipeline optimization is not significantly more biased than fixed preprocessing. In addition, (2) when applying a fixed pipeline across all subjects, the task contrast significantly affects pipeline performance; in particular, the effects of PCA and ICA models vary with contrast, and are not by themselves optimal preprocessing steps. Also, (3) selecting the optimal pipeline for each subject improves within-subject (P,R) and between-subject overlap, with the weaker cognitive contrast being more sensitive to pipeline optimization. These results demonstrate that sensitivity of fMRI results is influenced not only by preprocessing choices, but also by interactions with other experimental design factors. This paper outlines a quantitative procedure to denoise data that would otherwise be discarded due to artifact; this is particularly relevant for weak signal contrasts in single-subject, small-sample and clinical datasets.PLoS ONE 01/2012; 7(2):e31147. · 4.09 Impact Factor
Article: Assessing the influence of different ROI selection strategies on functional connectivity analyses of fMRI data acquired during steady-state conditions.[show abstract] [hide abstract]
ABSTRACT: In blood oxygen level dependent (BOLD) functional magnetic resonance imaging (fMRI), assessing functional connectivity between and within brain networks from datasets acquired during steady-state conditions has become increasingly common. However, in contrast to connectivity analyses based on task-evoked signal changes, selecting the optimal spatial location of the regions of interest (ROIs) whose timecourses will be extracted and used in subsequent analyses is not straightforward. Moreover, it is also unknown how different choices of the precise anatomical locations within given brain regions influence the estimates of functional connectivity under steady-state conditions. The objective of the present study was to assess the variability in estimates of functional connectivity induced by different anatomical choices of ROI locations for a given brain network. We here targeted the default mode network (DMN) sampled during both resting-state and a continuous verbal 2-back working memory task to compare four different methods to extract ROIs in terms of ROI features (spatial overlap, spatial functional heterogeneity), signal features (signal distribution, mean, variance, correlation) as well as strength of functional connectivity as a function of condition. We show that, while different ROI selection methods produced quantitatively different results, all tested ROI selection methods agreed on the final conclusion that functional connectivity within the DMN decreased during the continuous working memory task compared to rest.PLoS ONE 01/2011; 6(4):e14788. · 4.09 Impact Factor
An Empirical Comparison of SPM Preprocessing Parameters
to the Analysis of fMRI Data
Valeria Della-Maggiore, Wilkin Chau, Pedro R. Peres-Neto,* and Anthony R. McIntosh
Rotman Research Institute of Baycrest Centre, Toronto, Ontario M6A 2E1, Canada; and
*Department of Zoology, University of Toronto, Toronto, Ontario M5S 3G5, Canada
Received J une 27, 2001
We present the results from two sets of Monte Carlo
simulations aimed at evaluating the robustness of
some preprocessing parameters of SPM99 for the anal-
ysis of functional magnetic resonance imaging (fMR I).
Statistical robustness was estimated by implementing
parametric and nonparametric simulation approaches
based on the images obtained from an event-related
fMR I experiment. Simulated datasets were tested for
combinations of the following parameters: basis func-
tion, global scaling, low-pass filter, high-pass filter and
autoregressive modeling of serial autocorrelation.
Based on single-subject SPM analysis, we derived the
following conclusions that may serve as a guide for
initial analysis of fMR I data using SPM99: (1) T he ca-
nonical hemodynamic response function is a more re-
liable basis function to model the fMR I time series
than HR F with time derivative. (2) Global scaling
should be avoided since it may significantly decrease
the power depending on the experimental design. (3)
T he use of a high-pass filter may be beneficial for
event-related designs with fixed interstimulus inter-
vals. (4) When dealing with fMR I time series with short
interstimulus intervals (< 8 s), the use of first-order
autoregressive model is recommended over a low-pass
filter (HR F ) because it reduces the risk of inferential
bias while providing a relatively good power. F or
datasets with interstimulus intervals longer than 8
seconds, temporal smoothing is not recommended
since it decreases power. While the generalizability of
our results may be limited, the methods we employed
can be easily implemented by other scientists to deter-
mine the best parameter combination to analyze their
© 2002 E lsevier Science (USA)
One of the primary statistical tools to examine
changes in brain activity from fMRI datasets is Statis-
tical Parametric Mapping (SPM) (Friston et al., 1994,
1995a,b). SPM utilizes a general linear model (GLM) to
assess task-specific, voxel-based differences in the
magnitude of the blood-oxygenation-level-dependent
(BOLD) signal. The fMRI time series is modeled at
each voxel with a linear combination of explanatory
functions plus a residual error term (Friston et al.,
As with other parametric approaches, the applica-
tion of GLM is contingent on two statistical assump-
tions of the data: normal distribution and indepen-
dence of the error term. It is common toapply temporal
and spatial smoothing to insure the time series con-
forms to a Gaussian Random Field (GRF), which vali-
dates the application of parametric statistical assess-
ment (Friston et al., 1995b; Worsley and Friston, 1995).
In addition to smoothing, other preprocessing steps
have been implemented to further enhance statistical
power while respecting the GLM’s assumptions. Sev-
eral factors contribute to image intensity changes in
fMRI; these include the physiological component gen-
erated by alterations in the BOLD signal which vary in
the order of 1 to5% (J ezzard and Song, 1996; Turner et
al., 1998), and irrelevant noise of physiological and
nonphysiological origin. Nonphysiological noise can be
instrumental (e.g., thermal noise) or due tohead move-
ments, whereas physiological noise originates mainly
from cardiac and respiratory cycles. SPM offers several
options to model changes in the BOLD signal, includ-
ing a canonical hemodynamic response function (HRF)
and an HRF with time derivative (HRF/TD) aimed at
adjusting for delays in the onset of the hemodynamic
response. To eliminate random components of low fre-
quency noise mentioned above, a high pass filter can be
selected (Holmes et al., 1997). Temporally correlated
noise can be smoothed by convolving the time series
with a known HRF function (low-pass filter), and/or by
eliminating the temporal autocorrelation using a first-
order autoregressive model (AR1) (Friston et al., 2000).
To date, the basis for selecting the appropriate pa-
rameters to preprocess a fMRI dataset has been theo-
retical, a necessary but not sufficient criterion to en-
sure the appropriate statistical treatment of the data.
There is little available information on the actual be-
havior of these parameters. The “sensitivity” of SPM
parameters has been estimated empirically by evalu-
ating the effect of preprocessing a real fMRI dataset
NeuroImage 17, 19–28 (2002)
© 2002 Elsevier Science (USA)
All rights reserved.
with different parameter combinations (Hopfinger et
al., 2000). However, statistical inferences based on this
typeof empirical testing may not bevalid: on onehand,
the estimation of power may be highly biased by the
small number of sample tests (n ? number of subjects);
on the other hand, given that the magnitude of the
signal and its spatial localization remain unknown to
the experimenter, true activations cannot be distin-
guished from false positives. To our knowledge, no
systematic study has been conducted to evaluate the
robustness of these parameters.
In this paper wepresent theresults derived from two
sets of Monte Carlo simulations generated to evaluate
the robustness of some of the SPM preprocessing pa-
rameters. Both power (i.e., the probability of detecting
an activation if it exists) and type I error (i.e., the
probability of detecting an activation if it does not
exist) were estimated for five hundred datasets gener-
ated based on fMRI images obtained from an event-
related study. Parametric and nonparametric ap-
proaches were used to generate the two sets of
simulations. The parametric simulation entailed the
generation of a white-noise baseline (plus AR1 corre-
lated noise), and the addition of an HRF signal (Cohen,
1997). Given that we generated the signal, we could
test the impact of varying the experimental design on
the generality of the SPM results. The nonparametric
simulation consisted in using the original fMRI data as
the population from which simulated datasets were
sampled. Although the statistical distribution of the
original data remained unknown, this approach pre-
sented the advantage of preserving the spatial and
temporal structure of real fMRI data. Power and false
positive rate were assessed for combinations of the
following parameters: basis (modeling) function, global
scaling, low pass filter, high pass filter and autoregres-
sive modeling of temporal correlations.
MATERIALS AND METHODS
Monte Carlo Simulations
The power of a statistical test can be estimated using
analytical or empirical methods. Analytical methods
are based on the same probability theory and assump-
tions that are used to identify the appropriate statisti-
cal distribution for any traditional statistical method.
Several assembled tables (e.g., Cohen, 1988) and com-
puter software packages (see Thomas and Krebs, 1997)
based on numerical solutions are available for estimat-
ing the power of most commonly used statistical tests.
However, when analytical formulae for estimating
power have not been derived, or when there is interest
in assessing the power of a test in which statistical
assumptions have been violated, power tables can be
generated using a Monte Carlo approach (e.g., Ste-
phens, 1974). In this case, one simulates statistical
populations and manipulates them in order to intro-
duce a desirable effect size (e.g., difference between
means) or sample variability (e.g., variance). Following
this, a large number of samples are taken and the test
statistic is calculated each time (Oden, 1991). If the
effect size is manipulated to be zero (i.e., the null
hypothesis is true), the probability of committing a
type I error is estimated as the proportion of tests that
erroneously rejected the null hypothesis. If the effect
size is set to be different from zero, the proportion of
cases in which the null hypothesis was correctly re-
jected is used as an estimate of statistical power. A
comprehensive simulation study of this kind can pro-
vide a basis for understanding the behavior of any
particular test and for comparing different tests
(Peres-Neto and Olden, 2001). This aids in identifying
the most appropriate statistical test for a particular
scenario (i.e., combinations of factors that can influ-
ence the statistical test).
In the present study we used Monte Carlo simula-
tions to compare the robustness of 16 different SPM
models. These models were defined by several combi-
nations of four preprocessing parameters (see Table 1):
global scaling (remove global effects or not), low pass
filter (HRF or none), high pass filter (default cutoff or
none), modeling of temporal autocorrelation (AR1 or
none). To enhance the reliability of the study, datasets
were generated using two different simulation ap-
proaches: parametric and nonparametric. All simu-
lated data was spatially smoothed with a 10-mm full-
width half-maximum (FWHM) filter.
Two basis functions were initially evaluated: HRF
and HRF/TD. HRF/TD is aimed at correcting for occa-
sional delays in the onset of the hemodynamic re-
sponse. However, modeling 500 datasets from the non-
reduced the power compared to HRF alone. Further
investigation of the efficiency of HRF/TD by generating
datasets with 0-, 1-, or 2-s delay using the parametric
approach, indicated that in fact, the effect of HRF/TD
varied with the duration of the delay. Altogether, these
results suggest that HRF/TD may be detrimental in
estimating changes in brain activity when applying to
the whole brain (see results for details). Thus, HRF
was the only basis function tested for all models.
Tostudy the performance of each parameter setting,
a total of 500 datasets were simulated based on T2*-
weighted Echo-Planar Images (EPI) obtained from five
subjects scanned with a GE 1.5T scanner during a
visual attention study (100 datasets per subject) (Gies-
brecht et al., 2000). Each real dataset consisted of 24
axial slices (64 ? 64 mm) of 180 vol; voxel size ? 3.8 ?
3.8 ? 5.0 mm. For practical reasons, only 5 of the 24
slices were used to generate the simulated datasets.
DELLA-MAGGIORE ET AL.
We first generated the baseline activity of the simu-
lated datasets by using a first-order autoregressive
plus white-noise model derived empirically by Purdon
and Weisskoff (1998).
The model with additive white noise can be ex-
pressed as a recursive filter:
x?n? ? ??1 ? q? ? w?n? ? q ? x?n ? 1?? ? v?n?,
where w[n] and v[n] constitute the white noise, and q
represents the degree of correlation between adjacent
samples of the AR1 process. The AR1 component rep-
resents physiological and non-physiological low fre-
quency noise characteristic of fMRI time series, while
the white noise represents nonphysiological, scanner
noise. The value of q was set to 0.82, and the variance
of w[n] and v[n] were to set to 1.16 and 3.52, respec-
tively. These values were chosen so that the temporal
autocorrelation of the simulated baseline was similar
tothat found in a real fMRI timeseries with a TR of 3 s.
Since the time series of each voxel was generated in-
dependently, the spatial autocorrelation present in the
real dataset was lost (Petersson et al., 1999). To solve
this problem, a gaussian low-pass spatial filter with a
kernel size of 3 ? 3 ? 3 voxels was applied to all
simulated datasets. The kernel weight was determined
empirically, so that the standard deviation of the vox-
el’s time series was similar to that of the original
Five “active” regions were defined for each subject
(Fig. 1a). Each region consisted of 3 ? 3 ? 2 voxels (i.e.,
11.4 ? 11.4 ? 10 mm). Tosimulate the signal, a hemo-
dynamic response function derived from Cohen and
collaborators (Cohen, 1997) was added to the baseline
h?t? ? t8.6e??t/0.547?,
where t is time. Except for those datasets used to
examine the effect of HRF/TD, the response latency of
each time series was 1 s.
Simulated datasets were not spatially normalized to
avoid the introduction of additional artifacts from non-
linear warping. Instead, tomaintain the same anatom-
ical coordinates of the “active” regions across the 500
datasets we followed these steps: We first defined the
five active regions from one of the subject’s anatomical
image. Using AFNI (Medical Collegeof Wisconsin, Cox,
1996) we determined the coordinates of the five regions
in Talairach coordinate space. The Talairach coordi-
nates were then used to identify the corresponding
locations in the native coordinate space of the other
four subject’s brains.
To enhance the reliability of the results, the robust-
ness of the 16 SPM models was tested on two possible
scenarios generated following an event-related experi-
mental design with either fixed or variable interstimu-
lus interval (ISI). The variable ISI, simulated random
presentation of the stimuli.
Scenario 1: 1% signal change, constant ISI
Scenario 2: 1% signal change, variable ISI (mean
ISI ? 31 s)
These particular scenarios were chosen to represent
common cases found in the fMRI literature (e.g.,
D’Esposito et al., 1999; Hopfinger et al., 2000). For a
T ABL E 1
123456789 1011 1213 14 1516
Remove global effects
High pass filter
Low pass filter (HRF)
Note. x indicates the parameter settings that defined each of the 16 models tested for all simulated scenarios.
ROBUSTNESS OF SPM PREPROCESSING PARAMETERS
given scenario, the simulated percent signal change
was constant within a voxel’s time series—thereby
simulating onlyone experimental
across spatial location (i.e., across the five “active” re-
gions). Figures 2a and 2b illustrates scenarios 1 and 2,
respectively, for one voxel of an “active” region. Both
scenarios were simulated using the same TR ? 3 s.
Because the parametric approach described above
may not maintain thespatial and temporal structureof
real fMRI data, we designed an alternative nonpara-
metric approach where these features were kept. This
protocol was adapted from the parametric bootstrap
(Efron and Tibshirani, 1993), where samples are
drawn from a multivariate normal distribution and the
variance/covariance structure obtained from empirical
data. The original data used for this purpose was one
session of the visual attention fMRI study described
above. The experiment followed an event-related de-
sign with variable ISI (mean ISI ? 26.4 s), where the
stimuli were presented semi-randomly; TR ? 2 s, and
the average signal change was 1%. The simulation
protocol was as follows. Images were spatially normal-
ized sothat thebetween subjects’ standard deviation of
each scan could be computed. From the 10 subjects of
the original fMRI experiment, we selected those who
exhibited task-related changes for the comparison of
two conditions, “target” versus “cue” (for this purpose
the data was analyzed with SPM using HRF alone).
Five subjects who showed bilateral activation of the
occipital cortex at a corrected alpha of 0.05 were cho-
sen, and the data of each of them was designated as an
fMRI population. For each population, twoareas (voxel
size ? 3 ? 3 ? 2) from the occipital cortex were defined
as the “active” regions (Fig. 1b). The location of these
areas varied slightly (?3 voxels) across the five sub-
F IG. 1.
Shown are the t values obtained from one dataset of the parametric
(a), and nonparametric (b) simulations, overlay on four horizontal
slices of a T1 image. The data displayed in the figure have not been
spatially smoothed. Selected regions are signaled with a white box.
“Active” regions are indicated with white arrows, whereas “inactive”
regions are indicated with yellow arrows.
Anatomical localization of “active” and “inactive” regions.
F IG. 2.
the simulated time series corresponding to one “active” voxel for a
fixed (a) and a variable (b) interstimulus interval of the parametric
approach and the nonparametric approach (c). Baseline noise is
indicated in black (solid line); 1% signal is indicated in red. Vertical
bars show the onset of the stimulus for each experimental design (for
practical reasons, only the onsets for the cue—but not for the tar-
get—are illustrated in Fig. 2c).
Simulated BOLD signal. The figure shows a portion of
DELLA-MAGGIORE ET AL.
jects according to individual anatomical differences.
The standard deviation was computed for each scan of
the brain across the 10 subjects. One hundred datasets
were generated per population (total ? 500) by adding
to each scan a normally distributed random error
based on the standard deviation of the corresponding
volume. The Pearson correlation between the time se-
ries of the simulated datasets and the population was
around 0.75. Although we used normally distributed
errors, this approach was considered as nonparametric
in the sense that signal and experimental design were
not manipulated. Our rationale for using the standard
deviation for the 10 initial subjects, instead of the one
based on the 5 subjects from whom the spatiotemporal
series were generated, was that they provided a better
estimate of the between-subjects error.
Estimation of Power and Type I Error
Once all simulated datasets were generated follow-
ing the two approaches, they were statistically ana-
lyzed using the SPM correction for multiple compari-
sons based on the theory of GRF (Adler, 1981; Worsley
et al., 1992). Corrected P values were obtained for all
voxels, but only the peak voxel of each “active” region
was kept for the estimation of power. The same crite-
rion was used to estimate the false positive rate (see
Power and false positive rate were assessed at a
familywise alpha level (i.e., the alpha level obtained
T ABL E 2
123456789 1011 12 1314 15 16
Variable ISI 0.918
Non parametric 0.916
Note. Shown are the power and number of false positives for the parametric and nonparametric simulations corresponding to the models
defined in Table 1. Power and false positives were obtained from averaging across all datasets and across all active and inactive regions,
respectively. Cluster size indicates the average number of voxels whose corrected P value was smaller than the familywise alpha level of 0.05
for all active regions. The first column indicates the simulation approach used to estimate power and number of false positives.
F IG. 3.
estimation. Average power estimates for models 1 and 2 (without
time derivative and with time derivative, respectively) are displayed
for the nonparametric and the parametric simulations. The latter
includes the results from using datasets with response latencies of 0,
1, or 2 s.
Effect of using the time derivative of HRF on power
F IG. 4.
HRF generated for the parametric simulation (Cohen et al., 1997)
(clear blue), the same function with 1-s delay (black) and 2-s delay
(green), and the ideal HRF from SPM (pink). The HRF from the
nonparametric simulation (orange) is the average HRF estimated
from one “active” region of the occipital lobe.
Hemodynamic response functions (HRF). Shown are the
ROBUSTNESS OF SPM PREPROCESSING PARAMETERS
after correcting for multiple comparisons) of 0.05.
Power was estimated as the ratio between the number
of times that a model yielded a significant outcome for
an “active” area and the total number of samples (true
positive rate) (n ? 500). Because the familywise alpha
level obtained after adjusting for multiple corrections
was extremely low (? ? 0.00005, as estimated by in-
terpolating the “threshold” t value and the degrees of
freedom reported by SPM into the t probability distri-
bution) we were not able todetermine type I error with
our sample size (n ? 500 datasets). In fact, around
20000 simulated datasets would have been needed to
compute the type I error correctly. However, given that
the number of false positives (the number of times that
a model reported a significant outcomefor an “inactive”
area) of 500 datasets ranged from 0 to 3, we were not
tooconcerned about underestimating type I error rates
by using our current sample size. Thus, instead of
reporting the type I error, we showed the number of
false positives out of 500 datasets (Table 2). To assess
the number of false positives from the parametric sim-
ulations, five 3 ? 3 ? 2 “inactive” areas, i.e., areas
composed of voxels where no activation was added to
the baseline noise, were selected (Fig. 1a). Two “inac-
tive” regions of the same dimensions were designated
for the nonparametric datasets, from areas of the brain
where the t statistic was close to zero (Fig. 1b). Both
power and false positives were assessed from different
regions of the same datasets. The average number of
voxels that reached statistical significance for “active”
regions was quantified and is displayed on Table 2.
As with any other statistical measure, power esti-
mates are subject to random variation and therefore
confidence intervals are needed to compare differences
between models. Themost common way of constructing
confidenceintervals around power estimates generated
through Monte Carlo simulations is by assuming a
binomial distribution, because each individual sample
test contributes with one out of two possible mutually
exclusive outcomes (i.e., reject or accept the null hy-
pothesis). However, in cases where the sample varia-
tion within simulation scenarios (i.e., models) is
greater than random, a binomial approach would over-
estimate power differences across models. In the
present study, the large variation observed within and
between subjects suggests that using a binomial dis-
tribution would not be appropriate. An alternative ap-
proach, commonly used in the realms of robust estima-
tion (e.g., Dryden and Walker, 1999), is to construct
confidence intervals empirically by resampling the
original data (i.e., sample test probability values) a
large number of times and calculating the power for
each subset. The confidence interval is then con-
structed based on the variation around the values for
the subsets. These intervals will be influenced by the
sampling variability due to subjects and regions, thus
providing a more conservative approach for comparing
power between models. Our protocol for estimating
confidence intervals was as follows: (1) sample with
replacement 100 probability values out of the total
values available per model (i.e., 100 sample tests ? 5
subjects ? 5 regions for theparametric simulation, and
100 ? 5 subjects ? 2 regions for the nonparametric
simulation), using this subset to calculate power; (2)
repeat step 1, 1000 times; (3) based on 1000 values
generated in step 2, construct a 95% percentile confi-
dence interval. A sampling size of 100 probability val-
ues was chosen to estimate the confidence intervals
because it represented the smallest sample unit where
sampling variation was only due to chance (i.e., num-
ber of sample tests generated per subject). Because
type I error rates were generally smaller than the
specified alpha level for all scenarios, confidence inter-
vals for these estimates were not constructed.
RESULTS AND DISCUSSION
Overall, the results from our simulations indicate
that, despite particular differences, the parameter
combination yielding the most powerful results was
consistent across the four scenarios of the parametric
approach and the nonparametric approach. Specifi-
cally, theselection of HRF as thebasis function and the
high-pass filter (models 1 and 3) were more efficient
than any of the other parameters in modeling the fMRI
data. Moreover, it is worth emphasizing that the pat-
tern of results obtained using the nonparametric ap-
proach resembled closely that from scenario 2 of the
parametric approach. Interestingly, although the spa-
tial and temporal structure for the two sets of simula-
tions was very different, their experimental design fol-
lowed a variable ISI. This finding is important as it
reinforces the generalizability of our work. Finally,
except for the effect of global scaling, the pattern of
results obtained for scenario 1 of the parametric sim-
ulation was similar to that obtained for scenario 2 and
the nonparametric simulation. However, regardless of
the SPM model, datasets generated with a fixed ISI
yielded less powerful results. This observation is con-
sistent with the results of a recent simulation study
indicating that event-related designs with fixed ISI are
less efficient for power detection than those where the
presentation of thestimulus is random or semi-random
(Liu et al., 2001). A detailed discussion concerning the
impact of each SPM preprocessing parameter on power
and type I error, follows below.
Basis Function: HRF Alone versus HRF/TD
As mentioned in the methods section, the optimal
basis function to model fMRI time series was deter-
mined before running all Monte Carlo simulations. To
decide whether HRF/TD would be evaluated as an-
other preprocessing parameter, the impact of HRF/TD
DELLA-MAGGIORE ET AL.
in modeling fMRI data was assessed by testing 500
nonparametric datasets with and without HRF/TD.
Figure 3 shows the power computed from averaging
the number of true positives across all regions for mod-
els 1 (with HRF alone) and 2 (HRF/TD). The results
indicate that power was drastically reduced when
HRF/TD was selected. The efficiency of this parameter
in correcting for differences in response latency was
further assessed using the parametric approach by
simulating 500 datasets with 0-, 1-, or 2-s delay in the
onset of the hemodynamic response curve. To make
these results comparable to those obtained using the
nonparametric approach, the data was generated ac-
cording to scenario 2 (variable ISI). The normalized
shape and time course of the hemodynamic response
curve corresponding to SPM, the parametric (with the
three delay conditions) and nonparametric simulations
are illustrated by Fig. 4. The outcomes, depicted in Fig.
3, indicate that including HRF/TD as an extra covari-
ate to the GLM only increased the power for the 0 sec
delay condition. However, it attenuated the power sig-
nificantly for datasets with a response latency of 1 s
and drastically for datasets with a response latency of
2 s. Together with the onsets displayed by Fig. 4, these
results served to explain why the power was so low
when nonparametric datasets were modeled using
Further comparison of the parameter estimates
(beta coefficients), the variance and the residuals for
HRF and HRF/TD, using SPM’s test of statistical in-
ference1(Worsley and Friston, 1995), helped us under-
standing the nature of these results. Table 3 displays
the results obtained for one active voxel of one dataset
of the parametric simulation (scenario 2) and the non-
parametric simulation. The addition of the time deriv-
ative decreased the curve fitting residuals, but also
increased the residual variance used for calculation of
the t statistic and hence, did not always yield higher t
values. A higher t value was obtained for the 0-s delay
condition, because the effect variance for the time se-
ries was significantly augmented by HRF/TD. Con-
versely, the slightly higher effect variance obtained for
the 1-s delay condition was not enough toovercome the
high residual variance associated with the inclusion of
the derivative, resulting in a lower t value. Finally, the
addition of the time derivative to model the 2-s delay
condition and the nonparametric time series yielded to
a lower effect variance, drastically reducing both t val-
These findings are consistent with the results of our
simulations and support the observation that the effi-
cacy of the time derivative in accounting for delays in
the response onset depends on the magnitude of the
delay. Moreover, they suggest that improving the
model fitness does not always lead to higher power
estimates. Based on one real fMRI dataset, Hopfinger
and collaborators (2000) have reported similar sensi-
tivity of HRF and HRF/TD. Given the interchangeabil-
ity of the two basis functions, the authors suggested to
use HRF/TD to account for occasional delays in HRF.
Our results, however, indicate that depending on the
response latency, HRF/TD may drastically reduce the
power of the analysis. For these reasons, HRF/TD was
not considered as a parameter set for further evalua-
tion in this study.
Global Intensity Normalization: Scale or None?
Scaling the data by the global mean attenuated
power for those datasets with variable ISI (i.e., sce-
nario 2 of the parametric simulation and the nonpara-
metric simulations) (Fig. 5). This effect was particu-
1t values were computed according to the formula T ? cb/
(c?2(G*TG*)?1G*TVG(G*TG*)?1cT)1/2, where c represents the contrast
of interest, b is the parameter estimate, ?2is an unbiased estimator
of the error variance ?2, V ? K K
represent the hemodynamic response function and G* ? K G, where
G is the design matrix.
T, where K is a matrix whose rows
Tindicates the matrix transpose.
T ABL E 3
Efficiency of HRF/TD in Adjusting for Delays in the Onset of HRF
Model 1Model 2
Note. Shown are the t values obtained by dividing the effect variance and the residual variance according toSPM’s formula (for details see
discussion) ? T ? cb/(c2(G*TG*)?1G*TVG(G*TG*)?1cT)1/2, where the nominator is the effect variance and the denominator, the residual
variance. The fitting residuals were obtained from fitting the time series of one active voxel with HRF (Model 1) and HRF/TD (Model 2). The
variables were measured based on one dataset derived from a representative subject of each simulation approach. Parametric simulations
were generated with a response latency of either 0, 1, or 2 s.
ROBUSTNESS OF SPM PREPROCESSING PARAMETERS
larly pronounced for all models of the nonparametric
The use of global scaling to process neuroimaging
data remains controversial. Global signals, i.e., varia-
tions in signal that are common to the entire brain
volume, were initially considered to represent the un-
derlying background to regional changes in activity
(Ramsay et al., 1993). When the global mean is inde-
pendent of the experimental condition, scaling by the
grand mean can be beneficial because it reduces inter-
subject variability thereby improving the sensitivity at
the group level of analysis (McIntosh et al., 1996;
Aguirre et al., 1998). However, there is likely little
benefit for the analysis of single subjects. This hypoth-
esis is consistent with our results. The reason why
global scaling was only detrimental to datasets with
variable ISI is, however, unclear.
Temporal Filtering: High Pass Filter, Low Pass
Filter, and Autoregressive Models
Due to serial dependency of physiological and non-
physiological components of thenoise, fMRI timeseries
violate the independence of the error term, one of the
assumptions of the general linear model. This colored
noise represents a problem for inference based on time
series regression parameters (Bullmore et al., 2001),
and thus should be controlled for. Uncorrelated low-
frequency noise can be removed by using a high-pass
filter. Colored high-frequency noise can be either
treated with a low-pass filter that convolves the time
series with a Gaussian filter of the width of HRF (Fris-
ton et al., 1995b; Worsley and Friston, 1995; Zarahn et
al., 1997) or removed by using an autoregressive model
(Friston et al., 2000). In a recent paper, Friston and
collaborators (Friston et al., 2000) reported that mod-
eling unwanted frequency components by a combina-
tion of a high and a low-pass filter (temporal smooth-
ing) provided a good parameter estimation of the GLM,
while protecting for inferential bias. Using a 1st order
autoregressive model (AR1) was more efficient at pa-
rameter estimation but significantly enhanced inferen-
Our results comparing the confidence intervals for
all simulations (Fig. 5) indicate that the use of a high-
pass filter set at default cutoff, may be beneficial de-
pending on the experimental design. Indeed, although
no obvious improvement was observed for those simu-
lations with variable ISI, those with fixed ISI showed
higher power when the high pass filter was included
(see Fig. 5a, models 3 and 4). Nevertheless, it is impor-
tant to keep in mind that the use of a high pass filter
will depend on the amount of low-frequency noise,
which varies with the scanner.
The efficacy of using the low-pass filter or AR1 to
model temporal autocorrelation should be evaluated in
relation with the type I error rates, which varied with
the experimental design (i.e., fixed or variable ISI) and
the ISI. Although the number of false positives ob-
tained for the parametric simulations with fixed ISI
were0 of 500 datasets for all models, thoseobtained for
the parametric simulations with variable ISI were
higher for the models with higher power. Although at
first glance the average number of false positives for
thefirst threemodels does not appear particularly high
for a nominal alpha of 0.05 (Table 2: 3.4, 2, 2.2 of 500
for models 1, 2, and 3, respectively), they are certainly
much higher than the familywise alpha resulting after
correction for multiple comparisons, i.e., around 5 E
Note that the implementation of the low-pass filter or
F IG. 5.
denceintervals for all 16 models, corresponding toscenarios 1 (a) and
2 (b) of theparametric approach, and thenonparametric approach (c)
(n ? 500). Confidence intervals were obtained from spatially
smoothed datasets with a 10-mm FWHM.
Power estimates of SPM models. Shown are the confi-
DELLA-MAGGIORE ET AL.
AR1 reduced the number of false positives (see models
5 to 16 from Table 2), suggesting that they were effi-
cient in modeling serial autocorrelations. However, the
number of false positives for the nonparametric simu-
lation, where stimuli were also presented with a vari-
able ISI, was not high (only 1 false positive was ob-
tained for model 3, whereas the rest had no false
positives). We think that this difference may have orig-
inated in the length of the ISI. Figure 2 indicates the
ISI between some of the stimuli of the parametric
simulation was very short (as short as 3 or 6 s in
occasions), whereas the minimum ISI for the nonpara-
metric simulation was 8 s (minimum ITI was also 8 s).
Although the inactive areas lacked any activation, the
regression model used to fit the fMRI time series for
the inactive voxels was determined by the stimuli on-
sets of the active regions. We hypothesize that a re-
gression model specified according to the onsets dis-
played by Fig. 2b (corresponding to the parametric
simulation with variable ISI) would result in a better
fit for the noise of the inactive voxels than that speci-
fied according to the onsets displayed by Fig. 2c (cor-
responding tothe non-parametric simulation alsowith
variable ISI). As a result, the number of false positives
occurring in those areas would increase for the model
specified by the parametric simulation with variable
ISI but not as much for the nonparametric simulation
with longer ISI. That was in fact, the pattern obtained
for the type I errors (Table 2). We confirmed our hy-
pothesis by running 500 additional parametric simula-
tions with variable ISI, in which we increased the ISI
toat least 8 s, which showed a reduction in type I error
with no changes in the power estimates (data not
Based on these findings, we conclude that the effi-
ciency of the low-pass filter and AR1 in modeling serial
autocorrelations depends on the ISI. In our case, the
relatively low incidence of false positives associated
with the most powerful models (1 and 3), suggest that
the implementation of a low-pass filter or AR1 is not
necessary for valid inference. However, when dealing
with fMRI time series where stimuli are presented
close together, such as in rapid presentation event-
related designs, the risk of inferential bias would in-
crease. In those cases, the use of AR1 is recommend
over the low-pass filter as it appears to decrease the
number of false positives while maintaining a rela-
tively high power (see model 9 from Table 2).
Our main goal was to conduct a simulation study
where differences in performance of SPM preprocess-
ing parameters could be contrasted and revealed. Sev-
eral conclusions can be extracted from assessing the
robustness of theSPM preprocessing parameters using
simulated fMRI datasets. It is important to keep in
mind that these conclusions are based on the scenarios
considered in this study for individual subject analysis,
and thus may not apply toall fMRI experiments. How-
ever, we have adopted a framework that is sufficiently
general toguideSPM users in assessing therobustness
of other data sets or scenarios that may be more ap-
propriate to their specific questions.
To begin, given the inconsistencies associated with
theuseof HRF/TD, it would seem wisetouseHRF over
HRF/TD to model fMRI data. The use of an HRF that
is empirically derived for each voxel, rather than a
canonical HRF, may prove to be the best overall solu-
tion to the discrepancy. The use of the high-pass filter
is recommended when analyzing fMRI datasets with
fixed ISI. No improvement in power, over the use of
HRF alone, was evident when applied to datasets with
variable ISI. The use of global scaling for individual
analysis should be avoided since it can significantly
reduce the power, in particular for datasets with vari-
Finally, given that both the low pass filter and the
first-order autoregressivefunction decreasepower, it is
only recommended to use them for fMRI datasets with
short ISI (?8 s), which are more susceptible to infer-
ential bias. In those cases, AR1 appears to be more
efficient than HRF in that it controls for the incidence
of false positives while maintaining a relatively high
power. The effect of using the gaussian filter, alterna-
tive hemodynamic response functions, and a putative
more efficient high-pass filter remains to be tested.
The first twoauthors of the paper have equally contributed tothis
work. We thank Barry Giesbrecht and George R. Mangun for pro-
viding us with the fMRI Data used to generate the Monte Carlo
simulations. The computer code may be obtained by contacting Dr.
Wilkin Chau at email@example.com. We are also grateful
toCraig Easdon for his helpful comments on our manuscript. Funded
by Natural Sciences and Engineering Research Council and Cana-
dian Institutes of Health Research held by A. R. McIntosh.
Adler, R. J . 1981. TheGeometry of Random Fields. Wiley, New York.
Aguirre, G. K., Zarahn, E., and D’Esposito, M. 1998. The inferential
impact of global signal covariates in functional neuroimaging anal-
yses. Neuroimage 8(3): 302–306.
Cox, R. W. 1996. AFNI: Software for analysis and visualization of
functional magnetic resonance neuroimages. Comput. Biomed.
Res. 29(3): 162–173.
Cohen, M. S. 1997. Parametric analysis of fMRI data using linear
systems methods. NeuroImage 6: 93–103.
Cohen, J . 1988. Statistical Power Analysis for the Behavioral Sci-
ences. 2nd ed. L. Erlbaum, Hillsdale, NJ .
D’Esposito, M., Zarahn, E., and Aguirre, G. K. 1999. Event-related
functional MRI: Implications for cognitive psychology. Psychol.
Bull. 125(1): 155–164.
Dryden, I. L., and Walker, G. 1999. Highly resistant regression and
object matching. Biometrics 55: 820–825.
ROBUSTNESS OF SPM PREPROCESSING PARAMETERS
Efron, B., and Tibshirani, R. J . 1993. An Introduction to the Boot-
strap. Chapman & Hall.
Friston, K. J ., J ezzard, P., and Turner, R. 1994. Analysis of func-
tional MRI time series. Hum. Brain Mapp. 1: 153–171.
Friston, K. J ., et al. 1995. Statistical parametric maps in functional
imaging: A general linear approach. Hum. Brain Mapp. 2: 189–210.
Friston, K. J ., Frith, C. D., Turner, R., and Frackowiak, R. S. 1995a.
Characterizing evoked hemodynamics with fMRI. NeuroImage
Friston, K. J ., Holmes, A. P., Poline, J . B., Grasby, P. J ., Williams,
S. C., Frackowiak, R. S., and Turner, R. 1995b. Analysis of fMRI
time-series revisited. NeuroImage 2(1): 45–53.
Friston, K. J ., J osephs, O., Zarahn, E., Holmes, A. P., Rouquette, S.,
and Poline, J . 2000. To smooth or not to smooth? Bias and effi-
ciency in fMRI time-series analysis. NeuroImage 12(2): 196–208.
Giesbrecht, B., Woldorff, M. G., Fichtenholtz, H. M., and Mangun,
G. R. 2000. Isolating the neural mechanisms of spatial and non-
spatial attentional control. 30th Annual Meeting of the Society for
Neuroscience, New Orleans, LA.
Holmes, A. P., J osephs, O., Bu ¨chel, C., and Friston, K. J . 1997.
Statistical modeling of low-frequency confounds in fMRI. Proceed-
ing of the 3rd International Conference of the Functional Mapping
of the Human Brain, S480.
Hopfinger, J . B., Buchel, C., Holmes, A. P., and Friston, K. J . 2000.
A study of analysis parameters that influence the sensitivity of
event-related fMRI analyses. NeuroImage 11(4): 326–33.
Liu, T. T., Frank, L. R., Wong, E. C., and Buxton, R. B. 2001.
Detection power, estimation efficiency, and predictability in event-
related fMRI. NeuroImage 13: 759–773.
J ezzard, P., and Song, A. W. 1996. Technical foundations and pitfalls
of clinical fMRI. NeuroImage 4(3 Pt 3): 63–75.
Mcintosh, A. R., Grady, C. L., Haxby, J . V., Maisog, J . Ma, Horwitz,
B., and Clark, C. M. 1996. Within subject transformations of PET
regional cerebral blood flow data: ANCOVA, ratio, and Z score
adjustments on empirical data. Hum. Brain Mapp. 4: 93–102.
Oden, N. L. 1991. Allocation of effort in Monte Carlo simulation for
power of permutation tests. J . Am. Stat. Assoc. 86: 1074–1076.
Peres-Neto, P., and Olden, J . D. 2001. Assessing the robustness of
randomization tests: Examples from behavioral studies. Animal
Behav. 61: 79–86.
Petersson, K. M., Nichols, T. E., Poline, J .-B., and Holmes, A. P.
1999. Statistical limitations in functional neuroimaging II. Signal
detection and statistical inference. Philos. Trans. R. Soc. Lond. B
Purdon, P. L., and Weisskoff, R. M. 1998. Effect of temporal auto-
correlation due to physiological noise and stimulus paradigm on
voxel-level false-positive rates in fMRI. Hum. Brain Mapp. 6(4):
Ramsay, S. C., Murphy, K., Shea, S. A., Friston, K. J ., Lammertsma,
A. A., Clark, J . C., Adams, L., Guz, A., and Frackowiak, R. S. 1993.
Changes in global cerebral blood flow in humans: Effect on re-
gional cerebral blood flow during a neural activation task.
J . Physiol. 471: 521–534.
Stephens, M. A. 1974. EDF statistics for goodness of fit and some
comparisons. J . Am. Stat. Assoc. 69: 730–737.
Thomas, L., and Krebs, C. 1997. A review of statistical power anal-
ysis software. Bull. Ecol. Soc. Am. 78: 126–140.
Turner, R., Howseman, A., Rees, G. E., J osephs, O., and Friston, K.
1998. Functional magnetic resonance imaging of the human brain:
Data acquisition and analysis. Exp. Brain Res. 123(1–2): 5–12.
Worsley, K. J ., Evans, A. C., Marrett, S., and Neelin, P. A. 1992.
Three-dimensional statistical analysis for CBF activation studies
in human brain. J . Cereb. Blood Flow Metab. 12(6): 900–1180.
Worsley, K. J ., and Friston, K. J . 1995. Analysis of fMRI time-series
revisited—Again. NeuroImage 2(3): 173–181.
Worsley, K. J ., Marrett, S., Neelin, P., Vanal, A. C., Friston, K. J .,
and Evans, A. C. 1996. A unified statistical approach for determin-
ing significant signals in images of cerebral activation. Hum.
Brain Mapp. 4: 58–73.
Zarahn, E., Aguirre, G. K., and D’Esposito, M. 1997. Empirical anal-
yses of BOLD fMRI statistics. I. Spatially unsmoothed data col-
lected under null-hypothesis conditions. NeuroImage 5(3): 179–
DELLA-MAGGIORE ET AL.