Test-retest and between-site reliability in a multicenter fMRI study

Department of Psychiatry and Human Behavior, University of California Irvine, Irvine, California, USA.
Human Brain Mapping (Impact Factor: 5.97). 08/2008; 29(8):958-72. DOI: 10.1002/hbm.20440
Source: PubMed


In the present report, estimates of test-retest and between-site reliability of fMRI assessments were produced in the context of a multicenter fMRI reliability study (FBIRN Phase 1, www.nbirn.net). Five subjects were scanned on 10 MRI scanners on two occasions. The fMRI task was a simple block design sensorimotor task. The impulse response functions to the stimulation block were derived using an FIR-deconvolution analysis with FMRISTAT. Six functionally-derived ROIs covering the visual, auditory and motor cortices, created from a prior analysis, were used. Two dependent variables were compared: percent signal change and contrast-to-noise-ratio. Reliability was assessed with intraclass correlation coefficients derived from a variance components analysis. Test-retest reliability was high, but initially, between-site reliability was low, indicating a strong contribution from site and site-by-subject variance. However, a number of factors that can markedly improve between-site reliability were uncovered, including increasing the size of the ROIs, adjusting for smoothness differences, and inclusion of additional runs. By employing multiple steps, between-site reliability for 3T scanners was increased by 123%. Dropping one site at a time and assessing reliability can be a useful method of assessing the sensitivity of the results to particular sites. These findings should provide guidance toothers on the best practices for future multicenter studies.

Download full-text


Available from: Jessica Ann Turner,
  • Source
    • "This refers to the stability of estimates obtained when applying the model to multiple datasets over time, acquired under the same condition in the same subject. While reliability has been investigated frequently in the context of conventional fMRI activation (e.g., Aron et al., 2006; Brandt et al., 2013; Fliessbach et al., 2010; Friedman et al., 2008; Loubinoux et al., 2001; Plichta et al., 2012; Raemaekers et al., 2007) and functional connectivity studies (e.g., Braun et al., 2012; Birn et al., 2014), the reliability of DCM estimates has received less attention. To date, only two studies have addressed test-retest reliability in the context of DCM for fMRI. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Dynamic causal modeling (DCM) is a Bayesian framework for inferring effective connectivity among brain regions from neuroimaging data. While the validity of DCM has been investigated in various previous studies, the reliability of DCM parameter estimates across sessions has been examined less systematically. Here, we report results of a software comparison with regard to test-retest reliability of DCM for fMRI, using a challenging scenario where complex models with many parameters were applied to relatively few data points. Specifically, we examined the reliability of different DCM implementations (in terms of the intra-class correlation coefficient, ICC) based on fMRI data from 35 human subjects performing a simple motor task in two separate sessions, one month apart. We constructed DCMs of motor regions with fair to excellent reliability of conventional activation measures. Using classical DCM (cDCM) in SPM5, we found that the test-retest reliability of DCM results was high, both concerning the model evidence (ICC = 0.94) and the model parameter estimates (median ICC = 0.47). However, when using a more recent DCM version (DCM10 in SPM8), test-retest reliability was reduced notably. Analyses indicated that, in our particular case, the prior distributions played a crucial role in this change in reliability across software versions. Specifically, when using cDCM priors for model inversion in DCM10, this not only restored reliability but yielded even better results than in cDCM. Analyzing each component of the objective function in DCM, we found a selective change in the reliability of posterior mean estimates. This suggests that tighter regularization afforded by cDCM priors reduces the possibility of local extrema in the objective function. We conclude this paper with an outlook to ongoing developments for overcoming the software-dependency of reliability observed in this study, including global optimization and empirical Bayesian procedures.
    NeuroImage 05/2015; 117. DOI:10.1016/j.neuroimage.2015.05.040 · 6.36 Impact Factor
  • Source
    • "For example, if there are significant differences in the ratio of case versus control participants across sites in a multisite study, adjusting for site in the analysis may not be sufficient to eliminate all site effects. In such circumstances, assessment of reliability at an absolute level would inform the extent to which data are interchangeable across sites and thus the extent to which merging fMRI data across sites is valid [Friedman et al., 2008]. The most appropriate reliability measure, therefore, depends on study design and the research question at hand. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Multisite neuroimaging studies can facilitate the investigation of brain-related changes in many contexts, including patient groups that are relatively rare in the general population. Though multisite studies have characterized the reliability of brain activation during working memory and motor functional magnetic resonance imaging tasks, emotion processing tasks, pertinent to many clinical populations, remain less explored. A traveling participants study was conducted with eight healthy volunteers scanned twice on consecutive days at each of the eight North American Longitudinal Prodrome Study sites. Tests derived from generalizability theory showed excellent reliability in the amygdala ( = 0.82), inferior frontal gyrus (IFG; = 0.83), anterior cingulate cortex (ACC; = 0.76), insula ( = 0.85), and fusiform gyrus ( = 0.91) for maximum activation and fair to excellent reliability in the amygdala ( = 0.44), IFG ( = 0.48), ACC ( = 0.55), insula ( = 0.42), and fusiform gyrus ( = 0.83) for mean activation across sites and test days. For the amygdala, habituation ( = 0.71) was more stable than mean activation. In a second investigation, data from 111 healthy individuals across sites were aggregated in a voxelwise, quantitative meta-analysis. When compared with a mixed effects model controlling for site, both approaches identified robust activation in regions consistent with expected results based on prior single-site research. Overall, regions central to emotion processing showed strong reliability in the traveling participants study and robust activation in the aggregation study. These results support the reliability of blood oxygen level-dependent signal in emotion processing areas across different sites and scanners and may inform future efforts to increase efficiency and enhance knowledge of rare conditions in the population through multisite neuroimaging paradigms. Hum Brain Mapp, 2015. © 2015 Wiley Periodicals, Inc.
    Human Brain Mapping 03/2015; 36(7). DOI:10.1002/hbm.22791 · 5.97 Impact Factor
  • Source
    • "The nature of the in-scanner task and the analysis model have an important effect on reliability [Bennett and Miller, 2013; Caceres et al., 2009; Clement and Belleville, 2009]. Whereas reliability for some simpler sensorimotor tasks may be very high in neocortical regions [Friedman et al., 2008; Loubinoux et al., 2001; Raemaekers et al., 2007], memory fMRI studies "
    [Show abstract] [Hide abstract]
    ABSTRACT: fMRI is increasingly implemented in the clinic to assess memory function. There are multiple approaches to memory fMRI, but limited data on advantages and reliability of different methods. Here, we compared effect size, activation lateralisation, and between-sessions reliability of seven memory fMRI protocols: Hometown Walking (block design), Scene encoding (block design and event-related design), Picture encoding (block and event-related), and Word encoding (block and event-related). All protocols were performed on three occasions in 16 patients with temporal lobe epilepsy (TLE). Group T-maps showed activity bilaterally in medial temporal lobe for all protocols. Using ANOVA, there was an interaction between hemisphere and seizure-onset lateralisation (P = 0.009) and between hemisphere, protocol and seizure-onset lateralisation (P = 0.002), showing that the distribution of memory-related activity between left and right temporal lobes differed between protocols and between patients with left-onset and right-onset seizures. Using voxelwise intraclass Correlation Coefficient, between-sessions reliability was best for Hometown and Scenes (block and event). The between-sessions spatial overlap of activated voxels was also greatest for Hometown and Scenes. Lateralisation of activity between hemispheres was most reliable for Scenes (block and event) and Words (event). Using receiver operating characteristic analysis to explore the ability of each fMRI protocol to classify patients as left-onset or right-onset TLE, only the Words (event) protocol achieved a significantly above-chance classification of patients at all three sessions. We conclude that Words (event) protocol shows the best combination of between-sessions reliability of the distribution of activity between hemispheres and reliable ability to distinguish between left-onset and right-onset patients. Hum Brain Mapp, 2015. © 2015 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.
    Human Brain Mapping 03/2015; 36(4). DOI:10.1002/hbm.22726 · 5.97 Impact Factor
Show more