Mapping reliability in multicenter MRI: Voxel-based morphometry and cortical thickness

Department of Psychiatry, Rudolf Magnus Institute of Neuroscience, University Medical Center Utrecht, Utrecht, The Netherlands.
Human Brain Mapping (Impact Factor: 5.97). 12/2010; 31(12):1967-82. DOI: 10.1002/hbm.20991
Source: PubMed


Multicenter structural MRI studies can have greater statistical power than single-center studies. However, across-center differences in contrast sensitivity, spatial uniformity, etc., may lead to tissue classification or image registration differences that could reduce or wholly offset the enhanced statistical power of multicenter data. Prior work has validated volumetric multicenter MRI, but robust methods for assessing reliability and power of multisite analyses with voxel-based morphometry (VBM) and cortical thickness measurement (CORT) are not yet available. We developed quantitative methods to investigate the reproducibility of VBM and CORT to detect group differences and estimate heritability when MRI scans from different scanners running different acquisition protocols in a multicenter setup are included. The method produces brain maps displaying information such as lowest detectable effect size (or heritability) and effective number of subjects in the multicenter study. We applied the method to a five-site multicenter calibration study using scanners from four different manufacturers, running different acquisition protocols. The reliability maps showed an overall good comparability between the sites, providing a reasonable gain in sensitivity in most parts of the brain. In large parts of the cerebrum and cortex scan pooling improved heritability estimates, with "effective-N" values upto the theoretical maximum. For some areas, "optimal-pool" maps indicated that leaving out a site would give better results. The reliability maps also reveal which brain regions are in any case difficult to measure reliably (e.g., around the thalamus). These tools will facilitate the design and analysis of multisite VBM and CORT studies for detecting group differences and estimating heritability.

19 Reads
    • "T1-weighted (T1w) MRIs were acquired twice on 40 healthy control subjects at one of four different imaging locations, using three different brands of magnet and software (General Electric (GE), Philips, and Siemens) with standardized acquisition protocols. This sample size (N 5 40), is comparable to ([Jovicich et al., 2013], N 5 40) or larger (N 5 6–30 [Desikan et al., 2006b; Dickerson et al., 2008; Han et al., 2006; Schnack et al., 2010; Wonderlick et al., 2009]) than the other studies that analyzed the test– retest reliability, except two recent studies (N 5 1205 [Tustison et al., 2014]; N 5 189 [Liem et al., 2015]). "
    [Show abstract] [Hide abstract]
    ABSTRACT: In the last decade, many studies have used automated processes to analyze magnetic resonance imaging (MRI) data such as cortical thickness, which is one indicator of neuronal health. Due to the convenience of image processing software (e.g., FreeSurfer), standard practice is to rely on automated results without performing visual inspection of intermediate processing. In this work, structural MRIs of 40 healthy controls who were scanned twice were used to determine the test-retest reliability of FreeSurfer-derived cortical measures in four groups of subjects-those 25 that passed visual inspection (approved), those 15 that failed visual inspection (disapproved), a combined group, and a subset of 10 subjects (Travel) whose test and retest scans occurred at different sites. Test-retest correlation (TRC), intraclass correlation coefficient (ICC), and percent difference (PD) were used to measure the reliability in the Destrieux and Desikan-Killiany (DK) atlases. In the approved subjects, reliability of cortical thickness/surface area/volume (DK atlas only) were: TRC (0.82/0.88/0.88), ICC (0.81/0.87/0.88), PD (0.86/1.19/1.39), which represent a significant improvement over these measures when disapproved subjects are included. Travel subjects' results show that cortical thickness reliability is more sensitive to site differences than the cortical surface area and volume. To determine the effect of visual inspection on sample size required for studies of MRI-derived cortical thickness, the number of subjects required to show group differences was calculated. Significant differences observed across imaging sites, between visually approved/disapproved subjects, and across regions with different sizes suggest that these measures should be used with caution. Hum Brain Mapp, 2015. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
    Human Brain Mapping 05/2015; 36(9). DOI:10.1002/hbm.22856 · 5.97 Impact Factor
  • Source
    • "The sample size was too small to permit separate analyses by site. However, prior studies using VBM with larger samples have shown that multi-site acquisitions result in reliable findings (Bendfeldt et al. 2012; Schnack et al. 2010). The authors worked closely with the radiology staff at each site to duplicate the imaging sequences across different platforms, and both human and machine phantoms were used to insure reliability during the study. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This study examined the associations among brain volumes, theory of mind (ToM), peer relationships, and psychosocial adjustment in children with traumatic brain injury (TBI). Participants included 8- to 13-year-old children, 82 with TBI and 61 with orthopedic injuries (OIs). Children completed three measures of ToM. Classmates provided ratings of participants’ peer relationships, acceptance, and friendships. Parents rated children’s psychosocial adjustment. MRI was used to determine brain volumes. Brain volumes were associated with ToM, which in turn was associated with peer rejection/victimization. Peer rejection/victimization in the classroom was associated with peer acceptance, friendship, social withdrawal, and general psychopathology. Brain volumes, ToM, peer relationships, and social adjustment show significant links among children with TBI and those with OI. The findings support a multilevel model of social competence in childhood TBI.
    01/2014; 2(1):97–107. DOI:10.1177/2167702613499734
  • Source
    • "The use of automated segmentation algorithms is desirable, as these algorithms are (i) much faster than manual segmentations and (ii) user independent, that is, they do not depend on expert knowledge in neuroanatomy. However, significant challenges exist as differences in brain structure between groups, or changes within subjects are often very subtle (please see, e.g., [15], [16]). Therefore, it is crucially important that (i) automated segmentation algorithms are able to precisely determine the exact amount of, for example, gray matter tissue in an MRI image (cf. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Automated gray matter segmentation of magnetic resonance imaging data is essential for morphometric analyses of the brain, particularly when large sample sizes are investigated. However, although detection of small structural brain differences may fundamentally depend on the method used, both accuracy and reliability of different automated segmentation algorithms have rarely been compared. Here, performance of the segmentation algorithms provided by SPM8, VBM8, FSL and FreeSurfer was quantified on simulated and real magnetic resonance imaging data. First, accuracy was assessed by comparing segmentations of twenty simulated and 18 real T1 images with corresponding ground truth images. Second, reliability was determined in ten T1 images from the same subject and in ten T1 images of different subjects scanned twice. Third, the impact of preprocessing steps on segmentation accuracy was investigated. VBM8 showed a very high accuracy and a very high reliability. FSL achieved the highest accuracy but demonstrated poor reliability and FreeSurfer showed the lowest accuracy, but high reliability. An universally valid recommendation on how to implement morphometric analyses is not warranted due to the vast number of scanning and analysis parameters. However, our analysis suggests that researchers can optimize their individual processing procedures with respect to final segmentation quality and exemplifies adequate performance criteria.
    PLoS ONE 09/2012; 7(9):e45081. DOI:10.1371/journal.pone.0045081 · 3.23 Impact Factor
Show more

Similar Publications