Jovicich J, Czanner S, Han X, Salat D, van der Kouwe A, Quinn B et al. MRI-derived measurements of human subcortical, ventricular and intracranial brain volumes: reliability effects of scan sessions, acquisition sequences, data analyses, scanner upgrade, scanner vendors and field strengths. NeuroImage 46: 177-192

Warwick Manufacturing Group, School of Engineering, University of Warwick, UK
NeuroImage (Impact Factor: 6.36). 03/2009; 46(1):177-192. DOI: 10.1016/j.neuroimage.2009.02.010


Automated MRI-derived measurements of in-vivo human brain volumes provide novel insights into normal and abnormal neuroanatomy, but little is known about measurement reliability. Here we assess the impact of image acquisition variables (scan session, MRI sequence, scanner upgrade, vendor and field strengths), FreeSurfer segmentation pre-processing variables (image averaging, B1 field inhomogeneity correction) and segmentation analysis variables (probabilistic atlas) on resultant image segmentation volumes from older (n = 15, mean age 69.5) and younger (both n = 5, mean ages 34 and 36.5) healthy subjects. The variability between hippocampal, thalamic, caudate, putamen, lateral ventricular and total intracranial volume measures across sessions on the same scanner on different days is less than 4.3% for the older group and less than 2.3% for the younger group. Within-scanner measurements are remarkably reliable across scan sessions, being minimally affected by averaging of multiple acquisitions, B1 correction, acquisition sequence (MPRAGE vs. multi-echo-FLASH), major scanner upgrades (Sonata–Avanto, Trio–TrioTIM), and segmentation atlas (MPRAGE or multi-echo-FLASH). Volume measurements across platforms (Siemens Sonata vs. GE Signa) and field strengths (1.5 T vs. 3 T) result in a volume difference bias but with a comparable variance as that measured within-scanner, implying that multi-site studies may not necessarily require a much larger sample to detect a specific effect. These results suggest that volumes derived from automated segmentation of T1-weighted structural images are reliable measures within the same scanner platform, even after upgrades; however, combining data across platform and across field-strength introduces a bias that should be considered in the design of multi-site studies, such as clinical drug trials. The results derived from the young groups (scanner upgrade effects and B1 inhomogeneity correction effects) should be considered as preliminary and in need for further validation with a larger dataset.

Download full-text


Available from: Silvester Czanner,
  • Source
    • "Therefore, an automated or semi-automated segmentation algorithm is highly desired to adequately reduce the time and workload required to obtain the ventricle volume from the 3-D US images in lieu of manual segmentation. Cerebral ventricle segmentation algorithms have been developed for computed tomography (Liu et al. 2010; Poh et al. 2012; Qian et al. 2013) or magnetic resonance (Coup e et al. 2011; Jovicich et al. 2009; Liu et al. 2009; Schnack et al. 2001) images and used primarily in adult populations. Wang et al. (2011, 2014) proposed two automatic level set-based neonatal brain segmentation approaches, but their methods were applied to magnetic resonance images. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A three-dimensional (3-D) ultrasound (US) system has been developed to monitor the intracranial ventricular system of preterm neonates with intraventricular hemorrhage (IVH) and the resultant dilation of the ventricles (ventriculomegaly). To measure ventricular volume from 3-D US images, a semi-automatic convex optimization-based approach is proposed for segmentation of the cerebral ventricular system in preterm neonates with IVH from 3-D US images. The proposed semi-automatic segmentation method makes use of the convex optimization technique supervised by user-initialized information. Experiments using 58 patient 3-D US images reveal that our proposed approach yielded a mean Dice similarity coefficient of 78.2% compared with the surfaces that were manually contoured, suggesting good agreement between these two segmentations. Additional metrics, the mean absolute distance of 0.65 mm and the maximum absolute distance of 3.2 mm, indicated small distance errors for a voxel spacing of 0.22 × 0.22 × 0.22 mm3. The Pearson correlation coefficient (r = 0.97, p < 0.001) indicated a significant correlation of algorithm-generated ventricular system volume (VSV) with the manually generated VSV. The calculated minimal detectable difference in ventricular volume change indicated that the proposed segmentation approach with 3-D US images is capable of detecting a VSV difference of 6.5 cm3 with 95% confidence, suggesting that this approach might be used for monitoring IVH patients' ventricular changes using 3-D US imaging. The mean segmentation times of the graphics processing unit (GPU)- and central processing unit-implemented algorithms were 50 ± 2 and 205 ± 5 s for one 3-D US image, respectively, in addition to 120 ± 10 s for initialization, less than the approximately 35 min required by manual segmentation. In addition, repeatability experiments indicated that the intra-observer variability ranges from 6.5% to 7.5%, and the inter-observer variability is 8.5% in terms of the coefficient of variation of the Dice similarity coefficient. The intra-class correlation coefficient for ventricular system volume measurements for each independent observer ranged from 0.988 to 0.996 and was 0.945 for three different observers. The coefficient of variation and intra-class correlation coefficient revealed that the intra- and inter-observer variability of the proposed approach introduced by the user initialization was small, indicating good reproducibility, independent of different users.
    Ultrasound in Medicine & Biology 12/2015; 41(2):542-556. DOI:10.1016/j.ultrasmedbio.2014.09.019 · 2.21 Impact Factor
    • "When using any automated method to examine neurobiology , it is important to properly validate the results. As such, several studies have analyzed the reliability of automated cortical thickness measurements using test–retest studies [Desikan et al., 2006b; Dickerson et al., 2008; Han et al., 2006; Jovicich et al., 2013; Liem et al., 2015; Schnack et al., 2010; Wonderlick et al., 2009]. For example, Eggert et al. investigated the reliability and the accuracy of different segmentation algorithms in SPM8 (http://www.fil.ion. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In the last decade, many studies have used automated processes to analyze magnetic resonance imaging (MRI) data such as cortical thickness, which is one indicator of neuronal health. Due to the convenience of image processing software (e.g., FreeSurfer), standard practice is to rely on automated results without performing visual inspection of intermediate processing. In this work, structural MRIs of 40 healthy controls who were scanned twice were used to determine the test-retest reliability of FreeSurfer-derived cortical measures in four groups of subjects-those 25 that passed visual inspection (approved), those 15 that failed visual inspection (disapproved), a combined group, and a subset of 10 subjects (Travel) whose test and retest scans occurred at different sites. Test-retest correlation (TRC), intraclass correlation coefficient (ICC), and percent difference (PD) were used to measure the reliability in the Destrieux and Desikan-Killiany (DK) atlases. In the approved subjects, reliability of cortical thickness/surface area/volume (DK atlas only) were: TRC (0.82/0.88/0.88), ICC (0.81/0.87/0.88), PD (0.86/1.19/1.39), which represent a significant improvement over these measures when disapproved subjects are included. Travel subjects' results show that cortical thickness reliability is more sensitive to site differences than the cortical surface area and volume. To determine the effect of visual inspection on sample size required for studies of MRI-derived cortical thickness, the number of subjects required to show group differences was calculated. Significant differences observed across imaging sites, between visually approved/disapproved subjects, and across regions with different sizes suggest that these measures should be used with caution. Hum Brain Mapp, 2015. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
    Human Brain Mapping 05/2015; 36(9). DOI:10.1002/hbm.22856 · 5.97 Impact Factor
  • Source
    • "to yield reproducible quantitative results across platforms and field strengths has been well documented. Specifically, the published literature indicates that our selected analysis method (e.g., the FreeSurfer pipeline) is quite robust against site differences in imaging platforms, field strengths, and sequence types because volume measurements and surface-based measures of cortical thickness exhibit comparable variance as that measured within the same scanner (Han & Fischl, 2007; Han et al., 2006; Jovicich et al., 2009). As part of a larger study, multiweighted sequences were acquired from each research participant (as approved by institutional review boards in Houston and Toronto). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Objective: Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Method: Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product-moment correlation was compared with 4 robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator. Results: All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Conclusions: Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations.
    Neuropsychology 12/2014; 29(2). DOI:10.1037/neu0000166 · 3.27 Impact Factor
Show more