PosterPDF Available

(free)Surfing ANTs: a comparative study


Abstract and Figures

Robust and automated tools such as FreeSurfer or ANTs for brain segmentation and quantification have been of utmost importance in the recent advancements in the Neuroimaging field. However, the reproducibility and variability of their results are often uncertain. In this study, we analyzed their reproducibility and compared their outputs for different brain structures. We observe that both tools give high reproducibility for volumetric studies and give similar results in most of the examined cases. However, in some structures (pallidum, rostral-anterior cingulate) there are more pronounced and significant differences. Our results also indicate slightly better reproducibility of ANTs over FreeSurfer.
No caption available
No caption available
No caption available
No caption available
Content may be subject to copyright.
(free)Surfing ANTs: a comparative study
Mint Labs Email
Santi Puch, Paulo Rodrigues, David Moreno-Dominguez, Marc Ramos, and Vesna Prč kovska
Reproducibility of brain segmentation and quantification
Robust and automated tools for brain
segmentation and morphometric analysis
Reproducibility and variability of results often
Assess reproducibility Compare outcomes
Previous Work
Test-Retest dataset from Maclaren et al.
Image credit: original article
Established a unique dataset to assess brain
segmentation repeatability
120 T1-weighted volumes from 3 subjects
20 sessions in 31 days
ADNI protocol
Intra and inter-session variability of FreeSurfer v5.1
Brain segmentation pipelines
Standard FreeSurfer v5.3
AFNI4 skull strip
Registration OASIS-305
N4 bias field correction6
Atropos7 tissue segmentation
ANTs label propagation
Mint CloudN
Statistical analysis
Inter-session variability
Total standard deviation
Intra-session variability
Paired-data standard deviation
Monte Carlo
permutations test
Null hyphotesis: CVs = CVt
Intra-session coefficient of
Intra-session coefficient of
Statistical analysis
Results comparison
Percent variability difference9
ANTs showed less intra-session and inter-session
variability than FreeSurfer
Total Tissue volumes
FreeSurfer similar results as Maclaren et al.
Monte Carlo permutations test
Percent Variability Difference
Vast majority of structures show similar volumetric
measure, with a slight tendency for ANTs providing bigger
Bigger differences are observed in the globus pallidus and
rostral anterior cingulate
Tissue Differences
Deep brain structures and white matter
are overestimated in ANTs, compared to
Brainstem is overestimated in FreeSurfer
compared to ANTs
Evaluation of two of the most widely used morphometric pipelines: ANTs, FreeSurfer
We analyzed the reproducibility of these tools, showing:
High reproducibility of both tools
Slightly higher reproducibility of ANTs vs FreeSurfer
Future work will evaluate a much bigger dataset
[1] Fischl B, Salat DH, Busa E, Albert M, Dieterich M, Haselgrove C, van der Kouwe A, Killiany R, Kennedy D, Klaveness
S, Montillo A, Makris N, Rosen B, Dale AM. (2002). Whole brain segmentation: automated labeling of
neuroanatomical structures in the human brain. Neuron 33, 341-355.
[2] Avants BB, Tustison NJ, Song G, Cook PA, Klein A, Gee JC. (2011). A Reproducible Evaluation of ANTs Similarity
Metric Performance in Brain Image
Registration. NeuroImage, 54(3), 2033–2044.
[3] Maclaren J, Han Z, Vos SB, Fischbein N, Bammer R. (2014). Reliability of brain volume measurements: A test-retest
dataset. Scienti c Data, 1,
140037. Gshare.929651
[4] Cox RW. (1996). AFNI: software for analysis and visualization of functional magnetic resonance neuroimages.
Comput. Biomed. Res., 29, 3:162-73.
[5] Klein A, Tourville J. (2012). 101 labeled brain images and a consistent human cortical labeling protocol. Frontiers
in Brain Imaging Methods.
[6] Tustison NJ, Avants BB, Cook PA, Zheng Y, Egan A, Yushkevich PA, Gee JC. (2010). N4ITK: improved N3 bias
correction. IEEE Trans Med Imaging, 29,
[7] Avants BB, Tustison NJ, Wu J, Cook PA, Gee JC. (2011). An open source multivariate framework for n-tissue
segmentation with evaluation on public data. Neuroinformatics, 9, 4:381-400. doi: 10.1007/s12021-011-9109-y.
[8] Mint Labs CloudN neuroimaging platform:
[9] Tustison NJ, Cook PA, Klein A, Song G, Das SR, Duda JT, Kandel BM, van Strien N, Stone JR, Gee JC, Avants BB.
(2014). Large-scale evaluation of
ANTs and FreeSurfer cortical thickness measurements. Neuroimage 99, 166–179.
Thank you
... Preceding publications: Puch et al. 1 and Maclaren et al. 2 assessed the reproducibility of brain volume measurements in a test-retest dataset employing two widely used automatic segmentation tools: ANTs and Freesurfer. ...
Full-text available
We conducted a reliability analysis for SIENAX in a test-retest dataset and a multi-site dataset. The volumetric outputs of SIENAX show low coefficients of variance for the test-retest dataset but quite higher multi-site data, suggesting a possible need for data harmonization in multi-site studies.
Full-text available
We introduce the Mindboggle-101 dataset, the largest and most complete set of free, publicly accessible, manually labeled human brain images. To manually label the macroscopic anatomy in magnetic resonance images of 101 healthy participants, we created a new cortical labeling protocol that relies on robust anatomical landmarks and minimal manual edits after initialization with automated labels. The "Desikan-Killiany-Tourville" (DKT) protocol is intended to improve the ease, consistency, and accuracy of labeling human cortical areas. Given how difficult it is to label brains, the Mindboggle-101 dataset is intended to serve as brain atlases for use in labeling other brains, as a normative dataset to establish morphometric variation in a healthy population for comparison against clinical populations, and contribute to the development, training, testing, and evaluation of automated registration and labeling algorithms. To this end, we also introduce benchmarks for the evaluation of such algorithms by comparing our manual labels with labels automatically generated by probabilistic and multi-atlas registration-based approaches. All data and related software and updated information are available on the website.
Full-text available
We introduce Atropos, an ITK-based multivariate n-class open source segmentation algorithm distributed with ANTs ( The Bayesian formulation of the segmentation problem is solved using the Expectation Maximization (EM) algorithm with the modeling of the class intensities based on either parametric or non-parametric finite mixtures. Atropos is capable of incorporating spatial prior probability maps (sparse), prior label maps and/or Markov Random Field (MRF) modeling. Atropos has also been efficiently implemented to handle large quantities of possible labelings (in the experimental section, we use up to 69 classes) with a minimal memory footprint. This work describes the technical and implementation aspects of Atropos and evaluates its performance on two different ground-truth datasets. First, we use the BrainWeb dataset from Montreal Neurological Institute to evaluate three-tissue segmentation performance via (1) K-means segmentation without use of template data; (2) MRF segmentation with initialization by prior probability maps derived from a group template; (3) Prior-based segmentation with use of spatial prior probability maps derived from a group template. We also evaluate Atropos performance by using spatial priors to drive a 69-class EM segmentation problem derived from the Hammers atlas from University College London. These evaluation studies, combined with illustrative examples that exercise Atropos options, demonstrate both performance and wide applicability of this new platform-independent open source segmentation tool.
Many studies of the human brain have explored the relationship between cortical thickness and cognition, phenotype, or disease. Due to the subjectivity and time requirements in manual measurement of cortical thickness, scientists have relied on robust software tools for automation which facilitate the testing and refinement of neuroscientific hypotheses. The most widely used tool for cortical thickness studies is the publicly available, surface-based FreeSurfer package. Critical to the adoption of such tools is a demonstration of their reproducibility, validity, and the documentation of specific implementations that are robust across large, diverse imaging datasets. To this end, we have developed the automated, volume-based Advanced Normalization Tools (ANTs) cortical thickness pipeline comprising well-vetted components such as SyGN (multivariate template construction), SyN (image registration), N4 (bias correction), Atropos (n-tissue segmentation), and DiReCT (cortical thickness estimation). In this work, we have conducted the largest evaluation of automated cortical thickness measures in publicly available data, comparing FreeSurfer and ANTs measures computed on 1205 images from four open data sets (IXI, MMRR, NKI, and OASIS), with parcellation based on the recently proposed Desikan-Killiany-Tourville (DKT) cortical labeling protocol. We found good scan-rescan repeatability with both FreeSurfer and ANTs measures. Given that such assessments of precision do not necessarily reflect accuracy or ability to make statistical inferences, we further tested the neurobiological validity of these approaches by evaluating thickness-based prediction of age and gender. ANTs is shown to have a higher predictive performance than FreeSurfer for both of these measures. In promotion of open science, we make all of our scripts, data, and results publicly available which complements the use of open image data sets and the open source availability of the proposed ANTs cortical thickness pipeline.
A variant of the popular nonparametric nonuniform intensity normalization (N3) algorithm is proposed for bias field correction. Given the superb performance of N3 and its public availability, it has been the subject of several evaluation studies. These studies have demonstrated the importance of certain parameters associated with the B -spline least-squares fitting. We propose the substitution of a recently developed fast and robust B-spline approximation routine and a modified hierarchical optimization scheme for improved bias field correction over the original N3 algorithm. Similar to the N3 algorithm, we also make the source code, testing, and technical documentation of our contribution, which we denote as ??N4ITK,?? available to the public through the Insight Toolkit of the National Institutes of Health. Performance assessment is demonstrated using simulated data from the publicly available Brainweb database, hyperpolarized <sup>3</sup>He lung image data, and 9.4T postmortem hippocampus data.
The United States National Institutes of Health (NIH) commit significant support to open-source data and software resources in order to foment reproducibility in the biomedical imaging sciences. Here, we report and evaluate a recent product of this commitment: Advanced Neuroimaging Tools (ANTs), which is approaching its 2.0 release. The ANTs open source software library consists of a suite of state-of-the-art image registration, segmentation and template building tools for quantitative morphometric analysis. In this work, we use ANTs to quantify, for the first time, the impact of similarity metrics on the affine and deformable components of a template-based normalization study. We detail the ANTs implementation of three similarity metrics: squared intensity difference, a new and faster cross-correlation, and voxel-wise mutual information. We then use two-fold cross-validation to compare their performance on openly available, manually labeled, T1-weighted MRI brain image data of 40 subjects (UCLA's LPBA40 dataset). We report evaluation results on cortical and whole brain labels for both the affine and deformable components of the registration. Results indicate that the best ANTs methods are competitive with existing brain extraction results (Jaccard=0.958) and cortical labeling approaches. Mutual information affine mapping combined with cross-correlation diffeomorphic mapping gave the best cortical labeling results (Jaccard=0.669±0.022). Furthermore, our two-fold cross-validation allows us to quantify the similarity of templates derived from different subgroups. Our open code, data and evaluation scripts set performance benchmark parameters for this state-of-the-art toolkit. This is the first study to use a consistent transformation framework to provide a reproducible evaluation of the isolated effect of the similarity metric on optimal template construction and brain labeling.
A package of computer programs for analysis and visualization of three-dimensional human brain functional magnetic resonance imaging (FMRI) results is described. The software can color overlay neural activation maps onto higher resolution anatomical scans. Slices in each cardinal plane can be viewed simultaneously. Manual placement of markers on anatomical landmarks allows transformation of anatomical and functional scans into stereotaxic (Talairach-Tournoux) coordinates. The techniques for automatically generating transformed functional data sets from manually labeled anatomical data sets are described. Facilities are provided for several types of statistical analyses of multiple 3D functional data sets. The programs are written in ANSI C and Motif 1.2 to run on Unix workstations.
We present a technique for automatically assigning a neuroanatomical label to each voxel in an MRI volume based on probabilistic information automatically estimated from a manually labeled training set. In contrast to existing segmentation procedures that only label a small number of tissue classes, the current method assigns one of 37 labels to each voxel, including left and right caudate, putamen, pallidum, thalamus, lateral ventricles, hippocampus, and amygdala. The classification technique employs a registration procedure that is robust to anatomical variability, including the ventricular enlargement typically associated with neurological diseases and aging. The technique is shown to be comparable in accuracy to manual labeling, and of sufficient sensitivity to robustly detect changes in the volume of noncortical structures that presage the onset of probable Alzheimer's disease.
Reliability of brain volume measurements: A test-retest dataset
  • J Maclaren
  • Z Han
  • S B Vos
  • N Fischbein
  • R Bammer
Maclaren J, Han Z, Vos SB, Fischbein N, Bammer R. (2014). Reliability of brain volume measurements: A test-retest dataset. Scienti c Data, 1, 140037. Gshare.929651