Accuracy assessment of global and local atrophy measurement techniques with realistic simulated longitudinal Alzheimer's disease images

Centre for Medical Image Computing, Department of Medical Physics and Bioengineering, University College London, WC1E 6BT, UK.
NeuroImage (Impact Factor: 6.36). 06/2008; 42(2):696-709. DOI: 10.1016/j.neuroimage.2008.04.259
Source: PubMed

ABSTRACT The evaluation of atrophy quantification methods based on magnetic resonance imaging have been usually hindered by the lack of realistic gold standard data against which to judge these methods or to help refine them. Recently [Camara, O., Schweiger, M., Scahill, R., Crum, W., Sneller, B., Schnabel, J., Ridgway, G., Cash, D., Hill, D., Fox, N., 2006. Phenomenological model of diffuse global and regional atrophy using finite-element methods. IEEE Trans. Med.l Imaging 25, 1417-1430], we presented a technique in which atrophy is realistically simulated in different tissue compartments or neuroanatomical structures with a phenomenological model. In this study, we have generated a cohort of realistic simulated Alzheimer's disease (AD) images with known amounts of atrophy, mimicking a set of 19 real controls and 27 probable AD subjects, with an improved version of our atrophy simulation methodology. This database was then used to assess the accuracy of several well-known computational anatomy methods which provide global (BSI and SIENA) or local (Jacobian integration) estimates of longitudinal atrophy in brain structures using MR images.

  • Source
    • "Twenty articles [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25] used either the correlation coefficient, coefficient of determination, comparison of means, or a combination of these methods in the analysis of agreement. Table 3 shows some of the examples of inappropriate applications and interpretations of statistical analysis in the analysis of agreement found in this review. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Accurate values are a must in medicine. An important parameter in determining the quality of a medical instrument is agreement with a gold standard. Various statistical methods have been used to test for agreement. Some of these methods have been shown to be inappropriate. This can result in misleading conclusions about the validity of an instrument. The Bland-Altman method is the most popular method judging by the many citations of the article proposing this method. However, the number of citations does not necessarily mean that this method has been applied in agreement research. No previous study has been conducted to look into this. This is the first systematic review to identify statistical methods used to test for agreement of medical instruments. The proportion of various statistical methods found in this review will also reflect the proportion of medical instruments that have been validated using those particular methods in current clinical practice. Five electronic databases were searched between 2007 and 2009 to look for agreement studies. A total of 3,260 titles were initially identified. Only 412 titles were potentially related, and finally 210 fitted the inclusion criteria. The Bland-Altman method is the most popular method with 178 (85%) studies having used this method, followed by the correlation coefficient (27%) and means comparison (18%). Some of the inappropriate methods highlighted by Altman and Bland since the 1980s are still in use. This study finds that the Bland-Altman method is the most popular method used in agreement research. There are still inappropriate applications of statistical methods in some studies. It is important for a clinician or medical researcher to be aware of this issue because misleading conclusions from inappropriate analyses will jeopardize the quality of the evidence, which in turn will influence quality of care given to patients in the future.
    PLoS ONE 05/2012; 7(5):e37908. DOI:10.1371/journal.pone.0037908 · 3.23 Impact Factor
  • Source
    • "Furthermore, to add to the challenge, it is difficult to obtain a gold standard. Previous authors have used simulated MRI phantoms (Lee et al., 2006) at one time point, or simulations of atrophy (Camara et al., 2008; Lerch and Evans, 2005) for longitudinal studies, however providing a physiologically plausible simulation of atrophy is itself a difficult task. For this reason, we chose to compare the performance of the algorithm according to reproducibility and both cross-sectional and longitudinal group differentiation, which are common applications within the literature. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Cortical thickness estimation performed in-vivo via magnetic resonance imaging is an important technique for the diagnosis and understanding of the progression of neurodegenerative diseases. Currently, two different computational paradigms exist, with methods generally classified as either surface or voxel-based. This paper provides a much needed comparison of the surface-based method FreeSurfer and two voxel-based methods using clinical data. We test the effects of computing regional statistics using two different atlases and demonstrate that this makes a significant difference to the cortical thickness results. We assess reproducibility, and show that FreeSurfer has a regional standard deviation of thickness difference on same day scans that is significantly lower than either a Laplacian or Registration based method and discuss the trade off between reproducibility and segmentation accuracy caused by bending energy constraints. We demonstrate that voxel-based methods can detect similar patterns of group-wise differences as well as FreeSurfer in typical applications such as producing group-wise maps of statistically significant thickness change, but that regional statistics can vary between methods. We use a Support Vector Machine to classify patients against controls and did not find statistically significantly different results with voxel based methods compared to FreeSurfer. Finally we assessed longitudinal performance and concluded that currently FreeSurfer provides the most plausible measure of change over time, with further work required for voxel based methods.
    NeuroImage 05/2011; 57(3):856-65. DOI:10.1016/j.neuroimage.2011.05.053 · 6.36 Impact Factor
    • "Similarly to our study, lower atrophy rates were obtained using Jacobian integration compared with segmentation and subtraction of serial hippocampal volumes (Barnes et al., 2007, 2008). Underestimation of global brain atrophy was also found when applying this Jacobian integration technique to MRI on which atrophy has been simulated (Camara et al., 2008). One reason that may underlie this finding could be the inclusion of partial volume and CSF voxels in the region of interest, in our case the baseline GM. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Global gray matter (GM) atrophy rates were quantified from magnetic resonance imaging (MRI) over 6- and 12-month intervals in 37 patients with Alzheimer's disease (AD) and 19 controls using: (1) nonlinear registration and integration of Jacobian values, and (2) segmentation and subtraction of serial GM volumes. Sample sizes required to power treatment trials using global GM atrophy rate as an outcome measure were estimated and compared between the 2 techniques, and to global brain atrophy measures quantified using the boundary shift integral (brain boundary shift integral; BBSI) and structural image evaluation, using normalization, of atrophy (SIENA). Increased GM atrophy rates (approximately 2% per year) were observed in patients compared with controls. Although mean atrophy rates provided by Jacobian integration were smaller than those from segmentation and subtraction of GM volumes, measurement variance was reduced. The number of patients required per treatment arm to detect a 20% reduction in GM atrophy rate over a 12-month follow-up (90% power) was 202 (95% confidence interval [CI], 118-423) using Jacobian integration and 2047 (95% CI 271 to > 10,000) using segmentation and subtraction. Comparable sample sizes for whole brain atrophy were 240 (95% CI, 142-469) using the BBSI and 196 (95% CI, 110-425) using SIENA. Jacobian integration could be useful for measuring GM atrophy rate in Alzheimer's disease as a marker of disease progression and treatment efficacy.
    Neurobiology of aging 12/2010; 33(7):1194-202. DOI:10.1016/j.neurobiolaging.2010.11.001 · 4.85 Impact Factor
Show more