Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach.
ABSTRACT There is growing interest in performing genome-wide searches for associations between genetic variants and brain imaging phenotypes. While much work has focused on single scalar valued summaries of brain phenotype, accounting for the richness of imaging data requires a brain-wide, genome-wide search. In particular, the standard approach based on mass-univariate linear modelling (MULM) does not account for the structured patterns of correlations present in each domain. In this work, we propose sparse reduced rank regression (sRRR), a strategy for multivariate modelling of high-dimensional imaging responses (measurements taken over regions of interest or individual voxels) and genetic covariates (single nucleotide polymorphisms or copy number variations), which enforces sparsity in the regression coefficients. Such sparsity constraints ensure that the model performs simultaneous genotype and phenotype selection. Using simulation procedures that accurately reflect realistic human genetic variation and imaging correlations, we present detailed evaluations of the sRRR method in comparison with the more traditional MULM approach. In all settings considered, sRRR has better power to detect deleterious genetic variants compared to MULM. Important issues concerning model selection and connections to existing latent variable models are also discussed. This work shows that sRRR offers a promising alternative for detecting brain-wide, genome-wide associations.
- SourceAvailable from: Giovanni Montana[Show abstract] [Hide abstract]
ABSTRACT: Cardiac phenotypes, such as left ventricular (LV) mass,demonstrate high heritability although most genes associated with these complex traits remain unidentified. Genome-wide association studies (GWAS) have relied on conventional 2D cardiovascular magnetic resonance (CMR) as the gold-standard for phenotyping. However this technique is insensitive to the regional variations in wall thickness which are often associated with left ventricular hypertrophy and require large cohorts to reach significance. Here we test whether automated cardiac phenotyping using high spatial resolution CMR atlases can achieve improved precision for mapping wall thickness in healthy populations and whether smaller sample sizes are required compared to conventional methods. LV short-axis cine images were acquired in 138 healthy volunteers using standard 2D imaging and 3D high spatial resolution CMR. A multi-atlas technique was used to segment and co-register each image. The agreement between methods for end-diastolic volume and mass was made using Bland-Altman analysis in 20 subjects. The 3D and 2D segmentations of the LV were compared to manual labeling by the proportion of concordant voxels (Dice coefficient) and the distances separating corresponding points. Parametric and nonparametric data were analysed with paired t-tests and Wilcoxon signed-rank test respectively. Voxelwise power calculations used the interstudy variances of wall thickness. The 3D volumetric measurements showed no bias compared to 2D imaging. The segmented 3D images were more accurate than 2D images for defining the epicardium (Dice: 0.95vs0.93, P < 0.001; mean error 1.3mm vs 2.2mm, P < 0.001) and endocardium (Dice 0.95 vs 0.93, P < 0.001; mean error 1.1mm vs 2.0mm, P < 0.001).The 3D technique resulted in significant differences in wall thickness assessment at the base, septum and apex of the LV compared to 2D (P < 0.001). Fewer subjects were required for 3D imaging to detect a 1mm difference in wall thickness (72 vs 56, P < 0.001). High spatial resolution CMR with automated phenotyping provides greater power for mapping wall thickness than conventional 2D imaging and enables a reduction in the sample size required for studies of environmental and genetic determinants of LV wall thickness.Journal of Cardiovascular Magnetic Resonance 02/2014; 16(1):16. · 4.44 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: This article reviews work published by the ENIGMA Consortium and its Working Groups (http://enigma.ini.usc.edu). It was written collaboratively; P.T. wrote the first draft and all listed authors revised and edited the document for important intellectual content, using Google Docs for parallel editing, and approved it. Some ENIGMA investigators contributed to the design and implementation of ENIGMA or provided data but did not participate in the analysis or writing of this report. A complete listing of ENIGMA investigators is available at http://enigma.ini.usc.edu/publications/the-enigma-consortium-in-review/ For ADNI, some investigators contributed to the design and implementation of ADNI or provided data but did not participate in the analysis or writing of this report. A complete listing of ADNI investigators is available at http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ ADNI_Acknowledgement_List.pdf The work reviewed here was funded by a large number of federal and private agencies worldwide, listed in Stein et al. (2012); the funding for listed consortia is also itemized in Stein et al. (2012).Brain Imaging and Behavior 01/2014; · 2.67 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Brain imaging is a natural intermediate phenotype to understand the link between genetic information and behavior or brain pathologies risk factors. Massive efforts have been made in the last few years to acquire high-dimensional neuroimaging and genetic data on large cohorts of subjects. The statistical analysis of such data is carried out with increasingly sophisticated techniques and represents a great computational challenge. Fortunately, increasing computational power in distributed architectures can be harnessed, if new neuroinformatics infrastructures are designed and training to use these new tools is provided. Combining a MapReduce framework (TomusBLOB) with machine learning algorithms (Scikit-learn library), we design a scalable analysis tool that can deal with non-parametric statistics on high-dimensional data. End-users describe the statistical procedure to perform and can then test the model on their own computers before running the very same code in the cloud at a larger scale. We illustrate the potential of our approach on real data with an experiment showing how the functional signal in subcortical brain regions can be significantly fit with genome-wide genotypes. This experiment demonstrates the scalability and the reliability of our framework in the cloud with a 2 weeks deployment on hundreds of virtual machines.Frontiers in Neuroinformatics 01/2014; 8:31.