Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach
ABSTRACT There is growing interest in performing genome-wide searches for associations between genetic variants and brain imaging phenotypes. While much work has focused on single scalar valued summaries of brain phenotype, accounting for the richness of imaging data requires a brain-wide, genome-wide search. In particular, the standard approach based on mass-univariate linear modelling (MULM) does not account for the structured patterns of correlations present in each domain. In this work, we propose sparse reduced rank regression (sRRR), a strategy for multivariate modelling of high-dimensional imaging responses (measurements taken over regions of interest or individual voxels) and genetic covariates (single nucleotide polymorphisms or copy number variations), which enforces sparsity in the regression coefficients. Such sparsity constraints ensure that the model performs simultaneous genotype and phenotype selection. Using simulation procedures that accurately reflect realistic human genetic variation and imaging correlations, we present detailed evaluations of the sRRR method in comparison with the more traditional MULM approach. In all settings considered, sRRR has better power to detect deleterious genetic variants compared to MULM. Important issues concerning model selection and connections to existing latent variable models are also discussed. This work shows that sRRR offers a promising alternative for detecting brain-wide, genome-wide associations.
- SourceAvailable from: Behnood Rasti[Show abstract] [Hide abstract]
ABSTRACT: In this paper, a method called wavelet-based sparse reduced-rank regression (WSRRR) is proposed for hyperspectral image restoration. The method is based on minimizing a sparse regularization problem subject to an orthogonality constraint. A cyclic descent-type algorithm is derived for solving the minimization problem. For selecting the tuning parameters, we propose a method based on Stein's unbiased risk estimation. It is shown that the hyperspectral image can be restored using a few sparse components. The method is evaluated using signal-to-noise ratio and spectral angle distance for a simulated noisy data set and by classification accuracies for a real data set. Two different classifiers, namely, support vector machines and random forest, are used in this paper. The method is compared to other restoration methods, and it is shown that WSRRR outperforms them for the simulated noisy data set. It is also shown in the experiments on a real data set that WSRRR not only effectively removes noise but also maintains more fine features compared to other methods used. WSRRR also gives higher classification accuracies.IEEE Transactions on Geoscience and Remote Sensing 10/2014; 52(10):6688-6698. DOI:10.1109/TGRS.2014.2301415 · 2.93 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Testing the independence between two random variables $x$ and $y$ is an important problem in statistics and machine learning, where the kernel-based tests of independence is focused to address the study of dependence recently. The advantage of the kernel framework rests on its flexibility in choice of kernel. The Hilbert-Schmidt Independence Criterion (HSIC) was shown to be equivalent to a class of tests, where the tests are based on different distance-induced kernel pairs. In this work, we propose to select the optimal kernel pair by considering local alternatives, and evaluate the efficiency using the quadratic time estimator of HSIC. The local alternative offers the advantage that the measure of efficiency do not depend on a particular alternative, and only requires the knowledge of the asymptotic null distribution of the test. We show in our experiments that the proposed strategy results in higher power than other existing kernel selection approaches.
- [Show abstract] [Hide abstract]
ABSTRACT: Recent advances in acquiring high throughput neuroimaging and genomics data provide exciting new opportunities to study the influence of genetic variation on brain structure and function. Research in this emergent field, known as imaging genetics, aims to identify the association between genetic variations such as single nucleotide polymorphisms (SNPs) and neuroimaging quantitative traits (QTs). Sparse canonical correlation analysis (SCCA) is a bi-multivariate analysis method that has the potential to reveal complex multi-SNP-multi-QT associations. However, the scale and complexity of the imaging genetic data have presented critical computational bottlenecks requiring new concepts and enabling tools. In this paper, we present our initial efforts on developing a set of massively parallel strategies to accelerate a widely used SCCA implementation provided by the Penalized Multivariate Analysis (PMA) software package. In particular, we exploit parallel packages of R, optimized mathematical libraries, and the automatic offload model for Intel Many Integrated Core (MIC) architecture to accelerate SCCA. We create several simulated imaging genetics data sets of different sizes and use these synthetic data to perform comparative study. Our performance evaluation demonstrates that a 2-fold speedup can be achieved by the proposed acceleration. The preliminary results show that by combining data parallel strategy and the offload model for MIC we can significantly reduce the knowledge discovery timelines involving applying SCCA on large brain imaging genetics data.