Article

Fast Identification of Biological Pathways Associated with a Quantitative Trait Using Group Lasso with Overlaps

Imperial College London, UK.
Statistical Applications in Genetics and Molecular Biology (Impact Factor: 1.52). 01/2012; 11(1):7-7. DOI: 10.2202/1544-6115.1755
Source: RePEc

ABSTRACT Where causal SNPs (single nucleotide polymorphisms) tend to accumulate within biological pathways, the incorporation of prior pathways information into a statistical model is expected to increase the power to detect true associations in a genetic association study. Most existing pathways-based methods rely on marginal SNP statistics and do not fully exploit the dependence patterns among SNPs within pathways.

We use a sparse regression model, with SNPs grouped into pathways, to identify causal pathways associated with a quantitative trait. Notable features of our “pathways group lasso with adaptive weights†(P-GLAW) algorithm include the incorporation of all pathways in a single regression model, an adaptive pathway weighting procedure that accounts for factors biasing pathway selection, and the use of a bootstrap sampling procedure for the ranking of important pathways. P-GLAW takes account of the presence of overlapping pathways and uses a novel combination of techniques to optimise model estimation, making it fast to run, even on whole genome datasets.

In a comparison study with an alternative pathways method based on univariate SNP statistics, our method demonstrates high sensitivity and specificity for the detection of important pathways, showing the greatest relative gains in performance where marginal SNP effect sizes are small.

0 Bookmarks
 · 
85 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Genetics Core of the Alzheimer's Disease Neuroimaging Initiative (ADNI), formally established in 2009, aims to provide resources and facilitate research related to genetic predictors of multidimensional Alzheimer's disease (AD)-related phenotypes. Here, we provide a systematic review of genetic studies published between 2009 and 2012 where either ADNI APOE genotype or genome-wide association study (GWAS) data were used. We review and synthesize ADNI genetic associations with disease status or quantitative disease endophenotypes including structural and functional neuroimaging, fluid biomarker assays, and cognitive performance. We also discuss the diverse analytical strategies used in these studies, including univariate and multivariate analysis, meta-analysis, pathway analysis, and interaction and network analysis. Finally, we perform pathway and network enrichment analyses of these ADNI genetic associations to highlight key mechanisms that may drive disease onset and trajectory. Major ADNI findings included all the top 10 AD genes and several of these (e.g., APOE, BIN1, CLU, CR1, and PICALM) were corroborated by ADNI imaging, fluid and cognitive phenotypes. ADNI imaging genetics studies discovered novel findings (e.g., FRMD6) that were later replicated on different data sets. Several other genes (e.g., APOC1, FTO, GRIN2B, MAGI2, and TOMM40) were associated with multiple ADNI phenotypes, warranting further investigation on other data sets. The broad availability and wide scope of ADNI genetic and phenotypic data has advanced our understanding of the genetic basis of AD and has nominated novel targets for future studies employing next-generation sequencing and convergent multi-omics approaches, and for clinical drug and biomarker development.
    Brain Imaging and Behavior 10/2013; DOI:10.1007/s11682-013-9262-z · 2.67 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The development of advanced medical imaging technologies and high-throughput genomic measurements has enhanced our ability to understand their interplay as well as their relationship with human behavior by integrating these two types of datasets. However, the high dimensionality and heterogeneity of these datasets presents a challenge to conventional statistical methods; there is a high demand for the development of both correlative and integrative analysis approaches. Here, we review our recent work on developing sparse representation based approaches to address this challenge. We show how sparse models are applied to the correlation and integration of imaging and genetic data for biomarker identification. We present examples on how these approaches are used for the detection of risk genes and classification of complex diseases such as schizophrenia. Finally, we discuss future directions on the integration of multiple imaging and genomic datasets including their interactions such as epistasis.
    Journal of Neuroscience Methods 11/2014; 237. DOI:10.1016/j.jneumeth.2014.09.001 · 1.96 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Standard approaches to data analysis in genome-wide association studies (GWAS) ignore any potential functional relationships between gene variants. In contrast gene pathways analysis uses prior information on functional structure within the genome to identify pathways associated with a trait of interest. In a second step, important single nucleotide polymorphisms (SNPs) or genes may be identified within associated pathways. The pathways approach is motivated by the fact that genes do not act alone, but instead have effects that are likely to be mediated through their interaction in gene pathways. Where this is the case, pathways approaches may reveal aspects of a trait's genetic architecture that would otherwise be missed when considering SNPs in isolation. Most pathways methods begin by testing SNPs one at a time, and so fail to capitalise on the potential advantages inherent in a multi-SNP, joint modelling approach. Here, we describe a dual-level, sparse regression model for the simultaneous identification of pathways and genes associated with a quantitative trait. Our method takes account of various factors specific to the joint modelling of pathways with genome-wide data, including widespread correlation between genetic predictors, and the fact that variants may overlap multiple pathways. We use a resampling strategy that exploits finite sample variability to provide robust rankings for pathways and genes. We test our method through simulation, and use it to perform pathways-driven gene selection in a search for pathways and genes associated with variation in serum high-density lipoprotein cholesterol levels in two separate GWAS cohorts of Asian adults. By comparing results from both cohorts we identify a number of candidate pathways including those associated with cardiomyopathy, and T cell receptor and PPAR signalling. Highlighted genes include those associated with the L-type calcium channel, adenylate cyclase, integrin, laminin, MAPK signalling and immune function.
    PLoS Genetics 11/2013; 9(11):e1003939. DOI:10.1371/journal.pgen.1003939 · 8.17 Impact Factor

Full-text (2 Sources)

Download
26 Downloads
Available from
May 28, 2014