Fast Identification of Biological Pathways Associated with a Quantitative Trait Using Group Lasso with Overlaps

Imperial College London, UK.
Statistical Applications in Genetics and Molecular Biology (Impact Factor: 1.13). 01/2012; 11(1):7-7. DOI: 10.2202/1544-6115.1755
Source: RePEc


Where causal SNPs (single nucleotide polymorphisms) tend to accumulate within biological pathways, the incorporation of prior pathways information into a statistical model is expected to increase the power to detect true associations in a genetic association study. Most existing pathways-based methods rely on marginal SNP statistics and do not fully exploit the dependence patterns among SNPs within pathways.

We use a sparse regression model, with SNPs grouped into pathways, to identify causal pathways associated with a quantitative trait. Notable features of our “pathways group lasso with adaptive weights†(P-GLAW) algorithm include the incorporation of all pathways in a single regression model, an adaptive pathway weighting procedure that accounts for factors biasing pathway selection, and the use of a bootstrap sampling procedure for the ranking of important pathways. P-GLAW takes account of the presence of overlapping pathways and uses a novel combination of techniques to optimise model estimation, making it fast to run, even on whole genome datasets.

In a comparison study with an alternative pathways method based on univariate SNP statistics, our method demonstrates high sensitivity and specificity for the detection of important pathways, showing the greatest relative gains in performance where marginal SNP effect sizes are small.

Download full-text


Available from: Giovanni Montana, Jan 01, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Alzheimer's Disease Neuroimaging Initiative (ADNI) is an ongoing, longitudinal, multicenter study designed to develop clinical, imaging, genetic, and biochemical biomarkers for the early detection and tracking of Alzheimer's disease (AD). The initial study, ADNI-1, enrolled 400 subjects with early mild cognitive impairment (MCI), 200 with early AD, and 200 cognitively normal elderly controls. ADNI-1 was extended by a 2-year Grand Opportunities grant in 2009 and by a competitive renewal, ADNI-2, which enrolled an additional 550 participants and will run until 2015. This article reviews all papers published since the inception of the initiative and summarizes the results to the end of 2013. The major accomplishments of ADNI have been as follows: (1) the development of standardized methods for clinical tests, magnetic resonance imaging (MRI), positron emission tomography (PET), and cerebrospinal fluid (CSF) biomarkers in a multicenter setting; (2) elucidation of the patterns and rates of change of imaging and CSF biomarker measurements in control subjects, MCI patients, and AD patients. CSF biomarkers are largely consistent with disease trajectories predicted by β-amyloid cascade (Hardy, J Alzheimer's Dis 2006;9(Suppl 3):151-3) and tau-mediated neurodegeneration hypotheses for AD, whereas brain atrophy and hypometabolism levels show predicted patterns but exhibit differing rates of change depending on region and disease severity; (3) the assessment of alternative methods of diagnostic categorization. Currently, the best classifiers select and combine optimum features from multiple modalities, including MRI, [(18)F]-fluorodeoxyglucose-PET, amyloid PET, CSF biomarkers, and clinical tests; (4) the development of blood biomarkers for AD as potentially noninvasive and low-cost alternatives to CSF biomarkers for AD diagnosis and the assessment of α-syn as an additional biomarker; (5) the development of methods for the early detection of AD. CSF biomarkers, β-amyloid 42 and tau, as well as amyloid PET may reflect the earliest steps in AD pathology in mildly symptomatic or even nonsymptomatic subjects and are leading candidates for the detection of AD in its preclinical stages; (6) the improvement of clinical trial efficiency through the identification of subjects most likely to undergo imminent future clinical decline and the use of more sensitive outcome measures to reduce sample sizes. Multimodal methods incorporating APOE status and longitudinal MRI proved most highly predictive of future decline. Refinements of clinical tests used as outcome measures such as clinical dementia rating-sum of boxes further reduced sample sizes; (7) the pioneering of genome-wide association studies that leverage quantitative imaging and biomarker phenotypes, including longitudinal data, to confirm recently identified loci, CR1, CLU, and PICALM and to identify novel AD risk loci; (8) worldwide impact through the establishment of ADNI-like programs in Japan, Australia, Argentina, Taiwan, China, Korea, Europe, and Italy; (9) understanding the biology and pathobiology of normal aging, MCI, and AD through integration of ADNI biomarker and clinical data to stimulate research that will resolve controversies about competing hypotheses on the etiopathogenesis of AD, thereby advancing efforts to find disease-modifying drugs for AD; and (10) the establishment of infrastructure to allow sharing of all raw and processed data without embargo to interested scientific investigators throughout the world. Published by Elsevier Inc.
    Alzheimer's & dementia: the journal of the Alzheimer's Association 10/2011; 8(1 Suppl):S1-68. DOI:10.1016/j.jalz.2011.09.172 · 12.41 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Scanning the entire genome in search of variants related to imaging phenotypes holds great promise in elucidating the genetic etiology of neurodegenerative disorders. Here we discuss the application of a penalized multivariate model, sparse reduced-rank regression (sRRR), for the genome-wide detection of markers associated with voxel-wise longitudinal changes in the brain caused by Alzheimer's disease (AD). Using a sample from the Alzheimer's Disease Neuroimaging Initiative database, we performed three separate studies that each compared two groups of individuals to identify genes associated with disease development and progression. For each comparison we took a two-step approach: initially, using penalized linear discriminant analysis, we identified voxels that provide an imaging signature of the disease with high classification accuracy; then we used this multivariate biomarker as a phenotype in a genome-wide association study, carried out using sRRR. The genetic markers were ranked in order of importance of association to the phenotypes using a data re-sampling approach. Our findings confirmed the key role of the APOE and TOMM40 genes but also highlighted some novel potential associations with AD.
    NeuroImage 12/2011; 60(1):700-16. DOI:10.1016/j.neuroimage.2011.12.029 · 6.36 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The task of analyzing high-dimensional single nucleotide polymorphism (SNP) data in a case-control design using multivariable techniques has only recently been tackled. While many available approaches investigate only main effects in a high-dimensional setting, we propose a more flexible technique, cluster-localized regression (CLR), based on localized logistic regression models, that allows different SNPs to have an effect for different groups of individuals. Separate multivariable regression models are fitted for the different groups of individuals by incorporating weights into componentwise boosting, which provides simultaneous variable selection, hence sparse fits. For model fitting, these groups of individuals are identified using a clustering approach, where each group may be defined via different SNPs. This allows for representing complex interaction patterns, such as compositional epistasis, that might not be detected by a single main effects model. In a simulation study, the CLR approach results in improved prediction performance, compared to the main effects approach, and identification of important SNPs in several scenarios. Improved prediction performance is also obtained for an application example considering urinary bladder cancer. Some of the identified SNPs are predictive for all individuals, while others are only relevant for a specific group. Together with the sets of SNPs that define the groups, potential interaction patterns are uncovered.
    Statistical Applications in Genetics and Molecular Biology 01/2012; 11(4). DOI:10.1515/1544-6115.1694 · 1.13 Impact Factor
Show more