A dynamic model for genome-wide association studies

Department of Statistics, The Pennsylvania State University, University Park, PA, USA.
Human Genetics (Impact Factor: 4.82). 02/2011; 129(6):629-39. DOI: 10.1007/s00439-011-0960-6
Source: PubMed


Although genome-wide association studies (GWAS) are widely used to identify the genetic and environmental etiology of a trait, several key issues related to their statistical power and biological relevance have remained unexplored. Here, we describe a novel statistical approach, called functional GWAS or fGWAS, to analyze the genetic control of traits by integrating biological principles of trait formation into the GWAS framework through mathematical and statistical bridges. fGWAS can address many fundamental questions, such as the patterns of genetic control over development, the duration of genetic effects, as well as what causes developmental trajectories to change or stop changing. In statistics, fGWAS displays increased power for gene detection by capitalizing on cumulative phenotypic variation in a longitudinal trait over time and increased robustness for manipulating sparse longitudinal data.

Download full-text


Available from: Yao li, Mar 06, 2015
80 Reads
  • Source
    • "Genome-wide association studies (GWAS) have been a powerful tool for genetic and biomedical research. The past decade has witnessed the rapid development of GWAS and the substantial contributions it has made [Altshuler, Daly and Lander (2008); Psychiatric GCCC (2009); Hirschhorn (2009); Das et al. (2011)]. With advances in high-throughput genotyping techniques and modern statistics, GWAS have been helping investigators understand the genetic basis of many complex traits or diseases, providing valuable clues to the genetic predisposition of common diseases and drug responses [Burton et al. (2007); Daly (2010)], among others. "
    [Show abstract] [Hide abstract]
    ABSTRACT: With the recent advent of high-throughput genotyping techniques, genetic data for genome-wide association studies (GWAS) have become increasingly available, which entails the development of efficient and effective statistical approaches. Although many such approaches have been developed and used to identify single-nucleotide polymorphisms (SNPs) that are associated with complex traits or diseases, few are able to detect gene-gene interactions among different SNPs. Genetic interactions, also known as epistasis, have been recognized to play a pivotal role in contributing to the genetic variation of phenotypic traits. However, because of an extremely large number of SNP-SNP combinations in GWAS, the model dimensionality can quickly become so overwhelming that no prevailing variable selection methods are capable of handling this problem. In this paper, we present a statistical framework for characterizing main genetic effects and epistatic interactions in a GWAS study. Specifically, we first propose a two-stage sure independence screening (TS-SIS) procedure and generate a pool of candidate SNPs and interactions, which serve as predictors to explain and predict the phenotypes of a complex trait. We also propose a rates adjusted thresholding estimation (RATE) approach to determine the size of the reduced model selected by an independence screening. Regularization regression methods, such as LASSO or SCAD, are then applied to further identify important genetic effects. Simulation studies show that the TS-SIS procedure is computationally efficient and has an outstanding finite sample performance in selecting potential SNPs as well as gene-gene interactions. We apply the proposed framework to analyze an ultrahigh-dimensional GWAS data set from the Framingham Heart Study, and select 23 active SNPs and 24 active epistatic interactions for the body mass index variation. It shows the capability of our procedure to resolve the complexity of genetic control.
    The Annals of Applied Statistics 02/2015; 8(4). DOI:10.1214/14-AOAS771 · 1.46 Impact Factor
  • Source
    • "Recent genetic association studies have been performed on longitudinal cohorts to take advantage of repeat measurements of time-varying variables [Das et al., 2011; Fan et al., 2012; Furlotte et al., 2012]. Longitudinal analysis in genetic studies offers several advantages. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The analysis of whole-genome sequence (WGS) data using longitudinal phenotypes offers a potentially rich resource for the examination of the genetic variants and their covariates that affect complex phenotypes over time. We summarize eight contributions to the Genetic Analysis Workshop 18, which applied a diverse array of statistical genetic methods to analyze WGS data in combination with data from genome-wide association studies (GWAS) from up to four different time points on blood pressure phenotypes. The common goal of these analyses was to develop and apply appropriate methods that utilize longitudinal repeated measures to potentially increase the analytic efficiency of WGS and GWAS data. These diverse methods can be grouped into two categories, based on the way they model dependence structures: (1) linear mixed-effects (LME) models, where the random effect terms in the linear models are used to capture the dependence structures; and (2) variance-components models, where the dependence structures are constructed directly based on multiple components of variance-covariance matrices for the multivariate Gaussian responses. Despite the heterogeneous nature of these analytical methods, the group came to the following conclusions: (1) the use of repeat measurements can gain power to identify variants associated with the phenotype; (2) the inclusion of family data may correct genotyping errors and allow for more accurate detection of rare variants than using unrelated individuals only; and (3) fitting mixed-effects and variance-components models for longitudinal data presents computational challenges. The challenges and computational burden demanded by WGS data were addressed in the eight contributions.
    Genetic Epidemiology 09/2014; 38(S1):S74-S80. DOI:10.1002/gepi.21829 · 2.60 Impact Factor
  • Source
    • "Statistical learning (SL) methods are designed to find important predictors in large data sets, and can be applied in cases where the number of predictors is much larger than the number of subjects. Different learning methods such as Random Forests and Bayesian Lasso have recently been applied to genome-wide data [14-19]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Typically, genome-wide association studies consist of regressing the phenotype on each SNP separately using an additive genetic model. Although statistical models for recessive, dominant, SNP-SNP, or SNP-environment interactions exist, the testing burden makes an evaluation of all possible effects impractical for genome-wide data. We advocate a two-step approach where the first step consists of a filter that is sensitive to different types of SNP main and interactions effects. The aim is to substantially reduce the number of SNPs such that more specific modeling becomes feasible in a second step. We provide an evaluation of a statistical learning method called “gradient boosting machine” (GBM) that can be used as a filter. GBM does not require an a priori specification of a genetic model, and permits inclusion of large numbers of covariates. GBM can therefore be used to explore multiple GxE interactions, which would not be feasible within the parametric framework used in GWAS. We show in a simulation that GBM performs well even under conditions favorable to the standard additive regression model commonly used in GWAS, and is sensitive to the detection of interaction effects even if one of the interacting variables has a zero main effect. The latter would not be detected in GWAS. Our evaluation is accompanied by an analysis of empirical data concerning hair morphology. We estimate the phenotypic variance explained by increasing numbers of highest ranked SNPs, and show that it is sufficient to select 10K-20K SNPs in the first step of a two-step approach.
    10/2013; 4. DOI:10.4172/2153-0602.1000143
Show more