A dynamic model for genome-wide association studies

Department of Statistics, The Pennsylvania State University, University Park, PA, USA.
Human Genetics (Impact Factor: 4.82). 02/2011; 129(6):629-39. DOI: 10.1007/s00439-011-0960-6
Source: PubMed


Although genome-wide association studies (GWAS) are widely used to identify the genetic and environmental etiology of a trait, several key issues related to their statistical power and biological relevance have remained unexplored. Here, we describe a novel statistical approach, called functional GWAS or fGWAS, to analyze the genetic control of traits by integrating biological principles of trait formation into the GWAS framework through mathematical and statistical bridges. fGWAS can address many fundamental questions, such as the patterns of genetic control over development, the duration of genetic effects, as well as what causes developmental trajectories to change or stop changing. In statistics, fGWAS displays increased power for gene detection by capitalizing on cumulative phenotypic variation in a longitudinal trait over time and increased robustness for manipulating sparse longitudinal data.


Available from: Yao li, Mar 06, 2015
  • Source
    • "Nonzeros Proportion of Nonzeros Proportion of n σ 2 C IC U.-fit C.-fit O.-fit C IC U.-fit C.-fit O.-fit Bayesian group lasso Das et al. (2011) "
    [Show abstract] [Hide abstract]
    ABSTRACT: Although genome-wide association studies (GWAS) have proven powerful for comprehending the genetic architecture of complex traits, they are challenged by a high dimension of single-nucleotide polymorphisms (SNPs) as predictors, the presence of complex environmental factors, and longitudinal or functional natures of many complex traits or diseases. To address these challenges, we propose a high-dimensional varying-coefficient model for incorporating functional aspects of phenotypic traits into GWAS to formulate a so-called functional GWAS or fGWAS. The Bayesian group lasso and the associated MCMC algorithms are developed to identify significant SNPs and estimate how they affect longitudinal traits through time-varying genetic actions. The model is generalized to analyze the genetic control of complex traits using subject-specific sparse longitudinal data. The statistical properties of the new model are investigated through simulation studies. We use the new model to analyze a real GWAS data set from the Framingham Heart Study, leading to the identification of several significant SNPs associated with age-specific changes of body mass index. The fGWAS model, equipped with the Bayesian group lasso, will provide a useful tool for genetic and developmental analysis of complex traits or diseases.
    The Annals of Applied Statistics 09/2015; 9(2). DOI:10.1214/15-AOAS808 · 1.46 Impact Factor
  • Source
    • "Genome-wide association studies (GWAS) have been a powerful tool for genetic and biomedical research. The past decade has witnessed the rapid development of GWAS and the substantial contributions it has made [Altshuler, Daly and Lander (2008); Psychiatric GCCC (2009); Hirschhorn (2009); Das et al. (2011)]. With advances in high-throughput genotyping techniques and modern statistics, GWAS have been helping investigators understand the genetic basis of many complex traits or diseases, providing valuable clues to the genetic predisposition of common diseases and drug responses [Burton et al. (2007); Daly (2010)], among others. "
    [Show abstract] [Hide abstract]
    ABSTRACT: With the recent advent of high-throughput genotyping techniques, genetic data for genome-wide association studies (GWAS) have become increasingly available, which entails the development of efficient and effective statistical approaches. Although many such approaches have been developed and used to identify single-nucleotide polymorphisms (SNPs) that are associated with complex traits or diseases, few are able to detect gene-gene interactions among different SNPs. Genetic interactions, also known as epistasis, have been recognized to play a pivotal role in contributing to the genetic variation of phenotypic traits. However, because of an extremely large number of SNP-SNP combinations in GWAS, the model dimensionality can quickly become so overwhelming that no prevailing variable selection methods are capable of handling this problem. In this paper, we present a statistical framework for characterizing main genetic effects and epistatic interactions in a GWAS study. Specifically, we first propose a two-stage sure independence screening (TS-SIS) procedure and generate a pool of candidate SNPs and interactions, which serve as predictors to explain and predict the phenotypes of a complex trait. We also propose a rates adjusted thresholding estimation (RATE) approach to determine the size of the reduced model selected by an independence screening. Regularization regression methods, such as LASSO or SCAD, are then applied to further identify important genetic effects. Simulation studies show that the TS-SIS procedure is computationally efficient and has an outstanding finite sample performance in selecting potential SNPs as well as gene-gene interactions. We apply the proposed framework to analyze an ultrahigh-dimensional GWAS data set from the Framingham Heart Study, and select 23 active SNPs and 24 active epistatic interactions for the body mass index variation. It shows the capability of our procedure to resolve the complexity of genetic control.
    The Annals of Applied Statistics 02/2015; 8(4). DOI:10.1214/14-AOAS771 · 1.46 Impact Factor
  • Source
    • "Recent genetic association studies have been performed on longitudinal cohorts to take advantage of repeat measurements of time-varying variables [Das et al., 2011; Fan et al., 2012; Furlotte et al., 2012]. Longitudinal analysis in genetic studies offers several advantages. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The analysis of whole-genome sequence (WGS) data using longitudinal phenotypes offers a potentially rich resource for the examination of the genetic variants and their covariates that affect complex phenotypes over time. We summarize eight contributions to the Genetic Analysis Workshop 18, which applied a diverse array of statistical genetic methods to analyze WGS data in combination with data from genome-wide association studies (GWAS) from up to four different time points on blood pressure phenotypes. The common goal of these analyses was to develop and apply appropriate methods that utilize longitudinal repeated measures to potentially increase the analytic efficiency of WGS and GWAS data. These diverse methods can be grouped into two categories, based on the way they model dependence structures: (1) linear mixed-effects (LME) models, where the random effect terms in the linear models are used to capture the dependence structures; and (2) variance-components models, where the dependence structures are constructed directly based on multiple components of variance-covariance matrices for the multivariate Gaussian responses. Despite the heterogeneous nature of these analytical methods, the group came to the following conclusions: (1) the use of repeat measurements can gain power to identify variants associated with the phenotype; (2) the inclusion of family data may correct genotyping errors and allow for more accurate detection of rare variants than using unrelated individuals only; and (3) fitting mixed-effects and variance-components models for longitudinal data presents computational challenges. The challenges and computational burden demanded by WGS data were addressed in the eight contributions.
    Genetic Epidemiology 09/2014; 38(S1):S74-S80. DOI:10.1002/gepi.21829 · 2.60 Impact Factor
Show more