Methods for investigating gene-environment interactions in candidate pathway and genome-wide association studies.

Department of Preventive Medicine, University of Southern California, Los Angeles, California, 90089-9011, USA.
Annual Review of Public Health (Impact Factor: 6.63). 04/2010; 31:21-36. DOI: 10.1146/annurev.publhealth.012809.103619
Source: PubMed

ABSTRACT Despite the considerable enthusiasm about the yield of novel and replicated discoveries of genetic associations from the new generation of genome-wide association studies (GWAS), the proportion of the heritability of most complex diseases that have been studied to date remains small. Some of this "dark matter" could be due to gene-environment (G x E) interactions or more complex pathways involving multiple genes and exposures. We review the basic epidemiologic study design and statistical analysis approaches to studying G x E interactions individually and then consider more comprehensive approaches to studying entire pathways or GWAS data. In addition to the usual issues in genetic association studies, particular care is needed in exposure assessment, and very large sample sizes are required. Although hypothesis-driven, pathway-based and agnostic GWA study approaches are generally viewed as opposite poles, we suggest that the two can be usefully married using hierarchical modeling strategies that exploit external pathway knowledge in mining genome-wide data.


Available from: Duncan C Thomas, May 22, 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: The recent successes of genome-wide association studies (GWAS) have renewed interest in genome environment wide interaction studies (GEWIS) to discover genetic factors that modulate penetrance of environmental exposures to human diseases. Indeed, gene-environment interactions (G × E), which have not been emphasized in the GWAS era, could be a source contributing to the missing heritability, a major bottleneck limiting continuing GWAS successes. In this manuscript, we describe a design and analytic strategy to focus on G × E using only exposed subjects, dubbed as e-GEWIS. Operationally, an e-GEWIS analysis is equivalent to a GWAS analysis on exposed subjects only, and it has actually been used in some earlier GWAS without being explicitly identified as such. Through both analytics and simulations, e-GEWIS has been shown better efficiency than the usual cross-product-based analysis of G × E interaction with both cases and controls (cc-GEWIS), and they have comparable efficiency to case-only analysis of G × E (c-GEWIS), with potentially smaller sample sizes. The formalization of e-GEWIS here provides a theoretical basis to legitimize this framework for routine investigation of G × E, for more efficient G × E study designs, and for improvement of reproducibility in replicating GEWIS findings. As an illustration, we apply e-GEWIS to a lung cancer GWAS data set to perform a GEWIS, focusing on gene and smoking interaction. The e-GEWIS analysis successfully uncovered positive genetic associations on chromosome 15 among current smokers, suggesting a gene-smoking interaction. Although this signal was detected earlier, the current finding here serves as a positive control in support of this e-GEWIS strategy. © 2015 WILEY PERIODICALS, INC.
    Genetic Epidemiology 02/2015; DOI:10.1002/gepi.21890 · 2.95 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: For many complex diseases, prognosis is of essential importance. It has been shown that, beyond the main effects of genetic (G) and environmental (E) risk factors, the gene-environment (G$\times$E) interactions also play a critical role. In practice, the prognosis outcome data can be contaminated, and most of the existing methods are not robust to data contamination. In the literature, it has been shown that even a single contaminated observation can lead to severely biased model estimation. In this study, we describe prognosis using an accelerated failure time (AFT) model. An exponential squared loss is proposed to accommodate possible data contamination. A penalization approach is adopted for regularized estimation and marker selection. The proposed method is realized using an effective coordinate descent (CD) and minorization maximization (MM) algorithm. Simulation shows that without contamination, the proposed method has performance comparable to or better than the unrobust alternative. With contamination, it outperforms the unrobust alternative and, under certain scenarios, can be superior to the robust method based on quantile regression. The proposed method is applied to the analysis of TCGA (The Cancer Genome Atlas) lung cancer data. It identifies interactions different from those using the alternatives. The identified marker have important implications and satisfactory stability.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Accounting for gene-environment (GxE) interactions in complex trait association studies can facilitate our understanding of genetic heterogeneity under different environmental exposures, improve the ability to discover susceptible genes that exhibit little marginal effect, provide insight into the biological mechanisms of complex diseases, help to identify high-risk subgroups in the population, and uncover hidden heritability. However, significant GxE interactions can be difficult to find. The sample sizes required for sufficient power to detect association are much larger than those needed for genetic main effects, and interactions are sensitive to misspecification of the main effects model. These issues are exacerbated when working with binary phenotypes and rare variants, which bear less information on association. In this work, we present a similarity-based regression method for evaluating GxE interactions for rare variants with binary traits. The proposed model aggregates the genetic and GxE information across markers using genetic similarity thus increasing the ability to detect GxE signals. The model has a random effects interpretation, which leads to robustness against main effect misspecifications when evaluating GxE interactions. We construct score tests to examine GxE interactions and a computationally efficient EM algorithm to estimate the nuisance variance components. Using simulations and data applications, we show that the proposed method is a flexible and powerful tool to study the GxE effect in common or rare variant studies with binary traits. Copyright © 2015, The Genetics Society of America.
    Genetics 01/2015; 199(3). DOI:10.1534/genetics.114.171686 · 4.87 Impact Factor