Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood

The University of Queensland, Queensland Brain Institute, Brisbane, QLD 4072, The University of Queensland Diamantina Institute, Princess Alexandra Hospital, Brisbane, QLD 4102 and Department of Agriculture and Food Systems, University of Melbourne, VIC 3010, Melbourne, Australia.
Bioinformatics (Impact Factor: 4.62). 07/2012; 28(19):2540-2. DOI: 10.1093/bioinformatics/bts474
Source: PubMed

ABSTRACT Genetic correlations are the genome-wide aggregate effects of causal variants affecting multiple traits. Traditionally, genetic correlations between complex traits are estimated from pedigree studies, but such estimates can be confounded by shared environmental factors. Moreover, for diseases, low prevalence rates imply that even if the true genetic correlation between disorders was high, co-aggregation of disorders in families might not occur or could not be distinguished from chance. We have developed and implemented statistical methods based on linear mixed models to obtain unbiased estimates of the genetic correlation between pairs of quantitative traits or pairs of binary traits of complex diseases using population-based case-control studies with genome-wide single-nucleotide polymorphism data. The method is validated in a simulation study and applied to estimate genetic correlation between various diseases from Wellcome Trust Case Control Consortium data in a series of bivariate analyses. We estimate a significant positive genetic correlation between risk of Type 2 diabetes and hypertension of ~0.31 (SE 0.14, P = 0.024).
Our methods, appropriate for both quantitative and binary traits, are implemented in the freely available software GCTA ( Supplementary Information: Supplementary data are available at Bioinformatics online.

1 Follower
  • [Show abstract] [Hide abstract]
    ABSTRACT: Age-related effects are often included as covariates in the analytical model for genome-wide association analysis of quantitative traits reflecting human health. Nevertheless, previous studies have hardly examined the effects of age on the proportion of variation explained by single nucleotide polymorphisms (PVSNP) in these traits. In this study, the PVSNP estimates of body mass index (BMI), waist-to-hip ratio, pulse pressure, high-density lipoprotein cholesterol level, triglyceride level (TG), low-density lipoprotein cholesterol level, and glucose level were obtained from Korean consortium metadata partitioned by gender or by age. Restricted maximum likelihood estimates of the PVSNP were obtained in a mixed model framework. Previous studies using pedigree data suggested possible differential heritability of certain traits with regard to gender, which we observed in our current study (BMI and TG; P < 0.05). However, the PVSNP analysis based on age revealed that, with respect to every trait tested, individuals aged 40 to 49 exhibited significantly lower PVSNP estimates than individuals aged 50 to 59 or 60 to 69 (P < 0.05). The consistent heterogeneous PVSNP with respect to age may be due to degenerated genetic functions in individuals between the ages of 50 and 69. Our results suggest the genetic mechanism of age- and gender-dependent PVSNP of quantitative traits related to human health should be further examined.
    Journal of the American Aging Association 04/2015; 37(2):9756. DOI:10.1007/s11357-015-9756-2 · 3.45 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genetic risk prediction has several potential applications in medical research and clinical practice and could be used, for example, to stratify a heterogeneous population of patients by their predicted genetic risk. However, for polygenic traits, such as psychiatric disorders, the accuracy of risk prediction is low. Here we use a multivariate linear mixed model and apply multi-trait genomic best linear unbiased prediction for genetic risk prediction. This method exploits correlations between disorders and simultaneously evaluates individual risk for each disorder. We show that the multivariate approach significantly increases the prediction accuracy for schizophrenia, bipolar disorder, and major depressive disorder in the discovery as well as in independent validation datasets. By grouping SNPs based on genome annotation and fitting multiple random effects, we show that the prediction accuracy could be further improved. The gain in prediction accuracy of the multivariate approach is equivalent to an increase in sample size of 34% for schizophrenia, 68% for bipolar disorder, and 76% for major depressive disorders using single trait models. Because our approach can be readily applied to any number of GWAS datasets of correlated traits, it is a flexible and powerful tool to maximize prediction accuracy. With current sample size, risk predictors are not useful in a clinical setting but already are a valuable research tool, for example in experimental designs comparing cases with high and low polygenic risk. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
    The American Journal of Human Genetics 01/2015; DOI:10.1016/j.ajhg.2014.12.006 · 10.99 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Multiple trait association mapping, in which multiple traits are used simultaneously in the identification of genetic variants affecting those traits, has recently attracted interest. One class of approaches for this problem builds on classical variance component methodology, utilizing a multi-trait version of a linear mixed-model. These approaches both increase power and provide insights into the genetic architecture of multiple traits. In particular, it is possible to estimate the genetic correlation which is a measure of the portion of the total correlation between traits that is due to additive genetic effects. Unfortunately, the practical utility of these methods is limited since they are computationally intractable for large sample sizes. In this paper, we introduce a reformulation of the multiple trait association mapping approach by defining the matrix-variate linear mixed model. Our approach reduces the computational time necessary to perform maximum-likelihood inference in a multiple trait model by utilizing a data transformation. By utilizing a well-studied human cohort, we show that our approach provides more than a 10-fold speed up, making multiple trait association feasible in a large population cohort on the genome-wide scale. We take advantage of the efficiency of our approach to analyze gene expression data. By decomposing gene coexpression into a genetic and environmental component, we show that our method provides fundamental insights into the nature of co-expressed genes. An implementation of this method is available at Copyright © 2015, The Genetics Society of America.
    Genetics 02/2015; 200(1). DOI:10.1534/genetics.114.171447 · 4.87 Impact Factor