Beyond Missing Heritability: Prediction of Complex Traits

Department of Biostatistics, University of Alabama at Birmingham, Alabama, United States of America.
PLoS Genetics (Impact Factor: 8.17). 04/2011; 7(4):e1002051. DOI: 10.1371/journal.pgen.1002051
Source: PubMed

ABSTRACT Despite rapid advances in genomic technology, our ability to account for phenotypic variation using genetic information remains limited for many traits. This has unfortunately resulted in limited application of genetic data towards preventive and personalized medicine, one of the primary impetuses of genome-wide association studies. Recently, a large proportion of the "missing heritability" for human height was statistically explained by modeling thousands of single nucleotide polymorphisms concurrently. However, it is currently unclear how gains in explained genetic variance will translate to the prediction of yet-to-be observed phenotypes. Using data from the Framingham Heart Study, we explore the genomic prediction of human height in training and validation samples while varying the statistical approach used, the number of SNPs included in the model, the validation scheme, and the number of subjects used to train the model. In our training datasets, we are able to explain a large proportion of the variation in height (h(2) up to 0.83, R(2) up to 0.96). However, the proportion of variance accounted for in validation samples is much smaller (ranging from 0.15 to 0.36 depending on the degree of familial information used in the training dataset). While such R(2) values vastly exceed what has been previously reported using a reduced number of pre-selected markers (<0.10), given the heritability of the trait (∼ 0.80), substantial room for improvement remains.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The influence of genetic interactions (epistasis) on the genetic variance of quantitative traits is a major unresolved problem relevant to medical, agricultural, and evolutionary genetics. The additive genetic component is typically a high proportion of the total genetic variance in quantitative traits, despite that underlying genes must interact to determine phenotype. This study estimates direct and interaction effects for 11 pairs of Quantitative Trait Loci (QTLs) affecting floral traits within a single population of Mimulus guttatus. With estimates of all 9 genotypes for each QTL pair, we are able to map from QTL effects to variance components as a function of population allele frequencies, and thus predict changes in variance components as allele frequencies change. This mapping requires an analytical framework that properly accounts for bias introduced by estimation errors. We find that even with abundant interactions between QTLs, most of the genetic variance is likely to be additive. However, the strong dependency of allelic average effects on genetic background implies that epistasis is a major determinant of the additive genetic variance, and thus, the population's ability to respond to selection.
    PLoS Genetics 05/2015; 11(5):e1005201. DOI:10.1371/journal.pgen.1005201 · 8.17 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The ability to predict quantitative trait phenotypes from molecular polymorphism data will revolutionize evolutionary biology, medicine and human biology, and animal and plant breeding. Efforts to map quantitative trait loci have yielded novel insights into the biology of quantitative traits, but the combination of individually significant quantitative trait loci typically has low predictive ability. Utilizing all segregating variants can give good predictive ability in plant and animal breeding populations, but gives little insight into trait biology. Here, we used the Drosophila Genetic Reference Panel to perform both a genome wide association analysis and genomic prediction for the fitness-related trait chill coma recovery time. We found substantial total genetic variation for chill coma recovery time, with a genetic architecture that differs between males and females, a small number of molecular variants with large main effects, and evidence for epistasis. Although the top additive variants explained 36% (17%) of the genetic variance among lines in females (males), the predictive ability using genomic best linear unbiased prediction and a relationship matrix using all common segregating variants was very low for females and zero for males. We hypothesized that the low predictive ability was due to the mismatch between the infinitesimal genetic architecture assumed by the genomic best linear unbiased prediction model and the true genetic architecture of chill coma recovery time. Indeed, we found that the predictive ability of the genomic best linear unbiased prediction model is markedly improved when we combine quantitative trait locus mapping with genomic prediction by only including the top variants associated with main and epistatic effects in the relationship matrix. This trait-associated prediction approach has the advantage that it yields biologically interpretable prediction models.
    PLoS ONE 05/2015; 10(5):e0126880. DOI:10.1371/journal.pone.0126880 · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We explore the prediction of individuals phenotypes for complex traits using genomic data. We compare several widely used prediction models, including Ridge Regression, LASSO and Elastic Nets estimated from cohort data, and polygenic risk scores constructed using published summary statistics from genome-wide association meta-analyses (GWAMA). We evaluate the interplay between relatedness, trait architecture and optimal marker density, by predicting height, body mass index (BMI) and high-density lipoprotein level (HDL) in two data cohorts, originating from Croatia and Scotland. We empirically demonstrate that dense models are better when all genetic effects are small (height and BMI) and target individuals are related to the training samples, while sparse models predict better in unrelated individuals and when some effects have moderate size (HDL). For HDL sparse models achieved good across-cohort prediction, performing similarly to the GWAMA risk score and to models trained within the same cohort, which indicates that, for predicting traits with moderately-sized effects, large sample sizes and familial structure become less important, though still potentially useful. Finally, we propose a novel ensemble of whole-genome predictors with GWAMA risk scores and demonstrate that the resulting meta-model achieves higher prediction accuracy than either model on its own. We conclude that although current genomic predictors are not accurate enough for diagnostic purposes, performance can be improved without requiring access to large-scale individual-level data. Our methodologically simple meta-model is a means of performing predictive meta-analysis for optimising genomic predictions and can be easily extended to incorporate multiple population-level summary statistics or other domain knowledge. © The Author 2015. Published by Oxford University Press.
    Human Molecular Genetics 04/2015; DOI:10.1093/hmg/ddv145 · 6.68 Impact Factor

Preview (2 Sources)

1 Download
Available from