Beyond Missing Heritability: Prediction of Complex Traits

Department of Biostatistics, University of Alabama at Birmingham, Alabama, United States of America.
PLoS Genetics (Impact Factor: 7.53). 04/2011; 7(4):e1002051. DOI: 10.1371/journal.pgen.1002051
Source: PubMed


Despite rapid advances in genomic technology, our ability to account for phenotypic variation using genetic information remains limited for many traits. This has unfortunately resulted in limited application of genetic data towards preventive and personalized medicine, one of the primary impetuses of genome-wide association studies. Recently, a large proportion of the "missing heritability" for human height was statistically explained by modeling thousands of single nucleotide polymorphisms concurrently. However, it is currently unclear how gains in explained genetic variance will translate to the prediction of yet-to-be observed phenotypes. Using data from the Framingham Heart Study, we explore the genomic prediction of human height in training and validation samples while varying the statistical approach used, the number of SNPs included in the model, the validation scheme, and the number of subjects used to train the model. In our training datasets, we are able to explain a large proportion of the variation in height (h(2) up to 0.83, R(2) up to 0.96). However, the proportion of variance accounted for in validation samples is much smaller (ranging from 0.15 to 0.36 depending on the degree of familial information used in the training dataset). While such R(2) values vastly exceed what has been previously reported using a reduced number of pre-selected markers (<0.10), given the heritability of the trait (∼ 0.80), substantial room for improvement remains.

1 Follower
21 Reads
  • Source
    • "This method relies on both the additive relationship matrix between the individuals in the population, which are traditionally obtained from pedigree records, and on phenotypic records of the candidates to selection. Such is the power of BLUP that it is actually not only used in breeding programmes, but also in evolutionary ecology to estimate the strength of selection and evolutionary change (see Hadfield et al., 2010 for a review) and more recently in human genetics for the prediction of complex traits (Makowsky et al., 2011). With the advent of high-throughput genotyping techniques and the development of chips containing thousands of single nucleotide polymorphisms (SNPs) at a reasonable cost, the implementation of genome-wide evaluations (Meuwissen et al., 2001; Goddard and Hayes, 2007) is routinely used in many breeding programs, and conventional BLUP selection based on pedigrees is now migrating to genomic selection. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Estimated breeding values (EBVs) are traditionally obtained from pedigree information. However, EBVs from high-density genotypes can have higher accuracy than EBVs from pedigree information. At the same time, it has been shown that EBVs from genomic data lead to lower increases in inbreeding compared with traditional selection based on genealogies. Here we evaluate the performance with BLUP selection based on genealogical coancestry with three different genome-based coancestry estimates: (1) an estimate based on shared segments of homozygosity, (2) an approach based on SNP-by-SNP count corrected by allelic frequencies, and (3) the identity by state methodology. We evaluate the effect of different population sizes, different number of genomic markers, and several heritability values for a quantitative trait. The performance of the different measures of coancestry in BLUP is evaluated in the true breeding values after truncation selection and also in terms of coancestry and diversity maintained. Accordingly, cross-performances were also carried out, that is, how prediction based on genealogical records impacts the three other measures of coancestry and inbreeding, and viceversa. Our results show that the genetic gains are very similar for all four coancestries, but the genomic-based methods are superior to using genealogical coancestries in terms of maintaining diversity measured as observed heterozygosity. Furthermore, the measure of coancestry based on shared segments of the genome seems to provide slightly better results on some scenarios, and the increase in inbreeding and loss in diversity is only slightly larger than the other genomic selection methods in those scenarios. Our results shed light on genomic selection vs. traditional genealogical-based BLUP and make the case to manage the population variability using genomic information to preserve the future success of selection programmes.
    Frontiers in Genetics 04/2015; 6:127. DOI:10.3389/fgene.2015.00127
  • Source
    • "height in humans). Some strategies have been proposed to reduce C miss : (1) increasing the sample size in order to also detect genes with smaller effects, (2) expanding the studies to non-European samples in human genetics, (3) enlarging the collection of phenotypes to explore gene-gene interactions, (4) changing the structure of the training population, mainly in terms of the relatedness of the included individuals, and (5) moving to the genomic selection approach instead of estimating the marker effect for each SNP individually [13,19,20]. In animal breeding, some results suggest that the Illumina Bovine54K chip array (Illumina Inc., San Diego, CA) does not capture all the additive genetic variation for all dairy traits [21-23], even when using the GS approach, it estimates simultaneously all the SNP effects. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Genomic selection estimates genetic merit based on dense SNP (single nucleotide polymorphism) genotypes and phenotypes. This requires that SNPs explain a large fraction of the genetic variance. The objectives of this work were: (1) to estimate the fraction of genetic variance explained by dense genome-wide markers using 54 K SNP chip genotyping, and (2) to evaluate the effect of alternative marker-based relationship matrices and corrections for the base population on the fraction of the genetic variance explained by markers. Methods Two alternative marker-based relationship matrices were estimated using 35 706 SNPs on 1086 dairy bulls. Both pedigree- and marker-based relationship matrices were fitted simultaneously or separately in an animal model to estimate the fraction of variance not explained by the markers, i.e. the fraction explained by the pedigree. The phenotypes considered in the analysis were the deregressed estimated breeding values (dEBV) for milk, fat and protein yield and for somatic cell score (SCS). Results When dEBV were not sufficiently accurate (50 or 70%), the estimated fraction of the genetic variance explained by the markers was around 65% for yield traits and 45% for SCS. Scaling marker genotypes with locus-specific frequencies of heterozygotes slightly increased the variance explained by markers, compared with scaling with the average frequency of heterozygotes across loci. The estimated fraction of the genetic variance explained by the markers using separately both relationships matrices followed the same trends but the results were underestimated. With less accurate dEBV estimates, the fraction of the genetic variance explained by markers was underestimated, which is probably an artifact due to the dEBV being estimated by a pedigree-based animal model. Conclusions When using only highly accurate dEBV, the proportion of the genetic variance explained by the Illumina 54 K SNP chip was approximately 80% for Brown Swiss cattle. These results depend on the SNP chip used and the family structure of the population, i.e. more dense SNPs and closer family relationships are expected to result in a higher fraction of the variance explained by the SNPs.
    Genetics Selection Evolution 06/2014; 46(1):36. DOI:10.1186/1297-9686-46-36 · 3.82 Impact Factor
  • Source
    • "Depending on the genetic architecture of the complex trait under study, additive genetic variability explained by SNPs may be related to the allele frequency spectrum of causal variants (Manolio 2009), density of markers (Makowsky et al. 2011) and contribution of different segments of the genome, such as different chromosomes (Lee et al. 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The aim of this study was to separate marked additive genetic variability for three quantitative traits in chickens into components associated with classes of minor allele frequency (MAF), individual chromosomes and marker density using the genomewide complex trait analysis (GCTA) approach. Data were from 1351 chickens measured for body weight (BW), ultrasound of breast muscle (BM) and hen house egg production (HHP), each bird with 354 364 SNP genotypes. Estimates of variance components show that SNPs on commercially available genotyping chips marked a large amount of genetic variability for all three traits. The estimated proportion of total variation tagged by all autosomal SNPs was 0.30 (SE 0.04) for BW, 0.33 (SE 0.04) for BM, and 0.19 (SE 0.05) for HHP. We found that a substantial proportion of this variation was explained by low frequency variants (MAF <0.20) for BW and BM, and variants with MAF 0.10–0.30 for HHP. The marked genetic variance explained by each chromosome was linearly related to its length (R2 = 0.60) for BW and BM. However, for HHP, there was no linear relationship between estimates of variance and length of the chromosome (R2 = 0.01). Our results suggest that the contribution of SNPs to marked additive genetic variability is dependent on the allele frequency spectrum. For the sample of birds analysed, it was found that increasing marker density beyond 100K SNPs did not capture additional additive genetic variance.
    Journal of Animal Breeding and Genetics 06/2014; DOI:10.1111/jbg.12079 · 1.57 Impact Factor
Show more

Preview (2 Sources)

21 Reads
Available from