Increased accuracy of selection by using the realised relationship matrix

Biosciences Research Division, Department of Primary Industries Victoria, 1 Park Drive, Bundoora 3083, Australia.
Genetics Research (Impact Factor: 1.47). 03/2009; 91(1):47-60. DOI: 10.1017/S0016672308009981
Source: PubMed


Dense marker genotypes allow the construction of the realized relationship matrix between individuals, with elements the realized proportion of the genome that is identical by descent (IBD) between pairs of individuals. In this paper, we demonstrate that by replacing the average relationship matrix derived from pedigree with the realized relationship matrix in best linear unbiased prediction (BLUP) of breeding values, the accuracy of the breeding values can be substantially increased, especially for individuals with no phenotype of their own. We further demonstrate that this method of predicting breeding values is exactly equivalent to the genomic selection methodology where the effects of quantitative trait loci (QTLs) contributing to variation in the trait are assumed to be normally distributed. The accuracy of breeding values predicted using the realized relationship matrix in the BLUP equations can be deterministically predicted for known family relationships, for example half sibs. The deterministic method uses the effective number of independently segregating loci controlling the phenotype that depends on the type of family relationship and the length of the genome. The accuracy of predicted breeding values depends on this number of effective loci, the family relationship and the number of phenotypic records. The deterministic prediction demonstrates that the accuracy of breeding values can approach unity if enough relatives are genotyped and phenotyped. For example, when 1000 full sibs per family were genotyped and phenotyped, and the heritability of the trait was 0.5, the reliability of predicted genomic breeding values (GEBVs) for individuals in the same full sib family without phenotypes was 0.82. These results were verified by simulation. A deterministic prediction was also derived for random mating populations, where the effective population size is the key parameter determining the effective number of independently segregating loci. If the effective population size is large, a very large number of individuals must be genotyped and phenotyped in order to accurately predict breeding values for unphenotyped individuals from the same population. If the heritability of the trait is 0.3, and N(e)=100, approximately 12474 individuals with genotypes and phenotypes are required in order to predict GEBVs of un-phenotyped individuals in the same population with an accuracy of 0.7 [corrected].

Full-text preview

Available from:
  • Source
    • "This scenario may be rather specific for canola, in which modern, adapted breeding pools have a particularly narrow genetic basis due to conscious selection for specific traits313233. The situation is very different in maize or cattle, for example, where genetic differentiation among subpopulations or races are highly pronounced and population differences in gene and allele content are therefore often decisive[22,68,69]. We conclude that adjustment of prediction models on a case-by case basis in canola can potentially give small improvement in prediction of specific traits depending on the variance within a given breeding population. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genomic selection (GS) is a modern breeding approach where genome-wide single-nucleotide polymorphism (SNP) marker profiles are simultaneously used to estimate performance of untested genotypes. In this study, the potential of genomic selection methods to predict testcross performance for hybrid canola breeding was applied for various agronomic traits based on genome-wide marker profiles. A total of 475 genetically diverse spring-type canola pollinator lines were genotyped at 24,403 single-copy, genome-wide SNP loci. In parallel, the 950 F1 testcross combinations between the pollinators and two representative testers were evaluated for a number of important agronomic traits including seedling emergence, days to flowering, lodging, oil yield and seed yield along with essential seed quality characters including seed oil content and seed glucosinolate content. A ridge-regression best linear unbiased prediction (RR-BLUP) model was applied in combination with 500 cross-validations for each trait to predict testcross performance, both across the whole population as well as within individual subpopulations or clusters, based solely on SNP profiles. Subpopulations were determined using multidimensional scaling and K-means clustering. Genomic prediction accuracy across the whole population was highest for seed oil content (0.81) followed by oil yield (0.75) and lowest for seedling emergence (0.29). For seed yieId, seed glucosinolate, lodging resistance and days to onset of flowering (DTF), prediction accuracies were 0.45, 0.61, 0.39 and 0.56, respectively. Prediction accuracies could be increased for some traits by treating subpopulations separately; a strategy which only led to moderate improvements for some traits with low heritability, like seedling emergence. No useful or consistent increase in accuracy was obtained by inclusion of a population substructure covariate in the model. Testcross performance prediction using genome-wide SNP markers shows considerable potential for pre-selection of promising hybrid combinations prior to resource-intensive field testing over multiple locations and years.
    Full-text · Article · Jan 2016 · PLoS ONE
  • Source
    • "Therefore, genomic fingerprinting data permit the accurate estimation of the realized 93 relationships among any set of individuals, irrespective of their genealogy, to construct the 94 realized genomic relationship matrix (G-matrix) which can be used to substitute the A-matrix 95 (VanRaden 2008). This advancement represents a clear quantitative genetics watershed as theThomas et al. 2002;Frentiu et al. 2008;Hayes et al. 2009;El-Kassaby et al. 2012;Gay 101 et al. 2013;Porth et al. 2013;Zapata-Valenzuela et al. 2013;Klápště et al. 2014;Muñoz et al. 1022014). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The open-pollinated (OP) family testing combines the simplest known progeny evaluation and quantitative genetics analyses as candidates' offspring are assumed to represent independent half-sib families. The accuracy of genetic parameter estimates is often questioned as the assumption of "half-sibling" in OP families may often be violated. We compared the pedigree- versus marker-based genetic models by analyzing 22-year height and 30-year wood density for 214 white spruce (Picea glauca (Moench) Voss) OP families represented by 1,694 individuals growing on one site in Quebec, Canada. Assuming half-sibling, the pedigree-based model was limited to estimating the additive genetic variances which, in turn, were grossly overestimated as they were confounded by very minor dominance and major additive-by-additive epistatic genetic variances. In contrast, the implemented genomic pairwise realized relationship models allowed the disentanglement of additive from all non-additive factors through genetic variance decomposition. The marker-based models produced more realistic narrow-sense heritability estimates and, for the first time, allowed estimating the dominance and epistatic genetic variances from OP testing. In addition, the genomic models showed better prediction accuracies compared to pedigree models and were able to predict individual breeding values for new individuals from untested families, which was not possible using the pedigree based model. Clearly, the use of marker-based relationship approach is effective in estimating the quantitative genetic parameters of complex traits' even under simple and shallow pedigree structure.
    Full-text · Article · Jan 2016 · G3-Genes Genomes Genetics
  • Source
    • "However, in genomic selection models it is not clear how to account for selection (Aguilar et al., 2010; Chen et al., 2011). Furthermore, it must be assumed that the genotyped animals come from an unselected population (Hayes et al., 2009), something that is often not true in practice. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The objective of this study was to compare the multi-trait model using pedigree information and a model using genomic information in addition to pedigree information. We used data from 5896 lactations of 2021 buffalo cows, of which 384 were genotyped using the Illumina Infinium® bovine HD BeadChip, considering seven traits related to milk yield (MY305), fat (FY305), protein (PY305), and lactose (LY305), percentages of fat (%F) and protein (%P), and somatic cell score (SCS). We carried out two analyses, one using phenotype and pedigree information (matrix A) and the other using the relationship matrix based on pedigree and genomics information (a single step, matrix H). The (co)variance components were estimated using multiple-trait analysis by the Bayesian inference method. The model included the fixed effects of contemporary groups (herd-year and calving season), and the age of cow at calving as (co)variables (quadratic and linear effect). The additive genetic, permanent environmental, and residual effects were included as random effects in the model. The estimates of heritability using matrix A were 0.25, 0.22, 0.26, 0.25, 0.37, 0.42, and 0.17, while using matrix H the heritability values were 0.25, 0.24, 0.26, 0.26, 0.38, 0.47, and 0.18 for MY305, FY305, PY305, LY305, %F, %P, and SCS, respectively. The estimates of breeding values in the two analyses were similar for the traits studied, but the accuracies were greater when using matrix H (higher than 8% in the traits studied). Therefore, the use of genomic information in the analyses improved the accuracy.
    Full-text · Article · Dec 2015 · Genetics and molecular research: GMR
Show more