Moser G, Tier B, Crump RE, Khatkar MS, Raadsma HW.. A comparison of five methods to predict genomic breeding values of dairy bulls from genome-wide SNP markers. Genet Sel Evol 41: 56

The CRC for Innovative Dairy Products, Australia.
Genetics Selection Evolution (Impact Factor: 3.82). 12/2009; 41(1):56. DOI: 10.1186/1297-9686-41-56
Source: PubMed


Genomic selection (GS) uses molecular breeding values (MBV) derived from dense markers across the entire genome for selection of young animals. The accuracy of MBV prediction is important for a successful application of GS. Recently, several methods have been proposed to estimate MBV. Initial simulation studies have shown that these methods can accurately predict MBV. In this study we compared the accuracies and possible bias of five different regression methods in an empirical application in dairy cattle.
Genotypes of 7,372 SNP and highly accurate EBV of 1,945 dairy bulls were used to predict MBV for protein percentage (PPT) and a profit index (Australian Selection Index, ASI). Marker effects were estimated by least squares regression (FR-LS), Bayesian regression (Bayes-R), random regression best linear unbiased prediction (RR-BLUP), partial least squares regression (PLSR) and nonparametric support vector regression (SVR) in a training set of 1,239 bulls. Accuracy and bias of MBV prediction were calculated from cross-validation of the training set and tested against a test team of 706 young bulls.
For both traits, FR-LS using a subset of SNP was significantly less accurate than all other methods which used all SNP. Accuracies obtained by Bayes-R, RR-BLUP, PLSR and SVR were very similar for ASI (0.39-0.45) and for PPT (0.55-0.61). Overall, SVR gave the highest accuracy.All methods resulted in biased MBV predictions for ASI, for PPT only RR-BLUP and SVR predictions were unbiased. A significant decrease in accuracy of prediction of ASI was seen in young test cohorts of bulls compared to the accuracy derived from cross-validation of the training set. This reduction was not apparent for PPT. Combining MBV predictions with pedigree based predictions gave 1.05 - 1.34 times higher accuracies compared to predictions based on pedigree alone. Some methods have largely different computational requirements, with PLSR and RR-BLUP requiring the least computing time.
The four methods which use information from all SNP namely RR-BLUP, Bayes-R, PLSR and SVR generate similar accuracies of MBV prediction for genomic selection, and their use in the selection of immediate future generations in dairy cattle will be comparable. The use of FR-LS in genomic selection is not recommended.

Download full-text


Available from: Mehar Khatkar,
    • "In BayesB, SNPs are divided into two groups, one containing major SNPs that exert large effects and the other including SNPs that exert minor effects, and these SNP effects are fitted using two distinct distributions; however, in GBLUP, all SNP effects are assumed to be distributed normally, and the effects are fitted using a single distribution. Previous research has indicated that the accuracy of GBLUP is similar to that of BayesB (Hayes et al. 2009; Heslot et al. 2012; Le Roy et al. 2012; Luan et al. 2009; Moser et al. 2009; Pszczola et al. 2011). GBLUP is an extension of BLUP under the infinitesimal hypothesis, but considerable quantitative trait locus (QTL) mapping experience and breeding practice have validated the existence of major genes. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genetic prediction of quantitative traits is a critical task in plant and animal breeding. Genomic selection is an accurate and efficient method of estimating genetic merits by using high-density genome-wide single nucleotide polymorphisms (SNP). In the framework of linear mixed models, we extended genomic best linear unbiased prediction (GBLUP) by including additional quantitative trait locus (QTL) information that was extracted from high-throughput SNPs by using least absolute shrinkage selection operator (LASSO). GBLUP was combined with three LASSO methods-standard LASSO (SLGBLUP), adaptive LASSO (ALGBLUP), and elastic net (ENGBLUP)-that were used for detecting QTLs, and these QTLs were fitted as fixed effects; the remaining SNPs were fitted using a realized genetic relationship matrix. Simulations performed under distinct scenarios revealed that (1) the prediction accuracy of SLGBLUP was the lowest; (2) the prediction accuracies of ALGBLUP and ENGBLUP were equivalent to or higher than that of GBLUP, except under scenarios in which the number of QTLs was large; and (3) the persistence of prediction accuracy over generations was strongest in the case of ENGBLUP. Building on the favorable computational characteristics of GBLUP, ENGBLUP enables robust modeling and efficient computation to be performed for genomic selection.
    Genetica 02/2015; 143(3). DOI:10.1007/s10709-015-9826-5 · 1.40 Impact Factor
  • Source
    • "This trade-off is encoded in smoothing parameters (λ) that can be estimated by cross-validation or by Bayesian methods as in any other machine learning approach. Moser et al. (2009) found a similar predictive ability of SVM, Bayes R and ridge regression BLUP. Long et al. (2011a, 2011b) used SVM to predict sire estimated breeding values in dairy cattle with a Gaussian radial basis function and a linear kernel, with similar predictive ability as the Bayesian Lasso. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide prediction of complex traits has become increasingly important in animal and plant breeding, and is receiving increasing attention in human genetics. Most common approaches are whole-genome regression models where phenotypes are regressed on thousands of markers concurrently, applying different prior distributions to marker effects. While use of shrinkage or regularization in SNP regression models has delivered improvements in predictive ability in genome-based evaluations, serious over-fitting problems may be encountered as the ratio between markers and available phenotypes continues increasing. Machine learning is an alternative approach for prediction and classification, capable of dealing with the dimensionality problem in a computationally flexible manner. In this article we provide an overview of non-parametric and machine learning methods used in genome wide prediction, discuss their similarities as well as their relationship to some well-known parametric approaches. Although the most suitable method is usually case dependent, we suggest use of support vector machines and random forests for classification problems, whereas Reproducing Kernel Hilbert Spaces regression and boosting may suit better regression problems, with the former having the more consistently higher predictive ability. Neural Networks may suffer from over-fitting and may be too computationally demanded when the number of neurons is large. We further discuss on the metrics used to evaluate predictive ability in model comparison under cross-validation from a genomic selection point of view. We suggest use of predictive mean squared error as a main but not only metric for model comparison. Visual tools may greatly assist on the choice of the most accurate model.
    Livestock Science 08/2014; 166(1). DOI:10.1016/j.livsci.2014.05.036 · 1.17 Impact Factor
  • Source
    • "This advantage has not been confirmed in many previous studies that compared different methods using real data. In studies using real data, GBLUP performed comparably or better than variable selection methods [4,7,10,11], although there is evidence that substantially higher accuracies can be achieved using variable selection methods for traits that are known to be affected by genes of moderate-to-large effects (e.g. traits affected by DGAT1, [6,11]). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Nellore cattle play an important role in beef production in tropical systems and there is great interest in determining if genomic selection can contribute to accelerate genetic improvement of production and fertility in this breed. We present the first results of the implementation of genomic prediction in a Bos indicus (Nellore) population. Influential bulls were genotyped with the Illumina Bovine HD chip in order to assess genomic predictive ability for weight and carcass traits, gestation length, scrotal circumference and two selection indices. 685 samples and 320 238 single nucleotide polymorphisms (SNPs) were used in the analyses. A forward-prediction scheme was adopted to predict the genomic breeding values (DGV). In the training step, the estimated breeding values (EBV) of bulls were deregressed (dEBV) and used as pseudo-phenotypes to estimate marker effects using four methods: genomic BLUP with or without a residual polygenic effect (GBLUP20 and GBLUP0, respectively), a mixture model (Bayes C) and Bayesian LASSO (BLASSO). Empirical accuracies of the resulting genomic predictions were assessed based on the correlation between DGV and dEBV for the testing group. Accuracies of genomic predictions ranged from 0.17 (navel at weaning) to 0.74 (finishing precocity). Across traits, Bayesian regression models (Bayes C and BLASSO) were more accurate than GBLUP. The average empirical accuracies were 0.39 (GBLUP0), 0.40 (GBLUP20) and 0.44 (Bayes C and BLASSO). Bayes C and BLASSO tended to produce deflated predictions (i.e. slope of the regression of dEBV on DGV greater than 1). Further analyses suggested that higher-than-expected accuracies were observed for traits for which EBV means differed significantly between two breeding subgroups that were identified in a principal component analysis based on genomic relationships. Bayesian regression models are of interest for future applications of genomic selection in this population, but further improvements are needed to reduce deflation of their predictions. Recurrent updates of the training population would be required to enable accurate prediction of the genetic merit of young animals. The technical feasibility of applying genomic prediction in a Bos indicus (Nellore) population was demonstrated. Further research is needed to permit cost-effective selection decisions using genomic information.
    Genetics Selection Evolution 02/2014; 46(1):17. DOI:10.1186/1297-9686-46-17 · 3.82 Impact Factor
Show more