A comparison of five methods to predict genomic breeding values of dairy bulls from genome-wide SNP markers.

The CRC for Innovative Dairy Products, Australia.
Genetics Selection Evolution (Impact Factor: 3.75). 01/2009; 41:56. DOI: 10.1186/1297-9686-41-56
Source: PubMed

ABSTRACT Genomic selection (GS) uses molecular breeding values (MBV) derived from dense markers across the entire genome for selection of young animals. The accuracy of MBV prediction is important for a successful application of GS. Recently, several methods have been proposed to estimate MBV. Initial simulation studies have shown that these methods can accurately predict MBV. In this study we compared the accuracies and possible bias of five different regression methods in an empirical application in dairy cattle.
Genotypes of 7,372 SNP and highly accurate EBV of 1,945 dairy bulls were used to predict MBV for protein percentage (PPT) and a profit index (Australian Selection Index, ASI). Marker effects were estimated by least squares regression (FR-LS), Bayesian regression (Bayes-R), random regression best linear unbiased prediction (RR-BLUP), partial least squares regression (PLSR) and nonparametric support vector regression (SVR) in a training set of 1,239 bulls. Accuracy and bias of MBV prediction were calculated from cross-validation of the training set and tested against a test team of 706 young bulls.
For both traits, FR-LS using a subset of SNP was significantly less accurate than all other methods which used all SNP. Accuracies obtained by Bayes-R, RR-BLUP, PLSR and SVR were very similar for ASI (0.39-0.45) and for PPT (0.55-0.61). Overall, SVR gave the highest accuracy.All methods resulted in biased MBV predictions for ASI, for PPT only RR-BLUP and SVR predictions were unbiased. A significant decrease in accuracy of prediction of ASI was seen in young test cohorts of bulls compared to the accuracy derived from cross-validation of the training set. This reduction was not apparent for PPT. Combining MBV predictions with pedigree based predictions gave 1.05 - 1.34 times higher accuracies compared to predictions based on pedigree alone. Some methods have largely different computational requirements, with PLSR and RR-BLUP requiring the least computing time.
The four methods which use information from all SNP namely RR-BLUP, Bayes-R, PLSR and SVR generate similar accuracies of MBV prediction for genomic selection, and their use in the selection of immediate future generations in dairy cattle will be comparable. The use of FR-LS in genomic selection is not recommended.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence. The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS. The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels.
    BMC Genomics 10/2014; 15(Suppl 7):S4. DOI:10.1186/1471-2164-15-S7-S4 · 4.04 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The aim of this study was to compare the accuracy of genomic selection (i.e., selection based on genome-wide markers) to phenotypic selection through simulations based on real barley (Hordeum vulgare L.) single nucleotide polymorphism (SNP) data (1325 SNPs by 863 breeding lines). We simulated 100 quantitative trait loci (QTL) at randomly picked SNPs, which were dropped from the marker data. The sum of heritability of all the QTL was set as 0.1, 0.2, 0.4, or 0.6. We generated 100 datasets for each simulation condition. A dataset was then separated into training (N = 200, 400, or 600) and validation sets. Bayesian methods, multivariate regression methods (partial least square and ridge regression), and machine learning methods (random forest and support vector machine) were used for building prediction models. The prediction accuracy was high for the Bayesian methods and ridge regression. Under medium and high heritability (h(2) = 0.4 and 0.6), the mean of predictions from all methods was more accurate than predictions based on any single method, suggesting that different methods captured different aspects of genotype-phenotype associations. The advantage of genomic over phenotypic selection was larger under lower heritability and a larger training dataset. The difference in prediction accuracy between polygenic and oligogenic traits was small. The models were also useful in increasing the accuracy of predictions on breeding lines with phenotypic records. The results indicate that genomic selection can be efficiently used in barley breeding programs.
    Crop Science 09/2011; 51(5):1915. DOI:10.2135/cropsci2010.12.0732 · 1.48 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Genetic prediction of quantitative traits is a critical task in plant and animal breeding. Genomic selection is an accurate and efficient method of estimating genetic merits by using high-density genome-wide single nucleotide polymorphisms (SNP). In the framework of linear mixed models, we extended genomic best linear unbiased prediction (GBLUP) by including additional quantitative trait locus (QTL) information that was extracted from high-throughput SNPs by using least absolute shrinkage selection operator (LASSO). GBLUP was combined with three LASSO methods-standard LASSO (SLGBLUP), adaptive LASSO (ALGBLUP), and elastic net (ENGBLUP)-that were used for detecting QTLs, and these QTLs were fitted as fixed effects; the remaining SNPs were fitted using a realized genetic relationship matrix. Simulations performed under distinct scenarios revealed that (1) the prediction accuracy of SLGBLUP was the lowest; (2) the prediction accuracies of ALGBLUP and ENGBLUP were equivalent to or higher than that of GBLUP, except under scenarios in which the number of QTLs was large; and (3) the persistence of prediction accuracy over generations was strongest in the case of ENGBLUP. Building on the favorable computational characteristics of GBLUP, ENGBLUP enables robust modeling and efficient computation to be performed for genomic selection.
    Genetica 02/2015; DOI:10.1007/s10709-015-9826-5 · 1.75 Impact Factor

Full-text (5 Sources)

Available from
May 28, 2014