[Show abstract][Hide abstract] ABSTRACT: Fertility is a major concern in the dairy cattle industry and has been the subject of numerous studies over the past 20 years. Surprisingly, most of these studies focused on rough female phenotypes and, despite their important role in reproductive success, male- and embryo-related traits have been poorly investigated. In recent years, the rapid and important evolution of technologies in genetic research has led to the development of genomic selection. The generalisation of this method in combination with the achievements of the AI industry have led to the constitution of large databases of genotyping and sequencing data, as well as refined phenotypes and pedigree records. These resources offer unprecedented opportunities in terms of fundamental and applied research. Here we present five such examples with a focus on reproduction-related traits: (1) detection of quantitative trait loci (QTL) for male fertility and semen quality traits; (2) detection of QTL for refined phenotypes associated with female fertility; (3) identification of recessive embryonic lethal mutations by depletion of homozygous haplotypes; (4) identification of recessive embryonic lethal mutations by mining whole-genome sequencing data; and (5) the contribution of high-density single nucleotide polymorphism chips, whole-genome sequencing and imputation to increasing the power of QTL detection methods and to the identification of causal variants.
Reproduction Fertility and Development 12/2014; 27(1):14-21. DOI:10.1071/RD14379 · 2.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Genotyping with the medium-density Bovine SNP50 BeadChip(R) (50K) is now standard in cattle. The high-density BovineHD BeadChip(R), which contains 777 609 single nucleotide polymorphisms (SNPs), was developed in 2010. Increasing marker density increases the level of linkage disequilibrium between quantitative trait loci (QTL) and SNPs and the accuracy of QTL localization and genomic selection. However, re-genotyping all animals with the high-density chip is not economically feasible. An alternative strategy is to genotype part of the animals with the high-density chip and to impute high-density genotypes for animals already genotyped with the 50K chip. Thus, it is necessary to investigate the error rate when imputing from the 50K to the high-density chip.
Five thousand one hundred and fifty three animals from 16 breeds (89 to 788 per breed) were genotyped with the high-density chip. Imputation error rates from the 50K to the high-density chip were computed for each breed with a validation set that included the 20% youngest animals. Marker genotypes were masked for animals in the validation population in order to mimic 50K genotypes. Imputation was carried out using the Beagle 3.3.0 software.
Mean allele imputation error rates ranged from 0.31% to 2.41% depending on the breed. In total, 1980 SNPs had high imputation error rates in several breeds, which is probably due to genome assembly errors, and we recommend to discard these in future studies. Differences in imputation accuracy between breeds were related to the high-density-genotyped sample size and to the genetic relationship between reference and validation populations, whereas differences in effective population size and level of linkage disequilibrium showed limited effects. Accordingly, imputation accuracy was higher in breeds with large populations and in dairy breeds than in beef breeds. More than 99% of the alleles were correctly imputed if more than 300 animals were genotyped at high-density. No improvement was observed when multi-breed imputation was performed.
In all breeds, imputation accuracy was higher than 97%, which indicates that imputation to the high-density chip was accurate. Imputation accuracy depends mainly on the size of the reference population and the relationship between reference and target populations.
[Show abstract][Hide abstract] ABSTRACT: Recently, the amount of available single nucleotide polymorphism (SNP) marker data has considerably increased in dairy cattle breeds, both for research purposes and for application in commercial breeding and selection programs. Bayesian methods are currently used in the genomic evaluation of dairy cattle to handle very large sets of explanatory variables with a limited number of observations. In this study, we applied 2 Bayesian methods, BayesCπ and Bayesian least absolute shrinkage and selection operator (LASSO), to 2 genotyped and phenotyped reference populations consisting of 3,940 Holstein bulls and 1,172 Montbéliarde bulls with approximately 40,000 polymorphic SNP. We compared the accuracy of the Bayesian methods for the prediction of 3 traits (milk yield, fat content, and conception rate) with pedigree-based BLUP, genomic BLUP, partial least squares (PLS) regression, and sparse PLS regression, a variable selection PLS variant. The results showed that the correlations between observed and predicted phenotypes were similar in BayesCπ (including or not pedigree information) and Bayesian LASSO for most of the traits and whatever the breed. In the Holstein breed, Bayesian methods led to higher correlations than other approaches for fat content and were similar to genomic BLUP for milk yield and to genomic BLUP and PLS regression for the conception rate. In the Montbéliarde breed, no method dominated the others, except BayesCπ for fat content. The better performances of the Bayesian methods for fat content in Holstein and Montbéliarde breeds are probably due to the effect of the DGAT1 gene. The SNP identified by the BayesCπ, Bayesian LASSO, and sparse PLS regression methods, based on their effect on the different traits of interest, were located at almost the same position on the genome. As the Bayesian methods resulted in regressions of direct genomic values on daughter trait deviations closer to 1 than for the other methods tested in this study, Bayesian methods are suggested for genomic evaluations of French dairy cattle.
[Show abstract][Hide abstract] ABSTRACT: In many situations, genome-wide association studies are performed in populations presenting stratification. Mixed models including a kinship matrix accounting for genetic relatedness among individuals have been shown to correct for population and/or family structure. Here we extend this methodology to generalized linear mixed models which properly model data under various distributions. In addition we perform association with ancestral haplotypes inferred using a hidden Markov model.
The method was shown to properly account for stratification under various simulated scenari presenting population and/or family structure. Use of ancestral haplotypes resulted in higher power than SNPs on simulated datasets. Application to real data demonstrates the usefulness of the developed model. Full analysis of a dataset with 4600 individuals and 500 000 SNPs was performed in 2 h 36 min and required 2.28 Gb of RAM.
The software GLASCOW can be freely downloaded from www.giga.ulg.ac.be/jcms/prod_381171/software.
Supplementary data are available at Bioinformatics online.
[Show abstract][Hide abstract] ABSTRACT: Genomic selection involves computing a prediction equation from the estimated effects of a large number of DNA markers based on a limited number of genotyped animals with phenotypes. The number of observations is much smaller than the number of independent variables, and the challenge is to find methods that perform well in this context. Partial least squares regression (PLS) and sparse PLS were used with a reference population of 3,940 genotyped and phenotyped French Holstein bulls and 39,738 polymorphic single nucleotide polymorphism markers. Partial least squares regression reduces the number of variables by projecting independent variables onto latent structures. Sparse PLS combines variable selection and modeling in a one-step procedure. Correlations between observed phenotypes and phenotypes predicted by PLS and sparse PLS were similar, but sparse PLS highlighted some genome regions more clearly. Both PLS and sparse PLS were more accurate than pedigree-based BLUP and generally provided lower correlations between observed and predicted phenotypes than did genomic BLUP. Furthermore, PLS and sparse PLS required similar computing time to genomic BLUP for the study of 6 traits.
[Show abstract][Hide abstract] ABSTRACT: Genomic selection is implemented in French Holstein, Montbéliarde, and Normande breeds (70%, 16% and 12% of French dairy cows). A characteristic of the model for genomic evaluation is the use of haplotypes instead of single-nucleotide polymorphisms (SNPs), so as to maximise linkage disequilibrium between markers and quantitative trait loci (QTLs). For each trait, a QTL-BLUP model (i.e. a best linear unbiased prediction model including QTL random effects) includes 300–700 trait-dependent chromosomal regions selected either by linkage disequilibrium and linkage analysis or by elastic net. This model requires an important effort to phase genotypes, detect QTLs, select SNPs, but was found to be the most efficient one among all tested ones. QTLs are defined within breed and many of them were found to be breed specific. Reference populations include 1800 and 1400 bulls in Montbéliarde and Normande breeds. In Holstein, the very large reference population of 18 300 bulls originates from the EuroGenomics consortium. Since 2008, ~65 000 animals have been genotyped for selection by Labogena with the 50k chip. Bulls genomic estimated breeding values (GEBVs) were made official in June 2009. In 2010, the market share of the young bulls reached 30% and is expected to increase rapidly. Advertising actions have been undertaken to recommend a time-restricted use of young bulls with a limited number of doses. In January 2011, genomic selection was opened to all farmers for females. Current developments focus on the extension of the method to a multi-breed context, to use all reference populations simultaneously in genomic evaluation.
Animal Production Science 01/2012; 52(3). DOI:10.1071/AN11119 · 1.29 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The European Brown Swiss federation, in collaboration with Interbull, funded and managed a project named Intergenomics. The goal of this project is to perform genomic evaluations of sires based on a joint analysis of all the genotypes collected around Europe. To date, six countries are involved in Intergenomics and according to the country, between 3 and 15 traits are available. In this study, we propose to compare a panel of 4 genomic selection approaches to the pedigree-based BLUP (Best Linear Unbiased Predictor). Among these 4 methodologies, performances of the genomic BLUP (GBLUP) were compared to 2 bayesian approaches (Bayesian LASSO and Bayes Cπ) and a variable selection approach (Elastic Net or EN). Except the GBLUP, the other genomic selection approaches deal with the p>>n problem (number of Single Nucleotide Polymorphism or SNP (p) is much higher than the number of bulls (n)). We compare the correlations between observed and predicted deregressed proofs for the different traits, the different country scales and the different methods. Compared to the pedigree-based BLUP, genomic selection approaches allow a gain in correlation between 6.5 and 20.9%. Bayesian LASSO, Bayes Cπ and EN give the best results with a gain of correlation around 3% compared to a GBLUP. The slope of regression is also lowest with these three methods than with the pedigree-based BLUP and the GBLUP. Consequently, over the different country scale, the mean number of traits which validate the interbull test (slope of regression between 0.8 and 1.2) is lowest for the pedigree-based BLUP (6.4 traits in average) than for the Bayesian LASSO, Bayes Cπ and EN (between 7.8 and 8 traits in average).
[Show abstract][Hide abstract] ABSTRACT: Size of the reference population and reliability of phenotypes are crucial factors influencing the reliability of genomic predictions. It is therefore useful to combine closely related populations. Increased accuracies of genomic predictions depend on the number of individuals added to the reference population, the reliability of their phenotypes, and the relatedness of the populations that are combined.
This paper assesses the increase in reliability achieved when combining four Holstein reference populations of 4000 bulls each, from European breeding organizations, i.e. UNCEIA (France), VikingGenetics (Denmark, Sweden, Finland), DHV-VIT (Germany) and CRV (The Netherlands, Flanders). Each partner validated its own bulls using their national reference data and the combined data, respectively.
Combining the data significantly increased the reliability of genomic predictions for bulls in all four populations. Reliabilities increased by 10%, compared to reliabilities obtained with national reference populations alone, when they were averaged over countries and the traits evaluated. For different traits and countries, the increase in reliability ranged from 2% to 19%.
Genomic selection programs benefit greatly from combining data from several closely related populations into a single large reference population.
[Show abstract][Hide abstract] ABSTRACT: For genomic selection methods, the statistical challenge is to estimate the effect of each of the available single-nucleotide polymorphism (SNP). In a context where the number of SNPs (p) is much higher than the number of bulls (n), this task may lead to a poor estimation of these SNP effects if, as for genomic BLUP (gBLUP), all SNPs have a non-null effect. An alternative is to use approaches that have been developed specifically to solve the 'p > n' problem. This is the case of variable selection methods and among them, we focus on the Elastic-Net (EN) algorithm that is a penalized regression approach. Performances of EN, gBLUP and pedigree-based BLUP were compared with data from three French dairy cattle breeds, giving very encouraging results for EN. We tried to push further the idea of improving SNP effect estimates by considering fewer of them. This variable selection strategy was considered both in the case of gBLUP and EN by adding an SNP pre-selection step based on quantitative trait locus (QTL) detection. Similar results were observed with or without a pre-selection step, in terms of correlations between direct genomic value (DGV) and observed daughter yield deviation in a validation data set. However, when applied to the EN algorithm, this strategy led to a substantial reduction of the number of SNPs included in the prediction equation. In a context where the number of genotyped animals and the number of SNPs gets larger and larger, SNP pre-selection strongly alleviates computing requirements and ensures that national evaluations can be completed within a reasonable time frame.
Genetics Research 12/2011; 93(6):409-17. DOI:10.1017/S0016672311000358 · 1.47 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The purpose of this study was to investigate the imputation error and loss of reliability of direct genomic values (DGV) or genomically enhanced breeding values (GEBV) when using genotypes imputed from a 3,000-marker single nucleotide polymorphism (SNP) panel to a 50,000-marker SNP panel. Data consisted of genotypes of 15,966 European Holstein bulls from the combined EuroGenomics reference population. Genotypes with the low-density chip were created by erasing markers from 50,000-marker data. The studies were performed in the Nordic countries (Denmark, Finland, and Sweden) using a BLUP model for prediction of DGV and in France using a genomic marker-assisted selection approach for prediction of GEBV. Imputation in both studies was done using a combination of the DAGPHASE 1.1 and Beagle 2.1.3 software. Traits considered were protein yield, fertility, somatic cell count, and udder depth. Imputation of missing markers and prediction of breeding values were performed using 2 different reference populations in each country: either a national reference population or a combined EuroGenomics reference population. Validation for accuracy of imputation and genomic prediction was done based on national test data. Mean imputation error rates when using national reference animals was 5.5 and 3.9% in the Nordic countries and France, respectively, whereas imputation based on the EuroGenomics reference data set gave mean error rates of 4.0 and 2.1%, respectively. Prediction of GEBV based on genotypes imputed with a national reference data set gave an absolute loss of 0.05 in mean reliability of GEBV in the French study, whereas a loss of 0.03 was obtained for reliability of DGV in the Nordic study. When genotypes were imputed using the EuroGenomics reference, a loss of 0.02 in mean reliability of GEBV was detected in the French study, and a loss of 0.06 was observed for the mean reliability of DGV in the Nordic study. Consequently, the reliability of DGV using the imputed SNP data was 0.38 based on national reference data, and 0.48 based on EuroGenomics reference data in the Nordic validation, and the reliability of GEBV using the imputed SNP data was 0.41 based on national reference data, and 0.44 based on EuroGenomics reference data in the French validation.