[show abstract][hide abstract] ABSTRACT: Genotyping with the medium-density Bovine SNP50 BeadChip(R) (50K) is now standard in cattle. The high-density BovineHD BeadChip(R), which contains 777 609 single nucleotide polymorphisms (SNPs), was developed in 2010. Increasing marker density increases the level of linkage disequilibrium between quantitative trait loci (QTL) and SNPs and the accuracy of QTL localization and genomic selection. However, re-genotyping all animals with the high-density chip is not economically feasible. An alternative strategy is to genotype part of the animals with the high-density chip and to impute high-density genotypes for animals already genotyped with the 50K chip. Thus, it is necessary to investigate the error rate when imputing from the 50K to the high-density chip.
Five thousand one hundred and fifty three animals from 16 breeds (89 to 788 per breed) were genotyped with the high-density chip. Imputation error rates from the 50K to the high-density chip were computed for each breed with a validation set that included the 20% youngest animals. Marker genotypes were masked for animals in the validation population in order to mimic 50K genotypes. Imputation was carried out using the Beagle 3.3.0 software.
Mean allele imputation error rates ranged from 0.31% to 2.41% depending on the breed. In total, 1980 SNPs had high imputation error rates in several breeds, which is probably due to genome assembly errors, and we recommend to discard these in future studies. Differences in imputation accuracy between breeds were related to the high-density-genotyped sample size and to the genetic relationship between reference and validation populations, whereas differences in effective population size and level of linkage disequilibrium showed limited effects. Accordingly, imputation accuracy was higher in breeds with large populations and in dairy breeds than in beef breeds. More than 99% of the alleles were correctly imputed if more than 300 animals were genotyped at high-density. No improvement was observed when multi-breed imputation was performed.
In all breeds, imputation accuracy was higher than 97%, which indicates that imputation to the high-density chip was accurate. Imputation accuracy depends mainly on the size of the reference population and the relationship between reference and target populations.
[show abstract][hide abstract] ABSTRACT: Recently, the amount of available single nucleotide polymorphism (SNP) marker data has considerably increased in dairy cattle breeds, both for research purposes and for application in commercial breeding and selection programs. Bayesian methods are currently used in the genomic evaluation of dairy cattle to handle very large sets of explanatory variables with a limited number of observations. In this study, we applied 2 Bayesian methods, BayesCπ and Bayesian least absolute shrinkage and selection operator (LASSO), to 2 genotyped and phenotyped reference populations consisting of 3,940 Holstein bulls and 1,172 Montbéliarde bulls with approximately 40,000 polymorphic SNP. We compared the accuracy of the Bayesian methods for the prediction of 3 traits (milk yield, fat content, and conception rate) with pedigree-based BLUP, genomic BLUP, partial least squares (PLS) regression, and sparse PLS regression, a variable selection PLS variant. The results showed that the correlations between observed and predicted phenotypes were similar in BayesCπ (including or not pedigree information) and Bayesian LASSO for most of the traits and whatever the breed. In the Holstein breed, Bayesian methods led to higher correlations than other approaches for fat content and were similar to genomic BLUP for milk yield and to genomic BLUP and PLS regression for the conception rate. In the Montbéliarde breed, no method dominated the others, except BayesCπ for fat content. The better performances of the Bayesian methods for fat content in Holstein and Montbéliarde breeds are probably due to the effect of the DGAT1 gene. The SNP identified by the BayesCπ, Bayesian LASSO, and sparse PLS regression methods, based on their effect on the different traits of interest, were located at almost the same position on the genome. As the Bayesian methods resulted in regressions of direct genomic values on daughter trait deviations closer to 1 than for the other methods tested in this study, Bayesian methods are suggested for genomic evaluations of French dairy cattle.
Journal of Dairy Science 11/2012; · 2.57 Impact Factor
[show abstract][hide abstract] ABSTRACT: In many situations, genome-wide association studies are performed in populations presenting stratification. Mixed models including a kinship matrix accounting for genetic relatedness among individuals have been shown to correct for population and/or family structure. Here we extend this methodology to generalized linear mixed models which properly model data under various distributions. In addition we perform association with ancestral haplotypes inferred using a hidden Markov model.
The method was shown to properly account for stratification under various simulated scenari presenting population and/or family structure. Use of ancestral haplotypes resulted in higher power than SNPs on simulated datasets. Application to real data demonstrates the usefulness of the developed model. Full analysis of a dataset with 4600 individuals and 500 000 SNPs was performed in 2 h 36 min and required 2.28 Gb of RAM.
The software GLASCOW can be freely downloaded from www.giga.ulg.ac.be/jcms/prod_381171/software.
Supplementary data are available at Bioinformatics online.
[show abstract][hide abstract] ABSTRACT: Genomic selection involves computing a prediction equation from the estimated effects of a large number of DNA markers based on a limited number of genotyped animals with phenotypes. The number of observations is much smaller than the number of independent variables, and the challenge is to find methods that perform well in this context. Partial least squares regression (PLS) and sparse PLS were used with a reference population of 3,940 genotyped and phenotyped French Holstein bulls and 39,738 polymorphic single nucleotide polymorphism markers. Partial least squares regression reduces the number of variables by projecting independent variables onto latent structures. Sparse PLS combines variable selection and modeling in a one-step procedure. Correlations between observed phenotypes and phenotypes predicted by PLS and sparse PLS were similar, but sparse PLS highlighted some genome regions more clearly. Both PLS and sparse PLS were more accurate than pedigree-based BLUP and generally provided lower correlations between observed and predicted phenotypes than did genomic BLUP. Furthermore, PLS and sparse PLS required similar computing time to genomic BLUP for the study of 6 traits.
Journal of Dairy Science 04/2012; 95(4):2120-31. · 2.57 Impact Factor
[show abstract][hide abstract] ABSTRACT: Size of the reference population and reliability of phenotypes are crucial factors influencing the reliability of genomic predictions. It is therefore useful to combine closely related populations. Increased accuracies of genomic predictions depend on the number of individuals added to the reference population, the reliability of their phenotypes, and the relatedness of the populations that are combined.
This paper assesses the increase in reliability achieved when combining four Holstein reference populations of 4000 bulls each, from European breeding organizations, i.e. UNCEIA (France), VikingGenetics (Denmark, Sweden, Finland), DHV-VIT (Germany) and CRV (The Netherlands, Flanders). Each partner validated its own bulls using their national reference data and the combined data, respectively.
Combining the data significantly increased the reliability of genomic predictions for bulls in all four populations. Reliabilities increased by 10%, compared to reliabilities obtained with national reference populations alone, when they were averaged over countries and the traits evaluated. For different traits and countries, the increase in reliability ranged from 2% to 19%.
Genomic selection programs benefit greatly from combining data from several closely related populations into a single large reference population.
[show abstract][hide abstract] ABSTRACT: For genomic selection methods, the statistical challenge is to estimate the effect of each of the available single-nucleotide polymorphism (SNP). In a context where the number of SNPs (p) is much higher than the number of bulls (n), this task may lead to a poor estimation of these SNP effects if, as for genomic BLUP (gBLUP), all SNPs have a non-null effect. An alternative is to use approaches that have been developed specifically to solve the 'p > n' problem. This is the case of variable selection methods and among them, we focus on the Elastic-Net (EN) algorithm that is a penalized regression approach. Performances of EN, gBLUP and pedigree-based BLUP were compared with data from three French dairy cattle breeds, giving very encouraging results for EN. We tried to push further the idea of improving SNP effect estimates by considering fewer of them. This variable selection strategy was considered both in the case of gBLUP and EN by adding an SNP pre-selection step based on quantitative trait locus (QTL) detection. Similar results were observed with or without a pre-selection step, in terms of correlations between direct genomic value (DGV) and observed daughter yield deviation in a validation data set. However, when applied to the EN algorithm, this strategy led to a substantial reduction of the number of SNPs included in the prediction equation. In a context where the number of genotyped animals and the number of SNPs gets larger and larger, SNP pre-selection strongly alleviates computing requirements and ensures that national evaluations can be completed within a reasonable time frame.
Genetics Research 12/2011; 93(6):409-17. · 2.00 Impact Factor
[show abstract][hide abstract] ABSTRACT: The purpose of this study was to investigate the imputation error and loss of reliability of direct genomic values (DGV) or genomically enhanced breeding values (GEBV) when using genotypes imputed from a 3,000-marker single nucleotide polymorphism (SNP) panel to a 50,000-marker SNP panel. Data consisted of genotypes of 15,966 European Holstein bulls from the combined EuroGenomics reference population. Genotypes with the low-density chip were created by erasing markers from 50,000-marker data. The studies were performed in the Nordic countries (Denmark, Finland, and Sweden) using a BLUP model for prediction of DGV and in France using a genomic marker-assisted selection approach for prediction of GEBV. Imputation in both studies was done using a combination of the DAGPHASE 1.1 and Beagle 2.1.3 software. Traits considered were protein yield, fertility, somatic cell count, and udder depth. Imputation of missing markers and prediction of breeding values were performed using 2 different reference populations in each country: either a national reference population or a combined EuroGenomics reference population. Validation for accuracy of imputation and genomic prediction was done based on national test data. Mean imputation error rates when using national reference animals was 5.5 and 3.9% in the Nordic countries and France, respectively, whereas imputation based on the EuroGenomics reference data set gave mean error rates of 4.0 and 2.1%, respectively. Prediction of GEBV based on genotypes imputed with a national reference data set gave an absolute loss of 0.05 in mean reliability of GEBV in the French study, whereas a loss of 0.03 was obtained for reliability of DGV in the Nordic study. When genotypes were imputed using the EuroGenomics reference, a loss of 0.02 in mean reliability of GEBV was detected in the French study, and a loss of 0.06 was observed for the mean reliability of DGV in the Nordic study. Consequently, the reliability of DGV using the imputed SNP data was 0.38 based on national reference data, and 0.48 based on EuroGenomics reference data in the Nordic validation, and the reliability of GEBV using the imputed SNP data was 0.41 based on national reference data, and 0.44 based on EuroGenomics reference data in the French validation.
Journal of Dairy Science 07/2011; 94(7):3679-86. · 2.57 Impact Factor
[show abstract][hide abstract] ABSTRACT: Empirical experience with genomic selection in dairy cattle suggests that the distribution of the effects of single nucleotide polymorphisms (SNPs) might be far from normality for some traits. An alternative, avoiding the use of arbitrary prior information, is the Bayesian Lasso (BL). Regular BL uses a common variance parameter for residual and SNP effects (BL1Var). We propose here a BL with different residual and SNP effect variances (BL2Var), equivalent to the original Lasso formulation. The λ parameter in Lasso is related to genetic variation in the population. We also suggest precomputing individual variances of SNP effects by BL2Var, to be later used in a linear mixed model (HetVar-GBLUP). Models were tested in a cross-validation design including 1756 Holstein and 678 Montbéliarde French bulls, with 1216 and 451 bulls used as training data; 51 325 and 49 625 polymorphic SNP were used. Milk production traits were tested. Other methods tested included linear mixed models using variances inferred from pedigree estimates or integrated out from the data. Estimates of genetic variation in the population were close to pedigree estimates in BL2Var but not in BL1Var. BL1Var shrank breeding values too little because of the common variance. BL2Var was the most accurate method for prediction and accommodated well major genes, in particular for fat percentage. BL1Var was the least accurate. HetVar-GBLUP was almost as accurate as BL2Var and allows for simple computations and extensions.
Genetics Research 12/2010; 93(1):77-87. · 2.00 Impact Factor
[show abstract][hide abstract] ABSTRACT: Dense marker maps require efficient statistical methods for QTL fine mapping that work fast and efficiently with a large number of markers. In this study, the simulated dataset for the XIIth QTLMAS workshop was analyzed using a QTL fine mapping set of tools.
The QTL fine-mapping strategy was based on the use of statistical methods combining linkage and linkage disequilibrium analysis. Variance component based linkage analysis provided confidence intervals for the QTL. Within these regions, two additional analyses combining both linkage analysis and linkage disequilibrium information were applied. The first method estimated identity-by-descent probabilities among base haplotypes that were used to group them in different clusters. The second method constructed haplotype groups based on identity-by-state probabilities.
Two QTL explaining 9.4 and 3.3% of the genetic variance were found with high significance on chromosome 1 at positions 19.5 and 76.6 cM. On chromosome 2, two QTL were also detected at positions 26.0 and 53.2 explaining respectively 9.0 and 7.8 of total genetic variance. The QTL detected on chromosome 3 at position 11.9 cM (5% of variance) was less important. The QTL with the highest effect (37% of variance) was detected on chromosome 4 at position 3.1 cM and another QTL (13.6% of variance) was detected on chromosome 5 at position 93.9 cM.
The proposed strategy for fine-mapping of QTL combining linkage and linkage disequilibrium analysis allowed detecting the most important QTL with an additive effect in a short period but it should be extended in the future in order to fine-map linked and epistatic QTL.
[show abstract][hide abstract] ABSTRACT: The purpose of this study was to map quantitative trait loci (QTL) influencing female fertility estimated by non-return rate (NRR) in the French dairy cattle breeds Prim'Holstein, Normande and Montbeliarde. The first step was a QTL detection study on NRR at 281 days after artificial insemination on 78 half-sib families including 4993 progeny tested bulls. In Prim'Holstein, three QTL were identified on Bos taurus chromosomes BTA01, BTA02 and BTA03 (p < 0.01), whereas one QTL was identified in Normande on BTA01 (p < 0.05). The second step aimed at confirming these three QTL and refining their location by selecting and genotyping additional microsatellite markers on a sub-sample of 41 families from the three breeds using NRR within 56, 90 and 281 days after AI. Only the three QTL initially detected in Prim'Holstein were confirmed. Moreover, the analysis of NRR within 56, 90 and 281 days after AI allowed us to distinguish two FF QTL on BTA02 in Prim'Holstein, one for NRR56 and one for NRR90. Estimated QTL variance was 18%, 14%, 11.5% and 14% of the total genetic variance, respectively, for QTL mapping to BTA01, BTA02 (NRR90 and NRR56) and BTA03.
Journal of Animal Breeding and Genetics 09/2008; 125(4):280-8. · 1.65 Impact Factor
[show abstract][hide abstract] ABSTRACT: French artificial insemination companies have been running a marker-assisted selection program since 2001 to determine which young bulls should be progeny tested. A first batch of 899 Holstein sires receiving their first proofs based on progeny daughters has been studied. Estimated breeding values with or without marker information were computed based on information available in April 2004, and correlated to daughter yield deviations available in 2007 for production traits. Marker-assisted estimated breeding values presented greater correlations with daughter yield deviations than those calculated using only pedigree index. The average improvement in correlation was 0.043 and ranged from +0.001 for protein yield to +0.103 for fat percentage. This gain was based on the initial and suboptimal conditions of the program and is expected to increase in the coming years because of several improvements implemented since the start of the marker-assisted selection program.
Journal of Dairy Science 07/2008; 91(6):2520-2. · 2.57 Impact Factor
[show abstract][hide abstract] ABSTRACT: Fertility quantitative trait loci (QTL) are of high interest in dairy cattle since insemination failure has dramatically increased in some breeds such as Holstein. High-throughput SNP analysis and SNP microarrays give the opportunity to genotype many animals for hundreds SNPs per chromosome. In this study, due to these techniques a dense SNP marker map was used to fine map a QTL underlying nonreturn rate measured 90 days after artificial insemination previously detected with a low-density microsatellite marker map. A granddaughter design with 17 Holstein half-sib families (926 offspring) was genotyped for a set of 437 SNPs mapping to BTA3. Linkage analysis was performed by both regression and variance components analysis. An additional analysis combining both linkage analysis and linkage-disequilibrium information was applied. This method first estimated identity-by-descent probabilities among base haplotypes. These probabilities were then used to group the base haplotypes in different clusters. A QTL explaining 14% of the genetic variance was found with high significance (P < 0.001) at position 19 cM with the linkage analysis and four sires were estimated to be heterozygous (P < 0.05). Addition of linkage-disequilibrium information refined the QTL position to a set of narrow peaks. The use of the haplotypes of heterozygous sires offered the possibility to give confidence in some peaks while others could be discarded. Two peaks with high likelihood-ratio test values in the region of which heterozygous sires shared a common haplotype appeared particularly interesting. Despite the fact that the analysis did not fine map the QTL in a unique narrow region, the method proved to be able to handle efficiently and automatically a large amount of information and to refine the QTL position to a small set of narrow intervals. In addition, the QTL identified was confirmed to have a large effect (explaining 13.8% of the genetic variance) on dairy cow fertility as estimated by nonreturn rate at 90 days.
[show abstract][hide abstract] ABSTRACT: Abstract
The efficiency of the French marker-assisted selection (MAS) was estimated by a simulation study. The data files of two different time periods were used: April 2004 and 2006. The simulation method used the structure of the existing French MAS: same pedigree, same marker genotypes and same animals with records. The program simulated breeding values and new records based on this existing structure and knowledge on the QTL used in MAS (variance and frequency). Reliabilities of genetic values of young animals (less than one year old) obtained with and without marker information were compared to assess the efficiency of MAS for evaluation of milk, fat and protein yields and fat and protein contents. Mean gains of reliability ranged from 0.015 to 0.094 and from 0.038 to 0.114 in 2004 and 2006, respectively. The larger number of animals genotyped and the use of a new set of genetic markers can explain the improvement of MAS reliability from 2004 to 2006. This improvement was also observed by analysis of information content for young candidates. The gain of MAS reliability with respect to classical selection was larger for sons of sires with genotyped progeny daughters with records. Finally, it was shown that when superiority of MAS over classical selection was estimated with daughter yield deviations obtained after progeny test instead of true breeding values, the gain was underestimated.
[show abstract][hide abstract] ABSTRACT: Two quantitative trait loci (QTL) affecting female fertility were mapped in French dairy cattle. Phenotypes were non-return rates at 28, 56, 90 and 282 days after insemination. On chromosome 3, a QTL was significant at 1% for non-return rate at 90 days, suggesting that it affects early fertility events. An analysis of SLC35A3, which causes complex vertebral malformation, excluded this gene from the QTL interval. On chromosome 7, a QTL was almost significant (P = 0.05) for non-return rate at 282 days. This QTL was associated with abortion and stillbirth problems. Use of appropriate phenotypes appeared important for fine-mapping QTL associated with fertility.
[show abstract][hide abstract] ABSTRACT: The efficiency of the French marker-assisted selection (MAS) was estimated by a simulation study. The data files of two different time periods were used: April 2004 and 2006. The simulation method used the structure of the existing French MAS: same pedigree, same marker genotypes and same animals with records. The program simulated breeding values and new records based on this existing structure and knowledge on the QTL used in MAS (variance and frequency). Reliabilities of genetic values of young animals (less than one year old) obtained with and without marker information were compared to assess the efficiency of MAS for evaluation of milk, fat and protein yields and fat and protein contents. Mean gains of reliability ranged from 0.015 to 0.094 and from 0.038 to 0.114 in 2004 and 2006, respectively. The larger number of animals genotyped and the use of a new set of genetic markers can explain the improvement of MAS reliability from 2004 to 2006. This improvement was also observed by analysis of information content for young candidates. The gain of MAS reliability with respect to classical selection was larger for sons of sires with genotyped progeny daughters with records. Finally, it was shown that when superiority of MAS over classical selection was estimated with daughter yield deviations obtained after progeny test instead of true breeding values, the gain was underestimated.
[show abstract][hide abstract] ABSTRACT: French artificial insemination companies have been running a marker-assisted selection program since 2001 to determine which young bulls should be progeny tested. A first batch of 899 Holstein sires receiving their first proofs based on progeny daughters has been stud- ied. Estimated breeding values with or without marker information were computed based on information avail- able in April 2004, and correlated to daughter yield devi- ations available in 2007 for production traits. Marker- assisted estimated breeding values presented greater correlations with daughter yield deviations than those calculated using only pedigree index. The average im- provement in correlation was 0.043 and ranged from +0.001 for protein yield to +0.103 for fat percentage. This gain was based on the initial and suboptimal conditions of the program and is expected to increase in the coming years because of several improvements implemented since the start of the marker-assisted selection program.
[show abstract][hide abstract] ABSTRACT: 1. Historical context and major players The French programme of Marker Assisted Selection (MAS) was initiated in 2001 by INRA, LABOGENA and UNCEIA (on behalf of 8 breeding companies 1) for the three major French dairy breeds: Holstein, Montbéliarde and Normande. INRA (Institut National de la Recherche Agronomique) is the French institute for research in Agriculture, Environment and Nutrition. UNCEIA (Union Nationale des Coopératives d'Elevage et d'Insémination Animale) federates nearly all AI breeding organisations. LABOGENA is a genotyping and parentage testing laboratory with seven shareholders including INRA and UNCEIA. The setting of a first MAS programme (MAS1) followed the discovery of many QTL in a large collaborative research programme at the end of the nineties (Boichard et al., 2003). From 2001 until September 2008, more than 70000 animals (male and female selection candidates as well as many relatives) were genotyped on 45 microsattelites covering 14 chromosomal regions of 10 to 30 cM each (Boichard et al., 2002). The advent of dense SNP markers chips led to a new research project called Cartofine (="Fine Mapping") with the same partners and supported by both ANR, the funding agency under the umbrella of the French Ministry of Research and GIS AGENAE, a joint initiative between INRA, UNCEIA and the French breeding companies. This research programme benefited from the availability of the Illumina BovineSNP50 Chip. Around 3300 proven bulls from large families of 1 AMELIS, CREAVIA, GDO, MIDATEST and DYNAM'IS for the Holstein breed, UMOTEST and JURA BETAIL for the Montbéliarde breed and GNA for the Normande breed.