[Show abstract][Hide abstract] ABSTRACT: Background
Advances in human genomics have allowed unprecedented productivity in terms of algorithms, software, and literature available for translating raw next-generation sequence data into high-quality information. The challenges of variant identification in organisms with lower quality reference genomes are less well documented. We explored the consequences of commonly recommended preparatory steps and the effects of single and multi sample variant identification methods using four publicly available software applications (Platypus, HaplotypeCaller, Samtools and UnifiedGenotyper) on whole genome sequence data of 65 key ancestors of Swiss dairy cattle populations. Accuracy of calling next-generation sequence variants was assessed by comparison to the same loci from medium and high-density single nucleotide variant (SNV) arrays.
The total number of SNVs identified varied by software and method, with single (multi) sample results ranging from 17.7 to 22.0 (16.9 to 22.0) million variants. Computing time varied considerably between software. Preparatory realignment of insertions and deletions and subsequent base quality score recalibration had only minor effects on the number and quality of SNVs identified by different software, but increased computing time considerably. Average concordance for single (multi) sample results with high-density chip data was 58.3% (87.0%) and average genotype concordance in correctly identified SNVs was 99.2% (99.2%) across software. The average quality of SNVs identified, measured as the ratio of transitions to transversions, was higher using single sample methods than multi sample methods. A consensus approach using results of different software generally provided the highest variant quality in terms of transition / transversion ratio.
Our findings serve as a reference for variant identification pipeline development in non-human organisms and help assess the implication of preparatory steps in next-generation sequencing pipelines for organisms with incomplete reference genomes (pipeline code is included). Benchmarking this information should prove particularly useful in processing next-generation sequencing data for use in genome-wide association studies and genomic selection.
[Show abstract][Hide abstract] ABSTRACT: Inherited developmental diseases can cause severe animal welfare and economic problems in dairy cattle. The use of a small number of bulls for artificial insemination (AI) carries a risk that recessive defects rapidly enrich in the population. In recent years, an increasing number of Finnish Ayrshire calves have been identified with signs of ptosis, intellectual disability, retarded growth and mortality, which constitute an inherited disorder classified as PIRM syndrome.
[Show abstract][Hide abstract] ABSTRACT: High density genotyping data are indispensable for genomic analyses of complex traits in animal and crop species. Maize is one of the most important crop plants worldwide, however a high density SNP genotyping array for analysis of its large and highly dynamic genome was not available so far.
[Show abstract][Hide abstract] ABSTRACT: Abstract Text: Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls, for imputing sequence variant genotypes into reference sets for genomic prediction. Run 3.0 included 429 sequences, with 31.8 million variants detected. BayesRC, a new method for genomic prediction, addresses some challenges associated with using the sequence data, and takes advantage of biological information. In a dairy data set, predictions using BayesRC and imputed sequence data from 1000 Bull Genomes were 2% more accurate than with 800k data. We could demonstrate the method identified causal mutations in some cases. Further improvements will come from more accurate imputation of sequence variant genotypes and improved biological information.
Keywords: Genomic prediction, whole genome sequence, biological information
10th World Congress on Genetics Applied to Livestock Production; 08/2014
[Show abstract][Hide abstract] ABSTRACT: Sequence data were generated from 157 animals of the Fleckvieh population. A pre-phasing based approach was used to impute genotypes for 21,045,178 polymorphic sites into 10,363 target animals genotyped with high-density arrays. Imputed sequence variants were used in an association study with daughter-derived phenotypes for milk fat percentage. The association study identified ten QTL controlling fat percentage in Fleckvieh cattle. Two postulated causal variants in the DGAT1 and GHR genes yielded the most significant association signals. Sequence-based association studies for udder conformation traits demonstrated a complex genetic architecture of mammary gland development in cattle. The association studies identified eight, six and seven QTL underlying udder depth, teat length and teat thickness, respectively. Imputed sequence variants captured genetic effects at a better resolution than array-based genotypes. However, even when considering large-scale imputed sequence variants, a significant fraction of the heritability remains "missing".
10th World Congress on Genetics Applied to Livestock Production, Vancouver, CA; 08/2014
[Show abstract][Hide abstract] ABSTRACT: Bovine hereditary zinc deficiency (BHZD) is an autosomal recessive disorder of cattle, first described in Holstein-Friesian animals. Affected calves suffer from severe skin lesions and show a poor general health status. Recently, eight calves with the phenotypic appearance of BHZD have been reported in the Fleckvieh cattle population.
[Show abstract][Hide abstract] ABSTRACT: The 1000 bull genomes project supports the goal of accelerating the rates of genetic gain in domestic cattle while at the same time considering animal health and welfare by providing the annotated sequence variants and genotypes of key ancestor bulls. In the first phase of the 1000 bull genomes project, we sequenced the whole genomes of 234 cattle to an average of 8.3-fold coverage. This sequencing includes data for 129 individuals from the global Holstein-Friesian population, 43 individuals from the Fleckvieh breed and 15 individuals from the Jersey breed. We identified a total of 28.3 million variants, with an average of 1.44 heterozygous sites per kilobase for each individual. We demonstrate the use of this database in identifying a recessive mutation underlying embryonic death and a dominant mutation underlying lethal chrondrodysplasia. We also performed genome-wide association studies for milk production and curly coat, using imputed sequence variants, and identified variants associated with these traits in cattle.
[Show abstract][Hide abstract] ABSTRACT: Human driven selection during domestication and subsequent breed formation has likely left detectable signatures within the genome of modern cattle. The elucidation of these signatures of selection is of interest from the perspective of evolutionary biology, and for identifying domestication-related genes that ultimately may help to further genetically improve this economically important animal. To this end, we employed a panel of more than 15 million autosomal SNPs identified from re-sequencing of 43 Fleckvieh animals. We mainly applied two somewhat complementary statistics, the integrated Haplotype Homozygosity Score (iHS) reflecting primarily ongoing selection, and the Composite of Likelihood Ratio (CLR) having the most power to detect completed selection after fixation of the advantageous allele. We find 106 candidate selection regions, many of which are harboring genes related to phenotypes relevant in domestication, such as coat coloring pattern, neurobehavioral functioning and sensory perception including KIT, MITF, MC1R, NRG4, Erbb4, TMEM132D and TAS2R16, among others. To further investigate the relationship between genes with signatures of selection and genes identified in QTL mapping studies, we use a sample of 3062 animals to perform four genome-wide association analyses using appearance traits, body size and somatic cell count. We show that regions associated with coat coloring significantly (P<0.0001) overlap with the candidate selection regions, suggesting that the selection signals we identify are associated with traits known to be affected by selection during domestication. Results also provide further evidence regarding the complexity of the genetics underlying coat coloring in cattle. This study illustrates the potential of population genetic approaches for identifying genomic regions affecting domestication-related phenotypes and further helps to identify specific regions targeted by selection during speciation, domestication and breed formation of cattle. We also show that Linkage Disequilibrium (LD) decays in cattle at a much faster rate than previously thought.
[Show abstract][Hide abstract] ABSTRACT: Genetic variants underlying reduced male reproductive performance have been identified in humans and model organisms, most of them compromising semen quality. Occasionally, male fertility is severely compromised although semen analysis remains without any apparent pathological findings (i.e., idiopathic subfertility). Artificial insemination (AI) in most cattle populations requires close examination of all ejaculates before insemination. Although anomalous ejaculates are rejected, insemination success varies considerably among AI bulls. In an attempt to identify genetic causes of such variation, we undertook a genome-wide association study (GWAS). Imputed genotypes of 652,856 SNPs were available for 7962 AI bulls of the Fleckvieh (FV) population. Male reproductive ability (MRA) was assessed based on 15.3 million artificial inseminations. The GWAS uncovered a strong association signal on bovine chromosome 19 (P = 4.08×10(-59)). Subsequent autozygosity mapping revealed a common 1386 kb segment of extended homozygosity in 40 bulls with exceptionally poor reproductive performance. Only 1.7% of 35,671 inseminations with semen samples of those bulls were successful. None of the bulls with normal reproductive performance was homozygous, indicating recessive inheritance. Exploiting whole-genome re-sequencing data of 43 animals revealed a candidate causal nonsense mutation (rs378652941, c.483C>A, p.Cys161X) in the transmembrane protein 95 encoding gene TMEM95 which was subsequently validated in 1990 AI bulls. Immunohistochemical investigations evidenced that TMEM95 is located at the surface of spermatozoa of fertile animals whereas it is absent in spermatozoa of subfertile animals. These findings imply that integrity of TMEM95 is required for an undisturbed fertilisation. Our results demonstrate that deficiency of TMEM95 severely compromises male reproductive performance in cattle and reveal for the first time a phenotypic effect associated with genomic variation in TMEM95.
[Show abstract][Hide abstract] ABSTRACT: The implementation of high-throughput genotyping arrays in routine applications in livestock breeding programs yields genotypes for a large number of single nucleotide polymorphisms (SNPs). Dense SNP information enables both genomic predictions and the identification of quantitative trait loci (QTL) via genome-wide association studies(GWAS). Current genome-wide analyses of cattle populations rely on genotypes of 45,000 SNPs. We demonstrate that increasing the marker density only marginallyincreases the power of GWAS in the Fleckvieh population. However, sufficiently sized samples are crucial for successful genome-wide analyses of complex traits. Using phenotypes for milkfat percentage, we highlight that the identification of QTL that explain a small fraction (< 1%) of the genetic variation only, requires genotypes of several thousands of individuals. Identifying the underlying genomic variation is mandatory to permanently integrate QTL information in breeding programs. A key factor for the identification of causal trait variants is the availability of sequencing data of a population's key animals. Sequence-derived variants can be imputed for any individual with high-density genotypes. This enables to perform sequence-based association studies and to directly test putatively causal variants for association with phenotypes of interest.
[Show abstract][Hide abstract] ABSTRACT: This study investigated reliability of genomic predictions using medium-density (40,089; 50K) or high-density (HD; 388,951) marker sets. We developed an approximate method to test differences in validation reliability for significance. Model-based reliability and the effect of HD genotypes on inflation of predictions were analyzed additionally. Genomic breeding values were predicted for at least 1,321 validation bulls based on phenotypes and genotypes of at least 5,324 calibration bulls by means of a linear model in milk, fat, and protein yield; somatic cell score; milkability; muscling; udder, feet, and legs score as well as stature. In total, 1,485 bulls were actually HD genotyped and HD genotypes of the other animals were imputed from 50K genotypes using FImpute software. Validation reliability was measured as the coefficient of determination of the weighted regression of daughter yield deviations on predicted breeding values divided by the reliability of daughter yield deviations and inflation was evaluated by the slope of this regression. Model-based reliability was calculated from the model. Distributions for validation reliability of 50K markers were derived by repeated sampling of 50,000-marker samples from HD to test differences in validation reliability statistically. Additionally, the benefit of HD genotypes in validation reliability was tested by repeated sampling of validation groups and calculation of the difference in validation reliability between HD and 50K genotypes for the sampled groups of bulls. The mean benefit in validation reliability of HD genotypes was 0.015 compared with real 50K genotypes and 0.028 compared with 50K samples from HD affected by imputation error and was significant for all traits. The model-based reliability was, on average, 0.036 lower and the regression coefficient was 0.036 closer to the expected value with HD genotypes. The observed gain in validation reliability with HD genotypes was similar to expectations based on the number of markers and the effective number of segregating chromosome segments. Sampling error in the marker-based relationship coefficients causing overestimation of the model-based reliability was smaller with HD genotypes. Inflation of the genomic predictions was reduced with HD genotypes, accordingly. Similar effects on model-based reliability and inflation, but not on the validation reliability, were obtained by shrinkage estimation of the realized relationship matrix from 50K genotypes.
Journal of Dairy Science 11/2013; · 2.57 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: With the Weihenstephan funnel nest box, 12 laying hen flocks were tested for their individual laying performance, egg quality, and nesting behavior in a noncage environment. During the whole observation period of 8 yr, a transponder-based data recording system was continuously improved and resulted in a recording accuracy of 97%. At peak production, heritabilities for the number of eggs laid are in some flocks higher than expected. With improved data accuracy, heritability estimates on individual egg weights are more stable. Heritabilities for nesting behavior traits range between a low to moderate level, providing very useful information for laying hen selection to help improve traits that cannot be recorded in cages. Over the years, the benefits of the Weihenstephan funnel nest box for laying hen breeders have grown. This is due to higher data recording accuracies and extended testing capacities, which result in more reliable genetic parameters.
[Show abstract][Hide abstract] ABSTRACT: Genome- and population-wide re-sequencing would allow for most efficient detection of causal trait variants. However, despite a strong decrease of costs for next-generation sequencing in the last few years, re-sequencing of large numbers of individuals is not yet affordable. We therefore resorted to re-sequencing of a limited number of bovine animals selected to explain a major proportion of the population's genomic variation, so called key animals, in order to provide a catalogue of functional variants and a substrate for population- and genome-wide imputation of variable sites.
Forty-three animals accounting for about 69 percent of the genetic diversity of the Fleckvieh population, a cattle breed of Southern Germany and Austria, were sequenced with coverages ranging from 4.17 to 24.98 and averaging 7.46. After alignment to the reference genome (UMD3.1) and multi-sample variant calling, more than 17 million variant positions were identified, about 90 percent biallelic single nucleotide variants (SNVs) and 10 percent short insertions and deletions (InDels). The comparison with high-density chip data revealed a sensitivity of at least 92 percent and a specificity of 81 percent for sequencing based genotyping, and 97 percent and 93 percent when a imputation step was included. There are 91,733 variants in coding regions of 18,444 genes, 46 percent being non-synonymous exchanges, of which 575 variants are predicted to cause premature stop codons. Three variants are listed in the OMIA database as causal for specific phenotypes.
Low- to medium-coverage re-sequencing of individuals explaining a major fraction of a population's genomic variation allows for the efficient and reliable detection of most variants. Imputation strongly improves genotype quality of lowly covered samples and thus enables maximum density genotyping by sequencing. The functional annotation of variants provides the basis for exhaustive genotype imputation in the population, e.g., for highest-resolution genome-wide association studies.
[Show abstract][Hide abstract] ABSTRACT: BACKGROUND: Currently, genome-wide evaluation of cattle populations is based on SNP-genotyping using ~ 54 000 SNP. Increasing the number of markers might improve genomic predictions and power of genome-wide association studies. Imputation of genotypes makes it possible to extrapolate genotypes from lower to higher density arrays based on a representative reference sample for which genotypes are obtained at higher density. METHODS: Genotypes using 639 214 SNP were available for 797 bulls of the Fleckvieh cattle breed. The data set was divided into a reference and a validation population. Genotypes for all SNP except those included in the BovineSNP50 Bead chip were masked and subsequently imputed for animals of the validation population. Imputation of genotypes was performed with Beagle, findhap.f90, MaCH and Minimac. The accuracy of the imputed genotypes was assessed for four different scenarios including 50, 100, 200 and 400 animals as reference population. The reference animals were selected to account for 78.03%, 89.21%, 97.47% and > 99% of the gene pool of the genotyped population, respectively. RESULTS: Imputation accuracy increased as the number of animals and relatives in the reference population increased. Population-based algorithms provided highly reliable imputation of genotypes, even for scenarios with 50 and 100 reference animals only. Using MaCH and Minimac, the correlation between true and imputed genotypes was > 0.975 with 100 reference animals only. Pre-phasing the genotypes of both the reference and validation populations not only provided highly accurate imputed genotypes but was also computationally efficient. Genome-wide analysis of imputation accuracy led to the identification of many misplaced SNP. CONCLUSIONS: Genotyping key animals at high density and subsequent population-based genotype imputation yield high imputation accuracy. Pre-phasing the genotypes of the reference and validation populations is computationally efficient and results in high imputation accuracy, even when the reference population is small.
[Show abstract][Hide abstract] ABSTRACT: Impaired migration of primordial germ cells during embryonic development causes hereditary gonadal hypoplasia in both sexes of Northern Finncattle and Swedish Mountain cattle. The affected gonads exhibit a lack of or, in rare cases, a reduced number of germ cells. Most affected animals present left-sided gonadal hypoplasia. However, right-sided and bilateral cases are also found. This type of gonadal hypoplasia prevails in animals with white coat colour. Previous studies indicated that gonadal hypoplasia is inherited in an autosomal recessive fashion with incomplete penetrance. In order to identify genetic regions underlying gonadal hypoplasia, a genome-wide association study (GWAS) and a copy number variation (CNV) analysis were performed with 94 animals, including 21 affected animals, using bovine 777,962 SNP arrays. The GWAS and CNV results revealed two significantly associated regions on bovine chromosomes (BTA) 29 and 6, respectively (P=2.19 x 10(-13) and P=5.65 x 10(-6)). Subsequent cytogenetic and PCR analyses demonstrated that homozygosity of a ~500 kb chromosomal segment translocated from BTA6 to BTA29 (Cs29 allele) is the underlying genetic mechanism responsible for gonadal hypoplasia. The duplicated segment includes the KIT gene that is known to regulate the migration of germ cells and precursors of melanocytes. This duplication is also one of the two translocations associated with colour sidedness in various cattle breeds.
PLoS ONE 01/2013; 8(9):e75659. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Genome-wide association studies and genomic evaluation using a dense set of genetic markers both require a large number of genotyped individuals. Collection of the respective samples contributes substantially to the cost of the approach. In dairy cattle research, the use of residues from routine milk recording would be a cost-saving alternative to obtain samples for an appropriate number of individuals with specific phenotypes in a very short time. To assess the suitability of milk recording residues, we concurrently investigated milk residues obtained after standardized milk recording procedures and blood samples from 115 cows originating from 3 farms with different milking systems by genotyping 15 microsatellite markers. We found that 4% of the milk samples were possibly assigned to the wrong animal (i.e., conflicts) and that at least 27% of the milk residues were contaminated, as indicated by an extra allele not present in the blood sample. These additional alleles primarily originated from a sample with a higher somatic cell score that went through the milk sample analyzer in the milk laboratory before the target sample. Furthermore, additional allele carryover was observed across more than one sample, when the difference in somatic cell count between samples exceeded 100,000 cells/mL. Finally, in several samples, the extra allele could not be traced back to previous samples passing through the milk sample analyzer. One source of those contaminations might be sample collection on-farm due to milk traces from the previously milked cow in the hose. No correlation was found between the farm management and conflicts or contaminations. We conclude that residues from routine milk recording are not suitable for genomic evaluation or genome-wide association studies because of the high prevalence of contamination generated at several steps during the collection and processing of milk residual samples.
Journal of Dairy Science 09/2012; 95(9):5436-41. · 2.57 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Next generation sequencing of 43 key and contemporary animals, explaining 68% of the gene pool of the German FV population, and subsequent multi-sample variant calling yielded genotypes at 17.3 million sites. Pre-phasing both the sequence and the array data with Beagle and subsequent population-wide imputation with MiniMac facilitated to extrapolate genotypes at 12 million SNPs and 1.5 million InDels for 3668 FV animals via high-and medium-density genotypes. The accuracy of the imputed genotypes exceeded 95%. Thus imputed 13.5 million genotypes were used in genome-wide association studies (GWAS) with progeny-derived phenotypes for milk-fat content at different lactation stages. The sequence-based GWAS identified nine QTL, among them a highly significantly associated QTL for milk-fat content in early lactation on chromosome 27. The QTL contains GPAT4 which encodes a rate-limiting enzyme in the triacylglycerol biosynthesis pathway and plays a key role in milk fat biosynthesis. The association was more significant than that obtained from using array-based markers only (5.79 x 10 -20 vs. 3.63 x 10 -11). The most significantly associated SNP is located in the 3'-UTR of GPAT4 and affects a putative microRNA binding site. The SNP reached high significance (4.01 x 10 -17) in an independent validation study with 2327 animals of the German Holstein-Friesian population and is an excellent candidate to be the underlying QTN for the milk-fat content QTL on chromosome 27.
International Society for Animal Genetics, Cairns; 07/2012
[Show abstract][Hide abstract] ABSTRACT: Supernumerary teats (hyperthelia, SNTs) are a common abnormality of the bovine udder with a medium to high heritability and a postulated oligogenic or polygenic inheritance pattern. SNTs not only negatively affect machine milking ability but also act as a reservoir for bacteria. A genome-wide association study was carried out to identify genes involved in the development of SNTs in the dual-purpose Fleckvieh breed. A total of 2467 progeny-tested bulls were genotyped at 43 698 single nucleotide polymorphisms, and daughter yield deviations (DYDs) for 'udder clearness' (UC) were used as high-heritability phenotypes. Massive structuring of the study population was accounted for by principal components analysis-based and mixed model-based approaches. Four loci on BTA5, BTA6, BTA11 and BTA17 were significantly associated with the UC DYD. Three associated regions contain genes of the highly conserved Wnt signalling pathway. The four QTL together account for 10.7% of the variance of the UC DYD, whereas the major fraction of the DYD variance is attributable to chromosomes with no identified QTL. Our results support both an oligogenic and a polygenic inheritance pattern of SNTs in cattle. The identified candidate genes permit insights into the genetic architecture of teat malformations in cattle and provide clues to unravel the molecular mechanisms of mammary gland alterations in cattle and other species.
[Show abstract][Hide abstract] ABSTRACT: Hitchhiking mapping and association studies are two popular approaches to map genotypes to phenotypes. In this study we combine both approaches to complement their specific strengths and weaknesses, resulting in a method with higher statistical power and fewer false positive signals. We applied our approach to dairy cattle as they underwent extremely successful selection for milk production traits and since an excellent phenotypic record is available. We performed whole genome association tests with a new mixed model approach to account for stratification, which we validated via Monte Carlo simulations. Selection signatures were inferred with the integrated haplotype score and a locus specific permutation based integrated haplotype score that works with a folded frequency spectrum and provides a formal test of signifance to identify selection signatures.
About 1,600 out of 34,851 SNPs showed signatures of selection and the locus specific permutation based integrated haplotype score showed overall good accordance with the whole genome association study. Each approach provides distinct information about the genomic regions that influence complex traits. Combining whole genome association with hitchhiking mapping yielded two significant loci for the trait protein yield. These regions agree well with previous results from other selection signature scans and whole genome association studies in cattle.
We show that the combination of whole genome association and selection signature mapping based on the same SNPs increases the power to detect loci influencing complex traits. The locus specific permutation based integrated haplotype score provides a formal test of significance in selection signature mapping. Importantly it does not rely on knowledge of ancestral and derived allele states.
[Show abstract][Hide abstract] ABSTRACT: Genomic selection, where selection decisions are based on estimates of breeding value from genome wide-marker effects, has enormous potential to improve genetic gain in dairy and beef cattle. Although successful in dairy cattle, some major challenges remain 1) only a proportion of the genetic variance is captured, particularly for some traits 2) marker effects are rarely consistent across breeds, 3) accuracy of genomic predictions decays rapidly over time. Using full genome sequences rather than DNA markers in genomic selection could address these challenges. However, sequencing all individuals in the very large resource populations required to estimate the typically small effects of mutations on target traits would be prohibitively expensive. An alternative is to sequence key ancestors contributing most of the genetic material of the current population, and to use this reference for imputation of sequence from SNP chip data. The reference set must still be large, in order to capture for example, rare variants which are likely to explain some of the variation in our target traits. Recognising the need for a comprehensive “reference set” of key ancestors by many groups undertaking cattle research and cattle breeding programs, we have initiated the 1000 bull genomes project. The project will assemble whole genome sequences of cattle from institutions around the world, to provide an extended data base for imputation of genetic variants. This will enable the bovine genomics community to impute full genome sequence from SNP genotypes, and then use this data for genomic selection, and rapid discovery of causal mutations. Some preliminary results from the variant detection pipeline will be reported.
International Plant and Animal Genome Conference XX 2012; 01/2012