To read the full-text of this research, you can request a copy directly from the authors.
Based on pairwise identity-by-state (IBS) distances and whole-genome SNP data, kinship was investigated in the Israeli Holstein population. A total of 789 bulls, including most of the artificial insemination sires in service since 1987, were genotyped by the BovineSNP50 BeadChip. This sample included up to five generations. For each bull-by-bull combination, three states are possible for each marker: no match, a single match and both alleles match. Summing over all markers, the 932 598 IBS scores (three match frequencies*310 866 bull-by-bull combinations) were visualized using three-dimensional coordinates that corresponded to the frequencies of the three possible states. Results were reduced to two dimensions using the transformations x' = 0.7071(1 + freq1-freq2) and y' = 1.2247freq0. Bull-by-bull pairs were grouped according to their level of kinship, and canonical scores were calculated using discriminant analysis and the x' and y' features. Of the 474 pairs of recorded maternal grandsire-grandson with both individuals genotyped, the probability for 28 pairs to belong to this level of kinship was low (P < 0.05), suggesting an error rate of around 3% per generation in pedigree determination.
To read the full-text of this research, you can request a copy directly from the authors.
... Similar recommendations can be given for the foreseeable exclusive use of SNPs in the near future. The evaluation and applicability of SNPs in parentage control has been shown in several studies [3,30,. However, it is also clear that the currently recommended minimal number of SNPs might not be sufficient to eliminate false-negative results [28,47]. ...
Methods for parentage control in cattle have changed since their initial implementation in the late 1950's from blood group typing to more current single nucleotide polymorphism determination. In the early 1990's, 12 microsatellites were selected by the International Society for Animal Genetics based on their informativeness and robustness in a variety of different cattle breeds. Since then this panel is used as standard in cattle herd book breeding and its application is accompanied by recurrent international comparison tests ensuring permanent validity for the most common commercial dairy and beef cattle breeds for example Holstein Friesian, Simmental, Angus, and Hereford. Although, nearly every parentage can be resolved using these microsatellites, cases with very close relatives became an emerging resolution problem during recent years. This is mainly due to an increase of monomorphism and a trend to the fixation of alleles, although no direct selection against their variability was applied. Thus other effects must be presumed resulting in a loss of polymorphism information content, heterozygosity, and exclusion probabilities.
To determine changes of allele frequencies and exclusion probabilities, we analyzed the development of these parameters for the 12 microsatellites from 2004 to 2014. One hundred sixty eight thousand recorded Holstein Friesian cattle genotypes were evaluated. During this period certain alleles of nine microsatellites increased significantly (t-values >5). When calculating the exclusion probabilities for 11 microsatellites, reduction was determined for the three situations, i.e. one parent is wrongly identified (p = 0.01), both parents are wrongly identified (p = 0.005), and the genotype of one parent is missing (p = 0.048). With the addition of BM1818 to the marker set in 2009, this development was corrected leading to significant increases in exclusion probabilities. Although, the exclusion probabilities for the three family situations using the 12 microsatellites are >99 %, the clarification of 142 relationships in 40,000 situations where one parent is missing will still be impossible. Twenty-five sires were identified that are responsible for the most significant microsatellite allele increases in the population. The corresponding alleles are mainly associated with milk protein and fat yield, body weight at birth and weaning, as well as somatic cell score, milk fat percentage, and longissimus muscle area.
Our data show that most of the microsatellites used for parentage control in cattle show directional changes in allele frequencies consistent with the history of artificial selection in the German Holstein population.
The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses – the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferroni-type procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.
Using SNP genotypes to apply genomic selection in breeding programs is becoming common practice. Tools to edit and check the quality of genotype data are required. Checking for Mendelian inconsistencies makes it possible to identify animals for which pedigree information and genotype information are not in agreement.
Straightforward tests to detect Mendelian inconsistencies exist that count the number of opposing homozygous marker (e.g. SNP) genotypes between parent and offspring (PAR-OFF). Here, we develop two tests to identify Mendelian inconsistencies between sibs. The first test counts SNP with opposing homozygous genotypes between sib pairs (SIBCOUNT). The second test compares pedigree and SNP-based relationships (SIBREL). All tests iteratively remove animals based on decreasing numbers of inconsistent parents and offspring or sibs. The PAR-OFF test, followed by either SIB test, was applied to a dataset comprising 2,078 genotyped cows and 211 genotyped sires. Theoretical expectations for distributions of test statistics of all three tests were calculated and compared to empirically derived values. Type I and II error rates were calculated after applying the tests to the edited data, while Mendelian inconsistencies were introduced by permuting pedigree against genotype data for various proportions of animals.
Both SIB tests identified animal pairs for which pedigree and genomic relationships could be considered as inconsistent by visual inspection of a scatter plot of pairwise pedigree and SNP-based relationships. After removal of 235 animals with the PAR-OFF test, SIBCOUNT (SIBREL) identified 18 (22) additional inconsistent animals.Seventeen animals were identified by both methods. The numbers of incorrectly deleted animals (Type I error), were equally low for both methods, while the numbers of incorrectly non-deleted animals (Type II error), were considerably higher for SIBREL compared to SIBCOUNT.
Tests to remove Mendelian inconsistencies between sibs should be preceded by a test for parent-offspring inconsistencies. This parent-offspring test should not only consider parent-offspring pairs based on pedigree data, but also those based on SNP information. Both SIB tests could identify pairs of sibs with Mendelian inconsistencies. Based on type I and II error rates, counting opposing homozygotes between sibs (SIBCOUNT) appears slightly more precise than comparing genomic and pedigree relationships (SIBREL) to detect Mendelian inconsistencies between sibs.
Nearly 57,000 single-nucleotide polymorphisms (SNP) genotyped with the Illumina BovineSNP50 BeadChip (Illumina Inc., San Diego, CA) were investigated to determine usefulness of the associated SNP for genomic prediction. Genotypes were obtained for 12,591 bulls and cows, and SNP were selected based on 5,503 bulls with genotypes from a larger set of SNP. The following SNP were deleted: 6,572 that were monomorphic, 3,213 with scoring problems (primarily because of poor definition of clusters and excess number of clusters), and 3,649 with a minor allele frequency of <2%. Number of SNP for each minor allele frequency class (> or =2%) was fairly uniform (777 to 1,004). For 5 contiguous SNP assigned to chromosome 7, no bulls were heterozygous, which indicated that those SNP are actually on the nonpseudoautosomal portion of the X chromosome. Another 178 SNP that were not assigned to a chromosome but that had many fewer heterozygotes than expected were also assigned to the X chromosome. Existence of Hardy-Weinberg equilibrium was investigated by comparing observed with expected heterozygosity. For 11 SNP, the observed percentage of heterozygous individuals differed from the expected by >15%; therefore, those SNP were deleted. For 2,628 SNP, the genotype at another SNP was highly correlated (i.e., genotypes were identical for >99.5% of bulls), and those were deleted. After edits, 40,874 SNP remained. A parent-progeny conflict was declared when the genotypes were alternate homozygotes. Mean number of conflicts was 2.3 when pedigree was correct and 2,411 when it was incorrect. The sire was genotyped for >93% of animals. Maternal grandsire genotype was similarly checked; however, because alternate homozygotes could be valid, a conflict threshold of 16% was used to indicate a need for further investigation. Genotyping consistency was investigated for 21 bulls genotyped twice with differences primarily from SNP that were not scored in one of the genotypes. Concordance for readable SNP was extremely high (99.96-100%). Thousands of SNP that were polymorphic in Holsteins were monomorphic in Jerseys or Brown Swiss, which indicated that breed-specific SNP sets are required or that all breeds need to be considered in the SNP selection process. Genotypes from the Illumina BovineSNP50 BeadChip are of high accuracy and provide the basis for genomic evaluations in the United States and Canada.
We present a simple algorithm for reconstruction of haplotypes from a sample of multilocus genotypes. The algorithm is aimed specifically for analysis of very large pedigrees for small chromosomal segments, where recombination frequency within the chromosomal segment can be assumed to be zero. The algorithm was tested both on simulated pedigrees of 155 individuals in a family structure of three generations and on real data of 1149 animals from the Israeli Holstein dairy cattle population, including 406 bulls with genotypes, but no females with genotypes. The rate of haplotype resolution for the simulated data was >91% with a standard deviation of 2%. With 20% missing data, the rate of haplotype resolution was 67.5% with a standard deviation of 1.3%. In both cases all recovered haplotypes were correct. In the real data, allele origin was resolved for 22% of the heterozygous genotypes, even though 70% of the genotypes were missing. Haplotypes were resolved for 36% of the males. Computing time was insignificant for both data sets. Despite the intricacy of large-scale real pedigree genotypes, the proposed algorithm provides a practical rule-based solution for resolving haplotypes for small chromosomal segments in commercial animal populations.
The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses — the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.
Incorrect paternity assignment in cattle can have a major effect on rates of genetic gain. Of the 576 Israeli Holstein bulls genotyped by the BovineSNP50 BeadChip, there were 204 bulls for which the father was also genotyped. The results of 38 828 valid single nucleotide polymorphisms (SNPs) were used to validate paternity, determine the genotyping error rates and determine criteria enabling deletion of defective SNPs from further analysis. Based on the criterion of >2% conflicts between the genotype of the putative sire and son, paternity was rejected for seven bulls (3.5%). The remaining bulls had fewer conflicts by one or two orders of magnitude. Excluding these seven bulls, all other discrepancies between sire and son genotypes are assumed to be caused by genotyping mistakes. The frequency of discrepancies was >0.07 for nine SNPs, and >0.025 for 81 SNPs. The overall frequency of discrepancies was reduced from 0.00017 to 0.00010 after deletion of these 81 SNPs, and the total expected fraction of genotyping errors was estimated to be 0.05%. Paternity of bulls that are genotyped for genomic selection may be verified or traced against candidate sires at virtually no additional cost.
Efficient methods for processing genomic data were developed to increase reliability of estimated breeding values and to estimate thousands of marker effects simultaneously. Algorithms were derived and computer programs tested with simulated data for 2,967 bulls and 50,000 markers distributed randomly across 30 chromosomes. Estimation of genomic inbreeding coefficients required accurate estimates of allele frequencies in the base population. Linear model predictions of breeding values were computed by 3 equivalent methods: 1) iteration for individual allele effects followed by summation across loci to obtain estimated breeding values, 2) selection index including a genomic relationship matrix, and 3) mixed model equations including the inverse of genomic relationships. A blend of first- and second-order Jacobi iteration using 2 separate relaxation factors converged well for allele frequencies and effects. Reliability of predicted net merit for young bulls was 63% compared with 32% using the traditional relationship matrix. Nonlinear predictions were also computed using iteration on data and nonlinear regression on marker deviations; an additional (about 3%) gain in reliability for young bulls increased average reliability to 66%. Computing times increased linearly with number of genotypes. Estimation of allele frequencies required 2 processor days, and genomic predictions required <1 d per trait, and traits were processed in parallel. Information from genotyping was equivalent to about 20 daughters with phenotypic records. Actual gains may differ because the simulation did not account for linkage disequilibrium in the base population or selection in subsequent generations.
The effect of pedigree errors on estimated breeding value and genetic gain for a sex-limited trait with heritability of 0.25 was evaluated. Ten populations of 100,000 milking cows were simulated with correct paternity identification for all animals, and 10 populations were simulated with 10% incorrect paternal identification. The initial populations consisted of 100,000 unrelated individuals, and simulations were continued for 20 yr. The BLUP genetic evaluations were computed every year by an animal model analysis for each complete population. Estimated breeding values for the populations with 10% incorrect paternity were biased, especially in the later generations. Genetic gains were 4.3% higher with correct paternity identification. Reduction of pedigree errors by paternity confirmation of daughters of test sires by DNA microsatellites may result in considerable economic benefits, depending on the cost of testing in each country.
Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.
Identifi-cation of Mendelian inconsistencies between SNP and pedigree information of sibs Effect of misidentification on genetic gain and estimation of breeding value in dairy cattle populations
Calus M P L Mulder
H A Bastiaansen
Calus M.P.L., Mulder H.A. & Bastiaansen J.W.M. (2011) Identifi-cation of Mendelian inconsistencies between SNP and pedigree information of sibs. Genetics, Selection, Evolution 43, 34. Israel C. & Weller J.I. (2000) Effect of misidentification on genetic gain and estimation of breeding value in dairy cattle populations. Journal of Dairy Science 83, 181–7.