A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals

Department of Statistics, University of Auckland, Auckland 1142, New Zealand.
The American Journal of Human Genetics (Impact Factor: 10.99). 03/2009; 84(2):210-23. DOI: 10.1016/j.ajhg.2009.01.005
Source: PubMed

ABSTRACT We present methods for imputing data for ungenotyped markers and for inferring haplotype phase in large data sets of unrelated individuals and parent-offspring trios. Our methods make use of known haplotype phase when it is available, and our methods are computationally efficient so that the full information in large reference panels with thousands of individuals is utilized. We demonstrate that substantial gains in imputation accuracy accrue with increasingly large reference panel sizes, particularly when imputing low-frequency variants, and that unphased reference panels can provide highly accurate genotype imputation. We place our methodology in a unified framework that enables the simultaneous use of unphased and phased data from trios and unrelated individuals in a single analysis. For unrelated individuals, our imputation methods produce well-calibrated posterior genotype probabilities and highly accurate allele-frequency estimates. For trios, our haplotype-inference method is four orders of magnitude faster than the gold-standard PHASE program and has excellent accuracy. Our methods enable genotype imputation to be performed with unphased trio or unrelated reference panels, thus accounting for haplotype-phase uncertainty in the reference panel. We present a useful measure of imputation accuracy, allelic R(2), and show that this measure can be estimated accurately from posterior genotype probabilities. Our methods are implemented in version 3.0 of the BEAGLE software package.

1 Follower
  • [Show abstract] [Hide abstract]
    ABSTRACT: Background Both genome-wide association (GWA) studies and genomic selection depend on the level of non-random association of alleles at different loci, i.e. linkage disequilibrium (LD), across the genome. Therefore, characterizing LD is of fundamental importance to implement both approaches. In this study, using a 60K single nucleotide polymorphism (SNP) panel, we estimated LD and haplotype structure in crossbred broiler chickens and their component pure lines (one male and two female lines) and calculated the consistency of LD between these populations. Results The average level of LD (measured by r2) between adjacent SNPs across the chicken autosomes studied here ranged from 0.34 to 0.40 in the pure lines but was only 0.24 in the crossbred populations, with 28.4% of adjacent SNP pairs having an r2 higher than 0.3. Compared with the pure lines, the crossbred populations consistently showed a lower level of LD, smaller haploblock sizes and lower haplotype homozygosity on macro-, intermediate and micro-chromosomes. Furthermore, correlations of LD between markers at short distances (0 to 10 kb) were high between crossbred and pure lines (0.83 to 0.94). Conclusions Our results suggest that using crossbred populations instead of pure lines can be advantageous for high-resolution QTL (quantitative trait loci) mapping in GWA studies and to achieve good persistence of accuracy of genomic breeding values over generations in genomic selection. These results also provide useful information for the design and implementation of GWA studies and genomic selection using crossbred populations. Electronic supplementary material The online version of this article (doi:10.1186/s12711-015-0098-4) contains supplementary material, which is available to authorized users.
    Genetics Selection Evolution 02/2015; 47(1). DOI:10.1186/s12711-015-0098-4 · 3.75 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Cattle breeding populations are susceptible to the propagation of recessive diseases. Individual sires generate tens of thousands of progeny via artificial insemination. The frequency of deleterious alleles carried by such sires may increase considerably within few generations. Deleterious alleles manifest themselves often by missing homozygosity resulting from embryonic/fetal, perinatal or juvenile lethality of homozygotes. Results A scan for homozygous haplotype deficiency in 25,544 Fleckvieh cattle uncovered four haplotypes affecting reproductive and rearing success. Exploiting whole-genome resequencing data from 263 animals facilitated to pinpoint putatively causal mutations in two of these haplotypes. A mutation causing an evolutionarily unlikely substitution in SUGT1 was perfectly associated with a haplotype compromising insemination success. The mutation was not found in homozygous state in 10,363 animals (P = 1.79 × 10−5) and is thus likely to cause lethality of homozygous embryos. A frameshift mutation in SLC2A2 encoding glucose transporter 2 (GLUT2) compromises calf survival. The mutation leads to premature termination of translation and activates cryptic splice sites resulting in multiple exon variants also with premature translation termination. The affected calves exhibit stunted growth, resembling the phenotypic appearance of Fanconi-Bickel syndrome in humans (OMIM 227810), which is also caused by mutations in SLC2A2. Conclusions Exploiting comprehensive genotype and sequence data enabled us to reveal two deleterious alleles in SLC2A2 and SUGT1 that compromise pre- and postnatal survival in homozygous state. Our results provide the basis for genome-assisted approaches to avoiding inadvertent carrier matings and to improving reproductive and rearing success in Fleckvieh cattle. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1483-7) contains supplementary material, which is available to authorized users.
    BMC Genomics 04/2015; 16(1). DOI:10.1186/s12864-015-1483-7 · 4.04 Impact Factor
  • Source
    Nature Genetics 02/2015; 47(2):172-179. · 29.65 Impact Factor


Available from