Article

A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals.

Department of Statistics, University of Auckland, Auckland 1142, New Zealand.
The American Journal of Human Genetics (Impact Factor: 11.2). 03/2009; 84(2):210-23. DOI: 10.1016/j.ajhg.2009.01.005
Source: PubMed

ABSTRACT We present methods for imputing data for ungenotyped markers and for inferring haplotype phase in large data sets of unrelated individuals and parent-offspring trios. Our methods make use of known haplotype phase when it is available, and our methods are computationally efficient so that the full information in large reference panels with thousands of individuals is utilized. We demonstrate that substantial gains in imputation accuracy accrue with increasingly large reference panel sizes, particularly when imputing low-frequency variants, and that unphased reference panels can provide highly accurate genotype imputation. We place our methodology in a unified framework that enables the simultaneous use of unphased and phased data from trios and unrelated individuals in a single analysis. For unrelated individuals, our imputation methods produce well-calibrated posterior genotype probabilities and highly accurate allele-frequency estimates. For trios, our haplotype-inference method is four orders of magnitude faster than the gold-standard PHASE program and has excellent accuracy. Our methods enable genotype imputation to be performed with unphased trio or unrelated reference panels, thus accounting for haplotype-phase uncertainty in the reference panel. We present a useful measure of imputation accuracy, allelic R(2), and show that this measure can be estimated accurately from posterior genotype probabilities. Our methods are implemented in version 3.0 of the BEAGLE software package.

1 Bookmark
 · 
203 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants.
    Nature Communications 01/2014; 5:3934. DOI:10.1038/ncomms4934 · 10.74 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Higher-order chromatin structure is emerging as an important regulator of gene expression. Although dynamic chromatin structures have been identified in the genome, the full scope of chromatin dynamics during mammalian development and lineage specification remains to be determined. By mapping genome-wide chromatin interactions in human embryonic stem (ES) cells and four human ES-cell-derived lineages, we uncover extensive chromatin reorganization during lineage specification. We observe that although self-associating chromatin domains are stable during differentiation, chromatin interactions both within and between domains change in a striking manner, altering 36% of active and inactive chromosomal compartments throughout the genome. By integrating chromatin interaction maps with haplotype-resolved epigenome and transcriptome data sets, we find widespread allelic bias in gene expression correlated with allele-biased chromatin states of linked promoters and distal enhancers. Our results therefore provide a global view of chromatin dynamics and a resource for studying long-range control of gene expression in distinct human cell lineages.
    Nature 02/2015; 518(7539):331-6. DOI:10.1038/nature14222 · 42.35 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Exploitation of heterosis is one of the most important applications of genetics in agriculture. However, the genetic mechanisms of heterosis are only partly understood, and a global view of heterosis from a representative number of hybrid combinations is lacking. Here we develop an integrated genomic approach to construct a genome map for 1,495 elite hybrid rice varieties and their inbred parental lines. We investigate 38 agronomic traits and identify 130 associated loci. In-depth analyses of the effects of heterozygous genotypes reveal that there are only a few loci with strong overdominance effects in hybrids, but a strong correlation is observed between the yield and the number of superior alleles. While most parental inbred lines have only a small number of superior alleles, high-yielding hybrid varieties have several. We conclude that the accumulation of numerous rare superior alleles with positive dominance is an important contributor to the heterotic phenomena.
    Nature Communications 02/2015; 6:6258. DOI:10.1038/ncomms7258 · 10.74 Impact Factor

Preview

Download
4 Downloads
Available from