Article

Haplotype-resolved genome sequencing of a Gujarati Indian individual.

Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA.
Nature Biotechnology (Impact Factor: 39.08). 01/2011; 29(1):59-63. DOI: 10.1038/nbt.1740
Source: PubMed

ABSTRACT Haplotype information is essential to the complete description and interpretation of genomes, genetic diversity and genetic ancestry. Although individual human genome sequencing is increasingly routine, nearly all such genomes are unresolved with respect to haplotype. Here we combine the throughput of massively parallel sequencing with the contiguity information provided by large-insert cloning to experimentally determine the haplotype-resolved genome of a South Asian individual. A single fosmid library was split into a modest number of pools, each providing ∼3% physical coverage of the diploid genome. Sequencing of each pool yielded reads overwhelmingly derived from only one homologous chromosome at any given location. These data were combined with whole-genome shotgun sequence to directly phase 94% of ascertained heterozygous single nucleotide polymorphisms (SNPs) into long haplotype blocks (N50 of 386 kilobases (kbp)). This method also facilitates the analysis of structural variation, for example, to anchor novel insertions to specific locations and haplotypes.

Download full-text

Full-text

Available from: Alexandra P Mackenzie, Jun 28, 2015
3 Followers
 · 
166 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of generating libraries comprised of 9,216 indexed pools, each of which contains thousands of sparsely sequenced long fragments ranging from 5 kilobases to over 1 megabase. These pools are 'sub-haploid', in that the lengths of fragments contained in each pool sums to approximately 5 to 10% of the full genome. The scaffolding approach described here, termed fragScaff, leverages coincidences between the content of different pools as a source of contiguity information. Specifically, CPT-seq data is mapped to a de novo genome assembly, followed by the identification of pairs of contigs or scaffolds whose ends disproportionately co-occur in the same indexed pools, consistent with true adjacency in the genome. Such candidate 'joins' are used to construct a graph, which is then resolved by a minimum spanning tree. As a proof-of-concept, we apply CPT-seq and fragScaff to substantially boost the contiguity of de novo assemblies of the human, mouse, and fly genomes, increasing the scaffold N50 of de novo assemblies by 8 to 57 fold with high accuracy. We also demonstrate that fragScaff is complementary to Hi-C-based contact probability maps, providing mid-range contiguity to support robust, accurate chromosome-scale de novo genome assemblies without the need for laborious in vivo cloning steps. Finally, we demonstrate CPT-seq as a means of anchoring unplaced novel human contigs to the reference genome as well as for detecting misassembled sequences.
    Genome Research 10/2014; 24(12). DOI:10.1101/gr.178319.114 · 13.85 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: There is increasing evidence that the phenotypic effects of genomic sequence variants are best understood in terms of variant haplotypes rather than as isolated polymorphisms. Haplotype analysis is also critically important for uncovering population histories, and for the study of evolutionary genetics. Although the sequencing of individual human genomes to reveal personal collections of sequence variants is now well established, there has been slower progress in the phasing of these variants into pairs of haplotypes along each pair of chromosomes. Here, we have developed a distinct approach to haplotyping that can yield chromosome-length haplotypes, including the vast majority of heterozygous SNPs in an individual human genome. This approach exploits the haploid nature of sperm cells, and employs a combination of genotyping and low-coverage sequencing on a short-read platform. In addition to generating chromosome-length haplotypes, the approach can directly identify recombination events (averaging 1.1 per chromosome) with a median resolution of less than 100 kb.
    Genome Research 01/2013; DOI:10.1101/gr.144600.112 · 13.85 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: While standard DNA-sequencing approaches readily yield genotypic sequence data, haplotype information is often of greater utility for population genetic analyses. However, obtaining individual haplotype sequences can be costly and time-consuming and sometimes requires statistical reconstruction approaches that are subject to bias and error. Advancements have recently been made in determining individual chromosomal sequences in large-scale genomic studies, yet few options exist for obtaining this information from large numbers of highly polymorphic individuals in a cost-effective manner. As a solution, we developed a simple PCR-based method for obtaining sequence information from individual DNA strands using standard laboratory equipment. The method employs a water-in-oil emulsion to separate the PCR mixture into thousands of individual microreactors. PCR within these small vesicles results in amplification from only a single starting DNA template molecule and thus a single haplotype. We improved upon previous approaches by including SYBR Green I and a melted agarose solution in the PCR, allowing easy identification and separation of individually amplified DNA molecules. We demonstrate the use of this method on a highly polymorphic estuarine population of the copepod Eurytemora affinis for which current molecular and computational methods for haplotype determination have been inadequate.
    Molecular Ecology Resources 01/2013; 13(1):135-43. DOI:10.1111/1755-0998.12034 · 5.63 Impact Factor