-
Brock A Peters,
Bahram G Kermani,
Andrew B Sparks,
Oleg Alferov,
Peter Hong,
Andrei Alexeev,
Yuan Jiang,
Fredrik Dahl,
Y Tom Tang,
Juergen Haas, [......],
Kaliprasad Pothuraju,
Karel Konvicka,
Mike Tsoupko-Sitnikov,
Krishna P Pant,
Jessica C Ebert, Geoffrey B Nilsen,
Jonathan Baccash,
Aaron L Halpern,
George M Church,
Radoje Drmanac
[show abstract]
[hide abstract]
ABSTRACT: Recent advances in whole-genome sequencing have brought the vision of personal genomics and genomic medicine closer to reality. However, current methods lack clinical accuracy and the ability to describe the context (haplotypes) in which genome variants co-occur in a cost-effective manner. Here we describe a low-cost DNA sequencing and haplotyping process, long fragment read (LFR) technology, which is similar to sequencing long single DNA molecules without cloning or separation of metaphase chromosomes. In this study, ten LFR libraries were made using only ∼100 picograms of human DNA per sample. Up to 97% of the heterozygous single nucleotide variants were assembled into long haplotype contigs. Removal of false positive single nucleotide variants not phased by multiple LFR haplotypes resulted in a final genome error rate of 1 in 10 megabases. Cost-effective and accurate genome sequencing and haplotyping from 10-20 human cells, as demonstrated here, will enable comprehensive genetic studies and diverse clinical applications.
Nature 07/2012; 487(7406):190-5. · 36.28 Impact Factor
-
Paolo Carnevali,
Jonathan Baccash,
Aaron L Halpern,
Igor Nazarenko, Geoffrey B Nilsen,
Krishna P Pant,
Jessica C Ebert,
Anushka Brownley,
Matt Morenzoni,
Vitali Karpinchyk,
Bruce Martin,
Dennis G Ballinger,
Radoje Drmanac
[show abstract]
[hide abstract]
ABSTRACT: Unchained base reads on self-assembling DNA nanoarrays have recently emerged as a promising approach to low-cost, high-quality resequencing of human genomes. Because of unique characteristics of these mated pair reads, existing computational methods for resequencing assembly, such as those based on map-consensus calling, are not adequate for accurate variant calling. We describe novel computational methods developed for accurate calling of SNPs and short substitutions and indels (<100 bp); the same methods apply to evaluation of hypothesized larger, structural variations. We use an optimization process that iteratively adjusts the genome sequence to maximize its a posteriori probability given the observed reads. For each candidate sequence, this probability is computed using Bayesian statistics with a simple read generation model and simplifying assumptions that make the problem computationally tractable. The optimization process iteratively applies one-base substitutions, insertions, and deletions until convergence is achieved to an optimum diploid sequence. A local de novo assembly procedure that generalizes approaches based on De Bruijn graphs is used to seed the optimization process in order to reduce the chance of converging to local optima. Finally, a correlation-based filter is applied to reduce the false positive rate caused by the presence of repetitive regions in the reference genome.
Journal of computational biology: a journal of computational molecular cell biology 12/2011; 19(3):279-92. · 1.69 Impact Factor
-
Radoje Drmanac,
Andrew B Sparks,
Matthew J Callow,
Aaron L Halpern,
Norman L Burns,
Bahram G Kermani,
Paolo Carnevali,
Igor Nazarenko, Geoffrey B Nilsen,
George Yeung, [......],
Dylan Vu,
Alexander Wait Zaranek,
Xiaodi Wu,
Snezana Drmanac,
Arnold R Oliphant,
William C Banyai,
Bruce Martin,
Dennis G Ballinger,
George M Church,
Clifford A Reid
[show abstract]
[hide abstract]
ABSTRACT: Genome sequencing of large numbers of individuals promises to advance the understanding, treatment, and prevention of human diseases, among other applications. We describe a genome sequencing platform that achieves efficient imaging and low reagent consumption with combinatorial probe anchor ligation chemistry to independently assay each base from patterned nanoarrays of self-assembling DNA nanoballs. We sequenced three human genomes with this platform, generating an average of 45- to 87-fold coverage per genome and identifying 3.2 to 4.5 million sequence variants per genome. Validation of one genome data set demonstrates a sequence accuracy of about 1 false variant per 100 kilobases. The high accuracy, affordable cost of $4400 for sequencing consumables, and scalability of this platform enable complete human genome sequencing for the detection of rare variants in large-scale genetic studies.
Science 11/2009; 327(5961):78-81. · 31.20 Impact Factor
-
Kelly A Frazer,
Eleazar Eskin,
Hyun Min Kang,
Molly A Bogue,
David A Hinds,
Erica J Beilharz,
Robert V Gupta,
Julie Montgomery,
Matt M Morenzoni, Geoffrey B Nilsen,
Charit L Pethiyagoda,
Laura L Stuve,
Frank M Johnson,
Mark J Daly,
Claire M Wade,
David R Cox
[show abstract]
[hide abstract]
ABSTRACT: A dense map of genetic variation in the laboratory mouse genome will provide insights into the evolutionary history of the species and lead to an improved understanding of the relationship between inter-strain genotypic and phenotypic differences. Here we resequence the genomes of four wild-derived and eleven classical strains. We identify 8.27 million high-quality single nucleotide polymorphisms (SNPs) densely distributed across the genome, and determine the locations of the high (divergent subspecies ancestry) and low (common subspecies ancestry) SNP-rate intervals for every pairwise combination of classical strains. Using these data, we generate a genome-wide haplotype map containing 40,898 segments, each with an average of three distinct ancestral haplotypes. For the haplotypes in the classical strains that are unequivocally assigned ancestry, the genetic contributions of the Mus musculus subspecies--M. m. domesticus, M. m. musculus, M. m. castaneus and the hybrid M. m. molossinus--are 68%, 6%, 3% and 10%, respectively; the remaining 13% of haplotypes are of unknown ancestral origin. The considerable regional redundancy of the SNP data will facilitate imputation of the majority of these genotypes in less-densely typed classical inbred strains to provide a complete view of variation in additional strains.
Nature 09/2007; 448(7157):1050-3. · 36.28 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Individual differences in DNA sequence are the genetic basis of human variability. We have characterized whole-genome patterns of common human DNA variation by genotyping 1,586,383 single-nucleotide polymorphisms (SNPs) in 71 Americans of European, African, and Asian ancestry. Our results indicate that these SNPs capture most common genetic variation as a result of linkage disequilibrium, the correlation among common SNP alleles. We observe a strong correlation between extended regions of linkage disequilibrium and functional genomic elements. Our data provide a tool for exploring many questions that remain regarding the causal role of common human DNA variation in complex human traits and for investigating the nature of genetic variation within and between human populations.
Science 03/2005; 307(5712):1072-9. · 31.20 Impact Factor