Accurate and comprehensive sequencing of personal genomes.

Genome Informatics Section, Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.
Genome Research (Impact Factor: 13.85). 07/2011; 21(9):1498-505. DOI: 10.1101/gr.123638.111
Source: PubMed

ABSTRACT As whole-genome sequencing becomes commoditized and we begin to sequence and analyze personal genomes for clinical and diagnostic purposes, it is necessary to understand what constitutes a complete sequencing experiment for determining genotypes and detecting single-nucleotide variants. Here, we show that the current recommendation of ∼30× coverage is not adequate to produce genotype calls across a large fraction of the genome with acceptably low error rates. Our results are based on analyses of a clinical sample sequenced on two related Illumina platforms, GAII(x) and HiSeq 2000, to a very high depth (126×). We used these data to establish genotype-calling filters that dramatically increase accuracy. We also empirically determined how the callable portion of the genome varies as a function of the amount of sequence data used. These results help provide a "sequencing guide" for future whole-genome sequencing decisions and metrics by which coverage statistics should be reported.


Available from: Hatice Ozel Abaan, Feb 21, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Next-generation DNA sequencing has revolutionized the field of genetics and genomics, providing researchers with the tools to efficiently identify novel rare and low frequency risk variants, which was not practical with previously available methodologies. These methods allow for the sequence capture of a specific locus or small genetic region all the way up to the entire six billion base pairs of the diploid human genome. Rheumatic diseases are a huge burden on the US population, affecting more than 46 million Americans. Those afflicted suffer from one or more of the more than 100 diseases characterized by inflammation and loss of function, mainly of the joints, tendons, ligaments, bones, and muscles. While genetics studies of many of these diseases (for example, systemic lupus erythematosus, rheumatoid arthritis, and inflammatory bowel disease) have had major successes in defining their genetic architecture, causal alleles and rare variants have still been elusive. This review describes the current high-throughput DNA sequencing methodologies commercially available and their application to rheumatic diseases in both case–control as well as family-based studies.
    Arthritis research & therapy 11/2014; 16(6):490. DOI:10.1186/s13075-014-0490-4 · 4.12 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous SNVs; 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and ensemble genotyping would be essential to minimize false positive DNM candidates.This article is protected by copyright. All rights reserved
    Human Mutation 08/2014; 35(8). DOI:10.1002/humu.22587 · 5.05 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: To date, hundreds of fungal genomes have been sequenced and many more are in progress. This wealth of genomic information has provided new directions to study fungal biodiversity. However, to further dissect and understand the complicated biological mechanisms involved in fungal life styles, functional studies beyond genomes are required. Thanks to the developments of current -omics techniques, it is possible to produce large amounts of fungal functional data in a high-throughput fashion (e.g. transcriptome, proteome, etc.). The increasing ease of creating -omics data has also created a major challenge for downstream data handling and analysis. Numerous databases, tools and software have been created to meet this challenge. Facing such a richness of techniques and information, hereby we provide a brief roadmap on current wet-lab and bioinformatics approaches to study functional genomics in fungi.
    Briefings in functional genomics 07/2014; DOI:10.1093/bfgp/elu028 · 3.43 Impact Factor