Automating resequencing-based detection of insertion-deletion polymorphisms.

Department of Bioengineering, University of Washington, Seattle, Washington 98195, USA.
Nature Genetics (Impact Factor: 29.65). 01/2007; 38(12):1457-62. DOI: 10.1038/ng1925
Source: PubMed

ABSTRACT Structural and insertion-deletion (indel) variants have received considerable recent attention, partly because of their phenotypic consequences. Among these variants, the most common are small indels ( approximately 1-30 bp). Identifying and genotyping indels using sequence traces obtained from diploid samples requires extensive manual review, which makes large-scale studies inconvenient. We report a new algorithm, implemented in available software (PolyPhred version 6.0), to help automate detection and genotyping of indels from sequence traces. The algorithm identifies heterozygous individuals, which permits the discovery of low-frequency indels. It finds 80% of all indel polymorphisms with almost no false positives and finds 97% with a false discovery rate of 10%. Additionally, genotyping accuracy exceeds 99%, and it correctly infers indel length in 96% of the cases. Using this approach, we identify indels in the HapMap ENCODE regions, providing the first report of these polymorphisms in this data set.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Background: Genome editing techniques, including ZFN, TALEN and CRISPR, have created a need to rapidly screen many F1 individuals to identify carriers of indels and determine the sequences of the mutations. Current techniques require multiple clones of the targeted region to be sequenced for each individual, which is inefficient when many individuals must be analyzed. Direct Sanger sequencing of a PCR amplified region surrounding the target site is efficient, but Sanger sequencing genomes heterozygous for an indel results in a string of “double peaks” due to the mismatched region.Results: In order to facilitate indel identification, we developed an online tool called Poly Peak Parser (available at that is able to separate chromatogram data containing ambiguous base calls into wild-type and mutant allele sequences. This tool allows the nature of the indel to be determined from a single sequencing run per individual performed directly on a PCR product spanning the targeted site, without cloning.Conclusions: The method and algorithm described here facilitate rapid identification and sequence characterization of heterozygous mutant carriers generated by genome editing. Although designed for screening F1 individuals, this tool can also be used to identify heterozygous indels in many contexts. Developmental Dynamics, 2014. © 2014 Wiley Periodicals, Inc.
    Developmental Dynamics 12/2014; 243(12). DOI:10.1002/dvdy.24183 · 2.67 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Building a population-specific catalogue of single nucleotide variants (SNVs), indels and structural variants (SVs) with frequencies, termed a national pan-genome, is critical for further advancing clinical and public health genetics in large cohorts. Here we report a Danish pan-genome obtained from sequencing 10 trios to high depth (50 × ). We report 536k novel SNVs and 283k novel short indels from mapping approaches and develop a population-wide de novo assembly approach to identify 132k novel indels larger than 10 nucleotides with low false discovery rates. We identify a higher proportion of indels and SVs than previous efforts showing the merits of high coverage and de novo assembly approaches. In addition, we use trio information to identify de novo mutations and use a probabilistic method to provide direct estimates of 1.27e-8 and 1.5e-9 per nucleotide per generation for SNVs and indels, respectively.
    Nature Communications 01/2015; 6:5969. DOI:10.1038/ncomms6969 · 10.74 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Complex human diseases usually have multifactorial causes, and may develop as a result of the collective effects of multiple genetic variants, complex gene-gene/gene-environment interactions, rare sequence variants, copy number alterations, epigenetic modifications, etc. Understanding the genetic aetiology of complex human diseases require a comprehensive assessment of these causes. Recently, penalised regression methods have gained popularity in genetic research, aiming to detect genetic, epigenetic and environmental factors contributing to complex human diseases. In this article, we attempt to provide a brief overview of these methods in light of their applications in various contexts of genetic research.


1 Download
Available from