Article

Automating resequencing-based detection of insertion-deletion polymorphisms Nat. Genet. 38, 1457-1462

Department of Bioengineering, University of Washington, Seattle, Washington 98195, USA.
Nature Genetics (Impact Factor: 29.35). 01/2007; 38(12):1457-62. DOI: 10.1038/ng1925
Source: PubMed

ABSTRACT

Structural and insertion-deletion (indel) variants have received considerable recent attention, partly because of their phenotypic consequences. Among these variants, the most common are small indels ( approximately 1-30 bp). Identifying and genotyping indels using sequence traces obtained from diploid samples requires extensive manual review, which makes large-scale studies inconvenient. We report a new algorithm, implemented in available software (PolyPhred version 6.0), to help automate detection and genotyping of indels from sequence traces. The algorithm identifies heterozygous individuals, which permits the discovery of low-frequency indels. It finds 80% of all indel polymorphisms with almost no false positives and finds 97% with a false discovery rate of 10%. Additionally, genotyping accuracy exceeds 99%, and it correctly infers indel length in 96% of the cases. Using this approach, we identify indels in the HapMap ENCODE regions, providing the first report of these polymorphisms in this data set.

Full-text preview

Available from: uchicago.edu
  • Source
    • "Construction of a high-density linkage map using a large pedigree population and high-quality molecular markers is urgently required for MAS breeding in trees. Gene-derived markers, including simple sequence repeat (SSR) and small insertion and deletion (InDels of 1–30 bp) markers, in the coding or regulatory regions of genes, can alter gene function, transcription, or translation (Bhangale et al., 2006; Du et al., 2013b). Indeed, compared with nongenic markers, gene-derived markers are more reliable for construction of a high-resolution linkage map and can uncover a detailed picture of the QTL responsible for complex traits (Fukuoka et al., 2012). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Deciphering the genetic architecture underlying polygenic traits in perennial species can inform molecular marker-assisted breeding. Recent advances in high-throughput sequencing have enabled strategies that integrate linkage–linkage disequilibrium (LD) mapping in Populus. We used an integrated method of quantitative trait locus (QTL) dissection with a high-resolution linkage map and multi-gene association mapping to decipher the nature of genetic architecture (additive, dominant, and epistatic effects) of potential QTLs for growth traits in a Populus linkage population (1200 progeny) and a natural population (435 individuals). Seventeen QTLs for tree height, diameter at breast height, and stem volume mapped to 11 linkage groups (logarithm of odds (LOD) ≥ 2.5), and explained 2.7–18.5% of the phenotypic variance. After comparative mapping and transcriptome analysis, 187 expressed genes (10 046 common single nucleotide polymorphisms (SNPs)) were selected from the segmental homology regions (SHRs) of 13 QTLs. Using multi-gene association models, we observed 202 significant SNPs in 63 promising genes from 10 QTLs (P ≤ 0.0001; FDR ≤ 0.10) that exhibited reproducible associations with additive/dominant effects, and further determined 11 top-ranked genes tightly linked to the QTLs. Epistasis analysis uncovered a uniquely interconnected gene–gene network for each trait. This study opens up opportunities to uncover the causal networks of interacting genes in plants using an integrated linkage–LD mapping approach.
    Full-text · Article · Oct 2015 · New Phytologist
  • Source
    • "Indels are frequently binned into the categories of " small " and " large " based on sequence length. Small indels span $1–30 bp, whereas large indels can add or remove thousands of base pairs (Bhangale et al. 2006). The molecular mechanisms underlying large indels are fairly well understood; these include transposable element proliferation, transposable element-mediated ectopic recombination, slipped-strand mispairing , and nonhomologous end joining (Petrov et al. 2003; Bennetzen et al. 2005; Ju et al. 2011). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Evolutionary changes in genome size result from the combined effects of mutation, natural selection, and genetic drift. Insertion and deletion mutations (indels) directly impact genome size by adding or removing sequences. Most species lose more DNA through small indels (i.e. ∼1-30 bp) than they gain, which can result in genome reduction over time. Because this rate of DNA loss varies across species, small indel dynamics have been suggested to contribute to genome size evolution. Species with extremely large genomes provide interesting test cases for exploring the link between small indels and genome size; however, most large genomes remain relatively unexplored. Here, we examine rates of DNA loss in the tetrapods with the largest genomes - the salamanders. We used low-coverage genomic shotgun sequence data from four salamander species to examine patterns of insertion, deletion, and substitution in neutrally evolving non-LTR retrotransposon sequences. For comparison, we estimated genome-wide DNA loss rates in non-LTR retrotransposon sequences from five other vertebrate genomes: Anolis carolinensis, Danio rerio, Gallus gallus, Homo sapiens, and Xenopus tropicalis. Our results show that salamanders have significantly lower rates of DNA loss than do other vertebrates. More specifically, salamanders experience lower numbers of deletions relative to insertions, and both deletions and insertions are skewed towards smaller sizes. Based on these patterns, we conclude that slow DNA loss contributes to genomic gigantism in salamanders. We also identify candidate molecular mechanisms underlying these differences and suggest that natural variation in indel dynamics provides a unique opportunity to study the basis of genome stability.
    Full-text · Article · Nov 2012 · Genome Biology and Evolution
  • Source
    • "In examining the MAF distribution of indels detected from NGS, we found that the distribution was consistent with studies of indels detected from sequence traces [1,6]. MAF distributions for CEU and CHB+JPT indels, excluding singletons, were found to have similar proportions of low frequency (MAF < 5%) and common indel variants (MAF > 5%), whereas YRI showed enrichment of low frequency indels relative to the other populations (Figure S1 in Additional file 1). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Indels are an important cause of human variation and central to the study of human disease. The 1000 Genomes Project Low-Coverage Pilot identified over 1.3 million indels shorter than 50 bp, of which over 890 were identified as potentially disruptive variants. Yet, despite their ubiquity, the local genomic characteristics of indels remain unexplored. Herein we describe population- and minor allele frequency-based differences in linkage disequilibrium and imputation characteristics for indels included in the 1000 Genomes Project Low-Coverage Pilot for the CEU, YRI and CHB+JPT populations. Common indels were well tagged by nearby SNPs in all studied populations, and were also tagged at a similar rate to common SNPs. Both neutral and functionally deleterious common indels were imputed with greater than 95% concordance from HapMap Phase 3 and OMNI SNP sites. Further, 38 to 56% of low frequency indels were tagged by low frequency SNPs. We were able to impute heterozygous low frequency indels with over 50% concordance. Lastly, our analysis also revealed evidence of ascertainment bias. This bias prevents us from extending the applicability of our results to highly polymorphic indels that could not be identified in the Low-Coverage Pilot. Although further scope exists to improve the imputation of low frequency indels, our study demonstrates that there are already ample opportunities to retrospectively impute indels for prior genome-wide association studies and to incorporate indel imputation into future case/control studies.
    Full-text · Article · Feb 2012 · Genome biology
Show more