CHILD: a new tool for detecting low-abundance insertions and deletions in standard sequence traces

National Institute for Biotechnology in the Negev, Beer Sheva 84105, Israel.
Nucleic Acids Research (Impact Factor: 8.81). 04/2011; 39(7):e47. DOI: 10.1093/nar/gkq1354
Source: PubMed

ABSTRACT Several methods have been proposed for detecting insertion/deletions (indels) from chromatograms generated by Sanger sequencing. However, most such methods are unsuitable when the mutated and normal variants occur at unequal ratios, such as is expected to be the case in cancer, with organellar DNA or with alternatively spliced RNAs. In addition, the current methods do not provide robust estimates of the statistical confidence of their results, and the sensitivity of this approach has not been rigorously evaluated. Here, we present CHILD, a tool specifically designed for indel detection in mixtures where one variant is rare. CHILD makes use of standard sequence alignment statistics to evaluate the significance of the results. The sensitivity of CHILD was tested by sequencing controlled mixtures of deleted and undeleted plasmids at various ratios. Our results indicate that CHILD can identify deleted molecules present as just 5% of the mixture. Notably, the results were plasmid/primer-specific; for some primers and/or plasmids, the deleted molecule was only detected when it comprised 10% or more of the mixture. The false positive rate was estimated to be lower than 0.4%. CHILD was implemented as a user-oriented web site, providing a sensitive and experimentally validated method for the detection of rare indel-carrying molecules in common Sanger sequence reads.


Available from: Eitan Rubin, May 06, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Designer nucleases such as TALENS and Cas9 have opened new opportunities to scarlessly edit the mammalian genome. Here we explored several parameters that influence Cas9-mediated scarless genome editing efficiency in murine embryonic stem cells. Optimization of transfection conditions and enriching for transfected cells are critical for efficiently recovering modified clones. Paired gRNAs and wild-type Cas9 efficiently create programmed deletions, which facilitate identification of targeted clones, while paired gRNAs and the Cas9D10A nickase generated smaller targeted indels with lower chance of off-target mutagenesis. Genome editing is also useful for programmed introduction of exogenous DNA sequences at a target locus. Increasing the length of the homology arms of the homology-directed repair template strongly enhanced targeting efficiency, while increasing the length of the DNA insert reduced it. Together our data provide guidance on optimal design of scarless gene knockout, modification, or knock-in experiments using Cas9 nuclease.
    PLoS ONE 08/2014; 9(8):e105779. DOI:10.1371/journal.pone.0105779 · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3.
    The Scientific World Journal 06/2012; 2012:365104. DOI:10.1100/2012/365104 · 1.73 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Sanger sequencing is a common method of reading DNA sequences. It is less expensive than high-throughput methods, and it is appropriate for numerous applications including molecular diagnostics. However, sequencing mixtures of similar DNA of pathogens with this method is challenging. This is important because most clinical samples contain such mixtures, rather than pure single strains. The traditional solution is to sequence selected clones of PCR products, a complicated, time-consuming, and expensive procedure. Here, we propose the base-calling with vocabulary (BCV) method that computationally deciphers Sanger chromatograms obtained from mixed DNA samples. The inputs to the BCV algorithm are a chromatogram and a dictionary of sequences that are similar to those we expect to obtain. We apply the base-calling function on a test dataset of chromatograms without ambiguous positions, as well as one with 3-14% sequence degeneracy. Furthermore, we use BCV to assemble a consensus sequence for an HIV genome fragment in a sample containing a mixture of viral DNA variants and to determine the positions of the indels. Finally, we detect drug-resistant Mycobacterium tuberculosis strains carrying frameshift mutations mixed with wild-type bacteria in the pncA gene, and roughly characterize bacterial communities in clinical samples by direct 16S rRNA sequencing.
    PLoS ONE 01/2013; 8(1):e54835. DOI:10.1371/journal.pone.0054835 · 3.53 Impact Factor