Comprehensive and relaxed search for oligonucleotide signatures in hierarchically clustered sequence datasets

Services Department of Informatics, Technische Universität München, Boltzmannstrasse 3, 85748 Garching, Germany.
Bioinformatics (Impact Factor: 4.98). 04/2011; 27(11):1546-54. DOI: 10.1093/bioinformatics/btr161
Source: PubMed


Results: We have developed CaSSiS, a fast and scalable method for computing comprehensive collections of sequence- and sequence group-specific oligonucleotide signatures from large sets of hierarchically clustered nucleic acid sequence data. Based on the ARB Positional Tree (PT-)Server and a newly developed BGRT data structure, CaSSiS not only determines sequence-specific signatures and perfect group-covering signatures for every node within the cluster (i.e. target groups), but also signatures with maximal group coverage (sensitivity) within a user-defined range of non-target hits (specificity) for groups lacking a perfect common signature. An upper limit of tolerated mismatches within the target group, as well as the minimum number of mismatches with non-target sequences, can be predefined. Test runs with one of the largest phylogenetic gene sequence datasets available indicate good runtime and memory performance, and in silico spot tests have shown the usefulness of the resulting signature sequences as blueprints for group-specific oligonucleotide probes.

Download full-text


Available from: Christian Grothoff, Jan 11, 2015
    • "Thus, to accelerate the computations, the probe design software tools have to be retooled to permit the computation of many probes based on large sequence datasets. Alignment-free strategies CaSSiS (Comprehensive and Sensitive Signature Search) was developed to address the limited ability of previous probe design software to handle large collections of sequences (Bader et al., 2011). CaSSiS is able to perform fast and comprehensive probe design based on a three-step algorithm. "

    No preview · Chapter · Aug 2014
  • Source
    • "Taking into account this amount of data, high-throughput tools using the SSU rRNA biomarker such as phylogenetic oligonucleotide arrays (POAs) have been developed. Several tools were therefore proposed to select phylogenetic probes such as PRIMROSE (7), ARB PROBE_DESIGN (8), ORMA (9) or CaSSiS (10, 11). Unfortunately, most of these programs are not well-suited for large-scale probe design. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, high-throughput molecular tools have led to an exponential growth of available 16S rRNA gene sequences. Incorporating such data, molecular tools based on target-probe hybridization were developed to monitor microbial communities within complex environments. Unfortunately, only a few 16S rRNA gene-targeted probe collections were described. Here, we present PhylOPDb, an online resource for a comprehensive phylogenetic oligonucleotide probe database. PhylOPDb provides a convivial and easy-to-use web interface to browse both regular and explorative 16S rRNA-targeted probes. Such probes set or subset could be used to globally monitor known and unknown prokaryotic communities through various techniques including DNA microarrays, polymerase chain reaction (PCR), fluorescent in situ hybridization (FISH), targeted gene capture or in silico rapid sequence identification. PhylOPDb contains 74 003 25-mer probes targeting 2178 genera including Bacteria and Archaea.Database URL:
    Full-text · Article · Jan 2014 · Database The Journal of Biological Databases and Curation
  • Source
    • "Consequently, many algorithms have been developed for PCR-assay design. Existing algorithms can be divided into four categories based on the number of target and related non-target sequences that are evaluated during design: (i) a single target sequence (1–4), (ii) multiple target sequences (5–15), (iii) a single target sequence and multiple non-targets (16–18) and (iv) multiple target sequences and multiple non-targets (19–24). The last category is the most general and most challenging problem—the group-specific assay design problem. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Environmental biosurveillance and microbial ecology studies use PCR-based assays to detect and quantify microbial taxa and gene sequences within a complex background of microorganisms. However, the fragmentary nature and growing quantity of DNA-sequence data make group-specific assay design challenging. We solved this problem by developing a software platform that enables PCR-assay design at an unprecedented scale. As a demonstration, we developed quantitative PCR assays for a globally widespread, ecologically important bacterial group in soil, Acidobacteria Group 1. A total of 33 684 Acidobacteria 16S rRNA gene sequences were used for assay design. Following 1 week of computation on a 376-core cluster, 83 assays were obtained. We validated the specificity of the top three assays, collectively predicted to detect 42% of the Acidobacteria Group 1 sequences, by PCR amplification and sequencing of DNA from soil. Based on previous analyses of 16S rRNA gene sequencing, Acidobacteria Group 1 species were expected to decrease in response to elevated atmospheric CO2. Quantitative PCR results, using the Acidobacteria Group 1-specific PCR assays, confirmed the expected decrease and provided higher statistical confidence than the 16S rRNA gene-sequencing data. These results demonstrate a powerful capacity to address previously intractable assay design challenges.
    Full-text · Article · Mar 2012 · Nucleic Acids Research
Show more