Comprehensive and relaxed search for oligonucleotide signatures in hierarchically clustered sequence datasets.

Services Department of Informatics, Technische Universität München, Boltzmannstrasse 3, 85748 Garching, Germany.
Bioinformatics (Impact Factor: 5.47). 04/2011; 27(11):1546-54. DOI: 10.1093/bioinformatics/btr161
Source: PubMed

ABSTRACT PCR, hybridization, DNA sequencing and other important methods in molecular diagnostics rely on both sequence-specific and sequence group-specific oligonucleotide primers and probes. Their design depends on the identification of oligonucleotide signatures in whole genome or marker gene sequences. Although genome and gene databases are generally available and regularly updated, collections of valuable signatures are rare. Even for single requests, the search for signatures becomes computationally expensive when working with large collections of target (and non-target) sequences. Moreover, with growing dataset sizes, the chance of finding exact group-matching signatures decreases, necessitating the application of relaxed search methods. The resultant substantial increase in complexity is exacerbated by the dearth of algorithms able to solve these problems efficiently.
We have developed CaSSiS, a fast and scalable method for computing comprehensive collections of sequence- and sequence group-specific oligonucleotide signatures from large sets of hierarchically clustered nucleic acid sequence data. Based on the ARB Positional Tree (PT-)Server and a newly developed BGRT data structure, CaSSiS not only determines sequence-specific signatures and perfect group-covering signatures for every node within the cluster (i.e. target groups), but also signatures with maximal group coverage (sensitivity) within a user-defined range of non-target hits (specificity) for groups lacking a perfect common signature. An upper limit of tolerated mismatches within the target group, as well as the minimum number of mismatches with non-target sequences, can be predefined. Test runs with one of the largest phylogenetic gene sequence datasets available indicate good runtime and memory performance, and in silico spot tests have shown the usefulness of the resulting signature sequences as blueprints for group-specific oligonucleotide probes.
Software and Supplementary Material are available at

  • [Show abstract] [Hide abstract]
    ABSTRACT: This article presents a new algorithm for finding oligonucleotide signatures that are specific and sensitive for organisms or groups of organisms in large-scale sequence datasets. We assume that the organisms have been organized in a hierarchy, for example, a phylogenetic tree. The resulting signatures, binding sites for primers and probes, match the maximum possible number of organisms in the target group while having at most k matches outside of the target group. The key step in the algorithm is the use of the lowest common ancestor (LCA) to search the organism hierarchy; this allows the combinatorial problem in almost linear time (empirically observed) to be solved. The presented algorithm improves performance by several orders of magnitude in terms of both memory consumption and runtime when compared to the best-known previous algorithms while giving identical, exact solutions. This article gives a formal description of the algorithm, discusses details of our concrete, publicly available implementation, and presents the results from our performance evaluation.
    Journal of Experimental Algorithmics 07/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background PRISE2 is a new software tool for designing sequence-selective PCR primers and probes. To achieve high level of selectivity, PRISE2 allows the user to specify a collection of target sequences that the primers are supposed to amplify, as well as non-target sequences that should not be amplified. The program emphasizes primer selectivity on the 3' end, which is crucial for selective amplification of conserved sequences such as rRNA genes. In PRISE2, users can specify desired properties of primers, including length, GC content, and others. They can interactively manipulate the list of candidate primers, to choose primer pairs that are best suited for their needs. A similar process is used to add probes to selected primer pairs. More advanced features include, for example, the capability to define a custom mismatch penalty function. PRISE2 is equipped with a graphical, user-friendly interface, and it runs on Windows, Macintosh or Linux machines. Results PRISE2 has been tested on two very similar strains of the fungus Dactylella oviparasitica, and it was able to create highly selective primers and probes for each of them, demonstrating the ability to create useful sequence-selective assays. Conclusions PRISE2 is a user-friendly, interactive software package that can be used to design high-quality selective primers for PCR experiments. In addition to choosing primers, users have an option to add a probe to any selected primer pair, enabling design of Taqman and other primer-probe based assays. PRISE2 can also be used to design probes for FISH and other hybridization-based assays.
    BMC Bioinformatics 09/2014; 15:317. · 3.02 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, high-throughput molecular tools have led to an exponential growth of available 16S rRNA gene sequences. Incorporating such data, molecular tools based on target-probe hybridization were developed to monitor microbial communities within complex environments. Unfortunately, only a few 16S rRNA gene-targeted probe collections were described. Here, we present PhylOPDb, an online resource for a comprehensive phylogenetic oligonucleotide probe database. PhylOPDb provides a convivial and easy-to-use web interface to browse both regular and explorative 16S rRNA-targeted probes. Such probes set or subset could be used to globally monitor known and unknown prokaryotic communities through various techniques including DNA microarrays, polymerase chain reaction (PCR), fluorescent in situ hybridization (FISH), targeted gene capture or in silico rapid sequence identification. PhylOPDb contains 74 003 25-mer probes targeting 2178 genera including Bacteria and Archaea. Database URL:
    Database The Journal of Biological Databases and Curation 01/2014; 2014:bau036. · 4.20 Impact Factor