Article

Comprehensive and relaxed search for oligonucleotide signatures in hierarchically clustered sequence datasets.

Services Department of Informatics, Technische Universität München, Boltzmannstrasse 3, 85748 Garching, Germany.
Bioinformatics (Impact Factor: 4.62). 04/2011; 27(11):1546-54. DOI: 10.1093/bioinformatics/btr161
Source: PubMed

ABSTRACT PCR, hybridization, DNA sequencing and other important methods in molecular diagnostics rely on both sequence-specific and sequence group-specific oligonucleotide primers and probes. Their design depends on the identification of oligonucleotide signatures in whole genome or marker gene sequences. Although genome and gene databases are generally available and regularly updated, collections of valuable signatures are rare. Even for single requests, the search for signatures becomes computationally expensive when working with large collections of target (and non-target) sequences. Moreover, with growing dataset sizes, the chance of finding exact group-matching signatures decreases, necessitating the application of relaxed search methods. The resultant substantial increase in complexity is exacerbated by the dearth of algorithms able to solve these problems efficiently.
We have developed CaSSiS, a fast and scalable method for computing comprehensive collections of sequence- and sequence group-specific oligonucleotide signatures from large sets of hierarchically clustered nucleic acid sequence data. Based on the ARB Positional Tree (PT-)Server and a newly developed BGRT data structure, CaSSiS not only determines sequence-specific signatures and perfect group-covering signatures for every node within the cluster (i.e. target groups), but also signatures with maximal group coverage (sensitivity) within a user-defined range of non-target hits (specificity) for groups lacking a perfect common signature. An upper limit of tolerated mismatches within the target group, as well as the minimum number of mismatches with non-target sequences, can be predefined. Test runs with one of the largest phylogenetic gene sequence datasets available indicate good runtime and memory performance, and in silico spot tests have shown the usefulness of the resulting signature sequences as blueprints for group-specific oligonucleotide probes.
Software and Supplementary Material are available at http://cassis.in.tum.de/.

0 Bookmarks
 · 
151 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Signatures are short sequences that are unique and not similar to any other sequence in a databasethat can be used as the basis to identify different species. Even though several signature discoveryalgorithms have been proposed in the past, these algorithms require the entirety of databases to beloaded in the memory, thus restricting the amount of data that they can process. It makes thosealgorithms unable to process databases with large amounts of data. Also, those algorithms usesequential models and have slower discovery speeds, meaning that the efficiency can be improved.
    BMC Bioinformatics 10/2014; 15(1):339. · 2.67 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Thermophilic bacteria have recently attracted greater attention because of their potential application in improving different biochemical processes like anaerobic digestion of various substrates, wastewater treatment or hydrogen production. In this study we report on the design of a specific 16S rRNA-targeted oligonucleotide probe for detecting members of Coprothermobacter genus characterized by a strong protease activity to degrade proteins and peptides. The newly designed CTH485 probe and helper probes hCTH429 and hCTH439 were optimized for use in fluorescence in-situ hybridization on thermophilic anaerobic sludge samples. In-situ probing revealed that thermo-adaptive mechanisms shaping the 16S rRNA may affect the identification of thermophilic microorganisms. The novel developed FISH probe extends the possibility to study the widespread thermophilic syntrophic interaction of Coprothermobacter spp. with hydrogenotrophic methanogenic archaea, whose establishment is a great benefit for the whole anaerobic system.This article is protected by copyright. All rights reserved.
    FEMS Microbiology Letters 07/2014; · 2.72 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background PRISE2 is a new software tool for designing sequence-selective PCR primers and probes. To achieve high level of selectivity, PRISE2 allows the user to specify a collection of target sequences that the primers are supposed to amplify, as well as non-target sequences that should not be amplified. The program emphasizes primer selectivity on the 3' end, which is crucial for selective amplification of conserved sequences such as rRNA genes. In PRISE2, users can specify desired properties of primers, including length, GC content, and others. They can interactively manipulate the list of candidate primers, to choose primer pairs that are best suited for their needs. A similar process is used to add probes to selected primer pairs. More advanced features include, for example, the capability to define a custom mismatch penalty function. PRISE2 is equipped with a graphical, user-friendly interface, and it runs on Windows, Macintosh or Linux machines. Results PRISE2 has been tested on two very similar strains of the fungus Dactylella oviparasitica, and it was able to create highly selective primers and probes for each of them, demonstrating the ability to create useful sequence-selective assays. Conclusions PRISE2 is a user-friendly, interactive software package that can be used to design high-quality selective primers for PCR experiments. In addition to choosing primers, users have an option to add a probe to any selected primer pair, enabling design of Taqman and other primer-probe based assays. PRISE2 can also be used to design probes for FISH and other hybridization-based assays.
    BMC Bioinformatics 09/2014; 15:317. · 2.67 Impact Factor

Full-text

Download
3 Downloads
Available from
Jan 11, 2015