MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices.

Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder 80309-0347, USA.
Computer applications in the biosciences: CABIOS 11/1995; 11(5):563-6.
Source: DBLP

ABSTRACT The information matrix database (IMD), a database of weight matrices of transcription factor binding sites, is developed. MATRIX SEARCH, a program which can find potential transcription factor binding sites in DNA sequences using the IMD database, is also developed and accompanies the IMD database. MATRIX SEARCH adopts a user interface very similar to that of the SIGNAL SCAN program. MATRIX SEARCH allows the user to search an input sequence with the IMD automatically, to visualize the matrix representations of sites for particular factors, and to retrieve journal citations. The source code for MATRIX SEARCH is in the 'C' language, and the program is available for unix platforms.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare (, a structural alignment method for protein-DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein-DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments.
    Nucleic Acids Research 12/2012; · 8.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Typical approaches for predicting transcription factor binding sites (TFBSs) involve use of a position-specific weight matrix (PWM) to statistically characterize the sequences of the known sites. Recently, an alternative physicochemical approach, called SiteSleuth, was proposed. In this approach, a linear support vector machine (SVM) classifier is trained to distinguish TFBSs from background sequences based on local chemical and structural features of DNA. SiteSleuth appears to generally perform better than PWM-based methods. Here, we improve the SiteSleuth approach by considering both new physicochemical features and algorithmic modifications. New features are derived from Gibbs energies of amino acid-DNA interactions and hydroxyl radical cleavage profiles of DNA. Algorithmic modifications consist of inclusion of a feature selection step, use of a nonlinear kernel in the SVM classifier, and use of a consensus-based post-processing step for predictions. We also considered SVM classification based on letter features alone to distinguish performance gains from use of SVM-based models versus use of physicochemical features. The accuracy of each of the variant methods considered was assessed by cross validation using data available in the RegulonDB database for 54 Escherichia coli TFs, as well as by experimental validation using published ChIP-chip data available for Fis and Lrp.
    Nucleic Acids Research 08/2012; · 8.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Celem niniejszej pracy jest zaprezentowanie możliwości zastosowania mapowanych na elementy proceduralne Transact SQL klas obiektowych CLR tworzonych na platformie .NET w złożonych algorytmach przetwarzania. Przedstawione zostały podstawy teoretyczne algorytmów dopasowania łańcuchów dla alfabetów skończonych. Dla wprowadzonych alfabetów nieskończonych rozwiązania te nie mogą być w sposób prosty zmodyfikowane, dlatego zaproponowany został algorytm DTW (Dynamic Time Warping), który został oprogramowany z zastosowaniem reguł mapowania do obiektów rozszerzenia proceduralnego SQL. Przedstawiono elementy praktycznej realizacji praktycznej oraz dokonano omówienia wyników eksperymentu numerycznego dopasowującego gesty. The purpose of this work is to present the possibility to use mapped to the procedural elements Transact SQL CLR object classes that are created on the NET platform in a complex processing algorithms. There are presented theoretical algorithms for matching chains for finite symbol set alphabets. For introduced infinite symbol set alphabets solutions may not be easily modified, so it was proposed the algorithm DTW (Dynamic Time Warping), which was programmed using the mapping rules for procedural extension to SQL. There where shown elements of the practical implementation and the experimental results of matching gestures were discuss.
    Zeszyty Naukowe Wyższej Szkoły Informatyki. 01/2013; 12(1):91-111.

Full-text (2 Sources)

Available from
May 21, 2014