MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices

Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder 80309-0347, USA.
Computer applications in the biosciences: CABIOS 11/1995; 11(5):563-6. DOI: 10.1093/bioinformatics/11.5.563
Source: PubMed

ABSTRACT The information matrix database (IMD), a database of weight matrices of transcription factor binding sites, is developed. MATRIX SEARCH, a program which can find potential transcription factor binding sites in DNA sequences using the IMD database, is also developed and accompanies the IMD database. MATRIX SEARCH adopts a user interface very similar to that of the SIGNAL SCAN program. MATRIX SEARCH allows the user to search an input sequence with the IMD automatically, to visualize the matrix representations of sites for particular factors, and to retrieve journal citations. The source code for MATRIX SEARCH is in the 'C' language, and the program is available for unix platforms.

Download full-text


Available from: Gary D Stormo, Jul 06, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Known transcription regulatory signals which generally act as transcription factor binding sites (TFs) differ significantly in their base composition. Therefore, their occurrence in a genome largely depends on the local base composition. In an attempt to initiate an all human genome analysis for the occurrence of potential TFs, we systematically analyzed the GC-content of distinct functional regions (e. g., upstream and downstream gene regions, exons, long and short introns, repetitive elements) and correlated the frequencies of potential binding sites of a representative set of TFs in these regions. For these analyses, we used the pattern collection of the TRANSFAC database on transcriptional regulation, the information about functionally relevant combinations of them from the database TRANSCompel, and our new resource, TRANSGenomeTM, which provides an overall annotation of the human genome with emphasis on its regulatory characteristics. We show that the occurrence of sequence patterns with regulatory potential may be supported by, but cannot be fully explained by either the GC content of a whole chromosome or its putative promoter regions, nor by the information content of the patterns. Several patterns, HNF-3, NFAT, and GC box, show a clear overrepresentation in all promoter groups as well as in all chromosomes. Other patterns, like E2F and CRE-BP1, are underrepresented in all promoter groups as well as in all chromosomes in comparison with random sequences. Simultaneously, both patterns are over-represented in promoters in comparison with repetitive elements. We define several structural characteristics of the proximal promoters that differentiate them from other functional genomic regions. Two well-known promoter elements, GC- and TATA-boxes, are statistically enriched in promoters in comparison with random sequences, repetitive elements and exons. Altogether, our findings provide insights into the macroheterogeneity amongst the individual chromosomes, into the microheterogeneity among different functional regions of individual chromosomes, contribute to further understanding of structural organization of gene regulatory regions, and give first hints on the development of regulatory features during evolution.
    In silico biology 02/2003; 3(1-2):145-71.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The TRANSFAC system of databases provides information about the molecular mechanisms of transcriptional regulation and pathological dysregulation, certain aspects of chromatin structure, and signal transduction. Integrating this information into a comprehensive system allows to model complex regulatory networks of experimentally known as well as hypothetical components, the latter ones being predicted by the application of state-of-the-art bioinformatics tools. Altogether, it will provide core information with which it will be possible to tackle most of the problems in the field of “functional genomics”.
    Gene Function & Disease 10/2002; 3(12):3-11. DOI:10.1002/1438-826X(200210)3:1/23.0.CO;2-S
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Scaffold/matrix attachment regions (S/MARs) are essential regulatory DNA elements of eukaryotic cells. They are major determinants of locus control of gene expression and can shield gene expression from position effects. Experimental detection of S/MARs requires substantial effort and is not suitable for large-scale screening of genomic sequences. In silico prediction of S/MARs can provide a crucial first selection step to reduce the number of candidates. We used experimentally defined S/MAR sequences as the training set and generated a library of new S/MAR-associated, AT-rich patterns described as weight matrices. A new tool called SMARTest was developed that identifies potential S/MARs by performing a density analysis based on the S/MAR matrix library ( S/MAR predictions were evaluated by using six genomic sequences from animal and plant for which S/MARs and non-S/MARs were experimentally mapped. SMARTest reached a sensitivity of 38% and a specificity of 68%. In contrast to previous algorithms, the SMARTest approach does not depend on the sequence context and is suitable to analyze long genomic sequences up to the size of whole chromosomes. To demonstrate the feasibility of large-scale S/MAR prediction, we analyzed the recently published chromosome 22 sequence and found 1198 S/MAR candidates.
    Genome Research 03/2002; 12(2):349-54. DOI:10.1101/gr.206602. · 13.85 Impact Factor