Conference Paper

Selection of putative cis-regulatory motifs through regional and global conservation

Nat. Res. Council of Canada, Ottawa, Ont., Canada
DOI: 10.1109/CSB.2004.1332545 Conference: Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
Source: IEEE Xplore

ABSTRACT Cis-regulatory motifs are often overrepresented in promoters and may exhibit frequency biases in subpromoter regions (SPRs). Many probabilistic algorithms have been used to predict such motifs, but they tend to generate many false positives. We devised a novel algorithm, MotifFilter, that computes representation indices (RIs) for putative motifs. MotifFilter's RI is a ratio of the actual over expected frequency of a motif in promoters, SPRs or random genomic DNA that takes into account of the nucleotide probability distributions in these regions. This approach was applied to a genome-wide survey of putative cAMP-response elements (CREs) for motifs generated by a profile hidden Markov model. Twenty of 144 putative CRE motifs found in the survey were retained by the MotifFilter.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Discovery of transcription regulatory elements has been an enormous challenge, both to biologists and computational scientists. Over the last three decades, significant progress has been achieved by various laboratories around the world. Earlier, laborious experimental methods were used to detect one or handful of elements at a time. With recent advances in DNA sequencing technology, many completed genomes became available. High throughput biological techniques and computational methods emerged. Comparative genomic approaches and their integration with microarray gene expression data provided promising results. In this review, we discuss the development of technology to decipher the complex transcription regulation system with a focus on the discovery of cis-regulatory elements in eukaryotes.
    Current Bioinformatics 08/2006; 1(3):321-336. DOI:10.2174/157489306777828026 · 1.73 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In biological sequence research, the positional weight matrix (PWM) is often used for motif signal detection. A set of experimentally verified oligonucleotides known to be functional subsequences, which can be bound by a transcription factor (TF), as translational initiation sites or pre-mRNA splicing sites, are collected and aligned. The frequency of each nucleotide A, C, G, or T at each column of the alignment is calculated in the matrix. Once a PWM is constructed, it can be used to search from a nucleotide sequence for the subsequences that possibly perform the same function. The match between a subsequence and a PWM is usually described by a score function, which measures the closeness of the subsequence to the PWM as compared with the given background. However, selection of threshold scores that legitimately qualify a functional subsequence has been a great challenge. Many laboratories have attempted tackling this problem; but there is no significant breakthrough so far. In this chapter, we discuss the characteristics of a PWM and factors that affect motif predictions and propose a new score function that is tied into information content and statistical expectation of a PWM. We also apply this score function in the PWMs from public databases and compare it favorably with the broadly used Match method.
    Oligonucleotide Array Sequence Analysis, Edited by M.K. Moretti, L.J. Rizzo, 01/2008: chapter 15: pages 421-440; Nova Science Publishers., ISBN: 978-1-60456-542-3
  • Article: Famili

Full-text (2 Sources)

Available from
May 22, 2014