TFCONES: A database of vertebrate transcription factor-encoding genes and their associated conserved noncoding elements

Institute of Molecular and Cell Biology, 61 Biopolis Drive, Singapore 138673, Singapore.
BMC Genomics (Impact Factor: 4.04). 02/2007; 8:441. DOI: 10.1186/1471-2164-8-441
Source: PubMed

ABSTRACT Transcription factors (TFs) regulate gene transcription and play pivotal roles in various biological processes such as development, cell cycle progression, cell differentiation and tumor suppression. Identifying cis-regulatory elements associated with TF-encoding genes is a crucial step in understanding gene regulatory networks. To this end, we have used a comparative genomics approach to identify putative cis-regulatory elements associated with TF-encoding genes in vertebrates.
We have created a database named TFCONES (Transcription Factor Genes & Associated COnserved Noncoding ElementS) ( which contains all human, mouse and fugu TF-encoding genes and conserved noncoding elements (CNEs) associated with them. The CNEs were identified by gene-by-gene alignments of orthologous TF-encoding gene loci using MLAGAN. We also predicted putative transcription factor binding sites within the CNEs. A significant proportion of human-fugu CNEs contain experimentally defined binding sites for transcriptional activators and repressors, indicating that a majority of the CNEs may function as transcriptional regulatory elements. The TF-encoding genes that are involved in nervous system development are generally enriched for human-fugu CNEs. Users can retrieve TF-encoding genes and their associated CNEs by conducting a keyword search or by selecting a family of DNA-binding proteins.
The conserved noncoding elements identified in TFCONES represent a catalog of highly prioritized putative cis-regulatory elements of TF-encoding genes and are candidates for functional assay.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Biclustering, the discovery of sets of objects with a coherent pattern across a subset of conditions, is a critical task to study a wide-set of biomedical problems, where molecular units or patients are meaningfully related with a set of properties. The challenging combinatorial nature of this task led to the development of approaches with restrictions on the allowed type, number and quality of biclusters. Contrasting, recent biclustering approaches relying on pattern mining methods can exhaustively discover flexible structures of robust biclusters. However, these approaches are only prepared to discover constant biclusters and their underlying contributions remain dispersed. The proposed BicPAM biclustering approach integrates existing principles made available by state-of-the-art pattern-based approaches with two new contributions. First, BicPAM is the first efficient attempt to exhaustively mine non-constant types of biclusters, including additive and multiplicative coherencies in the presence or absence of symmetries. Second, BicPAM provides strategies to effectively compose different biclustering structures and to handle arbitrary levels of noise inherent to data and with discretization procedures. Results show BicPAM's superiority against its peers and its ability to retrieve unique types of biclusters of interest, to efficiently deliver exhaustive solutions and to successfully recover planted biclusters in datasets with varying levels of missing values and noise. Its application over gene expression data leads to unique solutions with heightened biological relevance. BicPAM approaches integrate existing disperse efforts towards pattern-based biclustering and provides the first critical strategies to efficiently discover exhaustive solutions of biclusters with shifting, scaling and symmetric assumptions with varying quality and underlying structures. Additionally, BicPAM dynamically adapts its behavior to mine data with different levels of missing values and noise.
    Algorithms for Molecular Biology 12/2014; 9:27. DOI:10.1186/s13015-014-0027-z · 1.86 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In vertebrates, conserved noncoding elements (CNEs) are functionally constrained sequences that can show striking conservation over >400 million years of evolutionary distance and frequently are located megabases away from target developmental genes. Conserved noncoding sequences (CNSs) in plants are much shorter, and it has been difficult to detect conservation among distantly related genomes. In this article, we show not only that CNS sequences can be detected throughout the eudicot clade of flowering plants, but also that a subset of 37 CNSs can be found in all flowering plants (diverging ∼170 million years ago). These CNSs are functionally similar to vertebrate CNEs, being highly associated with transcription factor and development genes and enriched in transcription factor binding sites. Some of the most highly conserved sequences occur in genes encoding RNA binding proteins, particularly the RNA splicing-associated SR genes. Differences in sequence conservation between plants and animals are likely to reflect differences in the biology of the organisms, with plants being much more able to tolerate genomic deletions and whole-genome duplication events due, in part, to their far greater fecundity compared with vertebrates.
    The Plant Cell 03/2014; 26(3). DOI:10.1105/tpc.113.121905 · 9.58 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: UCbase 2.0 ( is an update, extension and evolution of UCbase, a Web tool dedicated to the analysis of ultraconserved sequences (UCRs). UCRs are 481 sequences >200 bases sharing 100% identity among human, mouse and rat genomes. They are frequently located in genomic regions known to be involved in cancer or differentially expressed in human leukemias and carcinomas. UCbase 2.0 is a platform-independent Web resource that includes the updated version of the human genome annotation (hg19), information linking disorders to chromosomal coordinates based on the Systematized Nomenclature of Medicine classification, a query tool to search for Single Nucleotide Polymorphisms (SNPs) and a new text box to directly interrogate the database using a MySQL interface. To facilitate the interactive visual interpretation of UCR chromosomal positioning, UCbase 2.0 now includes a graph visualization interface directly linked to UCSC genome browser.Database URL:
    Database The Journal of Biological Databases and Curation 01/2014; 2014. DOI:10.1093/database/bau062 · 4.46 Impact Factor

Preview (2 Sources)

Available from