Barrasa, M.I., Vaglio, P., Cavasino, F., Jacotot, L. & Walhout, A.J.M. EDGEdb: a transcription factor-DNA interaction database for the analysis of C. elegans differential gene expression. BMC Genomics 8, 21

Program in Gene Function and Expression and Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA. <>
BMC Genomics (Impact Factor: 3.99). 02/2007; 8(1):21. DOI: 10.1186/1471-2164-8-21
Source: PubMed


Transcription regulatory networks are composed of protein-DNA interactions between transcription factors and their target genes. A long-term goal in genome biology is to map protein-DNA interaction networks of all regulatory regions in a genome of interest. Both transcription factor -and gene-centered methods can be used to systematically identify such interactions. We use high-throughput yeast one-hybrid assays as a gene-centered method to identify protein-DNA interactions between regulatory sequences (e.g. gene promoters) and transcription factors in the nematode Caenorhabditis elegans. We have already mapped several hundred protein-DNA interactions and analyzed the transcriptional consequences of some by examining differential gene expression of targets in the presence or absence of an upstream regulator. The rapidly increasing amount of protein-DNA interaction data at a genome scale requires a database that facilitates efficient data storage, retrieval and integration.
Here, we report the implementation of a C. elegans differential gene expression database (EDGEdb). This database enables the storage and retrieval of protein-DNA interactions and other data that relate to differential gene expression. Specifically, EDGEdb contains: i) sequence information of regulatory elements, including gene promoters, ii) sequence information of all 934 predicted transcription factors, their DNA binding domains, and, where available, their dimerization partners and consensus DNA binding sites, iii) protein-DNA interactions between regulatory elements and transcription factors, and iv) expression patterns conferred by regulatory elements, and how such patterns are affected by interacting transcription factors.
EDGEdb provides a protein-DNA -and protein-protein interaction resource for C. elegans transcription factors and a framework for similar databases for other organisms. The database is available at

Download full-text


Available from: Fabien Cavasino,
  • Source
    • "To validate inferred integrated networks, we use known interactions in TRANSFAC [26], REDfly [27] and EdgeDB [28] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Network alignment refers to the problem of finding a bijective mapping across vertices of two or more graphs to maximize the number of overlapping edges and/or to minimize the number of mismatched interactions across networks. This paper introduces a network alignment algorithm inspired by eigenvector analysis which creates a simple relaxation for the underlying quadratic assignment problem. Our method relaxes binary assignment constraints along the leading eigenvector of an alignment matrix which captures the structure of matched and mismatched interactions across networks. Our proposed algorithm denoted by EigeAlign has two steps. First, it computes the Perron-Frobenius eigenvector of the alignment matrix. Second, it uses this eigenvector in a linear optimization framework of maximum weight bipartite matching to infer bijective mappings across vertices of two graphs. Unlike existing network alignment methods, EigenAlign considers both matched and mismatched interactions in its optimization and therefore, it is effective in aligning networks even with low similarity. We show that, when certain technical conditions hold, the relaxation given by EigenAlign is asymptotically exact over Erdos-Renyi graphs with high probability. Moreover, for modular network structures, we show that EigenAlign can be used to split the large quadratic assignment optimization into small subproblems, enabling the use of computationally expensive, but tight semidefinite relaxations over each subproblem. Through simulations, we show the effectiveness of the EigenAlign algorithm in aligning various network structures including Erdos-Renyi, power law, and stochastic block models, under different noise models. Finally, we apply EigenAlign to compare gene regulatory networks across human, fly and worm species which we infer by integrating genome-wide functional and physical genomics datasets from ENCODE and modENCODE consortia. EigenAlign infers conserved regulatory interactions across these species despite large evolutionary distances spanned. We find strong conservation of centrally-connected genes and some biological pathways, especially for human-fly comparisons.
  • Source
    • "The prolific output of the -omics technologies has been matched by an ever-increasing number of databases that organize data on biological molecules and their interactions in human cells and in model organisms, such as yeast, E. coli, C. elegans, Drosophila, and others. For example, IntAct, STRING, HPRD, BioGRID, WI8, DroID, YEASTRACT, and SGD [2]–[9] store curated information about protein interactions; PHOSIDA, PhosphoSitePlus, PhosphoELM, NetPhosK, NetworKIN, PREDIKIN, and Scansite [10]–[15] accumulate knowledge about protein phosphorylation and increasingly also about other PTMs; EdgeDB, REDfly, JASPAR, ENCODE, PAZAR, ABS, ORegAnno, and others [16]–[22] provide information about transcriptional regulatory interactions; miRBase, PutMir, Miranda, TargetScan, and miRecords [23]–[27] contain information on miRNAs and mRNA targets of miRNAs; and PutMir, TransmiR, and ENCODE [19], [25], [28] supply information about TFs regulating miRNA expressions. Many of these databases are highly comprehensive in their specialized areas, yet they do not provide an integrated picture of how multiple layers of biological regulation (PPI, PTM, TF-DNA interactions, and transcriptional and translational feedbacks) cooperate to enable the signal integration and processing that determine cellular responses. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The ever-increasing capacity of biological molecular data acquisition outpaces our ability to understand the meaningful relationships between molecules in a cell. Multiple databases were developed to store and organize these molecular data. However, emerging fundamental questions about concerted functions of these molecules in hierarchical cellular networks are poorly addressed. Here we review recent advances in the development of publically available databases that help us analyze the signal integration and processing by multilayered networks that specify biological responses in model organisms and human cells.
    PLoS Computational Biology 02/2014; 10(2):e1003385. DOI:10.1371/journal.pcbi.1003385 · 4.62 Impact Factor
  • Source
    • "Genes having inconsistent expression calls across technical replicates were filtered out. We obtained the list of candidate TFs in C. elegans from EDGEdb [44], and obtained 1,000 bp promoter sequence of genes from BioMart [45]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Arsenic, a known human carcinogen, is widely distributed around the world and found in particularly high concentrations in certain regions including Southwestern US, Eastern Europe, India, China, Taiwan and Mexico. Chronic arsenic poisoning affects millions of people worldwide and is associated with increased risk of many diseases including arthrosclerosis, diabetes and cancer. In this study, we explored genome level global responses to high and low levels of arsenic exposure in Caenorhabditis elegans using Affymetrix expression microarrays. This experimental design allows us to do microarray analysis of dose-response relationships of global gene expression patterns. High dose (0.03%) exposure caused stronger global gene expression changes in comparison with low dose (0.003%) exposure, suggesting a positive dose-response correlation. Biological processes such as oxidative stress, and iron metabolism, which were previously reported to be involved in arsenic toxicity studies using cultured cells, experimental animals, and humans, were found to be affected in C. elegans. We performed genome-wide gene expression comparisons between our microarray data and publicly available C. elegans microarray datasets of cadmium, and sediment exposure samples of German rivers Rhine and Elbe. Bioinformatics analysis of arsenic-responsive regulatory networks were done using FastMEDUSA program. FastMEDUSA analysis identified cancer-related genes, particularly genes associated with leukemia, such as dnj-11, which encodes a protein orthologous to the mammalian ZRF1/MIDA1/MPP11/DNAJC2 family of ribosome-associated molecular chaperones. We analyzed the protective functions of several of the identified genes using RNAi. Our study indicates that C. elegans could be a substitute model to study the mechanism of metal toxicity using high-throughput expression data and bioinformatics tools such as FastMEDUSA.
    PLoS ONE 07/2013; 8(7):e66431. DOI:10.1371/journal.pone.0066431 · 3.23 Impact Factor
Show more