Article

Barrasa, M.I., Vaglio, P., Cavasino, F., Jacotot, L. & Walhout, A.J.M. EDGEdb: a transcription factor-DNA interaction database for the analysis of C. elegans differential gene expression. BMC Genomics 8, 21

Program in Gene Function and Expression and Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA. <>
BMC Genomics (Impact Factor: 4.04). 02/2007; 8(1):21. DOI: 10.1186/1471-2164-8-21
Source: PubMed

ABSTRACT Transcription regulatory networks are composed of protein-DNA interactions between transcription factors and their target genes. A long-term goal in genome biology is to map protein-DNA interaction networks of all regulatory regions in a genome of interest. Both transcription factor -and gene-centered methods can be used to systematically identify such interactions. We use high-throughput yeast one-hybrid assays as a gene-centered method to identify protein-DNA interactions between regulatory sequences (e.g. gene promoters) and transcription factors in the nematode Caenorhabditis elegans. We have already mapped several hundred protein-DNA interactions and analyzed the transcriptional consequences of some by examining differential gene expression of targets in the presence or absence of an upstream regulator. The rapidly increasing amount of protein-DNA interaction data at a genome scale requires a database that facilitates efficient data storage, retrieval and integration.
Here, we report the implementation of a C. elegans differential gene expression database (EDGEdb). This database enables the storage and retrieval of protein-DNA interactions and other data that relate to differential gene expression. Specifically, EDGEdb contains: i) sequence information of regulatory elements, including gene promoters, ii) sequence information of all 934 predicted transcription factors, their DNA binding domains, and, where available, their dimerization partners and consensus DNA binding sites, iii) protein-DNA interactions between regulatory elements and transcription factors, and iv) expression patterns conferred by regulatory elements, and how such patterns are affected by interacting transcription factors.
EDGEdb provides a protein-DNA -and protein-protein interaction resource for C. elegans transcription factors and a framework for similar databases for other organisms. The database is available at http://edgedb.umassmed.edu.

Download full-text

Full-text

Available from: Fabien Cavasino, Aug 19, 2015
0 Followers
 · 
183 Views
  • Source
    • "To validate inferred integrated networks, we use known interactions in TRANSFAC [26], REDfly [27] and EdgeDB [28] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Network alignment refers to the problem of finding a bijective mapping across vertices of two or more graphs to maximize the number of overlapping edges and/or to minimize the number of mismatched interactions across networks. This paper introduces a network alignment algorithm inspired by eigenvector analysis which creates a simple relaxation for the underlying quadratic assignment problem. Our method relaxes binary assignment constraints along the leading eigenvector of an alignment matrix which captures the structure of matched and mismatched interactions across networks. Our proposed algorithm denoted by EigeAlign has two steps. First, it computes the Perron-Frobenius eigenvector of the alignment matrix. Second, it uses this eigenvector in a linear optimization framework of maximum weight bipartite matching to infer bijective mappings across vertices of two graphs. Unlike existing network alignment methods, EigenAlign considers both matched and mismatched interactions in its optimization and therefore, it is effective in aligning networks even with low similarity. We show that, when certain technical conditions hold, the relaxation given by EigenAlign is asymptotically exact over Erdos-Renyi graphs with high probability. Moreover, for modular network structures, we show that EigenAlign can be used to split the large quadratic assignment optimization into small subproblems, enabling the use of computationally expensive, but tight semidefinite relaxations over each subproblem. Through simulations, we show the effectiveness of the EigenAlign algorithm in aligning various network structures including Erdos-Renyi, power law, and stochastic block models, under different noise models. Finally, we apply EigenAlign to compare gene regulatory networks across human, fly and worm species which we infer by integrating genome-wide functional and physical genomics datasets from ENCODE and modENCODE consortia. EigenAlign infers conserved regulatory interactions across these species despite large evolutionary distances spanned. We find strong conservation of centrally-connected genes and some biological pathways, especially for human-fly comparisons.
  • Source
    • "We also identified significant motifs for 12 factors with at least 10 promoter binding observations from the EDGE database of Yeast-One Hybrid (Y1H) experiments (Barrasa et al., 2007), though again the correct motifs for these sites are not known a priori. The Oreganno database lists 187 different experimentally tested binding sites and cis-regulatory modules in C. elegans (Montgomery et al., 2006; Griffith et al., 2008), which includes the annotated bound factors for several sites. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Regulatory sites that control gene expression are essential to the proper functioning of cells, and identifying them is critical for modeling regulatory networks. We have developed Magma (Multiple Aligner of Genomic Multiple Alignments), a software tool for multiple species, multiple gene motif discovery. Magma identifies putative regulatory sites that are conserved across multiple species and occur near multiple genes throughout a reference genome. Magma takes as input multiple alignments that can include gaps. It uses efficient clustering methods that make it about 70 times faster than PhyloNet, a previous program for this task, with slightly greater sensitivity. We ran Magma on all non-coding DNA conserved between Caenorhabditis elegans and five additional species, about 70 Mbp in total, in <4 h. We obtained 2,309 motifs with lengths of 6-20 bp, each occurring at least 10 times throughout the genome, which collectively covered about 566 kbp of the genomes, approximately 0.8% of the input. Predicted sites occurred in all types of non-coding sequence but were especially enriched in the promoter regions. Comparisons to several experimental datasets show that Magma motifs correspond to a variety of known regulatory motifs.
    Journal of computational biology: a journal of computational molecular cell biology 02/2012; 19(2):139-47. DOI:10.1089/cmb.2011.0249 · 1.67 Impact Factor
  • Source
    • "Myriad cellular processes require sequence specific recognition of a nucleic acid by a protein. Transcription factors bind to specific DNA elements to enhance or repress transcription at a defined locus (Deplancke et al. 2006; Barrasa et al. 2007; Carrera and Treisman 2008; Noyes et al. 2008). RNAbinding proteins coordinate translation, mRNA localization and stability, and pre-mRNA splicing through association with defined sequences in target transcripts (Varnum et al. 1991; Johnstone and Lasko 2001; Nilsen 2002; Jurica and Moore 2003; Singh and Valcarcel 2005; Iwasaki et al. 2009). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Sequence-specific recognition of nucleic acids by proteins is required for nearly every aspect of gene expression. Quantitative binding experiments are a useful tool to measure the ability of a protein to distinguish between multiple sequences. Here, we describe the use of fluorophore-labeled oligonucleotide probes to quantitatively monitor protein/nucleic acid interactions. We review two complementary experimental methods, fluorescence polarization and fluorescence electrophoretic mobility shift assays, that enable the quantitative measurement of binding affinity. We also present two strategies for post-synthetic end-labeling of DNA or RNA oligonucleotides with fluorescent dyes. The approaches discussed here are efficient and sensitive, providing a safe and accessible alternative to the more commonly used radio-isotopic methods.
    RNA 01/2011; 17(1):14-20. DOI:10.1261/rna.2428111 · 4.62 Impact Factor
Show more