Use of a Drosophila Genome-Wide Conserved Sequence Database to Identify Functionally Related cis-Regulatory Enhancers

Neural Cell-Fate Determinants Section, NINDS, NIH, Bethesda, Maryland 20892, USA.
Developmental Dynamics (Impact Factor: 2.67). 01/2012; 241(1):169-89. DOI: 10.1002/dvdy.22728
Source: PubMed

ABSTRACT Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs.
We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization.
cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cis-regulatory modules (CRMs) and motifs play a central role in tissue and condition-specific gene expression. Here we present Imogene, an ensemble of statistical tools that we have developed to facilitate their identification and implemented in a publicly available software. Starting from a small training set of mammalian or fly CRMs that drive similar gene expression profiles, Imogene determines de novo cis-regulatory motifs that underlie this co-expression. It can then predict on a genome-wide scale other CRMs with a regulatory potential similar to the training set. Imogene bypasses the need of large datasets for statistical analyses by making central use of the information provided by the sequenced genomes of multiple species, based on the developed statistical tools and explicit models for transcription factor binding site evolution. We test Imogene on characterized tissue-specific mouse developmental CRMs. Its ability to identify CRMs with the same specificity based on its de novo created motifs is comparable to that of previously evaluated ‘motif-blind’ methods. We further show, both in flies and in mammals, that Imogene de novo generated motifs are sufficient to discriminate CRMs related to different developmental programs. Notably, purely relying on sequence data, Imogene performs as well in this discrimination task as a previously reported learning algorithm based on Chromatin Immunoprecipitation (ChIP) data for multiple transcription factors at multiple developmental stages.
    Nucleic Acids Research 03/2014; 42(10). DOI:10.1093/nar/gku209 · 8.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Developmental progression is driven by specific spatiotemporal domains of gene expression, which give rise to stereotypically patterned embryos even in the presence of environmental and genetic variation. Views of how transcription factors regulate gene expression are changing owing to recent genome-wide studies of transcription factor binding and RNA expression. Such studies reveal patterns that, at first glance, seem to contrast with the robustness of the developmental processes they encode. Here, we review our current knowledge of transcription factor function from genomic and genetic studies and discuss how different strategies, including extensive cooperative regulation (both direct and indirect), progressive priming of regulatory elements, and the integration of activities from multiple enhancers, confer specificity and robustness to transcriptional regulation during development.
    Nature Reviews Genetics 08/2012; 13(9):613-26. DOI:10.1038/nrg3207 · 39.79 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Analysis of cis-regulatory enhancers has revealed that they consist of clustered blocks of highly conserved sequences. Although most characterized enhancers reside near their target genes, a growing number of studies have shown that enhancers located over 50 kb from their minimal promoter(s) are required for appropriate gene expression and many of these 'long-range' enhancers are found in genomic regions that are devoid of identified exons. To gain insight into the complexity of Drosophila cis-regulatory sequences within exon-poor regions, we have undertaken an evolutionary analysis of 39 of these regions located throughout the genome. This survey revealed that within these genomic expanses, clusters of conserved sequence blocks (CSBs) are positioned once every 1.1 kb, on average, and that a typical cluster contains multiple (5 to 30 or more) CSBs that have been maintained for at least 190 My of evolutionary divergence. As an initial step toward assessing the cis-regulatory activity of conserved clusters within gene-free genomic expanses, we have tested the in-vivo enhancer activity of 19 consecutive CSB clusters located in the middle of a 115 kb gene-poor region on the 3(rd) chromosome. Our studies revealed that each cluster functions independently as a specific spatial/temporal enhancer. In total, the enhancers possess a diversity of regulatory functions, including dynamically activating expression in defined patterns within subsets of cells in discrete regions of the embryo, larvae and/or adult. We also observed that many of the enhancers are multifunctional-that is, they activate expression during multiple developmental stages. By extending these results to the rest of the Drosophila genome, which contains over 70,000 non-coding CSB clusters, we suggest that most function as enhancers.
    PLoS ONE 04/2013; 8(4):e60137. DOI:10.1371/journal.pone.0060137 · 3.53 Impact Factor

Full-text (3 Sources)

Available from
Jun 2, 2014