Why Transcription Factor Binding Sites are Ten Nucleotides Long.

University of Pennsylvania.
Genetics (Impact Factor: 4.87). 08/2012; DOI: 10.1534/genetics.112.143370
Source: PubMed

ABSTRACT Gene expression is controlled primarily by transcription factors, whose DNA binding sites are typically 10 nucleotides long. We develop a population-genetic model to understand how the length and information content of such binding sites evolve. Our analysis is based on an inherent tradeoff between specificity, which is greater in long binding sites, and robustness to mutation, which is greater in short binding sites. The evolutionary stable distribution of binding site lengths predicted by the model agrees with the empirical distribution (5 nt to 31 nt, with mean 9.9 nt for eukaryotes), and it is remarkably robust to variation in the underlying parameters of population size, mutation rate, number of transcription factor targets, and strength of selection for proper binding and selection against improper binding. In a systematic dataset of eukaryotic and prokaryotic transcription factors we also uncover strong relationships between the length of a binding site and its information content per nucleotide, as well as between the number of targets a transcription factor regulates and the information content in its binding sites. Our analysis explains these features as well as the remarkable conservation of binding site characteristics across diverse taxa.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The genomes of large multicellular eukaryotes are mostly comprised of non-protein coding DNA. Although there has been much agreement that a small fraction of these genomes has important biological functions, there has been much debate as to whether the rest contributes to development and/or homeostasis. Much of the speculation has centered on the genomic regions that are transcribed into RNA at some low level. Unfortunately these RNAs have been arbitrarily assigned various names, such as "intergenic RNA," "long non-coding RNAs" etc., which have led to some confusion in the field. Many researchers believe that these transcripts represent a vast, unchartered world of functional non-coding RNAs (ncRNAs), simply because they exist. However, there are reasons to question this Panglossian view because it ignores our current understanding of how evolution shapes eukaryotic genomes and how the gene expression machinery works in eukaryotic cells. Although there are undoubtedly many more functional ncRNAs yet to be discovered and characterized, it is also likely that many of these transcripts are simply junk. Here, we discuss how to determine whether any given ncRNA has a function. Importantly, we advocate that in the absence of any such data, the appropriate null hypothesis is that the RNA in question is junk.
    Frontiers in Genetics 01/2015; 6:2. DOI:10.3389/fgene.2015.00002
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Odor perception requires that each olfactory sensory neuron (OSN) class continuously express a single odorant receptor (OR) regardless of changes in the environment. However, little is known about the control of the robust, class-specific OR expression involved. Here, we investigate the cis-regulatory mechanisms and components that generate robust and OSN class-specific OR expression in Drosophila. Our results demonstrate that the spatial restriction of expression to a single OSN class is directed by clusters of transcription-factor DNA binding motifs. Our dissection of motif clusters of differing complexity demonstrates that structural components such as motif overlap and motif order integrate transcription factor combinations and chromatin status to form a spatially restricted pattern. We further demonstrate that changes in metabolism or temperature perturb the function of complex clusters. We show that the cooperative regulation between motifs around and within the cluster generates robust, class-specific OR expression.
    PLoS Genetics 03/2015; 11(3):e1005051. DOI:10.1371/journal.pgen.1005051 · 8.17 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Many cellular functions depend on highly specific intermolecular interactions, for example transcription factors and their DNA binding sites, microRNAs and their RNA binding sites, the interfaces between heterodimeric protein molecules, the stems in RNA molecules, and kinases and their response regulators in signal-transduction systems. Despite the need for complementarity between interacting partners, such pairwise systems seem to be capable of high levels of evolutionary divergence, even when subject to strong selection. Such behavior is a consequence of the diminishing advantages of increasing binding affinity between partners, the multiplicity of evolutionary pathways between selectively equivalent alternatives, and the stochastic nature of evolutionary processes. Because mutation pressure toward reduced affinity conflicts with selective pressure for greater interaction, situations can arise in which the expected distribution of the degree of matching between interacting partners is bimodal, even in the face of constant selection. Although biomolecules with larger numbers of interacting partners are subject to increased levels of evolutionary conservation, their more numerous partners need not converge on a single sequence motif or be increasingly constrained in more complex systems. These results suggest that most phylogenetic differences in the sequences of binding interfaces are not the result of adaptive fine tuning but a simple consequence of random genetic drift.
    Proceedings of the National Academy of Sciences 12/2014; 112(1). DOI:10.1073/pnas.1421641112 · 9.81 Impact Factor


Available from
May 22, 2014