Why Transcription Factor Binding Sites are Ten Nucleotides Long.

University of Pennsylvania.
Genetics (Impact Factor: 4.87). 08/2012; DOI: 10.1534/genetics.112.143370
Source: PubMed

ABSTRACT Gene expression is controlled primarily by transcription factors, whose DNA binding sites are typically 10 nucleotides long. We develop a population-genetic model to understand how the length and information content of such binding sites evolve. Our analysis is based on an inherent tradeoff between specificity, which is greater in long binding sites, and robustness to mutation, which is greater in short binding sites. The evolutionary stable distribution of binding site lengths predicted by the model agrees with the empirical distribution (5 nt to 31 nt, with mean 9.9 nt for eukaryotes), and it is remarkably robust to variation in the underlying parameters of population size, mutation rate, number of transcription factor targets, and strength of selection for proper binding and selection against improper binding. In a systematic dataset of eukaryotic and prokaryotic transcription factors we also uncover strong relationships between the length of a binding site and its information content per nucleotide, as well as between the number of targets a transcription factor regulates and the information content in its binding sites. Our analysis explains these features as well as the remarkable conservation of binding site characteristics across diverse taxa.

1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The genomes of large multicellular eukaryotes are mostly comprised of non-protein coding DNA. Although there has been much agreement that a small fraction of these genomes has important biological functions, there has been much debate as to whether the rest contributes to development and/or homeostasis. Much of the speculation has centered on the genomic regions that are transcribed into RNA at some low level. Unfortunately these RNAs have been arbitrarily assigned various names, such as "intergenic RNA," "long non-coding RNAs" etc., which have led to some confusion in the field. Many researchers believe that these transcripts represent a vast, unchartered world of functional non-coding RNAs (ncRNAs), simply because they exist. However, there are reasons to question this Panglossian view because it ignores our current understanding of how evolution shapes eukaryotic genomes and how the gene expression machinery works in eukaryotic cells. Although there are undoubtedly many more functional ncRNAs yet to be discovered and characterized, it is also likely that many of these transcripts are simply junk. Here, we discuss how to determine whether any given ncRNA has a function. Importantly, we advocate that in the absence of any such data, the appropriate null hypothesis is that the RNA in question is junk.
    Frontiers in Genetics 01/2015; 6:2. DOI:10.3389/fgene.2015.00002
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Author Summary Our neurons can become over a hundred years old. Even if neurons are restructured and remodeled by their constant work of receiving, storing and sending information, they stay devoted to one single task and retain their identity for their whole life. How a neuron keeps its identity is not well understood. In the olfactory system, the identity of the olfactory sensory neuron (OSN) is a result of the expression of a single odorant receptor (OR) from a large receptor gene repertoire in the genome. Neurons that share an expressed receptor make a functional class. Here, we identify clusters of transcription factor binding motifs to be the smallest unit that drive expression in a single olfactory sensory neuron class. We further demonstrate that it is the structure of the cluster that determines the class specific expression. However, environmental stress, such as temperature changes or starvation, destabilizes the expression produced by the cluster. Our results demonstrate that stable expression is generated from redundant motifs outside the cluster and suggest that cooperative regulation generates robust expression of the genes that determine neuronal identity and function.
    PLoS Genetics 03/2015; 11(3):e1005051. DOI:10.1371/journal.pgen.1005051 · 8.17 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Many cellular functions depend on highly specific intermolecular interactions, for example transcription factors and their DNA binding sites, microRNAs and their RNA binding sites, the interfaces between heterodimeric protein molecules, the stems in RNA molecules, and kinases and their response regulators in signal-transduction systems. Despite the need for complementarity between interacting partners, such pairwise systems seem to be capable of high levels of evolutionary divergence, even when subject to strong selection. Such behavior is a consequence of the diminishing advantages of increasing binding affinity between partners, the multiplicity of evolutionary pathways between selectively equivalent alternatives, and the stochastic nature of evolutionary processes. Because mutation pressure toward reduced affinity conflicts with selective pressure for greater interaction, situations can arise in which the expected distribution of the degree of matching between interacting partners is bimodal, even in the face of constant selection. Although biomolecules with larger numbers of interacting partners are subject to increased levels of evolutionary conservation, their more numerous partners need not converge on a single sequence motif or be increasingly constrained in more complex systems. These results suggest that most phylogenetic differences in the sequences of binding interfaces are not the result of adaptive fine tuning but a simple consequence of random genetic drift.
    Proceedings of the National Academy of Sciences 12/2014; 112(1). DOI:10.1073/pnas.1421641112 · 9.81 Impact Factor


Available from
May 22, 2014