Why Transcription Factor Binding Sites Are Ten Nucleotides Long

University of Pennsylvania.
Genetics (Impact Factor: 5.96). 08/2012; 192(3). DOI: 10.1534/genetics.112.143370
Source: PubMed


Gene expression is controlled primarily by transcription factors, whose DNA binding sites are typically 10 nucleotides long. We develop a population-genetic model to understand how the length and information content of such binding sites evolve. Our analysis is based on an inherent tradeoff between specificity, which is greater in long binding sites, and robustness to mutation, which is greater in short binding sites. The evolutionary stable distribution of binding site lengths predicted by the model agrees with the empirical distribution (5 nt to 31 nt, with mean 9.9 nt for eukaryotes), and it is remarkably robust to variation in the underlying parameters of population size, mutation rate, number of transcription factor targets, and strength of selection for proper binding and selection against improper binding. In a systematic dataset of eukaryotic and prokaryotic transcription factors we also uncover strong relationships between the length of a binding site and its information content per nucleotide, as well as between the number of targets a transcription factor regulates and the information content in its binding sites. Our analysis explains these features as well as the remarkable conservation of binding site characteristics across diverse taxa.

Download full-text


Available from: Alexander J. Stewart, Jan 02, 2014
    • "Information Maintenance Mutator mutations increase evolvability A high mutation rate leads to the cost in deleterious mutations A high mutation rate in E. coli was initially beneficial in the mouse gut because it allowed faster adaptation, but this benefit disappeared once adaptation was achieved.[79]Translational accuracy Translational rate The bacterial ribosome's interactions with mRNA govern its translation rate, which is controlled by trade-offs between site[80,81]Transcription factor binding sites are subject to a trade-off between specificity, which is greater in long binding sites, and robustness to mutation, which is greater in short binding sites.[82]Fitness through reduced genome by loss of biosynthetic genes in rich media Low fitness in minimal media, smaller niche Selective advantages can explain the prevalent loss of biosynthetic genes in E. coli, Shigella, and Acinetobacter baylyi, but with loss of prototrophy.[36,37]Chemosensory "
    [Show abstract] [Hide abstract]
    ABSTRACT: Strain-to-strain variations in bacterial biofilm formation, metabolism, motility, virulence, evolvability, DNA repair and resistance (to phage, antibiotics, or environmental stresses) each contribute to bacterial diversity. Microbiologists should be aware that all of these traits are subject to constraints imposed by trade-offs, so adaptations improving one trait may be at the cost of another. A deeper appreciation of trade-offs is thus crucial for assessing the mechanistic limits on important bacterial characteristics. Studies of the negative correlations between various traits have revealed three molecular mechanisms, namely, trade-offs involving resource allocation, design constraint, and information processing. This review further discusses why these trade-off mechanisms are important in the establishment of models capable of predicting bacterial competition, coexistence, and sources of diversity.
    No preview · Article · Dec 2015 · Trends in Microbiology
  • Source
    • "e binding sites are on average merely ten nucleotides long ( Stewart et al . , 2012 ) , regulatory regions comprise promoters and enhancers that span thousands of nucleotides ( The ENCODE Project Consortium , 2012 ) , and the average information content per nucleotide of binding sites is roughly 65% of the maximum , indicating modest specificity ( Stewart et al . , 2012 ) . Taken together with evidence that synthetically - added regulatory interactions rarely impact phenotype ( Isalan et al . , 2008 ) , these observations suggest that mutational robustness may contribute to the apparent complexity of transcriptional regulatory networks . What is more , non - functional regulatory interactions may form "
    [Show abstract] [Hide abstract]
    ABSTRACT: Robustness is the invariance of a phenotype in the face of environmental or genetic change. The phenotypes produced by transcriptional regulatory circuits are gene expression patterns that are to some extent robust to mutations. Here we review several causes of this robustness. They include robustness of individual transcription factor binding sites, homotypic clusters of such sites, redundant enhancers, transcription factors, redundant transcription factors, and the wiring of transcriptional regulatory circuits. Such robustness can either be an adaptation by itself, a byproduct of other adaptations, or the result of biophysical principles and non-adaptive forces of genome evolution. The potential consequences of such robustness include complex regulatory network topologies that arise through neutral evolution, as well as cryptic variation, i.e., genotypic divergence without phenotypic divergence. On the longest evolutionary timescales, the robustness of transcriptional regulation has helped shape life as we know it, by facilitating evolutionary innovations that helped organisms such as flowering plants and vertebrates diversify.
    Full-text · Article · Oct 2015 · Frontiers in Genetics
  • Source
    • "Eukaryotic transcription factor binding sites are the result of a trade-off between the specificity offered by longer stretches of DNA and the robustness to mutation offered by shorter sequences and vary in length between 5 and >30nt, with an average length of 10nt (Stewart et al. 2012). It has been estimated that eukaryotic promoters may contain 10-50 binding sites for 5-15 different transcription factors (Wray et al. 2003). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Snake venom has been hypothesised to have originated and diversified via a process that involves duplication of genes encoding body proteins with subsequent recruitment of the copy to the venom gland, where natural selection acts to develop or increase toxicity. However, gene duplication is known to be a rare event in vertebrate genomes and the recruitment of duplicated genes to a novel expression domain (neofunctionalisation) is an even rarer process that requires the evolution of novel combinations of transcription factor binding sites in upstream regulatory regions. Therefore, whilst this hypothesis concerning the evolution of snake venom is therefore very unlikely and should be regarded with caution, it is nonetheless often assumed to be established fact, hindering research into the true origins of snake venom toxins. To critically evaluate this hypothesis we have generated transcriptomic data for body tissues and salivary and venom glands from five species of venomous and non-venomous reptiles. Our comparative transcriptomic analysis of these data reveals that snake venom does not evolve via the hypothesised process of duplication and recruitment of genes encoding body proteins. Indeed, our results show that many proposed venom toxins are in fact expressed in a wide variety of body tissues, including the salivary gland of non-venomous reptiles and that these genes have therefore been restricted to the venom gland following duplication, not recruited. Thus snake venom evolves via the duplication and subfunctionalisation of genes encoding existing salivary proteins. These results highlight the danger of the elegant and intuitive "just-so story" in evolutionary biology.
    Full-text · Article · Jul 2014 · Genome Biology and Evolution
Show more