The role of DNA shape in protein-DNA recognition

Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biophysics, Columbia University, 1130 Saint Nicholas Avenue, New York, New York 10032, USA.
Nature (Impact Factor: 41.46). 10/2009; 461(7268):1248-53. DOI: 10.1038/nature08473
Source: PubMed


The recognition of specific DNA sequences by proteins is thought to depend on two types of mechanism: one that involves the formation of hydrogen bonds with specific bases, primarily in the major groove, and one involving sequence-dependent deformations of the DNA helix. By comprehensively analysing the three-dimensional structures of protein-DNA complexes, here we show that the binding of arginine residues to narrow minor grooves is a widely used mode for protein-DNA recognition. This readout mechanism exploits the phenomenon that narrow minor grooves strongly enhance the negative electrostatic potential of the DNA. The nucleosome core particle offers a prominent example of this effect. Minor-groove narrowing is often associated with the presence of A-tracts, AT-rich sequences that exclude the flexible TpA step. These findings indicate that the ability to detect local variations in DNA shape and electrostatic potential is a general mechanism that enables proteins to use information in the minor groove, which otherwise offers few opportunities for the formation of base-specific hydrogen bonds, to achieve DNA-binding specificity.


Available from: Remo Rohs
  • Source
    • "Remarkably, the CXC domain uses a single arginine to directly read out dinucleotide sequences from the minor groove of DNA, distinct from other DNA-binding domains that commonly recognize DNA sequence from the major groove with large secondary structure elements (Freemont et al. 1991). Arginine has been documented to interact with the minor groove but, in most cases, indirectly reads out DNA sequences by binding narrow minor grooves adopted by AT-rich sequences (Rohs et al. 2009). A single small CXC domain offers only limited binding specificity and affinity. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The male-specific lethal dosage compensation complex (MSL-DCC) selectively assembles on the X chromosome in Drosophila males and activates gene transcription by twofold through histone acetylation. An MSL recognition element (MRE) sequence motif nucleates the initial MSL association, but how it is recognized remains unknown. Here, we identified the CXC domain of MSL2 specifically recognizing the MRE motif and determined its crystal structure bound to specific and nonspecific DNAs. The CXC domain primarily contacts one strand of DNA duplex and employs a single arginine to directly read out dinucleotide sequences from the minor groove. The arginine is flexible when bound to nonspecific sequences. The core region of the MRE motif harbors two binding sites on opposite strands that can cooperatively recruit a CXC dimer. Specific DNA-binding mutants of MSL2 are impaired in MRE binding and X chromosome localization in vivo. Our results reveal multiple dynamic DNA-binding modes of the CXC domain that target the MSL-DCC to X chromosomes.
    Genes & development 12/2014; 28(28):2652-2662. DOI:10.1101/gad.250936.114 · 10.80 Impact Factor
  • Source
    • " GC-content and ORChID2 (Rohs et al. 2009; Bishop et al. 2011) scores were calculated from the nucleotide sequences of the CREs. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The histone modification state of genomic regions is hypothesized to reflect the regulatory activity of the underlying genomic DNA. Based on this hypothesis, the ENCODE Project Consortium measured the status of multiple histone modifications across the genome in several cell types and used these data to segment the genome into regions with different predicted regulatory activities. We measured the cis-regulatory activity of more than 2000 of these predictions in the K562 leukemia cell line. We tested genomic segments predicted to be Enhancers, Weak Enhancers, or Repressed elements in K562 cells, along with other sequences predicted to be Enhancers specific to the HI human embryonic stem cell line (H1-hESC). Both Enhancer and Weak Enhancer sequences in K562 cells were more active than negative controls, although surprisingly, Weak Enhancer segmentations drove expression higher than did Enhancer segmentations. Lower levels of the covalent histone modifications H3K36me3 and H3K27ac, thought to mark active enhancers and transcribed gene bodies, associate with higher expression and partly explain the higher activity of Weak Enhancers over Enhancer predictions. While DNase I hypersensitivity (HS) is a good predictor of active sequences in our assay, transcription factor (TF) binding models need to be included in order to accurately identify highly expressed sequences. Overall, our results show that a significant fraction (similar to 26%) of the ENCODE enhancer predictions have regulatory activity, suggesting that histone modification states can reflect the cis-regulatory activity of sequences in the genome, but that specific sequence preferences, such as TF-binding sites, are the causal determinants of cis-regulatory activity.
    Genome Research 07/2014; 24(10). DOI:10.1101/gr.173518.114 · 14.63 Impact Factor
  • Source
    • "This co-occurrence explains the ability of the same DNA sequence features that discriminate bound from unbound Pu.1 sites to predict nucleosome occupancy in cells that do not express Pu.1. Whether such co-occurrence underlies direct causal relationships between DNA features that control TF recruitment (such as DNA shape characteristics) (Rohs et al., 2009) and nucleosome assembly remains to be determined . Moreover, the general relevance of this model outside of this specific set of regulatory sites will have to be assessed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Transcription factors (TFs) preferentially bind sites contained in regions of computationally predicted high nucleosomal occupancy, suggesting that nucleosomes are gatekeepers of TF binding sites. However, because of their complexity mammalian genomes contain millions of randomly occurring, unbound TF consensus binding sites. We hypothesized that the information controlling nucleosome assembly may coincide with the information that enables TFs to bind cis-regulatory elements while ignoring randomly occurring sites. Hence, nucleosomes would selectively mask genomic sites that can be contacted by TFs and thus be potentially functional. The hematopoietic pioneer TF Pu.1 maintained nucleosome depletion at macrophage-specific enhancers that displayed a broad range of nucleosome occupancy in other cell types and in reconstituted chromatin. We identified a minimal set of DNA sequence and shape features that accurately predicted both Pu.1 binding and nucleosome occupancy genome-wide. These data reveal a basic organizational principle of mammalian cis-regulatory elements whereby TF recruitment and nucleosome deposition are controlled by overlapping DNA sequence features.
    Molecular cell 05/2014; DOI:10.1016/j.molcel.2014.04.006 · 14.02 Impact Factor
Show more