The role of DNA shape in protein-DNA recognition

Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biophysics, Columbia University, 1130 Saint Nicholas Avenue, New York, New York 10032, USA.
Nature (Impact Factor: 42.35). 10/2009; 461(7268):1248-53. DOI: 10.1038/nature08473
Source: PubMed

ABSTRACT The recognition of specific DNA sequences by proteins is thought to depend on two types of mechanism: one that involves the formation of hydrogen bonds with specific bases, primarily in the major groove, and one involving sequence-dependent deformations of the DNA helix. By comprehensively analysing the three-dimensional structures of protein-DNA complexes, here we show that the binding of arginine residues to narrow minor grooves is a widely used mode for protein-DNA recognition. This readout mechanism exploits the phenomenon that narrow minor grooves strongly enhance the negative electrostatic potential of the DNA. The nucleosome core particle offers a prominent example of this effect. Minor-groove narrowing is often associated with the presence of A-tracts, AT-rich sequences that exclude the flexible TpA step. These findings indicate that the ability to detect local variations in DNA shape and electrostatic potential is a general mechanism that enables proteins to use information in the minor groove, which otherwise offers few opportunities for the formation of base-specific hydrogen bonds, to achieve DNA-binding specificity.

Download full-text


Available from: Remo Rohs, Aug 11, 2015
  • Source
    • "Remarkably, the CXC domain uses a single arginine to directly read out dinucleotide sequences from the minor groove of DNA, distinct from other DNA-binding domains that commonly recognize DNA sequence from the major groove with large secondary structure elements (Freemont et al. 1991). Arginine has been documented to interact with the minor groove but, in most cases, indirectly reads out DNA sequences by binding narrow minor grooves adopted by AT-rich sequences (Rohs et al. 2009). A single small CXC domain offers only limited binding specificity and affinity. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The male-specific lethal dosage compensation complex (MSL-DCC) selectively assembles on the X chromosome in Drosophila males and activates gene transcription by twofold through histone acetylation. An MSL recognition element (MRE) sequence motif nucleates the initial MSL association, but how it is recognized remains unknown. Here, we identified the CXC domain of MSL2 specifically recognizing the MRE motif and determined its crystal structure bound to specific and nonspecific DNAs. The CXC domain primarily contacts one strand of DNA duplex and employs a single arginine to directly read out dinucleotide sequences from the minor groove. The arginine is flexible when bound to nonspecific sequences. The core region of the MRE motif harbors two binding sites on opposite strands that can cooperatively recruit a CXC dimer. Specific DNA-binding mutants of MSL2 are impaired in MRE binding and X chromosome localization in vivo. Our results reveal multiple dynamic DNA-binding modes of the CXC domain that target the MSL-DCC to X chromosomes. [Keywords: dosage compensation; crystal structure; DNA–protein complex; nonspecific DNA binding] The evolution of species with sexual dimorphism com-monly involved converting a pair of autosomes into heteromorphic sex chromosomes. In humans and fruit flies, two X chromosomes define the female sex, whereas males have only one X in addition to the Y chromosome. Avoiding recombination between the sex chromosomes, the proto-Y chromosome lost most of its resident genes, leaving the proto-X monosomic in the males. This un-balanced situation diminishes the vitality of the organ-ism and therefore generated an evolutionary pressure to compensate for the reduced dosage of X chromosomal genes. In mammals and fruit flies, this is achieved by selective transcriptional activation of X chromosomal genes through histone acetylation (Straub and Becker 2011; Deng et al. 2013). Whereas in Drosophila mela-nogaster, the X chromosome is only boosted in males, in mammals, all X chromosomes in both sexes are activated followed by the selective inactivation of one X in females (Disteche 2012). One of the fundamental questions of outstanding in-terest is how an entire sex chromosome is molecularly distinguished from the autosomes. This question can be addressed conveniently in the Drosophila model, where a basic set of dosage compensation factors has been found following the male-specific lethal (MSL) loss-of-function phenotype. These so-called MSL proteins and noncoding roX (or RNA on the X) RNAs form a regulatory complex (the MSL dosage compensation complex [MSL-DCC]) that selectively associates with the X chromosome. The MSL-DCC consists of MSL1, MSL2, MSL3, the RNA helicase maleless (MLE), the histone acetyltransferase MOF (males absent on the first), and roX RNAs. MSL1 is a dimeric scaffolding protein that interacts with MSL2, MSL3, and MOF. The structural basis for these
    Genes & development 12/2014; 28(28):2652-2662. DOI:10.1101/gad.250936.114 · 12.64 Impact Factor
  • Source
    • "This co-occurrence explains the ability of the same DNA sequence features that discriminate bound from unbound Pu.1 sites to predict nucleosome occupancy in cells that do not express Pu.1. Whether such co-occurrence underlies direct causal relationships between DNA features that control TF recruitment (such as DNA shape characteristics) (Rohs et al., 2009) and nucleosome assembly remains to be determined . Moreover, the general relevance of this model outside of this specific set of regulatory sites will have to be assessed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Transcription factors (TFs) preferentially bind sites contained in regions of computationally predicted high nucleosomal occupancy, suggesting that nucleosomes are gatekeepers of TF binding sites. However, because of their complexity mammalian genomes contain millions of randomly occurring, unbound TF consensus binding sites. We hypothesized that the information controlling nucleosome assembly may coincide with the information that enables TFs to bind cis-regulatory elements while ignoring randomly occurring sites. Hence, nucleosomes would selectively mask genomic sites that can be contacted by TFs and thus be potentially functional. The hematopoietic pioneer TF Pu.1 maintained nucleosome depletion at macrophage-specific enhancers that displayed a broad range of nucleosome occupancy in other cell types and in reconstituted chromatin. We identified a minimal set of DNA sequence and shape features that accurately predicted both Pu.1 binding and nucleosome occupancy genome-wide. These data reveal a basic organizational principle of mammalian cis-regulatory elements whereby TF recruitment and nucleosome deposition are controlled by overlapping DNA sequence features.
    Molecular cell 05/2014; DOI:10.1016/j.molcel.2014.04.006 · 14.46 Impact Factor
  • Source
    • "of dA:dT [ poly(dA:dT) tracts ] (Adachi et al., 1989; Käs et al., 1989; Rohs et al., 2009) and regions of strand unpairing ( Bode et al., 1992 ) , both of which are common features of AT - rich DNA . How - ever , not all AT - rich DNA fragments bind to the nuclear matrix ( von Kries et al. , 1991 ; Dickinson et al . "
    [Show abstract] [Hide abstract]
    ABSTRACT: Scaffold or matrix attachment regions (S/MARs) are found in all eukaryotes. The pattern of distribution and genomic context of S/MARs is thought to be important for processes such as chromatin organization and modulation of gene expression. Despite the importance of such processes, much is unknown about the large-scale distribution and sequence content of S/MARs in vivo. Here, we report the use of tiling microarrays to map 1358 S/MARs on Arabidopsis thaliana chromosome 4 (chr4). S/MARs occur throughout chr4, spaced much more closely than in the large plant and animal genomes that have been studied to date. Arabidopsis S/MARs can be divided into five clusters based on their association with other genomic features, suggesting a diversity of functions. While some Arabidopsis S/MARs may define structural domains, most occur near the transcription start sites of genes. Genes associated with these S/MARs have an increased probability of expression, which is particularly pronounced in the case of transcription factor genes. Analysis of sequence motifs and 6-mer enrichment patterns show that S/MARs are preferentially enriched in poly(dA:dT) tracts, sequences that resist nucleosome formation, and the majority of S/MARs contain at least one nucleosome-depleted region. This global view of S/MARs provides a framework to begin evaluating genome-scale models for S/MAR function.
    The Plant Cell 01/2014; 26(1). DOI:10.1105/tpc.113.121194 · 9.58 Impact Factor
Show more