Using a structural and logics systems approach to infer bHLH–DNA binding specificity determinants

Department of Medicine, Division of Genetics, Brigham & Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
Nucleic Acids Research (Impact Factor: 9.11). 02/2011; 39(11):4553-63. DOI: 10.1093/nar/gkr070
Source: PubMed

ABSTRACT Numerous efforts are underway to determine gene regulatory networks that describe physical relationships between transcription factors (TFs) and their target DNA sequences. Members of paralogous TF families typically recognize similar DNA sequences. Knowledge of the molecular determinants of protein-DNA recognition by paralogous TFs is of central importance for understanding how small differences in DNA specificities can dictate target gene selection. Previously, we determined the in vitro DNA binding specificities of 19 Caenorhabditis elegans basic helix-loop-helix (bHLH) dimers using protein binding microarrays. These TFs bind E-box (CANNTG) and E-box-like sequences. Here, we combine these data with logics, bHLH-DNA co-crystal structures and computational modeling to infer which bHLH monomer can interact with which CAN E-box half-site and we identify a critical residue in the protein that dictates this specificity. Validation experiments using mutant bHLH proteins provide support for our inferences. Our study provides insights into the mechanisms of DNA recognition by bHLH dimers as well as a blueprint for system-level studies of the DNA binding determinants of other TF families in different model organisms and humans.

Download full-text


Available from: Federico De Masi, Mar 12, 2014
  • Source
    • "This observation is important because it suggests that the sequence preferences of TFs may be broadly inferred from data for only a small subset of TFs (Alleyne et al., 2009; Berger et al., 2008; Bernard et al., 2012; Noyes et al., 2008). However, these analyses have utilized data for only a handful of DBD classes and species and they contrast with numerous demonstrations that mutation of one or a few critical DBD AAs can alter the sequence preferences of a TF (Aggarwal et al., 2010; Cook et al., 1994; De Masi et al., 2011; Mathias et al., 2001; Noyes et al., 2008), which suggest that prediction of DNA binding preferences by homology should be highly error-prone. To our knowledge, rigorous and exhaustive analyses of the accuracy and limitations of inference approaches to predicting TF DNAbinding motifs using DBD sequences has not been done. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Transcription factor (TF) DNA sequence preferences direct their regulatory activity, but are currently known for only ∼1% of eukaryotic TFs. Broadly sampling DNA-binding domain (DBD) types from multiple eukaryotic clades, we determined DNA sequence preferences for >1,000 TFs encompassing 54 different DBD classes from 131 diverse eukaryotes. We find that closely related DBDs almost always have very similar DNA sequence preferences, enabling inference of motifs for ∼34% of the ∼170,000 known or predicted eukaryotic TFs. Sequences matching both measured and inferred motifs are enriched in chromatin immunoprecipitation sequencing (ChIP-seq) peaks and upstream of transcription start sites in diverse eukaryotic lineages. SNPs defining expression quantitative trait loci in Arabidopsis promoters are also enriched for predicted TF binding sites. Importantly, our motif "library" can be used to identify specific TFs whose binding may be altered by human disease risk alleles. These data present a powerful resource for mapping transcriptional networks across eukaryotes.
    Cell 09/2014; 158(6):1431-43. DOI:10.1016/j.cell.2014.08.009 · 33.12 Impact Factor
  • Source
    • "Moreover, our approach is not limited to array designs based on de Bruijn sequences, but rather can be applied to any data sets using PBMs or other assays for which binding scores for k-mers are generated. Numerous studies have focused on different TF structural classes , with the goal of identifying recognition rules underlying protein-DNA binding specificity (Benos, et al., 2002; De Masi, et al., 2011; Noyes, et al., 2008; Suzuki and Yagi, 1994). Precise classification of TFs according to their DNA binding sequence preferences together with identification of those sets of preferred sequences , as provided by our modeling approach, will permit more detailed studies of the molecular determinants of TF-DNA binding specificity. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Sequence-specific transcription factors (TFs) regulate the expression of their target genes through interactions with specific DNA-binding sites in the genome. Data on TF-DNA binding specificities are essential for understanding how regulatory specificity is achieved. Results: Numerous studies have used universal protein-binding microarray (PBM) technology to determine the in vitro binding specificities of hundreds of TFs for all possible 8 bp sequences (8mers). We have developed a Bayesian analysis of variance (ANOVA) model that decomposes these 8mer data into background noise, TF familywise effects and effects due to the particular TF. Adjusting for background noise improves PBM data quality and concordance with in vivo TF binding data. Moreover, our model provides simultaneous identification of TF subclasses and their shared sequence preferences, and also of 8mers bound preferentially by individual members of TF subclasses. Such results may aid in deciphering cis-regulatory codes and determinants of protein–DNA binding specificity. Availability and implementation: Source code, compiled code and R and Python scripts are available from Contact: or Supplementary information: Supplementary data are available at Bioinformatics online.
    Bioinformatics 04/2013; 29(11):1390-1398. DOI:10.1093/bioinformatics/btt152 · 4.62 Impact Factor
  • Source
    • "Henceforth, we will refer to the two base pairs immediately upstream and downstream of the E-box as the ''proximal flanks'' and the base pairs more than two positions away from the E-box as the ''distal flanks'' (Figure 3A). Previous studies of bHLH DNA binding specificity focused either on the core E-box or the 2 bp proximal flanks (e.g., De Masi et al., 2011; Fong et al., 2012; Grove et al., 2009; Maerkl and Quake, 2007; Wang et al., 2012). Our analyses of the gcPBM data revealed that in addition to the E-box site and the proximal flanks, the distal flanks also contribute to the differential DNA binding specificities of Cbf1 and Tye7. "
    [Show abstract] [Hide abstract]
    ABSTRACT: DNA sequence is a major determinant of the binding specificity of transcription factors (TFs) for their genomic targets. However, eukaryotic cells often express, at the same time, TFs with highly similar DNA binding motifs but distinct in vivo targets. Currently, it is not well understood how TFs with seemingly identical DNA motifs achieve unique specificities in vivo. Here, we used custom protein-binding microarrays to analyze TF specificity for putative binding sites in their genomic sequence context. Using yeast TFs Cbf1 and Tye7 as our case studies, we found that binding sites of these bHLH TFs (i.e., E-boxes) are bound differently in vitro and in vivo, depending on their genomic context. Computational analyses suggest that nucleotides outside E-box binding sites contribute to specificity by influencing the three-dimensional structure of DNA binding sites. Thus, the local shape of target sites might play a widespread role in achieving regulatory specificity within TF families.
    Cell Reports 04/2013; 31(4). DOI:10.1016/j.celrep.2013.03.014 · 7.21 Impact Factor
Show more