Increasing Coverage of Transcription Factor Position Weight Matrices through Domain-level Homology

Institute for Systems Biology, Seattle, Washington, United States of America.
PLoS ONE (Impact Factor: 3.23). 08/2012; 7(8):e42779. DOI: 10.1371/journal.pone.0042779
Source: PubMed


Transcription factor-DNA interactions, central to cellular regulation and control, are commonly described by position weight matrices (PWMs). These matrices are frequently used to predict transcription factor binding sites in regulatory regions of DNA to complement and guide further experimental investigation. The DNA sequence preferences of transcription factors, encoded in PWMs, are dictated primarily by select residues within the DNA binding domain(s) that interact directly with DNA. Therefore, the DNA binding properties of homologous transcription factors with identical DNA binding domains may be characterized by PWMs derived from different species. Accordingly, we have implemented a fully automated domain-level homology searching method for identical DNA binding sequences.
By applying the domain-level homology search to transcription factors with existing PWMs in the JASPAR and TRANSFAC databases, we were able to significantly increase coverage in terms of the total number of PWMs associated with a given species, assign PWMs to transcription factors that did not previously have any associations, and increase the number of represented species with PWMs over an order of magnitude. Additionally, using protein binding microarray (PBM) data, we have validated the domain-level method by demonstrating that transcription factor pairs with matching DNA binding domains exhibit comparable DNA binding specificity predictions to transcription factor pairs with completely identical sequences.
The increased coverage achieved herein demonstrates the potential for more thorough species-associated investigation of protein-DNA interactions using existing resources. The PWM scanning results highlight the challenging nature of transcription factors that contain multiple DNA binding domains, as well as the impact of motif discovery on the ability to predict DNA binding properties. The method is additionally suitable for identifying domain-level homology mappings to enable utilization of additional information sources in the study of transcription factors. The domain-level homology search method, resulting PWM mappings, web-based user interface, and web API are publicly available at

Download full-text


Available from: Ilya Shmulevich, Jun 23, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Transcription factor (TF) DNA sequence preferences direct their regulatory activity, but are currently known for only ∼1% of eukaryotic TFs. Broadly sampling DNA-binding domain (DBD) types from multiple eukaryotic clades, we determined DNA sequence preferences for >1,000 TFs encompassing 54 different DBD classes from 131 diverse eukaryotes. We find that closely related DBDs almost always have very similar DNA sequence preferences, enabling inference of motifs for ∼34% of the ∼170,000 known or predicted eukaryotic TFs. Sequences matching both measured and inferred motifs are enriched in chromatin immunoprecipitation sequencing (ChIP-seq) peaks and upstream of transcription start sites in diverse eukaryotic lineages. SNPs defining expression quantitative trait loci in Arabidopsis promoters are also enriched for predicted TF binding sites. Importantly, our motif "library" can be used to identify specific TFs whose binding may be altered by human disease risk alleles. These data present a powerful resource for mapping transcriptional networks across eukaryotes.
    Cell 09/2014; 158(6):1431-43. DOI:10.1016/j.cell.2014.08.009 · 32.24 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: High protein secretion capacity in filamentous fungi requires an extremely efficient system for protein synthesis, folding and transport. When the folding capacity of the endoplasmic reticulum (ER) is exceeded, a pathway known as the unfolded protein response (UPR) is triggered, allowing cells to mitigate and cope with this stress. In yeast, this pathway relies on the transcription factor Hac1, which mediates the up-regulation of several genes required under these stressful conditions. In this work, we identified and characterized the ortholog of the yeast HAC1 gene in the filamentous fungus Neurospora crassa. We show that its mRNA undergoes an ER stress-dependent splicing reaction, which in N. crassa removes a 23 nt intron and leads to a change in the open reading frame. By disrupting the N. crassa hac-1 gene, we determined it to be crucial for activating UPR and for proper growth in the presence of ER stress-inducing chemical agents. Neurospora is naturally found growing on dead plant material, composed primarily by lignocellulose, and is a model organism for the study of plant cell wall deconstruction. Notably, we found that growth on cellulose, a substrate that requires secretion of numerous enzymes, imposes major demands on ER function and is dramatically impaired in the absence of hac-1, thus broadening the range of physiological functions of the UPR in filamentous fungi. Growth on hemicellulose however, another carbon source that necessitates the secretion of various enzymes for its deconstruction, is not impaired in the mutant nor is the amount of proteins secreted on this substrate, suggesting that secretion, as a whole, is unaltered in the absence of hac-1. The characterization of this signaling pathway in N. crassa will help in the study of plant cell wall deconstruction by fungi and its manipulation may result in important industrial biotechnological applications.
    PLoS ONE 07/2015; 10(7):e0131415. DOI:10.1371/journal.pone.0131415 · 3.23 Impact Factor