-
[show abstract]
[hide abstract]
ABSTRACT: Erythropoiesis is dependent upon the lineage-specific transcription factors, Gata1, Tal1 and Klf1. Several erythroid genes have been shown to require all three factors for their expression suggesting that they function synergistically; however, there is little direct evidence for widespread cooperation. Gata1 and Tal1 can assemble within higher order protein complexes (Ldb1-complexes) that include the adapter molecules Lmo2 and Ldb1. Ldb1 proteins are capable of co-association and long-range Ldb1-mediated oligomerization of enhancer and promoter bound Ldb1-complexes has been shown to be required for β-globin gene expression. In this study, we generated a genome-wide map of Ldb1-complex binding sites that revealed widespread binding at erythroid genes and at known erythroid enhancer elements. Ldb1-complex binding sites frequently co-localized with Klf1 binding sites and with consensus binding motifs for other erythroid transcription factors. Transcriptomic analysis demonstrated strong correlation between Ldb1-complex binding and Ldb1 dependency for gene expression and identified a large cohort of genes co-regulated by Ldb1-complexes and Klf1. Together, these results provide a foundation for defining the mechanism and scope of Ldb1-complex activity during erythropoiesis.
Blood 04/2013; · 9.90 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In this study, we demonstrate that the lack of retinoic acid-related orphan receptor (ROR) γ or α expression in mice significantly reduced the peak expression level of Cry1, Bmal1, E4bp4, Rev-Erbα and Per2 in an ROR isotype- and tissue-selective manner without affecting the phase of their rhythmic expression. Analysis of RORγ/RORα double knockout mice indicated that in certain tissues RORγ and RORα exhibited a certain degree of redundancy in regulating clock gene expression. Reporter gene analysis showed that RORγ was able to induce reporter gene activity through the RORE-containing regulatory regions of Cry1, Bmal1, Rev-Erbα and E4bp4. Co-expression of Rev-Erbα or addition of a novel ROR antagonist repressed this activation. ChIP-Seq and ChIP-Quantitative real-time polymerase chain reaction (QPCR) analysis demonstrated that in vivo RORγ regulate these genes directly and in a Zeitgeber time (ZT)-dependent manner through these ROREs. This transcriptional activation by RORs was associated with changes in histone acetylation and chromatin accessibility. The rhythmic expression of RORγ1 by clock proteins may lead to the rhythmic expression of RORγ1 target genes. The presence of RORγ binding sites and its down-regulation in RORγ(-)(/)(-) liver suggest that the rhythmic expression of Avpr1a depends on RORγ consistent with the concept that RORγ1 provides a link between the clock machinery and its regulation of metabolic genes.
Nucleic Acids Research 06/2012; 40(17):8519-35. · 8.03 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Complex regulatory networks orchestrate most cellular processes in biological systems. Genes in such networks are subject to expression noise, resulting in isogenic cell populations exhibiting cell-to-cell variation in protein levels. Increasing evidence suggests that cells have evolved regulatory strategies to limit, tolerate or amplify expression noise. In this context, fundamental questions arise: how can the architecture of gene regulatory networks generate, make use of or be constrained by expression noise? Here, we discuss the interplay between expression noise and gene regulatory network at different levels of organization, ranging from a single regulatory interaction to entire regulatory networks. We then consider how this interplay impacts a variety of phenomena, such as pathogenicity, disease, adaptation to changing environments, differential cell-fate outcome and incomplete or partial penetrance effects. Finally, we highlight recent technological developments that permit measurements at the single-cell level, and discuss directions for future research.
Trends in Genetics 02/2012; 28(5):221-32. · 10.06 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Embryonic stem cell (ESC) identity and self-renewal is maintained by extrinsic signaling pathways and intrinsic gene regulatory networks. Here, we show that three members of the Ccr4-Not complex, Cnot1, Cnot2, and Cnot3, play critical roles in maintaining mouse and human ESC identity as a protein complex and inhibit differentiation into the extraembryonic lineages. Enriched in the inner cell mass of blastocysts, these Cnot genes are highly expressed in ESC and downregulated during differentiation. In mouse ESCs, Cnot1, Cnot2, and Cnot3 are important for maintenance in both normal conditions and the 2i/LIF medium that supports the ground state pluripotency. Genetic analysis indicated that they do not act through known self-renewal pathways or core transcription factors. Instead, they repress the expression of early trophectoderm (TE) transcription factors such as Cdx2. Importantly, these Cnot genes are also necessary for the maintenance of human ESCs, and silencing them mainly lead to TE and primitive endoderm differentiation. Together, our results indicate that Cnot1, Cnot2, and Cnot3 represent a novel component of the core self-renewal and pluripotency circuitry conserved in mouse and human ESCs.
Stem Cells 02/2012; 30(5):910-22. · 7.78 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Protein-DNA interactions play key roles in determining gene-expression programs during cellular development and differentiation. Chromatin immunoprecipitation (ChIP) is the most widely used assay for probing such interactions. With recent advances in sequencing technology, ChIP-Seq, an approach that combines ChIP and next-generation parallel sequencing is fast becoming the method of choice for mapping protein-DNA interactions on a genome-wide scale. Here, we briefly review the ChIP-Seq approach for mapping protein-DNA interactions and describe the use of the SISSRs peak-finder, a software tool for precise identification of protein-DNA binding sites from sequencing data generated using ChIP-Seq.
Methods in molecular biology (Clifton, N.J.) 01/2012; 802:305-22.
-
[show abstract]
[hide abstract]
ABSTRACT: The TET family of FE(II) and 2-oxoglutarate-dependent enzymes (Tet1/2/3) promote DNA demethylation by converting 5-methylcytosine to 5-hydroxymethylcytosine (5hmC), which they further oxidize into 5-formylcytosine and 5-carboxylcytosine. Tet1 is robustly expressed in mouse embryonic stem cells (mESCs) and has been implicated in mESC maintenance. Here we demonstrate that, unlike genetic deletion, RNAi-mediated depletion of Tet1 in mESCs led to a significant reduction in 5hmC and loss of mESC identity. The differentiation phenotype due to Tet1 depletion positively correlated with the extent of 5hmC loss. Meta-analyses of genomic data sets suggested interaction between Tet1 and leukemia inhibitory factor (LIF) signaling. LIF signaling is known to promote self-renewal and pluripotency in mESCs partly by opposing MAPK/ERK-mediated differentiation. Withdrawal of LIF leads to differentiation of mESCs. We discovered that Tet1 depletion impaired LIF-dependent Stat3-mediated gene activation by affecting Stat3's ability to bind to its target sites on chromatin. Nanog overexpression or inhibition of MAPK/ERK signaling, both known to maintain mESCs in the absence of LIF, rescued Tet1 depletion, further supporting the dependence of LIF/Stat3 signaling on Tet1. These data support the conclusion that analysis of mESCs in the hours/days immediately following efficient Tet1 depletion reveals Tet1's normal physiological role in maintaining the pluripotent state that may be subject to homeostatic compensation in genetic models.
Nucleic Acids Research 12/2011; 40(8):3364-77. · 8.03 Impact Factor
-
Gang Wei,
Brian J Abraham,
Ryoji Yagi, Raja Jothi,
Kairong Cui,
Suveena Sharma,
Leelavati Narlikar,
Daniel L Northrup,
Qingsong Tang,
William E Paul,
Jinfang Zhu,
Keji Zhao
[show abstract]
[hide abstract]
ABSTRACT: The transcription factor GATA3 plays an essential role during T cell development and T helper 2 (Th2) cell differentiation. To understand GATA3-mediated gene regulation, we identified genome-wide GATA3 binding sites in ten well-defined developmental and effector T lymphocyte lineages. In the thymus, GATA3 directly regulated many critical factors, including Th-POK, Notch1, and T cell receptor subunits. In the periphery, GATA3 induced a large number of Th2 cell-specific as well as Th2 cell-nonspecific genes, including several transcription factors. Our data also indicate that GATA3 regulates both active and repressive histone modifications of many target genes at their regulatory elements near GATA3 binding sites. Overall, although GATA3 binding exhibited both shared and cell-specific patterns among various T cell lineages, many genes were either positively or negatively regulated by GATA3 in a cell type-specific manner, suggesting that GATA3-mediated gene regulation depends strongly on cofactors existing in different T cells.
Immunity 08/2011; 35(2):299-311. · 21.64 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Maintaining a steady pool of self-renewing hematopoietic stem cells (HSCs) is critical for sustained production of multiple blood lineages. Many transcription factors and molecules involved in chromatin and epigenetic modifications have been found to be critical for HSC self-renewal and differentiation; however, their interplay is less understood. The transcription factor GA binding protein (GABP), consisting of DNA-binding subunit GABPα and transactivating subunit GABPβ, is essential for lymphopoiesis as shown in our previous studies. Here we demonstrate cell-intrinsic, absolute dependence on GABPα for maintenance and differentiation of hematopoietic stem/progenitor cells. Through genome-wide mapping of GABPα binding and transcriptomic analysis of GABPα-deficient HSCs, we identified Zfx and Etv6 transcription factors and prosurvival Bcl-2 family members including Bcl-2, Bcl-X(L), and Mcl-1 as direct GABP target genes, underlying its pivotal role in HSC survival. GABP also directly regulates Foxo3 and Pten and hence sustains HSC quiescence. Furthermore, GABP activates transcription of DNA methyltransferases and histone acetylases including p300, contributing to regulation of HSC self-renewal and differentiation. These systematic analyses revealed a GABP-controlled gene regulatory module that programs multiple aspects of HSC biology. Our studies thus constitute a critical first step in decoding how transcription factors are orchestrated to regulate maintenance and multipotency of HSCs.
Blood 02/2011; 117(7):2166-78. · 9.90 Impact Factor
-
LiQi Li, Raja Jothi,
Kairong Cui,
Jan Y Lee,
Tsadok Cohen,
Marat Gorivodsky,
Itai Tzchori,
Yangu Zhao,
Sandra M Hayes,
Emery H Bresnick,
Keji Zhao,
Heiner Westphal,
Paul E Love
[show abstract]
[hide abstract]
ABSTRACT: The nuclear adaptor Ldb1 functions as a core component of multiprotein transcription complexes that regulate differentiation in diverse cell types. In the hematopoietic lineage, Ldb1 forms a complex with the non-DNA-binding adaptor Lmo2 and the transcription factors E2A, Scl and GATA-1 (or GATA-2). Here we demonstrate a critical and continuous requirement for Ldb1 in the maintenance of both fetal and adult mouse hematopoietic stem cells (HSCs). Deletion of Ldb1 in hematopoietic progenitors resulted in the downregulation of many transcripts required for HSC maintenance. Genome-wide profiling by chromatin immunoprecipitation followed by sequencing (ChIP-Seq) identified Ldb1 complex-binding sites at highly conserved regions in the promoters of genes involved in HSC maintenance. Our results identify a central role for Ldb1 in regulating the transcriptional program responsible for the maintenance of HSCs.
Nature Immunology 02/2011; 12(2):129-36. · 26.01 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: DOMINE is a comprehensive collection of known and predicted domain-domain interactions (DDIs) compiled from 15 different sources. The updated DOMINE includes 2285 new domain-domain interactions (DDIs) inferred from experimentally characterized high-resolution three-dimensional structures, and about 3500 novel predictions by five computational approaches published over the last 3 years. These additions bring the total number of unique DDIs in the updated version to 26,219 among 5140 unique Pfam domains, a 23% increase compared to 20,513 unique DDIs among 4346 unique domains in the previous version. The updated version now contains 6634 known DDIs, and features a new classification scheme to assign confidence levels to predicted DDIs. DOMINE will serve as a valuable resource to those studying protein and domain interactions. Most importantly, DOMINE will not only serve as an excellent reference to bench scientists testing for new interactions but also to bioinformaticans seeking to predict novel protein-protein interactions based on the DDIs. The contents of the DOMINE are available at http://domine.utdallas.edu.
Nucleic Acids Research 01/2011; 39(Database issue):D730-5. · 8.03 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Signalling by the cytokine LIF and its downstream transcription factor, STAT3, prevents differentiation of pluripotent embryonic stem cells (ESCs). This contrasts with most cell types where STAT3 signalling induces differentiation. We find that STAT3 binding across the pluripotent genome is dependent on Brg1, the ATPase subunit of a specialized chromatin remodelling complex (esBAF) found in ESCs. Brg1 is required to establish chromatin accessibility at STAT3 binding targets, preparing these sites to respond to LIF signalling. Brg1 deletion leads to rapid polycomb (PcG) binding and H3K27me3-mediated silencing of many Brg1-activated targets genome wide, including the target genes of the LIF signalling pathway. Hence, one crucial role of Brg1 in ESCs involves its ability to potentiate LIF signalling by opposing PcG. Contrary to expectations, Brg1 also facilitates PcG function at classical PcG targets, including all four Hox loci, reinforcing their repression in ESCs. Therefore, esBAF does not simply antagonize PcG. Rather, the two chromatin regulators act both antagonistically and synergistically with the common goal of supporting pluripotency.
Nature Cell Biology 01/2011; 13(8):903-13. · 19.49 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: GA binding protein (GABP) consists of GABPα and GABPβ subunits. GABPα is a member of Ets family transcription factors and
binds DNA via its conserved Ets domain, whereas GABPβ does not bind DNA but possesses transactivation activity. In T cells,
GABP has been demonstrated to regulate the gene expression of interleukin-7 receptor α chain (IL-7Rα) and postulated to be
critical in T cell development. To directly investigate its function in early thymocyte development, we used GABPα conditional
knock-out mice where the exons encoding the Ets DNA-binding domain are flanked with LoxP sites. Ablation of GABPα with the
Lck-Cre transgene greatly diminished thymic cellularity, blocked thymocyte development at the double negative 3 (DN3) stage,
and resulted in reduced expression of T cell receptor (TCR) β chain in DN4 thymocytes. By chromatin immunoprecipitation, we
demonstrated in DN thymocytes that GABPα is associated with transcription initiation sites of genes encoding key molecules
in TCR rearrangements. Among these GABP-associated genes, knockdown of GABPα expression by RNA interference diminished expression
of DNA ligase IV, Artemis, and Ku80 components in DNA-dependent protein kinase complex. Interestingly, forced expression of prearranged TCR but not
IL-7Rα can alleviate the DN3 block in GABPα-targeted mice. Our observations collectively indicate that in addition to regulating
IL-7Rα expression, GABP is critically required for TCR rearrangements and hence normal T cell development.
Journal of Biological Chemistry 04/2010; 285(14):10179-10188. · 4.77 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: GA binding protein (GABP) consists of GABPalpha and GABPbeta subunits. GABPalpha is a member of Ets family transcription factors and binds DNA via its conserved Ets domain, whereas GABPbeta does not bind DNA but possesses transactivation activity. In T cells, GABP has been demonstrated to regulate the gene expression of interleukin-7 receptor alpha chain (IL-7Ralpha) and postulated to be critical in T cell development. To directly investigate its function in early thymocyte development, we used GABPalpha conditional knock-out mice where the exons encoding the Ets DNA-binding domain are flanked with LoxP sites. Ablation of GABPalpha with the Lck-Cre transgene greatly diminished thymic cellularity, blocked thymocyte development at the double negative 3 (DN3) stage, and resulted in reduced expression of T cell receptor (TCR) beta chain in DN4 thymocytes. By chromatin immunoprecipitation, we demonstrated in DN thymocytes that GABPalpha is associated with transcription initiation sites of genes encoding key molecules in TCR rearrangements. Among these GABP-associated genes, knockdown of GABPalpha expression by RNA interference diminished expression of DNA ligase IV, Artemis, and Ku80 components in DNA-dependent protein kinase complex. Interestingly, forced expression of prearranged TCR but not IL-7Ralpha can alleviate the DN3 block in GABPalpha-targeted mice. Our observations collectively indicate that in addition to regulating IL-7Ralpha expression, GABP is critically required for TCR rearrangements and hence normal T cell development.
Journal of Biological Chemistry 02/2010; 285(14):10179-88. · 4.77 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Protein-protein interactions (PPIs), though extremely valuable towards a better understanding of protein functions and cellular processes, do not provide any direct information about the regions/domains within the proteins that mediate the interaction. Most often, it is only a fraction of a protein that directly interacts with its biological partners. Thus, understanding interaction at the domain level is a critical step towards (i) thorough understanding of PPI networks; (ii) precise identification of binding sites; (iii) acquisition of insights into the causes of deleterious mutations at interaction sites; and (iv) most importantly, development of drugs to inhibit pathological protein interactions. In addition, knowledge derived from known domain-domain interactions (DDIs) can be used to understand binding interfaces, which in turn can help discover unknown PPIs.
Here, we describe a novel method called K-GIDDI (knowledge-guided inference of DDIs) to narrow down the PPI sites to smaller regions/domains. K-GIDDI constructs an initial DDI network from cross-species PPI networks, and then expands the DDI network by inferring additional DDIs using a divide-and-conquer biclustering algorithm guided by Gene Ontology (GO) information, which identifies partial-complete bipartite sub-networks in the DDI network and makes them complete bipartite sub-networks by adding edges. Our results indicate that K-GIDDI can reliably predict DDIs. Most importantly, K-GIDDI's novel network expansion procedure allows prediction of DDIs that are otherwise not identifiable by methods that rely only on PPI data.
http://www.ittc.ku.edu/~xwchen/domainNetwork/ddinet.html
Bioinformatics 09/2009; 25(19):2492-9. · 5.47 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Chromatin modifications have been implicated in the regulation of gene expression. While association of certain modifications with expressed or silent genes has been established, it remains unclear how changes in chromatin environment relate to changes in gene expression. In this article, we used ChIP-seq (chromatin immunoprecipitation with massively parallel sequencing) to analyze the genome-wide changes in chromatin modifications during activation of total human CD4(+) T cells by T-cell receptor (TCR) signaling. Surprisingly, we found that the chromatin modification patterns at many induced and silenced genes are relatively stable during the short-term activation of resting T cells. Active chromatin modifications were already in place for a majority of inducible protein-coding genes, even while the genes were silent in resting cells. Similarly, genes that were silenced upon T-cell activation retained positive chromatin modifications even after being silenced. To investigate if these observations are also valid for miRNA-coding genes, we systematically identified promoters for known miRNA genes using epigenetic marks and profiled their expression patterns using deep sequencing. We found that chromatin modifications can poise miRNA-coding genes as well. Our data suggest that miRNA- and protein-coding genes share similar mechanisms of regulation by chromatin modifications, which poise inducible genes for activation in response to environmental stimuli.
Genome Research 09/2009; 19(10):1742-51. · 13.61 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Polycomb group (PcG) proteins control organism development by regulating the expression of developmental genes. Transcriptional regulation by PcG proteins is achieved, at least partly, through the PRC2-mediated methylation on lysine 27 of histone H3 (H3K27) and PRC1-mediated ubiquitylation on lysine 119 of histone H2A (uH2A). As an integral component of PRC1, Bmi1 has been demonstrated to be critical for H2A ubiquitylation. Although recent studies have revealed the genome-wide binding patterns of some of the PRC1 and PRC2 components, as well as the H3K27me3 mark, there have been no reports describing genome-wide localization of uH2A. Using the recently developed ChIP-Seq technology, here, we report genome-wide localization of the Bmi1-dependent uH2A mark in MEF cells. Gene promoter averaging analysis indicates a peak of uH2A just inside the transcription start site (TSS) of well-annotated genes. This peak is enriched at promoters containing the H3K27me3 mark and represents the least expressed genes in WT MEF cells. In addition, peak finding reveals regions of local uH2A enrichment throughout the mouse genome, including almost 700 gene promoters. Genes with promoter peaks of uH2A exhibit lower-level expression when compared to genes that do not contain promoter peaks of uH2A. Moreover, we demonstrate that genes with uH2A peaks have increased expression upon Bmi1 knockout. Importantly, local enrichment of uH2A is not limited to regions containing the H3K27me3 mark. We describe the enrichment of H2A ubiquitylation at high-density CpG promoters and provide evidence to suggest that DNA methylation may be linked to uH2A at these regions. Thus, our work not only reveals Bmi1-dependent H2A ubiquitylation, but also suggests that uH2A targeting in differentiated cells may employ a different mechanism from that in ES cells.
PLoS Genetics 07/2009; 5(6):e1000506. · 8.69 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Distinctive SWI/SNF-like ATP-dependent chromatin remodeling esBAF complexes are indispensable for the maintenance and pluripotency of mouse embryonic stem (ES) cells [Ho L, et al. (2009) Proc Natl Acad Sci USA 10.1073/pnas.0812889106]. To understand the mechanism underlying the roles of these complexes in ES cells, we performed high-resolution genome-wide mapping of the core ATPase subunit, Brg, using ChIP-Seq technology. We find that esBAF, as represented by Brg, binds to genes encoding components of the core ES transcriptional circuitry, including Polycomb group proteins. esBAF colocalizes extensively with transcription factors Oct4, Sox2 and Nanog genome-wide, and shows distinct functional interactions with Oct4 and Sox2 at its target genes. Surprisingly, no significant colocalization of esBAF with PRC2 complexes, represented by Suz12, is observed. Lastly, esBAF colocalizes with Stat3 and Smad1 genome-wide, consistent with a direct and critical role in LIF and BMP signaling for maintaining self-renewal. Taken together, our studies indicate that esBAF is an essential component of the core pluripotency transcriptional network, and might also be a critical component of the LIF and BMP signaling pathways essential for maintenance of self-renewal and pluripotency.
Proceedings of the National Academy of Sciences 04/2009; 106(13):5187-91. · 9.68 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Although several studies have provided important insights into the general principles of biological networks, the link between network organization and the genome-scale dynamics of the underlying entities (genes, mRNAs, and proteins) and its role in systems behavior remain unclear. Here we show that transcription factor (TF) dynamics and regulatory network organization are tightly linked. By classifying TFs in the yeast regulatory network into three hierarchical layers (top, core, and bottom) and integrating diverse genome-scale datasets, we find that the TFs have static and dynamic properties that are similar within a layer and different across layers. At the protein level, the top-layer TFs are relatively abundant, long-lived, and noisy compared with the core- and bottom-layer TFs. Although variability in expression of top-layer TFs might confer a selective advantage, as this permits at least some members in a clonal cell population to initiate a response to changing conditions, tight regulation of the core- and bottom-layer TFs may minimize noise propagation and ensure fidelity in regulation. We propose that the interplay between network organization and TF dynamics could permit differential utilization of the same underlying network by distinct members of a clonal cell population.
Molecular Systems Biology 02/2009; 5:294. · 8.63 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Insulators are DNA elements that prevent inappropriate interactions between the neighboring regions of the genome. They can be functionally classified as either enhancer blockers or domain barriers. CTCF (CCCTC-binding factor) is the only known major insulator-binding protein in the vertebrates and has been shown to bind many enhancer-blocking elements. However, it is not clear whether it plays a role in chromatin domain barriers between active and repressive domains. Here, we used ChIP-seq to map the genome-wide binding sites of CTCF in three cell types and identified significant binding of CTCF to the boundaries of repressive chromatin domains marked by H3K27me3. Although we find an extensive overlapping of CTCF-binding sites across the three cell types, its association with the domain boundaries is cell-type-specific. We further show that the nucleosomes flanking CTCF-binding sites are well positioned. Interestingly, we found a complementary pattern between the repressive H3K27me3 and the active H2AK5ac regions, which are separated by CTCF. Our data indicate that CTCF may play important roles in the barrier activity of insulators, and this study provides a resource for further investigation of the CTCF function in organizing chromatin in the human genome.
Genome Research 01/2009; 19(1):24-32. · 13.61 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: ChIP-Seq, which combines chromatin immunoprecipitation (ChIP) with ultra high-throughput massively parallel sequencing, is increasingly being used for mapping protein-DNA interactions in-vivo on a genome scale. Typically, short sequence reads from ChIP-Seq are mapped to a reference genome for further analysis. Although genomic regions enriched with mapped reads could be inferred as approximate binding regions, short read lengths (approximately 25-50 nt) pose challenges for determining the exact binding sites within these regions. Here, we present SISSRs (Site Identification from Short Sequence Reads), a novel algorithm for precise identification of binding sites from short reads generated from ChIP-Seq experiments. The sensitivity and specificity of SISSRs are demonstrated by applying it on ChIP-Seq data for three widely studied and well-characterized human transcription factors: CTCF (CCCTC-binding factor), NRSF (neuron-restrictive silencer factor) and STAT1 (signal transducer and activator of transcription protein 1). We identified 26 814, 5813 and 73 956 binding sites for CTCF, NRSF and STAT1 proteins, respectively, which is 32, 299 and 78% more than that inferred previously for the respective proteins. Motif analysis revealed that an overwhelming majority of the identified binding sites contained the previously established consensus binding sequence for the respective proteins, thus attesting for SISSRs' accuracy. SISSRs' sensitivity and precision facilitated further analyses of ChIP-Seq data revealing interesting insights, which we believe will serve as guidance for designing ChIP-Seq experiments to map in vivo protein-DNA interactions. We also show that tag densities at the binding sites are a good indicator of protein-DNA binding affinity, which could be used to distinguish and characterize strong and weak binding sites. Using tag density as an indicator of DNA-binding affinity, we have identified core residues within the NRSF and CTCF binding sites that are critical for a stronger DNA binding.
Nucleic Acids Research 10/2008; 36(16):5221-31. · 8.03 Impact Factor