A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis

Article (PDF Available)inPLoS Genetics 8(3):e1002531 · March 2012with35 Reads
DOI: 10.1371/journal.pgen.1002531 · Source: PubMed
Abstract
Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA-based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type-specific developmental gene expression patterns.
    • "The following mutant alleles, deficiencies and transgenes were used: Df(1) CHES-1-like1, Df(3R)Exel6157 and Ubi-polo (Ahmad et al., 2012 ); twi- GAL4 (Greig and Akam, 1993); Mef2-GAL4 (Ranganayakulu et al., 1996); tinD-GAL4 (Yin et al., 1997); Hand-GAL4 (Han and Olson, 2005 ); UAS- 2EGFP (Halfon et al., 2002); UAS-DN-htl (Michelson et al., 1998b); htl AB42 (Gisselbrecht et al., 1996); stumps YY202 (Michelson et al., 1998a); UAS-htl (Michelson et al., 1998b); fz R52 (Jones et al., 1996); CHES-1-like KK101264 (Dietzl et al., 2007); jumu GL00363 (Ni et al., 2011) and UAS-fz1-1 (Boutros et al., 2000). Wild-type and mutated ChIPCRM2610 enhancers were synthesized by Integrated DNA Technologies, cloned into the pWattB-nlacZ vector (Busser et al., 2012) and targeted to the attP40 site (Markstein et al., 2008) by phiC31-mediated integration (Groth et al., 2004) to create the ChIPCRM2610 WT and ChIPCRM2610 Fkh lacZ reporter-driving lines. "
    [Show abstract] [Hide abstract] ABSTRACT: Cardiogenesis involves the coordinated regulation of multiple biological processes by a finite set of transcription factors (TFs). Here we show that the Forkhead TFs, Checkpoint suppressor homologue (CHES-1-like) and Jumeau (Jumu), which govern cardiac progenitor cell divisions by regulating Polo kinase activity, play an additional, mutually redundant role in specifying the cardiac mesoderm (CM) since eliminating the functions of both Forkhead genes in the same embryo results in defective hearts with missing hemisegments. This process is mediated by the Forkhead TFs regulating the fibroblast growth factor receptor Heartless (Htl) and the Wnt receptor Frizzled (Fz): CHES-1-like and jumu exhibit synergistic genetic interactions with htl and fz in CM specification, thereby implying function through the same genetic pathways, and transcriptionally activate the expression of both receptor-encoding genes. Furthermore, ectopic overexpression of either htl or fz in the mesoderm partially rescues the defective CM specification phenotype in embryos lacking both Forkhead genes. Together, these data emphasize the functional redundancy that leads to robustness in the cardiac progenitor specification process, and illustrate the pleiotropic functions of Forkhead TFs in different aspects of cardiogenesis.
    Full-text · Article · Jan 2016
    • "The efficiency of RNAi knockdown was enhanced by both collecting embryos at 29°C and using UAS-dcr2. Embryo collections, fixation and antibody stains followed standard procedures1920212223. The Zfh1 antibody was used at 1:1500 and was provided by J. Skeath (Washington University, St Louis, MO) and the Tin antibody was created according to previously published procedures and used at 1:500 [18]. "
    [Show abstract] [Hide abstract] ABSTRACT: Here we used predictive gene expression signatures within a multi-species framework to identify the genes that underlie cardiac cell fate decisions in differentiating embryonic stem cells. We show that the overlapping orthologous mouse and human genes are the most accurate candidate cardiogenic genes as these genes identified the most conserved developmental pathways that characterize the cardiac lineage. An RNAi-based screen of the candidate genes in Drosophila uncovered numerous novel cardiogenic genes. shRNA knockdown combined with transcriptome profiling of the newly-identified transcription factors zinc finger protein 503 and zinc finger E-box binding homeobox 2 and the well-known cardiac regulatory factor NK2 homeobox 5 revealed that zinc finger E-box binding homeobox 2 activates terminal differentiation genes required for cardiomyocyte structure and function whereas zinc finger protein 503 and NK2 homeobox 5 are required for specification of the cardiac lineage. We further demonstrated that an essential role of NK2 homeobox 5 and zinc finger protein 503 in specification of the cardiac lineage is the repression of gene expression programs characteristic of alternative cell fates. Collectively, these results show that orthologous gene expression signatures can be used to identify conserved cardiogenic pathways.
    Full-text · Article · Oct 2015
    • "A single gene may be regulated by multiple enhancers in the same cell type, and such regulatory relationships have been shown to span large genomic distances [9]. Methods that predict active enhancers10111213141516 have observed widespread changes in enhancer activity in different cell types [17] . It has been suggested that differential enhancer usage implements both cellstate specific and cell-state independent gene regulation [18]. "
    [Show abstract] [Hide abstract] ABSTRACT: RNA Polymerase II ChIA-PET data has revealed enhancers that are active in a profiled cell type and the genes that the enhancers regulate through chromatin interactions. The most commonly used computational method for analyzing ChIA-PET data, the ChIA-PET Tool, discovers interaction anchors at a spatial resolution that is insufficient to accurately identify individual enhancers. We introduce Germ, a computational method that estimates the likelihood that any two narrowly defined genomic locations are jointly occupied by RNA Polymerase II. Germ takes a blind deconvolution approach to simultaneously estimate the likelihood of RNA Polymerase II occupation as well as a model of the arrangement of read alignments relative to locations occupied by RNA Polymerase II. Both types of information are utilized to estimate the likelihood that RNA Polymerase II jointly occupies any two genomic locations. We apply Germ to RNA Polymerase II ChIA-PET data from embryonic stem cells to identify the genomic locations that are jointly occupied along with transcription start sites. We show that these genomic locations align more closely with features of active enhancers measured by ChIP-Seq than the locations identified using the ChIA-PET Tool. We also apply Germ to RNA Polymerase II ChIA-PET data from motor neuron progenitors. Based on the Germ results, we observe that a combination of cell type specific and cell type independent regulatory interactions are utilized by cells to regulate gene expression.
    Full-text · Article · May 2015
Show more

Supplementary resources