Article

A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis

Laboratory of Developmental Systems Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, United States of America.
PLoS Genetics (Impact Factor: 7.53). 03/2012; 8(3):e1002531. DOI: 10.1371/journal.pgen.1002531
Source: PubMed

ABSTRACT

Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA-based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type-specific developmental gene expression patterns.

Download full-text

Full-text

Available from: Alan M Michelson
  • Source
    • "The following mutant alleles, deficiencies and transgenes were used: Df(1) CHES-1-like1, Df(3R)Exel6157 and Ubi-polo (Ahmad et al., 2012); twi- GAL4 (Greig and Akam, 1993); Mef2-GAL4 (Ranganayakulu et al., 1996); tinD-GAL4 (Yin et al., 1997); Hand-GAL4 (Han and Olson, 2005); UAS- 2EGFP (Halfon et al., 2002); UAS-DN-htl (Michelson et al., 1998b); htl AB42 (Gisselbrecht et al., 1996); stumps YY202 (Michelson et al., 1998a); UAS-htl (Michelson et al., 1998b); fz R52 (Jones et al., 1996); CHES-1-like KK101264 (Dietzl et al., 2007); jumu GL00363 (Ni et al., 2011) and UAS-fz1-1 (Boutros et al., 2000). Wild-type and mutated ChIPCRM2610 enhancers were synthesized by Integrated DNA Technologies, cloned into the pWattB-nlacZ vector (Busser et al., 2012) and targeted to the attP40 site (Markstein et al.,2008) by phiC31-mediated integration (Groth et al., 2004) to create the ChIPCRM2610 WT and ChIPCRM2610 Fkh lacZ reporter-driving lines. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Cardiogenesis involves the coordinated regulation of multiple biological processes by a finite set of transcription factors (TFs). Here we show that the Forkhead TFs, Checkpoint suppressor homologue (CHES-1-like) and Jumeau (Jumu), which govern cardiac progenitor cell divisions by regulating Polo kinase activity, play an additional, mutually redundant role in specifying the cardiac mesoderm (CM) since eliminating the functions of both Forkhead genes in the same embryo results in defective hearts with missing hemisegments. This process is mediated by the Forkhead TFs regulating the fibroblast growth factor receptor Heartless (Htl) and the Wnt receptor Frizzled (Fz): CHES-1-like and jumu exhibit synergistic genetic interactions with htl and fz in CM specification, thereby implying function through the same genetic pathways, and transcriptionally activate the expression of both receptor-encoding genes. Furthermore, ectopic overexpression of either htl or fz in the mesoderm partially rescues the defective CM specification phenotype in embryos lacking both Forkhead genes. Together, these data emphasize the functional redundancy that leads to robustness in the cardiac progenitor specification process, and illustrate the pleiotropic functions of Forkhead TFs in different aspects of cardiogenesis.
    Full-text · Article · Jan 2016 · Development
  • Source
    • "We next examined mid expression in the progenitors for LO1/ventral transverse 1 (VT1) and the SBM, since these muscles appeared to exhibit defects in mid mutant embryos. The founder cells for LO1 and the SBM have been shown to express Org-1, while the SBM expresses Ladybird (Lb) (Busser et al. 2012; Schaub et al. 2012). Thus we used these markers to identify the LO1/VT1 and SBM progenitors. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Drosophila Midline (Mid) is an ortholog of vertebrate Tbx20, which plays roles in the developing heart, migrating cranial motor neurons and endothelial cells. Mid functions in cell fate specification and differentiation of tissues that include the ectoderm, cardioblasts, neuroblasts, and egg chambers; however, a role in the somatic musculature has not been described. We identified mid in genetic and molecular screens for factors contributing to somatic muscle morphogenesis. Mid is expressed in founder cells (FCs) for several muscle fibers, and functions cooperatively with the T-box protein H15 in lateral oblique muscle 1 and the segment border muscle. Mid is particularly important for the specification and development of the lateral transverse (LT) muscles LT3 and LT4, which arise by asymmetric division of a single muscle progenitor. Mid is expressed in this progenitor and its two sibling FCs, but is maintained only in the LT4 FC. Both muscles were frequently missing in mid mutant embryos, and LT4-associated expression of the transcription factor Krüppel (Kr) was lost. When present, LT4 adopted an LT3-like morphology. Coordinately, mid misexpression caused LT3 to adopt an LT4-like morphology and was associated with ectopic Kr expression. From these data, we concluded that mid functions first in the progenitor to direct development of LT3 and LT4, and later in the FCs to influence which of these differentiation profiles is selected. Mid is the first T-box factor shown to influence LT3 and LT4 muscle identity and, along with the T-box protein Optomotor-blind-related-gene-1 (Org-1) is representative of a new class of transcription factors in muscle specification. Copyright © 2015, The Genetics Society of America.
    Full-text · Article · Jan 2015 · Genetics
  • Source
    • "We next examined mid expression in the progenitors for LO1/ventral transverse (VT) 1 and the segment border muscle (SBM), since these muscles appeared to exhibit defects in mid mutant embryos. The founder cells for LO1 and the SBM have been shown to expressed the Optomotor-blind-related-gene (Org)-1, while the SBM expresses Ladybird (Lb) (Busser et al. 2012; Schaub et al. 2012). Thus we used these markers to identify the LO1/VT1 and SBM progenitors. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Drosophila Midline (Mid) is an ortholog of vertebrate Tbx20, which plays roles in the developing heart, migrating cranial motor neurons and endothelial cells. Mid functions in cell fate specification and differentiation of tissues that include the ectoderm, cardioblasts, neuroblasts, and egg chambers; however, a role in the somatic musculature has not been described. We identified mid in genetic and molecular screens for factors contributing to somatic muscle morphogenesis. Mid is expressed in founder cells (FCs) for several muscle fibers, and functions cooperatively with the T-box protein H15 in lateral oblique muscle 1 and the segment border muscle. Mid is particularly important for the specification and development of the lateral transverse (LT) muscles LT3 and LT4, which arise by asymmetric division of a single muscle progenitor. Mid is expressed in this progenitor and its two sibling FCs, but is maintained only in the LT4 FC. Both muscles were frequently missing in mid mutant embryos, and LT4-associated expression of the transcription factor Krüppel (Kr) was lost. When present, LT4 adopted an LT3- like morphology. Coordinately, mid mis expression caused LT3 to adopt an LT4-like morphology and was associated with ectopic Kr expression. From these data, we concluded tha t mid functions first in the progenitor to direct development of LT3 and LT4, and later in the FCs to influence which of these differentiation profiles is selected. Mid is the first T-box factor shown to influence LT3 and LT4 muscle identity and, along with the T-box protein Optomotor-blind- related-gene-1 (Org-1) is representative of a new class of transcription factors in muscle specification.
    Full-text · Article · Jan 2015 · Genetics
Show more