Joshua J Waterfall

NCI-Frederick, Maryland, United States

Are you Joshua J Waterfall?

Claim your profile

Publications (23)182.7 Total impact

  • Joshua J Waterfall, J Keith Killian, Paul S Meltzer
    [Show abstract] [Hide abstract]
    ABSTRACT: Genetic mutations, metabolic dysfunction, and epigenetic misregulation are commonly considered to play distinct roles in tumor development and maintenance. However, intimate relationships between these mechanisms are now emerging. In particular, mutations in genes for the core metabolic enzymes IDH, SDH, and FH are significant drivers of diverse tumor types. In each case, the resultant accumulation of particular metabolites inhibits TET enzymes responsible for oxidizing 5-methylcytosine, leading to pervasive DNA hypermethylation.
    Biochemical and Biophysical Research Communications 08/2014; · 2.41 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To understand the genetic mechanisms driving variant and IGHV4-34-expressing hairy-cell leukemias, we performed whole-exome sequencing of leukemia samples from ten affected individuals, including six with matched normal samples. We identified activating mutations in the MAP2K1 gene (encoding MEK1) in 5 of these 10 samples and in 10 of 21 samples in a validation set (overall frequency of 15/31), suggesting potential new strategies for treating individuals with these diseases.
    Nature Genetics 11/2013; · 35.21 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Gastrointestinal stromal tumors (GIST) harbor driver mutations of signal transduction kinases such as KIT, or alternatively, manifest loss-of-function defects in the mitochondrial succinate dehydrogenase (SDH) complex, a component of the Krebs cycle and electron transport chain. We have uncovered a striking divergence between the DNA methylation profiles of SDH-deficient GIST (N=24) versus KIT tyrosine kinase pathway mutated GIST (N=39). Infinium 450K methylation array analysis of fixed (FFPE) tissues disclosed an order of magnitude greater genomic hypermethylation from gastric smooth muscle reference in SDH-deficient GIST versus the KIT mutant group (84.9K vs. 8.4K targets). Epigenomic divergence was further found among SDH-mutant paraganglioma/pheochromocytoma (N=29), a developmentally distinct SDH-deficient tumor system. Comparison of SDH-mutant GIST with isocitrate dehydrogenase (IDH)-mutant glioma-- another Krebs-cycle defective tumor type-- revealed comparable measures of global hypo- and hypermethylation. These data expose a vital connection between succinate metabolism and genomic DNA methylation during tumorigenesis, and generally implicate the mitochondrial Krebs cycle in nuclear epigenomic maintenance.
    Cancer Discovery 04/2013; · 10.14 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent genome-wide studies in metazoans have shown that RNA polymerase II (Pol II) accumulates to high densities on many promoters at a rate-limited step in transcription. However, the status of this Pol II remains an area of debate. Here, we compare quantitative outputs of a global run-on sequencing assay and chromatin immunoprecipitation sequencing assays and demonstrate that the majority of the Pol II on Drosophila promoters is transcriptionally engaged; very little exists in a preinitiation or arrested complex. These promoter-proximal polymerases are inhibited from further elongation by detergent-sensitive factors, and knockdown of negative elongation factor, NELF, reduces their levels. These results not only solidify the notion that pausing occurs at most promoters, but demonstrate that it is the major rate-limiting step in early transcription at these promoters. Finally, the divergent elongation complexes seen at mammalian promoters are far less prevalent in Drosophila, and this specificity in orientation correlates with directional core promoter elements, which are abundant in Drosophila.
    Cell Reports 10/2012; · 7.21 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Previously, we identified 8-bps long DNA sequences (8-mers) that localize in human proximal promoters and grouped them into known transcription factor binding sites (TFBS). We now examine split 8-mers consisting of two 4-mers separated by 1-bp to 30-bps (X(4)-N(1-30)-X(4)) to identify pairs of TFBS that localize in proximal promoters at a precise distance. These include two overlapping TFBS: the ETS⇔ETS motif ((C)/(G)CCGGAAGCGGAA) and the ETS⇔CRE motif ((C)/(G)CGGAAGTGACGTCAC). The nucleotides in bold are part of both TFBS. Molecular modeling shows that the ETS⇔CRE motif can be bound simultaneously by both the ETS and the B-ZIP domains without protein-protein clashes. The electrophoretic mobility shift assay (EMSA) shows that the ETS protein GABPα and the B-ZIP protein CREB preferentially bind to the ETS⇔CRE motif only when the two TFBS overlap precisely. In contrast, the ETS domain of ETV5 and CREB interfere with each other for binding the ETS⇔CRE. The 11-mer (CGGAAGTGACG), the conserved part of the ETS⇔CRE motif, occurs 226 times in the human genome and 83% are in known regulatory regions. In vivo GABPα and CREB ChIP-seq peaks identified the ETS⇔CRE as the most enriched motif occurring in promoters of genes involved in mRNA processing, cellular catabolic processes, and stress response, suggesting that a specific class of genes is regulated by this composite motif.
    G3-Genes Genomes Genetics 10/2012; 2(10):1243-56. · 1.79 Impact Factor
  • Joshua J Waterfall, Paul S Meltzer
    [Show abstract] [Hide abstract]
    ABSTRACT: Like many sarcomas, synovial sarcoma is driven by a characteristic oncogenic transcription factor fusion, SS18-SSX. In this issue of Cancer Cell, Su et al. elucidate the protein partners necessary for target gene misregulation and demonstrate a direct effect of histone deacetylase inhibitors on the SS18-SSX complex composition, expression misregulation, and apoptosis.
    Cancer cell 03/2012; 21(3):323-4. · 25.29 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We broadly profiled DNA methylation in breast cancers (n = 351) and benign parenchyma (n = 47) for correspondence with disease phenotype, using FFPE diagnostic surgical pathology specimens. Exploratory analysis revealed a distinctive primary invasive carcinoma subclass featuring extreme global methylation deviation. Subsequently, we tested the correlation between methylation remodeling pervasiveness and malignant biological features. A methyl deviation index (MDI) was calculated for each lesion relative to terminal ductal-lobular unit baseline, and group comparisons revealed that high-grade and short-survival estrogen receptor-positive (ER(+)) cancers manifest a significantly higher MDI than low-grade and long-survival ER(+) cancers. In contrast, ER(-) cancers display a significantly lower MDI, revealing a striking epigenomic distinction between cancer hormone receptor subtypes. Kaplan-Meier survival curves of MDI-based risk classes showed significant divergence between low- and high-risk groups. MDI showed superior prognostic performance to crude methylation levels, and MDI retained prognostic significance (P < 0.01) in Cox multivariate analysis, including clinical stage and pathological grade. Most MDI targets individually are significant markers of ER(+) cancer survival. Lymphoid and mesenchymal indexes were not substantially different between ER(+) and ER(-) groups and do not explain MDI dichotomy. However, the mesenchymal index was associated with ER(+) cancer survival, and a high lymphoid index was associated with medullary carcinoma. Finally, a comparison between metastases and primary tumors suggests methylation patterns are established early and maintained through disease progression for both ER(+) and ER(-) tumors.
    American Journal Of Pathology 07/2011; 179(1):55-65. · 4.60 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We report the immediate effects of estrogen signaling on the transcriptome of breast cancer cells using global run-on and sequencing (GRO-seq). The data were analyzed using a new bioinformatic approach that allowed us to identify transcripts directly from the GRO-seq data. We found that estrogen signaling directly regulates a strikingly large fraction of the transcriptome in a rapid, robust, and unexpectedly transient manner. In addition to protein-coding genes, estrogen regulates the distribution and activity of all three RNA polymerases and virtually every class of noncoding RNA that has been described to date. We also identified a large number of previously undetected estrogen-regulated intergenic transcripts, many of which are found proximal to estrogen receptor binding sites. Collectively, our results provide the most comprehensive measurement of the primary and immediate estrogen effects to date and a resource for understanding rapid signal-dependent transcription in other systems.
    Cell 05/2011; 145(4):622-34. · 31.96 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Transitions between pluripotent stem cells and differentiated cells are executed by key transcription regulators. Comparative measurements of RNA polymerase distribution over the genome's primary transcription units in different cell states can identify the genes and steps in the transcription cycle that are regulated during such transitions. To identify the complete transcriptional profiles of RNA polymerases with high sensitivity and resolution, as well as the critical regulated steps upon which regulatory factors act, we used genome-wide nuclear run-on (GRO-seq) to map the density and orientation of transcriptionally engaged RNA polymerases in mouse embryonic stem cells (ESCs) and mouse embryonic fibroblasts (MEFs). In both cell types, progression of a promoter-proximal, paused RNA polymerase II (Pol II) into productive elongation is a rate-limiting step in transcription of ∼40% of mRNA-encoding genes. Importantly, quantitative comparisons between cell types reveal that transcription is controlled frequently at paused Pol II's entry into elongation. Furthermore, "bivalent" ESC genes (exhibiting both active and repressive histone modifications) bound by Polycomb group complexes PRC1 (Polycomb-repressive complex 1) and PRC2 show dramatically reduced levels of paused Pol II at promoters relative to an average gene. In contrast, bivalent promoters bound by only PRC2 allow Pol II pausing, but it is confined to extremely 5' proximal regions. Altogether, these findings identify rate-limiting targets for transcription regulation during cell differentiation.
    Genes & development 04/2011; 25(7):742-54. · 12.08 Impact Factor
  • Source
    Leighton J Core, Joshua J Waterfall, John T Lis
    [Show abstract] [Hide abstract]
    ABSTRACT: RNA polymerases are highly regulated molecular machines. We present a method (global run-on sequencing, GRO-seq) that maps the position, amount, and orientation of transcriptionally engaged RNA polymerases genome-wide. In this method, nuclear run-on RNA molecules are subjected to large-scale parallel sequencing and mapped to the genome. We show that peaks of promoter-proximal polymerase reside on approximately 30% of human genes, transcription extends beyond pre-messenger RNA 3' cleavage, and antisense transcription is prevalent. Additionally, most promoters have an engaged polymerase upstream and in an orientation opposite to the annotated gene. This divergent polymerase is associated with active genes but does not elongate effectively beyond the promoter. These results imply that the interplay between polymerases and regulators over broad promoter regions dictates the orientation and efficiency of productive transcription.
    Science 01/2009; 322(5909):1845-8. · 31.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We demonstrate the use of a variational method to determine a quantitative lower bound on the rate of convergence of Markov chain Monte Carlo (MCMC) algorithms as a function of the target density and proposal density. The bound relies on approximating the second largest eigenvalue in the spectrum of the MCMC operator using a variational principle and the approach is applicable to problems with continuous state spaces. We apply the method to one dimensional examples with Gaussian and quartic target densities, and we contrast the performance of the random walk Metropolis-Hastings algorithm with a "smart" variant that incorporates gradient information into the trial moves, a generalization of the Metropolis adjusted Langevin algorithm. We find that the variational method agrees quite closely with numerical simulations. We also see that the smart MCMC algorithm often fails to converge geometrically in the tails of the target density except in the simplest case we examine, and even then care must be taken to choose the appropriate scaling of the deterministic and random parts of the proposed moves. Again, this calls into question the utility of smart MCMC in more complex problems. Finally, we apply the same method to approximate the rate of convergence in multidimensional Gaussian problems with and without importance sampling. There we demonstrate the necessity of importance sampling for target densities which depend on variables with a wide range of scales.
    Physical Review E 11/2008; 78(4 Pt 2):046704. · 2.31 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Successful predictions are among the most compelling validations of any model. Extracting falsifiable predictions from nonlinear multiparameter models is complicated by the fact that such models are commonly sloppy, possessing sensitivities to different parameter combinations that range over many decades. Here we discuss how sloppiness affects the sorts of data that best constrain model predictions, makes linear uncertainty approximations dangerous, and introduces computational difficulties in Monte-Carlo uncertainty analysis. We also present a useful test problem and suggest refinements to the standards by which models are communicated.
    Annals of the New York Academy of Sciences 01/2008; 1115:203-11. · 4.38 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Quantitative computational models play an increasingly important role in modern biology. Such models typically involve many free parameters, and assigning their values is often a substantial obstacle to model development. Directly measuring in vivo biochemical parameters is difficult, and collectively fitting them to other experimental data often yields large parameter uncertainties. Nevertheless, in earlier work we showed in a growth-factor-signaling model that collective fitting could yield well-constrained predictions, even when it left individual parameters very poorly constrained. We also showed that the model had a "sloppy" spectrum of parameter sensitivities, with eigenvalues roughly evenly distributed over many decades. Here we use a collection of models from the literature to test whether such sloppy spectra are common in systems biology. Strikingly, we find that every model we examine has a sloppy spectrum of sensitivities. We also test several consequences of this sloppiness for building predictive models. In particular, sloppiness suggests that collective fits to even large amounts of ideal time-series data will often leave many parameters poorly constrained. Tests over our model collection are consistent with this suggestion. This difficulty with collective fits may seem to argue for direct parameter measurements, but sloppiness also implies that such measurements must be formidably precise and complete to usefully constrain many model predictions. We confirm this implication in our growth-factor-signaling model. Our results suggest that sloppy sensitivity spectra are universal in systems biology models. The prevalence of sloppiness highlights the power of collective fits and suggests that modelers should focus on predictions rather than on parameters.
    PLoS Computational Biology 11/2007; 3(10):1871-78. · 4.87 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We apply the methods of optimal experimental design to a differential equation model for epidermal growth factor receptor signalling, trafficking and down-regulation. The model incorporates the role of a recently discovered protein complex made up of the E3 ubiquitin ligase, Cbl, the guanine exchange factor (GEF), Cool-1 (beta -Pix) and the Rho family G protein Cdc42. The complex has been suggested to be important in disrupting receptor down-regulation. We demonstrate that the model interactions can accurately reproduce the experimental observations, that they can be used to make predictions with accompanying uncertainties, and that we can apply ideas of optimal experimental design to suggest new experiments that reduce the uncertainty on unmeasurable components of the system.
    IET Systems Biology 06/2007; 1(3):190-202. · 1.54 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Directly measuring the parameters involved in dynamical models of cellular processes is typically very difficult, and collectively fitting such parameters to other data often yields large parameter uncertainties. Nonetheless, a collective fit which only weakly constrains model parameters may strongly constrain model predictions, if the model is ill-conditioned: much more sensitive to some directions in parameter space than others. In the quadratic approximation, the model sensitivities are proportional to the inverse square roots of the hessian matrix eigenvalues. Using a collection of 14 models from the systems biology literature, we show that for large systems the eigenvalue spectra are universally sloppy; they span huge ranges (> 10^6) and have approximately constant logarithmic spacing. Thus the models are ill-conditioned and have no well-defined cutoff between important and unimportant parameter combinations. This universal sloppiness suggests that collective fits will often poorly constrain parameters but usefully constrain many predictions.
    01/2007;
  • PLOS Comput Biol. 01/2007; 3(10):e189.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We apply the methods of optimal experimental design to a differential equation model for epidermal growth factor receptor (EGFR) signaling, trafficking, and down-regulation. The model incorporates the role of a recently discovered protein complex made up of the E3 ubiquitin ligase, Cbl, the guanine exchange factor (GEF), Cool-1 (Beta-Pix), and the Rho family G protein Cdc42. The complex has been suggested to be important in disrupting receptor down-regulation. We demonstrate that the model interactions can accurately reproduce the experimental observations, that they can be used to make predictions with accompanying uncertainties, and that we can apply ideas of optimal experimental design to suggest new experiments that reduce the uncertainty on unmeasurable components of the system.
    11/2006;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In a variety of contexts, physicists study complex, nonlinear models with many unknown or tunable parameters to explain experimental data. We explain why such systems so often are sloppy: the system behavior depends only on a few "stiff" combinations of the parameters and is unchanged as other "sloppy" parameter combinations vary by orders of magnitude. We observe that the eigenvalue spectra for the sensitivity of sloppy models have a striking, characteristic form with a density of logarithms of eigenvalues which is roughly constant over a large range. We suggest that the common features of sloppy models indicate that they may belong to a common universality class. In particular, we motivate focusing on a Vandermonde ensemble of multiparameter nonlinear models and show in one limit that they exhibit the universal features of sloppy models.
    Physical Review Letters 11/2006; 97(15):150601. · 7.73 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We apply the ideas of optimal experimental design to systems biology models: minimizing a design criterion based on the average variance of predictions, we suggest new experiments that need to be performed to optimally test a given biological hypothesis. The estimated variance in predictions is derived from the sensitivities of protein and chemical species in the model to changes in reaction rates. The sensitivities also allow us to determine which interactions in the biological network dominate the system behavior. To test the design principles, we have developed a differential equation model incorporating the processes of endocytosis, recycling and degradation of activated epidermal growth factor (EGF) receptor in a mammalian cell line. Recent experimental work has discovered mutant proteins that cause receptor accumulation and a prolonged growth signal. Our model is optimized to fit this mutant experimental data and wild type data for a variety of experimental conditions. Of biological interest is the effect on surface and internalized receptor levels after the overexpression or inactivation of regulator proteins in the network: the optimal design method allows us to fine tune the conditions to best predict the behavior of these unknown components of the system.
    03/2006;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Models of biological networks such as those involved in signal transduction, development, and the cell cycle routinely contain dozens of parameters. Even if high quality data on the dynamics of every form of every chemical species were available for such networks, some parameter combinations would be orders of magnitude more constrained than other combinations -- a feature we term sloppiness. In order to understand this shared, possibly universal, behavior we turn to mathematically well-defined classes of models -- multiple linear regression, sums of polynomials and sums of exponentials. The origins of sloppiness turn out to have nothing to do with how much data is available or how many parameters a model has, but are instead the scale of description at which a model is constructed and how the parameters of the model map to the data. Thus describing a cloud of points by a plane, the core of linear regression, is not sloppy while describing complex biological networks by the biochemical reactions, just as fitting sums of exponentials or polynomials, is unavoidably sloppy.
    01/2006;

Publication Stats

1k Citations
182.70 Total Impact Points

Institutions

  • 2014
    • NCI-Frederick
      Maryland, United States
  • 2011–2013
    • National Cancer Institute (USA)
      • Genetics Branch
      Maryland, United States
  • 2006–2012
    • Cornell University
      • • Department of Molecular Biology and Genetics
      • • Laboratory of Atomic and Solid State Physics
      Ithaca, NY, United States
  • 2008
    • University College Dublin
      • Complex and Adaptive Systems Laboratory
      Dublin, L, Ireland