Wing H Wong

Stanford University, Palo Alto, CA, United States

Are you Wing H Wong?

Claim your profile

Publications (63)616.71 Total impact

  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Molecular insights into somatic cell reprogramming to induced pluripotent stem cells (iPS) would aid regenerative medicine, but are difficult to elucidate in iPS because of their heterogeneity, as relatively few cells undergo reprogramming (0.1-1%; refs , ). To identify early acting regulators, we capitalized on non-dividing heterokaryons (mouse embryonic stem cells fused to human fibroblasts), in which reprogramming towards pluripotency is efficient and rapid, enabling the identification of transient regulators required at the onset. We used bi-species transcriptome-wide RNA-seq to quantify transcriptional changes in the human somatic nucleus during reprogramming towards pluripotency in heterokaryons. During heterokaryon reprogramming, the cytokine interleukin 6 (IL6), which is not detectable at significant levels in embryonic stem cells, was induced 50-fold. A 4-day culture with IL6 at the onset of iPS reprogramming replaced stably transduced oncogenic c-Myc such that transduction of only Oct4, Klf4 and Sox2 was required. IL6 also activated another Jak/Stat target, the serine/threonine kinase gene Pim1, which accounted for the IL6-mediated twofold increase in iPS frequency. In contrast, LIF, another induced GP130 ligand, failed to increase iPS frequency or activate c-Myc or Pim1, thereby revealing a differential role for the two Jak/Stat inducers in iPS generation. These findings demonstrate the power of heterokaryon bi-species global RNA-seq to identify early acting regulators of reprogramming, for example, extrinsic replacements for stably transduced transcription factors such as the potent oncogene c-Myc.
    Nature Cell Biology 09/2013; · 20.76 Impact Factor
  • Junhee Seok, Lu Tian, Wing H Wong
    [Show abstract] [Hide abstract]
    ABSTRACT: Analyzing the failure times of multiple events is of interest in many fields. Estimating the joint distribution of the failure times in a non-parametric way is not straightforward because some failure times are often right-censored and only known to be greater than observed follow-up times. Although it has been studied, there is no universally optimal solution for this problem. It is still challenging and important to provide alternatives that may be more suitable than existing ones in specific settings. Related problems of the existing methods are not only limited to infeasible computations, but also include the lack of optimality and possible non-monotonicity of the estimated survival function. In this paper, we proposed a non-parametric Bayesian approach for directly estimating the density function of multivariate survival times, where the prior is constructed based on the optional Pólya tree. We investigated several theoretical aspects of the procedure and derived an efficient iterative algorithm for implementing the Bayesian procedure. The empirical performance of the method was examined via extensive simulation studies. Finally, we presented a detailed analysis using the proposed method on the relationship among organ recovery times in severely injured patients. From the analysis, we suggested interesting medical information that can be further pursued in clinics.
    Biostatistics 07/2013; · 2.43 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: OBJECTIVE: To test whether the probability of having a live birth (LB) with the first IVF cycle (C1) can be predicted and personalized for patients in diverse environments. DESIGN: Retrospective validation of multicenter prediction model. SETTING: Three university-affiliated outpatient IVF clinics located in different countries. PATIENT(S): Using primary models aggregated from >13,000 C1s, we applied the boosted tree method to train a preIVF-diversity model (PreIVF-D) with 1,061 C1s from 2008 to 2009, and validated predicted LB probabilities with an independent dataset comprising 1,058 C1s from 2008 to 2009. INTERVENTION(S): None. MAIN OUTCOME MEASURE(S): Predictive power, reclassification, receiver operator characteristic analysis, calibration, dynamic range. RESULT(S): Overall, with PreIVF-D, 86% of cases had significantly different LB probabilities compared with age control, and more than one-half had higher LB probabilities. Specifically, 42% of patients could have been identified by PreIVF-D to have a personalized predicted success rate >45%, whereas an age-control model could not differentiate them from others. Furthermore, PreIVF-D showed improved predictive power, with 36% improved log-likelihood (or 9.0-fold by log-scale; >1,000-fold linear scale), and prediction errors for subgroups ranged from 0.9% to 3.7%. CONCLUSION(S): Validated prediction of personalized LB probabilities from diverse multiple sources identify excellent prognoses in more than one-half of patients.
    Fertility and sterility 03/2013; · 3.97 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Landmark events occur in a coordinated manner during pre-implantation development of the mammalian embryo, yet the regulatory network that orchestrates these events remains largely unknown. Here, we present the first systematic investigation of the network in pre-implantation mouse embryos using morpholino-mediated gene knockdowns of key embryonic stem cell (ESC) factors followed by detailed transcriptome analysis of pooled embryos, single embryos, and individual blastomeres. We delineated the regulons of Oct4, Sall4, and Nanog and identified a set of metabolism- and transport-related genes that were controlled by these transcription factors in embryos but not in ESCs. Strikingly, the knockdown embryos arrested at a range of developmental stages. We provided evidence that the DNA methyltransferase Dnmt3b has a role in determining the extent to which a knockdown embryo can develop. We further showed that the feed-forward loop comprising Dnmt3b, the pluripotency factors, and the miR-290-295 cluster exemplifies a network motif that buffers embryos against gene expression noise. Our findings indicate that Oct4, Sall4, and Nanog form a robust and integrated network to govern mammalian pre-implantation development.
    Molecular Systems Biology 01/2013; 9:632. · 11.34 Impact Factor
  • Luo Lu, Hui Jiang, Wing H. Wong
    [Show abstract] [Hide abstract]
    ABSTRACT: Consider a class of densities that are piecewise constant functions over partitions of the sample space defined by sequential coordinate partitioning. We introduce a prior distribution for a density in this function class and derive in closed form the marginal posterior distribution of the corresponding partition. A computationally efficient method, based on sequential importance sampling, is presented for the inference of the partition from this posterior distribution. Compared to traditional approaches such as the kernel method or the histogram, the Bayesian sequential partitioning (BSP) method proposed here is capable of providing much more accurate estimates when the sample space is of moderate to high dimension. We illustrate this by simulated as well as real data examples. The examples also demonstrate how BSP can be used to design new classification methods competitive with the state of the art.
    Journal of the American Statistical Association 01/2013; 108(504). · 1.83 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In the vertebrate neural tube, regional Sonic hedgehog (Shh) signaling invokes a time- and concentration-dependent induction of six different cell populations mediated through Gli transcriptional regulators. Elsewhere in the embryo, Shh/Gli responses invoke different tissue-appropriate regulatory programs. A genome-scale analysis of DNA binding by Gli1 and Sox2, a pan-neural determinant, identified a set of shared regulatory regions associated with key factors central to cell fate determination and neural tube patterning. Functional analysis in transgenic mice validates core enhancers for each of these factors and demonstrates the dual requirement for Gli1 and Sox2 inputs for neural enhancer activity. Furthermore, through an unbiased determination of Gli-binding site preferences and analysis of binding site variants in the developing mammalian CNS, we demonstrate that differential Gli-binding affinity underlies threshold-level activator responses to Shh input. In summary, our results highlight Sox2 input as a context-specific determinant of the neural-specific Shh response and differential Gli-binding site affinity as an important cis-regulatory property critical for interpreting Shh morphogen action in the mammalian neural tube.
    Genes & development 12/2012; 26(24):2802-16. · 12.08 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Retroviral overexpression of reprogramming factors (Oct4, Sox2, Klf4, c-Myc) generates induced pluripotent stem cells (iPSCs). However, the integration of foreign DNA could induce genomic dysregulation. Cell-permeant proteins (CPPs) could overcome this limitation. To date, this approach has proved exceedingly inefficient. We discovered a striking difference in the pattern of gene expression induced by viral versus CPP-based delivery of the reprogramming factors, suggesting that a signaling pathway required for efficient nuclear reprogramming was activated by the retroviral, but not CPP approach. In gain- and loss-of-function studies, we find that the toll-like receptor 3 (TLR3) pathway enables efficient induction of pluripotency by viral or mmRNA approaches. Stimulation of TLR3 causes rapid and global changes in the expression of epigenetic modifiers to enhance chromatin remodeling and nuclear reprogramming. Activation of inflammatory pathways are required for efficient nuclear reprogramming in the induction of pluripotency.
    Cell 10/2012; 151(3):547-58. · 31.96 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Retroviral overexpression of reprogramming factors (Oct4, Sox2, Klf4, c-Myc) generates induced pluripo-tent stem cells (iPSCs). However, the integration of foreign DNA could induce genomic dysregulation. Cell-permeant proteins (CPPs) could overcome this limitation. To date, this approach has proved ex-ceedingly inefficient. We discovered a striking differ-ence in the pattern of gene expression induced by viral versus CPP-based delivery of the reprogram-ming factors, suggesting that a signaling pathway required for efficient nuclear reprogramming was activated by the retroviral, but not CPP approach. In gain-and loss-of-function studies, we find that the toll-like receptor 3 (TLR3) pathway enables effi-cient induction of pluripotency by viral or mmRNA approaches. Stimulation of TLR3 causes rapid and global changes in the expression of epigenetic modi-fiers to enhance chromatin remodeling and nuclear reprogramming. Activation of inflammatory path-ways are required for efficient nuclear reprogram-ming in the induction of pluripotency.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Next-generation sequence analysis has become an important task both in laboratory and clinical settings. A key stage in the majority sequence analysis workflows, such as resequencing, is the alignment of genomic reads to a reference genome. The accurate alignment of reads with large indels is a computationally challenging task for researchers. We introduce SeqAlto as a new algorithm for read alignment. For reads longer than or equal to 100 bp, SeqAlto is up to 10 × faster than existing algorithms, while retaining high accuracy and the ability to align reads with large (up to 50 bp) indels. This improvement in efficiency is particularly important in the analysis of future sequencing data where the number of reads approaches many billions. Furthermore, SeqAlto uses less than 8 GB of memory to align against the human genome. SeqAlto is benchmarked against several existing tools with both real and simulated data. Linux and Mac OS X binaries free for academic use are available at http://www.stanford.edu/group/wonglab/seqalto whwong@stanford.edu.
    Bioinformatics 07/2012; 28(18):2366-73. · 5.47 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: To report and evaluate the performance and utility of an approach to predicting IVF-double embryo transfer (DET) multiple birth risks that is evidence-based, clinic-specific, and considers each patient's clinical profile. Retrospective prediction modeling. An outpatient university-affiliated IVF clinic. We used boosted tree methods to analyze 2,413 independent IVF-DET treatment cycles that resulted in live births. The IVF cycles were retrieved from a database that comprised more than 33,000 IVF cycles. None. The performance of this prediction model, MBP-BIVF, was validated by an independent data set, to evaluate predictive power, discrimination, dynamic range, and reclassification. Multiple birth probabilities ranging from 11.8% to 54.8% were predicted by the model and were significantly different from control predictions in more than half of the patients. The prediction model showed an improvement of 146% in predictive power and 16.0% in discrimination over control. The population standard error was 1.8%. We showed that IVF patients have inherently different risks of multiple birth, even when DET is specified, and this risk can be predicted before ET. The use of clinic-specific prediction models provides an evidence-based and personalized method to counsel patients.
    Fertility and sterility 05/2012; 98(1):69-76. · 3.97 Impact Factor
  • Source
    Nature 01/2012; 489(7414):57-74. · 38.60 Impact Factor
  • Source
    Martin A. Tanner, Wing H. Wong
    [Show abstract] [Hide abstract]
    ABSTRACT: It was known from Metropolis et al. [J. Chem. Phys. 21 (1953) 1087--1092] that one can sample from a distribution by performing Monte Carlo simulation from a Markov chain whose equilibrium distribution is equal to the target distribution. However, it took several decades before the statistical community embraced Markov chain Monte Carlo (MCMC) as a general computational tool in Bayesian inference. The usual reasons that are advanced to explain why statisticians were slow to catch on to the method include lack of computing power and unfamiliarity with the early dynamic Monte Carlo papers in the statistical physics literature. We argue that there was a deeper reason, namely, that the structure of problems in the statistical mechanics and those in the standard statistical literature are different. To make the methods usable in standard Bayesian problems, one had to exploit the power that comes from the introduction of judiciously chosen auxiliary variables and collective moves. This paper examines the development in the critical period 1980--1990, when the ideas of Markov chain simulation from the statistical physics literature and the latent variable formulation in maximum likelihood computation (i.e., EM algorithm) came together to spark the widespread application of MCMC methods in Bayesian computation.
    Statistical Science 04/2011; 25(2010). · 2.24 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Effective clinical management of prostate cancer (PCA) has been challenged by significant intratumoural heterogeneity on the genomic and pathological levels and limited understanding of the genetic elements governing disease progression. Here, we exploited the experimental merits of the mouse to test the hypothesis that pathways constraining progression might be activated in indolent Pten-null mouse prostate tumours and that inactivation of such progression barriers in mice would engender a metastasis-prone condition. Comparative transcriptomic and canonical pathway analyses, followed by biochemical confirmation, of normal prostate epithelium versus poorly progressive Pten-null prostate cancers revealed robust activation of the TGFβ/BMP-SMAD4 signalling axis. The functional relevance of SMAD4 was further supported by emergence of invasive, metastatic and lethal prostate cancers with 100% penetrance upon genetic deletion of Smad4 in the Pten-null mouse prostate. Pathological and molecular analysis as well as transcriptomic knowledge-based pathway profiling of emerging tumours identified cell proliferation and invasion as two cardinal tumour biological features in the metastatic Smad4/Pten-null PCA model. Follow-on pathological and functional assessment confirmed cyclin D1 and SPP1 as key mediators of these biological processes, which together with PTEN and SMAD4, form a four-gene signature that is prognostic of prostate-specific antigen (PSA) biochemical recurrence and lethal metastasis in human PCA. This model-informed progression analysis, together with genetic, functional and translational studies, establishes SMAD4 as a key regulator of PCA progression in mice and humans.
    Nature 02/2011; 470(7333):269-73. · 38.60 Impact Factor
  • Source
    Li Ma, Wing H. Wong
    [Show abstract] [Hide abstract]
    ABSTRACT: Testing and characterizing the difference between two data samples is of fundamental interest in statistics. Existing methods such as Kolmogorov-Smirnov and Cramer-von-Mises tests do not scale well as the dimensionality increases and provides no easy way to characterize the difference should it exist. In this work, we propose a theoretical framework for inference that addresses these challenges in the form of a prior for Bayesian nonparametric analysis. The new prior is constructed based on a random-partition-and-assignment procedure similar to the one that defines the standard optional P\'olya tree distribution, but has the ability to generate multiple random distributions jointly. These random probability distributions are allowed to "couple", that is to have the same conditional distribution, on subsets of the sample space. We show that this "coupling optional P\'olya tree" prior provides a convenient and effective way for both the testing of two sample difference and the learning of the underlying structure of the difference. In addition, we discuss some practical issues in the computational implementation of this prior and provide several numerical examples to demonstrate its work.
    Journal of the American Statistical Association 11/2010; · 1.83 Impact Factor
  • Source
    Wing H. Wong, Li Ma
    [Show abstract] [Hide abstract]
    ABSTRACT: We introduce an extension of the P\'olya tree approach for constructing distributions on the space of probability measures. By using optional stopping and optional choice of splitting variables, the construction gives rise to random measures that are absolutely continuous with piecewise smooth densities on partitions that can adapt to fit the data. The resulting "optional P\'{o}lya tree" distribution has large support in total variation topology and yields posterior distributions that are also optional P\'{o}lya trees with computable parameter values. Comment: Published in at http://dx.doi.org/10.1214/09-AOS755 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
    The Annals of Statistics 10/2010; · 2.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Nearly 75% of in vitro fertilization (IVF) treatments do not result in live births and patients are largely guided by a generalized age-based prognostic stratification. We sought to provide personalized and validated prognosis by using available clinical and embryo data from prior, failed treatments to predict live birth probabilities in the subsequent treatment. We generated a boosted tree model, IVFBT, by training it with IVF outcomes data from 1,676 first cycles (C1s) from 2003-2006, followed by external validation with 634 cycles from 2007-2008, respectively. We tested whether this model could predict the probability of having a live birth in the subsequent treatment (C2). By using nondeterministic methods to identify prognostic factors and their relative nonredundant contribution, we generated a prediction model, IVF(BT), that was superior to the age-based control by providing over 1,000-fold improvement to fit new data (p<0.05), and increased discrimination by receiver-operative characteristic analysis (area-under-the-curve, 0.80 vs. 0.68 for C1, 0.68 vs. 0.58 for C2). IVFBT provided predictions that were more accurate for approximately 83% of C1 and approximately 60% of C2 cycles that were out of the range predicted by age. Over half of those patients were reclassified to have higher live birth probabilities. We showed that data from a prior cycle could be used effectively to provide personalized and validated live birth probabilities in a subsequent cycle. Our approach may be replicated and further validated in other IVF clinics.
    Proceedings of the National Academy of Sciences 08/2010; 107(31):13570-5. · 9.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Due to the complex nature of common diseases, their etiology is likely to involve "uncommon but strong" (UBS) interactive effects--i.e. allelic combinations that are each present in only a small fraction of the patients but associated with high disease risk. However, the identification of such effects using standard methods for testing association can be difficult. In this work, we introduce a method for testing interactions that is particularly powerful in detecting UBS effects. The method consists of two modules--one is a pattern counting algorithm designed for efficiently evaluating the risk significance of each marker combination, and the other is a sequential permutation scheme for multiple testing correction. We demonstrate the work of our method using a candidate gene data set for cardiovascular and coronary diseases with an injected UBS three-locus interaction. In addition, we investigate the power and false rejection properties of our method using data sets simulated from a joint dominance three-locus model that gives rise to UBS interactive effects. The results show that our method can be much more powerful than standard approaches such as trend test and multifactor dimensionality reduction for detecting UBS interactions.
    Genetic Epidemiology 07/2010; 34(5):434-43. · 4.02 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Many genes initially identified for their roles in cell fate determination or signaling during development can have a significant impact on tumorigenesis. In the developing cerebellum, Sonic hedgehog (Shh) stimulates the proliferation of granule neuron precursor cells (GNPs) by activating the Gli transcription factors. Inappropriate activation of Shh target genes results in unrestrained cell division and eventually medulloblastoma, the most common pediatric brain malignancy. We find dramatic differences in the gene networks that are directly driven by the Gli1 transcription factor in GNPs and medulloblastoma. Gli1 binding location analysis revealed hundreds of genomic loci bound by Gli1 in normal and cancer cells. Only one third of the genes bound by Gli1 in GNPs were also bound in tumor cells. Correlation with gene expression levels indicated that 116 genes were preferentially transcribed in tumors, whereas 132 genes were target genes in both GNPs and medulloblastoma. Quantitative PCR and in situ hybridization for some putative target genes support their direct regulation by Gli. The results indicate that transformation of normal GNPs into deadly tumor cells is accompanied by a distinct set of Gli-regulated genes and may provide candidates for targeted therapies.
    Proceedings of the National Academy of Sciences 05/2010; 107(21):9736-41. · 9.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Complex interactions between genes or proteins contribute substantially to phenotypic evolution. We present a probabilistic model and a maximum likelihood approach for cross-species clustering analysis and for identification of conserved as well as species-specific co-expression modules. This model enables a "soft" cross-species clustering (SCSC) approach by encouraging but not enforcing orthologous genes to be grouped into the same cluster. SCSC is therefore robust to obscure orthologous relationships and can reflect different functional roles of orthologous genes in different species. We generated a time-course gene expression dataset for differentiating mouse embryonic stem (ES) cells, and compiled a dataset of published gene expression data on differentiating human ES cells. Applying SCSC to analyze these datasets, we identified conserved and species-specific gene regulatory modules. Together with protein-DNA binding data, an SCSC cluster specifically induced in murine ES cells indicated that the KLF2/4/5 transcription factors, although critical to maintaining the pluripotent phenotype in mouse ES cells, were decoupled from the OCT4/SOX2/NANOG regulatory module in human ES cells. Two of the target genes of murine KLF2/4/5, LIN28 and NODAL, were rewired to be targets of OCT4/SOX2/NANOG in human ES cells. Moreover, by mapping SCSC clusters onto KEGG signaling pathways, we identified the signal transduction components that were induced in pluripotent ES cells in either a conserved or a species-specific manner. These results suggest that the pluripotent cell identity can be established and maintained through more than one gene regulatory network.
    PLoS Computational Biology 01/2010; 6(3):e1000707. · 4.87 Impact Factor

Publication Stats

4k Citations
616.71 Total Impact Points

Institutions

  • 2005–2013
    • Stanford University
      • • Department of Health Research and Policy
      • • Department of Statistics
      • • Department of Obstetrics and Gynecology
      Palo Alto, CA, United States
    • University of Southern California
      Los Angeles, California, United States
    • University of Pittsburgh
      Pittsburgh, Pennsylvania, United States
  • 2002–2012
    • Harvard University
      • • Department of Stem Cell and Regenerative Biology
      • • Department of Molecular and Cell Biology
      • • Department of Statistics
      • • Department of Biostatistics
      Boston, MA, United States
  • 2004–2011
    • Dana-Farber Cancer Institute
      • Department of Medical Oncology
      Boston, MA, United States
  • 2006–2009
    • Harvard Medical School
      • • Department of Obstetrics, Gynecology, and Reproductive Biology
      • • Department of Neurology
      Boston, Massachusetts, United States
    • Tsinghua University
      • Department of Automation
      Beijing, Beijing Shi, China
  • 2008
    • University of California, Berkeley
      • Department of Statistics
      Berkeley, MO, United States
    • Johns Hopkins Bloomberg School of Public Health
      • Department of Biostatistics
      Baltimore, MD, United States
  • 2005–2006
    • Georgia Health Sciences University
      • Center for Biotechnology and Genomic Medicine
      Augusta, GA, United States