Mike West

Duke University, Durham, North Carolina, United States

Are you Mike West?

Claim your profile

Publications (168)562.87 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This study aims to explore gene expression profiles that are associated with locoregional (LR) recurrence in breast cancer after mastectomy. A total of 94 breast cancer patients who underwent mastectomy between 1990 and 2001 and had DNA microarray study on the primary tumor tissues were chosen for this study. Eligible patient should have no evidence of LR recurrence without postmastectomy radiotherapy (PMRT) after a minimum of 3-year follow-up (n = 67) and any LR recurrence (n = 27). They were randomly split into training and validation sets. Statistical classification tree analysis and proportional hazards models were developed to identify and validate gene expression profiles that relate to LR recurrence. Our study demonstrates two sets of gene expression profiles (one with 258 genes and the other 34 genes) to be of predictive value with respect to LR recurrence. The overall accuracy of the prediction tree model in validation sets is estimated 75% to 78%. Of patients in validation data set, the 3-year LR control rate with predictive index more than 0.8 derived from 34-gene prediction models is 91%, and predictive index 0.8 or less is 40% (P = .008). Multivariate analysis of all patients reveals that estrogen receptor and genomic predictive index are independent prognostic factors that affect LR control. Using gene expression profiles to develop prediction tree models effectively identifies breast cancer patients who are at higher risk for LR recurrence. This gene expression-based predictive index can be used to select patients for PMRT.
    Journal of Clinical Oncology 11/2006; 24(28):4594-602. DOI:10.1200/JCO.2005.02.5676 · 17.88 Impact Factor
  • Source
    Carlos M Carvalho, Mike West
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces a novel class of Bayesian models for multivariate time series analysis based on a synthesis of dynamic linear models and graphical models. The synthesis uses sparse graphical modelling ideas to introduce structured, conditional independence relationships in the time-varying, cross-sectional covariance matrices of multiple time series. We define this new class of models and their theoretical structure involving novel matrix-normal/hyper-inverse Wishart distributions. We then describe the resulting Bayesian methodology and computational strategies for model fitting and prediction. This includes novel stochastic evolution theory for time-varying, struc-tured variance matrices, and the full sequential and conjugate updating, filtering and forecasting analysis. The models are then applied in the context of financial time se-ries for predictive portfolio analysis. The improvements defined in optimal Bayesian decision analysis in this example context vividly illustrate the practical benefits of the parsimony induced via appropriate graphical model structuring in multivariate dynamic modelling. We discuss theoretical and empirical aspects of the conditional independence structures in such models, issues of model uncertainty and search, and the relevance of this new framework as a key step towards scaling multivariate dynamic Bayesian modelling methodology to time series of increasing dimension and complexity.
    10/2006; DOI:10.1214/07-BA204
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Clinical trials have indicated a benefit of adjuvant chemotherapy for patients with stage IB, II, or IIIA--but not stage IA--non-small-cell lung cancer (NSCLC). This classification scheme is probably an imprecise predictor of the prognosis of an individual patient. Indeed, approximately 25 percent of patients with stage IA disease have a recurrence after surgery, suggesting the need to identify patients in this subgroup for more effective therapy. We identified gene-expression profiles that predicted the risk of recurrence in a cohort of 89 patients with early-stage NSCLC (the lung metagene model). We evaluated the predictor in two independent groups of 25 patients from the American College of Surgeons Oncology Group (ACOSOG) Z0030 study and 84 patients from the Cancer and Leukemia Group B (CALGB) 9761 study. The lung metagene model predicted recurrence for individual patients significantly better than did clinical prognostic factors and was consistent across all early stages of NSCLC. Applied to the cohorts from the ACOSOG Z0030 trial and the CALGB 9761 trial, the lung metagene model had an overall predictive accuracy of 72 percent and 79 percent, respectively. The predictor also identified a subgroup of patients with stage IA disease who were at high risk for recurrence and who might be best treated by adjuvant chemotherapy. The lung metagene model provides a potential mechanism to refine the estimation of a patient's risk of disease recurrence and, in principle, to alter decisions regarding the use of adjuvant chemotherapy in early-stage NSCLC.
    New England Journal of Medicine 09/2006; 355(6):570-80. DOI:10.1056/NEJMoa060467 · 54.42 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Although the transition from early- to advanced-stage ovarian cancer is a critical determinant of survival, little is known about the molecular underpinnings of ovarian metastasis. We hypothesize that microarray analysis of global gene expression patterns in primary ovarian cancer and metastatic omental implants can identify genes that underlie the metastatic process in epithelial ovarian cancer. We utilized Affymetrix U95Av2 microarrays to characterize the molecular alterations that underlie omental metastasis from 47 epithelial ovarian cancer samples collected from multiple sites in 20 patients undergoing primary surgical cytoreduction for advanced-stage (IIIC/IV) serous ovarian cancer. Fifty-six genes demonstrated differential expression between ovarian and omental samples (P < 0.01), and twenty of these 56 differentially expressed genes have previously been implicated in metastasis, cell motility, or cytoskeletal function. Ten of the 56 genes are involved in p53 gene pathways. A Bayesian statistical tree analysis was used to identify a 27-gene expression pattern that could accurately predict the site of tumor (ovary versus omentum). This predictive model was evaluated using an external data set. Nine of the 27 predictive genes have previously been shown to be involved in oncogenesis and/or metastasis, and 10/27 genes have been implicated in p53 pathways. Microarray findings were validated by real-time quantitative PCR. We conclude that gene expression patterns that distinguish omental metastasis from primary epithelial ovarian cancer can be identified and that many of the genes have functions that are biologically consistent with a role in oncogenesis, metastasis, and p53 gene networks.
    International Journal of Gynecological Cancer 08/2006; 16(5):1733-45. DOI:10.1111/j.1525-1438.2006.00660.x · 1.95 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Objective Bayesian inference for the multivariate normal distribution is il-lustrated, using different types of formal objective priors (Jeffreys, invari-ant, reference and matching), different modes of inference (Bayesian and fre-quentist), and different criteria involved in selecting optimal objective pri-ors (ease of computation, frequentist performance, marginalization paradoxes, and decision-theoretic evaluation). In the course of the investigation of the bivariate normal model in Berger and Sun (2006), a variety of surprising results were found, including the availability of objective priors that yield exact frequentist inferences for many functions of the bivariate normal parameters, such as the correlation coefficient. Certain of these results are generalized to the multivariate normal situation.
  • Source
    Carlos M Carvalho, Mike West
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces a novel class of Bayesian models for multivariate time series analysis based on a synthesis of dynamic linear models and graphical models. The models are then applied in the context of financial time series for predictive portfolio analysis providing a significant improvement in per-formance of optimal investment decisions.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Numerous recent studies have demonstrated the use of genomic data, particularly gene expression signatures, as clinical prognostic factors in cancer and other complex diseases. Such studies herald the future of genomic medicine and the opportunity for personalized prognosis in a variety of clinical contexts that utilizes genome-scale molecular information. The scale, complexity, and information content of high-throughput gene expression data, as one example of complex genomic information, is often under-appreciated as many analyses continue to focus on defining individual rather than multiplex biomarkers for patient stratification. Indeed, this complexity of genomic data is often--rather paradoxically--viewed as a barrier to its utility. To the contrary, the complexity and scale of global genomic data, as representing the many dimensions of biology, must be embraced for the development of more precise clinical prognostics. The need is for integrated analyses--approaches that embrace the complexity of genomic data, including multiple forms of genomic data, and aim to explore and understand multiple, interacting, and potentially conflicting predictors of risk, rather than continuing on the current and traditional path that oversimplifies and ignores the information content in the complexity. All forms of potentially relevant data should be examined, with particular emphasis on understanding the interactions, complementarities, and possible conflicts among gene expression, genetic, and clinical markers of risk.
    Genome Research 06/2006; 16(5):559-66. DOI:10.1101/gr.3851306 · 13.85 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To develop clinical prediction models for local regional recurrence (LRR) of breast carcinoma after mastectomy that will be superior to the conventional measures of tumor size and nodal status. Clinical information from 1,010 invasive breast cancer patients who had primary modified radical mastectomy formed the database of the training and testing of clinical prognostic and prediction models of LRR. Cox proportional hazards analysis and Bayesian tree analysis were the core methodologies from which these models were built. To generate a prognostic index model, 15 clinical variables were examined for their impact on LRR. Patients were stratified by lymph node involvement (<4 vs. >or =4) and local regional status (recurrent vs. control) and then, within strata, randomly split into training and test data sets of equal size. To establish prediction tree models, 255 patients were selected by the criteria of having had LRR (53 patients) or no evidence of LRR without postmastectomy radiotherapy (PMRT) (202 patients). With these models, patients can be divided into low-, intermediate-, and high-risk groups on the basis of axillary nodal status, estrogen receptor status, lymphovascular invasion, and age at diagnosis. In the low-risk group, there is no influence of PMRT on either LRR or survival. For intermediate-risk patients, PMRT improves LR control but not metastases-free or overall survival. For the high-risk patients, however, PMRT improves both LR control and metastasis-free and overall survival. The prognostic score and predictive index are useful methods to estimate the risk of LRR in breast cancer patients after mastectomy and for estimating the potential benefits of PMRT. These models provide additional information criteria for selection of patients for PMRT, compared with the traditional selection criteria of nodal status and tumor size.
    International Journal of Radiation OncologyBiologyPhysics 04/2006; 64(5):1401-9. DOI:10.1016/j.ijrobp.2005.11.015 · 4.18 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Breast cancer is a heterogeneous disease, and markers for disease subtypes and therapy response remain poorly defined. For that reason, we employed a prospective neoadjuvant study in locally advanced breast cancer to identify molecular signatures of gene expression correlating with known prognostic clinical phenotypes, such as inflammatory breast cancer or the presence of hypoxia. In addition, we defined molecular signatures that correlate with response to neoadjuvant chemotherapy. Tissue was collected under ultrasound guidance from patients with stage IIB/III breast cancer before four cycles of neoadjuvant liposomal doxorubicin paclitaxel chemotherapy combined with local whole breast hyperthermia. Gene expression analysis was done using Affymetrix U133 Plus 2.0 GeneChip arrays. Gene expression patterns were identified that defined the phenotypes of inflammatory breast cancer as well as tumor hypoxia. In addition, molecular signatures were identified that predicted the persistence of malignancy in the axillary lymph nodes after neoadjuvant chemotherapy. This persistent lymph node signature significantly correlated with disease-free survival in two separate large populations of breast cancer patients. Gene expression signatures have the capacity to identify clinically significant features of breast cancer and can predict which individual patients are likely to be resistant to neoadjuvant therapy, thus providing the opportunity to guide treatment decisions.
    Clinical Cancer Research 03/2006; 12(3 Pt 1):819-26. DOI:10.1158/1078-0432.CCR-05-1447 · 8.19 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe and illustrate approaches to data augmentation in multi-way contingency tables for which partial information, in the form of subsets of marginal totals, is available. In such problems, interest lies in questions of inference about the parameters of models underlying the table together with imputation for the individual cell entries. We discuss questions of structure related to the implications for inference on cell counts arising from assumptions about log-linear model forms, and a class of simple and useful prior distributions on the parameters of log-linear models. We then discuss “local move” and “global move” Metropolis–Hastings simulation methods for exploring the posterior distributions for parameters and cell counts, focusing particularly on higher-dimensional problems. As a by-product, we note potential uses of the “global move” approach for inference about numbers of tables consistent with a prescribed subset of marginal counts. Illustration and comparison of MCMC approaches is given, and we conclude with discussion of areas for further developments and current open issues.
    Journal of Statistical Planning and Inference 02/2006; DOI:10.1016/j.jspi.2004.07.002 · 0.60 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The development of an oncogenic state is a complex process involving the accumulation of multiple independent mutations that lead to deregulation of cell signalling pathways central to the control of cell growth and cell fate. The ability to define cancer subtypes, recurrence of disease and response to specific therapies using DNA microarray-based gene expression signatures has been demonstrated in multiple studies. Various studies have also demonstrated the potential for using gene expression profiles for the analysis of oncogenic pathways. Here we show that gene expression signatures can be identified that reflect the activation status of several oncogenic pathways. When evaluated in several large collections of human cancers, these gene expression signatures identify patterns of pathway deregulation in tumours and clinically relevant associations with disease outcomes. Combining signature-based predictions across several pathways identifies coordinated patterns of pathway deregulation that distinguish between specific cancers and tumour subtypes. Clustering tumours based on pathway signatures further defines prognosis in respective patient subsets, demonstrating that patterns of oncogenic pathway deregulation underlie the development of the oncogenic phenotype and reflect the biology and outcome of specific cancers. Predictions of pathway deregulation in cancer cell lines are also shown to predict the sensitivity to therapeutic agents that target components of the pathway. Linking pathway deregulation with sensitivity to therapeutics that target components of the pathway provides an opportunity to make use of these oncogenic pathway signatures to guide the use of targeted therapeutics.
    Nature 02/2006; 439(7074):353-7. DOI:10.1038/nature04296 · 42.35 Impact Factor
  • Source
  • Bayesian Inference for Gene Expression and Proteomics, Edited by K.A. Do and P. Mueller and M. Vannucci, 01/2006: pages 155-176; Cambridge University Press.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The integrated likelihood (also called the marginal likelihood or the normal-izing constant) is a central quantity in Bayesian model selection and model averaging. It is defined as the integral over the parameter space of the like-lihood times the prior density. The Bayes factor for model comparison and Bayesian testing is a ratio of integrated likelihoods, and the model weights in Bayesian model averaging are proportional to the integrated likelihoods. We consider the estimation of the integrated likelihood from posterior simulation output, aiming at a generic method that uses only the likelihoods from the posterior simulation iterations. The key is the harmonic mean identity, which says that the reciprocal of the integrated likelihood is equal to the posterior harmonic mean of the likelihood. The simplest estimator based on the iden-tity is thus the harmonic mean of the likelihoods. While this is an unbiased and simulation-consistent estimator, its reciprocal can have infinite variance and so it is unstable in general. We describe two methods for stabilizing the harmonic mean estimator. In the first one, the parameter space is reduced in such a way that the modified estimator involves a harmonic mean of heavier-tailed densities, thus resulting in a finite variance estimator. The resulting estimator is stable. It is also self-monitoring, since it obeys the central limit theorem, and so confidence in-tervals are available. We discuss general conditions under which this reduction is applicable. The second method is based on the fact that the posterior distribution of the log-likelihood is approximately a gamma distribution. This leads to an estimator of the maximum achievable likelihood, and also an estimator of the effective number of parameters that is extremely simple to compute from the loglikelihoods, independent of the model parametrization, and always positive. This yields estimates of the log integrated likelihood, and posterior simulation-based analogues of the BIC and AIC model selection criteria, called BICM and AICM. We provide standard errors for these criteria. We illustrate the proposed methods through several examples.
  • Source
    Beatrix Jones, Mike West
    [Show abstract] [Hide abstract]
    ABSTRACT: The covariance between two variables in a multivariate Gaussian distribution is decomposed into a sum of path weights for all paths connecting the two variables in an undirected independence graph. These weights are useful in determining which variables are important in mediating correlation between the two path endpoints. The decomposition arises in undirected Gaussian graphical models and does not require or involve any assumptions of causality. This covariance decomposition is derived using basic linear algebra. The decomposition is feasible for very large numbers of variables if the corresponding precision matrix is sparse, a circumstance that arises in examples such as gene expression studies in functional genomics. Additional computational efficiences are possible when the undirected graph is derived from an acyclic directed graph. Copyright 2005, Oxford University Press.
    Biometrika 12/2005; 92(4):779-786. DOI:10.1093/biomet/92.4.779 · 1.51 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Atherosclerosis is a chronic inflammatory process and progresses through characteristic morphologic stages. We have shown previously that chronically injecting bone-marrow-derived vascular progenitor cells can effect arterial repair. This repair capacity depends on the age of the injected marrow cells, suggesting a progressive decline in progenitor cell function. We hypothesized that the progression of atherosclerosis coincides with the deteriorating repair capacity of the bone marrow. Here, we ascribe patterns of gene expression that accurately and reproducibly identify specific disease states in murine atherosclerosis. We then use these expression patterns to determine the point in the disease process at which the repair of arteries by competent bone marrow cells ceases to be efficient. We show that the loss of the molecular signature for competent repair is concurrent with the initiation of atherosclerotic lesions. This work provides a previously unreported comprehensive molecular data set using broad-based analysis that links the loss of successful repair with the progression of a chronic illness.
    Proceedings of the National Academy of Sciences 12/2005; 102(46):16789-94. DOI:10.1073/pnas.0507718102 · 9.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The E2F family of transcription factors provides essential activities for coordinating the control of cellular proliferation and cell fate. Both E2F1 and E2F3 proteins have been shown to be particularly important for cell proliferation, whereas the E2F1 protein has the capacity to promote apoptosis. To explore the basis for this specificity of function, we used DNA microarray analysis to probe for the distinctions in the two E2F activities. Gene expression profiles that distinguish either E2F1- or E2F3-expressing cells from quiescent cells are enriched in genes encoding cell cycle and DNA replication activities, consistent with many past studies. E2F1 profile is also enriched in genes known to function in apoptosis. We also identified patterns of gene expression that specifically differentiate the activity of E2F1 and E2F3; this profile is enriched in genes known to function in mitosis. The specificity of E2F function has been attributed to protein interactions mediated by the marked box domain, and we now show that chimeric E2F proteins generate expression signatures that reflect the origin of the marked box, thus linking the biochemical mechanism for specificity of function with specificity of gene activation.
    Proceedings of the National Academy of Sciences 12/2005; 102(44):15948-53. DOI:10.1073/pnas.0504300102 · 9.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We discuss the implementation, development and performance of methods of stochastic computation in Gaussian graphical models. We view these methods from the perspective of high-dimensional model search, with a particular interest in the scalability with dimension of Markov chain Monte Carlo (MCMC) and other stochastic search methods. After reviewing the structure and context of undirected Gaussian graphical models and model uncertainty (covariance selection), we discuss prior specifications, including new priors over models, and then explore a number of examples using various methods of stochastic computation. Traditional MCMC methods are the point of departure for this experimentation; we then develop alternative stochastic search ideas and contrast this new approach with MCMC. Our examples range from low (12–20) to moderate (150) dimension, and combine simple synthetic examples with data analysis from gene expression studies. We conclude with comments about the need and potential for new computational methods in far higher dimensions, including constructive approaches to Gaussian graphical modeling and computation.
    Statistical Science 11/2005; DOI:10.1214/088342305000000304 · 1.69 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe a database and information discovery system named DIG (Duke Integrated Genomics) designed to facilitate the process of gene annotation and the discovery of functional context. The DIG system collects and organizes gene annotation and functional information, and includes tools that support an understanding of genes in a functional context by providing a framework for integrating and visualizing gene expression, protein interaction and literature-based interaction networks.
    Bioinformatics 08/2005; 21(13):2957-9. DOI:10.1093/bioinformatics/bti467 · 4.62 Impact Factor

Publication Stats

8k Citations
562.87 Total Impact Points

Institutions

  • 1998–2013
    • Duke University
      • • Department of Statistical Science
      • • Department of Molecular Genetics and Microbiology
      Durham, North Carolina, United States
  • 2010
    • University of Chicago
      Chicago, Illinois, United States
    • Paris Dauphine University
      Lutetia Parisorum, Île-de-France, France
  • 2008
    • University of Miami Miller School of Medicine
      • Division of Hospital Medicine
      Miami, Florida, United States
  • 2005
    • Duke University Medical Center
      • Department of Medicine
      Durham, NC, United States
  • 2004
    • Federal University of Rio de Janeiro
      Rio de Janeiro, Rio de Janeiro, Brazil
  • 2001–2003
    • Howard Hughes Medical Institute
      Ashburn, Virginia, United States
  • 2002
    • University of Cambridge
      • Department of Engineering
      Cambridge, England, United Kingdom