[Show abstract][Hide abstract] ABSTRACT: Although the transition from early- to advanced-stage ovarian cancer is a critical determinant of survival, little is known about the molecular underpinnings of ovarian metastasis. We hypothesize that microarray analysis of global gene expression patterns in primary ovarian cancer and metastatic omental implants can identify genes that underlie the metastatic process in epithelial ovarian cancer. We utilized Affymetrix U95Av2 microarrays to characterize the molecular alterations that underlie omental metastasis from 47 epithelial ovarian cancer samples collected from multiple sites in 20 patients undergoing primary surgical cytoreduction for advanced-stage (IIIC/IV) serous ovarian cancer. Fifty-six genes demonstrated differential expression between ovarian and omental samples (P < 0.01), and twenty of these 56 differentially expressed genes have previously been implicated in metastasis, cell motility, or cytoskeletal function. Ten of the 56 genes are involved in p53 gene pathways. A Bayesian statistical tree analysis was used to identify a 27-gene expression pattern that could accurately predict the site of tumor (ovary versus omentum). This predictive model was evaluated using an external data set. Nine of the 27 predictive genes have previously been shown to be involved in oncogenesis and/or metastasis, and 10/27 genes have been implicated in p53 pathways. Microarray findings were validated by real-time quantitative PCR. We conclude that gene expression patterns that distinguish omental metastasis from primary epithelial ovarian cancer can be identified and that many of the genes have functions that are biologically consistent with a role in oncogenesis, metastasis, and p53 gene networks.
International Journal of Gynecological Cancer 08/2006; 16(5):1733-45. · 1.95 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Objective Bayesian inference for the multivariate normal distribution is il-lustrated, using different types of formal objective priors (Jeffreys, invari-ant, reference and matching), different modes of inference (Bayesian and fre-quentist), and different criteria involved in selecting optimal objective pri-ors (ease of computation, frequentist performance, marginalization paradoxes, and decision-theoretic evaluation). In the course of the investigation of the bivariate normal model in Berger and Sun (2006), a variety of surprising results were found, including the availability of objective priors that yield exact frequentist inferences for many functions of the bivariate normal parameters, such as the correlation coefficient. Certain of these results are generalized to the multivariate normal situation.
[Show abstract][Hide abstract] ABSTRACT: This paper introduces a novel class of Bayesian models for multivariate time series analysis based on a synthesis of dynamic linear models and graphical models. The models are then applied in the context of financial time series for predictive portfolio analysis providing a significant improvement in per-formance of optimal investment decisions.
World Meeting on Bayesian Statistics Benidorm. 07/2006;
[Show abstract][Hide abstract] ABSTRACT: Numerous recent studies have demonstrated the use of genomic data, particularly gene expression signatures, as clinical prognostic factors in cancer and other complex diseases. Such studies herald the future of genomic medicine and the opportunity for personalized prognosis in a variety of clinical contexts that utilizes genome-scale molecular information. The scale, complexity, and information content of high-throughput gene expression data, as one example of complex genomic information, is often under-appreciated as many analyses continue to focus on defining individual rather than multiplex biomarkers for patient stratification. Indeed, this complexity of genomic data is often--rather paradoxically--viewed as a barrier to its utility. To the contrary, the complexity and scale of global genomic data, as representing the many dimensions of biology, must be embraced for the development of more precise clinical prognostics. The need is for integrated analyses--approaches that embrace the complexity of genomic data, including multiple forms of genomic data, and aim to explore and understand multiple, interacting, and potentially conflicting predictors of risk, rather than continuing on the current and traditional path that oversimplifies and ignores the information content in the complexity. All forms of potentially relevant data should be examined, with particular emphasis on understanding the interactions, complementarities, and possible conflicts among gene expression, genetic, and clinical markers of risk.
Genome Research 06/2006; 16(5):559-66. · 13.85 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: To develop clinical prediction models for local regional recurrence (LRR) of breast carcinoma after mastectomy that will be superior to the conventional measures of tumor size and nodal status.
Clinical information from 1,010 invasive breast cancer patients who had primary modified radical mastectomy formed the database of the training and testing of clinical prognostic and prediction models of LRR. Cox proportional hazards analysis and Bayesian tree analysis were the core methodologies from which these models were built. To generate a prognostic index model, 15 clinical variables were examined for their impact on LRR. Patients were stratified by lymph node involvement (<4 vs. >or =4) and local regional status (recurrent vs. control) and then, within strata, randomly split into training and test data sets of equal size. To establish prediction tree models, 255 patients were selected by the criteria of having had LRR (53 patients) or no evidence of LRR without postmastectomy radiotherapy (PMRT) (202 patients).
With these models, patients can be divided into low-, intermediate-, and high-risk groups on the basis of axillary nodal status, estrogen receptor status, lymphovascular invasion, and age at diagnosis. In the low-risk group, there is no influence of PMRT on either LRR or survival. For intermediate-risk patients, PMRT improves LR control but not metastases-free or overall survival. For the high-risk patients, however, PMRT improves both LR control and metastasis-free and overall survival.
The prognostic score and predictive index are useful methods to estimate the risk of LRR in breast cancer patients after mastectomy and for estimating the potential benefits of PMRT. These models provide additional information criteria for selection of patients for PMRT, compared with the traditional selection criteria of nodal status and tumor size.
International Journal of Radiation OncologyBiologyPhysics 04/2006; 64(5):1401-9. · 4.18 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Breast cancer is a heterogeneous disease, and markers for disease subtypes and therapy response remain poorly defined. For that reason, we employed a prospective neoadjuvant study in locally advanced breast cancer to identify molecular signatures of gene expression correlating with known prognostic clinical phenotypes, such as inflammatory breast cancer or the presence of hypoxia. In addition, we defined molecular signatures that correlate with response to neoadjuvant chemotherapy.
Tissue was collected under ultrasound guidance from patients with stage IIB/III breast cancer before four cycles of neoadjuvant liposomal doxorubicin paclitaxel chemotherapy combined with local whole breast hyperthermia. Gene expression analysis was done using Affymetrix U133 Plus 2.0 GeneChip arrays.
Gene expression patterns were identified that defined the phenotypes of inflammatory breast cancer as well as tumor hypoxia. In addition, molecular signatures were identified that predicted the persistence of malignancy in the axillary lymph nodes after neoadjuvant chemotherapy. This persistent lymph node signature significantly correlated with disease-free survival in two separate large populations of breast cancer patients.
Gene expression signatures have the capacity to identify clinically significant features of breast cancer and can predict which individual patients are likely to be resistant to neoadjuvant therapy, thus providing the opportunity to guide treatment decisions.
Clinical Cancer Research 03/2006; 12(3 Pt 1):819-26. · 8.19 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The development of an oncogenic state is a complex process involving the accumulation of multiple independent mutations that lead to deregulation of cell signalling pathways central to the control of cell growth and cell fate. The ability to define cancer subtypes, recurrence of disease and response to specific therapies using DNA microarray-based gene expression signatures has been demonstrated in multiple studies. Various studies have also demonstrated the potential for using gene expression profiles for the analysis of oncogenic pathways. Here we show that gene expression signatures can be identified that reflect the activation status of several oncogenic pathways. When evaluated in several large collections of human cancers, these gene expression signatures identify patterns of pathway deregulation in tumours and clinically relevant associations with disease outcomes. Combining signature-based predictions across several pathways identifies coordinated patterns of pathway deregulation that distinguish between specific cancers and tumour subtypes. Clustering tumours based on pathway signatures further defines prognosis in respective patient subsets, demonstrating that patterns of oncogenic pathway deregulation underlie the development of the oncogenic phenotype and reflect the biology and outcome of specific cancers. Predictions of pathway deregulation in cancer cell lines are also shown to predict the sensitivity to therapeutic agents that target components of the pathway. Linking pathway deregulation with sensitivity to therapeutics that target components of the pathway provides an opportunity to make use of these oncogenic pathway signatures to guide the use of targeted therapeutics.
[Show abstract][Hide abstract] ABSTRACT: We describe and illustrate approaches to data augmentation in multi-way contingency tables for which partial information, in the form of subsets of marginal totals, is available. In such problems, interest lies in questions of inference about the parameters of models underlying the table together with imputation for the individual cell entries. We discuss questions of structure related to the implications for inference on cell counts arising from assumptions about log-linear model forms, and a class of simple and useful prior distributions on the parameters of log-linear models. We then discuss “local move” and “global move” Metropolis–Hastings simulation methods for exploring the posterior distributions for parameters and cell counts, focusing particularly on higher-dimensional problems. As a by-product, we note potential uses of the “global move” approach for inference about numbers of tables consistent with a prescribed subset of marginal counts. Illustration and comparison of MCMC approaches is given, and we conclude with discussion of areas for further developments and current open issues.
Journal of Statistical Planning and Inference 02/2006; · 0.60 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The integrated likelihood (also called the marginal likelihood or the normal-izing constant) is a central quantity in Bayesian model selection and model averaging. It is defined as the integral over the parameter space of the like-lihood times the prior density. The Bayes factor for model comparison and Bayesian testing is a ratio of integrated likelihoods, and the model weights in Bayesian model averaging are proportional to the integrated likelihoods. We consider the estimation of the integrated likelihood from posterior simulation output, aiming at a generic method that uses only the likelihoods from the posterior simulation iterations. The key is the harmonic mean identity, which says that the reciprocal of the integrated likelihood is equal to the posterior harmonic mean of the likelihood. The simplest estimator based on the iden-tity is thus the harmonic mean of the likelihoods. While this is an unbiased and simulation-consistent estimator, its reciprocal can have infinite variance and so it is unstable in general. We describe two methods for stabilizing the harmonic mean estimator. In the first one, the parameter space is reduced in such a way that the modified estimator involves a harmonic mean of heavier-tailed densities, thus resulting in a finite variance estimator. The resulting estimator is stable. It is also self-monitoring, since it obeys the central limit theorem, and so confidence in-tervals are available. We discuss general conditions under which this reduction is applicable. The second method is based on the fact that the posterior distribution of the log-likelihood is approximately a gamma distribution. This leads to an estimator of the maximum achievable likelihood, and also an estimator of the effective number of parameters that is extremely simple to compute from the loglikelihoods, independent of the model parametrization, and always positive. This yields estimates of the log integrated likelihood, and posterior simulation-based analogues of the BIC and AIC model selection criteria, called BICM and AICM. We provide standard errors for these criteria. We illustrate the proposed methods through several examples.
[Show abstract][Hide abstract] ABSTRACT: The E2F family of transcription factors provides essential activities for coordinating the control of cellular proliferation and cell fate. Both E2F1 and E2F3 proteins have been shown to be particularly important for cell proliferation, whereas the E2F1 protein has the capacity to promote apoptosis. To explore the basis for this specificity of function, we used DNA microarray analysis to probe for the distinctions in the two E2F activities. Gene expression profiles that distinguish either E2F1- or E2F3-expressing cells from quiescent cells are enriched in genes encoding cell cycle and DNA replication activities, consistent with many past studies. E2F1 profile is also enriched in genes known to function in apoptosis. We also identified patterns of gene expression that specifically differentiate the activity of E2F1 and E2F3; this profile is enriched in genes known to function in mitosis. The specificity of E2F function has been attributed to protein interactions mediated by the marked box domain, and we now show that chimeric E2F proteins generate expression signatures that reflect the origin of the marked box, thus linking the biochemical mechanism for specificity of function with specificity of gene activation.
Proceedings of the National Academy of Sciences 12/2005; 102(44):15948-53. · 9.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Atherosclerosis is a chronic inflammatory process and progresses through characteristic morphologic stages. We have shown previously that chronically injecting bone-marrow-derived vascular progenitor cells can effect arterial repair. This repair capacity depends on the age of the injected marrow cells, suggesting a progressive decline in progenitor cell function. We hypothesized that the progression of atherosclerosis coincides with the deteriorating repair capacity of the bone marrow. Here, we ascribe patterns of gene expression that accurately and reproducibly identify specific disease states in murine atherosclerosis. We then use these expression patterns to determine the point in the disease process at which the repair of arteries by competent bone marrow cells ceases to be efficient. We show that the loss of the molecular signature for competent repair is concurrent with the initiation of atherosclerotic lesions. This work provides a previously unreported comprehensive molecular data set using broad-based analysis that links the loss of successful repair with the progression of a chronic illness.
Proceedings of the National Academy of Sciences 12/2005; 102(46):16789-94. · 9.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We discuss the implementation, development and performance of methods of stochastic computation in Gaussian graphical models. We view these methods from the perspective of high-dimensional model search, with a particular interest in the scalability with dimension of Markov chain Monte Carlo (MCMC) and other stochastic search methods. After reviewing the structure and context of undirected Gaussian graphical models and model uncertainty (covariance selection), we discuss prior specifications, including new priors over models, and then explore a number of examples using various methods of stochastic computation. Traditional MCMC methods are the point of departure for this experimentation; we then develop alternative stochastic search ideas and contrast this new approach with MCMC. Our examples range from low (12–20) to moderate (150) dimension, and combine simple synthetic examples with data analysis from gene expression studies. We conclude with comments about the need and potential for new computational methods in far higher dimensions, including constructive approaches to Gaussian graphical modeling and computation.
[Show abstract][Hide abstract] ABSTRACT: We describe a database and information discovery system named DIG (Duke Integrated Genomics) designed to facilitate the process of gene annotation and the discovery of functional context. The DIG system collects and organizes gene annotation and functional information, and includes tools that support an understanding of genes in a functional context by providing a framework for integrating and visualizing gene expression, protein interaction and literature-based interaction networks.
[Show abstract][Hide abstract] ABSTRACT: A better understanding of the underlying biology of invasive serous ovarian cancer is critical for the development of early detection strategies and new therapeutics. The objective of this study was to define gene expression patterns associated with favorable survival.
RNA from 65 serous ovarian cancers was analyzed using Affymetrix U133A microarrays. This included 54 stage III/IV cases (30 short-term survivors who lived <3 years and 24 long-term survivors who lived >7 years) and 11 stage I/II cases. Genes were screened on the basis of their level of and variability in expression, leaving 7,821 for use in developing a predictive model for survival. A composite predictive model was developed that combines Bayesian classification tree and multivariate discriminant models. Leave-one-out cross-validation was used to select and evaluate models.
Patterns of genes were identified that distinguish short-term and long-term ovarian cancer survivors. The expression model developed for advanced stage disease classified all 11 early-stage ovarian cancers as long-term survivors. The MAL gene, which has been shown to confer resistance to cancer therapy, was most highly overexpressed in short-term survivors (3-fold compared with long-term survivors, and 29-fold compared with early-stage cases). These results suggest that gene expression patterns underlie differences in outcome, and an examination of the genes that provide this discrimination reveals that many are implicated in processes that define the malignant phenotype.
Differences in survival of advanced ovarian cancers are reflected by distinct patterns of gene expression. This biological distinction is further emphasized by the finding that early-stage cancers share expression patterns with the advanced stage long-term survivors, suggesting a shared favorable biology.
Clinical Cancer Research 05/2005; 11(10):3686-96. · 8.19 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Despite the strikingly grave prognosis for older patients with glioblastomas, significant variability in patient outcome is experienced. To explore the potential for developing improved prognostic capabilities based on the elucidation of potential biological relationships, we did analyses of genes commonly mutated, amplified, or deleted in glioblastomas and DNA microarray gene expression data from tumors of glioblastoma patients of age >50 for whom survival is known. No prognostic significance was associated with genetic changes in epidermal growth factor receptor (amplified in 17 of 41 patients), TP53 (mutated in 11 of 41 patients), p16INK4A (deleted in 15 of 33 patients), or phosphatase and tensin homologue (mutated in 15 of 41 patients). Statistical analysis of the gene expression data in connection with survival involved exploration of regression models on small subsets of genes, based on computational search over multiple regression models with cross-validation to assess predictive validity. The analysis generated a set of regression models that, when weighted and combined according to posterior probabilities implied by the statistical analysis, identify patterns in expression of a small subset of genes that are associated with survival and have value in assessing survival risks. The dominant genes across such multiple regression models involve three key genes-SPARC (Osteonectin), Doublecortex, and Semaphorin3B-which play key roles in cellular migration processes. Additional analysis, based on statistical graphical association models constructed using similar computational analysis methods, reveals other genes which support the view that multiple mediators of tumor invasion may be important prognostic factor in glioblastomas in older patients.
Cancer Research 05/2005; 65(10):4051-8. · 9.28 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Model search in regression with very large numbers of candidate predictors raises chal-lenges for both model specification and computation, and standard approaches such as Markov chain Monte Carlo (MCMC) and step-wise methods are often infeasible or inef-fective. We describe a novel shotgun stochastic search (SSS) approach that explores "in-teresting" regions of the resulting, very high-dimensional model spaces to quickly identify regions of high posterior probability over models. We describe algorithmic and modeling aspects, priors over the model space that induce sparsity and parsimony over and above the traditional dimension penalization implicit in Bayesian and likelihood analyses, and parallel computation using cluster computers. We discuss an example from gene expression cancer genomics, comparisons with MCMC and other methods, and theoretical and simulation-based aspects of performance characteristics in large-scale regression model search. We also provide software implementing the methods.
Journal of the American Statistical Association 04/2005; 102(June). · 2.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The covariance between two variables in a multivariate Gaussian distribution is decomposed into a sum of path weights for all paths connecting the two variables in an undirected independence graph. These weights are useful in determining which variables are important in mediating correlation between the two path endpoints. The decomposition arises in undirected Gaussian graphical models and does not require or involve any assumptions of causality. This covariance decomposition is derived using basic linear algebra. The decomposition is feasible for very large numbers of variables if the corresponding precision matrix is sparse, a circumstance that arises in examples such as gene expression studies in functional genomics. Additional computational efficiences are possible when the undirected graph is derived from an acyclic directed graph. Copyright 2005, Oxford University Press.