A Partitioning Deletion/Substitution/Addition Algorithm for Creating Survival Risk Groups

Division of Biostatistics, Yale University Schools of Public Health and Medicine, 60 College Street, New Haven, Connecticut 06519, U.S.A. Department of Biological Statistics and Computational Biology and Department of Statistical Science, Cornell University, Ithaca, New York, U.S.A. Departments of Neurological Surgery and Epidemiology and Biostatistics, University of California San Francisco, 505 Parnassus Avenue, San Francisco, California 94117, U.S.A. email: .
Biometrics (Impact Factor: 1.57). 04/2012; 68(4). DOI: 10.1111/j.1541-0420.2012.01756.x
Source: PubMed


Accurately assessing a patient's risk of a given event is essential in making informed treatment decisions. One approach is to stratify patients into two or more distinct risk groups with respect to a specific outcome using both clinical and demographic variables. Outcomes may be categorical or continuous in nature; important examples in cancer studies might include level of toxicity or time to recurrence. Recursive partitioning methods are ideal for building such risk groups. Two such methods are Classification and Regression Trees (CART) and a more recent competitor known as the partitioning Deletion/Substitution/Addition (partDSA) algorithm, both of which also utilize loss functions (e.g., squared error for a continuous outcome) as the basis for building, selecting, and assessing predictors but differ in the manner by which regression trees are constructed. Recently, we have shown that partDSA often outperforms CART in so-called "full data" settings (e.g., uncensored outcomes). However, when confronted with censored outcome data, the loss functions used by both procedures must be modified. There have been several attempts to adapt CART for right-censored data. This article describes two such extensions for partDSA that make use of observed data loss functions constructed using inverse probability of censoring weights. Such loss functions are consistent estimates of their uncensored counterparts provided that the corresponding censoring model is correctly specified. The relative performance of these new methods is evaluated via simulation studies and illustrated through an analysis of clinical trial data on brain cancer patients. The implementation of partDSA for uncensored and right-censored outcomes is publicly available in the R package, partDSA.

Download full-text


Available from: Robert L Strawderman, Aug 27, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Time-to-event regression models are a critical tool for associating survival time outcomes with molecular data. Despite mounting evidence that genetic subgroups of the same clinical disease exist, little attention has been given to exploring how this heterogeneity affects time-to-event model building and how to accomodate it. Methods able to diagnose and model heterogeneity should be valuable additions to the biomarker discovery toolset. We propose a mixture of survival functions that classifies subjects with similar relationships to a time-to-event response. This model incorporates multivariate regression and model selection and can be fit with an EM algorithm, we call Cox-Assisted Clustering (CAC). We illustrate a likely manifestation of genetic heterogeneity and demonstrate how it may affect survival models with little warning. An application to gene expression in ovarian cancer DNA repair pathways illustrates how the model may be used to learn new genetic subsets for risk stratification.We explore the implications of this model for censored observations and the effect on genomic predictors and diagnostic analysis. R implementation of CAC using standard packages is available at https://gist.github.com/programeng/8620b85146b14b6edf8f Data used in the analysis are publicly available. kevin.eng@roswellpark.org.
    Bioinformatics 02/2014; 30(12). DOI:10.1093/bioinformatics/btu065 · 4.98 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: For estimating conditional survival functions, non-parametric estimators can be preferred to parametric and semi-parametric estimators due to relaxed assumptions that enable robust estimation. Yet, even when misspecified, parametric and semi-parametric estimators can possess better operating characteristics in small sample sizes due to smaller variance than non-parametric estimators. Fundamentally, this is a bias-variance trade-off situation in that the sample size is not large enough to take advantage of the low bias of non-parametric estimation. Stacked survival models estimate an optimally weighted combination of models that can span parametric, semi-parametric, and non-parametric models by minimizing prediction error. An extensive simulation study demonstrates that stacked survival models consistently perform well across a wide range of scenarios by adaptively balancing the strengths and weaknesses of individual candidate survival models. In addition, stacked survival models perform as well as or better than the model selected through cross-validation. Finally, stacked survival models are applied to a well-known German breast cancer study. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
    Biostatistics 02/2015; 16(3). DOI:10.1093/biostatistics/kxv001 · 2.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The effect of timing of initiation of concurrent radiation and chemotherapy after surgery on outcome of patients with glioblastoma (GBM) remains unclear. To further explore this issue, we analyzed 4 clinical trials for patients newly diagnosed with GBM receiving concurrent and adjuvant temozolomide. The cohort study included 198 adult patients with newly diagnosed supratentorial GBM who were enrolled from 2004 to 2010 in 4 clinical trials consisting of radiation plus temozolomide and an experimental agent. The interval to initiation of therapy was determined from the time of surgical resection. The partitioning deletion/substitution/addition algorithm was used to determine the cutoff points for timing of chemoradiation at which there was a significant difference in overall survival (OS) and progression-free survival (PFS). The median wait time between surgery and initiation of concurrent chemoradiation was 29.5 days (range, 7-56 days). A short delay in chemoradiation administration (at 30-34 days) was predictive of prolonged OS (hazard ratio [HR]: 0.63, P = .03) and prolonged PFS (HR: 0.68, P = .06) compared with early initiation of concurrent chemoradiation (<30 days), after adjusting for protocol and baseline prognostic variables including extent of resection by multivariate analysis. A longer delay to chemoradiation beyond 34 days was not associated with improved OS or PFS compared with early initiation (HR: 0.94, P = .77 and HR: 0.91, P = .63, respectively). A short delay in the start of concurrent chemoradiation is beyond the classic paradigm of 4 weeks post-resection and may be associated with prolonged OS and PFS. GBM, glioblastomaKPS, Karnofsky Performance ScoreOS, overall survivalpartDSA, partitioning deletion/substitution/addition algorithmPFS, progression-free survivalRT, radiation therapyTMZ, temozolomide.
    Neurosurgery 04/2015; Publish Ahead of Print(2). DOI:10.1227/NEU.0000000000000766 · 3.62 Impact Factor