A Partitioning Deletion/Substitution/Addition Algorithm for Creating Survival Risk Groups

Division of Biostatistics, Yale University Schools of Public Health and Medicine, 60 College Street, New Haven, Connecticut 06519, U.S.A. Department of Biological Statistics and Computational Biology and Department of Statistical Science, Cornell University, Ithaca, New York, U.S.A. Departments of Neurological Surgery and Epidemiology and Biostatistics, University of California San Francisco, 505 Parnassus Avenue, San Francisco, California 94117, U.S.A. email: .
Biometrics (Impact Factor: 1.52). 04/2012; 68(4). DOI: 10.1111/j.1541-0420.2012.01756.x
Source: PubMed

ABSTRACT Accurately assessing a patient's risk of a given event is essential in making informed treatment decisions. One approach is to stratify patients into two or more distinct risk groups with respect to a specific outcome using both clinical and demographic variables. Outcomes may be categorical or continuous in nature; important examples in cancer studies might include level of toxicity or time to recurrence. Recursive partitioning methods are ideal for building such risk groups. Two such methods are Classification and Regression Trees (CART) and a more recent competitor known as the partitioning Deletion/Substitution/Addition (partDSA) algorithm, both of which also utilize loss functions (e.g., squared error for a continuous outcome) as the basis for building, selecting, and assessing predictors but differ in the manner by which regression trees are constructed. Recently, we have shown that partDSA often outperforms CART in so-called "full data" settings (e.g., uncensored outcomes). However, when confronted with censored outcome data, the loss functions used by both procedures must be modified. There have been several attempts to adapt CART for right-censored data. This article describes two such extensions for partDSA that make use of observed data loss functions constructed using inverse probability of censoring weights. Such loss functions are consistent estimates of their uncensored counterparts provided that the corresponding censoring model is correctly specified. The relative performance of these new methods is evaluated via simulation studies and illustrated through an analysis of clinical trial data on brain cancer patients. The implementation of partDSA for uncensored and right-censored outcomes is publicly available in the R package, partDSA.

Download full-text


Available from: Robert L Strawderman, Aug 27, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Time-to-event regression models are a critical tool for associating survival time outcomes with molecular data. Despite mounting evidence that genetic subgroups of the same clinical disease exist, little attention has been given to exploring how this heterogeneity affects time-to-event model building and how to accomodate it. Methods able to diagnose and model heterogeneity should be valuable additions to the biomarker discovery toolset. We propose a mixture of survival functions that classifies subjects with similar relationships to a time-to-event response. This model incorporates multivariate regression and model selection and can be fit with an EM algorithm, we call Cox-Assisted Clustering (CAC). We illustrate a likely manifestation of genetic heterogeneity and demonstrate how it may affect survival models with little warning. An application to gene expression in ovarian cancer DNA repair pathways illustrates how the model may be used to learn new genetic subsets for risk stratification.We explore the implications of this model for censored observations and the effect on genomic predictors and diagnostic analysis. R implementation of CAC using standard packages is available at Data used in the analysis are publicly available.
    Bioinformatics 02/2014; 30(12). DOI:10.1093/bioinformatics/btu065 · 4.62 Impact Factor