ArticlePDF Available

Abstract and Figures

Accurately predicting the time of occurrence of an event of interest is a critical problem in longitudinal data analysis. One of the main challenges in this context is the presence of instances whose event outcomes become unobservable after a certain time point or when some instances do not experience any event during the monitoring period. Such a phenomenon is called censoring which can be effectively handled using survival analysis techniques. Traditionally, statistical approaches have been widely developed in the literature to overcome this censoring issue. In addition, many machine learning algorithms are adapted to effectively handle survival data and tackle other challenging problems that arise in real-world data. In this survey, we provide a comprehensive and structured review of the representative statistical methods along with the machine learning techniques used in survival analysis and provide a detailed taxonomy of the existing methods. We also discuss several topics that are closely related to survival analysis and illustrate several successful applications in various real-world application domains. We hope that this paper will provide a more thorough understanding of the recent advances in survival analysis and offer some guidelines on applying these approaches to solve new problems that arise in applications with censored data.
No caption available
… 
No caption available
… 
Content may be subject to copyright.
A preview of the PDF is not available
... Survival analysis is an important and fundamental tool for modelling applications using time-toevent data [1] which can be encountered in medicine, reliability, safety, finance, etc. This is a reason why many machine learning models have been developed to deal with time-to-event data and to solve the corresponding problems in the framework of survival analysis [2,3,4]. The crucial peculiarity of time-to-event data is that a training set consists of censored and uncensored observations. ...
... Many survival models are available to cover various cases of the time-to-event probability distributions and their parameters [3]. One of the important models is the Cox proportional hazards model [5] which can be regarded as a regression semi-parametric model. ...
... Importance of the survival analysis applications can be regarded as one of the reasons for developing many machine learning methods dealing with censored and time-to-event data. A comprehensive review of machine learning survival models is presented in [3]. A large part of models uses the Cox model which can be viewed as a simple and applicable survival model which establish a relationship between covariates and outcomes. ...
Preprint
A method for estimating the conditional average treatment effect under condition of censored time-to-event data called BENK (the Beran Estimator with Neural Kernels) is proposed. The main idea behind the method is to apply the Beran estimator for estimating the survival functions of controls and treatments. Instead of typical kernel functions in the Beran estimator, it is proposed to implement kernels in the form of neural networks of a specific form called the neural kernels. The conditional average treatment effect is estimated by using the survival functions as outcomes of the control and treatment neural networks which consists of a set of neural kernels with shared parameters. The neural kernels are more flexible and can accurately model a complex location structure of feature vectors. Various numerical simulation experiments illustrate BENK and compare it with the well-known T-learner, S-learner and X-learner for several types of the control and treatment outcome functions based on the Cox models, the random survival forest and the Nadaraya-Watson regression with Gaussian kernels. The code of proposed algorithms implementing BENK is available in https://github.com/Stasychbr/BENK.
... One essential line of work to tackle this challenge adopts the proportional hazards assumption (PH) [5], instead of attempting to fully model the survival function. More recently, traditional survival analysis methods [5] have been complemented and then superseded by machine learning approaches; for a survey, see [39]. For example, with the increasing success of deep learning methods, DeepSurv [18] has reported a significant increase in performance by employing a neural network with a loss function adapted to hold the assumed proportionality of hazards. ...
... Ridge-Cox [36] and lasso-Cox [37] add l 1 and l 2 regularization terms to the original Cox model, respectively. Wang, Li, and Reddy [39] reveal a recent survey on the intersection between survival analysis and machine learning research. Survival random forest (RSF) adopts ensemble learning to cope with censored cases [14,15], and Khan and Zubek [20] introduce support vector regression for censored data (SVRC). ...
Preprint
Survival analysis is the branch of statistics that studies the relation between the characteristics of living entities and their respective survival times, taking into account the partial information held by censored cases. A good analysis can, for example, determine whether one medical treatment for a group of patients is better than another. With the rise of machine learning, survival analysis can be modeled as learning a function that maps studied patients to their survival times. To succeed with that, there are three crucial issues to be tackled. First, some patient data is censored: we do not know the true survival times for all patients. Second, data is scarce, which led past research to treat different illness types as domains in a multi-task setup. Third, there is the need for adaptation to new or extremely rare illness types, where little or no labels are available. In contrast to previous multi-task setups, we want to investigate how to efficiently adapt to a new survival target domain from multiple survival source domains. For this, we introduce a new survival metric and the corresponding discrepancy measure between survival distributions. These allow us to define domain adaptation for survival analysis while incorporating censored data, which would otherwise have to be dropped. Our experiments on two cancer data sets reveal a superb performance on target domains, a better treatment recommendation, and a weight matrix with a plausible explanation.
... The proposed model achieved better prediction performance than DeepSurv and CPH in view of discrimination, calibration, and ability of risk stratification. The multivariate CPH model is the most-used survival model to fit the relationship between patients' covariates and outcomes [27]. However, its prediction ability may be limited by linear assumptions. ...
Article
Full-text available
Heart failure (HF) is challenging public medical and healthcare systems. This study aimed to develop and validate a novel deep learning-based prognostic model to predict the risk of all-cause mortality for patients with HF. We also compared the performance of the proposed model with those of classical deep learning- and traditional statistical-based models. The present study enrolled 730 patients with HF hospitalized at Toho University Ohashi Medical Center between April 2016 and March 2020. A recurrent neural network-based model (RNNSurv) involving time-varying covariates was developed and validated. The proposed RNNSurv showed better prediction performance than those of a deep feed-forward neural network-based model (referred as “DeepSurv”) and a multivariate Cox proportional hazard model in view of discrimination (C-index: 0.839 vs. 0.755 vs. 0.762, respectively), calibration (better fit with a 45-degree line), and ability of risk stratification, especially identifying patients with high risk of mortality. The proposed RNNSurv demonstrated an improved prediction performance in consideration of temporal information from time-varying covariates that could assist clinical decision-making. Additionally, this study found that significant risk and protective factors of mortality were specific to risk levels, highlighting the demand for an individual-specific clinical strategy instead of a uniform one for all patients.
... Machine learning methods demonstrated the ability to learn feature representations automatically from numeric data under analysis. These features are considered to automatically build models with well-known classification algorithms [18]. In detail, supervised machine learning algorithms can be exploited in order to build predictive models starting from labeled radiomic characteristics. ...
Article
Full-text available
The Gleason score was originally formulated to represent the heterogeneity of prostate cancer and helps to stratify the risk of patients affected by this tumor. The Gleason score assigning represents an on H&E stain task performed by pathologists upon histopathological examination of needle biopsies or surgical specimens. In this paper, we propose an approach focused on the automatic Gleason score classification. We exploit a set of 18 radiomic features. The radiomic feature set is directly obtainable from segmented magnetic resonance images. We build several models considering supervised machine learning techniques, obtaining with the RandomForest classification algorithm a precision ranging from 0.803 to 0.888 and a recall from to 0.873 to 0.899. Moreover, with the aim to increase the never seen instance detection, we exploit the sigmoid calibration to better tune the built model.
... The analysis developed by the National Bank of Italian Republic focuses on internal and external factors and causes related to the realization of a construction work, following a statistical methodology which allows to correlate different factors. Specific time elapsing between a fixed starting point and a closing event are considered in a statistical method named survival analysis [10]. ...
Chapter
The average incidence of design variation and unexpected events related to the delivery of construction sites is about 13% compared to project planning. The objective of this study is to lower this percentage to about 10–11%, using methodologies aimed at configuring effective digital management strategies. The main purpose is to enhance process efficiency and optimization of construction management strategies, as BIM-based digital management approaches allow to predict unforeseen events, reducing negative variation in time and costs.The proposed application case concerns an applied methodology conducted on a 35,000 m2 historical building renovation project, in a central urban context of Rome, owned by a public institutional real estate company.The implementation of the proposed BIM-based digital information management strategies allowed to enhance efficiency in site management, reducing delay’s incidence on construction site delivery of about 3%. Such improvement is related to the reduction of delays deriving from prediction of unexpected events. The application of the proposed methodology radically improved the traditional site management strategy used by the construction company, generating a significant reduction of wastes in time and resources; in fact, the use of AI and ML systems promptly supported decision-making processes.The result is the configuration of a digital process allowing an optimized time and material management process through real-time monitoring of on-site activities, configuring an effective decision-making support system.Moreover, the information model was also developed according to Asset Information Model (AIM) requirements able to provide a reliable database for the operation and maintenance phase. KeywordsDigital construction managementBIMMachine learning
Chapter
Recent research has shown the potential for neural networks to improve upon classical survival models such as the Cox model, which is widely used in clinical practice. Neural networks, however, typically rely on data that are centrally available, whereas healthcare data are frequently held in secure silos. We present a federated Cox model that accommodates this data setting and also relaxes the proportional hazards assumption, allowing time-varying covariate effects. In this latter respect, our model does not require explicit specification of the time-varying effects, reducing upfront organisational costs compared to previous works. We experiment with publicly available clinical datasets and demonstrate that the federated model is able to perform as well as a standard model.KeywordsSurvival analysisFederated learningNon-proportional hazards
Article
Biomedical multi-modality data (also named multi-omics data) refer to data that span different types and derive from multiple sources in clinical practices (e.g. gene sequences, proteomics and histopathological images), which can provide comprehensive perspectives for cancers and generally improve the performance of survival models. However, the performance improvement of multi-modality survival models may be hindered by two key issues as follows: (1) how to learn and fuse modality-sharable and modality-individual representations from multi-modality data; (2) how to explore the potential risk-aware characteristics in each risk subgroup, which is beneficial to risk stratification and prognosis evaluation. Additionally, learning-based survival models generally refer to numerous hyper-parameters, which requires time-consuming parameter setting and might result in a suboptimal solution. In this paper, we propose an adaptive risk-aware sharable and individual subspace learning method for cancer survival analysis. The proposed method jointly learns sharable and individual subspaces from multi-modality data, whereas two auxiliary terms (i.e. intra-modality complementarity and inter-modality incoherence) are developed to preserve the complementary and distinctive properties of each modality. Moreover, it equips with a grouping co-expression constraint for obtaining risk-aware representation and preserving local consistency. Furthermore, an adaptive-weighted strategy is employed to efficiently estimate crucial parameters during the training stage. Experimental results on three public datasets demonstrate the superiority of our proposed model.
Article
Full-text available
An attention-based random survival forest (Att-RSF) is presented in the paper. The first main idea behind this model is to adapt the Nadaraya-Watson kernel regression to the random survival forest so that the regression weights or kernels can be regarded as trainable attention weights under important condition that predictions of the random survival forest are represented in the form of functions, for example, the survival function and the cumulative hazard function. Each trainable weight assigned to a tree and a training or testing example is defined by two factors: by the ability of corresponding tree to predict and by the peculiarity of an example which falls into a leaf of the tree. The second main idea behind Att-RSF is to apply the Huber's contamination model to represent the attention weights as the linear function of the trainable attention parameters. The Harrell's C-index (concordance index) measuring the prediction quality of the random survival forest is used to form the loss function for training the attention weights. The C-index jointly with the contamination model lead to the standard quadratic optimization problem for computing the weights, which has many simple algorithms for its solution. Numerical experiments with real datasets containing survival data illustrate Att-RSF.
Chapter
This chapter provides a comprehensive overview to data driven disease progression modeling techniques. It adopts a broad approach to disease progression, focusing on all computational methods able to model any temporal aspects of disease progression. Consequently, we have focused on three classes of analysis: staging and trajectory estimation analysis to better understand the course of a disease, predictive classification analysis for important disease related event prediction, and time to event analysis with survival models to estimate when clinically significant events are expected to occur during the progression of a disease. We describe the state of the art in each of these classes, together with discussions on challenges and opportunities for additional research.
Conference Paper
Full-text available
We introduce a novel check-in time prediction problem. The goal is to predict the time a user will check-in to a given location. We formulate check-in prediction as a survival analysis problem and propose a Recurrent-Censored Regression (RCR) model. We address the key challenge of check-in data scarcity, which is due to the uneven distribution of check-ins among users/locations. Our idea is to enrich the check-in data with potential visitors, i.e., users who have not visited the location before but are likely to do so. RCR uses recurrent neural network to learn latent representations from historical check-ins of both actual and potential visitors, which is then incorporated with censored regression to make predictions. Experiments show RCR outperforms state-of-the-art event time prediction techniques on real-world datasets.
Article
Full-text available
Right censoring is a common phenomenon that arises in many longitudinal studies where an event of interest could not be recorded within the given time frame. Censoring causes missing time-to-event labels, and this effect is compounded when dealing with datasets which have high amounts of censored instances. In addition, dependent censoring in the data, where censoring is dependent on the covariates in the data leads to bias in standard survival estimators (such as Kaplan-Meier). This motivates us to propose an imputed censoring approach which calibrates the right censored (RC) times in an attempt to reduce the bias in the survival estimators. This calibration is done using an imputation method which estimates the sparse inverse covariance matrix over the dataset in an iterative convergence framework. During estimation, we apply row and column-based regularization to account for both row and column-wise correlations between different instances while imputing them. We evaluate the goodness of our approach using crowdfunding data, electronic health records (EHRs) and synthetic censored datasets. Experimental results indicate that our approach helps in improving the AUC values of survival learners, compared to applying them directly on the original survival data.
Article
Prognostic classification schemes have often been used in medical applications, but rarely subjected to a rigorous examination of their adequacy. For survival data, the statistical methodology to assess such schemes consists mainly of a range of ad hoc approaches, and there is an alarming lack of commonly accepted standards in this field. We review these methods and develop measures of inaccuracy which may be calculated in a validation study in order to assess the usefulness of estimated patient‐specific survival probabilities associated with a prognostic classification scheme. These measures are meaningful even when the estimated probabilities are misspecified, and asymptotically they are not affected by random censorship. In addition, they can be used to derive R²‐type measures of explained residual variation. A breast cancer study will serve for illustration throughout the paper. Copyright © 1999 John Wiley & Sons, Ltd.
Book
There is a huge amount of literature on statistical models for the prediction of survival after diagnosis of a wide range of diseases like cancer, cardiovascular disease, and chronic kidney disease. Current practice is to use prediction models based on the Cox proportional hazards model and to present those as static models for remaining lifetime after diagnosis or treatment. In contrast, Dynamic Prediction in Clinical Survival Analysis focuses on dynamic models for the remaining lifetime at later points in time, for instance using landmark models. Designed to be useful to applied statisticians and clinical epidemiologists, each chapter in the book has a practical focus on the issues of working with real life data. Chapters conclude with additional material either on the interpretation of the models, alternative models, or theoretical background. The book consists of four parts: • Part I deals with prognostic models for survival data using (clinical) information available at baseline, based on the Cox model. • Part II is about prognostic models for survival data using (clinical) information available at baseline, when the proportional hazards assumption of the Cox model is violated. • Part III is dedicated to the use of time-dependent information in dynamic prediction. • Part IV explores dynamic prediction models for survival data using genomic data. Dynamic Prediction in Clinical Survival Analysis summarizes cutting-edge research on the dynamic use of predictive models with traditional and new approaches. Aimed at applied statisticians who actively analyze clinical data in collaboration with clinicians, the analyses of the different data sets throughout the book demonstrate how predictive models can be obtained from proper data sets.
Article
We propose a new method for estimation in linear models. The ‘lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree‐based models are briefly described.
Book
This greatly expanded second edition of Survival Analysis- A Self-learning Text provides a highly readable description of state-of-the-art methods of analysis of survival/event-history data. This text is suitable for researchers and statisticians working in the medical and other life sciences as well as statisticians in academia who teach introductory and second-level courses on survival analysis. The second edition continues to use the unique "lecture-book" format of the first (1996) edition with the addition of three new chapters on advanced topics: Chapter 7: Parametric Models Chapter 8: Recurrent events Chapter 9: Competing Risks. Also, the Computer Appendix has been revised to provide step-by-step instructions for using the computer packages STATA (Version 7.0), SAS (Version 8.2), and SPSS (version 11.5) to carry out the procedures presented in the main text. The original six chapters have been modified slightly to expand and clarify aspects of survival analysis in response to suggestions by students, colleagues and reviewers, and to add theoretical background, particularly regarding the formulation of the (partial) likelihood functions for proportional hazards, stratified, and extended Cox regression models David Kleinbaum is Professor of Epidemiology at the Rollins School of Public Health at Emory University, Atlanta, Georgia. Dr. Kleinbaum is internationally known for innovative textbooks and teaching on epidemiological methods, multiple linear regression, logistic regression, and survival analysis. He has provided extensive worldwide short-course training in over 150 short courses on statistical and epidemiological methods. He is also the author of ActivEpi (2002), an interactive computer-based instructional text on fundamentals of epidemiology, which has been used in a variety of educational environments including distance learning. Mitchel Klein is Research Assistant Professor with a joint appointment in the Department of Environmental and Occupational Health (EOH) and the Department of Epidemiology, also at the Rollins School of Public Health at Emory University. Dr. Klein is also co-author with Dr. Kleinbaum of the second edition of Logistic Regression- A Self-Learning Text (2002). He has regularly taught epidemiologic methods courses at Emory to graduate students in public health and in clinical medicine. He is responsible for the epidemiologic methods training of physicians enrolled in Emory’s Master of Science in Clinical Research Program, and has collaborated with Dr. Kleinbaum both nationally and internationally in teaching several short courses on various topics in epidemiologic methods.
Conference Paper
Technological advances have created a great opportunity to provide multi-view data for patients. However, due to the large discrepancy between different heterogeneous views, traditional survival models are unable to efficiently handle multiple modalities data as well as learn very complex interactions that can affect survival outcomes in various ways. In this paper, we develop a Deep Correlational Survival Model (DeepCorrSurv) for the integration of multi-view data. The proposed network consists of two sub-networks, view-specific and common sub-network. To remove the view discrepancy, the proposed DeepCorrSurv first explicitly maximizes the correlation among the views. Then it transfers feature hierarchies from view commonality and specifically fine-tunes on the survival regression task. Extensive experiments on real lung and brain tumor data sets demonstrated the effectiveness of the proposed DeepCorrSurv model using multiple modalities data across different tumor types.