Predicting kidney transplant survival using tree-based modeling.
ABSTRACT Predicting the outcome of kidney transplantation is clinically important and computationally challenging. The goal of this project was to develop the models predicting probability of kidney allograft survival at 1, 3, 5, 7, and 10 years. Kidney transplant data from the United States Renal Data System (January 1, 1990, to December 31, 1999, with the follow-up through December 31, 2000) were used (n = 92,844). Independent variables included recipient demographic and anthropometric data, end-stage renal disease course, comorbidity information, donor data, and transplant procedure variables. Tree-based models predicting the probability of the allograft survival were generated using roughly two-thirds of the data (training set), with the remaining one-third left aside to be used for models validation (testing set). The prediction of the probability of graft survival in the independent testing dataset achieved a good correlation with the observed survival (r = 0.94, r = 0.98, r = 0.99, r = 0.93, and r = 0.98) and relatively high areas under the receiving operator characteristic curve (0.63, 0.64, 0.71, 0.82, and 0.90) for 1-, 3-, 5-, 7-, and 10-year survival prediction, respectively. The models predicting the probability of 1-, 3-, 5-, 7-, and 10-year allograft survival have been validated on the independent dataset and demonstrated performance that may suggest implementation in clinical decision support system.
- SourceAvailable from: Petra Reinke[Show abstract] [Hide abstract]
ABSTRACT: Renal transplantation has dramatically improved the survival rate of hemodialysis patients. However, with a growing proportion of marginal organs and improved immunosuppression, it is necessary to verify that the established allocation system, mostly based on human leukocyte antigen matching, still meets today's needs. The authors turn to machine-learning techniques to predict, from donor-recipient data, the estimated glomerular filtration rate (eGFR) of the recipient 1 year after transplantation. The patient's eGFR was predicted using donor-recipient characteristics available at the time of transplantation. Donors' data were obtained from Eurotransplant's database, while recipients' details were retrieved from Charité Campus Virchow-Klinikum's database. A total of 707 renal transplantations from cadaveric donors were included. Two separate datasets were created, taking features with <10% missing values for one and <50% missing values for the other. Four established regressors were run on both datasets, with and without feature selection. The authors obtained a Pearson correlation coefficient between predicted and real eGFR (COR) of 0.48. The best model for the dataset was a Gaussian support vector machine with recursive feature elimination on the more inclusive dataset. All results are available at http://transplant.molgen.mpg.de/. For now, missing values in the data must be predicted and filled in. The performance is not as high as hoped, but the dataset seems to be the main cause. Predicting the outcome is possible with the dataset at hand (COR=0.48). Valuable features include age and creatinine levels of the donor, as well as sex and weight of the recipient.Journal of the American Medical Informatics Association 08/2011; 19(2):255-62. · 3.57 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Predicting the outcome of kidney transplantation is important in optimizing transplantation parameters and modifying factors related to the recipient, donor, and transplant procedure. As patients with end-stage renal disease (ESRD) secondary to lupus nephropathy are generally younger than the typical ESRD patients and also seem to have inferior transplant outcome, developing an outcome prediction model in this patient category has high clinical relevance. The goal of this study was to compare methods of building prediction models of kidney transplant outcome that potentially can be useful for clinical decision support. We applied three well-known data mining methods (classification trees, logistic regression, and artificial neural networks) to the data describing recipients with systemic lupus erythematosus (SLE) in the US Renal Data System (USRDS) database. The 95% confidence interval (CI) of the area under the receiver-operator characteristic curves (AUC) was used to measure the discrimination ability of the prediction models. Two groups of predictors were selected to build the prediction models. Using input variables based on Weka (a open source machine learning software) supplemented with additional variables of known clinical relevance (38 total predictors), the logistic regression performed the best overall (AUC: 0.74, 95% CI: 0.72-0.77)-significantly better (p < 0.05) than the classification trees (AUC: 0.70, 95% CI: 0.67-0.72) but not significantly better (p = 0.218) than the artificial neural networks (AUC: 0.71, 95% CI: 0.69-0.73). The performance of the artificial neural networks was not significantly better than that of the classification trees (p = 0.693). Using the more parsimonious subset of variables (six variables), the logistic regression (AUC: 0.73, 95% CI: 0.71-0.75) did not perform significantly better than either the classification tree (AUC: 0.70, 95% CI: 0.68-0.73) or the artificial neural network (AUC: 0.73, 95% CI: 0.70-0.75) models. We generated several models predicting 3-year allograft survival in kidney transplant recipients with SLE that potentially can be used in practice. The performance of logistic regression and classification tree was not inferior to more complex artificial neural network. Prediction models may be used in clinical practice to identify patients at risk.ASAIO journal (American Society for Artificial Internal Organs: 1992) 01/2011; 57(4):300-9. · 1.39 Impact Factor
- Nonlinear Analysis Real World Applications 10/2011; · 2.34 Impact Factor