Nonlinear Kernel-Based Approaches for Predicting Normal Tissue Toxicities
ABSTRACT Since the early demonstration of the curative potential of radiation therapy for tumor sterilization, normal tissue toxicity continues to be dose limiting. Accurate prediction of patientÂ¿s complication risk would allow personalization of treatment planning decisions. Nonlinear kernel methods can provide a robust framework for learning complex interactions between observed toxicities and treatment, anatomical, and patient-related variables. However, proper application of these powerful methods would require better understanding of a high-dimensional feature space that is spanned by all these variables. In this work, we investigate methods for visualization of this high-dimensional space and compare different approaches for extracting discriminant features. Our preliminary results demonstrate that principle component analysis is a valuable tool for visualizing high dimensional data and for determining proper kernel type. In addition, variable selection based on resampling methods within the logistic regression framework seemed to yield improved prediction performance compared to the recursive-feature elimination method.
- SourceAvailable from: Ellen Huang[Show abstract] [Hide abstract]
ABSTRACT: Tumor control probability (TCP) to radiotherapy is determined by complex interactions between tumor biology, tumor microenvironment, radiation dosimetry, and patient-related variables. The complexity of these heterogeneous variable interactions constitutes a challenge for building predictive models for routine clinical practice. We describe a datamining framework that can unravel the higher order relationships among dosimetric dose-volume prognostic variables, interrogate various radiobiological processes, and generalize to unseen data before when applied prospectively. Several datamining approaches are discussed that include dose-volume metrics, equivalent uniform dose, mechanistic Poisson model, and model building methods using statistical regression and machine learning techniques. Institutional datasets of non-small cell lung cancer (NSCLC) patients are used to demonstrate these methods. The performance of the different methods was evaluated using bivariate Spearman rank correlations (rs). Over-fitting was controlled via resampling methods. Using a dataset of 56 patients with primary NCSLC tumors and 23 candidate variables, we estimated GTV volume and V75 to be the best model parameters for predicting TCP using statistical resampling and a logistic model. Using these variables, the support vector machine (SVM) kernel method provided superior performance for TCP prediction with an rs=0.68 on leave-one-out testing compared to logistic regression (rs=0.4), Poisson-based TCP (rs=0.33), and cell kill equivalent uniform dose model (rs=0.17). The prediction of treatment response can be improved by utilizing datamining approaches, which are able to unravel important non-linear complex interactions among model variables and have the capacity to predict on unseen data for prospective clinical applications.Acta oncologica (Stockholm, Sweden) 03/2010; 49(8):1363-73. · 2.27 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Purpose: Radiation pneumonitis (RP) is a potentially fatal side effect arising in lung cancer patients who receive radiotherapy as part of their treatment. For the modeling of RP outcomes data, several predictive models based on traditional statistical methods and machine learning techniques have been reported. However, no guidance to variation in performance has been provided to date. Materials and methods: In this study, we explore several machine learning algorithms for classification of RP data. The performance of these classification algorithms is investigated in conjunction with several feature selection strategies and the impact of the feature selection strategy on performance is further evaluated. The extracted features include patient's demographic, clinical and pathological variables, treatment techniques, and dose-volume metrics. In conjunction, we have been developing an in-house Matlab-based open source software tool, called dose-response explorer system (DREES), customized for modeling and exploring dose response in radiation oncology. This software has been upgraded with a popular classification algorithm called support vector machine (SVM), which seems to provide improved performance in our exploration analysis and has strong potential to strengthen the ability of radiotherapy modelers in analyzing radiotherapy outcomes data. These tools are demonstrated on an institutional non-small cell lung carcinoma (NSCLC) dataset of patients who received radiotherapy. Results: Our methods were applied to an NSCLC dataset that consists of 209 patients' information, each having 160 variables. Using several feature selection methods, relevant features were searched. Subsequently, with the selected features, various classification algorithms were tested. Through these experiments, we showed the usefulness of machine learning methods in the analysis of radiation oncology dataset. Conclusions: We have presented an open-source software tool and several machine learning algorithms for analyzing radiotherapy outcomes. We demonstrated the tool on a lung cancer patient dataset. We believe that the improved tool will provide radiation oncology modelers with new means to analyze radiation response data.