Conference PaperPDF Available

Predictive modeling of post radiation-therapy recurrence for gynecological cancer patients using clinical and histopathology imaging features

Authors:

Abstract

Purpose: To build a machine-learning (ML) classifier to predict the clinical endpoint of post-Radiation-Therapy (RT) recurrence of gynecological cancer patients, while exploring the outcome predictability of cell spacing and nuclei size pre-treatment histopathology image features and clinical variables. Materials and Methods: Thirty-six gynecological (i.e., cervix, vaginal, and vulva) cancer patients (median age at diagnosis = 59.5 years) with a median follow-up time of 25.7 months, nine of which (event rate of 25%) experienced post-RT recurrence, were included in this analysis. Patient-specific nuclei size and cell spacing distributions from cancerous and non-tumoral regions of pre-treatment hematoxylin and eosin (H&E) stained digital histopathology Whole-Slide-Images (WSI) were extracted. The mean and standard deviation of these distributions were computed as imaging features for each WSI. Clinical features of clinical and radiological stage at the time of radiation, p16 status, age at diagnosis, and cancer type were also obtained. Uniquely, a Tree-based Pipeline Optimization Tool (TPOT) AutoML approach, including hyperparameter tuning, was implemented to find the best performing pipeline for this class-imbalanced and small dataset. A Radial Basis Function Kernel (RBF) sampler (gamma = 0.25) was applied to combined imaging and clinical input variables for training. The resulting features were fed into an XGBoost (ie., eXtreme gradient-boosting) classifier (learning rate = 0.1). Its outputs were propagated as “synthetic features” followed by polynomial feature transforms. All raw and transformed features were trained with a decision tree classification algorithm. Results of model evaluation metrics from a 10-fold stratified shuffle split cross-validation were averaged. A permutation test (n=1000) was performed to validate the significance of the classification scores. Results: Our model achieved a 10-fold stratified shuffle split cross-validation scores of 0.87 for mean accuracy, 0.92 for mean balanced accuracy, 0.78 for precision, 1 for recall, 0.85 for F1 score, and 0.92 for Area Under the Curve of Receiver Operating Characteristics Curve, to predict our patient cohort’s post-RT recurrence binary outcome. A p-value of 0.036 was obtained from the permutation test. This implies real dependencies between our combined imaging and clinical features and outcomes which were learned by the classifier, and the primising model performance was not by chance. Conclusions: Despite the small dataset and low event rate, as a proof of concept, we showed that a decision-tree-based ML classification algorithm using an XGBoost algorithm is able to utilize combined (cell spacing & nuclei size) imaging and clinical features to predict post-RT outcomes for gynecological cancer patients.
Title: Predictive modeling of post radiation-therapy recurrence for gynecological cancer patients
using clinical and histopathology imaging features
Authors:
Yujing Zou, B.Sc.1†, Magali Lecavalier-Barsoum, MD2, Manuela Pelmus, MD3, Farhad Maleki,
PhD4, Shirin Abbasinejad Enger, PhD 1,3,5,6
1 Medical Physics Unit, Department of Oncology, Faculty of Medicine, McGill University,
Montreal, Quebec, Canada
2 Department of Radiation Oncology, Jewish General Hospital, Montreal, QC, Canada
3 Department of Pathology, Faculty of Medicine, McGill University, Montreal, Quebec, Canada
4 Department of Radiology, The Research Institute of the McGill University Health Centre, McGill
University, Montreal, QC, Canada
5,6 Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec,
Canada
Corresponding author: Yujing Zou†, yujing.zou@mail.mcgill.ca, Tel: 514-961-2768
Purpose:
To build a machine-learning (ML) classifier to predict the clinical endpoint of post-Radiation-
Therapy (RT) recurrence of gynecological cancer patients, while exploring the outcome
predictability of cell spacing and nuclei size pre-treatment histopathology image features and
clinical variables.
Materials and Methods:
Thirty-six gynecological (i.e., cervix, vaginal, and vulva) cancer patients (median age at
diagnosis = 59.5 years) with a median follow-up time of 25.7 months, nine of which (event rate
of 25%) experienced post-RT recurrence, were included in this analysis. Patient-specific nuclei
size and cell spacing distributions from cancerous and non-tumoral regions of pre-treatment
hematoxylin and eosin (H&E) stained digital histopathology Whole-Slide-Images (WSI) were
extracted. The mean and standard deviation of these distributions were computed as imaging
features for each WSI. Clinical features of clinical and radiological stage at the time of radiation,
p16 status, age at diagnosis, and cancer type were also obtained. Uniquely, a Tree-based
Pipeline Optimization Tool (TPOT) AutoML approach, including hyperparameter tuning, was
implemented to find the best performing pipeline for this class-imbalanced and small dataset. A
Radial Basis Function Kernel (RBF) sampler (gamma = 0.25) was applied to combined imaging
and clinical input variables for training. The resulting features were fed into an XGBoost (ie.,
eXtreme gradient-boosting) classifier (learning rate = 0.1). Its outputs were propagated as
“synthetic features” followed by polynomial feature transforms. All raw and transformed features
were trained with a decision tree classification algorithm. Results of model evaluation metrics
from a 10-fold stratified shuffle split cross-validation were averaged. A permutation test
(n=1000) was performed to validate the significance of the classification scores.
Results:
Our model achieved a 10-fold stratified shuffle split cross-validation scores of 0.87 for mean
accuracy, 0.92 for mean balanced accuracy, 0.78 for precision, 1 for recall, 0.85 for F1 score,
and 0.92 for Area Under the Curve of Receiver Operating Characteristics Curve, to predict our
patient cohort’s post-RT recurrence binary outcome. A p-value of 0.036 was obtained from the
permutation test. This implies real dependencies between our combined imaging and clinical
features and outcomes which were learned by the classifier, and the primising model
performance was not by chance.
Conclusions:
Despite the small dataset and low event rate, as a proof of concept, we showed that a decision-
tree-based ML classification algorithm using an XGBoost algorithm is able to utilize combined
(cell spacing & nuclei size) imaging and clinical features to predict post-RT outcomes for
gynecological cancer patients.
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.