Conference PaperPDF Available

Impact of Machine Learning Assistance on the Quality of Life Prediction for Breast Cancer Patients

Authors:
  • Nordic Healthcare Group - Research
  • Nordic Healthcare Group

Abstract and Figures

Proper and well-timed interventions may improve breast cancer patient adaptation, resilience and quality of life (QoL) during treatment process and time after disease. The challenge is to identify those patients who would benefit most from a particular intervention. The aim of this study was to measure whether the machine learning prediction incorporated in the clinical decision support system (CDSS) improves clinicians' performance to predict patients' QoL during treatment process. We conducted an experimental setup in which six clinicians used CDSS and predicted QoL for 60 breast cancer patients. Each patient was evaluated both with and without the aid of machine learning prediction. The clinicians were also open-ended interviewed to investigate the usage and perceived benefits of CDSS with the machine learning prediction aid. Clinicians' performance to evaluate the patients' QoL was higher with the aid of machine learning predictions than without the aid. AUROC of clinicians was .777 (95% CI .691 − .857) with the aid and .755 (95% CI .664 − .840) without the aid. When the machine learning model's prediction was correct, the average accuracy (ACC) of the clinicians was .788 (95% CI .739 − .838) with the aid and .717 (95% CI .636 − .798) without the aid.
Content may be subject to copyright.
Impact of machine learning assistance on the quality of life prediction for
breast cancer patients
Mikko Nuutinen1 a, Sonja Korhonen1, Anna-Maria Hiltunen1, Ira Haavisto1, Paula
Poikonen-Saksela2, Johanna Mattson2and Riikka-Leena Leskel¨
a1
1Nordic Healthcare Group, Helsinki, Finland
2Helsinki University Hospital Comprehensive Cancer Center and Helsinki University, Finland
Keywords: Clinical decision support system, breast cancer, resilience, machine learning
Abstract: Proper and well-timed interventions may improve breast cancer patient adaptation, resilience and quality of life
(QoL) during treatment process and time after disease. The challenge is to identify those patients who would
benefit most from a particular intervention. The aim of this study was to measure whether the machine learning
prediction incorporated in the clinical decision support system (CDSS) improves clinicians’ performance to
predict patients’ QoL during treatment process. We conducted an experimental setup in which six clinicians
used CDSS and predicted QoL for 60 breast cancer patients. Each patient was evaluated both with and without
the aid of machine learning prediction. The clinicians were also open-ended interviewed to investigate the
usage and perceived benefits of CDSS with the machine learning prediction aid. Clinicians’ performance
to evaluate the patients’ QoL was higher with the aid of machine learning predictions than without the aid.
AUROC of clinicians was .777 (95% CI .691 .857) with the aid and .755 (95% CI .664 .840) without the
aid. When the machine learning model’s prediction was correct, the average accuracy (ACC) of the clinicians
was .788 (95% CI .739 .838) with the aid and .717 (95% CI .636 .798) without the aid.
1 INTRODUCTION
Breast cancer is a major socio-economic challenge
due to its high prevalence. In 2018, more than 2 mil-
lion new breast cancer patients were diagnosed world-
wide (Bray et al., 2018). 28% of all cancers in Europe
were breast cancers. The concept of resilience refers
to a person’s ability to adapt and bounce back from
some challenging event (Deshields et al., 2016; Rut-
ter, 2006). How a breast cancer patient adapts to treat-
ment process and time after disease greatly affects a
patient’s quality of life (QoL). Proper and well-timed
interventions may be important in improving patient
adaptation and resilience. The challenge is to identify
in advance and in a timely manner those patients who
would benefit most from a particular intervention.
An advanced machine learning algorithms inte-
grated into clinical decision support system (CDSS
(Sutton et al., 2020)) can help a clinician to identify
target patients and to determine appropriate interven-
tions. As far as we know, no previous studies have
investigated the aid of machine learning prediction in-
tegrated into CDSS to identify patients who may need
attention and intervention for resilience of breast can-
ahttps://orcid.org/0000-0002-7429-3710
cer treatment process and survival.
In this study, we investigated the use of machine
learning prediction integrated into CDSS to identify
breast cancer patients who may need help. We con-
ducted a user experiment in which clinicians’ task was
to predict patients’ quality of life after the time period
of 6 months from the diagnosis of breast cancer. The
independent variable of the user experiment was the
aid of machine learning algorithm. The aim was to
measure whether the machine learning prediction im-
proves clinicians’ performance to predict the QoL of
patients. In addition, we conducted an open-ended in-
terview for each clinician. The aim was to determine
how this kind of decision support tool could be used
and who would benefit from it and how.
2 METHODS
2.1 Dataset
Patient data that is used in this study was collected
from four clinical sites: (1) Helsinki University Hos-
pital Comprehensive Cancer Center (HUS), (2) He-
brew University in Jerusalem, Israel, (3) Champal-
imaud Breast Unit (CHAMP) and (4) European In-
stitute of Oncology (IEO). The study was approved
by the European Institute of Oncology, Applied Re-
search Division for Cognitive and Psychological Sci-
ence (Approval No R868/18 – IEO 916) and the clin-
ical ethical committees of each hospital.
The retrospective data set contains sociodemo-
graphic and lifestyle, medical and treatment and psy-
chosocial assessment values for 608 breast cancer
patients1. For the user experiment, we selected 60
HUS patients (test set). The remaining 548 patients
(train set) were used for training the machine learn-
ing algorithm. The target variable for machine learn-
ing algorithm and user experiment was patients’ self-
assessed quality of life (QoL) value evaluated six
months (Month 6) after the baseline (Month 0). Each
patient’s baseline was 3-4 weeks after breast cancer
was diagnosed. The QoL value was measured us-
ing EORTC QLQ-Global QoL scale (Aaronson et al.,
1993).
Table 1 presents descriptive analysis of sociode-
mographic and lifestyle, medical and treatment and
psychosocial assessment values for the patients of the
test set. These variables were presented in the user
interface of the user experiment for the clinicians (Fig
1). In Table 1, the patients are divided into the low
and high QoL groups. The threshold of grouping is
the QoL value of 75. Patients whose self-assessed
QoL value was higher than 75 after 6 months from
the baseline were grouped in the high QoL group.
The same grouping was used for training the machine
learning classifier (Section 2.2). Table 1 shows that
the high QoL patients were significantly older and
they had lower BMI and better baseline values for the
overall health and quality of life and lower distress
level compared to the low QoL group.
2.2 Machine learning model
Train data set (n=548) was used for training machine
learning model (random forest classifier). The task of
machine learning model was to classify a patient to
be either in the group of low QoL or high QoL af-
ter 6 months from the baseline. The performance of
the trained machine learning model to classify high
and low QoL patients was evaluated on the test data
set (n=60) by calculating the standard performance
metrics, such as the area under the receiver operating
characteristic curve (AUROC), recall and precision.
1HUS: 185 patients, Hebrew University: 138 patients,
CHAMP: 108 patients and IEO: 177 patients
2.3 User experiment
The standard performance measurements of machine
learning algorithms are not sufficient to show that
CDSS is effective also in a real clinical environment.
The human decision-making process is complex and
biased. It cannot be assumed that clinicians will al-
ways closely follow the recommendations of machine
learning model (Vasey et al., 2021; Ginestra et al.,
2019). In this study we conducted a user experiment
for measuring the performance of decision making by
simulating the use of CDSS with or without the aid
of the machine learning prediction. The independent
variable was the aid of machine learning prediction.
The dependent variable was the predicted QoL value
for patients. The QoL values were given by using the
continuous scale from 0 (low QoL) to 100 (high QoL).
2.3.1 User interface
Fig 1 presents the user interface of the user experi-
ment in which the machine learning prediction was
presented for the participants. In the case of without
the aid of prediction, only patient background infor-
mation and patient questionnaire data were presented
for the participants. The QoL predictions from the
participants were stored with the slider at the bottom
of the user interface (inside the red square).
The patient background information presented on
the user interface tried to present the same informa-
tion as clinicians use at the normal patient admission.
It’s important to note that the results of the experiment
are comparable to normal patient examinations only
when all patient’s background information relevant to
the task are presented on the user interface. The pa-
tient background information of this study presented
on the user interface was based on consultation of two
medical oncologists with long experience of breast
cancer treatment (HUS: Paula Poikonen-Saksela and
Leena Vehmanen, 5.10.2020). The selected patient
background variables were related to the patient’s age,
BMI, education, working life, physical activity, fam-
ily relationships, chemotherapy treatment and mental
health background. Also, previous research (Bonanno
et al., 2007; Molina et al., 2014) supports that the vari-
ables such as age, socioeconomic and marital status
and social support are important factors of resilience.
Furthermore, the user interface presented the pa-
tients’ answers for three psychosocial questions. The
questions were related to patient’s health, quality of
life and distress level at the baseline. The questions
were selected according to the variable importance
values of the trained machine learning model (Table
3).
Table 1: Sociodemographic and lifestyle, medical and treatment and psychosocial assessment values for the test set patient
cohort. The patients are divided into the low and high QoL (quality of life) groups according to the Global QoL value from
the European Organization for Research and Treatment of Cancer QOL Core Questionnaire 30. The threshold of grouping
is the QoL value of 75. Patients whose self-assessed QoL value was higher than 75 after 6 months from the baseline were
grouped in the high QoL group. P values by Fisher’s exact or Mann-Whitneyu test. Avg. = Average, SD = Standard deviation,
BMI = Body mass index
Variable group Variable Low QoL (0-75) High QoL (75-100) P value
Number of patients 38 22
Sociodemographic
and lifestyle
Age, Avg.(±SD)54.6 (7.61) 62.2 (6.38) < .001
BMI, Avg.(±SD)27.3 (4.62) 24.2 (3.76) 0.006
Higher education1, n (%) 38 (100.0) 21 (95.5) 0.367
Part time or unemployment, n (%) 1 (2.6) 2 (9.1) 0.548
Low income2, n (%) 6 (15.8) 1 (4.5) 0.246
No exercise, n (%) 3 (7.9) 1 (4.5) 1
Living alone, n (%) 15 (39.5) 10 (45.5) 0.787
Number of children, Avg.(±SD)1.4 (1.15) 1.8 (1.1) 0.064
Medical and
treatment
Chemotherapy treatment, n (%) 26 (68.4) 12 (54.5) 0.405
Preexisting mentalillness, n (%) 15 (39.5) 5 (22.7) 0.258
Chronic depression, n (%) 5 (13.2) 0 (0.0) 0.148
Psychosocial
assessment
Baseline self-assessment:
Overall quality of life3,Avg.(±SD)5.3 (1.12) 6.2 (1.18) < .001
Baseline self-assessment:
Overall health4,Avg.(±SD)5.1 (1.04) 6.3 (1.08) < .001
Baseline self-assessment:
Distress level5,Avg.(±SD)4.7 (2.68) 1.6 (1.65) < .001
Month 6, self-assessment:
Global QLQ6,Avg.(±SD)55.7 (14.77) 92.0 (7.03) < .001
1Bachelor, high school, postgraduate school or vocational non academic diploma
2Net monthly income 0 -1500C
3QLQ30-29 (Aaronson et al., 1993), How would you rate your overall quality of life during the past week? 1 (very poor) - 7 (exellent)
4QLQ30-30 (Aaronson et al., 1993), How would you rate your overall health during the past week? 1 (very poor) - 7 (exellent)
5NCCN distress thermometer (Goebel and Mehdorn, 2011), Please circle the number (0-10) that best describes how much distress you have been experiencing in the past week,
including today: 1 (No distress) - 10 (Extreme distress)
6QLQ30 functional scale Global, (Aaronson et al., 1993) [0-100], the higher is better
Figure 1: User interface of the user experiment. After participants have analyzed patient background information and selected
patient questionnaires, they rate quality of life value for the patient with or without the aid of machine learning prediction. The
quality of life values were given by using the continuous scale from 0 (low QoL) to 100 (high QoL) (inside the red square).
In this user interface example, the aid of machine learning prediction have been presented.
2.3.2 Participants and samples
To compare the performance of clinicians with and
without the aid of prediction, six clinicians diagnosed
three sets of 20 patients twice, in two separate ses-
sions, according to the crossover design detailed in
Fig 2. Participants were oncologists with median
7.5 (4-18) years of experience of treating breast can-
cer patients. During each session, clinicians inter-
preted half of patients with machine learning predic-
tion value, and half without. After a washout period
the clinicians diagnosed the same set of 20 patients
with the aid status reversed. The patients that were re-
viewed with the aid of predictions at the first session
were reviewed without the aid during the second ses-
sion, and vice versa. That is, the 60 patients (test set)
were randomly grouped into three groups, 20 patients
in each, and each clinician evaluated all the patients
in one group with and without the aid. Thus, each pa-
tient group was evaluated by two different clinicians.
To establish familiarity with the CDSS and the
machine learning predictions, each session began
with an introduction and 4 training patients (2 with
and 2 without the aid) that were not part of the test
patients. Study administrator also clarified any ques-
tions about the functionality and the variables of user
experiment.
The washout period between the two sessions of
the crossover design was 2-4 weeks. According to
the recommendation (Pantanowitz et al., 2013) the
washout period should be at least 2 weeks. On the
other hand, with a long washout period, the partic-
ipant’s diagnostic criteria could have changed over
time. For example, participants could have gained
more experience or changed their attitude toward di-
agnostic criteria (Nielsen et al., 2010).
Too long experiment causes fatigue, which low-
ers the quality of input values. With a pilot study
we confirmed that the length of a single session with
20 patients was no more than 30 minutes. According
to standard (ITU-R, 2012) the duration of experiment
should be less than 60 minutes.
2.3.3 Open-ended interview
After the second session of the user experiment, an
open-ended interview was conducted for the partici-
pants. The interview data was analyzed following the-
matic analysis and the approach identified by (Clarke
and Braun, 2014). The interview included the follow-
ing questions:
Could you make use of this kind of decision sup-
port tool when taking care of a patient and how?
5 5 5
5 5 5
Washout period of 2 weeks
Training With aid Without aid
Order 2
Order 1 5
5
Figure 2: Experimental design. Each of the 6 clinicians
was randomly assigned to either test order 1 or 2. Each test
began with a brief practice block of 4 (2 with the aid and
2 without the aid) patient cases, followed by 4 experiment
blocks of 5 patients, with order 1 beginning with the aid of
machine learning predictions and order 2 beginning without
the aid.
How would you envision it to be used in your or-
ganisation / department?
Who (what role/s) in your organisation would use
such a tool?
• Who (what role/s) in your organisation would
make use of the information?
How might the predicted score affect the patient
care processes from your perspective / in your or-
ganisation?
Do you think the patients could benefit from this
kind of prediction? Under which conditions?
What aspects should take in consideration when
further developing the decision support tool?
2.3.4 Statistical analyses
The performance of the clinicians with the aid and
without the aid was evaluated by calculating the per-
formance metrics of the area under the receiver oper-
ating characterictic curve (AUROC), recall, precision
and balanced accuracy (ACC). Furthermore, we mea-
sured participants’ review time when decisions were
made with or without the aid of predictions. We used
bootstrapping (Seabold and Perktold, 2010) to com-
pute 95% confidential intervals (CI) and p-values for
the performance metrics.
3 RESULTS
3.1 Machine learning model
The AUROC value of the trained machine learning
model (random forest) for the test data set was .832
(95% CI .757-.900). Recall and precision values were
.727 (95% CI .583-.857) and .727 (95% CI .589-.854)
when the threshold value of the model was .60. Table
2 presents the confusion matrix of the trained machine
learning model for the test data set when the threshold
value of the model was .60 or .70. With the thresh-
old of .60, the model classified 6/38 low QoL patients
in the group of high QoL (false positives). With the
threshold of .70, 1/38 low QoL patients were classi-
fied in the group of high QoL.
Table 3 lists the 10 most important variables of
the trained machine learning model according to the
random forest feature importance values. The vari-
ables of Global QLQ, mental health (HADS) and dis-
tress level at the baseline (Month 0) were important
psychosocial factors. Age, BMI and monthly income
were important sociodemographic and lifestyle fac-
tors.
Table 2: Confusion matrix for the trained machine learning
model when the classification threshold (th) was .60 or .70.
QoL = Quality of life.
th =.60 Predicted Low QoL Predicted High QoL
Low QoL 32 6
High QoL 6 16
th =.70 Predicted Low QoL Predicted High QoL
Low QoL 37 1
High QoL 15 7
Table 3: Variable importance values of the trained model
Variable Importance value
Global QLQ1.106
Mental health, HADS2.079
Age .072
Distress level3.071
Overall QoL, QLQ30-301.057
Overall health, QLQ30-291.048
Upset, Panas 5 4.044
Net income .043
Coping with cancer, CBI4.041
BMI .040
The psychosocial variabels were from the questionnaires:
1EORTC quality of life questionnaire (QLQ-C30)
2Hospital Anxiety and Depression Scale (HADS)
3NCCN distress thermometer
4Positive and Negative affectivity - short form (PANAS)
5Cancer Behavior Inventory (self-efficacy in coping with cancer) (CBI-B)
3.1.1 User experiment
Table 4 presents the performance values for the ma-
chine learning model and over clinicians with and
without the aid of the predictions. The overall
receiver operating characteristic (ROC) curves are
shown in Fig 3. AUROC of clinicians was .755
(95% CI .664–.840) without the aid and .777 (95% CI
.691–.857) with the aid. AUROC of machine learn-
ing model was .832 (95% CI .757-.900) which is not
statistically significantly higher than AUROC of clin-
icians with or without the aid (p=.53 and p=.135).
Figure 3: The overall receiver operating characteristic
(ROC) curves for machine learning (ML) model and clin-
icians with/without the aid of machine learning prediction.
AUROC = Area under the receiver operating characteristic
curve.
The AUROC values of the individual clinicians
with or without the aid for the evaluated patient
groups (n=20) are presented in Fig 4. The AUROC
values of the machine learning model are shown with
the dashed lines. Two clinicians (#1 and #5) with
the aid had higher AUROC than the machine learning
model had for the same 20 patient group. Four clini-
cians (#1, #3, #4, #5) with the aid had higher AUROC
than without the aid. Two clinicians (#2, #6) without
the aid had higher AUROC than with the aid.
Figure 4: Area under the receiver operating characteristic
curve (AUROC) values for the 6 clinicians with and without
the aid of machine learning prediction. The performance of
machine learning algorithm is shown with the dashed lines.
As can be seen from the results, on average, all
performance values (AUROC, recall, precision, ACC)
were better with the aid than without the aid. It is also
clear, that the performance of the machine learning
model was higher than that of clinicians except for the
recall measure. However, recall and precision values
can be optimized by thresholding classifier. That is,
by using a lower probability threshold, recall can be
higher and precision lower and vice versa.
Table 5 presents review time for clinicians with
and without the aid. The average review time was
34.01 s (95% CI 31.49 s - 36.53 s) without the aid
and 38.63 s (95% CI 36.58 s - 40.67 s) with the aid.
The difference is statistically significant (p< .001).
Only one clinician (#6) was faster to give the predic-
tion with the aid than without the aid.
Table 6 presents accuracy values (ACC) of the
clinicians, when machine learning model predicted
the QoL classes correctly or incorrectly. The aver-
age accuracy of the clinicians was .720 (95% CI .644-
.797) without the aid and .793 (95% CI .754-.832)
with the aid (p=.040) when the machine learning
model predicted the QoL classes correctly. The aver-
age accuracy of the clinicians was .532 (95% CI .264-
.799) without the aid and .476 (95% CI .208-.745)
with the aid (p=.363) when the machine learning
model predicted the QoL classes incorrectly. That is,
when the prediction of the machine learning model
was correct, the predictions of the clinicians were
more accurate on average.
3.1.2 Open-ended interview
All participants found the CDSS to be useful if incor-
porated into the care of breast cancer patients. Ac-
cording to participants, the information provided by
the CDSS would not likely affect the actual breast
cancer treatment of patients or the choice of thera-
pies, but rather influence the psychosocial support and
other possible interventions offered to patients. How-
ever, there was a consensus that for the resilience pre-
diction to be valuable it must lead to an actual inter-
vention for the patient. The usefulness of the tool is
therefore affected by the availability of interventions
to improve resilience. Furthermore, one participant
thought that the prediction would be most useful and
informative in cases where the predicted resilience is
lower than the clinician’s intuitive prediction. Sev-
eral participants viewed that the CDSS would be most
useful if it could identify the patients with weak re-
silience 12 months after the end of treatment, at which
point in time a portion of patients are generally less
vigorous than the majority. The resilience predic-
tion could then be used to target specific individually
planned interventions and a higher level of support
for this group of patients. The optimal timing for the
use of the CDSS is thought to differ between patients,
varying from the time of planning adjuvant treatment
to the post-treatment period.
Most participants thought that both doctors and
nurses may be possible users of the CDSS and could
make use of resilience prediction information. How-
ever, the suitable user depends on which interventions
would follow from the prediction, as offering certain
interventions may require a referral from a doctor.
However, the likelihood and motivation of clinicians
to use the CDSS is generally believed to be signif-
icantly affected by the ease of use and convenience
of the tool. With regards to breast cancer patients,
there were conflicting views on whether the informa-
tion provided by the tool would be useful to be shared
with patients. While some participants viewed that
patients learning their resilience prediction may moti-
vate and encourage them through their treatment and
rehabilitation process, some participants worried that
a poor predicted resilience may cause discouragement
and increase stress. Therefore, if the resilience predic-
tion is shared with patients the manner in which the
information is communicated must be paid attention
to.
In further development of the CDSS, one partici-
pant highlighted the importance of the incorporation
of more parameters concerning breast cancer treat-
ment and possible comorbidities into the CDSS, while
another hoped for more detailed information of the
mental health and possible medications of the patient.
Furthermore, one participant also hoped for the pa-
tient perspective in terms of their feelings towards
learning their predicted resilience to be explored fur-
ther.
4 DISCUSSION
The aim of this study was to measure whether the ma-
chine learning prediction integrated into the CDSS af-
fects clinicians’ ability to predict the quality of life of
breast cancer patients during the treatment process.
Based on the results, the aid of machine learning pre-
diction improved the ability of clinicians to predict
patients’ quality of life. Clinicians’ performance im-
proved at a statistically significant level in patients for
whom the machine learning model was able to predict
the correct outcome. The same result has also been
observed in a previous study (Kiani et al., 2020).
Traditional performance measurements, such as
AUROC, accuracy, and sensitivity, measure numerical
accuracy values. A deeper understanding of advan-
tages and disadvantages of CDSS requires different
measures and methods. Previous studies (Lee et al.,
2020; Jang et al., 2020) have measured, for example,
clinician’s confidence in his or her own assessment
when the prediction of a machine learning model was
visible. In this study, in addition to traditional per-
formance measures, we conducted an open-ended in-
Table 4: The performance measurements of area under the receiver operating characteristic curve (AUROC), recall, precision
and balanced accuracy (ACC) for machine learning (ML) model and over all participants with and without the aid of machine
learning prediction. Recall, precision and ACC were calculated for the participants by using the threshold QoL value of 75
and for machine learning model by using the threshold probability value of .60. Avg. = Average, CI = Confidence interval.
Set AUROC Recall Precision ACC
ML, Avg. (95% CI) .832 (.757-.900) .727 (.587-.854) .727 (.589-.854) .785 (.704-.863)
With aid, Avg. (95% CI) .777 (.686-.858) .818 (.696-.930) .590 (.465-.707) .745 (.663-.820)
Without aid, Avg. (95% CI) .755 (.656-.840) .773 (.636-.887) .540 (.412-.662) .696 (.607-.778)
Table 5: Review time of the participants with and without the aid of predictions. s = seconds, Avg. = Average.
Participant Review time with aid (s) Review time without aid (s)
1 38.99 29.05
2 40.78 32.96
3 35.02 29.32
4 38.60 35.55
5 32.99 27.81
6 45.38 49.40
Avg. 38.63 34.02
terview for the participants. The interview gathered
information for the development and use of decision
support tool. Based on the results, this kind of deci-
sion support tool was found to be useful. However, it
requires that the use of the tool would lead to real in-
terventions, which in turn requires that interventions
are available and possible to apply. This finding lim-
its the usefulness of the tool to hospitals which have
ready interventions for support of resilience in place
or the possibility to add such interventions into the
breast cancer care process.
The research setup of this experimental study sim-
ulated the use of decision support tool. This is a re-
search setup that should be conducted after the perfor-
mance validation of machine learning algorithm but
before field study. The goal of field study is to val-
idate tool for a real operating environment. In other
words, the results of this study determined whether
the tool needs to be further developed and what im-
provements are needed before field study. Based
on the results, several improvements are needed be-
fore the field study phase. First, based on the open-
ended interview, the patient’s medication, treatment,
and other conditions should be presented more de-
tailed level. More specific information may improve
clinicians’ confidence in both their own assessments
and predictions provided by the CDSS. Second, clin-
icians’ performance improved only slightly when the
prediction of machine learning method was available.
If the prediction of machine learning model was cor-
rect, performance of clinicians improved at statisti-
cally significant level. Based on this, the performance
of the machine learning model should be improved for
the field study phase. The number of false predictions
should be minimized that the usefulness of the tool in
actual use can be higher. Third, the machine learning
model outputs only single prediction value to the time
point after six months from the baseline. CDSS could
be more useful if more endpoints (e.g., 6, 9 and 12
months) are predicted and/or timeline-type QoL tra-
jectories are possible. Furthermore, from the point of
view of the interviewed clinicians the resilience pre-
diction would be most useful for the time point 12
months after the end of treatment, as more variance in
patients’ resilience is often observed at this point in
time.
As a follow-up study, the effect of the clinician’s
experience and test environment for the performance
should also be investigated. Previous research (Cai
et al., 2019) has shown that the aid of machine learn-
ing benefits more inexperienced clinicians. All partic-
ipants in this study were experienced clinicians. That
is, with inexperienced clinicians benefits from the aid
of machine learning predictions could be higher. Test
environment of this study did not correspond to a real
clinical environment. There were no unrelated dis-
tractions or other examinations requiring the attention
of clinicians. In noisy real clinical environment, the
aid of machine learning predictions can be higher that
should be studied.
5 CONCLUSIONS
Based on the study, the machine learning model in-
tegrated into the CDSS improved clinicians’ perfor-
mance in predicting patients’ quality of life after six
months from the baseline. Performance improved
especially in the cases where the machine learning
model was able to correctly predict patient’s QoL
value. It should be noted, however, that based on the
open-ended interview, this kind of tool is considered
Table 6: Accuracy of individual participants when machine learning algorithm predicted correctly or incorrectly QoL (quality
of life) class for the patients. Avg. = Average, CI = Confidence interval.
With aid Without aid
Participant Correct prediction Incorrect prediction Correct prediction Incorrect prediction
1 .765 .667 .647 .667
2 .846 .429 .846 .429
3 .722 .000 .667 .000
4 .833 1.000 .722 1.000
5 .824 .333 .824 .667
6 .769 .429 .615 .429
Avg. (95% CI) .793 (.754-.832) .476 (.208-.745) .720 (.644-.797) .532 (.264-.799)
useful only when interventions can be implemented
for the patients identified to have low predicted re-
silience.
REFERENCES
Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger,
M., Cull, A., Duez, N. J., Filiberti, A., Flechtner, H.,
Fleishman, S. B., and de Haes, J. C. (1993). The Euro-
pean Organization for Research and Treatment of Cancer
QLQ-C30: a quality-of-life instrument for use in inter-
national clinical trials in oncology. J Natl Cancer Inst,
85(5):365–376.
Bonanno, G. A., Galea, S., Bucciarelli, A., and Vlahov,
D. (2007). What predicts psychological resilience after
disaster? The role of demographics, resources, and life
stress. J Consult Clin Psychol, 75(5):671–682.
Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre,
L. A., and Jemal, A. (2018). Global cancer statistics
2018: GLOBOCAN estimates of incidence and mortality
worldwide for 36 cancers in 185 countries. CA Cancer J
Clin, 68(6):394–424.
Cai, S. L., Li, B., Tan, W. M., Niu, X. J., Yu, H. H., Yao,
L. Q., Zhou, P. H., Yan, B., and Zhong, Y. S. (2019). Us-
ing a deep learning system in endoscopy for screening of
early esophageal squamous cell carcinoma (with video).
Gastrointest Endosc, 90(5):745–753.
Clarke, V. and Braun, V. (2014). Thematic Analysis, vol-
ume 3, pages 1947–1952.
Deshields, T. L., Heiland, M. F., Kracen, A. C., and Dua, P.
(2016). Resilience in adults with cancer: development of
a conceptual model. Psychooncology, 25(1):11–18.
Ginestra, J. C., Giannini, H. M., Schweickert, W. D.,
Meadows, L., Lynch, M. J., Pavan, K., Chivers, C. J.,
Draugelis, M., Donnelly, P. J., Fuchs, B. D., and Um-
scheid, C. A. (2019). Clinician Perception of a Machine
Learning-Based Early Warning System Designed to Pre-
dict Severe Sepsis and Septic Shock. Crit Care Med,
47(11):1477–1484.
Goebel, S. and Mehdorn, H. M. (2011). Measurement of
psychological distress in patients with intracranial tu-
mours: the NCCN distress thermometer. J Neurooncol,
104(1):357–364.
ITU-R (2012). Itu-r rec. bt.500-13, methodology for the
subjective assessment of the quality of television pic-
tures. Report A 70000, ITU Radiocommunication Sec-
tor.
Jang, S., Song, H., Shin, Y. J., Kim, J., Kim, J., Lee, K. W.,
Lee, S. S., Lee, W., Lee, S., and Lee, K. H. (2020). Deep
Learning-based Automatic Detection Algorithm for Re-
ducing Overlooked Lung Cancers on Chest Radiographs.
Radiology, 296(3):652–661.
Kiani, A., Uyumazturk, B., Rajpurkar, P., Wang, A., Gao,
R., Jones, E., Yu, Y., Langlotz, C. P., Ball, R. L., Mon-
tine, T. J., Martin, B. A., Berry, G. J., Ozawa, M. G.,
Hazard, F. K., Brown, R. A., Chen, S. B., Wood, M.,
Allard, L. S., Ylagan, L., Ng, A. Y., and Shen, J. (2020).
Impact of a deep learning assistant on the histopathologic
classification of liver cancer. NPJ Digit Med, 3:23.
Lee, J. H., Ha, E. J., Kim, D., Jung, Y. J., Heo, S., Jang,
Y. H., An, S. H., and Lee, K. (2020). Application of
deep learning to the diagnosis of cervical lymph node
metastasis from thyroid cancer with CT: external valida-
tion and clinical utility for resident training. Eur Radiol,
30(6):3066–3072.
Molina, Y., Yi, J. C., Martinez-Gutierrez, J., Reding, K. W.,
Yi-Frazier, J. P., and Rosenberg, A. R. (2014). Resilience
among patients across the cancer continuum: diverse per-
spectives. Clin J Oncol Nurs, 18(1):93–101.
Nielsen, P. S., Lindebjerg, J., Rasmussen, J., Starklint, H.,
Waldstrøm, M., and Nielsen, B. (2010). Virtual mi-
croscopy: an evaluation of its validity and diagnostic per-
formance in routine histologic diagnosis of skin tumors.
Hum Pathol, 41(12):1770–1776.
Pantanowitz, L., Sinard, J. H., Henricks, W. H., Fatheree,
L. A., Carter, A. B., Contis, L., Beckwith, B. A., Evans,
A. J., Lal, A., and Parwani, A. V. (2013). Validating
whole slide imaging for diagnostic purposes in pathol-
ogy: guideline from the College of American Pathol-
ogists Pathology and Laboratory Quality Center. Arch
Pathol Lab Med, 137(12):1710–1722.
Rutter, M. (2006). Implications of resilience concepts for
scientific understanding. Ann N Y Acad Sci, 1094:1–12.
Seabold, S. and Perktold, J. (2010). Statsmodels: Econo-
metric and statistical modeling with python. Proceedings
of the 9th Python in Science Conference, 2010.
Sutton, R. T., Pincock, D., Baumgart, D. C., Sadowski,
D. C., Fedorak, R. N., and Kroeker, K. I. (2020). An
overview of clinical decision support systems: benefits,
risks, and strategies for success. NPJ Digit Med, 3:17.
Vasey, B., Clifton, D. A., Collins, G. S., Denniston,
A. K., Faes, L., Geerts, B. F., Liu, X., Morgan, L.,
Watkinson, P., and McCulloch, P. (2021). DECIDE-AI:
new reporting guidelines to bridge the development-to-
implementation gap in clinical artificial intelligence. Nat
Med, 27(2):186–187.
... To overcome the limitation of traditional models, a few machine learning (ML) models have been proposed in the literature that predict the QoL of breast cancer survivors. However, there were only a few ML models for QoL prediction with limitations [6,[8][9][10][11][12]. First, most models did not fully include multidimensional factors. ...
... In this study, the AUC for evaluating model performances surpassed 0.8 for all survival periods, and the results of external validation using data collected in other studies were also greater than 0.77. This performance is much better than that of previous studies that predicted QoL using ML modeling, which reported values ranging from 0.476 to 0.793 [6,[9][10][11][12]. These ML-based breast cancer QoL prediction models were developed with not only clinical and sociodemographic factors but also with the integration of information from multiple factors, thus ensuring better model performance. ...
... In this study, we performed an external validation to test the generalizability of our models, which is a strength compared to the previous study that did not perform external validation [6,[9][10][11][12]. This aspect is important as it demonstrates the effectiveness of our models and their potential to be applied to other settings. ...
Article
Full-text available
Background: Breast cancer is the most common cancer and cause of cancer death in women. Although survival rates have improved, unmet psychosocial needs remain challenging because the quality of life (QoL) and QoL-related factors change over time. In addition, traditional statistical models have limitations in identifying factors associated with QoL over time, particularly concerning the physical, psychological, economic, spiritual, and social dimensions. Objective: This study aimed to identify patient-centered factors associated with QoL among breast cancer patients using a machine learning algorithm to analyze data collected along different survivorship trajectories. Methods: The study used two datasets: the first data set was the cross-sectional survey data from the Breast cancer Information Grand round for Survivorship (BIG-S) study, which recruited consecutive breast cancer survivors who visited the outpatient breast cancer clinic at the Samsung Medical Center in Seoul, Korea, between 2018 and 2019. The second data set was the longitudinal cohort data from the Beauty Education for diStressed breasT cancer (BEST) cohort study, which was conducted at two university-based cancer hospitals in Seoul, Korea between 2011 and 2016. QoL was measured using EORTC QLQ-C30 questionnaire. Feature importance was interpreted using Shapley Additive Explanations (SHAP). The final model was selected based on the highest mean area under the receiver operating characteristic curve (AUC). The analyses were performed using the Python 3.7, scikit-learn package, and TensorFlow Keras framework. Results: The study included 6,265 breast cancer survivors in the training dataset and 432 patients in the validation set. Mean age was 50.6 years and 46.8% had stage 1 cancer. In the training dataset, 48.3% survivors had poor QoL. The study developed machine learning models for QoL prediction based on six algorithms. Performance was good for all survival trajectories: overall (AUC = 0.823), baseline (AUC = 0.835), under 1 year (AUC = 0.860), between 2 and 3 years (AUC = 0.808), between 3 and 4 years (AUC = 0.820), and between 4 and 5 years (AUC = 0.826). Emotional and physical functions were the most important features before surgery and under 1 year after surgery, respectively. Fatigue was the most important feature between 1-4 years. Despite the survival period, hopefulness was the most influential feature on QoL. External validation of the models showed good performance with AUCs between 0.770 and 0.862. Conclusions: The study identified important factors associated with QoL among breast cancer survivors across different survival trajectories. Understanding the changing trends of these factors could help to intervene more precisely and timely, and potentially prevent or alleviate QoL-related issues for patients. The good performance of our machine learning models in both training and external validation sets suggests the potential utility of this approach in identifying patient-centered factors and improving survivorship care.
Article
Full-text available
As an increasing number of clinical decision-support systems driven by artificial intelligence progress from development to implementation, better guidance on the reporting of human factors and early-stage clinical evaluation is needed.
Article
Full-text available
Artificial intelligence (AI) algorithms continue to rival human performance on a variety of clinical tasks, while their actual impact on human diagnosticians, when incorporated into clinical workflows, remains relatively unexplored. In this study, we developed a deep learning-based assistant to help pathologists differentiate between two subtypes of primary liver cancer, hepatocellular carcinoma and cholangiocarcinoma, on hematoxylin and eosin-stained whole-slide images (WSI), and evaluated its effect on the diagnostic performance of 11 pathologists with varying levels of expertise. Our model achieved accuracies of 0.885 on a validation set of 26 WSI, and 0.842 on an independent test set of 80 WSI. Although use of the assistant did not change the mean accuracy of the 11 pathologists (p = 0.184, OR = 1.281), it significantly improved the accuracy (p = 0.045, OR = 1.499) of a subset of nine pathologists who fell within well-defined experience levels (GI subspecialists, non-GI subspecialists, and trainees). In the assisted state, model accuracy significantly impacted the diagnostic decisions of all 11 pathologists. As expected, when the model’s prediction was correct, assistance significantly improved accuracy (p = 0.000, OR = 4.289), whereas when the model’s prediction was incorrect, assistance significantly decreased accuracy (p = 0.000, OR = 0.253), with both effects holding across all pathologist experience levels and case difficulty levels. Our results highlight the challenges of translating AI models into the clinical setting, and emphasize the importance of taking into account potential unintended negative consequences of model assistance when designing and testing medical AI-assistance tools.
Article
Full-text available
Computerized clinical decision support systems, or CDSS, represent a paradigm shift in healthcare today. CDSS are used to augment clinicians in their complex decision-making processes. Since their first use in the 1980s, CDSS have seen a rapid evolution. They are now commonly administered through electronic medical records and other computerized clinical workflows, which has been facilitated by increasing global adoption of electronic medical records with advanced capabilities. Despite these advances, there remain unknowns regarding the effect CDSS have on the providers who use them, patient outcomes, and costs. There have been numerous published examples in the past decade(s) of CDSS success stories, but notable setbacks have also shown us that CDSS are not without risks. In this paper, we provide a state-of-the-art overview on the use of clinical decision support systems in medicine, including the different types, current use cases with proven efficacy, common pitfalls, and potential harms. We conclude with evidence-based recommendations for minimizing risk in CDSS design, implementation, evaluation, and maintenance.
Article
Full-text available
Each phase of the cancer experience profoundly affects patients' lives. Much of the literature has focused on negative consequences of cancer; however, the study of resilience may enable providers to promote more positive psychosocial outcomes before, during; and after the cancer experience. The current review describes the ways in which elements of resilience have been defined and studied at each phase of the cancer continuum. Extensive literature searches were Conducted to find studies assessing resilience during one or more stages of the adult cancer continuum. For all phases of the cancer continuum, resilience descriptions included preexisting or baseline characteristics, such as demographics and personal attributes (e.g., optimism, social support), mechanisms of adaptation, such as coping and medical experiences (e.g., positive provider communication), as well as psychosocial outcomes, such as growth and quality of life. Promoting resilience is a critical element of patient psychosocial care. Nurses may enable resilience by recognizing and promoting certain baseline characteristics and optimizing mechanisms of adaptation.
Article
PurposeThis study aimed to validate a deep learning model’s diagnostic performance in using computed tomography (CT) to diagnose cervical lymph node metastasis (LNM) from thyroid cancer in a large clinical cohort and to evaluate the model’s clinical utility for resident training.Methods The performance of eight deep learning models was validated using 3838 axial CT images from 698 consecutive patients with thyroid cancer who underwent preoperative CT imaging between January and August 2018 (3606 and 232 images from benign and malignant lymph nodes, respectively). Six trainees viewed the same patient images (n = 242), and their diagnostic performance and confidence level (5-point scale) were assessed before and after computer-aided diagnosis (CAD) was included.ResultsThe overall area under the receiver operating characteristics (AUROC) of the eight deep learning algorithms was 0.846 (range 0.784–0.884). The best performing model was Xception, with an AUROC of 0.884. The diagnostic accuracy, sensitivity, specificity, positive predictive value, and negative predictive value of Xception were 82.8%, 80.2%, 83.0%, 83.0%, and 80.2%, respectively. After introducing the CAD system, underperforming trainees received more help from artificial intelligence than the higher performing trainees (p = 0.046), and overall confidence levels significantly increased from 3.90 to 4.30 (p < 0.001).Conclusion The deep learning–based CAD system used in this study for CT diagnosis of cervical LNM from thyroid cancer was clinically validated with an AUROC of 0.884. This approach may serve as a training tool to help resident physicians to gain confidence in diagnosis.Key Points • A deep learning-based CAD system for CT diagnosis of cervical LNM from thyroid cancer was validated using data from a clinical cohort. The AUROC for the eight tested algorithms ranged from 0.784 to 0.884. • Of the eight models, the Xception algorithm was the best performing model for the external validation dataset with 0.884 AUROC. The accuracy, sensitivity, specificity, positive predictive value, and negative predictive value were 82.8%, 80.2%, 83.0%, 83.0%, and 80.2%, respectively. • The CAD system exhibited potential to improve diagnostic specificity and accuracy in underperforming trainees (3 of 6 trainees, 50.0%). This approach may have clinical utility as a training tool to help trainees to gain confidence in diagnoses.
Article
Background and aims: Few artificial intelligence-based technologies have been developed to improve the efficiency of screening for esophageal squamous cell carcinoma (ESCC). Here, we developed and validated a novel system of computer-aided detection (CAD) using a deep neural network (DNN) to localize and identify early ESCC under conventional endoscopic white-light imaging. Methods: We collected 2428 (1332 abnormal, 1096 normal) esophagoscopic images from 746 patients to set up a novel DNN-CAD system in 2 centers and prepared a validation dataset containing 187 images from 52 patients. Sixteen endoscopists (senior, mid-level, and junior) were asked to review the images of the validation set. The diagnostic results, including accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), were compared between the DNN-CAD system and endoscopists. Results: The receiver operating characteristic curve for DNN-CAD showed that the area under the curve was >96%. For the validation dataset, DNN-CAD had a sensitivity, specificity, accuracy, PPV, and NPV of 97.8%, 85.4%, 91.4%, 86.4%, and 97.6%, respectively. The senior group achieved an average diagnostic accuracy of 88.8%, whereas the junior group had a lower value of 77.2%. After referring to the results of DNN-CAD, the average diagnostic ability of the endoscopists improved, especially in terms of sensitivity (74.2% vs 89.2%), accuracy (81.7% vs 91.1%), and NPV (79.3% vs 90.4%). Conclusions: The novel DNN-CAD system used for screening of early ESCC has high accuracy and sensitivity, and can help endoscopists to detect lesions previously ignored under white-light imaging.
Article
Objective: To assess clinician perceptions of a machine learning-based early warning system to predict severe sepsis and septic shock (Early Warning System 2.0). Design: Prospective observational study. Setting: Tertiary teaching hospital in Philadelphia, PA. Patients: Non-ICU admissions November-December 2016. Interventions: During a 6-week study period conducted 5 months after Early Warning System 2.0 alert implementation, nurses and providers were surveyed twice about their perceptions of the alert's helpfulness and impact on care, first within 6 hours of the alert, and again 48 hours after the alert. Measurements and main results: For the 362 alerts triggered, 180 nurses (50% response rate) and 107 providers (30% response rate) completed the first survey. Of these, 43 nurses (24% response rate) and 44 providers (41% response rate) completed the second survey. Few (24% nurses, 13% providers) identified new clinical findings after responding to the alert. Perceptions of the presence of sepsis at the time of alert were discrepant between nurses (13%) and providers (40%). The majority of clinicians reported no change in perception of the patient's risk for sepsis (55% nurses, 62% providers). A third of nurses (30%) but few providers (9%) reported the alert changed management. Almost half of nurses (42%) but less than a fifth of providers (16%) found the alert helpful at 6 hours. Conclusions: In general, clinical perceptions of Early Warning System 2.0 were poor. Nurses and providers differed in their perceptions of sepsis and alert benefits. These findings highlight the challenges of achieving acceptance of predictive and machine learning-based sepsis alerts.
Article
This article provides a status report on the global burden of cancer worldwide using the GLOBOCAN 2018 estimates of cancer incidence and mortality produced by the International Agency for Research on Cancer, with a focus on geographic variability across 20 world regions. There will be an estimated 18.1 million new cancer cases (17.0 million excluding nonmelanoma skin cancer) and 9.6 million cancer deaths (9.5 million excluding nonmelanoma skin cancer) in 2018. In both sexes combined, lung cancer is the most commonly diagnosed cancer (11.6% of the total cases) and the leading cause of cancer death (18.4% of the total cancer deaths), closely followed by female breast cancer (11.6%), prostate cancer (7.1%), and colorectal cancer (6.1%) for incidence and colorectal cancer (9.2%), stomach cancer (8.2%), and liver cancer (8.2%) for mortality. Lung cancer is the most frequent cancer and the leading cause of cancer death among males, followed by prostate and colorectal cancer (for incidence) and liver and stomach cancer (for mortality). Among females, breast cancer is the most commonly diagnosed cancer and the leading cause of cancer death, followed by colorectal and lung cancer (for incidence), and vice versa (for mortality); cervical cancer ranks fourth for both incidence and mortality. The most frequently diagnosed cancer and the leading cause of cancer death, however, substantially vary across countries and within each country depending on the degree of economic development and associated social and life style factors. It is noteworthy that high‐quality cancer registry data, the basis for planning and implementing evidence‐based cancer control programs, are not available in most low‐ and middle‐income countries. The Global Initiative for Cancer Registry Development is an international partnership that supports better estimation, as well as the collection and use of local data, to prioritize and evaluate national cancer control efforts. CA: A Cancer Journal for Clinicians 2018;0:1‐31. © 2018 American Cancer Society
Article
Resilience is a construct addressed in the psycho-oncology literature and is especially relevant to cancer survivorship. The purpose of this paper is to propose a model for resilience that is specific to adults diagnosed with cancer. To establish the proposed model, a brief review of the various definitions of resilience and of the resilience literature in oncology is provided. The proposed model includes baseline attributes (personal and environmental) which impact how an individual responds to an adverse event, which in this paper is cancer-related. The survivor has an initial response that fits somewhere on the distress-resilience continuum; however, post-cancer experiences (and interventions) can modify the initial response through a process of recalibration. The literature reviewed indicates that resilience is a common response to cancer diagnosis or treatment. The proposed model supports the view of resilience as both an outcome and a dynamic process. Given the process of recalibration, a discussion is provided of interventions that might facilitate resilience in adults with cancer. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.