Content uploaded by Anindo Saha
Author content
All content in this area was uploaded by Anindo Saha on Sep 22, 2021
Content may be subject to copyright.
Encoding Clinical Priori in 3D Convolutional Neural
Networks for Prostate Cancer Detection in bpMRI
Anindo Saha, Matin Hosseinzadeh, Henkjan Huisman
Diagnostic Image Analysis Group, Radboud University Medical Center
Nijmegen 6525 GA, The Netherlands
{anindya.shaha,matin.hosseinzadeh,henkjan.huisman}@radboudumc.nl
Abstract
We hypothesize that anatomical priors can be viable mediums to infuse domain-
specific clinical knowledge into state-of-the-art convolutional neural networks
(CNN) based on the U-Net architecture. We introduce a probabilistic population
prior which captures the spatial prevalence and zonal distinction of clinically
significant prostate cancer (csPCa), in order to improve its computer-aided detection
(CAD) in bi-parametric MR imaging (bpMRI). To evaluate performance, we train
3D adaptations of the U-Net, U-SEResNet, UNet++ and Attention U-Net using
800 institutional training-validation scans, paired with radiologically-estimated
annotations and our computed prior. For 200 independent testing bpMRI scans with
histologically-confirmed delineations of csPCa, our proposed method of encoding
clinical priori demonstrates a strong ability to improve patient-based diagnosis
(upto 8.70% increase in AUROC) and lesion-level detection (average increase of
1.08 pAUC between 0.1–10 false positives per patient) across all four architectures.
1 Introduction
State-of-the-art CNN architectures are often conceived as one-size-fits-all solutions to computer vision
challenges, where objects can belong to one of 1000 different classes and occupy any part of natural
color images [
1
]. In contrast, medical imaging modalities in radiology and nuclear medicine exhibit
much lower inter-sample variability, where the spatial content of a scan is limited by the underlying
imaging protocols and human anatomy. In agreement with recent studies [
2
–
4
], we hypothesize that
variant architectures of U-Net can exploit this property via an explicit anatomical prior, particularly
at the task of csPCa detection in bpMRI. To this end, we present a probabilistic population prior
P
,
constructed using radiologically-estimated csPCa annotations and CNN-generated prostate zonal
segmentations of 700 training samples. We propose
P
as a powerful means of encoding clinical priori
to improve patient-based diagnosis and lesion-level detection on histologically-confirmed cases. We
evaluate its efficacy across a range of popular 3D U-Net architectures that are widely adapted for
biomedical applications [5–9].
Related Work
Traditional image analysis techniques, such as MALF [
10
], can benefit from spatial
priori in the form of atlases or multi-expert labeled template images reflecting the target organ
anatomy. Meanwhile, machine learning models can adapt several techniques, such as reference
coordinate systems [
11
,
12
] or anatomical maps [
2
], to integrate domain-specific priori into CNN
architectures. In recent years, the inclusion of zonal priors [
4
] and prevalence maps [
3
] have yielded
similar benefits in 2D CAD systems for prostate cancer.
Anatomical Priors
For the
i
-th bpMRI scan in the training dataset, let us define its specific
prevalence map as
pi= (pi
1, pi
2, ..., pin)
, where
n
represents the total number of voxels per channel.
Let us define the binary masks for the prostatic transitional zone (TZ), peripheral zone (PZ) and
34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.
arXiv:2011.00263v4 [eess.IV] 21 Sep 2021
Figure 1:
(a) Prevalence Prior
:Pat
µ= 0.00
is equivalent to the mean csPCa annotation in the
training dataset; mapping the common sizes, shapes and locations of malignant lesions.
(b) Hybrid
Prior
:Pat
µ= 0.01
blends the information of csPCa annotations with that of the prostate zonal
segmentations.
(c) Zonal Prior
:Pat
µ= 0.33
is approximately equivalent to the weighted average
of all prostate zonal segmentations in the training dataset.
(d)
: Schematic of the pipeline used to
train/evaluate each candidate 3D CNN model with a variant of the prior P, in separate turns.
malignancy (M), if present, in this sample as
BT Z
,
BP Z
and
BM
, respectively. We can compute the
value of the j-th voxel in pias follows:
f(pij) =
0.00 pij∈(BT Z ∪BTZ ∪BM)0
µ pij∈BT Z ∩BM
0
3µ pij∈BP Z ∩BM
0
1.00 pij∈BM
Here,
f(pij)
aims to model the spatial likelihood of csPCa by drawing upon the empirical distribution
of the training dataset. Nearly 75% and 25% of all malignant lesions emerge from PZ and TZ,
respectively [
13
,
14
]. Thus, similar to PI-RADS v2 [
15
],
f(pij)
incorporates the importance of
zonal distinction during the assessment of csPCa. In terms of the likelihood of carrying csPCa,
it assumes that voxels belonging to the background class are not likely (
f(pij)=0.00
), those
belonging to TZ are more likely (
f(pij) = µ
), those belonging to PZ are three times as likely
as TZ (
f(pij) = 3µ
), and those containing csPCa are the most likely (
f(pij) = 1.00
), in any
given scan. All the computed specific prevalence maps can be generalized to a single probabilistic
population prior,
P= (Ppi)/N ∈[0,1]
, where
N
represents the total number of training samples.
The value of
µ∈[0,0.33]
is a hyperparameter that regulates the relative contribution of benign
prostatic regions in the composition of each
pi
and subsequently our proposed prior
P
(refer to
Fig. 1(a-c)). Due to the standardized bpMRI imaging protocol [
15
], inter-sample alignment of the
prostate gland is effectively preserved with minimal spatial shifts observed across different patient
scans. Prior-to-image correspondence is established at both train-time and inference by using the
case-specific prostate segmentations to translate, orient and scale
P
, accordingly, for each bpMRI
scan. No additional non-rigid registration techniques have been applied throughout this process.
2 Experimental Analysis
Materials
To train and tune each model, we use 800 prostate bpMRI (T2W, high b-value DWI,
computed ADC) scans from Radboud University Medical Center, paired with fully delineated
annotations of csPCa. Annotations are estimated by a consensus of expert radiologists via PI-RADS
v2 [
15
], where any lesion marked PI-RADS
≥
4 constitutes as csPCa. From here, 700 and 100
patient scans are partitioned into training and validation sets, respectively, via stratified sampling. To
evaluate performance, we use 200 testing scans from Ziekenhuisgroep Twente. Here, annotations are
clinically confirmed by independent pathologists [
16
,
17
] with Gleason Score
>3 + 3
corresponding
to csPCa. TZ, PZ segmentations are generated for every scan using a multi-planar, anisotropic 3D
U-Net from a separate study [
18
], where the network achieves an average Dice Similarity Coefficient
of
0.90 ±0.01
for whole-gland segmentation over
5×5
nested cross-validation. The network is
trained on a subset of 47 bpMRI scans from the training dataset and its output zonal segmentations
are used to construct and apply the anatomical priors (as detailed in Section 1). Special care is taken
to ensure mutually exclusive patients between the training, validation and testing datasets.
2
Experiments
Adjusting the value of
µ
can lead to remarkably different priors, as seen in Fig. 1(a-c).
We test three different priors, switching the value of
µ
between 0.00, 0.01 and 0.33, to investigate
the range of its impact on csPCa detection. Based on our observations in previous work [
4
], we opt
for an early fusion of the probabilistic priori, where each variant of
P
is stacked as an additional
channel in the input image volume (refer to Fig. 1(d)) via separate turns. Candidate CNN models
include 3D adaptations of the stand-alone U-Net [
5
], an equivalent network composed of Squeeze-
and-Excitation residual blocks [
6
] termed U-SEResNet, the UNet++ [
7
] and the Attention U-Net [
8
]
architectures. All models are trained using intensity-normalized (mean=0, stdev=1), center-cropped
(
144×144×18
) images with
0.5×0.5×3.6
mm
3
resolution. Minibatch size of 4 is used with an
exponentially decaying cyclic learning rate [
19
] oscillating between
10−6
and
2.5×10−4
. Focal loss
(
α= 0.75, γ = 2.00
) [
20
] is used to counter the 1:153 voxel-level class imbalance [
21
] in the training
dataset, with Adam optimizer [
22
] in backpropagation. Train-time augmentations include horizontal
flip, rotation (
−7.5◦
to
7.5◦
), translation (
0
-
5%
horizontal/vertical shifts) and scaling (
0
-
5%
) centered
along the axial plane. During inference, we apply test-time augmentations by averaging predictions
over the original and horizontally-flipped images.
3 Results and Discussion
Patient-based diagnosis and lesion-level detection performance on the testing set are noted in Table
1 and Fig 2, respectively. For every combination of the 3D CNN models and a variant of the prior
P
, we observe improvements in performance over the baseline. Notably, the hybrid prior, which
retains a blend of both csPCa prevalence and zonal priori, shares the highest increases of 7.32–8.70%
in patient-based AUROC.
P
demonstrates a similar ability to enhance csPCa localization, with an
average increase of 1.08 in pAUC between 0.1–10 false positives per patient across all FROC setups.
Table 1: Patient-based diagnosis performance of each 3D CNN model paired with different variants of
the anatomical prior
P
. Performance scores indicate the mean metric followed by the 95% confidence
interval estimated as twice the standard deviation from 1000 replications of bootstrapping.
Architecture
Area Under Receiver Operating Characteristic (AUROC)
Baseline
Prevalence Prior
Zonal Prior Hybrid Prior
(without prior) (µ= 0.00) (µ= 0.33) (µ= 0.01)
U-Net [5] 0.690±0.079 0.737±0.076 0.740±0.073 0.763±0.071
U-SEResNet [6] 0.694±0.077 0.732±0.077 0.748±0.080 0.777±0.072
UNet++ [7] 0.694±0.078 0.734±0.080 0.752±0.079 0.781±0.069
Attention U-Net [8] 0.711±0.078 0.736±0.079 0.750±0.071 0.790±0.066
Figure 2: Lesion-level Free-Response Receiver Operating Characteristic (FROC) analyses of each
3D CNN model paired with different variants of the anatomical prior
P
:
(a)
U-Net
(b)
U-SEResNet
(c)
UNet++
(d)
Attention U-Net. Transparent areas indicate the 95% confidence intervals estimated
from 1000 replications of bootstrapping.
In this research, we demonstrate how the standardized imaging protocol of prostate bpMRI can be
leveraged to construct explicit anatomical priors, which can subsequently be used to encode clinical
priori into state-of-the-art U-Net architectures. By doing so, we are able to provide a higher degree of
train-time supervision and boost overall model performance in csPCa detection, even in the presence
3
of a limited training dataset with inaccurate annotations. In future study, we aim to investigate the
prospects of integrating our proposed prior in the presence of larger training datasets, as well as
quantitatively deduce its capacity to guide model generalization to histologically-confirmed testing
cases beyond the radiologically-estimated training annotations.
Broader Impact
Prostate cancer is one of the most prevalent cancers in men worldwide [
23
]. In the absence of
experienced radiologists, its multifocality, morphological heterogeneity and strong resemblance to
numerous non-malignant conditions in MR imaging, can lead to low inter-reader agreement (
<50%
)
and sub-optimal interpretation [
13
,
24
,
25
]. The development of automated, reliable detection
algorithms has therefore become an important research focus in medical image computing, offering
the potential to support radiologists with consistent quantitative analysis in order to improve their
diagnostic accuracy, and in turn, minimize unnecessary biopsies in patients [26, 27].
Data scarcity and inaccurate annotations are frequent challenges in the medical domain, where
they hinder the ability of CNN models to capture a complete, visual representation of the target
class(es). Thus, we look towards leveraging the breadth of clinical knowledge established in the
field, well beyond the training dataset, to compensate for these limitations. The promising results of
this study verifies and further motivates the ongoing development of state-of-the-art techniques to
incorporate clinical priori into CNN architectures, as an effective and practical solution to improve
overall performance.
Population priors for prostate cancer can be susceptible to biases that indicate asymmetrical prevalence.
For instance, the computed prior may exhibit a relatively higher response on one side (left/right),
stemming from an imbalanced spatial distribution of the malignant lesions sampled for the training
dataset. We strongly recommend adequate train-time augmentations (as detailed in Section 2) to
mitigate this challenge.
Acknowledgments and Disclosure of Funding
The authors would like to acknowledge the contributions of Maarten de Rooij and Ilse Slootweg
from Radboud University Medical Center during the annotation of fully delineated masks of prostate
cancer for every bpMRI scan used in this study. This research is supported in part by the European
Union H2020: ProCAncer-I project (EU grant 952159). Anindo Saha is supported by the Erasmus+:
EMJMD scholarship in Medical Imaging and Applications (MaIA) program.
References
[1]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image
Database. In 2009 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages
248–255, 2009.
[2]
A.V. Dalca, J. Guttag, and M.R. Sabuncu. Anatomical Priors in Convolutional Networks for Unsupervised
Biomedical Segmentation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), pages 9290–9299, 2018.
[3]
R. Cao, X. Zhong, F. Scalzo, S. Raman, and K. Sung. Prostate Cancer Inference via Weakly-Supervised
Learning using a Large Collection of Negative MRI. In 2019 IEEE/CVF International Conference on
Computer Vision Workshop (ICCVW), pages 434–439, 2019.
[4]
M. Hosseinzadeh, P. Brand, and H. Huisman. Effect of Adding Probabilistic Zonal Prior in Deep Learning-
based Prostate Cancer Detection. In International Conference on Medical Imaging with Deep Learning
–Extended Abstract Track, 2019.
[5]
O. Çiçek, A. Abdulkadir, S.S. Lienkamp, T. Brox, and O. Ronneberger. 3D U-Net: Learning Dense
Volumetric Segmentation from Sparse Annotation. In Medical Image Computing and Computer-Assisted
Intervention (MICCAI), pages 424–432, 2016.
[6]
J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu. Squeeze-and-Excitation Networks. IEEE Transactions on
Pattern Analysis and Machine Intelligence, pages 7132–7141, 2019.
[7]
Z. Zhou, M.M.R. Siddiquee, N. Tajbakhsh, and J. Liang. UNet++: Redesigning Skip Connections to
Exploit Multiscale Features in Image Segmentation. IEEE Transactions on Medical Imaging, 39(6):
1856–1867, 2020.
4
[8]
J. Schlemper, O. Oktay, M. Schaap, M. Heinrich, B. Kainz, B. Glocker, and D. Rueckert. Attention Gated
Networks: Learning to Leverage Salient Regions in Medical Images. Medical Image Analysis, 53:197–207,
2019.
[9]
L. Rundo, C. Han, Y. Nagano, J. Zhang, R. Hataya, C. Militello, A. Tangherloni, M.S. Nobile, C. Ferretti,
D. Besozzi, M.C. Gilardi, S. Vitabile, G. Mauri, H. Nakayama, and P. Cazzaniga. USE-Net: Incorporating
Squeeze-and-Excitation Blocks into U-Net for Prostate Zonal Segmentation of Multi-Institutional MRI
Datasets. Neurocomputing, 365:31 – 43, 2019.
[10]
H. Wang, J.W. Suh, S.R. Das, J.B. Pluta, C. Craige, and P.A. Yushkevich. Multi-Atlas Segmentation with
Joint Label Fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(3):611–623,
2013.
[11]
T. Kooi, G. Litjens, B. van Ginneken, A. Gubern-Mérida, C.I. Sánchez, R. Mann, A. den Heeten, and
N. Karssemeijer. Large Scale Deep Learning for Computer Aided Detection of Mammographic Lesions.
Medical Image Analysis, 35:303–312, 2017.
[12]
C. Wachinger, M. Reuter, and T. Klein. DeepNAT: Deep Convolutional Neural Network for Segmenting
Neuroanatomy. NeuroImage, 170:434–445, 2018.
[13]
B. Israël, M. van der Leest, M. Sedelaar, A.R. Padhani, P. Zámecnik, and J.O. Barentsz. Multiparametric
Magnetic Resonance Imaging for the Detection of Clinically Significant Prostate Cancer: What Urologists
Need to Know. Part 2: Interpretation. European Urology, 77(4):469–480, 2020.
[14]
M.E. Chen, D.A. Johnston, K. Tang, R.J. Babaian, and P. Troncoso. Detailed Mapping of Prostate
Carcinoma Foci: Biopsy Strategy Implications. Cancer, 89(8):1800–1809, 2000.
[15]
J.C. Weinreb, J.O. Barentsz, P.L. Choyke, and F. Cornud. PI-RADS Prostate Imaging – Reporting and
Data System: 2015, Version 2. European Urology, 69(1):16 – 40, 2016.
[16]
M. van der Leest, E. Cornel, B. Israël, and R. Hendriks. Head-to-head Comparison of Transrectal
Ultrasound-guided Prostate Biopsy Versus Multiparametric Prostate Resonance Imaging with Subsequent
Magnetic Resonance-guided Biopsy in Biopsy-naïve Men with Elevated Prostate-specific Antigen: A
Large Prospective Multicenter Clinical Study. European Urology, 75(4):570 – 578, 2019.
[17]
J.I. Epstein, L. Egevad, M.B. Amin, and B. Delahunt. The 2014 International Society of Urological
Pathology (ISUP) Consensus Conference on Gleason Grading of Prostatic Carcinoma: Definition of
Grading Patterns and Proposal for a New Grading System. Am. J. Surg. Pathol., 40(2):244–252, 2016.
[18]
T. Riepe, M. Hosseinzadeh, P. Brand, and H. Huisman. Anisotropic Deep Learning Multi-planar Automatic
Prostate Segmentation. In Proceedings of the 28th International Society for Magnetic Resonance in
Medicine Annual Meeting, 2020. URL
http://indexsmart.mirasmart.com/ISMRM2020/PDFfiles/
3518.html.
[19]
L.N. Smith. Cyclical Learning Rates for Training Neural Networks. In 2017 IEEE Winter Conference on
Applications of Computer Vision (WACV), pages 464–472, 2017.
[20]
T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. Focal Loss for Dense Object Detection. In 2017 IEEE
International Conference on Computer Vision (ICCV), pages 2999–3007, 2017.
[21]
R. Cao, A. Mohammadian Bajgiran, S. Afshari Mirak, S. Shakeri, X. Zhong, D. Enzmann, S. Raman, and
K. Sung. Joint Prostate Cancer Detection and Gleason Score Prediction in mp-MRI via FocalNet. IEEE
Transactions on Medical Imaging, 38(11):2496–2506, 2019.
[22]
D.P. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. In International Conference on
Learning Representations (ICLR), 2015. URL http://arxiv.org/abs/1412.6980.
[23]
K.D. Miller, L. Nogueira, A.B. Mariotto, J.H. Rowland, K.R. Yabroff, C.M. Alfano, A. Jemal, J.L. Kramer,
and R.L. Siegel. Cancer Treatment and Survivorship Statistics, 2019. CA: A Cancer Journal for Clinicians,
69(5):363–385, 2019.
[24]
C.P. Smith, S.A. Harmon, T. Barrett, and L.K. Bittencourt. Intra- and Interreader Reproducibility of
PI-RADS v2: A Multireader Study. Journal of Magnetic Resonance Imaging, 49(6):1694–1703, 2019.
[25]
A.B. Rosenkrantz, L.A. Ginocchio, D. Cornfeld, and A.T. Froemming. Interobserver Reproducibility of the
PI-RADS Version 2 Lexicon: A Multicenter Study of Six Experienced Prostate Radiologists. Radiology,
280(3):793–804, 2016.
[26]
M.M.C. Elwenspoek, A.L. Sheppard, M.D.F. McInnes, and P. Whiting. Comparison of Multiparametric
Magnetic Resonance Imaging and Targeted Biopsy With Systematic Biopsy Alone for the Diagnosis of
Prostate Cancer: A Systematic Review and Meta-Analysis. JAMA Network Open, 2(8):e198427, 2019.
[27]
P. Schelb, J.P. Kohl, S.and Radtke, and D. Bonekamp. Classification of Cancer at Prostate MRI: Deep
Learning versus Clinical PI-RADS Assessment. Radiology, 293(3):607–617, 2019.
5
Appendix: Model Predictions
(a): Histologically-confirmed clinically significant prostate cancer emerging from PZ.
(b): Histologically-confirmed clinically significant prostate cancer emerging from TZ.
Figure 3: Mid-axial bpMRI slice of the prostate gland and its corresponding model predictions
(overlaid on T2W images) for two different patient scans in the testing dataset. In each case, the
patient is afflicted by a single instance of csPCa localized in a different part of the prostate anatomy.
6