Conference PaperPDF Available

Encoding Clinical Priori in 3D Convolutional Neural Networks for Prostate Cancer Detection in bpMRI

Abstract and Figures

We hypothesize that anatomical priors can be viable mediums to infuse domain-specific clinical knowledge into state-of-the-art convolutional neural networks (CNN) based on the U-Net architecture. We introduce a probabilistic population prior which captures the spatial prevalence and zonal distinction of clinically significant prostate cancer (csPCa), in order to improve its computer-aided detection (CAD) in bi-parametric MR imaging (bpMRI). To evaluate performance, we train 3D adaptations of the U-Net, U-SEResNet, UNet++ and Attention U-Net using 800 institutional training-validation scans, paired with radiologically-estimated annotations and our computed prior. For 200 independent testing bpMRI scans with histologically-confirmed delineations of csPCa, our proposed method of encoding clinical priori demonstrates a strong ability to improve patient-based diagnosis (upto 8.70% increase in AUROC) and lesion-level detection (average increase of 1.08 pAUC between 0.1-10 false positives per patient) across all four architectures.
Content may be subject to copyright.
Encoding Clinical Priori in 3D Convolutional Neural
Networks for Prostate Cancer Detection in bpMRI
Anindo Saha, Matin Hosseinzadeh, Henkjan Huisman
Diagnostic Image Analysis Group, Radboud University Medical Center
Nijmegen 6525 GA, The Netherlands
{anindya.shaha,matin.hosseinzadeh,henkjan.huisman}@radboudumc.nl
Abstract
We hypothesize that anatomical priors can be viable mediums to infuse domain-
specific clinical knowledge into state-of-the-art convolutional neural networks
(CNN) based on the U-Net architecture. We introduce a probabilistic population
prior which captures the spatial prevalence and zonal distinction of clinically
significant prostate cancer (csPCa), in order to improve its computer-aided detection
(CAD) in bi-parametric MR imaging (bpMRI). To evaluate performance, we train
3D adaptations of the U-Net, U-SEResNet, UNet++ and Attention U-Net using
800 institutional training-validation scans, paired with radiologically-estimated
annotations and our computed prior. For 200 independent testing bpMRI scans with
histologically-confirmed delineations of csPCa, our proposed method of encoding
clinical priori demonstrates a strong ability to improve patient-based diagnosis
(upto 8.70% increase in AUROC) and lesion-level detection (average increase of
1.08 pAUC between 0.1–10 false positives per patient) across all four architectures.
1 Introduction
State-of-the-art CNN architectures are often conceived as one-size-fits-all solutions to computer vision
challenges, where objects can belong to one of 1000 different classes and occupy any part of natural
color images [
1
]. In contrast, medical imaging modalities in radiology and nuclear medicine exhibit
much lower inter-sample variability, where the spatial content of a scan is limited by the underlying
imaging protocols and human anatomy. In agreement with recent studies [
2
4
], we hypothesize that
variant architectures of U-Net can exploit this property via an explicit anatomical prior, particularly
at the task of csPCa detection in bpMRI. To this end, we present a probabilistic population prior
P
,
constructed using radiologically-estimated csPCa annotations and CNN-generated prostate zonal
segmentations of 700 training samples. We propose
P
as a powerful means of encoding clinical priori
to improve patient-based diagnosis and lesion-level detection on histologically-confirmed cases. We
evaluate its efficacy across a range of popular 3D U-Net architectures that are widely adapted for
biomedical applications [5–9].
Related Work
Traditional image analysis techniques, such as MALF [
10
], can benefit from spatial
priori in the form of atlases or multi-expert labeled template images reflecting the target organ
anatomy. Meanwhile, machine learning models can adapt several techniques, such as reference
coordinate systems [
11
,
12
] or anatomical maps [
2
], to integrate domain-specific priori into CNN
architectures. In recent years, the inclusion of zonal priors [
4
] and prevalence maps [
3
] have yielded
similar benefits in 2D CAD systems for prostate cancer.
Anatomical Priors
For the
i
-th bpMRI scan in the training dataset, let us define its specific
prevalence map as
pi= (pi
1, pi
2, ..., pin)
, where
n
represents the total number of voxels per channel.
Let us define the binary masks for the prostatic transitional zone (TZ), peripheral zone (PZ) and
34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.
arXiv:2011.00263v4 [eess.IV] 21 Sep 2021
Figure 1:
(a) Prevalence Prior
:Pat
µ= 0.00
is equivalent to the mean csPCa annotation in the
training dataset; mapping the common sizes, shapes and locations of malignant lesions.
(b) Hybrid
Prior
:Pat
µ= 0.01
blends the information of csPCa annotations with that of the prostate zonal
segmentations.
(c) Zonal Prior
:Pat
µ= 0.33
is approximately equivalent to the weighted average
of all prostate zonal segmentations in the training dataset.
(d)
: Schematic of the pipeline used to
train/evaluate each candidate 3D CNN model with a variant of the prior P, in separate turns.
malignancy (M), if present, in this sample as
BT Z
,
BP Z
and
BM
, respectively. We can compute the
value of the j-th voxel in pias follows:
f(pij) =
0.00 pij(BT Z BTZ BM)0
µ pijBT Z BM
0
3µ pijBP Z BM
0
1.00 pijBM
Here,
f(pij)
aims to model the spatial likelihood of csPCa by drawing upon the empirical distribution
of the training dataset. Nearly 75% and 25% of all malignant lesions emerge from PZ and TZ,
respectively [
13
,
14
]. Thus, similar to PI-RADS v2 [
15
],
f(pij)
incorporates the importance of
zonal distinction during the assessment of csPCa. In terms of the likelihood of carrying csPCa,
it assumes that voxels belonging to the background class are not likely (
f(pij)=0.00
), those
belonging to TZ are more likely (
f(pij) = µ
), those belonging to PZ are three times as likely
as TZ (
f(pij) = 3µ
), and those containing csPCa are the most likely (
f(pij) = 1.00
), in any
given scan. All the computed specific prevalence maps can be generalized to a single probabilistic
population prior,
P= (Ppi)/N [0,1]
, where
N
represents the total number of training samples.
The value of
µ[0,0.33]
is a hyperparameter that regulates the relative contribution of benign
prostatic regions in the composition of each
pi
and subsequently our proposed prior
P
(refer to
Fig. 1(a-c)). Due to the standardized bpMRI imaging protocol [
15
], inter-sample alignment of the
prostate gland is effectively preserved with minimal spatial shifts observed across different patient
scans. Prior-to-image correspondence is established at both train-time and inference by using the
case-specific prostate segmentations to translate, orient and scale
P
, accordingly, for each bpMRI
scan. No additional non-rigid registration techniques have been applied throughout this process.
2 Experimental Analysis
Materials
To train and tune each model, we use 800 prostate bpMRI (T2W, high b-value DWI,
computed ADC) scans from Radboud University Medical Center, paired with fully delineated
annotations of csPCa. Annotations are estimated by a consensus of expert radiologists via PI-RADS
v2 [
15
], where any lesion marked PI-RADS
4 constitutes as csPCa. From here, 700 and 100
patient scans are partitioned into training and validation sets, respectively, via stratified sampling. To
evaluate performance, we use 200 testing scans from Ziekenhuisgroep Twente. Here, annotations are
clinically confirmed by independent pathologists [
16
,
17
] with Gleason Score
>3 + 3
corresponding
to csPCa. TZ, PZ segmentations are generated for every scan using a multi-planar, anisotropic 3D
U-Net from a separate study [
18
], where the network achieves an average Dice Similarity Coefficient
of
0.90 ±0.01
for whole-gland segmentation over
5×5
nested cross-validation. The network is
trained on a subset of 47 bpMRI scans from the training dataset and its output zonal segmentations
are used to construct and apply the anatomical priors (as detailed in Section 1). Special care is taken
to ensure mutually exclusive patients between the training, validation and testing datasets.
2
Experiments
Adjusting the value of
µ
can lead to remarkably different priors, as seen in Fig. 1(a-c).
We test three different priors, switching the value of
µ
between 0.00, 0.01 and 0.33, to investigate
the range of its impact on csPCa detection. Based on our observations in previous work [
4
], we opt
for an early fusion of the probabilistic priori, where each variant of
P
is stacked as an additional
channel in the input image volume (refer to Fig. 1(d)) via separate turns. Candidate CNN models
include 3D adaptations of the stand-alone U-Net [
5
], an equivalent network composed of Squeeze-
and-Excitation residual blocks [
6
] termed U-SEResNet, the UNet++ [
7
] and the Attention U-Net [
8
]
architectures. All models are trained using intensity-normalized (mean=0, stdev=1), center-cropped
(
144×144×18
) images with
0.5×0.5×3.6
mm
3
resolution. Minibatch size of 4 is used with an
exponentially decaying cyclic learning rate [
19
] oscillating between
106
and
2.5×104
. Focal loss
(
α= 0.75, γ = 2.00
) [
20
] is used to counter the 1:153 voxel-level class imbalance [
21
] in the training
dataset, with Adam optimizer [
22
] in backpropagation. Train-time augmentations include horizontal
flip, rotation (
7.5
to
7.5
), translation (
0
-
5%
horizontal/vertical shifts) and scaling (
0
-
5%
) centered
along the axial plane. During inference, we apply test-time augmentations by averaging predictions
over the original and horizontally-flipped images.
3 Results and Discussion
Patient-based diagnosis and lesion-level detection performance on the testing set are noted in Table
1 and Fig 2, respectively. For every combination of the 3D CNN models and a variant of the prior
P
, we observe improvements in performance over the baseline. Notably, the hybrid prior, which
retains a blend of both csPCa prevalence and zonal priori, shares the highest increases of 7.32–8.70%
in patient-based AUROC.
P
demonstrates a similar ability to enhance csPCa localization, with an
average increase of 1.08 in pAUC between 0.1–10 false positives per patient across all FROC setups.
Table 1: Patient-based diagnosis performance of each 3D CNN model paired with different variants of
the anatomical prior
P
. Performance scores indicate the mean metric followed by the 95% confidence
interval estimated as twice the standard deviation from 1000 replications of bootstrapping.
Architecture
Area Under Receiver Operating Characteristic (AUROC)
Baseline
Prevalence Prior
Zonal Prior Hybrid Prior
(without prior) (µ= 0.00) (µ= 0.33) (µ= 0.01)
U-Net [5] 0.690±0.079 0.737±0.076 0.740±0.073 0.763±0.071
U-SEResNet [6] 0.694±0.077 0.732±0.077 0.748±0.080 0.777±0.072
UNet++ [7] 0.694±0.078 0.734±0.080 0.752±0.079 0.781±0.069
Attention U-Net [8] 0.711±0.078 0.736±0.079 0.750±0.071 0.790±0.066
Figure 2: Lesion-level Free-Response Receiver Operating Characteristic (FROC) analyses of each
3D CNN model paired with different variants of the anatomical prior
P
:
(a)
U-Net
(b)
U-SEResNet
(c)
UNet++
(d)
Attention U-Net. Transparent areas indicate the 95% confidence intervals estimated
from 1000 replications of bootstrapping.
In this research, we demonstrate how the standardized imaging protocol of prostate bpMRI can be
leveraged to construct explicit anatomical priors, which can subsequently be used to encode clinical
priori into state-of-the-art U-Net architectures. By doing so, we are able to provide a higher degree of
train-time supervision and boost overall model performance in csPCa detection, even in the presence
3
of a limited training dataset with inaccurate annotations. In future study, we aim to investigate the
prospects of integrating our proposed prior in the presence of larger training datasets, as well as
quantitatively deduce its capacity to guide model generalization to histologically-confirmed testing
cases beyond the radiologically-estimated training annotations.
Broader Impact
Prostate cancer is one of the most prevalent cancers in men worldwide [
23
]. In the absence of
experienced radiologists, its multifocality, morphological heterogeneity and strong resemblance to
numerous non-malignant conditions in MR imaging, can lead to low inter-reader agreement (
<50%
)
and sub-optimal interpretation [
13
,
24
,
25
]. The development of automated, reliable detection
algorithms has therefore become an important research focus in medical image computing, offering
the potential to support radiologists with consistent quantitative analysis in order to improve their
diagnostic accuracy, and in turn, minimize unnecessary biopsies in patients [26, 27].
Data scarcity and inaccurate annotations are frequent challenges in the medical domain, where
they hinder the ability of CNN models to capture a complete, visual representation of the target
class(es). Thus, we look towards leveraging the breadth of clinical knowledge established in the
field, well beyond the training dataset, to compensate for these limitations. The promising results of
this study verifies and further motivates the ongoing development of state-of-the-art techniques to
incorporate clinical priori into CNN architectures, as an effective and practical solution to improve
overall performance.
Population priors for prostate cancer can be susceptible to biases that indicate asymmetrical prevalence.
For instance, the computed prior may exhibit a relatively higher response on one side (left/right),
stemming from an imbalanced spatial distribution of the malignant lesions sampled for the training
dataset. We strongly recommend adequate train-time augmentations (as detailed in Section 2) to
mitigate this challenge.
Acknowledgments and Disclosure of Funding
The authors would like to acknowledge the contributions of Maarten de Rooij and Ilse Slootweg
from Radboud University Medical Center during the annotation of fully delineated masks of prostate
cancer for every bpMRI scan used in this study. This research is supported in part by the European
Union H2020: ProCAncer-I project (EU grant 952159). Anindo Saha is supported by the Erasmus+:
EMJMD scholarship in Medical Imaging and Applications (MaIA) program.
References
[1]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image
Database. In 2009 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages
248–255, 2009.
[2]
A.V. Dalca, J. Guttag, and M.R. Sabuncu. Anatomical Priors in Convolutional Networks for Unsupervised
Biomedical Segmentation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), pages 9290–9299, 2018.
[3]
R. Cao, X. Zhong, F. Scalzo, S. Raman, and K. Sung. Prostate Cancer Inference via Weakly-Supervised
Learning using a Large Collection of Negative MRI. In 2019 IEEE/CVF International Conference on
Computer Vision Workshop (ICCVW), pages 434–439, 2019.
[4]
M. Hosseinzadeh, P. Brand, and H. Huisman. Effect of Adding Probabilistic Zonal Prior in Deep Learning-
based Prostate Cancer Detection. In International Conference on Medical Imaging with Deep Learning
–Extended Abstract Track, 2019.
[5]
O. Çiçek, A. Abdulkadir, S.S. Lienkamp, T. Brox, and O. Ronneberger. 3D U-Net: Learning Dense
Volumetric Segmentation from Sparse Annotation. In Medical Image Computing and Computer-Assisted
Intervention (MICCAI), pages 424–432, 2016.
[6]
J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu. Squeeze-and-Excitation Networks. IEEE Transactions on
Pattern Analysis and Machine Intelligence, pages 7132–7141, 2019.
[7]
Z. Zhou, M.M.R. Siddiquee, N. Tajbakhsh, and J. Liang. UNet++: Redesigning Skip Connections to
Exploit Multiscale Features in Image Segmentation. IEEE Transactions on Medical Imaging, 39(6):
1856–1867, 2020.
4
[8]
J. Schlemper, O. Oktay, M. Schaap, M. Heinrich, B. Kainz, B. Glocker, and D. Rueckert. Attention Gated
Networks: Learning to Leverage Salient Regions in Medical Images. Medical Image Analysis, 53:197–207,
2019.
[9]
L. Rundo, C. Han, Y. Nagano, J. Zhang, R. Hataya, C. Militello, A. Tangherloni, M.S. Nobile, C. Ferretti,
D. Besozzi, M.C. Gilardi, S. Vitabile, G. Mauri, H. Nakayama, and P. Cazzaniga. USE-Net: Incorporating
Squeeze-and-Excitation Blocks into U-Net for Prostate Zonal Segmentation of Multi-Institutional MRI
Datasets. Neurocomputing, 365:31 – 43, 2019.
[10]
H. Wang, J.W. Suh, S.R. Das, J.B. Pluta, C. Craige, and P.A. Yushkevich. Multi-Atlas Segmentation with
Joint Label Fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(3):611–623,
2013.
[11]
T. Kooi, G. Litjens, B. van Ginneken, A. Gubern-Mérida, C.I. Sánchez, R. Mann, A. den Heeten, and
N. Karssemeijer. Large Scale Deep Learning for Computer Aided Detection of Mammographic Lesions.
Medical Image Analysis, 35:303–312, 2017.
[12]
C. Wachinger, M. Reuter, and T. Klein. DeepNAT: Deep Convolutional Neural Network for Segmenting
Neuroanatomy. NeuroImage, 170:434–445, 2018.
[13]
B. Israël, M. van der Leest, M. Sedelaar, A.R. Padhani, P. Zámecnik, and J.O. Barentsz. Multiparametric
Magnetic Resonance Imaging for the Detection of Clinically Significant Prostate Cancer: What Urologists
Need to Know. Part 2: Interpretation. European Urology, 77(4):469–480, 2020.
[14]
M.E. Chen, D.A. Johnston, K. Tang, R.J. Babaian, and P. Troncoso. Detailed Mapping of Prostate
Carcinoma Foci: Biopsy Strategy Implications. Cancer, 89(8):1800–1809, 2000.
[15]
J.C. Weinreb, J.O. Barentsz, P.L. Choyke, and F. Cornud. PI-RADS Prostate Imaging – Reporting and
Data System: 2015, Version 2. European Urology, 69(1):16 – 40, 2016.
[16]
M. van der Leest, E. Cornel, B. Israël, and R. Hendriks. Head-to-head Comparison of Transrectal
Ultrasound-guided Prostate Biopsy Versus Multiparametric Prostate Resonance Imaging with Subsequent
Magnetic Resonance-guided Biopsy in Biopsy-naïve Men with Elevated Prostate-specific Antigen: A
Large Prospective Multicenter Clinical Study. European Urology, 75(4):570 – 578, 2019.
[17]
J.I. Epstein, L. Egevad, M.B. Amin, and B. Delahunt. The 2014 International Society of Urological
Pathology (ISUP) Consensus Conference on Gleason Grading of Prostatic Carcinoma: Definition of
Grading Patterns and Proposal for a New Grading System. Am. J. Surg. Pathol., 40(2):244–252, 2016.
[18]
T. Riepe, M. Hosseinzadeh, P. Brand, and H. Huisman. Anisotropic Deep Learning Multi-planar Automatic
Prostate Segmentation. In Proceedings of the 28th International Society for Magnetic Resonance in
Medicine Annual Meeting, 2020. URL
http://indexsmart.mirasmart.com/ISMRM2020/PDFfiles/
3518.html.
[19]
L.N. Smith. Cyclical Learning Rates for Training Neural Networks. In 2017 IEEE Winter Conference on
Applications of Computer Vision (WACV), pages 464–472, 2017.
[20]
T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. Focal Loss for Dense Object Detection. In 2017 IEEE
International Conference on Computer Vision (ICCV), pages 2999–3007, 2017.
[21]
R. Cao, A. Mohammadian Bajgiran, S. Afshari Mirak, S. Shakeri, X. Zhong, D. Enzmann, S. Raman, and
K. Sung. Joint Prostate Cancer Detection and Gleason Score Prediction in mp-MRI via FocalNet. IEEE
Transactions on Medical Imaging, 38(11):2496–2506, 2019.
[22]
D.P. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. In International Conference on
Learning Representations (ICLR), 2015. URL http://arxiv.org/abs/1412.6980.
[23]
K.D. Miller, L. Nogueira, A.B. Mariotto, J.H. Rowland, K.R. Yabroff, C.M. Alfano, A. Jemal, J.L. Kramer,
and R.L. Siegel. Cancer Treatment and Survivorship Statistics, 2019. CA: A Cancer Journal for Clinicians,
69(5):363–385, 2019.
[24]
C.P. Smith, S.A. Harmon, T. Barrett, and L.K. Bittencourt. Intra- and Interreader Reproducibility of
PI-RADS v2: A Multireader Study. Journal of Magnetic Resonance Imaging, 49(6):1694–1703, 2019.
[25]
A.B. Rosenkrantz, L.A. Ginocchio, D. Cornfeld, and A.T. Froemming. Interobserver Reproducibility of the
PI-RADS Version 2 Lexicon: A Multicenter Study of Six Experienced Prostate Radiologists. Radiology,
280(3):793–804, 2016.
[26]
M.M.C. Elwenspoek, A.L. Sheppard, M.D.F. McInnes, and P. Whiting. Comparison of Multiparametric
Magnetic Resonance Imaging and Targeted Biopsy With Systematic Biopsy Alone for the Diagnosis of
Prostate Cancer: A Systematic Review and Meta-Analysis. JAMA Network Open, 2(8):e198427, 2019.
[27]
P. Schelb, J.P. Kohl, S.and Radtke, and D. Bonekamp. Classification of Cancer at Prostate MRI: Deep
Learning versus Clinical PI-RADS Assessment. Radiology, 293(3):607–617, 2019.
5
Appendix: Model Predictions
(a): Histologically-confirmed clinically significant prostate cancer emerging from PZ.
(b): Histologically-confirmed clinically significant prostate cancer emerging from TZ.
Figure 3: Mid-axial bpMRI slice of the prostate gland and its corresponding model predictions
(overlaid on T2W images) for two different patient scans in the testing dataset. In each case, the
patient is afflicted by a single instance of csPCa localized in a different part of the prostate anatomy.
6
... Parallel to recent studies in medical image computing ( Gibson et al., 2018;Dalca et al., 2018;Wachinger et al., 2018;Cao et al., 2019b;Faryna et al., 2021 ) on infusing clinical priori into CNN architectures, we hypothesize that M 1 can benefit from an explicit anatomical prior for csPCa detection in bpMRI. To this end, we construct a probabilistic population prior P ∈ [0 , 1] , measuring 144 × 144 × 18 voxels, as introduced in our previous work ( Saha et al., 2020 ). P captures the spatial prevalence and zonal distinction of csPCa using 1584 radiologically-estimated csPCa ( PR ) annotations and CNN-generated prostate zonal segmentations, respectively, from the training dataset. ...
Article
Full-text available
We present a multi-stage 3D computer-aided detection and diagnosis (CAD) model² for automated localization of clinically significant prostate cancer (csPCa) in bi-parametric MR imaging (bpMRI). Deep attention mechanisms drive its detection network, targeting salient structures and highly discriminative feature dimensions across multiple resolutions. Its goal is to accurately identify csPCa lesions from indolent cancer and the wide range of benign pathology that can afflict the prostate gland. Simultaneously, a decoupled residual classifier is used to achieve consistent false positive reduction, without sacrificing high sensitivity or computational efficiency. In order to guide model generalization with domain-specific clinical knowledge, a probabilistic anatomical prior is used to encode the spatial prevalence and zonal distinction of csPCa. Using a large dataset of 1950 prostate bpMRI paired with radiologically-estimated annotations, we hypothesize that such CNN-based models can be trained to detect biopsy-confirmed malignancies in an independent cohort. For 486 institutional testing scans, the 3D CAD system achieves 83.69±5.22% and 93.19±2.96% detection sensitivity at 0.50 and 1.46 false positive(s) per patient, respectively, with 0.882±0.030 AUROC in patient-based diagnosis –significantly outperforming four state-of-the-art baseline architectures (U-SEResNet, UNet++, nnU-Net, Attention U-Net) from recent literature. For 296 external biopsy-confirmed testing scans, the ensembled CAD system shares moderate agreement with a consensus of expert radiologists (76.69%; kappa = 0.51±0.04) and independent pathologists (81.08%; kappa = 0.56±0.06); demonstrating strong generalization to histologically-confirmed csPCa diagnosis.
... It is also possible to add prior information in the data. Clinical prior represented by probability maps is used as additional training data in (Saha et al. 2020) for prostate cancer detection. ...
Article
Full-text available
Deep learning has become widely used for medical image segmentation in recent years. However, despite these advances, there are still problems for which deep learning-based segmentation fails. Recently, some deep learning approaches had a breakthrough by using anatomical information which is the crucial cue for manual segmentation. In this paper, we provide a review of anatomy-aided deep learning for medical image segmentation which covers systematically summarized anatomical information categories and corresponding representation methods. We address known and potentially solvable challenges in anatomy-aided deep learning and present a categorized methodology overview on using anatomical information with deep learning from over 70 papers. Finally, we discuss the strengths and limitations of the current anatomy-aided deep learning approaches and suggest potential future work.
... In Schelb et al. 55 , the performance of a previously developed csPCa segmentation model was evaluated prospectively on a simulated clinical deployment over the following three years, reaching a performance comparable to expert-provided PI-RADS. Other recent papers are similar in methodology to the previous ones 43,49,54 . Vente et al. 66 provided a comprehensive analysis of what techniques are (and are not) important for performing GGG classification within a csPCa segmentation framework, and found out that soft-label ordinal regression (scaling the GGG scale system to 0 to 1 scale and performing regression) performed best. ...
Preprint
Full-text available
The emergence of multi-parametric magnetic resonance imaging (mpMRI) has had a profound impact on the diagnosis of prostate cancers (PCa), which is the most prevalent malignancy in males in the western world, enabling a better selection of patients for confirmation biopsy. However, analyzing these images is complex even for experts, hence opening an opportunity for computer-aided diagnosis systems to seize. This paper proposes a fully automatic system based on Deep Learning that takes a prostate mpMRI from a PCa-suspect patient and, by leveraging the Retina U-Net detection framework, locates PCa lesions, segments them, and predicts their most likely Gleason grade group (GGG). It uses 490 mpMRIs for training/validation, and 75 patients for testing from two different datasets: ProstateX and IVO (Valencia Oncology Institute Foundation). In the test set, it achieves an excellent lesion-level AUC/sensitivity/specificity for the GGG$\geq$2 significance criterion of 0.96/1.00/0.79 for the ProstateX dataset, and 0.95/1.00/0.80 for the IVO dataset. Evaluated at a patient level, the results are 0.87/1.00/0.375 in ProstateX, and 0.91/1.00/0.762 in IVO. Furthermore, on the online ProstateX grand challenge, the model obtained an AUC of 0.85 (0.87 when trained only on the ProstateX data, tying up with the original winner of the challenge). For expert comparison, IVO radiologist's PI-RADS 4 sensitivity/specificity were 0.88/0.56 at a lesion level, and 0.85/0.58 at a patient level. Additional subsystems for automatic prostate zonal segmentation and mpMRI non-rigid sequence registration were also employed to produce the final fully automated system. The code for the ProstateX-trained system has been made openly available at https://github.com/OscarPellicer/prostate_lesion_detection. We hope that this will represent a landmark for future research to use, compare and improve upon.
... Parallel to recent studies in medical image computing (Gibson et al., 2018;Dalca et al., 2018;Wachinger et al., 2018;Cao et al., 2019b;Faryna et al., 2021) on infusing clinical priori into CNN architectures, we hypothesize that M 1 can benefit from an explicit anatomical prior for csPCa detection in bpMRI. To this end, we construct a probabilistic population prior P ∈ [0, 1], measuring 144 × 144 × 18 voxels, as introduced in our previous work (Saha et al., 2020). P captures the spatial prevalence and zonal distinction of csPCa using 1584 radiologically-estimated csPCa (PR) annotations and CNN-generated prostate zonal segmentations, respectively, from the training dataset. ...
Preprint
Full-text available
We present a multi-stage 3D computer-aided detection and diagnosis (CAD) model for automated localization of clinically significant prostate cancer (csPCa) in bi-parametric MR imaging (bpMRI). Deep attention mechanisms drive its detection network, targeting salient structures and highly discriminative feature dimensions across multiple resolutions. Its goal is to accurately identify csPCa lesions from indolent cancer and the wide range of benign pathology that can afflict the prostate gland. Simultaneously, a decoupled residual classifier is used to achieve consistent false positive reduction, without sacrificing high sensitivity or computational efficiency. In order to guide model generalization with domain-specific clinical knowledge, a probabilistic anatomical prior is used to encode the spatial prevalence and zonal distinction of csPCa. Using a large dataset of 1950 prostate bpMRI paired with radiologically-estimated annotations, we hypothesize that such CNN-based models can be trained to detect biopsy-confirmed malignancies in an independent cohort. For 486 institutional testing scans, the 3D CAD system achieves 83.69±5.22% and 93.19±2.96% detection sensitivity at 0.50 and 1.46 false positive(s) per patient, respectively, with 0.882±0.030 AUROC in patient-based diagnosis −significantly outperforming four state-of-the-art baseline architectures (U-SEResNet, UNet++, nnU-Net, Attention U-Net) from recent literature. For 296 external biopsy-confirmed testing scans, the ensembled CAD system shares moderate agreement with a consensus of expert radiologists (76.69%; kappa = 0.51±0.04) and independent pathologists (81.08%; kappa = 0.56±0.06); demonstrating strong generalization to histologically-confirmed csPCa diagnosis.
Article
Full-text available
The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations: (1) their optimal depth is apriori unknown, requiring extensive architecture search or inefficient ensemble of models of varying depths; and (2) their skip connections impose an unnecessarily restrictive fusion scheme, forcing aggregation only at the same-scale feature maps of the encoder and decoder sub-networks. To overcome these two limitations, we propose UNet++, a new neural architecture for semantic and instance segmentation, by (1) alleviating the unknown network depth with an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision ; (2) redesigning skip connections to aggregate features of varying semantic scales at the decoder sub-networks, leading to a highly flexible feature fusion scheme; and (3) devising a pruning scheme to accelerate the inference speed of UNet++. We have evaluated UNet++ using six different medical image segmentation datasets, covering multiple imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and electron microscopy (EM), and demonstrating that (1) UNet++ consistently outperforms the baseline models for the task of semantic segmentation across different datasets and backbone architectures; (2) UNet++ enhances segmentation quality of varying-size objects-an improvement over the fixed-depth U-Net; (3) Mask RCNN++ (Mask R-CNN with UNet++ design) outperforms the original Mask R-CNN for the task of instance segmentation; and (4) pruned UNet++ models achieve significant speedup while showing only modest performance degradation. Our implementation and pre-trained models are available at https://github.com/MrGiovanni/UNetPlusPlus.
Article
Full-text available
Background: There is large variability among radiologists in their detection of clinically significant (cs) prostate cancer (PCa) on multiparametric magnetic resonance imaging (mpMRI). Objective: To reduce the interpretation variability and achieve optimal accuracy in assessing prostate mpMRI. Design, setting, and participants: How the interpretation of mpMRI can be optimized is demonstrated here. Whereas part 1 of the "surgery-in-motion" paper focused on acquisition, this paper shows the correlation between (ab)normal prostate anatomical structures and image characteristics on mpMRI, and how standardized interpretation according to Prostate Imaging Reporting and Data System version 2 (PI-RADS v2) should be performed. This will be shown in individual patients. Surgical procedure: To detect csPCa, three mpMRI "components" are used: "anatomic" T2-weighted imaging, "cellular-density" diffusion-weighted imaging, and "vascularity" dynamic contrast-enhanced MRI. Measurements: Based on PI-RADS v2, the accompanying video shows how mpMRI interpretation is performed. Finally, the role of mpMRI in detecting csPCa is briefly discussed and the main features of the recently introduced PI-RADS v2.1 are evaluated. Results and limitations: With PI-RADS v2, it is possible to quantify normal and abnormal anatomical structures within the prostate based on its imaging features of the three mpMRI "components." With this knowledge, a more objective evaluation of the presence of a csPCa can be performed. However, there still remains quite some space to reduce interobserver variability. Conclusions: For understanding the interpretation of mpMRI according to PI-RADS v2, knowledge of the correlation between imaging and (ab)normal anatomical structures on the three mpMRI components is needed. Patient summary: This second surgery-in-motion contribution shows what structures can be recognized on prostate magnetic resonance imaging (MRI). How a radiologist performs his reading according to the so-called Prostate Imaging Reporting and Data System criteria is shown here. The main features of these criteria are summarized, and the role of prostate MRI in detecting clinically significant prostate cancer is discussed briefly.
Article
Full-text available
Importance: The current diagnostic pathway for patients with suspected prostate cancer (PCa) includes prostate biopsy. A large proportion of individuals who undergo biopsy have either no PCa or low-risk disease that does not require treatment. Unnecessary biopsies may potentially be avoided with prebiopsy imaging. Objective: To compare the performance of systematic transrectal ultrasonography-guided prostate biopsy vs prebiopsy biparametric or multiparametric magnetic resonance imaging (MRI) followed by targeted biopsy with or without systematic biopsy. Data sources: MEDLINE, Embase, Cochrane, Web of Science, clinical trial registries, and reference lists of recent reviews were searched through December 2018 for randomized clinical trials using the terms "prostate cancer" and "MRI." Study selection: Randomized clinical trials comparing diagnostic pathways including prebiopsy MRI vs systematic transrectal ultrasonography-guided biopsy in biopsy-naive men with a clinical suspicion of PCa. Data extraction and synthesis: Data were pooled using random-effects meta-analysis. Risk of bias was assessed using the revised Cochrane tool. Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines were followed. All review stages were conducted by 2 reviewers. Main outcomes and measures: Detection rate of clinically significant and insignificant PCa, number of biopsy procedures, number of biopsy cores taken, and complications. Results: Seven high-quality trials (2582 patients) were included. Compared with systematic transrectal ultrasonography-guided biopsy alone, MRI with or without targeted biopsy was associated with a 57% (95% CI, 2%-141%) improvement in the detection of clinically significant PCa, a 33% (95% CI, 23%-45%) potential reduction in the number of biopsy procedures, and a 77% (95% CI, 60%-93%) reduction in the number of cores taken per procedure. One trial showed reduced pain and bleeding adverse effects. Systematic sampling of the prostate in addition to the acquisition of targeted cores did not significantly improve the detection of clinically significant PCa compared with systematic biopsy alone. Conclusions and relevance: In this meta-analysis, prebiopsy MRI combined with targeted biopsy vs systematic transrectal ultrasonography-guided biopsy alone was associated with improved detection of clinically significant PCa, despite substantial heterogeneity among trials. Prebiopsy MRI was associated with a reduced number of individual biopsy cores taken per procedure and with reduced adverse effects, and it potentially prevented unnecessary biopsies in some individuals. This evidence supports implementation of prebiopsy MRI into diagnostic pathways for suspected PCa.
Article
Full-text available
We propose a novel attention gate (AG) model for medical image analysis that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules when using convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN models such as VGG or U-Net architectures with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed AG models are evaluated on a variety of tasks, including medical image classification and segmentation. For classification, we demonstrate the use case of AGs in scan plane detection for fetal ultrasound screening. We show that the proposed attention mechanism can provide efficient object localisation while improving the overall prediction performance by reducing false positives. For segmentation, the proposed architecture is evaluated on two large 3D CT abdominal datasets with manual annotations for multiple organs. Experimental results show that AG models consistently improve the prediction performance of the base architectures across different datasets and training sizes while preserving computational efficiency. Moreover, AGs guide the model activations to be focused around salient regions, which provides better insights into how model predictions are made. The source code for the proposed AG models is publicly available.
Article
Full-text available
Background: There is growing interest to implement multiparametric magnetic resonance imaging (mpMRI) and MR-guided biopsy (MRGB) for biopsy-naïve men with suspected prostate cancer. Objective: Primary objective was to compare and evaluate an MRI pathway and a transrectal ultrasound-guided biopsy (TRUSGB) pathway in biopsy-naïve men with prostate-specific antigen levels of ≥3ng/ml. Design, setting, and population: A prospective, multicenter, powered, comparative effectiveness study included 626 biopsy-naïve patients (from February 2015 to February 2018). Intervention: All patients underwent prebiopsy mpMRI followed by systematic TRUSGB. Men with suspicious lesions on mpMRI also underwent MRGB prior to TRUSGB. MRGB was performed using the in-bore approach. Outcome measurements and statistical analysis: Clinically significant prostate cancer (csPCa) was defined as grade group ≥2 (Gleason score ≥3+4) in any core. The main secondary objectives were the number of men who could avoid biopsy after nonsuspicious mpMRI, the number of biopsy cores taken, and oncologic follow-up. Differences in proportions were tested using McNemar's test with adjusted Wald confidence intervals for differences of proportions with matched pairs. Results and limitations: The MRI pathway detected csPCa in 159/626 (25%) patients and insignificant prostate cancer (insignPCa) in 88/626 patients (14%). TRUSGB detected csPCa in 146/626 patients (23%) and insignPCa in 155/626 patients (25%). Relative sensitivity of the MRI pathway versus the TRUSGB pathway was 1.09 for csPCa (p=0.17) and 0.57 for insignPCa (p<0.0001). The total number of biopsy cores reduced from 7512 to 849 (-89%). The MRI pathway enabled biopsy avoidance in 309/626 (49%) patients due to nonsuspicious mpMRI. Immediate TRUSGB detected csPCa in only 3% (10/309) of these patients, increasing to 4% (13/309) with 1-yr follow-up. At the same time, TRUSGB would overdetect insignPCa in 20% (63/309). "Focal saturation" by four additional perilesional cores to MRGB improved the detection of csPCa in 21/317 (7%) patients. Compared with the literature, our proportion of nonsuspicious mpMRI cases is significantly higher (27-36% vs 49%) and that of equivocal cases is lower (15-28% vs 6%). This is probably due to the high-quality standard in this study. Therefore, a limitation is the duplication of these results in less experienced centers. Conclusions: In biopsy-naïve men, the MRI pathway compared with the TRUSGB pathway results in an identical detection rate of csPCa, with significantly fewer insignPCa cases. In this high-quality standard study, almost half of men have nonsuspicious MRI, which is higher compared with other studies. Not performing TRUS biopsy is at the cost of missing csPCa only in 4%. Patient summary: We compared magnetic resonance imaging (MRI) with MRI-guided biopsy against standard transrectal ultrasound biopsy for the diagnosis of prostate cancer in biopsy-naïve men. Our results show that patients can benefit from MRI because biopsy may be omitted in half of men, and fewer indolent cancers are detected, without compromising the detection of harmful disease. Men also need fewer needles to make a diagnosis.
Article
Background Men suspected of having clinically significant prostate cancer (sPC) increasingly undergo prostate MRI. The potential of deep learning to provide diagnostic support for human interpretation requires further evaluation. Purpose To compare the performance of clinical assessment to a deep learning system optimized for segmentation trained with T2-weighted and diffusion MRI in the task of detection and segmentation of lesions suspicious for sPC. Materials and Methods In this retrospective study, T2-weighted and diffusion prostate MRI sequences from consecutive men examined with a single 3.0-T MRI system between 2015 and 2016 were manually segmented. Ground truth was provided by combined targeted and extended systematic MRI-transrectal US fusion biopsy, with sPC defined as International Society of Urological Pathology Gleason grade group greater than or equal to 2. By using split-sample validation, U-Net was internally validated on the training set (80% of the data) through cross validation and subsequently externally validated on the test set (20% of the data). U-Net-derived sPC probability maps were calibrated by matching sextant-based cross-validation performance to clinical performance of Prostate Imaging Reporting and Data System (PI-RADS). Performance of PI-RADS and U-Net were compared by using sensitivities, specificities, predictive values, and Dice coefficient. Results A total of 312 men (median age, 64 years; interquartile range [IQR], 58-71 years) were evaluated. The training set consisted of 250 men (median age, 64 years; IQR, 58-71 years) and the test set of 62 men (median age, 64 years; IQR, 60-69 years). In the test set, PI-RADS cutoffs greater than or equal to 3 versus cutoffs greater than or equal to 4 on a per-patient basis had sensitivity of 96% (25 of 26) versus 88% (23 of 26) at specificity of 22% (eight of 36) versus 50% (18 of 36). U-Net at probability thresholds of greater than or equal to 0.22 versus greater than or equal to 0.33 had sensitivity of 96% (25 of 26) versus 92% (24 of 26) (both P > .99) with specificity of 31% (11 of 36) versus 47% (17 of 36) (both P > .99), not statistically different from PI-RADS. Dice coefficients were 0.89 for prostate and 0.35 for MRI lesion segmentation. In the test set, coincidence of PI-RADS greater than or equal to 4 with U-Net lesions improved the positive predictive value from 48% (28 of 58) to 67% (24 of 36) for U-Net probability thresholds greater than or equal to 0.33 (P = .01), while the negative predictive value remained unchanged (83% [25 of 30] vs 83% [43 of 52]; P > .99). Conclusion U-Net trained with T2-weighted and diffusion MRI achieves similar performance to clinical Prostate Imaging Reporting and Data System assessment. © RSNA, 2019 Online supplemental material is available for this article. See also the editorial by Padhani and Turkbey in this issue.
Article
Multi-parametric MRI (mp-MRI) is considered the best non-invasive imaging modality for diagnosing prostate cancer (PCa). However, mp-MRI for PCa diagnosis is currently limited by the qualitative or semi-quantitative interpretation criteria, leading to inter-reader variability and a suboptimal ability to assess lesion aggressiveness. Convolutional neural networks (CNNs) are a powerful method to automatically learn the discriminative features for various tasks, including cancer detection. We propose a novel multi-class CNN, FocalNet, to jointly detect PCa lesions and predict their aggressiveness using Gleason score (GS). FocalNet characterizes lesion aggressiveness and fully utilizes distinctive knowledge from mp-MRI. We collected a prostate mp-MRI dataset from 417 patients who underwent 3T mp- MRI exams prior to robotic-assisted laparoscopic prostatectomy (RALP). FocalNet is trained and evaluated in this large study cohort with 5-fold cross-validation. In the free-response receiver operating characteristics (FROC) analysis for lesion detection, FocalNet achieved 89.7% and 87.9% sensitivity for index lesions and clinically significant lesions at 1 false positive per patient, respectively. For GS classification, evaluated by the receiver operating characteristics (ROC) analysis, FocalNet received the area under the curve (AUC) of 0.81 and 0.79 for the classifications of clinically significant PCa (GS≥3+4) and PCa with GS≥4+3, respectively. With the comparison to the prospective performance of radiologists using the current diagnostic guideline, FocalNet demonstrated comparable detection sensitivity for index lesions and clinically significant lesions, only 3.4% and 1.5% lower than highly experienced radiologists without statistical significance.
Article
Background The Prostate Imaging Reporting and Data System version 2 (PI‐RADSv2) has been in use since 2015; while interreader reproducibility has been studied, there has been a paucity of studies investigating the intrareader reproducibility of PI‐RADSv2. Purpose To evaluate both intra‐ and interreader reproducibility of PI‐RADSv2 in the assessment of intraprostatic lesions using multiparametric magnetic resonance imaging (mpMRI). Study Type Retrospective. Population/Subjects In all, 102 consecutive biopsy‐naïve patients who underwent prostate MRI and subsequent MR/transrectal ultrasonography (MR/TRUS)‐guided biopsy. Field Strength/Sequences Prostate mpMRI at 3T using endorectal with phased array surface coils (TW MRI, DW MRI with ADC maps and b2000 DW MRI, DCE MRI). Assessment Previously detected and biopsied lesions were scored by four readers from four different institutions using PI‐RADSv2. Readers scored lesions during two readout rounds with a 4‐week washout period. Statistical Tests Kappa (κ) statistics and specific agreement (Po) were calculated to quantify intra‐ and interreader reproducibility of PI‐RADSv2 scoring. Lesion measurement agreement was calculated using the intraclass correlation coefficient (ICC). Results Overall intrareader reproducibility was moderate to substantial (κ = 0.43–0.67, Po = 0.60–0.77), while overall interreader reproducibility was poor to moderate (κ = 0.24, Po = 46). Readers with more experience showed greater interreader reproducibility than readers with intermediate experience in the whole prostate (P = 0.026) and peripheral zone (P = 0.002). Sequence‐specific interreader agreement for all readers was similar to the overall PI‐RADSv2 score, with κ = 0.24, 0.24, and 0.23 and Po = 0.47, 0.44, and 0.54 in T2‐weighted, diffusion‐weighted imaging (DWI), and dynamic contrast‐enhanced (DCE), respectively. Overall intrareader and interreader ICC for lesion measurement was 0.82 and 0.71, respectively. Data Conclusion PI‐RADSv2 provides moderate intrareader reproducibility, poor interreader reproducibility, and moderate interreader lesion measurement reproducibility. These findings suggest a need for more standardized reader training in prostate MRI. Level of Evidence: 2 Technical Efficacy: Stage 2