ArticlePDF Available

Abstract and Figures

Gleason score 7 prostate cancer with a higher proportion of pattern 4 (G4) has been linked to genomic heterogeneity and poorer patient outcome. The current assessment of G4 proportion uses estimation by a pathologist, with a higher proportion of G4 more likely to trigger additional imaging and treatment over active surveillance. This estimation method has been shown to have inter-observer variability. Fifteen patients with Prostate Grade Group (GG) 2 (Gleason 3 + 4) and fifteen patients with GG3 (Gleason 4 + 3) disease were selected from the PROMIS study with 192 haematoxylin and eosin-stained slides scanned. Two experienced uropathologists assessed the maximum cancer core length (MCCL) and G4 proportion using the current standard method (visual estimation) followed by detailed digital manual annotation of each G4 area and measurement of MCCL (planimetric estimation) using freely available software by the same two experts. We aimed to compare visual estimation of G4 and MCCL to a pathologist-driven digital measurement. We show that the visual and digital MCCL OPEN
Objective measurement of MCCL and shows a discrepancy with visual measurement and pathologist estimation. (A) MCCL difference between visual and digital MCCL shows under-estimation in visual compared to digital MCCL. Bar plot of visual MCCL in yellow and digital MCCL in blue, organised by Gleason score. MCCL is plotted on the y-axis; each patient is plotted on the x-axis. Red dashed lines represent a threshold of 6 mm as the MCCL criterion for significance (PROMIS definition 1). Patients highlighted in red were over or underestimated in the original visual measurement. (B) Waterfall plot representing the difference between visual and digital measurements as digital MCCL-visual MCCL by Gleason score (y-axis), patients plotted on the x-axis. Visual Gleason score is represented in yellow for 3 + 4 and blue for 4 + 3. Bars with a negative value represent measurements where the visual MCCL was shorter than the digital MCCL (underestimation). Bars with a positive value represent cases were the visual MCCL was higher than the digital MCCL. The difference in 80% of cases is ± 2 mm (n = 24), red dashed line at − 2 and 2 mm difference. (C) Density plots representing the MCCL distribution between visual and digital images by Gleason scores. Y-axis represents the Kernel density estimation. The X-axis contains MCCL values. Visual MCCL score is represented in yellow and blue for the digital measurement. 4 + 3.The mean visual MCCL was 9.53 mm (5-15 mm) and the mean digital MCCL was 9.88 mm (5.01-15.74).
… 
Content may be subject to copyright.

Scientic Reports | (2020) 10:17177 | 
www.nature.com/scientificreports
A critical evaluation of visual
proportion of Gleason 4
and maximum cancer core length
quantied by histopathologists
Lina Maria Carmona Echeverria1,2*, Aiman Haider3, Alex Freeman3,
Urszula Stopka‑Farooqui1, Avi Rosenfeld4, Benjamin S. Simpson1, Yipeng Hu5,
David Hawkes5, Hayley Pye1, Susan Heavey1, Vasilis Stavrinides1,2, Joseph M. Norris1,2,
Ahmed El‑Shater Bosaily2,6, Cristina Cardona Barrena1, Simon Bott7, Louise Brown8,
Nick Burns‑Cox9, Tim Dudderidge10, Alastair Henderson11, Richard Hindley12,
Richard Kaplan8, Alex Kirkham5,13, Robert Oldroyd14, Maneesh Ghei15, Raj Persad16,
Shonit Punwani5,13, Derek Rosario17, Iqbal Shergill18, Mathias Winkler19,
Hashim U. Ahmed19,20, Mark Emberton2 & Hayley C. Whitaker1
Gleason score 7 prostate cancer with a higher proportion of pattern 4 (G4) has been linked to genomic
heterogeneity and poorer patient outcome. The current assessment of G4 proportion uses estimation
by a pathologist, with a higher proportion of G4 more likely to trigger additional imaging and
treatment over active surveillance. This estimation method has been shown to have inter‑observer
variability. Fifteen patients with Prostate Grade Group (GG) 2 (Gleason 3 + 4) and fteen patients with
GG3 (Gleason 4 + 3) disease were selected from the PROMIS study with 192 haematoxylin and eosin‑
stained slides scanned. Two experienced uropathologists assessed the maximum cancer core length
(MCCL) and G4 proportion using the current standard method (visual estimation) followed by detailed
digital manual annotation of each G4 area and measurement of MCCL (planimetric estimation) using
freely available software by the same two experts. We aimed to compare visual estimation of G4
and MCCL to a pathologist‑driven digital measurement. We show that the visual and digital MCCL
OPEN
Molecular Diagnostics and Therapeutics Group, Division of Surgery and Interventional Science, University College
     Division of Surgery and Interventional
             
            
   
   Centre for Medical Image Computing, University College London, Charles Bell House,
  

             

   
           
          Department of Urology, Hampshire
         Department of
           
            
             
      
           
      
     Department of Urology, Imperial College London, South
  Imperial Prostate, Division of Surgery, Department of Surgery and
  *email:

Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2020) 10:17177 | 
www.nature.com/scientificreports/
measurement diers up to 2 mm in 76.6% (23/30) with a high degree of agreement between the two
measurements; Visual gave a median MCCL of 10 ± 2.70 mm (IQR 4, range 5–15 mm) compared to
digital of 9.88 ± 3.09 mm (IQR 3.82, range 5.01–15.7 mm) (p = 0.64) The visual method for assessing
G4 proportion over‑estimates in all patients, compared to digital measurements [median 11.2% (IQR
38.75, range 4.7–17.9%) vs 30.4% (IQR 18.37, range 12.9–50.76%)]. The discordance was higher as
the amount of G4 increased (Bias 18.71, CI 33.87–48.75, r 0.7, p < 0.0001). Further work on assessing
actual G4 burden calibrated to clinical outcomes might lead to the use of diering G4 thresholds
of signicance if the visual estimation is used or by incorporating semi‑automated methods for G4
burden measurement.
Gleason pattern 4 (G4) prostate cancer is genetically distinct from Gleason pattern 3 and correlates with worse
cancer control outcomes either on active surveillance or following active treatment1,2. In 2013 Pierorazio etal.,
retrospectively reviewed 7850 radical prostatectomy specimens to investigate the short-term biochemical out-
come using a prognostic based scoring system called the Prostate Grading Group (GG). By separating the
Gleason sum 7 group into 3 + 4 and 4 + 3, the authors found that men with 4 + 3 had worse outcome dened as
biochemical recurrence-free survival3. ese ndings were further validated and were subsequently endorsed by
the 2014 International Society of Urological Pathology Consensus Conference and the World Health Organiza-
tion (WHO)46. Additionally, there is some uncertainty about whether %G4 in 3 + 4 cancers is also relevant to
management and outcome79.
is new classication system calls for improved categorisation of the percentage of G4 (%G4) in Prostate
Cancer (PCa) to allow for better risk stratication and inform treatment decisions7,912. e distinction between
Gleason 3 + 4 (GG2) and 4 + 3 (GG3) is made when %G4 falls below or above 50%, respectively, as visually
estimated by a uropathologist5. Additionally, the maximum amount of cancer in any core (maximum cancer
core length, MCCL) has been used as a proxy for tumour volume estimation and can be used to dene clinical
signicance13,14.
Most histological prostate cancer burden studies have been performed in radical prostatectomy specimens
or on men who have undergone transrectal systematic biopsies. e Prostate MR Imaging Study (PROMIS)
includes men who are biopsy naïve whose prostates were systematically sampled every 5mm providing a unique
opportunity to perform an in-depth pathologist-driven annotation and digital analysis of the pathological slides
and compare this to the visually-reported %G4 and MCCL15.
In this study, we aimed to compare %G4 and MCCL within standard practice, estimated by a pathologist, to a
calculated burden from digitally annotated slides by the same pathologists on thirty patients from the PROMIS
study with GG2 and GG3 PCa.
Results
Comparison between visual and digital MCCL. When comparing visual versus digital MCCL, in 23 of
the 30 patients the dierence was up to ± 2mm; taking into account the positive and negative values the median
dierence was 0.58mm (range −4.12 to + 5.52mm, t-test, p = 0.64) (Fig.1A,B). Seven patients had measure-
ments that diered by ≥ 2mm between digital and visual estimation. When viewed as a density plot, there was a
tendency to overestimate MCCL in the 3 + 4 group and under-estimate in the 4 + 3 group when using the visual
method (Fig.1C). To understand the degree of agreement between the two measurements, a Bland–Altman test
was performed16. ere was no systematic dierence (bias) between the visual and digital assessment of MCCL,
and there was no correlation between increasing MCCL and the level of disagreement between the two measure-
ments (Supplementary gureS.1).
Gleason 4. e visual %G4 overestimated %G4 burden when compared to the digital assessment in all cases
(Fig.2A). e 4 + 3 group had a mean dierence of + 26.6% (range 9.6–41.9%) compared to + 10.8% (range
1.3–24.9%) for the 3 + 4 group (t-test, p = 1.9 × 10–5). e average %G4 in the patients graded 3 + 4 was 11.2%
(range 4.7–17.9%) compared to 30.4% (range 12.9–50.6%) in the 4 + 3 group (t-test, p < 0.0001). When patholo-
gists were asked to assess the overall Gleason score based on the digital images (visual %G4), two patients were
downgraded from their original clinical grading of 4 + 3 to 3 + 4 by both pathologists (See yellow bars of patients
23 and 18 in Fig.2A).
Using the established 50% G4 threshold to designate a 4 + 3 cancer, and based on the digital %G4 (blue bars),
only one patient (number 19 in Fig.2A) would be classied as 4 + 3. When dividing the digital %G4 into quartiles,
two patients in the original 4 + 3 group had less %G4 than the upper quartile of the 3 + 4 group (18 and 30). In
other words, these two patients had less %G4 than the men with the highest %G4 compromise in the original
3 + 4 g roup. Figure2B shows the Bland–Altman analysis; showing that there was a bias towards overestimation in
the visual estimations as all values are located above the line of complete agreement (Complete agreement would
result in a zero value). e disagreement was larger when more than 20% of G4 was present (R 0.79, p < 0.0001).
Examination of the index block (block with the highest Gleason score and MCCL), revealed the same nd-
ings as previously seen with all tumour containing cores (Fig.3A). e visual assessment of digitised images
downgraded four patients index block from 4 + 3 to 3 + 4 (patients 18, 30,16, and 23). When examining the
digital %G4, only two patients reached the 50% G4 threshold (27 and 19), and so would be the only two patients
with 4 + 3 disease based on digital measurement. e Bland–Altman analysis revealed a similar trend to that of
the overall %G4 analysis. One measurement had a complete agreement between the digital and visual estimate
(Patient 6 in Fig.3A).One patient had a higher digital estimation compared to the visual estimation (Patient 4 in
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2020) 10:17177 | 
www.nature.com/scientificreports/
Fig.3A). is is represented by the only dot in the negative area of Fig.3B.e disagreement between measure-
ments increased as the amount of %G4 increased (R 0.6, p < 0.0001).
When patients were classied using the clinical signicance criteria used in PROMIS in which MCCL and
Gleason score were combined to derive denitions 1 (≥ 4 + 3 or 6mm) and 2 (≥ 3 + 4 or 4mm) the digital
analysis reclassied four patients’ index block as lower risk13. When all blocks were compared using this system,
20 patients had discrepancy between the visual and digital classication, leading to reclassication to higher or
lower risk in six and fourteen patients, respectively (Supplementary FigureS2).
Discussion
We have presented an in-depth analysis of 30 men from the PROMIS trial, to establish the level of agreement
between the gold standard visual estimation of MCCL and %G4, compared to digitally annotated images. Limi-
tations to this study include: e presence of cribriform pattern was not recorded separately in this study or
Figure1. Objective measurement of MCCL and shows a discrepancy with visual measurement and pathologist
estimation. (A) MCCL dierence between visual and digital MCCL shows under-estimation in visual compared
to digital MCCL. Bar plot of visual MCCL in yellow and digital MCCL in blue, organised by Gleason score.
MCCL is plotted on the y-axis; each patient is plotted on the x-axis. Red dashed lines represent a threshold
of 6mm as the MCCL criterion for signicance (PROMIS denition 1). Patients highlighted in red were over
or underestimated in the original visual measurement. (B) Waterfall plot representing the dierence between
visual and digital measurements as digital MCCL-visual MCCL by Gleason score (y-axis), patients plotted on
the x-axis. Visual Gleason score is represented in yellow for 3 + 4 and blue for 4 + 3. Bars with a negative value
represent measurements where the visual MCCL was shorter than the digital MCCL (underestimation). Bars
with a positive value represent cases were the visual MCCL was higher than the digital MCCL. e dierence in
80% of cases is ± 2mm (n = 24), red dashed line at −2 and 2mm dierence. (C) Density plots representing the
MCCL distribution between visual and digital images by Gleason scores. Y-axis represents the Kernel density
estimation. e X-axis contains MCCL values. Visual MCCL score is represented in yellow and blue for the
digital measurement. 4 + 3.e mean visual MCCL was 9.53mm (5–15mm) and the mean digital MCCL was
9.88mm (5.01–15.74).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2020) 10:17177 | 
www.nature.com/scientificreports/
included in the nal analysis. In addition, the pathologists retrospectively assessed %G4 on annotated images,
introducing potential bias in their assessment. Finally, no long term follow up currently exists for the PROMIS
study, so we are unable to determine the prognostic signicance of our ndings.
A threshold of 4mm and 6mm has been shown to correlate with 95% of lesions that have a volume higher
than 0.2mL or 0.5mL, respectively13. Demetrios etal. found that MCCL greater than 10mm can predict T3
disease and large tumour volumes with a hazard ratio (HR) of 5.7314. Using these thresholds and taking into
account the dierence in the MCCL measurements, there is a potential impact on the treatment options oered.
For instance, patients reclassied as having < 6mm MCCL could be candidates for active surveillance instead of
radical therapy (Patients 6, 18, 19 and 20) (Fig.1A,B). Interestingly, the visual measurement of men with 3 + 4
Figure2. Visual Gleason 4 appraisal overestimates burden of disease. (A) Bar plot of the proportion of
Gleason 4 estimation average between two uropathologists (yellow) and digital estimation (blue). %G4 is
plotted on the y-axis; each patient is plotted on the x-axis. A threshold of 50% g4 for clinical signicance is
shown as a red dashed line. Patient number on the x-axis is highlighted in bold and underlined if the digital
measurement of their %G4 would lead to reclassication based on the digital value. Patient marked with *
has 50% G4 in the digital measurement. (B) Bland–Altman plot representing the dierence in measurement in
the y-axis as visual %G4 – digital %G4. e x-axis represents the mean %G4 measurement of both techniques
as (visual %G4 + digital %G4)/2. e bold black line represents complete agreement at 0. e purple dashed
line corresponds to the bias at 18.71; the dotted purple line corresponds to the bias condence interval
(33.87–48.75). Dash and dotted blue lines correspond to the upper and lower limit of agreement and condence
intervals are plotted with dotted blue lines. Upper limit of agreement: 41.31 (33.87–48.75), lower limit of
agreement: −3.87 (−11.31 to 3.56). Regression line is plotted as a continuous blue line.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2020) 10:17177 | 
www.nature.com/scientificreports/
disease was more likely to be greater compared to men with 4 + 3 disease (Fig.1C). Despite these dierences,
the Bland–Altman analysis showed good concordance between the two measurements; thus, the accuracy of the
MCCL is not compromised when a digital tool is used.
In our study, the visual estimation of %G4 diered from the digital one; accurate measurement of the G4 bur-
den has been shown to help risk-stratify patients9,17. In a study by de Souza etal., 20% of Gleason 3 + 4 tumours
had more extensive G4 disease than the rst quartile of 4 + 3 tumours in radical prostatectomy specimens18. In
2014, Huang etal. found that 45% of men with ≤ 5% of G4 in prostate biopsy had insignicant cancer in radical
prostatectomy7. Additionally, several papers have shown that tumours with lower %G4 behave closer to GG1
tumours3,8,9,1922.
Figure3. Objective measurement of Gleason 4 burden shows a discrepancy between visual measurement
and the digital measurement for the index block. (A) Visual %G4 for the index block 30 patients shown in
yellow overlaid with digital %G4 in blue. Patients separated by original Gleason grade grouping; 3 + 4 or
4 + 3, and organized by visual %G4. A threshold of 50% G4 for clinical signicance is shown as a red dashed
line. Patient number on the x-axis highlighted in bold and underlined if the objective measurement of their
%G4 would cause reclassication. (B) Bland–Altman plot representing the dierence in measurement in the
y-axis as visual %G4−digital %G4. e x-axis represents the mean %G4 measurement of both techniques as
(visual %G4 + digital %G4)/2. e bold black line represents complete agreement at 0. e purple dashed line
corresponds to the bias at 14.36; the dotted purple line corresponds to the bias condence interval (9.78–18.94).
Dash and dotted blue lines correspond to the upper and lower limit of agreement and condence intervals are
plotted with dotted blue lines. Upper limit of agreement: 38.40 (30.49–46.32), lower limit of agreement: −9.67
(−17.59 to −1.76). e regression line is plotted as a continuous blue line.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2020) 10:17177 | 
www.nature.com/scientificreports/
In this study, we found that visual estimation always overestimated the amount of G4 compared to a digitally
calculated %G4. For all of these patients, reclassication of the %G4 would potentially lead to a change in treat-
ment options, and imaging follow up. For example, patient 18 was reclassied aer digital assessment and would
be downgraded from 4 + 3 of > 6mm to 3 + 4 of < 6mm (Fig.2A). e same was found when we examined the
index block only.
Integration of %G4 reporting in biopsies and radical prostatectomy specimens is already recommended6.
e ndings of our study suggest that a re-assessment of %G4 estimation may be required. Reclassication of
G4 could lead to a re-evaluation of previously published biomarker and clinical studies and redene the refer-
ence standard for research. e heterogeneity of studies of the prognostic importance of Gleason 3 + 4 disease
as compared with Gleason 4 + 3 disease may be a reection of uncertainty about how much G4 pattern disease
is actually shown in specimens and is particularly relevant to treatments such as radiotherapy or ablation where
there is no whole mount radical prostatectomy specimen to analyse.
As we move toward the inclusion of digital pathology in standard clinical practice, it will be essential to inves-
tigate the dierences between human and digital estimation of key pathological parameters and the potential
impact this could have on patient care. is will involve adapting the current visual classication to digitally-
derived grading. is study does not aim to highlight human error or criticize visual estimation of the patholo-
gists but to encourage the use of technology to improve our understanding of MCCL and G4 burden in prostate
cancer, and to seek novel methods to quantify and study the disease. Whilst this type of analysis would be cur-
rently challenging to embed directly into clinical practice due to the time taken to contour each region; work is
already ongoing to automate this process2328. Identifying relatively overlooked elements, such as %G4, improves
the accuracy of the models used in machine learning29, as such future algorithms can be trained to specically
identify %G4, rather than GG alone.
Further research is also needed to develop and validate new thresholds of the burden of G4 against large
cohorts with medium and long-term cancer control outcomes.
Materials and methods
Patients. Two-hundred and twenty-six patients from University College London Hospital took part in the
PROMIS trial. Men underwent 5mm sampling using a transperineal template mapping procedure. Of 113 men
with Gleason 7 PCa, 85 had signicant disease (PROMIS denition 1: Gleason 4 + 3 or MCCL 6mm). 15
patients with Gleason 3 + 4 and 15 patients with 4 + 3 disease were selected from the 85, using a random number
generator (Table1; Fig.4A). A mean of 14.2 ± 8.05 cores per patient (IQR 9, range 2–34) were taken. 192 H&E
slides from these 30 patients were scanned using a NanoZoomer-SQ digital slide scanner (Hamamatsu).
Digital scan annotations and data collection. Two experienced UCH uropathologists with 16years
(AF) and 1.5years’ experience (AH) were involved in this study. e 30 cases included in this study were origi-
nally reported by AF as part of the PROMIS trial. e pathologists were blinded to the PROMIS Gleason score;
scans were shown randomly and assessed by two experienced uropathologists (AF/AH) using NDP.View 2 so-
ware. Each slide was systematically assessed as follows: 1. Each core was numbered from le to right. 2. Length
Table 1. Gleason 7 patients in the PROMIS cohort and 30 selected patients for in-depth analysis. Table
comparing the Gleason 7 patients from University College London (UCH) within the PROMIS study. UCH
PROMIS cohort is on the le, selected patients on the right. Number of patients per group by Gleason score in
each cohort as n = , percentage in parenthesis. Mean value for age, prostate volume, presenting PSA and PSA
density, with range in parenthesis. Age is denoted in years, prostate volume in cubic centimetres (cc), PSA in
ng/dL and PSA density calculated as PSA/prostate volume. Likert scores are presented as number of patients
and percentage in parenthesis, Likert NA when no Likert score was given. *p-value obtained using an unpaired
t-test, **if using Mann–Whitney test.
UCH—PROMIS cohort (4 + 3
or ≥ 6mm MCCL) p value (3 + 4 vs
4 + 3) Selected 30 patients p value (3 + 4 vs
4 + 3)
Gleason score 3 + 4 4 + 3 3 + 4 4 + 3
n = 67 (78%) n = 18 (22%) n = 15 (50%) n = 15 (50%)
Age (years) 63 (43–77) 64 (48–79) 0.44* 62 (50–72) 65 (48–79) 0.30*
Prostate volume
(cc) 38.34 (16–83) 38.18 (26–55) 0.65** 34 (21–62) 38 (26–55) 0.11**
Presenting PSA
(ng/dL) 7.46 (1.30–13) 10.76 (5.7–15) < 0.0001* 7.60 (4.9–10.1) 10.74 (6.2–15) 0.0005*
PSA density
(PSAd) 0.22 (0.06–0.59) 0.29 (0.11–0.53) 0.002** 0.24 (0.10–0.38) 0.29 (0.11–0.53) 0.14*
Likert 2 1 (1.4%) 0 0 0
Likert 3 8 (11.9%) 3 (16.6%) 1 (6.6%) 0
Likert 4 21 (31.3%) 3 (16.6%) 6 (40%) 4 (26.6%)
Likert 5 5 (7.46%) 12 (66.6%) 8 (53.3%) 11 (73.3%)
Likert NA 4 (5.87%) 0 0 1 (6.6%)
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2020) 10:17177 | 
www.nature.com/scientificreports/
of cancer was measured (Fig.4B). 3. Areas containing any cancer were contoured in yellow (Fig.4C). 4. Areas
containing G4 were contoured in black (Fig.4C).
e MCCL was reported prospectively by the pathologists during the trial using the integrated ruler in the
microscope; this measurement was assigned as ‘visual’ MCCL. In PROMIS, the MCCL was reported by taking
into account intervening benign glands (ISUP) and measuring cancer only. For the purposes of this study, the
ISUP measurement was used. e ‘digital’ MCCL was derived as follows: If a core was straight, a single measure-
ment was performed. If there was any curvature, manual sequential measurements were performed along the
core axes and combined to give the nal measurement.
%G4 was not collected as part of the original trial, pathologists retrospectively visually estimated the %G4
per patient to the closest 10% using the annotated images. is was assigned as ‘visual’ %G4. For digital %G4,
the soware performs instant area measurements. e resulting area (for each yellow and black contours) was
prospectively recorded, and an objective percentage of G4 was calculated as shown in equation1 (Fig.4D). is
total was assigned as ‘digital’ %G4. A separate analysis of the index block was performed separately. e index
block was dened as the block with the highest Gleason score and MCCL in combination with concordance
with the index lesion on mpMRI.
Statistical analysis. Patients were divided according to the original Gleason score from the PROMIS trial
into 3 + 4 and 4 + 3. e routinely performed ‘visual’ estimation for both measurements was used as the reference
standard for all comparisons. When comparing two groups, meeting normal distribution (Shapiro–Wilk test)
and same variances (F-test), a student t-test was applied. Whenever data was not normally distributed a Mann–
Whitney test was performed. To quantify the agreement between the two methods, the Bland Altman method
was performed. e visual method was used as a standard for comparison; bias was dened as the average of
the dierence between the two methods. Limits of agreement were calculated at 95% CI. All analyses were made
using R: A Language and Environment for Statistical Computing30. e Bland–Altman analysis was performed
using the blandr package for R31.
Ethical approval. All clinical samples were collected from University College London Hospital NHS Trust
patients who had provided informed consent. Ethics committee approval was granted by National Research
Figure4. Patient selection and methods of digital manual annotation. (A) Euler diagram representing patient
selection process for 30 patients for in-depth analysis. (B) NDPview2 image of scanned H&E slide of prostate
cores from transperineal biopsies, where nuclei are shown in blue, and other structures in pink. From le to
right, MCCL measurement in a straight core of 8.5mm. Approximate visual pathologist measurement marked
with a red line (7.76mm). Following the axis of the core, three measurements in black of 2.53mm, 2.11mm and
4.48mm for a total of 9.12mm for the digital measurement. (C) ree prostate cores, areas with cancer were
contoured in yellow, areas with Gleason 4 were contoured in black. Close up of contours shown in black box.
Non-contoured areas correspond to benign prostatic tissue. (D) Equation used to derive percentage Gleason 4.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2020) 10:17177 | 
www.nature.com/scientificreports/
Ethics Service Committee London (reference 11/LO/0185). Access to biobank samples was obtained [reference
(EC/21.16)]. All analyses were performed in accordance with relevant guidelines and regulations.
Received: 30 April 2020; Accepted: 28 August 2020
References
1. Rubin, M. A., Girelli, G. & Demichelis, F. Genomic correlates to the newly proposed grading prognostic groups for prostate cancer.
Eur. Urol. 69, 557–560 (2016).
2. Sowalsky, A. G. et al. Gleason score 7 prostate cancers emerge through branched evolution of clonal Gleason pattern 3 and 4. Clin.
Cancer Res. 23, 3823–3833 (2017).
3. Pierorazio, P. M., Walsh, P. C., Partin, A. W., Epstein, J. I. & Epstein, J. Prognostic Gleason grade grouping: data based on the
modied Gleason scoring system. BJU Int. 111, 753–760 (2013).
4. Epstein, J. I. et al. A contemporary prostate cancer grading system: a validated alternative to the Gleason score. Eur. Urol. 69,
428–435 (2016).
5. Epstein, J. I. et al. e 2014 International Society of Urological Pathology (ISUP) consensus conference on Gleason grading of
prostatic carcinoma denition of grading patterns and proposal for a new grading system. Am. J. Surg. Pathol. 40, 244–252 (2016).
6. Humphrey, P. A., Moch, H., Cubilla, A. L., Ulbright, T. M. & Reuter, V. E. e 2016 WHO classication of tumours of the urinary
system and male genital organs—Part B: prostate and bladder tumours. Eur. Urol. 70, 106–119 (2016).
7. Huang, C. C. et al. Gleason score 3+4=7 prostate cancer with minimal quantity of Gleason pattern 4 on needle biopsy is associated
with low-risk tumor in radical prostatectomy specimen. Am. J. Surg. Pathol. 38, 1096–1101 (2014).
8. Sato, S. et al. Cases having a Gleason Score 3+4=7 with <5% of Gleason pattern 4 in prostate needle biopsy show similar failure-
free survival and adverse pathology prevalence to Gleason Score 6 cases in a radical prostatectomy cohort. Am. J. Surg. Pathol. 43,
1560–1565 (2019).
9. Sauter, G. et al. Clinical utility of quantitative Gleason grading in prostate biopsies and prostatectomy specimens. Eur. Urol. 69,
592–598 (2016).
10. Cole, A. I. et al. Prognostic value of percent Gleason grade 4 at prostate biopsy in predicting prostatectomy pathology and recur-
rence. J. Urol. 196, 405–411 (2016).
11. Stark, J. R. et al. Gleason score and lethal prostate cancer: does 3 + 4 = 4 + 3?. J. Clin. Oncol. 27, 3459–3464 (2009).
12. Berney, D. M. et al. e percentage of high grade disease in prostate biopsies signicantly improves on grade groups in prediction
of prostate cancer death. Histopathology 75(4), 589–597 (2019).
13. Ahmed, H. U. et al. Characterizing clinically signicant prostate cancer using template prostate mapping biopsy. J. Urol. 186,
458–464 (2011).
14. Simopoulos, D. N. et al. Cancer core length from targeted biopsy: an index of prostate cancer volume and pathological stage. BJU
Int. https ://doi.org/10.1111/bju.14691 (2019).
15. Ahmed, H. U. et al. Diagnostic accuracy of multi-parametric MRI and TRUS biopsy in prostate cancer (PROMIS): a paired validat-
ing conrmatory study. Lancet 389, 815–822 (2017).
16. Bland, J. M. & Altman, D. G. Applying the right statistics: analyses of measurement studies. Ultrasound. Obstet. Gynecol. 22, 85–93
(2003).
17. Sharma, M. & Miyamoto, H. Percent Gleason pattern 4 in stratifying the prognosis of patients with intermediate-risk prostate
cancer. Transl. Androl. Urol. 7, S484–S489 (2018).
18. de Souza, M. F., de Azevedo Araujo, A. L. C., da Silva, M. T. & Athanazio, D. A. e Gleason pattern 4 in radical prostatectomy
specimens in current practice—quantication, morphology and concordance with biopsy. Ann. Diagn. Pathol. 34, 13–17 (2018).
19. Berg, K. D., Roder, M. A., Brasso, K., Vainer, B. & Iversen, P. Primary Gleason pattern in biopsy Gleason score 7 is predictive of
adverse histopathological features and biochemical failure following radical prostatectomy. Scand. J. Urol. 48, 168–176 (2014).
20. Helpap, B. et al. e signicance of accurate determination of Gleason score for therapeutic options and prognosis of prostate
cancer. Pathol. Oncol. Res. 22, 349–356 (2016).
21. Miyake, H. et al. Prognostic signicance of primary Gleason pattern in Japanese men with Gleason score 7 prostate cancer treated
with radical prostatectomy. Urol. Oncol. Semin. Orig. Invest. 31, 1511–1516 (2013).
22. Khoddami, S. M. et al. Predictive value of primary Gleason pattern 4 in patients with Gleason score 7 tumours treated with radical
prostatectomy. BJU Int. 94, 42–46 (2004).
23. Esteva, A. et al. A guide to deep learning in healthcare. Nat. Med. 25(1), 24–29 (2019).
24. Ström, P. et al. Articial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study.
Lancet Oncol. https ://doi.org/10.1016/S1470 -2045(19)30738 -7 (2020).
25. Nir, G. et al. Automatic grading of prostate cancer in digitized histopathology images: learning from multiple experts. Med. Image
Anal. 50, 167–180 (2018).
26. Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat.
Med. 25, 1301–1309 (2019).
27. Arvaniti, E. et al. Automated Gleason grading of prostate cancer tissue microarrays via deep learning. Sci. Rep. 8, 12054 (2018).
28. Nagpal, K. et al. Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. npj
Digit. Med. 2, 1–10 (2019).
29. Rosenfeld, A., Graham, D. G., Hamoudi, R., Butawan, R., Eneh, V., Khan, S. et al. MIAT: a novel attribute selection approach to
better predict upper gastrointestinal cancer. in Proceedings of the 2015 IEEE International Conference on Data Science and Advanced
Analytics, DSAA 2015 (Institute of Electrical and Electronics Engineers Inc., 2015). https ://doi.or g/10.110 9/DSAA.2015. 73448 66.
30. Team RC. R: A Language and Environment for Statistical Computing (2019). https ://www.r-proje ct.org/.
31. Datta, D. blandr: A Bland–Altman Method Comparison Package for R (2017). https ://doi.org/10.5281/zenod o.82451 4.
Author contributions
L.M.C.E., H.C.W., H.U.A., and M.E. conceived and designed the study. L.M.C.E., U.S., A.H., A.F., C.C.B. collected
the data L.M.C.E. analysed the data with guidance from Y.H., A.R. and L.B. All authors were involved in writing
the paper and had nal approval of the submitted and published versions.
Competing interests
Ahmed currently receives funding from the Wellcome Trust, Prostate Cancer UK, Medical Research Council
(UK), Cancer Research UK, e Urology Foundation, BMA Foundation, Imperial Healthcare Charity, Sonacare
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol.:(0123456789)
Scientic Reports | (2020) 10:17177 | 
www.nature.com/scientificreports/
Inc., Trod Medical and Sophiris Biocorp for trials in prostate cancer. Ahmed is a paid medical consultant for
Sophiris Biocorp, Sonacare Inc., BTG and Boston for trials work and proctoring. Emberton receives funding from
NIHR-i4i, MRC, Cancer Research UK, Sonacare Inc., and Sophiris Biocorp for trials in prostate cancer. Emberton
is a medical consultant to Sonacare Inc., Sophiris Biocorp, Steba Biotech, Exact Imaging and Profound Medical.
Ahmed and Emberton are proctors for HIFU and paid for training other surgeons in this procedure. Ahmed is
a proctor for cryotherapy using the Galil/BTG system. Emberton is a proctor for Irreversible Electroporation
(Nanoknife). Rest of the authors have no conict of interest.
Additional information
Supplementary information is available for this paper at https ://doi.org/10.1038/s4159 8-020-73524 -z.
Correspondence and requests for materials should be addressed to L.M.C.E.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access is article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons licence, and indicate if changes were made. e images or other third party material in this
article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.
© e Author(s) 2020
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Cores were measured immediately after each biopsy (to minimise desiccation) by an independent operator (J.B.), in consensus with an operator not performing the biopsy procedure, to minimise bias. Each core was straightened (if necessary) on a piece of white card, and core length (CL)-the total length of the biopsy core (target + background material) and target length (TL)-(target material length only) marked alongside the core using an ultrafine (0.1 mm) black tip pen (adapted from [13]), and measured using a ruler to the nearest 0.5 mm. ...
Article
Full-text available
Background Magnetic resonance imaging ( MRI) can be used to target tumour components in biopsy procedures, while the ability to precisely correlate histology and MRI signal is crucial for imaging biomarker validation. Robotic MRI/computed tomography (CT) fusion biopsy offers the potential for this without in-gantry biopsy, although requires development. Methods Test–retest T1 and T2 relaxation times, attenuation (Hounsfield units, HU), and biopsy core quality were prospectively assessed (January–December 2021) in a range of gelatin, agar, and mixed gelatin/agar solutions of differing concentrations on days 1 and 8 after manufacture. Suitable materials were chosen, and four biopsy phantoms were constructed with twelve spherical 1–3-cm diameter targets visible on MRI, but not on CT. A technical pipeline was developed, and intraoperator and interoperator reliability was tested in four operators performing a total of 96 biopsies. Statistical analysis included T1, T2, and HU repeatability using Bland–Altman analysis, Dice similarity coefficient (DSC), and intraoperator and interoperator reliability. Results T1, T2, and HU repeatability had 95% limits-of-agreement of 8.3%, 3.4%, and 17.9%, respectively. The phantom was highly reproducible, with DSC of 0.93 versus 0.92 for scanning the same or two different phantoms, respectively. Hit rate was 100% (96/96 targets), and all operators performed robotic biopsies using a single volumetric acquisition. The fastest procedure time was 32 min for all 12 targets. Conclusions A reproducible biopsy phantom was developed, validated, and used to test robotic MRI/CT-fusion biopsy. The technique was highly accurate, reliable, and achievable in clinically acceptable timescales meaning it is suitable for clinical application.
... grade groups 1 vs. 2) and that artificial intelligence performs remarkably well in this regard, highlighting the advantage of artificial intelligence to make consistent decisions. In addition to the Gleason score, it is recommended to report the portion of tumorous biopsy, the portion of Gleason pattern 4, a somewhat tedious task, which has been shown to be concerningly variable [9] and in one study was found over-estimated compared with digital analysis (mean bias of þ14%) [10]. ...
Article
Purpose of review: Artificial intelligence has made an entrance into mainstream applications of daily life but the clinical deployment of artificial intelligence-supported histological analysis is still at infancy. Recent years have seen a surge in technological advance regarding the use of artificial intelligence in pathology, in particular in the diagnosis of prostate cancer. Recent findings: We review first impressions of how artificial intelligence impacts the clinical performance of pathologists in the analysis of prostate tissue. Several challenges in the deployment of artificial intelligence remain to be overcome. Finally, we discuss how artificial intelligence can help in generating new knowledge that is interpretable by humans. Summary: It is evident that artificial intelligence has the potential to outperform most pathologists in detecting prostate cancer, and does not suffer from inherent interobserver variability. Nonetheless, large clinical validation studies that unequivocally prove the benefit of artificial intelligence support in pathology are necessary. Regardless, artificial intelligence may soon automate and standardize many facets of routine work, including qualitative (i.e. Gleason Grading) and quantitative measures (i.e. portion of Gleason Grades and tumor volume). For the near future, a model where pathologists are enhanced by second-review or real-time artificial intelligence systems appears to be the most promising approach.
Article
Introduction: Gleason Score 7 prostate cancer comprises a wide spectrum of disease risk, and precise sub-stratification is paramount. Our group previously demonstrated that the total length of Gleason pattern 4 (GP4) is a better predictor than %GP4 for adverse pathologic outcomes at radical prostatectomy. We aimed to determine the association of GP4 length on prostate biopsy with post-prostatectomy oncologic outcomes. Methods: We compared four GP4 quantification methods including: maximum %GP4 in any single core, overall %GP4, total length GP4 (mm) across all cores, and length GP4 (mm) in the highest volume core, for prediction of biochemical recurrence-free survival after radical prostatectomy using multivariable Cox proportional hazards regression. Results: A total of 457 men with Grade Group 2 prostate cancer on biopsy subsequently underwent radical prostatectomy. The 3-year biochemical recurrence-free survival probability was 85% (95% CI 81-88%). On multivariable analysis, all four GP4 quantification methods were associated with BCR -maximum %GP4 (HR=1.30; 95% CI 1.07-1.59; p=0.009), overall %GP4 (HR=1.61; 95% CI 1.21-2.15; p=0.001), total length GP4 (HR=2.48; 95% CI 1.36-4.52; p=0.003), and length GP4 in highest core (HR=1.32; 95% CI 1.11-1.57; p=0.001). However, we were unable to identify differences between methods of quantification with a relatively low event rate. Conclusions: These findings support further studies on GP4 quantification in addition to the ratio of GP3 and GP4 to classify prostate cancer risk. Research should also be conducted on whether GP4 quantification could provide a surrogate endpoint for disease progression for trials in active surveillance.
Article
Full-text available
The development of decision support systems for pathology and their deployment in clinical practice have been hindered by the need for large manually annotated datasets. To overcome this problem, we present a multiple instance learning-based deep learning system that uses only the reported diagnoses as labels for training, thereby avoiding expensive and time-consuming pixel-wise manual annotations. We evaluated this framework at scale on a dataset of 44,732 whole slide images from 15,187 patients without any form of data curation. Tests on prostate cancer, basal cell carcinoma and breast cancer metastases to axillary lymph nodes resulted in areas under the curve above 0.98 for all cancer types. Its clinical application would allow pathologists to exclude 65–75% of slides while retaining 100% sensitivity. Our results show that this system has the ability to train accurate classification models at unprecedented scale, laying the foundation for the deployment of computational decision support systems in clinical practice.
Article
Full-text available
For prostate cancer patients, the Gleason score is one of the most important prognostic factors, potentially determining treatment independent of the stage. However, Gleason scoring is based on subjective microscopic examination of tumor morphology and suffers from poor reproducibility. Here we present a deep learning system (DLS) for Gleason scoring whole-slide images of prostatectomies. Our system was developed using 112 million pathologist-annotated image patches from 1226 slides, and evaluated on an independent validation dataset of 331 slides. Compared to a reference standard provided by genitourinary pathology experts, the mean accuracy among 29 general pathologists was 0.61 on the validation set. The DLS achieved a significantly higher diagnostic accuracy of 0.70 (p = 0.002) and trended towards better patient risk stratification in correlations to clinical follow-up data. Our approach could improve the accuracy of Gleason scoring and subsequent therapy decisions, particularly where specialist expertise is unavailable. The DLS also goes beyond the current Gleason system to more finely characterize and quantitate tumor morphology, providing opportunities for refinement of the Gleason system itself.
Article
Full-text available
It has been recommended that the % of high grade (HG) Gleason patterns 4 and 5 should be quantitated in prostate cancer (PCa). However, this has not been assessed in a cohort using PCa death as an outcome and there is debate as to whether the biopsy with the ‘worst’ %HG disease or an ‘overall’ %HG be reported. Such data may be assist in active surveillance decisions. Men with clinically localized PCa diagnosed by needle biopsy from 1990‐2003 were included. The endpoint was prostate cancer death. Clinical variables included Gleason score (GS), PSA, age, clinical stage, and disease extent. Deaths were divided into those from prostate cancer and those from other causes, according to WHO criteria. 988 biopsy cases were centrally reviewed using criteria agreed at the Chicago ISUP conference in 2014. Cores were given individual GS and Grade Groups (GG) and a percentage of each grade given for each core. The ‘worst’ %HG disease seen in a biopsy series was calculated as well as the ‘overall’ %HG disease. The overall percentage of HG disease was highly significant with an HR=4.45 for the interquartile range (CI 3.30‐6.01, p<2.2x10‐16) and similar to the %HG seen in the worst core. In multivariate analysis, both were highly significant. GG2 with ≤ 5% Gleason pattern 4 showed similar survival to GG1 cases. These data validate the use of %HG disease to predict PCa death. As both worst and overall %HG disease are powerful predictors of outcome, either could be chosen to provide prognostic information. This article is protected by copyright. All rights reserved.
Article
Full-text available
Here we present deep-learning techniques for healthcare, centering our discussion on deep learning in computer vision, natural language processing, reinforcement learning, and generalized methods. We describe how these computational techniques can impact a few key areas of medicine and explore how to build end-to-end systems. Our discussion of computer vision focuses largely on medical imaging, and we describe the application of natural language processing to domains such as electronic health record data. Similarly, reinforcement learning is discussed in the context of robotic-assisted surgery, and generalized deep-learning methods for genomics are reviewed.
Article
Full-text available
The Gleason grading system remains the most powerful prognostic predictor for patients with prostate cancer since the 1960s. Its application requires highly-trained pathologists, is tedious and yet suffers from limited inter-pathologist reproducibility, especially for the intermediate Gleason score 7. Automated annotation procedures constitute a viable solution to remedy these limitations. In this study, we present a deep learning approach for automated Gleason grading of prostate cancer tissue microarrays with Hematoxylin and Eosin (H&E) staining. Our system was trained using detailed Gleason annotations on a discovery cohort of 641 patients and was then evaluated on an independent test cohort of 245 patients annotated by two pathologists. On the test cohort, the inter-annotator agreements between the model and each pathologist, quantified via Cohen's quadratic kappa statistic, were 0.75 and 0.71 respectively, comparable with the inter-pathologist agreement (kappa = 0.71). Furthermore, the model's Gleason score assignments achieved pathology expert-level stratification of patients into prognostically distinct groups, on the basis of disease-specific survival data available for the test cohort. Overall, our study shows promising results regarding the applicability of deep learning-based solutions towards more objective and reproducible prostate cancer grading, especially for cases with heterogeneous Gleason patterns.
Article
Background: An increasing volume of prostate biopsies and a worldwide shortage of urological pathologists puts a strain on pathology departments. Additionally, the high intra-observer and inter-observer variability in grading can result in overtreatment and undertreatment of prostate cancer. To alleviate these problems, we aimed to develop an artificial intelligence (AI) system with clinically acceptable accuracy for prostate cancer detection, localisation, and Gleason grading. Methods: We digitised 6682 slides from needle core biopsies from 976 randomly selected participants aged 50-69 in the Swedish prospective and population-based STHLM3 diagnostic study done between May 28, 2012, and Dec 30, 2014 (ISRCTN84445406), and another 271 from 93 men from outside the study. The resulting images were used to train deep neural networks for assessment of prostate biopsies. The networks were evaluated by predicting the presence, extent, and Gleason grade of malignant tissue for an independent test dataset comprising 1631 biopsies from 246 men from STHLM3 and an external validation dataset of 330 biopsies from 73 men. We also evaluated grading performance on 87 biopsies individually graded by 23 experienced urological pathologists from the International Society of Urological Pathology. We assessed discriminatory performance by receiver operating characteristics and tumour extent predictions by correlating predicted cancer length against measurements by the reporting pathologist. We quantified the concordance between grades assigned by the AI system and the expert urological pathologists using Cohen's kappa. Findings: The AI achieved an area under the receiver operating characteristics curve of 0·997 (95% CI 0·994-0·999) for distinguishing between benign (n=910) and malignant (n=721) biopsy cores on the independent test dataset and 0·986 (0·972-0·996) on the external validation dataset (benign n=108, malignant n=222). The correlation between cancer length predicted by the AI and assigned by the reporting pathologist was 0·96 (95% CI 0·95-0·97) for the independent test dataset and 0·87 (0·84-0·90) for the external validation dataset. For assigning Gleason grades, the AI achieved a mean pairwise kappa of 0·62, which was within the range of the corresponding values for the expert pathologists (0·60-0·73). Interpretation: An AI system can be trained to detect and grade cancer in prostate needle biopsy samples at a ranking comparable to that of international experts in prostate pathology. Clinical application could reduce pathology workload by reducing the assessment of benign biopsies and by automating the task of measuring cancer length in positive biopsy cores. An AI system with expert-level grading performance might contribute a second opinion, aid in standardising grading, and provide pathology expertise in parts of the world where it does not exist. Funding: Swedish Research Council, Swedish Cancer Society, Swedish eScience Research Center, EIT Health.
Article
Recent discussions have suggested expanding the inclusion criteria for active prostate cancer surveillance to include cases with a Gleason score (GS) of 3+4=7. In this study, we examined this proposed use of a limited percent Gleason pattern 4 (%GP4) to identify candidates of active surveillance among 315 patients who underwent radical prostatectomy for prostate cancer with a GS of 6 or 3+4=7 via needle biopsy. The latter cases were divided into 4 groups using highest or overall %GP4 cut-off values of 5% and 10% as determined from prostate needle biopsies. The frequency of adverse pathology and risk of biochemical recurrence were compared between the GS 6 and both GS 3+4=7 groups. Adverse pathology was defined as a GS 4+3=7 or higher, pT3b staging or positive lymph node metastasis. Notably, the Gleason pattern 4 <5% and GS 6 groups did not differ significantly in terms of the frequency of adverse pathology and risk of biochemical recurrence by the highest method. However, other highest Gleason pattern 4 categories had significantly higher frequencies and risks. Using the overall method, even the Gleason pattern 4 <5% group had a significantly higher frequency of adverse pathology and risk of biochemical recurrence relative to the GS 6 group. In conclusion, our findings suggest that patients with a GS 3+4=7 on biopsy with a highest %GP4 <5% are similar candidates for active surveillance to men with GS 6 cancers.
Article
Objective To study the relationship of maximum cancer core length, on targeted biopsy of MRI‐visible index lesions, to volume of that tumor found at prostatectomy. Patients and Methods 205 men undergoing fusion biopsy and radical prostatectomy were divided into two groups: 136 in whom the maximum cancer core length came from an index MRI‐visible lesion (targeted) and 69 in whom maximum cancer core length came from a non‐targeted lesion. MRI was 3T multi‐parametric and biopsy was via MRI‐US fusion. Results In the targeted biopsy group, maximum cancer core length correlated with volume of clinically‐significant index tumors (ρ=0.44‐0.60, p<0.01). The correlation was similar for first and repeat biopsy and for transition and peripheral zone lesions (ρ=0.42‐0.49, p<0.01). No correlations were found in the non‐targeted group. Targeted maximum cancer core length (6‐10 mm and >10 mm) and MRI lesion diameter (>20 mm) were independently associated with tumor volume. Targeted maximum cancer core lengths >10 mm and Gleason scores >7 were each associated with pathological T3 disease (OR, 5.73 and 5.04, respectively), but MRI lesion diameter lesion was not. Conclusions Maximum cancer core length on a targeted biopsy from an MRI‐visible lesion is an independent predictor of both cancer volume and pathologic stage. This relationship does not exist for MCCL from a non‐targeted biopsy core. Quantifying cancer core length on MRI‐targeted biopsies may have a value, not previously described, to risk‐stratify patients with prostate cancer before treatment. This article is protected by copyright. All rights reserved.
Article
Prostate cancer (PCa) is a heterogeneous disease that is manifested in a diverse range of histologic patterns and its grading is therefore associated with an inter-observer variability among pathologists, which may lead to an under- or over-treatment of patients. In this work, we develop a computer aided diagnosis system for automatic grading of PCa in digitized histopathology images using supervised learning methods. Our pipeline comprises extraction of multi-scale features that include glandular, cellular, and image-based features. A number of novel features are proposed based on intra- and inter-nuclei properties; these features are shown to be among the most important ones for classification. We train our classifiers on 333 tissue microarray (TMA) cores that were sampled from 231 radical prostatectomy patients and annotated in detail by six pathologists for different Gleason grades. We also demonstrate the TMA-trained classifier's performance on additional 230 whole-mount slides of 56 patients, independent of the training dataset, by examining the automatic grading on manually marked lesions and randomly sampled 10% of the benign tissue. For the first time, we incorporate a probabilistic approach for supervised learning by multiple experts to account for the inter-observer grading variability. Through cross-validation experiments, the overall grading agreement of the classifier with the pathologists was found to be an unweighted kappa of 0.51, while the overall agreements between each pathologist and the others ranged from 0.45 to 0.62. These results suggest that our classifier's performance is within the inter-observer grading variability levels across the pathologists in our study, which are also consistent with those reported in the literature.
Article
The Gleason score remains the most reliable prognosticator in men with prostate cancer. One of the recent important modifications in the Gleason grading system recommended from the International Society of Urological Pathology consensus conference is recording the percentage of Gleason pattern 4 in the pathology reports of prostate needle biopsy and radical prostatectomy cases with Gleason score 7 prostatic adenocarcinoma. Limited data have indeed suggested that the percent Gleason pattern 4 contributes to stratifying the prognosis of patients who undergo radical prostatectomy. An additional obvious benefit of reporting percent pattern 4 includes providing critical information for treatment decisions. This review summarizes and discusses available studies assessing the utility of the percentage of Gleason pattern 4 in the management of prostate cancer patients.