ArticlePDF Available

Deep learning for necrosis detection using canine perivascular wall tumour whole slide images

Springer Nature
Scientific Reports
Authors:
  • AURA Veterinary

Abstract and Figures

Necrosis seen in histopathology Whole Slide Images is a major criterion that contributes towards scoring tumour grade which then determines treatment options. However conventional manual assessment suffers from inter-operator reproducibility impacting grading precision. To address this, automatic necrosis detection using AI may be used to assess necrosis for final scoring that contributes towards the final clinical grade. Using deep learning AI, we describe a novel approach for automating necrosis detection in Whole Slide Images, tested on a canine Soft Tissue Sarcoma (cSTS) data set consisting of canine Perivascular Wall Tumours (cPWTs). A patch-based deep learning approach was developed where different variations of training a DenseNet-161 Convolutional Neural Network architecture were investigated as well as a stacking ensemble. An optimised DenseNet-161 with post-processing produced a hold-out test F1-score of 0.708 demonstrating state-of-the-art performance. This represents a novel first-time automated necrosis detection method in the cSTS domain as well specifically in detecting necrosis in cPWTs demonstrating a significant step forward in reproducible and reliable necrosis assessment for improving the precision of tumour grading.
This content is subject to copyright. Terms and conditions apply.
1
Vol.:(0123456789)
Scientic Reports | (2022) 12:10634 | https://doi.org/10.1038/s41598-022-13928-1
www.nature.com/scientificreports
Deep learning for necrosis
detection using canine perivascular
wall tumour whole slide images
Taranpreet Rai1*, Ambra Morisi2, Barbara Bacci5, Nicholas J. Bacon6, Michael J. Dark7,
Tawk Aboellail8, Spencer Angus Thomas4,9, Miroslaw Bober1, Roberto La Ragione2,3 &
Kevin Wells1
Necrosis seen in histopathology Whole Slide Images is a major criterion that contributes towards
scoring tumour grade which then determines treatment options. However conventional manual
assessment suers from inter-operator reproducibility impacting grading precision. To address this,
automatic necrosis detection using AI may be used to assess necrosis for nal scoring that contributes
towards the nal clinical grade. Using deep learning AI, we describe a novel approach for automating
necrosis detection in Whole Slide Images, tested on a canine Soft Tissue Sarcoma (cSTS) data set
consisting of canine Perivascular Wall Tumours (cPWTs). A patch-based deep learning approach was
developed where dierent variations of training a DenseNet-161 Convolutional Neural Network
architecture were investigated as well as a stacking ensemble. An optimised DenseNet-161 with post-
processing produced a hold-out test F1-score of 0.708 demonstrating state-of-the-art performance.
This represents a novel rst-time automated necrosis detection method in the cSTS domain as well
specically in detecting necrosis in cPWTs demonstrating a signicant step forward in reproducible
and reliable necrosis assessment for improving the precision of tumour grading.
Canine So Tissue Sarcoma (cSTS) are a heterogeneous group of mesenchymal neoplasms that derive from tis-
sues of mesenchymal origin16. e anatomical site of cSTS varies signicantly, but mostly involve the cutaneous
and subcutaneous tissues7. Canine So Tissue Sarcoma (cSTS) are a large group that can be broken down into
several subtypes, but are grouped together nonetheless, due to the similarities of microscopic and clinical features
for each subtype. e general treatment of choice for cSTS is to surgically remove cutaneous and subcutaneous
sarcomas, where they have a low re-occurrence rate aer surgical excision. However, it is higher-grade tumours
that can prove to be problematic leading to poorer prognosis and outcomes. Histological grade is the most impor-
tant prognostic factor in human So Tissue Sarcoma (STS), and is likely one of the most validated criteria to
predict outcome following surgery in canine patients811. It is widely accepted that the histological grading system
for cSTS is applied to all cSTS subtypes to adopt simplicity. However, there can also be an inconsistent naming of
subtypes which can lead to a poor correlation between classication of tumours and their histogenesis (tissue of
origin). is sometimes results in confusion for pathologists, therefore, highlighting a need for standardisation7.
Due to poor agreement when identifying sarcoma subtypes, we are focusing on one common subtype found
in canines: canine Perivascular Wall Tumours (cPWT). CaninePerivascular Wall Tumours (cPWT) arise from
vascular mural cells and can be recognisable from their vascular growth patterns which include staghorn, pla-
centoid, perivascular whorling, and bundles from tunica media12,13.
e scoring for cSTS grading is broken down into three major criteria: dierentiation, mitotic index and
necrosis7. For the purposes of this paper, the study is focused on necrosis detection which is an important indi-
cator of disease progression and its severity.
A sub-eld of machine learning known as deep learning is used for necrosis detection in this work. Such
deep learning algorithms are abundant in the medical imaging eld and especially digital pathology, assisting in
OPEN
1Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, UK. 2School of
Veterinary Medicine, University of Surrey, Guildford GU2 7AL, UK. 3School of Biosciences and Medicine, University
of Surrey, Guildford GU2 7XH, UK. 4National Physical Laboratory, London TW11 0LW, UK. 5Department of
Veterinary Medical Sciences, University of Bologna, 40126 Bologna, Italy. 6Fitzpatrick Referrals Oncology and Soft
Tissue, Guildford, UK. 7Department of Comparative, Diagnostic, and Population Medicine, College of Veterinary
Medicine, University of Florida, Gainesville, FL, USA. 8Department of Microbiology, Immunology and Pathology,
Colorado State University, Fort Collins, CO, USA. 9Department of Computer Science, University of Surrey,
Guildford GU2 7XH, UK. *email: t.rai@surrey.ac.uk
Content courtesy of Springer Nature, terms of use apply. Rights reserved
2
Vol:.(1234567890)
Scientic Reports | (2022) 12:10634 | https://doi.org/10.1038/s41598-022-13928-1
www.nature.com/scientificreports/
computer-aided diagnosis to classify images or automatically detect diseases. Deep learning has become increas-
ingly ubiquitous and has proven to be very successful in recent image classication tasks in digital pathology1419.
e digitisation of histological slides into Whole Slide Images (WSI) has created the eld of digital pathology. e
eld of digital pathology has allowed cellular pathology labs to move into digital workows20. is has resulted in
a change in working practices as clinical pathologists are no longer required to be present at the same location as
pathology equipment. Potential benets of this innovation includes remote working across borders, collaborative
reporting, and the curation of large teaching databases. Nevertheless, several pathology tasks remain exposed
to inter-observer variability, where two or more pathologists will dier in their assessment of a histological
slide16. As a result, there is much interest in improving and automating pathology workows whilst promoting
standardisation for scoring certain criteria within grading, with greater reproducibility. Automatic necrosis detec-
tion in cSTS could decrease viewing times for the pathologists and reduce inter- and intra-observer variability,
positively impacting accuracy in the tumour’s diagnosis and prognosis.
e study presented here aimed to classify regions demonstrating necrosis against regions that do not, in
canine Perivascular Wall Tumour (cPWT) Whole Slide Images (WSIs), by using deep learning models such
as pretrained Convolutional Neural Networks (DenseNet-161). In the literature, relatively few authors have
investigated necrosis detection using machine learning methods. As necrosis detection is typically an image
classication task, depending on the image resolution and ”eld of view” (size of image), necrosis detection
can be considered a texture detection problem. Earlier work in necrosis detection applied machine learning
methods where texture features were used for Support-Vector Machine (SVM) classication21. e same authors
later published literature where deep learning was compared with traditional computer vision machine learn-
ing methods in digital pathology. For necrosis detection, their proposed deep learning Convolutional Neural
Network (CNN) architecture performed best with an average test accuracy of 81.44%22. Another set of authors
investigated necrosis detection comparing both an SVM machine learning model and deep learning for viable
and necrotic tumour assessment in human osteosarcoma WSIs23. e aim was to label the regions of WSI into
viable tumor, necrotic tumor, and non-tumor. For evaluation the Volume Under the Surface score (VUS) was
computed for non-tumour versus viable tumour versus necrotic. eir models produced 0.922 and 0.959 VUS
scores for SVM and deep learning models respectively. Nevertheless, these works do not investigate canine So
Tissue Sarcoma (cSTS) and so this paper addresses whether such deep learning models can also positively impact
necrosis detection in cSTS.
Several methods of training deep learning models were investigated in this work, such as a pretrained
DenseNet-161 (with and without augmentations), an extension of training this model via hard negative mining,
to reduce false positive (FP) predictions and a stacking ensemble model. To the best of our knowledge this is the
rst work in automated detection of necrosis in cPWTs, as well as in cSTS and thus this methodology could be
used for the necrosis scoring in an automated detection and grading system for cSTSs. To our best of knowledge,
these results represent the highest F1-scores in regard to cPWT necrosis detection to date.
Methods
Data description and patch extraction process. A set of canine So Tissue Sarcoma (cSTS) histology
slides obtained from the Department of Microbiology, Immunology and Pathology, Colorado State University
were diagnosed by a veterinary pathologist. A senior pathologist at the University of Surrey conrmed the grade
of each case (patient) and chose a representative histological slide for each patient. ese slides were then digit-
ised using a Hamamatsu NDP slide scanner (Hamamatsu Nanozoomer 2.0 HT) and viewed with the NDP.viewer
platform. ese slides were scanned at 40x magnication (0.23m/pixel) with a scanning speed of approximately
150s at 40x mode (15mm
×
15mm) to create a digital Whole Slide Image (WSI).
Two pathologists independently annotated the WSIs for necrosis using the open-source Automated Slide
Analysis Platform (ASAP) soware, as contours around the necrotic regions24. e pathologists used dierent
magnications (ranging from 5x to 40x) to analyse the necrotic regions before drawing contours. As a result, two
class labels were created from these annotations: positive (necrosis) and negative, for subsequent analysis as a
binary patch-based classication problem. In order to categorise a region as containing necrosis, both pathologist
annotators needed to form an ”agreement”. erefore, the intersection of the necrosis annotations were labelled
as necrosis. Similarly, areas that are agreed to have no necrosis are labelled as negative. We used these annota-
tions to create image masks for the patch extraction process and applied Otsu thresholding to remove non-tissue
background from both classes creating tissue masks. A patch-based approach was applied due to the large nature
of Whole Slide Images, which typically produce gigapixel images in the higher resolution layers of the pyramid
format. Such large images cannot be directly fed into machine learning models and so patches of a smaller size
are extracted from WSIs for further analysis. Using the aforementioned intersection of the annotator’s necrosis
binary maps, non-overlapping patches of size 256 x 256 pixels were extracted from both necrosis and negative
regions (2 classes). A demonstration of the patch extraction process can be visualised in Fig.1.
e study investigated 10x magnication resolutions for necrosis detection as suggested by the on-board
pathologists. e pathologists chose 10x magnication over 5x, 20x and 40x as the ideal resolution for necrosis
detection. Non-overlapping patches of 256 by 256 pixels were extracted from regions of both classes using a
minimum decision threshold for percentage of necrosis present in a patch. For a magnication of 10x, 30% of
the patch must have contained necrosis pixels (determined from the expert dened labels) in order for it to be
labelled as necrosis. A threshold of 30% for necrosis pixels was chosen as it was needed to take into account
boundary eects of the necrosis clusters in the images. Patches extracted from boundaries of the necrosis cluster
would almost certainly contain non-necrotic tissue. However, a suitable amount (in this case 30%) of necrosis
tissue is required to suciently label a patch as necrosis for the eective training of deep learning models. A
Content courtesy of Springer Nature, terms of use apply. Rights reserved
3
Vol.:(0123456789)
Scientic Reports | (2022) 12:10634 | https://doi.org/10.1038/s41598-022-13928-1
www.nature.com/scientificreports/
higher number would risk dismissing useful necrotic patches, whereas a lower number (thus more negative tissue
in a patch) would likely cause confusion during the training of deep learning models.
Deep learning model and experimental set-up. In order to evaluate the robustness and the veracity
of our approach, we performed 3-fold cross validation, where a hold-out test set created to compare the models
trained on the three dierent folds. In total we extracted patches from 32 patients (WSIs) to create our train,
validation and test sets.
ere were a total of 5784 necrosis patches from 20 slides for training/validation and 1151 necrosis patches
from 12 slides for testing. Additionally, there were a total of 50,975 negative patches for training/validation and
31,351 negative patches for testing.
Class imbalance is apparent throughout the dierent folds of the datasets. To address the large variation of
the negative class with a relatively small presence of the necrosis class, we reduced class imbalance by randomly
extracting 800 negative patches per WSI and used these with all necrosis patches per WSI. is reduces the class
imbalance to 1:4 for necrosis to negative, respectively, although not excessively. Weighted cross entropy loss
was applied to mitigate class imbalance. It was found that the models trained with this level of class imbalance
performed marginally better. erefore, we opt to train with this mild imbalance ratio, as balancing the dataset
to 50/50 may risk throwing away useful (vital) information from the negative class.
e deep learning model implemented transfer learning bottleneck feature extraction. Previous investigations
of several pretrained networks including VGG and ResNet architectures have been shown to have a positive
impact in digital pathology25,26. However, DenseNet-161 was chosen due to its leading performance in previ-
ous works27. According to one study, DenseNet-161 can be used for fast and accurate classication of digital
pathology images to assist pathologists in daily clinical tasks28. us, bottleneck features were extracted from
DenseNet-161, producing an output of 2208 features. ese features were then fed into a classication layer, to
classify ”necrosis” or the ”negative” class per patch. As DenseNet has been pretrained on ImageNet, its standard
output is for 1000 classes. However, as necrosis detection is considered a binary problem, a binary classication
layer has been used that replaces the original multiclass classication layer. See part a Fig.2.
Comparative experiments were implemented comparing the pretrained DenseNet-161 to Sharma etal’s
proposed CNN model and an AlexNet22. At the initial 50% decision threshold, these models produce higher
f1-scores. However, it should be noted that the sensitivities of the AlexNet and proposed CNN models were
lower than our previously implemented DenseNet-161 models. Our DenseNet-161 model also provided a higher
AUC value prior to thresholding and post-processing (See Supplementary TableS1). erefore, the pretrained
Figure1. In (a), Annotations by ”Annotator 1” and ”Annotator 2” applied to the same canine Perivascular
Wall Tumour (cPWT) Whole Slide Image (WSI). For the patch extraction process, binary masks (or maps)
are generated, (shown in (b). A necrosis mask is created, highlighting the intersection agreement between
both annotators, when considering a region as necrotic. Any disagreement is dismissed from the necrosis and
negative binary masks. From applying Otsu thresholding we dismissed any non-tissue related regions and by
using the intersection agreement for both annotators, we created a ”negative mask”, highlighting in white regions
that do not contain necrosis. We used these masks to extract patches, as shown in (c). In this case we extract 10x
magnication necrosis and negative patches of 256
×
256 pixels.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
4
Vol:.(1234567890)
Scientic Reports | (2022) 12:10634 | https://doi.org/10.1038/s41598-022-13928-1
www.nature.com/scientificreports/
Figure2. (a) Bottleneck feature extraction using DenseNet-161. A patch size of 256 x 256 pixels is fed into
a DenseNet-161 feature extractor, where bottleneck features are obtained. ese features are then fed into a
classication layer for further training and validation, classifying necrosis or negative patches. (b) Hard negative
mining approach to train the model with additional ”dicult” examples presented to the network. (c) Stacking
ensemble. e input X is fed into M base-level member models: DenseNet-161 model, the DenseNet-161
model with augmentations and the hard negative mining model. e prediction outputs of these models
ˆyM
are
combined and fed into a logistic regression meta-model as new feature inputs. New coecients are learnt in this
logistic regression model, before nal predictions are output ˆ
yfinal
.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
5
Vol.:(0123456789)
Scientic Reports | (2022) 12:10634 | https://doi.org/10.1038/s41598-022-13928-1
www.nature.com/scientificreports/
DenseNet-161 was the primary baseline choice of model for this paper, as it had a greater capacity for further perfor-
mance improvement via thresholding and post-processing.
A grid search was previously implemented in preliminary experiments to determine optimal hyperparameters.
e range of hyperparameters investigated were batch sizes of 16, 32 and 48, learning rates of 0.00001, 0.0001, 0.001
and 0.01 and dierent variations of scheduler steps. As a result, for all experiments a batch size of 32 was inputted into
each model and the loss function used was cross entropy loss. e Adam optimiser initialised with a learning rate of
0.0001 was used, with a scheduler step of 20 and a scheduler gamma of 0.5 as optimal values29. is optimised set of
hyperparameters resulted in a smooth, stable training behaviour. We calculated the RGB mean and standard deviation
values per fold for patch image normalisation. For every fold, each model was trained for 100 epochs, where the model
from the epoch with the lowest validation loss was automatically chosen as the best performing model. is selected
model was then applied to both the validation and test sets for evaluation during training and nal testing, respectively.
e models were implemented in Python, using the PyTorch deep learning framework. Other notable Python
packages used for pre-processing and post-processing tasks included OpenSlide, NumPy, Pandas, OpenCV and Math.
Both training and testing of the deep learning models were performed using GPU programming. e hardware and
resources available for implementation used a Dell T630 system, which included 2 Intel Xeon E5 v4 series 8-Core CPUs
at 3.2GHz, 128GB of RAM, 4 nVidia Titan X (Pascal, Compute 6.1, Single Precision) GPUs.
Other types of models investigated. In this section we describe the dierent type of deep learning and ensem-
ble models investigated for necrosis detection. Apart from the ensembles, all deep learning models used the same hyper-
parameters and training scheme as described in the previous section.
DenseNet‑161 with augmentations. Adding augmented patch images to a training set is a common strategy to mitigate
the lack of variation and size of limited datasets30. Modications were thus applied to an existing image to produce new
images, using random horizontal/ vertical ips and colour jitter (random changes to the brightness, contrast, saturation
and hue of a patch image). A change of upto/minus 40% is randomly applied for brightness, contrast and saturation.
Hard negative mining model. Hard negative mining was performed in order to reduce the number of false positives.
is is an approach that allowed us to train the model with additional ”dicult” examples presented to the network31.
Firstly, a ”full” training set was created, where we extracted every single negative class patch from the training set. Sec-
ondly, we then applied the best model on the full training set to infer a new set of predictions. e model with the lowest
validation loss was selected as the best model. Any prediction on a patch that was a false positive (and did not exist in the
original dataset) was added to the sub-sampled dataset; thus creating a new dataset known as the ”hard negative training
set”. Lastly, we then trained the DenseNet-161 model using this new dataset and evaluated as before. A ow diagram of
the implementation of the hard negative experiments can be visualised in part b of Fig.2.
Ensemble model. A common approach to boost the performance of machine learning models is via the employ-
ment of ensemble models where the ensemble process is depicted in part c of Fig.2. e basic concept of ensembles is
to train multiple models (or base-member models) and combine their predictions into one single output32. Ensemble
models typically outperform single models (individual base-member models) on their respective target datasets.
is can be seen in recent machine learning based competitions such as on the Kaggle platform and MICCAI3335.
e ensemble model was trained on the sub-sampled training dataset. To make use of all the data from the
training WSIs, we made inferences on the full training set patches (including the original training set patches).
ere are various methods and combinations to create ensemble models, however, preliminary experiments
demonstrated that an ideal combination consisted of base-member models was the DenseNet-161 model, the
DenseNet-161 model with augmentations and the hard negative mining model.
Preliminary experiments investigated combining ensemble predictions and training a logistic regression
model (as a meta-model) on the full training set of patches to produce prediction probability outputs. We then
tested this trained logistic regression model on validation and hold-out test sets. Pseudocode for the stacking
ensemble is also represented in Algorithm1.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
6
Vol:.(1234567890)
Scientic Reports | (2022) 12:10634 | https://doi.org/10.1038/s41598-022-13928-1
www.nature.com/scientificreports/
Figure3. Histograms of the initial classication results based on a standard 0.5 (50%) probability decision
threshold. Depicted are true negatives (TN), false positives (FP), false negatives (FN) and true positives (TP) for the
DenseNet-161 model. On the le side depicts histogram plots of TN and FP for each validation fold, whereas on the
right side depicts histogram plots of FN and TP. ese combinations were chosen for the plots as they complement
each other. It can be seen that all three folds are characteristically similar in distribution. TN and TP predictions
typically produce high probabilities, as can be seen by the frequency of such predicted probabilities. Increasing the
probability threshold would increase the number of true negatives and reduce the number of false positives. However,
this would subsequently increase the number of false negatives and reduce the number of true positives.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
7
Vol.:(0123456789)
Scientic Reports | (2022) 12:10634 | https://doi.org/10.1038/s41598-022-13928-1
www.nature.com/scientificreports/
Post-processing. It is important to note that although sensitivity is a vital measure in the medical domain,
the number of false positives greatly inuences the score for necrosis, thus impacting overall grading. For
our problem, both the precision and sensitivity were considered to be equally as important and therefore the
F1-score was used to determine the optimal thresholds for each folds validation. ese thresholds were then
applied to our hold-out test set for each fold. Additionally, for comparison, the mean optimal threshold for the
three folds was computed and applied this to our hold-out test set. e F1-score is the harmonic mean between
the precision and sensitivity. It takes into account both the sensitivity and precision producing a weighted aver-
age of the two metrics. Both precision (formula2) and sensitivity (formula3) contribute equally to the F1 score
(formula5):
Figure4. Line graphs that depict the sensitivity, specicity and weighted F1-score calculated for each
probability threshold, for the three validation folds from the DenseNet-161 and ensemble models. To determine
the optimal probability threshold, we choose the threshold with the highest F1-score. In the above plots, these
are denoted as ”Best threshold”. For example, for the ensemble model, in validation fold 1, this threshold was
0.86, for fold 2 it was 0.65 and for fold 3 it was 0.97.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
8
Vol:.(1234567890)
Scientic Reports | (2022) 12:10634 | https://doi.org/10.1038/s41598-022-13928-1
www.nature.com/scientificreports/
where TP, FP and FN are true positives, false positives and false negatives, respectively.
Figure3 depicts histograms showing true positives, true negatives, false positives and false negatives for the
DenseNet-161 model. e y-axis is log-normalised to compress and better visualise the frequency of predictions
as the number of true negatives signicantly outweigh the number of true positives. e x-axis (probabilities)
is split into 100 bins. e le side of this gure shows true negatives (TN) and false positive (FP) histograms
for each validation fold, whereas on the right depicts histogram plots of false negatives (FN) and true positives
(TP). e validation set was used to choose optimal thresholds. e hold-out test set was used as a data set purely
for evaluation and not contribute towards any change in strategy (i.e. to prevent data/information leaks). ese
classication output combinations were chosen as they complement one other. For example, increasing the prob-
ability decision threshold would increase the number of TNs and reduce the number of FPs. However, this would
subsequently increase the number of FNs thus reducing the number of TPs. For all folds, the TN and FP plots
shows a wide range of prediction probabilities, with a heavy skew towards the lower probabilities. e FN and TP
plot for fold 3 (validation) appears to shows slight sparsity among FN predictions in comparison to other folds.
e sensitivity, specicity and F1-scores were calculated for several probability thresholds for each validation
fold, as shown in Fig.4. For both the DenseNet-161 model and the ensemble model, it was apparent that high
probability thresholds had an adverse eect on sensitivity. e F1-score was used as the metric to determine
optimal thresholds.
(1)
Accuracy
=
TP
+
TN
TP +TN +FN +FP
(2)
Precision
=
TP
TP +FP
(3)
Sensitivity
=
TP
TP +FN
(4)
=
(5)
F1
=2
Sensitivity
Precision
Sensitivity
+
Precision
Figure5. e post-processing step to remove predicted single necrosis tiles is depicted. e necrosis
predictions are applied to a binary mask which is a downsized binary map of the original WSI, by a factor of
32x, for computational eciency. Connected components analysis is subsequently performed, where if a tile of
a xed size (in this case an area of 32 pixels squared) is not connected to other tiles, horizontally, vertically, or
diagonally, it is removed from the mask. Final predictions are updated based on using these binary masks.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
9
Vol.:(0123456789)
Scientic Reports | (2022) 12:10634 | https://doi.org/10.1038/s41598-022-13928-1
www.nature.com/scientificreports/
e probability thresholds t ranged from 0.01 to 1 and so choosing the optimal validation threshold T for the
F1-score F1 can be represented formally as:
In general, the DenseNet-161 model followed a similar trend for all 3 folds where the optimal threshold was high
(between 0.88 and 0.93). e ensemble model demonstrated similar results apart from fold 2 where the optimal
probability decision threshold was found to be 0.65.
Necrosis detection in WSIs oen displays sporadic false positive predictions. From domain knowledge and
discussions with the on-board pathologists, it was determined that single tile (patch) necrosis predictions in a
WSI would typically not be considered necrotic in most circumstances, if surrounded by non necrotic tissue. is
is due to the size of the isolated region playing a part in quantifying on whether a region should be considered
necrosis and scored. Of course, single patches could be necrotic, however, this would be analogues to outlier
detection or statistical noise and uctuations. As a result, a post-processing step was applied to remove these
single tile necrosis predictions. e necrosis predictions are applied to a binary mask, which is has dimensions
downsized by 32x compared with the original WSI. Connected components analysis is then performed36. In this
case, if a tile of a xed size (32
×
32 pixels) is not connected to any other neighbouring tiles (or necrotic regions),
horizontally, vertically, or diagonally, it is automatically removed from the mask. As a result, the nal predictions
are updated based using these binary masks. is process is depicted in Fig.5.
Results
Individual base-member model results. Results for the DenseNet-161 model, DenseNet-161 with aug-
mentations model and the hard negative mining model are presented in Table1. It must be denoted that these
results are patch-based and not WSI (or patient) based. erefore, classication results are based on whether a
sampled patch is considered necrotic or not. Furthermore, these results are based on a default ”decision thresh-
old” or simply ”threshold” of 50%. is means that if a patch is predicted to have a probability condence of more
than 50%, then it is considered necrosis. Anything less than 50% is considered negative (not necrotic).
e addition of augmentations appeared to show a marginal improvement in sensitivity as shown with the
validation (italicised) and hold-out test sets. e hard negative mining model demonstrated an improvement on
the F1-score and specicity scores across the 3 folds, compared to the DenseNet-161 model for both the valida-
tion and test sets. However, sensitivity was adversely aected across the average of the validation and test sets.
Nevertheless, across all models, sensitivity scores were higher in the test set than validation. is is most likely
due to the models nding unencountered tissue types to be suspicious during test, a consequence of training
with limited datasets.
e highest validation and test specicity was produced by the ensemble model; the model trained on the
combined probability outputs of the DenseNet-161 model, DenseNet-161 with augmentations model and the
hard negative mining model. Consequently, the highest validation and test F1-scores are also from the ensemble
model. As the F1-score is used as a basis for choosing the best performing model, we continued with the ensemble
model and used the DenseNet-161 model as a comparison for further post-processing.
(6)
T=arg max
t
F1(t)
Table 1. 3-fold averaged results for the DenseNet-161 model, the hard negative mining model and the
DenseNet-161 with augmentations model, with reported mean sensitivity, specicity and F1-scores averaged
across all three folds. e highest score for a metric is highlighted in bold for the test set, whereas this is
italicised for the validation set. Plus/minus values shows the mean subtracted from the highest and lowest
results from the 3-fold experiments, for each model.
Model Set Sensitivity Specicity F1 score
DenseNet-161
Validation 0.928
+
0.029 0.928
+
0.024 0.724
+
0.060
0.004
0.032
0.096
Tes t 0.939
+
0.003 0.907 0.020 0.404
+
0.053
0.004
0.019
0.049
Hard negative model
Validation 0.906
+
0.033 0.943
+
0.021 0.753
+
0.060
0.040
0.030
0.096
Tes t 0.917
+
0.013 0.926
+
0.011 0.449
+
0.053
0.010
0.016
0.049
DenseNet-161 with augmentations
Validation 0.930
+
0.029 0.922
+
0.018 0.710
+
0.041
0.052
0.022
0.075
Tes t 0.944
+
0.014 0.900
+
0.020 0.389
+
0.046
0.008
0.017
0.041
Ensemble
Validation 0.910
+
0.051 0.955
+
0.029 0.793
+
0.011
0.037
0.038
0.009
Tes t 0.924
+
0.022 0.943
+
0.031 0.535
+
0.028
0.039
0.031
0.025
Content courtesy of Springer Nature, terms of use apply. Rights reserved
10
Vol:.(1234567890)
Scientic Reports | (2022) 12:10634 | https://doi.org/10.1038/s41598-022-13928-1
www.nature.com/scientificreports/
After applying optimal thresholds and post-processing. e post-processing was applied to the
results aer obtaining optimal thresholds for each fold. In this case, the DenseNet-161 model and the ensemble
model. Results are presented in Table2. From this table it is clear that post-processing has a signicant impact,
especially on specicity and F1-scores. e best sensitivities for both validation and test sets were found in the
DenseNet-161 model results. However, the best performing specicity result was from DenseNet-161 (threshold
per fold + post-processed), with 0.992 and 0.984 for validation and test, respectively. As a result, the highest
performing test F1-score also came from this model (0.708).
From Table1, it can be seen that the models on the test data produced sub-optimal F1 scores, suggesting that
the models based on a 50% probability decision threshold, did not generalise at an optimal standard. However,
Table2 demonstrates aer thresholding and post-processing, the F1 scores signicantly improve, suggesting
that higher decision thresholds may be required when applied to unseen data. is could be due to textures and
dierent colours presented from the staining process residing in the test data. As a result, these unseen artefacts
could lead to an increase in low condence necrosis predictions, thus producing false positives. is also suggests
that structures related to necrosis are learnt well using the training data as true positive necrosis predictions tend
to produce high condence predictions. Table3 depicts the confusion matrix value results aer applying opti-
mal thresholds and the single tile removal post-processing. It can be seen that aer applying optimal thresholds
and post-processing, the number of false positives (FPs) signicantly decrease for both the DenseNet-161 and
ensemble models, with a slight reduction in the number of true positives (TPs).
Additionally, accuracy and an average of the sensitivity and specicity (denoted as Sensitivity/Specicity
Average) are also introduced into the Table2. e Sensitivity/Specicity Average can be directly compared to
the results from Sharma etal.22 where the authors averaged their necrosis and non-necrotic classication results
Table 2. 3-fold averaged results aer applying optimal thresholds and the single tile removal post-processing
to the DenseNet-161, Sharma etal’s22 proposed CNN and ensemble models, for the validation and test sets.
Presented in this table are the mean sensitivity, specicity and f1-scores averaged across all three folds.
”reshold per fold + post-processed” is where optimal thresholds derived from three-way cross-validation
followed by single tile removal were applied to all validation folds and the hold-out test set. ”Mean threshold
+ post-processed” is where the optimal thresholds for each fold have been averaged and then applied to each
folds validation and hold-out test set. e highest score for a certain metric is highlighted in bold for the test
set, whereas it is italic for the validation set. Plus/minus values shows the mean subtracted from the highest
and lowest results from the 3-fold experiments, for each model.
Model Set Sensitivity Specicity F1 score Accuracy Sens./ Spec. Avg.
DenseNet-161
Validation 0.928
+
0.029 0.928
+
0.024 0.724
+
0.060 0.927
+
0.017 0.928
+
0.010
0.004
0.032
0.096
0.026
0.009
Tes t 0.939
+
0.003 0.907
+
0.020 0.404
+
0.053 0.908
+
0.019 0.923
+
0.011
0.004
0.019
0.049
0.019
0.012
Sharma etal. Proposed CNN
Validation 0.794
+
0.078 0.954
+
0.029 0.727
+
0.031 0.938
+
0.015 0.874
+
0.023
0.097
0.031
0.020
0.021
0.034
Tes t 0.719
+
0.046 0.951
+
0.004 0.455
+
0.025 0.944
+
0.002 0.835
+
0.023
0.060
0.005
0.014
0.004
0.028
DenseNet-161 (threshold per fold + post-processed)
Validation 0.808
+
0.022 0.992
+
0.001 0.860
+
0.015 0.973
+
0.003 0.900
+
0.011
0.032
0.001
0.011
0.005
0.015
Tes t 0.807
+
0.028 0.984
+
0.001 0.708
+
0.010 0.978
+
0.000 0.896
+
0.013
0.032
0.001
0.011
0.000
0.016
DenseNet-161 (mean 3-fold threshold + post-processed)
Validation 0.813
+
0.042 0.991
+
0.006 0.855
+
0.020 0.972
+
0.003 0.902
+
0.018
0.069
0.005
0.016
0.005
0.032
Tes t 0.807
+
0.010 0.983
+
0.004 0.702
+
0.033 0.977
+
0.004 0.895
+
0.005
0.011
0.004
0.033
0.004
0.003
Ensemble
Validation 0.910
+
0.051 0.955
+
0.029 0.793
+
0.078 0.950
+
0.022 0.933
+
0.006
0.037
0.038
0.112
0.029
0.004
Tes t 0.924
+
0.022 0.943
+
0.031 0.535
+
0.133 0.943
+
0.029 0.934
+
0.011
0.039
0.031
0.120
0.029
0.007
Ensemble (threshold per fold + post-processed)
Validation 0.853
+
0.009 0.990
+
0.001 0.878
+
0.011 0.976
+
0.002 0.922
+
0.004
0.015
0.001
0.006
0.004
0.008
Tes t 0.846
+
0.050 0.981
+
0.031 0.704
+
0.026 0.977
+
0.003 0.914
+
0.022
0.060
0.051
0.019
0.004
0.028
Ensemble (mean 3-fold threshold + post-processed)
Validation 0.868
+
0.070 0.982
+
0.012 0.851
+
0.022 0.969
+
0.006 0.925
+
0.015
0.048
0.018
0.043
0.008
0.022
Tes t 0.871
+
0.032 0.974
+
0.015 0.673
+
0.089 0.971
+
0.012 0.923
+
0.026
0.059
0.014
0.085
0.012
0.018
Content courtesy of Springer Nature, terms of use apply. Rights reserved
11
Vol.:(0123456789)
Scientic Reports | (2022) 12:10634 | https://doi.org/10.1038/s41598-022-13928-1
www.nature.com/scientificreports/
producing 81.44%.When using our dataset, their proposed CNN model produced 83.5% for the Sensitivity/
Specicity Average.
Comparatively, our 3-fold scores range from 89.5 to 93.4%, thus producing the highest accuracies for this
metric for necrosis versus non necrotic (negative) classication.
A set of exemplar spatial confusion maps are shown in Fig.5 where we present the results of the tile-based
classication overlaid onto the original WSIs created from the hold-out test set are shown in Fig.6. In this gure,
we depict results from the DenseNet-161 model and the ensemble model. e le side shows results with the
standard 50% probability threshold applied, whereas the right shows optimal validation threshold applied to the
fold 3 test set results, with single tile removal. It is apparent that there are far fewer FPs aer the post-processing
for both the DenseNet-161 and ensemble models.
e post-processing improved the DenseNet-161 results becoming the top performing model. is is attrib-
uted to spatially sparse FP predictions in slides. All models experienced a slight reduction in sensitivity aer
applying the optimal thresholds and post-processing. ere is a slight increase in false negatives, especially
around the borders of the necrosis clusters (TPs) in the images. However, the ensemble model spatial confusion
matrices depict less FN predictions in the middle of the cluster, in comparison to DenseNet-161. is is important
and allows us to understand the limitations of patch-based approaches and may in fact highlight disagreement
between annotators. We are aware that boundary cases may exist around the borders of these clusters due to the
annotation and patch extraction process. e deep learning models may reect these uncertainties by producing
less condent predictions in these areas. e ensemble model also demonstrates the power of combining multiple
dierent models, mimicking the combination of dierent ”teachers” or ”experts”, as there are fewer FNs in the
middle of the necrosis clusters compared to the DenseNet-161 model.
Discussion
A necrosis detection method was created aer investigating a pretrained DenseNet-161 model, hard negative
mining and ensemble models. We further investigated the application of optimal thresholds and further post-
processing. is is the rst known necrosis detection model for cPWTs and in general cSTS. As a result, we also
produce state-of-the-art performance metrics, especially regarding accuracy and sensitivity/ specicity averages
for necrosis detection in cPWTs.
e post-processing alongside applying optimal thresholds allowed the DenseNet-161 model to produce the
best F1-scores: the key metric for evaluation for this work. is is most likely due to the DenseNet-161 model
generating more ”sparse” FP predictions in than the ensemble model, where there are more clustered FP predic-
tions. Nevertheless, although producing the highest F1-scores, this dierence was marginally higher than the
ensemble model with post-processing.
However, upon inspection of the spatial confusion matrix heatmaps, it was observed that both optimised
models diered slightly especially in regards to false negative (FN) predictions, with slightly fewer FNs inside
the true positive clusters for the ensemble model. is further demonstrates the dierence in learning between
alternative types of machine learning models such as deep learning models and ensembles with logistic regression
as their backbone. is paper demonstrates that deep learning models can be successfully used as a diagnostic
support tool for grading cPWT in cSTS. Necrosis detection should also be investigated with other cSTS subtypes.
e study presented here has the potential to improve the veterinary anatomic pathology workow. e
application of deep learning based-methods to diagnostic veterinary pathology will improve the accuracy of
diagnosis and allow pathologist laboratories to handle larger clinical caseloads. ese methods could also be
Table 3. 3-fold averaged confusion matrix value results aer applying optimal thresholds and the single tile
removal post-processing to the DenseNet-161 and ensemble models, for the hold-out test sets. Presented
in this table are the true negative (TN), true positive (TP), false negative (FN) and false positive (FP) values
averaged across all three folds, rounded to the nearest whole gure. Plus/minus values shows the mean
subtracted from the highest and lowest results from the 3-fold experiments, for each model.
Model TN TP FN FP
DenseNet-161 30880
+
672 1081
+
3 70
+
5 3171
+
651
651
5
3
672
DenseNet-161 (threshold per fold + post-processed) 33505
+
27 929
+
32 222
+
37 546
+
19
19
37
32
27
DenseNet-161 (mean 3-fold threshold + post-processed) 33479
+
143 929
+
12 222
+
12 572
+
126
126
12
12
143
Ensemble 32119
+
1053 1063
+
26 88
+
44 1932
+
1048
1048
44
26
1053
Ensemble (threshold per fold + post-processed) 33221
+
101 983
+
57 168
+
69 830
+
184
184
69
57
101
Ensemble (mean 3-fold threshold + post-processed) 33180
+
504 1002
+
37 149
+
68 871
+
460
460
68
37
504
Content courtesy of Springer Nature, terms of use apply. Rights reserved
12
Vol:.(1234567890)
Scientic Reports | (2022) 12:10634 | https://doi.org/10.1038/s41598-022-13928-1
www.nature.com/scientificreports/
Figure6. Sample Whole Slide Image (WSI) spatial confusion maps before and aer applying optimal threshold
(determined from the fold 3 validation set) and post-processing; removing single tile predictions. e le side
images shows predictions from the DenseNet-161 and ensemble models with the standard 50% probability
decision thresholds. e right side shows predictions aer applying the optimal threshold and post-processing.
True positives (TP) are displayed in red, false negatives (FN) in green, false positives (FP) in yellow and true
negatives (TN) in clear.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
13
Vol.:(0123456789)
Scientic Reports | (2022) 12:10634 | https://doi.org/10.1038/s41598-022-13928-1
www.nature.com/scientificreports/
applied to other imaging modalities in veterinary health, including clinical veterinary pathology such as cytology
and diagnostic imaging such as MRI or radiography.
Future work
Future work should include exploring further alternative convolutional neural networks that have not been
previously and thoroughly investigated for use in digital pathology. Although this necrosis model has been
developed and trained using cPWT, it would also be of interest to apply these necrosis detection models to other
closely related cSTS subtypes.
Data availability
e datasets used and/or analysed during the current study available from the corresponding author on reason-
able request.
Received: 13 March 2022; Accepted: 30 May 2022
References
1. Bostock, D. & Dye, M. Prognosis aer surgical excision of canine brous connective tissue sarcomas. Vet. Pathol. 17, 581–588
(1980).
2. Dernell, W. S., Withrow, S. J., Kuntz, C. A. & Powers, B. E. Principles of treatment for so tissue sarcoma. Clin. Tech. Small Anim.
Pract. 13, 59–64 (1998).
3. Ehrhart, N. So-tissue sarcomas in dogs: a review. J. Am. Anim. Hosp. Assoc. 41, 241–246 (2005).
4. Mayer, M. N. & LaRue, S. M. So tissue sarcomas in dogs. Can. Vet. J. 46, 1048 (2005).
5. Cavalcanti, E. B. et al. Correlation of clinical, histopathological and histomorphometric features of canine so tissue sarcomas.
Braz. J. Vet. Pathol. 14, 151–158 (2021).
6. Torrigiani, F., Pierini, A., Lowe, R., Simčič, P. & Lubas, G. So tissue sarcoma in dogs: a treatment review and a novel approach
using electrochemotherapy in a case series. Vet. Comp. Oncol. 17, 234–241 (2019).
7. Dennis, M. et al. Prognostic factors for cutaneous and subcutaneous so tissue sarcomas in dogs. Vet. Pathol. 48, 73–84 (2011).
8. Kuntz, C. et al. Prognostic factors for surgical treatment of so-tissue sarcomas in dogs: 75 cases (1986–1996). J. Am. Vet. Med.
Assoc. 211, 1147–1151 (1997).
9. McSporran, K. Histologic grade predicts recurrence for marginally excised canine subcutaneous so tissue sarcomas. Vet. Pathol.
46, 928–933 (2009).
10. Bray, J. P., Polton, G. A., McSporran, K. D., Bridges, J. & Whitbread, T. M. Canine so tissue sarcoma managed in rst opinion
practice: outcome in 350 cases. Vet. Surg. 43, 774–782 (2014).
11. Avallone, G. et al. Review of histological grading systems in veterinary medicine. Vet. Pathol. 58, 809–828 (2021).
12. Avallone, G. et al. e spectrum of canine cutaneous perivascular wall tumors: morphologic, phenotypic and clinical characteriza-
tion. Vet. Pathol. 44, 607–620 (2007).
13. Loures, F. et al. Histopathology and immunohistochemistry of peripheral neural sheath tumor and perivascular wall tumor in
dog. Arq. Bras. Med. Vet. Zootec. 71, 1100–1106 (2019).
14. Xing, F., Xie, Y., Su, H., Liu, F. & Yang, L. Deep learning in microscopy image analysis: a survey. IEEE Trans. Neural Netw. Learn.
Syst. 29, 4550–4568 (2017).
15. Ing, N. etal. Semantic segmentation for prostate cancer grading by convolutional neural networks. In Medical Imaging 2018: Digital
Pathology, vol. 10581, 105811B (International Society for Optics and Photonics, 2018).
16. Ertosun, M.G. & Rubin, D.L. Automated grading of gliomas using deep learning in digital pathology images: a modular approach
with ensemble of convolutional neural networks. In AMIA Annual Symposium Proceedings, vol. 2015, 1899 (American Medical
Informatics Association, 2015).
17. Madabhushi, A. & Lee, G. Image analysis and machine learning in digital pathology: challenges and opportunities. Med. Image
Anal. 33, 170–175 (2016).
18. Klein, C. et al. Articial intelligence for solid tumour diagnosis in digital pathology. Br. J. Pharmacol. 178, 4291–4315 (2021).
19. Sethy, P. K. & Behera, S. K. Automatic classication with concatenation of deep and handcraed features of histological images
for breast carcinoma diagnosis. Multimed. Tools Appl. 81, 9631–9643 (2022).
20. Cross, S., Dennis, T. & Start, R. Telepathology: current status and future prospects in diagnostic histopathology. Histopathology
41, 91–109 (2002).
21. Sharma, H. etal. Appearance-based necrosis detection using textural features and svm with discriminative thresholding in his-
topathological whole slide images. In 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE), 1–6
(IEEE, 2017).
22. Sharma, H., Zerbe, N., Klempert, I., Hellwich, O. & Hufnagl, P. Deep convolutional neural networks for automatic classication
of gastric carcinoma using whole slide images in digital histopathology. Comput. Med. Imaging Graph. 61, 2–13 (2017).
23. Arunachalam, H. B. et al. Viable and necrotic tumor assessment from whole slide images of osteosarcoma using machine-learning
and deep-learning models. PloS One 14, e0210706 (2019).
24. Litjens, G. Automated slide analysis platform (ASAP) (2017).
25. Rai, T. etal. Can imagenet feature maps be applied to small histopathological datasets for the classication of breast cancer meta-
static tissue in whole slide images? In Medical Imaging 2019: Digital Pathology, vol. 10956, 109560V (International Society for
Optics and Photonics, 2019).
26. Rai, T. etal. An investigation of aggregated transfer learning for classication in digital pathology. In Medical Imaging 2019: Digital
Pathology, vol. 10956, 109560U (International Society for Optics and Photonics, 2019).
27. Huang, G., Liu, Z., Van DerMaaten, L. & Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 4700–4708 (2017).
28. Talo, M. Automated classication of histopathology images using transfer learning. Artif. Intell. Med. 101, 101743 (2019).
29. Kingma, D.P. & Ba, J. Adam: a method for stochastic optimization. arXiv preprintarXiv: 1412. 6980 (2014).
30. Shorten, C. & Khoshgoaar, T. M. A survey on image data augmentation for deep learning. J. Big Data 6, 1–48 (2019).
31. Li, M. etal. Deep instance-level hard negative mining model for histopathology images. In International Conference on Medical
Image Computing and Computer‑Assisted Intervention, 514–522 (Springer, 2019).
32. Yang, Y. Temporal Data Mining via Unsupervised Ensemble Learning (Elsevier, 2016).
33. Kang, J. & Gwak, J. Ensemble of instance segmentation models for polyp segmentation in colonoscopy images. IEEE Access 7,
26440–26447 (2019).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
14
Vol:.(1234567890)
Scientic Reports | (2022) 12:10634 | https://doi.org/10.1038/s41598-022-13928-1
www.nature.com/scientificreports/
34. Ataloglou, D., Dimou, A., Zarpalas, D. & Daras, P. Fast and precise hippocampus segmentation through deep convolutional neural
network ensembles and transfer learning. Neuroinformatics 17, 563–582 (2019).
35. Qummar, S. et al. A deep learning ensemble approach for diabetic retinopathy detection. IEEE Access 7, 150530–150539 (2019).
36. Spizhevoi, A. & Rybnikov, A. OpenCV 3 Computer Vision with Python Cookbook: Leverage the Power of OpenCV 3 and Python to
Build Computer Vision Applications (Packt Publishing Ltd, 2018).
Acknowledgements
We would like to thank the Doctoral College, University of Surrey (UK), National Physical Laboratory (UK) and
Zoetis through the vHive initiative, for making this research possible.
Author contributions
T.R. conducted all experiments. A.M. and B.B. conducted the annotations process. T.R., M.B., R.L., K.W. and S.T.
analysed the results. All authors contributed to manuscript preparation and reviewed the manuscript.
Competing interests
e authors declare no competing interests.
Additional information
Supplementary Information e online version contains supplementary material available at https:// doi. org/
10. 1038/ s41598- 022- 13928-1.
Correspondence and requests for materials should be addressed to T.R.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access is article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons licence, and indicate if changes were made. e images or other third party material in this
article are included in the articles Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.
© e Author(s) 2022
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... There were several motivations for this study. We previously published [27] the first report on the use of deep learning to detect cSTSs in haematoxylin and eosin (H&E)-stained whole slides. However, the study reported here builds on the initial study and focuses on grading. ...
... In this study, we present a detailed report describing the automatic assessment of WSIs for the detection and quantification of necrosis in cSTSs, providing further insight and analysis from our baseline approach as previously published [27]. The experiments presented in this study confirmed that DenseNet161 is able to recognise areas of necrosis with high accuracy (92.7%). ...
... In this study, we present a detailed report describing the automatic assessment of WSIs for the detection and quantification of necrosis in cSTSs, providing further insight and analysis from our baseline approach as previously published [27]. The experiments pre-sented in this study confirmed that DenseNet161 is able to recognise areas of necrosis with high accuracy (92.7%). ...
Article
Full-text available
Simple Summary Canine soft-tissue sarcomas are a group of tumours that arise from the skin and subcutaneous connective tissue. The most common method used to predict the behaviour of these tumours is grading. The grading system used for soft-tissue sarcomas is derived from a combined score calculated by evaluating the mitotic count, percentage of tumour necrosis and degree of cellular differentiation. However, these parameters are highly subjective and a high inter-observer variability has been reported in grading these tumours, which can result in complications regarding treatment plans. Manual identification of areas of necrosis is a time-consuming task that is prone to observer error. Artificial-intelligence algorithms and, in particular, machine learning, can help improve grading by automatically detecting regions of necrosis. The aim of this study was to differentiate image regions in order to automatically identify tumour necrosis in digitised canine soft-tissue sarcoma slides. This method showed an accuracy of 92.7% which represents the number of correctly classified data instances over the total number of data instances. Therefore, the proposed method is a promising tool to minimise human error in the evaluation of necrosis in soft-tissue sarcomas, and hence increase the efficiency and accuracy of histopathological grading of canine soft-tissue sarcomas. Abstract The definitive diagnosis of canine soft-tissue sarcomas (STSs) is based on histological assessment of formalin-fixed tissues. Assessment of parameters, such as degree of differentiation, necrosis score and mitotic score, give rise to a final tumour grade, which is important in determining prognosis and subsequent treatment modalities. However, grading discrepancies are reported to occur in human and canine STSs, which can result in complications regarding treatment plans. The introduction of digital pathology has the potential to help improve STS grading via automated determination of the presence and extent of necrosis. The detected necrotic regions can be factored in the grading scheme or excluded before analysing the remaining tissue. Here we describe a method to detect tumour necrosis in histopathological whole-slide images (WSIs) of STSs using machine learning. Annotated areas of necrosis were extracted from WSIs and the patches containing necrotic tissue fed into a pre-trained DenseNet161 convolutional neural network (CNN) for training, testing and validation. The proposed CNN architecture reported favourable results, with an overall validation accuracy of 92.7% for necrosis detection which represents the number of correctly classified data instances over the total number of data instances. The proposed method, when vigorously validated represents a promising tool to assist pathologists in evaluating necrosis in canine STS tumours, by increasing efficiency, accuracy and reducing inter-rater variation.
... In this study we used a canine Perivascular Wall Tumour (cPWT) dataset to train our models [11] [12][13] [14]. A veterinary pathologist diagnosed canine Soft Tissue Sarcoma (cSTS) using histology slides collected from the Department of Microbiology, Immunology and Pathology, Colorado State University. ...
... These scores ranged from 0 to 1, where 1 would highlight the model is 100% certain that the candidate is mitosis and 0.01 would describe a prediction that is very low in confidence. We optimised our models based on the F1-score [44][45][46]. The probability thresholds t ranged from 0.01 to 1, and so choosing the optimal threshold T for the F1-score F1 can be represented formally as: ...
Article
Full-text available
Simple Summary Performing a mitosis count (MC) is essential in grading canine Soft Tissue Sarcoma (cSTS) and canine Perivascular Wall Tumours (cPWTs), although it is subject to inter- and intra-observer variability. To enhance standardisation, an artificial intelligence mitosis detection approach was investigated. A two-step annotation process was utilised with a pre-trained Faster R-CNN model, refined through veterinary pathologists’ reviews of false positives, and subsequently optimised using an F1-score thresholding method to maximise accuracy measures. The study achieved a best F1-score of 0.75, demonstrating competitiveness in the field of canine mitosis detection. Abstract Performing a mitosis count (MC) is the diagnostic task of histologically grading canine Soft Tissue Sarcoma (cSTS). However, mitosis count is subject to inter- and intra-observer variability. Deep learning models can offer a standardisation in the process of MC used to histologically grade canine Soft Tissue Sarcomas. Subsequently, the focus of this study was mitosis detection in canine Perivascular Wall Tumours (cPWTs). Generating mitosis annotations is a long and arduous process open to inter-observer variability. Therefore, by keeping pathologists in the loop, a two-step annotation process was performed where a pre-trained Faster R-CNN model was trained on initial annotations provided by veterinary pathologists. The pathologists reviewed the output false positive mitosis candidates and determined whether these were overlooked candidates, thus updating the dataset. Faster R-CNN was then trained on this updated dataset. An optimal decision threshold was applied to maximise the F1-score predetermined using the validation set and produced our best F1-score of 0.75, which is competitive with the state of the art in the canine mitosis domain.
... Convolutional neural networks have been used in various applications related to breast cancer, such as in medical imaging for the diagnosis and prognosis of breast cancer. In veterinary medicine, several manuscripts have described the potentiality of this technology, but no consistent efforts have been undertaken regarding canine mammary tumors [69][70][71][72][73][74][75]. Deep learning-based algorithms can assist the pathologist in classifying tumors on standard hematoxylin and eosin images. ...
Article
Full-text available
Simple Summary Digital pathology (DP) and computer-aided diagnosis (CAD) are rapidly evolving fields that have great potential for improving the accuracy and efficiency of cancer diagnosis, including that of canine mammary tumors (CMTs), the most common neoplasm in female dogs. The work presents a study on the development of CAD systems for the automated classification of CMTs utilizing convolutional neural networks (CNNs) to extract features from histopathological images of CMTs and classify them into benign or malignant tumors. The study shows that the proposed framework can accurately distinguish between benign and malignant CMTs, with testing accuracies ranging from 0.63 to 0.85. The study emphasizes how digital pathology and CAD could help veterinarians and pathologists in accurately diagnosing the tumor type, which is crucial in determining the optimal course of treatment. Overall, digital pathology and CAD are promising tools that could improve the accuracy and efficiency of cancer diagnosis, including that of canine mammary tumors. Abstract Histopathology, the gold-standard technique in classifying canine mammary tumors (CMTs), is a time-consuming process, affected by high inter-observer variability. Digital (DP) and Computer-aided pathology (CAD) are emergent fields that will improve overall classification accuracy. In this study, the ability of the CAD systems to distinguish benign from malignant CMTs has been explored on a dataset—namely CMTD—of 1056 hematoxylin and eosin JPEG images from 20 benign and 24 malignant CMTs, with three different CAD systems based on the combination of a convolutional neural network (VGG16, Inception v3, EfficientNet), which acts as a feature extractor, and a classifier (support vector machines (SVM) or stochastic gradient boosting (SGB)), placed on top of the neural net. Based on a human breast cancer dataset (i.e., BreakHis) (accuracy from 0.86 to 0.91), our models were applied to the CMT dataset, showing accuracy from 0.63 to 0.85 across all architectures. The EfficientNet framework coupled with SVM resulted in the best performances with an accuracy from 0.82 to 0.85. The encouraging results obtained by the use of DP and CAD systems in CMTs provide an interesting perspective on the integration of artificial intelligence and machine learning technologies in cancer-related research.
... Another issue is the subjective nature of the inflammation score-although the differentiation score it replaces in the Trojani scheme [32] is also subjective, it would be advantageous if all criteria within any grading system were objective, readily obtainable from routinely-stained haematoxylin and eosin sections and easy to reproduce, thereby reducing variability between pathologists and laboratories [48,49]. With the advent and increasing adoption of image analysis within veterinary pathology [50] it may be that artificial intelligence plays an important role in quantifying criteria such as inflammation, as well as other features such as the extent of necrosis [51], and mitotic counts [52] with increased accuracy. ...
Article
Full-text available
Simple Summary Soft tissue sarcomas are a common form of cancer arising in the skin and connective tissues of domestic cats. Soft tissue sarcomas encompass a group of different histological subtypes of tumours, which can behave in a range of different ways in the patient. In dogs and in humans, this group of tumours can be given a histological score (“grade”) at the time of diagnosis, which is prognostic, but there is no equivalent, well-established grading system for these tumours in cats. This review looks at soft tissue sarcomas in terms of which histological subtypes of tumour should be included in this group, and how pathologists approach their grading, comparing feline tumours with their human and canine counterparts. Abstract Soft tissue sarcomas are one of the most commonly diagnosed tumours arising in the skin and subcutis of our domestic cats, and are malignant neoplasms with a range of histological presentations and potential biological behaviours. However, unlike their canine and human counterparts, there is no well-established histological grading system for pathologists to apply to these tumours, in order to provide a more accurate and refined prognosis. The situation is further complicated by the presence of feline injection site sarcomas as an entity, as well as confusion over terminology for this group of tumours and which histological types should be included. There is also an absence of large scale studies. This review looks at these tumours in domestic cats, their classification and histological grading, with comparisons to the human and canine grading system.
Article
Full-text available
The second leading cause of death from cancer among women is breast cancer. In order to prevent avoidable deaths, early detection is extremely necessary. Malignancy evaluation of tissue biopsies, however, is complicated and based on observer subjectivity. In addition, histological images stained with hematoxylin and eosin (H&E) exhibit a highly variable appearance, also at the same degree of malignancy. In this paper, we propose a classification model based on KNN with the combination of deep and handcrafted features using histological images to diagnose breast cancer. Here, four malignancy levels are considered, namely normal, benign, in situ, and invasive. The classification of four malignancy levels is examined by three classifiers with three sets of deep features and three handcrafted features. The deep features are extracted from the fc6 layer of three pre-trained networks: alexnet, vgg16, and vgg19. The handcrafted features are GLCM, HOG, and LBP. After evaluation, the top-performed classifier, deep feature, and handcrafted features are considered to frame the classification model. The classification model based on fine- KNN with combining the feature of vgg16 and LBP achieved satisfactory diagnostic effectiveness (accuracy) of 84.2% and area under the curve (AUC) of 0.85. Further, the likelihood ratio for positive results (LR+) is greater than 10, i.e., 12.5, which implicates the proposed method has a significant contribution to the diagnosis and a good diagnostic test.
Article
Full-text available
Soft-tissue sarcomas (STS) represent a heterogeneous group of tumours with similar histological characteristics and biological behaviour. This study aimed to describe the correlation between clinical, histopathological and histomorphometric features of STS in dogs. Medical records were reviewed to identify all dogs in which an STS was diagnosed between 2006-2017. Thirty cases were included, and tumour samples and medical records were recovered. Most of the dogs were mixed breed (40%) and 80% of the STS were located in the subcutaneous connective tissue. Histopathological classification showed that undifferentiated sarcoma (17%) and peripheral nerve sheath tumour (30%) were the most common STS. Grade I STS were obtained in 50% of cases (15/30), and grade II or III tumours compromised 43% (13/30) and 7% (2/30) respectively. The mitotic index ranged from zero to 26 (5.8 ± 7.5). Increased nucleus:cytoplasm ratio was moderately associated with higher tumour grade (p = 0.05; rS = 0.361) and mitotic index (p = 0.05; rS = 0.355), while the number of microvessels was positively correlated with degree of differentiation (p = 0.05; rS = 0.362) and nuclear pleomorphism (p = 0.036; rS = 0.384). Histomorphometry proved to be useful in the evaluation of STS, representing an additional tool correlated with well-established prognostic factors (histopathological grade, degree of differentiation, nuclear pleomorphism).
Article
Full-text available
Tumour diagnosis relies on the visual examination of histological slides by pathologists through a microscope eyepiece. Digital pathology, the digitalization of histological slides at high magnification with slides scanners, has raised the opportunity to extract quantitative information due to image analysis. In the last decade, medical image analysis has made exceptional progress due to the development of artificial intelligence (AI) algorithms. AI has been successfully used in the field of medical imaging and more recently in digital pathology. The feasibility and usefulness of AI assisted pathology tasks have been demonstrated in the very last years and we can expect those developments to be applied to routine histopathology in the future. In this review, we will describe and illustrate this technique and present the most recent applications in the field of tumour histopathology. LINKED ARTICLES This article is part of a themed issue on Molecular imaging ‐ visual themed issue. To view the other articles in this section visit http://onlinelibrary.wiley.com/doi/10.1111/bph.v178.21/issuetoc
Article
Full-text available
Natural Language Processing (NLP) is one of the most captivating applications of Deep Learning. In this survey, we consider how the Data Augmentation training strategy can aid in its development. We begin with the major motifs of Data Augmentation summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form. We follow these motifs with a concrete list of augmentation frameworks that have been developed for text data. Deep Learning generally struggles with the measurement of generalization and characterization of overfitting. We highlight studies that cover how augmentations can construct test sets for generalization. NLP is at an early stage in applying Data Augmentation compared to Computer Vision. We highlight the key differences and promising ideas that have yet to be tested in NLP. For the sake of practical implementation, we describe tools that facilitate Data Augmentation such as the use of consistency regularization, controllers, and offline and online augmentation pipelines, to preview a few. Finally, we discuss interesting topics around Data Augmentation in NLP such as task-specific augmentations, the use of prior knowledge in self-supervised learning versus Data Augmentation, intersections with transfer and multi-task learning, and ideas for AI-GAs (AI-Generating Algorithms). We hope this paper inspires further research interest in Text Data Augmentation.
Article
Full-text available
Diabetic Retinopathy (DR) is an ophthalmic disease that damages retinal blood vessels. DR causes impaired vision and may even lead to blindness if it is not diagnosed in early stages. DR has five stages or classes, namely normal, mild, moderate, severe and PDR (Proliferative Diabetic Retinopathy). Normally, highly trained experts examine the colored fundus images to diagnose this fatal disease. This manual diagnosis of this condition (by clinicians) is tedious and error-prone. Therefore, various computer vision-based techniques have been proposed to automatically detect DR and its different stages from retina images. However, these methods are unable to encode the underlying complicated features and can only classify DR’s different stages with very low accuracy particularly, for the early stages. In this research, we used the publicly available Kaggle dataset of retina images to train an ensemble of five deep Convolution Neural Network (CNN) models (Resnet50, Inceptionv3, Xception, Dense121, Dense169) to encode the rich features and improve the classification for different stages of DR. The experimental results show that the proposed model detects all the stages of DR unlike the current methods and performs better compared to state-of-the-art methods on the same Kaggle dataset.
Article
Full-text available
Soft tissue sarcomas (STS) comprise a heterogeneous group of malignancies derived from extra-skeletal mesenchymal tissues that may show similar histopathological changes. Histopathologic patterns suggestive of perivascular wall tumors (PWT) and peripheral nerve sheath tumors (PNST) have been described. This study investigated the histogenesis in a series of 71 cases of canine STS that showed morphological compatibility with what is described for PWT and PNST. Immunohistochemistry analysis were done to CD56, S100, SMA, Desmin, Von Willebrand Factor, NSE and GFAP. Twenty-one cases (29.6%) showed histopathologic features compatible with PWT, 23 cases (32.4%) with PNST and 27 cases (38.0%) shared both histopathological features. By immunohistochemistry, 59 (83.1%) cases showed positivity only for neural markers and 12 (16.9%) had simultaneous positivity for both neural and muscle markers. PNST was the most prevalent neoplasm and none of the cases were positive for muscle markers only. The histopathologic features were not useful to define the diagnosis of PWT, since most tumors were negative for muscle markers but positive for neural markers. Due to this immunoreactivity and the morphologic features, future studies may propose guidelines for the classification of these neoplasms.
Article
Full-text available
Abstract Deep convolutional neural networks have performed remarkably well on many Computer Vision tasks. However, these networks are heavily reliant on big data to avoid overfitting. Overfitting refers to the phenomenon when a network learns a function with very high variance such as to perfectly model the training data. Unfortunately, many application domains do not have access to big data, such as medical image analysis. This survey focuses on Data Augmentation, a data-space solution to the problem of limited data. Data Augmentation encompasses a suite of techniques that enhance the size and quality of training datasets such that better Deep Learning models can be built using them. The image augmentation algorithms discussed in this survey include geometric transformations, color space augmentations, kernel filters, mixing images, random erasing, feature space augmentation, adversarial training, generative adversarial networks, neural style transfer, and meta-learning. The application of augmentation methods based on GANs are heavily covered in this survey. In addition to augmentation techniques, this paper will briefly discuss other characteristics of Data Augmentation such as test-time augmentation, resolution impact, final dataset size, and curriculum learning. This survey will present existing methods for Data Augmentation, promising developments, and meta-level decisions for implementing Data Augmentation. Readers will understand how Data Augmentation can improve the performance of their models and expand limited datasets to take advantage of the capabilities of big data.
Article
Tumor grading is a method to quantify the putative clinical aggressiveness of a neoplasm based on specific histological features. A good grading system should be simple, easy to use, reproducible, and accurately segregate tumors into those with low versus high risk. The aim of this review is to summarize the histological and, when available, cytological grading systems applied in veterinary pathology, providing information regarding their prognostic impact, reproducibility, usefulness, and shortcomings. Most of the grading schemes used in veterinary medicine are developed for common tumor entities. Grading systems exist for soft tissue sarcoma, osteosarcoma, multilobular tumor of bone, mast cell tumor, lymphoma, mammary carcinoma, pulmonary carcinoma, urothelial carcinoma, renal cell carcinoma, prostatic carcinoma, and central nervous system tumors. The prognostic relevance of many grading schemes has been demonstrated, but for some tumor types the usefulness of grading remains controversial. Furthermore, validation studies are available only for a minority of the grading systems. Contrasting data on the prognostic power of some grading systems, lack of detailed instructions in the materials and methods in some studies, and lack of data on reproducibility and validation studies are discussed for the relevant grading systems. Awareness of the limitations of grading is necessary for pathologists and oncologists to use these systems appropriately and to drive initiatives for their improvement.
Article
Early and accurate diagnosis of diseases can often save lives. Diagnosis of diseases from tissue samples is done manually by pathologists. Diagnostics process is usually time consuming and expensive. Hence, automated analysis of tissue samples from histopathology images has critical importance for early diagnosis and treatment. The computer aided systems can improve the quality of diagnoses and give pathologists a second opinion for critical cases. In this study, a deep learning based transfer learning approach has been proposed to classify histopathology images automatically. Two well-known and current pre-trained convolutional neural network (CNN) models, ResNet-50 and DenseNet-161, have been trained and tested using color and grayscale images. The DenseNet-161 tested on grayscale images and obtained the best classification accuracy of 97.89%. Additionally, ResNet-50 pre-trained model was tested on the color images of the Kimia Path24 dataset and achieved the highest classification accuracy of 98.87%. According to the obtained results, it may be said that the proposed pre-trained models can be used for fast and accurate classification of histopathology images and assist pathologists in their daily clinical tasks.
Chapter
Histopathology image analysis can be considered as a Multiple instance learning (MIL) problem, where the whole slide histopathology image (WSI) is regarded as a bag of instances (i.e., patches) and the task is to predict a single class label to the WSI. However, in many real-life applications such as computational pathology, discovering the key instances that trigger the bag label is of great interest because it provides reasons for the decision made by the system. In this paper, we propose a deep convolutional neural network (CNN) model that addresses the primary task of a bag classification on a histopathology image and also learns to identify the response of each instance to provide interpretable results to the final prediction. We incorporate the attention mechanism into the proposed model to operate the transformation of instances and learn attention weights to allow us to find key patches. To perform a balanced training, we introduce adaptive weighing in each training bag to explicitly adjust the weight distribution in order to concentrate more on the contribution of hard samples. Based on the learned attention weights, we further develop a solution to boost the classification performance by generating the bags with hard negative instances. We conduct extensive experiments on colon and breast cancer histopathology data and show that our framework achieves state-of-the-art performance.