ArticlePDF Available

Deep Learning for Identifying Metastatic Breast Cancer

  • PathAI Inc

Abstract and Figures

The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies. Our team won both competitions in the grand challenge, obtaining an area under the receiver operating curve (AUC) of 0.925 for the task of whole slide image classification and a score of 0.7051 for the tumor localization task. A pathologist independently reviewed the same images, obtaining a whole slide image classification AUC of 0.966 and a tumor localization score of 0.733. Combining our deep learning system's predictions with the human pathologist's diagnoses increased the pathologist's AUC to 0.995, representing an approximately 85 percent reduction in human error rate. These results demonstrate the power of using deep learning to produce significant improvements in the accuracy of pathological diagnoses.
Content may be subject to copyright.
Deep Learning for Identifying Metastatic Breast Cancer
Dayong Wang Aditya Khosla?Rishab Gargeya Humayun Irshad Andrew H Beck
Beth Israel Deaconess Medical Center, Harvard Medical School
?CSAIL, Massachusetts Institute of Technology
The International Symposium on Biomedical Imaging
(ISBI) held a grand challenge to evaluate computational
systems for the automated detection of metastatic breast
cancer in whole slide images of sentinel lymph node biop-
sies. Our team won both competitions in the grand chal-
lenge, obtaining an area under the receiver operating curve
(AUC) of 0.925 for the task of whole slide image classifica-
tion and a score of 0.7051 for the tumor localization task.
A pathologist independently reviewed the same images, ob-
taining a whole slide image classification AUC of 0.966 and
a tumor localization score of 0.733. Combining our deep
learning system’s predictions with the human pathologist’s
diagnoses increased the pathologist’s AUC to 0.995, rep-
resenting an approximately 85 percent reduction in human
error rate. These results demonstrate the power of using
deep learning to produce significant improvements in the
accuracy of pathological diagnoses.
1. Introduction
The medical specialty of pathology is tasked with pro-
viding definitive disease diagnoses to guide patient treat-
ment and management decisions [4]. Standardized, accu-
rate and reproducible pathological diagnoses are essential
for advancing precision medicine. Since the mid-19th cen-
tury, the primary tool used by pathologists to make diag-
noses has been the microscope [1]. Limitations of the quali-
tative visual analysis of microscopic images includes lack of
standardization, diagnostic errors, and the significant cog-
nitive load required to manually evaluate millions of cells
across hundreds of slides in a typical pathologist’s workday
[15,17,7]. Consequently, over the past several decades
there has been increasing interest in developing computa-
tional methods to assist in the analysis of microscopic im-
ages in pathology [9,8].
From October 2015 to April 2016, the International
Symposium on Biomedical Imaging (ISBI) held the Came-
lyon Grand Challenge 2016 (Camelyon16) to identify top-
performing computational image analysis systems for the
task of automatically detecting metastatic breast cancer in
digital whole slide images (WSIs) of sentinel lymph node
biopsies1. The evaluation of breast sentinel lymph nodes is
an important component of the American Joint Committee
on Cancer’s TNM breast cancer staging system, in which
patients with a sentinel lymph node positive for metastatic
cancer will receive a higher pathologic TNM stage than pa-
tients negative for sentinel lymph node metastasis [6], fre-
quently resulting in more aggressive clinical management,
including axillary lymph node dissection [13,14].
The manual pathological review of sentinel lymph nodes
is time-consuming and laborious, particularly in cases in
which the lymph nodes are negative for cancer or contain
only small foci of metastatic cancer. Many centers have
implemented testing of sentinel lymph nodes with immuno-
histochemistry for pancytokeratins [5], which are proteins
expressed on breast cancer cells and not normally present
in lymph nodes, to improve the sensitivity of cancer metas-
tasis detection. However, limitations of pancytokeratin im-
munohiostochemistry testing of sentinel lymph nodes in-
clude: increased cost, increased time for slide preparation,
and increased number of slides required for pathological
review. Further, even with immunohistochemistry-stained
slides, the identification of small cancer metastases can be
tedious and inaccurate.
Computer-assisted image analysis systems have been de-
veloped to aid in the detection of small metastatic foci
from pancytokeratin-stained immunohistochemistry slides
of sentinel lymph nodes [22]; however, these systems are
not used clinically. Thus, the development of effective and
cost efficient methods for sentinel lymph node evaluation
remains an active area of research [11], as there would be
value to a high-performing system that could increase accu-
racy and reduce cognitive load at low cost.
Here, we present a deep learning-based approach for the
identification of cancer metastases from whole slide im-
arXiv:1606.05718v1 [q-bio.QM] 18 Jun 2016
ages of breast sentinel lymph nodes. Our approach uses
millions of training patches to train a deep convolutional
neural network to make patch-level predictions to discrim-
inate tumor-patches from normal-patches. We then aggre-
gate the patch-level predictions to create tumor probability
heatmaps and perform post-processing over these heatmaps
to make predictions for the slide-based classification task
and the tumor-localization task. Our system won both com-
petitions at the Camelyon Grand Challenge 2016, with per-
formance approaching human level accuracy. Finally, com-
bining the predictions of our deep learning system with a
pathologist’s interpretations produced a significant reduc-
tion in the pathologist’s error rate.
2. Dataset and Evaluation Metrics
In this section, we describe the Camelyon16 dataset pro-
vided by the organizers of the competition and the evalua-
tion metrics used to rank the participants.
2.1. Camelyon16 Dataset
The Camelyon16 dataset consists of a total of 400 whole
slide images (WSIs) split into 270 for training and 130 for
testing. Both splits contain samples from two institutions
(Radbound UMC and UMC Utrecht) with specific details
provided in Table 1.
Table 1: Number of slides in the Camelyon16 dataset.
Institution Train Train Test
cancer normal
Radboud UMC 90 70 80
UMC Utrecht 70 40 50
Total 160 110 130
The ground truth data for the training slides consists of
a pathologist’s delineation of regions of metastatic cancer
on WSIs of sentinel lymph nodes. The data was provided in
two formats: XML files containing vertices of the annotated
contours of the locations of cancer metastases and WSI bi-
nary masks indicating the location of the cancer metastasis.
2.2. Evaluation Metrics
Submissions to the competition were evaluated on the
following two metrics:
Slide-based Evaluation: For this metric, teams were
judged on performance at discriminating between
slides containing metastasis and normal slides. Com-
petition participants submitted a probability for each
test slide indicating its predicted likelihood of contain-
ing cancer. The competition organizers measured the
participant performance using the area under the re-
ceiver operator (AUC) score.
Lesion-based Evaluation: For this metric, partic-
ipants submitted a probability and a corresponding
(x, y)location for each predicted cancer lesion within
the WSI. The competition organizers measured partic-
ipant performance as the average sensitivity for detect-
ing all true cancer lesions in a WSI across 6 false pos-
itive rates: 1
2, 1, 2, 4, and 8 false positives per WSI.
3. Method
In this section, we describe our approach to cancer
metastasis detection.
3.1. Image Pre-processing
(a) (b)
Figure 1: Visualization of tissue region detection during im-
age pre-processing (described in Section 3.1). Detected tis-
sue regions are highlighted with the green curves.
To reduce computation time and to focus our analysis
on regions of the slide most likely to contain cancer metas-
tasis, we first identify tissue within the WSI and exclude
background white space. To achieve this, we adopt a thresh-
old based segmentation method to automatically detect the
background region. In particular, we first transfer the orig-
inal image from the RGB color space to the HSV color
space, then the optimal threshold values in each channel are
computed using the Otsu algorithm [16], and the final mask
images are generated by combining the masks from H and
S channels. The detection results are visualized in Fig. 1,
where the tissue regions are highlighted using green curves.
According to the detection results, the average percentage
of background region per WSI is approximately 82%.
3.2. Cancer Metastasis Detection Framework
Our cancer metastasis detection framework consists of a
patch-based classification stage and a heatmap-based post-
processing stage, as depicted in Fig. 2.
During model training, the patch-based classification
stage takes as input whole slide images and the ground
truth image annotation, indicating the locations of regions
of each WSI containing metastatic cancer. We randomly
whole slide image
training data
deep model
whole slide image
overlapping image
patches tumor prob. map
Figure 2: The framework of cancer metastases detection.
extract millions of small positive and negative patches from
the set of training WSIs. If the small patch is located in
a tumor region, it is a tumor / positive patch and labeled
with 1, otherwise, it is a normal / negative patch and labeled
with 0. Following selection of positive and negative training
examples, we train a supervised classification model to dis-
criminate between these two classes of patches, and we em-
bed all the prediction results into a heatmap image. In the
heatmap-based post-processing stage, we use the tumor
probability heatmap to compute the slide-based evaluation
and lesion-based evaluation scores for each WSI.
3.3. Patch-based Classification Stage
During training, this stage uses as input 256x256 pixel
patches from positive and negative regions of the WSIs
and trains a classification model to discriminate between
the positive and negative patches. We evaluated the per-
formance of four well-known deep learning network ar-
chitectures for this classification task: GoogLeNet [20],
AlexNet [12], VGG16 [19] and a face orientated deep net-
work [21], as shown in Table 2. The two deeper networks
(GoogLeNet and VGG16) achieved the best patch-based
classification performance. In our framework, we adopt
GoogLeNet as our deep network structure since it is gen-
erally faster and more stable than VGG16. The network
structure of GoogLeNet consists of 27 layers in total and
more than 6million parameters.
Table 2: Evaluation of Various Deep Models
Patch classification accuracy
GoogLeNet [20] 98.4%
AlexNet [12] 92.1%
VGG16 [19] 97.9%
FaceNet [21] 96.8%
In our experiments, we evaluated a range of magnifica-
tion levels, including 40×,20×and 10×, and we obtained
the best performance with 40×magnification. We used
only the 40×magnification in the experimental results re-
ported for the Camelyon competition.
After generating tumor-probability heatmaps using
GoogLeNet across the entire training dataset, we noted that
a significant proportion of errors were due to false positive
classification from histologic mimics of cancer. To improve
model performance on these regions, we extract additional
training examples from these difficult negative regions and
retrain the model with a training set enriched for these hard
negative patches.
We present one of our results in Fig. 3. Given a whole
slide image (Fig. 3(a)) and a deep learning based patch clas-
sification model, we generate the corresponding tumor re-
gion heatmap (Fig. 3(b)), which highlights the tumor area.
(a) Tumor Slide (b) Heatmap (c) Heatmap overlaid on slide
Figure 3: Visualization of tumor region detection.
3.4. Post-processing of tumor heatmaps to compute
slide-based and lesion-based probabilities
After completion of the patch-based classification stage,
we generate a tumor probability heatmap for each WSI. On
these heatmaps, each pixel contains a value between 0 and
1, indicating the probability that the pixel contains tumor.
We now perform post-processing to compute slide-based
and lesion-based scores for each heatmap.
3.4.1 Slide-based Classification
For the slide-based classification task, the post-processing
takes as input a heatmap for each WSI and produces as out-
put a single probability of tumor for the entire WSI. Given
a heatmap, we extract 28 geometrical and morphological
features from each heatmap, including the percentage of tu-
mor region over the whole tissue region, the area ratio be-
tween tumor region and the minimum surrounding convex
region, the average prediction values, and the longest axis
of the tumor region. We compute these features over tu-
mor probability heatmaps across all training cases, and we
build a random forest classifier to discriminate the WSIs
with metastases from the negative WSIs. On the test cases,
our slide-based classification method achieved an AUC of
0.925, making it the top-performing system for the slide-
based classification task in the Camelyon grand challenge.
3.4.2 Lesion-based Detection
For the lesion-based detection post-processing, we aim to
identify all cancer lesions within each WSI with few false
positives. To achieve this, we first train a deep model (D-I)
using our initial training dataset described above. We then
train a second deep model (D-II) with a training set that is
enriched for tumor-adjacent negative regions. This model
(D-II) produces fewer false positives than D-I but has re-
duced sensitivity. In our framework, we first threshold the
heatmap produced from D-I at 0.90, which creates a binary
heatmap. We then identify connected components within
the tumor binary mask, and we use the central point as the
tumor location for each connected component. To estimate
the probability of tumor at each of these (x, y)locations, we
take the average of the tumor probability predictions gener-
ated by D-I and D-II across each connected component. The
scoring metric for Camelyon16 was defined as the average
sensitivity at 6 predefined false positive rates: 1/4, 1/2, 1, 2,
4, and 8 FPs per whole slide image. Our system achieved
a score of 0.7051, which was the highest score in the com-
petition and was 22 percent higher than the second-ranking
score (0.5761).
4. Experimental Results
4.1. Evaluation Results from Camelyon16
In this section, we briefly present the evaluation results
generated by the Camelyon16 organizers, which is also
available on the website 2.
There are two kinds evaluation in Camelyon16: Slide-
based Evaluation and Lesion-based Evaluation. We won
both of these two challenging tasks.
Slide-based Evaluation: The merits of the algorithms
were assessed for discriminating between slides containing
metastasis and normal slides. Receiver operating character-
istic (ROC) analysis at the slide level were performed and
the measure used for comparing the algorithms was area un-
der the ROC curve (AUC). Our submitted result was gener-
ated based on the algorithm in Section 3.4.1. As shown in
Fig. 4, the AUC is 0.9250. Notice that our algorithm per-
formed much better than the second best method when the
False Positive Rate (FPR) is low.
Lesion-based Evaluation: For the lesion-based evalua-
tion, free-response receiver operating characteristic (FROC)
curve were used. The FROC curve is defined as the plot of
Figure 4: Receiver Operating Characteristic (ROC) curve of
Slide-based Classification
sensitivity versus the average number of false-positives per
image. Our submitted result was generated based on the al-
gorithm in Section 3.4.2. As shown in Fig. 5, we can make
two observations: first, the pathologist did not make any
false positive predictions; second, when the average number
of false positives is larger than 2, which indicates that there
will be two false positive alert in each slide on average, our
performance (in terms of sensitivity) even outperformed the
Figure 5: Free-Response Receiver Operating Characteristic
(FROC) curve of the Lesion-based Detection.
4.2. Combining Deep Learning System with a Hu-
man Pathologist
To evaluate the top-ranking deep learning systems
against a human pathologist, the Camelyon16 organizers
had a pathologist examine the test images used in the com-
petition. For the slide-based classification task,the human
pathologist achieved an AUC of 0.9664, reflecting a 3.4 per-
cent error rate. When the predictions of our deep learning
system were combined with the predictions of the human
pathologist, the AUC was raised to 0.9948 reflecting a drop
in the error rate to 0.52 percent.
5. Discussion
Here we present a deep learning-based system for the
automated detection of metastatic cancer from whole slide
images of sentinel lymph nodes. Key aspects of our sys-
tem include: enrichment of the training set with patches
from regions of normal lymph node that the system was
initially mis-classifying as cancer; use of a state-of-the-
art deep learning model architecture; and careful design of
post-processing methods for the slide-based classification
and lesion-based detection tasks.
Historically, approaches to histopathological image anal-
ysis in digital pathology have focused primarily on low-
level image analysis tasks (e.g., color normalization, nu-
clear segmentation, and feature extraction), followed by
construction of classification models using classical ma-
chine learning methods, including: regression, support vec-
tor machines, and random forests. Typically, these algo-
rithms take as input relatively small sets of image features
(on the order of tens) [9,10]. Building on this framework,
approaches have been developed for the automated extrac-
tion of moderately high dimensional sets of image features
(on the order of thousands) from histopathological images
followed by the construction of relatively simple, linear
classification models using methods designed for dimen-
sionality reduction, such as sparse regression [2].
Since 2012, deep learning-based approaches have con-
sistently shown best-in-class performance in major com-
puter vision competitions, such as the ImageNet Large
Scale Visual Recognition Competition (ILSVRC) [18].
Deep learning-based approaches have also recently shown
promise for applications in pathology. A team from the re-
search group of Juergen Schmidhuber used a deep learning-
based approach to win the ICPR 2012 and MICCAI 2013
challenges focused on algorithm development for mitotic
figure detection[3]. In contrast to the types of machine
learning approaches historically used in digital pathology,
in deep learning-based approaches there tend to be no dis-
crete human-directed steps for object detection, object seg-
mentation, and feature extraction. Instead, the deep learn-
ing algorithms take as input only the images and the image
labels (e.g., 1 or 0) and learn a very high-dimensional and
complex set of model parameters with supervision coming
only from the image labels.
Our winning approach in the Camelyon Grand Challenge
2016 utilized a 27-layer deep network architecture and ob-
tained near human-level classification performance on the
test data set. Importantly, the errors made by our deep
learning system were not strongly correlated with the errors
made by a human pathologist. Thus, although the patholo-
gist alone is currently superior to our deep learning system
alone, combining deep learning with the pathologist pro-
duced a major reduction in pathologist error rate, reducing
it from over 3 percent to less than 1 percent. More generally,
these results suggest that integrating deep learning-based
approaches into the work-flow of the diagnostic pathologist
could drive improvements in the reproducibility, accuracy
and clinical value of pathological diagnoses.
6. Acknowledgments
We thank all the Camelyon Grand Challenge 2016 or-
ganizers with special acknowledgments to lead coordinator
Babak Ehteshami Bejnordi. AK and AHB are co-founders
of PathAI, Inc.
[1] E. H. Ackerknecht et al. Rudolf virchow: Doctor, statesman,
anthropologist. Rudolf Virchow: Doctor, Statesman, Anthro-
pologist., 1953. 1
[2] A. H. Beck, A. R. Sangoi, S. Leung, R. J. Marinelli, T. O.
Nielsen, M. J. van de Vijver, R. B. West, M. van de Rijn, and
D. Koller. Systematic analysis of breast cancer morphology
uncovers stromal features associated with survival. Science
translational medicine, 3(108):108ra113–108ra113, 2011. 5
[3] D. C. Cires¸an, A. Giusti, L. M. Gambardella, and J. Schmid-
huber. Mitosis detection in breast cancer histology images
with deep neural networks. In Medical Image Computing
and Computer-Assisted Intervention–MICCAI 2013, pages
411–418. Springer, 2013. 5
[4] R. S. Cotran, V. Kumar, T. Collins, and S. L. Robbins. Rob-
bins pathologic basis of disease. 1999. 1
[5] B. J. Czerniecki, A. M. Scheff, L. S. Callans, F. R.
Spitz, I. Bedrosian, E. F. Conant, S. G. Orel, J. Berlin,
C. Helsabeck, D. L. Fraker, et al. Immunohistochemistry
with pancytokeratins improves the sensitivity of sentinel
lymph node biopsy in patients with breast carcinoma. Can-
cer, 85(5):1098–1103, 1999. 1
[6] S. B. Edge and C. C. Compton. The american joint com-
mittee on cancer: the 7th edition of the ajcc cancer staging
manual and the future of tnm. Annals of surgical oncology,
17(6):1471–1474, 2010. 1
[7] J. G. Elmore, G. M. Longton, P. A. Carney, B. M. Geller,
T. Onega, A. N. Tosteson, H. D. Nelson, M. S. Pepe, K. H.
Allison, S. J. Schnitt, et al. Diagnostic concordance among
pathologists interpreting breast biopsy specimens. Jama,
313(11):1122–1132, 2015. 1
[8] F. Ghaznavi, A. Evans, A. Madabhushi, and M. Feldman.
Digital imaging in pathology: whole-slide imaging and be-
yond. Annual Review of Pathology: Mechanisms of Disease,
8:331–359, 2013. 1
[9] M. N. Gurcan, L. E. Boucheron, A. Can, A. Madabhushi,
N. M. Rajpoot, and B. Yener. Histopathological image anal-
ysis: a review. Biomedical Engineering, IEEE Reviews in,
2:147–171, 2009. 1,5
[10] H. Irshad, A. Veillard, L. Roux, and D. Racoceanu. Methods
for nuclei detection, segmentation, and classification in digi-
tal histopathology: A reviewcurrent status and future poten-
tial. Biomedical Engineering, IEEE Reviews in, 7:97–114,
2014. 5
[11] S. Jaffer and I. J. Bleiweiss. Evolution of sentinel lymph
node biopsy in breast cancer, in and out of vogue? Advances
in anatomic pathology, 21(6):433–442, 2014. 1
[12] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet
classification with deep convolutional neural networks. In
F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger,
editors, Advances in Neural Information Processing Systems
25, pages 1097–1105. Curran Associates, Inc., 2012. 3
[13] G. H. Lyman, A. E. Giuliano, M. R. Somerfield, A. B. Ben-
son, D. C. Bodurka, H. J. Burstein, A. J. Cochran, H. S.
Cody, S. B. Edge, S. Galper, et al. American society of clini-
cal oncology guideline recommendations for sentinel lymph
node biopsy in early-stage breast cancer. Journal of Clinical
Oncology, 23(30):7703–7720, 2005. 1
[14] G. H. Lyman, S. Temin, S. B. Edge, L. A. Newman, R. R.
Turner, D. L. Weaver, A. B. Benson, L. D. Bosserman, H. J.
Burstein, H. Cody, et al. Sentinel lymph node biopsy for
patients with early-stage breast cancer: American society of
clinical oncology clinical practice guideline update. Journal
of Clinical Oncology, 32(13):1365–1383, 2014. 1
[15] R. E. Nakhleh. Error reduction in surgical pathology.
Archives of pathology & laboratory medicine, 130(5):630–
632, 2006. 1
[16] N. Otsu. A Threshold Selection Method from Gray-level
Histograms. IEEE Transactions on Systems, Man and Cy-
bernetics, 9(1):62–66, 1979. 2
[17] S. S. Raab, D. M. Grzybicki, J. E. Janosky, R. J. Zarbo, F. A.
Meier, C. Jensen, and S. J. Geyer. Clinical impact and fre-
quency of anatomic pathology errors in cancer diagnoses.
Cancer, 104(10):2205–2213, 2005. 1
[18] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh,
S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein,
et al. Imagenet large scale visual recognition challenge.
International Journal of Computer Vision, 115(3):211–252,
2015. 5
[19] K. Simonyan and A. Zisserman. Very deep convolu-
tional networks for large-scale image recognition. CoRR,
abs/1409.1556, 2014. 3
[20] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,
D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich.
Going deeper with convolutions. In CVPR 2015, 2015. 3
[21] D. Wang, C. Otto, and A. K. Jain. Face Search at Scale: 80
Million Gallery. 2015. 3
[22] D. L. Weaver, D. N. Krag, E. A. Manna, T. Ashikaga,
S. P. Harlow, and K. D. Bauer. Comparison of pathologist-
detected and automated computer-assisted image analysis
detected sentinel lymph node micrometastases in breast can-
cer. Modern pathology, 16(11):1159–1163, 2003. 1
... The accuracy of the proposed scheme reached 97.6%. The author of [20] used the EVI time series to create deep neural networks with different architectures to classify summer crops. In [21], the author has used thermal imaging technology to identify crop disease. ...
... The proposed strategy had a 97.6 percent accuracy rate. To categorize summer crops, the author of [20] used the EVI time data to develop deep neural networks with various architectures. The author used thermal imaging technology to detect crop disease in [21]. ...
Full-text available
Agriculture is essential to the growth of every country. Cotton and other major crops fall into the cash crops. Cotton is affected by most of the diseases that cause significant crop damage. Many diseases affect yield through the leaf. Detecting disease early saves crop from further damage. Cotton is susceptible to several diseases, including leaf spot, target spot, bacterial blight, nutrient deficiency, powdery mildew, leaf curl, etc. Accurate disease identification is important for taking effective measures. Deep learning in the identification of plant disease plays an important role. The proposed model based on meta Deep Learning is used to identify several cotton leaf diseases accurately. We gathered cotton leaf images from the field for this study. The dataset contains 2385 images of healthy and diseased leaves. The size of the dataset was increased with the help of the data augmentation approach. The dataset was trained on Custom CNN, VGG16 Transfer Learning, ResNet50, and our proposed model: the meta deep learn leaf disease identification model. A meta learning technique has been proposed and implemented to provide a good accuracy and generalization. The proposed model has outperformed the Cotton Dataset with an accuracy of 98.53%.
... 40 We must better understand the complex and evolving relationship between clinicians and human-centered AI tools in an evolving clinical environment. [41][42][43][44][45] It is still challenging to provide diagnosis and treatment recommendations through AI-based systems; however, we expect AI will ultimately master that domain. Given the rapid advances in image analysis, it is most likely to impact the field of medicine-radiology and pathology first, leading to images being solely examined by a machine. ...
... Wang et al demonstrate using an automated system to identify metastatic breast cancer with improved accuracy. 5 The exposure to new technology like artificial intelligence may provide additional insight and a new perspective. ...
Full-text available
Fortis Gaba,1 Qassi Q Gaba,2 Dilini Fernando3 1Department of Medicine, Queen Elizabeth University Hospital, NHS Greater Glasgow and Clyde, Glasgow, G51 4TF, UK; 2University of Oxford, Medical Sciences, Oxford, OX1 3PL, UK; 3Ninewells Hospital and Medical School, Dundee, DD19SY, UKCorrespondence: Fortis Gaba, Department of Medicine, Queen Elizabeth University Hospital, 1345 Govan Road, Glasgow, G51 4TF, UK, Email View the original paper by Mr Teshome and colleagues
... Extending deep convolutional networks for medical image classification has received a lot of attention. The VGG16 deep network was used by Wang et al. [41] to detect breast cancer. Esteva et al. [42] did research on skin cancer detection using Inception V3, with the goal of classifying malignancy status. ...
Full-text available
In this work, we introduce a spatially transformed DenseNet architecture for transformation invariant classification of cancer tissue. Our architecture increases the accuracy of the base DenseNet architecture while adding the ability to operate in a transformation invariant way while simultaneously being simpler than other models that try to provide some form of invariance.
Full-text available
Introduction: The pathological rare category of thyroid is a type of lesion with a low incidence rate and is easily misdiagnosed in clinical practice, which directly affects a patient’s treatment decision. However, it has not been adequately investigated to recognize the rare, benign, and malignant categories of thyroid using the deep learning method and recommend the rare to pathologists. Methods: We present an empirical decision tree based on the binary classification results of the patch-based UNet model to predict rare categories and recommend annotated lesion areas to be rereviewed by pathologists. Results: Applying this framework to 1,374 whole-slide images (WSIs) of frozen sections from thyroid lesions, we obtained an area under a curve of 0.946 and 0.986 for the test datasets with and without WSIs, respectively, of rare types. However, the recognition error rate for the rare categories was significantly higher than that for the benign and malignant categories (p < 0.00001). For rare WSIs, the addition of the empirical decision tree obtained a recall rate and precision of 0.882 and 0.498, respectively; the rare types (only 33.4% of all WSIs) were further recommended to be rereviewed by pathologists. Additionally, we demonstrated that the performance of our framework was comparable to that of pathologists in clinical practice for the predicted benign and malignant sections. Conclusion: Our study provides a baseline for the recommendation of the uncertain predicted rare category to pathologists, offering potential feasibility for the improvement of pathologists’ work efficiency.
Full-text available
Artificial Intelligence (AI) has revolutionized the way organizations face decision-making issues. One of these crucial elements is the implementation of organizational changes. There has been a wide-spread adoption of AI techniques in the private sector, whereas in the public sector their use has been recently extended. One of the greatest challenges that European governments have to face is the implementation of a wide variety of European Union (EU) funding programs which have evolved in the context of the EU long-term budget. In the current study, the Balanced Scorecard (BSC) and Artificial Neural Networks (ANNs) are intertwined with forecasting the outcomes of a co-financed EU program by means of its impact on the non-financial measures of the government body that materialized it. The predictive accuracy of the present model advanced in this research study takes into account all the complexities of the business environment, within which the provided dataset is produced. The outcomes of the study showed that the measures taken to enhance customer satisfaction allows for further improvement. The utilization of the proposed model could facilitate the decision-making process and initiate changes to the administrational issues of the available funding programs.
Fully automatic and autonomous medical systems are already released and being used. Nurses and doctors have started adopting the technology to reduce manual work, and to provide more accurate service and impactful interventions to patients. Increased access, better outcomes, reduced costs and more personal and customized healthcare are the promise of AI. But unlike other commercial systems where performance is paramount, in healthcare, patient safety is the primary concern. There is a tremendous drive to capitalize on AI capabilities as soon as possible and as much as possible. However, there is a risk to AI's success. People expect infallibility from AI – far more than they expect from human physicians. As a result, only a few catastrophic events involving AI could spell doom for AI in healthcare.
Full-text available
In pathology, tissue samples are assessed using multiple staining techniques to enhance contrast in unique histologic features. In this paper, we introduce a multimodal CNN-GNN based graph fusion approach that leverages complementary information from multiple non-registered histopathology images to predict pathologic scores. We demonstrate this approach in nonalcoholic steatohepatitis (NASH) by predicting CRN fibrosis stage and NAFLD Activity Score (NAS). Primary assessment of NASH typically requires liver biopsy evaluation on two histological stains: Trichrome (TC) and hematoxylin and eosin (H&E). Our multimodal approach learns to extract complementary information from TC and H&E graphs corresponding to each stain while simultaneously learning an optimal policy to combine this information. We report up to 20% improvement in predicting fibrosis stage and NAS component grades over single-stain modeling approaches, measured by computing linearly weighted Cohen's kappa between machine-derived vs. pathologist consensus scores. Broadly, this paper demonstrates the value of leveraging diverse pathology images for improved ML-powered histologic assessment.
Full-text available
Building on a growing number of pathology labs having a full digital infrastructure for pathology diagnostics, there is a growing interest in implementing artificial intelligence (AI) algorithms for diagnostic purposes. This article provides an overview of the current status of the digital pathology infrastructure at the University Medical Center Utrecht and our roadmap for implementing AI algorithms in the next few years.
Use of artificial intelligence (AI) is a burgeoning field in otolaryngology and the communication sciences. A virtual symposium on the topic was convened from Duke University on October 26, 2020, and was attended by more than 170 participants worldwide. This review presents summaries of all but one of the talks presented during the symposium; recordings of all the talks, along with the discussions for the talks, are available at and Each of the summaries is about 2500 words in length and each summary includes two figures. This level of detail far exceeds the brief summaries presented in traditional reviews and thus provides a more-informed glimpse into the power and diversity of current AI applications in otolaryngology and the communication sciences and how to harness that power for future applications.
Full-text available
Due to the prevalence of social media websites, one challenge facing computer vision researchers is to devise methods to process and search for persons of interest among the billions of shared photos on these websites. Facebook revealed in a 2013 white paper that its users have uploaded more than 250 billion photos, and are uploading 350 million new photos each day. Due to this humongous amount of data, large-scale face search for mining web images is both important and challenging. Despite significant progress in face recognition, searching a large collection of unconstrained face images has not been adequately addressed. To address this challenge, we propose a face search system which combines a fast search procedure, coupled with a state-of-the-art commercial off the shelf (COTS) matcher, in a cascaded framework. Given a probe face, we first filter the large gallery of photos to find the top-k most similar faces using deep features generated from a convolutional neural network. The k candidates are re-ranked by combining similarities from deep features and the COTS matcher. We evaluate the proposed face search system on a gallery containing 80 million web-downloaded face images. Experimental results demonstrate that the deep features are competitive with state-of-the-art methods on unconstrained face recognition benchmarks (LFW and IJB-A). Further, the proposed face search system offers an excellent trade-off between accuracy and scalability on datasets consisting of millions of images. Additionally, in an experiment involving searching for face images of the Tsarnaev brothers, convicted of the Boston Marathon bombing, the proposed face search system could find the younger brother's (Dzhokhar Tsarnaev) photo at rank 1 in 1 second on a 5M gallery and at rank 8 in 7 seconds on an 80M gallery.
Full-text available
A breast pathology diagnosis provides the basis for clinical treatment and management decisions; however, its accuracy is inadequately understood. To quantify the magnitude of diagnostic disagreement among pathologists compared with a consensus panel reference diagnosis and to evaluate associated patient and pathologist characteristics. Study of pathologists who interpret breast biopsies in clinical practices in 8 US states. Participants independently interpreted slides between November 2011 and May 2014 from test sets of 60 breast biopsies (240 total cases, 1 slide per case), including 23 cases of invasive breast cancer, 73 ductal carcinoma in situ (DCIS), 72 with atypical hyperplasia (atypia), and 72 benign cases without atypia. Participants were blinded to the interpretations of other study pathologists and consensus panel members. Among the 3 consensus panel members, unanimous agreement of their independent diagnoses was 75%, and concordance with the consensus-derived reference diagnoses was 90.3%. The proportions of diagnoses overinterpreted and underinterpreted relative to the consensus-derived reference diagnoses were assessed. Sixty-five percent of invited, responding pathologists were eligible and consented to participate. Of these, 91% (N = 115) completed the study, providing 6900 individual case diagnoses. Compared with the consensus-derived reference diagnosis, the overall concordance rate of diagnostic interpretations of participating pathologists was 75.3% (95% CI, 73.4%-77.0%; 5194 of 6900 interpretations). Among invasive carcinoma cases (663 interpretations), 96% (95% CI, 94%-97%) were concordant, and 4% (95% CI, 3%-6%) were underinterpreted; among DCIS cases (2097 interpretations), 84% (95% CI, 82%-86%) were concordant, 3% (95% CI, 2%-4%) were overinterpreted, and 13% (95% CI, 12%-15%) were underinterpreted; among atypia cases (2070 interpretations), 48% (95% CI, 44%-52%) were concordant, 17% (95% CI, 15%-21%) were overinterpreted, and 35% (95% CI, 31%-39%) were underinterpreted; and among benign cases without atypia (2070 interpretations), 87% (95% CI, 85%-89%) were concordant and 13% (95% CI, 11%-15%) were overinterpreted. Disagreement with the reference diagnosis was statistically significantly higher among biopsies from women with higher (n = 122) vs lower (n = 118) breast density on prior mammograms (overall concordance rate, 73% [95% CI, 71%-75%] for higher vs 77% [95% CI, 75%-80%] for lower, P < .001), and among pathologists who interpreted lower weekly case volumes (P < .001) or worked in smaller practices (P = .034) or nonacademic settings (P = .007). In this study of pathologists, in which diagnostic interpretation was based on a single breast biopsy slide, overall agreement between the individual pathologists' interpretations and the expert consensus-derived reference diagnoses was 75.3%, with the highest level of concordance for invasive carcinoma and lower levels of concordance for DCIS and atypia. Further research is needed to understand the relationship of these findings with patient management.
Full-text available
We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make train-ing faster, we used non-saturating neurons and a very efficient GPU implemen-tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
Sentinel lymph node biopsy (SLNB) was introduced 2 decades ago and thereafter validated for routine surgical management of breast cancer, including cases treated with neoadjuvant chemotherapy. As the number of lymph nodes for staging has decreased, pathologists have scrutinized SLN with a combination of standard hematoxylin and eosin, levels, immunohistochemistry (IHC), and molecular methods. An epidemic of small-volume metastases thereby arose, leading to modifications in the American Joint Committee on Cancer staging to accommodate findings such as isolated tumor cells (ITC) and micrometastases. With the goal of determining the significance of these findings, retrospective followed by prospective trials were performed, showing mixed results. The ACOSOG Z10 and NSABP B-32 trials both independently showed that ITC and micrometastases were not significant and thus discouraged the use of levels and IHC for detecting them. However, the Surveillance Epidemiology and End Results database showed that patients with micrometastases had an overall decreased survival. In addition, the MIRROR (Micrometastases and ITC: Relevant and Robust or Rubbish?) trial, showed that patients with ITC and micrometastases treated with adjuvant therapy had lower hazard ratios compared with untreated patients. Subsequently, the ACOSOG Z0011 trial randomized patients with up to 2 positive SLN to axillary lymph node dissection (ALND) or not, all treated with radiation and chemotherapy, showing no difference in survival or recurrence rates between the 2 groups and causing a shift from ALND. As the rate of ALND has declined, the necessity of performing levels, IHC, frozen section, and molecular studies on SLN needs to be revisited.
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.