ArticlePDF Available

Deep Learning for Identifying Metastatic Breast Cancer

Authors:
  • PathAI Inc

Abstract and Figures

The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies. Our team won both competitions in the grand challenge, obtaining an area under the receiver operating curve (AUC) of 0.925 for the task of whole slide image classification and a score of 0.7051 for the tumor localization task. A pathologist independently reviewed the same images, obtaining a whole slide image classification AUC of 0.966 and a tumor localization score of 0.733. Combining our deep learning system's predictions with the human pathologist's diagnoses increased the pathologist's AUC to 0.995, representing an approximately 85 percent reduction in human error rate. These results demonstrate the power of using deep learning to produce significant improvements in the accuracy of pathological diagnoses.
Content may be subject to copyright.
Deep Learning for Identifying Metastatic Breast Cancer
Dayong Wang Aditya Khosla?Rishab Gargeya Humayun Irshad Andrew H Beck
Beth Israel Deaconess Medical Center, Harvard Medical School
?CSAIL, Massachusetts Institute of Technology
{dwang5,hirshad,abeck2}@bidmc.harvard.edu khosla@csail.mit.edu
rishab.gargeya@gmail.com
Abstract
The International Symposium on Biomedical Imaging
(ISBI) held a grand challenge to evaluate computational
systems for the automated detection of metastatic breast
cancer in whole slide images of sentinel lymph node biop-
sies. Our team won both competitions in the grand chal-
lenge, obtaining an area under the receiver operating curve
(AUC) of 0.925 for the task of whole slide image classifica-
tion and a score of 0.7051 for the tumor localization task.
A pathologist independently reviewed the same images, ob-
taining a whole slide image classification AUC of 0.966 and
a tumor localization score of 0.733. Combining our deep
learning system’s predictions with the human pathologist’s
diagnoses increased the pathologist’s AUC to 0.995, rep-
resenting an approximately 85 percent reduction in human
error rate. These results demonstrate the power of using
deep learning to produce significant improvements in the
accuracy of pathological diagnoses.
1. Introduction
The medical specialty of pathology is tasked with pro-
viding definitive disease diagnoses to guide patient treat-
ment and management decisions [4]. Standardized, accu-
rate and reproducible pathological diagnoses are essential
for advancing precision medicine. Since the mid-19th cen-
tury, the primary tool used by pathologists to make diag-
noses has been the microscope [1]. Limitations of the quali-
tative visual analysis of microscopic images includes lack of
standardization, diagnostic errors, and the significant cog-
nitive load required to manually evaluate millions of cells
across hundreds of slides in a typical pathologist’s workday
[15,17,7]. Consequently, over the past several decades
there has been increasing interest in developing computa-
tional methods to assist in the analysis of microscopic im-
ages in pathology [9,8].
From October 2015 to April 2016, the International
Symposium on Biomedical Imaging (ISBI) held the Came-
lyon Grand Challenge 2016 (Camelyon16) to identify top-
performing computational image analysis systems for the
task of automatically detecting metastatic breast cancer in
digital whole slide images (WSIs) of sentinel lymph node
biopsies1. The evaluation of breast sentinel lymph nodes is
an important component of the American Joint Committee
on Cancer’s TNM breast cancer staging system, in which
patients with a sentinel lymph node positive for metastatic
cancer will receive a higher pathologic TNM stage than pa-
tients negative for sentinel lymph node metastasis [6], fre-
quently resulting in more aggressive clinical management,
including axillary lymph node dissection [13,14].
The manual pathological review of sentinel lymph nodes
is time-consuming and laborious, particularly in cases in
which the lymph nodes are negative for cancer or contain
only small foci of metastatic cancer. Many centers have
implemented testing of sentinel lymph nodes with immuno-
histochemistry for pancytokeratins [5], which are proteins
expressed on breast cancer cells and not normally present
in lymph nodes, to improve the sensitivity of cancer metas-
tasis detection. However, limitations of pancytokeratin im-
munohiostochemistry testing of sentinel lymph nodes in-
clude: increased cost, increased time for slide preparation,
and increased number of slides required for pathological
review. Further, even with immunohistochemistry-stained
slides, the identification of small cancer metastases can be
tedious and inaccurate.
Computer-assisted image analysis systems have been de-
veloped to aid in the detection of small metastatic foci
from pancytokeratin-stained immunohistochemistry slides
of sentinel lymph nodes [22]; however, these systems are
not used clinically. Thus, the development of effective and
cost efficient methods for sentinel lymph node evaluation
remains an active area of research [11], as there would be
value to a high-performing system that could increase accu-
racy and reduce cognitive load at low cost.
Here, we present a deep learning-based approach for the
identification of cancer metastases from whole slide im-
1http://camelyon16.grand-challenge.org/
1
arXiv:1606.05718v1 [q-bio.QM] 18 Jun 2016
ages of breast sentinel lymph nodes. Our approach uses
millions of training patches to train a deep convolutional
neural network to make patch-level predictions to discrim-
inate tumor-patches from normal-patches. We then aggre-
gate the patch-level predictions to create tumor probability
heatmaps and perform post-processing over these heatmaps
to make predictions for the slide-based classification task
and the tumor-localization task. Our system won both com-
petitions at the Camelyon Grand Challenge 2016, with per-
formance approaching human level accuracy. Finally, com-
bining the predictions of our deep learning system with a
pathologist’s interpretations produced a significant reduc-
tion in the pathologist’s error rate.
2. Dataset and Evaluation Metrics
In this section, we describe the Camelyon16 dataset pro-
vided by the organizers of the competition and the evalua-
tion metrics used to rank the participants.
2.1. Camelyon16 Dataset
The Camelyon16 dataset consists of a total of 400 whole
slide images (WSIs) split into 270 for training and 130 for
testing. Both splits contain samples from two institutions
(Radbound UMC and UMC Utrecht) with specific details
provided in Table 1.
Table 1: Number of slides in the Camelyon16 dataset.
Institution Train Train Test
cancer normal
Radboud UMC 90 70 80
UMC Utrecht 70 40 50
Total 160 110 130
The ground truth data for the training slides consists of
a pathologist’s delineation of regions of metastatic cancer
on WSIs of sentinel lymph nodes. The data was provided in
two formats: XML files containing vertices of the annotated
contours of the locations of cancer metastases and WSI bi-
nary masks indicating the location of the cancer metastasis.
2.2. Evaluation Metrics
Submissions to the competition were evaluated on the
following two metrics:
Slide-based Evaluation: For this metric, teams were
judged on performance at discriminating between
slides containing metastasis and normal slides. Com-
petition participants submitted a probability for each
test slide indicating its predicted likelihood of contain-
ing cancer. The competition organizers measured the
participant performance using the area under the re-
ceiver operator (AUC) score.
Lesion-based Evaluation: For this metric, partic-
ipants submitted a probability and a corresponding
(x, y)location for each predicted cancer lesion within
the WSI. The competition organizers measured partic-
ipant performance as the average sensitivity for detect-
ing all true cancer lesions in a WSI across 6 false pos-
itive rates: 1
4,1
2, 1, 2, 4, and 8 false positives per WSI.
3. Method
In this section, we describe our approach to cancer
metastasis detection.
3.1. Image Pre-processing
(a) (b)
Figure 1: Visualization of tissue region detection during im-
age pre-processing (described in Section 3.1). Detected tis-
sue regions are highlighted with the green curves.
To reduce computation time and to focus our analysis
on regions of the slide most likely to contain cancer metas-
tasis, we first identify tissue within the WSI and exclude
background white space. To achieve this, we adopt a thresh-
old based segmentation method to automatically detect the
background region. In particular, we first transfer the orig-
inal image from the RGB color space to the HSV color
space, then the optimal threshold values in each channel are
computed using the Otsu algorithm [16], and the final mask
images are generated by combining the masks from H and
S channels. The detection results are visualized in Fig. 1,
where the tissue regions are highlighted using green curves.
According to the detection results, the average percentage
of background region per WSI is approximately 82%.
3.2. Cancer Metastasis Detection Framework
Our cancer metastasis detection framework consists of a
patch-based classification stage and a heatmap-based post-
processing stage, as depicted in Fig. 2.
During model training, the patch-based classification
stage takes as input whole slide images and the ground
truth image annotation, indicating the locations of regions
of each WSI containing metastatic cancer. We randomly
Train
Test
whole slide image
sample
sample
training data
normal
tumor
deep model
P(tumor)
whole slide image
overlapping image
patches tumor prob. map
1.0
0.0
0.5
Figure 2: The framework of cancer metastases detection.
extract millions of small positive and negative patches from
the set of training WSIs. If the small patch is located in
a tumor region, it is a tumor / positive patch and labeled
with 1, otherwise, it is a normal / negative patch and labeled
with 0. Following selection of positive and negative training
examples, we train a supervised classification model to dis-
criminate between these two classes of patches, and we em-
bed all the prediction results into a heatmap image. In the
heatmap-based post-processing stage, we use the tumor
probability heatmap to compute the slide-based evaluation
and lesion-based evaluation scores for each WSI.
3.3. Patch-based Classification Stage
During training, this stage uses as input 256x256 pixel
patches from positive and negative regions of the WSIs
and trains a classification model to discriminate between
the positive and negative patches. We evaluated the per-
formance of four well-known deep learning network ar-
chitectures for this classification task: GoogLeNet [20],
AlexNet [12], VGG16 [19] and a face orientated deep net-
work [21], as shown in Table 2. The two deeper networks
(GoogLeNet and VGG16) achieved the best patch-based
classification performance. In our framework, we adopt
GoogLeNet as our deep network structure since it is gen-
erally faster and more stable than VGG16. The network
structure of GoogLeNet consists of 27 layers in total and
more than 6million parameters.
Table 2: Evaluation of Various Deep Models
Patch classification accuracy
GoogLeNet [20] 98.4%
AlexNet [12] 92.1%
VGG16 [19] 97.9%
FaceNet [21] 96.8%
In our experiments, we evaluated a range of magnifica-
tion levels, including 40×,20×and 10×, and we obtained
the best performance with 40×magnification. We used
only the 40×magnification in the experimental results re-
ported for the Camelyon competition.
After generating tumor-probability heatmaps using
GoogLeNet across the entire training dataset, we noted that
a significant proportion of errors were due to false positive
classification from histologic mimics of cancer. To improve
model performance on these regions, we extract additional
training examples from these difficult negative regions and
retrain the model with a training set enriched for these hard
negative patches.
We present one of our results in Fig. 3. Given a whole
slide image (Fig. 3(a)) and a deep learning based patch clas-
sification model, we generate the corresponding tumor re-
gion heatmap (Fig. 3(b)), which highlights the tumor area.
(a) Tumor Slide (b) Heatmap (c) Heatmap overlaid on slide
Figure 3: Visualization of tumor region detection.
3.4. Post-processing of tumor heatmaps to compute
slide-based and lesion-based probabilities
After completion of the patch-based classification stage,
we generate a tumor probability heatmap for each WSI. On
these heatmaps, each pixel contains a value between 0 and
1, indicating the probability that the pixel contains tumor.
We now perform post-processing to compute slide-based
and lesion-based scores for each heatmap.
3.4.1 Slide-based Classification
For the slide-based classification task, the post-processing
takes as input a heatmap for each WSI and produces as out-
put a single probability of tumor for the entire WSI. Given
a heatmap, we extract 28 geometrical and morphological
features from each heatmap, including the percentage of tu-
mor region over the whole tissue region, the area ratio be-
tween tumor region and the minimum surrounding convex
region, the average prediction values, and the longest axis
of the tumor region. We compute these features over tu-
mor probability heatmaps across all training cases, and we
build a random forest classifier to discriminate the WSIs
with metastases from the negative WSIs. On the test cases,
our slide-based classification method achieved an AUC of
0.925, making it the top-performing system for the slide-
based classification task in the Camelyon grand challenge.
3.4.2 Lesion-based Detection
For the lesion-based detection post-processing, we aim to
identify all cancer lesions within each WSI with few false
positives. To achieve this, we first train a deep model (D-I)
using our initial training dataset described above. We then
train a second deep model (D-II) with a training set that is
enriched for tumor-adjacent negative regions. This model
(D-II) produces fewer false positives than D-I but has re-
duced sensitivity. In our framework, we first threshold the
heatmap produced from D-I at 0.90, which creates a binary
heatmap. We then identify connected components within
the tumor binary mask, and we use the central point as the
tumor location for each connected component. To estimate
the probability of tumor at each of these (x, y)locations, we
take the average of the tumor probability predictions gener-
ated by D-I and D-II across each connected component. The
scoring metric for Camelyon16 was defined as the average
sensitivity at 6 predefined false positive rates: 1/4, 1/2, 1, 2,
4, and 8 FPs per whole slide image. Our system achieved
a score of 0.7051, which was the highest score in the com-
petition and was 22 percent higher than the second-ranking
score (0.5761).
4. Experimental Results
4.1. Evaluation Results from Camelyon16
In this section, we briefly present the evaluation results
generated by the Camelyon16 organizers, which is also
available on the website 2.
There are two kinds evaluation in Camelyon16: Slide-
based Evaluation and Lesion-based Evaluation. We won
both of these two challenging tasks.
Slide-based Evaluation: The merits of the algorithms
were assessed for discriminating between slides containing
metastasis and normal slides. Receiver operating character-
istic (ROC) analysis at the slide level were performed and
the measure used for comparing the algorithms was area un-
der the ROC curve (AUC). Our submitted result was gener-
ated based on the algorithm in Section 3.4.1. As shown in
Fig. 4, the AUC is 0.9250. Notice that our algorithm per-
formed much better than the second best method when the
False Positive Rate (FPR) is low.
Lesion-based Evaluation: For the lesion-based evalua-
tion, free-response receiver operating characteristic (FROC)
curve were used. The FROC curve is defined as the plot of
2http://camelyon16.grand-challenge.org/results/
Figure 4: Receiver Operating Characteristic (ROC) curve of
Slide-based Classification
sensitivity versus the average number of false-positives per
image. Our submitted result was generated based on the al-
gorithm in Section 3.4.2. As shown in Fig. 5, we can make
two observations: first, the pathologist did not make any
false positive predictions; second, when the average number
of false positives is larger than 2, which indicates that there
will be two false positive alert in each slide on average, our
performance (in terms of sensitivity) even outperformed the
pathologist.
Figure 5: Free-Response Receiver Operating Characteristic
(FROC) curve of the Lesion-based Detection.
4.2. Combining Deep Learning System with a Hu-
man Pathologist
To evaluate the top-ranking deep learning systems
against a human pathologist, the Camelyon16 organizers
had a pathologist examine the test images used in the com-
petition. For the slide-based classification task,the human
pathologist achieved an AUC of 0.9664, reflecting a 3.4 per-
cent error rate. When the predictions of our deep learning
system were combined with the predictions of the human
pathologist, the AUC was raised to 0.9948 reflecting a drop
in the error rate to 0.52 percent.
5. Discussion
Here we present a deep learning-based system for the
automated detection of metastatic cancer from whole slide
images of sentinel lymph nodes. Key aspects of our sys-
tem include: enrichment of the training set with patches
from regions of normal lymph node that the system was
initially mis-classifying as cancer; use of a state-of-the-
art deep learning model architecture; and careful design of
post-processing methods for the slide-based classification
and lesion-based detection tasks.
Historically, approaches to histopathological image anal-
ysis in digital pathology have focused primarily on low-
level image analysis tasks (e.g., color normalization, nu-
clear segmentation, and feature extraction), followed by
construction of classification models using classical ma-
chine learning methods, including: regression, support vec-
tor machines, and random forests. Typically, these algo-
rithms take as input relatively small sets of image features
(on the order of tens) [9,10]. Building on this framework,
approaches have been developed for the automated extrac-
tion of moderately high dimensional sets of image features
(on the order of thousands) from histopathological images
followed by the construction of relatively simple, linear
classification models using methods designed for dimen-
sionality reduction, such as sparse regression [2].
Since 2012, deep learning-based approaches have con-
sistently shown best-in-class performance in major com-
puter vision competitions, such as the ImageNet Large
Scale Visual Recognition Competition (ILSVRC) [18].
Deep learning-based approaches have also recently shown
promise for applications in pathology. A team from the re-
search group of Juergen Schmidhuber used a deep learning-
based approach to win the ICPR 2012 and MICCAI 2013
challenges focused on algorithm development for mitotic
figure detection[3]. In contrast to the types of machine
learning approaches historically used in digital pathology,
in deep learning-based approaches there tend to be no dis-
crete human-directed steps for object detection, object seg-
mentation, and feature extraction. Instead, the deep learn-
ing algorithms take as input only the images and the image
labels (e.g., 1 or 0) and learn a very high-dimensional and
complex set of model parameters with supervision coming
only from the image labels.
Our winning approach in the Camelyon Grand Challenge
2016 utilized a 27-layer deep network architecture and ob-
tained near human-level classification performance on the
test data set. Importantly, the errors made by our deep
learning system were not strongly correlated with the errors
made by a human pathologist. Thus, although the patholo-
gist alone is currently superior to our deep learning system
alone, combining deep learning with the pathologist pro-
duced a major reduction in pathologist error rate, reducing
it from over 3 percent to less than 1 percent. More generally,
these results suggest that integrating deep learning-based
approaches into the work-flow of the diagnostic pathologist
could drive improvements in the reproducibility, accuracy
and clinical value of pathological diagnoses.
6. Acknowledgments
We thank all the Camelyon Grand Challenge 2016 or-
ganizers with special acknowledgments to lead coordinator
Babak Ehteshami Bejnordi. AK and AHB are co-founders
of PathAI, Inc.
References
[1] E. H. Ackerknecht et al. Rudolf virchow: Doctor, statesman,
anthropologist. Rudolf Virchow: Doctor, Statesman, Anthro-
pologist., 1953. 1
[2] A. H. Beck, A. R. Sangoi, S. Leung, R. J. Marinelli, T. O.
Nielsen, M. J. van de Vijver, R. B. West, M. van de Rijn, and
D. Koller. Systematic analysis of breast cancer morphology
uncovers stromal features associated with survival. Science
translational medicine, 3(108):108ra113–108ra113, 2011. 5
[3] D. C. Cires¸an, A. Giusti, L. M. Gambardella, and J. Schmid-
huber. Mitosis detection in breast cancer histology images
with deep neural networks. In Medical Image Computing
and Computer-Assisted Intervention–MICCAI 2013, pages
411–418. Springer, 2013. 5
[4] R. S. Cotran, V. Kumar, T. Collins, and S. L. Robbins. Rob-
bins pathologic basis of disease. 1999. 1
[5] B. J. Czerniecki, A. M. Scheff, L. S. Callans, F. R.
Spitz, I. Bedrosian, E. F. Conant, S. G. Orel, J. Berlin,
C. Helsabeck, D. L. Fraker, et al. Immunohistochemistry
with pancytokeratins improves the sensitivity of sentinel
lymph node biopsy in patients with breast carcinoma. Can-
cer, 85(5):1098–1103, 1999. 1
[6] S. B. Edge and C. C. Compton. The american joint com-
mittee on cancer: the 7th edition of the ajcc cancer staging
manual and the future of tnm. Annals of surgical oncology,
17(6):1471–1474, 2010. 1
[7] J. G. Elmore, G. M. Longton, P. A. Carney, B. M. Geller,
T. Onega, A. N. Tosteson, H. D. Nelson, M. S. Pepe, K. H.
Allison, S. J. Schnitt, et al. Diagnostic concordance among
pathologists interpreting breast biopsy specimens. Jama,
313(11):1122–1132, 2015. 1
[8] F. Ghaznavi, A. Evans, A. Madabhushi, and M. Feldman.
Digital imaging in pathology: whole-slide imaging and be-
yond. Annual Review of Pathology: Mechanisms of Disease,
8:331–359, 2013. 1
[9] M. N. Gurcan, L. E. Boucheron, A. Can, A. Madabhushi,
N. M. Rajpoot, and B. Yener. Histopathological image anal-
ysis: a review. Biomedical Engineering, IEEE Reviews in,
2:147–171, 2009. 1,5
[10] H. Irshad, A. Veillard, L. Roux, and D. Racoceanu. Methods
for nuclei detection, segmentation, and classification in digi-
tal histopathology: A reviewcurrent status and future poten-
tial. Biomedical Engineering, IEEE Reviews in, 7:97–114,
2014. 5
[11] S. Jaffer and I. J. Bleiweiss. Evolution of sentinel lymph
node biopsy in breast cancer, in and out of vogue? Advances
in anatomic pathology, 21(6):433–442, 2014. 1
[12] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet
classification with deep convolutional neural networks. In
F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger,
editors, Advances in Neural Information Processing Systems
25, pages 1097–1105. Curran Associates, Inc., 2012. 3
[13] G. H. Lyman, A. E. Giuliano, M. R. Somerfield, A. B. Ben-
son, D. C. Bodurka, H. J. Burstein, A. J. Cochran, H. S.
Cody, S. B. Edge, S. Galper, et al. American society of clini-
cal oncology guideline recommendations for sentinel lymph
node biopsy in early-stage breast cancer. Journal of Clinical
Oncology, 23(30):7703–7720, 2005. 1
[14] G. H. Lyman, S. Temin, S. B. Edge, L. A. Newman, R. R.
Turner, D. L. Weaver, A. B. Benson, L. D. Bosserman, H. J.
Burstein, H. Cody, et al. Sentinel lymph node biopsy for
patients with early-stage breast cancer: American society of
clinical oncology clinical practice guideline update. Journal
of Clinical Oncology, 32(13):1365–1383, 2014. 1
[15] R. E. Nakhleh. Error reduction in surgical pathology.
Archives of pathology & laboratory medicine, 130(5):630–
632, 2006. 1
[16] N. Otsu. A Threshold Selection Method from Gray-level
Histograms. IEEE Transactions on Systems, Man and Cy-
bernetics, 9(1):62–66, 1979. 2
[17] S. S. Raab, D. M. Grzybicki, J. E. Janosky, R. J. Zarbo, F. A.
Meier, C. Jensen, and S. J. Geyer. Clinical impact and fre-
quency of anatomic pathology errors in cancer diagnoses.
Cancer, 104(10):2205–2213, 2005. 1
[18] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh,
S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein,
et al. Imagenet large scale visual recognition challenge.
International Journal of Computer Vision, 115(3):211–252,
2015. 5
[19] K. Simonyan and A. Zisserman. Very deep convolu-
tional networks for large-scale image recognition. CoRR,
abs/1409.1556, 2014. 3
[20] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,
D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich.
Going deeper with convolutions. In CVPR 2015, 2015. 3
[21] D. Wang, C. Otto, and A. K. Jain. Face Search at Scale: 80
Million Gallery. 2015. 3
[22] D. L. Weaver, D. N. Krag, E. A. Manna, T. Ashikaga,
S. P. Harlow, and K. D. Bauer. Comparison of pathologist-
detected and automated computer-assisted image analysis
detected sentinel lymph node micrometastases in breast can-
cer. Modern pathology, 16(11):1159–1163, 2003. 1
... In a study conducted by Selvathi and Aarthy Poornila [95], breast cancer diagnosis utilizing medical images and the MIAS mammography dataset was compared using CNNs, sparse autoencoders (SAE), and supervised SAEs (SSAE). GoogleNet, AlexNet, VGG16, and FaceNet were examined by Wang et al. [96] ...
Article
Full-text available
In the realm of image-based breast cancer detection and severity assessment, this study delves into the revolutionary potential of sophisticated artificial intelligence (AI) techniques. By investigating image processing, machine learning (ML), and deep learning (DL), the research illuminates their combined impact on transforming breast cancer diagnosis. This integration offers insights into early identification and precise characterization of cancers. With a foundation in 125 research articles, this article presents a comprehensive overview of the current state of image-based breast cancer detection. Synthesizing the transformative role of AI, including image processing, ML, and DL, the review explores how these technologies collectively reshape the landscape of breast cancer diagnosis and severity assessment. An essential aspect highlighted is the synergy between advanced image processing methods and ML algorithms. This combination facilitates the automated examination of medical images, which is crucial for detecting minute anomalies indicative of breast cancer. The utilization of complex neural networks for feature extraction and pattern recognition in DL models further enhances diagnostic precision. Beyond diagnostic improvements, the abstract underscores the substantial influence of AI-driven methods on breast cancer treatment. The integration of AI not only increases diagnostic precision but also opens avenues for individualized treatment planning, marking a paradigm shift toward personalized medicine in breast cancer care. However, challenges persist, with issues related to data quality and interpretability requiring continued research efforts. Looking forward, the abstract envisions future directions for breast cancer identification and diagnosis, emphasizing the adoption of explainable AI techniques and global collaboration for data sharing. These initiatives promise to propel the field into a new era characterized by enhanced efficiency and precision in breast cancer care.
... Most of the initial advancements of artificial intelligence are depicted in the field of dermatology, radiology, and pathology, noted by an increase in Food and Drug Administration approvals for artificial intelligence devices in these fields [9]. Also, in pathology, a reduction in error rates from 3.4% to 0.5% was seen with the help of artificial intelligence [10]. Artificial intelligence has also seeped deeply into our lives in the form of daily assistance devices such as Siri, Alexa, chatbots, Google assistant, smart homes, navigation apps, autonomous vehicles, etc. ...
Article
Full-text available
Artificial intelligence is rapidly evolving and its application is increasing day-by day in the medical field. The application of artificial intelligence is also valuable in gastrointestinal diseases, by calculating various scoring systems, evaluating radiological images, preoperative and intraoperative assistance, processing pathological slides, prognosticating, and in treatment responses. This field has a promising future and can have an impact on many management algorithms. In this minireview, we aimed to determine the basics of artificial intelligence, the role that artificial intelligence may play in gastrointestinal surgeries and malignancies, and the limitations thereof.
... Over recent years, the application of deep learning artificial intelligence approaches in digital pathology has shown excellent results, both in the broad application of different tissue diseases such as liver cancer [6,7], breast cancer [8][9][10] and esophageal cancer [11,12] and in the optimization and updating of cellular tissues in detection [13], segmentation [14] and classification [15] methods, all of which show great potential. For example, deep learning methods enable networks for Digital whole slide images (WSIs) lymphocyte measurement and segmentation tasks simultaneously [14]; artificial intelligence approaches for lung cancer histopathology classification by supervised or weakly supervised strategies [16]; deep learning for prediction from H&E images of follicular thyroid tumors [17]; and differential diagnosis of follicular thyroid tumors based on histopathological images using deep learning techniques [18]. ...
Article
Full-text available
Background Thyroid cancer is a common thyroid malignancy. The majority of thyroid lesion needs intraoperative frozen pathology diagnosis, which provides important information for precision operation. As digital whole slide images (WSIs) develop, deep learning methods for histopathological classification of the thyroid gland (paraffin sections) have achieved outstanding results. Our current study is to clarify whether deep learning assists pathology diagnosis for intraoperative frozen thyroid lesions or not. Methods We propose an artificial intelligence-assisted diagnostic system for frozen thyroid lesions that applies prior knowledge in tandem with a dichotomous judgment of whether the lesion is cancerous or not and a quadratic judgment of the type of cancerous lesion to categorize the frozen thyroid lesions into five categories: papillary thyroid carcinoma, medullary thyroid carcinoma, anaplastic thyroid carcinoma, follicular thyroid tumor, and non-cancerous lesion. We obtained 4409 frozen digital pathology sections (WSI) of thyroid from the First Affiliated Hospital of Sun Yat-sen University (SYSUFH) to train and test the model, and the performance was validated by a six-fold cross validation, 101 papillary microcarcinoma sections of thyroid were used to validate the system’s sensitivity, and 1388 WSIs of thyroid were used for the evaluation of the external dataset. The deep learning models were compared in terms of several metrics such as accuracy, F1 score, recall, precision and AUC (Area Under Curve). Results We developed the first deep learning-based frozen thyroid diagnostic classifier for histopathological WSI classification of papillary carcinoma, medullary carcinoma, follicular tumor, anaplastic carcinoma, and non-carcinoma lesion. On test slides, the system had an accuracy of 0.9459, a precision of 0.9475, and an AUC of 0.9955. In the papillary carcinoma test slides, the system was able to accurately predict even lesions as small as 2 mm in diameter. Tested with the acceleration component, the cut processing can be performed in 346.12 s and the visual inference prediction results can be obtained in 98.61 s, thus meeting the time requirements for intraoperative diagnosis. Our study employs a deep learning approach for high-precision classification of intraoperative frozen thyroid lesion distribution in the clinical setting, which has potential clinical implications for assisting pathologists and precision surgery of thyroid lesions.
Article
Full-text available
Deep learning (DL), a subfield of machine learning, has made remarkable strides across various aspects of medicine. This review examines DL’s applications in hematology, spanning from molecular insights to patient care. The review begins by providing a straightforward introduction to the basics of DL tailored for those without prior knowledge, touching on essential concepts, principal architectures, and prevalent training methods. It then discusses the applications of DL in hematology, concentrating on elucidating the models’ architecture, their applications, performance metrics, and inherent limitations. For example, at the molecular level, DL has improved the analysis of multi-omics data and protein structure prediction. For cells and tissues, DL enables the automation of cytomorphology analysis, interpretation of flow cytometry data, and diagnosis from whole slide images. At the patient level, DL’s utility extends to analyzing curated clinical data, electronic health records, and clinical notes through large language models. While DL has shown promising results in various hematology applications, challenges remain in model generalizability and explainability. Moreover, the integration of novel DL architectures into hematology has been relatively slow in comparison to that in other medical fields.
Article
Full-text available
Automated detection of cancerous cells in medical imaging holds significant promise for enhancing diagnostic accuracy and improving patient outcomes. This study presents a deep learning model developed for this purpose, evaluated on a dataset of 10,000 annotated medical images. Our model achieved an overall accuracy of 95.2%, precision of 93.8%, recall of 96.5%, F1-score of 95.1%, and an AUC-ROC of 0.982. These results demonstrate superior performance compared to existing state-of-the-art models, highlighting our model's ability to accurately identify cancerous cells while minimizing false positives and false negatives. The model's architecture, a convolutional neural network (CNN), effectively captures the complex patterns indicative of cancerous cells. Techniques such as data augmentation and transfer learning further enhanced the model's training process and generalization capabilities. A detailed analysis using a confusion matrix revealed minimal errors, underscoring the model's robustness and reliability. Despite the promising results, limitations include the need for more diverse datasets and real-time implementation capabilities. Future work should focus on expanding the dataset, optimizing the model for faster inference times, and extensive clinical validation. Enhancing the model's explainability and interpretability will also be crucial for clinical acceptance. In conclusion, our deep learning model significantly advances automated cancer cell detection in medical imaging, offering high accuracy and reliability. These findings support the potential of deep learning to improve diagnostic processes and patient care in clinical settings.
Article
Full-text available
Due to the prevalence of social media websites, one challenge facing computer vision researchers is to devise methods to process and search for persons of interest among the billions of shared photos on these websites. Facebook revealed in a 2013 white paper that its users have uploaded more than 250 billion photos, and are uploading 350 million new photos each day. Due to this humongous amount of data, large-scale face search for mining web images is both important and challenging. Despite significant progress in face recognition, searching a large collection of unconstrained face images has not been adequately addressed. To address this challenge, we propose a face search system which combines a fast search procedure, coupled with a state-of-the-art commercial off the shelf (COTS) matcher, in a cascaded framework. Given a probe face, we first filter the large gallery of photos to find the top-k most similar faces using deep features generated from a convolutional neural network. The k candidates are re-ranked by combining similarities from deep features and the COTS matcher. We evaluate the proposed face search system on a gallery containing 80 million web-downloaded face images. Experimental results demonstrate that the deep features are competitive with state-of-the-art methods on unconstrained face recognition benchmarks (LFW and IJB-A). Further, the proposed face search system offers an excellent trade-off between accuracy and scalability on datasets consisting of millions of images. Additionally, in an experiment involving searching for face images of the Tsarnaev brothers, convicted of the Boston Marathon bombing, the proposed face search system could find the younger brother's (Dzhokhar Tsarnaev) photo at rank 1 in 1 second on a 5M gallery and at rank 8 in 7 seconds on an 80M gallery.
Article
Full-text available
A breast pathology diagnosis provides the basis for clinical treatment and management decisions; however, its accuracy is inadequately understood. To quantify the magnitude of diagnostic disagreement among pathologists compared with a consensus panel reference diagnosis and to evaluate associated patient and pathologist characteristics. Study of pathologists who interpret breast biopsies in clinical practices in 8 US states. Participants independently interpreted slides between November 2011 and May 2014 from test sets of 60 breast biopsies (240 total cases, 1 slide per case), including 23 cases of invasive breast cancer, 73 ductal carcinoma in situ (DCIS), 72 with atypical hyperplasia (atypia), and 72 benign cases without atypia. Participants were blinded to the interpretations of other study pathologists and consensus panel members. Among the 3 consensus panel members, unanimous agreement of their independent diagnoses was 75%, and concordance with the consensus-derived reference diagnoses was 90.3%. The proportions of diagnoses overinterpreted and underinterpreted relative to the consensus-derived reference diagnoses were assessed. Sixty-five percent of invited, responding pathologists were eligible and consented to participate. Of these, 91% (N = 115) completed the study, providing 6900 individual case diagnoses. Compared with the consensus-derived reference diagnosis, the overall concordance rate of diagnostic interpretations of participating pathologists was 75.3% (95% CI, 73.4%-77.0%; 5194 of 6900 interpretations). Among invasive carcinoma cases (663 interpretations), 96% (95% CI, 94%-97%) were concordant, and 4% (95% CI, 3%-6%) were underinterpreted; among DCIS cases (2097 interpretations), 84% (95% CI, 82%-86%) were concordant, 3% (95% CI, 2%-4%) were overinterpreted, and 13% (95% CI, 12%-15%) were underinterpreted; among atypia cases (2070 interpretations), 48% (95% CI, 44%-52%) were concordant, 17% (95% CI, 15%-21%) were overinterpreted, and 35% (95% CI, 31%-39%) were underinterpreted; and among benign cases without atypia (2070 interpretations), 87% (95% CI, 85%-89%) were concordant and 13% (95% CI, 11%-15%) were overinterpreted. Disagreement with the reference diagnosis was statistically significantly higher among biopsies from women with higher (n = 122) vs lower (n = 118) breast density on prior mammograms (overall concordance rate, 73% [95% CI, 71%-75%] for higher vs 77% [95% CI, 75%-80%] for lower, P < .001), and among pathologists who interpreted lower weekly case volumes (P < .001) or worked in smaller practices (P = .034) or nonacademic settings (P = .007). In this study of pathologists, in which diagnostic interpretation was based on a single breast biopsy slide, overall agreement between the individual pathologists' interpretations and the expert consensus-derived reference diagnoses was 75.3%, with the highest level of concordance for invasive carcinoma and lower levels of concordance for DCIS and atypia. Further research is needed to understand the relationship of these findings with patient management.
Article
Full-text available
We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Article
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make train-ing faster, we used non-saturating neurons and a very efficient GPU implemen-tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
Article
Sentinel lymph node biopsy (SLNB) was introduced 2 decades ago and thereafter validated for routine surgical management of breast cancer, including cases treated with neoadjuvant chemotherapy. As the number of lymph nodes for staging has decreased, pathologists have scrutinized SLN with a combination of standard hematoxylin and eosin, levels, immunohistochemistry (IHC), and molecular methods. An epidemic of small-volume metastases thereby arose, leading to modifications in the American Joint Committee on Cancer staging to accommodate findings such as isolated tumor cells (ITC) and micrometastases. With the goal of determining the significance of these findings, retrospective followed by prospective trials were performed, showing mixed results. The ACOSOG Z10 and NSABP B-32 trials both independently showed that ITC and micrometastases were not significant and thus discouraged the use of levels and IHC for detecting them. However, the Surveillance Epidemiology and End Results database showed that patients with micrometastases had an overall decreased survival. In addition, the MIRROR (Micrometastases and ITC: Relevant and Robust or Rubbish?) trial, showed that patients with ITC and micrometastases treated with adjuvant therapy had lower hazard ratios compared with untreated patients. Subsequently, the ACOSOG Z0011 trial randomized patients with up to 2 positive SLN to axillary lymph node dissection (ALND) or not, all treated with radiation and chemotherapy, showing no difference in survival or recurrence rates between the 2 groups and causing a shift from ALND. As the rate of ALND has declined, the necessity of performing levels, IHC, frozen section, and molecular studies on SLN needs to be revisited.
Article
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.