PreprintPDF Available

A Hypersensitive Breast Cancer Detector

Authors:
  • Whiterabbit.ai
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Early detection of breast cancer through screening mammography yields a 20-35% increase in survival rate; however, there are not enough radiologists to serve the growing population of women seeking screening mammography. Although commercial computer aided detection (CADe) software has been available to radiologists for decades, it has failed to improve the interpretation of full-field digital mammography (FFDM) images due to its low sensitivity over the spectrum of findings. In this work, we leverage a large set of FFDM images with loose bounding boxes of mammographically significant findings to train a deep learning detector with extreme sensitivity. Building upon work from the Hourglass architecture, we train a model that produces segmentation-like images with high spatial resolution, with the aim of producing 2D Gaussian blobs centered on ground-truth boxes. We replace the pixel-wise $L_2$ norm with a weak-supervision loss designed to achieve high sensitivity, asymmetrically penalizing false positives and false negatives while softening the noise of the loose bounding boxes by permitting a tolerance in misaligned predictions. The resulting system achieves a sensitivity for malignant findings of 0.99 with only 4.8 false positive markers per image. When utilized in a CADe system, this model could enable a novel workflow where radiologists can focus their attention with trust on only the locations proposed by the model, expediting the interpretation process and bringing attention to potential findings that could otherwise have been missed. Due to its nearly perfect sensitivity, the proposed detector can also be used as a high-performance proposal generator in two-stage detection systems.
Content may be subject to copyright.
A Hypersensitive Breast Cancer Detector
Stefano Pedemonte1, Brent Mombourquette1, Alexis Goh1, Trevor Tsue1, Aaron Long1,
Sadanand Singh1, Thomas Paul Matthews1, Meet Shah1, and Jason Su1
1Whiterabbit AI, Inc., Santa Clara, CA, USA
ABSTRACT
Early detection of breast cancer through screening mammography yields a 20-35% increase in survival rate;1
however, there are not enough radiologists to serve the growing population of women seeking screening mam-
mography.2Although commercial computer aided detection (CADe) software has been available to radiologists
for decades, it has failed to improve the interpretation of full-field digital mammography (FFDM) images due
to its low sensitivity over the spectrum of findings. In this work, we leverage a large set of FFDM images with
loose bounding boxes of mammographically significant findings to train a deep learning detector with extreme
sensitivity. Building upon work from the Hourglass architecture,3we train a model that produces segmentation-
like images with high spatial resolution, with the aim of producing 2D Gaussian blobs centered on ground-truth
boxes. We replace the pixel-wise L2norm with a weak-supervision loss designed to achieve high sensitivity,
asymmetrically penalizing false positives and false negatives while softening the noise of the loose bounding
boxes by permitting a tolerance in misaligned predictions. The resulting system achieves a sensitivity for ma-
lignant findings of 0.99 with only 4.8 false positive markers per image. When utilized in a CADe system, this
model could enable a novel workflow where radiologists can focus their attention with trust on only the locations
proposed by the model, expediting the interpretation process and bringing attention to potential findings that
could otherwise have been missed. Due to its nearly perfect sensitivity, the proposed detector can also be used
as a high-performance proposal generator in two-stage detection systems.
Keywords: radiology, mammography, cancer, CAD, CADe, AI, deep learning, segmentation, weak supervision
1. PURPOSE
With nearly 269,000 new cases and 42,000 deaths each year, breast cancer is the most common and the second
most deadly cancer for women.4Breast cancer screening using full-field digital mammography (FFDM) has led
to a reduction in deaths from this disease.1Developed to aid radiologists in screening, traditional computer
Further author information: (Send correspondence to whiterabbit.ai)
whiterabbit.ai: E-mail: research@whiterabbit.ai
Figure 1. Two examples of detections. Red blobs represent the detector output. Red boxes are Malignant findings marked
by a radiologist. Left: 3 false positives. Right: One false positive.
arXiv:2001.08382v1 [cs.CV] 23 Jan 2020
aided detection (CADe) highlighted suspicious regions in the image. However, it failed to achieve clinical utility
due to its low sensitivity, high false positive rate, and reliance on hand-designed features.5,6
Recently, convolutional neural networks (CNN) have achieved superhuman performance on many imaging
tasks and have been adapted to FFDM to improve detection rates in cancer.7In particular, CNN detection
models localize and classify different findings in images. However, most detection models are trained on well-
labeled, balanced datasets with tight bounding boxes. These models can struggle to adapt to datasets without
these qualities, such as mammography, where the cancer incidence rate is 0.51% and most bounding boxes
annotated in the routine clinical workflow only loosely encapsulate the finding.8Thus, we propose a new weakly-
supervised loss function for a detector that marks malignant findings on mammograms that enables us to achieve
extreme sensitivity and a false positive rate of less than a handful of marks per image.
2. METHODS
2.1 Hourglass Model
Two principal approaches for object detection can be identified in the literature: 1 – detectors that localize and
predict the extent of objects by drawing bounding boxes around them,9,10 2 – detectors that localize objects
by estimating their center coordinates.3This second class of detectors has been developed mostly to localize
body joints for human pose estimation. In this context, a refreshingly simple and high performance solution
was proposed by Newell et al.3Their solution consists of a simple CNN trained to produced a Gaussian blob
at the location of each joint. This CNN, named Hourglass, is composed of several U-Nets, which are stacks of
residual modules that progressively downsample and then upsample the features.11 Before each downsampling,
skip connections are added across modules of identical resolution to facilitate gradient propagation. The key
characteristic of the U-Net and Hourglass architectures is to enable the model to produce outputs with high
spatial resolution while considering a wide receptive field for each output pixel. These end-to-end convolutional
networks are not much different from VGG-style networks, but have the advantage of removing the trade-off
between the size of the field-of-view and the resolution of the model output.
The authors of Hourglass achieve state-of-the-art performance on the body joint localization task by minimiz-
ing the L2norm between the network output and a stack of reference images, each representing a Gaussian blob
located at a different joint location. In our approach for the localization of malignant lesions on mammograms,
we maintain the Hourglass architecture but introduce a new loss that promotes high sensitivity.
2.2 Blobs to coordinates
In pose estimation, where there are a fixed set of points, the Hourglass is trained to produce a single blob output
channel for each joint location in the form ORNjoints×Nx×Ny. We modify the last layer of the network such
that the output is OR1×Nx×Ny, capturing all potential findings in one channel. We then define a new loss
that promotes high sensitivity and that allows for the prediction of any number of blobs at multiple locations.
Conversion of Oto an array of 2-dimensional coordinates is achieved by applying a simple peak-finding algorithm
(See Figure 2-A-D).
2.3 An innovative loss that promotes sensitivity
In Hourglass,3the loss is the L2norm of the difference between the predicted images and a stack of images
containing 2-D Gaussian blobs centered on each joint annotation. We replace the L2norm of the Hourglass
model with one designed around four principles aimed at promoting sensitivity:
(a) Loss should remain unaffected by small mis-alignments between the predicted and ground truth locations.
(b) Loss should remain unaffected by size variations of the predicted blob.
(c) A small number of false positives per image is acceptable.
(d) False positive marks should be penalized less than false negative marks.
(A) ANNOTATED INPUT IMAGE
(B) BACKBONE MODEL
(D) PEAK-FINDING (E) BACKGROUND LOSS MASK
(F) DETECTION LOSS
(C) MODEL OUTPUT
REFERENCE BLOBS
MINIMUM L2-NORM
MASKED-OUT
TOP-K BLOBS
MODEL OUTPUT
BLOB CENTER
ANNOTATION
CENTER
DETECTION
LOSS PATCH
BLOB PEAK
LOCATIONS
ANNOTATION
CENTER
Figure 2. Hourglass network with loss that promotes high sensitivity
The loss that we propose is composed of two terms: a detection loss LDET and a background loss LBG. The
detection loss operates on pixels surrounding the area of an annotation, measuring the similarity between the
model output and a 2-D Gaussian blob. The background loss operates on pixels far away from annotations,
measuring the similarity of the model output to zero.
To calculate the detection loss, first a patch ODET (marked in green in Figure 2-D,F) of a chosen fixed size
centered on the ground truth annotation is extracted from the model output O(in a practical implementation,
this operation happens in-place to enable propagation of the loss gradient). The detection loss is then calculated
as the L2-norm of the difference between ODET and a reference patch R. Following principles (a) and (b), we
incorporate in the loss tolerances to errors in the location of annotations and to varying extent of the findings
by constructing the reference patch Radaptively, as a function of the model output ODET. Invariance of the
loss to small mis-alignments between the predicted and ground truth locations (a) is obtained by generating a
reference 2-D Gaussian blob centered on the model’s predicted blob (see Figure 2-F). Invariance of the loss to the
predicted blob size (b) is obtained by comparing the model output ODET with a bank of reference blobs Riof
different size σi, with i= 1, . . . , NRefBlobs (see Figure 2-F). Only the most similar blob (i.e. the blob that yields
the smallest L2-norm) contributes to the detection loss. Denoting with Na 2D Gaussian over patch coordinates
x:
x0= center of mass(ODET),(1)
Ri(x) = N(x;x0,σi),(2)
R= arg min
ikODET Rik2,(3)
LDET =kODET Rk2.(4)
The background loss enforces the model to produce values close to zero in areas far away from annotations,
evaluating the L2-norm of the model output in the region outside of the detector patch ODET (see Figure 2-E).
In order to promote sensitivity, we relax the effect of the background loss by allowing for a small number of false
positives in each image (c). This is achieved by masking out patches of the model output centered on the top-k
highest confidence blobs generated by the model. Figure 2-E presents an example of background mask M(k) for
k= 2.
LBG =kM(k)Ok2,(5)
where denotes element-wise multiplication. Finally, sensitivity is promoted by penalizing false positive marks
less than false negative marks (d). This is achieved by down-weighting the background loss by a factor ω[0,1],
producing the final loss:
L=LDET +ωLBG (6)
2.4 Dataset
Train Val Test
Patients 49748 6214 6194
Exams 158159 19872 19556
Images 665805 84135 82450
Annotations
Benign 11930 1556 1380
High Risk 891 124 76
Malignant 2699 331 293
Normal Images 653232 82520 81009
Figure 3. Image-level statistics for training (Train), validation
(Val), and testing (Test) datasets and annotation counts.
A total of 197,587 screening mammography exams
(832,390 FFDM images) spanning 62,156 patients
were collected from an academic medical center in the
United States. The exams were interpreted by one of
11 radiologists with breast imaging experience ranging
from 2 to 30 years. Annotations were collected as part
of the routine clinical work flow. The intended clinical
use case of these annotations did not require precise
segmentations nor rigorous definitions of the physical
extent of a finding. Therefore, the tightness of an an-
notation to the finding’s boundary varies from case
to case. The data encompasses all types of mammo-
graphically significant findings that could be encoun-
tered in a screening setting except for breast implants
which were excluded from both training and evalua-
tion. Screening exams were associated with subsequent biopsy events by clinical staff for regulatory compliance
purposes through dedicated mammography reporting software (Magview 7.1, Burtonsville, Maryland). This
provided a structured way to directly link annotations on screening exams to pathology results from biopsies.
Pathology cell type information was mapped to the labels of benign, high risk, or malignant by a fellowship
trained breast imaging radiologist.
The images were classified into four classes: (1) normal, no suspicious tissue was found by a radiologist, (2)
benign, benign tissue was found during screening or biopsy, (3) high risk, tissue likely to develop into cancer
was found during biopsy, (4) malignant, malignant tissue was found during biopsy. During training, the model
was trained to identify high risk (3) and malignant (4) as the positive cases with normal (1) and benign (2)
as the negative cases. Our analysis focused on the detection of biopsy proven malignant findings. Therefore,
though we considered high risk findings as positive examples in training to promote sensitivity, during evaluation
sensitivity was evaluated with malignant (4) as the positive class and all others as the negative class. Patients
were randomly selected for model training, validation, or testing according to a 80:10:10 split. These splits were
on the patient level, so there are no overlapping images, exams, or patients in the different datasets. Training,
hyperparameter tuning, and model selection were completed using only the training and validation sets. The
final performance was evaluated once on the test set after all models had been frozen. Statistics of the three
data sets are reported in Table 3.
2.5 Training
The model was trained on FFDM images resized to 3840 ×3840 then downsampled by a factor of 2.5 in each
dimension. Training occurred for 80 epochs using the Adam optimizer with a learning rate of 1e-4 and background
weight ω= 0.01. One epoch consisted of 8,000 images evenly sampled among the three annotation types and the
set of images without findings. Owing to the fully-convolutional architecture, the size of the input image can be
different for each batch. Thus, in order to speed-up experimentation, we trained the model in two phases. First,
we pre-trained the model on patches of size 1280 ×1280 centered on image annotations, downsampled by 2.5,
and then fine-tuned it on the whole-image input with size 3840 ×3840, downsampled by 2.5.
3. RESULTS
The hypersensitive loss detector achieves a sensitivity for malignancies of 0.99, generating only 4.8 false negative
marks per image. The Hourglass model with the original L2loss only achieves 0.42 sensitivity, so standard
detection loss functions struggle on this challenging detection problem and highly imbalanced dataset. Addition-
ally, we compare our approach to other CADe software in Table 4. Our approach is the only method that both
evaluates on a highly imbalanced dataset that reflects the natural distribution (0.86% malignancy case occur-
rence) and detects masses, microcalcifications, architectural distortions, and asymmetries. These two features
are crucial for a clinically-functional model, as they reflect what a model would encounter in an actual screening
Method Sensitivity FPI Data Type Malignant/Total Cases Dataset
Malich et al. (2001)12 0.900 1.3 M, C 150/150 Private
Petrick et al. (2002)13 0.870 1.5 M 156/156 Private
Baker et al. (2003)60.380 0.7 D 45/45 Private
Dhungel et al. (2015)14 0.75 4.8 M 40/40 DDSM-BCRP
Morra et al. (2015)15 0.890 2.7 M, C 123/175 Private
Ribli et al. (2018)16 0.9 0.3 M, C, D, A 115/115 INbreast
Teuwen et al. (2018)17 0.969 3.6 M, D, A 1153/2878* Private
Moor et al. (2018)18 0.94 7.9 M, D, A 1153/2878* Private
Agarwal et al. (2019)19 0.980 1.7 M 211/223 DDSM + INbreast
Ours 0.990 4.8 M, C, D, A 168/19556 Private
Figure 4. Sensitivity and false positives per image (FPI) for different CADe systems. M = Masses, C = Microcalcifications,
D = Architectural Distortions, A = Asymmetries. * Approximation (exact numbers not given)
population, where only 0.51% of exams have cancer and exams can have any type of lesion.8Additionally, we
experiment in Figure 5to demonstrate the positive effect on sensitivity of each of the four innovative components
of the loss described in Section 2.3.
4. CONCLUSIONS
This work presents a novel loss function for training mammography CADe models. By considering the specific
properties of both the problem domain and the data, design and optimization of this loss produce a hypersensitive
detector with an acceptable false positive rate. Moreover, this loss function, though inspired by mammography
CADe, has components directly applicable to many image-based detection tasks. Data with large variances in
both segmentation quality and physical extent of findings to be detected are exceedingly common especially in
the medical imaging domain. The proposed loss function makes no assumptions about the underlying model and
can be used with any segmentation-based network architecture.
The low sensitivity of existing CADe systems limits their applicability in the radiologists’ interpretation
workflow. Since malignant findings can be missed by a low-sensitivity CADe system, the human reader cannot
rely on software generated annotations. Therefore existing CADe systems are utilized as second readers to bring
back the radiologist’s attention to image areas that the algorithm considers suspicious. However, unlike other
CADe systems, our model performs with a nearly perfect sensitivity and an acceptable rate of false positives on
a dataset that reflects the natural distribution, with a 0.86% cancer occurrence rate of all the different types
of lesions: masses, microcalcifications, architectural distortions, and asymmetries. Thus, this nearly perfect
sensitivity to malignant findings enabled by the new loss function described in this work could potentially allow
radiologists to safely ignore regions of the images not highlighted by the algorithm. We believe that these are
important steps forward for improving, simplifying, and substantially expediting radiologists’ interpretations of
mammograms.
5. FUTURE WORK
The high sensitivity of this detection model allows the detection of nearly all malignant findings in all images,
making it an ideal proposal-generator for a two-stage detection system.20 In the first stage, this detection model
identifies the suspicious regions of each full image, ideally capturing all the malignant lesions with high sensitivity.
In the second stage, we can train a classifier on image patches from the proposals generated by the first stage.
This second model solely inputs smaller patches and is trained specifically on these proposals, which are the
most suspicious and challenging image areas to classify. Such a two stage model could increase the specificity
while maintaining the high level of sensitivity.
6. DISCLOSURE
This work has not been submitted to any journal or conference for publication or presentation consideration.
Figure 5. Effect of removing the four aspects of the loss on sensitivity and false positives per image (FPI). The plots
represent sensitivity as a function of the average number of false positive marks per image for increasing values of a
threshold applied to the predicted Gaussian blob intensity. All aspects increase the sensitivity: (a) invariance to blob
alignment, (b) invariance to blob size, (c) accepting top-kfalse positives per image (k= 3), (d) down-weighting the
background loss (ω= 0.01). For several of the curves (baseline, (a), (c), (d)), the sensitivity does not reach a value greater
than 0.96 even when all blobs are accepted, confirming the importance of each component of the loss. Blob size invariance
(b) has the effect of reducing the false positive rate while maintaining maximum sensitivity.
REFERENCES
[1] Elmore, J. G., Armstrong, K., Lehman, C. D., and Fletcher, S. W., “Screening for breast cancer,”
Jama 293(10), 1245–1256 (2005).
[2] Baxi, S. S., Liberman, L., Lee, C., and Elkin, E. B., “Breast imaging fellowships in the united states: who,
what, and where?,” American Journal of Roentgenology 192(2), 403–407 (2009).
[3] Newell, A., Yang, K., and Deng, J., “Stacked hourglass networks for human pose estimation,” in [European
conference on computer vision], 483–499, Springer (2016).
[4] Siegel, R. L., Miller, K. D., and Jemal, A., “Cancer statistics, 2019,” CA: A Cancer Journal for Clini-
cians 69(1), 7–34 (2019).
[5] Lehman, C. D., Wellman, R. D., Buist, D. S., Kerlikowske, K., Tosteson, A. N., and Miglioretti, D. L., “Di-
agnostic accuracy of digital screening mammography with and without computer-aided detection,” JAMA
internal medicine 175(11), 1828–1837 (2015).
[6] Baker, J. A., Rosen, E. L., Lo, J. Y., Gimenez, E. I., Walsh, R., and Soo, M. S., “Computer-aided detec-
tion (cad) in screening mammography: Sensitivity of commercial cad systems for detecting architectural
distortion,” American Journal of Roentgenology 181, 1083–1088 (Oct 2003).
[7] Wu, N., Phang, J., Park, J., Shen, Y., Huang, Z., Zorin, M., Jastrzkebski, S., F´evry, T., Katsnelson, J.,
Kim, E., Wolfson, S., Parikh, U., Gaddam, S., Lin, L. L. Y., Ho, K., Weinstein, J. D., Reig, B., Gao, Y.,
Toth, H., Pysarenko, K., Lewin, A., Lee, J., Airola, K., Mema, E., Chung, S., Hwang, E., Samreen, N.,
Kim, S. G., Heacock, L., Moy, L., Cho, K., and Geras, K. J., “Deep neural networks improve radiologists’
performance in breast cancer screening,” CoRR abs/1903.08297 (2019).
[8] Lehman, C. D., Arao, R. F., Sprague, B. L., Lee, J. M., Buist, D. S., Kerlikowske, K., Henderson, L. M.,
Onega, T., Tosteson, A. N., Rauscher, G. H., and Miglioretti, D. L., “National Performance Benchmarks
for Modern Screening Digital Mammography: Update from the Breast Cancer Surveillance Consortium,”
Radiology 283, 49–58 (04 2017).
[9] He, K., Gkioxari, G., Dollar, P., and Girshick, R., “Mask r-cnn,” in [The IEEE International Conference
on Computer Vision (ICCV)], (Oct 2017).
[10] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. C., “Ssd: Single shot
multibox detector,” in [European conference on computer vision], 21–37, Springer (2016).
[11] Ronneberger, O., Fischer, P., and Brox, T., “U-net: Convolutional networks for biomedical image seg-
mentation,” in [International Conference on Medical image computing and computer-assisted intervention],
234–241, Springer (2015).
[12] Malich, A., Marx, C., Facius, M., Boehm, T., Fleck, M., and Kaiser, W. A., “Tumour detection rate of
a new commercially available computer-aided detection system,” European Radiology 11, 2454–2459 (Dec
2001).
[13] Petrick, N., Sahiner, B., Chan, H.-P., Helvie, M. A., Paquerault, S., and Hadjiiski, L. M., “Breast cancer
detection: Evaluation of a mass-detection algorithm for computer-aided diagnosisexperience in 263 patients,”
Radiology 224(1), 217–224 (2002). PMID: 12091686.
[14] Dhungel, N., Carneiro, G., and Bradley, A. P., “Automated mass detection in mammograms using cas-
caded deep learning and random forests,” in [2015 International Conference on Digital Image Computing:
Techniques and Applications (DICTA)], 1–8 (Nov 2015).
[15] Morra, L., Sacchetto, D., Durando, M., Agliozzo, S., Carbonaro, L. A., Delsanto, S., Pesce, B., Persano, D.,
Mariscotti, G., Marra, V., Fonio, P., and Bert, A., “Breast cancer: Computer-aided detection with digital
breast tomosynthesis,” Radiology 277(1), 56–63 (2015). PMID: 25961633.
[16] Ribli, D., Horvth, A., Unger, Z., Pollner, P., and Csabai, I., “Detecting and classifying lesions in mammo-
grams with deep learning,” Scientific Reports 8(1) (2018).
[17] Jonas Teuwen, Sil C. van de Leemput, A. G.-M. A. R.-R. R. M. B. B., “Soft tissue lesion detection in
mammography using deep neural networks for object detection,” (2018).
[18] de Moor, T., Rodr´ıguez-Ruiz, A., Mann, R., and Teuwen, J., “Automated soft tissue lesion detection
and segmentation in digital mammography using a u-net deep learning network,” CoRR abs/1802.06865
(2018).
[19] Richa Agarwal, Oliver Diaz, X. L. M. H. Y.-R. M., “Automatic mass detection in mammograms using deep
convolutional neural networks,” Journal of Medical Imaging 6(3), 1 – 9 – 9 (2019).
[20] Ren, S., He, K., Girshick, R., and Sun, J., “Faster r-cnn: Towards real-time object detection with region
proposal networks,” in [Advances in Neural Information Processing Systems 28], Cortes, C., Lawrence,
N. D., Lee, D. D., Sugiyama, M., and Garnett, R., eds., 91–99, Curran Associates, Inc. (2015).
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting the presence of cancer in the breast, when tested on the screening population. We attribute the high accuracy to a few technical advances. (i) Our network’s novel two-stage architecture and training procedure, which allows us to use a high-capacity patch-level network to learn from pixel-level labels alongside a network learning from macroscopic breast-level labels. (ii) A custom ResNet-based network used as a building block of our model, whose balance of depth and width is optimized for high-resolution medical images. (iii) Pretraining the network on screening BI-RADS classification, a related task with more noisy labels. (iv) Combining multiple input views in an optimal way among a number of possible choices. To validate our model, we conducted a reader study with 14 readers, each reading 720 screening mammogram exams, and show that our model is as accurate as experienced radiologists when presented with the same data. We also show that a hybrid model, averaging the probability of malignancy predicted by a radiologist with a prediction of our neural network, is more accurate than either of the two separately. To further understand our results, we conduct a thorough analysis of our network’s performance on different subpopulations of the screening population, the model’s design, training procedure, errors, and properties of its internal representations. Our best models are publicly available at https://github.com/nyukat/breastcancerclassifier.
Article
Full-text available
With recent advances in the field of deep learning, the use of convolutional neural networks (CNNs) in medical imaging has become very encouraging. The aim of our paper is to propose a patch-based CNN method for automated mass detection in full-field digital mammograms (FFDM). In addition to evaluating CNNs pretrained with the ImageNet dataset, we investigate the use of transfer learning for a particular domain adaptation. First, the CNN is trained using a large public database of digitized mammograms (CBIS-DDSM dataset), and then the model is transferred and tested onto the smaller database of digital mammograms (INbreast dataset). We evaluate three widely used CNNs (VGG16, ResNet50, InceptionV3) and show that the InceptionV3 obtains the best performance for classifying the mass and nonmass breast region for CBIS-DDSM. We further show the benefit of domain adaptation between the CBIS-DDSM (digitized) and INbreast (digital) datasets using the InceptionV3 CNN. Mass detection evaluation follows a five fold cross-validation strategy using free-response operating characteristic curves. Results show that the transfer learning from CBIS-DDSM obtains a substantially higher performance with the best true positive rate (TPR) of 0.98 � 0.02 at 1.67 false positives per image (FPI), compared with transfer learning from ImageNet with TPR of 0.91 � 0.07 at 2.1 FPI. In addition, the proposed framework improves upon mass detection results described in the literature on the INbreast database, in terms of both TPR and FPI.
Article
Full-text available
Computer-aided detection or decision support systems aim to improve breast cancer screening programs by helping radiologists to evaluate digital mammography (DM) exams. Commonly such methods proceed in two steps: selection of candidate regions for malignancy, and later classification as either malignant or not. In this study, we present a candidate detection method based on deep learning to automatically detect and additionally segment soft tissue lesions in DM. A database of DM exams (mostly bilateral and two views) was collected from our institutional archive. In total, 7196 DM exams (28294 DM images) acquired with systems from three different vendors (General Electric, Siemens, Hologic) were collected, of which 2883 contained malignant lesions verified with histopathology. Data was randomly split on an exam level into training (50\%), validation (10\%) and testing (40\%) of deep neural network with u-net architecture. The u-net classifies the image but also provides lesion segmentation. Free receiver operating characteristic (FROC) analysis was used to evaluate the model, on an image and on an exam level. On an image level, a maximum sensitivity of 0.94 at 7.93 false positives (FP) per image was achieved. Similarly, per exam a maximum sensitivity of 0.98 at 7.81 FP per image was achieved. In conclusion, the method could be used as a candidate selection model with high accuracy and with the additional information of lesion segmentation.
Article
Full-text available
In the last two decades Computer Aided Diagnostics (CAD) systems were developed to help radiologists analyze screening mammograms. The benefits of current CAD technologies appear to be contradictory and they should be improved to be ultimately considered useful. Since 2012 deep convolutional neural networks (CNN) have been a tremendous success in image recognition, reaching human performance. These methods have greatly surpassed the traditional approaches, which are similar to currently used CAD solutions. Deep CNN-s have the potential to revolutionize medical image analysis. We propose a CAD system based on one of the most successful object detection frameworks, Faster R-CNN. The system detects and classifies malignant or benign lesions on a mammogram without any human intervention. Our approach described here has achieved the 2nd place in the Digital Mammography DREAM Challenge with $ AUC = 0.85 $. The proposed method also sets the state of the art classification performance on the public INbreast database, $ AUC = 0.95$. When used as a detector, the system reaches high sensitivity with very few false positive marks per image on the INbreast dataset.
Article
Full-text available
We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of bounding box priors over different aspect ratios and scales per feature map location. At prediction time, the network generates confidences that each prior corresponds to objects of interest and produces adjustments to the prior to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Our SSD model is simple relative to methods that requires object proposals, such as R-CNN and MultiBox, because it completely discards the proposal generation step and encapsulates all the computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on ILSVRC DET and PASCAL VOC dataset confirm that SSD has comparable performance with methods that utilize an additional object proposal step and yet is 100-1000x faster. Compared to other single stage methods, SSD has similar or better performance, while providing a unified framework for both training and inference.
Article
Full-text available
Importance: After the US Food and Drug Administration (FDA) approved computer-aided detection (CAD) for mammography in 1998, and the Centers for Medicare and Medicaid Services (CMS) provided increased payment in 2002, CAD technology disseminated rapidly. Despite sparse evidence that CAD improves accuracy of mammographic interpretations and costs over $400 million a year, CAD is currently used for most screening mammograms in the United States. Objective: To measure performance of digital screening mammography with and without CAD in US community practice. Design, setting, and participants: We compared the accuracy of digital screening mammography interpreted with (n = 495 818) vs without (n = 129 807) CAD from 2003 through 2009 in 323 973 women. Mammograms were interpreted by 271 radiologists from 66 facilities in the Breast Cancer Surveillance Consortium. Linkage with tumor registries identified 3159 breast cancers in 323 973 women within 1 year of the screening. Main outcomes and measures: Mammography performance (sensitivity, specificity, and screen-detected and interval cancers per 1000 women) was modeled using logistic regression with radiologist-specific random effects to account for correlation among examinations interpreted by the same radiologist, adjusting for patient age, race/ethnicity, time since prior mammogram, examination year, and registry. Conditional logistic regression was used to compare performance among 107 radiologists who interpreted mammograms both with and without CAD. Results: Screening performance was not improved with CAD on any metric assessed. Mammography sensitivity was 85.3% (95% CI, 83.6%-86.9%) with and 87.3% (95% CI, 84.5%-89.7%) without CAD. Specificity was 91.6% (95% CI, 91.0%-92.2%) with and 91.4% (95% CI, 90.6%-92.0%) without CAD. There was no difference in cancer detection rate (4.1 in 1000 women screened with and without CAD). Computer-aided detection did not improve intraradiologist performance. Sensitivity was significantly decreased for mammograms interpreted with vs without CAD in the subset of radiologists who interpreted both with and without CAD (odds ratio, 0.53; 95% CI, 0.29-0.97). Conclusions and relevance: Computer-aided detection does not improve diagnostic accuracy of mammography. These results suggest that insurers pay more for CAD with no established benefit to women.
Conference Paper
Full-text available
Mass detection from mammogram plays an crucial role as a pre-processing stage for mass segmentation and classification. In this paper, we present a novel approach for detecting masses from mammo-grams using a cascade of deep learning and random forest classifiers. The deep learning classifier consists of a multi-scale deep belief network classifier that selects regions to be further processed by a two-level cascade of deep convolutional neural networks. The regions that survive this deep learning analysis is then processed by a two-level cascade of random forest classifiers that use several morphological and texture features extracted from those surviving regions. We show that the proposed cascade of deep learning and random forest classifiers are effective in the reduction of false positive regions, while keeping a high true positive detection, and that the final mass detection produced by our approach achieves the best results in the field on public mammogram datasets.
Article
Purpose To establish performance benchmarks for modern screening digital mammography and assess performance trends over time in U.S. community practice. Materials and Methods This HIPAA-compliant, institutional review board-approved study measured the performance of digital screening mammography interpreted by 359 radiologists across 95 facilities in six Breast Cancer Surveillance Consortium (BCSC) registries. The study included 1 682 504 digital screening mammograms performed between 2007 and 2013 in 792 808 women. Performance measures were calculated according to the American College of Radiology Breast Imaging Reporting and Data System, 5th edition, and were compared with published benchmarks by the BCSC, the National Mammography Database, and performance recommendations by expert opinion. Benchmarks were derived from the distribution of performance metrics across radiologists and were presented as 50th (median), 10th, 25th, 75th, and 90th percentiles, with graphic presentations using smoothed curves. Results Mean screening performance measures were as follows: abnormal interpretation rate (AIR), 11.6 (95% confidence interval [CI]: 11.5, 11.6); cancers detected per 1000 screens, or cancer detection rate (CDR), 5.1 (95% CI: 5.0, 5.2); sensitivity, 86.9% (95% CI: 86.3%, 87.6%); specificity, 88.9% (95% CI: 88.8%, 88.9%); false-negative rate per 1000 screens, 0.8 (95% CI: 0.7, 0.8); positive predictive value (PPV) 1, 4.4% (95% CI: 4.3%, 4.5%); PPV2, 25.6% (95% CI: 25.1%, 26.1%); PPV3, 28.6% (95% CI: 28.0%, 29.3%); cancers stage 0 or 1, 76.9%; minimal cancers, 57.7%; and node-negative invasive cancers, 79.4%. Recommended CDRs were achieved by 92.1% of radiologists in community practice, and 97.1% achieved recommended ranges for sensitivity. Only 59.0% of radiologists achieved recommended AIRs, and only 63.0% achieved recommended levels of specificity. Conclusion The majority of radiologists in the BCSC surpass cancer detection recommendations for screening mammography; however, AIRs continue to be higher than the recommended rate for almost half of radiologists interpreting screening mammograms. (©) RSNA, 2016 Online supplemental material is available for this article.
Conference Paper
This work introduces a novel convolutional network architecture for the task of human pose estimation. Features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body. We show how repeated bottom-up, top-down processing used in conjunction with intermediate supervision is critical to improving the performance of the network. We refer to the architecture as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions. State-of-the-art results are achieved on the FLIC and MPII benchmarks outcompeting all recent methods.
Conference Paper
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .