Lung Nodule Detection using 3D Convolutional Neural
Networks Trained on Weakly Labeled Data
Rushil Anirudh1, Jayaraman J. Thiagarajan2, Timo Bremer2, and Hyojin Kim2
1School of Electrical, Computer and Energy Engineering, Arizona State University
2Center for Applied Scientiﬁc Computing, Lawrence Livermore National Laboratory
Early detection of lung nodules is currently the one of the most eﬀective ways to predict and treat lung cancer.
As a result, the past decade has seen a lot of focus on computer aided diagnosis (CAD) of lung nodules, whose
goal is to eﬃciently detect, segment lung nodules and classify them as being benign or malignant. Eﬀective
detection of such nodules remains a challenge due to their arbitrariness in shape, size and texture. In this paper,
we propose to employ 3D convolutional neural networks (CNN) to learn highly discriminative features for nodule
detection in lieu of hand-engineered ones such as geometric shape or texture. While 3D CNNs are promising
tools to model the spatio-temporal statistics of data, they are limited by their need for detailed 3D labels, which
can be prohibitively expensive when compared obtaining 2D labels. Existing CAD methods rely on obtaining
detailed labels for lung nodules, to train models, which is also unrealistic and time consuming. To alleviate this
challenge, we propose a solution wherein the expert needs to provide only a point label, i.e., the central pixel of
of the nodule, and its largest expected size. We use unsupervised segmentation to grow out a 3D region, which is
used to train the CNN. Using experiments on the SPIE-LUNGx dataset, we show that the network trained using
these weak labels can produce reasonably low false positive rates with a high sensitivity, even in the absence of
accurate 3D labels.
The last decade has seen signiﬁcant advances in using machine learning for computer aided diagnosis (CAD),
which can signiﬁcantly improve eﬃciency and reduce costs. The continued success of CAD tools can be at-
tributed to the development of feature representations that can work well under several diﬀerent conditions with
invariances to properties such as brightness, shape, size and geometric transformations. More recently, advances
in representation learning (e.g. deep neural networks) have enabled inference of features from training data
in lieu of hand-tuned feature design by an expert.1These have resulted in signiﬁcant boosts in accuracy for
tasks such as image recognition, natural language understanding, and speech recognition.1However, there are
signiﬁcant hurdles before such successes can be transferred to beneﬁt the medical imaging community. A major
limiting factor is the diﬃculty in obtaining annotated data, which is signiﬁcantly more expensive than compared
to traditional computer vision. In this paper, we consider the problem of detecting early stage lung nodules
based on learned representations. This is a crucial problem in medical diagnosis since it is estimated that more
people died due to lung and bronchus cancer than all other cancers combined in 2015.2Classical approaches
typically segment the lung, extract features from the training data and train a classiﬁer to detect potential nod-
ules.3However, lung nodule detection is inherently more challenging due to the high variability of nodule shape,
size, and texture. As a result, nodule detection techniques that employ classiﬁers learned using hand-engineered
features often provide poor generalization to novel test data. More recent approaches that employ deep neural
networks in their pipeline have achieved state-of-the-art detection performances. For example, Kumar et al.4
use an autoencoder (an unsupervised learning network) to extract useful features from annotated nodules, these
features are used to learn to classify nodules as being malignant or benign. Next, Ginneken et al.5have shown
promising results using an oﬀ-the-shelf convolutional neural network (CNN), one that is pre-trained for an image
recognition task. They use the network to obtain features which are used for classiﬁcation. Two dimensional
CNNs have been used in other CAD methods such as pancreas segmentation, lymph nodes and colonic polyp
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National
Laboratory under Contract DE-AC52-07NA27344. Corresponding author email: email@example.com
detection.6Most of these methods are trained individually on 2D images with 2D convolutional ﬁlters, whereas
the data at hand is inherently 3 dimensional. Roth et. al.7have addressed this by considering a ‘2.5D’ represen-
tation that takes slices of the images from a point of interest in 3 orthogonal views. These slices are combined
to be treated as a 3-channel image, which is used to train a deep network. In contrast, we propose to train a
full 3D convolutional network, that can directly learn 3D convolutional ﬁlters from the data. Such ﬁlters are
beneﬁcial because it can capture the full range of variations expected from the lung nodules. However, there
are two crucial challenges in generalizing 2D convolutional networks to 3D. The foremost challenge is the need
for labeled training data, which can be prohibitively expensive to obtain for 3D images. In fact, most existing
systems for detection rely on detailed nodule segmentations provided by an expert for model training - which
is particularly unrealistic in the case convolutional neural nets since they require a much larger training dataset
for learning features with high representative power. In several computer vision applications, this problem is
addressed by outsourcing labeling using services such as Amazons MTurk, which cannot be readily adapted for
medical image analysis since experts are required to eﬀectively interpret the data. Consequently, the proposed
detection system reduces the labeling eﬀort on experts by working with “point labels” which are essentially
single pixel locations potentially indicating the center of the nodules. By using unsupervised learning methods
to estimate the true label from the weak information, we show that we can reduce the eﬀort required on the
expert to label, while being able to train 3D networks that can discriminate eﬀectively. The second challenge is
pertinent to the computational burden of 3D neural networks. While 3D convolutions are expensive, particularly
for processing 3D scans (typically 512×512 ×200), building a single network that can potentially handle nodules
of varying sizes in diﬀerent regions of the scan is hard. To circumvent this, we propose to train our network
on smaller 3D regions centered around the nodule instead of the whole image, and simultaneously build two
networks with diﬀerent context sizes, 41 ×41 ×7,and 25 ×25 ×7 respectively. The ﬁnal detection is obtained
as the consensus of the two networks. Our primary contributions can be summarized as follows:
1. We present a modular system that leverages the robustness of 3D convolutional neural networks for the
problem of lung nodule detection. Our system learns the most discriminative features for nodule detection
instead of working with hand engineered features such as shape and texture. To the best of our knowledge,
we are the ﬁrst to explore lung nodule detection using 3D convolutional ﬁlters.
2. Our system works with point labels, which specify a single voxel location that indicates the presence of
a nodule, and its largest cross sectional area. This is much more time eﬃcient compared to the detailed
annotations of a nodule in the training set, which is highly impractical since experts are needed to provide
these. Using unsupervised learning methods, we estimate a ﬁnal 3D label which is used to train our 3D
3. By learning two diﬀerent networks with varying context sizes, our detection system achieves improved
4. We demonstrate promising results on the AAPM-SPIE-LungX nodule classiﬁcation dataset.
In this section we outline diﬀerent aspects of our system, that can make predictions on 3D volumes of CT scans.
First we address the label estimation procedure for training data. Next we use these estimated labels to train
our 3D convolutional network.
2.1 Estimating weak labels
A limiting factor for using 3D CNNs is the cost of obtaining detailed 3D labels, which are signiﬁcantly harder
to obtain than 2D labels. This is exacerbated by the fact that experts such as radiologists are needed to label
lung nodules, as opposed to crowd sourcing platforms such as Amazon MTurk, which have become a norm in
computer vision. Therefore, we begin by using only a single voxel location or a point label, which indicates the
presence of a nodule. Such point labels are a natural way for experts to annotate lung scans eﬃciently. We
Figure 1: Estimating the ground truth per slice from a point label given by the expert. These 3D labels are used to train
process the slices in 2D, and combine them using 3D Gaussian ﬁltering. First, we obtain 2D SLIC superpixels8
to oversegment each slice, as shown in the ﬁgure 1. These superpixels ﬁnd contiguous regions in the image, which
are used to eliminate obvious regions that are not nodules based on size and intensity. The 3D Gaussian ﬁltering
reduces noise and combines the 2D slices to form a coherent 3D nodule. We are able to do this accurately because
we are looking at a local neighborhood around the nodule. The size of the local neighborhood is determined
by the largest cross sectional area of the nodule, as given by the expert. The superpixels can eﬀectively aid in
capturing nodules that are hard to distinguish at times, such as when they are touching a lung wall.
The 3D CNN is trained to predict whether or not a single voxel is likely to be a nodule or not, based on the
spatio-temporal statistics around it. For example, if the location of the nodule is at V(x, y, z ), where Vis the
entire CT volume, we choose the input volume to be ˆv=V(x−w:x+w, y −w:y+w, z −h:z+h), where his the
window size in X, Y planes and hin the Zplane. We used values in the range of w= 10 −25 and h= 3,5. The
volume is thinner in the zplane because CT scans are typically sampled much more densely in X, Y planes than
in Z. There are at the most 2 nodules per scan, but training a 3D CNN requires many examples to eﬀectively
learn the ﬁlters. Therefore in order to inﬂate our training set, we treat diﬀerent voxels within the same nodule
as diﬀerent positive examples. A typical nodule can range from 3 −28 pixels wide at its largest size, and spans
3−7 slices typically. We center our volume at several diﬀerent randomly sampled voxels within the nodule and
pick the resulting volume for a given w, h as a positive training example. Inﬂating training sets have been useful
to train networks that can achieve robustness and avoid overﬁtting.7, 9 The negative set is much harder to obtain
than the positive set because its hard to deﬁne it. A negative class contains all examples from the lung that are
not the nodule. A smart approach can provide a much better deﬁnition of what a negative sample should be.
In the ideal case, we only need to choose negative samples to be those which are expected to be easily mistaken
by our network. Therefore, we restrict the negative space to lie within the lung, since it is highly unlikely for
a nodule to be found outside it. Next, we random sample locations which have an intensity above a threshold
(≈400 −500 on the Hounsﬁeld scale). These sampling methods resulted in about 15Kpositive samples and
around 20Knegative samples from the AAPM-SPIE-LungX dataset.10,11
2.3 Architecture of the Convolutional Neural Net
We trained a 3D CNN using the MatConvNet toolbox for MATLAB.12 The toolbox allows us to specify the kind
of layers, and the number of ﬁlters needed. The network was designed to be similar to most of the popular models
for image recognition.9As shown in ﬁgure 2, our network contained 5 convolutional layers which were followed
by Rectiﬁed Linear Unit (ReLU) activation layers,92 max-pooling layers, and a ﬁnal 2-way softmax layer for
classiﬁcation. We also use dropout13 to regularize the learning problem. Of the ﬁve convolutional layers, two are
fully connected (FC) with convolution kernels of size 1×1. The generalization from 2D convolutional networks to
3D networks is trivial, in that the ﬁlters that are learned are 3 dimensional. Since we use a multiscale approach,
Figure 2: Overall design of the 3D convolutional neural network trained for lung nodule detection.
we train two diﬀerent networks for each scale, the convolutional ﬁlter sizes for the larger 3D CNN are shown in
2. For the smaller scale, we use the same architecture, and modify the sizes of kernels accordingly.
2.4 Testing and candidate generation
Our network is trained end-to-end, i.e. it is able to make predictions regarding the presence of a nodule directly
from a CT volume of the appropriate size. However, a typical scan is 512 ×512 ×200 in size and searching the
entire 3D volume to make predictions is highly impractical. Instead, we reduce the search space signiﬁcantly by
ruling out parts of the scan that are very unlikely to contain a lung nodule. Since the lung nodule is expected to be
inside the lung, we perform lung segmentation using morphological operations on each 2D slice. The segmentation
of the lung itself is a hard problem, and there are dedicated systems to perform eﬀective segmentation in 2D and
3D. We also observed that most of the false positive detections on the system were because of the airways which
are part of the lung but look a lot like nodules when observed locally. Therefore a robust 3D segmentation can
signiﬁcantly reduce the false positive rate, and improve speed of detection. Next, for each voxel we apply the dot
enhancement ﬁlter using the 3D Hessian. The resulting “dot score” is high if the region around the current voxel
is spherical.6The dot score map is thresholded in each local neighborhood to provide the ﬁnal list of candidates.
This method can be very eﬀective when the nodules are expected to be approximately round in shape. The dot
score is computed as |λ3|2/λ1, where λ1, λ3are the ﬁrst and third eigenvalues of the 3D Hessian. The dot score
for a given volume essentially provides an estimation of its roundness such that a high score indicates a tendency
towards roundness. We set a low threshold to eliminate obviously non nodule-like elements, and run 3D Gaussian
smoothing ﬁlter to remove smaller stray particles within the volume. These steps signiﬁcantly reduce the false
positives, resulting in around 80-200 3D nodule like candidates per scan. After smoothing, these can be easily
identiﬁed using a 3D connected component algorithm eﬃciently. Finally, we center a test volume at multiple
locations inside each candidate and obtain a prediction from the deep network. This also allows us to perform
voting in order to eliminate noisy predictions by running a smoothing ﬁlter on the predictions.
In this section we describe the dataset, experimental conditions, and results obtained for lung nodule detection.
SPIE-AAPM-LUNGx dataset: The dataset has been published for nodule classiﬁcation, which requires
labeling each nodule as benign or malignant. We use the dataset for detection, as it does not contain detailed
labels for nodules, and hence a realistic test case. Of the 70 scans, we have used 20 for training and 47 for
testing. Three scans were discarded because there was ambiguity regarding the presence of a nodule at the
speciﬁed location. The label is provided as an (x, y, z) location along with information on the largest cross
sectional area of the nodule. We did not use this information, however, it could be used to estimate better labels.
3.1 Evaluation settings
For the test scans, we ﬁrst generate ground truth labels in a similar fashion as described for the training data.
These estimated labels are used to evaluate the performance of our system on the dataset.
Multiscale CNN: The lung nodules vary signiﬁcantly in size – typically from around 3mm - 20mm. Many
successful detection systems employ a multi-scale architecture. Since we are interested in 3D volumes, there are
several ways to choose the scale. We chose two scales at 25×25×7, and 41×41×7 experimentally. We train them
separately and obtain the predictions from each CNN to obtain the ﬁnal result. The combination performed
much better in terms of sensitivity and accuracy as expected. Finally, we generate the free receiver operating
curves at various detection thresholds. At a particular threshold, we declare a match if there is a nodule around
a small radius (typically 5 −10) of the ground truth. This is done by ﬁrst estimating the centroid of each 3D
blob in the test prediction that is greater than the threshold. Next, we ﬁnd the distances from each centroid to
the ground truth. Only the one that is closest and within a distance threshold is considered a positive, the rest
are considered false positives. For each threshold the total number of false positives divided by the number of
scans gives the average false positive rate.
Results: We compute the free receiver operating characteristic (FROC) for our system, which plots the sensi-
tivity against the average number of false positives per scan. The results are shown in ﬁgure 3a. As it can be
seen, even with a weak labeled system, we achieve sensitivities of 80% for 10 false positives per scan. Sample
predictions are shown in ﬁgure 3.
Average False Positives Per Scan
10 20 30 40
Free Reciever Operating Characteristic
2 scale CNN
(a) ROC curve (b) Sample results
Figure 3: Detection performance on the SPIE-AAPM LUNGx dataset.
4. CONCLUSION & FUTURE WORK:
We have presented a system for lung nodule detection that works with 3D convolutional networks trained using
weak label information. While the initial results look promising, there are areas to further improve the system.
Our current system currently processes superpixels in 2D, this could be improved with a 3D superpixel system
that clusters coherent spatio-temporal regions. A 3D lung segmentation approach could also eliminate the air
tracts which are a primary cause for false positives, but these cannot be diﬀerentiated when observed in 2D.
Next, the training set can be inﬂated even further using 3D transforms of existing labels, which has been done
for 2D CNNs. Such a technique will also ensure there is little overlap between the original label and transformed
label to avoid overﬁtting.
 LeCun, Y., Bengio, Y., and Hinton, G., “Deep learning,” Nature 521(7553), 436–444 (2015).
 http://www.lung.org/lung- disease/lung-cancer/lung-cancer-screening-guidelines/
lung-cancer-screening-for-patients.pdf. [accessed 13-Aug-2015].
 Dhara, A. K., Mukhopadhyay, S., and Khandelwal, N., “Computer-aided detection and analysis of pul-
monary nodule from ct images: A survey,” IETE Technical Review 29(4), 265–275 (2012).
 Kumar, D., Wong, A., and Clausi, D. A., “Lung nodule classiﬁcation using deep features in ct images,” in
[Computer and Robot Vision (CRV), 2015 12th Conference on], 133–138, IEEE (2015).
 van Ginneken, B., Setio, A. A., Jacobs, C., and Ciompi, F., “Oﬀ-the-shelf convolutional neural network
features for pulmonary nodule detection in computed tomography scans,” in [Biomedical Imaging (ISBI),
2015 IEEE 12th International Symposium on], 286–289, IEEE (2015).
 Choi, W.-J. and Choi, T.-S., “Automated pulmonary nodule detection based on three-dimensional shape-
based feature descriptor,” Computer methods and programs in biomedicine 113(1), 37–54 (2014).
 Roth, H. R., Lu, L., Seﬀ, A., Cherry, K. M., Hoﬀman, J., Wang, S., Liu, J., Turkbey, E., and Summers,
R. M., “A new 2.5 d representation for lymph node detection using random sets of deep convolutional neural
network observations,” in [Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014],
520–527, Springer International Publishing (2014).
 Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., and Susstrunk, S., “Slic superpixels compared
to state-of-the-art superpixel methods,” Pattern Analysis and Machine Intelligence, IEEE Transactions
on 34(11), 2274–2282 (2012).
 Krizhevsky, A., Sutskever, I., and Hinton, G. E., “Imagenet classiﬁcation with deep convolutional neural
networks,” in [Advances in neural information processing systems], 1097–1105 (2012).
 “Spie-aapm-nci lung nodule classiﬁcation challenge dataset.” https://wiki.cancerimagingarchive.net/
display/DOI/SPIE-AAPM-NCI+Lung+Nodule+Classification+Challenge+Dataset. [accessed 13-Aug-
 Armato, III, S. G., Hadjiiski, L., Tourassi, G. D., Drukker, K., Giger, M. L., Li, F., Redmond, G., Fara-
hani, K., Kirby, J. S., and Clarke, L. P., “Guest editorial: Lungx challenge for computerized lung nodule
classiﬁcation: reﬂections and lessons learned,” Journal of Medical Imaging 2(2), 020103 (2015).
 Vedaldi, A. and Lenc, K., “Matconvnet-convolutional neural networks for matlab,” arXiv preprint
 Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R., “Dropout: A simple way
to prevent neural networks from overﬁtting,” The Journal of Machine Learning Research 15(1), 1929–1958