ArticlePDF Available

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning

Authors:

Abstract and Figures

We develop an algorithm that can detect pneumonia from chest X-rays at a level exceeding practicing radiologists. Our algorithm, CheXNet, is a 121-layer convolutional neural network trained on ChestX-ray14, currently the largest publicly available chest X-ray dataset, containing over 100,000 frontal-view X-ray images with 14 diseases. Four practicing academic radiologists annotate a test set, on which we compare the performance of CheXNet to that of radiologists. We find that CheXNet exceeds average radiologist performance on pneumonia detection on both sensitivity and specificity. We extend CheXNet to detect all 14 diseases in ChestX-ray14 and achieve state of the art results on all 14 diseases.
Content may be subject to copyright.
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays
with Deep Learning
Pranav Rajpurkar * 1 Jeremy Irvin * 1 Kaylie Zhu 1Brandon Yang 1Hershel Mehta 1
Tony Duan 1Daisy Ding 1Aarti Bagul 1Curtis Langlotz 2Katie Shpanskaya 2
Matthew P. Lungren 2Andrew Y. Ng 1
Abstract
We develop an algorithm that can detect
pneumonia from chest X-rays at a level ex-
ceeding practicing radiologists. Our algo-
rithm, CheXNet, is a 121-layer convolutional
neural network trained on ChestX-ray14, cur-
rently the largest publicly available chest X-
ray dataset, containing over 100,000 frontal-
view X-ray images with 14 diseases. Four
practicing academic radiologists annotate a
test set, on which we compare the perfor-
mance of CheXNet to that of radiologists.
We find that CheXNet exceeds average radi-
ologist performance on pneumonia detection
on both sensitivity and specificity. We extend
CheXNet to detect all 14 diseases in ChestX-
ray14 and achieve state of the art results on
all 14 diseases.
1. Introduction
More than 1 million adults are hospitalized with pneu-
monia and around 50,000 die from the disease every
year in the US alone (CDC,2017). Chest X-rays
are currently the best available method for diagnosing
pneumonia (WHO,2001), playing a crucial role in clin-
ical care (Franquet,2001) and epidemiological studies
(Cherian et al.,2005). However, detecting pneumo-
nia in chest X-rays is a challenging task that relies on
the availability of expert radiologists. In this work, we
present a model that can automatically detect pneu-
monia from chest X-rays at a level exceeding practicing
radiologists.
*Equal contribution 1Stanford University De-
partment of Computer Science 2Stanford University
School of Medicine. Correspondence to: Pranav Ra-
jpurkar <pranavsr@cs.stanford.edu>, Jeremy Irvin
<jirvin16@cs.stanford.edu>.
Project website at https://stanfordmlgroup.
github.io/projects/chexnet
Output
Pneumonia Positive (85%)
Input
Chest X-Ray Image
CheXNet
121-layer CNN
Figure 1. CheXNet is a 121-layer convolutional neural net-
work that takes a chest X-ray image as input, and outputs
the probability of a pathology. On this example, CheXnet
correctly detects pneumonia and also localizes areas in the
image most indicative of the pathology.
Our model, ChexNet (shown in Figure 1), is a 121-
layer convolutional neural network that inputs a chest
X-ray image and outputs the probability of pneumonia
along with a heatmap localizing the areas of the im-
age most indicative of pneumonia. We train CheXNet
on the recently released ChestX-ray14 dataset (Wang
et al.,2017), which contains 112,120 frontal-view chest
X-ray images individually labeled with up to 14 differ-
ent thoracic diseases, including pneumonia. We use
arXiv:1711.05225v2 [cs.CV] 25 Nov 2017
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning
Figure 2. CheXNet outperforms the average of the radiologists at pneuomonia detection using X-ray images. CheXNet
is tested against 4 practicing radiologists on sensitivity (which measures the proportion of positives that are correctly
identified as such) and specificity (which measures the proportion of negatives that are correctly identified as such). A
single radiologist’s performance is represented by an orange marker, while the average is represented by green. CheXNet
outputs the probability of detecting pneumonia in a Chest X-ray, and the blue curve is generated by varying the thresholds
used for the classification boundary. The sensitivity-specificity point for each radiologist and for the average lie below the
blue curve, signifying that CheXNet is able to detect pneumonia at a level matching or exceeding radiologists.
dense connections (Huang et al.,2016) and batch nor-
malization (Ioffe & Szegedy,2015) to make the opti-
mization of such a deep network tractable.
Detecting pneumonia in chest radiography can be diffi-
cult for radiologists. The appearance of pneumonia in
X-ray images is often vague, can overlap with other di-
agnoses, and can mimic many other benign abnormal-
ities. These discrepancies cause considerable variabil-
ity among radiologists in the diagnosis of pneumonia
(Neuman et al.,2012;Davies et al.,1996;Hopstaken
et al.,2004). To estimate radiologist performance, we
collect annotations from four practicing academic radi-
ologists on a subset of 420 images from ChestX-ray14.
On these 420 images, we measure performance of in-
dividual radiologists using the majority vote of other
radiologists as ground truth, and similarly measure
model performance.
We find that the model exceeds the average radiolo-
gist performance at the pneumonia detection task on
both sensitivity and specificity. To compare CheXNet
against previous work using ChestX-ray14, we make
simple modifications to CheXNet to detect all 14 dis-
eases in ChestX-ray14, and find that we outperform
best published results on all 14 diseases. Automated
detection of diseases from chest X-rays at the level of
expert radiologists would not only have tremendous
benefit in clinical settings, it would also be invaluable
in delivery of health care to populations with inade-
quate access to diagnostic imaging specialists.
2. CheXNet
2.1. Problem Formulation
The pneumonia detection task is a binary classification
problem, where the input is a frontal-view chest X-
ray image Xand the output is a binary label y
{0,1}indicating the absence or presence of pneumonia
respectively. For a single example in the training set,
we optimize the weighted binary cross entropy loss
L(X, y) = w+·ylog p(Y= 1|X)
w·(1 y) log p(Y= 0|X),
where p(Y=i|X) is the probability that the network
assigns to the label i,w+=|N|/(|P|+|N|), and w=
|P|/(|P|+|N|) with |P|and |N|the number of positive
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning
cases and negative cases of pneumonia in the training
set respectively.
2.2. Model Architecture and Training
CheXNet is a 121-layer Dense Convolutional Net-
work (DenseNet) (Huang et al.,2016) trained on the
ChestX-ray 14 dataset. DenseNets improve flow of in-
formation and gradients through the network, making
the optimization of very deep networks tractable. We
replace the final fully connected layer with one that
has a single output, after which we apply a sigmoid
nonlinearity.
The weights of the network are initialized with weights
from a model pretrained on ImageNet (Deng et al.,
2009). The network is trained end-to-end using Adam
with standard parameters (β1= 0.9 and β2= 0.999)
(Kingma & Ba,2014). We train the model using mini-
batches of size 16. We use an initial learning rate of
0.001 that is decayed by a factor of 10 each time the
validation loss plateaus after an epoch, and pick the
model with the lowest validation loss.
3. Data
3.1. Training
We use the ChestX-ray14 dataset released by Wang
et al. (2017) which contains 112,120 frontal-view X-ray
images of 30,805 unique patients. Wang et al. (2017)
annotate each image with up to 14 different thoracic
pathology labels using automatic extraction methods
on radiology reports. We label images that have pneu-
monia as one of the annotated pathologies as positive
examples and label all other images as negative exam-
ples. For the pneumonia detection task, we randomly
split the dataset into training (28744 patients, 98637
images), validation (1672 patients, 6351 images), and
test (389 patients, 420 images). There is no patient
overlap between the sets.
Before inputting the images into the network, we
downscale the images to 224×224 and normalize based
on the mean and standard deviation of images in the
ImageNet training set. We also augment the training
data with random horizontal flipping.
3.2. Test
We collected a test set of 420 frontal chest X-rays. An-
notations were obtained independently from four prac-
ticing radiologists at Stanford University, who were
asked to label all 14 pathologies in Wang et al. (2017).
The radiologists had 4, 7, 25, and 28 years of experi-
ence, and one of the radiologists is a sub-specialty fel-
lowship trained thoracic radiologist. Radiologists did
not have access to any patient information or knowl-
edge of disease prevalence in the data. Labels were
entered into a standardized data entry program.
4. CheXNet vs. Radiologist
Performance
4.1. Comparison
We assess radiologist performance on the test set on
the pneumonia detection task. Recall that each of the
images in the test set has a ground truth label from
four practicing radiologists. We evaluate the perfor-
mance of an individual radiologist by using the ma-
jority vote of the other three radiologists as ground
truth. Similarly, we evaluate CheXNet using the ma-
jority vote of three of four radiologists, repeated four
times to cover all groups of three.
We compare CheXNet against radiologists on the Re-
ceiver Operating Characteristic (ROC) curve, which
plots model sensitivity against 1 - specificity. Figure 2
illustrates the model ROC curve as well as the four
radiologist and average radiologist operating points: a
single radiologist’s performance is represented by an
orange marker, while the average is represented by
green. CheXNet outputs the probability of detect-
ing pneumonia in a Chest X-ray, and the ROC curve
is generated by varying the thresholds used for the
classification boundary. CheXNet has an AUROC of
0.828 on the test set. The sensitivity-specificity point
for each radiologist and for the average lie below the
ROC curve, signifying that CheXNet is able to detect
pneumonia at a level matching or exceeding radiolo-
gists.
4.2. Limitations
We identify three limitations of this comparison. First,
neither the model nor the radiologists were permit-
ted to use prior examinations or patient history, which
have been shown to decrease radiologist performance
(Berbaum et al.,1985;Potchen et al.,1979). Second,
only frontal radiographs were presented to the radi-
ologists and model during diagnosis, but it has been
shown that up to 15% of accurate diagnoses require the
lateral view (Raoof et al.,2012); we thus expect that
this setup provides a conservative estimate of perfor-
mance. Third, neither the model nor the radiologists
were not permitted to use patient history, which has
been shown to decrease radiologist diagnostic perfor-
mance in interpreting chest radiographs (for example,
given a pulmonary abnormality with a history of fever
and cough, pneumonia would be appropriate rather
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning
Pathology Wang et al. (2017) Yao et al. (2017) CheXNet (ours)
Atelectasis 0.716 0.772 0.8094
Cardiomegaly 0.807 0.904 0.9248
Effusion 0.784 0.859 0.8638
Infiltration 0.609 0.695 0.7345
Mass 0.706 0.792 0.8676
Nodule 0.671 0.717 0.7802
Pneumonia 0.633 0.713 0.7680
Pneumothorax 0.806 0.841 0.8887
Consolidation 0.708 0.788 0.7901
Edema 0.835 0.882 0.8878
Emphysema 0.815 0.829 0.9371
Fibrosis 0.769 0.767 0.8047
Pleural Thickening 0.708 0.765 0.8062
Hernia 0.767 0.914 0.9164
Table 1. CheXNet outperforms the best published results on all 14 pathologies in the ChestX-ray14 dataset. In detecting
Mass, Nodule, Pneumonia, and Emphysema, CheXNet has a margin of >0.05 AUROC over previous state of the art
results.
than less specific terms such as infiltration or consoli-
dation) (Potchen et al.,1979).
5. ChexNet vs. Previous State of the
Art on the ChestX-ray14 Dataset
We extend the algorithm to classify multiple thoracic
pathologies by making three changes. First, instead of
outputting one binary label, ChexNet outputs a vec-
tor tof binary labels indicating the absence or presence
of each of the following 14 pathology classes: Atelec-
tasis, Cardiomegaly, Consolidation, Edema, Effusion,
Emphysema, Fibrosis, Hernia, Infiltration, Mass, Nod-
ule, Pleural Thickening, Pneumonia, and Pneumotho-
rax. Second, we replace the final fully connected layer
in CheXNet with a fully connected layer producing a
14-dimensional output, after which we apply an ele-
mentwise sigmoid nonlinearity. The final output is the
predicted probability of the presence of each pathology
class. Third, we modify the loss function to optimize
the sum of unweighted binary cross entropy losses
L(X, y) =
14
X
c=1
[yclog p(Yc= 1|X)
(1 yc) log p(Yc= 0|X)],
where p(Yc= 1|X) is the predicted probability that
the image contains the pathology cand p(Yc= 0|X)
is the predicted probability that the image does not
contain the pathology c.
We randomly split the dataset into training (70%), val-
idation (10%), and test (20%) sets, following previous
work on ChestX-ray14 (Wang et al.,2017;Yao et al.,
2017). We ensure that there is no patient overlap be-
tween the splits. We compare the per-class AUROC of
the model against the previous state of the art held by
Yao et al. (2017) on 13 classes and Wang et al. (2017)
on the remaining 1 class.
We find that CheXNet achieves state of the art results
on all 14 pathology classes. Table 1illustrates the per-
class AUROC comparison on the test set. On Mass,
Nodule, Pneumonia, and Emphysema, we outperform
previous state of the art considerably (>0.05 increase
in AUROC).
6. Model Interpretation
To interpret the network predictions, we also produce
heatmaps to visualize the areas of the image most in-
dicative of the disease using class activation mappings
(CAMs) (Zhou et al.,2016). To generate the CAMs,
we feed an image into the fully trained network and
extract the feature maps that are output by the final
convolutional layer. Let fkbe the kth feature map and
let wc,k be the weight in the final classification layer
for feature map kleading to pathology c. We obtain
a map Mcof the most salient features used in classi-
fying the image as having pathology cby taking the
weighted sum of the feature maps using their associ-
ated weights. Formally,
Mc=X
k
wc,kfk.
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning
(a) Patient with multifocal com-
munity acquired pneumonia. The
model correctly detects the airspace
disease in the left lower and right up-
per lobes to arrive at the pneumonia
diagnosis.
(b) Patient with a left lung nodule.
The model identifies the left lower
lobe lung nodule and correctly clas-
sifies the pathology.
(c) Patient with primary lung ma-
lignancy and two large masses, one
in the left lower lobe and one in
the right upper lobe adjacent to the
mediastinum. The model correctly
identifies both masses in the X-ray.
(d) Patient with a right-sided pneu-
mothroax and chest tube. The
model detects the abnormal lung
to correctly predict the presence of
pneumothorax (collapsed lung).
(e) Patient with a large right pleural
effusion (fluid in the pleural space).
The model correctly labels the effu-
sion and focuses on the right lower
chest.
(f) Patient with congestive heart
failure and cardiomegaly (enlarged
heart). The model correctly identi-
fies the enlarged cardiac silhouette.
Figure 3. CheXNet localizes pathologies it identifies using Class Activation Maps, which highlight the areas of the X-ray
that are most important for making a particular pathology classification. The captions for each image are provided by
one of the practicing radiologists.
We identify the most important features used by the
model in its prediction of the pathology cby upscal-
ing the map Mcto the dimensions of the image and
overlaying the image.
Figure 3shows several examples of CAMs on the pneu-
monia detection task as well as the 14-class pathology
classification task.
7. Related Work
Recent advancements in deep learning and large
datasets have enabled algorithms to surpass the per-
formance of medical professionals in a wide variety of
medical imaging tasks, including diabetic retinopathy
detection (Gulshan et al.,2016), skin cancer classifica-
tion (Esteva et al.,2017), arrhythmia detection (Ra-
jpurkar et al.,2017), and hemorrhage identification
(Grewal et al.,2017).
Automated diagnosis from chest radiographs has re-
ceived increasing attention with algorithms for pul-
monary tuberculosis classification (Lakhani & Sun-
daram,2017) and lung nodule detection (Huang et al.,
2017). Islam et al. (2017) studied the performance
of various convolutional architectures on different ab-
normalities using the publicly available OpenI dataset
(Demner-Fushman et al.,2015). Wang et al. (2017)
released ChestX-ray-14, an order of magnitude larger
than previous datasets of its kind, and also bench-
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning
marked different convolutional neural network archi-
tectures pre-trained on ImageNet. Recently Yao et al.
(2017) exploited statistical dependencies between la-
bels in order make more accurate predictions, outper-
forming Wang et al. (2017) on 13 of 14 classes.
8. Conclusion
Pneumonia accounts for a significant proportion of
patient morbidity and mortality (Gon¸calves-Pereira
et al.,2013). Early diagnosis and treatment of pneu-
monia is critical to preventing complications including
death (Aydogdu et al.,2010). With approximately 2
billion procedures per year, chest X-rays are the most
common imaging examination tool used in practice,
critical for screening, diagnosis, and management of a
variety of diseases including pneumonia (Raoof et al.,
2012). However, two thirds of the global population
lacks access to radiology diagnostics, according to an
estimate by the World Health Organization (Mollura
et al.,2010). There is a shortage of experts who can in-
terpret X-rays, even when imaging equipment is avail-
able, leading to increased mortality from treatable dis-
eases (Kesselman et al.,2016).
We develop an algorithm which exceeds the perfor-
mance of radiologists in detecting pneumonia from
frontal-view chest X-ray images. We also show that
a simple extension of our algorithm to detect multi-
ple diseases outperforms previous state of the art on
ChestX-ray14, the largest publicly available chest X-
ray dataset. With automation at the level of experts,
we hope that this technology can improve healthcare
delivery and increase access to medical imaging ex-
pertise in parts of the world where access to skilled
radiologists is limited.
9. Acknowledgements
We would like to acknowledge the Stanford Center for
Artificial Intelligence in Medicine and Imaging for clin-
ical dataset infrastructure support (AIMI.stanford.
edu).
References
Aydogdu, M, Ozyilmaz, E, Aksoy, Handan, Gursel,
G, and Ekim, Numan. Mortality prediction in
community-acquired pneumonia requiring mechan-
ical ventilation; values of pneumonia and intensive
care unit severity scores. Tuberk Toraks, 58(1):25–
34, 2010.
Berbaum, K, Franken Jr, EA, and Smith, WL. The
effect of comparison films upon resident interpre-
tation of pediatric chest radiographs. Investigative
radiology, 20(2):124–128, 1985.
CDC, 2017. URL https://www.cdc.gov/features/
pneumonia/index.html.
Cherian, Thomas, Mulholland, E Kim, Carlin, John B,
Ostensen, Harald, Amin, Ruhul, Campo, Mar-
garet de, Greenberg, David, Lagos, Rosanna,
Lucero, Marilla, Madhi, Shabir A, et al. Standard-
ized interpretation of paediatric chest radiographs
for the diagnosis of pneumonia in epidemiological
studies. Bulletin of the World Health Organization,
83(5):353–359, 2005.
Davies, H Dele, Wang, Elaine E-l, Manson, David,
Babyn, Paul, and Shuckett, Bruce. Reliability of
the chest radiograph in the diagnosis of lower res-
piratory infections in young children. The Pediatric
infectious disease journal, 15(7):600–604, 1996.
Demner-Fushman, Dina, Kohli, Marc D, Rosenman,
Marc B, Shooshan, Sonya E, Rodriguez, Laritza,
Antani, Sameer, Thoma, George R, and McDonald,
Clement J. Preparing a collection of radiology ex-
aminations for distribution and retrieval. Journal of
the American Medical Informatics Association, 23
(2):304–310, 2015.
Deng, Jia, Dong, Wei, Socher, Richard, Li, Li-Jia, Li,
Kai, and Fei-Fei, Li. Imagenet: A large-scale hier-
archical image database. In Computer Vision and
Pattern Recognition, 2009. CVPR 2009. IEEE Con-
ference on, pp. 248–255. IEEE, 2009.
Esteva, Andre, Kuprel, Brett, Novoa, Roberto A,
Ko, Justin, Swetter, Susan M, Blau, Helen M, and
Thrun, Sebastian. Dermatologist-level classification
of skin cancer with deep neural networks. Nature,
542(7639):115–118, 2017.
Franquet, T. Imaging of pneumonia: trends and algo-
rithms. European Respiratory Journal, 18(1):196–
208, 2001.
Gon¸calves-Pereira, Jo˜ao, Concei¸ao, Catarina, and
ovoa, Pedro. Community-acquired pneumo-
nia: identification and evaluation of nonresponders.
Therapeutic advances in infectious disease, 1(1):5–
17, 2013.
Grewal, Monika, Srivastava, Muktabh Mayank, Ku-
mar, Pulkit, and Varadarajan, Srikrishna. Radnet:
Radiologist level accuracy using deep learning for
hemorrhage detection in ct scans. arXiv preprint
arXiv:1710.04934, 2017.
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning
Gulshan, Varun, Peng, Lily, Coram, Marc, Stumpe,
Martin C, Wu, Derek, Narayanaswamy, Arunacha-
lam, Venugopalan, Subhashini, Widner, Kasumi,
Madams, Tom, Cuadros, Jorge, et al. Development
and validation of a deep learning algorithm for de-
tection of diabetic retinopathy in retinal fundus pho-
tographs. Jama, 316(22):2402–2410, 2016.
Hopstaken, RM, Witbraad, T, Van Engelshoven,
JMA, and Dinant, GJ. Inter-observer variation in
the interpretation of chest radiographs for pneumo-
nia in community-acquired lower respiratory tract
infections. Clinical radiology, 59(8):743–752, 2004.
Huang, Gao, Liu, Zhuang, Weinberger, Kilian Q, and
van der Maaten, Laurens. Densely connected convo-
lutional networks. arXiv preprint arXiv:1608.06993,
2016.
Huang, Peng, Park, Seyoun, Yan, Rongkai, Lee,
Junghoon, Chu, Linda C, Lin, Cheng T, Hussien,
Amira, Rathmell, Joshua, Thomas, Brett, Chen,
Chen, et al. Added value of computer-aided ct image
features for early lung cancer diagnosis with small
pulmonary nodules: A matched case-control study.
Radiology, pp. 162725, 2017.
Ioffe, Sergey and Szegedy, Christian. Batch normaliza-
tion: Accelerating deep network training by reduc-
ing internal covariate shift. In International Confer-
ence on Machine Learning, pp. 448–456, 2015.
Islam, Mohammad Tariqul, Aowal, Md Abdul, Min-
haz, Ahmed Tahseen, and Ashraf, Khalid. Ab-
normality detection and localization in chest x-rays
using deep convolutional neural networks. arXiv
preprint arXiv:1705.09850, 2017.
Kesselman, Andrew, Soroosh, Garshasb, Mollura,
Daniel J, and Group, RAD-AID Conference Writ-
ing. 2015 rad-aid conference on international radi-
ology for developing countries: The evolving global
radiology landscape. Journal of the American Col-
lege of Radiology, 13(9):1139–1144, 2016.
Kingma, Diederik and Ba, Jimmy. Adam: A
method for stochastic optimization. arXiv preprint
arXiv:1412.6980, 2014.
Lakhani, Paras and Sundaram, Baskaran. Deep learn-
ing at chest radiography: Automated classification
of pulmonary tuberculosis by using convolutional
neural networks. Radiology, pp. 162326, 2017.
Mollura, Daniel J, Azene, Ezana M, Starikovsky,
Anna, Thelwell, Aduke, Iosifescu, Sarah, Kimble,
Cary, Polin, Ann, Garra, Brian S, DeStigter, Kris-
ten K, Short, Brad, et al. White paper report of
the rad-aid conference on international radiology for
developing countries: identifying challenges, oppor-
tunities, and strategies for imaging services in the
developing world. Journal of the American College
of Radiology, 7(7):495–500, 2010.
Neuman, Mark I, Lee, Edward Y, Bixby, Sarah,
Diperna, Stephanie, Hellinger, Jeffrey, Markowitz,
Richard, Servaes, Sabah, Monuteaux, Michael C,
and Shah, Samir S. Variability in the interpretation
of chest radiographs for the diagnosis of pneumo-
nia in children. Journal of hospital medicine, 7(4):
294–298, 2012.
Potchen, EJ, Gard, JW, Lazar, P, Lahaie, P, and
Andary, M. Effect of clinical history data on chest
film interpretation-direction or distraction. In Inves-
tigative Radiology, volume 14, pp. 404–404, 1979.
Rajpurkar, Pranav, Hannun, Awni Y, Haghpanahi,
Masoumeh, Bourn, Codie, and Ng, Andrew Y.
Cardiologist-level arrhythmia detection with con-
volutional neural networks. arXiv preprint
arXiv:1707.01836, 2017.
Raoof, Suhail, Feigin, David, Sung, Arthur,
Raoof, Sabiha, Irugulpati, Lavanya, and Rosenow,
Edward C. Interpretation of plain chest
roentgenogram. CHEST Journal, 141(2):545–558,
2012.
Wang, Xiaosong, Peng, Yifan, Lu, Le, Lu, Zhiyong,
Bagheri, Mohammadhadi, and Summers, Ronald M.
Chestx-ray8: Hospital-scale chest x-ray database
and benchmarks on weakly-supervised classification
and localization of common thorax diseases. arXiv
preprint arXiv:1705.02315, 2017.
WHO. Standardization of interpretation of chest ra-
diographs for the diagnosis of pneumonia in chil-
dren. 2001.
Yao, Li, Poblenz, Eric, Dagunts, Dmitry, Covington,
Ben, Bernard, Devon, and Lyman, Kevin. Learning
to diagnose from scratch by exploiting dependen-
cies among labels. arXiv preprint arXiv:1710.10501,
2017.
Zhou, Bolei, Khosla, Aditya, Lapedriza, Agata, Oliva,
Aude, and Torralba, Antonio. Learning deep fea-
tures for discriminative localization. In Proceedings
of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 2921–2929, 2016.
... To date, deep learning technologies in the field of computer vision can be broadly categorized into two main classes: Convolutional Neural Networks (CNNs) [4], and Vision Transformers (ViTs) [5]. Models for automated medical image diagnosis are mostly based on improved CNN algorithms, such as DenseNet [6]. CNNs' greatest advantage lies in their ability to automatically and efficiently extract multi-level features of images through local receptive fields and weight-sharing mechanisms, thereby achieving accurate image recognition and processing. ...
... CNN methods such as AlexNet and VGGNet were used by Wang et al. to predict 14 diseases in the NIH ChestX-ray14 dataset [14]. Rajpurkar et al. designed CheXNet based on DenseNet121 [6]. DenseNet is characterized by its dense connectivity, where each layer receives input from all preceding layers, improving feature reuse and reducing the number of parameters. ...
... We compared CheX-DS with previous studies focusing on AUC to validate our research findings. The previous studies discussed in the comparison include CheXNet [6], DualCheXN [23], CheXGCN [24], ImageGCN [25] and CheXViT [17]. The results are as shown in TABLE IV. ...
Preprint
Full-text available
The automatic diagnosis of chest diseases is a popular and challenging task. Most current methods are based on convolutional neural networks (CNNs), which focus on local features while neglecting global features. Recently, self-attention mechanisms have been introduced into the field of computer vision, demonstrating superior performance. Therefore, this paper proposes an effective model, CheX-DS, for classifying long-tail multi-label data in the medical field of chest X-rays. The model is based on the excellent CNN model DenseNet for medical imaging and the newly popular Swin Transformer model, utilizing ensemble deep learning techniques to combine the two models and leverage the advantages of both CNNs and Transformers. The loss function of CheX-DS combines weighted binary cross-entropy loss with asymmetric loss, effectively addressing the issue of data imbalance. The NIH ChestX-ray14 dataset is selected to evaluate the model's effectiveness. The model outperforms previous studies with an excellent average AUC score of 83.76\%, demonstrating its superior performance.
Conference Paper
Medical image classification is essential for diagnosing and treating serious diseases like lung and brain cancers since early detection of these complex formations improves patient outcomes. Current methods misclassify because they cannot catch subtle patterns and large-scale features. We solved this challenge with the Multiscale Transformer-enhanced Feature Fusion Network (MTFFN), an upgraded architecture to improve classification accuracy. We use a domain-specific transformer encoder and convolutional neural networks (MC-CNN) to extract fine-grained visual information and high-level semantics. By augmenting data to account for picture variability, MTFFN enhances generalization and diagnostic accuracy. Experimental validation on the BRATS and LIDC datasets shows that MTFFN outperforms state-of-the-art methods with 99.3% accuracy, 99.5% specificity, and 99.0% sensitivity. These findings show that MTFFN is reliable, especially for classifying complex tumor forms. MTFFN's impact on early detection and tailored treatment may grow with automated clinical procedures.
Article
Full-text available
Pneumonia is a serious and potentially life-threatening lung infection that necessitates prompt and accurate diagnosis to ensure effective treatment and reduce the risk of complications or fatalities. Traditional diagnosis through chest X-ray interpretation can be challenging due to variability in radiologist expertise and the subtle nature of some pneumonia-related abnormalities. With the rise of artificial intelligence in healthcare, deep learning techniques—particularly Convolutional Neural Networks (CNNs)—have shown great promise in automating and enhancing diagnostic accuracy in medical imaging. This study conducts a comparative evaluation of a custom-built CNN model and Google's Teachable Machine, a user-friendly no-code machine learning platform, for binary classification of chest X-ray images into two categories: "Normal" and "Pneumonia." Both models were trained and tested on the same dataset to ensure a fair comparison. The experimental outcomes reveal that the scratch-built CNN model significantly outperforms the Teachable Machine model in key performance metrics including accuracy, recall, precision, and F1-score. These results highlight the superiority of tailored deep learning architectures designed specifically for the complexities of medical image analysis. The findings suggest that while generalized machine learning tools offer accessibility, they may fall short in critical diagnostic tasks where specialized models can better capture domain-specific features for improved clinical decision-making.
Article
Full-text available
In this study, we systematically investigate the impact of class imbalance on classification performance of convolutional neural networks (CNNs) and compare frequently used methods to address the issue. Class imbalance is a common problem that has been comprehensively studied in classical machine learning, yet very limited systematic research is available in the context of deep learning. In our study, we use three benchmark datasets of increasing complexity, MNIST, CIFAR-10 and ImageNet, to investigate the effects of imbalance on classification and perform an extensive comparison of several methods to address the issue: oversampling, undersampling, two-phase training, and thresholding that compensates for prior class probabilities. Our main evaluation metric is area under the receiver operating characteristic curve (ROC AUC) adjusted to multi-class tasks since overall accuracy metric is associated with notable difficulties in the context of imbalanced data. Based on results from our experiments we conclude that (i) the effect of class imbalance on classification performance is detrimental; (ii) the method of addressing class imbalance that emerged as dominant in almost all analyzed scenarios was oversampling; (iii) oversampling should be applied to the level that totally eliminates the imbalance, whereas undersampling can perform better when the imbalance is only removed to some extent; (iv) as opposed to some classical machine learning models, oversampling does not necessarily cause overfitting of CNNs; (v) thresholding should be applied to compensate for prior class probabilities when overall number of properly classified cases is of interest.
Preprint
Full-text available
We describe a deep learning approach for automated brain hemorrhage detection from computed tomography (CT) scans. Our model emulates the procedure followed by radiologists to analyse a 3D CT scan in real-world. Similar to radiologists, the model sifts through 2D cross-sectional slices while paying close attention to potential hemorrhagic regions. Further, the model utilizes 3D context from neighboring slices to improve predictions at each slice and subsequently, aggregates the slice-level predictions to provide diagnosis at CT level. We refer to our proposed approach as Recurrent Attention DenseNet (RADnet) as it employs original DenseNet architecture along with adding the components of attention for slice level predictions and recurrent neural network layer for incorporating 3D context. The real-world performance of RADnet has been benchmarked against independent analysis performed by three senior radiologists for 77 brain CTs. RADnet demonstrates 81.82% hemorrhage prediction accuracy at CT level that is comparable to radiologists. Further, RADnet achieves higher recall than two of the three radiologists, which is remarkable.
Article
Full-text available
Purpose To test whether computer-aided diagnosis (CAD) approaches can increase the positive predictive value (PPV) and reduce the false-positive rate in lung cancer screening for small nodules compared with human reading by thoracic radiologists. Materials and Methods A matched case-control sample of low-dose computed tomography (CT) studies in 186 participants with 4-20-mm noncalcified lung nodules who underwent biopsy in the National Lung Screening Trial (NLST) was selected. Variables used for matching were age, sex, smoking status, chronic obstructive pulmonary disease status, body mass index, study year of the positive screening test, and screening results. Studies before lung biopsy were randomly split into a training set (70 cancers plus 70 benign controls) and a validation set (20 cancers plus 26 benign controls). Image features from within and outside dominant nodules were extracted. A CAD algorithm developed from the training set and a random forest classifier were applied to the validation set to predict biopsy outcomes. Receiver operating characteristic analysis was used to compare the prediction accuracy of CAD with the NLST investigator's diagnosis and readings from three experienced and board-certified thoracic radiologists who used contemporary clinical practice guidelines. Results In the validation cohort, the area under the receiver operating characteristic curve for CAD was 0.9154. By default, the sensitivity, specificity, and PPV of the NLST investigators were 1.00, 0.00, and 0.43, respectively. The sensitivity, specificity, PPV, and negative predictive value of CAD and the three radiologists' combined reading were 0.95, 0.88, 0.86, and 0.96 and 0.70, 0.69, 0.64, and 0.75, respectively. Conclusion CAD could increase PPV and reduce the false-positive rate in the early diagnosis of lung cancer. (©) RSNA, 2017 Online supplemental material is available for this article.
Article
Full-text available
Chest X-Rays (CXRs) are widely used for diagnosing abnormalities in the heart and lung area. Automatically detecting these abnormalities with high accuracy could greatly enhance real world diagnosis processes. Lack of standard publicly available dataset and benchmark studies, however, makes it difficult to compare various detection methods. In order to overcome these difficulties, we have used a publicly available Indiana CXR, JSRT and Shenzhen dataset and studied the performance of known deep convolutional network (DCN) architectures on different abnormalities. We find that the same DCN architecture doesn't perform well across all abnormalities. Shallow features or earlier layers consistently provide higher detection accuracy compared to deep features. We have also found ensemble models to improve classification significantly compared to single model. Combining these insight, we report the highest accuracy on chest X-Ray abnormality detection on these datasets. We find that for cardiomegaly detection, the deep learning method improves the accuracy by a staggering 17 percentage point compared to rule based methods. We applied the techniques to the problem of tuberculosis detection on a different dataset and achieved the highest accuracy. Our localization experiments using these trained classifiers show that for spatially spread out abnormalities like cardiomegaly and pulmonary edema, the network can localize the abnormalities successfully most of the time. One remarkable result of the cardiomegaly localization is that the heart and its surrounding region is most responsible for cardiomegaly detection, in contrast to the rule based models where the ratio of heart and lung area is used as the measure. We believe that through deep learning based classification and localization, we will discover many more interesting features in medical image diagnosis that are not considered traditionally.
Conference Paper
Full-text available
Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections - one between each layer and its subsequent layer - our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less memory and computation to achieve high performance. Code and models are available at https://github.com/liuzhuang13/DenseNet.
Article
The field of medical diagnostics contains a wealth of challenges which closely resemble classical machine learning problems; practical constraints, however, complicate the translation of these endpoints naively into classical architectures. Many tasks in radiology, for example, are largely problems of multi-label classification wherein medical images are interpreted to indicate multiple present or suspected pathologies. Clinical settings drive the necessity for high accuracy simultaneously across a multitude of pathological outcomes and greatly limit the utility of tools which consider only a subset. This issue is exacerbated by a general scarcity of training data and maximizes the need to extract clinically relevant features from available samples -- ideally without the use of pre-trained models which may carry forward undesirable biases from tangentially related tasks. We present and evaluate a partial solution to these constraints in using LSTMs to leverage interdependencies among target labels in predicting 14 pathologic patterns from chest x-rays and establish state of the art results on the largest publicly available chest x-ray dataset from the NIH without pre-training. Furthermore, we propose and discuss alternative evaluation metrics and their relevance in clinical practice.
Conference Paper
In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that can be applied to a variety of tasks. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014, which is remarkably close to the 34.2% top-5 error achieved by a fully supervised CNN approach. We demonstrate that our network is able to localize the discriminative image regions on a variety of tasks despite not being trained for them
Article
We develop an algorithm which exceeds the performance of board certified cardiologists in detecting a wide range of heart arrhythmias from electrocardiograms recorded with a single-lead wearable monitor. We build a dataset with more than 500 times the number of unique patients than previously studied corpora. On this dataset, we train a 34-layer convolutional neural network which maps a sequence of ECG samples to a sequence of rhythm classes. Committees of board-certified cardiologists annotate a gold standard test set on which we compare the performance of our model to that of 6 other individual cardiologists. We exceed the average cardiologist performance in both recall (sensitivity) and precision (positive predictive value).
Article
Purpose To evaluate the efficacy of deep convolutional neural networks (DCNNs) for detecting tuberculosis (TB) on chest radiographs. Materials and Methods Four deidentified HIPAA-compliant datasets were used in this study that were exempted from review by the institutional review board, which consisted of 1007 posteroanterior chest radiographs. The datasets were split into training (68.0%), validation (17.1%), and test (14.9%). Two different DCNNs, AlexNet and GoogLeNet, were used to classify the images as having manifestations of pulmonary TB or as healthy. Both untrained and pretrained networks on ImageNet were used, and augmentation with multiple preprocessing techniques. Ensembles were performed on the best-performing algorithms. For cases where the classifiers were in disagreement, an independent board-certified cardiothoracic radiologist blindly interpreted the images to evaluate a potential radiologist-augmented workflow. Receiver operating characteristic curves and areas under the curve (AUCs) were used to assess model performance by using the DeLong method for statistical comparison of receiver operating characteristic curves. Results The best-performing classifier had an AUC of 0.99, which was an ensemble of the AlexNet and GoogLeNet DCNNs. The AUCs of the pretrained models were greater than that of the untrained models (P < .001). Augmenting the dataset further increased accuracy (P values for AlexNet and GoogLeNet were .03 and .02, respectively). The DCNNs had disagreement in 13 of the 150 test cases, which were blindly reviewed by a cardiothoracic radiologist, who correctly interpreted all 13 cases (100%). This radiologist-augmented approach resulted in a sensitivity of 97.3% and specificity 100%. Conclusion Deep learning with DCNNs can accurately classify TB at chest radiography with an AUC of 0.99. A radiologist-augmented approach for cases where there was disagreement among the classifiers further improved accuracy. (©) RSNA, 2017.
Article
Skin cancer, the most common human malignancy, is primarily diagnosed visually, beginning with an initial clinical screening and followed potentially by dermoscopic analysis, a biopsy and histopathological examination. Automated classification of skin lesions using images is a challenging task owing to the fine-grained variability in the appearance of skin lesions. Deep convolutional neural networks (CNNs) show potential for general and highly variable tasks across many fine-grained object categories. Here we demonstrate classification of skin lesions using a single CNN, trained end-to-end from images directly, using only pixels and disease labels as inputs. We train a CNN using a dataset of 129,450 clinical images-two orders of magnitude larger than previous datasets-consisting of 2,032 different diseases. We test its performance against 21 board-certified dermatologists on biopsy-proven clinical images with two critical binary classification use cases: keratinocyte carcinomas versus benign seborrheic keratoses; and malignant melanomas versus benign nevi. The first case represents the identification of the most common cancers, the second represents the identification of the deadliest skin cancer. The CNN achieves performance on par with all tested experts across both tasks, demonstrating an artificial intelligence capable of classifying skin cancer with a level of competence comparable to dermatologists. Outfitted with deep neural networks, mobile devices can potentially extend the reach of dermatologists outside of the clinic. It is projected that 6.3 billion smartphone subscriptions will exist by the year 2021 (ref. 13) and can therefore potentially provide low-cost universal access to vital diagnostic care.