PreprintPDF Available

COVID-19/Pneumonia Classification Based on Guided Attention

Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

With the novel coronavirus 19 (COVID-19) continually having a devastating effect around the globe, many scientists and clinicians are actively seeking to develop new techniques to assist with the tackling of this disease. Modern machine learning methods have shown promise in their adoption to assist the health care industry through their data and analytics-driven decision making, inspiring researchers to develop new angles to fight the virus. In this paper, we aim to develop a robust method for the detection of COVID-19 by utilizing patients' chest X-ray images. Despite recent progress, scarcity of data has thus far limited the development of a robust solution. We extend upon existing work by combing publicly available data across 5 different sources and carefully annotating the comprising images into three categories: normal, pneumonia, and COVID-19. To achieve a high classification accuracy, we propose a training pipeline based on the directed guidance of traditional classification networks, where the guidance is directed by an external segmentation network. Through this network, we observed that the widely used, standard networks can achieve an accuracy comparable to tailor-made models specifically for COVID-19, furthermore one network, VGG-16, outperformed the best of the tailor-made models.
Content may be subject to copyright.
Page 1/16
COVID-19/Pneumonia Classication Based on
Guided Attention
Viacheslav Danilov
Alex Karpovsky
Kanda Software
Alexander Kirpich
Georgia State University
Diana Litmanovich
Beth Israel Deaconess Medical Center
Dato Nefaridze
Oleg Talalov
Semyon Semyonov
Alexander Proutski
Vladimir Koniukhovskii
EPAM Systems
Vladimir Shvartc
EPAM Systems
Yuriy Gankin ( )
Research Article
Keywords: Covid-19, pneumonia, classication, guided attention, deep learning
License: This work is licensed under a Creative Commons Attribution 4.0 International License. 
Read Full License
Page 2/16
With the novel coronavirus 19 (COVID-19) continually having a devastating effect around the globe, many
scientists and clinicians are actively seeking to develop new techniques to assist with the tackling of this
disease. Modern machine learning methods have shown promise in their adoption to assist the health
care industry through their data and analytics-driven decision making, inspiring researchers to develop
new angles to ght the virus. In this paper, we aim to develop a robust method for the detection of COVID-
19 by utilizing patients' chest X-ray images. Despite recent progress, scarcity of data has thus far limited
the development of a robust solution. We extend upon existing work by combing publicly available data
across 5 different sources and carefully annotating the comprising images into three categories: normal,
pneumonia, and COVID-19. To achieve a high classication accuracy, we propose a training pipeline
based on the directed guidance of traditional classication networks, where the guidance is directed by
an external segmentation network. Through this network, we observed that the widely used, standard
networks can achieve an accuracy comparable to tailor-made models specically for COVID-19,
furthermore one network, VGG-16, outperformed the best of the tailor-made models.
Research Highlights
1. Both direct and indirect supervision allow for networks to focus more on the desired object.
2. Basing network training on the guided attention mechanism results in accuracies comparable to
tailor-made networks made for distinguishing between COVID-19 and pneumonia.
3. Direct supervision based on U-net inuences network performance more than indirect supervision
based on Grad-CAM.
4. VGG-16 trained using guided attention has demonstrated the most accurate classication at the level
of 88% and 84% on the validation and testing subsets respectively.
1. Introduction
Since its introduction into the human population in late 2019, COVID-19 continues to have a devastating
effect on the global populace with the number of infected individuals steadily rising 1. With widely
available treatments still outstanding and the continued strain placed on many healthcare systems
across the world, ecient screening of suspected COVID-19 patients and their subsequent isolation is of
paramount importance to mitigate the further spread of the virus. Presently, the accepted gold standard
for patient screening is reverse transcriptase-polymerase chain reaction (RT-PCR) where the presence of
COVID-19 is inferred from analysis of respiratory samples 2. Despite its success, RT-PCR is a highly
involved manual process with slow turnaround times, and with results becoming available up to several
days after the test is performed. Furthermore, its variable sensitivity, lack of standardized reporting, and a
widely ranging total positive rate 3–5 calls for alternative screening methods.
Page 3/16
Chest radiography imaging (such as X-ray or computed tomography (CT) imaging) has gained traction as
a powerful alternative, where the diagnosis is administered by expert radiologists who analyze the
resulting images and infer the presence of COVID-19 through subtle visual cues 6–10. Of the two imaging
methods studied, X-ray imaging has distinct advantages with regards to accessibility, availability, and
rate of testing 11. Furthermore, the existence of portable X-ray imaging systems does not require patient
transportation or physical contact between healthcare professionals and suspected infected individuals,
thus allowing for ecient virus isolation and a safer testing methodology. Despite its obvious promise,
the main challenge facing radiography examination is the scarcity of trained experts that could conduct
the analysis at a time when the number of possible patients continues to rise. As such, a computer
system that could accurately analyze and interpret chest X-ray images could signicantly alleviate the
burden placed on expert radiologists and further streamline patient care. Image identication techniques
are readily adopted in Articial Intelligence (AI) and could prove to be a powerful solution to the problem
at hand.
Despite recent progress in the development of AI algorithms 12–15, one of the fundamental issues facing
the development of a robust solution is the scarcity of publicly available data. We extend upon existing
works by combining various publicly available data sources and carefully annotating the images across
three classes: normal, pneumonia, and COVID-19. The data is then divided into training, validation, and
testing subsets with an 8:1:1 split respectively with a strict class balance maintained across all sets.
Deep learning models, such as convolutional neural networks (CNNs), have gained traction in the eld of
medical imaging 16 and here we train 10 promising CNNs for the purpose of COVID-19 classication in
chest X-ray images. To assist the models, we utilize a purpose-built extraction mask as part of a three-
stage procedure. The mask accurately extracts the lung areas from the CXRs, with the subsequent
images fed into one of the CNNs. To better quantify the performance of our proposed framework we
benchmark our results against recently developed COVID-Net models 12. To ensure consistency we utilize
our dataset to output predictions across an array of different COVID-Net models.
The structure of the rest of this paper is as follows. Section 2 summarizes the data collected based on 5
most relevant datasets. Section 3 describes a proposed three-stage workow using a guided attention
mechanism. Results obtained during the all stages, Further improvements of the proposed workow, its
advantages over other models and possible implementation are shown in Sect.4. Section 5 represents a
synthesis of key points of the developed model based on the guided attention mechanism.
2. Data
To train a high-precision classier, we collected data from different publicly available sources. At the time
of publication, we identied the following ve datasets; Covid Chest X-Ray Dataset (CCXRD) 17,18,
Actualmed COVID-19 Chest X-Ray Dataset (ACCXRD) 19, Fig.1 COVID-19 Chest X-Ray Dataset (FCCXRD)
20, COVID-19 Radiography Database (CRD) 21,22, and RSNA Pneumonia Detection Dataset (RSNAPDD) 23.
Since the datasets include different labels for their ndings, we conducted the following mapping. We
Page 4/16
assigned viral and bacterial pneumonias to the “Pneumonia label; SARS, MERS-CoV, COVID-19, and
COVID-19 (ARDS) to the “COVID-19” label; “no ndings” and “normal” diagnosis to the “Normal” label.
Table1 summarizes statistical information of the study data set.
Table 1
– Statistical information on the dataset used in the study
Dataset Diagnosis Images in dataset
Normal Pneumonia COVID-19
CCXRD 18 162 503 683
ACCXRD 127 58 185
FCCXRD 3 2 35 40
CRD – – 219 219
RSNAPDD 800 700 – 1500
Total 952 (34.1%) 918 (32.9%) 920 (33.0%) 2790 (100%)
It should be noted that the RSNAPDD dataset includes only normal and pneumonia cases. Originally, this
dataset consisted of 20672 normal cases and 9555 cases of pneumonia. In order to keep the class
balance in our datasets, we added only 800 normal and 700 pneumonia cases. It is worth noticing that
normal and pneumonia cases from the CRD dataset were excluded because they duplicated images from
the CCXRD dataset.
The nal dataset only includes images acquired from the anterior-posterior (AP) and posteroanterior (PA)
directions. Lateral CXR has no clinical applicability to distinguish COVID-19 patients 24.
3. Methods
The proposed workow in this study is divided into three stages. During the rst stage 10 industry-
standard networks including MobileNet V2, DenseNet-121, EcientNet B0, EcientNet B1, EcientNet B3,
EcientNet B5, VGG-16, ResNet-50 V2, Inception V3, and Inception ResNet V2 were trained on the
prepared dataset. All those networks are the de-facto industry standard in the eld of deep learning.
During the second stage, we chose the 4 most accurate networks which were then ne-tuned. In the
process of ne-tuning, both a feature extractor and a classier were trained. In the third stage, the
networks were trained with a guided attention mechanism. This mechanism is based on the usage of the
U-net segmentation network, where the output is used to focus the classier on the lung area of an image.
Besides direct guidance by U-net, the network is additionally trained based on indirect supervision through
the application of Grad-CAM. Indirect supervision is used in the training process since Grad-CAM’s
attention heatmaps reect the areas of an input image supporting the network’s prediction. In this regard,
the prediction is based on the areas which we expect the network to focus on, while indirect supervision
Page 5/16
forces networks to focus on the desired object in the image rather than its other parts. The training
workow of the model is shown in Fig.1 below. All three stages are described in Sect.3.1 and 3.2 in more
It should be noted that different COVID-Net models 12 are also considered in this study. To date, COVID-
Net models are state-of-the-art models used for distinguishing COVID-19 and pneumonia cases. All
COVID-Net models are abbreviated CXR further in the paper.
3.1. Stage I and Stage II
As we mentioned above, we chose 10 deep learning networks in order to nd out which network
architectures are most effective in recognizing COVID-19 and pneumonia. All the networks vary by the
number of weights, architecture topology, the way of data processing, etc. Additionally, CXR models are
used for comparison purposes. Table2 summarizes information about the networks we used in the rst
Page 6/16
Table 2
– Description of the models used during therst stage
Model Size of an input
image Size of an output
feature vector Parameters,
millions Size,
MobileNet V2 224x224 7x7x1280 2.6 14 25
DenseNet-121 224x224 7x7x1024 7.2 33 26
EcientNet B0 224x224 7x7x1280 4.2 29 27
EcientNet B1 240x240 8x8x1280 6.7 31 27
EcientNet B3 300x300 10x10x1536 11.0 48 27
EcientNet B5 456x456 15x15x2048 28.8 75 27
VGG-16 224x224 7x7x512 14.9 528 28
ResNet-50 V2 224x224 7x7x2048 25.6 98 29
InceptionV3 299x299 8x8x2048 22.1 92 30
ResNet V2 299x299 5x5x1536 54.5 215 31
CXR Small 224x224 7x7x2048 117.4 1448 32
CXR Large 224x224 7x7x2048 127.4 1486 32
CXR-3A 480x480 13x13x1536 40.2 617 32
CXR-3B 480x480 15x15x2048 11.7 293 32
CXR-3C 480x480 15x15x2048 9.2 210 32
CXR-4A 480x480 13x13x1536 40.2 617 32
CXR-4B 480x480 15x15x2048 11.7 293 32
CXR-4C 480x480 15x15x2048 9.2 210 32
To train the abovementioned networks, we used bodies of these networks with ImageNet weights frozen.
Using Amazon SageMaker 33, we tuned a given model and found its best version through a series of
training jobs run on the collected dataset. Having performed hyperparameter tuning based on Bayesian
optimization strategy, a set of hyperparameter values for the best performing model was found, as
measured by a validation accuracy. The optimal architecture of the network head consists of the
following layers:
Page 7/16
Global Average Pooling layer;
Densely-connected layer with 128 neurons and ELU activation;
Dropout layer with dropout rate equal to 0.10;
Densely-connected layer with 64 neurons and ELU activation;
Dropout layer with dropout rate equal to 0.05;
Densely-connected layer with 3 neurons;
Softmax activation layer.
It is important to note that for the rst stage that only the classication heads were trained with body
weights frozen. According to the results of the hyperparameter tuning procedure, gradient descent
optimizer SGD with a learning rate equal to 10− 4 proved to be optimal. Having trained several state-of-the-
art networks, we found that most of them diverged. In this connection, L2-regularization with λ of 0.001
was applied to all training networks. All networks were trained with a batch size equal to 32. In order to
avoid overtting during network training, we applied Early Stopping regularization monitoring validation
loss with patience equal to 10 epochs. For training networks in on both rst and second stages we used
the cross-entropy, calculated as follows:
is the number of classes (3 in our study), p
is the predicted probability, y
is the ground-truth label
(ternary indicator),
is a small positive constant.
For the training and testing networks during the rst stage, the dataset was split in an 8:1:1 ratio i.e. the
training subset includes 2122 images (80.7%), the validation subset – 242 images (9.2%), and the testing
subset – 267 images (10.1%). The split of data within training, validations, and testing phases was
performed according to the distribution shown in Table3.
Page 8/16
Table 3
– Description of the data distribution within training, validation, and testing
Dataset Diagnosis Training Validation Testing
CCXRD Normal 14 2 2
Pneumonia 133 15 17
COVID-19 407 46 51
ACCXRD Normal 102 12 13
Pneumonia 0 0 0
COVID-19 46 6 6
FCCXRD Normal 1 1 1
Pneumonia 0 1 1
COVID-19 27 4 4
CRD Normal 0 0 0
Pneumonia 0 0 0
COVID-19 177 20 22
RSNAPDD Normal 567 63 70
Pneumonia 648 72 80
COVID-19 0 0 0
Total 2122 (80.7%) 242 (9.2%) 267 (10.1%)
3.2. Stage III
Once the performance and accuracy metrics of all networks were estimated, 4 networks that showed the
best results on the rst stage were chosen for ne-tuning. Besides training both bodies and heads of the
networks, we introduced a guided attention mechanism for the considered networks. We were inspired by
34, where the authors proposed a framework that provides guidance on the attention maps generated by a
weakly supervised deep learning neural network. The attention block in our pipeline is based on the usage
of U-net 35. As shown in Fig.1, the proposed algorithm applies segmentation masks to the features of the
network body (feature extractor) using multiplication. Applying an attention block to the output feature
vector of the network’s backbone allows networks to put more weight on the features that will be more
relevant in the distinction of the different classes. Additionally during this stage, we applied attention
maps obtained with help of the Grad-CAM technique 36. Furthermore, the loss differs from the one on
Stage I and Stage II and it is calculated as follows:
Page 9/16
whereLclasis the cross-entropy loss,Lattnis the attention loss,
is the coecient used to scale the total
loss and the attention component.Lattnis calculated according to Eq.(5) in34.
To correctly apply U-net in the guided attention mechanism, we trained this network on the lung
segmentation task. The data used for the training of this network is taken from the V7 Labs repository37.
The segmentation dataset contains 6500 images of AP/PA chest X-ray images with pixel-level polygonal
lung segmentations. Some examples of COVID-19 affected patients with segmented areas of lungs are
shown in Fig.2.
3.3. Visual model validation
While modern neural networks enable superior performance, their lack of decomposability into intuitive
and understandable components makes them hard to interpret. In this regard, an achievement of the
model transparency is useful to explain their predictions. Nowadays, one of the techniques used for
model interpretation is known as Class Activation Map (CAM) 38. Though CAM is a good technique to
demystify the working of CNNs, it suffers from some limitations. One of the drawbacks of CAM is that it
requires feature maps to directly precede the softmax layers, so it applies to a particular kind of network
architecture that performs global average pooling over convolutional maps immediately before
prediction. Such architectures may achieve inferior accuracies compared to general networks on some
tasks or simply be inapplicable to new tasks. De facto deeper representations of a CNN capture the best
high-level constructs. Furthermore, CNN’s naturally retrain spatial information which is lost in fully
connected layers, so we expect the last convolutional layer to have the best tradeoff between high-level
semantics and detailed spatial information. In this connection, we decided to use another popular
technique known as Grad-Cam. This model interpretation technique, published in 36, aims to improve the
shortcomings of CAM and claims to be compatible with any kind of architecture. The technique does not
require any modications to the existing model architecture, and this allows its application to any CNN
based architecture. Unlike CAM, Grad-Cam uses the gradient information owing into the last
convolutional layer of a CNN to understand each neuron for a decision of interest. Grad-Cam improves on
its predecessor, provides better localization and clear class discriminative saliency maps.
4. Results
4.1. Stage I
Having trained 10 neural networks, we found that 2 tend to diverge more than others. This is likely
connected with the normalization layers. Networks such as MobileNet V2 and VGG-16 do not have
Batch/Instance/Layer/Group Normalization layers in their architecture. In this regard, these networks start
diverging (MobileNet V2) or hit a validation loss/accuracy plateau (VGG-16) after approximately 100
Page 10/16
epochs. Popular regularization techniques such as Lasso Regression (L1 Regularization), Ridge
Regression (L2 regularization), ElasticNet (L1-L2 regularization), Dropout and Early Stopping may help to
avoid this problem. In this regard we applied Ridge Regression, Dropout layers and Early Stopping in our
training pipeline. As for the remaining networks, they did not suffer from overtting; however, they could
not reach better validation loss/accuracy values. When a given model reached its best validation loss, we
saved the associated model weights using saving callback. Figure3 demonstrates how the networks
were trained during the rst stage. Blue asterisks reect the best value of the accuracy on the validation
Since the loss value is poorly interpreted, we compared commonly used network metrics such as
accuracy and F1-score. Table4 and Table5 summarize these metrics estimated in the rst stage. As
seen, MobileNet V2, EcientNet B1, EcientNet B3, and VGG-16 achieved better results than other
Table 4
– Performance metrics within different subsets obtained after the rst stage
Model Accuracy F1-score
Training Validation Testing Training Validation Testing
MobileNet V2 0.95 0.79 0.77 0.95 0.80 0.78
DenseNet-121 0.76 0.72 0.74 0.76 0.72 0.75
EcientNet B0 0.95 0.79 0.70 0.95 0.80 0.70
EcientNet B1 0.79 0.76 0.74 0.79 0.76 0.75
EcientNet B3 0.77 0.75 0.71 0.78 0.75 0.72
EcientNet B5 0.77 0.74 0.70 0.77 0.74 0.70
VGG-16 0.90 0.79 0.78 0.90 0.80 0.79
ResNet-50 V2 0.80 0.71 0.69 0.80 0.71 0.70
Inception V3 0.77 0.71 0.73 0.77 0.71 0.74
Inception ResNet V2 0.71 0.68 0.70 0.71 0.67 0.70
Page 11/16
Table 5
– Performance metrics within different classes obtained after the rst stage
Model Accuracy F1-score
Normal Pneumonia Covid-19 Normal Pneumonia Covid-19
MobileNet V2 0.70 0.78 0.83 0.74 0.75 0.83
DenseNet-121 0.75 0.82 0.63 0.76 0.73 0.73
EcientNet B0 0.74 0.69 0.66 0.71 0.66 0.72
EcientNet B1 0.73 0.73 0.75 0.74 0.69 0.79
EcientNet B3 0.70 0.72 0.72 0.70 0.69 0.75
EcientNet B5 0.66 0.75 0.67 0.68 0.68 0.73
VGG-16 0.80 0.76 0.78 0.77 0.75 0.82
ResNet-50 V2 0.68 0.70 0.68 0.69 0.65 0.74
Inception V3 0.74 0.77 0.68 0.75 0.71 0.75
Inception ResNet V2 0.70 0.76 0.61 0.70 0.68 0.70
4.2. Stage II
Based on the results of the rst stage, MobileNet V2, EcientNet B1, EcientNet B3, and VGG-16
demonstrated their ability to distinct COVID-19 and pneumonia on X-ray images much better than other
networks. In this regard, these networks are chosen for ne-tuning. Additionally, we compared how ne-
tuned networks differ from the best networks of the rst stage. The results of the models’ performance
are shown in Fig.4, where blue asterisks reect the best value of the accuracy on the validation subsets.
Having compared accuracy and F1-score values on the rst (Table4and Table5) and second stage
(Table6and Table7), we can state that MobileNet V2 and VGG-16 have a larger boost in accuracy than
EcientNet models. Once the ne-tuning was performed, MobileNet V2 and VGG-16 got a + 6% and + 9%
accuracy change on the validation subset and a + 1% and + 4% accuracy change on the testing subset.
On the other hand, EcientNet B1 and EcientNet B3 a + 2% and + 3% accuracy change on the validation
subset and a -1% and + 6% accuracy change on the testing subset. It should also be noted, that the
largest boost in classication of COVID-19 was achieved by VGG-16. This network had an + 11% boost,
while MobileNet V2, EcientNet B1, and EcientNet B3 could reach the level of + 2%, 0%, and + 6%,
Page 12/16
Table 6
– Performance metrics within different subsets obtained after the second stage
Model Accuracy F1-score
Training Validation Testing Training Validation Testing
MobileNet V2 1.00 0.85 0.78 1.00 0.85 0.79
EcientNet B1 0.83 0.78 0.73 0.83 0.78 0.74
EcientNet B3 0.83 0.78 0.77 0.83 0.78 0.77
VGG-16 1.00 0.87 0.82 1.00 0.87 0.83
Table 7
– Performance metrics within different classes obtained after the second stage
Model Accuracy F1-score
Normal Pneumonia Covid-19 Normal Pneumonia Covid-19
MobileNet V2 0.74 0.75 0.85 0.75 0.74 0.85
EcientNet B1 0.70 0.74 0.75 0.72 0.71 0.78
EcientNet B3 0.77 0.75 0.78 0.76 0.74 0.81
VGG-16 0.81 0.78 0.89 0.80 0.79 0.89
4.3. Stage III
Having trained the chosen networks according to the pipeline described in Sect.3 and 3.2, we compared
them on the validation and testing subsets (Fig.5 and Fig.6). Based on the obtained results we
established that the proposed pipeline allows for boosting of the model accuracy. VGG-16 and MobileNet
V2 showed the best accuracy on the validation and testing subsets. It is worth noticing that the VGG-16
network outperformed the best CXR model (CXR-4A) on these subsets. The performance of other CXR
models is shown in Appendix A. It is observed that the VGG-16 (S3) network trained based on the
proposed pipeline has a + 9% and + 1% of accuracy boost on the validation subset compared to VGG-16
(S1) and VGG-16 (S2) respectively. Similar positive dynamics of using our pipeline is observed for other
models as well. It should be noted that the CXR-4A and lightweight MobileNet V2 have almost the same
accuracy, while the complexity of the latter is 15.5 times lower. The MobileNet V2 network includes 2.6
mln. weights, while CXR-4A – 40.2 mln. weights.
4.4. Model validation using Grad-CAM
As we mentioned in Sect.3.3, despite deep learning models having facilitated unprecedented accuracy in
image classication, one of their biggest problems is model interpretability representing a core
component in understanding and debugging of a model. We used the Grad-CAM technique to validate the
models and their correct/incorrect ability for making predictions, and to verify which series of neurons
activated in the forward-pass during the prediction. For the sake of visualization, we choose 3 patients
Page 13/16
with different ndings: normal, pneumonia, and COVID-19. Source images of these ndings with their
ground truth (GT) heatmaps are shown in Fig.7 and Fig.8.
Using Grad-CAM, we validated where our 4 best networks (MobileNet V2, EcientNet B1, EcientNet B3,
VGG-16) are looking, verifying that they are properly looking at the correct patterns in the image and
activating around those patterns. The Grad-CAM technique uses the gradients, owing into the nal
convolutional layer to produce a coarse localization heatmap highlighting the important regions in the
image for predicting the target concept i.e. COVID-19 or pneumonia areas. However, the localization
heatmaps may differ from the traditional localization techniques such as segmentation masks or
bounding boxes. In this regard, these heatmaps are used for the sake of approximate localization.
In order to interpret the models, Fig.7 and Fig.8 reect the visualization of gradient class activation
maps. Additional cases of the networks’ heatmaps are shown in Appendix B and Appendix C. Based on
the obtained results, we may state that the training of the models using masks (Stage III) has a positive
effect on the search for the correct patterns by the models. Networks such as MobileNet V2 (Fig.7c and
Fig.8c) and VGG-16 (Fig.7f and Fig.8f) identify affected areas correctly, despite the inaccuracies in the
location of the heatmaps. On the other hand, interpretation of the EcientNet networks showed that they
are not activating around the proper patterns in the image. This allows us to assume that EcientNet B1
and EcientNet B3 have not properly learned the underlying patterns in our dataset and/or we may need
to collect additional data.
5. Conclusion
In this study, we demonstrated a training pipeline based on directed guidance for neural networks. This
guidance forces the neural networks to pay attention to the areas obtained by the external network.
Having trained a set of deep learning models, we found that the proposed pipeline allows for increased
classication accuracy. This pipeline was used for the detection of COVID-19 and distinguishing its
presence from that of pneumonia. Of the obtained results, MobileNet V2 performed comparably to the
tailor-made CXR model CXR-4A, despite being 15 times less complex. According to the performed
experiments, the networks trained based on the proposed pipeline perform comparably to practicing
radiologists when it comes to the classication of multiple thoracic pathologies in chest X-ray
radiographs. Our pipeline may have the potential to improve healthcare delivery and increase access to
chest radiograph expertise for the detection of a variety of acute diseases.
The study was supported in part by the Ministry of Science and Higher Education, project No. FFSWW-
2020-0014 “Development of the technology for robotic multiparametric tomography based on big data
processing and machine learning methods for studying promising composite materials”.
Page 14/16
Author Contributions
Y.G., S.S., and O.T. conceived the idea of the study. V.D., Y.G., and O.T. developed the plan of execution. V.D.
and O.T. collected and annotated the data. V.D. and D.N. developed, trained, and analyzed the
performance of deep learning networks on the collected data. A.P. tested the performance of purpose-built
deep learning networks (COVID-Nets) on the collected data. V.D. and A.P. wrote the manuscript with input
from all the co-authors. A.K., A.K., V.K., V.S., and D.L. assisted in study direction and data quality
discussions. Y.G. supervised the project.
Competing interests
The authors have no competing interests as dened by Nature Research, or other interests that might be
perceived to inuence the results and/or discussion reported in this paper
Additional information
Correspondence and requests for materials should be addressed to V.D. and Y.G.
Reprints and permissions information is available at
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional aliations.
1. COVID-19 Virus Pandemic - Worldometer.
2. Wang, W.
et al.
Detection of SARS-CoV-2 in Different Types of Clinical Specimens.
JAMA - Journal of
the American Medical Association.
323, 1843–1844 (2020).
3. Wikramaratna, P., Paton, R., Ghafari, M. & Lourenco, J. Estimating false-negative detection rate of
2020.04.05.20053355 (2020) doi:10.1101/2020.04.05.20053355.
4. Yang, Y.
et al.
Evaluating the accuracy of different respiratory specimens in the laboratory diagnosis
and monitoring the viral shedding of 2019-nCoV infections.
2020.02.11.20021493 (2020)
5. Fang, Y.
et al.
Sensitivity of Chest CT for COVID-19: Comparison to RT-PCR.
296, E115–
E117 (2020).
6. Guan, W.
et al.
Clinical Characteristics of Coronavirus Disease 2019 in China.
N. Engl. J. Med.
1708–1720 (2020).
7. Huang, C.
et al.
Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China.
395, 497–506 (2020).
Page 15/16
8. Ng, M. Y.
et al.
Imaging Prole of the COVID-19 Infection: Radiologic Findings and Literature Review.
Radiol. Cardiothorac. Imaging.
2, e200034 (2020).
9. Kanne, J. P., Little, B. P., Chung, J. H., Elicker, B. M. & Ketai, L. H. Essentials for radiologists on COVID-
19: An update-radiology scientic expert panel. Radiology vol.296 E113–E114(2020).
10. Ai, T.
et al.
Correlation of Chest CT and RT-PCR Testing for Coronavirus Disease 2019 (COVID-19) in
China: A Report of 1014 Cases.
296, E32–E40 (2020).
11. Rubin, G. D.
et al.
The Role of Chest Imaging in Patient Management During the COVID-19 Pandemic:
A Multinational Consensus Statement From the Fleischner Society.
158, 106–116 (2020).
12. Wang, L., Lin, Z. Q. & Wong, A. COVID-Net: a tailored deep convolutional neural network design for
detection of COVID-19 cases from chest X-ray images.
Sci. Rep.
10, 19549 (2020).
13. Mahmud, T., Rahman, M. A., Fattah, S. A. & CovXNet: A multi-dilation convolutional neural network
for automatic COVID-19 and other pneumonia detection from chest X-ray images with transferable
multi-receptive feature optimization.
Comput. Biol. Med.
122, 103869 (2020).
14. Farooq, M., Hafeez, A. & COVID-ResNet: A Deep Learning Framework for Screening of COVID19 from
Radiographs. arXiv(2020).
15. Minaee, S., Kaeh, R., Sonka, M., Yazdani, S. & Jamalipour Sou, G. Deep-COVID: Predicting COVID-
19 from chest X-ray images using deep transfer learning.
Med. Image Anal.
65, 101794 (2020).
16. Baltruschat, I. M., Nickisch, H., Grass, M., Knopp, T. & Saalbach, A. Comparison of Deep Learning
Approaches for Multi-Label Chest X-Ray Classication.
Sci. Rep.
9, 6381 (2019).
17. Cohen, J. P.
et al.
COVID-19 Image Data Collection: Prospective Predictions Are the Future.
18. Cohen, J. P., Morrison, P. & Dao, L. COVID-19 Image Data Collection. arXiv(2020).
19. Wang, L.
et al.
Actualmed COVID-19 Chest X-ray Dataset Initiative. (2020).
20. Wang, L.
et al. Figure 1
COVID-19 Chest X-ray Dataset Initiative. (2020).
21. COVID-19 Radiography Database | Kaggle.
22. Chowdhury, M. E. H.
et al.
Can AI Help in Screening Viral and COVID-19 Pneumonia?
IEEE Access.
132665–132676 (2020).
23. RSNA Pneumonia Detection Challenge | Kaggle.
24. Litmanovich, D. E., Chung, M., Kirkbride, R. R., Kicska, G. & Kanne, J. P. Review of Chest Radiograph
Findings of COVID-19 Pneumonia and Suggested Reporting Language.
J. Thorac. Imaging.
35, 354–
360 (2020).
25. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L. C. MobileNetV2: Inverted Residuals and
Linear Bottlenecks. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 4510–4520(2018).
Page 16/16
26. Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks.
Proceedings – 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR
vols 2017-Janua 2261–2269 (Institute of Electrical and Electronics Engineers Inc., 2017).
27. Tan, M., Le, Q. V. & EcientNet Rethinking Model Scaling for Convolutional Neural Networks. 36th Int.
Conf. Mach. Learn. ICML 2019 2019-June, 10691–10700(2019).
28. Liu, S. & Deng, W. Very deep convolutional neural network based image classication using small
training sample size. in Proceedings – 3rd IAPR Asian Conference on Pattern Recognition, ACPR
2015 730–734 (Institute of Electrical and Electronics Engineers Inc., 2016).
29. He, K., Zhang, X., Ren, S. & Sun, J. Identity Mappings in Deep Residual Networks.
Lect. Notes Comput.
Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics).
9908 LNCS, 630–645
30. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the Inception Architecture for
Computer Vision. in Proceedings of the IEEE Computer Society Conference on Computer Vision and
Pattern Recognition vols 2016-Decem 2818–2826 (IEEE Computer Society, 2016).
31. Szegedy, C., Ioffe, S., Vanhoucke, V. & Alemi, A. A. Inception-v4, inception-ResNet and the impact of
residual connections on learning. in
31st AAAI Conference on Articial Intelligence, AAAI 2017
4284 (AAAI press, 2017).
32. Wang, L.
et al.
COVID-Net: COVID-Net Open Source Initiative.
33. Amazon SageMaker – Machine Learning – Amazon Web Services.
34. Li, K., Wu, Z., Peng, K. C., Ernst, J. & Fu, Y. Tell Me Where to Look: Guided Attention Inference Network.
in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition 9215–9223(IEEE Computer Society, 2018). doi:10.1109/CVPR.2018.00960.
35. Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image
segmentation. in Lecture Notes in Computer Science (including subseries Lecture Notes in Articial
Intelligence and Lecture Notes in Bioinformatics) vol.9351 234–241(Springer Verlag, 2015).
36. Selvaraju, R. R.
et al.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based
Localization. in
Proceedings of the IEEE International Conference on Computer Vision
vols 2017-
Octob 618–626 (Institute of Electrical and Electronics Engineers Inc., 2017).
37. COVID-19 X-ray dataset.
38. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Learning Deep Features for Discriminative
Localization. in Proceedings of the IEEE Computer Society Conference on Computer Vision and
Pattern Recognition vols 2016-Decem 2921–2929 (IEEE Computer Society, 2016).
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
The Coronavirus Disease 2019 (COVID-19) pandemic continues to have a devastating effect on the health and well-being of the global population. A critical step in the fight against COVID-19 is effective screening of infected patients, with one of the key screening approaches being radiology examination using chest radiography. It was found in early studies that patients present abnormalities in chest radiography images that are characteristic of those infected with COVID-19. Motivated by this and inspired by the open source efforts of the research community, in this study we introduce COVID-Net, a deep convolutional neural network design tailored for the detection of COVID-19 cases from chest X-ray (CXR) images that is open source and available to the general public. To the best of the authors’ knowledge, COVID-Net is one of the first open source network designs for COVID-19 detection from CXR images at the time of initial release. We also introduce COVIDx, an open access benchmark dataset that we generated comprising of 13,975 CXR images across 13,870 patient patient cases, with the largest number of publicly available COVID-19 positive cases to the best of the authors’ knowledge. Furthermore, we investigate how COVID-Net makes predictions using an explainability method in an attempt to not only gain deeper insights into critical factors associated with COVID cases, which can aid clinicians in improved screening, but also audit COVID-Net in a responsible and transparent manner to validate that it is making decisions based on relevant information from the CXR images. By no means a production-ready solution, the hope is that the open access COVID-Net, along with the description on constructing the open source COVIDx dataset, will be leveraged and build upon by both researchers and citizen data scientists alike to accelerate the development of highly accurate yet practical deep learning solutions for detecting COVID-19 cases and accelerate treatment of those who need it the most.
Full-text available
The diagnosis of coronavirus disease 2019 (COVID-19) is confirmed by reverse transcription polymerase chain reaction. The utility of chest radiography (CXR) remains an evolving topic of discussion. Current reports of CXR findings related to COVID-19 contain varied terminology as well as various assessments of its sensitivity and specificity. This can lead to a misunderstanding of CXR reports and makes comparison between examinations and research studies challenging. With this need for consistency, we propose language for standardized CXR reporting and severity assessment of persons under investigation for having COVID-19, patients with a confirmed diagnosis of COVID-19, and patients who may have radiographic findings typical or suggestive of COVID-19 when the diagnosis is not suspected clinically. We recommend contacting the referring providers to discuss the likelihood of viral infection when typical or indeterminate features of COVID-19 pneumonia on CXR are present as an incidental finding. In addition, we summarize the currently available literature related to the use of CXR for COVID-19 and discuss the evolving techniques of obtaining CXR in COVID-19-positive patients. The recently published expert consensus statement on reporting chest computed tomography findings related to COVID-19, endorsed by the Radiological Society of North American (RSNA), the Society of Thoracic Radiology (STR), and American College of Radiology (ACR), serves as the framework for our proposal.
Full-text available
Reverse transcription-polymerase chain reaction (RT-PCR) assays are used to test patients and key workers for infection with the causative SARS-CoV-2 virus. RT-PCR tests are highly specific and the probability of false positives is low, but false negatives can occur if the sample contains insufficient quantities of the virus to be successfully amplified and detected. The amount of virus in a swab is likely to vary between patients, sample location (nasal, throat or sputum) and through time as infection progresses. Here, we analyse publicly available data from patients who received multiple RT-PCR tests and were identified as SARS-CoV-2 positive at least once. We identify that the probability of a positive test decreases with time after symptom onset, with throat samples less likely to yield a positive result relative to nasal samples. Empirically derived distributions of the time between symptom onset and hospitalisation allowed us to comment on the likely false negative rates in cohorts of patients who present for testing at different clinical stages. We further estimate the expected numbers of false negative tests in a group of tested individuals and show how this is affected by the timing of the tests. Finally, we assessed the robustness of these estimates of false negative rates to the probability of false positive tests. This work has implications both for the identification of infected patients and for the discharge of convalescing patients who are potentially still infectious.
Full-text available
With more than 900,000 confirmed cases worldwide and nearly 50,000 deaths during the first three months of 2020, the COVID-19 pandemic has emerged as an unprecedented healthcare crisis. The spread of COVID-19 has been heterogeneous, resulting in some regions having sporadic transmission and relatively few hospitalized patients with COVID-19 and others having community transmission that has led to overwhelming numbers of severe cases. For these regions, healthcare delivery has been disrupted and compromised by critical resource constraints in diagnostic testing, hospital beds, ventilators, and healthcare workers who have fallen ill to the virus exacerbated by shortages of personal protective equipment. While mild cases mimic common upper respiratory viral infections, respiratory dysfunction becomes the principal source of morbidity and mortality as the disease advances. Thoracic imaging with chest radiography (CXR) and computed tomography (CT) are key tools for pulmonary disease diagnosis and management, but their role in the management of COVID-19 has not been considered within the multivariable context of the severity of respiratory disease, pre-test probability, risk factors for disease progression, and critical resource constraints. To address this deficit, a multidisciplinary panel comprised principally of radiologists and pulmonologists from 10 countries with experience managing COVID-19 patients across a spectrum of healthcare environments evaluated the utility of imaging within three scenarios representing varying risk factors, community conditions, and resource constraints. Fourteen key questions, corresponding to 11 decision points within the three scenarios and three additional clinical situations, were rated by the panel based upon the anticipated value of the information that thoracic imaging would be expected to provide. The results were aggregated, resulting in five main and three additional recommendations intended to guide medical practitioners in the use of CXR and CT in the management of COVID-19.
Full-text available
Background: Since December 2019, when coronavirus disease 2019 (Covid-19) emerged in Wuhan city and rapidly spread throughout China, data have been needed on the clinical characteristics of the affected patients. Methods: We extracted data regarding 1099 patients with laboratory-confirmed Covid-19 from 552 hospitals in 30 provinces, autonomous regions, and municipalities in China through January 29, 2020. The primary composite end point was admission to an intensive care unit (ICU), the use of mechanical ventilation, or death. Results: The median age of the patients was 47 years; 41.9% of the patients were female. The primary composite end point occurred in 67 patients (6.1%), including 5.0% who were admitted to the ICU, 2.3% who underwent invasive mechanical ventilation, and 1.4% who died. Only 1.9% of the patients had a history of direct contact with wildlife. Among nonresidents of Wuhan, 72.3% had contact with residents of Wuhan, including 31.3% who had visited the city. The most common symptoms were fever (43.8% on admission and 88.7% during hospitalization) and cough (67.8%). Diarrhea was uncommon (3.8%). The median incubation period was 4 days (interquartile range, 2 to 7). On admission, ground-glass opacity was the most common radiologic finding on chest computed tomography (CT) (56.4%). No radiographic or CT abnormality was found in 157 of 877 patients (17.9%) with nonsevere disease and in 5 of 173 patients (2.9%) with severe disease. Lymphocytopenia was present in 83.2% of the patients on admission. Conclusions: During the first 2 months of the current outbreak, Covid-19 spread rapidly throughout China and caused varying degrees of illness. Patients often presented without fever, and many did not have abnormal radiologic findings. (Funded by the National Health Commission of China and others.).
Full-text available
Background Chest CT is used for diagnosis of 2019 novel coronavirus disease (COVID-19), as an important complement to the reverse-transcription polymerase chain reaction (RT-PCR) tests. Purpose To investigate the diagnostic value and consistency of chest CT as compared with comparison to RT-PCR assay in COVID-19. Methods From January 6 to February 6, 2020, 1014 patients in Wuhan, China who underwent both chest CT and RT-PCR tests were included. With RT-PCR as reference standard, the performance of chest CT in diagnosing COVID-19 was assessed. Besides, for patients with multiple RT-PCR assays, the dynamic conversion of RT-PCR results (negative to positive, positive to negative, respectively) was analyzed as compared with serial chest CT scans for those with time-interval of 4 days or more. Results Of 1014 patients, 59% (601/1014) had positive RT-PCR results, and 88% (888/1014) had positive chest CT scans. The sensitivity of chest CT in suggesting COVID-19 was 97% (95%CI, 95-98%, 580/601 patients) based on positive RT-PCR results. In patients with negative RT-PCR results, 75% (308/413) had positive chest CT findings; of 308, 48% were considered as highly likely cases, with 33% as probable cases. By analysis of serial RT-PCR assays and CT scans, the mean interval time between the initial negative to positive RT-PCR results was 5.1 ± 1.5 days; the initial positive to subsequent negative RT-PCR result was 6.9 ± 2.3 days). 60% to 93% of cases had initial positive CT consistent with COVID-19 prior (or parallel) to the initial positive RT-PCR results. 42% (24/57) cases showed improvement in follow-up chest CT scans before the RT-PCR results turning negative. Conclusion Chest CT has a high sensitivity for diagnosis of COVID-19. Chest CT may be considered as a primary tool for the current COVID-19 detection in epidemic areas.
The COVID-19 pandemic is causing a major outbreak in more than 150 countries around the world, having a severe impact on the health and life of many people globally. One of the crucial step in fighting COVID-19 is the ability to detect the infected patients early enough, and put them under special care. Detecting this disease from radiography and radiology images is perhaps one of the fastest ways to diagnose the patients. Some of the early studies showed specific abnormalities in the chest radiograms of patients infected with COVID-19. Inspired by earlier works, we study the application of deep learning models to detect COVID-19 patients from their chest radiography images. We first prepare a dataset of 5,000 Chest X-rays from the publicly available datasets. Images exhibiting COVID-19 disease presence were identified by board-certified radiologist. Transfer learning on a subset of 2,000 radiograms was used to train four popular convolutional neural networks, including ResNet18, ResNet50, SqueezeNet, and DenseNet-121, to identify COVID-19 disease in the analyzed chest X-ray images. We evaluated these models on the remaining 3,000 images, and most of these networks achieved a sensitivity rate of 98% ( ± 3%), while having a specificity rate of around 90%. Besides sensitivity and specificity rates, we also present the receiver operating characteristic (ROC) curve, precision-recall curve, average prediction, and confusion matrix of each model. We also used a technique to generate heatmaps of lung regions potentially infected by COVID-19 and show that the generated heatmaps contain most of the infected areas annotated by our board certified radiologist. While the achieved performance is very encouraging, further analysis is required on a larger set of COVID-19 images, to have a more reliable estimation of accuracy rates. The dataset, model implementations (in PyTorch), and evaluations, are all made publicly available for research community at
With the recent outbreak of COVID-19, fast diagnostic testing has become one of the major challenges due to the critical shortage of test kit. Pneumonia, a major effect of COVID-19, needs to be urgently diagnosed along with its underlying reasons. In this paper, deep learning aided automated COVID-19 and other pneumonia detection schemes are proposed utilizing a small amount of COVID-19 chest X-rays. A deep convolutional neural network (CNN) based architecture, named as CovXNet, is proposed that utilizes depthwise convolution with varying dilation rates for efficiently extracting diversified features from chest X-rays. Since the chest X-ray images corresponding to COVID-19 caused pneumonia and other traditional pneumonias have significant similarities, at first, a large number of chest X-rays corresponding to normal and (viral/bacterial) pneumonia patients are used to train the proposed CovXNet. Learning of this initial training phase is transferred with some additional fine-tuning layers that are further trained with a smaller number of chest X-rays corresponding to COVID-19 and other pneumonia patients. In the proposed method, different forms of CovXNets are designed and trained with X-ray images of various resolutions and for further optimization of their predictions, a stacking algorithm is employed. Finally, a gradient-based discriminative localization is integrated to distinguish the abnormal regions of X-ray images referring to different types of pneumonia. Extensive experimentations using two different datasets provide very satisfactory detection performance with accuracy of 97.4% for COVID/Normal, 96.9% for COVID/Viral pneumonia, 94.7% for COVID/Bacterial pneumonia, and 90.2% for multiclass COVID/normal/Viral/Bacterial pneumonias. Hence, the proposed schemes can serve as an efficient tool in the current state of COVID-19 pandemic. All the architectures are made publicly available at:
An epidemic of respiratory disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) began in China and has spread to other countries.¹ Real-time reverse transcriptase–polymerase chain reaction (rRT-PCR) of nasopharyngeal swabs typically has been used to confirm the clinical diagnosis.² However, whether the virus can be detected in specimens from other sites, and therefore potentially transmitted in other ways than by respiratory droplets, is unknown.