Conference PaperPDF Available

PrepNet: A Convolutional Auto-Encoder to Homogenize CT Scans for Cross-Dataset Medical Image Analysis


Abstract and Figures

With the spread of COVID-19 over the world, the need arose for fast and precise automatic triage mechanisms to decelerate the spread of the disease by reducing human efforts e.g. for image-based diagnosis. Although the literature has shown promising efforts in this direction, reported results do not consider the variability of CT scans acquired under varying circumstances, thus rendering resulting models unfit for use on data acquired using e.g. different scanner technologies. While COVID-19 diagnosis can now be done efficiently using PCR tests, this use case exemplifies the need for a methodology to overcome data variability issues in order to make medical image analysis models more widely applicable. In this paper, we explicitly address the variability issue using the example of COVID-19 diagnosis and propose a novel generative approach that aims at erasing the differences induced by e.g. the imaging technology while simultaneously introducing minimal changes to the CT scans through leveraging the idea of deep auto-encoders. The proposed prepossessing architecture (PrepNet) (i) is jointly trained on multiple CT scan datasets and (ii) is capable of extracting improved discriminative features for improved diagnosis. Experimental results on three public datasets (SARS-COVID-2, UCSD COVID-CT, MosMed) show that our model improves cross-dataset generalization by up to 11.84 percentage points despite a minor drop in within dataset performance.
Content may be subject to copyright.
PrepNet: A Convolutional Auto-Encoder to
Homogenize CT Scans for Cross-Dataset Medical
Image Analysis
Mohammadreza Amirian∗† , Javier A. Montoya-Zegarra, Jonathan Gruss, Yves D. Stebler,
Ahmet Selman Bozkir, Marco Calandri, Friedhelm Schwenkerand Thilo Stadelmann∗§
ZHAW School of Engineering, 8400 Winterthur, Switzerland
Ulm University, Institute of Neural Information Processing, 89081 Ulm, Germany
University of Turin, Department of Oncology, 10124 Turin, Italy
§Fellow, ECLT European Centre for Living Technology, 30123 Venice, Italy
Abstract—With the spread of COVID-19 over the world, the
need arose for fast and precise automatic triage mechanisms
to decelerate the spread of the disease by reducing human
efforts e.g. for image-based diagnosis. Although the literature
has shown promising efforts in this direction, reported results
do not consider the variability of CT scans acquired under
varying circumstances, thus rendering resulting models unfit for
use on data acquired using e.g. different scanner technologies.
While COVID-19 diagnosis can now be done efficiently using
PCR tests, this use case exemplifies the need for a methodology
to overcome data variability issues in order to make medical
image analysis models more widely applicable. In this paper,
we explicitly address the variability issue using the example of
COVID-19 diagnosis and propose a novel generative approach
that aims at erasing the differences induced by e.g. the imaging
technology while simultaneously introducing minimal changes
to the CT scans through leveraging the idea of deep auto-
encoders. The proposed prepossessing architecture (PrepNet) (i)
is jointly trained on multiple CT scan datasets and (ii) is capable
of extracting improved discriminative features for improved
diagnosis. Experimental results on three public datasets (SARS-
COVID-2, UCSD COVID-CT, MosMed) show that our model
improves cross-dataset generalization by up to 11.84 percentage
points despite a minor drop in within dataset performance.
Index Terms—Adaptive preprocessing, domain adaptation, auto-
A major challenge in rolling out machine learned models
to a broad user base is the variability of data encountered in
the real world. Models can only be expected to work well on
data of similar distribution as has been used for training, but
ubiquitously, differences e.g. in image acquisition setup hinder
the applicability of a once developed model in novel settings.
A recent example for the negative effects of such failure to
adapt between differing domains has been given at the start of
the COVID-19 pandemic:
As of 2nd February 2021, this disease has caused over
100 million infections worldwide and over 2million deaths
according to the World Health Organisation (WHO) [1]. To
alleviate this, rapid diagnosis of COVID-19 cases has proven
to be effective for decelerating the spread of the disease [2].
According to [2], [3], reverse transcriptase quantitative poly-
merase chain reaction (RT-qPCR) tests are accepted as the gold
standard rule for the identification of positive cases. However,
this type of test was not available in sufficient numbers at
the beginning of the pandemic, is time-consuming and relies
on both human effort and expert knowledge. Thus, there
arose a need for automatic diagnostic methods that can assist
experts and reduce human efforts by targeting the automatic
identification of COVID-19 positive cases. The literature has
shown promising efforts in the automatic identification of
COVID-19 cases from lung computed tomography (CT) scans
using computer vision methods [4], [5], [6], [7]. Furthermore,
Lessmann et al. addressed cross-vendor analysis (between
different CT scanners such as Varian, Siemens, GE Health-
care, Philips and Canon) for 3D CT scans successfully [8].
However, it is demonstrated that a considerable drop in cross-
dataset performance appears for the diagnosis of 2D CT scans
acquired via different devices. Thus, the previously mentioned
within dataset variability has the potential to discourage the
community to merge and annotate data from multiple sources.
As a result, combining datasets is a challenge posed not
only for COVID detection but also for other applications
in diagnosis and segmentation. In this paper, we address
domain adaptation of medical image analysis methods by
proposing a deep convolutional neural network (CNN) for
preprocessing 2D CT scans: it is trained to fool a classifier
that discriminates between various CT datasets, thus aiming
to remove the within dataset variability. We evaluate the
performance of the suggested method on the exemplary use
case of predicting COVID-19 positive cases, due to the global
variability in respective datasets and the availability of plenty
opportunities to compare. The methodology is inspired by
generative adversarial learning [9], [10]. Our contribution is
twofold: (i) we propose a novel trainable preprocessing CNN
architecture with a dual training objective that is capable of
equalizing the variability of different CT-scanner technologies
in the image domain as a pre-processor (PrepNet); (ii) we
validate this model by showing the transferability of its diag-
nostic capabilities between different CT data sources based on
common public benchmarks. We conduct experiments on the
SARS-CoV-2 CT-scan dataset [11] and the UCSD COVID-CT
dataset [12] as well as MosMed dataset [13]. Our results show
that our PrepNet model improves the cross-dataset COVID-19
diagnosis performance (i.e., training on one dataset, testing on
another) by 11.84 percentage points (pp) through creating a
unified representation of multi-dataset CT scans.
With the emergence of COVID-19, many studies and
datasets have been proposed in the literature that show an
increase in data diversity over time and the extent of related
computer vision methods to deal with it [14], [15]. Horry
et al. [2] utilize a transfer learning scheme to build various
COVID-19 classifiers based on several off the shelf CNN mod-
els such as VGG16/19 [16], Resnet50 [17], InceptionV3 [18],
Xception [19], and InceptionResnet [20]. They compared the
generalization capability of various images sources such as X-
ray, CT and ultrasound images and developed a pre-processing
scheme for X-ray images to reduce noise at non-lung areas in
order to decrease the effect of quality imbalance among the
employed images. A VGG19 [16] coupled with ultrasound
images is found to yield the best validation accuracy of 99%,
while 84% have been achieved using CT scans [21].
He et al. [21] propose a sample-efficient learning con-
cept called “Self-Trans” via synergetically combining transfer
learning and contrastive self-supervised learning. They seek
intrinsic visual patterns in CT scans without relying on labels
created with human effort. Besides, they open-sourced their
CT dataset involving 349 COVID-19 positive patients and
397 COVID-19 negatives [12]. They achieve an accuracy of
86% through unbiased feature representations together with a
reduction of overfitting.
Mobiny et al. [22] propose the DECAPS approach with
following contributions: (i) inverted dynamic routing [23] to
avoid seeking visual features from non-related regions, (ii)
training with a two-stage patch crop and drop strategy to
encourage the network to focus on the useful areas, (iii)
employing conditional generative adversarial networks for data
augmentation. Experiments result 84.3% precision and 91.5%
recall along with 87.6% accuracy. They additionally report
results for the conventional deep classifiers DenseNet121
[24] and Resnet50 [17], yielding 82.5% and 80.8% accuracy,
respectively. In contrast to this study, Pham [25] points out
the negative impact of data augmentation in the context
of CT-based COVID-19 image classification. In his study,
the author fine-tunes various well-known pre-trained CNN
models ranging from AlexNet [26] to NasNet-Large [27].
Experiments conducted on the already introduced CT dataset
[12] credit a DenseNet-201 with the best accuracy of 96.2%.
However, data augmentation using random vertical/horizontal
flips (p=0.5), vertical/horizontal translation (±30 pixels) and
scaling (±10%) yields a 6% accuracy drop on average.
Chaganti et al. [28] suggest a deep-reinforcement-learning-
based scheme focusing on seeking doubtful lung areas on
CT scans to localize abnormal portions. A recent study by
[15], a novel architecture called “COVID-Net- CT-2” which
utilizes machine-driven design exploration based on iterative
constrained optimization is proposed [29]. The authors point
out that one of the subtle problems of earlier studies is the
limited number of patients and poor diversity of CT scans in
terms of multi-nationality. Therefore, they introduce the two
large-scale COVID-19 CT datasets called “COVIDx CT-2A
and “COVIDx CT-2B” gathered from 4,501 patients from at
least 15 countries, totally comprising 194.922 and 201.103
images respectively. Experiments show that the architecture
achieves a sensitivity of 99.0% and an accuracy of 98.1%,
which competes with radiologist-level decision making capa-
bility. The study deals with variability in the patients’ ethnicity,
while CT scans generated by various vendors’ devices exhibit
visual differences, artifacts, and variable intensities that are
never addressed so far. Thus, independent from the reported
success of some deep learning architecture, it is likely to
witness a drop in prediction accuracy during inference when
a test image is taken with a different device as has been used
for training. Motivated by this issue, we propose to employ
a pre-processing network (PrepNet) to standardize CT images
with respect to the visual differences among datasets prior to
training of any final diagnosis model, relying on generative
architectures since they showed very promising results for
similar tasks [22]. An advantage of this approach is that the
PrepNet can be combined with any downstream diagnosis
model, thus leveraging future progress there without additional
costs while improving cross-dataset performance.
Two research papers closely related to the goal of do-
main adaptation in this study are presented by Lessmann et
al. addressing cross-vendor diagnosis [8] and Amyar et al.
using auto-encoders in multi-task learning [30]. Neverthe-
less, Lessmann et al. did not confront a considerable cross-
vendor performance drop because of using a richer source
of information (3D scans) as explained in [31]. Amyar et
al. leveraged multi-task learning and trained an auto-encoder
besides a segmentation and classification model for COVID-
19 diagnosis. However, they did not aim at removing the
cross-dataset variability of the scans. This study focuses on
homogenizing the 2D CT scans by reducing cross-dataset
In this section, we give details of our PrepNet model
in terms of network architecture, core modules, and loss
functions. The architecture of our proposed model is presented
in Figure 1. For a group of Ninput CT scans {X n}N
coming from different CT vendors’ devices, our model ex-
tracts multi-scale discriminative feature maps through an auto-
encoder and reconstructs the original CT scans {ˆ
n=1. The
reconstructed CT scans are next fed into a dataset/technology
classification branch. The dataset classifier branch acts as a
pseudo-label classifier and is responsible for discriminating
Fig. 1. The architecture of our proposed PrepNet model consists of three main modules: (i) an auto-encoder that acts as a CT cross-dataset homogenizer; (ii)
a multi CT-technology classifier; and (iii) a COVID-19 binary classifier. The auto-encoder and the multi CT technology classifier are trained adversarially.
The binary COVID-19 classifier is independently trained using the auto-encoder’s output.
among different CT datasets. Once this model is trained end-
to-end in an adversarial way, the reconstructed CT scans are
fed into a COVID-19 classifier which is trained directly on the
reconstructed CT-scans. The COVID-19 classification branch
is responsible for the classification of healthy vs. non-healthy
patients. The complete network model with its main modules
are described next in more detail.
A. Model Architecture
Auto-Encoder Module: We feed a CT scan image Xninto our
auto-encoder (Eaand Da) and obtain a reconstructed version
Xngiven by ˆ
Xn=Da(Ea(Xn)). The encoder Eais based on
the standard classification network VGG-Net [16], whilst the
decoder Dais a convolutional network with the same number
of layers as the encoder. We add skip-connections from Eato
Dato recover the spatial information lost during the down-
sampling operations.
Dataset Classifier Module: The CT dataset classifier Et
receives as input the reconstructed CT scan ˆ
Xnfrom the
auto-encoder and feeds it into an encoder branch Et(ˆ
classifies the CT dataset/technology. In our experiments, Et
relies on the VGG-Net architecture as well.
COVID-19 Classifier Module: The COVID-19 classifier Ec
is also uses several backbone architecture. Given a recon-
structed CT scan ˆ
Xn, it outputs COVID vs. non-COVID
predictions, i.e. Ec(ˆ
B. Loss Functions and Evaluation Metric
The complete loss function of PrepNet is based on the vari-
ous terms presented in Figure 1. It comprises a reconstruction
loss Lrec and two classification losses Lpseu and Lcovid :
Ltotal =Lrec +Lpseu +Lcovid (1)
Given the labeled dataset D={(Xn, yn, pn)}N
the CT scans Xntogether with their binary COVID label
ynand the CT-dataset pseudo label pn, the auto-encoder
reconstruction loss is given by Lrec =PnkX nˆ
the COVID-19 binary classification loss is denoted Lcovid =
Pnynlog ˆyn+ (1 yn) log(1 ˆyn); the CT dataset pseudo
label is computed by Lpseu =Pnpnlog ˆpn.
To measure the COVID-19 detection performance and to
minimize the effect of class imbalance in datasets, we use the
balanced accuracy metric (BA) [32]
where Pand Nare the number of positive and negative
samples respectively and T P and T N denote the number
of true positive and true negative predictions, respectively. In
addition, we also use specificity, sensitivity, and area under
the curve to evaluate the COVID-19 performance results.
A. Datasets
We use three public datasets to validate our approach exper-
imentally. The SARS-CoV-2 CT-scan dataset [11] comprises a
total of 4,173 CT images of real patients from the Public
Hospital of the Government Employees of Sao Paulo (HSPM)
and the Metropolitan Hospital of Lapa, both in Sao Paulo
- Brazil (2,168 positive/infected and 768 healthy patients).
Moreover, 1,247 CT scans belong to patients who have other
pulmonary diseases. The CT image annotations (positive vs.
negative) have been done by three different clinicians. Note
that during our visual inspection we found two erroneous
images (i.e. unrelated to the problem domain) and excluded
them from the dataset. In addition, we also excluded the 1,247
pulmonary diseased patients.
The UCSD COVID-CT dataset [12] has been collected
in the Tongji Hospital in Wuhan, China during the out-
break of COVID-19 between the months of January/2020 and
April/2020. This dataset contains 349 CT images from infected
patients and 397 from non-infected patients. All images have
been annotated by a senior radiologist of the same hospital.
As reported by [22], heights of the images in this dataset
range between 153 and 1,853 pixels with an average of
491 pixels, whereas the widths vary between 124 and 1,458
pixels (average of 383 pixels). For partitioning, we follow
the splitting guideline provided by the authors of the dataset.
Table I summarizes the train, validation and test splits for each
The MosMed dataset [13] was collected by the Moscow
Health Care Department from different municipal hospitals
in Russia between March/2020 and April/2020. The dataset
contains axial CT images from 1110 patients with different
levels of COVID-19 severity, ranging from mild to critical
cases and also healthy patients. Some image samples of each
dataset are provided in Figure 2.
Fig. 2. COVID-19 positive and negative samples for each used dataset. Note
the variabilities in terms of texture, size, and shape across datasets.
B. Implementation Details
We run all our experiments using the publicly available
Pytorch 1.5.0 library and an NVIDIA VP100 GPU (32 GB of
VRAM). During network training, each image is first resized
based on the input size of the classifiers’ backbones; we use
histogram equalization as a fixed preprocessing step, then
apply the mean and standard deviation of ImageNet pretrained
models. We train PrepNet using the AdamW optimizer [33].
We perform a 24 hour hyperparameter search with six parallel
runs using the Bayesian search strategy with Hyperband for
early-stopping on one GPUs [34]. The hyperparameter search
improves the chance of avoiding local minima and presenting
optimal results of every configuration. The best model is
selected based on the optimal validation performances. During
training, we first train the auto-encoder for 20 epochs and
warm up the dataset classification branch for 2epochs before
we start with the adversarial training. Once the adversarial
training is finished, we train the COVID classification branch
independently from the other two branches using the output
of the auto-encoder/PrepNet.
C. Experimental Results
The inter- and cross-dataset performance of the proposed
preprocessing schemes are presented in Table II. In order to
observe possible overfitting, we report the hold out test set
performance on each dataset. The cross-dataset performance
is evaluated by measuring the balanced accuracy (minimizing
the effect of class imbalance) of the models trained on one
dataset and tested on the other. We report results using the
balanced accuracy of the models trained on the SARS-COV-2
and UCSD COVID-CT datasets. Further metrics also include
sensitivity (Sens), specificity (Spec) and area under the curve
(AUC). In the rows, we present the datasets used during
training. Furthermore, we group the results by model. The first
group of results are related to the COVID classifier (VGG-19
pre-trained model), that is trained and evaluated on the original
CT scans. The second group of results is related to the auto-
encoder alone trained on both datasets in a self-supervised
manner to minimize the reconstruction loss. The third group
of results relate to full PrepNet preprocessing before training
the classifiers.
The results in Table II show that the average cross-dataset
performance (over all dataset splits) of models trained on
original data increases by 6.77pp after using the pure auto-
encoder model, and by 11.84pp through PrepNet. However,
the average test accuracy for within-dataset evaluation declines
by 0.32pp and 1.83pp after applying the baseline auto-encoder
or PrepNet, respectively. A discussion regarding this effect is
presented in the next section.
In our experiments, we use the VGG19 [16] as the baseline
model because it is more straight-forward to train and has
shown good generalization properties on 2D medical images
based on previous practical experiments1. Besides that, the
VGG architecture has been also successfully applied for
COVID-19 identification [2], [21].
As part of our ablation study, we also evaluated how
different backbones affect the COVID-19 diagnosis accuracy
of PrepNet. More precisely, we replicate the experiments
for each dataset (SARS-COV-2 and UCSD COVID-CT) and
evaluate different CNN architectures as part of our COVID-
classifier Module (See Section III-A for more information).
The CNN architectures include ResNet18 [17], Inception [35],
and EfficientNet-B0 [36]. We report results in Table III.
Experimental results show that in almost all backbones, the
average cross-dataset performance increases with the cost of
a small decrease in the within-dataset accuracy.
Finally, in order to evaluate the generalisation capabil-
ities of PrepNet and our baselines, we evaluate how our
trained models perform on an unseen dataset, i.e. the MosMed
dataset [13]. The results in Table IV show the improvements
of our AutoEncoder and PrepNet models in terms of BA and
sensitivity, however, with a decrease in specificity and AUC
when compared with the COVID-19 classifier. Despite the
decrease in specificity, we argue that especially for medical
diagnosis and screening, a low specificity is less harmful
than a reduction in sensitivity, as false positive cases can be
discarded by additional examinations. On the contrary, a higher
sensitivity is important as false negatives should be low.
D. Discussion
The baseline and proposed pre-processing approaches in-
troduce performance drops when applied before within-dataset
Dataset portions
Dataset Type Size Country Train Validation Test
SARS-COV-2 [11] 2D CT Various Brazil 2,046 (70%) 439 (15%) 439 (15%)
UCSD COVID-CT [12] 2D CT Various China 423 (57%) 116 (16%) 201 (27%)
MosMed Dataset [13] 3D CT Various Rusia 1100 images for unseen test dataset
Test dataset SARS-COV-2 UCSD COVID-CT Within Test Cross-Dataset Pre-trained
Dataset portion BA Sens Spec AUC Test Sens Spec AUC Average Average encoder
COVID classifier
SARS-COV-2 0.8924 0.9292 0.7876 0.8584 0.4433 0.7835 0.1262 0.4548 0.8587 0.4159 Yes
UCSD COVID-CT 0.3295 0.3476 0.2743 0.3110 0.8250 0.7113 0.9320 0.8216 (baseline) (baseline)
SARS-COV-2 0.8956 0.9907 0.6460 0.8183 0.4983 0.9175 0.0970 0.5073 0.8555 0.4836 Yes
UCSD COVID-CT 0.49405 0.6030 0.3008 0.4519 0.8154 0.7216 0.8846 0.8031 (0.32%) (+6.77%)
SARS-COV-2 0.9007 0.9353 0.7982 0.8668 0.5157 0.9175 0.1067 0.5121 0.8404 0.5343 Yes
UCSD COVID-CT 0.5545 0.6446 0.1858 0.4852 0.7800 0.8556 0.7087 0.7822 (1.83%) (+11.84%)
classification. These approaches usually reduce the test accura-
cies when trained and evaluated on the same dataset using the
corresponding dataset splits. Therefore, we further investigate
the intermediate results of the baseline auto-encoder and
PrepNet on a case-by-case basis. Severe cases of generated
artifacts through reconstruction via the baseline auto-encoder
and the PrepNet are presented in Figure 3. We conjecture
that the drop in within-dataset test performance is caused by
occasional artifacts such as these. These quality drops can
be clearly seen in the reconstruction loss. However, it is not
straightforward to correct them. We could eventually overcome
this by also investigating different data-augmentation strate-
gies and by improving the network architecture of our auto-
encoder. Additionally, we depict sample images in which the
models failed to make a correct decision after auto-encoder
or PrepNet Figure 4. Limited amount of training data and
noisy labels of public datasets are other factors contributing to
low classification accuracies. One possible way to tackle this
limitation is to rely on weakly supervised learning methods
to improve the COVID-19 classification accuracy with the
methodology summarized in [37].
In this paper, we introduced a novel approach to unify
several CT scan datasets with respect to varying image
datasets and acquisition circumstances such as CT scanner
technology through training an adaptive pre-processing net-
work that removes such specificities from the images them-
selves. Additionally, we presented initial results demonstrating
the applicability of the method on three publicly available
benchmark datasets. This way, it is possible to shift the
focus of model training from merely optimizing hold-out test
set performance on the same data distribution (which likely
does not transfer to any other environment) towards cross-
dataset detection accuracy. The proposed PrepNet improves
the cross-dataset balanced accuracy by a margin of 11.84
percentage points (SARS-CoV-2 CT-scan dataset [11]) at the
expanse of a decline in the within dataset test performance of
ca. 1.83pp (UCSD COVID-CT database [12]). These results
suggest that the trainable preprocessing network erases some
of the necessary information for diagnosis, due to artifacts.
This information could be partially retained by propagating
the gradients of the COVID-19 classifier network through the
preprocessing model, and generated artifacts could be detected
automatically by monitoring the reconstruction loss of the
auto-encoder module. This, together with further investigations
on the applicability and generality of the proposed approach
to combine multiple datasets, is an intriguing theme for future
This research was financially supported by the ZHAW
Digital Futures Fund under contracts “SDMCT—Standardized
Data and Modeling for AI-based CoVID-19 Diagnosis Support
on CT Scans” as well as “Synthetic data generation of CoVID-
19 CT/X-rays images for enabling fast triage of healthy vs.
Test dataset SARS-COV-2 UCSD COVID-CT Within Test Cross-Dataset Pre-trained
Dataset portion BA Sens Spec AUC Test Sens Spec AUC Average Average encoder
SARS-COV-2 0.9007 0.9353 0.7982 0.8668 0.5157 0.9175 0.1067 0.5121 0.8404 0.5343 Yes
UCSD COVID-CT 0.5545 0.6446 0.1858 0.4852 0.7800 0.8556 0.7087 0.7822 (1.83%) (+11.84%)
SARS-COV-2 0.7462 0.7046 0.8584 0.7815 0.4728 0.8144 0.1538 0.4841 0.7345 0.4940 Yes
UCSD COVID-CT 0.5152 0.6246 0.1947 0.4096 0.7228 0.8351 0.6154 0.7252 (12.42%) (+7.81%)
SARS-COV-2 0.8553 0.9046 0.7080 0.8063 0.4703 0.9485 0.02885 0.4886 0.8286 0.3995 Yes
UCSD COVID-CT 0.3288 0.36308 0.2212 0.2922 0.8020 0.8351 0.7692 0.8021 (3.01%) (1.64%)
SARS-COV-2 0.8735 0.8923 0.8142 0.8532 0.5223 0.5979 0.4519 0.5249 0.8253 0.4835 Yes
UCSD COVID-CT 0.4447 0.5015 0.2743 0.3879 0.7772 0.8041 0.7500 0.7771 (3.34%) (+6.76%)
Dataset Original Baseline auto-encoder PrepNet
Fig. 3. Severe cases of artifacts generated by the baseline and the proposed PrepNet. The images demonstrate different levels of distortions like e.g. extreme
Test dataset MosMed Pre-trained
Preprocessing BA Sens Spec AUC encoder
COVID-classifier 0.6066 0.5246 0.8771 0.7009 Yes
(baseline) (baseline) (baseline) (baseline)
AutoEncoder 0.6693 0.7142 0.5175 0.6159 Yes
(+6.27%) (+18.96%) (35.96%) (8.50%)
PrepNet 0.7073 0.7558 0.5438 0.6498 Yes
(+10.07%) (+23.12%) (33.33%) (5.11%)
unhealthy patients”.
[1] WHO. (2021) WHO COVID-19 situation reports.
[Online]. Available:
novel-coronavirus-2019/situation- reports
Dataset pre-processed initial reproduction PrepNet reproduction
Fig. 4. Samples CT scans that are wrongly classified after the trainable
[2] M. J. Horry, S. Chakraborty, M. Paul, A. Ulhaq, B. Pradhan, M. Saha,
and N. Shukla, “Covid-19 detection through transfer learning using
multimodal imaging data,” IEEE Access, vol. 8, pp. 149 808–149 824,
[3] C. Chen, G. Gao, Y. Xu, L. Pu, Q. Wang, L. Wang, W. Wang, Y. Song,
M. Chen, L. Wang et al., “Sars-cov-2–positive sputum and feces after
conversion of pharyngeal samples in patients with covid-19,Annals of
internal medicine, vol. 172, no. 12, pp. 832–834, 2020.
[4] X. Mei, H.-C. Lee, K.-y. Diao, M. Huang, B. Lin, C. Liu, Z. Xie,
Y. Ma, P. M. Robson, M. Chung, A. Bernheim, V. Mani, C. Calcagno,
K. Li, S. Li, H. Shan, J. Lv, T. Zhao, J. Xia, Q. Long, S. Steinberger,
A. Jacobi, T. Deyer, M. Luksza, F. Liu, B. P. Little, Z. A. Fayad, and
Y. Yang, “Artificial intelligence–enabled rapid diagnosis of patients with
covid-19,” Nature Medicine, vol. 26, pp. 1224–1228, 2020.
[5] S. A. Harmon, T. H. Sanford, S. Xu, E. B. Turkbey, H. Roth, Z. Xu,
D. Yang, A. Myronenko, V. Anderson, A. Amalou, M. Blain, M. Kassin,
D. Long, N. Varble, S. M. Walker, U. Bagci, A. M. Ierardi, E. Stellato,
G. G. Plensich, G. Franceschelli, C. Girlando, G. Irmici, D. Labella,
D. Hammoud, A. Malayeri, E. Jones, R. M. Summers, P. L. Choyke,
D. Xu, M. Flores, K. Tamura, H. Obinata, H. Mori, F. Patella, M. Cariati,
G. Carrafiello, P. An, B. J. Wood, and B. Turkbey, “Artificial intelligence
for the detection of covid-19 pneumonia on chest ct using multinational
datasets,” Nature Communications, vol. 11, p. 4080, 2020.
[6] L. Li, L. Qin, Z. Xu, Y. Yin, X. Wang, B. Kong, J. Bai, Y. Lu, Z. Fang,
Q. Song, K. Cao, D. Liu, G. Wang, Q. Xu, X. Fang, S. Zhang, J. Xia, and
J. Xia, “Using artificial intelligence to detect covid-19 and community-
acquired pneumonia based on pulmonary ct: Evaluation of the diagnostic
accuracy,Radiology, vol. 296, no. 2, pp. E65–E71, 2020.
[7] X. Wang, X. Deng, Q. Fu, Q. Zhou, J. Feng, H. Ma, W. Liu, and
C. Zheng, “A weakly-supervised framework for covid-19 classification
and lesion localization from chest ct,” IEEE Transactions on Medical
Imaging, vol. 39, no. 8, pp. 2615–2625, 2020.
[8] N. Lessmann, C. I. S´
anchez, L. Beenen, L. H. Boulogne, M. Brink,
E. Calli, J.-P. Charbonnier, T. Dofferhoff, W. M. van Everdingen, P. K.
Gerke et al., “Automated assessment of covid-19 reporting and data
system and chest ct severity scores in patients suspected of having covid-
19 using artificial intelligence,” Radiology, vol. 298, no. 1, pp. E18–E28,
[9] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. C. Courville, and Y. Bengio, “Generative adversarial nets,”
in Advances in Neural Information Processing Systems 27: Annual
Conference on Neural Information Processing Systems 2014, December
8-13 2014, Montreal, Quebec, Canada, Z. Ghahramani, M. Welling,
C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds., 2014, pp.
2672–2680. [Online]. Available:
[10] J. Schmidhuber, “Generative adversarial networks are special cases
of artificial curiosity (1990) and also closely related to predictability
minimization (1991),” Neural Networks, vol. 127, pp. 58–66, 2020.
[11] E. Soares and P. Angelov, “A large dataset of real patients
CT scans for COVID-19 identification,” 2020. [Online]. Available:
[12] J. Zhao, Y. Zhang, X. He, and P. Xie, “Covid-ct-dataset: a ct scan dataset
about covid-19,” arXiv preprint arXiv:2003.13865, 2020.
[13] S. P. Morozov, A. E. Andreychenko, N. A. Pavlov, A. V. Vladzymyrskyy,
N. V. Ledikhova, V. A. Gombolevskiy, I. A. Blokhin, P. B. Gelezhe,
A. V. Gonchar, and V. Y. Chernina, “Mosmeddata: Chest ct scans with
covid-19 related findings dataset,” 2020.
[14] J. P. Cohen, P. Morrison, and L. Dao, “Covid-19 image data
collection,” arXiv 2003.11597, 2020. [Online]. Available: https:
[15] H. Gunraj, A. Sabri, D. Koff, and A. Wong, “Covid-net ct-2: En-
hanced deep neural networks for detection of covid-19 from chest
ct images through bigger, more diverse learning,arXiv preprint
arXiv:2101.07433, 2021.
[16] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 770–778.
[18] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking
the inception architecture for computer vision,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2016, pp.
[19] F. Chollet, “Xception: Deep learning with depthwise separable convolu-
tions,” in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2017, pp. 1251–1258.
[20] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4,
inception-resnet and the impact of residual connections on learning,” in
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31,
no. 1, 2017.
[21] X. He, X. Yang, S. Zhang, J. Zhao, Y. Zhang, E. Xing, and P. Xie,
“Sample-efficient deep learning for covid-19 diagnosis based on ct
scans,” MedRxiv, 2020.
[22] A. Mobiny, P. A. Cicalese, S. Zare, P. Yuan, M. Abavisani, C. C. Wu,
J. Ahuja, P. M. de Groot, and H. Van Nguyen, “Radiologist-level covid-
19 detection using ct scans with detail-oriented capsule networks,” arXiv
preprint arXiv:2004.07407, 2020.
[23] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between
capsules,” arXiv preprint arXiv:1710.09829, 2017.
[24] H. Gao, L. Zhuang, L. Maaten, and K. Q. Weinberger, “Densely
connected convolutional networks,” in CVPR, vol. 1, 2017, p. 3.
[25] T. D. Pham, “A comprehensive study on classification of covid-19 on
computed tomography with pretrained convolutional neural networks,
Scientific Reports, vol. 10, no. 1, pp. 1–8, 2020.
[26] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,Advances in neural informa-
tion processing systems, vol. 25, pp. 1097–1105, 2012.
[27] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable
architectures for scalable image recognition,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2018, pp. 8697–
[28] S. Chaganti, A. Balachandran, G. Chabin, S. Cohen, T. Flohr,
B. Georgescu, P. Grenier, S. Grbic, S. Liu, F. Mellot et al., “Quan-
tification of tomographic patterns associated with covid-19 from chest
ct,” arXiv preprint arXiv:2004.01279, 2020.
[29] A. Wong, “Netscore: towards universal metrics for large-scale perfor-
mance analysis of deep neural networks for practical on-device edge
usage,” in International Conference on Image Analysis and Recognition.
Springer, 2019, pp. 15–26.
[30] A. Amyar, R. Modzelewski, H. Li, and S. Ruan, “Multi-task deep learn-
ing based ct imaging analysis for covid-19 pneumonia: Classification
and segmentation,” Computers in Biology and Medicine, vol. 126, p.
104037, 2020.
[31] C. de Vente, L. H. Boulogne, K. V. Venkadesh, C. Sital, N. Lessmann,
C. Jacobs, C. I. S´
anchez, and B. van Ginneken, “Improving automated
covid-19 grading with convolutional neural networks in computed to-
mography scans: An ablation study,arXiv preprint arXiv:2009.09725,
[32] K. H. Brodersen, C. S. Ong, K. E. Stephan, and J. M. Buhmann,
“The balanced accuracy and its posterior distribution,” in 2010 20th
international conference on pattern recognition. IEEE, 2010, pp. 3121–
[33] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,
arXiv preprint arXiv:1711.05101, 2017.
[34] L. Tuggener, M. Amirian, K. Rombach, S. L¨
orwald, A. Varlet, C. West-
ermann, and T. Stadelmann, “Automated machine learning in practice:
state of the art and recent results,” in 2019 6th Swiss Conference on
Data Science (SDS). IEEE, 2019, pp. 31–36.
[35] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov,
D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with
convolutions,” in Computer Vision and Pattern Recognition (CVPR),
2015. [Online]. Available:
[36] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for con-
volutional neural networks,” in International Conference on Machine
Learning. PMLR, 2019, pp. 6105–6114.
[37] N. Simmler, P. Sager, P. Andermatt, R. Chavarriaga, F.-P. Schilling,
M. Rosenthal, and T. Stadelmann, “A survey of un-, weakly-, and semi-
supervised learning methods for noisy, missing and partial labels in
industrial vision applications,” in Proceedings of the 8th SDS. IEEE,
... We build upon their work for the following reasons: (i) The average distance between the predicted and the actual vertebrae centroids is small and considered state-of-the-art; (ii) the models are pure CNN architectures which can be easily extended within the framework of deep learning [23]; (iii) no assumptions are made about neither the shape of the spine nor the visible vertebrae. This way, the model is adapted to the target data, which is considerably easier to train in our experience than the alternative of adapting the data to the model [24]. ...
Full-text available
A variety of medical computer vision applications analyze 2D slices of computed tomography (CT) scans, whereas axial slices from the body trunk region are usually identified based on their relative position to the spine. A limitation of such systems is that either the correct slices must be extracted manually or labels of the vertebrae are required for each CT scan to develop an automated extraction system. In this paper, we propose an unsupervised domain adaptation (UDA) approach for vertebrae detection and identification based on a novel Domain Sanity Loss (DSL) function. With UDA the model’s knowledge learned on a publicly available (source) data set can be transferred to the target domain without using target labels, where the target domain is defined by the specific setup (CT modality, study protocols, applied pre- and processing) at the point of use (e.g., a specific clinic with its specific CT study protocols). With our approach, a model is trained on the source and target data set in parallel. The model optimizes a supervised loss for labeled samples from the source domain and the DSL loss function based on domain-specific “sanity checks” for samples from the unlabeled target domain. Without using labels from the target domain, we are able to identify vertebra centroids with an accuracy of 72.8%. By adding only ten target labels during training the accuracy increases to 89.2%, which is on par with the current state-of-the-art for full supervised learning, while using about 20 times less labels. Thus, our model can be used to extract 2D slices from 3D CT scans on arbitrary data sets fully automatically without requiring an extensive labeling effort, contributing to the clinical adoption of medical imaging by hospitals.
Full-text available
The COVID-19 pandemic continues to rage on, with multiple waves causing substantial harm to health and economies around the world. Motivated by the use of computed tomography (CT) imaging at clinical institutes around the world as an effective complementary screening method to RT-PCR testing, we introduced COVID-Net CT, a deep neural network tailored for detection of COVID-19 cases from chest CT images, along with a large curated benchmark dataset comprising 1,489 patient cases as part of the open-source COVID-Net initiative. However, one potential limiting factor is restricted data quantity and diversity given the single nation patient cohort used in the study. To address this limitation, in this study we introduce enhanced deep neural networks for COVID-19 detection from chest CT images which are trained using a large, diverse, multinational patient cohort. We accomplish this through the introduction of two new CT benchmark datasets, the largest of which comprises a multinational cohort of 4,501 patients from at least 16 countries. To the best of our knowledge, this represents the largest, most diverse multinational cohort for COVID-19 CT images in open-access form. Additionally, we introduce a novel lightweight neural network architecture called COVID-Net CT S, which is significantly smaller and faster than the previously introduced COVID-Net CT architecture. We leverage explainability to investigate the decision-making behavior of the trained models and ensure that decisions are based on relevant indicators, with the results for select cases reviewed and reported on by two board-certified radiologists with over 10 and 30 years of experience, respectively. The best-performing deep neural network in this study achieved accuracy, COVID-19 sensitivity, positive predictive value, specificity, and negative predictive value of 99.0%/99.1%/98.0%/99.4%/99.7%, respectively. Moreover, explainability-driven performance validation shows consistency with radiologist interpretation by leveraging correct, clinically relevant critical factors. The results are promising and suggest the strong potential of deep neural networks as an effective tool for computer-aided COVID-19 assessment. While not a production-ready solution, we hope the open-source, open-access release of COVID-Net CT-2 and the associated benchmark datasets will continue to enable researchers, clinicians, and citizen data scientists alike to build upon them.
Full-text available
This paper presents an automatic classification segmentation tool for helping screening COVID-19 pneumonia using chest CT imaging. The segmented lesions can help to assess the severity of pneumonia and follow-up the patients. In this work, we propose a new multitask deep learning model to jointly identify COVID-19 patient and segment COVID-19 lesion from chest CT images. Three learning tasks: segmentation, classification and reconstruction are jointly performed with different datasets. Our motivation is on the one hand to leverage useful information contained in multiple related tasks to improve both segmentation and classification performances, and on the other hand to deal with the problems of small data because each task can have a relatively small dataset. Our architecture is composed of a common encoder for disentangled feature representation with three tasks, and two decoders and a multi-layer perceptron for reconstruction, segmentation and classification respectively. The proposed model is evaluated and compared with other image segmentation techniques using a dataset of 1369 patients including 449 patients with COVID-19, 425 normal ones, 98 with lung cancer and 397 of different kinds of pathology. The obtained results show very encouraging performance of our method with a dice coefficient higher than 0.88 for the segmentation and an area under the ROC curve higher than 97% for the classification.
Full-text available
The use of imaging data has been reported to be useful for rapid diagnosis of COVID-19. Although computed tomography (CT) scans show a variety of signs caused by the viral infection, given a large amount of images, these visual features are difficult and can take a long time to be recognized by radiologists. Artificial intelligence methods for automated classification of COVID-19 on CT scans have been found to be very promising. However, current investigation of pretrained convolutional neural networks (CNNs) for COVID-19 diagnosis using CT data is limited. This study presents an investigation on 16 pretrained CNNs for classification of COVID-19 using a large public database of CT scans collected from COVID-19 patients and non-COVID-19 subjects. The results show that, using only 6 epochs for training, the CNNs achieved very high performance on the classification task. Among the 16 CNNs, DenseNet-201, which is the deepest net, is the best in terms of accuracy, balance between sensitivity and specificity, F1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_1$$\end{document} score, and area under curve. Furthermore, the implementation of transfer learning with the direct input of whole image slices and without the use of data augmentation provided better classification rates than the use of data augmentation. Such a finding alleviates the task of data augmentation and manual extraction of regions of interest on CT images, which are adopted by current implementation of deep-learning models for COVID-19 classification.
Full-text available
Detecting COVID-19 early may help in devising an appropriate treatment plan and disease containment decisions. In this study, we demonstrate how transfer learning from deep learning models can be used to perform COVID-19 detection using images from three most commonly used medical imaging modes X-Ray, Ultrasound, and CT scan. The aim is to provide over-stressed medical professionals a second pair of eyes through intelligent deep learning image classification models. We identify a suitable Convolutional Neural Network (CNN) model through initial comparative study of several popular CNN models. We then optimize the selected VGG19 model for the image modalities to show how the models can be used for the highly scarce and challenging COVID-19 datasets. We highlight the challenges (including dataset size and quality) in utilizing current publicly available COVID-19 datasets for developing useful deep learning models and how it adversely impacts the trainability of complex models. We also propose an image pre-processing stage to create a trustworthy image dataset for developing and testing the deep learning models. The new approach is aimed to reduce unwanted noise from the images so that deep learning models can focus on detecting diseases with specific features from them. Our results indicate that Ultrasound images provide superior detection accuracy compared to X-Ray and CT scans. The experimental results highlight that with limited data, most of the deeper networks struggle to train well and provides less consistency over the three imaging modes we are using. The selected VGG19 model, which is then extensively tuned with appropriate parameters, performs in considerable levels of COVID-19 detection against pneumonia or normal for all three lung image modes with the precision of up to 86% for X-Ray, 100% for Ultrasound and 84% for CT scans.
Full-text available
Chest CT is emerging as a valuable diagnostic tool for clinical management of COVID-19 associated lung disease. Artificial intelligence (AI) has the potential to aid in rapid evaluation of CT scans for differentiation of COVID-19 findings from other clinical entities. Here we show that a series of deep learning algorithms, trained in a diverse multinational cohort of 1280 patients to localize parietal pleura/lung parenchyma followed by classification of COVID-19 pneumonia, can achieve up to 90.8% accuracy, with 84% sensitivity and 93% specificity, as evaluated in an independent test set (not included in training and validation) of 1337 patients. Normal controls included chest CTs from oncology, emergency, and pneumonia-related indications. The false positive rate in 140 patients with laboratory confirmed other (non COVID-19) pneumonias was 10%. AI-based algorithms can readily identify CT scans with COVID-19 associated pneumonia, as well as distinguish non-COVID related pneumonias with high specificity in diverse patient populations. Chest CT is emerging as a valuable diagnostic tool for clinical management of COVID-19 associated lung disease. Here, the authors present a multinational study on the application of deep learning algorithms for COVID-19 diagnosis against multiple lung conditions as controls.
Full-text available
Background The COVID-19 pandemic has spread across the globe with alarming speed, morbidity and mortality. Immediate triage of suspected patients with chest infections caused by COVID-19 using chest CT may be of assistance when results from definitive viral testing are delayed. Purpose To develop and validate an artificial intelligence (AI) system to score the likelihood and extent of pulmonary COVID-19 on chest CT scans using the CO-RADS and CT severity scoring systems. Materials and Methods CORADS-AI consists of three deep learning algorithms that automatically segment the five pulmonary lobes, assign a CO-RADS score for the suspicion of COVID-19 and assign a CT severity score for the degree of parenchymal involvement per lobe. This study retrospectively included patients who received an unenhanced chest CT scan due to clinical suspicion of COVID-19 at two medical centers. The system was trained, validated, and tested with data from one of the centers. Data from the second center served as an external test set. Diagnostic performance and agreement with scores assigned by eight independent observers were measured using receiver operating characteristic (ROC) analysis, linearly-weighted kappa and classification accuracy. Results 105 patients (62 ± 16 years, 61 men) and 262 patients (64 ± 16 years, 154 men) were evaluated in the internal and the external test set, respectively. The system discriminated between COVID-19 positive and negative patients with areas under the ROC curve of 0.95 (95% CI: 0.91-0.98) and 0.88 (95% CI: 0.84-0.93). Agreement with the eight human observers was moderate to substantial with a mean linearly-weighted kappa of 0.60 ± 0.01 for CO-RADS scores and 0.54 ± 0.01 for CT severity scores. Conclusion CORADS-AI correctly identified COVID-19 positive patients with high diagnostic performance from chest CT exams, assigned standardized CO-RADS and CT severity scores in good agreement with eight independent observers and generalized well to external data.
Full-text available
Purpose: To present a method that automatically segments and quantifies abnormal CT patterns commonly present in coronavirus disease 2019 (COVID-19), namely ground glass opacities and consolidations. Materials and methods: In this retrospective study, the proposed method takes as input a non-contrasted chest CT and segments the lesions, lungs, and lobes in three dimensions, based on a dataset of 9749 chest CT volumes. The method outputs two combined measures of the severity of lung and lobe involvement, quantifying both the extent of COVID-19 abnormalities and presence of high opacities, based on deep learning and deep reinforcement learning. The first measure of (PO, PHO) is global, while the second of (LSS, LHOS) is lobewise. Evaluation of the algorithm is reported on CTs of 200 participants (100 COVID-19 confirmed patients and 100 healthy controls) from institutions from Canada, Europe and the United States collected between 2002-Present (April, 2020). Ground truth is established by manual annotations of lesions, lungs, and lobes. Correlation and regression analyses were performed to compare the prediction to the ground truth. Results: Pearson correlation coefficient between method prediction and ground truth for COVID-19 cases was calculated as 0.92 for PO (P < .001), 0.97 for PHO(P < .001), 0.91 for LSS (P < .001), 0.90 for LHOS (P < .001). 98 of 100 healthy controls had a predicted PO of less than 1%, 2 had between 1-2%. Automated processing time to compute the severity scores was 10 seconds per case compared to 30 minutes required for manual annotations. Conclusion: A new method segments regions of CT abnormalities associated with COVID-19 and computes (PO, PHO), as well as (LSS, LHOS) severity scores.
Full-text available
Accurate and rapid diagnosis of COVID-19 suspected cases plays a crucial role in timely quarantine and medical treatment. Developing a deep learning-based model for automatic COVID-19 diagnosis on chest CT is helpful to counter the outbreak of SARS-CoV-2. A weakly-supervised deep learning framework was developed using 3D CT volumes for COVID-19 classification and lesion localization. For each patient, the lung region was segmented using a pre-trained UNet; then the segmented 3D lung region was fed into a 3D deep neural network to predict the probability of COVID-19 infectious; the COVID-19 lesions are localized by combining the activation regions in the classification network and the unsupervised connected components. 499 CT volumes were used for training and 131 CT volumes were used for testing. Our algorithm obtained 0.959 ROC AUC and 0.976 PR AUC. When using a probability threshold of 0.5 to classify COVID-positive and COVID-negative, the algorithm obtained an accuracy of 0.901, a positive predictive value of 0.840 and a very high negative predictive value of 0.982. The algorithm took only 1.93 seconds to process a single patient’s CT volume using a dedicated GPU. Our weakly-supervised deep learning model can accurately predict the COVID-19 infectious probability and discover lesion regions in chest CT without the need for annotating the lesions for training. The easily-trained and high-performance deep learning algorithm provides a fast way to identify COVID-19 patients, which is beneficial to control the outbreak of SARS-CoV-2. The developed deep learning software is available at .
Amidst the ongoing pandemic, the assessment of computed tomography (CT) images for COVID-19 presence can exceed the workload capacity of radiologists. Several studies addressed this issue by automating COVID-19 classification and grading from CT images with convolutional neural networks (CNNs). Many of these studies reported initial results of algorithms that were assembled from commonly used components. However, the choice of the components of these algorithms was often pragmatic rather than systematic and systems were not compared to each other across papers in a fair manner. We systematically investigated the effectiveness of using 3-D CNNs instead of 2-D CNNs for seven commonly used architectures, including DenseNet, Inception, and ResNet variants. For the architecture that performed best, we furthermore investigated the effect of initializing the network with pretrained weights, providing automatically computed lesion maps as additional network input, and predicting a continuous instead of a categorical output. A 3-D DenseNet-201 with these components achieved an area under the receiver operating characteristic curve of 0.930 on our test set of 105 CT scans and an AUC of 0.919 on a publicly available set of 742 CT scans, a substantial improvement in comparison with a previously published 2-D CNN. This article provides insights into the performance benefits of various components for COVID-19 classification and grading systems. We have created a challenge on to allow for a fair comparison between the results of this and future research.