ArticlePDF AvailableLiterature Review

Abstract and Figures

Most oncological cases can be detected by imaging techniques, but diagnosis is based on pathological assessment of tissue samples. In recent years, the pathology field has evolved to a digital era where tissue samples are digitised and evaluated on screen. As a result, digital pathology opened up many research opportunities, allowing the development of more advanced image processing techniques, as well as artificial intelligence (AI) methodologies. Nevertheless, despite colorectal cancer (CRC) being the second deadliest cancer type worldwide, with increasing incidence rates, the application of AI for CRC diagnosis, particularly on whole-slide images (WSI), is still a young field. In this review, we analyse some relevant works published on this particular task and highlight the limitations that hinder the application of these works in clinical practice. We also empirically investigate the feasibility of using weakly annotated datasets to support the development of computer-aided diagnosis systems for CRC from WSI. Our study underscores the need for large datasets in this field and the use of an appropriate learning methodology to gain the most benefit from partially annotated datasets. The CRC WSI dataset used in this study, containing 1,133 colorectal biopsy and polypectomy samples, is available upon reasonable request.
This content is subject to copyright. Terms and conditions apply.

Scientic Reports | (2021) 11:14358 | 
www.nature.com/scientificreports
CAD systems for colorectal cancer
from WSI are still not ready
for clinical acceptance
Sara P. Oliveira1,2,6*, Pedro C. Neto1,2,6, João Fraga3,6, Diana Montezuma3,4,5, Ana Monteiro3,
João Monteiro3, Liliana Ribeiro3, Soa Gonçalves3, Isabel M. Pinto3 & Jaime S. Cardoso1,2
Most oncological cases can be detected by imaging techniques, but diagnosis is based on pathological
assessment of tissue samples. In recent years, the pathology eld has evolved to a digital era where
tissue samples are digitised and evaluated on screen. As a result, digital pathology opened up many
research opportunities, allowing the development of more advanced image processing techniques,
as well as articial intelligence (AI) methodologies. Nevertheless, despite colorectal cancer (CRC)
being the second deadliest cancer type worldwide, with increasing incidence rates, the application
of AI for CRC diagnosis, particularly on whole-slide images (WSI), is still a young eld. In this review,
we analyse some relevant works published on this particular task and highlight the limitations that
hinder the application of these works in clinical practice. We also empirically investigate the feasibility
of using weakly annotated datasets to support the development of computer-aided diagnosis systems
for CRC from WSI. Our study underscores the need for large datasets in this eld and the use of an
appropriate learning methodology to gain the most benet from partially annotated datasets. The
CRC WSI dataset used in this study, containing 1,133 colorectal biopsy and polypectomy samples, is
available upon reasonable request.
Pathologists are responsible for the diagnosis of samples collected during biopsies. Digital pathology methods
have been increasing due to technological advances, and their implementation can support the work conducted
by pathologists. And while this multi step process requires an additional scanning step (Fig.1), the benets far
outweigh the increased initial overhead of these steps. For example, access to old cases, collaboration with exter-
nal laboratories, and data sharing are all made easier. For example, peer-review of a whole-slide image (WSI) is
completed at a quicker pace with a digital pathology workow. In addition, the ability to easily access images
mitigates the risk of errors, making diagnosis more auditable.
Over the last decade, the advent of digitised tissue samples, the wider adoption of digital workows in pathol-
ogy labs, and the consequent availability of more data, combined with a shortage of pathologists, enabled the
evolution of the computational pathology eld with the integration of automatic image analysis into clinical
practice, mainly based in Articial Intelligence (AI) methodologies14. Researchers have been exploring the
implementation of computer-aided diagnosis (CAD) systems for several dierent tasks regarding cancer WSI.
e most popular are the detection, grading and/or segmentation of lesions. Additionally, there are also predic-
tive systems that attempt to estimate the patient’s probability of survival.
Despite the ever-growing number of publications of machine learning (ML) methods applied to CAD systems,
there is a dearth of published work for the task of joint detection and classication of colorectal lesions from WSI,
lagging colorectal cancer (CRC) behind pathologies such as breast cancer and prostate cancer. Furthermore,
a signicant amount of the work developed does not use the entire WSI but instead uses crops and regions of
interest extracted from these images. While these latter works show signicant results, the applicability of such
works in clinical practice is limited. Similarly, publicly available datasets oen consist of crops instead of the
original image. Others include only abnormal tissue, limiting the development of CRC diagnostic systems and
the detection task.
As a rst step in addressing these limitations, in this work, we examine and identify the benets and short-
comings of the current body of work on CAD systems for CRC diagnosis from WSI. Since the development of
such systems typically requires large and diverse datasets, any review would be incomplete without a concurrent
OPEN
            
           

 *
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2021) 11:14358 | 
www.nature.com/scientificreports/
discussion of the existing data and associated annotation metadata. Following this initial reection, and in an
attempt to better understand the limitations of the data and methods used to develop CAD systems for CRC
diagnosis, we analyse the impact of using partially annotated datasets, as well as the eect of increasing training
instances, on the robustness of learning. We conduct this empirical study using in-house data, an ongoing eort
to develop a reference dataset for CRC automatic diagnosis. ese comparisons serve two distinct purposes:
First, they validate the annotation and labelling eorts used in the dataset construction; Second, public data is
used to evaluate the dependency of the performance with the number of used samples. One method was assessed
for its applicability within a framework for pathological diagnosis in a digital workow. In an attempt to address
the lack of public CRC datasets with large numbers of slides, the colorectal data used in this study, that sums up
1,133 colorectal biopsy and polypectomy samples, is available upon reasonable request.
is paper distinguish itself from previous reviews5,6 by focusing only on CRC histopathological slides clas-
sication, distinguishing works that evaluate complete slides from those that use only smaller portions of the
image, without returning a nal diagnosis for each case. Moreover, we discuss their major shortcomings, espe-
cially with regard to the properties of datasets that are used. In this review, we provide a detailed description of
the proposed methods, including most of the references of the previous reviews and, in addition, some more
recent works, making this a timely paper.
In addition to this introductory section, this paper consists of three other main sections and a conclusion.
e development of AI models for medical applications should always take into consideration the clinical back-
ground, and thus “Clinical insights” section introduces the main clinical notions regarding CRC and overviews
the process of classifying lesions. Aerwards, “Computational pathology” section provides a detailed review
of CAD systems for CRC diagnosis from WSI, discussing the current problems regarding the analysis of these
images. Finally, before the conclusion and discussion of future work on “Conclusions” section, “Feasibility study
on the use of weakly annotated and larger datasets for CRC diagnosis” section not only describes the workow of
the CRC dataset construction, but also shows the results of our feasibility study on the use of weakly annotated
and larger datasets.
Clinical insights
CRC epidemiology. Colorectal cancer (CRC) represents one of the major public health problems today. A
striking and oen unknown fact is that CRC is the second most deadly cancer7,8. Globocan estimated data for
2020 show that CRC is the third most incident cancer (10% of all cancers) and the second most deadly (9,4%;
surpassed by lung cancer, 18%)8. CRC is the third most common cancer in men (aer lung and prostate cancer)
and the second most common cancer in women (aer breast cancer). CRC is a disease of modern times: the
highest rates of incidence happen in developed countries9. As the world becomes richer, and people shi to a
western lifestyle, the incidence of CRC is expected to increase, since it is a multifactorial disease resulting from
lifestyle, genetic, and environmental factors9,10. Population growth and ageing lead to an increasing incidence
of the disease, as well as better and more numerous screening programs for early detection and prevention. e
prevalence of screening among individuals aged 50 years and older increased from 38%, in 2000, to 66%, in 2018,
according to data from the National Center for Health Statistics (NHIS)11. Importantly, CRC is a preventable
and curable cancer if detected early on, and, therefore, screening is an eective tumour prevention measure12.
Screening determines the decrease in mortality through timely detection and removal of adenomatous polypoid
(pre-malignant) lesions, promoting the interruption of progression to cancer. It should begin with colonoscopy
in asymptomatic individuals aged 50 years or over (and without personal or family risk factors for CRC) and
Figure1. Digital pathology workow, from collecting the biopsy sample to the WSI visualisation.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2021) 11:14358 | 
www.nature.com/scientificreports/
repeated every ten years if normal13. It is worth mentioning that, due to the Covid-19 pandemic, CRC screening
programmes have been disrupted worldwide. As such, it is crucial that catch-up screening is provided as soon
and eectively as possible, hoping to mitigate the impact on CRC deaths14,15. Computer-aided diagnosis (CAD)
solutions in CRC could help in this task, contributing to improve pathology diagnostic capacity.
CRC pathological assessment. During pathological assessment, colorectal biopsies/polyps can be strati-
ed in non-neoplastic, low-grade dysplasia (LGD), high-grade dysplasia (HGD, including intramucosal car-
cinomas) and invasive carcinomas, regarding their development sequence. Colorectal dysplasia refers to the
pre-malignant abnormal development of cells/tissues, which can eventually progress to tumour lesions, and is
classied in low- and high-grade, with the last confering a relative higher risk of cancer(Fig.2).
e case for grading dysplasia. It is well-known that grading colorectal dysplasia is a somewhat subjective issue.
In a study to evaluate inter-observer variability in HGD diagnosis, ve gastrointestinal pathologists conducted a
consensus conference in which criteria for colorectal HGD were developed16. When grading the same 107 pol-
yps, the inter-observer agreement was found to be poor both before and aer the consensus. Other studies have
also shown sub-optimal agreement in grading colorectal dysplasia17,18. Despite this, the most recent guidelines
from the European Society of Gastrointestinal Endoscopy (ESGE), as well as those from the US multi-society
task force on CRC, continue to recommend surveillance for polyps with high-grade dysplasia regardless of their
size13,19. Patients requiring surveillance aer polypectomy include those with complete removal of:
at least one adenoma
10 mm
or with high-grade dysplasia;
ve or more adenomas;
any serrated polyp
10 mm
or with dysplasia13.
As such, it remains a current practice in most countries (although not in every laboratory) to evaluate and grade
colorectal dysplasia.
Dysplasia grading. As previously stated, various studies have shown good concordance among pathologists
in recognising adenomatous features, but lower levels of agreement evaluating dysplasia grade, with signicant
inter-observer variability1618. To date, there are still no tangible criteria on what distinguishes the high end of
LGD from the low end of HGD. Although there are some reporting guidelines regarding grading dysplasia in
colorectal biopsies2022, objective criteria are still lacking. It is fairly easy for a pathologist to diagnose a typical
low grade or high-grade adenoma but, since in fact these lesions exist in a continuum, the correct assessment of
many intermediate cases is more dicult. Nevertheless, protocols such as the English National Health System
(NHS) bowel cancer screening programme guidance, with guidelines from the bowel cancer screening pro-
gramme pathology group20 or the Pan-Canadian consensus for colorectal polyps report22, can aid pathologists
to grade colorectal lesions more objectively. Additional information from reference books, such as the World
Figure2. Normal colonic mucosa and dysplastic progression.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2021) 11:14358 | 
www.nature.com/scientificreports/
Health Organization (WHO) Classication of Tumours: digestive system tumours7, can also assist in this task.
e most relevant characteristics that dierentiate low- and high-grade dysplasia are detailed in Table1.
Computational pathology
e digitisation of histopathology images opened many research opportunities for the eld of computer-aided
image analysis23,24. In fact, due to the high-resolution and complex nature of whole-slide image (WSI) evaluation,
advances in image analysis are required, which provides the opportunity to apply and advance image processing
techniques, as well as AI methodologies, such as machine learning (ML) and deep learning (DL) algorithms3,4,23,24.
Moreover, the integration of AI into healthcare routines is a required milestone for the years to come, and thus,
in terms of pathology-focused research, many DL architectures have been applied with many dierent tasks in
mind, either to predict diagnoses or even to identify new biomarkers3.
Regarding the eld of AI, and its application in computational pathology, DL models25, which consist of
multiple layers of processing to learn dierent levels of data representation, are the most common and promising
methods nowadays. e networks are composed of multiple layers, each with multiple nodes. e large numbers
of hidden layers confer depth to the networks, hence the name. Each node performs a weighted sum of its inputs
and then feeds it into a non-linear function, the result of which is passed forward as input to the following layer
and so on until the last layer, which provides the network output. In this way, these models have the intrinsic abil-
ity to learn features, directly from the input data, useful for the task at hand25. In particular, convolutional neural
networks (CNN) are applied to images and automatically extract features, which are then used to identify objects/
regions of interest or to classify the underlying diagnosis26. In digital pathology, this type of models is used, for
example, for mitosis detection27,28, tissue segmentation29,30, cancer grading31,32 or histological classication33,34.
Challenges of whole-slide image analysis. Despite the popularity, clear potential, progress and good
results of DL in computer vision, and in medical imaging in particular, researchers should carefully consider and
manage its pros and cons4,35. Indeed, digital pathology brings some specic challenges that need to be addressed:
High dimensionality of data. Histology images are extremely informative, but at the cost of high dimension-
ality, usually over
50, 000 ×50, 000
pixels35. Hence, these images do not t in the memory of a Graphics
Processing Unit (GPU), which is usually needed to train DL models. Current methods either downsample
the original image or extract multiple smaller patches, choosing between the cost of losing pixel information
or losing spatial information, respectively;
Data variability, due to the nearly innite patterns resulting from the basic tissue types, and the lack of
standardisation in tissue preparation, staining and scanning;
Lack of annotated data, since extensive annotation is subjective, tedious, expensive and time-consuming;
Non-boolean diagnosis, especially in dicult and rare cases, which makes the diagnosis process more complex;
Need for interpretability/explainability, in order to be reliable, easily debugged, trusted and approved4,35,36.
erefore, the research community has the opportunity to develop robust algorithms with high performance,
transparent and as interpretable as possible, always designed and validated in partnership with pathologists. To
this end, one can take advantage of some well-known techniques such as transfer learning (using pre-trained
networks instead of training from scratch), weakly/unsupervised learning (analysing images only with slide-
level labelling), generative frameworks (by learning to generate images, the algorithm can understand their main
distinctive features) or multitask learning (learning interrelated concepts may produce better generalisations)35.
Computational pathology on colorectal cancer. As mentioned earlier, the rise of DL and its applica-
tion in computer vision has been critical to computer-aided diagnosis (CAD) research. Several researchers have
Table 1. Colorectal low- and high-grade dysplasia characterisation. *Architectural features: gland morphology
and placement; **Cytological features: cell level characteristics.
Low-grade dysplasia High-grade dysplasia
Extension – Changes must involve more than two glands (except in tiny biopsies of
polyps)
Low power magnication Lack of architectural complexity suggests low-grade dysplasia throughout Alterations have to be enough to be identied atlow power: complex
architectural abnormalities, epithelium looks thick, blue, disorganised and
“dirty”
Cytology/architecture Does not combine cytological high-grade dysplasia with architectural
high-grade features Needs to combine high-grade cytological and high-grade architectural
alterations
Architectural features* Gland crowding, showing parallel disposition, with no complexity (no
back-to-back or cribriforming); Global architecture may vary from tubular
to villous
Complex glandular crowding and irregularity ; Prominent budding;
Cribriform appearance and back-to-back glands; Prominent intra-luminal
papillary tuing
Cytological features**
Nucleus are enlarged and hyperchromatic, many times cigar-shaped;
Nucleus maintain basal orientation (only up to the lower half of the
height of the epithelium, although in some cases we can see glands with
full-thickness nuclear stratication - this is not HGD if the archite- ture
is bland); ere is no loss of cell polarity or pleomorphism; No atypical
mitosis; Maintained cytological maturation (mucin)
Noticeably enlarged nuclei, oen with a dispersed chromatin pattern and
evident nucleoli; Loss of cell polarity or nuclear stratication to the extent
that the nuclei are distributed within all 1/3 of the height of the epithelium;
Atypical mitoses; Prominent apoptosis/necrosis, giving the lesion a “dirty”
appearance; Lack of cytological maturation (loss of mucin)
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2021) 11:14358 | 
www.nature.com/scientificreports/
sought to work alongside pathologists to improve or reduce the workload of diagnosing cancer using histopatho-
logical images. However, the development of AI applications for colorectal cancer (CRC) diagnosis on WSI is
still limited, as noted by akuret al.5: of the 30 papers reviewed, only 20% have diagnosis as a nal goal. In fact,
the majority of the papers deal with a wide variety of tasks, with a particular focus on tissue segmentation, the
goal of 62% of the reviewed papers5. Last year, Wanget al.6 also published a review on the application of AI to
CRC diagnosis and therapy, reecting the same trend. However, CRC diagnosis is a growing application, with
an increasing number of publications in recent years. In the next section, we collect and describe the published
works on CRC diagnosis, with a particular focus on slide diagnosis (Table2), but also summarising some works
using partial regions of tissue (region crops or tiles) without aggregation for WSI.
CRC diagnosis on WSI. In 2012, Kalkanet al.37 proposed a method for CRC automatic detection from Haema-
toxylin and Eosin (H&E) slides, combining textural and structural features of smaller patches (
1024 ×1024
pix-
els). Firstly, the patches are classied into normal, inamed, adenomatous or cancer with a k-NN classier, based
on local shape and textural features, such as Haralick features, Gabor lters features and colour histograms fea-
tures. en, the (up to) 300 patches representing the slide are summarised in the average probabilities for all the
four primary classes, and used as a feature vector for a logistic-linear regressor, to obtain a nal slide diagnosis:
normal or cancer. e proposed method was trained on 120 H&E stained slides and achieved an Area Under the
Curve (AUC) of 0.90 and an average accuracy of 87.69%, with accuracies of 79.17% and 92.68% for cancer and
normal slides, respectively. Similarly, using traditional computer vision techniques, Yoshidaet al.38 presented an
approach to classify CRC H&E slides into 4 types: non-neoplastic, adenoma, carcinoma and unclassiable. For
each WSI, all tissue regions are identied, summing 1328 sections from 1068 H&E slides. en, each section is
processed for blur detection and colour normalisation before the analysis in two steps: cytological atypia analysis
and structural atypia analysis. In the rst step, the method proposed by Cosattoet al.39 is used, based on multiple
instance learning (MIL) formulation using a Multilayer Perceptron (MLP), to grade the degree of cytological
alteration of the tissue (high or low).
en, the image is classied into low atypia, intermediate atypia, high atypia or unclassiable, based on
structural nuclear features and cytoplasmatic features, extracted from consecutive ROIs, that are summarised
by the mean-square of the top 3 ROIs. Finally, each image is classied based on the combination of structural
atypia analysis result (high, intermediate or low) and the cytological atypia analysis result (high or low), given
that carcinoma presents higher atypia values. e model has an undetected carcinoma rate of 9.3%, an undetected
adenoma rate of 0.0% and an over detection proportion of 27.1%.
e rst DL application model was presented in 2017, by Korbaret al.40, to automatically classify colorec-
tal polyps on H&E stained slides into ve classes: normal, hyperplastic, sessile serrated, traditional serrated
adenoma, tubular adenoma and tubulovillous/villous adenoma. e 697 H&E stained slides (annotated by a
pathologist) were cropped into ROIs of
811 ×984
pixels (mean size), and then divided into overlapping smaller
patches. ese patches were classied using the ResNet-152 and the prediction of the slide was obtained as the
most common colorectal polyp class among all patches of the slide. However, if no more than ve patches are
identied with the most common class, with a condence higher than 70%, the slide is classied as normal.
e proposed system achieved 93.0% accuracy, 89.7% precision, 88.3% recall and 8.8% F1-score. Later, the
authors proposed a visualisation method41, based on this approach, to identify highly-specic ROIs for each
Table 2. Literature overview on colorectal whole slide image diagnosis. CRC : Colorectal Cancer; AD:
Adenoma; CA: Carcinoma; ADC: Adenocarcinoma; H&E: Haemotoxylin & Eosin; px: pixels; k-NN: k Nearest
Neighbours; ROI: Region of Interest CNN: Convolutional Neu- ral Network; SVM: Support Vector Machine;
MLP: Multi-Layer Perceptron; MIL: Multiple Instance Learning; Acc.: Accuracy; AUC : Area Under the ROC
Curve; FNR/FPR: False Negative/Positive Rate
Author Yea r Task Dataset Description Results
Kalkanet al.37 2012 CRC detection(normal vs cancer) 120 H&E slides(tile annotations) 1024 × 1024 px tiles;k-NN classier
+ Logistic-linear classier Acc.: 87.69%; AUC: 0.90
Korbaret al.40 2017
Polyp classication (6-class): normal,
hyperplastic, sessile serrated,
traditional serrated, tubular and
tubulovillous/villous
697 H&E slides (annotated) 811 × 984 px ROIs (mean
size);ResNet-152 + argmax of tile
class frequency
Acc.: 93%; Precision: 89.7%; Recall:
88.3%; F1-score: 88.8%
Yoshidaet al.38 2017 CRC classication (4-class): unclas-
siable, non-neoplastic, adenoma
and CA
1068 H&E slides (w/ labelled tissue
sections)
Tissue sections crop + cytological
atypia analysis + structural atypia
analysis + overall classication
FNR (CA): 9.3%; FNR (adenoma):
0%; FPR: 27.1%
Iizukaet al.43 2020 CRC classication (3-class): non-
neoplastic, AD and ADC
4536 H&E slides (annotated) + 547
H&E slides from TCGA-COAD
collection
512 × 512 px tiles at 20×; Inception-
v3 + RNN
AUC: 0.962 (ADC), 0.993 (AD);
AUC (TCGA-COAD subset): 0.982
(ADC)
Songet al.46 2020 Colorectal adenoma detection (nor-
mal vs adenoma) 411 H&E slides (annotated) + exter-
nal set: 168 H&E slides
640 × 640 px tiles at 10×; Modied
DeepLab-v2 + 15th largest pixel
probability AUC: 0.92; Acc. (external set):>90%
Weiet al.45 2020 Polyp classication (5-class): Nor-
mal, hyperplastic, tubular, tubulovil-
lous/villous, sessile serrated
508 H&E slides (annotated) + exter-
nal set: 238 H&E slides
224 × 224 px tiles at 40×; ResNet
models ensemble + hierarchical
classication Acc.: 93.5%; Acc. (external set): 87%
Xuet al.47 2020 CRC detection (normal vs cancer) 307 H&E slides (annotated) + 50
H&E slides (external set) 768 × 768 px tiles; Inception-v3 +
tiles tumour probability thresholding Acc.:>93%; Acc. (external set): >87%
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2021) 11:14358 | 
www.nature.com/scientificreports/
type of colorectal polyps within a slide, using the Guided Grad-CAM method42 and a subset of data (176 H&E
colorectal slides).
In 2020, several authors presented solutions for CRC diagnosis, with varying degrees of detail. Iizukaet
al.43 proposed the combination of a Inception-v3 network with a recurrent neural network (RNN) to classify
H&E colorectal WSI into non-neoplastic, adenoma and adenocarcinoma. Each slide was divided into patches
of
512 ×512
pixels (at 20X magnication, with a sliding window of 256 pixels) and assigned to one of the three
diagnostic classes. en, all tiles are aggregated using a RNN, trained to combine the features outputted by the
CNN. e dataset consists of subsets from two dierent institutions, summing 4536 WSIs. Moreover, the model
was also evaluated on a subset of 547 colon surgical resection cases from e Cancer Genome Atlas (TCGA)
repository44, containing adenocarcinoma and normal samples (TCGA-COAD collection). On the private data-
set, the proposed approach measured AUCs of 0.962 and 0.992 for colorectal adenocarcinomas and adenomas,
respectively. On the public dataset, the model achieved an 0.982 AUC for adenocarcinomas. It is noteworthy
that the authors report that, since the samples from the external subset are much larger than the biopsies used
for training, the RNN aggregation was replaced by a max-pooling aggregation. Meanwhile, Weiet al.45 aimed to
identify ve types of polyps in H&E stained colorectal slides: normal, tubular adenoma, tubulovillous or villous
adenoma, hyperplastic polyp, and sessile serrated adenoma. To train the model, the authors used 509 slides (with
annotations of relevant areas by ve specialised pathologists) and for further testing, they used an external set of
238 slides, obtained from dierent institutions. e model consists of an ensemble of the ve versions of ResNet
(namely, networks with 34, 50, 101, and 152 layers) to classify tiles of
224 ×224
pixels (at 40X magnication).
en, the patches are combined with a hierarchical classier to predict a slide diagnosis. Based on the predicted
tile classes, the model rst classies a polyp as adenomatous or serrated, by comparing the frequency of tiles
classes (tubular, tubulovillous, or villous vs. hyperplastic or sessile serrated). Adenomatous polyps with more
than 30% tubulovillous or villous adenoma tiles are classied within this class and the remaining are classied
as tubular adenoma. Serrated polyps with more than 1.5% of sessile serrated tiles are classied within this class
and the remaining are classied as hyperplastic. e thresholds were set with a grid search over the training set,
reaching an accuracy of 93.5%, on the internal test set, and 87.0% on the external test set.
Moreover, also during last year, two other authors proposed segmenting colorectal tissue simultaneously
with the diagnosis. Songet al.46 presented an approach based on a modied DeepLab-v2 network on
pixel tiles, at a 10X magnication. e dataset consists of 411 annotated slides, labelled as colorectal adenomas
or normal mucosa (which includes chronic inammation), and a subset of 168 slides collected from two other
institutions, to serve as an external test. e authors modied the DeepLab-v2 network by introducing a skip
layer that combines the upsampled lower layers with the higher layers, in order to retain semantic details of the
tiles. en, the 15th largest pixel-level probability is used for the slide-level prediction. In the inference phase,
the slide is decomposed into tiles of 22002200 pixels. e proposed approach achieved an AUC of 0.92 and,
when tested on the independent dataset, an accuracy over 90%. In turn, the model of Xuet al.47 was trained on
a set of 307 slides (normal and CRC), with tissue boundaries manually annotated by a pathologist, achieving a
mean accuracy of 99.9% for normal slides and 93.6% for cancer slides, and a mean dice coecient of 88.5%. For
further testing, the model was also evaluated on an external set of 50 CRC slides and achieved a mean accuracy
of 87.8% and a mean Dice coecient of 87.2%. e method uses the Inception-v3 architecture, pre-trained on
the ImageNet dataset, to classify patches of
768 ×768
pixels, resized to
299 ×299
pixels. e nal tumour regions
and slide diagnosis are obtained by thresholding the tile predictions: tiles with tumour probability above 0.65
are considered as cancer.
While some of the reported results are impressive and show high potential, there are still some obvious short-
comings that need to be addressed. One of the issues is model evaluation: most of the papers analysed have not
used any form of external evaluation on public benchmark datasets, as can be seen by the dataset descriptions
on Table2. is validation is necessary to understand and compare the performance of models that, otherwise,
cannot be directly compared to each other due to the use of distinct datasets. It also limits the study of robustness
of the model when it is exposed to data from sources other than those used for training. On the other hand, as
with any DL problem, the size of the dataset is crucial. Although, as mentioned earlier, it is expensive to collect
the necessary amount of data to develop a robust model, it is noticeable that the reviewed articles could greatly
benet from an increase in the volume of data, since most of the works are trained on only a few hundred slides
(Table2). Describing and sharing how the data collection and annotation processes were performed is also crucial
to assess the quality of the dataset and the quality of the annotations. For example, the number of annotators, their
experience in the eld, and how their discrepancies were resolved. However, this description was not a common
practice in the articles reviewed. Moreover, comparing models becomes more complicated when one realises
that the number of classes used for the classication tasks is not standardised across published work. erefore,
together with the dierence in the kind of metrics presented, direct comparisons should be made with caution.
Other CRC classication approaches. Despite the small number of published works on colorectal WSI diagnosis
(Table2), there is a myriad of other articles also working on CRC classication using information from smaller
tissue regions, that can be exploited as a basis for general diagnostic systems. Despite the dierent task, these
works that use image crops, or even small patches4852, can be leveraged for slide diagnosis, in combination with
aggregation methods that combine all the extracted information in a single prediction.
As for WSI classication, there are also approaches for crop images classication based on traditional com-
puter vision methods or DL models, and even a combination of both. In 2017, Xuet al.53 proposed the combina-
tion of an Alexnet (pre-trained on ImageNet dataset) as feature extractor and a SVM classier to develop both
a binary (normal vs. cancer) and a multiclass (CRC type) classication approach for cropped images (variable
size, 40X magnication) from CRC H&E slides. e latter goal is to distinguish between 6 classes: normal,
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2021) 11:14358 | 
www.nature.com/scientificreports/
adenocarcinoma, mucinous carcinoma, serrated carcinoma, papillary carcinoma cribriform adenocarcinoma.
Each image is divided into overlapping patches of
672 ×672
pixels (then resized to 224224 pixels), from which
4096-dimensional feature vectors are extracted. For cancer detection, features are selected based on the dier-
ences between positive and negative labels: the top 100 feature components (ranked from the largest dierences to
the smallest) are kept. en the nal prediction is obtained with a linear SVM (one-vs-rest classication for CRC
type). e CRC detection model has an accuracy of 98% and the CRC type classication model has an accuracy
of 87.2%, trained on 717 image crops. Already in 2019, Yanget al.54 and Ribeiroet al.55 proposed works based
on color and geometric features, and classical ML methods, to classify CRC. With colour pictures (
350 ×350
pixels, 20× magnication) from H&E stained colorectal tissue sections (labelled and marked by professional
pathologists), Yanget al.54 proposed a method based on sub-patch weight color histogram features, the RelicfF
based forward selection algorithm and a Morlet wavelet kernel-based least squares SVM classier. e method
was developed using a total of 180 images and obtained an AUC and accuracy of 0.85 and 83.13%, respectively.
Ribeiroet al.55 associated multidimensional fractal geometry, curvelet transforms and Haralick features and tested
several classiers on 151 cropped images (
775 ×522
pixels, 20X magnication) from 16 H&E adenocarcinoma
samples. e best result, an AUC of 0.994, was achieved with multiscale and multidimensional percolation
features (from curvelet sub-images with scale 1 and 4), quantications performed with multiscale and multidi-
mensional lacunarity (from input images and their curvelet sub-images with scale 1) and a polynomial classier.
Regarding DL models, there are also several proposed approaches for several CRC classication tasks. In
2017, Haj-Hassanet al.56 proposed a method based on multispectral images and a custom CNN to predict 3 CRC
types: benign hyperplasia, intrapithelial neoplasia and carcinoma. From the H&E stained tissue samples of 30
patients, 16 multispectral images of
512 ×512
pixels are acquired, in a wavelength range of 500–600 nm. Aer
a CRC tissue segmentation with an Active Contour algorithm, images are cropped in smaller tiles of
60 ×60
pixels (with the same label of the slide) and fed to a custom CNN (input size of
60 ×60 ×16
), reaching an
accuracy of 99.17%. In 2018, Ponzioet al.57 adapted a pre-trained VGG16 net for CRC classication into adeno-
carcinoma, tubuvillous adenoma and healthy tissue. ey used tissue subtype large ROIs, identied by a skilled
pathologist from 27 H&E stained slides of colorectal tissue from a public repository58, that were then cropped
into
1089 ×1089
patches, at a magnication level of 40×. By freezing the weights up to the most discriminative
pooling layer (determined by t-SNE) and training only the nal layers of the network, the solution provided a
classication accuracy over 90%. e system was evaluated at two levels: the patch score (fraction of patches
that were correctly classied) and patient score (per-patient patch score, averaged over all cases), that reached
96.82% and 96.78%, respectively. In 2019, Senaet al.59 proposed a custom CNN to classify four stages of CRC
tissue development: normal mucosa, early pre-neoplastic lesion, adenoma and carcinoma. e dataset consist of
393 images from H&E colorectal slides (20× magnication), cropped into nine subimages of
864 ×548
pixels.
For further validation on signicantly dierent images, the authors also used the GLaS challenge dataset60,61,
with 151 cropped images. Since both datasets dier on resolution, the GLaS images were resized with bi-cubic
interpolation and centrally cropped. e proposed method obtained an overall accuracy of 95.3% and the external
validation returned an accuracy of 81.7%. Meanwhile, Zhouet al.62 proposed a pipeline to classify colorectal
adenocarcinomas, based on the recent graph neural networks, converting each histopathological image into a
graph, with nucleus and cellular interactions being represented by nodes and edges, respectively. e authors
also propose a new graph convolution module, called Adaptive GraphSage, to combine multilevel features.
With 139 images (
4548 ×7520
pixels, 20x magnication), cropped from WSI labelled as normal, low grade and
high grade, the method achieved an accuracy of 97%. For the same classication task, in 2020, Shabanet al.63
proposed a context-aware convolution neural network to incorporate contextual information in the training
phase. Firstly, tissue regions (
1792 ×1792
pixels) are decomposed in local representations by a CNN (
224 ×224
pixels input), and the nal prediction is obtained by combining all contextual information with a representation
aggregation network, considering the spatial organisation of smaller tiles. is method was developed on 439
images (
5000 ×7300
pixels, 20× magnication) and achieved an average accuracy of 99.28% and 95.70% for
a binary and three class setup, respectively.
Feasibility study on the use of weakly annotated and larger datasets for CRC
diagnosis
Collecting and labelling data for computational pathology problems is a lengthy and expensive process. As seen
in the previous section, research is oen conducted on small datasets containing a high granularity of annota-
tions per sample. Despite the benets of detailed annotations, researchers have recently turned their attention
to weakly-supervised approaches. ese approaches, notwithstanding the simplied annotation, can leverage
larger datasets for learning. More importantly, weakly-supervised learning techniques are less prone to bias in
data collection. Performance is also evaluated on a more extensive test set, and thus, the behaviour of the model
in the real-world can be generalised much more accurately. In this section, we conduct a feasibility study on
the use of eciently annotated datasets to drive the development of computer-aided diagnosis (CAD) systems
for colorectal cancer (CRC) from whole-slide images (WSI). We attempt to answer the question of the required
dimension of the dataset, as well as the extension of annotations, to enable the robust learning of predictive
models. We also analyse the advantage of using a loss function adapted to the ordinal nature of the classes cor-
responding to the CRC scores.
Datasets. is feasibility study was conducted on two datasets: the rst contains colorectal haematoxylin
& eosin (H&E) stained slides (CRC dataset), whereas the second includes prostate cancer H&E-stained biopsy
slides (PCa dataset). As mentioned in “Computational pathology on colorectal cancer” section, there is a short-
age of large public datasets containing CRC WSIs and most of the existing ones are based on cropped regions
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2021) 11:14358 | 
www.nature.com/scientificreports/
instead of entire slides. Hence, we relied on a PCa dataset that, while not fully transferring to colorectal use case,
is one of the largest WSI datasets publicly available. is amount of data allowed us to study the data require-
ments of a weakly supervised approach and how the performance evolved with the growth of the dataset.
CRC dataset. e CRC dataset contains 1133 colorectal biopsy and polypectomy slides and is the result of our
ongoing eorts to contribute to CRC diagnosis with a reference dataset. We aim to detect high-grade lesions
with high sensitivity. High-grade lesions encompass conventional adenomas with high-grade dysplasia (includ-
ing intra-mucosal carcinomas) and invasive adenocarcinomas. In addition, we also intend to identify low-grade
lesions (corresponding to conventional adenomas with low-grade dysplasia). Accordingly, we created three
diagnostic categories for the algorithm, labelled as non-neoplastic, low-grade and high-grade lesions (Table3).
In the non-neoplastic category, cases with suspicion/known history of inammatory bowel disease/infection/
etc. were omitted. We selected conventional adenomas as they were the largest group on daily routine (serrated
lesions, and other polyp types, were omitted).
All cases were retrieved from the data archive of IMP Diagnostics laboratory, Portugal, and were digitised
by 2 Leica GT450 WSI scanners, and evaluated by one of two pathologists (Fig.3a). Data collection and usage
was performed in accordance with national legal and ethical standards applicable to this type of data. Since the
study is retrospectively designed, no protected health information was used and patient informed consent is
exempted from being requested.
Diagnostics were made using a medical grade monitor LG 27HJ712C-W and Aperio eSlide Manager soware.
When reviewing the cases, most diagnosis were coincident with the initial pathology report and no further
assessment was made. In case of dierence, the case was rechecked and decided between the two pathologists.
A small number of cases (n=100) were further annotated with region marks (Fig.3b) by one of the pathologists
and then rechecked by the other, using the Sedeen Viewer soware64.
Corrections were made when considered necessary by both. For complex cases, or when agreement could not
be reached, both the label and/or annotation were reevaluated by a third pathologist. Case classication followed
the criteria previously described in “Dysplasia grading” section. Accordingly, cases with only minimal high-grade
dysplasia (HGD) areas (only one or two glands), or with areas of orid high-grade cytological features but without
associated worrisome architecture, were kept in the low-grade dysplasia class, as well as cases with cytological
high-grade dysplasia only seen on the surface. It is worth noting that some cases may be more dicult to grade
and have to be decided on a case-by-case basis, preferentially by consensus. Additionally, as recommended by
the World Health Organization (WHO), intramucosal carcinomas were included in the HGD cases20,65.
Table 3. CRC dataset class denition.
Algorithm data classes Pathological diagnosis
Non-neoplastic Normal CR mucosa, non-specic inammation, hyperplasia
Low-grade lesion Low-grade conventional adenoma
High-grade lesion High-grade conventional adenoma and invasive adenocarcinoma
Figure3. Example of a whole-slide (a) from the CRC dataset. Manual segmentations (b) include regions
annotated as non-neoplastic (white), low-grade lesions (blue), high-grade lesions (pink), linfocytes (green) and
fulguration (yellow).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2021) 11:14358 | 
www.nature.com/scientificreports/
Regarding the distribution of slide labels, while the annotated samples are considerably imbalanced, as seen
in Fig.4a, when combined with the non-annotated samples, the distribution of the labels is signicantly more
even. Figure4b shows this nal distribution and it is closer to what is seen in clinical practice. Moreover, it was
important to fully annotate cases that are especially dicult or high-grade, so the model can learn more about
these critical cases. e CRC dataset was used not only to develop the proposed methodology, but also to evalu-
ate the relevance of annotations in a model pre-training: can a small set of annotated images leverage the overall
performance of the weakly supervised model?
PCa dataset. Besides the inuence of the level of annotations, we also aimed to evaluate the proposed clas-
sication methodology on a larger dataset, also with a multiclass formulation, to investigate the impact of the
dataset size on the performance of the algorithm. In this sense, we used 9,825 PCa biopsy slides from the dataset
of the Prostate cANcer graDe Assessment (PANDA) challenge66. e available full training set consists of 10,616
WSI of digitized H&E stained PCa biopsies (we excluded cases with some type of error) obtained from two
centres: the Radboud University Medical Centre and the Karolinska Institute, and includes both labelling and
tissue segmentation. Each image is labelled with the correspondent ISUP grade and includes tissue annotation,
dierentiating tumour areas from normal tissue. e International Society of Urological Pathology (ISUP) grad-
ing system is the currently score to grade PCa, which is based on the modied Gleason system (a score based on
glandular architecture within the tumour), providing accurate stratication of PCa67.
e PCa dataset contains six dierent labels, corresponding to the ve ISUP grades and the normal label,
whereas the CRC dataset has only three dierent labels. Histopathological slides are quite dierent for dierent
types of cancer, for instance, the quantity of tissue varies signicantly. e images require some pre-processing
that creates the tiles from the WSI. Such processing removes the background, and thus, tissue variations deeply
aect the number of tiles of one slide. Table4 displays an illustrative example of this, by comparing the number
of tiles and the mean number of tiles per slide included in both datasets. e average number of tiles per slide is
approximately 82x and 45x higher, respectively on the CRC annotated subset and on the complete dataset, when
compared to the PCa dataset. Because of this variation in tissue proportion, despite having 8.6x more slides, the
PCa dataset still has around 5x fewer tiles.
Methodology. Traditional supervised learning techniques would require all the patches extracted from the
original image to be labelled. However, cancer grading (in clinical practice) aims to classify the whole slide
image, not individual tiles. Moreover, labelling the tiles represents a signicant eort with regards to the work-
load of the pathologists. erefore, techniques such as multiple instance learning (MIL), have been adapted to
computational pathology problems6870. MIL only requires slide level labels and the original supervised problem
is converted to a weakly-supervised problem. e nature of the problem allows the implementation of this tech-
Figure4. Slide classes distribution on CRC dataset.
Table 4. Comparison between the number of tiles extracted from the PCa slides and the CRC slides.
Dataset # Slides # Tiles Mean # Tiles per slide
PCa 9825 253,291 25.78
CRC all 1133 1,322,596 1167.34
CRC annotated 100 211,235 2112.35
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol:.(1234567890)
Scientic Reports | (2021) 11:14358 | 
www.nature.com/scientificreports/
nique knowing that, if a WSI is classied with a label Y, no tile belongs to a more severe class than Y and at least
one tile belongs to the label Y. erefore, using the MIL concept, we propose a workow (Fig.5) based on the
work of Campanellaet al.69 with several adaptations:
1. Ordinal labels: First, the problem at hand has a multiclass formulation, whereas the original had only two
labels. In order to contextualise the premises of the MIL method and the clinical information, the labels
must not be seen as independent and their relation must be modelled. For instance, normal tissue is closer
to low-grade lesions than to high-grade dysplasias. us, there is an order regarding labels.
2. Removal of recurrent aggregation: e original approach leveraged a Recurrent Neural Network (RNN) to
aggregate predictions of individual tiles into a nal prediction. All the tests conducted for the feasibility
results did not show any benet of having this RNN aggregation, in fact, the performance degraded. us,
it was removed from the pipeline.
3. Tile ranking using the expected value: Using a single tile for the prediction requires a ranking rule in order to
select the most representative of potentially thousands of tiles. Since the problem is non-binary, the original
rule is not applicable69. erefore, to create a ranking of tiles that are meaningful for the nal prediction,
the backbone network is used to compute the outputs of each tile and the expected value is then computed
from these outputs:
with n_classes the number of classes,
xi
the class,
pi
the correspondent probability;
4. Loss function: e problem includes ordinal labels, so the minimisation of the cross-entropy fails to fully
capture the model’s behaviour. As mentioned before, the distance between labels is dierent and cross-
entropy treats them as if they are equally distant. us, in an attempt to increase the performance of the
initial baseline experiments the model is now optimised to minimise an approximation to the Quadratic
Weighted Kappa (QWK)71.
Training details. e setup of the experiments was similar across datasets: ResNet-34 as the backbone, batch-
size of 32, the Adaptive Moment Estimation (Adam) algorithm with a learning rate of
1×104
as optimiser, tiles
of
512 ×512
pixels that include 100% of tissue and mixed-precision from the Pytorch available package. Only
one tile was used for predicting the label of the slide (MIL formulation), thus the training set was only regarding
the selected tile. As for hardware, all the experiments were conducted using an Nvidia Tesla V100 (32 GB) GPU.
tile
_score =
n_classes
i=1
xi×p
i
Figure5. Proposed workow for colorectal cancer diagnosis on whole-slide images.
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol.:(0123456789)
Scientic Reports | (2021) 11:14358 | 
www.nature.com/scientificreports/
Experimental results and discussion. In deep learning problems, it is not always trivial to determine the
required dataset size to achieve the expected performance. Usually, it is expected that increasing the size of the
dataset increases the model performance. However, this is not always true. Hence, to fully understand the impact
of the dataset size in the computational pathology domain, the developed approach was trained on several sub-
sets of the original PCa training set with dierent sizes: 80, 160, 500, 1000, 2500, 5000 and 8348 (complete
training set). For a fair comparison, all the experiments were evaluated on the same test set, that included 1477
slides (15% of the total dataset) independent from the training set. As can be seen in Table5, the model is able
to leverage more data in order to achieve better performance. Moreover, in line with these results, Campanellaet
al.69 stated that for the MIL approach to work properly, the dataset must contain at least 10,000 samples. In our
experiments, the performance with 5000 slides was already close to the best performance.
To further infer the generalisation capability, an extra model was trained on 80 slides and evaluated on 20
slides randomly sampled from the 1477 test set. As seen in Table6, as expected, when the size of the test set
increases, the performance rapidly degrades, nursing the concerns and requirements for larger datasets. It is also
worth noting that the performance of the model is considered poor in terms of accuracy scores. e QWK, on
the other hand, records reasonable values. is dierence in performance means that, while the model misclas-
sies about 40% of the slides, it classies them with neighbour classes of the ground-truth. One possible reason
for this could be the noise present in the labels of this specic dataset.
e third set of experiments explores the potential to leverage the annotations of a subset of data in order to
improve the performance of the overall MIL method. Table7 shows the results of the best epoch of each of the
experiments. ere are notable performance gains in both the accuracy and the QWK score as the number of
training samples increases. However, perhaps the most exciting performance gain is related to the pre-training
of the backbone network on the 100 annotated samples for only two epochs before the start of the MIL training.
is experiment is able to outperform the best epoch of the experiment without pre-training in only 7 epochs,
in other words, 12 hours of training, with 84.94% accuracy and 0.803 QWK score. Moreover, these values kept
increasing until the last training epoch, reaching an accuracy and QWK score of 88.42% and 0.863, respectively.
e nal results presented in Table7 can be extended with a sensitivity to lesions of 93.33% and 95.74% for the
last two entries respectively. e training set comprises 874 samples (100 annotated and 774 non annotated),
whereas the test set has 259 WSI.
e results shown in Fig.6a,b, respectively for the QWK and the accuracy, are representative of the gains that
both the number of samples and the use of annotations bring to the model. Moreover, the use of annotations
Table 5. Evolution of the model performance when trained on subsets of the PCa dataset with dierent sizes,
keeping the test set size constant (n=1,477). Bold values indicate best results
# Train slides # Train tiles QWK score Accuracy (%)
80 1919 0.497 32.36
160 3851 0.586 37.71
500 12,714 0.628 41.28
1000 25,757 0.692 47.66
2500 64,697 0.738 50.03
5000 129,734 0.771 58.43
8348 215,116 0.789 59.40
Table 6. Performance comparison of the model trained on a subset of the PCa dataset, when evaluated on test
sets with dierent sizes. Bold value indicates best results
Dataset # Train slides # Test slides # Train tiles # Test tiles QWK score
PCa 80 1,477 1,919 38,175 0.497
PCa 80 20 1,919 579 0.591
Table 7. Performance of the model on the dierent experiments on the CRC dataset. Bold values indicate best
results
Dataset Pre-train QWK score Accuracy (%) Convergence Time (Epoch)
CRC annotated (n=100) No 0.583 75.00 6.5 h (13)
CRC All (n=1,133) No 0.795 84.17 2 days and 19 h (27)
CRC All (n=1,133) Yes 0.863 88.42 4 days (40)
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol:.(1234567890)
Scientic Reports | (2021) 11:14358 | 
www.nature.com/scientificreports/
appears not only to speed up convergence at high values, but also to increase the model’s ability to learn at
further epochs.
e nding in these feasibility results supports the need for larger datasets. Not only that, but it also increases
the condence in the performance of weakly-supervised learning techniques, especially if it is possible to include
at some point some supervised training to propel the performance even more. It is expected that these novel
techniques and larger datasets converge to models that are closer to be deployed for clinical practice.
Conclusions
Despite the growing popularity and availability of computational pathology works, there are relatively few pub-
lished works on colorectal cancer (CRC) diagnosis from whole-slide images (WSI), as shown in “CRC diagnosis
on WSI” section. Moreover, the reported results are based on relatively small and private datasets, which renders
them fragile and makes direct comparisons not so fair. e reviewed papers on CRC diagnosis leverage dierent
techniques and approaches to solve the same problem, from attention to graph neural networks and even weakly-
supervised learning. However, there is still room for progress in each of these techniques, both individually and
in combination. Nevertheless, it is dicult to perform a proper evaluation when the size of the dataset does not
favor the use of these techniques and limits the research. erefore, the rst and perhaps most crucial step for the
further development of computer-aided diagnosis (CAD) systems for CRC is to establish a large and meaningful
dataset. As studied in “Feasibility study on the use of weakly annotated and larger datasets for CRC diagnosis
section, increasing the number of WSI in the training data leads to an increase in performance, as does detailed
annotation of, at least, part of the dataset.
Nonetheless, the construction of larger datasets with extensive annotations is not an easy and expeditious
task. Hence, there is still a plethora of techniques to be explored with weakly labelled datasets. One of these
tasks is known as multiple instance learning (MIL) and while it has been employed several times on these type
of problems, it can still be improved to achieve more accurate results. As shown in “Experimental results &
Discussion” section, the performance of MIL systems is greatly improved with a pre-training on the 10% of the
dataset that is annotated. Future research is needed to delineate the lowest percentage of the dataset that needs
to be annotated so this approach improves both accuracy and convergence times. Moreover, there are other
techniques that can leverage weak labels. For instance, self-supervised learning methods leverage all the pixels
of the images as “labels”. On this note, it is important to also evaluate if there is a feasible way to combine the
benets of both methodologies.
Since the main goal of deep learning (DL) in computational pathology is to develop a solution that can be
deployed in a clinical environment, it is important to develop it in a similar fashion to the clinical practice, in
other words, to handle the same type of data given to pathologists, WSI. In that sense, the work proposed in this
paper is considerably more in line with the end goal of CAD systems for computational pathology: our proposal
can be directly applied to a lab workow.
Future work. Other approaches to be explored in the future are based on improving the supervised pre-
training stage. For example, could it be possible to use pre-training, not as a separate initial process, but, instead,
use annotated data as an extra-label in a multi-task learning conguration? Can pre-training leverage the use of
synthetic data to improve the performance? All these questions represent open research directions that could
be explored in the future. Recently, researchers have also approached a more human type of learning, known as
Figure6. Performance evaluated on CRC dataset.
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol.:(0123456789)
Scientic Reports | (2021) 11:14358 | 
www.nature.com/scientificreports/
curriculum learning. is could also be used eectively if, for example, the pre-training and MIL training are
distinct stages. Both stages can be iterated multiple times, with hard samples added at each iteration.
Despite the use of attention in some of the articles reviewed, they still incorporate these mechanisms as a type
of aggregation layer for their network. However, recent work in natural language processing and in computer
vision shows that these mechanisms are considerably more powerful than originally thought72. Transformers
(neural network architectures based only on attention layers) have shown impressive results. Both Facebook73
and Google74 have already started exploring these networks for vision and a similar path could also be interest-
ing for CAD systems in pathology.
From a dataset point of view, there is one more issue that has been largely ignored in computational pathol-
ogy research: a developed model should not be specic to scanning machine output or to particular laboratory
congurations. e broad knowledge acquired by the model during the training phase should not be wasted or
useless. It is then necessary that the models can either be directly generalised to other scanning machines, or that
a small sample of non-annotated WSIs are sucient to ne-tune a model with similar performance to the original.
us, broader studies should be conducted in this area, but on a positive note, they do not require large datasets.
In order for these approaches to be used in practice, it is important that researchers develop techniques to
inform pathologists about the spatial location that was most responsible for the diagnosis and to explain the
reasons for the prediction. Interpretability and explainability have been explored in medical applications of DL75,
and so they should be present in Computational Pathology use cases76, such as CRC diagnosis. e ultimate goal
is to create transparent systems that medical professionals can trust and rely on.
Data availability
e CRC dataset, from IMP Diagnostics, used in the current study is available on reasonable request through
the following email contact: cadpath.ai@impdiagnostics.com. e external PCa dataset was obtained from the
Prostate cANcer graDe Assessment (PANDA) challenge and is publicly available through the competition website:
https:// www. kaggle. com/c/ prost ate- cancer- grade- asses sment.
Received: 26 April 2021; Accepted: 28 June 2021
References
1. Madabhushi, A. & Lee, G. Image analysis and machine learning in digital pathology: challenges and opportunities. Med. Image
Anal. 33, 170–175. https:// doi. org/ 10. 1016/j. media. 2016. 06. 037 (2016).
2. Dimitriou, N., Arandjelović, O. & Caie, P. D. Deep learning for whole slide image analysis: An overview. Front. Med. 6, 264. https://
doi. org/ 10. 3389/ fmed. 2019. 00264 (2019).
3. Acs, B., Rantalainen, M. & Hartman, J. Articial intelligence as the next step towards precision pathology. J. Intern. Med. 288,
62–81. https:// doi. org/ 10. 1111/ joim. 13030 (2020).
4. Rakha, E. A. et al. Current and future applications of articial intelligence in pathology: A clinical perspective. J. Clin. Pathol.
https:// doi. org/ 10. 1136/ jclin path- 2020- 206908 (2020).
5. akur, N., Yoon, H. & Chong, Y. Current trends of articial intelligence for colorectal cancer pathology image analysis: A sys-
tematic review. Cancers 12, 1884. https:// doi. org/ 10. 3390/ cance rs120 71884 (2020).
6. Wang, Y. et al. Application of articial intelligence to the diagnosis and therapy of colorectal cancer. Am. J. Cancer Res. 10,
3575–3598 (2020).
7. WHO Classication of Tumours Editorial B oard. Digestive System Tumours. WHO Tumour Series 5th edn. (International Agency
for Research on Cancer, 2019).
8. International Agency for Research on Cancer (IARC). Global Cancer Observatory. https:// gco. iarc. fr/ (2021).
9. Brody, H. Colorectal cancer. Nature 521, S1. https:// doi. org/ 10. 1038/ 521S1a (2015).
10. Holmes, D. A disease of growth. Nature 521, S2–S3. https:// doi. org/ 10. 1038/ 521S2a (2015).
11. Siegel, R. L. et al. Colorectal cancer statistics 2020. CA A Cancer J. Clin. 70, 145–164. https:// doi. org/ 10. 3322/ caac. 21601 (2020).
12. Digestive Cancers Europe (DiCE). Colorectal screening in Europe. www. diges tivec ancers. eu/ wp- conte nt/ uploa ds/ 2020/ 02/ 466-
Docum ent- DiCEW hiteP aper2 019. pdf (2019).
13. Hassan, C. et al. Post-polypectomy colonoscopy surveillance: European society of gastrointestinal endoscopy guideline—Update
2020. Endoscopy 52, 687–700. https:// doi. org/ 10. 1055/a- 1185- 3109 (2020).
14. de Jonge, L. et al. Impact of the covid-19 pandemic on faecal immunochemical test-based colorectal cancer screening programmes
in Australia, Canada, and the Netherlands: A comparative modelling study. Lancet Gastroenterol. Hepatol. 6, 304–314. https:// doi.
org/ 10. 1016/ S2468- 1253(21) 00003-0 (2021).
15. Ricciardiello, L. et al. Impact of SARS-CoV-2 pandemic on colorectal cancer screening delay: Eect on stage shi and increased
mortality. Clin. Gastroenterol. Hepatol. https:// doi. org/ 10. 1016/j. cgh. 2020. 09. 008 (2020).
16. Mahajan, D. et al. Reproducibility of the villous component and high-grade dysplasia in colorectal adenomas <1 cm: Implications
for endoscopic surveillance. Am. J. Surg. Pathol. 37, 427–433. https:// doi. org/ 10. 1097/ PAS. 0b013 e3182 6cf50f (2013).
17. Turner, J. K., Williams, G. T., Morgan, M., Wright, M. & Dolwani, S. Interobserver agreement in the reporting of colorectal polyp
pathology among bowel cancer screening pathologists in Wales. Histopathology 62, 916–924. https:// doi. org/ 10. 1111/ his. 12110
(2013).
18. Osmond, A. et al. Interobserver variability in assessing dysplasia and architecture in colorectal adenomas: A multicentre Canadian
study. J. Clin. Pathol. 67, 781–786. https:// doi. org/ 10. 1136/ jclin path- 2014- 202177 (2014).
19. Gupta, S. et al. Recommendations for follow-up aer colonoscopy and polypectomy: A consensus update by the US Multi-Society
Task Force on Colorectal Cancer. Gastrointest. Endosc. https:// doi. org/ 10. 1016/j. gie. 2020. 01. 014 (2020).
20. Public Health England. Reporting lesions in the NHS BCSP: Guidelines from the bowel cancer screening programme pathology
group. https:// www. gov. uk/ gover nment/ publi catio ns/ bowel- cancer- scree ning- repor ting- lesio ns# histo ry (2018).
21. Quirke, P., Risio, M., Lambert, R., von Karsa, L. & Vieth, M. Quality assurance in pathology in colorectal cancer screening and
diagnosis—European recommendations. Virchows Arch. 458, 1–19. https:// doi. org/ 10. 1007/ s00428- 010- 0977-6 (2011).
22. Pathology Working Group of the Canadian Partnership Against Cancer. Pathological reporting of colorectal polyps: Pan-Canadian
consensus guidelines. http:// canad ianjo urnal ofpat hology. ca/ wp- conte nt/ uploa ds/ 2016/ 11/ cjp- volume- 4- isuue-3. pdf (2012).
23. Veta, M., Pluim, J., Diest, P. & Viergever, M. Breast cancer histopathology image analysis: A review. IEEE Trans. Biomed. Eng. 61,
1400–1411. https:// doi. org/ 10. 1109/ TBME. 2014. 23038 52 (2014).
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol:.(1234567890)
Scientic Reports | (2021) 11:14358 | 
www.nature.com/scientificreports/
24. Rober tson, S., Azizpour, H., Smith, K. & Hartman, J. Digital image analysis in breast pathology-from image processing techniques
to articial intelligence. Transl. Res. 194, 19–35. https:// doi. org/ 10. 1016/j. trsl. 2017. 10. 010 (2018).
25. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444. https:// doi. org/ 10. 1038/ natur e14539 (2015).
26. Janowczyk, A. & Madabhushi, A. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use
cases. J. Pathol. Inform. 7, 29. https:// doi. org/ 10. 4103/ 2153- 3539. 186902 (2016).
27. Roux, L. et al. Mitosis detection in breast cancer histological images: An ICPR 2012 contest. J. Pathol. Inform. https:// doi. org/ 10.
4103/ 2153- 3539. 112693 (2013).
28. Veta, M. et al. Assessment of algorithms for mitosis detection in breast cancer histopathology images. Med. Image Anal. 20, 237–248.
https:// doi. org/ 10. 1016/j. media. 2014. 11. 010 (2015).
29 . Ing, N. etal. Semantic segmentation for prostate cancer grading by convolutional neural networks. In Medical Imaging 2018: Digital
Pathology, vol. 10581 (eds. Tomaszewski, J. E. & Gurcan, M. N.) 343–355 (SPIE, 2018). https:// doi. org/ 10. 1117/ 12. 22930 00.
30. Wang, S., Yang, D., Rong, R., Zhan, X. & Xiao, G. Pathology image analysis using segmentation deep learning algorithms. Am. J.
Pathol. 189, 1686–1698. https:// doi. org/ 10. 1016/j. ajpath. 2019. 05. 007 (2019).
31. Wan, T., Cao, J., Chen, J. & Qin, Z. Automated grading of breast cancer histopathology using cascaded ensemble with combination
of multi-level image features. Neurocomputing 229, 34–44. https:// doi. org/ 10. 1016/j. neucom. 2016. 05. 084 (2017).
32. Truong, A. H., Sharmanska, V., Limbäck-Stanic, C. & Grech-Sollars, M. Optimization of deep learning methods for visualization
of tumor heterogeneity and brain tumor grading through digital pathology. Neuro-Oncol. Adv. https:// doi. org/ 10. 1093/ noajnl/
vdaa1 10 (2020).
33. Vo, D. M., Nguyen, N. & Lee, S.-W. Classication of breast cancer histology images using incremental boosting convolution net-
works. Inf. Sci. 482, 123–138. https:// doi. org/ 10. 1016/j. ins. 2018. 12. 089 (2019).
34 . Tsaku, N. Z. etal. Texture-based deep learning for eective histopathological cancer image classication. In 2019 IEEE International
Conference on Bioinformatics and Biomedicine (BIBM), 973–977 (2019). https:// doi. org/ 10. 1109/ BIBM4 7256. 2019. 89832 26.
35. Tizhoosh, H. & Pantanowitz, L. Articial intelligence and digital pathology: challenges and opportunities. J. Pathol. Inf. https://
doi. org/ 10. 4103/ jpi. jpi_ 53_ 18 (2018).
36. Abels, E. et al. Computational pathology denitions, best practices, and recommendations for regulatory guidance: A white paper
from the digital pathology association. J. Pathol. 249, 286–294. https:// doi. org/ 10. 1002/ path. 5331 (2019).
37. Kalkan, H., Nap, M., Duin, R. P. W. & Loog, M. Automated colorectal cancer diagnosis for whole-slice histopathology. In Medical
Image Computing and Computer-Assisted Intervention (MICCAI) (eds. Ayache, N., Delingette, H., Golland, P. & Mori, K.), 550–557.
https:// doi. org/ 10. 1007/ 978-3- 642- 33454-2_ 68 (2012).
38. Yoshida, H. et al. Automated histological classication of whole slide images of colorectal biopsy specimens. Oncotarget 8, 90719–
90729. https:// doi. org/ 10. 18632/ oncot arget. 21819 (2017).
39. Cosatto, E. etal. Automated gastric cancer diagnosis on H&E-stained sections; training a classier on a large scale with multiple
instance machine learning. In Medical Imaging 2013: Digital Pathology (eds. Gurcan, M. N. & Madabhushi, A.), vol. 8676, 51–59
(SPIE, 2013). https:// doi. org/ 10. 1117/ 12. 20070 47.
40. Korbar, B. et al. Deep learning for classication of colorectal polyps on whole-slide images. J. Pathol. Inf. https:// doi. org/ 10. 4103/
jpi. jpi_ 34_ 17 (2017).
41. Korbar, B. etal. Looking under the hood: Deep neural network visualization to interpret whole-slide image analysis outcomes for
colorectal polyps. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 821–827. https:// doi.
org/ 10. 1109/ CVPRW. 2017. 114 (2017).
42. Selvaraju, R. R. etal. Grad-CAM: visual explanations from deep networks via gradient-based localization. In IEEE International
Conference on Computer Vision (ICCV), 618–626. https:// doi. org/ 10. 1109/ ICCV. 2017. 74 (2017).
43. Iizuka, O. et al. Deep learning models for histopathological classication of gastric and colonic epithelial tumours. Sci. Rep. https://
doi. org/ 10. 1038/ s41598- 020- 58467-9 (2020).
44. Weinstein, J. et al. e Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120. https:// doi. org/ 10. 1038/ ng.
2764 (2013).
45. Wei, J. W. et al. Evaluation of a deep neural network for automated classication of colorectal polyps on histopathologic slides.
JAMA Netw. Open https:// doi. org/ 10. 1001/ jaman etwor kopen. 2020. 3398 (2020).
46. Song, Z. et al. Automatic deep learning-based colorectal adenoma detection system and its similarities with pathologists. BMJ
Open https:// doi. org/ 10. 1136/ bmjop en- 2019- 036423 (2020).
47. Xu, L. et al. Colorectal cancer detection based on deep learning. J. Pathol. Inf. https:// doi. org/ 10. 4103/ jpi. jpi_ 68_ 19 (2020).
48. Yoon, H. et al. Tumor identication in colorectal histology images using a convolutional neural network. J. Digit. Imaging 32,
131–140. https:// doi. org/ 10. 1007/ s10278- 018- 0112-9 (2019).
49. Sabol, P. et al. Explainable classier for improving the accountability in decision-making for colorectal cancer diagnosis from
histopathological images. J. Biomed. Inform. https:// doi. org/ 10. 1016/j. jbi. 2020. 103523 (2020).
50. Teh, E.W. & Taylor, G.W. Learning with less data via weakly labeled patch classication in digital pathology. In IEEE 17th Inter-
national Symposium on Biomedical Imaging (ISBI), 471–475. https:// doi. org/ 10. 1109/ ISBI4 5749. 2020. 90985 33 (2020).
51 . Ohata, E. F. et al. A novel transfer learning approach for the classication of histological images of colorectal cancer. J. Supercomput.
https:// doi. org/ 10. 1007/ s11227- 020- 03575-6 (2021).
52. Kim, S.-H., Koh, H. M. & Lee, B.-D. Classication of colorectal cancer in histological images using deep neural networks: An
investigation. Multimedia Tools Appl. https:// doi. org/ 10. 1007/ s11042- 021- 10551-6 (2021).
53. Xu, Y. et al. Large scale tissue histopathology image classication, segmentation, and visualization via deep convolutional activation
features. BMC Bioinform. 18, 1–17. https:// doi. org/ 10. 1186/ s12859- 017- 1685-x (2017).
54. Yang, K., Zhou, B., Yi, F., Chen, Y. & Chen, Y. Colorectal cancer diagnostic algorithm based on sub-patch weight color histogram
in combination of improved least squares support vector machine for pathological image. J. Med. Syst. 43, 1–9. https:// doi. org/ 10.
1007/ s10916- 019- 1429-8 (2019).
55. Ribeiro, M. G. et al. Classication of colorectal cancer based on the association of multidimensional and multiresolution features.
Expert Syst. Appl. 120, 262–278. https:// doi. org/ 10. 1016/j. eswa. 2018. 11. 034 (2019).
56. Haj-Hassan, H. et al. Classications of multispectral colorectal cancer tissues using convolution neural network. J. Pathol. Inf. 8,
1. https:// doi. org/ 10. 4103/ jpi. jpi_ 47_ 16 (2017).
57. Ponzio, F., Macii, E., Ficarra, E. & Cataldo, S. D. Colorectal cancer classication using deep convolutional networks—An experi-
mental study. In Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies (BIO-
IMAGING), 58–66. https:// doi. org/ 10. 5220/ 00066 43100 580066 (2018).
58. University of Leeds. Virtual pathology at the University of Leeds. http:// www. virtu alpat hology. leeds. ac. uk/ (2018).
59. Sena, P. et al. Deep learning techniques for detecting preneoplastic and neoplastic lesions in human colorectal histological images.
Oncol. Lett. 18, 6101–6107. https:// doi. org/ 10. 3892/ ol. 2019. 10928 (2019).
60. Sirinukunwattana, K., Snead, D. R. J. & Rajpoot, N. M. A stochastic polygons model for glandular structures in colon histology
images. IEEE Trans. Med. Imaging 34, 2366–2378. https:// doi. org/ 10. 1109/ TMI. 2015. 24339 00 (2015).
61. Kainz, P., Pfeier, M. & Urschler, M. Segmentation and classication of colon glands with deep convolutional neural networks and
total variation regularization. PeerJ 5, e3874. https:// doi. org/ 10. 7717/ peerj. 3874 (2017).
62. Zhou, Y. etal. Cgc-net: Cell graph convolutional network for grading of colorectal cancer histology images. In IEEE/CVF Inter-
national Conference on Computer Vision Workshop (ICCVW), 388–398. https:// doi. org/ 10. 1109/ ICCVW. 2019. 00050 (2019).
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol.:(0123456789)
Scientic Reports | (2021) 11:14358 | 
www.nature.com/scientificreports/
63. Shaban, M. et al. Context-aware convolutional neural network for grading of colorectal cancer histology images. IEEE Trans. Med.
Imaging 39, 2395–2405. https:// doi. org/ 10. 1109/ tmi. 2020. 29710 06 (2020).
64. Pathcore. Sedeen viewer. https:// pathc ore. com/ sedeen (2020).
65. Winawer, S. J. et al. Risk and surveillance of individuals with colorectal polyps. WHO collaborating centre for the prevention of
colorectal cancer. Bull. World Heal. Organ. 68, 789–795 (1990).
66. Radboud University Medical Center and Karolinska Institute. Prostate cancer grade assessment (PANDA) challenge: Prostate
cancer diagnosis using the Gleason grading system. https:// www. kaggle. com/c/ prost ate- cancer- grade- asses sment (2020).
67. Epstein, J. I. et al. e 2014 international society of urological pathology (ISUP) consensus conference on Gleason grading of
prostatic carcinoma. Am. J. Surg. Pathol. 40, 244–252. https:// doi. org/ 10. 1097/ PAS. 00000 00000 000530 (2016).
68. Lu, M. Y., Chen, R. J., Wang, J., Dillon, D. & Mahmood, F. Semi-supervised histology classication using deep multiple instance
learning and contrastive predictive coding. https:// arxiv. org/ abs/ arXiv: 1910. 10825 (2019).
69. Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat.
Med. 25, 1301–1309. https:// doi. org/ 10. 1038/ s41591- 019- 0508-1 (2019).
70. Oliveira, S. P. et al. Weakly-supervised classication of HER2 expression in breast cancer haematoxylin and eosin stained slides.
Appl. Sci. 10, 4728. https:// doi. org/ 10. 3390/ app10 144728 (2020).
71. Cohen, J. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychol. Bull. 70, 213–220.
https:// doi. org/ 10. 1037/ h0026 256 (1968).
72. Khan, S. etal. Transformers in vision: A survey. https:// arxiv. org/ abs/ 2101. 01169 v2 (2021).
73 . Carion, N. etal. End-to-end object detection with transformers. In (eds Vedaldi, A., Bischof, H., Brox, T. & Frahm, J.-M.) Computer
Vision—ECCV 2020, 213–229. https:// doi. org/ 10. 1007/ 978-3- 030- 58452-8_ 13 (2020).
74. Dosovitskiy, A. etal. An image is worth 16×16 words: Transformers for image recognition at scale. https:// arxiv. org/ abs/ 2010.
11929 (2020).
75. Silva, W., Fernandes, K., Cardoso, M. J. & Cardoso, J. S. Towards Complementary Explanations Using Deep Neural Networks. In
Understanding and Interpreting Machine Learning in Medical Image Computing Applications (IMIMIC) (ed. Stoyanov, D.) 133–140
(Springer, 2018). https:// doi. org/ 10. 1007/ 978-3- 030- 02628-8_ 15.
76. Pocevičiūtė, M., Eilertsen, G. & Lundström, C. Survey of XAI in Digital Pathology. In Articial Intelligence and Machine Learning
for Digital Pathology: State-of-the-Art and Future Challenges (eds Holzinger, A. et al.) 56–88 (Springer, 2020). https:// doi. org/ 10.
1007/ 978-3- 030- 50402-1_4.
Acknowledgements
is work is nanced by the ERDF - European Regional Development Fund through the Operational Programme
for Competitiveness and Internationalisation - COMPETE 2020 Programme within project POCI-01-0247-
FEDER-045413 and by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência
e a Tecnologia, within theproject UIDB/50014/2020 andthe PhD grant SFRH/BD/139108/2018. e authors
would like to thank Manuela Felizardo for creating the image of Fig.1.
Author contributions
S.P.O, P.C.N. and J.F. contributed equally to this work; S.P.O., P.C.N. and J.S.C. designed the experiments; S.P.O.
and P.C.N conducted the experiments and the analysis of the results; J.M., L.R. and S.G. prepared the histopatho-
logical samples; J.F. and D.M. collected, reviewed and annotated the histopathological cases; S.P.O., P.C.N., J.F.,
D.M. wrote the manuscript; I.M.P. clinically supervised the project; J.S.C. technically supervised the project. All
authors reviewed the manuscript and agreed to its published version.
Competing interests
e authors declare no competing interests.
Additional information
Correspondence and requests for materials should be addressed to S.P.O.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access is article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons licence, and indicate if changes were made. e images or other third party material in this
article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.
© e Author(s) 2021
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Thus, the development of robust and high-performance algorithms, which are transparent and as interpretable as possible, can be valuable to assist pathologists in their daily workload [11,12]. However, despite its clinical relevance, the application of AI for CRC diagnosis from WSI is still poorly explored and there are some limitations that hinder its application in clinical practice, as recently highlighted by Oliveira et al. [17]. ...
... Currently, most of the works published on CRC diagnosis focus on classifying cropped regions of interest, or even small tiles, rather than the much more challenging task of assessing the entire WSI, as noted in recent reviews [17][18][19][20]. Nevertheless, some authors have already presented approaches based on the evaluation of the entire slide of colorectal samples. ...
... Therefore, the availability of WSIs is limited, and, when available, they often lack meaningful labelling: while slide-level diagnoses are generally available, detailed spatial annotations are almost always lacking. A prototypical example is the CRC dataset [17], containing 1133 colorectal H&E samples with slide-level diagnoses. ...
Article
Full-text available
Colorectal cancer (CRC) diagnosis is based on samples obtained from biopsies, assessed in pathology laboratories. Due to population growth and ageing, as well as better screening programs, the CRC incidence rate has been increasing, leading to a higher workload for pathologists. In this sense, the application of AI for automatic CRC diagnosis, particularly on whole-slide images (WSI), is of utmost relevance, in order to assist professionals in case triage and case review. In this work, we propose an interpretable semi-supervised approach to detect lesions in colorectal biopsies with high sensitivity, based on multiple-instance learning and feature aggregation methods. The model was developed on an extended version of the recent, publicly available CRC dataset (the CRC+ dataset with 4433 WSI), using 3424 slides for training and 1009 slides for evaluation. The proposed method attained 90.19% classification ACC, 98.8% sensitivity, 85.7% specificity, and a quadratic weighted kappa of 0.888 at slide-based evaluation. Its generalisation capabilities are also studied on two publicly available external datasets.
... The publicly available test partition includes 11,888 images collected from seven publicly available datasets. The test partition includes WSIs (UNITOPATHO 31,32 , TCGA-COAD 33 , AIDA 34 , IMP-CRC 35 ) and cropped sections of WSIs (GlaS 36 , CRC 37 , UNITOPATHO 31,32 , Xu 38 ). Sections of WSIs are treated as WSIs, since they are provided with labels referring to the whole image. ...
Article
Full-text available
The digitalization of clinical workflows and the increasing performance of deep learning algorithms are paving the way towards new methods for tackling cancer diagnosis. However, the availability of medical specialists to annotate digitized images and free-text diagnostic reports does not scale with the need for large datasets required to train robust computer-aided diagnosis methods that can target the high variability of clinical cases and data produced. This work proposes and evaluates an approach to eliminate the need for manual annotations to train computer-aided diagnosis tools in digital pathology. The approach includes two components, to automatically extract semantically meaningful concepts from diagnostic reports and use them as weak labels to train convolutional neural networks (CNNs) for histopathology diagnosis. The approach is trained (through 10-fold cross-validation) on 3’769 clinical images and reports, provided by two hospitals and tested on over 11’000 images from private and publicly available datasets. The CNN, trained with automatically generated labels, is compared with the same architecture trained with manual labels. Results show that combining text analysis and end-to-end deep neural networks allows building computer-aided diagnosis tools that reach solid performance (micro-accuracy = 0.908 at image-level) based only on existing clinical data without the need for manual annotations.
... Thus, DL algorithms could provide valuable results for diagnoses in clinical practice, especially when inconsistencies occur. The available scanned histological images can be reviewed and examined by the collaboration of pathologists simultaneously, from different locations [121,122]. For an efficient fully digital workflow, however, the development of technology infrastructure, including computers, scanners, workstations and medical displays is necessary. ...
Article
Full-text available
Colorectal cancer (CRC) is the second most common cancer in women and the third most common in men, with an increasing incidence. Pathology diagnosis complemented with prognostic and predictive biomarker information is the first step for personalized treatment. The increased diagnostic load in the pathology laboratory, combined with the reported intra- and inter-variability in the assessment of biomarkers, has prompted the quest for reliable machine-based methods to be incorporated into the routine practice. Recently, Artificial Intelligence (AI) has made significant progress in the medical field, showing potential for clinical applications. Herein, we aim to systematically review the current research on AI in CRC image analysis. In histopathology, algorithms based on Deep Learning (DL) have the potential to assist in diagnosis, predict clinically relevant molecular phenotypes and microsatellite instability, identify histological features related to prognosis and correlated to metastasis, and assess the specific components of the tumor microenvironment.
... These methods can be made to work efficiently on thousands of WSIs, often after dividing them into smaller parts as image tiles (or patches), during the model training. Recently, Oliveira et al. have identified limitations of existing algorithms and underscored the need for more accurate methodologies for use in clinical practice (14). They highlighted the need for larger datasets and the use of appropriate learning methodology to improve prediction accuracy. ...
Preprint
Full-text available
Histopathological examination is a pivotal step in the diagnosis and treatment planning of many major diseases. To facilitate the diagnostic decision-making and reduce the workload of pathologists, we present an AI-based pre-screening tool capable of identifying normal and neoplastic colon biopsies. To learn the differential histological patterns from whole slides images (WSIs) stained with hematoxylin and eosin (H&E), our proposed weakly supervised deep learning method requires only slide-level labels and no detailed cell or region-level annotations. The proposed method was developed and validated on an internal cohort of biopsy slides (n=4 292) from two hospitals labeled with corresponding diagnostic categories assigned by pathologists after reviewing case reports. Performance of the proposed colon cancer pre-screening tool was evaluated in a cross-validation setting using the internal cohort (n=4 292) and also by an external validation on The Cancer Genome Atlas (TCGA) cohort (n=731). With overall cross-validated classification accuracy (AUROC = 0.9895) and external validation accuracy (AUROC = 0.9746), the proposed tool promises high accuracy to assist with the pre-screening of colorectal biopsies in clinical practice. Analysis of saliency maps confirms the representation of disease heterogeneity in model predictions and their association with relevant pathological features. The proposed AI tool correctly reported some slides as neoplastic while clinical reports suggested they were normal. Additionally, we analyzed genetic mutations and gene enrichment analysis of AI-generated neoplastic scores to gain further insight into the model predictions and explore the association between neoplastic histology and genetic heterogeneity through representative genes and signaling pathways.
... As a way to test the DP system, before full implementation, we took advantage of our R&D department's simultaneous AI project in colorectal cancer samples [15] and we evaluated the quality of 2963 digitized slides (1664 archive cases and 1299 routine cases). We divided the errors by those detected by the scanners and those detected in the subsequent pathology QC check. ...
Article
Full-text available
Digital pathology (DP) is being deployed in many pathology laboratories, but most reported experiences refer to public health facilities. In this paper, we report our experience in DP transition at a high-volume private laboratory, addressing the main challenges in DP implementation in a private practice setting and how to overcome these issues. We started our implementation in 2020 and we are currently scanning 100% of our histology cases. Pre-existing sample tracking infrastructure facilitated this process. We are currently using two high-capacity scanners (Aperio GT450DX) to digitize all histology slides at 40×. Aperio eSlide Manager WebViewer viewing software is bidirectionally linked with the laboratory information system. Scanning error rate, during the test phase, was 2.1% (errors detected by the scanners) and 3.5% (manual quality control). Pre-scanning phase optimizations and vendor feedback and collaboration were crucial to improve WSI quality and are ongoing processes. Regarding pathologists’ validation, we followed the Royal College of Pathologists recommendations for DP implementation (adapted to our practice). Although private sector implementation of DP is not without its challenges, it will ultimately benefit from DP safety and quality-associated features. Furthermore, DP deployment lays the foundation for artificial intelligence tools integration, which will ultimately contribute to improving patient care.
... In the last few decades, there was a massive growth in the amount of digital color image content in the medical field due to the spread of advanced multimedia devices and digital services capable of doing acquisition, transmission, and storage of digital data [8]. Digital Pathology images (Whole slide images (WSI)) are about 10X bigger than Radiology images, being over >1 GB in size in most cases, this type of image requires better storage management through their useful life cycle in clinical workflow [7,30]. Thus, image quality assessment (IQA) methods are crucial, working as a filter in the first stage of acquisition, they can improve storage management. ...
Conference Paper
Full-text available
Medical image quality assessment plays an important role not only in the design and manufacturing processes of image acquisition but also in the optimization of decision support systems. This work introduces a new deep ordinal learning approach for focus assessment in whole slide images. From the blurred image to the focused image there is an ordinal progression that contains relevant knowledge for more robust learning of the models. With this new method, it is possible to infer quality without losing ordinal information about focus since instead of using the nominal cross-entropy loss for training, ordinal losses were used. Our proposed model is contrasted against other state-of-the-art methods present in the literature. A first conclusion is a benefit of using data-driven methods instead of knowledge-based methods. Additionally, the proposed model is found to be the top-performer in several metrics. The best performing model scores an accuracy of 94.4% for a 12 classes classification problem in the FocusPath database.
Conference Paper
Manual assessment of fragments during the pro-cessing of pathology specimens is critical to ensure that the material available for slide analysis matches that captured during grossing without losing valuable material during this process. However, this step is still performed manually, resulting in lost time and delays in making the complete case available for evaluation by the pathologist. To overcome this limitation, we developed an autonomous system that can detect and count the number of fragments contained on each slide. We applied and compared two different methods: conventional machine learning methods and deep convolutional network methods. For conventional machine learning methods, we tested a two-stage approach with a supervised classifier followed by unsupervised hierarchical clustering. In addition, Fast R-CNN and YOLOv5, two state-of-the-art deep learning models for detection, were used and compared. All experiments were performed on a dataset comprising 1276 images of colorec-tal biopsy and polypectomy specimens manually labeled for fragment/set detection. The best results were obtained with the YOLOv5 architecture with a map@0.5 of 0.977 for fragment/set detection.
Article
Identification of nuclear components in the histology landscape is an important step towards developing computational pathology tools for the profiling of tumor micro-environment. Most existing methods for the identification of such components are limited in scope due to heterogeneous nature of the nuclei. Graph-based methods offer a natural way to formulate the nucleus classification problem to incorporate both appearance and geometric locations of the nuclei. The main challenge is to define models that can handle such an unstructured domain. Current approaches focus on learning better features and then employ well-known classifiers for identifying distinct nuclear phenotypes. In contrast, we propose a message passing network that is a fully learnable framework build on classical network flow formulation. Based on physical interaction of the nuclei, a nearest neighbor graph is constructed such that the nodes represent the nuclei centroids. For each edge and node, appearance and geometric features are computed which are then used for the construction of messages utilized for diffusing contextual information to the neighboring nodes. Such an algorithm can infer global information over an entire network and predict biologically meaningful nuclear communities. We show that learning such communities improves the performance of nucleus classification task in histology images. The proposed algorithm can be used as a component in existing state-of-the-art methods resulting in improved nucleus classification performance across four different publicly available datasets.
Preprint
Multiple Instance Learning (MIL) methods have become increasingly popular for classifying giga-pixel sized Whole-Slide Images (WSIs) in digital pathology. Most MIL methods operate at a single WSI magnification, by processing all the tissue patches. Such a formulation induces high computational requirements, and constrains the contextualization of the WSI-level representation to a single scale. A few MIL methods extend to multiple scales, but are computationally more demanding. In this paper, inspired by the pathological diagnostic process, we propose ZoomMIL, a method that learns to perform multi-level zooming in an end-to-end manner. ZoomMIL builds WSI representations by aggregating tissue-context information from multiple magnifications. The proposed method outperforms the state-of-the-art MIL methods in WSI classification on two large datasets, while significantly reducing the computational demands with regard to Floating-Point Operations (FLOPs) and processing time by up to 40x.
Preprint
Full-text available
A review of over 4000+ articles published in 2021 related to artificial intelligence in healthcare.A BrainX Community exclusive, annual publication which has trends, specialist editorials and categorized references readily available to provide insights into related 2021 publications. Cite as: Mathur P, Mishra S, Awasthi R, Cywinski J, et al. (2022). Artificial Intelligence in Healthcare: 2021 Year in Review. DOI: 10.13140/RG.2.2.25350.24645/1
Article
Full-text available
Colorectal cancer refers to cancer of the colon or rectum; and has high incidence rates worldwide. Colorectal cancer most often occurs in the form of adenocarcinoma, which is known to arise from adenoma, a precancerous lesion. In general, colorectal tissue collected through a colonoscopy is prepared on glass slides and diagnosed by a pathologist through a microscopic examination. In the pathological diagnosis, an adenoma is relatively easy to diagnose because the proliferation of epithelial cells is simple and exhibits distinct changes compared to normal tissue. Conversely, in the case of adenocarcinoma, the degree of fusion and proliferation of epithelial cells is complex and shows continuity. Thus, it takes a considerable amount of time to diagnose adenocarcinoma and classify the degree of differentiation, and discordant diagnoses may arise between the examining pathologists. To address these difficulties, this study performed pathological examinations of colorectal tissues based on deep learning. The approach was tested experimentally with images obtained via colonoscopic biopsy from Gyeongsang National University Changwon Hospital from March 1, 2016, to April 30, 2019. Accordingly, this study demonstrates that deep learning can perform a detailed classification of colorectal tissues, including colorectal cancer. To the best of our knowledge, there is no previous study which has conducted a similarly detailed feasibility analysis of a deep learning-based colorectal cancer classification solution.
Article
Full-text available
Colorectal cancer (CRC) is the second most diagnosed cancer in the United States. It is identified by histopathological evaluations of microscopic images of the cancerous region, relying on a subjective interpretation. The Colorectal Histology dataset used in this study contains 5000 images, made available by the University Medical Center Mannheim. This approach proposes the automatic identification of eight types of tissues found in CRC histopathological evaluation. We apply Transfer Learning from architectures of Convolutional Neural Networks (CNNs). We modify the structures of CNNs to extract features from the images and input them to well-known machine learning methods: Naive Bayes, Multilayer Perceptron, k-Nearest Neighbors, Random Forest, and Support Vector Machine (SVM). We evaluated 108 extractor–classifier combinations. The one that achieved the best results is DenseNet169 with SVM (RBF), reaching an Accuracy of 92.083% and F1-Score of 92.117%. Therefore, our approach is capable of distinguishing tissues found in CRC histopathological evaluation.
Article
Full-text available
Artificial intelligence (AI) is a relatively new branch of computer science involving many disciplines and technologies, including robotics, speech recognition, natural language and image recognition or processing, and machine learning. Recently, AI has been widely applied in the medical field. The effective combination of AI and big data can provide convenient and efficient medical services for patients. Colorectal cancer (CRC) is a common type of gastrointestinal cancer. The early diagnosis and treatment of CRC are key factors affecting its prognosis. This review summarizes the research progress and clinical application value of AI in the investigation, early diagnosis, treatment, and prognosis of CRC, to provide a comprehensive theoretical basis for AI as a promising diagnostic and treatment tool for CRC.
Article
Full-text available
Introduction An electronic Patient-Reported Outcome (ePRO) platform is needed for implementing evidence-based symptom management in outpatients with advanced cancer. We describe the overall protocol and the methodology for measuring symptom burden, to provide critical parameters needed to implement symptom management on the ePRO platform. Methods and analysis The study focusses on patients with advanced lung cancer, stomach cancer, oesophagus cancer, liver cancer, colorectal cancer or breast cancer. The primary outcome is the change of symptom burden. MD Anderson Symptom Inventory, and other PRO instruments (Insomnia Severity Index, Hospital Anxiety and Depression Scale, 9-item Patient Health Questionnaire and EuroQol-5 dimensions-5 levels version) were used. The secondary outcomes include feasibility of using ePRO, symptom-related quality of life, reasons for no improvement of symptoms, defining frequency of PRO assessments and cut-points, items for screening and management of comorbidity and satisfaction with ePRO platform in patients and health providers. After initial outpatient visit for baseline assessment, ePRO system will automatically send follow-up notification seven times over 4 weeks to patients. The characteristics and changing trajectory of symptoms of patients will be described. Parameters for using PROs, such as optimal time points for follow-up and cut-off point for alert will be determined. The feasibility of ePRO platform to track the changes of target symptoms in outpatients will be evaluated. Ethics and dissemination The study protocol and related documents were approved by the Institutional Research Board (IRB) of Peking University Cancer Hospital on 13 February 2019 (2019YJZ07). The results of this study will be disseminated through academic workshops, peer-reviewed publications and conferences. Trial registration number ChiCTR1900023560.
Article
Full-text available
Health problems caused by airborne particulate matter with a diameter less than 2.5 (PM2.5), especially in the respiratory system, have become a worldwide problem, but the influence and mechanisms of PM2.5 on the ocular surface have not been sufficiently elucidated. We investigated in vitro the onset and pathogenesis of corneal damage induced by PM2.5. Two types of PM2.5 samples originating from Beijing (designated #28) and the Gobi Desert (designated #30) were added to the culture medium of immortalized cultured human corneal epithelial cells (HCECs) to examine the effects on survival rates, autophagy, and proinflammatory cytokine production. Both types of PM2.5 significantly reduced the HCEC survival rate in a concentration-dependent manner by triggering autophagy. In particular, compared with #30, #28 induced much more severe damage in HCECs. Physical contact between PM2.5 and HCECs was not a primary contributor to PM2.5-induced HCEC damage. Among the 38 proinflammatory cytokines examined in this study, significant increases in the granulocyte macrophage colony-stimulating factor (GM-CSF) and interleukin-6 levels and a significant reduction in the interleukin-8 level were detected in culture medium of PM2.5-exposed HCECs. Simultaneous addition of a GM-CSF inhibitor, suramin, alleviated the HCEC impairment induced by PM2.5. In conclusion, PM2.5 induces HCEC death by triggering autophagy. Some cytokines that are released from HCECs, including GM-CSF, may be involved in HCEC damage caused by PM2.5 exposure.
Article
Full-text available
Importance The association between human papillomavirus (HPV) infection status and the natural process of kidney diseases has been neglected as an area of research. Further studies are needed to clarify factors that may alter the progression of end-stage kidney disease (ESKD). Objective To describe the rates of ESKD among patients with and without HPV infection. Design, Setting, and Participants In this nationwide, population-based retrospective cohort study, data were collected from the National Health Insurance Research Database of Taiwan. A total of 76 088 individuals with HPV infection were enrolled from January 1, 2000, to December 31, 2012, and compared with a control group of 76 088 individuals who had never been diagnosed with HPV infection (at a 1:1 ratio propensity-score matched by age, sex, index year, and comorbidities) in the context of the risk of developing ESKD. Statistical analysis was performed between November 2019 and July 2020. Exposures HPV infection was defined according to the International Classification of Diseases, Ninth Revision, Clinical Modification codes. Main Outcomes and Measures The main outcome was ESKD, as recorded in the Catastrophic Illness Patients database. Cox proportional hazards models were used to estimate hazard ratios (HRs) and 95% CIs, with the control group as a reference. Results Of 152 176 individuals (79 652 [52.3%] women; mean [SD] age, 34.4 [19.1] years), 76 088 individuals (50.0%) had HPV and 463 individuals (0.3%) developed ESKD. Incidence of ESKD was lower in individuals with HPV history than in those without HPV history (3.64 per 10 000 person-years vs 4.80 per 10 000 person-years). In the fully adjusted multivariate Cox proportional hazards regression model, individuals with a history of HPV infection had a significant decrease in risk of ESKD (adjusted HR, 0.72; 95% CI, 0.60-0.87) after adjusting for demographic characteristics, comorbidities, and comedications. In the subgroup analysis, individuals ages 50 to 64 years with HPV infection had a statistically significantly lower risk of ESKD compared with individuals ages 50 to 64 years with no HPV infection (adjusted HR, 0.48; 95% CI, 0.34-0.68; P < .001), while there was no significant reduction in risk for the other age groups (ie, 0-19, 20-49, and 65-100 years). Conclusions and Relevance In this study, a history of HPV infection was associated with a lower risk of subsequent ESKD. The mechanism behind this protective association remains uncertain. Future studies are required to clarify the possible biological mechanisms.
Article
Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence as compared to recurrent networks e.g. , Long short-term memory (LSTM). Different from convolutional networks, Transformers require minimal inductive biases for their design and are naturally suited as set-functions. Furthermore, the straightforward design of Transformers allows processing multiple modalities ( e.g. , images, videos, text and speech) using similar processing blocks and demonstrates excellent scalability to very large capacity networks and huge datasets. These strengths have led to exciting progress on a number of vision tasks using Transformer networks. This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline. We start with an introduction to fundamental concepts behind the success of Transformers i.e., self-attention, large-scale pre-training, and bidirectional feature encoding. We then cover extensive applications of transformers in vision including popular recognition tasks ( e.g. , image classification, object detection, action recognition, and segmentation), generative modeling, multi-modal tasks ( e.g. , visual-question answering, visual reasoning, and visual grounding), video processing ( e.g. , activity recognition, video forecasting), low-level vision ( e.g. , image super-resolution, image enhancement, and colorization) and 3D analysis ( e.g. , point cloud classification and segmentation). We compare the respective advantages and limitations of popular techniques both in terms of architectural design and their experimental value. Finally, we provide an analysis on open research directions and possible future works. We hope this effort will ignite further interest in the community to solve current challenges towards the application of transformer models in computer vision.
Article
Background Colorectal cancer screening programmes worldwide have been disrupted during the COVID-19 pandemic. We aimed to estimate the impact of hypothetical disruptions to organised faecal immunochemical test-based colorectal cancer screening programmes on short-term and long-term colorectal cancer incidence and mortality in three countries using microsimulation modelling. Methods In this modelling study, we used four country-specific colorectal cancer microsimulation models–Policy1-Bowel (Australia), OncoSim (Canada), and ASCCA and MISCAN-Colon (the Netherlands)—to estimate the potential impact of COVID-19-related disruptions to screening on colorectal cancer incidence and mortality in Australia, Canada, and the Netherlands annually for the period 2020–24 and cumulatively for the period 2020–50. Modelled scenarios varied by duration of disruption (3, 6, and 12 months), decreases in screening participation after the period of disruption (0%, 25%, or 50% reduction), and catch-up screening strategies (within 6 months after the disruption period or all screening delayed by 6 months). Findings Without catch-up screening, our analysis predicted that colorectal cancer deaths among individuals aged 50 years and older, a 3-month disruption would result in 414–902 additional new colorectal cancer diagnoses (relative increase 0·1–0·2%) and 324–440 additional deaths (relative increase 0·2–0·3%) in the Netherlands, 1672 additional diagnoses (relative increase 0·3%) and 979 additional deaths (relative increase 0·5%) in Australia, and 1671 additional diagnoses (relative increase 0·2%) and 799 additional deaths (relative increase 0·3%) in Canada between 2020 and 2050, compared with undisrupted screening. A 6-month disruption would result in 803–1803 additional diagnoses (relative increase 0·2–0·4%) and 678–881 additional deaths (relative increase 0·4–0·6%) in the Netherlands, 3552 additional diagnoses (relative increase 0·6%) and 1961 additional deaths (relative increase 1·0%) in Australia, and 2844 additional diagnoses (relative increase 0·3%) and 1319 additional deaths (relative increase 0·4%) in Canada between 2020 and 2050, compared with undisrupted screening. A 12-month disruption would result in 1619–3615 additional diagnoses (relative increase 0·4–0·9%) and 1360–1762 additional deaths (relative increase 0·8–1·2%) in the Netherlands, 7140 additional diagnoses (relative increase 1·2%) and 3968 additional deaths (relative increase 2·0%) in Australia, and 5212 additional diagnoses (relative increase 0·6%) and 2366 additional deaths (relative increase 0·8%) in Canada between 2020 and 2050, compared with undisrupted screening. Providing immediate catch-up screening could minimise the impact of the disruption, restricting the relative increase in colorectal cancer incidence and deaths between 2020 and 2050 to less than 0·1% in all countries. A post-disruption decrease in participation could increase colorectal cancer incidence by 0·2–0·9% and deaths by 0·6–1·6% between 2020 and 2050, compared with undisrupted screening. Interpretation Although the projected effect of short-term disruption to colorectal cancer screening is modest, such disruption will have a marked impact on colorectal cancer incidence and deaths between 2020 and 2050 attributable to missed screening. Thus, it is crucial that, if disrupted, screening programmes ensure participation rates return to previously observed rates and provide catch-up screening wherever possible, since this could mitigate the impact on colorectal cancer deaths. Funding Cancer Council New South Wales, Health Canada, and Dutch National Institute for Public Health and Environment.
Article
Background and Aims Gastroparesis (GP) is a multifactorial disease associated with a large burden on the healthcare systems. Pyloric-directed therapies including G-POEM can be effective improve patient quality of life and symptom severity. We report on the safety and efficacy of G-POEM and its impact on the quality of life of patients managed at a large referral center. Methods Consecutive patients with confirmed GP referred for G-POEM due to failure of medical therapy were included. All patients were assessed at baseline and then at 1, 3, 6, and 12 and 24 months after G-POEM using validated symptom and QOL instruments including the following: Gastroparesis Cardinal Symptom Index (GCSI), Patient Assessment of Gastrointestinal Disorders Symptom Severity Index (PAGI-SYM), and SF-36. Patients were evaluated before and 6 months after the procedure with EGD, 4-hour scintigraphy, and pyloric EndoFLIP. Technical success was defined as the ability to perform full-thickness pyloromyotomy. Clinical response was defined as an improvement of ≥1 point on GCSI. Results Fifty-two patients (median age: 48 years, range 25 - 80, 88% female) who underwent G-POEM between February 2018 to September 2020 for the following phenotypes: vomiting predominant (n=30), dyspepsia-predominant (n=16), and regurgitation predominant (n=6) GP. The technical success rate was 100%. Adverse events were noted in 3 out of 52 (5.77%), and were all successfully managed endoscopically. Clinical response was achieved in 68%, 58% and 48% of patients at 1-month, 6-month, and 12-month follow-up (p < 0.001, p < 0.001, and p < 0.01, respectively). When classified by etiology of GP, the clinical response rates were diabetic GP 64% (11/17), postsurgical GP 67% (6/9), and idiopathic GP of 72% (13/18). A statistically significant improvement in PAGI-SYM scores was observed at 1, 3, 6, 12, and 24 months, in addition to significant improvement in several domains of SF-36. Mean 4-hour gastric emptying was reduced 6 months after G-POEM (10.2%) compared with baseline (36.5%, p < 0.001). We report a significant reduction in the number of emergency department visits, days spent in the hospital up to 24 months after G-POEM. Conclusions G-POEM appears to be a safe and feasible treatment alternative for refractory GP with significant short-term and mid-term improvements in overall symptoms, QOL scores, and healthcare utilization.
Chapter
We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. The new model is conceptually simple and does not require a specialized library, unlike many other modern detectors. DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster R-CNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner. We show that it significantly outperforms competitive baselines. Training code and pretrained models are available at https://github.com/facebookresearch/detr.