Conference PaperPDF Available

Joint Segmentation of Multiple Thoracic Organs in CT Images with Two Collaborative Deep Architectures

Authors:

Abstract and Figures

Computed Tomography (CT) is the standard imaging technique for radiotherapy planning. The delineation of Organs at Risk (OAR) in thoracic CT images is a necessary step before radiotherapy, for preventing irradiation of healthy organs. However, due to low contrast, multi-organ segmentation is a challenge. In this paper, we focus on developing a novel framework for automatic delineation of OARs. Different from previous works in OAR segmentation where each organ is segmented separately, we propose two collaborative deep architectures to jointly segment all organs, including esophagus, heart, aorta and trachea. Since most of the organ borders are ill-defined, we believe spatial relationships must be taken into account to overcome the lack of contrast. The aim of combining two networks is to learn anatomical constraints with the first network, which will be used in the second network, when each OAR is segmented in turn. Specifically, we use the first deep architecture, a deep SharpMask architecture, for providing an effective combination of low-level representations with deep high-level features, and then take into account the spatial relationships between organs by the use of Conditional Random Fields (CRF). Next, the second deep architecture is employed to refine the segmentation of each organ by using the maps obtained on the first deep architecture to learn anatomical constraints for guiding and refining the segmentations. Experimental results show superior performance on 30 CT scans, comparing with other state-of-the-art methods.
Content may be subject to copyright.
Joint Segmentation of Multiple Thoracic Organs in CT
Images with Two Collaborative Deep Architectures
Roger Trullo1,2, Caroline Petitjean1, Dong Nie2, Dinggang Shen2, and Su Ruan1
1Normandie Univ, UNIROUEN, UNIHAVRE, INSA Rouen, LITIS, 76000 Rouen, France,
2Department of Radiology and BRIC, UNC-Chapel Hill, USA
Abstract. Computed Tomography (CT) is the standard imaging technique for ra-
diotherapy planning. The delineation of Organs at Risk (OAR) in thoracic CT im-
ages is a necessary step before radiotherapy, for preventing irradiation of healthy
organs. However, due to low contrast, multi-organ segmentation is a challenge. In
this paper, we focus on developing a novel framework for automatic delineation
of OARs. Different from previous works in OAR segmentation where each or-
gan is segmented separately, we propose two collaborative deep architectures to
jointly segment all organs, including esophagus, heart, aorta and trachea. Since
most of the organ borders are ill-defined, we believe spatial relationships must be
taken into account to overcome the lack of contrast. The aim of combining two
networks is to learn anatomical constraints with the first network, which will be
used in the second network, when each OAR is segmented in turn. Specifically,
we use the first deep architecture, a deep SharpMask architecture, for providing
an effective combination of low-level representations with deep high-level fea-
tures, and then take into account the spatial relationships between organs by the
use of Conditional Random Fields (CRF). Next, the second deep architecture is
employed to refine the segmentation of each organ by using the maps obtained
on the first deep architecture to learn anatomical constraints for guiding and re-
fining the segmentations. Experimental results show superior performance on 30
CT scans, comparing with other state-of-the-art methods.
Keywords: Anatomical constraints, CT Segmentation, Fully Convolutional Networks
(FCN), CRF, CRFasRNN, Auto-Context Model
1 Introduction
In medical image segmentation, many clinical settings include the delineation of mul-
tiple objects or organs, e.g., the cardiac ventricles, and thoracic or abdominal organs.
From a methodological point of view, the ways of performing multi-organ segmenta-
tion are diverse. For example, multi-atlas approaches in a patch based setting have been
shown effective for segmenting abdominal organs [11]. Many other approaches com-
bine several techniques, such as in [4] where thresholding, generalized hough transform
and an atlas-registration based method are used. The performance of these approaches is
bound to the use of separate methods that can also be computationally expensive. Usu-
ally, organs are segmented individually ignoring their spatial relationships, although
this information could be valuable to the segmentation process.
Fig. 1: Typical CT scan with manual segmentations of the esophagus, heart, trachea and
aorta.
In this paper, we focus on the segmentation of OAR, namely the aorta, esophagus,
trachea and heart, in thoracic CT (Fig. 1), an important prerequisite for radiotherapy
planning in order to prevent irradiation of healthy organs. Routinely, the delineation
is largely manual with poor intra- or inter-practitioners agreement. Note that the auto-
mated segmentation of the esophagus has hardly been addressed in research works as
it is exceptionally challenging: the boundaries in CT images are almost invisible (Fig.
2). Radiotherapists manually segment it based on not only the intensity information,
but also the anatomical knowledge, i.e., the esophagus is located behind the trachea in
the upper part, behind the heart in the lower part, and also next to the aorta in several
parts. More generally, this observation can be made for the other organs as well. Our
aim is to design a framework that would learn this kind of constraints automatically
to improve the segmentation of all OAR and the esophagus in particular. We propose
to tackle the problem of segmenting OAR in a joint manner through the application of
two collaborative deep architectures, which will implicitly learn anatomical constraints
in each of the organs to mitigate the difficulty caused by lack of image contrast. In
particular, we perform an initial segmentation by using a first deep Sharpmask network,
inspired by the refinement framework presented in [8] which allows an effective combi-
nation of low-level features and deep high-level representations. In order to enforce the
spatial and intensity relationships between the organs, the initial segmentation result is
further refined by Conditional Random Fields (CRF) with the CRFasRNN architecture.
We propose to use a second deep architecture which is designed to be able to make use
of the segmentation maps obtained by the first deep architecture of all organs, to learn
the anatomical constraints for the one organ that is currently under refinement of its
segmentation. We show experimentally that our framework outperforms other state-of-
the-art methods. Note that our framework is also generic enough to be applied to other
multi-label joint segmentation problems.
2 Method
2.1 SharpMask feature fusion architecture and CRF refinement
The first deep architecture performs initial segmentation, with its output as a probability
map of each voxel belonging to background, esophagus, heart, aorta, or trachea. In order
to alleviate the loss of image resolution due to the use of pooling operations in regular
Fig. 2: CT scan with manual delineation of the esophagus. Note how the esophagus is
hardly distinguishable.
Convolutional Neural Networks (CNN); Fully Convolutional Networks (FCN) [5] and
some other recent works such as the U-Net [9] and Facebooks SharpMask (SM) [8]
have used skip connections, outperforming many traditional architectures. The main
idea is to add connections from early to deep layers, which can be viewed as a form of
multiscale feature fusion, where low-level features are combined with highly-semantic
representations from deep layers.
In this work, we use an SM architecture that has been shown superior to the regular
FCNs for thoracic CT segmentation [12]. The CRF refinement is done subsequently
with the CRFasRNN architecture, which formulates the mean field approximation us-
ing backpropagable operations [15], allowing the operation to be part of the network
(instead of a separated postprocessing step) and even to learn some of its parameters.
Thus, a new training is performed for fine-tuning the learned weights from the first step,
and also for learning some parameters of the CRF [12]. In the second deep architecture
as described below, the segmentation initial results of the surrounding organs by the first
deep architecture will be used to refine the segmentation of each target organ separately.
2.2 Learning anatomical constraints
The second deep architecture, using SharpMask, is trained to distinguish between back-
ground and each target organ under separate refinement. This architecture has two sets
of inputs, i.e., 1) the original CT image and 2) the initial segmentation results of the
neighbouring organs around the target organ under refinement of segmentation. The
main difference of this second deep architecture, compared to the first deep architecture
with multiple output channels representing different organs and background, is that it
only has two output channels in the last layer, i.e., a probability map representing each
voxel belonging to background or a target organ under refinement of segmentation. The
basic assumption is that the second deep architecture will learn the anatomical con-
straints around the the target organ under refinement of segmentation and thus help to
produce better segmentation for the target organ. In Fig. 3 we show the full framework
with both the first deep architecture (top) and the second deep architecture.
Note that our framework (using two deep architectures) shares some similarity with
a refinement step, called AutoContext Model (ACM) [13], which has been successfully
applied to brain image segmentation [14], by using traditional classifiers such as Ran-
dom Forests. The main idea of ACM is to iteratively refine the posterior probability
Fig. 3: Proposed architecture for multi-organ segmentation. The core sharpmask net-
work is detailed on the right. Numbers indicate the number of channels at each layer.
maps over the labels, given not only the input features, but also the previous proba-
bilities of a large number of context locations which provide information about neigh-
boring organs, forcing the deep network to learn the spatial constrains for each target
organ [14]. In practice, this translates to train several classifiers iteratively, where each
classifier is trained not only with the original image data, but also with the probability
maps obtained from the previous classifier, which gives additional context information
to the new classifier. Comparing our proposed framework with the ACM, we use a deep
architecture. Overall, our method has three advantages: 1) it can avoid the design of
hand-crafted features, 2) our network can automatically learn the selection of context
features, and 3) our method uses less redundant information. Note that, in the classical
ACM, the selection of these features must be hard-coded; that is, the algorithm designer
has to select a specific number of sparse locations (i.e., using sparse points from rays at
different angles from a center point [13]), which makes the context information limited
within a certain range by the algorithm designer. On the other hand, in our method,
the context information can be automatically learned by the deep network, and limited
only by the receptive field of the network (which can even be the whole range of the
image in deep networks). Regarding the redundancy, ACM uses the probability maps
of all organs as input, which is often very similar to the ground-truth label maps. In
this way, the ACM is not able to further refine the results. In our method, we use only
the complementary information, the label probability maps of the neighboring organs
around the target organ under refinement of segmentation.
3 Experiments
In our implementation, the full slices of the CT scan are the inputs to our proposed
framework. Both the first and second architectures use large filters (i.e., 7×7, or 7×7×7),
as large filters have been shown beneficial for CT segmentation [2]. Both 2D and 3D
settings are implemented in our study. We have found that, different from MRI seg-
mentation [7], small patches are not able to produce good results for CT segmentation.
Thus, we use patches of size 160 ×160 ×48 as the training samples. Specifically, we
first build a 3D mesh model for each organ in all the training CT images, and then de-
fine each vertex as the mean of a certain Gaussian distribution with diagonal covariance,
from which we can sample points as the centers of the respective patches. In this way,
the training samples will contain important boundary information and also background
information. In particular, the elements in the diagonal are chosen to be 5, in such a way,
the kernel size used would include them when centered in the boundary. In addition, it
is also important to sample inside the organs and thus, we also sample in an uniform
grid.
3.1 Dataset and pre-processing
The dataset used in this paper contains 30 CT scans, each with lung cancer or Hodgkin
lymphoma and 6-fold cross validation is performed. Manual segmentations of the four
OAR are available for each CT scan, along with the body contour (which can used to re-
move background voxels during the training). The scans have 512 ×512 ×(150 284)
voxels with a resolution of 0.98×0.98×2.5 mm3. For each CT scan, its intensities are
normalized to have zero mean and unit variance, and it is also augmented to generate
more CT samples (for improving the robustness of training) through a set of random
affine transformations and random deformation fields (generated with B-spline interpo-
lation [6]). In particular, an angle between -5 to 5 degrees and a scale factor between 0.9
and 1.1 were randomly selected for each CT scan to produce the random affine trans-
formation. These values were selected empirically trying to produce realistic CT scans
similar to those of the available dataset.
3.2 Training
For organ segmentation in CT images, the data samples are often highly imbalanced,
i.e., with more background voxels than the target organ voxels. This needs to be con-
sidered when computing the loss function in the training. We utilize a weighted cross-
entropy loss function, where each weight is calculated as the complement of the proba-
bility of each class. In this way, more importance will be given to small organs, and also
each class will contribute to the loss function in a more equally way. We have found that
this loss function leads to better performance than using a regular (equally-weighted)
loss function. However, the results are still not reaching our expected level. For further
improvement, we use our above-obtained weights as initialization for the network, and
then fine-tune them by using the regular cross-entropy loss. This new integrated strat-
egy always outperforms the weighted or the regular cross-entropy loss function. In our
optimization, stochastic gradient descent is used as optimizer, with an initial learning
rate of 0.1 that is divided by 10 every 20 epochs, and the network weights are initialized
with the Xavier algorithm [3].
3.3 Results
In Fig. 4, we illustrate the improvement on the esophagus segmentation by using our
proposed framework with learned anatomical constraints. The last column shows the
results using the output of the first network as anatomical constraints. We can see how
the anatomical constraints can help produce a more accurate result on the segmentation
of the esophagus, even when having air inside (black voxels inside the esophagus).
Interestingly, the results obtained by using the output of the first network or the ground-
truth manual labels as anatomical constraints are very similar, almost with negligible
differences. Similar conclusions can also be drawn for segmentations of other organs.
In Fig. 5, we show the segmentation results for the aorta, trachea, and heart, with and
without anatomical constraints. In the cases of segmenting the aorta and trachea, the
use of anatomical constraints improves the segmentation accuracy. For the trachea, our
network is able to generalize to segment the whole part on the right lung (i.e., left
side of the image) even when it was segmented partially in the manual ground-truth.
On the other hand, for the heart, there are some false positives when using anatomical
constraints, as can be seen in the third column. However, accurate contours are obtained,
which are even better than those obtained without anatomical constraints, as can be seen
in the fourth column.
Fig. 4: Segmentation results for the esophagus. a) Input data to the second network,
with the anatomical constraints overlapped; results using b) only the first network with-
out anatomical constraints, c) manual labels on the neighboring organs as anatomical
constraints, d) the output of the first network as anatomical constraints.
In Table 1, we report the Dice ratio (DR) obtained using each of the comparison
methods, including a state-of-the-art 3D multi-atlas patch-based method (called OPAL
(Optimized PatchMatch for Near Real Time and Accurate Label Fusion [10])) and dif-
ferent deep architecture variants using 2D or 3D as well as different combinations of
strategies. Specifically, SM2D and SM3D refer to the use of the Network 1 in Fig. 3
using 2D or 3D respectively. We also tested their refinement with ACM and CRF, and
finally, the proposed framework is denoted as SM2D+Constraints. As OPAL mainly
compares patches for guiding the segmentation, OPAL should be effective in segment-
ing the clearly observable organs, such as the trachea (an identifiable black area), which
is true as indicated by the table. But, for the organs with either low contrast or large
intensity variation across slices and subjects, which is the the case for the esophagus,
the respective performance is seriously affected, as the table shows. The highest perfor-
mance for each organ is obtained by the SM2D-based architectures, while all 3D-based
architectures do not improve the segmentation performance. This is possibly due to
Fig. 5: Segmentation without (1st row) and with (2nd row) anatomical constraints.
Green contours denote manual ground-truths, and red contours denote our automatic
segmentation results. Right panel shows the 3D rendering for our segmented four or-
gans, i.e., aorta (blue), heart (beige), trachea (brown), and esophagus (green).
large slice thickness in the CT scans, as noticed also in [1], where the authors pre-
ferred to handle the third dimension by the recurrent neural networks, instead of 3D
convolutions. Another observation is that the ACM model is not able to outperform the
CRF refinement. We believe that this is mainly due to the fact that the CRF used is
fully connected and not based on the neighboring regions. The latter has been used as
comparison in the ACM [13], for claiming that the advantage is coming from the con-
text range information that the framework can reach. On the other hand, our proposed
framework is able to improve the performance for all the organs, except the heart whose
quantitative results are very similar to those obtained by the first network, and which
can be well-segmented by it, by leveraging the large heart size and also the good im-
age contrast around it. However, the quality of the obtained contours with the proposed
framework is better as shown in Fig.5. Although room for improvement is still left for
the esophagus (with mean DR value of 0.69), the experimental results show that our
proposed framework does bring an improvement, compared to the other methods.
Table 1: Comparison of mean DR ±stdev by different methods. Last column indicates
our proposed framework.
OPAL SM3D SM3D+ACM SM2D SM2D+CRF SM2D+ACM SM+Constraints
Esoph. 0.39±0.05 0.55±0.08 0.56±0.05 0.66±0.08 0.67±0.04 0.67±0.04 0.69±0.05
Heart 0.62±0.07 0.77±0.05 0.83±0.02 0.89±0.02 0.90±0.01 0.91±0.01 0.90±0.03
Trach. 0.80±0.03 0.71±0.06 0.82±0.03 0.83±0.06 0.82±0.06 0.79±0.06 0.87±0.02
Aorta 0.49±0.10 0.79±0.06 0.77±0.04 0.85±0.06 0.86±0.05 0.85±0.06 0.89±0.04
4 Conclusions
We have proposed a novel framework for joint segmentation of OAR in CT images.
It provides a way to learn the relationship between organs which can give anatomical
contextual constraints in the segmentation refinement procedure to improve the perfor-
mance. Our proposed framework includes two collaborative architectures, both based
on the SharpMask network, which allows for effective combination of low-level fea-
tures and deep highly-semantic representations. The main idea is to implicitly learn the
spatial anatomical constraints in the second deep architecture, by using the initial seg-
mentations of all organs (but a target organ under refinement of segmentation) from the
first deep architecture. Our experiments have shown that the initial segmentations of the
surrounding organs can effectively guide the refinement of segmentation of the target
organ. An interesting observation is that our network is able to automatically learn spa-
tial constraints, without specific manual guidance.
Acknowledgment This work is co-financed by the European Union with the Euro-
pean regional development fund (ERDF, HN0002137) and by the Normandie Regional
Council via the M2NUM project.
References
1. J. Chen, L. Yang, Y. Zhang, M. Alber, D. Z. Chen: Combining fully convolutional and recur-
rent neural networks for 3d biomedical image segmentation. In: NIPS. pp. 3036–44 (2016)
2. Q. Dou, H. Chen, Y. Jin, L. Yu, J. Qin, P. Heng: 3d deeply supervised network for automatic
liver segmentation from CT volumes. CoRR abs/1607.00582 (2016)
3. X. Glorot, Y. Bengio: Understanding the difficulty of training deep feedforward neural net-
works. In: AISTATS (2010)
4. M. Han, J. Ma, Y. Li, M. Li, Y. Song, Q. Li: Segmentation of organs at risk in ct volumes of
head, thorax, abdomen, and pelvis. In: Proc. SPIE. vol. 9413, pp. 94133J–6 (2015)
5. J. Long, E. Shelhamer, T. Darrell: Fully convolutional networks for semantic segmentation.
In: CVPR (2015)
6. F. Milletari, N. Navab, S. Ahmadi: V-net: Fully convolutional neural networks for volumetric
medical image segmentation. CoRR abs/1606.04797 (2016)
7. D. Nie, L. Wang, Y. Gao, D. Shen: Fully convolutional networks for multi-modality isoin-
tense infant brain image segmentation. In: ISBI. pp. 1342–1345 (2016)
8. P. H. O. Pinheiro, T. Lin, R. Collobert, P. Doll´
ar: Learning to refine object segments. CoRR
abs/1603.08695 (2016)
9. O. Ronneberger, P. Fischer, T. Brox: U-net: Convolutional networks for biomedical image
segmentation. In: MICCAI. LNCS, vol. 9351, pp. 234–241 (2015)
10. V.-T. Ta, R. Giraud, D. L. Collins, P. Coup´
e: Optimized patchmatch for near real time and
accurate label fusion. In: MICCAI. pp. 105–112 (2014)
11. T. Tong, et al.: Discriminative dictionary learning for abdominal multi-organ segmentation.
Medical Image Analysis 23, 92–104 (2015)
12. R. Trullo, C. Petitjean, S. Ruan, B. Dubray, D. Nie, D. Shen: Segmentation of organs at risk
in thoracic CT images using a sharpmask architecture and conditional random fields. In: ISBI
(2017)
13. Z. Tu, X. Bai: Auto-context and its application to high-level vision tasks and 3d brain image
segmentation. IEEE Trans on Pattern Anal Mach Intell 32(10), 1744–1757 (2010)
14. L. Wang, et al.: Links: Learning-based multi-source integration framework for segmentation
of infant brain images. NeuroImage 108, 160 – 172 (2015)
15. S. Zheng, et al.: Conditional random fields as recurrent neural networks. In: ICCV (2015)
... To address this constraint, researchers have proposed cascade multi-stage methods, which can be categorized into two types. One is coarse-to-finebased method [131][132][133][134][135][136][137][138][139][140][141], where the first network is utilized to acquire a coarse segmentation, followed by the second network that refines the coarse outcomes for improved accuracy. The other is localization and segmentation-based method [105,[142][143][144][145][146][147][148][149][150][151][152][153], Page where registration methods or localization networks are used to identify candidate boxes for the location of each organ, which are then input into the segmentation network, which is shown in Fig. 7 (B). ...
... This probability map will multiply the original image and be input into the second network to refine the coarse segmentation, as illustrated in Fig. 7(A). Over the years, numerous methods utilizing the coarseto-fine method have been developed for multi-organ segmentation, with references in [131][132][133][134][135][136][137][138][139][140][141]. Trullo et al. [72] proposed 2 deep architectures that work synergistically to segment several organs such as the esophagus, heart, aorta, and trachea. ...
Article
Full-text available
Accurate segmentation of multiple organs in the head, neck, chest, and abdomen from medical images is an essential step in computer-aided diagnosis, surgical navigation, and radiation therapy. In the past few years, with a data-driven feature extraction approach and end-to-end training, automatic deep learning-based multi-organ segmentation methods have far outperformed traditional methods and become a new research topic. This review systematically summarizes the latest research in this field. We searched Google Scholar for papers published from January 1, 2016 to December 31, 2023, using keywords “multi-organ segmentation” and “deep learning”, resulting in 327 papers. We followed the PRISMA guidelines for paper selection, and 195 studies were deemed to be within the scope of this review. We summarized the two main aspects involved in multi-organ segmentation: datasets and methods. Regarding datasets, we provided an overview of existing public datasets and conducted an in-depth analysis. Concerning methods, we categorized existing approaches into three major classes: fully supervised, weakly supervised and semi-supervised, based on whether they require complete label information. We summarized the achievements of these methods in terms of segmentation accuracy. In the discussion and conclusion section, we outlined and summarized the current trends in multi-organ segmentation.
... Besides, the MRI can present several problems, like intensity inhomogeneity [8], or varying intensities between the same sequences [9]. Due to the high performance provided by deep learning (DL) in various areas, medical researchers have also exploited this technique in different axes such as the brain [7,10,11], the lung [12], the pancreas [13,14], the prostate [15], and multi-organ [16,17]. The schemes based on DL have provided superior performance compared to traditional segmentation methods. ...
... The manual analysis of these MRIs is fastidious and time-consuming. It is very important to develop an accurate and fully automatic In this paper, we proposed a fully automatic CNN due to the high performance provided in various areas [7][8][9][10][11][12][13][14][15][16][17]. The schemes based on DL have provided superior performance compared to traditional segmentation methods. ...
Article
Full-text available
The quantitative analysis of brain magnetic resonance imaging (MRI) represents a tiring routine and enormously on accurate segmentation of some brain regions. Gliomas represent the most common and aggressive brain tumors. In their highest grade, it can lead to a very short life. The treatment planning is decided after the analysis of MRI data to assess tumors. This treatment is manually performed which needs time and represents a tedious task. Automatic and accurate segmentation technique becomes a challenging problem since these tumors can take a variety of sizes, contrast, and shape. For these reasons, we are motivated to suggest a new segmentation approach using deep learning. A new segmentation scheme is suggested using Convolutional Neural Networks (CNN). The presented scheme is tested using recent datasets (BraTS 2017, 2018, and 2020). It achieves good performances compared to new methods, with Dice scores of 0.86 for the Whole Tumor, 0.82 for Tumor Core, and 0.6 for Enhancing Tumor based on the first dataset. According to the second dataset, the three regions had an average of 0.88, 0.77, and 0.65, respectively. The new dataset provides 0.87, 0.91, and 0.79 for the three regions, respectively.
... Introduction Artificial intelligence (AI) methods have been applied in radiotherapy, supporting the contouring of the target and organs at risk volumes [1][2][3][4], the prediction of clinical outcomes [5][6][7], the dose distribution predictions [7][8][9], synthetic image reconstructions [10][11][12][13][14], and the dose deliverability prediction [15,16], among others [17][18][19][20][21]. Certainly, in the past six years, machine learning (ML) methods dedicated to quality assurance (QA) predictions of intensitymodulated radiotherapy (IMRT) and volumetric modulated arc therapy (VMAT) treatments have increasingly been studied [22][23][24]. The most common ML models implemented in this matter are Poisson regression [25][26][27], decision trees-based models (e.g., random forest or gradient boosting models) [28,29], support vector machine (SVM) [25,30], and artificial neural networks (ANN) or convolutional neural networks (CNN) [15,31,32]. ...
Article
Full-text available
Machine learning (ML) methods have been implemented in radiotherapy to aid virtual specific-plan verification protocols, predicting gamma passing rates (GPR) based on calculated modulation complexity metrics (MCS) because of their direct relation to dose deliverability. Nevertheless, these metrics might not comprehensively represent the modulation complexity, and automatically extracted features from alternative predictors associated with modulation complexity are needed. For this reason, three convolutional neural networks (CNN) -based models were trained to predict GPR values (regression and classification), using respectively three predictors: (1) the modulation maps (MM) from the multi-leaf collimator (MLC), (2) the relative monitor units per control point profile (MUcp), and (3) the composite dose image used for portal dosimetry, from 1024 anonymized prostate plans. The models’ performance was assessed for classification and regression by the area under the receiver operator characteristic curve (AUC_ROC) and Spearman’s correlation coefficient (r). Finally, four hybrid models were designed using all possible combinations of the three predictors. The prediction performance for the CNN-models using single predictors (MM, MUcp, and CDI) were AUC_ROC = 0.84±0.03, 0.77±0.07, 0.75±0.04, and r = 0.6, 0.5, 0.7. Contrastingly, the hybrid models (MM+MUcp, MM+CDI, MUcp+CDI, MM+MUcp+CDI) performance were AUC_ROC = 0.94 ± 0.03, 0.85 ± 0.06, 0.89 ± 0.06, 0.91 ± 0.03, and r = 0.7, 0.5, 0.6, 0.7. The MP, MUcp, and CDI are suitable predictors for dose deliverability models implementing ML methods. Additionally, hybrid models are susceptible to improving their prediction performance, including two or more input predictors.
... Recently, deep learning showed very promising results in image classification (Ciregan et al., 2012), object detection (Szegedy et al., 2013), and image segmentation (Badrinarayanan et al., 2017). In the medical imaging field, various applications have emerged in different areas (Yousefirizi et al., 2022), including pathology classification (Janowczyk and Madabhushi, 2016), treatment response prediction (Amyar et al., 2018), lesions segmentation (Kamnitsas et al., 2017) and organs at risk segmentation (Trullo et al., 2017). Thus, artificial intelligence in general and deep learning in particular can come in handy to develop computer aided diagnostic applications (CAD). ...
Article
Background and Objectives: Predicting patient response to treatment and survival in oncology is a prominent way towards precision medicine. To this end, radiomics has been proposed as a field of study where images are used instead of invasive methods. The first step in radiomic analysis in oncology is lesion segmentation. However, this task is time consuming and can be physician subjective. Automated tools based on supervised deep learning have made great progress in helping physicians. However, they are data hungry, and annotated data remains a major issue in the medical field where only a small subset of annotated images are available. Methods: In this work, we propose a multi-task, multi-scale learning framework to predict patient’s survival and response. We show that the encoder can leverage multiple tasks to extract meaningful and powerful features that improve radiomic performance. We also show that subsidiary tasks serve as an inductive bias so that the model can better generalize. Results: Our model was tested and validated for treatment response and survival in esophageal and lung cancers, with an area under the ROC curve of 77% and 71% respectively, outperforming single-task learning methods. Conclusions: Multi-task multi-scale learning enables higher performance of radiomic analysis by extracting rich information from intratumoral and peritumoral regions.
Article
Background: Non-contrast chest CT is widely used for lung cancer screening, and its images carry potential information of the thoracic aorta. The morphological assessment of the thoracic aorta may have potential value in the presymptomatic detection of thoracic aortic-related diseases and the risk prediction of future adverse events. However, due to low vasculature contrast in such images, visual assessment of aortic morphology is challenging and highly depends on physicians' experience. Purpose: The main objective of this study is to propose a novel multi-task framework based on deep learning for simultaneous aortic segmentation and localization of key landmarks on unenhanced chest CT. The secondary objective is to use the algorithm to measure quantitative features of thoracic aorta morphology. Methods: The proposed network is composed of two subnets to carry out segmentation and landmark detection, respectively. The segmentation subnet aims to demarcate the aortic sinuses of the Valsalva, aortic trunk and aortic branches, whereas the detection subnet is devised to locate five landmarks on the aorta to facilitate morphology measures. The networks share a common encoder and run decoders in parallel, taking full advantage of the synergy of the segmentation and landmark detection tasks. Furthermore, the volume of interest (VOI) module and the squeeze-and-excitation (SE) block with attention mechanisms are incorporated to further boost the capability of feature learning. Results: Benefiting from the multitask framework, we achieved a mean Dice score of 0.95, average symmetric surface distance of 0.53 mm, Hausdorff distance of 2.13 mm for aortic segmentation, and mean square error (MSE) of 3.23 mm for landmark localization in 40 testing cases. Conclusion: We proposed a multitask learning framework which can perform segmentation of the thoracic aorta and localization of landmarks simultaneously and achieved good results. It can support quantitative measurement of aortic morphology for further analysis of aortic diseases, such as hypertension.
Chapter
This chapter deals with the clinical task of measuring and quantifying cardiac morphology and function. The chapter opens with a clinical introduction, which outlines current clinical workflows, including how they derive and make use of biomarkers as well as their weaknesses and limitations. A technical review is provided that summarises the state-of-the-art in AI for automated measurement and quantification. This technical section describes the common deep learning models that have been proposed for measurement and quantification, and also gives some specific examples of their application. A practical tutorial is provided on a simple CMR segmentation task. The chapter closes with a clinical opinion piece that speculates on the future impact of AI in this area.KeywordsMeasurementQuantificationSegmentationDeep learningBiomarkerU-net
Article
Full-text available
Background and purpose The geometrical accuracy of auto-segmentation using convolutional neural networks (CNNs) has been demonstrated. This study aimed to investigate the dosimetric impact of differences between automatic and manual OARs for locally advanced (LA) and peripherally located early-stage (ES) non-small cell lung cancer (NSCLC). Material and methods A single CNN was created for automatic delineation of the heart, lungs, main left and right bronchus, esophagus, spinal cord and trachea using 55/10/40 patients for training/validation/testing. Dice score coefficient (DSC) and 95th percentile Hausdorff distance (HD95) were used for geometrical analysis. A new treatment plan based on the auto-segmented OARs was created for each test patient using 3D for ES-NSCLC (SBRT, 3-8 fractions) and IMRT for LA-NSCLC (24-35 fractions). The correlation between geometrical metrics and dose-volume differences was investigated. Results The average (± 1 SD) DSC and HD95 were 0.82 ± 0.07 and 16.2 ± 22.4 mm, while the average dose-volume differences were 0.5 ± 1.5 Gy (ES) and 1.5 ± 2.8 Gy (LA). The geometrical metrics did not correlate with the observed dose-volume differences (average Pearson for DSC: -0.27 ± 0.18 (ES) and -0.09 ± 0.12 (LA); HD95: 0.1 ± 0.3 mm (ES) and 0.2 ± 0.2 mm (LA)). Conclusions After post-processing, manual adjustments of automatic contours are only needed for clinically relevant OARs situated close to the tumor or within an entry or exit beam e.g., the heart and the esophagus for LA-NSCLC and the bronchi for ES-NSCLC. The lungs do not need to be checked further in detail.
Conference Paper
Full-text available
Cancer is one of the leading causes of death worldwide. Radiotherapy is a standard treatment for this condition and the first step of the radiotherapy process is to identify the target volumes to be targeted and the healthy organs at risk (OAR) to be protected. Unlike previous methods for automatic segmentation of OAR that typically use local information and individually segment each OAR, in this paper, we propose a deep learning framework for the joint segmentation of OAR in CT images of the thorax, specifically the heart, esophagus, trachea and the aorta. Making use of Fully Convolutional Networks (FCN), we present several extensions that improve the performance, including a new architecture that allows to use low level features with high level information, effectively combining local and global information for improving the localization accuracy. Finally, by using Conditional Random Fields (specifically the CRF as Recurrent Neural Network model), we are able to account for relationships between the organs to further improve the segmentation results. Experiments demonstrate competitive performance on a dataset of 30 CT scans.
Article
Full-text available
Segmentation of 3D images is a fundamental problem in biomedical image analysis. Deep learning (DL) approaches have achieved the state-of-the-art segmentation performance. To exploit the 3D contexts using neural networks, known DL segmen- tation methods, including 3D convolution, 2D convolution on the planes orthogonal to 2D slices, and LSTM in multiple directions, all suffer incompatibility with the highly anisotropic dimensions in common 3D biomedical images. In this paper, we propose a new DL framework for 3D image segmentation, based on a com- bination of a fully convolutional network (FCN) and a recurrent neural network (RNN), which are responsible for exploiting the intra-slice and inter-slice contexts, respectively. To our best knowledge, this is the first DL framework for 3D image segmentation that explicitly leverages 3D image anisotropism. Evaluating using a dataset from the ISBI Neuronal Structure Segmentation Challenge and in-house image stacks for 3D fungus segmentation, our approach achieves promising results, comparing to the known DL-based 3D segmentation approaches.
Article
Full-text available
Automatic liver segmentation from CT volumes is a crucial prerequisite yet challenging task for computer-aided hepatic disease diagnosis and treatment. In this paper, we present a novel 3D deeply supervised network (3D DSN) to address this challenging task. The proposed 3D DSN takes advantage of a fully convolutional architecture which performs efficient end-to-end learning and inference. More importantly, we introduce a deep supervision mechanism during the learning process to combat potential optimization difficulties, and thus the model can acquire a much faster convergence rate and more powerful discrimination capability. On top of the high-quality score map produced by the 3D DSN, a conditional random field model is further employed to obtain refined segmentation results. We evaluated our framework on the public MICCAI-SLiver07 dataset. Extensive experiments demonstrated that our method achieves competitive segmentation results to state-of-the-art approaches with a much faster processing speed.
Article
Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build "fully convolutional" networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a novel architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes one third of a second for a typical image.
Conference Paper
Automatic liver segmentation from CT volumes is a crucial prerequisite yet challenging task for computer-aided hepatic disease diagnosis and treatment. In this paper, we present a novel 3D deeply supervised network (3D DSN) to address this challenging task. The proposed 3D DSN takes advantage of a fully convolutional architecture which performs efficient end-to-end learning and inference. More importantly, we introduce a deep supervision mechanism during the learning process to combat potential optimization difficulties, and thus the model can acquire a much faster convergence rate and more powerful discrimination capability. On top of the high-quality score map produced by the 3D DSN, a conditional random field model is further employed to obtain refined segmentation results. We evaluated our framework on the public MICCAI-SLiver07 dataset. Extensive experiments demonstrated that our method achieves competitive segmentation results to state-of-the-art approaches with a much faster processing speed.
Conference Paper
Object segmentation requires both object-level information and low-level pixel data. This presents a challenge for feedforward networks: lower layers in convolutional nets capture rich spatial information, while upper layers encode object-level knowledge but are invariant to factors such as pose and appearance. In this work we propose to augment feedforward nets for object segmentation with a novel top-down refinement approach. The resulting bottom-up/top-down architecture is capable of efficiently generating high-fidelity object masks. Similarly to skip connections, our approach leverages features at all layers of the net. Unlike skip connections, our approach does not attempt to output independent predictions at each layer. Instead, we first output a coarse ‘mask encoding’ in a feedforward pass, then refine this mask encoding in a top-down pass utilizing features at successively lower layers. The approach is simple, fast, and effective. Building on the recent DeepMask network for generating object proposals, we show accuracy improvements of 10–20% in average recall for various setups. Additionally, by optimizing the overall network architecture, our approach, which we call SharpMask, is 50 % faster than the original DeepMask network (under .8 s per image).
Conference Paper
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
Conference Paper
The segmentation of infant brain tissue images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) plays an important role in studying early brain development. In the isointense phase (approximately 6-8 months of age), WM and GM exhibit similar levels of intensity in both T1 and T2 MR images, resulting in extremely low tissue contrast and thus making the tissue segmentation very challenging. The existing methods for tissue segmentation in this isointense phase usually employ patch-based sparse labeling on single T1, T2 or fractional anisotropy (FA) modality or their simply-stacked combinations without fully exploring the multi-modality information. To address the challenge, in this paper, we propose to use fully convolutional networks (FCNs) for the segmentation of isointense phase brain MR images. Instead of simply stacking the three modalities, we train one network for each modality image, and then fuse their high-layer features together for final segmentation. Specifically, we conduct a convolution-pooling stream for multimodality information from T1, T2, and FA images separately, and then combine them in high-layer for finally generating the segmentation maps as the outputs. We compared the performance of our approach with that of the commonly used segmentation methods on a set of manually segmented isointense phase brain images. Results showed that our proposed model significantly outperformed previous methods in terms of accuracy. In addition, our results also indicated a better way of integrating multi-modality images, which leads to performance improvement.