Content uploaded by Yun Gu
Author content
All content in this area was uploaded by Yun Gu on Aug 17, 2024
Content may be subject to copyright.
Medical Image Analysis (2023)
Contents lists available at ScienceDirect
Medical Image Analysis
journal homepage: www.elsevier.com/locate/media
Multi-site, Multi-domain Airway Tree Modeling (ATM’22): A Public Benchmark for
Pulmonary Airway Segmentation
Minghui Zhanga,1, Yangqian Wua,1, Hanxiao Zhanga,1 , Yulei Qina,1, Hao Zhenga,1 , Wen Tangc,2, Corey Arnoldq,2 , Chenhao Peic,2,
Pengxin Yuc,2, Yang Nand,2, Guang Yangd,2, Simon Walshd,2, Dominic C. Marshallp,2, Matthieu Komorowskip,2, Puyang Wange,2,
Dazhou Guof,2, Dakai Jinf,2 , Ya’nan Wug,2, Shuiqing Zhaog,2, Runsheng Changg,2 , Boyu Zhangh,2, Xing Lvi,2 , Abdul Qayyumj,2,
Moona Mazherr,2,QiSuk,2, Yonghuang Wul,2, Ying’ao Lium,2, Yufei Zhun,2, Jiancheng Yangn,o,2, Ashkan Pakzadt,2, Bojidar
Rangelovu,2, Raul San Jose Estepars,2 , Carlos Cano Espinosas,2, Jiayuan Sunb,1 , Guang-Zhong Yanga,1,∗, Yun Gua,1,∗
aInstitute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200240, China
bDepartment of Respiratory and Critical Care Medicine, Department of Respiratory Endoscopy, Shanghai Chest Hospital, Shanghai, China
cInferVision Medical Technology Co., Ltd., Beijing, China
dImperial College London, London, UK
eAlibaba DAMO Academy, 969 West Wen Yi Road, Hangzhou, Zhejiang, China
fAlibaba DAMO Academy USA, 860 Washington Street, 8F, New York, USA
gCollege of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
hA.I R&D Center, Sanmed Biotech Inc., No. 266 Tongchang Road, Xiangzhou District, Zhuhai, Guangdong, China
iA.I R&D Center, Sanmed Biotech Inc., T220 Trade st. SanDiego, CA, USA
jENIB, UMR CNRS 6285 LabSTICC, Brest, 29238, France
kShanghai Jiao Tong University, Shanghai, China
lSchool of Information Science and Technology, Fudan University, Shanghai, China
mUniversity of Science and Technology of China, Hefei, Anhui, China
nDianei Technology, Shanghai, China
oEPFL, Lausanne, Switzerland
pDepartment of Surgery and Cancer, Imperial College London, London, UK
qUniversity of California, Los Angeles, CA, USA
rDepartment of Computer Engineering and Mathematics, University Rovira I Virgili, Tarragona, Spain
sBrigham and Women’s Hospital, Harvard Medical School, Somerville, MA 02145, USA
tMedical Physics and Biomedical Engineering Department, University College London, London, UK
uCenter for Medical Image Computing, University College London, London, UK
ARTICLE INFO
Article history:
2000 MSC: 41A05, 41A10, 65D05,
65D17
Keywords: Pulmonary Airway Segmen-
tation, Traditional and Deep-Learning
Methods, Topological Prior Knowledge.
ABSTRACT
Open international challenges are becoming the de facto standard for assessing com-
puter vision and image analysis algorithms. In recent years, new methods have extended
the reach of pulmonary airway segmentation that is closer to the limit of image resolu-
tion. Since EXACT’09 pulmonary airway segmentation, limited effort has been directed
to the quantitative comparison of newly emerged algorithms driven by the maturity of
deep learning based approaches and extensive clinical efforts for resolving finer details
of distal airways for early intervention of pulmonary diseases. Thus far, public anno-
tated datasets are extremely limited, hindering the development of data-driven methods
and detailed performance evaluation of new algorithms. To provide a benchmark for the
medical imaging community, we organized the Multi-site, Multi-domain Airway Tree
Modeling (ATM’22), which was held as an official challenge event during the MICCAI
∗Corresponding authors.
e-mail: gzyang@sjtu.edu.cn (Guang-Zhong Yang), geron762@sjtu.edu.cn (Yun Gu)
1Belongs to the ATM’22 Organizers.
2Belongs to the ATM’22 Participants.
2 This is the draft version for editing. /Medical Image Analysis (2023)
2022 conference. ATM’22 provides large-scale CT scans with detailed pulmonary air-
way annotation, including 500 CT scans (300 for training, 50 for validation, and 150 for
testing). The dataset was collected from different sites and it further included a portion
of noisy COVID-19 CTs with ground-glass opacity and consolidation. Twenty-three
teams participated in the entire phase of the challenge and the algorithms for the top ten
teams are reviewed in this paper. Both quantitative and qualitative results revealed that
deep learning models embedded with the topological continuity enhancement achieved
superior performance in general. ATM’22 challenge holds as an open-call design, the
training data and the gold standard evaluation are available upon successful registration
via its homepage (https://atm22.grand-challenge.org/).
©2023 Elsevier B. V. All rights reserved.
1. Introduction
1.1. Background
Deep learning methods are reshaping the general practice of
image segmentation. In addition to novel network designs, the
performance of these algorithms is largely dependent on the
scale of the training data set and clinical accuracy of the annota-
tion used. For fair assessment of these algorithms, many grand-
challenges have been organized, focusing on organs including
the brain (Mendrik et al., 2015), abdominal multi-organs (Ma
et al., 2021), heart (Zhuang et al., 2019), skin lesion (Codella
et al., 2018a) and breast cancer (Aresta et al., 2019).
For pulmonary airway segmentation, limited attention has
been paid since the EXACT’09 challenge (Lo et al., 2012).
Clinically, accurate segmentation of the pulmonary airway
based on Computed Tomography (CT) is the prerequisite to
the diagnosis and treatment of small airway diseases. It also
plays an important role for pre-operative planning and intra-
operative guidance of minimally invasive endobronchial inter-
ventions. With increasing miniaturization of bronchoscopes
empowered by robot assistance, small branches beyond the the
5th generation of airways are routinely treated. Due to the
fine-grained pulmonary airway structure further complicated
by complex bifurcating topology, manual annotation is time-
consuming, error-prone, and requires a high level of clinical
skills.
As an example, Fig.1 presents a pulmonary airway structure
with different levels of annotations: The branch-wise anatom-
ical airway blended with the original CT is shown in Fig.1(a)
where the generations of bronchi are also annotated. Typi-
cal workflow involves the following steps: The binary mask
of airway is first segmented based on CT images as shown in
Fig.1(b); Based on the binary mask, the skeleton or centreline
of the airway in Fig.1(c) can be extracted via the morphological
operations; By detecting the branching points and ending points
as shown in Fig.1(d), the generations of airway can be then de-
termined. Since the morphology of the small distal bronchi can
be fine-grained, it is challenging to delineate the airways from
scratch for each patient To expedite the exploration of the air-
ways, automatic airway segmentation algorithms are in high de-
mand clinically.
1.2. Challenges of Pulmonary Airway Segmentation
Detailed pulmonary airway segmentation, which tradition-
ally works on the level of trachea and bronchi and ideally reach-
ing all the way to alveoli should the imaging resolution per-
mits. However, to acquire the fine-grained airway tree structure
is practically difficult. The main challenges involved the fol-
lowing aspects.
Challenge 1: Leakage (C1). The leakage phenomenon, as
seen in the Figure 2.a) (excerpted from (Charbonnier et al.,
2017)), is a common challenge that exerts on the pulmonary
airway segmentation algorithms. The leakage problem usu-
ally occurs on the small airway branches or lesion surrounding
areas (e.g., emphysema and bronchiectasis.) (Pu et al., 2012).
The segmentation methods are likely to leak into the adjacent
lung parenchyma through blurred airway walls or soft bound-
aries due to the highly variable intensity levels in the lumen
area. The traditional methods suffer the leakage problem more
severely than the deep learning methods since they usually
function on the low-level features of images. For example,
the intensity-based region growing methods often leak to lung
parenchyma through blurred/broken boundaries at small air-
ways. Rule-based method (Sonka et al., 1996) and morphology-
based method (Aykac et al., 2003) encounter similar problems
as well. The deep learning methods can extract semantic high-
level features that are discriminative to airways, which allevi-
ates the leakage problem. Unlike the mass of leakage that hap-
pened in the traditional methods, the form of leakage encoun-
tered by deep learning methods is that the prediction are thicker
than the ground-truths. This phenomenon is defined as gradient
dilation (Zheng et al., 2021b) when assigning larger weights to
the peripheral bronchi.
Challenge 2: Breakage (C2). The breakage phenomenon
refers to discontinuity of segmented structures due to noise
or errors, which can be seen in Figure 2.b) (adapted from
(Zheng et al., 2021b)). The breakage merely induces marginal
voxel-level errors, however, the topological structure is to-
tally changed after the largest connected component extraction.
The breakage problem is detrimental to the airway segmenta-
tion task because only the largest connected component of the
airway result is useful to the bronchoscopic-assisted surgery.
The presence of the breakages will cause interrupted trajec-
tories. Different from the leakage problem, the deep learning
This is the draft version for editing. /Medical Image Analysis (2023) 3
a) b) c) d)
Fig. 1. The hierarchical illustration of the pulmonary airway structure. a) represents the branch-wise anatomical airway, overlapping with the original
CT. b) and c) represent the binary airway and the centerline, respectively. The branch points (green) and end points (yellow) of the airway tree can be seen
in d). Best viewed in color.
method is more likely to generate breakages than conventional
methods. The conventional methods usually rely on intensity
constraints, hence, the connectivity can be guaranteed, espe-
cially the region-growing algorithms. However, as discussed in
(Nadeem et al., 2020), the intensity of the airway wall varies
significantly from the proximal to distal sites. Consequently,
the region growing methods merely function reliably in the tra-
chea and main bronchi. The breakages that happened in the
deep learning methods can be ascribed to three aspects. First,
the intrinsic class imbalance distribution (Zheng et al., 2021b;
Zhang et al., 2022a) adds difficulty in extracting the whole air-
a) b)
c) d)
Leakage Breakage
Ground-Truth Prediction
Robustness and Generalization Beyond Pixel-wise Extraction
local detail
Geometrical
Map
Fig. 2. Four main challenges of the pulmonary airway segmentation task.
a) Leakage. b) Breakage. c) Robustness and generalization. d) Beyond
pixel-wise extraction.
way tree structure with good connectivity. The class imbalance
includes two type of imbalance distribution, the inter-class im-
balance and the intra-class imbalance. Inter-class imbalance
means that the number of the airway voxels is far fewer than
that of background, and the intra-class refers to the relative total
volume difference of trachea, principal bronchi, lobar bronchi,
and distal segmental bronchi. Such imbalanced distribution in-
fluences the data-driven deep learning methods, leading to the
breakages of peripheral bronchi. The uncertainty of the airway
lumen is the second aspect that may cause breakage. The un-
certainty includes low contrast, complex topological structures,
and imaging noise. Third, the overlap-wise loss function, e.g.,
Dice (Milletari et al., 2016) loss function, is widely used for
medical image segmentation tasks. However, it can not guaran-
tee topological accuracy due to the severe intra-class imbalance
distribution. It is known that deep learning models trained with
class-imbalanced data may perform poorly in the minor classes
with scarce training data (Buda et al., 2018; Liu et al., 2019).
Challenge 3: Robustness and Generalization (C3). The
diseases, such as bronchiectasis, emphysema, and COVID-19
could influence the airway morphology or the characteristic
of CT images. Here, we put our emphasis on the pandemic
COVID-19 disease. As seen in Figure 2.c), the normal CT
scans can be categorized into the clean domain, where the air-
way lumen is relatively explicit in these clean CTs. However,
the COVID-19 CT scans can be deemed as the noisy domain
since they introduce the bias attributes, e.g., bilaterally scat-
tered irregular patches of ground glass opacity, thickening of
inter-lobular or intra-lobular septa, and consolidation. Prelim-
inary experiments (Zhang et al., 2021a, 2022b) have demon-
strated that the models trained on the clean domain are difficult
to be generalized to the noisy CTs. Improve the robustness and
generalization ability across domains is also critical to measure
the performance of airway segmentation algorithms.
Challenge 4: Beyond Pixel-wise Extraction (C4). Cur-
rently, the airway tree modeling task is regarded as the pixel-
or voxel-wise segmentation task. One of the critical purposes
of airway tree modeling is for the navigation of bronchoscopic-
assisted surgery. However, there exists a gap temporarily be-
tween these two things. The CT values are discrete signals and
4 This is the draft version for editing. /Medical Image Analysis (2023)
the airway prediction obtained by CNNs are also dense, dis-
crete volumes. Volume rendering algorithms are necessary to
acquire continuous results (e.g., Mesh). The topological rela-
tion in the voxel-wise structure data is weak while very strong
in the continuous data. As seen in Figure 2.d), the curved cen-
terline of airways can be extracted on the Voronoi diagram via
the Eikonal equation. More geometric attributes of the airway,
including the bifurcation, radius, and centerline directions can
be performed. Hence, we have a giant vision for the future air-
way tree modeling paradigm: Take the discrete CT scans as in-
put and output the continuous airway results. This is challeng-
ing because we need to develop a novel pixel-wise extraction
methodology.
Our organized ATM’22 focuses on the above challenges. The
large-scale dataset with full annotation encourages participants
to develop novel methods to harness the intrinsic topological
airway tree knowledge and achieve remarkable segmentation
performance.
1.3. Limitation of Previous Datasets
As reported in Table 1, the main drawbacks of recent rep-
resentative airway segmentation work lie in the limited scale of
the dataset and incomplete evaluation metrics. The total number
of benchmark datasets is all smaller than one hundred, which is
not sufficient for the deep learning model training. The mod-
els are prone to overfitting due to the small number of train-
ing set. Furthermore, the small datasets still need to split into
non-overlap training/validation/test sets. Consequently, the val-
idation set is inadequate to guarantee the robustness and gen-
eralization ability of the trained models because it owns a very
small amount of samples. In addition, the in-house datasets and
incomplete evaluation metrics degrade the transparency of the
results. They also add difficulty to the fair comparison among
various methods.
Considering the public benchmarks, the most famous air-
way segmentation challenge in the past few decades is the ’Ex-
traction of Airways From CT’ (EXACT’09) organized by (Lo
et al., 2012). It was held at the Second International Work-
shop on Pulmonary Image Analysis, in conjunction with the
12th International Conference on Medical Image Computing
and Computer-Assisted Intervention (MICCAI 2009)3. The
EXACT’09 provided 40 CT scans, the first 20 scans were des-
ignated as the training set and the remaining 20 scans were
set as the testing set to evaluate different algorithms. It also
provided a platform4for comparing airway extraction algo-
rithms with standard evaluation metrics. Since the machine-
learning/deep-learning methods were not the dominated meth-
ods in early 2000s, the majority of the algorithms were based on
the traditional image processing methods, such as morphologi-
cal filtering (Irving et al., 2009; Fetita et al., 2009), and region
growing (Pinho et al., 2009; Feuerstein et al., 2009; Wiemker
et al., 2009). The EXACT’09 challenge aimed to develop auto-
matic airway segmentation algorithms, however, these methods
3https://www.lungworkshop.org/2009/index.html
4http://image.diku.dk/exact/
often fail in extracting the smaller peripheral bronchi due to the
lack of robust features. Further, the EXACT’09 challenge did
not publish the manual annotation of airway and the number of
training samples are limited, which is not friendly for the bur-
geoning data-driven deep-learning methods.
Therefore, we organize the Multi-site, Multi-domain Air-
way Tree Modeling (ATM’22) Challenge, which was held
in conjunction with MICCAI 2022. Our challenge aims
to revolutionize the pulmonary airway segmentation task
compared with the EXACT’09 from three aspects:
1) More Annotated Data. The EXACT’09 only provided 40
CT scans without airway labels. In ATM’22, we collected 500
CT scans with elaborated airway labels, each delineated by
three experienced radiologists. We believe that a large num-
ber of CT scans and airway labels could boost the development
of robust airway segmentation algorithms based on deep neural
networks.
Compared with recent datasets adopted in deep-learning
methods, as seen in Table 1, ATM’22 expands the scale of the
dataset with record number of cases. In addition, the dataset
of the ATM’22 is split into 300/50/150 for training, validation,
and test. As well known, current deep-learning methods are
mainly data-driven, and the large number of the training set
is critical to obtain robust models. A large number of the
validation set (even larger than whole datasets used in previous
work) can avoid the over-fitting problem. The test set contains
150 CT scans which are inaccessible to the participants.
Only Docker-based submission is acceptable for evaluating
the segmentation algorithms, which guarantees the fairness
and reproduciblility of the benchmark. Further, the ATM’22
challenge covers CT scans from multiple sites, which can also
evaluate the generalization ability of the models. Detailed
information on dataset is provided in Section 3.1.
2) More Comprehensive Metrics. In the EXACT’09 chal-
lenge, the groundtruth of the airway was constructed from the
results of the participants. Specifically, they first divided the air-
way tree into branch segments. These segments are then scored
by experienced observers to determine whether it is a correctly
segmented part or not. Finally, the reference airway trees were
constructed by gathering the union of all correctly extracted
branch segments. Since EXACT’09 did not acquire the fine-
grained annotation of the airways, their benchmark was de-
signed to only evaluate the depth of the predicted airway while
neglecting the exact airway shape and dimensions.
To comprehensively assess the algorithms, ATM’22 consid-
ered both the depth of the airway trees and airway dimensions
via the fine-grained annotations by experienced radiologists.
ATM’22 aims to evaluate the airway segmentation algorithms
from two perspectives including Topological Completeness
and Topological Correctness. The topological completeness is
measured by the tree length detected rate (TD, %) and branch
detected rate (BD, %), which are introduced by EXACT’09.
Both TD and BD were evaluated on the largest component
of the prediction, reflecting the topological completeness and
continuity of the models. The high quality of the topological
completeness is essential to the navigation usage for endo-
This is the draft version for editing. /Medical Image Analysis (2023) 5
Table 1. Comparisons between the ATM’22 challenge and other related airway segmentation works from the dataset and evaluation. Commonly used
metrics are Tree length detected rate (i.e., TD, %), Branch detected rate (i.e., BD, %), Dice Similarity Coefficient (i.e., DSC, %), Precision (i.e., Pre, %),
Sensitivity (i.e., Sen, %) and Specificity (i.e., Spe, %). The reported false positive rate (FPR) is equal to reporting the specificity, and the true positive rate
(TPR) is equal to report the sensitivity.
Method Model
Description & Characteristic
Dataset Metrics
!Open-Source %In-House TD (%) BD (%) DSC (%) Pre (%) Sen (%) Spe (%)
Meng et al. (2017b) U-Net with the tracking algorithm %50 scans, TokushimaUniversity %! ! % % %
Jin et al. (2017) U-Net with graph refinement !40 scans, EXACT’09 !20 scans, LTRCKarwoski et al. (2008) ! ! % % % %
Charbonnier et al. (2017) U-Net with leak detection %45 scans, COPDGene Regan et al. (2011) !% % % % !
Garcia-Uceda Juarez et al. (2019) U-Net with a gcn module %32 scans, DLCST Pedersen et al. (2009) !%!%!%
Wang et al. (2019) Spatial CNN with radial distance loss %38 private scans !%!% % !
Nadeem et al. (2020) U-Net with freeze-and-grow algorithm %32 scans, SPIROMICS (Couper et al., 2014) % % % % % !
Garcia-Uceda et al. (2021) Efficient 3D-UNet %24 scans, CF-CT (Kuo et al., 2017), DLCST,EXACT’09 !%!% % !
Qin et al. (2021) UNet with attention module !90 scans, BAS (Qin et al., 2020) ! ! ! %! !
Zheng et al. (2021b) WingsNetwith general union loss !BAS ! ! %!% %
ATM’22 Challenge (Ours) ———— !500 scans ! ! ! ! ! !
bronchial interventions. The topological correctness represents
the overlap-wise accuracy of the segmentation models. We
adopted the Dice Similarity Coefficient (DSC, %) and Precision
(%) for the quantitative measurement of pulmonary airways,
which plays a critical role in abnormality analysis. The
selection criterion and standard formula of these metrics are
reported in Section 3.2.
3) More Powerful Platform. The ATM’22 challenge is one
of the Satellite Events in conjunction with MICCAI 20225, and
hosted on grand-challenge.org6, which allows the flexible and
extendible management of benchmarks. Compared with the
EXACT’09 challenge, our website not only provides the reg-
istration and dataset access but also supports the submission of
prediction results and prompt feedback. The submissions will
be evaluated automatically by the executable docker and the
metrics can be presented on the leaderboard in a few minutes.
The improvement of the evaluation procedure has significantly
accelerated the research, as the researchers do not need to wait
for the official result reply via e-mails like EXACT’09. In addi-
tion, the ATM’22 owns a live leaderboard that presents all valid
results from different teams. The public leaderboard ensures
the fairness in evaluating various algorithms. In conclusion, the
ATM’22 challenge has deployed on a more effective platform,
which is beneficial to the research community. A detailed eval-
uation procedure can be seen in Section 3.4.
1.4. Contributions
Our challenge was accepted as a Satellite Event of the
MICCAI 2022 challenge, and our official challenge web-
site is constructed and maintained via the platform of grand-
challenge.org. The contribution of our organized challenge can
be briefly summarized below:
•ATM’22 is a critical milestone that establishes the stan-
dard norm of the airway segmentation field in this deep
learning era. To our best knowledge, ATM’22 is the first
challenge to provide the large-scale dataset, 500 CT scans
with full pulmonary airway annotation. A large amount of
5MICCAI 2022 challenge list: https://conferences.miccai.org/
2022/en/MICCAI2022-CHALLENGES.html
6ATM’22 website: https://atm22.grand- challenge.org/
the dataset is beneficial to the development of the deep-
learning based algorithms. Further, our challenge was de-
ployed on the public platform that executes the evaluation
in time and then presents the results online. Hence, it is
convenient to compare with different algorithms and speed
up the research procedure.
•ATM’22 arouses the reflection that airway segmentation
should be a beyond pixel-wise segmentation task. Unlike
other common segmentation tasks, the overlap based and
the surface distance-based measures are enough to eval-
uate the performance of the algorithms. However, these
measures only consider the topological correctness of the
airway segmentation methods. The topological complete-
ness is another significant aspect to measure the perfor-
mance of airway segmentation algorithms. ATM’22 first
establishes the most comprehensive evaluation system, in-
cluding both topological correctness and topological com-
pleteness, to determine the performance of the algorithms.
Combined with the large-scale datasets, the intrinsic topo-
logical features of airways are expected to be harnessed.
•ATM’22 focus on the generalization ability of automatic
airway segmentation algorithms. ATM’22 contains diver-
gent data from multi-site and multi-domain. The deep
learning models are expected to explore more substantive
characteristics of the pulmonary airway to perform well
across different sites and domains. In addition, ATM’22
provides a valuable prerequisite database for various clin-
ical centers worldwide. They could leverage this database
for the pre-training of models and then apply it to their
in-house data.
The rest of the paper is organized as follows: Section 2 summa-
rizes the previous work related to pulmonary airway segmenta-
tion. Section 3 provides the details of the materials, evaluation
framework, and participation procedure in our challenge. Sec-
tion 4 introduces and compares the top 10 methods ranked in
this challenge, along with our insights. Section 5 presents the
quantitative and qualitative results of the validation phase and
the final test phase, followed by the discussion in Section 6.
Finally, we conclude our work in Section 7.
6 This is the draft version for editing. /Medical Image Analysis (2023)
2. Related Work
2.1. EXACT’09 Challenge
To compare different airway segmentation algorithms using
a standard dataset and performance evaluation method, the Ex-
traction of Airways From CT (EXACT’09) (Lo et al., 2012)
is successfully hosted in 2009. The EXACT’09 dataset pro-
vided 40 CT scans including 20 scans for the training usage
and 20 scans for the test stage. They evaluated 15 airway tree
extraction algorithms from different research groups. In that
pre-deep learning era, most of the participants adopted region-
growing and vessel filters to address this problem. The results
of the participants were further used by the organizers to con-
struct the golden standard of the airway reference. Specifically,
the airway prediction of different participants was first subdi-
vided into individual branches, and then visually scored by the
trained observers. The correctly segmented branches were re-
tained while the incorrect branches were rejected. Finally, all
accepted branches were aggregated to acquire the final refer-
ence standard. However, the training observers merely decided
whether the individual branches were acceptable or not while
they did not annotate the original CT scans for those branches
neglected by all the algorithms. In addition, due to the lack
of precise voxel-wise annotation, the evaluation of EXACT’09
was designed to only take the extracted airway tree length into
consideration without the shape and dimension.
EXACT’09 contributed to the field of pulmonary airway seg-
mentation as they established a framework to evaluate the air-
way extraction algorithms in a standard manner. They had es-
tablished their own website7, where detailed information and
challenge results are presented. Although this website is main-
tained manually for registration and submission, their feedback
period is extremely long, which is inappropriate for the current
scientific research.
2.2. Deep Learning Methods for Airway Segmentation
Since EXACT’09, several methods that employed techniques
such as adaptive thresholding, region growing, and filtering-
based enhancement were proposed. These methods success-
fully segmented the trachea and main bronchi but often failed
to extract peripheral bronchi because the intensity contrast be-
tween the airway lumen and wall weakens as airways bifurcate
into thinner branches. Xu et al. (2015) proposed the hybrid
multi-scale fuzzy connectedness framework cooperating with
morphological reconstruction and multi-scale vessel enhance-
ment for airway lumen segmentation. As presented in Fig.3,
the recent progress of deep learning, especially Convolutional
Neural Networks (CNNs) have promoted the research on air-
way segmentation (Charbonnier et al., 2017; Jin et al., 2017;
Meng et al., 2017b,a; Selvan et al., 2018; Nadeem et al., 2018;
Garcia-Uceda Juarez et al., 2018; Zhao et al., 2019; Yun et al.,
2019; Qin et al., 2019; Wang et al., 2019; Garcia-Uceda Juarez
et al., 2019; Nadeem et al., 2020; Selvan et al., 2020; Qin et al.,
2021; Zheng et al., 2021b,a; Garcia-Uceda et al., 2021; Wu
7EXACT’09 Website: http://image.diku.dk/exact/
et al., 2022; Yu et al., 2022b; Nan et al., 2022; Zhang et al.,
2022a).
To reduce the mass of false positives and increase the length
of the detected airway tree length, 2-D CNN (Yun et al., 2019)
and 2.5-D CNN (Charbonnier et al., 2017) were respectively
applied to the coarse segmentation to reduce false positives
and increase the length of the detected airway tree. 3D CNNs
were developed to handle the airway segmentation task via ei-
ther the fixed-stride patch-wise sliding window fashion (Garcia-
Uceda Juarez et al., 2018) or a dynamic VOI-based tracking
way (Meng et al., 2017b). To further extract discriminative
features, specific designs of neural networks were also incor-
porated into the 3D UNet. Graph refinement (Selvan et al.,
2020; Garcia-Uceda Juarez et al., 2019) was explored to incor-
porate neighborhood knowledge of airways in feature aggrega-
tion. Wang et al. (2019) proposed a spatial propagation layer
and radial distance loss for tubular topology perception. Qin
et al. (2021) designed a feature calibration and attention dis-
tillation module to force the 3D UNet to share superiority to
tenuous peripheral bronchioles. Zhao et al. (2019) proposed a
linear-programming tracking method to combine the results of
3D CNNs and 2D CNNs.
Meanwhile, the importance of the connectivity of the airway
prediction also raised attention. AirwayNet (Qin et al., 2019)
was proposed to transform the binary airway segmentation task
into 26-neighborhood connectivity prediction problem. Wu
et al. (2022) utilized the long-range slice continuity information
to enhance the connectedness of airway prediction. The con-
nectivity attribute was further explored by Zheng et al. (2021b)
and Zhang et al. (2022a). Zheng et al.put forward the class im-
balance problem that existed in the airway segmentation task
while Zhang et al.pointed out that a satisfactory trade-offbe-
tween the topological completeness and correctness should be
achieved. The WingsNet was adopted by Zheng et al. (2021b)
and Yu et al. (2022b) as the backbone for a multi-stage training
solution. Zheng et al.designed a general union loss (GUL) to
alleviate the intra-class imbalance problem. Yu et al.resolved
the problem via a breakage-sensitive loss. To further tackle the
topology-preserving challenge, Zhang et al. (2022a) proposed a
convolutional distance transform (CDT) module to refine the
fractured areas that are critical to the topological structures.
Nan et al. (2022) designed a continuity and accumulation map-
ping (CAM) loss, which enhanced the continuity degree and
minimized projection errors of airway predictions.
The EXACT’09 challenge has been hosted over a decade,
it is time to promote the airway segmentation task to a new
level for the next generation of medical image analysis and
bronchoscopic-assisted surgery. The ATM’22 challenge aimed
to revolutionize this field via providing more annotated data,
more comprehensive evaluation, and more efficient feedback
for the research community. A promising trend of the pul-
monary airway segmentation is to harness the intrinsic topo-
logical features from the significant annotated data.
2.3. Topological Prior Knowledge
The most relevant task to airway segmentation is tubular ob-
ject segmentation, where topological prior knowledge plays a
This is the draft version for editing. /Medical Image Analysis (2023) 7
EXACTʼ09
Propose the framework to
establish the reference
airway tree segmentation.
It evaluated 15 different
airway tree extraction
algorithms on a diverse set
of 20 chest CT scans with-
out golden annotations.
Most algorithms adopted
region-growing and vessel
filters in that pre-deep-
learning era.
ATMʼ22
ATMʼ22 challenge aimed to
revolutionize the field of
airway segmentation from
three aspects: 1) More
Annotated Data. 2) More
Comprehensive Metrics. 3)
More Efficient Platform.
The road map of the pulmonary airway segmentation works
Xuʼ15
Xu et al. proposed th-
e hybrid multi scale
fuzzy connectedness
framework coopera-
ting with morphologi-
cal reconstruction and
multi-scale vesselness
enhancement f or ai r-
way lumen segmenta-
tion.A spatially const-
rained Markov rand-
om walk used for air-
way wall estimation.
Mengʼ17
Meng et al. introduced
the concept of the tra-
cking along the centerl-
ine (extracted by the
gradient vector flow) to
the airway segmentati-
on problem. They desi-
gned the sampling stra-
tegy and 3 VOIs. 3D
UNet was used for air-
way region extraction,
leakage removal, and
Furcation detection.
Jean-Paulʼ17
Jean-Paul et al. design-
ed a classifier via the
ConvNet to detect the
leakage phenomenon.
The leakage removal
can increase the tree le-
ngth metric.
Qinʼ19
Qin et al. converted the
airway binary segmen-
tation problem to the 26
neighborhood conn-
ectivity classification
problem. They concate-
nated the distance map
and the coordinate map
to final decoder stage.
Wangʼ19
Wang et al. introdu-
ced a SCNN to cap-
ture slice-wise propa-
gation message, add-
ing strong spatial con-
straint. They also inc-
orporate the distance
map to establish a ra-
dial distance loss fun-
ction.
Selvanʼ20
Selvan et al. posed the
airway tree extraction
as a graph refinement
task. First, they used a
mean-field network to
approximate the poste-
rior density. Second, a
graph neural network
refines the sub-graphs
and derive edge proba-
bilities.
Nadeemʼ21
Nadeem et al. proposed
the CT intensity based
freeze and growth pro-
pagation. It starts with
a conservative segmen-
tation parameter and
captures finer details
through iterative pa-
rameter relaxtion.
Zhengʼ21
Zheng et al. proposed
the WingsNet combi-
ned with group super-
vision to deal with the
inter-class imbalance.
The general union loss
was designed to tackle
intra-class imbalance
via distance map based
weight and hard min-
ing sample solution.
The pulmonary airway
segmentation is still an open-
call challenging problem that
needs to be explored further.
Fig. 3. The road map of the representative airway segmentation works from EXACT’09 challenge to ATM’22 challenge.
critical role. A typical class of tubular objects shares a tree-like
structures (Li et al., 2022a), such as blood vessel (Lyu et al.,
2022), coronary artery (Kong et al., 2020), neuron images (Li
and Shen, 2019), and the airway. Despite the powerful data-
fitting ability of the deep learning models, they barely can learn
the extrinsic topological features. For example, it is extremely
difficult for deep learning models to represent the characteristic
that ”An object shares one single connected domain”. The poor
representation of the topology leads to the discontinuity prob-
lem that often happens in tubular object segmentation tasks. To
alleviate this problem, previous works could be categorized into
three dimensions: 1) Enhancing the representation ability of the
deep learning models. 2) Designing surrogate objective func-
tions to increase the topological accuracy. 3) Adding the ex-
plicit topological restriction to the optimization procedure.
Mosinska (Agarap, 2018) discriminated the higher-order
topological features of linear structures by adding the restric-
tion term to minimize the differences between the VGG19 de-
scriptor of the ground-truth images and the corresponding pre-
diction delineations. The Local Intensity Order Transforma-
tion (LIOT) (Shi et al., 2022) was dedicated to representing the
tubular structure, which is invariant to the increasing change of
the contrast. LIOT transformed the original image into a fea-
ture map with four channels, reinforcing the network to learn
more discriminative features. A Joint Topology-preserving and
Feature-refinement Network (JTFN) (Cheng et al., 2021) was
designed to jointly handle the global topology and refined fea-
tures via an iterative feedback learning strategy.
In addition to enhancing the representation ability of deep
learning models, other works endeavored to achieve this goal
by designing surrogate objective functions to increase topolog-
ical accuracy. Distance transform is a natural alternative (Ma
et al., 2020) used in medical image analysis to uncover topology
information. Kervadec et al. (2019) focused on the boundary of
the distance map and designed the boundary loss to minimize
the boundary variations between prediction and ground-truth
via an integral approach. Xue et al. (2020) directly regressed
the signed distance map (SDM), followed by the least absolute
error loss to penalize the output SDM with the wrong sign. To
repair the fractured areas, a convolutional distance transform
(CDT) module (Zhang et al., 2022a) was proposed to be per-
ceptible to the breakage. Other topological elements were also
investigated in the tubular object segmentation. The center-
line, bifurcation, local radius, curvature, normal, and so on are
the ponderable characteristics for the representation of tubular
structures. Wang et al. (Wang et al., 2020) presented tubular
shapes as the envelope of a family of spheres with continuously
changing center points and radii. They rephrased the distance
map prediction as a quantified classification based on the cen-
ter points and radii. Shit et al. (Shit et al., 2021) proposed a
differentiable measurement, CenterlineDice (clDice), to simul-
taneously handle the over- or under-segmentation phenomenon.
However, the centerline ground-truth of volumetric data is not
easily acquired. Despite it can be approximately computed via
the 3D skeletonization method (Lee et al., 1994), the curve-
skeleton/medial axis extraction from 3D mesh representation
itself is an open challenging and unsolved problem (Au et al.,
2008; Dey and Sun, 2006; Cornea et al., 2005).
Third, critical properties in the algebraic topology were grad-
ually applied to add the explicit topological restriction to the
optimization procedure. Persistent homology (Edelsbrunner
et al., 2000; Cornea et al., 2005) is a topological data analysis
8 This is the draft version for editing. /Medical Image Analysis (2023)
method for calculating the robustness of topological features
of a dataset at different scales. Persistent homology involves
counting the number of topological features from different di-
mensions, termed Betti numbers. The Betti numbers are crucial
topological invariants that count the number of features of di-
mension k, where β0,β1, and β2represent the number of con-
nected components, the number of loops or holes, and the num-
ber of hollow voids, respectively. Clough et al. (Clough et al.,
2020) first analyzed the Betti numbers as a set of birth and death
threshold values for each topological feature, which can be rep-
resented in a barcode diagram. They then specified the desired
topology of the segmented objects and adopted the Persistent
homology upon the candidate segmentation to reinforce it to
share the specified topological features. Similarly, Hu et al. (Hu
et al., 2019) optimized the persistence diagram to emphasize
one-dimensional topological features, i.e., the connected com-
ponents. The Morse theory (Milnor, 2016), which captures the
singularities of the gradient vector field of the likelihood func-
tion, was also investigated to identify critical global structures,
including 1D skeletons and 2D patches (Hu et al., 2021). Zhang
et al.explored several unsupervised geometry-based methods
for tubular object reconstruction. The divergence prior (Zhang
et al., 2019) and confluence property (Zhang et al., 2021b) were
incorporated as the explicit constraints to improve reconstruc-
tion accuracy.
3. Challenge Setup
3.1. Dataset
3.1.1. Dataset Information
We collected and annotated 500 chest CT scans from multi
sites. The CT scans were collected from the public LIDC-IDRI
dataset (Armato III et al., 2011) and the Shanghai Chest hos-
pital with the ethics approval number KS(Y)21328. All par-
ticipants agreed the CC BY NC. The chest CT scans were ac-
quired with three vendors including Philips iCT 256, GE MED-
ICAL SYSTEMS LightSpeed16, TOSHIBA Aquilion. The
health conditions of the scanned subjects are diverse, ranging
from healthy people to patients with severe pulmonary dis-
ease. The information of patients and scanners were manually
anonymized. We then annotated the selected 500 CT scans by
three experienced radiologists. The annotation details are care-
fully elaborated in the Section 3.1.2.
Each chest CT scan consisted of varying number of slices,
ranging from 157 to 1125 with a slice thickness of 0.450-1.000
mm. The axial size of all slices is 512×512 pixels with a spatial
resolution of 0.500-0.919 mm. The training set consists of 300
chest CT scans, while 50 and 150 CT scans for the validation
set and test set, respectively. The properties of the training,
validation, and test sets are summarized in Table 2.
3.1.2. Annotation Details
To acquire the fine-grained annotations of the airway from
chest CT scans, each CT scan was firstly preprocessed by the
models by (Zheng et al., 2021b; C¸ ic¸ek et al., 2016; Yu et al.,
2022b; Zhang et al., 2022a) trained on BAS dataset (Qin et al.,
2021). The results are then ensembled by majority voting strat-
egy to acquire the preliminary segmentation result. Theses pre-
liminary annotations were carefully delineated and manually
double-checked by three radiologists with more than five years
of professional experience to acquire the final refined airway
tree structure, which took 60-90 minutes for each CT scan. The
main challenge is to preserve the topological completeness of
airways, the breakage intends to happen in the pre-trained mod-
els due to feature inhomogeneity. The experienced radiologists
carefully repaired breakages and wiped out several small leak-
ages based on the original CT information and anatomical prior
knowledge. The organizers spent almost one year to collect the
500 chest CT scans from different sites and carefully delineat-
ing the refined airway annotations for each scan. In the anno-
tation process, we tried our best to ensure that each radiologist
stuck to the same annotation principle and thus guaranteed the
consistency of airway annotation.
3.2. Evaluation Metrics
As presented in Table 1, previous works adopted incomplete
metrics to measure the performance. In this challenge, we
established a comprehensive evaluation system. Followed by
(Maier-Hein et al., 2022), we chose two types of metrics to
evaluate the airway segmentation algorithms. The first is the
common segmentation task metric and the second is the spe-
cific property-related metric. Specifically, we adopted the Dice
Similarity Coefficient (DSC, %), Precision (%) to measure the
overlap-based and voxel-wise segmentation accuracy. Let Y
and ˆ
Ydenote the binary ground-truth label and the prediction
result. The calculation of DSC and Precision can be formulated
as below:
DS C =2|ˆ
YTY|
|Y +ˆ
Y| ,(1)
Precision =|ˆ
YTY|
|ˆ
Y| ,(2)
where the | · | denotes the sum operation that returns the number
of voxels. In addition, we also supplemented the evaluation of
the voxel-wise segmentation accuracy with the Sensitivity (Sen,
%) and Specificity (Sep, %). The Sen and Spe are respectively
associated with the true positive (TP) volume fractions and the
true negative (TN) volume fractions:
Sen =|T P|
|T P +FN|=|ˆ
YTY|
|Y| ,(3)
Spe =|T N|
|T N +FP|=|I|−|ˆ
YSY|
|I| − |Y| ,(4)
where the FN denotes the false negative volume fractions and
FP is the false positive volume fractions. Irepresents the image
to segment.
As for the specific property-related metric, topological com-
pleteness is the most critical attribute in the airway segmenta-
tion challenge. Following (Lo et al., 2012), we defined the Tree
length detected rate (TD, %) and Branch detected rate (BD, %)
to measure the performance of algorithms in detecting the air-
way. TD is defined as the fraction of the tree length that is
This is the draft version for editing. /Medical Image Analysis (2023) 9
Table 2. The summarized properties of the training, validation and test sets.
Dataset Scanner Slice Number Slice thickness (mm) In-Plane Resolution (mm)
Training Philips iCT 256, GE LightSpeed16 157-1125 0.500-1.000 0.514-0.919
Validation Philips iCT 256, GE LightSpeed16 408-803 0.500-0.750 0.531-0.822
Test Philips iCT 256, GE LightSpeed16, TOSHIBA Aquilion 257-830 0.450-0.801 0.500-0.859
detected appropriately with regard to the length of the airway
tree in the ground-truth:
T D =Tdet
Tre f
,(5)
where Tdet denotes the total length of all branches detected
in the prediction, and Tre f represents the whole tree length
in the ground-truth. BD denotes the percentage of the air-
way branches that are detected correctly with association to the
whole number of branches in the ground-truth:
BD =Bdet
Bre f
,(6)
where Bdet denotes the total correct branches detected in the
prediction, and Bre f represents the whole number of branches
in the ground-truth. Note that a branch in the prediction is iden-
tified as ’correct’ only if more than 80% of centerline voxels
extracted from the certain branch are within the ground-truth.
TD and BD were adopted to measure the topological com-
pleteness of segmentation algorithms, meanwhile, DSC and
Precision are chosen as the topological correctness measure-
ments. Since all metrics are normalized into [0%,100%], the
mean score calculation is adopted as the ranking criterion:
Mean Score =0.25 ∗T D +0.25 ∗BD
+0.25 ∗DS C +0.25 ∗Precision.(7)
The implementation of the evacuation code can be found in our
official ATM’22 repository8.
3.3. Participants
As an open-call challenge, the ATM’22 received 305 requests
of registration before the MICCAI 2022 conference (September
22, 2022), among which 30 teams had successfully participated
in the validation phase before the validation phase submission
deadline (August 17, 2022). 22 teams had submitted the algo-
rithm dockers successfully in the test phase before the test phase
submission deadline (August 31, 2022). As detailed in Figure
4, these 22 teams are from 9 different countries. In this paper,
the information of all teams are reported in Table 3. Note that
we assign a unique Team index for different teams in the vali-
dation phase and test phase respectively. We adopted the Team
index to describe their methodologies and results for simplic-
ity. 10 representative algorithms were selected to be reported
in detail. All teams agreed to include their methods and results
for this publication. The selected algorithms were under the
consideration of both novelty and evaluation performance.
8https://github.com/Puzzled-Hui/ATM- 22-Related- Work/
tree/main/evaluation
Template
India
Brazil
China
USA
UnitedArabEmirates
rmany
UK
France
Switzerland
Success Spread
22 Team s 9Countries
UnitedArabEmirates
1/22
China
8/22
India
1/22
Brazil
1/22
Switzerland
1/22
Germany
2/22
France
2/22
UK
2/22 USA
4/22
Fig. 4. The individual team statistics that fully participated successfully in
ATM’22 challenge.
3.4. Challenge Phases
The challenge includes three phases. First, to complete the
registration and get access to the dataset, the participants should
register on the official challenge website, sign the data agree-
ment file then send the scanned file via e-mail to organizers,
and keep their promise to abide by the challenge rules. Second,
the participants should take part in the validation phase, where
the binary predictions are required to submit. The evaluation is
automatically executed on the platform of grand-challenge.org.
The leaderboard is also presented online and updated promptly
9. Third, the participants should take part in the final test phase
to complete the full participation in this challenge. To guarantee
the fairness of the competition, the packaged docker is the only
valid submission in the test stage. Only complete prediction
of all test cases will be considered as successful participation.
The Prizes are awarded to the top-3 teams and the institutes
of organizers are not eligible for awards. The instructions for
the preparations of dockers are provided in the ATM’22 chal-
lenge repository 10. In this repository, we also provide the basic
pipeline to package your models to the docker image, which is
helpful to those who have little expertise with docker.
4. Methodologies
In this section, overall comparisons of different methods are
first reported. We then examine in detail the top 10 methods
ranked in the final test phase. For each method, we summarize
9The leaderboard of the validation phase: https://atm22.grand-
challenge.org/evaluation/validation-phase- 1-live-
leaderboard/leaderboard/
10Docker submission guideline of the test phase: https://github.com/
Puzzled-Hui/ATM- 22-Related- Work/tree/main/baseline-and-
docker-example
10 This is the draft version for editing. /Medical Image Analysis (2023)
Table 3. The list and details of the participant teams who successfully participated in the validation phase (Full submission including 50 binary predictions
and a qualified short paper was received before 17, Aug, 2022) and testing phase(Full submission including an executable docker and a qualified short
paper was received before 31, Aug, 2022). For simplicity, the short team index is used in the main text for the reference of the different teams, e.g., V1
means Validation team 1, representing for the team of Sanmed AI. Index is random. T1 means Test phase team 1, representing for the team of Sanmed AI.
Index depends on the successful submission order.
Validation index Test index Team name Affiliation Location
V1 T1 Sanmed AI A.I. R&D Center, Sanmed Biotech Inc. Guangdong, China
V2 T4 YangLab National Heart and Lung Institute, Imperial College London London, UK
V3 T20 notbestme School of information science and technology, Fudan University Shanghai, China
V4 - xiaqi Hygea Medical Technology Corporation Beijing, China
V5 T22 cvhthreedee Department of Informatics, Karlsruhe Institute of Technology Karlsruhe, Germany
V6 T3 LinkStartHao College of Physics and Information Engineering, Fuzhou University Fujian, China
V7 T7 neu204 College of Medicine and Biological Information Engineering, Northeastern University Liaoning, China
V8 T12 miclab Department of Computer Engineering and Industrial Automation, University of Campinas Campinas, Brazil
V9 T8 blackbean Shanghai AI Lab Shanghai, China
V10 T19 Median Median Technologies Valbonne, France
V11 T9 lya University of Science and Technology of China Hefei, China
V12 T18 satsuma Centre for Medical Image Computing, University College London London, UK
V13 - ailab Shanghai AI Lab Shanghai, China
V14 T6 timi InferVision Medical Technology Co., Ltd. Beijing, China
V15 T17 suqi School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University Shanghai, China
V16 - MibotTeam Smart surgery, Alg Department, Microport Shanghai, China
V17 T13 CITI-SJTU School of Biomedical Engineering, Shanghai Jiao Tong University Shanghai, China
V18 - SEU Key Laboratory of Computer Network and Information Integration, Southeast University Nanjing, China
V19 T14 deeptree damo Alibaba DAMO Academy Hangzhou, China
V20 T15 CBT IITDELHI Indian Institute of Technology Delhi(IITD) Delhi, India
V21 T5 dolphins Computer Science Department, National Engineering School of Brest Brest, France
V22 T11 bms410 National Yang Ming Chiao Tung University Yangming Campus Taipei, Taiwan, China
V23 - airwayseg Center of Product Research&Development, Keya Medical Shenzhen, China
V24 - atmmodeling2022 Pittsburgh Institute, Sichuan University Sichuan, China
V25 T16 bwhacil Applied Chest Imaging Laboratory, Brigham and Women’s Hospital, Harvard Medical School Boston, USA
V26 T10 dnai Diannei Technology Shanghai, China
V27 - mlers R&D, Microport Shanghai, China
V28 T2 fme Fraunhofer Institute for Digital Medicine MEVIS Bremen, Germany
V29 T21 biomedia Mohamed bin Zayed University of Artificial Intelligence, UAE Abu Dhabi, United Arab Emirates
V30 - ntflow Mathematics, Nanjing University Nanjing, China
Table 4. Descriptions of the notation.
Notation Description Notation Description
xInput yLabel
XFeature space YLabel space
L(·,·) Loss function ˆ
yLikelihood map
the main contributions and report the implementation details.
The potential directions of improvements are finally discussed.
For simplicity, Table 4 lists the frequently-used notation. The
order of method description is in accordance with the perfor-
mance ranking of the final test stage.
4.1. Overall Comparison
In this section, we focus on the overall comparison of the
top 10 methods. Table 5 summarizes the main characteristics
of the top 10 models, including the backbone architectures,
the pre-process procedures, data augmentation strategies, and
the post-process procedure. 3D UNet (C¸ ic¸ek et al., 2016) and
nnUNet (Isensee et al., 2021) are the common choices for the
backbones. nnUNet (used by T14, T17) adopted the percent-
age clipping instead of the lung window to truncate CT val-
ues. Generally speaking, nnUNet conducts more comprehen-
sive data augmentation compared with other methods. All top
10 methods perform the intensity normalization and a part of
methods (T6, T4, T7, T1, T20) adopt the lung region extraction
as the pre-process to crop the unrelated regions.
Table 6 presents a brief summary of the top 10 methods. In
this table, we compare the key components and training strate-
gies among these methods.
4.2. Participants Methods
Next, we will report the top 10 ranked methods and highlight
the key novelty or component of each method.
4.2.1. A.timi
The team of timi (T6) proposed a well-designed three-stage
deep learning pipeline for the airway segmentation, as seen in
Figure 5. The WingsNet (Zheng et al., 2021b) was adopted as
the backbone architecture. In the first stage, the network was
trained with only the dice loss and the random crop sampling
strategy. Their contribution is concentrated on the second stage,
where the loss function and training procedure were carefully
designed. Inspired by the local-imbalance-based weight (Zheng
et al., 2021a), they designed a variant of the general union loss
(GUL) (Zheng et al., 2021b), which adjusted the weight factor
to focus on the small airways according to the different sizes of
branches. They derived the wpfrom the local foreground rate
within the pre-defined neighborhood space. Furthermore, simi-
lar to (Zheng et al., 2021b; Zhang et al., 2022a), the voxels near
the centerline of the airway were assigned more attention. This
weight ratio, wd, was defined as inverse square to the Euclidean
distance from the current position of the voxel to its nearest
voxel on the centerline. In conclusion, the final weight of each
voxel could be defined as w=wp+wd, and the loss function
was defined as below:
L(y,ˆ
y)=1−PN
i=1wiˆ
yγ
iyi
PN
i=1wi(αˆ
yi+βyi),(8)
where the γ, α, β were set to 0.7, 0.2, and 0.8 respectively. To
improve the efficiency of the training procedure, the small air-
way over-sampling and the skeleton-based hard-mining were
This is the draft version for editing. /Medical Image Analysis (2023) 11
Table 5. Characteristics of the top 10 models. Abbr: Lung Window (LW), Lung Region Extraction (LRE), Intensity Normalization (Norm), Rotation (R),
Flip (F), Scale (S), Jitter (J), Gaussian Noise (GN), Brightness (B), Gamma (GA). It is noted that the largest connected component extraction is executed
on all methods by the official organizers to evaluate the final metrics.
Team Backbone Pre-Process Data Augmentation
Post-Process
Spatial-based Intensity-based Others
LW LRE Norm R F S J GN B GA
T6 WingsNet (Zheng et al., 2021b) [-1000,500] ! ! N/A fill holes
T4 Attention UNet (Oktay et al., 2018) [-1200,600] ! ! ! ! N/A region grow
T14 nnUNet (Isensee et al., 2021) N/A! ! ! ! ! ! ! N/A TTA
T7 3D UNet (C¸ic¸ek et al., 2016) [-1000,600] ! ! ! ! N/A ensemble
T1 3D UNet (C¸ic¸ek et al., 2016) [-1400,200] ! ! ! ! N/A N/A
T5 3DResNet (Tran et al., 2018) N/A! ! ! ! pseudo label N/A
T17 nnUNet (Isensee et al., 2021) N/A! ! ! ! ! ! ! N/A N/A
T20 Transformer N/A! ! N/A Resize
T9 nnUNet (Isensee et al., 2021) [-1200,600] ! ! ! ! ! deformation N/A
T10 3D UNet (C¸ ic¸ek et al., 2016) [-1028,266] ! ! ! N/A ensemble
adopted in different stages. The small airway over-sampling
strategy represents that the cropped patches around the small
airways (diameter less than 2 pixels) were densely over sam-
pled. The prediction of the first stage was the prerequisite of the
skeleton-based hard-mining strategy. The misclassified voxels
on the skeleton were defined as the hard-mining voxels, from
where the cropped patches were densely extracted for train-
ing usage. In the third stage, the variant of GUL, combined
WingsNet using
Dice Loss
Random Crop
Training Chest CT
raw data Lung Segmentation Input Volume
Small Airway Crop
(25%)
Hardmining Crop
(50%)
Random Crop
(25%)
WingsNet using
Weight Dice Loss
Small Airway Crop
(25%)
Hardmining Crop
(25%)
Random Crop
(50%)
WingsNet using
Weight Dice Loss
and Dice Loss
Training Set (80%)
Private Validation
Set (20%)
Last model
continue training
Best validation model
continue training
Final Model which
Performs best on
Private Validation
Set
Validation
Fig. 5. The three-stage deep learning pipeline for the airway segmentation
by the team of timi(T6).
with a weighted common Dice loss was adopted to fine-tune the
model. The main novelty of T6 method can be summarized as
follows: 1) Adopt the local-imbalance and centerline-distance
based weight to dynamically re-weight each voxel. 2) Design
the small airway oversampling and skeleton-based hard-mining
strategies.
4.2.2. B.YangLab
The team of YangLab (T4) designed a novel fuzzy attention
gate (FAG) and the Jaccard continuity and accumulation map-
a)
b)
Fig. 6. The proposed two modules by the team YangLab (T4). a) fuzzy atten-
tion gate (FAG). b) Jaccard continuity and accumulation mapping (JCAM)
loss.
ping (JCAM) loss for pulmonary airway segmentation. The
fuzzy attention gate was designed to tackle with the uncertainty
of annotations and the inhomogeneous intensity within the air-
way regions.
They followed the paradigm of the attention gate (Oktay
et al., 2018) while replacing the sigmoid function with the train-
able Gaussian membership functions. The Gaussian member-
ship functions are favored to specify the deep fuzzy sets due
to the smoothness and concise notation. Moreover, they advo-
cated designing the channel-specific attention gate instead of
assigning the same coefficient to all channels that belong to the
same spatial feature point. This way aimed to extract reliable
feature representations in different channels since they are pro-
cessed by different kernels. Motivated by the strength of the
uncertainty reduction in original data by fuzzy logic and neural
networks (Deng et al., 2016), they applied the fuzzy logic with
12 This is the draft version for editing. /Medical Image Analysis (2023)
the FAG using trainable Gaussian membership functions to as-
sist the neural networks to focus on the regions of interests. The
diagram of the FAG is demonstrated in Figure 6.a). Specifically,
assume that Xshares the shape of C×D×H×W, each feature
map was filtered by MGaussian membership functions with the
trainable mean µm,c, and standard deviation σm,c:
fm,c(X, µ, σ)=e
−(Xc−µm,c)2
2σ2
m,c,(9)
where m=1,2, ..., M,and c=1,2, ..., C. The operator ’OR’
was adopted to aggregate the fuzzy sets. To guarantee differen-
tiability, they used the max operation instead. The overall fuzzy
attention gate upon the c−th channel can be finally derived as:
fc(X, µ, σ)=
M
_
m=1
e
−(Xc−µm,c)2
2σ2
m,c=max(e
−(Xc−µm,c)2
2σ2
m,c) (10)
The Jaccard continuity and accumulation mapping (JCAM)
loss was another contribution that proposed to pay more atten-
tion to the continuity of the airway predictions. As seen in Fig-
ure 6.b), The JCAM estimated two topological types of errors
between the prediction and the ground-truth. The first is the
projection error, which was executed through the coronal, sagit-
tal, and axial planes. The second error, termed LCmeasures the
difference of centerlines extracted from the prediction and the
ground-truth, respectively. The projection error was split into
two parts, the linear accumulation maps (LAM), and the non-
linear transformation of the linear accumulation maps (nLAM)
performed by the tanh operation. The overall loss function they
used can be summarized as:
L(x,y)=αLJ(x,y)+βLC(x,y)+φLCE (x,y)+
γLLAM (x,y)+δLnLAM (x,y),(11)
where the LJdenotes the Jaccard loss function, the LCE denotes
the Cross-Entropy loss function. The α, β, γ was set to 1, and
the φ, δ was set to 0.3 in all experimental settings. In addtion,
they adopted the region growing method to fine-tune the trachea
part. In conclusion, the main novelty of T4 method lies in the
channel-specific fuzzy attention layer and the JCAM loss that
designed to enhance the continuity of airways.
4.2.3. C.deeptree damo
The team of deeptree damo (T14) proposed a two-stage
framework for airway segmentation, as demonstrated in Fig-
ure 7. In the first stage, to tackle the intra-class imbalance
between the different levels of airway branches, they formu-
lated the binary segmentation task to the multi-class segmen-
tation task in accordance with the airway branch size. Specif-
ically, they preliminarily decomposed the ground-truth of the
pulmonary airway label into three classes: 1) The trachea and
two main bronchi are classified as large-level airways, YL. 2)
From the bronchial up to the segmental airways are considered
as the middle-level airways, YM. 3) The rest of the peripheral
airways, whose average lumen diameter <2 mm, are small-
level airways, YS. The anatomy-aware multi-class (AMC) air-
way segmentation was formulated as follows:
Lf irst (ˆ
Y,Y)=L(ˆ
YL,YL)+L(ˆ
YM,YM)+L(ˆ
YS,YS),
(12)
Fig. 7. Overall workflow of the proposed two-stage airway segmentation by
team deeptree (T14). The coarse airway is first extracted by the anatomy-
aware deep network. Secondly, the breakage map is calculated by the mor-
phological operations to connect the breaking branches.
where they applied the general union loss function (Zheng et al.,
2021b) in the AMC framework. The AMC framework assisted
in explicitly differentiating the anatomic context of different
branches in the model training procedure. Thus, each class
owned a distinguished airway branch size range and the class-
specific features could be naturally learned.
Secondly, to deal with the breakage that happened in the first
stage, they calculated the breakage attention maps and simu-
lated the domain-specific breakage training data. These prepa-
rations aimed to accomplish the deep breakage connection. The
breakage attention map, termed H, was designed to highlight
the breaking area via the second-shortest distance calculation
between background points to all separate connected compo-
nents in a prediction. Hwas further normalized by the pa-
rameterized Sigmoid function H=Sigmoid(5 − H), where H
formed a normal 3D ball-like intensity distribution at a break-
age location. Further, the domain-specific breakage simulation
was performed to acquire sufficient breakage condition data YB
from the ground-truth for the 2nd-stage breakage-connection
network training. This network was fed with the fusion of X
and Hand predict the breakage ˆ
YB:
ˆ
YB=F(X,H;W),(13)
Lsecond =L(ˆ
YB,YB),(14)
where the F(·) and Wdenote the 2nd-stage breakage-
connection network and the corresponding network parame-
ters, respectively. Finally, the output of the 1st- and 2nd-stage
are merged to generate the whole airway tree prediction. In
summary, the breakage-connection network based on breakage
attention maps is the main novelty of T14 method.
4.2.4. D.neu204
The team of neo204 (T7) developed a two-stage network for
airway segmentation as described in Figure 8. In stage 1, the
3D computed tomography (CT) scans and the full airway an-
notation was fed into the proposed network, and the 3D com-
puted tomography (CT) scans and partial intra-pulmonary air-
way annotation were fed in stage 2. Then the results of the two
This is the draft version for editing. /Medical Image Analysis (2023) 13
stages were merged as the final prediction. 3D UNet (C¸ ic¸ek
Fig. 8. The proposed two-stage network for the airway segmentation by
team neu204 (T7). The first stage trains the whole airway while the second
stage refine the airways inside the lungs. The CoT refers to the contextual
transformer block.
et al., 2016) was chosen as the basic neural network architec-
ture in both two stages. They replaced one of the 3 ×3×3
convolutional kernel layers with the emerging contextual trans-
former (CoT) (Li et al., 2022b) module in both encoder and
decoder parts. The design of the CoT capitalizes on the con-
textual information among input keys to guide the learning of
the dynamic attention matrix and thus strengthens the capac-
ity of visual representation. The CoT module aimed to exploit
the rich contexts among the neighbor keys, which is beneficial
to highlighting the topological connection in the airway tree
structure. In conclusion, the independent processing of airways
based on their locations and introduction of the CoT are the
main contribution.
4.2.5. E.Sanmed AI
The team of Sanmed AI (T1) designed a modified attention
UNet for pulmonary airway tree modeling. First, a channel- and
spatial-wise attention module, the Project & Excite (PE) (Rick-
mann et al., 2019) module was embedded into each layer, fol-
lowing the common convolution operations. PE squeezes the
feature maps along different axes of slices separately to retain
more spatial information rather than perform global average
pooling. The extracted spatial information is further used in the
excitation step. It helps the network to learn the important fea-
ture information of the airways and improve the generalization
ability of the model.
Secondly the coordinate attention mechanism was applied
on the last decoder layer. It recorded the local information
of its corresponding patch in the whole image. Due to the
GPU memory limit, 3D CT images were cropped into sub-
volumes as model inputs, and such patch-based training strat-
egy caused a loss of position and context information. The co-
ordinate map was introduced to make up such information loss.
It was inserted to the high-dimension feature maps of the last
decoder because they share the same spatial dimension. Simi-
lar to other airway segmentation works (Qin et al., 2020; Zhang
et al., 2021a, 2022b), the Dice with Focal loss was applied in
all experiments. To sum up, the fusion of attention map and the
coordinate is the key design of T1 method.
4.2.6. F.dolphins
The team of dolphins (T5) proposed a 3DResNet (Tran et al.,
2018) with deep supervision model for the segmentation of pul-
monary airways. The convolutional block consisted of convo-
lutional layers with Batch Normalization (Ioffe and Szegedy,
2015) and ReLU activation function to extract the different fea-
ture maps from each block on the encoder side. The resid-
ual block was inserted at each encoder block with skip con-
nection. The feature concatenation was executed at each en-
coder and decoder block except the last 1 ×1 convolutional
layer. The three-level deep-supervision technique was applied
to generate the aggregated loss between ground-truth and pre-
diction. In addition, they used the nnUNet for one-fold cross-
validation of training volumes with ground-truth and validation
volumes with pseudo labels. In summary, the introduction of
deep supervision and the leverage of pseudo labels are the key
components of T5 method.
4.2.7. G.suqi
The team of suqi (T17) proposed a Dense-UNet based on the
nnUNet for airway segmentation. As the airway is the fine-
grained structure, to prevent the network from losing too much
information during upsampling and downsampling, they used
transposed convolution to realize upsampling, and used convo-
lution with a step size of 2 to realize downsampling. Further, to
enhance the feature embedding and alleviate the feature forget-
ting, the dense block (Huang et al., 2017) was introduced into
the nnUNet. Specifically, the encoder mapped the features into
the hidden space with the size of (6,5,5) and all 1 ×1×1 con-
volutions in each dense block had 256 channels. They still used
the output results of the encoder except the lowest two layers
to predict the probability map and obtain the supervision sig-
nal, which is conducive to the convergence of the network. In
conclusion, the integration of the dense block into nnUNet is
the key solution proposed by T17 method.
4.2.8. H.notbestme
The team of notbestme (T20) developed a multi-resolution
network for airway segmentation. It implemented a three-
axis fusion, computationally inexpensive self-attention mech-
anism. The multi-resolution network was designed to enhance
the multi-scale mining ability of the model and adapt to the seg-
mentation task of different objects due to the significant differ-
ence between the trachea and small airways. Specifically, they
used the interpolation algorithm to resize the original input to
different resolution sizes and fed them into subnetworks whose
weights are not shared.
As transformers had expanded into the field of computer vi-
sion (Dosovitskiy et al., 2020; Liu et al., 2021), the shortcom-
ings of CNNs in capturing global dependencies have been paid
more and more attention by researchers. However, it is im-
practical to directly transfer current transformer structures to
the volumetric medical images due to the limitation of compu-
tational resources. To deal with this problem, they designed
14 This is the draft version for editing. /Medical Image Analysis (2023)
Table 6. Brief summary and comparison of the top 10 methods. It includes the concise description of the method and the training strategy. The order of
ROI size is (depth, height, width).
Team name Main novelty/contribution of the method Training strategy
timi, T6
•Use WingsNet (Zheng et al., 2021b) as the backbone, adopt the local-imbalance-based (Zheng et al., 2021a) and
the centerline-distance-based weight (Zheng et al., 2021b; Zhang et al., 2022a) to dynamically re-weight each voxel.
•Design the small airway oversampling and skeleton-based hard-mining strategies.
•ROI size: 128 ×128 ×128. Batchsize is set to 24.
AdamW optimizer with learning rate 0.0001 is used.
•Dice loss and random crop were used in the training stage 1 for
100 epochs. The variant of the GUL and the designed sampling
strategies were adopted in the training stage 2 for 50 epochs. The
training stage 3 continued for 30 epochs with the combination of
the variant of GUL and 0.5 * Dice loss.
YangLab,T4
•Take the Attention U-Net (Oktay et al., 2018) structure as the backbone, propose the
channel-specific fuzzy attention layer combined with fuzzy logic.
•The JCAM loss is proposed to enhance the continuity and completeness of airways.
Correspondingly, a CCF-score is designed for the measurement.
•Adopt the average size of the 3D minimum bounding box of
the ground-truth as the patch size. Total epoch is set to 200.
Initial learning rate is 0.001 and a decay of 0.5 at the 20th,
50th, 80th, 110th and 150th epoch.
•The online smart patch sampling strategy is used in the
training procedure. It ensures the cropped patches
own enough centerline or foreground voxels.
deeptree damo, T14
•Take the nnU-Net (Isensee et al., 2021) as backbones for both twostages. Formulate the
anatomy-aware multi-class segmentation task for airways that share
large context variation of different branches.
•Introduce a breakage attention map that highlights the breaking regions.
Train a breakage-connection network with the simulated data.
•A modified nnUNet is adopted. Reduce the downsampling
operation to 3 times, and enlarge the width of the convolutional
layers at the deeper blocks to increase capacity.
•A linear time distance transform algorithm (Maurer et al., 2003) is adopted to
calculate the breakage attention map. The curve skeleton and
skeleton-to-volume propagation algorithm (Jin et al., 2016) is applied to
create simulated training samples for deep breakage connection.
neu204, T7
•Take the 3D-UNet (C¸ ic¸ek et al., 2016) as basic architectures for both two stages.
•The 1st stage processes the full airway tree while the 2nd stage only handle
the airway inside the lungs. The contextual transformer (CoT) (Li et al., 2022b) module
is embedded in both encoders and decoders.
•ROI size: 64 ×192 ×192 and 64 ×128 ×128 for the 1st- and
2nd- stage training, respectively. The CoT replaces one of the
3×3×3 convolution, followed by the IN (Ulyanovet al., 2016) and ReLU (Agarap, 2018).
•Adam optimizer with an initial learning rate of 0.01 is adopted.
Exponential decay solution (rate:0.9) is used after each epoch.
Sanmed AI, T1
•Take the 3D-UNet (C¸ ic¸ek et al., 2016) as the backbone and add the attention mechanisms.
•Project & Excite (PE) (Rickmann et al., 2019) module was embedded into each layer to
recalibrate feature maps. The coordinate map is applied to compensate for
the information loss due to the patch-wise training procedure.
•ROI size: 128 ×128 ×128. The coordinate map is normalized
to [-1, 1] in three axes. The warm-up cosine annealing learning
strategy is used. Learning rate ranges from 1e-5 to 0.01 with
cycle period of 20 epochs and decay ratio is 0.5.
•Choose Dice with Focal loss function in all experiments.
dolphins, T5
•Take the 3DResNet (Tranet al., 2018) as the backbone. The deep supervision is introduced
to generate the aggregated loss. Residual block is inserted via the skip connection.
•Use nnUNet to perform one-fold cross-validation of training volumes
with ground-truth and validation volumes with pseudo labels.
•ROI size: 16 ×256 ×256. Batchsize is set to 2. The learning
rate of 0.0004 with Adam optimizer is used. Total epoch is
set to 200 with an early stop solution of 20 epochs.
suqi, T17
•Take the nnUNet (Isensee et al., 2021) as backbone with the introduction of the dense block (Huang et al., 2017).
•Adopt some immediate results to conduct deep supervision that
is conducive to the convergenceof the network.
•ROI size: 96 ×160 ×160. Batchsize 2. Resample pixel spacing.
Total epoch is 1000, SGD optimizer with the initial learning rate
0.01 is used. The weighted Dice and binary cross entropy loss is
applied in all experiments.
notbestme, T20
•Adopt the transformer structure as backbone. To reduce
computational cost, a 2.5 D Compute-cheap Gated Global Attention is designed.
•A multi-resolution network is designed to enhance multi-scale mining ability.
•ROI size: 32 ×160 ×160. The final 5362 image patches are
extracted for training. Batchsize is 2 and total epoch is 50.
•CE loss for the first 5 epochs and then use pixel-wise weighted
CE loss derived from categorical information distribution.
Adam optimizer with a learning rate of 0.0005 is used for training.
lya, T9
•Take the nnUNet (Isensee et al., 2021) as the backbone. Besides original data
augmentation, Elastic and brightness transformation are introduced.
• • The small branches receive more attention in the sampling procedure,
and a combination of TopK and Dice loss are designed to conduct hard mining.
•The Adam optimizer with an initial learning rate of 3e-4 is used.
Total epoch is 1000 and Batchsize is set to 2.
•The TopK loss is intractable, thus the combined loss function
is only used to fine-tune the network.
dnai, T10 •Apply the 3D UNet (C¸ ic¸ ek et al., 2016) for coarse segmentationand
Attention UNet (Oktay et al., 2018) for the refine usage.
•ROI size: 96 ×160 ×160 for the coarse stage,
48 ×80 ×80 for the refining stage.
•75% labeled patches and 25% random patches are sampled for the
coarse stage training while 25% random patches and 75%
patches contained peripheral airways are extracted for fine part.
This is the draft version for editing. /Medical Image Analysis (2023) 15
the 2.5D Compute-cheap Gated Global Attention for 3D medi-
cal images. The self-attention calculation among three matrices
(Q,K,V) followed the standard criterion (Vaswani et al., 2017)
while they adopted the Pooling operation to reduce feature di-
mension. In addition, they used the attention map to enhance
the expression of the Value matrix (V) via the Gated Linear
Unit (Dauphin et al., 2017) mechanism. The main novelty of
T20 method is the proposed 2.5D compute-cheap gated global
attention that introduced into the transformer.
4.2.9. I.lya
The team of lya (T9) applied an improved nnUNet for the air-
way segmentation. More data augmentation, a specified voxel
sampling strategy, and a modified loss function were incorpo-
rated into the nnUNet to improve the segmentation performance
of the small peripheral bronchi. Besides the transformation
by nnUNet config, they adopted the elastic transformation and
brightness transformation to conduct data augmentation. Fur-
ther, they replaced the percentage clipping with a fixed CT win-
dow. The window was set to [-1200,600], and the maximum
HU value was randomly selected from 400 to 600 for data aug-
mentation in the training stage.
To handle the intra-class imbalance problem of the airways,
they made efforts from two aspects. For one thing, they dis-
carded the random sample solution and located the sampling
central points more on the small branches. For another, the deep
neural networks intend to fit the major class, thus the small pe-
ripheral bronchi are easily missed. They applied a combination
of the TopK loss function and the dice loss function:
L(y,ˆ
y)=L(y,ˆ
y)T opK +L(y,ˆ
y)dice,(15)
L(y,ˆ
y)T opK =−1
K
K
X
i=1
yilog(ˆ
yi)+(1 −yi)log(1 −ˆ
yi),(16)
where the TopK loss aimed to force the network to focus on
the hard samples. In summary, the key design of T9 method
lies in paying more attention on small branches in the sampling
procedure and the compound loss function.
4.2.10. J.dnai
The team of dnai (T10) designed a two-stage coarse-to-fine
framework for airway segmentation. In the coarse stage, they
chose the 3D UNet (C¸ ic¸ek et al., 2016) as the backbone and
used the Instance Normalizaiton (Ulyanov et al., 2016), Con-
vTranspose operation instead of the original components. The
patch size was 96 ×160 ×160 for the coarse stage training, with
75% of patches labeled and 25% patches sampled randomly.
In the refining stage, they adopted a relatively shallow net-
work based on the Attention UNet (Oktay et al., 2018). The
patch size also a shared smaller scale, 48 ×80 ×80. How-
ever, These patches were sampled with 25% randomly and 75%
contained high-level airway branches, which demonstrated that
in the refining stage, they aimed to increase the segmentation
performance of the peripheral airways. The combination of
cross entropy and Dice loss was used in all experiments. The
two-stage coarse-to-fine framework is the main design of T10
method.
4.3. Consensus on Effective Methods
After introducing the main contributions on individual meth-
ods, we conclude some consensus of effective methods to deal
with the challenges of pulmonary airway segmentation.
Solution 1: Multi-stage Solution (S1). The multi-stage
training pipeline has demonstrated the advantage of pulmonary
airway segmentation. First, the lung region extraction is a sim-
ple yet effective hard attention mechanism to focus on related
regions, which can deal with the leakage challenge (C1). Sec-
ondly, the initial training stage can obtain the preliminary pre-
dictions, which provide useful information for the following
training stage to acquire a more complete airway tree structure,
such as hard sample mining (T6) and breakage attention map
calculation (T14).
Solution 2: Improve Intra-class Discrimination (S2). Im-
proving the intra-class discrimination ability is a reasonable
choice to tackle the C2, breakage challenge. The extra infor-
mation can be extracted from the CT scans and binary airway
annotation, such as the centerline points (T4, T6, T14), radius
(T14), and spatial location (T7) of the branches. These ad-
ditional knowledges can be leveraged from several aspects to
improve intra-class discrimination: 1) Over-sampling. T4 pro-
posed a smart patch sampling strategy to put more emphasis on
peripheral airways based on the centerline points ratio. 2) Dif-
ferentiate the training procedure between airway branches. T14
formulated a multi-class task between the different levels of air-
way branches, hence, the multi-level discriminative features are
extracted from different branches of airways. T7 designed a
two-stage framework, the first stage was for the whole airway
tree segmentation while the second stage was only trained with
the partial intra-pulmonary airways.
Solution 3: Novel Objective Functions (S3). Designing
novel loss functions that emphasize topology completeness and
topology correctness is beneficial to deal with the challenge of
Leakage (C1) and Breakage (C2), Robustness and Generaliza-
tion (C3). For example, T4 proposed a JCAM loss function that
focuses on topological errors. The JCAM measures the projec-
tion error and centerline detected ratio error. T14 proposed the
breakage attention map to construct the objective function used
in the breakage-connection network. T6 adopted the variant of
the general union loss to force the network to enjoy superiority
to the continuity. The objective functions that pay attention to
the topology could harness the high-level feature of an airway
tree structure, which may improve the robustness and general-
ization ability of the algorithms.
5. Results
In this section, we reported the obtained results in the val-
idation phase (Table 7) and the test phase (Table 8, Table 9),
respectively. The results of the validation phase were stated
from an overall statistical perspective while the analysis of the
test phase results focused on the top 10 algorithms, which is
in accordance with Section 4. We conducted an elaborated
comparison among the top 10 algorithms, including quantita-
tive and qualitative analysis, model complexity analysis, deep
16 This is the draft version for editing. /Medical Image Analysis (2023)
Table 7. Quantitative results on the validation set of ATM’22 challenge achieved by participants. The results are reported in the format of mean ±standard
deviation. In the validation set, The number of sample CT scans is 50, and the total number of airway branches is 212.48±52.10. The results are first post-
processed by the largest component extraction and eventually reported by the Tree length detected rate (i.e., TD, %), Branch detected rate (i.e., BD, %),
Dice Similarity Coefficient (i.e., DSC, %), Precision (%), Sensitivity (i.e., Sen, %) and Specificity (i.e., Spe, %).
Team name TD (%) ↑BD (%) ↑DSC (%) ↑Precision (%) ↑Sen (%) ↑Spe (%) ↑
Sanmed AI (V1) 89.874±6.609 85.102±10.085 95.555±1.376 95.551±2.385 95.644±2.483 99.986±0.008
YangLab (V2) 94.406±3.798 91.302±6.439 95.926±1.249 97.180±1.941 94.766±2.270 99.991±0.006
notbestme (V3) 85.756±7.560 79.181±11.514 95.212±1.893 95.706±1.986 94.824±3.511 99.987±0.007
suqi (V4) 80.680±7.475 70.555±10.284 94.713±1.178 96.191±1.431 93.302±1.705 99.989±0.004
cvhthreedee (V5) 87.856±8.482 80.291±13.996 94.939±1.597 96.839±1.948 93.188±2.890 99.990±0.006
LinkStartHao (V6) 89.764±7.611 83.439±12.780 94.392±1.674 95.758±2.091 93.140±2.860 99.987±0.007
neu204 (V7) 94.441±4.008 92.279±5.987 95.800±1.142 93.451±1.929 98.302±1.221 99.979±0.007
miclab (V8) 82.865±5.861 74.223±40.124 95.501±1.007 96.557±1.575 94.507±1.785 99.990±0.005
blackbean (V9) 89.422±7.675 83.210±12.401 94.554±1.778 95.461±2.219 93.730±2.747 99.986±0.008
Median (V10) 88.765±7.669 82.441±12.040 94.667±1.699 95.642±2.088 93.769±2.579 99.987±0.007
lya (V11) 89.613±7.395 83.583±12.306 94.371±1.566 95.146±2.661 93.689±2.339 99.985±0.009
satsuma (V12) 89.783±7.734 83.571±12.822 94.649±1.588 95.574±2.233 93.817±2.623 99.986±0.008
ailab (V13) 90.989±6.912 86.102±11.011 94.624±1.737 95.211±2.334 94.111±2.572 99.985±0.008
timi (V14) 95.866±3.366 94.921±4.399 93.987±2.337 94.041±2.837 94.008±3.075 99.981±0.010
sen (V15) 89.295±7.587 83.269±12.300 93.614±2.306 95.000±2.473 92.386±3.879 99.985±0.009
MibotTeam (V16) 89.293±8.577 81.184±13.734 94.772±1.275 95.596±2.353 94.044±2.354 99.986±0.008
CITI-SJTU (V17) 91.840±6.000 87.239±9.724 92.943±1.595 91.132±2.526 94.891±1.981 99.971±0.011
SEU (V18) 84.704±10.573 76.672±16.599 93.776±1.885 95.240±2.561 92.468±3.325 99.985±0.009
deeptree damo (V19) 97.369±2.957 96.717±3.711 92.812±1.488 87.324±2.652 99.090±0.449 99.955±0.013
CBT IITDELHI (V20) 73.928±10.847 65.672±12.250 94.336±1.897 97.127±1.505 91.812±3.768 99.992±0.004
dolphins (V21) 83.478±12.616 77.496±15.540 93.228±2.160 95.961±1.652 90.814±4.478 99.988±0.006
bms410 (V22) 60.705±8.762 46.924±6.555 88.394±2.562 99.515±1.464 79.635±4.252 99.999±0.004
airwayseg (V23) 78.128±16.110 72.714±16.955 92.828±3.555 95.136±2.000 90.939±6.606 99.985±0.007
atmmodeling2022 (V24) 63.540±20.759 56.034±21.56 92.416±4.053 97.156±2.503 88.556±7.644 99.992±0.008
bwhacil (V25) 72.524±9.621 58.391±9.181 87.628±2.105 83.076±3.826 92.963±3.532 99.941±0.018
dnai (V26) 87.596±5.529 79.467±9.031 91.341±1.409 91.234±2.159 91.473±1.205 99.973±0.009
mlers (V27) 74.207±12.406 67.412±14.011 90.702±1.589 89.926±3.309 91.636±2.490 99.967±0.014
fme (V28) 74.785±6.849 55.643±7.580 87.053±1.408 87.978±1.965 86.213±2.401 99.963±0.010
biomedia‡(V29) 60.598±12.005 51.359±11.437 74.778±10.255 91.687±1.495 64.424±14.166 99.982±0.005
ntflow‡(V30) 28.372±6.008 21.930±5.304 86.452±3.299 95.834±1.595 78.914±5.320 99.990±0.004
‡Their submissions before the deadline cannot be correctly evaluated by grand-challenge.org. We downloaded their results and evaluated on local devices.
This is the draft version for editing. /Medical Image Analysis (2023) 17
analysis of the relationships among metrics, and ranking stabil-
ity analysis. It can be noticed that the airway segmentation task
itself is challenging. Our analysis only focused on the results of
the top 10 methods to derive critical observation and effective
methods, then provide insights for the research community.
That means even the tenth-place produced better results than
other successful participation teams (i.e., better than the average
of all valid results).
5.1. Validation Phase
Overall Outcome: 50 CT scans without the pulmonary air-
way ground-truth are provided for evaluation in the validation
phase. The participants were required to submit the binary pre-
diction results to the platform of grand-challenge.org, where
the evaluation was automatically executed. In the validation
phase11, we received 30 valid submissions from different teams,
23 submissions of them are detailed enough to report their main
architectures and loss function, which can be seen in Figure 9.
All the results are derived from the best submission of each
team before the deadline of the validation phase.
Architectures and Loss Functions: As seen in Figure 9,
3. MICCAI 2022 Challenge: Multi-site, Multi-Domain Airway Tree Modeling (ATM’22)
Main Architectures and Loss Function
1. Maier-Hein L, Reinke A, Kozubek M, et al. BIAS: Transparent reporting of biomedical image analysis challenges[J]. Medical image analysis, 2020, 66: 1017 96.
3D-UNet
11/23
nnUNet
7/23
Transformer
2/23
EfficientDet
1/23
WingsNet
1/23
ResUNet
1/23
Dice + Cross
entropy
8/23
Dice loss
6/23
Compound loss
7/23
Cross entropy
1/23
Dice + Focal
1/23
Main
Architectures
Loss
Function
Fig. 9. Main network architectures (left) and loss function (right) adopted
by the participants in the validation phase (n =23 teams).
the 3D UNet (C¸ ic¸ek et al., 2016) and the nnUNet (Isensee
et al., 2021) were the popular network architectures adopted
by the participants. Two teams adopted the transformer archi-
tectures (Liu et al., 2021; Tang et al., 2022) as the backbone.
The WingsNet (Zheng et al., 2021b) EfficientDet (Tan et al.,
2020), and the ResUNet (Diakogiannis et al., 2020) were used
by one team, respectively. The 3D UNet and the nnUNet are
demonstrated to be the most effective architectures for the med-
ical image segmentation task, hence, they are the most popu-
lar options for the participants. The WingsNet was adopted to
explicitly tackle the inter-class imbalance problem that existed
in airway segmentation. For easy and fast multi-scale feature
fusion, one team used the EfficientDet. The ResUNet was de-
signed to alleviate the problem of vanishing and exploding gra-
dients, thus achieving consistent training as the depth of the
network increases. As for the aspect of the loss functions, the
Dice loss (Milletari et al., 2016) with its variants (e.g., Dice loss
with Focal loss (Zhu et al., 2019), Dice loss with Cross Entropy
11Full ranking results of the validation phase (Time period: 1 Jun 2022
– 17 Aug 2022): https://atm22.grand-challenge.org/evaluation/
validation-phase/leaderboard/
loss (Taghanaki et al., 2019)) dominate the majority. Some
other teams designed the compound loss functions. For sim-
plicity, we used the team index defined in the Table 3. V9 de-
signed the centerline weighted loss function. In addition to the
centerline weighted loss function, V14 further used the local-
imbalanced-based loss function to dynamically re-weight each
voxel. V19 proposed a breakage-sensitive loss function in the
second stage to repair the breakage regions within the airways.
V25 explored the clDice loss (Shit et al., 2021) combined with
the multi-weighted loss to preserve the topology of the airway.
V2 designed a Jaccard Continuity and Accumulation Mapping
(JCAM) loss to tackle the discontinuity problem. V14, V2, V19
achieved the leading performance in the validation phase and
the later test phase, which demonstrated that the reasonable de-
sign of loss function is effective for the pulmonary airway seg-
mentation task. This experimental observation is in line with
the consensus of effective methods finding, the novel objective
functions that emphasize the topological completeness and the
topological correctness are beneficial to deal with challenge of
the Leakage (C1) and Breakage (C2), Robustness and General-
ization (C3). The details of all qualified papers in the validation
phase could be found in our official repository collection12.
Quantitative Results: Six metrics were reported in both the
validation phase and test phase. Four metrics, including the
tree length detected rate (TD, %), branch detected rate (BD,
%), Dice similarity coefficient (DSC, %) and Precision, are
used for the score calculation. In addition, the sensitivity (Sen,
%) and the specificity (Spe, %) were also covered to present
a more comprehensive report. The results are presented in
Table 7. It was observed that no single team could achieve
the best performance on all six metrics. As for the topolog-
ical completeness of airway segmentation results, teams V19
and V14 achieved remarkably higher performance than other
teams. Specifically, V19 achieved 97.369% TD and 96.717%
BD, and V14 attained 95.866% TD and 94.921% BD, which
substantially exceeded the average results (84.696% TD and
77.680% BD). As for the topological correctness, V2 achieved
the best performance on the DSC (95.926%) and V22 achieved
the highest Precision (99.515%). However, the other met-
rics of V22 were quite low, only 60.705% TD, 46.924% BD,
and 88.394% DSC. The underlying reason is that all metrics
were calculated on the largest component of the airway pre-
diction, the precision can be very high whereas the breakage
(C2) is severe (e.g., only the trachea and the main bronchi
were preserved). On the contrary, V2 obtained the second-best
performance of the Precision (97.180%) while achieving the
best DSC. It is worthwhile noting that in the validation phase,
V2 achieved satisfactory DSC and Precision while maintain-
ing competitive TD (94.406%) and BD (91.302%). Similarly,
V14 preserved compelling results of DSC (93.987%) and Pre-
cision (94.041%) under the circumstance that they achieved the
highest performance on topological completeness. Another im-
portant observation is that the high performance of topologi-
12ATM’22 Validation Phase Papers: https://drive.google.com/
drive/folders/1FTrc1AGqEqNDHvfCtEpQG3O2agxHXu46?usp=share_
link
18 This is the draft version for editing. /Medical Image Analysis (2023)
Table 8. Quantitative results on the full hidden test set of ATM’22 challenge achieved by participants. The results are reported in the format of mean ±
standard deviation. In the hidden test set, The number of sample CT scans is 150, and the total number of airway branches is 178.91±48.81. The results
are first post-processed by the largest component extraction and eventually reported by the Tree length detected rate (i.e., TD, %), Branch detected rate
(i.e., BD, %), Dice Similarity Coefficient (i.e., DSC, %), Precision (%), Sensitivity (i.e., Sen, %) and Specificity (i.e., Spe, %).
Team name TD (%) ↑BD (%) ↑DSC (%) ↑Precision (%) ↑Sen (%) ↑Spe (%) ↑
Sanmed AI (T1) 88.843±7.250 83.350±10.900 94.969±1.800 95.055±3.210 95.047±3.349 99.984±0.011
fme (T2) 70.695±12.393 54.615±12.946 87.986±10.177 87.137±10.730 90.460±4.874 99.698±2.418
LinkStartHao (T3) 81.721±10.295 71.140±15.614 92.938±2.133 96.140±2.414 90.128±4.438 99.988±0.008
YangLab (T4) 94.512±8.598 91.920±9.435 94.800±7.925 94.707±8.302 95.015±8.240 99.985±0.010
dolphins (T5) 90.134±6.477 84.201±11.151 92.734±2.094 94.656±3.434 91.122±4.273 99.983±0.012
timi (T6) 95.919±5.234 94.729±6.385 93.910±3.682 93.553±3.420 94.500±5.168 99.979±0.012
neu204 (T7) 90.974±10.409 86.670±13.087 94.056±8.021 93.027±8.410 95.284±8.581 99.979±0.013
blackbean (T8) 82.103±10.719 71.418±16.435 93.153±2.284 96.146±2.380 90.545±4.748 99.988±0.008
lya (T9) 85.215±9.146 75.705±14.887 93.758±2.174 96.501±2.908 91.412±4.795 99.989±0.010
dnai (T10) 86.733±5.393 77.888±8.703 90.871±1.748 91.674±2.787 90.871±1.748 99.974±0.011
bms410 (T11)⋆3.898±6.481 2.812±5.499 16.965±19.429 77.583±36.470 10.920±14.278 99.997±0.005
miclab (T12) 75.408±14.094 65.994±17.667 93.493±2.678 96.440±2.565 91.035±5.907 99.989±0.009
CITI-SJTU (T13) 83.545±9.942 73.012±15.854 92.443±2.195 94.756±2.911 90.445±4.384 99.984±0.010
deeptree damo (T14) 97.853±2.275 97.129±3.411 92.819±2.191 87.928±4.181 98.448±1.402 99.957±0.018
CBT IITDELHI (T15) 66.588±26.624 59.044±24.793 81.280±30.103 94.865±2.810 79.892±29.790 99.984±0.010
bwhacil (T16) 75.556±24.091 68.478±25.843 81.380±13.376 80.076±8.127 87.180±18.607 99.927±0.040
suqi (T17) 89.209±7.338 82.164±12.264 93.646±2.102 95.777±3.318 91.839±4.378 99.987±0.012
satsuma (T18) 81.565±11.017 70.819±16.828 93.307±2.196 96.181±2.411 90.813±4.745 99.988±0.008
Median (T19) 78.653±10.365 68.314±14.529 93.119±2.095 96.159±2.305 90.443±4.361 99.988±0.008
notbestme (T20) 87.518±9.028 81.343±13.560 94.515±2.270 96.590±2.673 92.701±4.325 99.989±0.009
biomedia (T21) 64.254±11.578 53.988±12.679 80.370±11.816 93.533±2.953 71.986±15.532 99.984±0.007
⋆The results are abnormal hence were excluded in the final ranking.
This is the draft version for editing. /Medical Image Analysis (2023) 19
Table 9. Quantitative results on the noisy domain (i.e., COVID-19 CT scans) of the hidden test set achieved by participants. The results are reported in
the format of mean ±standard deviation. In the hidden test set, The number of COVID-19 CT scans is 58, and the total number of airway branches is
167.17±34.97. The results are first post-processed by the largest component extraction and eventually reported by the Tree length detected rate (i.e., TD,
%), Branch detected rate (i.e., BD, %), Dice Similarity Coefficient (i.e., DSC, %), Precision (%), Sensitivity (i.e., Sen, %) and Specificity (i.e., Spe, %).
Team name TD (%) ↑BD (%) ↑DSC (%) ↑Precision (%) ↑Sen (%) ↑Spe (%) ↑
Sanmed AI (T1) 83.517±6.686 74.562±9.299 94.615±1.202 97.533±0.913 91.898±2.204 99.991±0.004
fme (T2) 57.623±5.934 42.319±4.507 88.217±1.819 91.703±1.110 85.036±3.001 99.973±0.004
LinkStartHao (T3) 72.648±7.033 56.880±8.482 91.310±1.626 97.398±0.656 85.993±2.945 99.991±0.003
YangLab (T4) 92.358±4.174 88.218±5.802 94.982±0.998 97.052±1.309 93.036±1.919 99.990±0.006
dolphins (T5) 84.061±5.356 73.740±8.478 91.984±1.544 97.336±0.898 87.241±2.803 99.991±0.004
timi (T6) 94.251±3.541 92.049±4.898 95.063±1.227 96.026±1.307 94.147±1.978 99.986±0.005
neu204 (T7) 86.021±6.614 77.724±9.825 94.672±1.055 96.186±1.860 93.280±2.426 99.986±0.008
blackbean (T8) 72.231±7.783 56.094±9.758 91.176±1.785 97.390±0.656 85.768±3.198 99.992±0.003
lya (T9) 76.528±5.575 61.400±8.787 91.932±1.602 98.355±0.794 86.350±2.901 99.995±0.003
dnai (T10) 84.181±5.538 73.619±8.026 90.191±1.482 92.911±1.844 87.664±2.129 99.976±0.009
bms410 (T11)⋆2.508±2.631 1.663±2.247 8.869±8.785 71.356±36.422 4.895±5.198 99.997±0.004
miclab (T12) 59.650±6.603 46.486±5.433 90.814±1.906 98.427±0.389 84.350±3.242 99.995±0.002
CITI-SJTU (T13) 73.647±6.963 57.194±8.783 90.908±1.559 96.476±0.777 85.998±2.834 99.988±0.004
deeptree damo (T14) 96.242±2.634 94.947±4.104 93.990±1.240 91.049±2.239 97.165±0.866 99.965±0.012
CBT IITDELHI (T15) 53.197±33.505 45.729±29.034 66.297±40.944 96.003±2.967 64.358±39.798 99.986±0.011
bwhacil (T16) 53.486±23.100 42.926±20.643 75.861±17.975 84.930±8.197 73.230±21.150 99.950±0.031
suqi (T17) 83.259±5.518 71.429±8.825 92.493±1.389 98.040±1.081 92.493±1.389 99.993±0.005
satsuma (T18) 71.235±7.414 55.133±9.328 91.316±1.682 97.537±0.630 85.894±3.013 99.992±0.003
Median (T19) 70.054±7.891 55.809±8.596 91.367±1.671 97.388±0.637 86.099±2.946 99.992±0.003
notbestme (T20) 81.283±8.344 70.723±11.425 93.175±1.668 97.917±0.885 88.940±3.199 99.993±0.004
biomedia (T21) 59.444±8.351 47.130±7.753 82.915±11.107 94.014±1.158 82.915±11.107 99.983±0.006
⋆The results are abnormal hence were excluded in the final ranking.
20 This is the draft version for editing. /Medical Image Analysis (2023)
cal correctness can not consistently guarantee the topological
completeness of the airways and vice versa. For one thing,
take V1 as an example, they achieved the competitive DSC
(95.555%) and Precision (95.551%), higher than the average
(93.383% DSC and 94.275% Precision), however, their TD and
BD were under 90%. More strikingly, V20 obtained 94.336%
DSC and 97.127% Precision while the TD and BD were as
low as 73.928% and 65.672% respectively. For another, V19
over-emphasized the topological completeness, consequently,
the topological correctness was inevitably affected. Specifi-
cally, they achieved the superior performance of the TD and BD
while the DSC and Precision suffered a decrease to 92.812%
and 87.324% respectively. The above findings demonstrated
that the pulmonary airway segmentation task differs from other
common medical segmentation tasks. DSC usually dominates
in the evaluation of many medical segmentation tasks due to
its superiority in measuring overlap-wise accuracy. However,
the overlap-wise accuracy is not sufficient to evaluate the air-
way extraction algorithms because the topology is intrinsically
embedded in the voxel-wise airway annotation data.
5.2. Test Phase
5.2.1. Overall Outcome
150 CT scans are kept entirely hidden by the organizers,
which means that the input images are inaccessible to the partic-
ipants. The participants are required to package their algorithms
into the docker images and followed the official instructions to
execute these dockers. 21 docker submissions were received
and successfully executed on our server to generate the final re-
sults. The standard format and instructions of the docker are
provided in our official repository13. Owning a high number
of registrations however only a fraction of the fully-completed
participants is a typical phenomenon that takes place in biomed-
ical image analysis challenges. (e.g., the Medical Segmenta-
tion Decathlon (Antonelli et al., 2022) towards a multitude of
both tasks and modalities 2019 challenge with 19/180 success-
ful submissions, the Skin lesion analysis detection (Codella
et al., 2018b) 2017 challenge with 46/593 submissions or the
Multi-Center, Multi-Vendor, and Multi-Disease Cardiac Seg-
mentation (Campello et al., 2021) challenge 2020 with 16/80
submissions). Some challenge participants usually register for
the data access while they cannot participate in the validation
and test phase within the deadline due to other commitments.
Furthermore, the dissatisfied training and validation results may
frustrate them to step back from the final submission. All qual-
ified papers in the final test phase could be found in our official
repository collection14.
5.2.2. Quantitative and Qualitative Comparisons
Performance across Domains: Table 8 and Table 9 respec-
tively reported the quantitative results of the full set and the
13The Docker Tutorial for the ATM’22 Test Phase Submission:
https://github.com/Puzzled-Hui/ATM- 22-Related- Work/tree/
main/baseline-and- docker-example
14ATM’22 Test Phase Papers:https://drive.google.com/drive/
folders/1T9OQ552TZUK5bxKm4D9wPt83AzSgqLAr?usp=sharing
partial COVID-19 set of the hidden test set. The overall re-
sults were broadly similar to those in the validation phase, es-
pecially the top 5 algorithms. The ranking result (seen in Table
10) revealed that the same 5 teams occupied the top 5 posi-
tions in both the validation and test phase, which demonstrated
their generalization ability was superior to the rest of the teams.
T6 and T4 ranked in the top 2 of the whole test phase, which
proved their strong capacity under the comprehensive evalua-
tion system. Compared to the average results (TD: 83.350%,
BD: 75.596%, DSC: 91.277%, Precision: 93.669%), T6 and
T4 achieved better performance. Specifically, T6 achieved
the 95.919% TD, 94.729% BD, 93.910% DSC, 93.53% Preci-
sion, and T4 obtained the 94.512% TD, 91.920% BD, 94.800%
DSC, 94.707% Precision. The overall compelling performance
achieved by T6 can be ascribed to their elaborated optimiza-
tion procedure tailored for the pulmonary airway segmentation
task. Furthermore, the large batchsize (24 in their experiments)
may be helpful to explore the features of airway datasets. The
proposed fuzzy attention layer and continuity and accumula-
tion mapping loss by T4 could assist to preserve the topological
structure of airways. In addition, they used the region growing
in the post-process procedure, which is beneficial to improv-
ing the accuracy of the trachea while maintaining the fidelity of
the airway structure. T14 ranked third place and achieved the
highest TD (97.853%) and BD (97.129%), while the Precision
declined to 87.928% that lower than the average. This obser-
vation implied that their proposed breakage attention map may
over-emphasize the topological completeness while leading to
the dilation problem. Figure 14 corroborated this finding as T14
generated thicker prediction surrounding the boundary of the
airway ground-truth. T7 and T1 ranked forth and fifth place, re-
spectively. They achieved the leading performance in the DSC
(94.056%, 94.696%) and Precision (93.027%, 95.055%). How-
ever, their TD and BD were not outstanding due to the lack of
especial modules for airway segmentation.
Another critical comparison was the generalization ability on
the noisy domain. Figure 10 depicted the box plots of the quan-
titative results achieved by the top-10 algorithms on the vali-
dation set, test set, and COVID-19 set separately. It was obvi-
ous that TD and BD suffered a decrease in the COVID-19 set
by all teams. The averages of TD and BD were 75.246% and
64.206%, which was far below the validation or test set. This
performance degradation was moderate among the top 3 meth-
ods as they could still achieve more than 92% TD and 88% BD.
DSC was slightly less affected than TD/BD while the variation
tendency among teams was not unified. DSC decreased in the
COVID-19 set among most teams, while T6 and T14 increased,
which substantiated the S3 (Sec.4.3). Both T6 and T14 de-
signed the topology-sensitive loss functions that are beneficial
to improving the robustness and generalization ability. The Pre-
cision had slightly increased due to the trade-offof sensitivity
and specificity. Considering the general performance degrada-
tion in the noisy domain, especially topological completeness,
further investigation of improving generalization is necessary.
Branch-wise True Positive Analysis: Table 12 described the
branch detected number achieved by the top 10 teams. The
top 10 methods obtained 154.4 branches on average across
This is the draft version for editing. /Medical Image Analysis (2023) 21
(a) Tree length detected rate (b) Branch detected rate
(c) Dice Similarity Coefficient (d) Precision
Fig. 10. Box plots of the quantitative results achieved by the top 10 algorithms on the validation set, test set, and COVID-19 set separately. The quantitative
results include the (a) tree length detected rate, (b) branch detected rate, (c) dice similarity coefficient and (d) precision. Team index is adopted to represent
different algorithms, and the order in the x-axis is dependent on the final rank (descend from left to right).
110
a) Normal Case 1 b) Normal Case 2
c) COVID-19 Case 1 d) COVID-19 Case 2
Fig. 11. The visualization of the color-encoded airway prediction by the Top 10 algorithms. Only the true positive part is presented, and the airway
branches are assigned with a unique color from dark red (detected by only one team) to green (detected by all ten teams). Two normal cases and two
COVID-19 cases from the test set are chosen for illustration. Individual line chart describes the TD and BD values of each team is also included beside the
branch-wise airway visualization. Best viewed in color.
22 This is the draft version for editing. /Medical Image Analysis (2023)
Fig. 12. The bar plot of the branch detected number achieved by the top
10 teams, averaged across the whole test dataset. The x-axis order (from
left to right) follows the averaged ranking result. The averaged 154.4 of the
branches were detected by all these 10 teams, whereas only the top 3 teams
achieved more than 165 branches.
the whole test dataset. T6, T4, T14 achieved more than 165
branches, which surpassed other teams. To figure out the differ-
ence in detected branch numbers concretely, we conducted the
visualization of the color-encoded airway prediction in Figure
11. The airway branches are assigned a unique color from dark
red (detected by only one team) to green (detected by all ten
teams). Both the normal cases and the COVID-19 cases were
provided for visualization. It can be explicitly observed that the
disparity usually happened in the peripheral airways, especially
the fifth/sixth generation of airways. The TD and BD exceeded
90% by the top 3 ranked methods whereas the lowest BD was
only around 50% from the rest of the teams. Another interest-
ing fact is that the top-ranked methods did perform well in each
case. For example, Figure 11.a) demonstrated that the airway
in the left upper lobe was difficult to be detected, where red and
blue dominated, revealing only a few teams detected correctly.
The first-place team suffered a failure in this case. The under-
lying explanation for this observation is that the first-place al-
gorithm shared some specific feature bias in contradiction with
the failure case. It reminded us that pursuing the consensus of
the prediction result is also significant to improve the reliability
of deep learning models.
False Negative Analysis: As discussed before, the primary dif-
ference among the methods focused on the peripheral airways.
Figure 13 illustrated these areas separately for each method of
the top 10 algorithms. The canonical normal case and COVID-
19 case were chosen for representation, where the red part in-
dicates the true positive and the green part denoted the false
negative. The local details highlighted the right middle and
lower lobes of the normal case, and the left lower lobe of the
noisy COVID-19 case. In the first normal case, The top 5 meth-
ods segmented more accurate than the rest of the teams. For the
COVID-19 case, T6 and T14 demonstrated significant improve-
ment compared to others in detecting more bronchi and preserv-
ing better completeness under such noisy COVID-19 imaging
characteristics. These observations confirm that the effective
solutions (S1,S2,S3) are beneficial to reduce the false negatives.
False Positive Analysis: The increase in the false positive
accounted for the dilation problem of the airway segmenta-
tion task (Zheng et al., 2021b). Figure 14 rendered the false
positives (blue) and the boundary of the airway ground-truth
(red). Except for the uncertainty of distal airways, the false
positive concentrated on the trachea and main bronchi. Took
T14, T7, and T10 as an example, They seemed thicker than the
ground-truth and the corresponding Precision were also lower
than other teams. The underlying reasons can be summarized
as two aspects. 1) Over-emphasizing topological completeness
could be harmful to the topological correctness (T14). 2) The
airway segmentation task itself is challenging as the intensity
shares large variation among the trachea, main bronchi, lobar
bronchi, and distal segmental bronchi (T1, T10). To reduce the
false positive, the promising effective solution is increasing the
intra-class feature discrimination ability and trying to achieve
a satisfactory trade-offbetween topological completeness and
correctness.
5.2.3. Metric Correlation Analysis
The evaluation system adopted in this challenge contains two
critical components, topological completeness (TD and BD)
and topological correctness (DSC and Precision). Quantita-
tive results showed that there was no single team achieves the
highest performance on all of these metrics. More remarkably,
T14 achieved the highest the TD (97.853%) and BD (97.129%)
while the lower Precision (87.928%) was the main drawback.
Moreover, The high DSC or Precision cannot guarantee the
topological completeness (T1, T7, T9). For example, T1 ob-
tained the outstanding DSC across the validation, test, and the
COVID-19 sets, however, their performance of TD and BD on
the test phase was lower than 90% and 85%, respectively. Fig-
ure 14 demonstrated that they did not produce many false pos-
itives while Figure 13 corroborated the missing of substantial
branches that led to the inferior TD and BD. To intuitively ob-
serve the inter-relation of these metrics, we conducted the met-
ric correlation analysis, as seen in Figure 15. Four subplots
were provided, the TD v.s. DSC (Figure 15.a), TD v.s. Pre-
cision (Figure 15.b), BD v.s. DSC (Figure 15.c), and BD v.s.
Precision (Figure 15.d). Each metric correlation could be sup-
posed to split into four quadrants. The expected method should
appear in the first quadrant, where both the topological com-
pleteness and topological correctness metrics are high. The
closest approaches (T6 and T4) so far still have a lot of room
for improvement. The majority of the top 10 methods were dis-
tributed on the second quadrant, which demonstrated that high
topological completeness is far tougher than high topological
correctness to be achieved. This observation was in line with
the main challenges encountered by the airway segmentation
task (seen in Sec.1.2). The situation of T10 and T14 were quite
different, T10 achieved high TD and BD, moderate DSC, and
a low Precision. T14 was contrary to T10, Their TD, BD, and
DSC were low, Precision was moderate. The above findings
revealed that the principal challenge is improving topological
completeness. Secondly, the topological correctness should be
This is the draft version for editing. /Medical Image Analysis (2023) 23
Ground-Truth
Ground-Truth
T6 T4 T14 T7 T1
T5 T17 T20 T9 T10
T6 T4 T14 T7 T1
T5 T17 T20 T9 T10
True Positive
False Negative
Local Details
a) Normal Case
b) COVID-19 Case
Fig. 13. Visualization of the false negatives between the Top 10 teams (followed by the ranking order) of the normal case and the COVID-19 case. The red
part indicates the true positive and the green part denotes the false negative. Local details are highlighted in the boxes. Best viewed in color.
24 This is the draft version for editing. /Medical Image Analysis (2023)
a) Normal Case b) COVID-19 Case
T6 T4 T14 T7 T1
T5 T17 T20 T9 T10
DSC=91.85%
Pre =96.85%
DSC=92.56%
Pre =98.15%
DSC=92.53%
Pre =93.60%
DSC=93.18%
Pre =96.78%
DSC=93.19%
Pre =97.49%
DSC=91.07%
Pre =97.31%
DSC=91.59%
Pre =98.16%
DSC=91.89%
Pre =98.53%
DSC=91.51%
Pre =98.13%
DSC=89.10%
Pre =95.53%
Boundary of Ground-Truth False Positives
DSC=94.50%
Pre =96.37%
DSC=94.19%
Pre =97.85%
DSC=94.11%
Pre =92.20%
DSC=94.26%
Pre =96.79%
DSC=93.27%
Pre =98.32%
DSC=90.89%
Pre =97.83%
DSC=91.95%
Pre =98.58%
DSC=91.66%
Pre =98.04%
DSC=91.47%
Pre =98.24%
DSC=90.13%
Pre =94.85%
T6 T4 T14 T7 T1
T5 T17 T20 T9 T10
Fig. 14. Visualization of the false positive between the Top 10 teams (followed by the ranking order) of the normal case and the COVID-19 case, same as
Figure 13. The red part indicates the boundary of the airway ground-truth and the blue part denotes the false positive. Best viewed in color.
carefully handled to prevent a dramatic decrease.
5.2.4. Ranking Stability Analysis
As defined in Sec.3.2, we adopted the mean score calculation
as the ranking criterion. The rankings included all successful
participants (i.e., who took part in both the validation and test
phase). Table 10 reported the rankings of 20 teams in the valida-
tion phase and test phase. It is observed that the top five teams
in the validation phase also occupied the top five positions, only
neu204 (T7) and deeptree damo (T14) interchanged the order.
However, From the sixth position to the twentieth position, an
obvious variation arose between the ranking of the validation
and test phase.
Kendall’s τ(Kendall, 1938) was adopted to determine the
variability of the rankings. In Table 10, the Kendall’s τis 0.607
with p-value of 0.000189, which also implied the fluctuation
of the ranking. The generalization ability of the methods ac-
counts for this phenomenon, as the top five methods performed
better than the rest of the teams with regard to the generaliza-
tion ability. To measure the sensitivity of our score calculation,
we modified it to a weighted formulation that slightly empha-
sizes the geometry of airway prediction results via adjusting the
weighting coefficients:
Weighted Score =0.30 ∗T D +0.30 ∗BD
+0.15 ∗DS C +0.15 ∗Precision.(17)
Table 11 reported the ranking results by the mean score and
weighted score separately. The result has shown that only the
team deeptree damo (T14) moved to first place while the rela-
tive orders of other teams remained unchanged. As discussed
before, T14 put much emphasis on topological completeness,
hence, they were sensitive to the weighting coefficients. In gen-
eral, the ranking order was close to identical as the correspond-
ing Kendall’s τis 0.979 with p-value of 1.87e-09. The above ex-
perimental results demonstrated that the ranking criterion was
reasonable.
5.2.5. Model Complexity Analysis
The efficiency of the method is also critical in medical ap-
plications. For example, the excellent efficiency provides the
potential to reconstruct the airway in real-time from the mobile
CT, which is helpful in guidance for thoracic surgery. Recently,
efficiency also raised attention in biomedical challenges. In pre-
vious challenge settings, only the prediction results were re-
quired to submit, consequently, the efficiency behind the meth-
ods was non-transparent. As the competition standard rose,
submitting the docker of the algorithm is preferred as a primary
choice. FLARE’21 (Ma et al., 2022) challenge considered the
running time and the maximum GPU memory consumption as
a part of the ranking score calculation. Although efficiency was
not involved in the ranking calculation, we conducted the model
complexity analysis for the complementary measurement. The
test dockers submitted by the participants were executed on the
same Linux workstation with Intel Xeon Gold 5119T CPU @
1.90GHz, 128 GB RAM, and 2 NVIDIA Geforce RTX 3090
GPUs. The maximum GPU memory consumption and infer-
ence time cost of each team were recorded in Table 12. We also
compared the metrics conditioned on the efficiency in Figure
16. T6 and T14 achieved an overall high performance, mean-
while, they maintained the competitive efficiency. Compared to
T6 and T14, T4 was time-consuming. The post-process proce-
dure used by T4 may increase the inference time cost. In the
future, the model complexity is likely to be added to the evalu-
ation to encourage the creation of effective methods with high
efficiency.
6. Discussions
6.1. Clinical Applications
Our challenge emphasizes the topological correctness and
completeness of the airway. Both are clinically significant be-
cause pulmonary disease assessment and endobronchial inter-
vention require accurate airway segmentation for quantitative
This is the draft version for editing. /Medical Image Analysis (2023) 25
(a) Tree length detected rate v.s. Dice Similarity Coefficient (b) Tree length detected rate v.s. Precision
(c) Branch detected rate v.s. Dice Similarity Coefficient (d) Branch detected rate v.s. Precision
Fig. 15. The metric correlation scatter maps of the top 10 methods. Concretely, (a) the TD v.s. DSC, (b) the TD v.s. Precision, (c) the BD v.s. DSC, (d) the
BD v.s. Precision. The team name is annotated besides the splashes. Best viewed in color.
26 This is the draft version for editing. /Medical Image Analysis (2023)
(a) Tree length detected rate (b) Branch detected rate
(c) Dice Similarity Coefficient (d) Precision
Fig. 16. Comparison of different models on the inference time and metrics. The larger markers indicate that the models share more parameters.
This is the draft version for editing. /Medical Image Analysis (2023) 27
Table 10. Rankings for the validation phase and the test phase. Mean scores
were used in ranking for all final 20 teams. Score is under percentage format.
The validation phase The test phase
Rank Team Name Mean Score Rank Team Name Mean Score
1 timi 94.7038 1 timi 94.5278
2 YangLab 94.7035 2 YangLab 93.9848
3 neu204 93.9927 3 deeptree damo 93.9323
4 deeptree damo 93.5555 4 neu204 91.1818
5 Sanmed AI 91.5205 5 Sanmed AI 91.1738
6 satsuma 90.8943 6 dolphins 90.4313
7 LinkStartHao 90.8383 7 suqi 90.1990
8 CITI-SJTU 90.7885 8 notbestme 89.9915
9 lya 90.6783 9 lya 87.7948
10 blackbean 90.6618 10 dnai 86.7915
11 Median 90.3788 11 CITI-SJTU 85.9390
12 notbestme 88.9638 12 blackbean 85.7050
13 dolphins 87.5408 13 LinkStartHao 85.4848
14 dnai 87.4095 14 satsuma 85.4680
15 miclab 87.2865 15 Median 84.0613
16 suqi 85.5348 16 miclab 82.8338
17 CBT IITDELHI 82.7658 17 bwhacil 76.3725
18 fme 76.3648 18 CBT IITDELHI 75.4443
19 bwhacil 75.4048 19 fme 75.1083
20 biomedia 69.6055 20 biomedia 73.0363
Table 11. Rankings stability analysis by different scores calculation methods
on the test phase for all final 20 teams. Score is under percentage format.
The test phase by weighted score The test phase by mean score
Rank Team Name Weighted Score Rank Team Name Mean Score
1 deeptree damo 95.3558 1 timi 94.5278
2 timi 94.8463 2 YangLab 93.9848
3 YangLab 93.6773 3 deeptree damo 93.9323
4 neu204 90.2379 4 neu204 91.1818
5 Sanmed AI 89.1429 5 Sanmed AI 91.1738
6 dolphins 89.1258 6 dolphins 90.4313
7 suqi 88.3940 7 suqi 90.1990
8 notbestme 87.7671 8 notbestme 89.9915
9 dnai 84.9991 9 lya 87.7948
10 lya 84.8609 10 dnai 86.7915
11 CITI-SJTU 82.8748 11 CITI-SJTU 85.9390
12 blackbean 82.1272 12 blackbean 85.7050
13 LinkStartHao 81.721 13 LinkStartHao 85.4848
14 satsuma 81.7576 14 satsuma 85.4680
15 Median 79.8302 15 Median 84.0613
16 miclab 77.9807 16 miclab 82.8338
17 bwhacil 74.6303 17 bwhacil 76.3725
18 CBT IITDELHI 70.3930 18 CBT IITDELHI 75.4443
19 fme 70.1270 19 fme 75.1083
20 biomedia 67.4702 20 biomedia 73.0363
Table 12. The maximum GPU memory consumption and inference time
used by the participants on the test phase, 150 cases in total. All the test
dockers were executed on the same device.
Team name GPU memory Inference time
Sanmed AI (T1) 3.74 GB 3h 11min
fme (T2) 17.89 GB 2h 18min
LinkStartHao (T3) 9.64 GB 4h 39min
YangLab (T4) 6.24 GB 7h 47min
dolphins (T5) 10.43 GB 3h 59min
timi (T6) 3.81 GB 2h 38min
neu204 (T7) 9.00 GB 5h 11min
blackbean (T8) 3.72 GB 12h 13min
lya (T9) 9.58 GB 2h 23min
dnai (T10) 11.59 GB 6h 25min
bms410 (T11) 23.39 GB 11h 01min
miclab (T12) 11.54 GB 2h 05min
CITI-SJTU (T13) 3.24 GB 5h 41min
deeptree damo (T14) 3.52 GB 5h 08min
CBT IITDELHI (T15) 7.13 GB 3h 35min
bwhacil (T16) 23.39 GB 1h 56min
suqi (T17) 11.47GB 4h 35min
Median (T18) 5.53GB 6h 06min
notbestme (T19) 10.58GB 1h 48min
satsuma (T20) 3.71GB 17h 42min
biomedia (T21) 11.52GB 9h 12min
measurements of bronchial features. With regard to topologi-
cal correctness, the performance of the voxel-wise segmenta-
tion determines the accuracy of the quantitative measurements.
The measurements of bronchial morphometric parameters such
as wall thickness, total airway count, and lumen diameter can
be used in the diagnosis of cystic fibrosis (Wielp¨
utz et al., 2013)
chronic obstructive pulmonary disease (COPD) (Kirby et al.,
2018), and asthma (Eddy et al., 2020). The accurate segmen-
tation of airways can relieve the burden of clinicians and re-
duce the large variability of a higher order of the branches. The
high topological correctness could facilitate better quantifica-
tion of airway pathologies, and then improve the comprehen-
sion of the mechanisms of disease progression. As for topo-
logical completeness, it is quite important for the navigation of
endobronchial interventions. Detailed pulmonary airway seg-
mentation, which traditionally works on the level of trachea and
bronchi and ideally reached the granularity of alveoli is required
for navigation in bronchoscopic-assisted surgery since it shows
great advantages in the treatment of lung cancer (Reynisson
et al., 2014), chronic obstructive pulmonary disease (COPD)
(Wan et al., 2006) and the recent COVID-19 (Luo et al., 2020).
As claimed in (Gu et al., 2022), the outer diameter of current
flexible bronchoscopes is smaller than 5mm, which allows the
direct exploration of distal small bronchi. Hence, a detailed
modeling of the bronchial tree is demanded to build the vir-
tual lung model for preoperative path planning and intraopera-
tive navigation. In conclusion, both topological correctness and
completeness of the airway are critical indicators for the diag-
nosis and treatment of pulmonary diseases. Hence, the aim of
our challenge encourages the development of airway segmen-
tation algorithms that consider both correctness and complete-
ness.
28 This is the draft version for editing. /Medical Image Analysis (2023)
6.2. Rethinking the Evaluation of Tubular Structures
Although the overlap-wise metrics (DSC, IOU, etc.) and the
distance-based metrics (Chamfer distance, Average symmetric
surface distance, etc.) are the standard metrics adopted by the
computer vision community, biomedical tasks often have spe-
cial domain-specific requirements. In this organized ATM’22
challenge, The topological completeness (i.e., connectivity) is
of particular interest, hence, the specific metrics should be taken
into consideration. Deep distance transform (Wang et al., 2020)
was designed for tubular structure segmentation in CT scans
while they merely employ the DSC and mean surface distance
for evaluation. They particularly formulated the tubular struc-
ture as the envelope of spheres with continuously changing cen-
ter points and radii to be distinct from other general object seg-
mentation, however, no specific metrics were adopted for evalu-
ation. The situation is opposite to the EXACT’09 challenge (Lo
et al., 2012). It provided the specific metrics (tree length de-
tected and branches detected) to measure topological complete-
ness of the airway extracted by the segmentation algorithms.
However, in that period, the methods were mainly intensity-
based, which cannot achieve the high performance of the topo-
logical metrics due to the lack of airway tree prior. Fortunately,
our organized ATM’22 challenge provided a large dataset with
full airway annotation. In addition, we emphasized the specific
tree-like topology in the airway segmentation task. Equipped
with the deep-learning technique and specified modules that
emphasize the topological knowledge, it is promising to explore
the intrinsic knowledge of the tubular structure and narrow the
gap between the methodology and evaluation.
As demonstrated before, we adopted the TD and BD to mea-
sure the topological completeness, and DSC and Precision were
used to evaluate the topological correctness. The Betti numbers
are crucial topological invariants, and the airway of the golden
standard shares the constant topological feature of β0=1 (i.e.,
owns only a single connected component). The Betti error is
hard to directly minimize due to the non-differentiable prop-
erty. Moreover, the Betti error alone cannot always reflect the
real topological completeness. It will fall into such a pitfall:
For example, the region growing method definitely produces
a single connected airway prediction due to the growth rule.
However, the peripheral airways are not detected. Under such
circumstances, the error of β0is zero while the TD/BD will be
quite low.
Airway thickness (Orlandi et al., 2005; Achenbach et al.,
2008) had demonstrated a strong correlation with reduced air
flow, and the Airway Fractal Dimension (AFD) was proposed
to conduct an auxiliary diagnosis of respiratory morbidity and
mortality in COPD (Bodduluri et al., 2018). These metrics are
biased toward the clinical analysis of the morphological param-
eters, which may be involved in our future works. The most
relevant metric to our task is clDice (Shit et al., 2021). It empha-
sizes the connectivity to evaluate tubular structure segmentation
based on extracted soft skeletons and masks. The clDice is de-
fined as the harmonic mean of Topology Precision and Topol-
ogy Sensitivity. It handles the FP and FN samples simultane-
ously, reinforcing the network to be connectivity-aware. The
advantage of clDice is that it can be designed as a loss func-
tion due to the differentiable property. However, the clDice loss
function cannot guarantee the topology of the airway because
the TD/BD is calculated after the largest component extraction
of the prediction while clDice does not. To our best knowl-
edge, the operation of the largest component extraction is non-
differentiable. The clDice is also affected by the structure size,
even if the predictions of the two algorithms differ in only a
single pixel, the clDice varies remarkably. In addition, our pre-
liminary experiments observed the soft-skeleton extracted by
clDice on the airway data was of poor quality. In conclusion, the
non-differentiable topological metrics (TD/BD) are unfriendly
to the pipeline of the deep learning models. The implicit mod-
ules, such as novel objective functions, are encouraged to be
investigated since they had shown the ability to improve topol-
ogy performance. The best supervision signals to characterize
the topology priors are still far away from being solved, hence
further research is necessary.
6.3. Limitations and Future of ATM’22
ATM’22 is primarily aimed at establishing the new standard
norm of the airway segmentation field in the deep learning era.
Our future work can be roughly concentrated on three aspects.
Firstly, more cases with diverse diseases and low resolution
would be introduced to strengthen the evaluation of the gener-
alization ability. ATM’22 only introduced the COVID-19 dis-
ease due to the extreme difficulty of annotating these noisy CT
scans. Benefiting from the ensemble of the top-ranked algo-
rithms from the outcome of ATM’22, it is feasible to intro-
duce more diseased cases. These diseased cases can be first
pre-segmented by a strong ensemble model to relieve the bur-
den of the clinician. Secondly, we would extend our challenge.
ATM’22 is currently focused on binary airway segmentation.
There are several recent works (Tan et al., 2021; Xie et al., 2022;
Yu et al., 2022a) that start the airway anatomical labeling (i.e.,
branch-wise airway classification or segmentation). The quali-
fied binary segmentation is the foundation of the assignment of
anatomical names to the corresponding branches of the airway
tree. The assignment task is a promising direction to promote
our challenge as fine-grained labels provide a detailed map for
bronchoscopic navigation and the morphological changes will
be position-aware. Thirdly, we would expand our challenge to
include more tubular structures. The topology performance is
also crucial in other tubular structures, such as the fundus blood
vessel, hepatic vessels, and coronary artery. The generalization
ability could not only be tested on the unseen domain of the
same class, but also on universal tasks.
7. Conclusion
In this paper, we presented the Multi-site, Multi-domain Air-
way Tree Modeling (ATM’22) benchmark. The largest chest
CT scans (500 scans in total) with full pulmonary airway an-
notation and the most comprehensive evaluation system were
provided for this task. We summarized four typical chal-
lenges in airway segmentation, among which achieving both
the high performance of topological completeness and correct-
ness was the most crucial. Generally speaking, most teams per-
formed better on the topological correctness (91.277% DSC and
This is the draft version for editing. /Medical Image Analysis (2023) 29
93.669% Precision in average) than the topological complete-
ness (83.350% TD and 75.596% BD on average). Experimen-
tal results also demonstrated that high performance of topolog-
ical correctness can not consistently guarantee the topological
completeness of the airways and vice versa. Several consen-
suses of effective methods were derived to deal with the chal-
lenges of the airway segmentation. Improve intra-class discrim-
ination and design novel objective functions were recognized
as promising directions to achieve the outstanding trade-of be-
tween topological completeness and correctness. Moreover, the
non-differentiable topological metrics(TD/BD) are unfriendly
to the deep learning models. Hence, the best supervision sig-
nals to characterize the topology priors are still needed further
research.
Acknowledgments
This work is supported in part by the Open Funding of
Zhejiang Laboratory under Grant 2021KH0AB03, in part by
the Shanghai Sailing Program under Grant 20YF1420800,
and in part by NSFC under Grant 62003208, and in part by
Shanghai Municipal of Science and Technology Project, under
Grant 20JC1419500 and Grant 20DZ2220400. The authors
would like to thank the MICCAI challenge society and the
support of the Amazon Web Services and Grand-Challenge.org.
Author contributions
M.Z: Conceptualization, Methodology, Software, Validation,
Formal analysis, Data curation, Writing- original draft; Y.W:
Data curation, Software, Writing- original draft; H.Z: Data cu-
ration, Writing- review and editing; Y.Q: Data curation, Soft-
ware; H.Z: Data curation, Software; J.S: Conceptualization,
Data curation; G.Z.Y: Methodology, Critical Evaluation, Con-
ceptualization, Supervision, Writing- review and editing; Y.G:
Project administration, Conceptualization, Methodology, Su-
pervision, Writing- review and editing.
W.T, C.A, C.P, P.Y, Y.N, G.Y, S.W, D.C.M, M.K, P.W, D.G,
D.J, Y.W, S.Z, R.C, B.Z, X.L, A.Q, M.M, Q.S, Y.W, Y.L, Y.Z,
J.Y, A.P, B.R, R.S.J.E, C.C.E were participants of the ATM’22
challenge, and provided their results for evaluation and the de-
scription of their algorithms. The final manuscript was ap-
proved by all authors.
Declaration of Competing Interest
The authors declare that they have no known competing fi-
nancial interests or personal relationships that could be ap-
peared to influence the work reported in this paper.
References
Achenbach, T., Weinheimer, O., Biedermann, A., Schmitt, S., Freudenstein, D.,
Goutham, E., Kunz, R.P., Buhl, R., Dueber, C., Heussel, C.P., 2008. Mdct
assessment of airway wall thickness in copd patients using a new method:
correlations with pulmonary function tests. European radiology 18, 2731–
2738.
Agarap, A.F., 2018. Deep learning using rectified linear units (relu). arXiv
preprint arXiv:1803.08375 .
Antonelli, M., Reinke, A., Bakas, S., Farahani, K., Kopp-Schneider, A., Land-
man, B.A., Litjens, G., Menze, B., Ronneberger, O., Summers, R.M., et al.,
2022. The medical segmentation decathlon. Nature communications 13,
1–13.
Aresta, G., Ara´
ujo, T., Kwok, S., Chennamsetty, S.S., Safwan, M., Alex, V.,
Marami, B., Prastawa, M., Chan, M., Donovan, M., et al., 2019. Bach:
Grand challenge on breast cancer histology images. Medical image analysis
56, 122–139.
Armato III, S.G., McLennan, G., Bidaut, L., McNitt-Gray, M.F., Meyer, C.R.,
Reeves, A.P., Zhao, B., Aberle, D.R., Henschke, C.I., Hoffman, E.A., et al.,
2011. The lung image database consortium (lidc) and image database re-
source initiative (idri): a completed reference database of lung nodules on ct
scans. Medical physics 38, 915–931.
Au, O.K.C., Tai, C.L., Chu, H.K., Cohen-Or, D., Lee, T.Y., 2008. Skeleton
extraction by mesh contraction. ACM transactions on graphics (TOG) 27,
1–10.
Aykac, D., Hoffman, E.A., McLennan, G., Reinhardt, J.M., 2003. Segmenta-
tion and analysis of the human airway tree from three-dimensional x-ray ct
images. IEEE transactions on medical imaging 22, 940–950.
Bodduluri, S., Puliyakote, A.S.K., Gerard, S.E., Reinhardt, J.M., Hoffman,
E.A., Newell, J.D., Nath, H.P., Han, M.K., Washko, G.R., Est´
epar, R.S.J.,
et al., 2018. Airway fractal dimension predicts respiratory morbidity and
mortality in copd. The Journal of clinical investigation 128, 5374–5382.
Buda, M., Maki, A., Mazurowski, M.A., 2018. A systematic study of the class
imbalance problem in convolutional neural networks. Neural networks 106,
249–259.
Campello, V.M., Gkontra, P., Izquierdo, C., Martin-Isla, C., Sojoudi, A., Full,
P.M., Maier-Hein, K., Zhang, Y., He, Z., Ma, J., et al., 2021. Multi-centre,
multi-vendor and multi-disease cardiac segmentation: the m&ms challenge.
IEEE Transactions on Medical Imaging 40, 3543–3554.
Charbonnier, J.P., Van Rikxoort, E.M., Setio, A.A., Schaefer-Prokop, C.M., van
Ginneken, B., Ciompi, F., 2017. Improving airway segmentation in com-
puted tomography using leak detection with convolutional networks. Medi-
cal image analysis 36, 52–60.
Cheng, M., Zhao, K., Guo, X., Xu, Y., Guo, J., 2021. Joint topology-preserving
and feature-refinement network for curvilinear structure segmentation, in:
Proceedings of the IEEE/CVF International Conference on Computer Vi-
sion, pp. 7147–7156.
C¸ ic¸ek, ¨
O., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O., 2016.
3d u-net: learning dense volumetric segmentation from sparse annotation,
in: International conference on medical image computing and computer-
assisted intervention, Springer. pp. 424–432.
Clough, J., Byrne, N., Oksuz, I., Zimmer, V.A., Schnabel, J.A., King, A., 2020.
A topological loss function for deep-learning based image segmentation us-
ing persistent homology. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence .
Codella, N.C., Gutman, D., Celebi, M.E., Helba, B., Marchetti, M.A., Dusza,
S.W., Kalloo, A., Liopyris, K., Mishra, N., Kittler, H., et al., 2018a. Skin
lesion analysis toward melanoma detection: A challenge at the 2017 interna-
tional symposium on biomedical imaging (isbi), hosted by the international
skin imaging collaboration (isic), in: 2018 IEEE 15th international sympo-
sium on biomedical imaging (ISBI 2018), IEEE. pp. 168–172.
Codella, N.C., Gutman, D., Celebi, M.E., Helba, B., Marchetti, M.A., Dusza,
S.W., Kalloo, A., Liopyris, K., Mishra, N., Kittler, H., et al., 2018b. Skin
lesion analysis toward melanoma detection: A challenge at the 2017 interna-
tional symposium on biomedical imaging (isbi), hosted by the international
skin imaging collaboration (isic), in: 2018 IEEE 15th international sympo-
sium on biomedical imaging (ISBI 2018), IEEE. pp. 168–172.
Cornea, N.D., Silver, D., Yuan, X., Balasubramanian, R., 2005. Computing
hierarchical curve-skeletons of 3d objects. The Visual Computer 21, 945–
955.
Couper, D., LaVange, L.M., Han, M., Barr, R.G., Bleecker, E., Hoffman, E.A.,
Kanner, R., Kleerup, E., Martinez, F.J., Woodruff, P.G., et al., 2014. Design
of the subpopulations and intermediate outcomes in copd study (spiromics).
Thorax 69, 492–495.
Dauphin, Y.N., Fan, A., Auli, M., Grangier, D., 2017. Language modeling
with gated convolutional networks, in: International conference on machine
learning, PMLR. pp. 933–941.
Deng, Y., Ren, Z., Kong, Y., Bao, F., Dai, Q., 2016. A hierarchical fused fuzzy
deep neural network for data classification. IEEE Transactions on Fuzzy
30 This is the draft version for editing. /Medical Image Analysis (2023)
Systems 25, 1006–1012.
Dey, T.K., Sun, J., 2006. Defining and computing curve-skeletons with medial
geodesic function, in: Symposium on geometry processing, pp. 143–152.
Diakogiannis, F.I., Waldner, F., Caccetta, P., Wu, C., 2020. Resunet-a: A deep
learning framework for semantic segmentation of remotely sensed data. IS-
PRS Journal of Photogrammetry and Remote Sensing 162, 94–114.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Un-
terthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.,
2020. An image is worth 16x16 words: Transformers for image recognition
at scale. arXiv preprint arXiv:2010.11929 .
Eddy, R.L., Svenningsen, S., Kirby, M., Knipping, D., McCormack, D.G., Lic-
skai, C., Nair, P., Parraga, G., 2020. Is computed tomography airway count
related to asthma severity and airway structure and function? American
Journal of Respiratory and Critical Care Medicine 201, 923–933.
Edelsbrunner, H., Letscher, D., Zomorodian, A., 2000. Topological persistence
and simplification, in: Proceedings 41st annual symposium on foundations
of computer science, IEEE. pp. 454–463.
Fetita, C., Ortner, M., Brillet, P.Y., Prˆ
eteux, F., Grenier, P., et al., 2009. A
morphological-aggregative approach for 3d segmentation of pulmonary air-
ways from generic msct acquisitions, in: Proc. of Second International
Workshop on Pulmonary Image Analysis, pp. 215–226.
Feuerstein, M., Kitasaka, T., Mori, K., 2009. Adaptive branch tracing and
image sharpening for airway tree extraction in 3-d chest ct, in: Proc. of
Second International Workshop on Pulmonary Image Analysis, pp. 1–8.
Garcia-Uceda, A., Selvan, R., Saghir, Z., Tiddens, H.A., de Bruijne, M., 2021.
Automatic airway segmentation from computed tomography using robust
and efficient 3-d convolutional neural networks. Scientific Reports 11, 1–
15.
Garcia-Uceda Juarez, A., Selvan, R., Saghir, Z., Bruijne, M.d., 2019. A joint
3d unet-graph neural network-based method for airway segmentation from
chest cts, in: International workshop on machine learning in medical imag-
ing, Springer. pp. 583–591.
Garcia-Uceda Juarez, A., Tiddens, H.A., Bruijne, M.d., 2018. Automatic air-
way segmentation in chest ct using convolutional neural networks, in: Image
analysis for moving organ, breast, and thoracic images. Springer, pp. 238–
250.
Gu, Y., Gu, C., Yang, J., Sun, J., Yang, G.Z., 2022. Vision-kinematics-
interaction for robotic-assisted bronchoscopy navigation. IEEE Transactions
on Medical Imaging .
Hu, X., Li, F., Samaras, D., Chen, C., 2019. Topology-preserving deep image
segmentation. Advances in neural information processing systems 32.
Hu, X., Wang, Y., Fuxin, L., Samaras, D., Chen, C., 2021. Topology-aware
segmentation using discrete morse theory. arXiv preprint arXiv:2103.09992
.
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q., 2017. Densely
connected convolutional networks, in: Proceedings of the IEEE conference
on computer vision and pattern recognition, pp. 4700–4708.
Ioffe, S., Szegedy, C., 2015. Batch normalization: Accelerating deep network
training by reducing internal covariate shift, in: International conference on
machine learning, PMLR. pp. 448–456.
Irving, B., Taylor, P., Todd-Pokropek, A., 2009. 3d segmentation of the airway
tree using a morphology based method, in: Proceedings of 2nd international
workshop on pulmonary image analysis, pp. 297–07.
Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H., 2021. nnu-
net: a self-configuring method for deep learning-based biomedical image
segmentation. Nature methods 18, 203–211.
Jin, D., Iyer, K.S., Chen, C., Hoffman, E.A., Saha, P.K., 2016. A robust and
efficient curve skeletonization algorithm for tree-like objects using minimum
cost paths. Pattern recognition letters 76, 32–40.
Jin, D., Xu, Z., Harrison, A.P., George, K., Mollura, D.J., 2017. 3d convolu-
tional neural networks with graph refinement for airway segmentation using
incomplete data labels, in: International workshop on machine learning in
medical imaging, Springer. pp. 141–149.
Karwoski, R.A., Bartholmai, B., Zavaletta, V.A., Holmes, D., Robb, R.A.,
2008. Processing of ct images for analysis of diffuse lung disease in the lung
tissue research consortium, in: Medical imaging 2008: physiology, function,
and structure from medical images, SPIE. pp. 356–364.
Kendall, M.G., 1938. A new measure of rank correlation. Biometrika 30, 81–
93.
Kervadec, H., Bouchtiba, J., Desrosiers, C., Granger, E., Dolz, J., Ayed, I.B.,
2019. Boundary loss for highly unbalanced segmentation, in: International
conference on medical imaging with deep learning, PMLR. pp. 285–296.
Kirby, M., Tanabe, N., Tan, W.C., Zhou, G., Obeidat, M., Hague, C.J., Leipsic,
J., Bourbeau, J., Sin, D.D., Hogg, J.C., et al., 2018. Total airway count on
computed tomography and the risk of chronic obstructive pulmonary disease
progression. findings from a population-based study. American journal of
respiratory and critical care medicine 197, 56–65.
Kong, B., Wang, X., Bai, J., Lu, Y., Gao, F., Cao, K., Xia, J., Song, Q., Yin,
Y., 2020. Learning tree-structured representation for 3d coronary artery seg-
mentation. Computerized Medical Imaging and Graphics 80, 101688.
Kuo, W., de Bruijne, M., Petersen, J., Nasserinejad, K., Ozturk, H., Chen, Y.,
Perez-Rovira, A., Tiddens, H.A., 2017. Diagnosis of bronchiectasis and
airway wall thickening in children with cystic fibrosis: objective airway-
artery quantification. European radiology 27, 4680–4689.
Lee, T.C., Kashyap, R.L., Chu, C.N., 1994. Building skeleton models via 3-
d medial surface axis thinning algorithms. CVGIP: Graphical Models and
Image Processing 56, 462–478.
Li, H., Tang, Z., Nan, Y., Yang, G., 2022a. Human treelike tubular structure
segmentation: A comprehensive review and future perspectives. Computers
in Biology and Medicine , 106241.
Li, Q., Shen, L., 2019. 3d neuron reconstruction in tangled neuronal image with
deep networks. IEEE transactions on medical imaging 39, 425–435.
Li, Y., Yao, T., Pan, Y., Mei, T., 2022b. Contextual transformer networks for
visual recognition. IEEE Transactions on Pattern Analysis and Machine
Intelligence .
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B., 2021.
Swin transformer: Hierarchical vision transformer using shifted windows,
in: Proceedings of the IEEE/CVF International Conference on Computer
Vision, pp. 10012–10022.
Liu, Z., Miao, Z., Zhan, X., Wang, J., Gong, B., Yu, S.X., 2019. Large-scale
long-tailed recognition in an open world, in: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pp. 2537–2546.
Lo, P., Van Ginneken, B., Reinhardt, J.M., Yavarna, T., De Jong, P.A., Irving,
B., Fetita, C., Ortner, M., Pinho, R., Sijbers, J., et al., 2012. Extraction
of airways from ct (exact’09). IEEE Transactions on Medical Imaging 31,
2093–2107.
Luo, F., Darwiche, K., Singh, S., Torrego, A., Steinfort, D.P., Gasparini, S., Liu,
D., Zhang, W., Fernandez-Bussy, S., Herth, F.J., et al., 2020. Performing
bronchoscopy in times of the covid-19 pandemic: practice statement from
an international expert panel. Respiration 99, 417–422.
Lyu, X., Cheng, L., Zhang, S., 2022. The reta benchmark for retinal vascular
tree analysis. Scientific Data 9, 1–15.
Ma, J., Wei, Z., Zhang, Y., Wang, Y., Lv, R., Zhu, C., Gaoxiang, C., Liu, J.,
Peng, C., Wang, L., et al., 2020. How distance transform maps boost seg-
mentation cnns: an empirical study, in: Medical Imaging with Deep Learn-
ing, PMLR. pp. 479–492.
Ma, J., Zhang, Y., Gu, S., An, X., Wang, Z., Ge, C., Wang, C., Zhang, F.,
Wang, Y., Xu, Y., et al., 2022. Fast and low-gpu-memory abdomen ct organ
segmentation: The flare challenge. Medical Image Analysis 82, 102616.
Ma, J., Zhang, Y., Gu, S., Zhu, C., Ge, C., Zhang, Y., An, X., Wang, C., Wang,
Q., Liu, X., et al., 2021. Abdomenct-1k: Is abdominal organ segmenta-
tion a solved problem. IEEE Transactions on Pattern Analysis and Machine
Intelligence .
Maier-Hein, L., Reinke, A., Christodoulou, E., Glocker, B., Godau, P., Isensee,
F., Kleesiek, J., Kozubek, M., Reyes, M., Riegler, M.A., et al., 2022. Metrics
reloaded: Pitfalls and recommendations for image analysis validation. arXiv
preprint arXiv:2206.01653 .
Maurer, C.R., Qi, R., Raghavan, V., 2003. A linear time algorithm for comput-
ing exact euclidean distance transforms of binary images in arbitrary dimen-
sions. IEEE Transactions on Pattern Analysis and Machine Intelligence 25,
265–270.
Mendrik, A.M., Vincken, K.L., Kuijf, H.J., Breeuwer, M., Bouvy, W.H.,
De Bresser, J., Alansary, A., De Bruijne, M., Carass, A., El-Baz, A., et al.,
2015. Mrbrains challenge: online evaluation framework for brain image
segmentation in 3t mri scans. Computational intelligence and neuroscience
2015.
Meng, Q., Kitasaka, T., Nimura, Y., Oda, M., Ueno, J., Mori, K., 2017a. Auto-
matic segmentation of airway tree based on local intensity filter and machine
learning technique in 3d chest ct volume. International journal of computer
assisted radiology and surgery 12, 245–261.
Meng, Q., Roth, H.R., Kitasaka, T., Oda, M., Ueno, J., Mori, K., 2017b. Track-
ing and segmentation of the airways in chest ct using a fully convolutional
network, in: International Conference on Medical Image Computing and
Computer-Assisted Intervention, Springer. pp. 198–207.
This is the draft version for editing. /Medical Image Analysis (2023) 31
Milletari, F., Navab, N., Ahmadi, S.A., 2016. V-net: Fully convolutional neu-
ral networks for volumetric medical image segmentation, in: 2016 fourth
international conference on 3D vision (3DV), IEEE. pp. 565–571.
Milnor, J., 2016. Morse theory.(am-51), volume 51, in: Morse Theory.(AM-
51), Volume 51. Princeton university press.
Nadeem, S.A., Hoffman, E.A., Sieren, J.C., Comellas, A.P., Bhatt, S.P., Barjak-
tarevic, I.Z., Abtin, F., Saha, P.K., 2020. A ct-based automated algorithm for
airway segmentation using freeze-and-grow propagation and deep learning.
IEEE transactions on medical imaging 40, 405–418.
Nadeem, S.A., Hoffman, E.A., Sieren, J.P., Saha, P.K., 2018. Topological leak-
age detection and freeze-and-grow propagation for improved ct-based air-
way segmentation, in: Medical Imaging 2018: Image Processing, SPIE. pp.
323–333.
Nan, Y., Del Ser, J., Tang, Z., Tang, P., Xing, X., Fang, Y., Herrera, F., Pedrycz,
W., Walsh, S., Yang, G., 2022. Fuzzy attention neural network to tackle
discontinuity in airway segmentation. arXiv preprint arXiv:2209.02048 .
Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K.,
Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al., 2018. At-
tention u-net: Learning where to look for the pancreas. arXiv preprint
arXiv:1804.03999 .
Orlandi, I., Moroni, C., Camiciottoli, G., Bartolucci, M., Pistolesi, M., Vil-
lari, N., Mascalchi, M., 2005. Chronic obstructive pulmonary disease: thin-
section ct measurement of airway wall thickness and lung attenuation. Ra-
diology 234, 604–610.
Pedersen, J.H., Ashraf, H., Dirksen, A., Bach, K., Hansen, H., Toennesen, P.,
Thorsen, H., Brodersen, J., Skov, B.G., Døssing, M., et al., 2009. The danish
randomized lung cancer ct screening trial—overall design and results of the
prevalence round. Journal of Thoracic Oncology 4, 608–614.
Pinho, R., Luyckx, S., Sijbers, J., 2009. Robust region growing based intratho-
racic airway tree segmentation, in: Proc. of Second International Workshop
on Pulmonary Image Analysis, pp. 261–271.
Pu, J., Gu, S., Liu, S., Zhu, S., Wilson, D., Siegfried, J.M., Gur, D., 2012. Ct
based computerized identification and analysis of human airways: a review.
Medical physics 39, 2603–2616.
Qin, Y., Chen, M., Zheng, H., Gu, Y., Shen, M., Yang, J., Huang, X., Zhu, Y.M.,
Yang, G.Z., 2019. Airwaynet: a voxel-connectivity aware approach for ac-
curate airway segmentation using convolutional neural networks, in: Inter-
national Conference on Medical Image Computing and Computer-Assisted
Intervention, Springer. pp. 212–220.
Qin, Y., Zheng, H., Gu, Y., Huang, X., Yang, J., Wang, L., Yao, F., Zhu, Y.M.,
Yang, G.Z., 2021. Learning tubule-sensitive cnns for pulmonary airway and
artery-vein segmentation in ct. IEEE Transactions on Medical Imaging 40,
1603–1617.
Qin, Y., Zheng, H., Gu, Y., Huang, X., Yang, J., Wang, L., Zhu, Y.M., 2020.
Learning bronchiole-sensitive airway segmentation cnns by feature recali-
bration and attention distillation, in: International Conference on Medical
Image Computing and Computer-Assisted Intervention, Springer. pp. 221–
231.
Regan, E.A., Hokanson, J.E., Murphy, J.R., Make, B., Lynch, D.A., Beaty,
T.H., Curran-Everett, D., Silverman, E.K., Crapo, J.D., 2011. Genetic epi-
demiology of copd (copdgene) study design. COPD: Journal of Chronic
Obstructive Pulmonary Disease 7, 32–43.
Reynisson, P.J., Leira, H.O., Hernes, T.N., Hofstad, E.F., Scali, M., Sorger, H.,
Amundsen, T., Lindseth, F., Langø, T., 2014. Navigated bronchoscopy: a
technical review. Journal of bronchology & interventional pulmonology 21,
242–264.
Rickmann, A.M., Roy, A.G., Sarasua, I., Navab, N., Wachinger, C., 2019.
‘project & excite’modules for segmentation of volumetric medical scans,
in: International Conference on Medical Image Computing and Computer-
Assisted Intervention, Springer. pp. 39–47.
Selvan, R., Kipf, T., Welling, M., Juarez, A.G.U., Pedersen, J.H., Petersen,
J., de Bruijne, M., 2020. Graph refinement based airway extraction using
mean-field networks and graph neural networks. Medical Image Analysis
64, 101751.
Selvan, R., Kipf, T., Welling, M., Pedersen, J.H., Petersen, J., de Bruijne, M.,
2018. Extraction of airways using graph neural networks. arXiv preprint
arXiv:1804.04436 .
Shi, T., Boutry, N., Xu, Y., G´
eraud, T., 2022. Local intensity order transforma-
tion for robust curvilinear object segmentation. IEEE Transactions on Image
Processing 31, 2557–2569.
Shit, S., Paetzold, J.C., Sekuboyina, A., Ezhov, I., Unger, A., Zhylka, A., Pluim,
J.P., Bauer, U., Menze, B.H., 2021. cldice-a novel topology-preserving
loss function for tubular structure segmentation, in: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.
16560–16569.
Sonka, M., Park, W., Hoffman, E.A., 1996. Rule-based detection of intratho-
racic airway trees. IEEE transactions on medical imaging 15, 314–326.
Taghanaki, S.A., Zheng, Y., Zhou, S.K., Georgescu, B., Sharma, P., Xu, D.,
Comaniciu, D., Hamarneh, G., 2019. Combo loss: Handling input and out-
put imbalance in multi-organ segmentation. Computerized Medical Imaging
and Graphics 75, 24–33.
Tan, M., Pang, R., Le, Q.V., 2020. Efficientdet: Scalable and efficient object
detection, in: Proceedings of the IEEE/CVF conference on computer vision
and pattern recognition, pp. 10781–10790.
Tan, Z., Feng, J., Zhou, J., 2021. Sgnet: Structure-aware graph-based network
for airway semantic segmentation, in: International Conference on Medical
Image Computing and Computer-Assisted Intervention, Springer. pp. 153–
163.
Tang, Y., Yang, D., Li, W., Roth, H.R., Landman, B., Xu, D., Nath, V.,
Hatamizadeh, A., 2022. Self-supervised pre-training of swin transformers
for 3d medical image analysis, in: Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pp. 20730–20740.
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M., 2018. A closer
look at spatiotemporal convolutions for action recognition, in: Proceedings
of the IEEE conference on Computer Vision and Pattern Recognition, pp.
6450–6459.
Ulyanov, D., Vedaldi, A., Lempitsky, V., 2016. Instance normalization: The
missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022 .
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N.,
Kaiser, Ł., Polosukhin, I., 2017. Attention is all you need. Advances in
neural information processing systems 30.
Wan, I.Y., Toma, T.P., Geddes, D.M., Snell, G., Williams, T., Venuta, F., Yim,
A.P., 2006. Bronchoscopic lung volume reduction for end-stage emphy-
sema: report on the first 98 patients. Chest 129, 518–526.
Wang, C., Hayashi, Y., Oda, M., Itoh, H., Kitasaka, T., Frangi, A.F., Mori,
K., 2019. Tubular structure segmentation using spatial fully connected net-
work with radial distance loss for 3d medical images, in: International Con-
ference on Medical Image Computing and Computer-Assisted Intervention,
Springer. pp. 348–356.
Wang, Y., Wei, X., Liu, F., Chen, J., Zhou, Y., Shen, W., Fishman, E.K., Yuille,
A.L., 2020. Deep distance transform for tubular structure segmentation in
ct scans, in: Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pp. 3833–3842.
Wielp¨
utz, M.O., Eichinger, M., Weinheimer, O., Ley, S., Mall, M.A., Wiebel,
M., Bischoff, A., Kauczor, H.U., Heussel, C.P., Puderbach, M., 2013. Au-
tomatic airway analysis on multidetector computed tomography in cystic
fibrosis: correlation with pulmonary function testing. Journal of thoracic
imaging 28, 104–113.
Wiemker, R., B¨
ulow, T., Lorenz, C., 2009. A simple centricity-based region
growing algorithm for the extraction of airways, in: Proc. Second Inter-
national Workshop on Pulmonary Image Analysis (MICCAI), Citeseer. pp.
309–314.
Wu, Y., Zhang, M., Yu, W., Zheng, H., Xu, J., Gu, Y., 2022. Ltsp: long-term
slice propagation for accurate airway segmentation. International Journal of
Computer Assisted Radiology and Surgery 17, 857–865.
Xie, W., Jacobs, C., Charbonnier, J.P., van Ginneken, B., 2022. Structure and
position-aware graph neural network for airway labeling. arXiv preprint
arXiv:2201.04532 .
Xu, Z., Bagci, U., Foster, B., Mansoor, A., Udupa, J.K., Mollura, D.J., 2015.
A hybrid method for airway segmentation and automated measurement of
bronchial wall thickness on ct. Medical image analysis 24, 1–17.
Xue, Y., Tang, H., Qiao, Z., Gong, G., Yin, Y., Qian, Z., Huang, C., Fan, W.,
Huang, X., 2020. Shape-aware organ segmentation by predicting signed
distance maps, in: Proceedings of the AAAI Conference on Artificial Intel-
ligence, pp. 12565–12572.
Yu, W., Zheng, H., Gu, Y., Xie, F., Yang, J., Sun, J., Yang, G.Z., 2022a. Tnn:
Tree neural network for airway anatomical labeling. IEEE Transactions on
Medical Imaging .
Yu, W., Zheng, H., Zhang, M., Zhang, H., Sun, J., Yang, J., 2022b. Break:
Bronchi reconstruction by geodesic transformation and skeleton embed-
ding, in: 2022 IEEE 19th International Symposium on Biomedical Imaging
(ISBI), IEEE. pp. 1–5.
Yun, J., Park, J., Yu, D., Yi, J., Lee, M., Park, H.J., Lee, J.G., Seo, J.B., Kim, N.,
2019. Improvement of fully automated airway segmentation on volumetric
32 This is the draft version for editing. /Medical Image Analysis (2023)
computed tomographic images using a 2.5 dimensional convolutional neural
net. Medical image analysis 51, 13–20.
Zhang, M., Yang, G.Z., Gu, Y., 2022a. Differentiable topology-preserved
distance transform for pulmonary airway segmentation. arXiv preprint
arXiv:2209.08355 .
Zhang, M., Yu, X., Zhang, H., Zheng, H., Yu, W., Pan, H., Cai, X., Gu, Y.,
2021a. Fda: Feature decomposition and aggregation for robust airway seg-
mentation, in: Domain Adaptation and Representation Transfer, and Afford-
able Healthcare and AI for Resource Diverse Global Health. Springer, pp.
25–34.
Zhang, M., Zhang, H., Yang, G.Z., Gu, Y., 2022b. Cfda: Collaborative feature
disentanglement and augmentation for pulmonary airway tree modeling of
covid-19 cts, in: International Conference on Medical Image Computing and
Computer-Assisted Intervention, Springer. pp. 506–516.
Zhang, Z., Marin, D., Chesakov, E., Maza, M.M., Drangova, M., Boykov, Y.,
2019. Divergence prior and vessel-tree reconstruction, in: Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pp. 10216–10224.
Zhang, Z., Marin, D., Drangova, M., Boykov, Y., 2021b. Confluent vessel trees
with accurate bifurcations, in: Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pp. 9573–9582.
Zhao, T., Yin, Z., Wang, J., Gao, D., Chen, Y., Mao, Y., 2019. Bronchus
segmentation and classification by neural networks and linear programming,
in: International conference on medical image computing and computer-
assisted intervention, Springer. pp. 230–239.
Zheng, H., Qin, Y., Gu, Y., Xie, F., Sun, J., Yang, J., Yang, G.Z., 2021a. Re-
fined local-imbalance-based weight for airway segmentation in ct, in: Inter-
national Conference on Medical Image Computing and Computer-Assisted
Intervention, Springer. pp. 410–419.
Zheng, H., Qin, Y., Gu, Y., Xie, F., Yang, J., Sun, J., Yang, G.Z., 2021b. Alle-
viating class-wise gradient imbalance for pulmonary airway segmentation.
IEEE Transactions on Medical Imaging 40, 2452–2462.
Zhu, W., Huang, Y., Zeng, L., Chen, X., Liu, Y., Qian, Z., Du, N., Fan, W.,
Xie, X., 2019. Anatomynet: deep learning for fast and fully automated
whole-volume segmentation of head and neck anatomy. Medical physics
46, 576–589.
Zhuang, X., Li, L., Payer, C., ˇ
Stern, D., Urschler, M., Heinrich, M.P., Oster, J.,
Wang, C., Smedby, ¨
O., Bian, C., et al., 2019. Evaluation of algorithms for
multi-modality whole heart segmentation: an open-access grand challenge.
Medical image analysis 58, 101537.