PreprintPDF Available

Efficient Prealignment of CT Scans for Registration through a Bodypart Regressor

Authors:
  • University of Bremen and Fraunhofer Institute for Digital Medicine MEVIS
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Convolutional neural networks have not only been applied for classification of voxels, objects, or images, for instance, but have also been proposed as a bodypart regressor. We pick up this underexplored idea and evaluate its value for registration: A CNN is trained to output the relative height within the human body in axial CT scans, and the resulting scores are used for quick alignment between different timepoints. Preliminary results confirm that this allows both fast and robust prealignment compared with iterative approaches.
Content may be subject to copyright.
Medical Imaging with Deep Learning 2019 MIDL 2019 Extended Abstract Track
Efficient Prealignment of CT Scans for Registration
through a Bodypart Regressor
Hans Meine1,2meine@uni-bremen.de
Alessa Hering2,3alessa.hering@mevis.fraunhofer.de
1University of Bremen, Medical Image Computing Group
2Fraunhofer Institute for Digital Medicine MEVIS
3Diagnostic Image Analyse Group, Radboud UMC, Nijmegen, Netherlands
Abstract
Convolutional neural networks have not only been applied for classification of voxels, ob-
jects, or images, for instance, but have also been proposed as a bodypart regressor. We pick
up this underexplored idea and evaluate its value for registration: A CNN is trained to out-
put the relative height within the human body in axial CT scans, and the resulting scores
are used for quick alignment between different timepoints. Preliminary results confirm that
this allows both fast and robust prealignment compared with iterative approaches.
Keywords: SSBR, self-supervised bodypart regressor, registration, prealignment
1. Introduction
Bodypart recognition is an interesting task that has many potential benefits for workflow
improvements. For instance, it may be used for data mining in large PACS systems, for
triggering automatic preprocessing or analysis steps of relevant body regions, or for offering
optimized viewer initializations for human readers during a particular kind of study.
The recent advent of convolutional neural networks in medical image computing has
also led to several applications for bodypart recognition: Yan et al. (2016) and Roth et al.
(2015) trained CNN that classified axial CT slices into one of 12 and 5 manually labeled
body parts, respectively. While such an assignment is reasonable on a scan level, it has
the downside that slices in transition regions (e.g. thorax / abdomen) cannot be uniquely
assigned and that the information is rather coarse. Hence, later works (Zhang et al.,2017;
Yan et al.,2017) posed the problem as a regression of the relative body height, which allows
more fine-grained region identification and does not suffer from ambiguities. While Zhang
et al. (2017) used manually labeled anatomical landmarks for calibration, Yan et al. (2017)
suggested a novel training approach that no longer needs any manual annotation, but can
be trained on a large number of unlabeled transversal CT scans. The latter method was
introduced for mining RECIST measurements (Yan et al.,2018) and is used in this work.
Registration of CT volumes has also been recently approached with CNN (Eppenhof
et al.,2019;Hering and Heldmann,2019;de Vos et al.,2019). Most of these approaches
focus on deformable registration. However, a full registration pipeline typically consists
of several steps, starting with a coarse prealignment. This prealignment is particularly
challenging when the two scans cover different regions or have only a small overlap.
c
2019 H. Meine & A. Hering.
arXiv:1909.08898v1 [eess.IV] 19 Sep 2019
Meine Hering
0 100 200 300 400 500 600
z position [mm] relative to start of fixed image
200
100
0
100
slice score
fixed
linear interpolation of samples
moving
samples from fixed
samples from moving
Fixed Moving
Prealignment
Figure 1: Slices scores of a random image pair, preligned based on three samples
2. Materials and Methods
For the experiments described in the following, we used a dataset of 1475 thorax-abdomen
CT scans in transversal orientation from 489 patients acquired at Radboud UMC, NL
in 2015. After a patient-level split into training and test sets, we trained a bodypart
regressor on a subset of 1035 volumes (from 326 patients) that had at least 300 slices each,
comprising a total of 670.986 slices, leaving a set of 440 volumes with 284.258 slices for
testing. These test volumes result in 277 intra-patient registration pairs. To generate more
challenging cases, only a subvolume of the moving image is used for the registration. For
this purpose, a random start slice is uniformly sampled between the first and the end-100 th
slice. Additionally, the number of slices is uniformly chosen with at least 20 slices up to the
whole volume.
The self-supervised bodypart regressor (SSBR (Yan et al.,2018), aka UBR (Yan et al.,
2017)) is based on a modified VGG-16 network that uses global average pooling and a single
dense layer after the convolutional basis to output a single score for each slice, resampled
to 128 ×128 voxels. The key ingredient is the loss function, which consists of two parts:
Given a batch of 32 stacks of m= 8 equidistant slices, the loss LSSBR =Lorder +Ldist
is a sum of a term Lorder =Pm2
i=0 log h(si+1 si) that penalizes non-increasing scores
within the stack and a term Ldist =Pm3
i=0 g(∆i+1 i) for achieving equal differences
i=si+1 sibetween scores of equidistant slices (his the sigmoid function and gis the
smooth L1 loss). By sampling random stacks of equidistant slices with varying positions
and inter-slice spacings, the network learns to output linearly increasing scores via short and
long distance penalties. Absolutely no manual annotation is necessary, not even landmarks.
For registration prealignment based on the regression scores, we devised two methods.
The first, extremely fast approach is to compute the score of the first and last slices of
the fixed image and the score of the center slice of the moving image. This allows us to
estimate the relative position of the center of the moving image within the fixed image for
prealignment, based on scoring just three slices (cf. Figure 1). The second method computes
the scores of all slices, resampling to a common slice spacing, and then computes the best
match by shifting one score curve with respect to the other, identifying the position with
the lowest `1norm of the overlapping parts (Figure 2).
We compare these SSBR prealignments with a brute force grid search method named
FASTA (Fast Translation Alignment) which evaluates a difference measure (here SSD, the
squared `2norm of the difference image) on a grid of possible translations. Finer grids allow
for more precise translation estimation at the expense of increased computational cost. For
2
Efficient Registration Prealignment through a Bodypart Regressor
Table 1: Scoring results for all methods with the following categories: 1: very good align-
ment, 2: good alignment, 3: correct body region, and 4: failure.
Method Mean Score 1 2 3 4
FASTA 1.6 188 41 17 31
SSBR fast 1.7 127 113 23 14
SSBR l1 1.3 201 64 10 2
0 100 200 300 400 500 600
z position [mm] relative to start of fixed image
200
100
0
100
slice score
moving
fixed
Figure 2: Prealignment according to `1minimization (different image pair)
faster processing, the moving image is resampled to a maximal image size of 128×128×128.
The fixed image is resampled to the same image resolution as the moving image. For the grid
generation, we choose a sampling rate of 3, 3, and 71 in x, y, and z-direction respectively.
For evaluation, we visually score the registration results into 4 categories: 1: very good
alignment, 2: good alignment, 3: correct body region, and 4: failure.
3. Results and Conclusion
The scoring results are shown in table 1. On an NVIDIA GTX 1080 Ti, the regressor only
requires 0.86 ms per slice, whereas our application needed an additional 2.9 ms to load and
preprocess the DICOM slices. The resulting scores of each volume had an average Pearson
correlation coefficient of 99.34% against the respective slice numbers. Figures 1and 2show
example alignments of the score curves on two randomly selected image pairs using the two
proposed methods.The `1-based alignment gives much more stable results, at the expense
of having to run all slices through the regressor. Still, the runtime is around 1 s for typical
image pairs (compared to 10 ms for the fast method and about 5 seconds for FASTA).
Conclusion: Our SSBR-based prealignment methods are faster than FASTA and the `1-
based alignment shows also better scoring results. However, they only deliver an alignment
in zdirection (still, the most important component when registering two axial CT scans).
The `1-based alignment is is very robust, and while it has to score all slices, the subsequent
step just has to align two small 1-dimensional score arrays. For extremely fast alignment,
we can just score three slices, but the resulting estimates show less precision in some cases.
Both proposed methods are much more robust than traditional methods when the overlap
of the volumes to be registered is small. Using the SSBR, it is even possible to align non-
overlapping volumes. We plan to further evaluate this new prealignment in practice and on
a larger dataset.
3
Meine Hering
Acknowledgments
We thank Bram van Ginneken and the DIAG group (Radboud UMC, Nijmegen) for making
the CT data available within our common ”Automation in Medical Imaging” Fraunhofer
ICON project.
References
Bob D de Vos, Floris F Berendsen, Max A Viergever, Hessam Sokooti, Marius Staring, and
Ivana Iˇsgum. A deep learning framework for unsupervised affine and deformable image
registration. Medical image analysis, 52:128–143, 2019.
Koen AJ Eppenhof, Maxime W Lafarge, and Josien PW Pluim. Progressively growing
convolutional networks for end-to-end deformable image registration. In Medical Imaging
2019: Image Processing, volume 10949, page 109491C. International Society for Optics
and Photonics, 2019.
Alessa Hering and Stefan Heldmann. Unsupervised learning for large motion thoracic ct
follow-up registration. In SPIE Medical Imaging: Image Processing, volume 10949, page
109491B, 2019.
Holger R. Roth, Christopher T. Lee, Hoo-Chang Shin, Ari Seff, Lauren Kim, Jianhua Yao,
Le Lu, and Ronald M. Summers. Anatomy-specific classification of medical images using
deep convolutional nets. In 2015 IEEE 12th International Symposium on Biomedical
Imaging (ISBI), pages 101–104, Brooklyn, NY, USA, April 2015. IEEE. ISBN 978-
1-4799-2374-8. doi: 10.1109/ISBI.2015.7163826. URL http://ieeexplore.ieee.org/
document/7163826/.
Ke Yan, Le Lu, and Ronald M. Summers. Unsupervised Body Part Regression via Spatially
Self-ordering Convolutional Neural Networks. arXiv:1707.03891 [cs], July 2017. URL
http://arxiv.org/abs/1707.03891.
Ke Yan, Xiaosong Wang, Le Lu, Ling Zhang, Adam P. Harrison, Mohammadhadi Bagheri,
and Ronald M. Summers. Deep Lesion Graphs in the Wild: Relationship Learning and
Organization of Significant Radiology Image Findings in a Diverse Large-Scale Lesion
Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 9261–9270, 2018.
Z. Yan, Y. Zhan, Z. Peng, S. Liao, Y. Shinagawa, S. Zhang, D. N. Metaxas, and X. S. Zhou.
Multi-Instance Deep Learning: Discover Discriminative Local Anatomies for Bodypart
Recognition. IEEE Transactions on Medical Imaging, 35(5):1332–1343, May 2016. ISSN
0278-0062. doi: 10.1109/TMI.2016.2524985.
P. Zhang, F. Wang, and Y. Zheng. Self supervised deep representation learning for fine-
grained body part recognition. In 2017 IEEE 14th International Symposium on Biomed-
ical Imaging (ISBI 2017), pages 578–582, April 2017. doi: 10.1109/ISBI.2017.7950587.
4
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Radiologists in their daily work routinely find and annotate significant abnormalities on a large number of radiology images. Such abnormalities, or lesions, have collected over years and stored in hospitals' picture archiving and communication systems. However, they are basically unsorted and lack semantic annotations like type and location. In this paper, we aim to organize and explore them by learning a deep feature representation for each lesion. A large-scale and comprehensive dataset, DeepLesion, is introduced for this task. DeepLesion contains bounding boxes and size measurements of over 32K lesions. To model their similarity relationship, we leverage multiple supervision information including types, self-supervised location coordinates and sizes. They require little manual annotation effort but describe useful attributes of the lesions. Then, a triplet network is utilized to learn lesion embeddings with a sequential sampling strategy to depict their hierarchical similarity structure. Experiments show promising qualitative and quantitative results on lesion retrieval, clustering, and classification. The learned embeddings can be further employed to build a lesion graph for various clinically useful applications. We propose algorithms for intra-patient lesion matching and missing annotation mining. Experimental results validate their effectiveness.
Article
Full-text available
In general image recognition problems, discriminative information often lies in local image patches. For example, most human identity information exists in the image patches containing human faces. The same situation stays in medical images as well. "Bodypart identity" of a transversal slice - which bodypart the slice comes from - is often indicated by local image information, e.g. a cardiac slice and an aorta arch slice are only differentiated by the mediastinum region. In this work, we design a multi-stage deep learning framework for image classification and apply it on bodypart recognition. Specifically, the proposed framework aims at: 1) discover the local regions that are discriminative and non-informative to the image classification problem, and 2) learn a image-level classifier based on these local regions. We achieve these two tasks by the two stages of learning scheme, respectively. In the pre-train stage, a convolutional neural network (CNN) is learned in a multiinstance learning fashion to extract the most discriminative and and non-informative local patches from the training slices. In the boosting stage, the pre-learned CNN is further boosted by these local patches for image classification. The CNN learned by exploiting the discriminative local appearances becomes more accurate than those learned from global image context. The key hallmark of our method is that it automatically discovers the discriminative and non-informative local patches through multiinstance deep learning. Thus, no manual annotation is required. Our method is validated on a synthetic dataset and a large scale CT dataset. It achieves better performances than state-of-the-art approaches, including the standard deep CNN.
Conference Paper
Full-text available
Automated classification of human anatomy is an important prerequisite for many computer-aided diagnosis systems. The spatial complexity and variability of anatomy throughout the human body makes classification difficult. “Deep learning” methods such as convolutional networks (ConvNets) outperform other state-of-the-art methods in image classification tasks. In this work, we present a method for organ- or body-part-specific anatomical classification of medical images acquired using computed tomography (CT) with ConvNets. We train a ConvNet, using 4,298 separate axial 2D key-images to learn 5 anatomical classes. Key-images were mined from a hospital PACS archive, using a set of 1,675 patients. We show that a data augmentation approach can help to enrich the data set and improve classification performance. Using ConvNets and data augmentation, we achieve anatomy-specific classification error of 5.9 % and area-under-the-curve (AUC) values of an average of 0.998 in testing. We demonstrate that deep learning can be used to train very reliable and accurate classifiers that could initialize further computer-aided diagnosis.
Article
Image registration, the process of aligning two or more images, is the core technique of many (semi-)automatic medical image analysis tasks. Recent studies have shown that deep learning methods, notably convolutional neural networks (ConvNets), can be used for image registration. Thus far training of ConvNets for registration was supervised using predefined example registrations. However, obtaining example registrations is not trivial. To circumvent the need for predefined examples, and thereby to increase convenience of training ConvNets for image registration, we propose the Deep Learning Image Registration (DLIR) framework for unsupervised affine and deformable image registration. In the DLIR framework ConvNets are trained for image registration by exploiting image similarity analogous to conventional intensity-based image registration. After a ConvNet has been trained with the DLIR framework, it can be used to register pairs of unseen images in one shot. We propose flexible ConvNets designs for affine image registration and for deformable image registration. By stacking multiple of these ConvNets into a larger architecture, we are able to perform coarse-to-fine image registration. We show for registration of cardiac cine MRI and registration of chest CT that performance of the DLIR framework is comparable to conventional image registration while being several orders of magnitude faster.
  • Ke Yan
  • Le Lu
  • Ronald M Summers
Ke Yan, Le Lu, and Ronald M. Summers. Unsupervised Body Part Regression via Spatially Self-ordering Convolutional Neural Networks. arXiv:1707.03891 [cs], July 2017. URL http://arxiv.org/abs/1707.03891.