ArticlePDF Available

Learning to detect anatomical landmarks of the pelvis in X-rays from arbitrary views

Authors:

Abstract and Figures

Purpose Minimally invasive alternatives are now available for many complex surgeries. These approaches are enabled by the increasing availability of intra-operative image guidance. Yet, fluoroscopic X-rays suffer from projective transformation and thus cannot provide direct views onto anatomy. Surgeons could highly benefit from additional information, such as the anatomical landmark locations in the projections, to support intra-operative decision making. However, detecting landmarks is challenging since the viewing direction changes substantially between views leading to varying appearance of the same landmark. Therefore, and to the best of our knowledge, view-independent anatomical landmark detection has not been investigated yet. Methods In this work, we propose a novel approach to detect multiple anatomical landmarks in X-ray images from arbitrary viewing directions. To this end, a sequential prediction framework based on convolutional neural networks is employed to simultaneously regress all landmark locations. For training, synthetic X-rays are generated with a physically accurate forward model that allows direct application of the trained model to real X-ray images of the pelvis. View invariance is achieved via data augmentation by sampling viewing angles on a spherical segment of 120×90120^\circ \times 90^\circ . Results On synthetic data, a mean prediction error of 5.6 ± 4.5 mm is achieved. Further, we demonstrate that the trained model can be directly applied to real X-rays and show that these detections define correspondences to a respective CT volume, which allows for analytic estimation of the 11 degree of freedom projective mapping. Conclusion We present the first tool to detect anatomical landmarks in X-ray images independent of their viewing direction. Access to this information during surgery may benefit decision making and constitutes a first step toward global initialization of 2D/3D registration without the need of calibration. As such, the proposed concept has a strong prospect to facilitate and enhance applications and methods in the realm of image-guided surgery.
Content may be subject to copyright.
Noname manuscript No.
(will be inserted by the editor)
Learning to Detect Anatomical Landmarks of the
Pelvis in X-rays From Arbitrary Views
Bastian Bier1,3
·Florian Goldmann1,3
Jan-Nico Zaech1,3
·Javad Fotouhi1,2
·
Rachel Hegeman4
·Robert Grupp2
Mehran Armand4,5
·Greg Osgood5
·
Nassir Navab1,2
·Andreas Maier3
·
Mathias Unberath1,2
Received: date / Accepted: date
Abstract
Purpose Minimally invasive alternatives are now available for many complex surg-
eries. These approaches are enabled by the increasing availability of intra-operative
image guidance. Yet, fluoroscopic X-rays suffer from projective transformation, and
thus cannot provide direct views onto anatomy. Surgeons could highly benefit from
additional information, such as the anatomical landmark locations in the projec-
tions, to support intra-operative decision making. However, detecting landmarks is
challenging since the viewing direction changes substantially between views lead-
ing to varying appearance of the same landmark. Therefore, and to the best of our
knowledge, view-independent anatomical landmark detection has not been inves-
tigated yet.
Methods In this work, we propose a novel approach to detect multiple anatom-
ical landmarks in X-ray images from arbitrary viewing directions. To this end,
a sequential prediction framework based on convolutional neural networks is em-
ployed to simultaneously regress all landmark locations. For training, synthetic
X-rays are generated with a physically accurate forward model that allows direct
application of the trained model to real X-ray images of the pelvis. View invari-
ance is achieved via data augmentation by sampling viewing angles on a spherical
segment of 120×90.
Results On synthetic data, a mean prediction error of 5.6 ±4.5 mm is achieved.
Further, we demonstrate that the trained model can be directly applied to real
X-rays, and show that these detections define correspondences to a respective CT
B. Bier
E-mail: bastian.bier@fau.de
M. Unberath
E-mail: unberath@jhu.edu
1Computer Aided Medical Procedures, Johns Hopkins University, Baltimore, USA
2Department of Computer Science, Johns Hopkins University, Baltimore, USA
3Pattern Recognition Lab, Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen,
Germany
4Applied Physics Laboratory, Johns Hopkins University, Baltimore, USA
5Department of Orthopedic Surgery, Johns Hopkins Hospital, Baltimore, USA
2 Bier et al.
volume, which allows for analytic estimation of the 11 degree of freedom projective
mapping.
Conclusion We present the first tool to detect anatomical landmarks in X-ray
images independent of their viewing direction. Access to this information during
surgery may benefit decision making and constitutes a first step towards global
initialization of 2D/3D registration without the need of calibration. As such, the
proposed concept has a strong prospect to facilitate and enhance applications and
methods in the realm of image-guided surgery.
Keywords Anatomical Landmarks ·Convolutional Neural Networks ·2D/3D
Registration ·Landmark Detection
1 Introduction
In recent years, the increasing availability of intra-operative image guidance has
enabled percutaneous alternatives to complex procedures. This is beneficial for
the patient since minimally invasive surgeries are associated with a reduced risk
of infection, less blood loss, and an overall decrease of discomfort. However, this
comes at the cost of increased task-load for the surgeon, who has no direct view
onto the patient’s anatomy but has to rely on indirect feedback through X-ray
images. These suffer from projective transformation; in particular the absence of
depth cues and, depending on the viewing direction, vanishing anatomical land-
marks. One of these procedures is percutaneous pelvis fracture fixation. Pelvis
fractures may be complex with a variety of fracture patterns. In order to fixate
pelvic fractures internally, K-wires must be guided through narrow bone corridors.
Numerous X-ray images from different views may be required to ensure a correct
tool trajectory [2,24]. One possibility to support the surgeon during these pro-
cedures is to supply additional contextual information extracted from the image.
Providing additional, ”implicit 3D” information during these interventions can
drastically ease the mental mapping, where the surgeon has to register the tool in
his hand to the 3D patient anatomy using 2D X-ray images only [22,8]. In this
case, implicit 3D information refers to data that is not 3D as such but provides
meaningful contextual information related to prior knowledge of the surgeon.
A promising candidate for implicit 3D information are the positions of anatom-
ical landmarks in the X-ray images. Anatomical landmarks are biologically mean-
ingful locations in anatomy that can be readily detected and enable correspondence
between specimens and across domains. Inherently, the knowledge of landmark lo-
cations exhibits helpful properties: (1) context is provided, which supports intra-
operative decision making, (2) they supply semantic information, which defines
correspondences across multiple images, and (3) they might foster machine under-
standing. For these reasons, anatomical landmarks are widely used in medicine and
medical imaging, where they serve as orientation in diagnostic and interventional
radiology [7]. They deliver a better interpretation of the patients’ anatomy [28]
and are also of interest for image processing tasks as prerequisite to initialize or
constrain mathematical models [27]. A non-exhaustive review reveals that anatom-
ical landmarks have been used to guide and model segmentation tasks [31,10], to
perform image registration [12], to extract relevant clinical quantitative measure-
ments [16], to plan therapies [14], or to initialize further image processing [17].
Detect Anatomical Landmarks in X-rays From Arbitrary Views 3
Often, knowing the exact location of landmarks is mandatory for the desired
application suggesting that landmarks must be labeled manually [28]. Manual
labeling is time consuming, interrupts the clinical workflow, and is subjective,
which yields rater dependent results. Although important, anatomical landmark
detection is a challenging task due to patient specific variations and ambiguous
anatomical structures. At the same time, automatic algorithms should be fast,
robust, reliable, and accurate.
Landmark detection methods have been developed for various imaging modal-
ities and for 2D or 3D image data [6,7]. In the following overview we focus on 2D
X-ray images. Landmark or key point detection is well understood in computer
vision, where robust feature descriptors disambiguate correspondences between
multiple 2D images, finally enabling purely image-based pose retrieval. Unfortu-
nately, the above concept defined for reflection imaging does not translate di-
rectly to transmission imaging. For the latter, image and landmark appearance
can change fundamentally depending on the viewing direction since the whole 3D
object contributes to the resulting detector measurement.
Most of the current landmark detection approaches either predict the land-
mark positions on the input image directly, or combine these initial estimates
subsequently with a parametric or graphical model fitting step [27]. Constraining
detection results by models that encode prior knowledge can disambiguate false
positive responses. Alternatively, priors can be incorporated implicitly, if multiple
landmarks are detected simultaneously by reducing the search space to possible
configurations [15]. Wang et al. summarized several landmark detection methods
competing in a Grand Challenge [28], where 19 landmarks have to be detected
in 2D cephalometric X-ray images of the craniofacial area, a task necessary for
modern orthodontics. Mader et al. used a U-net to localize ribs in chest radio-
graphs [17]. They solved the problem of ambiguities in the local image informa-
tion (false responses) using a conditional random field. This second step assesses
spatial information between the landmarks and also refines the hypotheses gen-
erated by the U-net. Sa et al. detected intervertebral discs in X-ray images of
the spine to predict a bounding box of the respective vertebrae, by using a pre-
trained Faster-RNN and refining its weights [21]. Payer et al. evaluated different
CNN architectures to detect multiple landmark locations in X-ray images of hand
X-rays by regressing a single heat map for each landmark [19]. In a similar task,
another approach used random forests to detect 37 anatomical landmark in hand
radiographs. The initial estimates were subsequently combined with prior knowl-
edge given by possible landmark configurations. [23]. For each landmark a unique
random regression forest is trained. In Xie et al., anatomical landmarks were a
prerequisite for the segmentation of the pelvis in anterior-posterior radiographs in
order to create a 3D patient specific pelvis model for surgical planning. The shape
model utilized for this purpose is based on anatomical landmarks [30].
All the presented approaches above assume a single, predefined view onto the
anatomy. This assumption is valid for certain applications, where radiographic im-
ages in a diagnostic setup are often acquired in standardized views, but is strongly
violated when view changes continuously e.g. in interventional applications or for
projection data acquired on trajectories, scenarios in which the view changes con-
tinuously. To the best of our knowledge, there exists no approach that is able to
detect anatomical landmarks in X-ray images independent of the viewing direction.
The view independence substantially complicates the landmark detection problem
4 Bier et al.
C
9×9
128
Input Image
615×479
P
2
C
9×9
128
P
2
C
9×9
128
P
2
C
5×5
32
C
9×9
512
C
1×1
512
C
1×1
23
b1
p
76×59
23
C
9×9
128
Input Image
615×479
P
2
C
9×9
128
P
2
C
9×9
128
P
2
C
5×5
32
C
11×11
128
C
11×11
128
C
11×11
128
C
11×11
128
C
1×1
23
wp
w1
Stage 1 Stage >= 2
bt
p
76×59
23
C/P : convolution/pooling
9×9 : filter size
128 : filter number
Fig. 1 Schematic representation of the convolutional neural network used in this work. A single
input image is processed by multiple stages of convolutional and pooling layers, resulting in a
stack of belief maps, where each map corresponds to a landmark location. During the stage-wise
application, these belief maps are refined.
in X-ray images since object edges vanish and anatomical structures overlap due to
the effect of transmission imaging. X-ray transform invariant landmark detection,
therefore, bears great potential to aid fluoroscopic guidance.
In contrast to the landmark detection approaches that deliver implicit 3D in-
formation, several approaches exist that introduce explicit 3D information. These
solutions rely on external markers to track the tools or the patient in 3D [18], con-
sistency conditions to estimate relative pose between X-ray images [1], or 2D/3D
registration of pre-operative CT to intra-operative X-ray to render multiple views
simultaneously [18,25]. While these approaches have proven helpful, they are not
widely accepted in clinical practice. The primary reasons are disruptions to the
surgical workflow [8], as well as susceptibility to both truncation and initialization
due to the low capture range of the optimization target [11].
In this work, we propose an automatic, purely image-based method to detect
multiple anatomical landmarks in X-ray images independent of the viewing direc-
tion. Landmarks are detected using a sequential prediction framework [29] trained
on synthetically generated images. Based on landmark knowledge, we can a) iden-
tify corresponding landmarks between arbitrary views of the same anatomy and
b) estimate pose relative to a pre-procedurally acquired volume without the need
for any calibration. We evaluate our approach on synthetic data and demonstrate
that it generalizes to unseen clinical X-rays of the pelvis without the need for re-
training. Further, we argue that the accuracy of our detections in clinical X-rays
may benefit the initialization of 2D/3D registration. This paper is an extended
version of the work presented at the MICCAI 2018 conference [4] and provides
a broader background on existing landmark detection research, a comprehensive
quantitative analysis of the view invariance on synthetic data, and a quantitative
evaluation on real X-ray images of cadaveric specimens.
2 Materials and Methods
2.1 Network Architecture
The sequential prediction framework used in this work has been initially developed
for human pose estimation [29]. In the original application, the machine learning
task is to detect multiple human joint positions in RGB images. The architecture
is abstractly depicted in Figure 1. Given a single RGB input image, the network
Detect Anatomical Landmarks in X-rays From Arbitrary Views 5
predicts multiple belief maps bp
tfor each joint position p[1, ..., P ] at the end of
every stage t[1, ..., T ] of the network. In the first stage, initial belief maps bp
1
are predicted based only on local image information. Image features are extracted
using a stack of convolutional and pooling layers with Rectified Linear Units (Re-
LUs) as activation functions, described by weights w1. In following stages t2,
the predicted belief maps bp
tare obtained by combining local image information
extracted by the layers with weights wpand the prediction results of the preceding
stage. Note that this combination is implemented using a concatenation operation.
The weights wpare shared for all stages t2. The cost function Cis the sum
of the L2-losses between the predicted belief maps bp
tand the ground truth belief
maps b
t:
C=
T
X
t=1
P
X
p=1
||bp
tb
t||2
2(1)
The ground truth belief maps b
tcontain a normal distribution, centered at
the ground truth joint position. By design, the network imposes several properties:
The key element of the architecture is that the belief maps are predicted based on
local image information as well as the results of the preceding stage. This enables
the model to learn long-range contextual dependencies of landmark configurations.
The belief maps of the first stage bp
1are predicted only on local image informa-
tion, which leads to false positive responses due to ambiguities in the local image
appearance. The stage wise application resolves these by implicitly incorporating
the characteristic configuration of the landmark positions. Furthermore, the net-
work has a very large receptive field that also increases over stages, which enables
the learning of spatial dependencies over long distances. Lastly, the loss over all
intermediate predictions bp
tis computed, which counteracts the vanishing gradient
effect and simultaneously guides the network to focus early on the detection task.
A drawback of this architecture is the small size of the output belief maps that
are downsampled by a factor of around eight compared to the input size.
2.2 Landmark Detection
We exploit the aforementioned advantages of sequential prediction frameworks
for the detection of anatomical landmarks in X-ray images independent of their
viewing direction. Our assumption is that anatomical landmarks exhibit strong
constraints and thus characteristic patterns even in presence of arbitrary viewing
angles. In fact, this assumption may be even stronger compared to human pose
estimation if limited anatomy, such as the pelvis, is considered due to rigidity.
Within this paper and as a first proof-of-concept, we study anatomical landmarks
on the pelvis. We devise a network adapted from [29] with six stages to simultane-
ously predict 23 belief maps per X-ray image that are used for landmark location
extraction, as shown in Figure 1.
In order to obtain the predicted landmark positions, all predicted belief maps
bp
tare averaged over all stages prior to estimating the position of the landmarks
yielding the averaged belief map bp. We then define the landmark position lpas the
position with the highest response in bp. Since the belief maps are downsampled,
6 Bier et al.
1
3
7
11
15
19
21
6
5
2
20
22
12
8
4
18
17
23
13
14
910
Fig. 2 The pelvis bone of a CT of the used data set is rendered with the corresponding 3D
landmark labels that have been labeled manually. Orange dots with numbers indicate visible
landmarks. Landmarks hidden due to the rendering are marked with a grey box and number
(e.g. the tip of the right femoral head, landmark #13).
the maximum location is computed in a sub pixel accuracy by a Maximum Like-
lihood Estimation of the Gaussian estimate. If the maximum response in a belief
map is below 0.4, the landmarks are discarded since they may be outside the field
of view or are not reliably recognized. The implementation was done in Python
and Tensorflow. The hyperparameters for the network training are set to 106for
the learning rate and a batch size of one. Optimization was performed using Adam
over 30 epochs until convergence in the validation set had been reached.
2.3 Data Generation
The network training requires a data set of X-ray images with corresponding land-
mark positions. Manual labeling is infeasible for various reasons: First of all, the
labeling process to obtain the required amount of training data is time costly.
However, and more importantly, an accurate and consistent labeling cannot be
guaranteed in the 2D projection images due to the discussed properties of trans-
mission imaging (vanishing edges, superimposed anatomy). Therefore, we synthet-
ically generated the training data from full body CTs of the NIH Cancer Imaging
Archive [20]. In total, 23 landmark positions were manually labeled in 20 CTs of
male and female patients using 3D volume renderings in 3D Slicer [5]. The land-
mark positions have been selected to be clinically meaningful, to have a good visi-
bility in the projection images, and to be consistently identifiable on the anatomy.
The selected landmarks are depicted in Figure 2.
Subsequently, data was obtained by forward projection of the volume and the
respective 3D labels with the same X-ray imaging geometry, resulting in a set
of X-ray images with corresponding landmark positions. The synthetic X-ray im-
ages had a size of 615 ×479 pixels with an isotropic pixel spacing of 0.616 mm.
Detect Anatomical Landmarks in X-rays From Arbitrary Views 7
The corresponding ground truth belief maps were downsampled by a factor of
about eight and had a size of 76 ×59. For data generation, two factors are im-
portant to emphasize: (1) The augmentation of training data in order to obtain
view-invariance is crucial. To this end, we applied random translation to the CT
volume, varied the source-to-isocenter distance, applied flipping on the detector,
and most importantly, varied the angular range of the X-ray source position on
a spherical segment of 120in LAO/RAO and in 90in CRAN/CAUD, centered
around an AP view of the pelvis. This range approximates the range of variation
in X-ray images during surgical procedures on the pelvis [13]. (2) A realistic for-
ward projector that accounts for physically accurate image formation, while being
capable of fast data generation was used to obtain realistic synthetic training data.
This allows direct application of the network model to real clinical X-ray images.
The forward projector computes material-dependent attenuation images that are
converted into synthetic X-rays [26]. In total, 20000 X-rays were generated and
split 18 ×1×1-fold into training, validation, and testing, where we ensured that
images of one patient are not shared among these sets.
2.4 2D/3D Registration
As motivated previously, the detected anatomical landmarks offer a range of pos-
sible applications. In this work, we focus on the example of initializing 2D/3D
registration. To this end, 2D landmark positions are automatically extracted from
X-ray images while 3D points are obtained from a manually labeled pre-operative
CT acquisition of the same patient. Since the landmark detections supply semantic
information, correspondences between the 2D and 3D points are defined, which en-
ables the computation of the projection matrix PR3×4in closed form across the
two domains [9]. The set of 2D detections are expressed as homogeneous vectors
as dnR3with n[1,...,N]. Each point contains the entries dn= (xn.yn, wn).
The set of corresponding 3D points are denoted as homogeneous vectors rnR4.
Following the direct linear transform, each correspondence yields two linearly in-
dependent equations [9, p.178]:
0TwirT
iyirT
i
wirT
i0TxirT
i
p1
p2
p3
=0.(2)
With Nbeing the number of corresponding points, these rows are stacked into
a measurement matrix that results in a size of 2N×12. p1,p2, and p3are vectors
R4that contain the entries of the projection matrix P. These are obtained
subsequently by computing the null space of the measurement matrix.
3 Experiments and Results
The result section is split into two parts: In the first part, the results of the
landmark detection on the synthetic data set are presented. In the second part,
the network trained on synthetic data is used to predict anatomical landmarks
in real X-ray images acquired using a clinical C-arm X-ray system of cadaveric
specimens. Note, that the network has not been re-trained for this purpose. For
both cases, the results are presented qualitatively and quantitatively.
8 Bier et al.
CRA 45°
CAU 45°
RAO 60°LAO 60°
Fig. 3 Predicted landmark positions for example projection images sampled across the sam-
pled spherical segment of the synthetic test data set. Ground truth positions are marked with
blue labels and automatic detection with red labels. Note, that each projection image is pro-
cessed independently.
Fig. 4 Accuracy depending on the viewing direction of the X-ray source. In average the
detection result from central views is superior to the ones at the border of the sphere. The
accuracy is defined as the ratio of landmarks that have an error below 15 pixels in the respective
view.
3.1 Synthetic Data
For the evaluation of the view-invariant landmark detection, we created X-ray
images of the testing CT data set that were uniformly sampled across the whole
spherical segment with an angular spacing of 5in both dimensions. A standard
setting for the geometry with 750 mm source-to-isocenter distance and 1,200 mm
source-to-detector distance was used. These distances are not varied in the evalu-
ation, since the focus is the angular dependent detectability of landmarks.
Detect Anatomical Landmarks in X-rays From Arbitrary Views 9
Table 1 Individual landmark belief and error. Average Belief is the average of the highest
responses in the belief maps for a certain landmark. Average Error is the average distance
between the landmark detection and its ground truth location. The columns Q1,Q2,Q3 and
Q4 also contain the average error, but evaluated only in a particular quadrant of the spherical
segment to indicate detectability changes of certain landmarks across the spherical segment.
Error values are given in pixels.
Landmark # Average Belief Average Error [pixel] Q1 Q2 Q3 Q4
1 0.79 7.60 9.42 5.35 7.13 7.89
2 0.84 6.68 5.66 7.67 6.63 6.05
3 0.83 6.86 9.13 7.81 5.07 5.26
4 0.87 7.69 8.79 10.2 7.11 4.70
5 0.85 7.53 8.11 8.47 6.63 6.62
6 0.82 5.63 4.72 5.21 5.97 6.35
7 0.78 7.90 7.96 7.48 8.29 7.99
8 0.77 10.1 5.87 12.1 7.70 15.3
9 0.90 5.26 5.15 5.08 5.55 5.07
10 0.88 7.19 7.60 6.90 5.80 8.41
11 0.89 6.43 5.77 5.99 6.86 6.83
12 0.91 7.78 8.96 7.23 5.71 8.55
13 0.92 4.47 5.64 4.10 4.67 3.24
14 0.90 5.64 3.70 7.00 5.24 6.18
15 0.85 9.04 8.77 9.54 7.75 10.3
16 0.82 7.23 6.55 6.95 7.26 8.18
17 0.81 19.9 20.0 24.2 15.2 21.1
18 0.80 15.3 11.2 16.6 14.5 19.3
19 0.74 9.56 10.4 10.4 9.80 7.09
20 0.77 8.59 5.78 12.9 6.83 8.91
21 0.51 9.40 14.3 6.86 13.9 8.51
22 0.44 13.7 9.73 25.0 10.1 16.2
23 0.51 26.0 24.2 17.6 39.3 29.8
Average 9.10 ±7.38
In Figure 3, the detection results are presented qualitatively and compared
to the ground truth positions. Overall, the qualitative agreement between ground
truth locations and predictions is very good. Quantitatively, the average distance
between ground truth positions and detection across all projections and landmarks
is 9.1±7.4 pixels (5.6±4.5 mm). Note that, as motivated previously, belief map
responses lower than 0.4 are considered as landmark not detected and the cor-
responding landmark are excluded from the statistics. Graphically, the detection
accuracy is plotted across all viewing directions in Figure 4. We define the detec-
tion accuracy as the percentage of landmarks that have an error smaller than a
distance threshold of 15 pixels in the respective view. The detection accuracy is
also plotted against this threshold in Figure 8.
In Table 1, a more detailed analysis of the error across the different landmarks
and the view position is presented. For each landmark, the average maximum be-
lief, the average error across all projections, as well as the error across quadrants is
shown. For the latter, the spherical segment is subdivided into four areas, centered
at the AP position with two perpendicular divisions across the CRAN/CAUD and
RAO/LAO axis. This reveals three interesting observations: first, some landmarks
have an overall lower error (e.g. landmark #9 with an average error of 5.26pixels),
while others are detected poorly (e.g. landmark #23 with an average error of
10 Bier et al.
0 0.2 0.4 0.6 0.8 1
0
20
40
60
80
100
Belief
Distance Error [pixel]
Fig. 5 The error of a landmark detection is plotted onto the belief of the corresponding
landmark detection. A correlation can be observed: higher beliefs indicate lower detection
errors.
Fig. 6 Maximum belief depending on the viewing direction for two landmarks (#11 and #19).
While landmark #11 (left) is equally well visible across views, the belief for landmark #19
(right) changes substantially across views.
26.05 pixels). Second, there exists a correlation between the average maximum be-
lief response and the average error: the higher the response in a belief map, the
lower the error. This observation is also supported by the scatter plot presented
in Figure 5, where for each prediction the detection error is plotted over the corre-
sponding maximum belief map response. Third, some landmarks can be detected
equally well, independently of the viewing direction (e.g. landmark #11), while for
others, the detectability highly varies across the quadrants (e.g. landmark #19).
This observation is graphically well visible for these two landmarks, as shown in
Figure 6.
We further investigated how the belief map response develops over the stages of
the network and how ambiguities in the early stages are resolved. In Figure 7, two
example projections are shown, overlain by their corresponding belief maps at the
respective stage. In the first row, the landmark of interest (tip of the right femoral
head) is outside the field of view. However, a false position response appears after
the first stage due to the similar appearance of the anatomy. With further stages,
this ambiguity gets resolved. In the second row, a similar behavior is visible and a
refinement of the prediction accuracy is clearly observable. The development of a
Detect Anatomical Landmarks in X-rays From Arbitrary Views 11
Example 2 Example 1
After Stage 6
After Stage 1 After Stage 3
Fig. 7 Initial, intermediate, and final belief map predicted by the model. The detection task
in both cases is to detect the tip of the right femur. False positive responses due to ambiguities
in the local image information are resolved over the stages.
0 5 10 15 20 25 30
0
50
100
Distance Threshold [Pixel]
Detection Accuracy [%]
1 Stage
2 Stages
3 Stages
4 Stages
5 Stages
6 Stages
Fig. 8 Accuracy depending on the distance threshold for intermediate stages.
landmark belief is also shown in Figure 8. Here, the detection accuracy is plotted
over the error distance tolerance for the belief maps at certain stages. Identical to
above, a landmark is considered detected, if the error to its ground truth location
is smaller than the Distance Threshold. It can be well observed that the detection
results are refined with increasing stages.
3.2 Clinical Data
For the evaluation of the view-invariant landmark detection on real X-ray image
data, five cadaveric data sets have been processed, each set consisting of a pre-
operative CT scan and intra-operative X-ray sequences, taken from arbitrary and
unknown viewing angles. In order to enable the retrieval of X-ray geometry, metal
beads (BBs) were injected into the pelvis before imaging. To retrieve the X-ray
12 Bier et al.
projection geometry, first BB correspondences are established between individual
images of the intra-operative X-ray sequence. Then, the fundamental matrix is
computed for each image pair allowing for the 3D reconstruction of the BB po-
sitions [9]. This 3D reconstruction was then registered to the 3D BB locations
extracted from the CT volume, allowing for an exact registration of each BB in
2D space to its corresponding location in 3D space. With these correspondences
established, the projection matrices for each X-ray image was then calculated in
closed form solution as in Equation 2. To evaluate the reprojection error of these
projection matrices (which defines a lower bound on the accuracy achievable using
anatomical landmarks), the 3D BB coordinates as per the CT scan are forward
projected into 2D space. Table 2 shows the reprojection error between the forward
projection and the real X-ray images for the X-ray sequences. Note that one of
this sequences has a tool in the field of view (sequence #3), while another shows
a fracture of the pelvis (sequence #2). The low reprojection error of 2–5 px is
in line with our expectations, and suggests that the resulting projection matrices
are an appropriate reference when evaluating the performance of our proposed
view-invariant landmark detection.
In the top row of Figure 9, example landmark detection results in X-ray images
of these sequences are shown. Automatic detections and ground truth locations
are indicated with red and blue crosses, respectively. Overall, the well agreement
between the automatic detections and the ground truth positions can be appreci-
ated for various poses and truncations. In complicated situations when tools are
present in the image, the landmark detection approach fails to detect the sur-
rounding landmarks, as can be seen in example 4. However, this is not surprising
since such situations were not part of the training data set. Small unknown ob-
jects, such as the metallic beads on the bone, seem to only have a limited influence
on performance. Furthermore, example 5 depicts an image of the sequence where
a fracture is present in the X-ray image, indicated with the white arrow. Qualita-
tively, this did not influence the detection substantially. Quantitatively, the overall
deviation between true and predicted landmark locations averaged over the total
106 real images is shown in Table 2 in column titled Landmark Error.
The bottom row in Figure 9 shows digitally reconstructed radiographs (DRRs)
of the CT volumes belonging to the real X-ray image of the same patient shown
above. The geometry for generating the DRRs has been obtained in closed form
2D/3D registration of the detected landmarks to the 3D labels in the CT volume, as
described in Section 2.4. For these various poses, the landmark detection accuracy
proves sufficient to achieve views that are very similar to the target X-ray image,
suggesting successful initialization.
4 Discussion and Conclusion
We presented a novel approach to detect anatomical landmarks in X-ray images
independent of the viewing direction. The landmark locations supply additional
information for the surgeon and enable various applications, including global ini-
tialization of 2D/3D registration in closed form. Due to the characteristics of
transmission imaging, landmark appearances change substantially with the view-
ing direction making anatomical landmark detection a challenging task that has,
to the best of our knowledge, not previously been addressed.
Detect Anatomical Landmarks in X-rays From Arbitrary Views 13
Fig. 9 (Top) Detection results on clinical X-ray images. Landmark detection using the pro-
posed approach are marked with a red cross, ground truth positions with a blue cross. (Bottom)
Forward projections of the corresponding CT volume using the projection matrices computed
by 2D/3D registration between the 2D landmark detections and the 3D labels in the CT
volumes.
Table 2 Quantitative evaluation for the detection results on the X-ray images of the cadaver
specimens. Reprojection Errors (RPE) given in pixels. ref RPE is the error of the reference
pose estimated from the metallic beads in order to project the 3D labels into the 2D X-
ray images. Landmark RPE is the RPE of the metallic markers, using poses estimated with
automatic anatomical landmark detections. Landmark Error is the distance of the detections
to the ground truth positions. With a pixel size of 0.193 mm/px, also the metric error on the
detector is given in mm.
Sequence ref RPE Landmark RPE Landmark Error
#1: specimen 1 2.45 74.31 120.9 (23.33 mm)
#2: specimen 1, with fracture 5.46 173.6 97.82 (18.87 mm)
#3: specimen 1, with tool 2.88 177.4 63.67 (12.28 mm)
#4: specimen 2 2.86 119.4 127.9 (24.68 mm)
#5: specimen 2 2.99 115.3 79.89 (15.41 mm)
We employed a convolutional neural network that consists of multiple stages.
Given a single input image, multiple belief maps are inferred and refined based on
local image information and belief maps of the preceding stage, finally indicating
the respective landmark location. The network was trained on synthetic data gen-
erated using a physics-based framework and evaluated on both synthetic and real
test sets, revealing promising performance.
Despite encouraging results, some limitations remain that we discuss in the
following paragraph, pointing to possible future research directions to overcome
or alleviate these. First of all, the robustness towards unseen scenarios, such as
tools in the image or changes of the anatomy due to fractured anatomy must be
improved. This issue could be addressed with a larger data set that contains such
variation. Also, the accuracy from views of the border of the spherical segment is
slightly inferior compared to frontal views. This might be explained by the higher
amount of overlapping anatomy from these directions as well as a lower amount of
variation of the training data sampled in this area. A possible solution could be to
increase the angular range during training, while limiting validation to the current
14 Bier et al.
range Further, the network architecture in its current state yields belief maps that
are downsampled by a factor of around eight compared to the input image. This
downsampling inherently limits the accuracy of the detection results. While this
accuracy may have been sufficient for the initial purpose of human pose estimation,
in medical imaging, higher accuracy is desirable. Possible improvements that are
subject to future work could be achieved by competing network architectures based
on an encoder-decoder design with skip connections in order to preserve resolu-
tion of the output images. Alternatively, test-time augmentation could be applied
by processing slightly altered versions of the input image with the same network
during application. The results of these multiple outputs could subsequently be
averaged, which might yield higher accuracy. Furthermore, the robustness as well
as the overall accuracy could benefit by providing prior knowledge in the form of a
model-based post-processing step. A possible source of error might be introduced
by the labeling of the landmarks in the 3D volume that, since manual, is inherently
prone to errors. Ideally, an unsupervised landmark or keypoint selection process
would be of great benefit for this approach. As a possible application, we showed
that an initialization of 2D/3D registration based on the automatic detections is
successful without the need for additional calibration. In this work, we relied on
a closed form solution to estimate the image pose which is compelling due to its
simplicity, yet, a more sophisticated approach based on maximum likelihood would
certainly yield superior results in presence of statistical outliers. In this task we
also showed that considering the maximum belief is powerful for selecting reliably
detected landmarks. This additional information can be used as a confidence mea-
sure for further processing tasks. Recently, the proposed concept of view-invariant
anatomical landmark detection has been transferred to projection images of knees
in an attempt to estimate involuntary motion during scans [3]
In conclusion, detecting anatomical landmarks has grown to be an essential tool
in automatic image parsing in diagnostic imaging, suggesting similar importance
for image-guided interventions. The implementation of anatomical landmarks as
a powerful concept for aiding image-guided interventions will be pushed continu-
ously as new approaches, such as this one, strive to achieve clinically acceptable
performance.
Acknowledgements We gratefully acknowledge the support of NIH/NIBIB R01 EB023939,
R21 EB020113, R01 EB016703, R01 EB0223939, and the NVIDIA Corporation with the do-
nation of the GPUs used for this research. Further, the authors acknowledge funding support
from NIH 5R01AR065248-03.
Conflict of Interest The authors declare that they have no conflict of interest.
References
1. Aichert, A., Berger, M., Wang, J., Maass, N., Doerfler, A., Hornegger, J., Maier, A.K.:
Epipolar consistency in transmission imaging. IEEE Trans. Med. Imag. 34(11), 2205–2219
(2015)
2. Baumgartner, R., Libuit, K., Ren, D., Bakr, O., Singh, N., Kandemir, U., Marmor, M.T.,
Morshed, S.: Reduction of radiation exposure from c-arm fluoroscopy during orthopaedic
trauma operations with introduction of real-time dosimetry. Journal of Orthopaedic
Trauma 3(2), e53e58 (2016)
3. Bier, B., Aschoff, K., Syben, C., Unberath, M., Levenston, M., Gold, G., Fahrig, R., Maier,
A.: Detecting anatomical landmarks for motion estimation in weight-bearing imaging of
Detect Anatomical Landmarks in X-rays From Arbitrary Views 15
knees. In: International Workshop on Machine Learning for Medical Image Reconstruction,
pp. 83–90. Springer (2018)
4. Bier, B., Unberath, M., Zaech, J.N., Fotouhi, J., Armand, M., Osgood, G., Navab, N.,
Maier, A.: X-ray-transform invariant anatomical landmark detection for pelvic trauma
surgery. In: International Conference on Medical Image Computing and Computer-
Assisted Intervention, pp. 55–63. Springer (2018)
5. Fedorov, A., Beichel, R., Kalpathy-Cramer, J., Finet, J., Fillion-Robin, J.C., Pujol, S.,
Bauer, C., Jennings, D., Fennessy, F., Sonka, M., Buatti, J., Aylward, S., Miller, J., Pieper,
S., Kikinis, R.: 3d slicer as an image computing platform for the quantitative imaging
network. Magnetic resonance imaging 30(9), 1323–1341 (2012)
6. Ghesu, F.C., Georgescu, B., Mansi, T., Neumann, D., Hornegger, J., Comaniciu, D.: An
artificial agent for anatomical landmark detection in medical images. In: MICCAI, pp.
229–237. Springer (2016)
7. Ghesu, F.C., Georgescu, B., Zheng, Y., Grbic, S., Maier, A., Hornegger, J., Comaniciu, D.:
Multi-scale deep reinforcement learning for real-time 3d-landmark detection in ct scans.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2017)
8. artl, R., Lam, K.S., Wang, J., Korge, A., Audig´e, F.K.L.: Worldwide survey on the use
of navigation in spine surgery. World Neurosurg 379(1), 162172 (2013)
9. Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge
University Press, ISBN: 0521540518 (2004)
10. Heimann, T., Meinzer, H.P.: Statistical shape models for 3d medical image segmentation:
a review. Medical image analysis 13(4), 543–563 (2009)
11. Hou, B., Alansary, A., McDonagh, S., Davidson, A., Rutherford, M., Hajnal, J.V., Rueck-
ert, D., Glocker, B., Kainz, B.: Predicting slice-to-volume transformation in presence of
arbitrary subject motion. In: MICCAI, pp. 296–304. Springer (2017)
12. Johnson, H.J., Christensen, G.E.: Consistent landmark and intensity-based image regis-
tration. IEEE transactions on medical imaging 21(5), 450–461 (2002)
13. Khurana, B., Sheehan, S.E., Sodickson, A.D., Weaver, M.J.: Pelvic ring fractures: what
the orthopedic surgeon wants to know. Radiographics 34(5), 1317–1333 (2014)
14. Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der
Laak, J.A., Van Ginneken, B., S´anchez, C.I.: A survey on deep learning in medical image
analysis. Medical image analysis 42, 60–88 (2017)
15. Liu, D., Zhou, K.S., Bernhardt, D., Comaniciu, D.: Search strategies for multiple landmark
detection by submodular maximization. In: Computer Vision and Pattern Recognition
(CVPR), 2010 IEEE Conference on, pp. 2831–2838. IEEE (2010)
16. M Pouch, A., A Yushkevich, P., M Jackson, B., S Jassar, A., Vergnat, M., H Gorman, J.,
C Gorman, R., M Sehgal, C.: Development of a semi-automated method for mitral valve
modeling with medial axis representation using 3d ultrasound. Medical physics 39(2),
933–950 (2012)
17. Mader, A.O., von Berg, J., Fabritz, A., Lorenz, C., Meyer, C.: Localization and labeling
of posterior ribs in chest radiographs using a crf-regularized fcn with local refinement. In:
International Conference on Medical Image Computing and Computer-Assisted Interven-
tion, pp. 562–570. Springer (2018)
18. Markelj, P., Tomaˇzeviˇc, D., Likar, B., Perns, F.: A review of 3d/2d registration methods
for image-guided interventions. Med Image Anal 16(3), 642–661 (2012)
19. Payer, C., ˇ
Stern, D., Bischof, H., Urschler, M.: Regressing heatmaps for multiple landmark
localization using cnns. In: International Conference on Medical Image Computing and
Computer-Assisted Intervention, pp. 230–238. Springer (2016)
20. Roth, H., Lu, L., Seff, A., Cherry, K.M., Hoffman, J., Wang, S., Summers, R.M.: A new
2.5 d representation for lymph node detection in ct. The Cancer Imaging Archive (2015)
21. Sa, R., Owens, W., Wiegand, R., Studin, M., Capoferri, D., Barooha, K., Greaux, A.,
Rattray, R., Hutton, A., Cintineo, J., Chaudhary, V.: Intervertebral disc detection in x-
ray images using faster r-cnn. In: Engineering in Medicine and Biology Society (EMBC),
2017 39th Annual International Conference of the IEEE, pp. 564–567. IEEE (2017)
22. Starr, R., Jones, A., Reinert, C., Borer, D.: Preliminary results and complications follow-
ing limited open reduction and percutaneous screw fixation of displaced fractures of the
acetabulum. Injury 32, SA45–50 (2001)
23. ˇ
Stern, D., Ebner, T., Urschler, M.: From local to global random regression forests: ex-
ploring anatomical landmark localization. In: International Conference on Medical Image
Computing and Computer-Assisted Intervention, pp. 221–229. Springer (2016)
16 Bier et al.
24. St¨ockle, U., Schaser, K., K¨onig, B.: Image guidance in pelvic and acetabular surgeryex-
pectations, success and limitations. Injury 38(4), 450462 (2007)
25. Tucker, E., Fotouhi, J., Lee, S., Unberath, M., Fuerst, B., Johnson, A., Armand, M.,
Osgood, G., Navab, N.: Towards clinical translation of augmented orthopedic surgery:
from pre-op ct to intra-op x-ray via rgbd sensing. In: SPIE Medical Imaging (2018)
26. Unberath, M., Zaech, J.N., Lee, S.C., Bier, B., Fotouhi, J., Armand, M., Navab, N.:
Deepdrr–a catalyst for machine learning in fluoroscopy-guided procedures. In: Inter-
national Conference on Medical Image Computing and Computer-Assisted Intervention.
Springer (2018)
27. Urschler, M., Ebner, T., ˇ
Stern, D.: Integrating geometric configuration and appearance
information into a unified framework for anatomical landmark localization. Medical image
analysis 43, 23–36 (2018)
28. Wang, C.W., Huang, C.T., Hsieh, M.C., Li, C.H., Chang, S.W., Li, W.C., Vandaele, R.,
Mar´ee, R., Jodogne, S., Geurts, P., Chen, C., Zhen, G., chu, C., Mirzaalian, H., Vrtovec,
T., Ibragimov, B.: Evaluation and comparison of anatomical landmark detection methods
for cephalometric x-ray images: A grand challenge. IEEE transactions on medical imaging
34(9), 1890–1900 (2015)
29. Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In:
CVPR, pp. 4724–4732 (2016)
30. Xie, W., Franke, J., Chen, C., Gr¨utzner, P.A., Schumann, S., Nolte, L.P., Zheng, G.:
A complete-pelvis segmentation framework for image-free total hip arthroplasty (tha):
methodology and clinical study. The International Journal of Medical Robotics and Com-
puter Assisted Surgery 11(2), 166–180 (2015)
31. Zheng, Y., Barbu, A., Georgescu, B., Scheuering, M., Comaniciu, D.: Four-chamber heart
modeling and automatic segmentation for 3-d cardiac ct volumes using marginal space
learning and steerable features. IEEE transactions on medical imaging 27(11), 1668–1681
(2008)
... Bier et al. [16] employed a two-stage pipeline that estimated heat maps to detect landmarks in hip X-rays. X-rays passed through a series of convolutional layers to yield a first set of heat maps. ...
... These heat maps worked as input for a second CNN that generated a second set of heat map estimations. However, contrary to Bier et al. [16], both heat maps were multiplied to produce the final landmark estimations instead of going through more layers. ...
Article
Full-text available
In medical imaging, automated landmark detection estimates the position of anatomical points in images to derive measurements. Previous approaches commonly employ coordinate regression. Landmark segmentation, a technique in which masks centered at the target point are segmented, has recently shown promising results. Here, we present segmentation-guided coordinate regression, a methodology that fuses both approaches and balances accuracy and robustness. Our approach identifies masks centered at landmarks using a segmentation network. Then, a coordinate regression network estimates the coordinates by employing the input image and the segmentation output.We assessed the methodology's performance by detecting eight landmarks in full lower limb X-rays and investigated the impact of weight initialization, network backbone, and optimization of the loss function. The approach was contrasted with landmark segmentation and coordinate regression and applied to the analysis of lower limb malalignment. Results showed that deeper pretrained models with a weight of 0.2 at the segmentation loss detected landmarks more accurately. Segmentation-guided regression outperformed coordinate regression. Landmark segmentation was hampered by undetected landmarks and false positives. Due to its architecture, the proposed method did not suffer from failed detections, allowing lower limb malalignment to be reliably calculated. With respect to comparable literature, our approach leads to similar or improved results for landmark detection, translating to highly accurate and reliable lower limb malalignment analysis. In conclusion, we proposed a novel method for detecting landmarks in X-rays, which leads to a balance in accuracy and robustness and allows the measurement of lower limb malalignment.
... Since minimally invasive surgeries are related to a lower risk of infection, less exsanguination, and a complete decrease in disorder, they are helpful for the patient. However, they increase the workload of a surgeon [6]. Also, they may be less preferred due to radiation exposure, time limitation, and lack of trained and informed personnel [7]- [12]. ...
... In the domain of computer vision, the problem of landmark annotation has been extensively studied for various types of medical images. [5][6][7] However, employing computer vision perception often faces challenges due to similar pixel intensity distributions in local structures, leading to difficulties in precisely locating landmark coordinates with localization errors. [8][9][10][11] To address this issue, machine learning (ML) approaches have been explored over the years for landmark localization. ...
Article
Objectives The objectives of this study are to explore and evaluate the automation of anatomical landmark localization in cephalometric images using machine learning techniques, with a focus on feature extraction and combinations, contextual analysis, and model interpretability through Shapley Additive exPlanations (SHAP) values. Methods We conducted extensive experimentation on a private dataset of 300 lateral cephalograms to thoroughly study the annotation results obtained using pixel feature descriptors including raw pixel, gradient magnitude, gradient direction, and histogram-oriented gradient (HOG) values. The study includes evaluation and comparison of these feature descriptions calculated at different contexts namely local, pyramid, and global. The feature descriptor obtained using individual combinations is used to discern between landmark and nonlandmark pixels using classification method. Additionally, this study addresses the opacity of LGBM ensemble tree models across landmarks, introducing SHAP values to enhance interpretability. Results The performance of feature combinations was assessed using metrics like mean radial error, standard deviation, success detection rate (SDR) (2 mm), and test time. Remarkably, among all the combinations explored, both the HOG and gradient direction operations demonstrated significant performance across all context combinations. At the contextual level, the global texture outperformed the others, although it came with the trade-off of increased test time. The HOG in the local context emerged as the top performer with an SDR of 75.84% compared to others. Conclusions The presented analysis enhances the understanding of the significance of different features and their combinations in the realm of landmark annotation but also paves the way for further exploration of landmark-specific feature combination methods, facilitated by explainability.
Article
The integration of artificial intelligence in image-guided interventions holds transformative potential, promising to extract 3D geometric and quantitative information from conventional 2D imaging modalities during complex procedures. Achieving this requires the rapid and precise alignment of 2D intraoperative images (e.g., X-ray) with 3D preoperative volumes (e.g., CT, MRI). However, current 2D/3D registration methods fail across the broad spectrum of procedures dependent on X-ray guidance: traditional optimization techniques require custom parameter tuning for each subject, whereas neural networks trained on small datasets do not generalize to new patients or require labor-intensive manual annotations, increasing clinical burden and precluding application to new anatomical targets. To address these challenges, we present xvr, a fully automated framework for training patient-specific neural networks for 2D/3D registration. xvr uses physics-based simulation to generate abundant high-quality training data from a patient’s own preoperative volumetric imaging, thereby overcoming the inherently limited ability of supervised models to generalize to new patients and procedures. Furthermore, xvr requires only 5 min of training per patient, making it suitable for emergency interventions as well as planned procedures. We perform the largest evaluation of a 2D/3D registration algorithm on real X-ray data to date and find that xvr robustly generalizes across a diverse dataset comprising multiple anatomical structures, imaging modalities, and hospitals. Across surgical tasks, xvr achieves submillimeter-accurate registration at intraoperative speeds, improving upon existing methods by an order of magnitude. xvr is released as open-source software freely available at https://github.com/eigenvivek/xvr.
Article
A large number of vision-based medical equipment are playing an important role in the clinical process. These equipment have greatly improved the automation and precision through advanced medical image processing technology. The segmentation technology and landmark localization technology for 3D medical image are the two most significat underlying technologies. However, most of the existing methods are single-task, which is not conducive to the integration of algorithms in medical equipment. In this paper, a Dual Prior Knowledge Injection Network (DPKI-Net) is proposed for multi-task 3D medical image segmentation and landmark localization. Task Gradient Decoupling Module (TGDM) and Spatial Prior Module (SPM) are the two core ideas of the proposed method. TGDM applies the historical training process prior knowledge to the task decoupling process. It achieves better task decoupling by changing the gradient ratio of the task separation points. SPM calculates the spatial prior distribution of segmentation object and landmark object, and injects it into the subsequent single-task path to strengthen the internal features of single-task. We constructed two 3D multi-task medical image datasets for validation, both the qualitative and quantitative results show that the proposed method has good performance.
Article
Full-text available
Machine learning-based approaches outperform competing methods in most disciplines relevant to diagnostic radiology. Interventional radiology, however, has not yet benefited substantially from the advent of deep learning, in particular because of two reasons: 1) Most images acquired during the procedure are never archived and are thus not available for learning, and 2) even if they were available, annotations would be a severe challenge due to the vast amounts of data. When considering fluoroscopy-guided procedures, an interesting alternative to true interventional fluoroscopy is in silico simulation of the procedure from 3D diagnostic CT. In this case, labeling is comparably easy and potentially readily available, yet, the appropriateness of resulting synthetic data is dependent on the forward model. In this work, we propose DeepDRR, a framework for fast and realistic simulation of fluoroscopy and digital radiography from CT scans, tightly integrated with the software platforms native to deep learning. We use machine learning for material decomposition and scatter estimation in 3D and 2D, respectively, combined with analytic forward projection and noise injection to achieve the required performance. On the example of anatomical landmark detection in X-ray images of the pelvis, we demonstrate that machine learning models trained on DeepDRRs generalize to unseen clinically acquired data without the need for re-training or domain adaptation. Our results are promising and promote the establishment of machine learning in fluoroscopy-guided procedures.
Article
Full-text available
In approaches for automatic localization of multiple anatomical landmarks, disambiguation of locally similar structures as obtained by locally accurate candidate generation is often performed by solely including high level knowledge about geometric landmark configuration. In our novel localization approach, we propose to combine both image appearance information and geometric landmark configuration into a unified random forest framework integrated into an optimization procedure that iteratively refines joint landmark predictions by using the coordinate descent algorithm. Depending on how strong multiple landmarks are correlated in a specific localization task, this integration has the benefit that it remains flexible in deciding whether appearance information or the geometric configuration of multiple landmarks is the stronger cue for solving a localization problem both accurately and robustly. Furthermore, no preliminary choice on how to encode a graphical model describing landmark configuration has to be made. In an extensive evaluation on five challenging datasets involving different 2D and 3D imaging modalities, we show that our proposed method is widely applicable and delivers state-of-the-art results when compared to various other related methods.
Article
Full-text available
Deep learning algorithms, in particular convolutional networks, have rapidly become a methodology of choice for analyzing medical images. This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year. We survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks and provide concise overviews of studies per application area. Open challenges and directions for future research are discussed.
Chapter
Localization and labeling of posterior ribs in radiographs is an important task and a prerequisite for, e.g., quality assessment, image registration, and automated diagnosis. In this paper, we propose an automatic, general approach for localizing spatially correlated landmarks using a fully convolutional network (FCN) regularized by a conditional random field (CRF) and apply it to rib localization. A reduced CRF state space in form of localization hypotheses (generated by the FCN) is used to make CRF inference feasible, potentially missing correct locations. Thus, we propose a second CRF inference step searching for additional locations. To this end, we introduce a novel “refine” label in the first inference step. For “refine”-labeled nodes, small subgraphs are extracted and a second inference is performed on all image pixels. The approach is thoroughly evaluated on 642 images of the public Indiana chest X-ray collection, achieving a landmark localization rate of 94.6%.
Chapter
Patient motion is one of the major challenges in cone-beam computed tomography (CBCT) scans acquired under weight-bearing conditions, since it leads to severe artifacts in reconstructions. In knee imaging, a state-of-the-art approach to compensate for patient motion uses fiducial markers attached to the skin. However, marker placement is a tedious and time consuming procedure for both, the physician and the patient. In this manuscript we investigate the use of anatomical landmarks in an attempt to replace externally attached fiducial markers. To this end, we devise a method to automatically detect anatomical landmarks in projection domain X-ray images irrespective of the viewing direction. To overcome the need for annotation of every X-ray image and to assure consistent annotation across images from the same subject, annotations and projection images are generated from 3D CT data. Twelve landmarks are annotated in supine CBCT reconstructions of the knee joint and then propagated to synthetically generated projection images. Then, a sequential Convolutional Neuronal Network is trained to predict the desired landmarks in projection images. The network is evaluated on synthetic images and real clinical data. On synthetic data promising results are achieved with a mean prediction error of 8.4±8.28.4 \pm 8.2 pixel. The network generalizes to real clinical data without the need of re-training. However, practical issues, such as the second leg entering the field of view, limit the performance of the method at this stage. Nevertheless, our results are promising and encourage further investigations on the use of anatomical landmarks for motion management.
Article
Robust and fast detection of anatomical structures is a prerequisite for medical image analysis. Current solutions for anatomy detection are typically based on machine learning and are subject to several limitations, including the use of suboptimal feature engineering techniques and most importantly the use of computationally suboptimal search-schemes. To address these issues, we propose a method that follows a new paradigm by reformulating the detection problem as a behavior learning task for an artificial agent. We couple the modeling of the anatomy appearance and the object search in a unified behavioral framework, using the capabilities of deep reinforcement learning and multi-scale image analysis. In other words, an artificial agent is trained not only to distinguish the target anatomical object from the rest of the body but also how to find the object by learning and following an optimal navigation path to the target object in the imaged volumetric space. We evaluate our approach on 1487 3D-CT volumes from 532 patients and show that we significantly outperform state-of-the-art solutions on detecting several anatomical structures with no failed cases, while also improving the detection accuracy by 20-30%. Most importantly, we improve the detection-speed of the reference methods by 2-3 orders of magnitude, achieving unmatched real-time performance on large 3D-CT scans.
Conference Paper
Automatic identification of specific osseous landmarks on the spinal radiograph can be used to automate calculations for correcting ligament instability and injury, which affect 75% of patients injured in motor vehicle accidents. In this work, we propose to use deep learning based object detection method as the first step towards identifying landmark points in lateral lumbar X-ray images. The significant breakthrough of deep learning technology has made it a prevailing choice for perception based applications, however, the lack of large annotated training dataset has brought challenges to utilizing the technology in medical image processing field. In this work, we propose to fine tune a deep network, Faster-RCNN, a state-of-the-art deep detection network in natural image domain, using small annotated clinical datasets. In the experiment we show that, by using only 81 lateral lumbar X-Ray training images, one can achieve much better performance compared to traditional sliding window detection method on hand crafted features. Furthermore, we fine-tuned the network using 974 training images and tested on 108 images, which achieved average precision of 0.905 with average computation time of 3 second per image, which greatly outperformed traditional methods in terms of accuracy and efficiency.
Conference Paper
Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show a systematic design for how convolutional networks can be incorporated into the pose machine framework for learning image features and image-dependent spatial models for the task of pose estimation. The contribution of this paper is to implicitly model long-range dependencies between variables in structured prediction tasks such as articulated pose estimation. We achieve this by designing a sequential architecture composed of convolutional networks that directly operate on belief maps from previous stages, producing increasingly refined estimates for part locations, without the need for explicit graphical model-style inference. Our approach addresses the characteristic difficulty of vanishing gradients during training by providing a natural learning objective function that enforces intermediate supervision, thereby replenishing back-propagated gradients and conditioning the learning procedure. We demonstrate state-of-the-art performance and outperform competing methods on standard benchmarks including the MPII, LSP, and FLIC datasets.
Conference Paper
This paper aims to solve a fundamental problem in intensity-based 2D/3D registration, which concerns the limited capture range and need for very good initialization of state-of-the-art image registration methods. We propose a regression approach that learns to predict rotations and translations of arbitrary 2D image slices from 3D volumes, with respect to a learned canonical atlas co-ordinate system. To this end, we utilize Convolutional Neural Networks (CNNs) to learn the highly complex regression function that maps 2D image slices into their correct position and orientation in 3D space. Our approach is attractive in challenging imaging scenarios, where significant subject motion complicates reconstruction performance of 3D volumes from 2D slice data. We extensively evaluate the effectiveness of our approach quantitatively on simulated MRI brain data with extreme random motion. We further demonstrate qualitative results on fetal MRI where our method is integrated into a full reconstruction and motion compensation pipeline. With our CNN regression approach we obtain an average prediction error of 7 mm on simulated data, and convincing reconstruction quality of images of very young fetuses where previous methods fail. We further discuss applications to Computed Tomography (CT) and X-Ray projections. Our approach is a general solution to the 2D/3D initialization problem. It is computationally efficient, with prediction times per slice of a few milliseconds, making it suitable for real-time scenarios.