Conference PaperPDF Available

From Local to Global Random Regression Forests: Exploring Anatomical Landmark Localization

October 2016

October 2016

DOI:10.1007/978-3-319-46723-8_26

Conference: Medical Image Computing and Computer Assisted Intervention (MICCAI 2016)
At: Athens, Greece

Authors:

Darko Štern

Medical University of Graz

Thomas Ebner

Martin Urschler

Ludwig Boltzmann Institute for Clinical-Forensic Imaging, Graz, Austria

State of the art anatomical landmark localization algorithms pair local Random Forest (RF) detection with disambiguation of locally similar structures by including high level knowledge about relative landmark locations. In this work we pursue the question, how much high-level knowledge is needed in addition to a single landmark localization RF to implicitly model the global configuration of multiple, potentially ambiguous landmarks. We further propose a novel RF localization algorithm that distinguishes locally similar structures by automatically identifying them, exploring the back-projection of the response from accurate local RF predictions. In our experiments we show that this approach achieves competitive results in single and multi-landmark localization when applied to 2D hand radiographic and 3D teeth MRI data sets. Additionally, when combined with a simple Markov Random Field model, we are able to outperform state of the art methods.

Overview of our RRF based localization strategy. (a) 37 anatomical landmarks in 2D hand X-ray images and differently colored MRF configurations. (b) In phase 1, RRF is trained locally on an area surrounding a landmark (radius R) with short range features, resulting in accurate but ambiguous landmark predictions (c). (d) Backprojection is applied to select pixels for training the RRF in phase 2 with larger feature range (e). (f) Estimated landmarks by accumulating predictions of pixels in local neighbourhood . (g,h) One of two independently predicted wisdom teeth from 3D MRI.

…

Figures - uploaded by Martin Urschler

Content may be subject to copyright.

Content uploaded by Martin Urschler

Content may be subject to copyright.

From Local to Global Random

Regression Forests: Exploring Anatomical

Landmark Localization

Darko ˇ

Stern1?, Thomas Ebner2, and Martin Urschler1,2,3

1Ludwig Boltzmann Institute for Clinical Forensic Imaging, Graz, Austria

2Institute for Computer Graphics and Vision, Graz University of Technology, Austria

3BioTechMed-Graz, Austria

Abstract. State of the art anatomical landmark localization algorithms

pair local Random Forest (RF) detection with disambiguation of locally

similar structures by including high level knowledge about relative land-

mark locations. In this work we pursue the question, how much high-level

knowledge is needed in addition to a single landmark localization RF to

implicitly model the global conﬁguration of multiple, potentially ambigu-

ous landmarks. We further propose a novel RF localization algorithm

that distinguishes locally similar structures by automatically identifying

them, exploring the back-projection of the response from accurate local

RF predictions. In our experiments we show that this approach achieves

competitive results in single and multi-landmark localization when ap-

plied to 2D hand radiographic and 3D teeth MRI data sets. Additionally,

when combined with a simple Markov Random Field model, we are able

to outperform state of the art methods.

1 Introduction

Automatic localization of anatomical structures consisting of potentially ambigu-

ous (i.e. locally similar) landmarks is a crucial step in medical image analysis

applications like registration or segmentation. Lindner et al. [5] propose a state

of the art localization algorithm, which is composed of a sophisticated statistical

shape model (SSM) that locally detects landmark candidates by three step opti-

mization over a random forest (RF) response function. Similarly, Donner et al. [2]

use locally restricted classiﬁcation RFs to generate landmark candidates, fol-

lowed by a Markov Random Field (MRF) optimizing their conﬁguration. Thus,

in both approaches good RF localization accuracy is paired with disambiguation

of landmarks by including high-level knowledge about their relative location. A

diﬀerent concept for localizing anatomical structures is from Criminisi et al. [1],

suggesting that the RF framework itself is able to learn global structure conﬁgu-

ration. This was achieved with random regression forests (RRF) using arbitrary

?This work was supported by the province of Styria (HTI:Tech for Med ABT08-22-

T-7/2013-13) and the Austrian Science Fund (FWF): P 28078-N33.

2ˇ

Stern et al.

Fig. 1. Overview of our RRF based localization strategy. (a) 37 anatomical landmarks

in 2D hand X-ray images and diﬀerently colored MRF conﬁgurations. (b) In phase

1, RRF is trained locally on an area surrounding a landmark (radius R) with short

range features, resulting in accurate but ambiguous landmark predictions (c). (d) Back-

projection is applied to select pixels for training the RRF in phase 2 with larger feature

range (e). (f) Estimated landmarks by accumulating predictions of pixels in local neigh-

bourhood. (g,h) One of two independently predicted wisdom teeth from 3D MRI.

long range features and allowing pixels from all over the training image to glob-

ally vote for anatomical structures. Although roughly capturing global structure

conﬁguration, their long range voting is inaccurate when pose variations are

present, which led to extending this concept with a graphical model [4]. Ebner

et al. [3] adapted the work of [1] for multiple landmark localization without the

need for an additional model and improved it by introducing a weighting of vot-

ing range at testing time and by adding a second RRF stage restricted to the

local area estimated by the global RRF. Despite putting more trust into the

surroundings of a landmark, their results crucially depend on empirically tuned

parameters deﬁning the restricted area according to ﬁrst stage estimation.

In this work we pursue the question, how much high-level knowledge is needed

in addition to a single landmark localization RRF to implicitly model the global

conﬁguration of multiple, potentially ambiguous landmarks [6]. Investigating dif-

ferent RRF architectures, we propose a novel single landmark localization RRF

algorithm, robust to ambiguous, locally similar structures. When extended with

a simple MRF model, our RRF outperforms the current state of the art method

of Lindner et al. [5] on a challenging multi-landmark 2D hand radiographs data

set, while at the same time performing best in localizing single wisdom teeth

landmarks from 3D head MRI.

2 Method

Although being constrained by all surrounding objects, the location of an anatom-

ical landmark is most accurately deﬁned by its neighboring structures. While

From local to global random regression forest localization 3

increasing the feature range leads to more surrounding objects being seen for

deﬁning a landmark, enlarging the area from which training pixels are drawn

leads to the surrounding objects being able to participate in voting for a land-

mark location. We explore these observations and investigate the inﬂuence of

diﬀerent feature and voting ranges, by proposing several RRF strategies for sin-

gle landmark localization. Following the ideas of Lindner et al. [5] and Donner et

al. [2], in the ﬁrst phase of the proposed RRF architectures, the local surround-

ings of a landmark are accurately deﬁned. The second RRF phase establishes

diﬀerent algorithm variants by exploring distinct feature and voting ranges to

discriminate ambiguous, locally similar structures. In order to maintain the ac-

curacy achieved during the ﬁrst RRF phase, locations outside of a landmark’s

local vicinity are recognized and banned from estimating the landmark location.

2.1 Training the RRF

We independently train an RRF for each anatomical landmark. Similar to [1,3],

at each node of the Ttrees of a forest, the set of pixels Snreaching node nis

pushed to left (Sn,L) or right (Sn,R) child node according to the splitting decision

made by thresholding a feature response for each pixel. Feature responses are

calculated as diﬀerences between mean image intensity of two rectangles with

maximal size sand maximal oﬀset orelative to a pixel position vi;i∈Sn.

Each node stores a feature and threshold selected from a pool of NFrandomly

generated features and NTthresholds, maximizing the objective function I:

I=X

i∈Sn



di−d(Sn)



2−X

c∈{L,R}

i∈Sn,c



di−d(Sn,c)



2.(1)

For pixel set S,diis the i-th voting vector, deﬁned as the vector between land-

mark position land pixel position vi, while d(S) is the mean voting vector of

pixels in S. For later testing, we store at each leaf node lthe mean value of

relative voting vectors dlof all pixels reaching l.

First training phase: Based on a set of pixels SI, selected from the training

images at the location inside a circle of radius Rcentered at the landmark

position, the RRF is ﬁrst trained locally with features whose rectangles have

maximal size in each direction sIand maximal oﬀset oI, see Fig. 1b. Training of

this phase is ﬁnished when a maximal depth DIis reached.

Second training phase: Here, our novel algorithm variants are designed by

implementing diﬀerent strategies how to deal with feature ranges and selection

of the area from which pixels are drawn during training. By pursuing the same

local strategy as in the ﬁrst phase for continuing training of the trees up to a

maximal depth DII , we establish the localRRF similar to the RF part in [5, 2]. If

we continue training to depth DI I with a restriction to pixels SIbut additionally

allow long range features with maximal oﬀset oII >oIand maximal size sI I >sI,

we get fAdaptRRF. Another way of introducing long range features, but still

keeping the same set of pixels SI, was proposed for segmentation in Peter et

al. [7]. They optimize for each forest node the feature size and oﬀset instead

4ˇ

Stern et al.

of the traditional greedy RF node training strategy. For later comparison, we

have adapted the strategy from [7] for our localization task by training trees

from root node to a maximal depth DI I using this optimization. We denote it as

PeterRRF. Finally, we propose two strategies where feature range and area from

which to select pixels are increased in the second training phase. By continuing

training to depth DII , allowing in the second phase large scale features (oI I ,

sII ) and simultaneously extending the training pixels (set of pixels SI I ) to the

whole image, we get the fpAdaptRRF. Here SI I is determined by randomly

sampling from pixels uniformly distributed in the image. The second strategy

uses a diﬀerent set of pixels SII , selected according to back-projection images

computed from the ﬁrst training phase. This concept is a main contribution of

our work, therefore the next paragraph describes it in more detail.

2.2 Pixel Selection by Back-projection Images

In the second training phase, pixels SII from locally similar structures are explic-

itly introduced, since they provide information that may help in disambiguation.

We automatically identify similar structures by applying the RRF from the ﬁrst

phase on all training images in a testing step as described in Section 2.3. Thus,

pixels from the area surrounding the landmark as well as pixels with locally

similar appearance to the landmark end up in the ﬁrst phase RRFs terminal

nodes, since the newly introduced pixels are pushed through the ﬁrst phase

trees. The obtained accumulators show a high response on structures with a

similar appearance compared to the landmark’s local appearance (see Fig. 1c).

To identify pixels voting for a high response, we calculate for each accumula-

tor a back-projection image (see Fig. 1d), obtained by summing for each pixel

vall accumulator values at the target voting positions v+dlof all trees. We

ﬁnalize our backProjRRF strategy by selecting for each tree training pixels SII

as Npx randomly sampled pixels according to a probability proportional to the

back-projection image (see Fig. 1e).

2.3 Testing the RRF

During testing, all pixels of a previously unseen image are pushed through the

RRF. Starting at the root node, pixels are passed recursively to the left or right

child node according to the feature tests stored at the nodes until a leaf node

is reached. The estimated location of the landmark L(v) is calculated based on

the pixels position vand the relative voting vector dlstored in the leaf node l.

However, if the length of voting vector |dl|is larger than radius R, i.e. pixel v

is not in the area closely surrounding the landmark, the estimated location is

omitted from the accumulation of the landmark location predictions. Separately

for each landmark, the pixel’s estimations are stored in an accumulator image.

2.4 MRF Model

For multi-landmark localization, high-level knowledge about landmark conﬁgu-

ration may be used to further improve disambiguation between locally similar

From local to global random regression forest localization 5

structures. An MRF selects the best candidate for each landmark according to

the RRF accumulator values and a geometric model of the relative distances be-

tween landmarks, see Fig. 1a. In the MRF model, each landmark Licorresponds

to one variable while candidate locations selected as the Ncstrongest maxima

in the landmark’s accumulator determine the possible states of a variable. The

landmark conﬁguration is obtained by optimizing energy function

E(L) =

i=1

Ui(Li) + X

{i,j}∈C

Pi,j (Li, Lj),(2)

where unary term Uiis set to the RRF accumulator value of candidate Liand

the relative distances of two landmarks from the training annotations deﬁne

pairwise term Pi,j , modeled as normal distributions for landmark pairs in set C.

3 Experimental Setup and Results

We evaluate the performance of our landmark localization RRF variants on data

sets of 2D hand X-ray images and 3D MR images of human teeth. As evaluation

measure, we use the Euclidean distance between ground truth and estimated

landmark position. To measure reliability, the number of outliers, deﬁned as lo-

calization errors larger than 10mm for hand landmarks and 7 mm for teeth, are

calculated. For both data sets, which were normalized in intensities by perform-

ing histogram matching, we perform a three-fold cross-validation, splitting the

data into 66% training and 33% testing data, respectively.

Hand Dataset consists of 895 2D X-ray hand images publicly available at

Digital Hand Atlas Database 1. Due to their lacking physical pixel resolution,

we assume a wrist width of 50mm, resample the images to a height of 1250

pixels and normalize image distances according to the wrist width as deﬁned

by the ground-truth annotation of two landmarks (see Fig. 1a). For evaluation,

NL= 37 landmarks, many of them showing locally similar structures, e.g. ﬁnger

tips or joints between the bones, were manually annotated by three experts.

Teeth Dataset consists of 280 3D proton density weighted MR images of

left or right side of the head. In the latter case, images were mirrored to create a

consistent data set of images with 208 x 256 x 30 voxels and a physical resolution

of 0.59 x 0.59 x 1 mm per voxel. Specifying their center locations, two wisdom

teeth per data set were annotated by a dentist. Localization of wisdom teeth is

challenging due to the presence of other locally similar molars (see Fig. 1g).

Experimental setup: For each method described in Section 2, an RRF

consisting of NT= 7 trees is built separately for every landmark. The ﬁrst RRF

phase is trained using pixels from training images within a range of R= 10mm

around each landmark position. The splitting criterion for each node is greedily

optimized with NF= 20 candidate features and NT= 10 candidate thresholds

except for PeterRRF. The random feature rectangles are deﬁned by maximal

1Available from http://www.ipilab.org/BAAweb/, as of Jan. 2016

6ˇ

Stern et al.

error [mm]

0 5 10 15

Cumulative Distribution

0.80

0.82

0.84

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

hand dataset

CriminisiRRF

EbnerRRF

localRRF

PeterRRF

fAdaptRRF

fpAdaptRRF

backProjRRF

error [mm]

0 5 10 15 20

Cumulative Distribution

0.80

0.82

0.84

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

teeth dataset

CriminisiRRF

EbnerRRF

localRRF

PeterRRF

fAdaptRRF

fpAdaptRRF

backProjRRF

Fig. 2. Cumulative localization error distributions for hand and teeth data sets.

size in each direction sI= 1mm and maximal oﬀset oI=R. In the second RRF

phase, Npx = 10000 pixels are introduced and feature range is increased to a

maximal feature size sII = 50mm and oﬀset in each direction oI I = 50mm.

Treating each landmark independently on both 2D hands and 3D teeth

dataset, the single-landmark experiments show the performance of the

methods in case it is not feasible (due to lack of annotation) or semantically

meaningful (e.g. third vs. other molars) to deﬁne all available locally similar

structures. We compare our algorithms that start with local feature scale ranges

and increase to more global scale ranges (localRRF, fAdaptRRF, PeterRRF,

fpAdaptRRF, backProjRRF ) with reimplementations of two related works that

start from global feature scale ranges (CriminisiRRF [1], with maximal feature

size sII and oﬀset oI I from pixels uniformly distributed over the image) and op-

tionally decrease to more local ranges (EbnerRRF [3]). First training phases stop

for all methods at DI= 13, while the second phase continues training within

the same trees until DII = 25. To ensure fair comparison, we use the same

RRF parameters for all methods, except for the number of candidate features

in PeterRRF, which was set to NF= 500 as suggested in [7]. Cumulative error

distribution results of the single-landmark experiments can be found in Fig. 2.

Table 1 shows quantitative localization results regarding reliability for all hand

landmarks and for subset conﬁgurations (ﬁngertips, carpals, radius/ulna).

The multi-landmark experiments allow us to investigate the beneﬁts

of adding high level knowledge about landmark conﬁguration via an MRF to

the prediction. In addition to our reimplementation of the related works [1,3],

Lindner et al. [5] applied their code onto our hand data set using DI= 25 in their

implementation of the local RF stage. To allow a fair comparison with Lindner

et al. [5], we modify our two training phases by training two separate forests

for both stages until maximum depths DI=DII = 25, instead of continuing

training trees of a single forest. Thus, we investigate our presented backProjRRF,

the combination of backProjRRF with an MRF, localRRF combined with an

MRF, and the two state of the art methods from Ebner et al. [3] (EbnerRRF )

From local to global random regression forest localization 7

Table 1. Multi-landmark localization reliability results on hand radiographs for all

landmarks and subset conﬁgurations (compare Fig. 1 for conﬁguration colors).

method mean ±std. outliers

EbnerRRF 0.97 ±2.45 228 (6.89h)

Lindner et al. [5] 0.85 ±1.01 20 (0.60h)

localRRF+MRF 0.80 ±0.91 14 (0.42h)

backProj 0.84 ±1.58 57 (1.72h)

backProj+MRF 0.80 ±0.91 15 (0.45h)

landmark subset localRRF backProj backProj

conﬁguration +MRF +MRF

full ••••14 (0.4h) 15 (0.5h) 57 (1.7h)

ﬁngertips •14 (3.1h)5(1.1h) 17 (3.8h)

radius,ulna •495 (92.2h)6(1.1h) 11 (2.0h)

carpals •17 (2.7h)13 (2.1h) 14 (2.2h)

and Lindner et al. [5]. The MRF, which is solved by a message passing algorithm,

uses Nc= 75 candidate locations (i.e. local accumulator maxima) per landmark

as possible states of the MRF variables. Quantitative results on multi-landmark

localization reliability for the 2D hand data set can be found in Table 1. Since all

our methods including EbnerRRF are based on the same local RRFs, accuracy

is the same with a median error of µhand

E= 0.51mm, which is slightly better

than accuracy of Lindner et al. [5] (µhand

E= 0.64mm).

4 Discussion and Conclusion

Single landmark RRF localization performance is highly inﬂuenced by both, se-

lection of the area from which training pixels are drawn and range of hand-crafted

features used to construct its forest decision rules, yet exact inﬂuence is currently

not fully understood. As shown in Fig. 2, the global CriminisiRRF method, is

not giving accurate localization results (median error µhand

E= 2.98mm), al-

though it shows the capability to discriminate ambiguous structures due to the

use of long range features and training pixels from all over the image. As a rea-

son for low accuracy we identiﬁed greedy node optimization, that favors long

range features even at deep tree levels when no ambiguity among training pix-

els is present anymore. Our implementation of PeterRRF [7], which overcomes

greedy node optimization by selecting optimal feature range in each node, shows

a strong improvement in localization accuracy (µhand

E= 0.89mm). Still it is not

as accurate as the method of Ebner et al. [3], which uses a local RRF with short

range features in the second stage (µhand

E= 0.51mm), while also requiring a sig-

niﬁcantly larger number (around 25 times) of feature candidates per node. The

drawback of EbnerRRF is essentially the same as for localRRF if the area, from

which local RRF training pixels are drawn, despite being reduced by the global

RRF of the ﬁrst stage, still contains neighboring, locally similar structures. To

investigate RRFs capability to discriminate ambiguous structures reliably while

preserving high accuracy of locally trained RRFs, we switch the order of Ebn-

erRRF stages, thus inverting their logic in the spirit of [5, 2]. Therefore, we ex-

tended localRRF by adding a second training phase that uses long range features

for accurate localization and diﬀerently selects areas from which training pixels

are drawn. While increasing the feature range in fAdaptRRF shows the same

accuracy compared to localRRF (µhand

E= 0.51mm), reliability is improved, but

not as strong as when introducing novel pixels into the second training phase.

Training on novel pixels is required to make feature selection more eﬀective in

8ˇ

Stern et al.

discriminating locally similar structures, but it is important to note that they do

not participate in voting at testing time since the accuracy obtained in the ﬁrst

phase would be lost. With our proposed backProjRRF we force the algorithm to

explicitly learn from examples which are hard to discriminate, i.e. pixels belong-

ing to locally similar structures, as opposed to fpAdaptRRF, where pixels are

randomly drawn from the image. Results in Fig. 2 reveal that highest reliability

(0.172% and 7.07 % outliers on 2D hand and 3D teeth data sets, respectively) is

obtained by backProjRRF, while still achieving the same accuracy as localRRF.

In a multi-landmark setting, RRF based localization can be combined with

high level knowledge from an MRF or SSM as in [5,2]. Method comparison re-

sults from Table 1 show that our backProjRRF combined with an MRF model

outperforms the state-of-the-art method of [5] on the hand data set in terms

of accuracy and reliability. However, compared to localRRF our backProjRRF

shows no beneﬁt when both are combined with a strong graphical MRF model. In

cases where such a strong graphical model is unaﬀordable, e.g. if expert annota-

tions are limited (see subset conﬁgurations in Table 1), combining backProjRRF

with an MRF shows much better results in terms of reliability compared to lo-

calRRF+MRF. This is especially prominent in the results for radius and ulna

landmarks. Moreover, Table 1 shows that even without incorporating an MRF

model, the results of our backProjRRF are competitive to the state of the art

methods when limited high level knowledge is available (ﬁngertips, radius/ulna,

carpals). Thus, in conclusion, we have shown the capability of RRF to success-

fully model locally similar structures by implicitly encoding global landmark

conﬁguration while still maintaining high localization accuracy.

References

1. Criminisi, A., Robertson, D., Konukoglu, E., Shotton, J., Pathak, S., White, S.,

Siddiqui, K.: Regression forests for eﬃcient anatomy detection and localization in

computed tomography scans. Med. Image Anal. 17(8), 1293–1303 (2013)

2. Donner, R., Menze, B.H., Bischof, H., Langs, G.: Global localization of 3D anatom-

ical structures by pre-ﬁltered Hough Forests and discrete optimization. Med. Image

Anal. 17(8), 1304–1314 (2013)

3. Ebner, T., ˇ

Stern, D., Donner, R., Bischof, H., Urschler, M.: Towards Automatic

Bone Age Estimation from MRI: Localization of 3D Anatomical Landmarks. In:

MICCAI 2014, Part II. LNCS, vol. 8674, pp. 421–428 (2014)

4. Glocker, B., Zikic, D., Konukoglu, E., Haynor, D.R., Criminisi, A.: Vertebra Local-

ization in Pathological Spine CT via Dense Classiﬁcation from Sparse Annotations.

In: MICCAI 2013, Part II. LNCS, vol. 8150, pp. 262–270 (2013)

5. Lindner, C., Bromiley, P.A., Ionita, M.C., Cootes, T.F.: Robust and Accurate Shape

Model Matching using Random Forest Regression-Voting. IEEE Trans. PAMI 37,

1862–1874 (2015)

6. Lindner, C., Thomson, J., arcOGEN Consortium, T., Cootes, T.: Learning-Based

Shape Model Matching: Training Accurate Models with Minimal Manual Input. In:

MICCAI 2015, Part III. LNCS, vol. 9351, pp. 580–587 (2015)

7. Peter, L., Pauly, O., Chatelain, P., Mateus, D., Navab, N.: Scale-Adaptive Forest

Training via an Eﬃcient Feature Sampling Scheme. In: MICCAI 2015, Part I. LNCS,

vol. 9349, pp. 637–644 (2015)

Coordinated Transformer with Position \& Sample-aware Central Loss for Anatomical Landmark Detection

Preprint

Full-text available

May 2023

Heatmap-based anatomical landmark detection is still facing two unresolved challenges: 1) inability to accurately evaluate the distribution of heatmap; 2) inability to effectively exploit global spatial structure information. To address the computational inability challenge, we propose a novel position-aware and sample-aware central loss. Specifically, our central loss can absorb position information, enabling accurate evaluation of the heatmap distribution. More advanced is that our central loss is sample-aware, which can adaptively distinguish easy and hard samples and make the model more focused on hard samples while solving the challenge of extreme imbalance between landmarks and non-landmarks. To address the challenge of ignoring structure information, a Coordinated Transformer, called CoorTransformer, is proposed, which establishes long-range dependencies under the guidance of landmark coordination information, making the attention more focused on the sparse landmarks while taking advantage of global spatial structure. Furthermore, CoorTransformer can speed up convergence, effectively avoiding the defect that Transformers have difficulty converging in sparse representation learning. Using the advanced CoorTransformer and central loss, we propose a generalized detection model that can handle various scenarios, inherently exploiting the underlying relationship between landmarks and incorporating rich structural knowledge around the target landmarks. We analyzed and evaluated CoorTransformer and central loss on three challenging landmark detection tasks. The experimental results show that our CoorTransformer outperforms state-of-the-art methods, and the central loss significantly improves the performance of the model with p-values< 0.05.

Automatic Landmark Identification in Cone‐Beam Computed Tomography

Article

Feb 2023

Objective: To present and validate an open-source fully automated landmark placement (ALICBCT) tool for cone-beam computed tomography scans. Material and methods: One hundred and forty-three large and medium field of view cone-beam computed tomography (CBCT) were used to train and test a novel approach, called ALICBCT that reformulates landmark detection as a classification problem through a virtual agent placed inside volumetric images. The landmark agents were trained to navigate in a multi-scale volumetric space to reach the estimated landmark position. The agent movements decision relies on a combination of DenseNet feature network and fully connected layers. For each CBCT, 32 ground truth landmark positions were identified by 2 clinician experts. After validation of the 32 landmarks, new models were trained to identify a total of 119 landmarks that are commonly used in clinical studies for the quantification of changes in bone morphology and tooth position. Results: Our method achieved a high accuracy with an average of 1.54±0.87 mm error for the 32 landmark positions with rare failures, taking an average of 4.2s computation time to identify each landmark in one large 3D-CBCT scan using a conventional GPU. Conclusion: The ALICBCT algorithm is a robust automatic identification tool that has been deployed for clinical and research use as an extension in the 3D Slicer platform allowing continuous updates for increased precision.

Which images to label for few-shot medical image analysis?

Article

May 2024
MED IMAGE ANAL

A Method for X-Ray Image Landmarks Localization using Cyclic Coordinate-Guided Strategy

Conference Paper

Apr 2024

GLFSU-Net: Global and Local Features Fusion Using Separable U-Net to Improve Anatomical Landmark Detection in 2D X-Ray Radiography ImagesGLFSU-Net: Improving Anatomical Landmark Detection in 2D X-Ray Radiography Images

Conference Paper

Apr 2024

Jixiang Ding

SVTNet: Automatic bone age assessment network based on TW3 method and vision transformer

Article

Full-text available

Nov 2023
INT J IMAG SYST TECH

This study aims to develop a proficient and clinically applicable algorithm that can accurately assess bone age. This algorithm is based on the principles of the Tanner‐Whitehouse 3 (TW3) integral approach, and aims to achieve efficiency, scalability, and interpretability. We developed a model for bone age prediction in children. The model was tested on a pediatric dataset from a tertiary care hospital consisting of left‐hand radiographs of children between the age of 0 and 18. Our model consists of removing the arm portion using a pre‐trained YOLO network, localizing 37 key points in the hand bone portion using a spatial configuration network, and segmenting the original image through 20 of these points to obtain 20 fixed‐size patches. Finally, each of the 20 bone images is classified by training a visual transformer (ViT) model. In this study, a hybrid network, SVTNet, was developed that incorporates visual transformers to obtain estimates of bone age in the carpal (C series) and metacarpal (RUS series) bones. The sum of the clinical TW3 scoring region scores and bone maturity scores were utilized to determine the bone age for each corresponding region. The performance of the algorithm was evaluated in terms of both training and testing by evaluating 3871 left hand X‐ray micrographs obtained from a tertiary hospital in China. The results showed that the average absolute error of bone age estimation was 0.50 years for the RUS series of bones and 0.47 years for the C series of bones. The main contribution of this study is to propose, for the first time, a ViT‐based bone age assessment method that automates the entire process of the TW3 algorithm and is clinically interpretable, with predictive accuracy comparable to that of an experienced orthopedic surgeon.

Anatomical Landmark Detection Using a Multiresolution Learning Approach with a Hybrid Transformer-CNN Model

Chapter

Oct 2023

Accurate localization of anatomical landmarks has a critical role in clinical diagnosis, treatment planning, and research. Most existing deep learning methods for anatomical landmark localization rely on heatmap regression-based learning, which generates label representations as 2D Gaussian distributions centered at the labeled coordinates of each of the landmarks and integrates them into a single spatial resolution heatmap. However, the accuracy of this method is limited by the resolution of the heatmap, which restricts its ability to capture finer details. In this study, we introduce a multiresolution heatmap learning strategy that enables the network to capture semantic feature representations precisely using multiresolution heatmaps generated from the feature representations at each resolution independently, resulting in improved localization accuracy. Moreover, we propose a novel network architecture called hybrid transformer-CNN (HTC), which combines the strengths of both CNN and vision transformer models to improve the network’s ability to effectively extract both local and global representations. Extensive experiments demonstrated that our approach outperforms state-of-the-art deep learning-based anatomical landmark localization networks on the numerical XCAT 2D projection images and two public X-ray landmark detection benchmark datasets. Our code is available at https://github.com/seriee/Multiresolution-HTC.git.

Bioimpedance MR Scanning For Hypertension and Functional Experience

Conference Paper

Dec 2022

Learning to Localize Cross-Anatomy Landmarks in X-Ray Images with a Universal Model

Article

Full-text available

Jun 2022

Objective and Impact Statement . In this work, we develop a universal anatomical landmark detection model which learns once from multiple datasets corresponding to different anatomical regions. Compared with the conventional model trained on a single dataset, this universal model not only is more light weighted and easier to train but also improves the accuracy of the anatomical landmark location. Introduction . The accurate and automatic localization of anatomical landmarks plays an essential role in medical image analysis. However, recent deep learning-based methods only utilize limited data from a single dataset. It is promising and desirable to build a model learned from different regions which harnesses the power of big data. Methods . Our model consists of a local network and a global network, which capture local features and global features, respectively. The local network is a fully convolutional network built up with depth-wise separable convolutions, and the global network uses dilated convolution to enlarge the receptive field to model global dependencies. Results . We evaluate our model on four 2D X-ray image datasets totaling 1710 images and 72 landmarks in four anatomical regions. Extensive experimental results show that our model improves the detection accuracy compared to the state-of-the-art methods. Conclusion . Our model makes the first attempt to train a single network on multiple datasets for landmark detection. Experimental results qualitatively and quantitatively show that our proposed model performs better than other models trained on multiple datasets and even better than models trained on a single dataset separately.

Coronary Ostia Localization Using Residual U-Net with Heatmap Matching and 3D DSNT

Chapter

Dec 2022

Localization of coronary ostia landmarks in Computed Tomography Angiography (CTA) volumes is a crucial step in developing various automatic diagnostic procedures. In this study, we propose a one-step method of coronary ostia landmark localization that utilizes a residual U-Net with heatmap matching and 3D Differentiable Spatial to Numerical Transform (DSNT). We evaluate the method using two datasets: a Coronary Computed Tomography Angiography (CCTA) dataset containing 201 scans and a publicly available ImageTBAD dataset containing 77 CTA scans annotated with coronary ostia landmarks. On the CCTA dataset we report median Euclidean distance error – 1.14 mm on the left coronary ostium and 0.98 mm on the right coronary ostium. On the ImageTBAD CTA dataset we report median Euclidean distance error – 3.48 mm on the left coronary ostium and 2.97 mm on the right coronary ostium. Our evaluation shows that the proposed method improves accuracy of coronary ostia landmark localization when compared to other known methods.

Learning-Based Shape Model Matching: Training Accurate Models with Minimal Manual Input

Conference Paper

Full-text available

Oct 2015

Recent work has shown that statistical model-based methods lead to accurate and robust results when applied to the segmentation of bone shapes from radiographs. To achieve good performance, model-based matching systems require large numbers of annotations, which can be very time-consuming to obtain. Non-rigid registration can be applied to unlabelled images to obtain correspondences from which models can be built. However, such models are rarely as effective as those built from careful manual annotations, and the accuracy of the registration is hard to measure. In this paper, we show that small numbers of manually annotated points can be used to guide the registration, leading to significant improvements in performance of the resulting model matching system, and achieving results close to those of a model built from dense manual annotations. Placing such sparse points manually is much less time-consuming than a full dense annotation, allowing good models to be built for new bone shapes more quickly than before. We describe detailed experiments on varying the number of sparse points, and demonstrate that manually annotating fewer than 30% of the points is sufficient to create robust and accurate models for segmenting hip and knee bones in radiographs. The proposed method includes a very effective and novel way of estimating registration accuracy in the absence of ground truth.

Robust and Accurate Shape Model Matching Using Random Forest Regression-Voting

Article

Full-text available

Dec 2014

A widely used approach for locating points on deformable objects in images is to generate feature response images for each point, and then to fit a shape model to these response images. We demonstrate that Random Forest regression-voting can be used to generate high quality response images quickly. Rather than using a generative or a discriminative model to evaluate each pixel, a regressor is used to cast votes for the optimal position of each point. We show that this leads to fast and accurate shape model matching when applied in the Constrained Local Model framework. We evaluate the technique in detail, and compare it with a range of commonly used alternatives across application areas: the annotation of the joints of the hands in radiographs and the detection of feature points in facial images. We show that our approach outperforms alternative techniques, achieving what we believe to be the most accurate results yet published for hand joint annotation and state-of-the-art performance for facial feature point detection.

Towards Automatic Bone Age Estimation from MRI: Localization of 3D Anatomical Landmarks

Conference Paper

Full-text available

Sep 2014

Bone age estimation (BAE) is an important procedure in forensic practice which recently has seen a shift in attention from X-ray to MRI based imaging. To automate BAE from MRI, localization of the joints between hand bones is a crucial first step, which is challenging due to anatomical variations, different poses and repeating structures within the hand. We propose a landmark localization algorithm using multiple random regression forests, first analyzing the shape of the hand from information of the whole image, thus implicitly modeling the global landmark configuration, followed by a refinement based on more local information to increase prediction accuracy. We are able to clearly outperform related approaches on our dataset of 60 T1-weighted MR images, achieving a mean landmark localization error of 1.4 ± 1.5mm, while having only 0.25% outliers with an error greater than 10mm.

Global localization of 3D anatomical structures by pre-filtered Hough Forests and discrete optimization

Article

Full-text available

Mar 2013

The accurate localization of anatomical landmarks is a challenging task, often solved by domain specific approaches. We propose a method for the automatic localization of landmarks in complex, repetitive anatomical structures. The key idea is to combine three steps: (1) a classifier for pre-filtering anatomical landmark positions that (2) are refined through a Hough regression model, together with (3) a parts-based model of the global landmark topology to select the final landmark positions. During training landmarks are annotated in a set of example volumes. A classifier learns local landmark appearance, and Hough regressors are trained to aggregate neighborhood information to a precise landmark coordinate position. A non-parametric geometric model encodes the spatial relationships between the landmarks and derives a topology which connects mutually predictive landmarks. During the global search we classify all voxels in the query volume, and perform regression-based agglomeration of landmark probabilities to highly accurate and specific candidate points at potential landmark locations. We encode the candidates’ weights together with the conformity of the connecting edges to the learnt geometric model in a Markov Random Field (MRF). By solving the corresponding discrete optimization problem, the most probable location for each model landmark is found in the query volume. We show that this approach is able to consistently localize the model landmarks despite the complex and repetitive character of the anatomical structures on three challenging data sets (hand radiographs, hand CTs, and whole body CTs), with a median localization error of 0.80 mm, 1.19 mm and 2.71 mm, respectively.

Scale-Adaptive Forest Training via an Efficient Feature Sampling Scheme

Conference Paper

Oct 2015

In the context of forest-based segmentation of medical data, modeling the visual appearance around a voxel requires the choice of the scale at which contextual information is extracted, which is of crucial importance for the final segmentation performance. Building on Haar-like visual features, we introduce a simple yet effective modification of the forest training which automatically infers the most informative scale at each stage of the procedure. Instead of the standard uniform sampling during node split optimization, our approach draws candidate features sequentially in a fine-to-coarse fashion. While being very easy to implement, this alternative is free of additional parameters, has the same computational cost as a standard training and shows consistent improvements on three medical segmentation datasets with very different properties.

Vertebrae Localization in Pathological Spine CT via Dense Classification from Sparse Annotations

Conference Paper

Sep 2013

Accurate localization and identification of vertebrae in spinal imaging is crucial for the clinical tasks of diagnosis, surgical planning, and post-operative assessment. The main difficulties for automatic methods arise from the frequent presence of abnormal spine curvature, small field of view, and image artifacts caused by surgical implants. Many previous methods rely on parametric models of appearance and shape whose performance can substantially degrade for pathological cases. We propose a robust localization and identification algorithm which builds upon supervised classification forests and avoids an explicit parametric model of appearance. We overcome the tedious requirement for dense annotations by a semi-automatic labeling strategy. Sparse centroid annotations are transformed into dense probabilistic labels which capture the inherent identification uncertainty. Using the dense labels, we learn a discriminative centroid classifier based on local and contextual intensity features which is robust to typical characteristics of spinal pathologies and image artifacts. Extensive evaluation is performed on a challenging dataset of 224 spine CT scans of patients with varying pathologies including high-grade scoliosis, kyphosis, and presence of surgical implants. Additionally, we test our method on a heterogeneous dataset of another 200, mostly abdominal, CTs. Quantitative evaluation is carried out with respect to localization errors and identification rates, and compared to a recently proposed method. Our approach is efficient and outperforms state-of-the-art on pathological cases.

Regression forests for efficient anatomy detection and localization in computed tomography scans

Article

Jan 2013

This paper proposes a new algorithm for the efficient, automatic detection and localization of multiple anatomical structures within three-dimensional computed tomography (CT) scans. Applications include selective retrieval of patients images from PACS systems, semantic visual navigation and tracking radiation dose over time. The main contribution of this work is a new, continuous parametrization of the anatomy localization problem, which allows it to be addressed effectively by multi-class random regression forests. Regression forests are similar to the more popular classification forests, but trained to predict continuous, multi-variate outputs, where the training focuses on maximizing the confidence of output predictions. A single pass of our probabilistic algorithm enables the direct mapping from voxels to organ location and size. Quantitative validation is performed on a database of 400 highly variable CT scans. We show that the proposed method is more accurate and robust than techniques based on efficient multi-atlas registration and template-based nearest-neighbor detection. Due to the simplicity of the regressor's context-rich visual features and the algorithm's parallelism, these results are achieved in typical run-times of only ∼4s on a conventional single-core machine.

Towards Automatic Bone Age Estimation from MRI: Localization of 3D Anatomical Landmarks

Jan 2014
421-428

T Ebner
D Štern
R Donner
H Bischof
M Urschler

Ebner, T.,Štern, D., Donner, R., Bischof, H., Urschler, M.: Towards Automatic Bone Age Estimation from MRI: Localization of 3D Anatomical Landmarks. In: MICCAI 2014, Part II. LNCS, vol. 8674, pp. 421-428 (2014)

Vertebra Localization in Pathological Spine CT via Dense Classification from Sparse Annotations

Jan 2013
262-270

B Glocker
D Zikic
E Konukoglu
D R Haynor
A Criminisi

Glocker, B., Zikic, D., Konukoglu, E., Haynor, D.R., Criminisi, A.: Vertebra Localization in Pathological Spine CT via Dense Classification from Sparse Annotations. In: MICCAI 2013, Part II. LNCS, vol. 8150, pp. 262-270 (2013)

From Local to Global Random Regression Forests: Exploring Anatomical Landmark Localization

Abstract and Figures

Recommended publications

Hough Networks for Head Pose Estimation and Facial Feature Localization

Automatic Localization of Locally Similar Structure based on the Scale-Widening Random Regression Fo...

Automatic third molar localization from 3D MRI using random regression forests

Integrating Geometric Configuration and Appearance Information into a Unified Framework for Anatomic...

Integrating Spatial Configuration into Heatmap Regression Based CNNs for Landmark Localization