adfa, p. 1, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Ultrasound and Fluoroscopic Images Fusion by
Autonomous Ultrasound Probe Detection
Peter Mountney1, Razvan Ionasec1, Markus Kaizer2, Sina Mamaghani1, Wen Wu1,
Terrence Chen1, Matthias John2, Jan Boese2 and Dorin Comaniciu1
1 Siemens Corporate Research & Technology, Princeton, USA
2 Siemens AG, Healthcare Sector, Forchheim, Germany
Abstract. New minimal-invasive interventions such as transcatheter valve pro-
cedures exploit multiple imaging modalities to guide tools (fluoroscopy) and
visualize soft tissue (transesophageal echocardiography (TEE)). Currently, the-
se complementary modalities are visualized in separate coordinate systems and
on separate monitors creating a challenging clinical workflow. This paper pro-
poses a novel framework for fusing TEE and fluoroscopy by detecting the pose
of the TEE probe in the fluoroscopic image. Probe pose detection is challenging
in fluoroscopy and conventional computer vision techniques are not well suited.
Current research requires manual initialization or the addition of fiducials. The
main contribution of this paper is autonomous six DoF pose detection by com-
bining discriminative learning techniques with a fast binary template library.
The pose estimation problem is reformulated to incrementally detect pose pa-
rameters by exploiting natural invariances in the image. The theoretical contri-
bution of this paper is validated on synthetic, phantom and in vivo data. The
practical application of this technique is supported by accurate results (< 5 mm
in-plane error) and computation time of 0.5s.
Percutaneous and minimally-invasive cardiac procedures are progressively replac-
ing conventional open-heart surgery for the treatment of structural and rhythmological
heart disease. Catheters are used to access target anatomy through small vascular
access ports. This greatly reduces recovery time and the risk of complications associ-
ated with open surgery. Without direct access and visualization, the entire procedure
is performed under imaging guidance. There are two established modalities currently
used in operating rooms to provide real-time intra-operative images: X-ray fluorosco-
py (Fluoro) and transesophageal echocardiography (TEE). Fluoro provides high qual-
ity visualization of instruments and devices, which are typically radiopaque, while
TEE and more recently 3D TEE can image soft-tissue with great detail. Nevertheless,
the complementary nature of TEE and Fluoro is barely exploited in today’s practice
where the real-time acquisitions are not synchronized and images are visualized sepa-
rately in misaligned coordinate systems.
Recently, the fusion of Fluoro and TEE has been proposed using either hardware or
image based methods. Hardware based approaches , attach additional devices to
the ultrasound probe such as electromagnetic  or mechanical  trackers and align
the device and Fluoro coordinates systems through calibration. Image based methods
,, attempt to use the appearance of the TEE probe in the Fluoro image to
estimate the pose of the probe in the Fluoro coordinate system. These methods are
attractive because they do not require the introduction of additional equipment into
the theatre which may disrupt clinical workflow.
Image based pose estimation is well studied and the problem may be considered
solved when the correspondence between 2D image points and a 3D model are
known. Unfortunately, the appearance of the TEE probe in the Fluoro image makes
establishing the correspondence challenging. The probe’s appearance lacks texture or
clear feature points and can be homogenous under low dose or close to dense tissue.
To alleviate this problem, markers  may be retro fitted to the TEE probe. The pose
of the probe is estimated using well established computer vision techniques, however,
the addition of markers increases the overall size of the probe. Alternatively the natu-
ral geometry of the probe may be used to estimate its pose ,. The authors use a
2D/3D registration technique to refine the probe’s pose estimation and optimal results
are obtained using two biplane images. The method is robust for small pose changes
(10 mm / 10°), however, it requires manual initialization and does not update the reg-
istration in real-time, both of which are important in the clinical setting.
In the paper we propose a robust and fast learning-based method for the automated
detection of the TEE probe pose, with six degrees of freedom, from Fluoro images. A
probabilistic model-based approach is employed to estimate candidates for the in-
plane probe position, orientation and scale parameters. Digitally reconstructed radiog-
raphy (DRR) in combination with a binary template library is introduced for the esti-
mation of out-of-plane rotation parameters (pitch and roll). The approach does not
require manual initialization, is robust over the entire pose parameter space, and inde-
pendent of specific TEE probe design / manufacturer. The performance of the algo-
rithm is demonstrated on a comprehensive dataset of in vivo Fluoro sequences and
validated on simulated and phantom data.
2 Fusion Framework
Information from a TEE volume can be visualized in a Fluoro image by aligning the
TEE and C-arm Fluoro coordinate systems. A point in the ultrasound volume
can be visualized in the Fluoro image at coordinate
Qusing the following trans-
FluoroImage W TEE W
projection xz d TEE TEE
P is the projection matrix,
are the transformation from
detector to world coordinate system,
are the angulations of the C-arm and
are the rotation and position of the TEE probe in the world coordinate
system such that 111
TEE xz TEE
= and 1111
TEE d xz TEE
The TEE volume and Fluoro image can be aligned if position
(, , )
and orientation ,,
=of the TEE probe are known in the Fluoro
detector coordinate system.
Fig. 1. Detecting the pose of a TEE probe from a single Fluoro image.
2.1 TEE Probe Pose Detection
At the heart of our approach is the separation of the pose parameters into in-plane
(, , )
parameters (shown in Fig. 1). By mar-
ginalizing the estimation problem, in-plane parameters can be efficiently estimated
directly from the Fluoro images
I, while being invariant to the out-of-plane pa-
rameters that are more challenging to determine.
The in-plane parameters can be computed from the probe’s position
in the Fluoro image, the projection transformation
of the Fluoro device and the physical dimensions of the TEE probe. To detect the in-
from a Fluoro image
I we use discriminative
learning methods as described in the next section.
The out-of-plane parameters are more challenging to estimate. The visual appear-
ance in Fluoro of the probe varies greatly making it challenging to learn a compact
classifier. This requires the problem to be treated in a fundamentally different way. A
template library is created of the probe’s appearance under out-of-plane orientations
. Each template has an associated
and by matching the Fluoro im-
age to the template the out-of-plane parameters can be estimated.
DetectProbe DetectOrientation DetectScale
DetectRollandPitch:BinaryTem p l a t eLibrary Visualize
Detecting In-plane Parameters
The in-plane parameters are estimated using discriminative learning methods. A clas-
sifier is trained to detect the position
of the TEE
probe in the Fluoro image. The classifiers are trained using manually annotated Fluo-
ro data. They are trained and applied sequentially such that first, candidates are de-
, then the orientation
is detected for each candidate and finally
the size of the probe is detected
. Each detector is trained using a Probabilistic
Boosting Tree (PBT) with Haar-like and steerable features .
detector is trained on manual annotations and negative exam-
ples taken randomly from the Fluoro image. The Fluoro image is resized to
and a window of
is centered at the annotation. A pool of 100,000 Haar fea-
tures are used to train the PBT. The appearance of the probe varies greatly and to
avoid over fitting a classifier is created which is less discriminative but highly proba-
bly to detect the tip of the probe.
detector is trained on manually annotated data and the false
positives from the position detector. Additional negative training data is created, cen-
tered on the annotation but with incorrect orientation parameters. The PBT is trained
with five features including the relative intensity and the difference between two
steerable filters . The orientation detector is trained at intervals of 6° with 360°
coverage. This detector is more discriminative than the position detector and therefore
removes outliers as well as estimating the orientation.
detector is trained to detect two points where the tip of the probe
meets the shaft. The PBT is trained using Haar features. During detection the orienta-
tion and position of the probe are used to constrain the search area for the size detec-
Detecting Out-of-plane Parameters
The appearance of the probe under roll and pitch
varies significantly in
the Fluoro image and cannot generally be accounted for in the image space using the
same techniques as the in-plane parameters. The out-of-plane parameters must be
treated in a fundamentally different way. The proposed solution is to build a template
library containing Fluoro images of the probe under different
parameters are estimated by matching an image patch in
I (normalized for the in-
plane parameters) with the template library.
A comprehensive template library should contain a wide variety of orientations. It
is not feasible to build this library from in vivo data as it is challenging to manually
and the data may not contain complete coverage of the parameter
space. The library is constructed using DRR. DRR’s simulate x-ray Fluoro by tracing
light rays through a 3D volume. In this work a DynaCT of the TEE probe is acquired
512 512 488
0.2225 mm resolution). The orientation and position of the probe was
manually annotated and
rotations are applied to the volume.
Searching a large template library can be computationally expensive. The size of
the library is limited to reduce the search space. The probe is not free to move in all
directions due to the physical constraints of the tissue. In addition the X-ray image,
formulated by integrating light, makes objects appear the same under symmetrical
poses. This is exploited to reduce the size of the template library. The library is built
from -45° to 45° and roll
from -90° to 90° at 2° intervals. This
subsample library is still large and expensive to store and search. To make the prob-
lem computationally tractable a binary template representation is used ,. Binary
templates are an efficient way of storing discriminative information for fast matching.
The image patch is divided into sub-regions and features are extracted for each re-
gion. The dominant orientation  of the gradient in the sub-region is used as a fea-
ture. This has been shown to work well on homogenous regions and objects which
lack texture as is the case for the TEE probe in the Fluoro image. The orientations are
discretized into 8 orientation bins. Each sub-region can be represented as a single byte
which corresponds to the 8 orientation bins. The bit is set to 1 if the orientation exists
in the sub-region and 0 if it does not. The binary template for the image patch is com-
prised of a set of bytes corresponding to the sub-regions. The resulting template is a
compact and discriminative representation of the image patch.
Input templates extracted from the Fluoro image
FIare matched to tem-
plates in the library
IOc FI uvrFOr
is a binary function which returns true if the features in two regions match,
FI uv r
is the input template centered on candidate
is a template in the library and
is the sub-region. The function counts how
many sub-regions in two templates are the same. The template in the library with the
highest count is taken to be the best match and the associated
as the out-of-
plane parameters. This function can be evaluated very quickly using a bitwise AND
operation followed by a bit count enabling the library to be searched efficiently.
The proposed method for probe pose detection was validated on synthetic, phan-
tom and in vivo datasets. Throughout our experiments a GE TEE Transducer was
used. The synthetic dataset includes 4050 simulated Fluoro images (DRR) from a 3D
C-arm Volume (DynaCT -
512 512 488
0.2225 mm pixel spacing) of the TEE
probe. The ground-truth was generated by annotating the 3D probe position in the
DynaCT volume. The phantom dataset includes a volumetric DynaCT of the TEE
probe inserted into a silicon phantom, and a total of 51 Fluoro (
pixel spacing) images captured by rotating the C-arm and with the TEE probe remain-
Fig. 2. Fluoroscopic images illustrating probe detection and estimation of in-plane parameters.
The position of the C-arm is known from the robotic control, which enabled ground-
truth to be computed for each Fluoro image using the 3D probe annotation. The in
vivo dataset was acquired during several porcine studies and includes 50 Fluoro se-
quences comprising of around 7,000 frames (
0.345 mm pixel spacing). The
data contains images with background clutter, catheter tools and variety in the pose of
the probe, C-arm angulations, dose and anatomy. The pose parameters were manually
annotated in all frames and assumed as ground-truth for training and testing.
In the first experiment the quantitative and qualitative performance evaluation of
the in-plane parameter
detection was performed on all three datasets. The
detector was trained on 75% of the in vivo dataset (36 sequences of 5,363 frames) and
tested on the entire synthetic, phantom and remaining 25% of the in vivo dataset. The
results are summarized in Table 1.
For the in vivo data the average in-plane position
error was 2.2 and 3.7 mm
and the in-plane orientation error was 6.69°. Errors in the position estimation are
caused by false detections along the shaft of the probe. False position detections con-
tribute to errors in the orientation estimation. The true positive rate is 0.88 and the
false positive rate is 0.22. The detection and accuracy is affected by dose level, prox-
imity to dense tissue and background clutter. The detection framework performs best
when the probe is clearly distinguishable from its background. Fig. 2 illustrates detec-
tion examples and nature of in vivo images with cluttered background and low tex-
Table 1 Quantitative validation of the in-
Fig. 3. Error analysis (degrees) of
Data u (mm)
1.1 (1.1) 2.2 (3.9) 2.6 (3.2)
1.6 (1.4) 2.0 (1.2) 3.0 (3.4)
2.2 (5.1) 3.7 (8.0) 6.6 (16.7)
Fig. 4. Top - Fluoro images showing the detected pose of probe. Bottom: Left- Fluoro image.
Center – mitral valve detected in 3D TEE. Right – Valve model visualized in Fluoro.
The results for the phantom and synthetic data are provided in Table 1 where detec-
tion was performed at a fixed scale. The Fluoro data from the phantom experiment
appears different from the in vivo data used to train the detectors making it challeng-
ing. The true positive rate was 0.95 and false positive rate 0.05. False detections were
caused by the density of the silicon phantom, which obscures the probe in three imag-
es. The true positive and false positive rates for synthetic data were 0.99 and 0.01
respectively. The visual appearance of the synthetic DRR is similar to the training
data and the probe is clearly distinguishable causing high true positive rate.
detectors are analyzed on the synthetic data to evaluate
the accuracy of the binary template matching. Fig. 3 plots the
error over the
search space (degrees) and illustrates stable detection with a single outlier.
The framework is evaluated with respect to all parameters (Table 2). Quantitative
validation was performed on synthetic and phantom data (in vivo ground truth data
was not available). The largest error is in the Z axis, which corresponds to the optical
axis of the Fluoro device. It is expected that this is the largest error because estimating
distance along the optical axis is challenging from a monocular Fluoro image. Fortu-
nately, the goal of the framework is to visualize anatomy in the Fluoro image, there-
fore errors in Z has little effect on the final visualization. Initial clinical feedback
suggests errors of up to 15° and 10 mm (excluding Z) are acceptable for some visuali-
zations, however accuracy requirements are application specific. Qualitative evalua-
tion (Fig. 4 top) is performed on in vivo Fluoro images.
Table 2 Quantitative validation of TEE probe detection.
Data X (mm) Y(mm) Z(mm)
Synthetic 0.82 (0.79) 0.97 (2.1) 64.0(13.9) 4.2 (10.5) 4.6 (9.0) 2.6(3.2)
Phantom 1.1 (0.8) 0.7(0.6) 19.04(1.6) 11.5(12.0) 11.8(9.8) 3.0(3.4)
The computational performance was evaluated (Intel 2.13GHz single core, 3.4GB
RAM). The average detection time is 0.53 seconds. The computational cost can be
reduced by incorporating temporal information to reduce the search space.
To illustrate the clinical relevance of this work an anatomical model of the mitral
valve is detected  in 3D TEE and visualized in Fluoro (Fig. 4 bottom). The data is
not synchronized and is manually fused. A catheter is visible in both modalities.
This paper presents a novel method for automated fusion of TEE and Fluoro images
to provide guidance for cardiac interventions. The proposed system detects the pose
of a TEE probe in a Fluoro image. Discriminative learning is combined with fast bina-
ry template matching to address the challenges of pose detection. Validation has been
performed on synthetic, phantom and in vivo data. The method is capable of detecting
in 0.5s with an in-plane accuracy of less than 5 mm. Future work will focus on incor-
porating temporal information, using the initial detected pose as a starting estimate for
pose refinement and visualization of anatomically meaningful information.
1. Jain, A., Gutierrez, L., Stanton, D.: 3D TEE Registration with X-Ray Fluoroscopy for Inter-
ventional Cardiac Applications. FIMH. pp. 321–329 (2009).
2. Ma, Y., Penney, G.P., Bos, D., Frissen, P., Rinaldi, C.A., Razavi, R., Rhode, K.S.: Hybrid
echo and x-ray image guidance for cardiac catheterization procedures by using a robotic arm:
a feasibility study. Physics in Medicine and Biology. 55, 371–382 (2010).
3. Gao, G., Penney, G., Ma, Y., Gogin, N., Cathier, P., Arujuna, A., Morton, G., Caulfield, D.,
Gill, J., Aldo Rinaldi, C., Hancock, J., Redwood, S., Thomas, M., Razavi, R., Gijsbers, G.,
Rhode, K.: Registration of 3D trans-esophageal echocardiography to X-ray fluoroscopy us-
ing image-based probe tracking. Medical Image Analysis. 16, 38–49 (2012).
4. Gao, G., Penney, G., Gogin, N., Cathier, P., Arujuna, A., Wright, M., Caulfield, D., Rinaldi,
A., Razavi, R., Rhode, K.: Rapid Image Registration of 3D Transesophageal Echocardiog-
raphy and X-Ray for Guidance of Cardiac Interventions. IPCAI. pp. 124–134 (2010).
5. Lang, P., Seslija, P., Chu, M.W.A., Bainbridge, D., Guiraudon, G.M., Jones, D.L., Peters,
T.M.: US-Fluoroscopy Registration for Transcatheter Aortic Valve Implantation. IEEE
Transactions on Biomedical Engineering. 59, 1444 –1453 (2012).
6. Wu, W., Chen, T., Wang, P., Zhou, S.K., Comaniciu, D., Barbu, A., Strobel, N.: Learning-
based hypothesis fusion for robust catheter tracking in 2D X-ray fluoroscopy. CVPR. pp.
7. Hinterstoisser, S., Lepetit, V., Ilic, S., Fua, P., Navab, N.: Dominant orientation templates for
real-time detection of texture-less objects. CVPR. pp. 2257–2264 (2010).
8. Taylor, S., Drummond, T.: Multiple Target Localisation at over 100 FPS. BMVC (2009).
9. Ionasec, R.I., Voigt, I., Georgescu, B., Wang, Y., Houle, H., Vega-Higuera, F., Navab, N.,
Comaniciu, D.: Patient-specific modeling and quantification of the aortic and mitral valves
from 4-D cardiac CT and TEE. IEEE Trans on Med Imag. 29, 1636–1651 (2010).