Conference PaperPDF Available

Fast 3D Structure Localization in Medical Volumes using CUDA-enabled GPUs

Authors:

Abstract and Figures

Effective and fast localization of anatomical struc-tures is a crucial first step towards automated analysis of medical volumes. In this paper, we propose an iterative approach for structure localization in medical volumes based on the adaptive bandwidth mean-shift algorithm for object detection (ABMSOD). We extend and tune the ABMSOD algorithm, originally used to detect 2D objects in non-medical images, to localize 3D anatomical structures in medical volumes. For fast localization, we design and develop optimized parallel implementations of the proposed algorithm on multi-cores using OpenMP, and on GPUs using CUDA. We evaluate the quality, performance and scalability of the proposed algorithm on Computed Tomography (CT) volumes for various structures.
Content may be subject to copyright.
Fast 3D Structure Localization in Medical Volumes
using CUDA-enabled GPUs
Sharan Vaswani, Rahul Thota, Nagavijayalakshmi Vydyanathan and Amit Kale
Siemens Corporate Research and Technologies
Bangalore, India
Email: {sharan.vaswani, rahul.thota, nagavijayalakshmi.vydyanathan, kale.amit}@siemens.com
Abstract—Effective and fast localization of anatomical struc-
tures is a crucial first step towards automated analysis of medical
volumes. In this paper, we propose an iterative approach for
structure localization in medical volumes based on the adaptive
bandwidth mean-shift algorithm for object detection (ABMSOD).
We extend and tune the ABMSOD algorithm, originally used
to detect 2D objects in non-medical images, to localize 3D
anatomical structures in medical volumes. For fast localization,
we design and develop optimized parallel implementations of
the proposed algorithm on multi-cores using OpenMP, and on
GPUs using CUDA. We evaluate the quality, performance and
scalability of the proposed algorithm on Computed Tomography
(CT) volumes for various structures.
Index Terms—localization; medical image processing; GPGPU
computing; CUDA
I. INTRODUCTION
Automation of clinical procedures involving analysis of
imaging data, such as tissue volume quantification, screening,
diagnosis as well as surgical procedures, not only helps to im-
prove patient throughput but also enhances repeatability, safety
and quality of patient care. Typically, analysis of medical
imaging data includes operations such as image segmentation,
registration, feature extraction, recognition and classification.
As medical images suffer from inherent noise and low contrast
and spatial resolution [1], accurate segmentation, registration
or classification is difficult and computationally intensive. For
example, several 3D anatomy segmentation and recognition
algorithms take several minutes for execution even with GPU
acceleration [2], [3], [4]. In addition, these algorithms are
highly tuned and specific to an anatomical structure, like the
lung or liver. To address the above issues, a generic pre-
processing step that localizes any structure can be very useful
in improving both speed and accuracy of the above procedures.
For example, high precision segmentation of tumors can be
accomplished faster by executing complex domain-specific
segmentation algorithms on a localized region around the
tumour, rather than the entire volume.
Localization can be used to improve the speed and quality of
diagnosis for difficult cases. A doctor can scan his past patient
data to retrieve a subset of imaging records that comprise
of the structure of interest. Domain-specific algorithms can
then be run on localized regions in the relevant records to
identify similar cases quickly, which the doctor can consult
before making critical diagnoses. Localization can also be
applied to track anatomical structures in image-guided surgical
procedures. Thus localization can play a crucial first step in
automated analysis of medical imaging data.
In this paper, we propose an iterative approach for struc-
ture localization in medical volumes that is based on the
adaptive bandwidth mean-shift algorithm for object detection
(ABMSOD) [5]. We extend and tune the ABMSOD algorithm,
originally used to detect 2D objects in non-medical images,
to localize 3D anatomical structures in medical volumes. The
ABMSOD algorithm is an iterative meanshift based object
detection algorithm which estimates the position as well as
the size and orientation of the target object in the given
image. ABMSOD estimates the object position using conven-
tional meanshift and the object scale and orientation using
an adapative bandwidth for the meanshift kernel. To enable
fast localization of structures, we develop optimized parallel
implementations of our localization technique on multi-cores
using OpenMP and graphics processors using CUDA.
We evaluate the quality, performance and scalability of our
algorithm on Computed Tomography (CT) volumes for the
following structures: brain stem, eye and the parotid gland.
Our evaluations show that in 40% of the runs, we are able
to encapsulate more than 90% of the structure, while in 65%
of the runs we are able to capture the structure partially with
atleast 50% of coverage. As our proposed technique is generic
enough to accomodate any target description, our future work
is to experiment with different target descriptors trading off
between localization accuracy, speed and flexibility. The GPU-
acceleration of the proposed algorithm yields a 97x speedup
over the sequential implementation and a 28x speedup over
the OpenMP parallel implementation, and scales well with
increasing dimensions of the search space.
II. RE LATE D WOR K
Localization and segmentation of anatomies has been a
primary focus area of research over the recent years. Lo-
calization techniques are typically based on either boundary
delineation methods using active shapes, appearances [6], [7]
and deformable models [8], [9] or through machine learning
techniques [10], [11], [12], [13], [14], [15]. Active shapes and
appearance models have been used to perform 2D segmen-
tation in medical images that have a fairly consistent shape
and gray level appearance [16], [17], [18]. However, these
techniques require extensive apriori knowledge of the struc-
tures and intensive training to build robust models, especially
for 3D segmentation where the gray levels and the boundary
appearances may vary largely [19]. Also, these works focus
on specific imaging modalities. Other model-based localization
techniques, including deformable models [8], [9] also require
extensive training and are structure specific.
Recently, marginal space learning has proved a successful
technique in automatic detection of 3D structures in med-
ical images [10], [11]. Here, localization is modeled as a
classification problem and structures are identified by per-
forming parameter estimations in a series of marginal spaces
with increasing dimensionality rather than scanning the clas-
sifer exhaustively over the entire search space. Constrained
marginal search exploits correlation amongst the parameters
to reduce the search space and quicken convergence. Cri-
minisi et. al. [12] propose a random decision forest based
classifier to detect multiple organs in CT images. These works
which are based on classification, involve a computationally
intensive training phase that requires a large training data set.
Pauly, Criminisi and others, proposed an approach to detect
the position of multiple objects in MR images using a single
fern-based regressor [14], which facilitates faster training. This
work introduces features based on MR Dixon channels and
is tightly coupled to the MR modality. They use cuboids to
localize anatomies. Zhou et. al. [13], [15] describe an ensemble
learning based approach for detecting solid objects in CT
images. This approach requires lesser number of data sets
for the training phase. It detects 3D objects by applying 2D
detection techniques across slices in all three dimensions of
the image volume, which makes the algorithm computationally
intensive (approximately 15 seconds per CT scan), and uses
cuboids to localize structures. Apart for the works described
above, several researchers have proposed algorithms to localize
and segment specific organs by applying domain specific
knowledge [10], [20], [21], [22].
All of the above works are either domain specific, compu-
tationally intensive or require extensive training. Some works
use cuboids for localization which does not tightly capture
the orientation and shape of irregular structures. In this paper,
we extend the work of Chen et. al. [5] to develop a generic
framework that can be used to detect any 3D structure in a
medical volume without the need for intensive training or apri-
ori domain specific knowledge. Our algorithm is parallelized
on multi-core CPUs and GPUs to enable fast localization.
III. BACKGRO UN D
A. Adaptive Bandwidth Mean-shift Algorithm
The Adaptive Bandwidth Meanshift Object Detection
(ABMSOD) algorithm is an iterative meanshift based algo-
rithm for 2D object detection in computer vision. The mean
shift algorithm is an unsupervised method commonly used
in computer vision problems such as filtering, tracking and
segmentation. In the mean shift procedure the feature points
move toward some significant modes and cluster themselves
automatically making it ideal for multi-object detection and
localization. The ABMSOD algorithm uses kernel weighted
feature histograms (features could be color, texture or haar-
like features) to describe the target object and candidate object
models. Given a test image and the target kernel-weighted
feature histogram, ABMSOD tries to identify the optimum
position, scale and orientation of the target in the test image.
It follows a two step approach - in the first step, it searches
through the whole image to identify rough positions of possi-
ble candidate objects. This is done by randomly scattering
ellipstical windows all over the image and computing the
similarity between the feature histogram of these windows
with that of the target using the Bhattacharyya co-efficient,
which is defined as
ρ(x) =
M
X
u=1 ppu(x)qu(1)
where, pand qare the candidate and target histograms
respectively, Mis the total number of bins in the histogram
and xis the position of the candidate window. Only window
regions with similarity above a threshold is considered as
possible candidate objects. In the second step, for each pos-
sible candidate object identified, the optimum position, scale
and orientation maximizing the similarity between the target
histogram and the candidate histogram, is computed.
Conventional meanshift is used for optimum position esti-
mation. The meanshift procedure assigns weights to the pixels
in each elliptical candidate window. These weights depend on
the feature value as well as the distance of the pixel from
the ellipse centre. The distances are usually weighted by a
kernel function like the epanichekov or gaussian kernel [23].
Meanshift finds the best next position of the window in every
iteration using the pixel weights. To estimate the optimum axes
lengths and orientation, the optimum bandwidth matrix, which
encodes the scale and orientation parameters of the candidate
ellipse, is computed at each iteration using the equations in
[23]. Thus ABMSOD iteratively finds the best position, scale
and orientation of the target object in a given image.
B. GPGPU Computing and CUDA
The GPU is a data-parallel computing device consisting of
a set of multiprocessing units (SM), each of which is a set of
SIMD (single instruction multiple data) processing cores. For
example, the Quadro FX 5600 GPU has 16 multiprocessing
units, each having 8 SIMD cores, resulting in a total of 128
cores. Each SM has a fixed number of registers and a fast
on-chip memory that is shared among its SIMD cores. The
different SMs share a slower off-chip device memory. Constant
memory and texture memory are read-only regions of the
device memory and accesses to these regions are cached. Local
and global memory refers to read-write regions of the device
memory and its accesses are not cached. The Compute Unified
Device Architecture (CUDA) [24] is a C-based programming
model from NVIDIA that exposes the parallel capabilities of
the NVIDIA GPU for general purpose computing.
In the CUDA context, the GPU is called device, whereas
the CPU is called host. Kernel refers to an application that is
executed on the GPU. A CUDA kernel is launched on the
Fig. 1. CUDA Programming and Memory Model. (Courtesy: NVIDIA)
GPU as a grid of thread blocks. A thread block contains
a fixed number of threads and can span in one, two or
three dimensions. A grid can span in one or two dimensions.
Figure 1 shows an example of a kernel launched as a two
dimensional 3×2grid of two dimensional 5×3thread blocks.
Threads are uniquely identified based on their block index
and thread index within the block. A thread block is executed
on one of the multiprocessors and multiple thread blocks can
be run on the same multiprocessor. Consecutive threads of
increasing thread indices in a thread block are grouped into
what are known as warps which is the smallest unit in which
the threads are scheduled and executed on a multiprocessor.
IV. 3D S TRUCTURE LOCALIZATION ALGORITHM
To detect 3D structures in medical volumes, we extend the
ABMSOD algorithm (described in Section III-A), which was
originally proposed to detect 2D objects in the computer vision
domain. We extend the algorithm to perform 3D localizations
and adapt it for medical image processing. As a first step to
the extension, we derive the expression for the 3D bandwidth
matrix(H), which encodes the scale and orientation parameters
of the candidate ellipsoid. There are 9 independent parameters
of the candidate ellipsoid - the 3D coordinates of the center
of the ellipsoid, the lengths of the 3 axes - a,b,c and the
orientation of the ellipsoid about the X, Y and Z axes - α,
β, and γ. The Hmatrix, therefore, is given by
H=AAT;A=R(α, β, γ )×D(a, b, c)(2)
Here, Rdenotes the rotation matrix and Ddenotes the
diagonal matrix. Ris defined as
R=Rx(α)×Ry(β)×Rz(γ)(3)
where Rx,Ryand Rzrepresent the rotation matrices about
the X, Y and Z axes respectively. D is given by
D="a0 0
0b0
0 0 c#(4)
At every iteration, we scale the parameters of the ellipsoid
by a factor σfollowing the intuition of Ning et. al. [25] to
search in a region slightly bigger than that obtained from the
previous iteration so as to capture more of the local context
around the search window. With this scaling, the equation of
the ellipsoid is given by
S={s|(xs)TH1(xs)} ≤ σ3(5)
Here, Sis the set of all points that lies within the ellipsoid,
centered at x, with a bandwidth matrix of H. The lengths of
the three axes of the ellipsoid are scaled by a factor σ.
The feature values of the points in Sare used for the
histogram calculation. The candidate histogram for an ellipsoid
described by the bandwidth matrix Hand centred at the point
xis given by
pu(x) = CHX
sS0
|H|1/2K((xs)TH1(xs))δ[b(s)u](6)
where u= 1,2...M denotes the histogram bins, b(s)denotes
the bin number to which the feature of pixel slies, CHis the
normalization constant, and δis the Kronecker delta function.
Any feature that can be represented by a histogram such
as the pixel intensity, histogram of gradients or local binary
patterns, can be used. The target histogram qis constructed
using the ground truth data of the structure to be localized. It
represented in a similar way as the candidate histogram. We
also measure the center of the structure to be localized and the
scale and orientation parameters of the closest fitting ellipsoid
that encloses the structure in the training volumes and use this
to initialize the search space. In our experiments, we used one
volume for each structure for training.
The search space is defined by a bounded region around
the expected center of the structure (as computed through the
training procedure described above). The dimensions of the
search space should be large enough to account for the maxi-
mum variance in the structure positions. Our algorithm works
by scattering random points within the search space. Each of
these points marks the center of a candidate ellipsoid. The
axes lengths of the candidate ellipsoids are intialized using the
values obtained through training along with a certain variance.
All orientations are initialized to zero. This constitutes the
intial H matrix. To fix the value of σ, the algorithm is run
on the train volume using different values of σand the value
which gives the tightest enclosing ellipsoid is chosen.
As mentioned in Section III-A, the ABMSOD algorithm
uses meanshift for position estimation.At each iteration of
meanshift, the next position for the search window depends
on the coordinates and weights assigned to the pixels in the
ellipse. It is given by:
x=PsSGH(xs)w(s)s
PsSGH(xs)w(s)(7)
where Kis the kernel function used to weigh the pixel
contributions to the histogram and w(s)is the weight
assigned to each pixel sin the search window
We use the Gaussian function as our kernel function,
i.e K(x) = cexp x
2The function w(s)is defined as
w(s) = PM
1pqu/pu(x)δ[b(s)u]and GH(x) = K0
H(x)
ABMSOD derives the expression for the optimum 2D H
matrix for estimating the scale and orientation at each iteration.
We derive the optimum H matrix for any N dimensional super-
ellipsoid. It is shown by Chen et. al. [23] that maxmization of
the Bhattacharyya coefficient is equivalent to the maximization
of the following function
f(x, H) = |H|1/2XK((xs)TH1(xs))w(s)(8)
To derive the optimum N-dimensional H matrix, we fix the
position xof the ellipsoid in Equation 8. Replacing Kby the
gaussian kernel equation, we get
f(H) = c|H|1/2Xexp (xs)TH1(xs)
2w(s)(9)
By taking logarithm on both sides and by using Jensen’s
inequality, we have
ln(f(H)) Xln(c) + 1
2ln(H1) + ((xs)TH1(xs))w(s)
2(10)
Let Ldenote the RHS of the above equation. Differentiating
Lwith respect to H1and setting it to zero, we derive
H=PsS(xs)(xs)Tw(s)
PsSw(s)(11)
Equation 11 is used to update the H matrix in every iteration.
The algorithm is terminated after a fixed maximum number
of iteration or when the Bhattacharyya coefficient saturates.
This iterative procedure is executed for each of the initial
points of the search space. Thus, each of these points con-
verges to a local maxima giving the optimum position and
ellipsoid parameters in the neighbourhood of the initial point.
The ellipsoid with the maximum Bhattacharyya coefficient
localizes the structure most accurately and is the output of the
algorithm. Sufficient number of points is necessary to ensure
convergence to the correct solution.
A. Parallel 3D Localization on Multi-cores
The iterative search for the optimum position, shape and
orientation for each initial random point is independent of each
other. Hence, for a ’n’ core processor, we spawn ’n’ threads,
where each thread simultaneously performs the iterative search
procedure for an initial random point. Since the computations
for each point are independent, there is no need for sharing
memory among the threads. Each thread has local memory
for the meanshift algorithm specific calculations. We thus use
the OpenMP ’parallel for’ construct to distribute the points
amongst the threads. As each search path can take different
times for convergence, the ’schedule dynamic’ clause is used
to dynamically load balance the distribution of points amongst
the threads. This results in a speed-up of 3.5x for the parallel
implementation on the CPU.
B. Parallel 3D Localization on GPUs
We design and implement the parallel variant of the lo-
calization technique on GPUs using the CUDA programming
model [24]. As explained in Section III-B, CUDA exposes two
levels of parallelism - a coarser level using thread blocks and
a finer level using threads within a thread block. The inde-
pendent exploration of different search paths originating from
each initial random point is distributed amongst thread blocks.
In addition, using the finer level of parallelism offered by
threads within a thread block, we further parallelize operations
within each search iteration. We make the threads in a thread
block handle computations for a subset of the voxels from a
cube which completely encloses the ellipsoid. Computations
such as the application of the kernel function, construction of
the candidate histogram, weight assignment to the voxels, H
matrix computation etc. are all done in parallel where each
thread is responsible for a set of voxels. Summation of values
across threads is performed through parallel reduction.
Algorithm 1 depicts the pseudo-code for the GPU acceler-
ated 3D localization algorithm. To reduce the synchronization
operations among threads during histogram computation, we
allow each thread to construct a local histogram of the voxels
handled by that thread. These histograms are stored in shared
memory for fast access. After all local histograms are con-
structed, the histograms are bin wise aggregated by the threads
in parallel to form the global candidate histogram. The CT
volume is stored in a 3D texture and the access to the volume
is ensured to be in a way that maximizes spatial locality and
efficiently utilizes the texture cache. Constant variables like
σand the target histogram are stored in constant memory
to utilize the constant cache. Data shared by threads within
a block like the H matrix, local histograms etc. are stored
in shared memory for fast retrieval through the broadcast
mechanism supported by CUDA. Enough number of threads
are launched to keep all the cores of each streaming multi-
processor (SM) busy. We try and maximize the occupancy for
each SM. The occupancy is however limited by the amount of
shared memory and the number of registers available per SM.
We also ensure that there is no register spilling. The kernels
are designed to minimize the warp divergence and maximize
the Instructions per cycle (IPC). In addition all global memory
accesses (loading the initial ellipsoid parameters and storing
back the final optimum parameters for each ellipsoid) are
coalesced. This gives us a speed-up of 97x for the GPU
implementation of the algorithm.
V. PERFORMANCE EVAL UATION
The proposed structure localization algorithm was evalu-
ated on a system with an Intel(R)Xeon(R) X5450 quad-core
processor clocked at 3 GHz and with 3.25 GB RAM. The
system included a NVIDIA Tesla C2050 GPU as a PCI-
express device. This GPU has 14 multi-processors each having
32 cuda cores, resulting in a total of 448 cuda cores. The cores
are clocked at 1.15 GHz. Each multi-processor has 48 KB of
shared memory and 32 K registers. The GPU device has 3
GB of device memory. The Tesla C2050 GPU has a compute
capability version of 2.0 and cuda toolkit and sdk version 4.0
was used to develop and execute CUDA kernels on the device.
We implemented three variants of the structure localization
algorithm - a sequential version in C, a OpenMP-based parallel
version that utilizes the 4 cores of the Xeon X5450 processor,
Algorithm 1 GPU accelerated 3D localization algorithm
1: Initialize randomly the postions xof N ellipsoids within a fixed search space centred about the structure location in the target
2: Initialize randomly the scales with a certain variance about the scales of the target structure
3: Initialize all orientations to zero and construct the bandwidth matrix according to Equations 2 3 4
4: for each candidate ellipsoid Sccentered at xwith a bandwidth matrix Hdo .Perform the iterative search for the optimum position, scale and
orientation for each candidate ellipsoid
5: iter cnt 1
6: bhatcf 0.Initialize Bhattacharyya coefficient
7: max bhatcf 0.Maximum Bhattacharaya coefficient across all iterations
8: delta bhatcf T HRE SH OLD
9: xopt x
10: Hopt H
11: while (delta bhatcf T HRE SH OLD and iter cnt < M AX IT ERAT IONS)do .Termination criteria for the meanshift search
12: for all CUDA threads tCUDA thread block Bcdo .One CUDA thread block is launched for each candidate ellipsoid
13: V(t)set of voxels handled by thread t
14: for each voxel vV(t)that satisfies Equation 5 do .for every voxel inside the candidate ellipsoid
15: LCHt(bv)LC Ht(bv) + cexp dv
2. LCH is the partial candidate histogram local to each thread, bvis the bin index for
voxel vand dvis the distance of voxel vfrom the center of the ellipsoid
16: GCHcglobal candidate histogram .all threads do a parallel reduction to aggregate the local histograms and form the global
histogram
17: for each voxel vV(t)that satisfies Equation 5 do .for every voxel inside the candidate ellipsoid
18: wvqGCHc(bv)
T H(bv).A weight is computed for each voxel as the ratio of the bin heights of the candidate histogram and the target
histogram. T H refers to the target histogram
19: δxv GH(v)wvsv. δxv denotes the contribution of the voxel vto the change in the position of the ellipsoid and is computed
according to Equation 7
20: xnew Pvδxv
PvGH(v)wv
. xnew is the new position of candidate ellipsoid and all threads do a parallel reduction to compute the
summations required in this step
21: for each voxel vV(t)that satisfies Equation 5 do .for every voxel inside the candidate ellipsoid
Repeat Steps 15 to Step 18 for the candidate ellipsoid Sccentered at xnew
22: δHv (xnew sv)(xnew sv)Twv. δH v denotes contribution of a voxel to the H matrix
23: Hnew PvδHv
Pvwv
. Hnew is the optimum H matrix at the new position of the candidate ellipsoid and all threads do a parallel
reduction to compute the summations required in this step
24: bhatcf p(GCHc)(T H )
25: delta bhatcf max bhatcf bhatcf
26: if bhatcf > max bhatcf then
27: max bhatcf bhatcf
28: xopt xnew
29: Hopt Hnew
30: iter cnt inter cnt + 1
31: Among the N final candidate ellipsoids, output xopt and Hopt corresponding to the ellipsoid with the maximum Bhattacharyya coefficient.
and a CUDA-based GPU accelerated version. We evaluate the
quality and accuracy of the proposed algorithm as well as the
performance and scalability for the three variants described
above.
The algorithm was tested on 3 structures in CT volumes
- brain stem, eye and the parotid gland. We tested on 17
CT volumes for the brain stem localization, 19 volumes for
the right eye and 9 volumes for the left parotid gland. The
experiments were conducted with N = 400 initial candidate
ellipsoids, a maximum of 30 iterations and a termination
threshold value of 0.02. All values reported are averages of
three runs. Figure 2 shows the ground truth for these structures
and the corresponding localized region as detected by our
algorithm for one volume. Though the algorithm performs 3D
localization, for easy visualization, we depict the 2D central
slices of the volume. The ground truth is shown after contrast
enhancement to enable easy viewing of the structure. However,
as can be seen in the images showing the localization results,
the contrast resolution of medical images is low, which makes
the task of accurate localization very difficult.
Figure 3 shows the extent to which appropriate scales and
orientations are captured. We see that our approach, which
computes the orientation and scale parameters in tandem
in continous space rather than in discrete space, is able to
adapt to the scale and orientations of the brainstem structure
quite smoothly. The localized region in the XY plane (see
Figure 2(b)) is almost a circle while in XZ and YZ planes,
(see Figure 3), the scale and orientations of the plotted ellipse
adapts to the orientation and shape of the structure.
We measure the quality of the localization achieved using
two metrics: coverage ratio (CR) and tightness ratio (TR).
Coverage ratio is a measure of the extent of coverage of the
structure of interest within the localized region and is defined
as the percentage volume of the structure that is encapsulated
within the localized region, i.e CR =(SL)
Swhere S denotes
the structure of interest and L denotes the localized region.
The tightness ratio (TR), is a measure of how tightly the
structure is localized and is defined as the percentage volume
of the localized region that is composed of the structure under
consideration, i.e T R =(SL)
L.
(a) Brain Stem (b) Right Eye (c) Left Parotid
Fig. 2. Contrast-enhanced ground truth (left) and localized regions (right) (central slice along the XY plane) for brain stem, right eye, and the left parotid
gland
Fig. 3. Contrast-enhanced ground truth and localized regions (central slice
along the XZ plane (top) and the YZ plane (bottom) for the brain stem
Structure Seq Multi-core accel GPU accel
Brain Stem 545.8 151.9 5.5
Right Eye 269.3 73.9 2.9
Left Parotid 1070.9 302.7 9.39
TABLE I
AVERA GE LOCALIZATION TIME IN SECONDS FOR THE SEQUENTIAL,
MULTI-CORE AC CL ERAT ED AN D TH E GPU ACCE LE RATED VAR IA NTS .
Figure 4(a) shows the plot of the coverage ratio. In general,
the quality of localizations is better for the brain stem and
the right eye (brain stem and the right eye are localized with
atleast 50% coverage in about two-thirds of the volumes). This
is because, we use a intensity histogram based approach for
localization and the contrast ratios are higher for the brain stem
and the right eye as compared to the parotid gland. In 40%
of the runs, we are able to encapsulate more than 90% of the
structure, while in 65% of the runs we are able to capture the
structure partially with atleast 50% of coverage. A coverage
ratio of zero implies an incorrect localization, which accounts
for 16% of the runs. Figure 4(b) plots the tightness ratio for
the four structures. On an average about 20% of the localized
region is made up of the structure being localized. Since, we
use a simplistic intensity histogram as our target description
function, shifts in intensity profiles across volumes as well
as the low contrast ratio inherent in medical images, impacts
the quality of localization. One approach to address this is to
plug-in better descriptor functions (feature-based and domain
specific), while trading off between the computational intensity
of the training and detection and the required accuracy levels.
Table I shows the average localization times. We see that
the multi-core version yields an average speedup of about
3.5x over the sequential variant on the 4 cores, while the
GPU accelerated version yields an average speedup of about
97x over the sequential variant. Thus, leveraging the GPUs
as massively parallel computing devices yields two orders of
magnitude improvements in localization speeds.
To evaluate the scalability of the proposed technique, we
varied the search space and measured the quality as well as
performance. Figure 5 plots the localization times as well as
the coverage ratio for brain stem for one volume as we increase
the dimensions of the search space (we increase the x, y,
z dimensions of the search space as well as the number of
search points keeping the number of search points density a
constant). As expected we see that the quality improves as we
increase the search space. The GPU accelerated variant scales
extremely well as compared to the other two. Thus, leveraging
the graphics processors as massively parallel computing de-
vices yields large orders of improvements in both localization
speed as well as scalability.
VI. CONCLUSION AND FUTURE WO RK
This paper proposes an iterative approach for structure
localization in medical volumes that is based on the adaptive
bandwidth mean-shift algorithm for object detection (ABM-
SOD). The ABMSOD algorithm, originally used to detect
2D objects in non-medical images, is extended and tuned
to localize 3D anatomical structures in medical volumes. To
enable fast localization, optimized parallel implementations
are developed on multi-cores using OpenMP and GPUs using
CUDA. Our evaluations with three structures (brain stem, eye
and parotid) in CT volumes shows that our algorithm is able
to localize structures with reasonable accuracy in many cases.
However, we noted that the algorithm is sensitive to intensity
changes across volumes and has a poor performance when
the intensity difference between the train and test volumes
is high. To address this issue, we plan to further investigate
the algorithm with different target descriptor functions like
local binary patterns (LBP), histogram of gradients or haar
features. Features like LBP are intensity and rotation invariant
and hence should be able to adapt to large differences in
intensity between train and test volumes.
REFERENCES
[1] J. T. Bushberg, J. A. Seibert, E. M. Leidholdt, and J. M. Boone, The
Essential Physics of Medical Imaging (2nd Edition), 2nd ed. Lippincott
Williams & Wilkins, Dec. 2001.
(a) (b)
Fig. 4. Scatter plot of (a) coverage ratio and (b) tightness ratio
Fig. 5. Scalability of the localization against increasing dimensions of the
search space
[2] X. Zhou, T. Hayashi, T. Hara, H. Fujita, R. Yokoyama, T. Kiryu, and
H. Hoshi, “Automatic segmentation and recognition of anatomical lung
structures from high-resolution chest ct images,” Journal of Computer-
ized Medical Imaging and Graphics, vol. 30, no. 5, pp. 299–313, 2006.
[3] B. C. Lucas, M. Kazhdan, and R. H. Taylor, “Multi-object geodesic
active contours (mogac): A parallel sparse-field algorithm for image seg-
mentation,” Johns Hopkins University Department of Computer Science,
Tech. Rep., Feb 2012.
[4] J.-C. D, F. de Manuel L, P. J, T. JM, R. E, D. M, and S. A. L.-C.
MJ., “Optimal multiresolution 3d level-set method for liver segmentation
incorporating local curvature constraints,” in Intl. Conf. of the IEEE
Engineering in Medicine and Biology Society, 2011, pp. 19–22.
[5] X. Chen, H. Huang, H. Zheng, and C. Li, “Adaptive bandwidth mean
shift object detection,” in IEEE Conf. on Robotics, Automation and
Mechatronics (RAM), 2008, pp. 210–215.
[6] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, “Active shape
modelstheir training and application,” Comput. Vis. Image Underst.,
vol. 61, no. 1, pp. 38–59, Jan. 1995.
[7] T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance mod-
els,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 23, pp. 681–685, 2001.
[8] S. Frantz, K. Rohr, and H. S. Stiehl, “Localization of 3d anatomical
point landmarks in 3d tomographic images using deformable models,”
in Intl. Conf. on Medical Image Computing and Computer-Assisted
Intervention, ser. MICCAI ’00, 2000, pp. 492–501.
[9] S. Wrz and K. Rohr, “Localization of anatomical point landmarks in
3d medical images by fitting 3d parametric intensity models.” Medical
Image Analysis, vol. 10, no. 1, pp. 41–58, 2006.
[10] Y. Zheng, A. Barbu, B. Georgescu, M. Scheuering, and D. Comaniciu,
“Four-chamber heart modeling and automatic segmentation for 3-d
cardiac ct volumes using marginal space learning and steerable features,
IEEE Trans. Med. Imaging, vol. 27, no. 11, pp. 1668–1681, 2008.
[11] Y. Zheng, B. Georgescu, and D. Comaniciu, “Marginal space learning
for efficient detection of 2d/3d anatomical structures in medical images,
in Proceedings of the 21st International Conference on Information
Processing in Medical Imaging, ser. IPMI ’09, 2009, pp. 411–422.
[12] A. Criminisi, J. Shotton, and S. Bucciarelli, “Decision forests with
long-range spatial context for organ localization in ct volumes,” in
Proceedings of the MICCAI workshop on Probabilistic Models for
Medical Image Analysis (MICCAI-PMMIA), 2009.
[13] “Automated localization of solid organs in 3d ct images: A
majority voting algorithm based on ensemble learning,” Poster
at the International Workshop on Machine learning in Medi-
cal imaging (in conjunction with MICCAI 2010) http://miccai-
mlmi.uchicago.edu/past wkshp/2010/pdf/paper01.pdf.
[14] O. Pauly, B. Glocker, A. Criminisi, D. Mateus, A. M. M¨
oller, S. Nekolla,
and N. Navab, “Fast multiple organ detection and localization in whole-
body mr dixon sequences,” in Intl. conf. on Medical image computing
and computer-assisted intervention - Volume Part III, 2011, pp. 239–247.
[15] X. Zhou, A. Watanabe, X. Zhou, T. Hara, R. Yokoyama, M. Kanematsu,
and H. Fujita, “Automatic organ segmentation on torso ct images by
using content-based image retrieval,” in Medical Imaging 2012: Image
Processing: Proceedings of Society of Photo-Optical Instrumentation
Engineers (SPIE), vol. 8314, feb 2012.
[16] M. de Bruijne, B. van Ginneken, W. Niessen, J. Maintz, and
M. Viergever, “Active shape model based segmentation of abdominal
aortic aneurysms in CTA images,” in Medical Imaging 2002: Image
Processing: (SPIE), vol. 4684, 2002, pp. 463–474.
[17] N. Boukala, E. Favier, B. Laget, and P. Radeva, “Active appearance
model-based segmentation of hip radiographs,” in In Proceedings of the
IEEE International Conference on Industrial Technology (ICIT), 2004.
[18] N. Boukala, E. Favier, and B. Laget, “Active appearance model-based
segmentation of hip radiographs,” in In Proceedings of Society of Photo-
Optical Instrumentation Engineers (SPIE), vol. 5747, 2005, p. 443.
[19] M. de Bruijne, B. van Ginneken, M. A. Viergever, and W. J. Niessen,
“Adapting active shape models for 3d segmentation of tubular structures
in medical images,” in Information Processing in Medical Imaging,
2003, pp. 136–147.
[20] O. Ecabert, J. Peters, H. Schramm, C. Lorenz, J. von Berg, M. J. Walker,
M. Vembar, M. E. Olszewski, K. Subramanyan, G. Lavi, and J. Weese,
“Automatic model-based segmentation of the heart in ct images.IEEE
Transactions on Medical Imaging, vol. 27, no. 9, pp. 1189–1201, 2008.
[21] H. Ling, S. K. Zhou, Y. Zheng, B. Georgescu, M. S¨
uhling, and D. Co-
maniciu, “Hierarchical, learning-based automatic liver segmentation,” in
IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR), June 2008.
[22] J. Yao, S. D. O. Connor, and R. M. Summers, “Automated spinal column
extraction and partitioning,” in IEEE Intl. Symp. on Biomed. Imaging:
From Nano to Macro, USA, April 2006, 2006, pp. 390–393.
[23] X. Chen, H. Huang, H. Zheng, and C. Li, “Adaptive bandwidth mean
shift object tracking,” in IEEE Conf. on Robotics, Automation and
Mechatronics (RAM), 2008, pp. 1011–1017.
[24] “Nvidia cuda c programming guide,”
http://developer.download.nvidia.com/compute/cuda/32/toolkit/docs/CUDA.
[25] J. Ning, L. Zhang, D. Zhang, and C. Wu, “Scale and orientation adaptive
mean shift tracking,” Institution of Engineering and Technology (IET)
Computer Vision, vol. 6, no. 1, pp. 52–61, 2012.
... For details of the implementation see 2. The optimal bandwidth corresponds to the scale and orientation that maximizes the Bhattacharyya coefficient at the current position. [4] derives the expression for estimating the optimum bandwidth matrix H. [23] validates the expression 7 for higher dimensions and uses the ABMSOD framework for localizing 3D structures in medical volumes. For this, the feature histogram of the anatomy to be localized is used as the target distribution. ...
Research
Full-text available
Automated detection of visually salient regions is an active area of research in computer vision. Salient regions can serve as inputs for object detectors as well as inputs for region based registration algorithms. In this paper we consider the problem of speeding up computationally intensive bottom-up salient region detection in 3D medical volumes.The method uses the Kadir Brady formulation of saliency. We show that in the vicinity of a salient region, entropy is a monotonically increasing function of the degree of overlap of a candidate window with the salient region. This allows us to initialize a sparse seed-point grid as the set of tentative salient region centers and iteratively converge to the local entropy maxima, thereby reducing the computation complexity compared to the Kadir Brady approach of performing this computation at every point in the image. We propose two different approaches for achieving this. The first approach involves evaluating entropy in the four quadrants around the seed point and iteratively moving in the direction that increases entropy. The second approach we propose makes use of mean shift tracking framework to affect entropy maximizing moves. Specifically, we propose the use of uniform pmf as the target distribution to seek high entropy regions. We demonstrate the use of our algorithm on medical volumes for left ventricle detection in PET images and tumor localization in brain MR sequences.
... For details of the implementation see 2. The optimal bandwidth corresponds to the scale and orientation that maximizes the Bhattacharyya coefficient at the current position. [4] derives the expression for estimating the optimum bandwidth matrix H. [23] validates the expression 7 for higher dimensions and uses the ABMSOD framework for localizing 3D structures in medical volumes. For this, the feature histogram of the anatomy to be localized is used as the target distribution. ...
Article
Full-text available
Automated detection of visually salient regions is an active area of research in computer vision. Salient regions can serve as inputs for object detectors as well as inputs for region based registration algorithms. In this paper we consider the problem of speeding up computationally intensive bottom-up salient region detection in 3D medical volumes.The method uses the Kadir Brady formulation of saliency. We show that in the vicinity of a salient region, entropy is a monotonically increasing function of the degree of overlap of a candidate window with the salient region. This allows us to initialize a sparse seed-point grid as the set of tentative salient region centers and iteratively converge to the local entropy maxima, thereby reducing the computation complexity compared to the Kadir Brady approach of performing this computation at every point in the image. We propose two different approaches for achieving this. The first approach involves evaluating entropy in the four quadrants around the seed point and iteratively moving in the direction that increases entropy. The second approach we propose makes use of mean shift tracking framework to affect entropy maximizing moves. Specifically, we propose the use of uniform pmf as the target distribution to seek high entropy regions. We demonstrate the use of our algorithm on medical volumes for left ventricle detection in PET images and tumor localization in brain MR sequences.
Chapter
The field of big data analytics has started playing a vital role in the advancement of Medical Image Analysis (MIA) over the last decades very quickly. Healthcare is a major example of how the three Vs of data i.e., velocity, variety, and volume, are an important feature of the data it generates. In medical imaging (MI), the exact diagnosis of the disease and/or assessment of disease relies on both image collection and interpretation. Image interpretation by the human experts is a bit difficult with respect to its discrimination, the complication of the image, and also the prevalent variations exist across various analyzers. Recent improvements in Machine Learning (ML), specifically in Deep Learning (DL), help in identifying, classifying and measuring patterns in medical images. This paper is focused on the Systematic Literature Review (SLR) of various microservice events like image localization, segmentation, detection, and classification tasks.
Chapter
Automated detection of visually salient regions is an active area of research in computer vision. Salient regions can serve as inputs for object detectors as well as inputs for region-based registration algorithms. In this paper, we consider the problem of speeding up computationally intensive bottom-up salient region detection in 3D medical volumes. The method uses the Kadir–Brady formulation of saliency. We show that in the vicinity of a salient region, entropy is a monotonically increasing function of the degree of overlap of a candidate window with the salient region. This allows us to initialize a sparse seed point grid as the set of tentative salient region centers and iteratively converge to the local entropy maxima, thereby reducing the computation complexity compared to the Kadir–Brady approach of performing this computation at every point in the image. We propose two different approaches for achieving this. The first approach involves evaluating entropy in the four quadrants around the seed point and iteratively moving in the direction that increases entropy. The second approach we propose makes use of mean shift tracking framework to affect entropy maximizing moves. Specifically, we propose the use of uniform pmf as the target distribution to seek high entropy regions. We demonstrate the use of our algorithm on medical volumes for left ventricle detection in PET images and tumor localization in brain MR sequences.
Article
Full-text available
This paper presents a fast and robust segmentation scheme that automatically identifies and extracts a massive-organ region on torso CT images. In contrast to the conventional algorithms that are designed empirically for segmenting a specific organ based on traditional image processing techniques, the proposed scheme uses a fully data-driven approach to accomplish a universal solution for segmenting the different massive-organ regions on CT images. Our scheme includes three processing steps: machine-learning-based organ localization, content-based image (reference) retrieval, and atlas-based organ segmentation techniques. We applied this scheme to automatic segmentations of heart, liver, spleen, left and right kidney regions on non-contrast CT images respectively, which are still difficult tasks for traditional segmentation algorithms. The segmentation results of these organs are compared with the ground truth that manually identified by a medical expert. The Jaccard similarity coefficient between the ground truth and automated segmentation result centered on 67% for heart, 81% for liver, 78% for spleen, 75% for left kidney, and 77% for right kidney. The usefulness of our proposed scheme was confirmed.
Conference Paper
Full-text available
Existing differential approaches to the localization of 3Danatomical point landmarks in 3Dtomographic images are relatively sensitive to noise as well as to small intensity variations, both of which result in false detections as well as affect the localization accuracy. In this paper, we introduce a new approach to 3Dlandmark localization based on deformable models, which takes into account more global image information in comparison to differential approaches. To model the surface at a landmark, we use quadric surfaces combined with global deformations. The models are fitted to the image data by optimizing an edge-based fitting measure that incorporates the strength as well as the direction of the intensity variations. Initial values for the model parameters are determined by a semi-automatic differential approach. We obtain accurate estimates of the 3Dlandmark positions directly from the fitted model parameters. Experimental results of applying our new approach to 3Dtomographic images of the human head are presented. In comparison to a pure differential approach to landmark localization, the localization accuracy is significantly improved and also the number of false detections is reduced.
Article
Scitation is the online home of leading journals and conference proceedings from AIP Publishing and AIP Member Societies
Article
A scale and orientation adaptive mean shift tracking (SOAMST) algorithm is proposed in this study to address the problem of how to estimate the scale and orientation changes of the target under the mean shift tracking framework. In the original mean shift tracking algorithm, the position of the target can be well estimated, whereas the scale and orientation changes cannot be adaptively estimated. Considering that the weight image derived from the target model and the candidate model can represent the possibility that a pixel belongs to the target, the authors show that the original mean shift tracking algorithm can be derived using the zeroth- and the first-order moments of the weight image. With the zeroth-order moment and the Bhattacharyya coefficient between the target model and candidate model, a simple and effective method is proposed to estimate the scale of target. Then an approach, which utilises the estimated area and the second-order centre moment, is proposed to adaptively estimate the width, height and orientation changes of the target. Extensive experiments are performed to testify the proposed method and validate its robustness to the scale and orientation changes of the target.
Article
Model-based vision is firmly established as a robust approach to recognizing and locating known rigid objects in the presence of noise, clutter, and occlusion. It is more problematic to apply model-based methods to images of objects whose appearance can vary, though a number of approaches based on the use of flexible templates have been proposed. The problem with existing methods is that they sacrifice model specificity in order to accommodate variability, thereby compromising robustness during image interpretation. We argue that a model should only be able to deform in ways characteristic of the class of objects it represents. We describe a method for building models by learning patterns of variability from a training set of correctly annotated images. These models can be used for image search in an iterative refinement algorithm analogous to that employed by Active Contour Models (Snakes). The key difference is that our Active Shape Models can only deform to fit the data in ways consistent with the training set. We show several practical examples where we have built such models and used them to locate partially occluded objects in noisy, cluttered images.
Article
Despite the advantages that 3D medical image analysis methods offer and the fast introduction of CT and MRI, to date most hospitals use radiographs to perform preoperative planning of hip surgeries and automatic analysis of hip radiographs is still of interest. In this paper, we present a novel method for segmentation of bone structures in anterior-posterior (AP) radiographs based on Active Appearance Models. The pelvis shape is decomposed in circular regions which reflect convex local arrangement of shape points. A priori global knowledge of the geometric structure of this region representation is captured by a statistical deformable template integrating a set of admissible deformations. The texture of each region is modeled separately, and we build a local Active Appearance Model for each region. A leave-one-out test was used to evaluate the performance of the proposed method and to compare it with conventional Active Appearance Model. The results demonstrate that the method is precise and very robust to large-scale noise present in radiographs, and that it can be useful in the context of preoperative planning of hip surgery.
Article
This article reviews The Essential Physics of Medical Imaging, Third Edition. by J. T. Bushberg, J. A. Seibert, E. M. Leidholdt, J. M. Boone
Article
This paper introduces a new, efficient, probabilistic algorithm for the automatic analysis of 3D medical images. Given an input CT volume our algorithm automatically detects and localizes the anatomical structures within, accurately and efficiently. Our algorithm builds upon randomized decision forests, which are enjoying much success in the machine learning and computer vision communities. This paper extends randomized decision forests by incorporating new, 3D, context-rich visual features, and applies the resulting classifier to the task of automatic parsing of medical images into their parts. In this paper we focus on detection of human organs, but our general-purpose classifier might be trained instead to detect anomalies and malformations. Applications include (but are not limited to) efficient visualization and navigation through 3D medical scans. The output of our algorithm is probabilistic thus enabling the modeling of uncertainty as well as fusion of multiple sources of information (e.g. multiple modalities). The high level of generalization offered by decision forests yields accurate posterior probabilities for the localization of the organs of interest. High computational efficiency is achieved thanks both to the massive level of parallelism of the classifier as well as the use of integral volumes for feature extraction. The validity of our method is assessed quantitatively on a ground-truth database which has been sanitized by medical experts.
Article
Advanced liver surgery requires a precise pre-operative planning, where liver segmentation and remnant liver volume are key elements to avoid post-operative liver failure. In that context, level-set algorithms have achieved better results than others, especially with altered liver parenchyma or in cases with previous surgery. In order to improve functional liver parenchyma volume measurements, in this work we propose two strategies to enhance previous level-set algorithms: an optimal multi-resolution strategy with fine details correction and adaptive curvature, as well as an additional semiautomatic step imposing local curvature constraints. Results show more accurate segmentations, especially in elongated structures, detecting internal lesions and avoiding leakages to close structures.