# Ball-Scale Based Hierarchical Multi-Object Recognition in 3D Medical Images

**ABSTRACT** This paper investigates, using prior shape models and the concept of ball scale (b-scale), ways of automatically recognizing objects in 3D images without performing elaborate searches or optimization. That is, the goal is to place the model in a single shot close to the right pose (position, orientation, and scale) in a given image so that the model boundaries fall in the close vicinity of object boundaries in the image. This is achieved via the following set of key ideas: (a) A semi-automatic way of constructing a multi-object shape model assembly. (b) A novel strategy of encoding, via b-scale, the pose relationship between objects in the training images and their intensity patterns captured in b-scale images. (c) A hierarchical mechanism of positioning the model, in a one-shot way, in a given image from a knowledge of the learnt pose relationship and the b-scale image of the given image to be segmented. The evaluation results on a set of 20 routine clinical abdominal female and male CT data sets indicate the following: (1) Incorporating a large number of objects improves the recognition accuracy dramatically. (2) The recognition algorithm can be thought as a hierarchical framework such that quick replacement of the model assembly is defined as coarse recognition and delineation itself is known as finest recognition. (3) Scale yields useful information about the relationship between the model assembly and any given image such that the recognition results in a placement of the model close to the actual pose without doing any elaborate searches or optimization. (4) Effective object recognition can make delineation most accurate. Comment: This paper was published and presented in SPIE Medical Imaging 2010

**2**Bookmarks

**·**

**107**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**Segmentation of anatomical structures from medical images is a challenging problem, which depends on the accurate recognition (localization) of anatomical structures prior to delineation. This study generalizes anatomy segmentation problem via attacking two major challenges: 1) automatically locating anatomical structures without doing search or optimization, and 2) automatically delineating the anatomical structures based on the located model assembly. For 1), we propose intensity weighted ball-scale object extraction concept to build a hierarchical transfer function from image space to object (shape) space such that anatomical structures in 3-D medical images can be recognized without the need to perform search or optimization. For 2), we integrate the graph-cut (GC) segmentation algorithm with prior shape model. This integrated segmentation framework is evaluated on clinical 3-D images consisting of a set of 20 abdominal CT scans. In addition, we use a set of 11 foot MR images to test the generalizability of our method to the different imaging modalities as well as robustness and accuracy of the proposed methodology. Since MR image intensities do not possess a tissue specific numeric meaning, we also explore the effects of intensity nonstandardness on anatomical object recognition. Experimental results indicate that: 1) effective recognition can make the delineation more accurate; 2) incorporating a large number of anatomical structures via a model assembly in the shape model improves the recognition and delineation accuracy dramatically; 3) ball-scale yields useful information about the relationship between the objects and the image; 4) intensity variation among scenes in an ensemble degrades object recognition performance.IEEE transactions on medical imaging. 12/2011; 31(3):777-89. - SourceAvailable from: Ulas Bagci[Show abstract] [Hide abstract]

**ABSTRACT:**Abnormal nodular branching opacities at the lung periphery in Chest Computed Tomography (CT) are termed by radiology literature as tree-in-bud (TIB) opacities. These subtle opacity differences represent pulmonary disease in the small airways such as infectious or inflammatory bronchiolitis. Precisely quantifying the detection and measurement of TIB abnormality using computer assisted detection (CAD) would assist clinical and research investigation of this pathology commonly seen in pulmonary infections. This paper presents a novel method for automatically detecting TIB patterns based on fast localization of candidates using local scale information of the images. The proposed method combines shape index, local gradient statistics, and steerable wavelet features to automatically identify TIB patterns. Experimental results using 39 viral bronchiolitis human para-influenza (HPIV) CTs and 21 normal lung CTs achieved an overall accuracy of 89.95%.Conference proceedings: ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference 08/2011; 2011:5096-9.

Page 1

arXiv:1002.1288v1 [cs.CV] 5 Feb 2010

BALL-SCALE BASED HIERARCHICAL MULTI-OBJECT RECOGNITION

IN 3D MEDICAL IMAGES

ULAS ¸ BA˘GCI, JAYARAM K. UDUPA, AND XINJIAN CHEN

Abstract. This paper investigates, using prior shape models and the concept of ball scale

(b-scale), ways of automatically recognizing objects in 3D images without performing elab-

orate searches or optimization. That is, the goal is to place the model in a single shot

close to the right pose (position, orientation, and scale) in a given image so that the model

boundaries fall in the close vicinity of object boundaries in the image. This is achieved

via the following set of key ideas: (a) A semi-automatic way of constructing a multi-object

shape model assembly. (b) A novel strategy of encoding, via b-scale, the pose relationship

between objects in the training images and their intensity patterns captured in b-scale im-

ages. (c) A hierarchical mechanism of positioning the model, in a one-shot way, in a given

image from a knowledge of the learnt pose relationship and the b-scale image of the given

image to be segmented. The evaluation results on a set of 20 routine clinical abdominal

female and male CT data sets indicate the following: (1) Incorporating a large number of

objects improves the recognition accuracy dramatically. (2) The recognition algorithm can

be thought as a hierarchical framework such that quick replacement of the model assembly

is defined as coarse recognition and delineation itself is known as finest recognition. (3)

Scale yields useful information about the relationship between the model assembly and any

given image such that the recognition results in a placement of the model close to the actual

pose without doing any elaborate searches or optimization. (4) Effective object recognition

can make delineation most accurate.

Contents

1.

2.

3.

3.1.

3.2.

3.3.

4.

4.1.

4.1.1.

4.1.2.

4.2.

4.3.

4.4.

4.5.

5.

6.

7.

Introduction

Methods

Model Building

Establishing Correspondence Across Shapes

Specifying Landmarks

Single and Multiple-Object 3D Statistical Shape Models

Relationship Between Shape and Intensity Structure Systems

Ball Scale Encoding

Computation of b-scale

Intensity Weighted Ball Scale - WBscale

Relationship Vector

Estimation of Scale Parameter - f1= s

Estimation of Translation Parameters - f2= t : (tx,ty,tz)

Estimation of Orientation Parameters - f3= R : (Rx,Ry,Rz)

Hierarchical Recognition

Evaluations and Results

Conclusion

2

2

3

3

4

4

5

6

6

7

8

8

9

10

10

10

11

1

Page 2

8.

References

Acknowledgement 12

12

1. Introduction

The segmentation process as a whole can be thought of as consisting of two tasks: recog-

nition and delineation. Recognition is to determine roughly “where” the object is and to

distinguish it from other object-like entities. Although delineation is the final step for defin-

ing the spatial extent of the object region/boundary in the image, an efficient recognition

strategy is a key for successful delineation. In this study, a novel, general method is intro-

duced for object recognition to assist in segmentation (delineation) tasks. It exploits the

pose relationship that can be encoded, via the concept of ball scale (b-scale) [2], between

the binary training objects and their associated images.

As an alternative to the manual methods based on initial placement of the models by an

expert [19, 20] in the literature, model based methods can be employed for recognition. For

example, in [21], the position of an organ model (such as liver) is estimated by its histogram.

In [22], generalized hough transform is succesfully extended to incorporate variability of

shape for 2D segmentation problem. Atlas based methods are also used to define initial

position for a shape model. In [24], affine registration is performed to align the data into

an atlas to determine the initial position for a shape model of the knee cartilage. Similarly,

a popular particle filtering algorithm is used to detect the starting pose of models for both

single and multi-object cases [23]. However, due to the large search space and numerous

local minimas in most of these studies, conducting a global search on the entire image is not

a feasible approach. In this paper, we investigate an approach of automatically recognizing

objects in 3D images without performing elaborate searches or optimization.

2. Methods

The proposed method consists of the following key ideas and components:

1. Model Building: After aligning image data from all N subjects in the training set into

a common coordinate system via 7-parameter affine registration, the live-wire algorithm [7]

is used to segment M different objects from N subjects. Segmented objects are used for

the automatic extraction of landmarks in a slice-by-slice manner [5]. From the landmark

information for all objects, a model assembly MA is constructed.

2. b-scale encoding: The b-scale value at every voxel in an image helps to understand

“objectness” of a given image without doing explicit segmentation. For each voxel, the

radius of the largest ball of homogeneous intensity is weighted by the intensity value of that

particular voxel in order to incorporate appearance (texture) information into the object

information (called intensity weighted b-scale: WBs) so that a model of the correlations

between shape and texture can be built. A simple and proper way of thresholding the b-scale

image yields a few largest balls remaining in the image. These are used for the construction

of the relationship between the segmented training objects and the corresponding images.

The resulting images have a strong relationship with the actual delineated objects.

3. Relationship between MA and WBs: A principal component (PC) system is built

via PCA for the segmented objects in each image, and their mean PC system, denoted PCo,

2

Page 3

is found over all training images. PCohas an origin and three PC axes. Similarly the mean

PC system, denoted PCb, for intensity weighted b-scale images (WBs) is found. Finally the

transformation F that maps PCbto PCois found. Given an image I to be segmented, the

main idea here is to use F to facilitate a quick placement of MA in I with a proper pose as

indicated in Step 4 below.

4. Hierarchical Recognition: For a given image I, WBs is obtained and its PC system,

denoted PCbIis computed subsequently. Assuming the relationship of PCbIto PCoto be the

same as of PCbto PCo, and assuming that PCooffers the proper pose of MA in the training

images, we use transformation F and PCbIto determine the pose of MA in I. This level of

recognition is called coarse recognition. Further refinement of the recognition can be done

using the skin boundary object in the image with the requirement that a major portion of

MA should lie inside the body region delimited by the skin boundary. Moreover, a little

search inside the skin boundary can be done for the fine tuning, however, since offered coarse

recognition method gives high recognition rates, there is no need to do any elaborate searches.

We will focus on the fine tuning of coarse recognition for future study. The finest level of

recognition requires the actual delineation algorithm itself, which is a hybrid method in our

case and called GC-ASM (synergistic integration of graph-cut and active shape model). This

delineation algorithm is presented in a companion paper submitted to this symposium [8].

3. Model Building

A convenient way of achieving incorporation of prior information automatically in comput-

ing systems is to create and use a flexible model to encode information such as the expected

size, shape, appearance, and position of objects in an image [9]. Among such information,

shape and appearance are two complementary but closely related attributes of biological

structures in images, and hence they are often used to create statistical models. In partic-

ular, shape has been used both in high and low level image analysis tasks extensively, and

it has been demonstrated that shape models (such as Active Shape Models (ASMs)) can

be quite powerful in compensating for misleading information due to noise, poor resolution,

clutter, and occlusion in the images [4, 10, 11]. Therefore, we use ASM [6] to estimate

population statistics from a set of examples (training set). In order to guarantee 3D point

correspondences required by ASM, we build our statistical shape models combining semi-

automatic methods: (1) manually selected anatomically correspondent slices by an expert,

and (2) semi-automatic way of specifying key points on the shapes starting from the same

anatomical locations. Once Step (1) is accomplished, the remaining problem turns into a

problem of establishing point correspondence in 2D shapes, which is easily solved.

3.1. Establishing Correspondence Across Shapes. It is extremely significant to choose

correct correspondences so that a good representation of the modelled object results. Al-

though landmark correspondence is usually established manually by experts, it is time-

consuming, prone to errors, and restricted to only 2D objects [9, 12]. Because of these

limitations, a semi-automatic landmark tagging method, equal space landmarking, is used

to establish correspondence between landmarks of each sample shape in our experiments.

Although this method is proposed for 2D objects, and equally spacing a fixed number of

points for 3D objects is much more difficult, we use equal space landmarking technique in

pseudo-3D manner where 3D object is annotated slice by slice.

3

Page 4

3.2. Specifying Landmarks. Let S ∈ R3be a single shape and assume that its finite

dimensional representation after the landmarking consisting of n landmark points with po-

sitions LM(i)∈ S, i = 1,...,n, where LM(i)= (x(i),y(i),z(i)) are Cartesian coordinates of

the ithpoint on the shape S. Equal space landmark tagging for points LM(i)for i = 1,...,n

on shape boundaries (contours) starts by selecting an initial point on each shape sample in

training set and equally space a fixed number of points on each boundary automatically [9].

Selecting the starting point has been done manually by annotating the same anatomical

point for each shape in the training set.

Figure 1 shows annotated landmarks for five different objects (skin, liver, right kidney,

left kidney, spleen) in a CT slice of the abdominal region. Note that different number of

landmarks are used for different objects considering their size.

Figure 1. A CT slice of abdominal region with selected objects (skin, liver,

spleen, and left and right kidney) is shown on the left. Annotated landmarks

for the selected objects are shown on the right.

3.3. Single and Multiple-Object 3D Statistical Shape Models. Statistical models

of shape variability, or active shape models (ASMs), represent objects as finite number of

landmarks and examine the statistics of their coordinates over a number of training sets. The

characteristics pattern of a shape class is described by the average shape vector (mean shape)

and a linear combination of eigenvectors of the variations around the mean shape, which is

formed by simply averaging over all shape samples. An instance of shape from the same

object class can then be generated by deforming the mean shape by a linear combination of

the retained eigenvectors. For summary of ASM, see [6, 1].

In multiple-object ASM, a model assembly (MA) is built, wherein each object class brings

its unique ASM model into the framework. Therefore, MA can be expressed as a set of

models of the form:

MA = {M1,...,MM},

where M denotes the number of objects considered in the model assembly and each model

Mi= (xi,Λi) consists of a mean shape xiand allowable variations given by the covariance

matrix Λifor Oi, 1 ≤ i ≤ M.

Figure 2 shows multi-object 3D ASMs for abdominal organs. Note that skins are also

considered in the MAs to restrict the search space. Note also that mean shapes of the

objects do not have any overlapped region with other mean shapes of the objects. Because,

4

(1)

Page 5

in training part, we select the objects Oisuch that (Oi∩ Oj)(i?=j)∈1,...,M= ∅, implying that

there is no overlaps across the objects. This fact leads to (xi∩ xj)(i?=j)∈1,...,M= ∅, as each

mean shape xi is created independently and alignment of the shapes of objects does not

affect the distribution of objects in the mean shape due to the nature of the 7-parameter

affine transformation A: (Oi∩ Oj)(i?=j)∈1,...,M = ∅ ⇔ (A(Oi) ∩ A(Oj))(i?=j)∈1,...,M = ∅ ⇔

(A(xi) ∩ A(xj))(i?=j)∈1,...,M = ∅. Objects are not aligned individually, hence, their spatial

relations before and after alignment does not change.

Figure 2. Mean shape is generated using 3D ASM for multiple objects of

abdominal region. Left: Figure includes mean shapes of liver, spleen, right

and left kidneys. Right: Figure includes mean shape of skin boundary of the

abdominal region as well.

Once the MA is created, the next step is to initialize the segmentation process by locat-

ing the MA into any given subject image, with proper scale, translation, and orientation

parameters, described in the following sections. After the objects in the given image are

recognized, local constraints of the shape points in MA are incorporated into the hybrid

delineation algorithm called iterative graph-cut active shape model, see [8] for details.

4. Relationship Between Shape and Intensity Structure Systems

The goal in recognition of anatomical objects is to identify the extracted shapes of objects.

Since extracted geometric patterns are elements of a pattern family which can be thought

of as images modulo the invariances represented by the similarity group proposed, they can

naturally be considered as desirable image features to roughly identify the relationship of

patterns in terms of scale, position, and orientation. Thus, we conjecture that creating a

pattern family that includes rough object information together with region information yields

concise bases for recognizing objects. For this purpose, without doing explicit segmentation,

a rough but definitive representation of objects is possible by the b-scale approach. Hence,

we have endeavoured to integrate locally adaptive b-scale information of objects into the

recognition process to produce geometric patterns extracted by a b-scale based filtering

method using region homogeneity.

There are several advantages to the scale based approach. First, boundary and region

based representations of objects are explicitly contained in the scale-based methods. Based

on continuity of homogeneous regions inside images, we roughly identify geometric prop-

erties of objects, namely scale information, and represent the actual images with this new

5

Page 6

representation, called scale images, i.e. ball-scale, tensor-scale, generalized-scale images. Sec-

ond, since scale based methods are able to identify objects embodied in the images roughly,

resultant rough objects can be used as prior information to be integrated into the whole

segmentation process. Third, and most importantly, scale images provide fast recognition of

objects with high accuracy. Scale based methods provide highly accurate estimates of posi-

tion, orientation, and size of the actual objects such that there is no need to do elaborate

searches within the images to locate the objects. To the best of our knowledge, this is the

only existing study for 3D images locating objects of interest for a given image in one shot,

namely without doing any search. In the following subsection, we describe how shape and

intensity structure systems are related through b-scale based method.

4.1. Ball Scale Encoding. The b-scale method has been reported to be very useful in image

segmentation [2], filtering [16], inhomogeneity correction[17], and image registration [18].

The main idea in b-scale encoding is to determine the size of local structures at every spel

under a prespecified scene-dependent region-homogeneity criterion. For example, in Figure 3,

the size of the hyperball located at pixel c is bigger than that located d or e, thus the size

of the local structure to which pixel c belongs is bigger than that to which d or e belongs.

By definition, and also seen from the Figure 3, locally adaptive scale in regions with fine

details or in the vicinity of boundaries is small, while it is large in the interior of a large

homogeneous object regions.

Figure 3.

Using a suitable region-homogeneity criterion, a local determination is done

as to what is the largest disc that can be centered at any point such as c within

which the intensities are homogeneous. The b-scale at c is bigger than that d

or at e. (b) The b-scale scene of the CT abdominal slice in (a).

(a) A 2D slice from a 3D-CT scene of an abdominal region.

4.1.1. Computation of b-scale. Terminology & Notation: The pair (Zn,α) is called a

digital space where α is an adjacency relation on Zn. We represent a nD image over a fuzzy

digital space (Zn,α), called scene for short, by a pair C = (C,f) where C is a finite nD array

of spels (spatial elements), called scene domain of C, covering a body region of the particular

patient for whom image data C are acquired, and f is an intensity function defined on C,

which assigns an integer intensity value to each spel c ∈ C. We assume that f(c) ≥ 0 for

6

Page 7

all c ∈ C and f(c) = 0 if and only if there are no measured data for spel c and that ν is an

n-tuple indicating the physical dimensions of the spels in C.

A hyperball Bk,ν of radius k ≥ 0 and with center at c ∈ C in a scene C = (C,f) over

(Zn,α,ν) is defined by

e ∈ C

A fraction FOk,ν(c) (“fraction of object”), that indicates the fraction of the ball boundary

occupied by a region which is sufficiently homogeneous with c, by

?

|Bk,ν(c) − Bk−1,ν(c)|

Bk,ν=

?????

?

?

?

?

n

?

i=1

ν2

minj

i(ci− ei)2

?ν2

j

? ≤ k

.

(2)

FOk,ν(c) =

e∈Bk,ν(c)−Bk−1,ν(c)Wψ(|f(c) − f(e)|)

(3)

where |Bk,ν(c) − Bk−1,ν(c)| is the number of spels in Bk,ν(c) − Bk−1,ν(c) and Wψis a homo-

geneity function [16, 13]. A detailed description of the characteristics of Wψis presented

in [16]. In all experiments, we use a zero-mean unnormalized Gaussian function for Wψ.

The ball radius k is iteratively increased starting from one and the b-scale computation

algorithm checks for FOk,ν(c), the fraction of the object containing c that is contained in the

ball. When this fraction falls below the threshold ts, it is considered that the ball contains

an object region different from that to which c belongs [16]. Following the recommendation

in [2], ts= 0.85 is chosen.

4.1.2. Intensity Weighted Ball Scale - WBscale. Although the size of a local structure is es-

timated using the appearance information of the gray scale images, i.e. region-homogeneity

criterion, b-scale images contain only rough geometric information. Incorporating appear-

ance information into this rough knowledge characterizes scale information of local structures,

thus, it allows us to distinguish objects with the same size by their appearance information.

One way to extract b-scale images with corresponding appearance information of the local

structures is to weight the radius of the hyperball centered at a given spel with the intensity

value of that spel. As a result, object scale information is enriched with local intensity values.

The algorithm for intensity weighted object scale estimation (IWOSE) is presented below.

Algorithm IWOSE

Input: c ∈ C in a scene C = (C,f), Wψ, a fixed threshold ts

Output: r′(c)

1. begin

2. set k = 1

3.while FOk,ν(c) ≥ tsdo

4.set k to k + 1

5.endwhile;

6.set r(c) to k − 1;

7. output r′(c) = f(c)r(c);

8. end

The histogram of Bscaleimage contains only the information about radius of the hyper-

balls, hence, it is fairly easy to eliminate small balls and obtain a few largest balls by applying

simple thresholding technique. Particularly in this case, thresholding can be used effectively

7

Page 8

to retain reliable object information. The patterns pertaining to the largest balls retained

after thresholding have strong correlations with truly delineated objects shown in the last

rows of the Figure 4. The truly delineated objects and patterns obtained after thresholding

share some global similarities; for instance, scale, location, and orientation of the patterns

are closely related to truly delineated objects. Patterns show salient characteristics because

they depend on object scale estimation and they are mostly spatially localized. Therefore, a

concise but reliable relationship can be built using scale, position, and orientation information

as parameters.

Figure 4 demonstrates thresholding process on the intensity weighted b-scale images,

WBscale. Different slices of WBscalescenes for abdominal CT images are shown in the first

rows of the figure. The remaining rows except the last, show thresholded intensity weighted

ball-scale scenes, namely (WBt

scale scenes, WBscalescenes allow much more flexibility to select thresholding interval t, since

t is not restricted to be chosen as only object scale in this case. As easily noticed from the

fifth row of the Figure 4, WBt

object correspondences shown in the last row of the same figure.

In recognition, as the aim is to recognize “roughly” the whereabouts of an object of

interest in the scene, and also since the trade-off between locality and conciseness of shape

variability will be modulated in the delineation step, it will be sufficient to use concise bases

produced by PCA without considering localized variability of the shapes. For the former

case, on the other hand, it is certain that analyzing variations for each subject separately

instead of analyzing variations over averaged ensembles leads to exact solution where specific

information present in the particular image is not lost.

scale), obtained using various different t intervals. Unlike b-

scalescenes have strong correlations with their truly delineated

4.2. Relationship Vector. In order to find the translation, scale, and orientation that

best align the shape structure system (learnt from segmented objects in the training set)

and the intensity structure system (derived from rough objects obtained through the b-

scale encoding scheme from the test set), it is essential to define the relationship between

the shape and intensity structures from the training images and incorporate this into the

model. We build this relationship via PCA to keep track of translation and orientation

differences, and we use “minimum enclosing box” approach to find scale component. In

minimum enclosing box approach, the real physical units of the truly segmented objects and

WBt

scalescenes are used. For orientation analysis, parameters of variations are computed via

PCA. Principal components of the shape and intensity structure system, denoted by PCo

and PCb, respectively, have an origin and three inertia axes. For the PC systems of the same

subject, we have three normalized eigenvectors showing the distribution of variations in each

direction, and origins to show centroids of the PC systems. Briefly, function F that describes

the relationship between PCo and PCb can be decomposed into the form F = (s,t,R),

where t : (tx,ty,tz) is translation component, s is a scale component, and R : (Rx,Ry,Rz)

represents three rotations in (x,y,z). We observe that there is a fixed relationship between

PCoand PCb, and the relationship function F can be split into three sub-functions f1,f2,f3

such that F = (f1= s,f2= t,f3= R).

4.3. Estimation of Scale Parameter - f1= s. Minimum Enclosing Box (MEB) enclosing

the objects of interest for each subject i = 1,...,N in the training set is used to estimate the

8

Page 9

Figure 4. Different slices of intensity weighted b-scale scenes extracted from

a CT image (female subject, abdominal region) are shown in the first row.

2nd-5th rows are showing corresponding thresholded intensity weighted b-scale

scenes illustrated in the first row. The last row denotes truly segmented objects

of the CT image for the selected scenes, respectively.

real physical units of the objects in question [14, 15]. The length that connects two farthest

corners of the MEB is defined as the scale parameter.

4.4. Estimation of Translation Parameters - f2= t : (tx,ty,tz). Estimation of the

translation parameters is solely based on forming a linear relationship between the centroids

of the binary objects and those obtained from thresholding the intensity weighted b-scale

9

Page 10

scene. Then, poses of the shapes are computed more accurately due to incorporation of

appearance information.

4.5. Estimation of Orientation Parameters - f3= R : (Rx,Ry,Rz). Since principal com-

ponent vectors of shape and intensity structure system constitute an orthonormal basis, and

since translation between the PC systems are estimated prior to the estimation of orientation

parameters, the two PC systems differ only by orientation, and therefore, the basis vectors

in shape structure system can be expressed in terms of the basis vectors in the intensity

structure system by the relation

PCb= (R)(PCo),

(4)

where R is an orthonormal rotation matrix carrying information about the relative positions

of shape and intensity structure systems in terms of their Euler angles. A set of N segmented

training data sets yields, for each i = 1,...,N, an orthonormal rotation matrix Ri that

relates PCoiand PCbi. We assume that, once the mean rotation and its standard deviation

around the mean are found, this information will guide us as to how the PCoand PCbof

any test image are related. With F modeling the mean relationship between the shape and

intensity structures over the training set, the enchanced model assembly becomes MA =

(M,∆F), where M denotes the set of object models and ∆F denotes the variation observed

in F over the training set.

5. Hierarchical Recognition

Given any patient test scene C, recognition at the coarsest level proceeds as follows: First

the weighted b-scale scene C1of C is computed. Note that this does not require any segmen-

tation. From C1, the PC system PCbfor the intensity structure is determined. Then, from

F, the pose of the model assembly MA in C is determined. Once coarse recognition has been

done, the pose of the MA can be refined further by several means: (i) by searching around

the pose of coarse recognition but confined to the limits indicated by ∆F; (ii) by segmenting

one of the easily segmented objects (such as the skin boundary) and by aligning it with

the corresponding object in MA. Recognition achieved after all such refinement methods

is called fine recognition. Finally, exact refinement gets done in the delineation step which

is considered to be the finest level of recognition. This hierarchy can be understood from

the perspective of delineation accuracy also such that recognition accuracy itself depends on

delineation accuracy, and conversely recognition influences delineation accuracy.

6. Evaluations and Results

We used whole body PET-CT scans of 10 female and 10 male patients. The voxel size of

the CT images is 1.17 mm x 1.17 mm x 1.17 mm (interpolated from 5 mm slices). We focus

on the abdominal region and have selected the following five objects from each subject: skin

boundary, liver, left kidney, right kidney and spleen.

We use leave-one-out-cross-validation (LOOCV) to measure recognition performance for

each subject type. Translation, scale, and orientation components of the relationship function

F are evaluated separately. The range of scale component in LOOCV tests was found to be

(0.97−1.07). Figures 5 and 6 show recognition accuracy in terms of mean translation error

over all objects for female and male subjects, respectively, and the different combinations

of objects included in the model assembly (shown along the horizontal axis). All results

10

Page 11

displayed are for coarse recognition only. As easily noticed, the minimum mean translation

errors are obtained when all the objects are included in the recognition process. Different

combination of objects yields different results. Size and spatial position of the objects play

an important role in recognition: it is easier to recognise large objects than smaller objects.

Similarly, Figures 7 and 8 show recognition accuracy in terms of mean orientation error (in

degrees) for female and male data, respectively. Note that the minimum mean orientation

error is obtained when all the objects are included in the recognition process.

We observe that the effectiveness of object recognition depends on the number and distri-

bution of objects considered in MA. Recognition accuracy is improved with the increasing

number of objects. The evaluated results indicate: (1) High recognition accuracy can be

achieved by including a large number of objects which are spread out in the body region.

(2) Incorporating local object scale information improves the recognition in a way that there

is usually no need to do search for scaling, orientation, and translation parameters. (3) The

appearance information incorporated via b-scale has strong effect on the computation of PC

system, and on the relationship function F.

LVS LKRKLV+SLV+LK LV+RK LV+S+LK LV+S+RKS+LK S+RKS+LK+RK LK+RK LV+LK+RKALL

5

10

15

20

25

30

35

40

45

50

55

60

Translation Error (mm)

1

Figure 5. Recognition accuracy in terms of mean translation error (mm) for

CT Abdominal female data with different number and combination of organs

included in the model assembly.

7. Conclusion

(1) The b-scale image of a given image captures object morphometric information without

requiring explicit segmentation. b-scales constitute fundamental units of an image in terms

of largest homogeneous balls situated at every voxel in the image. The b-scale concept has

been previously used in object delineation, filtering and registration. Our results suggest that

11

Page 12

LVSLK RKLV+SLV+LKLV+RK LV+S+LKLV+S+RKS+LKS+RK S+LK+RKLK+RKLV+LK+RK ALL

5

10

15

20

25

30

35

40

45

50

55

60

Translation Error (mm)

1

Figure 6. Recognition accuracy in terms of mean translation error (mm) for

CT Abdominal male data with different number and combination of organs

included in the model assembly.

their ability to capture object geography in conjunction with shape models may be useful

in quick and simple yet accurate object recognition strategies. (2) The presented method

is general and does not depend on exploiting the peculiar characteristics of the application

situation. (3) The specificity of recognition increases dramatically as the number of objects

in the model increases. (4) We emphasize that both modeling and testing procedures are

carried out on the CT data sets that are part of the clinical PET/CT data routinely acquired

in our hospital. The CT data set are thus of relatively poor (spatial and contrast) resolution

compared to other CT-alone studies with or without contrast. We expect better performance

if higher resolution CT data are employed in modeling or testing.

8. Acknowledgement

This paper is published in SPIE Medical Imaging Conference - 2010.

This research is partly funded by the European Commission Fp6 Marie Curie Action

Programme (MEST-CT-2005-021170). The second author’s research is funded by an NIH

grant EB004395.

References

[1]Chen, X., Udupa, J.K., Zheng, X., Torigian, D., Alavi, A., 2009. Automatic Anatomy Recognition Via

Multi Object Oriented Active Shape Models. Proc. of SPIE Medical Imaging.

Saha, P.K., Udupa, J.K., Odhner, D., 2000. Scale-based fuzzy connected image segmentation: Theory,

algorithms, and validation. Computer Vision Image Understanding, Vol. 77, pp.145–174.

12

[2]

Page 13

LVSLKRKLV+SLV+LKLV+RK LV+S+LK LV+S+RKS+LKS+RK S+LK+RKLK+RK LV+LK+RK ALL

−40

−30

−20

−10

0

10

20

30

40

Orientation Error (in degrees)

1

heading

attitude

bank

Figure 7. Recognition accuracy in terms of mean orientation error (mm)

in three directions: heading-x, attitute-y, bank-z for CT Abdominal female

data with different number and combination of organs included in the model

assembly.

[3]Saha, P.K., Udupa, J.K., 2001. Optimum Image Thresholding via Class Uncertainity and Region Ho-

mogeneity. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.23 (7), pp.689–706.

Chen, X., Udupa, J.K., Alavi, A. and Torigian, D., 2009. GC-ASM: Synergistic Integration of Active

Shape Modeling and Graph-Cut Methods. Proc. of SPIE Medical Imaging.

Dryden, I.L., Mardia, K.V., 1998. Statistical Shape Analysis, John Wiley & Sons.

T.F. Cootes, C.J. Taylor, D.H. Cooper and J. Graham, 1995. Active shape models - their training and

application. Computer Vision and Image Understanding. Vol. 61, pp. 38–59.

Falcao, A.X., Udupa, J.K., Samarasekera, S., Sharma, S., Hirsch, B.E., and Lotufo, R.A., 1998. User-

steered image segmentation paradigms: live wire and live lane. Graph. Models Image Process., Vol.

60 (4), pp. 233–260.

Chen, X., Udupa, J.K., Bagci, U., Alavi, A. and Torigian, D., 2010. 3D Automatic Anatomy Recognition

Based on Iterative Graph-Cut-ASM. Submitted to Proc. of SPIE Medical Imaging.

Davies, R., Twining, C., and Taylor, C., 2008. Statistical Models of Shape: Optimisation and Evaluation,

Springer.

[10] Cremers, D., Kohlberger, T., Schnorr, C., 2003. Shape statistics in kernel space for variational image

segmentation. Pattern Recognition, Vol.36, pp.1929–1943.

[11] Kokkinos, I., Maragos, P., 2009. Synergy between object recognition and image segmentation using the

expectation-maximization algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligience,

Vol.31 (8), pp.1486–1501.

[12] Kendal, D.G., 1989. A survey of statistical theory of shape, Statistical Science, Vol. 4, pp. 87–120.

[13] Madabhushi, A., Udupa, J., Souza, A., 2005. Generalized scale: Theory, algorithms, and application to

image inhomogeneity correction. Computer Vision and Image Understanding 101 (2), pp. 100–121.

[14] Castleman, K.R., 1996. Digital Image Processing. Englewood Cliffs, NJ: Prentice Hall.

[15] Schalkoff, R, 1989. Digital Image Processing and Computer Vision. Singapore: John Wiley and Sons.

13

[4]

[5]

[6]

[7]

[8]

[9]

Page 14

LVSLK RKLV+SLV+LKLV+RK LV+S+LKLV+S+RKS+LKS+RK S+LK+RKLK+RKLV+LK+RK ALL

−40

−30

−20

−10

0

10

20

30

40

Orientation Error (in degrees)

1

heading

attitude

bank

Figure 8. Recognition accuracy in terms of mean orientation error (mm) in

three directions: heading-x, attitute-y, bank-z for CT Abdominal male data

with different number and combination of organs included in the model as-

sembly.

[16] Saha, P.K., Udupa, J.K., 2001. Scale-Based diffusive image filtering preserving boundary sharpness and

fine structures, IEEE Transactions on Medical Imaging, Vol. 20 (11), pp.1140–1155.

[17] Zhuge, Y., Udupa, J.K., Liu, L., Saha, P.K., 2002. A scale-based method for correcting background

intensity variation in acquired images, SPIE Medical Imaging, Vol. 4684, pp. 1103–1111.

[18] Nyul, L., Udupa, J., Saha, P., February 2003. Incorporating a measure of local scale in voxel-based 3-D

image registration. IEEE Transactions on Medical Imaging 22 (2), pp. 228–237.

[19] Kelemen, A., Szekely, G., Gerid, G., 1999. Elastic model-based segmentation of 3D neuroradiological

data sets. IEEE Transactions on Medical Imaging, Vol.18 (10), pp.828–839.

[20] Pizer, S.M., and et al., 2003. Deformable m-reps for 3D medical image segmentation. International

Journal of Computer Vision, Vol.55 (2/3), pp.851–865.

[21] Soler, L., and et al., 2001. Fully automatic anatomical, pathological, and functional segmentation from

CT scans for hepatic surgery. Computer Aided Surgery, Vol.6 (3), pp.131–142.

[22] Brejl, M., Sonka, M., 2000. Object localization and border detection criteria design in edge-based image

segmentation: automated learning from examples. IEEE Transactions on Medical Imaging, Vol.19 (10),

pp.973–985.

[23] De Bruijne, M., Nielsen, M., 2005. Shape particle filtering for image segmentation. In Proceedings of

MICCAI, vol.3216, pp. 168–175.

[24] Fripp, J., Crozier, S., Warfield, S.K., Ourselin, S., 2005. Automatic initialisation of 3D deformable

models for cartilage segmentation. In Proceedings of Digital Image Computing: Techniques and Appli-

cations, pp. 513–518.

14

Page 15

School of Computer Science, University of Nottingham

E-mail address: ulasbagc@gmail.com

Medical Image Processing Group, UPENN

E-mail address: jay@mipg.upenn.edu

Diagnostic Radiology, NIH

E-mail address: chenx6@mail.nih.gov

15