Page 1

A Trained Spin-Glass Model for

Grouping of Image Primitives

Joes Staal, Stiliyan N. Kalitzin, and Max A. Viergever

Abstract—A method is presented that uses grouping to improve local classification of image primitives. The grouping process is based

upon a spin-glass system, where the image primitives are treated as possessing a spin. The system is subject to an energy functional

consisting of a local and a bilocal part, allowing interaction between the image primitives. Instead of defining the state of lowest energy

as the grouping result, the mean state of the system is taken. In this way, instabilities caused by multiple minima in the energy are

being avoided. The means of the spins are taken as the a posteriori probabilities for the grouping result. In the paper, it is shown how

the energy functional can be learned from example data. The energy functional is defined in such a way that, in case of no interactions

between the elements, the means of the spins equal the a priori local probabilities. The grouping process enables the fusion of the

a priori local and bilocal probabilities into the a posteriori probabilities. The method is illustrated both on grouping of line elements in

synthetic images and on vessel detection in retinal fundus images.

Index Terms—Statistical pattern recognition, spin-glass model, statistical learning, Bayesian grouping.

?

1

A

extraction, object specific measurements, and fast object

rendering from multidimensional image data. Simple seg-

mentation techniques based on local pixel-neighborhood

classification fail to apprehend the globality of objects and

often require intensive operator assistance to produce

acceptable results. The reason is that the notion of a object

does not necessarily follow the characteristics of its local

image representation; only in idealized cases do local

operations directly yield a definition of an object. Local

properties, such as textures, edgeness, ridgeness, etc.,

generally do not represent connected features of an object.

Therefore, a method is needed that groups pieces of image

primitives into objects. In such a method, the solution of the

segmentation problem will involve the use of domain knowl-

edgethatderivesfromtherecognitiontask.Similararguments

have motivated earlier work on model-driven grouping and

segmentationappliedtoreal-worldimages[4],[10],[11],[13],

[15], [20], [23], [24], [25], [26], [27], [28], [30].

In our view, the segmentation problem can only be

tackled successfully in conjunction with the recognition

problem. The recognition task provides a notion of the

objects to be defined using the segmentation method; this

allows us to incorporate model knowledge of the objects in

the grouping process, either by predefining properties that

are characteristic of an object or by deriving such properties

by statistical means from example data.

INTRODUCTION

S long as the field of digital image analysis exists,

segmentation has been the bottleneck to achieve object

The grouping process, described in this study, relies on

local and bilocal prior object probabilities that have been

based on the predefined recognition task. In the grouping

process, image primitives interact with each other and,

through these interactions, posterior probabilities for being

part of the object are computed. In this sense, the method is

based on Bayesian statistics.

The proposed method can be regarded as finding the

mean state of a spin-glass system subject to the Gibbs-

Boltzmann-distribution. The energy functionals that are

needed for such a spin-glass system are based on the local

and interaction prior probability densities of the image

primitives. If these densities are not known beforehand,

they can be estimated from example data. In the paper, it is

shown how this can be accomplished using classifiers from

statistical pattern recognition [5].

The idea of grouping image primitives using a spin-glass

model was investigated in [11] for the segmentation of

edges. The main differences between their approach and

ours is, firstly, that we use a local part in the energy

functional that does not rely on the interaction probabilities.

Secondly, we use example data to learn the energy

functionals and do not define them in analytical form.

Finally, instead of searching the configuration that max-

imizes the posterior probabilities, the mean values of all

possible configurations of the image primitives determine

the posterior probabilities. This is also an important

difference of the proposed method with respect to

Boltzmann-machine-like approaches [1]. And, unlike re-

laxation labeling methods, the method does not have to

reevaluate the probabilities of an objective function [17].

Other methods for grouping define affinity matrices

between primitives and try to define a splitting of the

primitives based on eigenanalysis [24], [29] or normalized

cuts [28]. The problem with such methods is that their

foundation is not statistical in nature, so that one has to

incorporate user defined rules in constructing the affinity

1172IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. 27, NO. 7,JULY 2005

. J. Staal and M.A. Viergever are with the Image Sciences Institute,

University Medical Center Utrecht, Heidelberglaan 100, E01.335, 3584

CX Utrecht, The Netherlands. E-mail: {joes, max}@isi.uu.nl.

. S.N. Kalitzin is with the Dutch Epilepsy Clinics Foundation, Achterweg 5,

2103 SW Heemstede, The Netherlands. E-mail: skalitzin@sein.nl.

Manuscript received 15 Dec. 2003; revised 27 Sept. 2004; accepted 28 Sept.

2004; published online 12 May 2005.

Recommended for acceptance by Y. Amit.

For information on obtaining reprints of this article, please send e-mail to:

tpami@computer.org, and reference IEEECS Log Number TPAMI-0428-1203.

0162-8828/05/$20.00 ? 2005 IEEE Published by the IEEE Computer Society

Page 2

matrices, although, in [27], an attempt has been made to

overcome this shortcoming.

The proposed method is illustrated in two examples:

grouping of line elements in synthetic and real-world

data. In [20], a Markovian approach to this problem is

given. Although the approach is interesting, the connec-

tion field and interaction matrices are manually con-

structed and it is not clear how they could be learned.

Also, the use of local (noninteracting) knowledge is not

incorporated in this scheme.

The purpose of the present study is to come up with a

general statistical method that improves the local classifica-

tion of image primitives by introducing (bilocal) interaction

between them.

The setup of the paper is as follows: In Section 2, the

grouping process is considered as a spin-glass system. The

solution of the grouping problem is formulated as finding

the means of the state variables. Section 3 gives an overview

of implementational issues, followed in Section 4 by

illustrations of the approach, both on synthetic and on

real-world images. Concluding remarks and a discussion of

the results are presented in Section 5.

2FORMAL PROBLEM STATEMENT

In this section, the grouping process will be regarded as a

spin-glass system. The system is governed by an energy

functional, consisting of a local and a bilocal part. In the

next section, the spin-glass formulation will be derived.

Section 2.2 discusses the definitions for the local and bilocal

energies.

2.1

The task is to group K elements ?i out of a set ? ¼

f?1;...;?Ng, which consists of N elements. The number K is

unknown beforehand. The elements can be image pixels,

line elements, image patches, etc. Every element is regarded

as having a spin sithat can be in one of two states: down or

0 and up or 1. If siis up, ?ibelongs to the group (or object);

if it is down, ?ibelongs to the background. It is useful to

introduce the probability of a certain grouping, i.e., a

configuration fsig of the spins, as

PðfsigÞ ¼1

Probabilistic Formulation

Ze??EðfsigÞ:

ð1Þ

Equation (1) is known as the Gibbs-Boltzmann-distribution.

The constant Z is the partition function, which is the sum

over all configurations of the spins of e??EðfsigÞ. It takes care

that the sum over all configurations of PðfsigÞ equals 1. The

functional EðfsigÞ plays the role of the energy belonging to

a state fsig of the system, whereas ? is a control parameter,

which is equivalent to the inverse temperature of a physical

spin system.

We want to model the spins up to pairwise interaction in

such a way that elements belonging both to the foreground

lower the energy. The energy function that is used to

accomplish this is

EðfsigÞ ¼

X

i

Lisiþ1

2

X

i

X

j

Bijsisj;

ð2Þ

where Liis the value of a local potential function induced

by si and Bij is the value of a bilocal potential function

induced by the pair si and sj. In [9], [11] similar energy

functions are being used.

The bilocal part of the energy can be viewed upon as a

discrete Hopfield network [12] with connections Bij

between the neurons. For the grouping process, it is

important that the elements influence each other, which is

accomplished by Hopfield networks since they exhibit

strong feedback-coupling.

Many Gibbs-based methods try to minimize the energy

functional in order to obtain a maximum a posteriori (MAP)

estimate from (1). A fast deterministic solution for energy

functionals with binary variables and constant Bijis given

in [9]. A recently published paper [18] investigates what

energy functions can be maximized using graph cuts. The

constraints that the energy functionals must satisfy are not

met in our case. For that reason, we estimate the mean state

of the system governed by (1). Following the terminology of

[19], another loss function is adopted from a Bayesian

theoretic point of view.

The mean hsii of a spin siis given by

hsii ¼

fsjg

X

siPðfsjgÞ;

ð3Þ

where the sum runs over all configurations. Elements with

mean spins close to one are very probable in the group,

whereas those with values close to zero belong to the

background. Once the mean values of the spins are

determined,thegroupedelementscanbeextractedbysetting

athreshold.Foreachelement?i,itsmeanspinplaystheroleof

the a posteriori probability of being part of the object.

The computation of the values of the mean spins can

efficiently be computed using the Metropolis algorithm [21],

which will be discussed in Section 3.1. But first, definitions

for the potential functions will be given.

2.2

For computation of the mean spins, the potentials Liand Bij

need to be known. We want to base the potentials on

properties of the elements ?i. Therefore, it is assumed that

a priori knowledge of every element ?iis available in the

form of a local probability Pi¼ Pðsi¼ 1Þ. Information of

the interaction between a pair of elements ?iand ?jshould

be available in the form of a bilocal probability Pij¼

Pðsi¼ 1jsj¼ 1Þ. A method for the determination of these

a priori probabilities is given in Section 3.2. Since every spin

can only be in one of two states, we have that Pðsi¼ 0Þ ¼

1 ? Pi and, likewise, Pðsi¼ 0jsj¼ 1Þ ¼ 1 ? Pij. Note that

local probabilities have a single index, whereas bilocal

probabilities are doubly indexed.

If there is no interaction between the elements, (2)

reduces to

Definition of the Potentials

EðfsigÞ ¼

X

i

Lisi:

Equation (1), the probability that the system is in state

fsig, equals, in this case,

PðfsigÞ ¼1

Z

Y

i

e??Lisi;

STAAL ET AL.: A TRAINED SPIN-GLASS MODEL FOR GROUPING OF IMAGE PRIMITIVES 1173

Page 3

showing that, without interactions, the classification of the

elements is independent of each other. Because of the

independence, we can consider the whole system as a set of

N systems, each consisting of one spin. The Gibbs-

Boltzmann-distribution for the system concerning siis then

given by

PðsiÞ ¼1

Zie??Lisi;

and we find, for the a posteriori probability

hsii ¼0 ? e??Li?0þ 1 ? e??Li?1

Zi

¼1

Zie??Li;

ð4Þ

with Zi¼ 1 þ e??Li, the partition function for the system

corresponding to si.

Without interaction between the elements, the energy

should be defined in such a way that the a posteriori

probability equals the a priori probability, i.e.,

hsii ¼ Pi:

ð5Þ

Substitution of (5) in (4) and solving for Liyields

Li¼ ?1

?loge

Pi

1 ? Pi:

ð6Þ

With the above equation, we have expressed the local

potential function in terms of the a priori local probabilities.

In the absence of bilocal interaction, the system is calibrated

in such a way that it classifies the elements according to

their a priori probabilities.

In analogy with the demand of (5), we would like the

system to be calibrated in such a way that if only a single

pair of sites ?i and ?j have interaction and if there is no

contribution of the local potential, that the a posteriori

conditional probability for si¼ 1 given sj¼ 1 equals the

a priori conditional probability Pij¼ Pðsi¼ 1jsj¼ 1Þ

hsijsj¼ 1i ¼ Pij;

where hsijsj¼ 1i is the a posteriori conditional probability.

Under these conditions, we find that

ð7Þ

hsijsj¼ 1i ¼

1

Zije??Bij;

with Zij¼ 1 þ e??Bij. Note that there are only two states,

viz. si¼ 0 ^ sj¼ 1 and si¼ 1 ^ sj¼ 1. Solving for Bijin the

above equations gives

Bij¼ ?1

?loge

Pij

1 ? Pij:

ð8Þ

In order to derive expressions for Liand Bijin terms of

a priori knowledge, their contributions to the energy

functional have been investigated in isolation. Equation (5)

holds only true when there is no bilocal interaction in the

spin system and (7) is only valid in a two-spin system

without a local potential field. When the potentials as

defined in (6) and (8) are combined into (2), the expressions

in (5) and (7) are perturbed (the perturbation may be very

large). In particular, the a posteriori probability in (5)

changes into

hsii ¼ Piþ ?i;

ð9Þ

where the sign and magnitude of ?iresemble the outcome

of the competition between the local and bilocal contribu-

tions to the energy.

To get a feeling for how the interactions between the

spins influence the system, we consider the following cases:

If Pij>1

sj¼ 1 is favored by the system since that will lower the

energy. For Pij<1

be preferred since, in that case, the energy is not increased.

An element that has Pi<1

have si¼ 0. However, if one of ?i0s neighbors, say ?j, is a

foreground element and it has strong interaction with ?i,

then, given that sj¼ 1, Bij< 0, which favors si¼ 1. Clearly,

there will be competition between the local and bilocal

contributions to the energy. If the interaction is strong

enough, the neighbor will cause the locally weak element to

become part of the foreground object, i.e., it increases the

mean spin of si. If ?iis a locally strong element, i.e., Pi>1

and it has weak interactions with its neighbors, then this

will encourage the spins of the neighbors to be set to zero

and the mean spin of ?iwill not change with respect to the

local a priori probability Pi.

Note that, with the definitions for the potentials in (6)

and (8), the parameter ? drops out in (1).

2, then Bij< 0 and the selection of both si¼ 1 and

2, putting at least one of the spins to 0 will

2and, thus, Li> 0, would like to

2

3IMPLEMENTATION

In this section, we will start with discussing the Metropolis

algorithm, a method for finding the expected values of the

state variables of a system that is characterized by the Gibbs-

Boltzmann-distribution. After this discussion, determina-

tion of the a priori probabilities Piand Pijwill be dealt with.

3.1

The Metropolis algorithm [21] is a Monte-Carlo method

forcalculating the expected

variables ofa systemthat

Gibbs-Boltzmann-distribution.

At the start of the algorithm, the system is in a certain

state, quite probably not the equilibrium state. The algo-

rithm begins by choosing an element at random and

reverses its spin. The reversal changes the energy of the

system. If the change of energy ?E < 0, the reversal of the

spin is accepted; if ?E > 0 the reversal is accepted with

probability expð???EÞ. The remaining N ? 1 elements are

checked in random order and the system changes its state

with the same rules as before. This procedure is referred to

as a Metropolis step. The Metropolis step is repeated

M times, where M has to be large enough in order to

represent the system’s (thermal) equilibrium.

The mean of the spins is found by

The Metropolis Algorithm

values

subject

of the state

theis to the

hsii ¼1

M

X

M

k¼1

siðkÞ;

ð10Þ

where siðkÞ is the value of the spin after the kth Metropolis

step. In the limit M ! 1, (10) converges to the true

expected value.

With the choices for the potentials in this paper,

knowledge of the energy itself is not necessary, only

?Eðsi7! ? s siÞ ¼ ð2? s si? 1ÞðLiþP

jBijsjÞ is needed, where

1174IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. 27, NO. 7, JULY 2005

Page 4

?Eðsi7! ? s siÞ is the change of the energy due to the spin

reversal of element ?iand ? s si¼ 1 ? si.

3.2Determination of the A Priori Probabilities

The main issue in the implementation of the proposed

method is the determination of the a priori probabilities Pi

and Pij. Once those are known, the potentials from (6) and

(8) can be computed. For the determination of Piand Pij,

two approaches are possible:

1.

Prior information suggests an analytical functional

based on properties of the elements.

The probability densities are estimated from exam-

ple data based on properties of the elements.

In this paper, the second option is taken and the

probabilities Pi and Pij have been learned from manually

labeled example data. In the example data, every primitive

?iis given a label “true” (1) or “false” (0), which serves as a

reference. To estimate the local density, a feature vector ?iis

introduced for every element ?iin the example data so that

Pican be estimated as function of the features. The set of ?is

combined with the reference labeling is the local training set.

For the bilocal probabilities, a training set is built

following a similar rationale. Recall that Pij is the condi-

tional probability that si¼ 1 given that sj¼ 1. This means

that we train with those ?ifor which the neighbor element ?j

in the local reference set is labeled as “true.” The target in

the bilocal reference set must be set to 1 if ?iis labeled as

“true” in the local reference set and to 0 if it is labeled as

“false.” For every pair ?iand ?jwhich appears in the bilocal

reference set, a vector ijis computed which stores the

interaction features. This enables the estimation of Pij as

function of the features. These vectors, together with the

bilocal reference set, form the bilocal training set.

The training sets are used to train a local and a bilocal

classifier to estimate the probability densities for Piand Pij.

An example of a classifier that is capable to accomplish this

task is a feed-forward neural network [3, chapter 6].

Because training feed-forward neural networks can be

difficult and many parameters need to be adjusted, we

have chosen to use the k-Nearest-Neighbor (kNN) classifier

for approximating the probability densities. There exist

optimized and fast implementations for kNN-classifiers, see

[2]. The training of a kNN-classifier is extremely simple: all

feature vectors with their corresponding labeling are stored.

The probability P that a feature vector with unknown label

(a query point) has a label equal to 1 is estimated by

inspecting the k closest neighbors of this vector in feature

space. Suppose that n of those neighbors have a label equal

to 1, then [5]

2.

P ¼n

k:

ð11Þ

For determining which feature vectors are closest to the

query point, the Euclidean distance is used in this work.

Because kNN-classifiers are sensitive for scaling between the

different features, each feature is normalized independently

to zero mean and unit variance. The parameters for this

linear transformation are obtained from the training data.

To summarize, by constructing feature sets for the local

and bilocal reference sets, a local and a bilocal kNN-classifier

can be trained. These classifiers enable the estimation of Pi

and Pijfor unlabeled feature sets using (11). The classifiers

can be regarded as “look-up” tables for the local and bilocal

probabilities.

To reduce the number of bilocal probabilities that have to

be learned and to avoid long-range interaction, a neighbor-

hood or “clique” can be used, in which element ?i only

interacts with a limited number of other elements. Such a

neighborhood also reduces the number of computations in

the Metropolis algorithm since fewer neighbors have to be

taken into account.

4EXAMPLES

In this section, we will illustrate the proposed method in

two examples. The first example deals with synthetic data

and shows grouping of line elements into a cord. In the

second example, real-world data is used in the detection of

the vasculature in retinal fundus images.

4.1

As a first example, we experiment with cords existing of

line elements. Ten training images of size 400 ? 400 pixels

are generated, which contain 2;020 line elements of which

20 form a cord (an example is shown in Fig. 1). All line

elements have a length of 10:0 ? 1:0 pixels (all distribu-

tions to generate the training and test data in this section

are uniform). The orientation of the line elements that

form the background varies between 0 degrees and

360 degrees. The cords have a random orientation ? and

the orientation of their constituting line elements varies

between ? ? 1:8 degrees and ? þ 1:8 degrees.

Only one local feature is taken into account: the mean ?i

of the gray values of the line elements, which is 5:8 ? 4:2 for

background elements and 10:0 ? 0:2 for foreground ele-

ments. These values cause some overlap between the

distributions of the local features. With these settings, a

fair number of foreground elements will be selected as

background in local classification.

For the bilocal a priori probabilities, five bilocal features

are computed, with three based on the geometry of the line

elements and two on the local features. The latter two are

the sum and the absolute difference of the ?s of every

considered pair. The geometrical features are a measure for

distance (distance between the closest end-points), a

Grouping Line Elements into a Cord

STAAL ET AL.: A TRAINED SPIN-GLASS MODEL FOR GROUPING OF IMAGE PRIMITIVES 1175

Fig. 1. Example of a training image. The elements that constitute the

cord are in black, while the background elements are in gray.

Page 5

measure for mutual orientation (inner product between the

unit vectors aligned with the line elements), and an

alignment measure, see Fig. 2. Note that a parallel

displacement of one line element with respect to another

does not change their mutual orientation.

To decrease computational costs and to avoid long range

interaction, only the 10 closest neighbors are taken into

account.

The method is tested with 10 test images that are

constructed in the same way as the training data. The

kNN-classifiers needed to approximate Piand Pijare both

used with k ¼ 11. After classification, a line element is

classified as foreground if the mean of its spin is larger than

0:5 and to background otherwise. A local classification for

one of the images, shown in Fig. 3a, is presented in Fig. 3b.

Notice that about 50 percent of the cord is classified as

background, as is to be expected from the distributions for

the means.

Fig. 3c shows the results after bilocal classification. The

Metropolis algorithm is run 1;000 times. All but one of the

elements of the cord are classified as foreground.

To evaluate the result of the grouping, several measures

have been computed: the number of true positives TP

(elements correctly classified as foreground), the number of

true negatives TN (elements correctly classified as back-

ground), the number of false positives FP (elements

incorrectly classified as foreground), and the number of

false negatives (elements incorrectly classified as back-

ground). Their values are listed in Table 1. The table shows

clearly that the classification result after grouping increases:

Instead of an error of 55:5 percent in foreground classifica-

tion, an error of 12:5 percent is obtained.

Finally, we investigated how much the means of the

spins changed on average after the bilocal classification,

cf. (9). The changes for the elements classified as foreground

show a increase of 0:245 on average for the bilocal a

posteriori probabilities with respect to the local a priori

probabilities. For the background elements no changes are

found.

4.2

In this section, the method is tested on two-dimensional

medical images of the retina of the human eye; for an

example, see Fig. 4. These images, also known as fundus

images, are acquired by making photographs of the back of

the eye. The image processing task is to delineate the vessel

structure.

Since image ridges are natural indicators of vessels, we

start our analysis with ridge detection. In the Appendix, a

short overview is given on ridge detection in two-dimen-

sional gray value images. For a more extensive discussion on

this subject, see [6]. The ridges of Fig. 4 are shown in Fig. 5.

The problem of detecting the vessels in Fig. 4 is thus

reduced to detecting which ridge pixels in Fig. 5 delineate

vessels. It is obvious from the abundance of ridges in Fig. 5

that this representation is still suboptimal.

To improve the representation, the ridge point sets are

fragmented into convex subsets. Each of these convex

subsets represents a line segment. The so obtained set of

line segments is the basic “grouping set” of geometrical

image primitives.

A convex point set is a set of points such that with

every couple of points that belong to the set, all points that

Segmenting Ridges in Retinal Fundus Images

1176IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27,NO. 7, JULY 2005

Fig. 2. (a) The shortest distance between the end-points of two line

elements is taken as the distance d between the elements. Note that

there are four distances between the end-points of two elements. (b) The

angle between two line elements is characterized by the absolute value

of the inner product of the unit vectors ^ n niand ^ n njthat are aligned with the

line elements. (c) A (symmetric) measure for alignment is found by

looking for the end-points which are closest to each other and forming a

vector ^ r r of unit length along the line between the two other end-points.

Note that those end-points are not necessarily the end-points with the

longest distance between the two line elements. The alignment measure

is now defined as the mean of the absolute values of the inner product of

^ r r with ^ n niand ^ n nj:1

2ðj^ r r ? ^ n nij þ j^ r r ? ^ n njjÞ.

Fig. 3. (a) Input image. The gray values denote the value of ?i. Dark means higher value. (b) Locally classified image. The gray value of the elements

measure the probability on spin up (darker denotes higher probability, lighter denotes lower probability). (c) As in (b), but now with bilocal interaction

added. Note that the spins of the elements in the cord have become stronger.

Page 6

lie in-between these two points on the line connecting

them also belong to the set. In more formal terms:

Definition 1. A point set R is convex if for all points x 2 R,

y 2 R, and, for a scalar ? 2 ½0;1?, the point zð?Þ ¼ x? þ

yð1 ? ?Þ 2 R.

A slightly different form of this basic definition is used,

in which the information governed by the directional

information of the ridges is exploited (see the Appendix).

This definition replaces Euclidean convexity by geodesic

convexity. The resulting sets are called affine convex sets.

The mechanism to obtain affine convex sets is a simple

region growing algorithm which compares an already

grouped ridge pixel with ungrouped pixels in a neighbor-

hood of radius ?c, where the subscript c stands for

connectivity. The condition on grouping a grouped and a

candidate pixel within the neighborhood is based on

two comparisons:

1.

Is the direction of the ridges on which the pixels are

found similar?

If so, are the pixels on the same ridge or are they on

parallel ridges?

2.

The first question can be checked by taking the scalar

product of the principal eigenvectors of the Hessian matrix

at the location of the ridge pixels. The principal eigenvectors

are perpendicular to the ridges (Appendix). If the orienta-

tions are similar, the scalar product will be close to 1. The

second question can be checked by computing the unit-

length normalized vector ^ r r between the locations of the

two pixels under consideration and taking the vector

product between this vector and the principal direction of

the grouped ridge pixel. If the pixels are on the same

segment, the vector product will be close to 1. See also Fig. 6

for the construction of the sets.

The following inequalities are checked:

??

^ v vðxg;?Þ ? ^ v vðxu;?Þ

??

xg? xu

??? ?c;

ð12Þ

????? ?o;

??? ?p;

ð13Þ

^ v vðxg;?Þ ^ ^ v v

ð14Þ

STAAL ET AL.: A TRAINED SPIN-GLASS MODEL FOR GROUPING OF IMAGE PRIMITIVES 1177

TABLE 1

Results from the Experiments of Section 4.1

The total number of background elements considered is 20;000 and the

total number of foreground elements 200. The first row shows the true

positives. The second row shows the true negatives. The false positives

and false negatives are given in row three and four, respectively. The

fifth row shows how much the means of the spins of the foreground

elements increase on average because of the grouping (see (9)). The

last row shows the same for the background elements.

Fig. 4. An image of the fundus of the human retina. The field of view is

approximately 540 pixels.

Fig. 5. The ridges (black) of Fig. 4 obtained at scale t ¼ 1:0 pixel2. Note

the large response of the ridge detector with respect to the noise in the

background.

Fig. 6. The dark curved lines are two ridges. The diameter of the disk is

?c. vgis the eigenvector belonging to a grouped pixel and v1and v2are

the eigenvectors of still ungrouped pixels. The vectors r1and r2are unit

vectors pointing from the grouped pixels to the ungrouped pixels. The

pixel that belongs to the same ridge will be added to the group because it

satisfies the conditions in (12), (13), and (14). The pixel on the parallel

ridge does not satisfy condition (14) and will not be grouped.

Page 7

where the subscript g stands for grouped, u for ungrouped,

o for orientation, and p for parallelism. The ?s determine the

measure for similarity. For the other symbols, see Fig. 6.

Using these techniques, the convex sets of the ridges in

Fig. 5 are displayed in Fig. 8a. These convex sets have

been used for local and bilocal classification. For this

purpose, 30 fundus images have been taken for which the

convex sets were computed. The ridges are detected at a

scale t ¼ 1:0 pixel2. For the convex sets, the following

settings are used: ?c¼ 3:0 pixel, ?o¼ 0:98, and ?p¼ 0:98.

This resulted in 106;206 sets, of which, after manual

labeling, 28;501 turned out to be marked as vessel.

The 30 images are divided in a training set of 15 images

and a test set of the remaining 15 images. To approximate Pi

and Pij, local and bilocal features are computed and

kNN-classifiers, using the corresponding training sets, are

built. To avoid long range interactions and to reduce

computational costs, for Pij only, the 10 closest neighbors

are taken into account.

The following local features are computed for every

convex set i:

1.

The mean ?iof the image gray values at the Mipixel

locations of the convex set

?i¼

1

Mi

X

m

Lðxm;i;ym;iÞ;

where L denotes the gray value image and ðxm;i;ym;iÞ

denotes the pixel locations of the ith convex set.

A measure for the width of vessels is computed in

the following way: For every pixel ðxm;i;ym;iÞ in the

convex set, the principal direction ^ v vm;iis known (see

the Appendix and discussion above). The principal

directions are perpendicular to the ridges, i.e.,

perpendicular to the vessels. One-dimensional gray

value profiles centered at ðxm;i;ym;iÞ and in the

direction of ^ v vm;i are extracted from the image. In

every profile, the edges on the left and right-hand

side of ðxm;i;ym;iÞ are detected. The distance between

the locations in the profile with strongest edge

response on the left and right side is taken as the

width ?m;i for profile m. The measure wi for the

width is the mean of the widths of all profiles

2.

wi¼

1

Mi

X

m

?m;i:

3.

A measure ?ifor the edge strength in the convex set

is computed as follows: The response of the

strongest edges on the left and right side of the

profiles (see previous item), ?m;i and ?m;i, respec-

tively, are averaged, yielding

?i¼

1

Mi

X

m

?m;iþ ?m;i:

4.

The curvature ?iof the convex set, defined as

?i¼

1

Mi? 1

X

Mi?1

m¼1

^ v vm;i? ^ v vm?1;i;

where ^ v vm;iis the principal direction corresponding

to pixel m of the convex set.

And, for the bilocal features between convex sets i and j,

the following measures are taken:

1.

The Euclidean distance between the closest end-

points of the sets (see also Fig. 2).

The sum of ?i and ?j (see Item 1 of the local

features).

The absolute difference of ?iand ?j(see Item 1 of the

local features).

The sum of ?iand ?j(see Item 3 of the local features).

The absolute difference of ?iand ?j(see Item 3 of the

local features).

The sum of ?i and ?j (see Item 4 of the local

features).

The absolute difference of ?iand ?j(see Item 4 of the

local features).

The mutual orientation (see also Fig. 2).

The mutual alignment (see also Fig. 2).

The method is evaluated using the test set. All 15 data sets

are classified, both locally and bilocally. The local and bilocal

kNN-classifiers are used with k ¼ 21 and 2;500 Metropolis

steps are taken. The performance of the system is measured

with ROC-curves [22]. An ROC-curve plots the fraction of

convex sets that is falsely classified as vessel against the

fraction that is correctly classified as vessel. The fractions are

determined by setting a threshold on the mean of the spins

and are defined as

2.

3.

4.

5.

6.

7.

8.

9.

true positive fraction ¼ sensitivity ¼

TP

TP þ FN;

TN þ FP:

false positive fraction ¼ 1 ? specificity ¼

FP

The closer an ROC-curve approaches the top left corner,

the better the performance of the system. A single measure

to quantify this behavior is the area under the curve, Az,

which is 1 for a perfect system. A system that makes

random classifications has an ROC-curve that is a straight

line through the origin with slope 1 and Az¼ 0:5.

For the test set, Az¼ 0:851 is found for local classification

and Az¼ 0:881 for bilocal classification. The curves are

plotted in Fig. 7a.

Another measure for evaluation is the accuracy of the

system

accuracy ¼

TN þ TP

TN þ TP þ FP þ FN;

which is also dependent on the value of the threshold value.

Thethresholdforoptimalaccuracycanbeestimatedfromthe

training set with leave-one-out experiments. Every image in

the training set is classified using the 14 other images in the

training set. For various threshold values, the accuracy is

computed and the threshold at which maximum accuracy is

found is used for computing the accuracy of the test set. In

Fig. 7b, the accuracy of the training set as function of the

threshold is given. Maximum accuracy for the local classi-

fication is found if hsii is thresholded at 0:5. For the bilocal

classification, this value is 0:95. The accuracies of the test set

at these threshold values are 0:870 for local classification and

1178IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 7, JULY 2005

Page 8

0:886 for bilocal classification. The points of maximum

accuracy on the ROC-curves are plotted in Fig. 7a. In

Table 2, an overview of all computed measures is given.

An example of the classification results is shown in

Figs. 8c and 8d.

From Table 2, it can be concluded that the foreground

classification has benefited from the grouping method. The

total number of correctly classified vessel sets has increased.

It also causes an increase in the number of correctly

classified background elements, whereas the number of

wrongly classified elements is reduced.

Not only is the number of true positives increased, their

a posteriori probabilities increased on average by 0:138,

which is the purpose of the method. The a posteriori

probabilities of the background elements decreased on

average with 0:016.

In Fig. 9a, the distribution of ?i, see (9), is shown in a

histogram for the foreground elements (the width of the bins

is 0:1). Fig. 9b shows the same for the background elements.

Thedistributionsarecenteredaroundzero,butskewedtothe

right for the foreground elements and to the left for the

background elements. As Fig. 9a shows, 46:5 percent of the

foreground elements have a change of the spins between

?0:05 and 0:05. The area of the bins above ?i¼ 0 shows that

42:1 percent of the elements had an increase of their mean

spin value, while the area of the bins below zero show that

11:4 percent had a decrease of the mean spin value. For the

background elements, 41:5 percent of the mean values

changed only between ?0:05 and 0:05. An increase of the

meanvalueswasfoundfor17:0percent,versusadecreasefor

41:5 percent of the elements.

5DISCUSSION

In this paper, a method is presented for grouping image

primitives based on local and bilocal features. The method

performs well on synthetic data. Compared to local

classification, the number of classification errors is reduced

and the confidence with which the elements are classified is

increased.

In the retinal fundus images, ROC-analysis shows that

bilocal classification is better than local classification. The

area under the curve increases from 0:851 to 0:881. For a

threshold of 0:5 on hsii in the local case and of 0:95 in the

bilocal case, the average increase of the posterior probabil-

ities for correctly classified convex sets is 0:138. The number

of true positives and negatives increases, whereas the

number of false positives and negatives decreases as well.

It must be noted that the test on the fundus images is

meant to serve as an illustration. For a genuine evaluation

of the approach on real world images, the characteristic

features of the image at hand must be determined by

performing feature selection on a larger variety of features.

The method itself can be applied to a variety of grouping

and classification problems. In this study, we consider the

grouping of line elements and convex sets, but the grouping

of individual pixels, pixel sets, or other structures can be

studied as well. Of course, in those cases, other types of

features will be needed, but the basis of the algorithm

remains the same. Extension to higher dimensional images

is straightforward. The complexity will remain the same,

OðN2Þ for fully connected bilocal interactions, with N being

the number of elements. It is also possible to include higher

order interactions (trinary, n-nary), although for this

extension the complexity will increase as OðNnÞ.

As an alternative for the definition of Bijbased on the

conditional probabilities, the joint probability can be used,

which we will denote by Qij. In that case, by demanding

that hsisji ¼ Qij, the following formula similar to (8) can be

derived

Bij¼ ?1

?loge

3Qij

1 ? Qij:

The factor 3 in the nominator appears because now the

partition function has to take 4 possible configurations into

account: si¼ 0 ^ sj¼ 0, si¼ 1 ^ sj¼ 0, si¼ 0 ^ sj¼ 1, and

si¼ 1 ^ sj¼ 1. For the estimation of Qijwith a classifier, the

training set has to be built by setting the label to “true” if

both elements belong the foreground and to “false”

STAAL ET AL.: A TRAINED SPIN-GLASS MODEL FOR GROUPING OF IMAGE PRIMITIVES1179

Fig. 7. (a) ROC curves for the local and bilocal classification of retinal fundus images. The area under the curve is 0:851 for the local curve and 0:881

for the bilocal curve. The dots on the ROC-curve are at the location of maximum accuracy. (b) Accuracy of the training set as function of the threshold

value. Maximum accuracy is found at hsii ¼ 0:5 for local classification and at 0:95 for bilocal classification.

Page 9

otherwise. Note that the training set for the joint probabil-

ities will consist of more feature vectors than the training set

for the conditional probabilities. This can increase the time

for training and classification.

Experiments we did with the joint probabilities show

similar behavior as with the conditional probabilities. The

performance is better than with local classification only.

However, in our examples, the conditional probabilities

give better results than the joint probabilities. The time to

classify one image increased from about 20 seconds to about

30 seconds (unoptimized code).

With the definitions we have given for Bij, only pairs of

spins which are both in the “on” state contribute to the

energy in (2). This is in accordance with the goal that only

grouped elements should contribute to the energy. Extra

terms like Cijsið1 ? sjÞ could be added to discriminate

between foreground and background, or Dijð1 ? siÞð1 ? sjÞ

for grouping background elements. Here, Cijand Dijare the

bilocal potentials for these cases, respectively. It depends on

the application whether or not such terms make sense.

Finally, a MAP-estimation may be obtained by using

simulated annealing methods [1], [8], [16]. In that case, the

factor ??1should be removed from (6) and (8) so that the

Metropolis algorithm becomes dependent on ?. The

Metropolis algorithm is started with a value of ? close to

zero. After the system has come to equilibrium, the value of

? is increased and the Metropolis algorithm is executed

again. This scheme is repeated until ? is so large that the

1180 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. 27, NO. 7,JULY 2005

Fig. 8. (a) The convex sets of the ridges of Fig. 5. Every grouped set has its own color. Note that sets which consist of 1 pixel have been removed.

(b) Manually labeled convex sets for the vessels. (c) Locally classified convex sets. Sets with higher mean spin are shown darker. (d) Bilocally

classified convex sets. Again, darker elements mean higher mean spin.

Page 10

system is forced into a “frozen” configuration (? ! 1).

However, the rate in which ? is allowed to increase makes

this algorithm very slow. Increasing ? faster than allowed

can yield unstable and nonunique results because the

energy may have multiple minima.

APPENDIX

The ridge detection used in this paper is described in full

detail in [14]. Here, a short overview for two-dimensional

images is presented.

Ridges and valleys are defined as those points where the

image has anextremum in thedirection of the largest surface

curvature. Mathematically, the points in the image LðxÞ are

searched, with x ¼ ðx1;x2ÞT, where the first derivative of the

luminance in the direction of the largest surface curvature

changes sign.

The direction of largest surface curvature is the

eigenvector ^ v v of the matrix of second order derivatives

of the image which has the largest absolute eigenvalue ?.

This matrix is often referred to as the Hessian matrix. The

first derivative of the image in the direction of ^ v v is found

by projecting the gradient of the image onto it. The sign

of ? determines whether a valley (? > 0) or a ridge

(? < 0) is found.

Because taking derivatives of discrete images is an ill-

posed operation, they are taken at a scale t using the

Gaussian scale-space technique (see e.g., [7] and references

therein). The main idea is that the image derivatives can be

taken by convolving the image with derivatives of a

Gaussian

@iLðx;tÞ

@xji

¼

1

2?t

Z

x02I R

2

@ie?kx?x0k2=t

@xji

Lðx0Þdx0;

ð15Þ

where xjis the image coordinate with respect to which the

derivative is taken. Mixed derivatives are computed by

taking mixed derivatives of the Gaussian kernel.

It is now possible to define a scalar field ?ðx;tÞ over the

image that takes value ?1 for valleys, 1 for ridges, and 0

elsewhere as follows:

?ðx;tÞ ¼ ?1

2signð?ðx;tÞÞ

? sign gðxþ?^ v v;tÞ ? ^ v v

where the gradient vector gðx;tÞ is defined as r rLðx;tÞ,

?ðx;tÞ is the largest eigenvalue by absolute value of the

Hessian matrix Hðx;tÞ ¼ r rr rTLðx;tÞ, and ^ v vðx;tÞ is the unit-

length normalized eigenvector belonging to that eigenvalue.

In (16), ^ v v is evaluated at ðx;tÞ. The parameter ? is the spatial

accuracy with which the point-sets are detected. In the

continuous case, the limit ? ! 0 is taken but, in the discrete

pixel case, ? ¼ 1:0 pixel is a natural choice.

Fig. 5 shows an example of ridge detection at a fundus

image (only valleys are shown).

ð Þ ? sign gðx??^ v v;tÞ ? ^ v v

ðÞ

jj;

ð16Þ

ACKNOWLEDGMENTS

This work was carried out in the framework of the NWO

research project STW-UGN/4496.

REFERENCES

[1] E. Aarts and J. Korst, Simulated Annealing and Boltzmann Machines.

John Wiley and Sons, 1989.

S. Arya, D.M. Mount, N.S. Netanyahu, R. Silverman, and A.Y. Wu,

“An Optimal Algorithm for Approximate Nearest Neighbor

Searching in Fixed Dimensions,” J. ACM, vol. 45, no. 6, pp. 891-

923, 1998.

C.M. Bishop, Neural Networks for Pattern Recognition. Oxford Univ.

Press, 1995.

[2]

[3]

STAAL ET AL.: A TRAINED SPIN-GLASS MODEL FOR GROUPING OF IMAGE PRIMITIVES1181

Fig. 9. Distribution of ?ifor (a) the foreground and (b) the background

elements.

TABLE 2

Results from the Experiments of Section 4.2

The thresholds used are 0:5 in the local case and 0:95 in the bilocal case.

The first row shows Az. The second row shows the number of true

positives, the third row the true negatives, the fourth row the false

positives, and the fifth row the false negatives. The accuracy, sensitivity,

and specificity are displayed in rows six to eight. The ninth row shows

how much the means of the spins of the foreground elements increase

on average because of the grouping (see (9)). The last row shows the

same for the background elements.

Page 11

[4]

B. Caputo and H. Niemann, “From Markov Random Fields to

Associative Memories and Back: Spin-Glass Markov Random

Fields,” Proc. IEEE Workshop Statistical and Computational Theories

of Vision, 2001.

R.O. Duda, P.E. Hart, and H.G. Stork, Pattern Classification,

second ed. New York: Wiley-Interscience, 2001.

D. Eberly, Ridges in Image and Data Analysis. Kluwer Academic

Publishers, 1996.

L.M.J. Florack, Image Structure. Kluwer Academic Press, 1997.

S. Geman and D. Geman, “Stochastic Relaxation, Gibbs Distribu-

tions, and the Bayesian Restoration of Images,” IEEE Trans. Pattern

Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721-741, 1984.

D.M. Greig, B.T. Porteous, and A.H. Seheult, “Exact Maximum A

Posteriori Estimation for Binary Images,” J. Royal Statistical Soc.,

Series B, vol. 51, no. 2, pp. 271-279, 1989.

[10] G. Guy and G. Medioni, “Inferring Global Perceptual Contours

from Local Features,” Int’l J. Computer Vision, vol. 20, no. 1/2,

pp. 113-133, 1996.

[11] L. He ´rault and R. Horaud, “Figure-Ground Discrimination: A

Combinatorial Optimization Approach,” IEEE Trans. Pattern

Analysis and Machine Intelligence, vol. 15, no. 9, pp. 899-914, Sept.

1993.

[12] J.J. Hopfield, “Neural Networks and Physical Systems with

Emergent Collective Computational Abilities,” Proc. Nat’l Academy

of the Sciences USA, vol. 79, pp. 2554-2558, 1982.

[13] S.N. Kalitzin, J.J. Staal, B.M. ter Haar Romeny, and M.A. Viergever,

“Image Segmentation and Object Recognition by Bayesian Group-

ing,” Proc. IEEE Int’l Conf. Image Processing, 2000.

[14] S.N. Kalitzin, J.J. Staal, B.M. ter Haar Romeny, and M.A. Viergever,

“A Computational Method for Segmenting Topological Point Sets

and Application to Image Analysis,” IEEE Trans. Pattern Analysis

and Machine Intelligence, vol. 23, no. 5, pp. 447-459, May 2001.

[15] G.N. Khan and D.F. Gillies, “Extracting Contours by Perceptual

Grouping,” Image and Vision Computing, vol. 10, no. 2, pp. 77-88,

1992.

[16] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, “Optimization by

SimulatedAnnealing,”Science, vol. 220, no. 4598, pp. 671-680,1983.

[17] J. Kittler and J. Illingworth, “Relaxation Labelling Algorithms—A

Review,”ImageandVisionComputing,vol.3,no.4,pp.206-216,1985.

[18] V. Kolmogorov and R. Zabih, “What Energy Functions can be

Minimized via Graph Cuts,” IEEE Trans. Pattern Analysis and

Machine Intelligence, vol. 26, no. 2, pp. 147-160, Feb. 2004.

[19] J. Marroquin, S. Mitter, and T. Poggio, “Probabilistic Solution of

Ill-Posed Problems in Computation Vision,” J. Am. Statistical

Assoc., vol. 82, no. 397, pp. 76-89, 1987.

[20] J.L. Marroquin, “A Markovian Random Field of Piecewise Straight

Lines,” Biological Cybernetics, vol. 61, pp. 457-465, 1989.

[21] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller,

and E. Teller, “Equation of State Calculations by Fast Computing

Machines,” J. Chemical Physics, vol. 21, no. 6, pp. 1087-1092, 1953.

[22] C.E. Metz, “Basic Principles of ROC Analysis,” Seminars in Nuclear

Medicine, vol. 8, no. 4, pp. 283-298, 1978.

[23] P. Parent and S.W. Zucker, “Trace Inference, Curvature Consis-

tency and Curve Detection,” IEEE Trans. Pattern Analysis and

Machine Intelligence, vol. 11, no. 8, pp. 823-839, Aug. 1989.

[24] P. Perona and W.T. Freeman, “A Factorization Approach to

Grouping,” Proc. European Conf. Computer Vision, vol. 1, pp. 655-

670, 1998.

[25] M. Pilu and R.B. Fisher, “Model-Driven Grouping and Recogni-

tion of Generic Object Parts from Single Images,” J. Robotics and

Autonomous Systems, vol. 21, pp. 107-122, 1997.

[26] T. Pun, “Electromagnetic Models for Perceptual Grouping,” Ad-

vancesinMachineVision:StrategiesandApplications, C.Archibald,ed.,

WorldScientificPublishingCo.,1992.

[27] A. Robles-Kelly and E.R. Hancock, “Perceptual Grouping Using

Eigendecomposition and the EM Algorithm,” Proc. 12th Scandina-

vian Conf. Image Analysis, pp. 214-221, 2001.

[28] J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,”

IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8,

pp. 888-905, Aug. 2000.

[29] Y. Weiss, “Segmentation Using Eigenvectors: A Unifying View,”

Proc. IEEE Int’l Conf. Computer Vision, pp. 975-982, 1999.

[30] L.R. Williams and D.W. Jacobs, “Stochastic Completion Fields: A

Neural Model of Illusory Contour Shape and Salience,” Neural

Computation, vol. 9, no. 4, pp. 837-858, 1997.

[5]

[6]

[7]

[8]

[9]

Joes Staal received the MSc degree in applied

physics at the Technical University of Delft, the

Netherlands, in 1995 and the PhD degree in

medical image processingat the Image Sciences

Institute, University Medical Center Utrecht, the

Netherlands, in 2004. He is currently employed

by TNO-TPD, Delft, the Netherlands, as a

research associate at the department of Instru-

mentation and Information Systems.

Stiliyan N. Kalitzin graduated in nuclear and

high-energy physics at the University of Sofia,

Bulgaria, in 1981 and received the PhD degree

in theoretical physics in 1988. In 1990, he joined

the University of Utrecht, Institute of Theoretical

Physics, where he continued his work on

supersymmetry and supergravity and got in-

volved in research on cellular automata, neural

networks, and biological modelling. In 1992, he

was enrolled as researcher in the Visual

Systems Analysis group with the Academic Medical Center University

Hospital in Amsterdam, where he contributed to the development and

analysis of biological neural network models of the human vision. From

1996 until 1999, he worked in the Image Sciences Institute at the

University Medical Center in Utrecht in the area of multiscale image

analysis, topological structure analysis of images, and perceptual

grouping. Since 1999, he has been with the Dutch Epilepsy Clinics

Foundation (SEIN) as head of the Medical Physics Department. His

current research interests are in the fields of nonlinear system dynamics,

signal and image processing, seizure prediction, closed-loop epileptic

seizure control, and large-scale neural network modeling of normal and

epileptic brain activity.

Max A. Viergever received the MSc degree in

applied mathematics in 1972 and the DSc

degree with a thesis on cochlear mechanics in

1980, both from the Delft University of Technol-

ogy. From 1972 to 1988, he was an assistant/

associate professor of applied mathematics at

this university. Since 1988, he has been a

professor and head of the Department of

Medical Imaging at Utrecht University, where

he became an adjunct professor of physics in

1989 and an adjunct professor of computer science in 1996. Since 1996,

he has been the scientific director of the Image Sciences Institute of the

University Medical Center Utrecht and, since 1998, the director of the

Graduate School for Biomedical Image Sciences (ImagO). He has been

a (co)author of more than 400 refereed scientific articles on biophysics

and medical imaging, guest editor of eight journal issues, (co)author/

editor of 15 books, and has served as supervisor of 60 PhD theses and

100 MSc theses. His research interests comprise all aspects of medical

imaging. He is a board member of IAPR, IPMI, and MICCAI, editor of the

book series Computational Imaging and Vision (Kluwer Academic

Publishers) editor-in-chief of the IEEE Transactions on Medical Imaging,

editor of the Journal of Mathematical Imaging and Vision, and has acted

as associate editor, guest editor, or editorial board member of nine more

international journals.

. For more information on this or any other computing topic,

please visit our Digital Library at www.computer.org/publications/dlib.

1182IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27,NO. 7, JULY 2005