# Particle Filtering for Multiple Object Tracking in Dynamic Fluorescence Microscopy Images: Application to Microtubule Growth Analysis

**ABSTRACT** Quantitative analysis of dynamic processes in living cells by means of fluorescence microscopy imaging requires tracking of hundreds of bright spots in noisy image sequences. Deterministic approaches, which use object detection prior to tracking, perform poorly in the case of noisy image data. We propose an improved, completely automatic tracker, built within a Bayesian probabilistic framework. It better exploits spatiotemporal information and prior knowledge than common approaches, yielding more robust tracking also in cases of photobleaching and object interaction. The tracking method was evaluated using simulated but realistic image sequences, for which ground truth was available. The results of these experiments show that the method is more accurate and robust than popular tracking methods. In addition, validation experiments were conducted with real fluorescence microscopy image data acquired for microtubule growth analysis. These demonstrate that the method yields results that are in good agreement with manual tracking performed by expert cell biologists. Our findings suggest that the method may replace laborious manual procedures.

**1**Bookmark

**·**

**113**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**The tracking and analysis of cell activities in time-lapse sequences plays an important role in understanding complex biological processes such as the spread of the tumor, an invasion of the virus, the wound recovery and the cell division. For automatic tracking of cells, the tasks such as the cell detection at each frame, the investigation of the correspondence between cells in previous and current frames, the identification of the cell division and the recognition of new cells must be performed. This paper proposes an automatic cell tracking algorithm. In the first frame, the marker of each cell is extracted using the feature vector obtained by the analysis of cellular regions, and then the watershed algorithm is applied using the extracted markers to produce the cell segmentation. In subsequent frames, the segmentation results of the previous frame are incorporated in the segmentation process for the current frame. A combined criterion of geometric and intensity property of each cell region is used for the proper association between previous and current cells to obtain correct cell tracking. Simulation results show that the proposed method improves the tracking performance compared to the tracking method in Cellprofiler (the software package for automatic analysis of bioimages).The Journal of the Korea Contents Association. 01/2011; 11(5). - SourceAvailable from: William J. Godinez[Show abstract] [Hide abstract]

**ABSTRACT:**Tracking subcellular structures as well as viral structures displayed as 'particles' in fluorescence microscopy images yields quantitative information on the underlying dynamical processes. We have developed an approach for tracking multiple fluorescent particles based on probabilistic data association. The approach combines a localization scheme that uses a bottom-up strategy based on the spot-enhancing filter as well as a top-down strategy based on an ellipsoidal sampling scheme that uses the Gaussian probability distributions computed by a Kalman filter. The localization scheme yields multiple measurements that are incorporated into the Kalman filter via a combined innovation, where the association probabilities are interpreted as weights calculated using an image likelihood. To track objects in close proximity, we compute the support of each image position relative to the neighboring objects of a tracked object and use this support to re-calculate the weights. To cope with multiple motion models, we integrated the interacting multiple model algorithm. The approach has been successfully applied to synthetic 2D and 3D images as well as to real 2D and 3D microscopy images, and the performance has been quantified. In addition, the approach was successfully applied to the 2D and 3D image data of the recent Particle Tracking Challenge at the IEEE International Symposium on Biomedical Imaging (ISBI) 2012.IEEE Transactions on Medical Imaging 09/2014; · 3.80 Impact Factor - SourceAvailable from: hal.inria.fr

Page 1

IEEE TRANSACTIONS ON MEDICAL IMAGING

Particle Filtering for Multiple Object Tracking

in Dynamic Fluorescence Microscopy Images:

Application to Microtubule Growth Analysis

1

Ihor Smal1, Katharina Draegestein2, Niels Galjart2, Wiro Niessen1, Erik Meijering1

1Biomedical Imaging Group Rotterdam and2Department of Cell Biology

Erasmus MC – University Medical Center Rotterdam

P. O. Box 2040, 3000 CA Rotterdam, The Netherlands

Abstract

Quantitative analysis of dynamic processes in living cells by means of fluorescence microscopy imaging requires tracking

of hundreds of bright spots in noisy image sequences. Deterministic approaches, which use object detection prior to tracking,

perform poorly in the case of noisy image data. We propose an improved, completely automatic tracker, built within a Bayesian

probabilistic framework. It better exploits spatiotemporal information and prior knowledge than common approaches, yielding

more robust tracking also in cases of photobleaching and object interaction. The tracking method was evaluated using simulated

but realistic image sequences, for which ground truth was available. The results of these experiments show that the method is

more accurate and robust than popular tracking methods. In addition, validation experiments were conducted with real fluorescence

microscopy image data acquired for microtubule growth analysis. These demonstrate that the method yields results that are in

good agreement with manual tracking performed by expert cell biologists. Our findings suggest that the method may replace

laborious manual procedures.

Index Terms

Bayesian estimation, particle filtering, sequential Monte Carlo, multiple object tracking, microtubule dynamics, fluorescence

microscopy, molecular bioimaging.

I. INTRODUCTION

In the past decade, advances in molecular cell biology have triggered the development of highly sophisticated live cell

fluorescence microscopy systems capable of in vivo multidimensional imaging of subcellular dynamic processes. Analysis of

time-lapse image data has redefined the understanding of many biological processes, which in the past had been studied using

fixed material. Motion analysis of nanoscale objects such as proteins or vesicles, or subcellular structures such as microtubules

(Fig. 1), commonly tagged with green fluorescent protein (GFP), requires tracking of large and time-varying numbers of spots in

noisy image sequences [1]–[7]. Nowadays, high-throughput experiments generate vast amounts of dynamic image data, which

cannot be analyzed manually with sufficient speed, accuracy and reproducibility. Consequently, many biologically relevant

questions are either left unaddressed, or answered with great uncertainty. Hence, the development of automated tracking

methods which replace tedious manual procedures and eliminate the bias and variability in human judgments, is of great

importance.

Conventional approaches to tracking in molecular cell biology typically consist of two subsequent steps. In the first step,

objects of interest are detected separately in each image frame and their positions are estimated based on, for instance,

intensity thresholding [8], multiscale analysis using the wavelet transform [9], or model fitting [4]. The second step solves

the correspondence problem between sets of estimated positions. This is usually done in a frame-by-frame fashion, based on

nearest-neighbor or smooth-motion criteria [10], [11]. Such approaches are applicable to image data showing limited numbers of

clearly distinguishable spots against relatively uniform backgrounds, but fail to yield reliable results in the case of poor imaging

conditions [12], [13]. Tracking methods based on optic flow [14], [15] are not suitable because the underlying assumption

of brightness preservation over time is not satisfied in fluorescence microscopy, due to photobleaching. Methods based on

spatiotemporal segmentation by minimal cost path searching have also been proposed [16], [17]. Until present, however, these

have been demonstrated to work well only for the tracking of a single object [16], or a very limited number of well-separated

objects [17]. As has been observed [17], such methods fail when either the number of objects is larger than a few dozen, or

when the object trajectories cross each other, which make them unsuitable for our applications.

As a consequence of the limited performance of existing approaches, tracking is still performed manually in many laboratories

worldwide. It has been argued [1] that in order to reach similar superior performance as expert human observers in temporal

data association, while at the same time achieving a higher level of sensitivity and accuracy, it is necessary to make better use

Copyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained

from the IEEE by sending a request to pubs-permissions@ieee.org.

Page 2

IEEE TRANSACTIONS ON MEDICAL IMAGING2

5 µm

5 µm

5 µm

5 µm

5 µm

5 µm

5 µm

5 µm

5 µm

5 µm

5 µm

5 µm

(f)

(a)(b)

(c)

(d)

(e)

Fig. 1.

images are single frames from six 2D time-lapse studies, conducted with different experimental and imaging conditions. The quality of such images typically

ranges from SNR ≈ 5–6 (a-c) to the extremely low SNR ≈ 2–3 (d-f).

Examples of microtubules tagged with GFP-labeled plus end tracking proteins (bright spots), imaged using fluorescence confocal microscopy. The

of temporal information and (application specific) prior knowledge about the morphodynamics of the objects being studied. The

human visual system integrates to a high degree spatial, temporal and prior information [18] to resolve ambiguous situations in

estimating motion flows in image sequences. Here we explore the power of a Bayesian generalization of the standard Kalman

filtering approach in emulating this process. It addresses the problem of estimating the hidden state of a dynamic system by

constructing the posterior probability density function (pdf) of the state based on all available information, including prior

knowledge and the (noisy) measurements. Since this pdf embodies all available statistical information, it can be termed a

complete solution to the estimation problem.

Bayesian filtering is a conceptual approach, which yields analytical solutions, in closed form, only in the case of linear

systems and Gaussian statistics. In the case of non-linearity and non-Gaussian statistics, numerical solutions can be obtained

by applying sequential Monte Carlo (SMC) methods [19], in particular particle filtering (PF) [20]. In the filtering process,

tracking is performed by using a predefined model of the expected dynamics to predict the object states, and by using the

(noisy) measurements (possibly from different types of sensors) to obtain the posterior probability of these states. In the case

of multiple target tracking, the main task is to perform efficient measurement-to-target association, on the basis of thresholded

measurements [21]. The classical data association methods in multiple target tracking can be divided into two main classes:

unique-neighbor data association methods, as in the multiple hypothesis tracker (MHT), which associate each measurement with

one of the previously established tracks, and all-neighbors data association methods, such as joint probabilistic data association

(JPDA), which use all measurements for updating all track estimates [21]. The tracking performance of these methods is known

to be limited by the linearity of the data models. By contrast, SMC methods that propagate the posterior pdf, or methods that

propagate the first-order statistical moment (the probability hypothesis density) of the multitarget pdf [22], have been shown

to be successful in solving the multiple target tracking and data association problems when the data models are nonlinear and

non-Gaussian [23], [24].

Page 3

IEEE TRANSACTIONS ON MEDICAL IMAGING3

Previous applications of PF-based motion estimation include radar- and sonar-based tracking [24], [25], mobile robot

localization [19], [26], teleconferencing or video surveillance [27], and other human motion applications [28]–[30]. In most

computer vision applications, tracking is limited to a few objects only [31], [32]. Most biological applications, on the other

hand, require the tracking of large and time-varying numbers of objects. Recently, the use of PF in combination with level-

sets [33] and active contours [34] has been reported for biological cell tracking. These methods outperform deterministic

methods, but they are straightforward applications of the original algorithm [31] for single target tracking, and cannot be

directly applied to the simultaneous tracking of many intracellular objects. A PF-like method for the tracking of proteins has

also been suggested [35], but it still uses template matching for the linking stage, it requires manual initialization, and tracks

only a single object. In this paper, we extend our earlier conference reports [36], [37], and develop a fully automated PF-based

method for robust and accurate tracking of multiple nanoscale objects in two-dimensional (2D) and three-dimensional (3D)

dynamic fluorescence microscopy images. Its performance is demonstrated for a particular biological application of interest:

microtubule growth analysis.

The paper is organized as follows. In Section II we give more in-depth information on the biological application considered

in this paper, providing further biological motivation for our work. In Section III we present the general tracking framework and

its extension to allow tracking of multiple objects. Next, in Section IV, we describe the necessary improvements and adaptations

to tailor the framework to the application. These include a new dynamic model which allows dealing with object interaction

and photobleaching effects. In addition, we improve the robustness and reproducibility of the algorithm by introducing a new

importance function for data-dependent sampling (the choice of the importance density is one of the most critical issues in

the design of a PF method). We also propose a new, completely automatic track initiation procedure. In Section V, we present

experimental results of applying our PF method to synthetic image sequences, for which ground truth was available, as well

as to real fluorescence microscopy image data of microtubule growth. A concluding discussion of the main findings and their

potential implications is given in Section VI.

II. MICROTUBULE GROWTH ANALYSIS

Microtubules (MTs) are polarized tubular filaments (diameter ≈ 25 nm) composed of α/β-tubulin heterodimers. In most

cell types, one end of a MT (the minus-end) is embedded in the so-called MT organizing center (MTOC), while the other

end (the plus-end) is exposed to the cytoplasm. MT polymerization involves the addition of α/β-tubulin subunits to the plus

end. During MT disassembly, these subunits are lost. MTs frequently switch between growth and shrinkage, a feature called

dynamic instability [38]. The conversion of growth to shrinkage is called catastrophe, while the switch from shrinkage to growth

is called rescue. The dynamic behavior of MTs is described by MT growth and shrinkage rates, and catastrophe and rescue

frequencies. MTs are fairly rigid structures having nearly constant velocity while growing or shrinking [39]. MT dynamics is

highly regulated, as a properly organized MT network is essential for many cellular processes, including mitosis, cell polarity,

transport of vesicles, and the migration and differentiation of cells. For example, when cells enter mitosis, the cdc2 kinase

controls MT dynamics such that the steady-state length of MTs decreases considerably. This is important for spindle formation

and positioning [40]. It has been shown that an increase in catastrophe frequency is largely responsible for this change in MT

length [41].

Plus-end-tracking proteins, or +TIPs [42], specifically bind to MT plus-ends and have been linked to MT-target interactions

and MT dynamics [43]–[45]. Plus-end-tracking was first described for overexpressed GFP-CLIP170 in cultured mammalian

cells [46]. In time-lapse movies, typical fluorescent “comet-like” dashes were observed, which represented GFP-CLIP170 bound

to the ends of growing MTs. As plus-end tracking is intimately associated with MT growth, fluorescently labeled +TIPs are

now widely used to measure MT growth rates in living cells, and they are also the objects of interest considered in the present

work. With fluorescent +TIPs, all growing MTs can be discerned. Alternatively, the advantage of using fluorescent tubulin is

that all parameters of MT dynamics can be measured. However, in regions where the MT network is dense, the fluorescent MT

network obscures MT ends, making it very difficult to examine MT dynamics. Hence, in many studies based on fluorescent

tubulin [47]–[49], analysis is restricted to areas within the cells where the MT network is sparse. Ideally, one should use both

methods to acquire all possible knowledge regarding MT dynamics, and this will be addressed in future work.

+TIPs are well positioned to perform their regulatory tasks. A network of interacting proteins, including +TIPs, may govern

the changes in MT dynamics that occur during the cell cycle [50] . Since +TIPs are so important and display such a fascinating

behavior, the mechanisms by which +TIPs recognize MT ends have attracted much attention. In one view, +TIPs binds to

newly synthesized MT ends with high affinity and detach seconds later from the MT lattice, either in a regulated manner

or stochastically [46]. However, other mechanisms have also been proposed [44], [45], [51]. Measuring the distribution and

displacement of a fluorescent +TIP in time may shed light on the mechanism of MT end binding. However, this is a labor

intensive procedure if fluorescent tracks have to be delineated by hand, and very likely leads to user bias and loss of important

information. By developing a reliable tracking algorithm we obtain information on the behavior of all growing MTs within a

cell, which reveals the spatiotemporal distribution and regulation of growing MTs. Importantly, this information can be linked

to the spatiotemporal fluorescent distribution of +TIPs. This is extremely important, since the localization of +TIPs reports on

the dynamic state of MTs and the cell.

Page 4

IEEE TRANSACTIONS ON MEDICAL IMAGING4

III. TRACKING FRAMEWORK

Before describing the details of our tracking approach, we first recap the basic principles of nonlinear Bayesian tracking in

general (III-A), and PF in particular (III-B), as well as the extension that has been proposed in the literature to allow tracking

of multiple objects within this framework (III-C).

A. Nonlinear Bayesian Tracking

The Bayesian tracking approach deals with the problem of inferring knowledge about the unobserved state of a dynamic

system, which changes over time, using a sequence of noisy measurements. In a state-space approach to dynamic state

estimation, the state vector xtof a system contains all relevant information required to describe the system under investigation.

Bayesian estimation in this case is used to recursively estimate a time evolving posterior distribution (or filtering distribution)

p(xt|z1:t), which describes the object state xtgiven all observations z1:tup to time t.

The exact solution to this problem can be constructed by specifying the Markovian probabilistic model of the state evolution,

D(xt|xt−1), and the likelihood L(zt|xt), which relates the noisy measurements to any state. The required probability density

function p(xt|z1:t) may be obtained, recursively, in two stages: prediction and update. It is assumed that the initial pdf,

p(x0|z0) ≡ p(x0), also known as the prior, is available (z1:0= z0being the set of no measurements).

The prediction stage involves using the system model and pdf p(xt−1|z1:t−1) to obtain the prior pdf of the state at time t

via the Chapman-Kolmogorov equation:

p(xt|z1:t−1) =

?

D(xt|xt−1)p(xt−1|z1:t−1)dxt−1.

(1)

In the update stage, when a measurement ztbecomes available, Bayes’ rule is used to modify the prior density and obtain the

required posterior density of the current state:

p(xt|z1:t) ∝ L(zt|xt)p(xt|z1:t−1).

(2)

This recursive estimation of the filtering distribution can be processed sequentially rather than as a batch, so that it is not

necessary to store the complete data set nor to reprocess existing data if a new measurement becomes available [20]. The

filtering distribution embodies all available statistical information and an optimal estimate of the state can theoretically be

found with respect to any sensible criterion.

B. Particle Filtering Methods

The optimal Bayesian solution, defined by the recurrence relations (1) and (2), is analytically tractable in a restrictive set

of cases, including the Kalman filter, which provides an optimal solution in case of linear dynamic systems with Gaussian

noise, and grid based filters [20]. For most practical models of interest, SMC methods (also known as bootstrap filtering,

particle filtering, and the condensation algorithm [31]) are used as an efficient numerical approximation. The basic idea here

is to represent the required posterior density function p(xt|z1:t) with a set of Nsrandom samples, or particles, and associated

weights {x(i)

t,w(i)

t}Ns

i=1. Thus, the filtering distribution can be approximated as

p(xt|z1:t) ≈

Ns

?

i=1

w(i)

tδ(xt− x(i)

t),

where δ(·) is the Dirac delta function and the weights are normalized such that?Ns

The weights in this representation are chosen using a sequential version of importance sampling (SIS) [52]. It applies when

auxiliary knowledge is available in the form of an importance function q(xt|xt−1,zt) describing which areas of the state-space

contain most information about the posterior. The idea is then to sample the particles in those areas of the state-space where

the importance function is large and to avoid as much as possible generating samples with low weights, since they provide a

negligible contribution to the posterior. Thus, we would like to generate a set of new particles from an appropriately selected

proposal function, i.e.,

x(i)

t

∼ q(xt|x(i)

i=1w(i)

t

= 1. These samples and weights

are then propagated through time to give an approximation of the filtering distribution at subsequent time steps.

t−1,zt),i = {1,...,Ns}.

(3)

A detailed formulation of q(·|·) is given in Section IV-F.

With the set of state particles obtained from (3), the importance weights w(i)

t

may be recursively updated as follows:

w(i)

t

∝L(zt|x(i)

t)D(x(i)

q(x(i)

t|x(i)

t−1)

t|x(i)

t−1,zt)

w(i)

t−1.

(4)

Page 5

IEEE TRANSACTIONS ON MEDICAL IMAGING5

Generally, any importance function can be chosen, subject to some weak constraints [53], [54]. The only requirements are the

possibility to easily draw samples from it and evaluate the likelihood and dynamic models. For very large numbers of samples,

this MC characterization becomes equivalent to the usual functional description of the posterior pdf.

By using this representation, statistical inferences, such as expectation, maximum a posteriori (MAP), and minimum mean

square error (MMSE) estimators (the latter is used for the object position estimation in the approach proposed in this paper),

can easily be approximated. For example,

ˆ xMMSE

t

= Ep[xt] =

?

xtp(xt|z1:t)dxt≈

Ns

?

i=1

x(i)

tw(i)

t.

(5)

A common problem with the SIS particle filter is the degeneracy phenomenon, where after a few iterations, all but a few

particles will have negligible weight. The variance of the importance weights can only increase (stochastically) over time [53].

The effect of the degeneracy can be reduced by a good choice of importance density and the use of resampling [20], [52],

[53] to eliminate particles that have small weights and concentrate on particles with large weights (see [53] for more details

on degeneracy and resampling procedures).

C. Multi-Modality and Mixture Tracking

It is straightforward to generalize the Bayesian formulation to the problem of multi-object tracking. However, due to the

increase in dimensionality, this formulation gives an exponential explosion of computational demands. The primary goal in a

multi-object tracking application is to determine the posterior distribution, which is multi-modal in this case, over the current

joint configuration of the objects at the current time step, given all observations up to that time step. Multiple modes are caused

either by ambiguity about the object state due to insufficient measurements, which is supposed to be resolved during tracking,

or by measurements coming from multiple objects being tracked. Generally, MC methods are poor at consistently maintaining

the multi-modality in the filtering distribution. In practice it frequently occurs that all the particles quickly migrate to one of

the modes, subsequently discarding other modes.

To capture and maintain the multi-modal nature, which is inherent to many applications in which tracking of multiple objects

is required, the filtering distribution is explicitly represented by an M-component mixture model [55]:

p(xt|z1:t) =

M

?

m=1

πm,tpm(xt|z1:t),

(6)

with?M

indicators, {c(i)

use the equivalent notation {x(l)

fashion as the two-step approach for standard Bayesian sequential estimation [55].

m=1πm,t= 1 and a non-parametric model is assumed for the individual mixture components. In this case, the particle

representation of the filtering distribution, {x(i)

t}N

t

= m if particle i belongs to mixture component m. For the mixture component m we also

m,t,w(l)

t

: c(i)

t

t,w(i)

t}N

i=1with N = MNsparticles, is augmented with a set of component

i=1, with c(i)

m,t}Ns

l=1= {x(i)

t,w(i)

= m}N

i=1. The representation (6) can be updated in the same

IV. TAILORING THE FRAMEWORK

Having presented the general framework for PF-based multiple object tracking, we now tailor it to our application: the study of

MT dynamics. This requires making choices regarding the models involved as well as a number of computational and practical

issues. Specifically, we propose a new dynamic model, which does not only cover spatiotemporal behavior but also allows

dealing with photobleaching effects (IV-A) and object interaction (IV-B). In addition, we propose a new observation model and

corresponding likelihood function (IV-C), tailored to objects that are elongated in their direction of motion. The robustness and

computational efficiency of the algorithm are improved by using two-step hierarchical searching (IV-D), measurement gating

(IV-E) and a new importance function for data-dependent sampling (IV-F). Finally, we propose practical procedures for particle

reclustering (IV-G) and automatic track initiation (IV-H).

A. State-Space and Dynamic Model

In order to model the dynamic behavior of the visible ends of MTs in our algorithm, we represent the object state with

the state vector xt= (xt, ˙ xt,yt, ˙ yt,zt, ˙ zt,σmax,t,σmin,t,σz,t,It)T, where (σmax,t,σmin,t,σz,t)T? stis the object shape feature

vector (see IV-C), (xt,yt,zt)T? rtis the radius vector, ˙ rt? vtis velocity, and Itobject intensity. The state evolution model

D(xt|xt−1) can be factorized as

D(xt|xt−1) = Dy(yt|yt−1)Ds(st|st−1)DI(It|It−1),

(7)

Page 6

IEEE TRANSACTIONS ON MEDICAL IMAGING6

where yt = (xt, ˙ xt,yt, ˙ yt,zt, ˙ zt). Here, Dy(yt|yt−1) is modeled using a linear Gaussian model [53], which can easily be

evaluated pointwise in (4), and is given by

Dy(yt|yt−1) ∝

?

exp

−1

2(yt− Fyt−1)TQ−1(yt− Fyt−1)

?

,

(8)

with the process transition matrix F = diag[F1,F1,F1] and covariance matrix Q = diag[Q1,Q1,Q1] given by

F1=

?

1

0

T

1

?

and

Q1=

?

q11

q12

q12

q22

?

,

where T is the sampling interval. Depending on the parameters q11, q12, q22 the model (8) describes a variety of motion

patterns, ranging from random walk (?vt? = 0, q11?= 0, q12= 0, q22= 0) to nearly constant velocity (?vt? ?= 0, q11?= 0,

q12?= 0, q22?= 0) [56], [57]. In our application, the parameters are fixed to q11=

controls the noise level. In this case, model (8) corresponds to the continuous-time model ˙ r(t) = w(t) ≈ 0, where w(t) is white

noise that corresponds to noisy accelerations [56]. We also make the realistic assumption that object velocities are bounded.

This prior information is object dependent and will be used for state initialization (see IV-H). Small changes in frame-to-frame

MT appearance (shape) are modeled using the Gaussian transition prior Ds(st|st−1) = N(st|st−1,Tq2I), where N(·|µ,Σ)

indicates the normal distribution with mean µ and covariance matrix Σ, I is the identity matrix, and q2represents the noise

level in object appearance.

In practice, the analysis of time-lapse fluorescence microscopy images is complicated by photobleaching, a dynamic process

by which the fluorescent proteins undergo photoinduced chemical destruction upon exposure to excitation light and thus lose

their ability to fluoresce. Although the mechanisms of photobleaching are not yet well understood, two commonly used (and

practically similar) approximations of fluorescence intensity over time are given by

q1

3T3, q12=

q1

2T2, q22= q1T, where q1

I(t) = Ae−at+ B

(9)

and

I(t) = I0

?

1 +

?t

L

?k?−1

,

(10)

where A, B, a, I0, L, and k are experimentally determined constants (see [58], [59] for more details on the validity and

sensitivity of these models). The rate of photobleaching is a function of the excitation intensity. With a laser as an excitation

source, photobleaching is observed on the time scale of microseconds to seconds. The high numerical aperture objectives

currently in use, which maximize spatial resolution and improve the limits of detection, further accelerate the photobleaching

process. Commonly, photobleaching is ignored by standard tracking methods, but in many practical cases it is necessary to

model this process so as to be less sensitive to changing experimental conditions.

Following the common approximation (9), we model object intensity in our image data by the sum of a time-dependent, a

time-independent, and a random component:

It+ Ic+ ut=

I0ˆA

ˆA +ˆBe−ˆ αt+

I0ˆB

ˆA +ˆB

+ ut,

(11)

where utis zero-mean Gaussian process noise and I0is the initial object intensity, obtained by the initialization procedure

(see IV-H). The parametersˆA,ˆB, and ˆ α are estimated using the Levenberg-Marquardt algorithm for nonlinear fitting of (9)

to the average background intensity over time, bt(see IV-C). In order to conveniently incorporate the photobleaching effect

contained in (11) into our framework, we approximate it as a first-order Gauss-Markov process, It= (1− ˆ α)It−1+ut, which

models the exponential intensity decay in the discrete-time domain. In this case, the corresponding state prior DI(It|It−1) =

N(It|(1 − ˆ α)It−1,q3T), where q3= T−1σ2

The photobleaching effect could alternatively be accommodated in our framework by assuming a constant intensity model

(ˆ α = 0) for DI(It|It−1), but with a very high variance for the process noise, σ2

number of MC samples, the variance of the estimation would rapidly grow, and many samples would be used inefficiently,

causing problems especially in the case of a highly peaked likelihood L(zt|xt) (see IV-C). By using (11), we follow at least

the trend of the intensity changes, and bring the estimation closer to the optimal solution. This way, we reduce the estimation

variance and, consequently, the number of MC samples needed for the same accuracy as in the case of the constant intensity

model.

In summary, the proposed model (7) correctly approximates small accelerations in object motion and fluctuations in object

intensity, and therefore is very suitable for tracking growing MTs, as their dynamics can be well modeled by constant velocity

plus small random diffusion [39]. The model (8) can also be successfully used for tracking other subcellular structures, for

example vesicles, which are characterized by motion with higher nonlinearity. In that case, the process noise level, defined by

Q, should be increased.

uand σ2

uis the variance of ut.

u. However, in practice, because of the limited

Page 7

IEEE TRANSACTIONS ON MEDICAL IMAGING7

B. Object Interactions and Markov Random Field

In order to obtain a more realistic motion model and avoid track coalescence in the case of multiple object tracking, we

explicitly model the interaction between objects using a Markov random field (MRF) [60]. Here we use a pairwise MRF,

expressed by means of a Gibbs distribution

ψt(x(i)

t,x(j)

t) ∝ exp(−di,j

t),

i,j ∈ {1,...,N},c(i)

t

?= c(j)

t,

(12)

where di,j

is, di,j

t

is easy to implement yet can be made quite sophisticated. Using this form, we can still retain the predictive motion model of

each individual target. To this end, we sample Nstimes the pairs (x(l)

from pm(xt−1|z1:t−1) and q(xt|x(l)

case are given by

m,t∝L(zt|x(l)

q(x(l)

m,t−1,zt)

t

is a penalty function which penalizes the states of two objects c(i)

is maximal when two objects coincide and gradually falls off as they move apart. This simple pairwise representation

t

and c(j)

t

that are closely spaced at time t. That

m,t−1,x(l)

m,t) (M such pairs at a time, m = {1,...,M}),

m,t−1,zt), respectively, l = {1,...,Ns}. Taking into account (12), the weights (4) in this

w(l)

m,t)D(x(l)

m,t|x(l)

m,t|x(l)

m,t−1)

M

?

k=1,k?=m

ψt(x(l)

m,t,x(l)

k,t).

(13)

The mixture representation {{x(l)

cation we have found that an interaction potential based only on object positions is sufficient to avoid most tracking failures.

The use of a MRF approach is especially relevant and efficient in the case of 3D+t data analysis, because object merging is

not possible in our application.

m,t,w(l)

m,t}M

m=1}Ns

l=1is then straightforwardly transformed to {x(i)

t,w(i)

t,c(i)

t}N

i=1. In our appli-

C. Observation Model and Likelihood

The measurements in our application are represented by a sequence of 2D or 3D images showing the motion of fluorescent

proteins. The individual images (also called frames) are recorded at discrete instants t, with a sampling interval T, with each

image consisting of Nx×Ny×Nzpixels (Nz= 1 in 2D). At each pixel (i,j,k), which corresponds to a rectangular volume

of dimensions ∆x× ∆y× ∆z nm3, the measured intensity is denoted as zt(i,j,k). The complete measurement recorded at

time t is an Nx×Ny×Nzmatrix denoted as zt= {zt(i,j,k) : i = 0,...,Nx−1,j = 0,...,Ny−1,k = 0,...,Nz−1}. For

simplicity we assume that the origins and axis orientations of the (x,y,z) reference system and the (i,j,k) system coincide.

Let ˜ zt(r) denote a first-order interpolation of zt(∆xi,∆yj,∆zk).

The image formation process in a microscope can be modeled as a convolution of the true light distribution coming from

the specimen, with a point-spread function (PSF), which is the output of the optical system for an input point light source.

The theoretical diffraction-limited PSF in the case of paraxial and non-paraxial imaging can be expressed by the scalar Debye

diffraction integral [61]. In practice, however, a 3D Gaussian approximation of the PSF [4] is commonly favored over the

more complicated PSF models (such as the Gibson-Lanni model [62]). This choice is mainly motivated by computational

considerations, but a Gaussian approximation of the physical PSF is fairly accurate for reasonably large pinhole sizes (relative

squared error (RSE) < 9%) and nearly perfect for typical pinhole sizes (RSE < 1%) [61]. In most microscopes currently used,

the PSF limits the spatial resolution to ≈ 200 nm in-plane and ≈ 600 nm in the direction of the optical axis, as a consequence

of which subcellular structures (typically of size < 20 nm) are imaged as blurred spots. We adopt the common assumption

that all blurring processes are due to a linear and spatially invariant PSF.

The PF framework accommodates any PSF that can be calculated pointwise. To model the imaged intensity profile of the

object with some shape, one would have to use the convolution with the PSF for every state x(i)

computational overload, we propose to model the PSF and object shape at the same time using the 3D Gaussian approximation.

To model the manifest elongation in the intensity profile of MTs, we utilize the velocity components from the state vector xt

as parameters in the PSF. In this case, for an object of intensity Itat position rt, the intensity contribution to pixel (i,j,k) is

approximated as

t. In order to overcome this

ht(i,j,k;xt) = bt+ (It+ Ic)×

exp

?

?

−1

2mTRTΣ−1Rm

−(k∆z− zt?m?tanθ)2

2σ2

?

×

exp

z

?

,

(14)

where btis the background intensity, σz(≈ 235 nm) models the axial blurring, R = R(φ) is a rotation matrix

R(φ) =

?

cosφ

−sinφ

sinφ

cosφ

?

,

Σ =

?σ2

m(θ)

0

0

σ2

min

?

,

Page 8

IEEE TRANSACTIONS ON MEDICAL IMAGING8

m =

?i∆x− xt

j∆y− yt

?

,σm(θ) = σmin− (σmin− σmax)cosθ,

tanθ =

˙ zt

?

˙ xt2+ ˙ yt2,tanφ =

˙ yt

˙ xt,

−π < φ,θ ≤ π.

The parameters σmax and σmin represent the amount of blurring and, at the same time, model the elongation of the object

along the direction of motion. For subresolution structures such as vesicles, σmin= σmax≈ 80 nm, and for the elongated MTs

σmin≈ 100 nm and σmax≈ 300 nm.

For background level estimation we use the fact that the contribution of object intensity values to the total image intensity

(mainly formed by background structures with lower intensity) is negligible, especially in the case of low SNRs. We have found

that in a typical 2D image of size 103× 103pixels containing a thousand objects, the number of object pixels is only about

1%. Even if the object intensities would be 10 times as large as the background level (very high SNR), their contribution to

the total image intensity would be less than 10%. In that case, the normalized histogram of the image ztcan be approximated

by a Gaussian distribution with meanˆb and variance σ2

b. The estimated background bt=ˆb is then calculated according to

bt=

1

NxNyNz

Nx−1

?

i=0

Ny−1

?

j=0

Nz−1

?

k=0

zt(i,j,k).

(15)

In the case of a skewed histogram of image intensity, the median of the distribution can be taken as an estimate of the

background level. The latter is preferable because it treats object pixels as outliers for the background distribution.

Since an object will affect only the pixels in the vicinity of its location, rt, we define the likelihood function as

LG(zt|xt) ?

?

(i,j,k)∈C(xt)

ph(zt(i,j,k)|xt)

pb(zt(i,j,k)|bt),

(16)

where C(xt) = {(i,j,k) ∈ Z3: ht(i,j,k;xt) − bt> 0.1It},

ph(zt(i,j,k)|xt) ∝

1

σh(i,j,k)exp

?

−(zt(i,j,k) − ht(i,j,k;xt))2

2σ2

h(i,j,k)

?

,

(17)

and

pb(zt(i,j,k)|bt) ∝ exp

?

−(zt(i,j,k) − bt)2

2σ2

b

?

,

(18)

with σ2

are assumed to be independent from pixel to pixel and from frame to frame. Poisson noise, which can be used to model the

effect of the quantum nature of light on the measured data, is one of the main sources of noise in fluorescence microscopy

imaging. The recursive Bayesian solution is applicable as long as the statistics of the measurement noise is known for each

pixel. In this paper we use a valid approximation of Poisson noise, with σ2

the image intensities in order to satisfy the condition σ2

h(i,j,k) and σ2

bthe variances of the measurement noise for the object+background and background, respectively, which

h(i,j,k) = ht(i,j,k;xt) and σ2

b= bt, by scaling

b= bt[13].

D. Hierarchical Searching

Generally, the likelihood LG(zt|xt) is very peaked (even when the region C(xt) is small) and may lead to severe sample

impoverishment and divergence of the filter. Theoretically it is impossible to avoid the degeneracy phenomenon, where, after a

few iterations of the algorithm, all but one of the normalized importance weights are very close to zero [53]. Consequently, the

accuracy of the estimator also degrades enormously [52]. A commonly used measure of degeneracy is the estimated effective

sample size [53], given by

?Ns

i=1

which intuitively corresponds to the number of “useful” particles. Degeneracy is usually strong for image data with low SNR,

but the filter also performs poorly when the noise level is too small [19]. This suggests that MC estimation with accurate sensors

may perform worse than with inaccurate sensors. The problem can be partially fixed by using an observation model which

overestimates the measurement noise. While the performance is better, this is not a principled way of fixing the problem; the

observation model is artificially inaccurate and the resulting estimation is no longer a posterior, even if infinitely many samples

were used. Other methods that try to improve the performance of PF include partitioned sampling [32], the auxiliary particle

filter (APF) [20], [54] and the regularized particle filters (RPF) [19], [54]. Because of the highly nonlinear observation model

and dynamic model with a high noise level, the mentioned methods are inefficient for our application. Partitioned sampling

Neff(t) =

?

(w(i)

t)2

?−1

,

(19)

Page 9

IEEE TRANSACTIONS ON MEDICAL IMAGING9

requires the possibility to partition the state space and to decouple the observation model for each of the partitions, which

cannot be done for our application. Application of the APF is beneficial only when the dynamic model is correctly specified

with a small amount of process noise. The tracking of highly dynamic structures with linear models requires increasing the

process noise in order to capture the typical motion patterns.

To overcome these problems, we use a different approach, based on RPF, and mainly on progressive correction [19]. First,

we propose a second observation model:

σB

σS(xt)×

??Sz

B

where

Sz

t(xt) =

?

and

Sh

t(xt) =

?

Sb

distribution: σ2

LG(zt|xt). Another advantage is that LS(zt|xt) can be used for objects without a predefined shape; only the region C(xt),

which presumably contains the object, and the total object intensity in C(xt) need to be specified.

Subsequently, we propose a modified hierarchical search strategy, which uses both models, LS and LG. To this end, we

calculate an intermediate state at time t′, between time points t−1 and t, by propagating and updating the samples using the

likelihood LSaccording to

¯ p(xt′|z1:t′) ∝ LS(zt′|xt′)D(xt′|xt−1)p(xt−1|z1:t−1)

LS(zt|xt) ?

exp

t(xt) − Sb

2σ2

t(xt)?2

−

?Sz

t(xt) − Sh

2σ2

t(xt)?2

S(xt)

?

,

(20)

(i,j,k)∈C(xt)

zt(i,j,k),

(i,j,k)∈C(xt)

ht(i,j,k;xt),

t= bt|C(xt)|, where | · | denotes the set size operator, and the variances σ2

S= So

Sand σ2

Bare taken to approximate the Poisson

tand σ2

B= Sb

t. The likelihood LS(zt|xt) is less peaked but gives an error of the same order as

(21)

where zt′ = zt. After this step, Neff is still rather high, because the likelihood LS is less peaked than LG. In a next step,

particles with high weights at time t′are diversified and put into regions where the likelihood LGis high, giving a much better

approximation of the posterior:

p(xt|z1:t) ∝ LG(zt|xt)N(xt|µt′,Σt′)¯ p(xt′|z1:t′),

(22)

where the expectation and the variance are given by

µt′ = E¯ p[xt′],

Σt′ = E¯ p[(xt′ − µt′)(xt′ − µt′)T].

(23)

The described hierarchical search strategy is further denoted as LSG. It keeps the number Neffquite large and, in practice,

provides filters that are more stable in time, with lower variance in the position estimation.

E. Measurement Gating

Multiple object tracking requires gating, or measurement selection. The purpose of gating is to reduce computational expense

by eliminating measurements which are far from the predicted measurement location. Gating is performed for each track at

each time step t by defining a subvolume of the image space, called the gate. All measurements positioned within the gate

are selected and used for the track update step, (2), while measurements outside the gate are ignored in these computations.

In standard approaches to tracking, using the Kalman filter or extended Kalman filter, measurement gating is accomplished by

using the predicted measurement covariance for each object and then updating the predicted state using joint probabilistic data

association [63]. In the PF approach, which is able to cope with nonlinear and non-Gaussian models, the analog of the predicted

measurement covariance is not available and can be constructed only by taking, for example, a Gaussian approximation of

the current particle cloud and using it to perform gating. Generally, this approximation is unsatisfactory, since the advantages

gained from having a representation of a non-Gaussian pdf are lost. In the proposed framework, however, this approximation

is justified by using the highly peaked likelihood functions and the reclustering procedure (described in IV-G), which keep the

mixture components unimodal.

Having the measurements ˜ zt(rt), we define the gate for each of the tracks as follows:

Cm,t= {rt∈ R3: (rt− ¯ rm,t)TΣ−1

m,t(rt− ¯ rm,t) ≤ C0},

(24)

where the parameter C0specifies the size of the gate, which is proportional to the probability that the object falls within the

gate. Generally, since the volume of the gate is dependent on the tracking accuracy, it varies from scan to scan and from track

Page 10

IEEE TRANSACTIONS ON MEDICAL IMAGING10

to track. In our experiments, C0= 9 (a 3-standard-deviation level gate). The gate Cm,tis centered at the position predicted

from the particle representation of pm(xt|z1:t−1):

¯ rm,t= Epm[rt] =

?

rtpm(xt|z1:t−1)dxt

≈

N

?

i=1,c(i)

t−1=m

¯ r(i)

tw(i)

t−1,

(25)

where the ¯ r(i)

t

are the position elements of the state vector

¯ x(i)

t

∼ D(xt|x(i)

t−1),i = {1,...,N}.

Similarly, the covariance matrix is calculated as

Σm,t= Epm[(rt− ¯ rm,t)(rt− ¯ rm,t)T].

(26)

F. Data-Dependent Sampling

Basic particle filters [20], [31], [36], which use the proposal distribution q(xt|xt−1,zt) = D(xt|xt−1) usually perform

poorly because too few samples are generated in regions where the desired posterior p(xt|z1:t) is large. In order to construct

a proposal distribution which alleviates this problem and takes into account the most recent measurements zt, we propose to

transform the image sequence into probability distributions. True spots are characterized by a combination of convex intensity

distributions and a relatively high intensity. Noise-induced local maxima typically exhibit a random distribution of intensity

changes in all directions, leading to a low local curvature [4]. These two discriminative features (intensity and curvature) are

used to construct an approximation of the likelihood L(zt|xt), using the image data available at time t. For each object we

use the transformation

˜ pm(rt|zt) =

?

∀rt∈ Cm,t, where Gσis the Gaussian kernel with standard deviation (scale) σ, the curvature κt(rt) is given by the determinant

of the Hessian matrix H of the intensity ˜ zt(rt):

(Gσ∗ ˜ zt(rt) − bt)rκs

Cm,t(Gσ∗ ˜ zt(rt) − bt)rκs

t(rt)

t(rt)dxdydz,

(27)

κt(rt) = det(H(rt)),

H(rt) = ∇ · ∇T˜ zt(rt),

(28)

and the exponents r > 0 and s > 0 weigh each of the features and determine the peakedness of the likelihood.

Using this transformation, we define the new data dependent proposal distribution for object m as

˜ qm(xt|xt−1,zt) = ˜ pm(rt|zt)N(It|˜ zt(rt) − bt,q3T)×

N(st|sMMSE

m,t−1,Tq2I)N(vt|rt− ˆ rMMSE

m,t−1,Tq1I),

(29)

Contrary to the original proposal distribution, which fails if the likelihood is too peaked, the distribution (29) generates samples

that are highly consistent with the most recent measurements in the predicted (using the information from the previous time

step) gates. A combination of both proposal distributions gives excellent results:

qm(xt|xt−1,zt) = γD(xt|xt−1) + (1 − γ)˜ qm(xt|xt−1,zt),

where 0 < γ < 1. Comparison shows that the proposal distribution qm(xt|xt−1,zt) is uniformly superior to the regular one

(γ = 1) and scales much better to smaller sample sizes.

G. Clustering and Track Management

The representation of the filtering distribution p(xt|z1:t) as the mixture model (6) allows for a deterministic spatial reclustering

procedure ({c′(i)

a new mixture representation (with possibly a different number of mixture components) taking as input the current mixture

representation. This allows modeling and capturing merging and splitting events, which also have a direct analogy with biological

phenomena. In our implementation, at each iteration the mixture representation is recalculated by applying K-means clustering

algorithm. The reclustering is based on spatial information (object positions) only and is initialized with the estimates (25).

Taking into account our application, two objects are not allowed to merge when their states become similar. Whenever

objects pass close to one another, the object with the best likelihood score typically “hijacks” the particles of the nearby

mixture components. As mentioned above, this problem is partly solved by using the MRF model for object interactions. The

MRF model significantly improves the tracking performance in 3D+t. For 2D+t data sets, however, the observed motion is a

projection of the real 3D motion onto the 2D plane. In this case, when one object passes above or beneath another (in 3D), we

t},M′) = F({x(i)

t},{c(i)

t},M) [55]. The function F can be implemented in any convenient way. It calculates

Page 11

IEEE TRANSACTIONS ON MEDICAL IMAGING11

perceive the motion as penetration or merging. These situations are in principle ambiguous and frequently cannot be resolved

uniquely, neither by an automatic tracking method nor by a human observer.

We detect possible object intersections during tracking by checking whether the gates Cm,tintersect each other. For example,

for two trajectories, the intersection is captured if Ci,t∩ Cj,t?= {0}, i,j ∈ {1,...,M}. In general, the measurement space

Ct= ∪M

gates or the gate itself. For each C∗

m=1Cm,tis partitioned into a set of disjoint regions Ct= {C∗

k,t, we define a set of indices Jk,t, which indicate which of the gates Ci,tbelong to it:

Jk,t= {i ∈ {1,...,M} : Ci,t∈ C∗

1,t,...,C∗

K,t}, where C∗

k,tis either the union of connected

k,t}

(30)

For the gates C∗

which correspond to object interaction, we follow the procedure similar to the one described in Section IV-B. For each C∗

which |Jk,t| ?= 1, the set of states {x(l)

a set of hypotheses Θ(l)

where a(l)

interaction and gives no measurements at time t. The hypothesis that maximizes the likelihood is selected as

k,twith |Jk,t| = 1, the update of the MC weights w(i)

m,tis done according to (4). For all other gates C∗

k,t,

k,tfor

j,t}, j ∈ Jk,t, is sampled from the proposal distribution (for every l = {1,...,Ns}), and

1,...,θ(l)

i

is a set of binary associations, {a(l)

i,j= 1 if object j exists during the interaction, and a(l)

k,t= {θ(l)

S}, S = 2|Jk,t|, is formed. Each θ(l)

i,j}, j ∈ Jk,t,

i,j= 0 if the object “dies” or leaves just before or during the

ˆθ(l)

k= argmax

θ(l)

i∈Θ(l)

k,t

L(zt|xt),

(31)

where the likelihood L(zt|xt) can be either LG(zt|xt) or LS(zt|xt), but the region C(xt) is defined as C(xt) = ∪j∈Jk,tC(x(l)

and ht(.;xt) is substituted in (16) and (20) for each θ(l)

i

with?

corresponding toˆθ(l)

as they were before. If the component representation in the next few frames after the interaction event becomes too diffuse,

and there is more than one significant mode, splitting is performed and a new track is initiated (see IV-H for more details).

Finally, for the termination of an existing track, the methods commonly used for small target tracking [23], [24] cannot

be applied straightforwardly. These methods assume that, due to imperfect sensors, the probability of detecting an object is

less than one, and they try to follow the object after disappearance for 4-5 frames, predicting its position in time and hoping

to catch it again. In our case, when the density of objects in the images is high, such monitoring would definitely result in

“confirming” measurements after 3-5 frames of prediction, but these measurements would very likely originate from another

object. In our algorithm in order to terminate the track we define the thresholds ¯ σmax, ¯ σmin, ¯ σz that describe the “biggest”

objects that we are going to track. Then we sample the particles in the predicted gates Cm,tusing the data-dependent sampling

(27) with s = 0. If the determinant of the covariance matrix computed for those MC samples is grater than ¯ σ2

the track is terminated. If the gate Cm,tdoes not contain a real object the determinant value will be much higher than the

proposed threshold, which is nicely separate the objects from the background structures.

j,t),

j∈Jk,ta(l)

j,t) are used in (16) and (20), with the ˆ a(l)

i,jht(.;x(l)

j,t). For the update of the MC weights w(l)

j,t

the region C(xt) = C(x(l)

j,t) and ht(.;xt) =?

j∈Jk,tˆ a(l)

jht(.;x(l)

j

denoting the a(l)

i,j

k. Additionally, in such cases, we do not perform reclustering, but keep the labels for the current iteration

max¯ σ2

min¯ σ2

zr−3

H. Initialization and Track Initiation

The prior distribution p(x0) is specified based on information available in the first frame. One way to initialize the state

vector x0would be to point on the desired bright spots in the image or to select regions of interest. In the latter case, the state

vector is initialized by a uniform distribution over the state space, in predefined intervals for velocity and intensity, and the

expected number of objects should be specified. During filtering and reclustering, after a burn-off period of 2-3 frames, only

the true objects will remain.

For completely automatic initiation of object tracks in the first frame, and also for the detection of potential objects for

tracking in subsequent frames, we use the following procedure. First, the image space is divided into NI= NX× NY× NZ

rectangular 3D cells of dimensions ∆c× ∆c× ∆a, with ∆c= 6σmaxand ∆a= 6σz. Next, for each time step t, the image is

converted to a probability map according to (27), and N = MNsparticles ˜ x(i)

of particles in each cell represents the degree of belief in object birth. To discriminate potential objects from background

structures or noise, we estimate for each cell the center of mass ˆ rk(k = {1,...,NI}) by MC integration over that cell and

calculate the number of MC samples nk,tin the ellipsoidal regions Sk,t(rt) centered at ˆ rk(with semi-axes of lengths ∆c/2,

∆c/2, ∆a/2). In order to initiate a new object, two conditions have to be satisfied. The first condition is that nk,t should

be greater than N|Sk,t|

from the image region with uniform background intensity. The second condition is similar to the one for track termination

(see IV-G): the determinant of the covariance matrix should be smaller than ¯ σ2

Each object d (out of Md newly detected at time t) is initialized with mixture weight πd,t = (M + Md)−1and object

position rd,t(the center of mass calculated by MC integration over the region Sd,t(rt)). The velocity is uniformly distributed

in a predefined range and the intensity is obtained from the image data for that frame and position. In cases where the samples

from an undetected object are split between four cells (in the unlikely event when the object is positioned exactly on the

intersection of the cell borders), the object will most probably be detected in the next time frame.

t

are sampled with equal weights. The number

|zt|= Nπ(6NI)−1. The threshold represents the expected number of particles if the sampling was done

max¯ σ2

min¯ σ2

zr−3.

Page 12

IEEE TRANSACTIONS ON MEDICAL IMAGING12

2µm

SNR=2

SNR=7SNR=5 SNR=3

2µm

Fig. 2.

of object appearance. The insets show zooms of objects at different SNRs. The right image is a frame from another sequence, at SNR=7, with the trajectories

of the 20 moving objects superimposed (white dots), illustrating the motion patterns allowed by the linear state evolution model (8).

Examples of synthetic images used in the experiments. The left image is a single frame from one of the sequences, at SNR=2, giving an impression

V. EXPERIMENTAL RESULTS

The performance of the described PF-based tracking method was evaluated using both computer generated image data

(Section V-A) and real fluorescence microscopy image data from MT dynamics studies (Section V-B). The former allowed us

to test the accuracy and robustness to noise and object interaction of our algorithm compared to two other commonly used

tracking tools. The experiments on real data enabled us to compare our algorithm to expert human observers.

A. Evaluation on Synthetic Data

1) Simulation Setup: The algorithm was evaluated using synthetic but realistic 2D image sequences (20 time frames of

512 × 512 pixels, ∆x= ∆y= 50 nm, T = 1 sec) of moving MT-like objects (a fixed number of 10, 20, or 40 objects per

sequence, yielding data sets of different object densities), generated according to (8) and (14), for different levels of Poisson

noise (see Fig. 2) in the range SNR=2–7, since SNR=4 has been identified by previous studies [12], [13] as a critical level at

which several popular tracking methods break down. In addition, the algorithm was tested using 3D synthetic image sequences

(20 time frames of 512×512 pixels ×20 optical slices, ∆x= ∆y= 50 nm, ∆z= 200 nm, T = 1 sec, with 10–40 objects per

sequence), also for different noise levels in the range of SNR=2–7. Here, SNR is defined as the difference in intensity between

the object and the background, divided by the standard deviation of the object noise [12]. The velocities of the objects ranged

from 200 to 700 nm/sec, representative of published data [64].

Having the ground truth for the synthetic data, we evaluated the accuracy of tracking by using a traditional quantitative

performance measure: the root mean square error (RMSE), in K independent runs (we used K = 3) [24]:

RMSE =

?

?

?

?1

?

|Tm|

K

K

?

i=1

RMSE2

k,

(32)

with

RMSE2

k=

1

M

M

?

m=1

1

?

t∈Tm

?rm,t−ˆ rk

m,t?2

?

,

(33)

where rm,tdefines the true position of object m at time t, ˆ rk

is the set of time points at which object m exists.

2) Experiments with Hierarchical Searching: In order to show the advantage of using the proposed hierarchical search

strategy (see IV-D), we calculated the localization error at different SNRs for objects moving along horizontal straight lines

at a constant speed of 400 nm/sec (similar to [6]). The tracking was done for two types of objects: round (σmax= σmin= 100

nm) and elongated (σmax = 300 nm, σmin = 100 nm) using the likelihoods LS, LG, and the combined two-step approach

LSG. The filtering was performed with 500 MC samples. The RMSE for all three models is shown in Fig. 3. The localization

error of the hierarchical search is lower and the effective sample size Neffis higher than in the case of using only LG. For

comparison, for the likelihoods LS, LG, and LSG, the ratios between the effective sample size Neffand Nsare less than 0.5,

0.005, and 0.05, respectively.

3) Comparison with Conventional Two-Stage Tracking Methods: The proposed PF-based tracking method was compared

to conventional two-stage (completely separated detection and linking) tracking approaches commonly found in the literature.

To maximize the credibility of these experiments, we chose to use two existing, state-of-the-art multitarget tracking software

tools based on this principle, rather than making our own (possibly biased) implementation of described methods. The first is

m,tis a posterior mean estimate of rm,tfor the kth run, and Tm

Page 13

IEEE TRANSACTIONS ON MEDICAL IMAGING 13

SNR

23457

RMSE [nm]

0

10

20

30

40

50

60

L

L

LSG

S

G

SNR

23457

RMSE [nm]

0

10

20

30

40

50

L

L

L

G

S

SG

Fig. 3.

models, LG, LS, and LSG.

The RMSE in object position estimation as a function of SNR for round (left) and elongated (right) objects using the three different observation

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

1

3

4

2

3

4

2

3

Fig. 4.

procedure, while ParticleTracker (and similarly Volocity) fails (bottom sequence).

Example (SNR=3) showing the ability of our PF method to deal with one-frame occlusion scenarios (top sequence), using the proposed reclustering

1

2

1

2

1

2

1

2

1

2

1

2

2

1

2

1

Fig. 5.

the object shape during the measurement-to-track association process, while ParticleTracker (and similarly Volocity) fails (bottom sequence).

Typical example (SNR=3) showing the ability of our PF method to resolve object crossing correctly (top sequence), by using the information about

1

2

3

3

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

Fig. 6.

three objects interact at one location and the occlusion lasts for more than one frame.

Example (SNR=3) where our PF method as well as ParticleTracker and Volocity failed (only the true tracks are shown in the sequence), because

Volocity (Improvision, Coventry, UK), which is a commercial software package, and the second is ParticleTracker [6], which

is freely available as a plugin to the public-domain image analysis tool ImageJ [65] (National Institutes of Health, Bethesda,

MD, USA).

With Volocity, the user has to specify thresholds for the object intensity and the approximate object size in order to discriminate

Page 14

IEEE TRANSACTIONS ON MEDICAL IMAGING 14

TABLE I

COMPARISON OF THE ABILITY OF THE THREE METHODS TO TRACK OBJECTS CORRECTLY IN CASES OF OBJECT APPEARANCE, DISAPPEARANCE, AND

INTERACTIONS.

Volocity

r0

ParticleTracker

r0

Ntr = 10

1.8

1

1

1

1

Ntr = 20

2

1.95

1.35

1.1

1.05

Ntr = 40

1.7

1.5

1.42

1.22

1.17

Particle Filter

r0

SNR

r1

r1

r1

2

3

4

5

7

1.1

1

1

1

1

0.9

1

1

1

1

0.1

0.5

0.7

1

1

1

1

1

1

1

1

1

1

1

1

2

3

4

5

7

1.15

1.05

1.05

1

1

0.5

0.6

0.6

0.7

0.85

0.1

0.15

0.45

0.65

0.9

1.05

1

1

1

1

0.8

0.9

0.95

1

1

2

3

4

5

7

1.9

1.1

1.05

1.04

1.02

0.05

0.6

0.7

0.8

0.8

0.1

0.15

0.2

0.35

0.33

1.05

1.02

1

1

1

0.5

0.7

0.8

0.9

0.9

objects from the background, in the detection stage. These thresholds are set globally, for the entire image sequence. Following

the extraction of all objects in each frame, linking is performed on the basis of finding nearest neighbors in subsequent

image frames. This association of nearest neighbors also takes into account whether the motion is smooth or erratic. With

ParticleTracker, the detection part also requires setting intensity and object size thresholds. The linking, however, is based on

finding the global optimal solution for the correspondence problem in a given number of successive frames. The solution is

obtained using graph theory and global energy minimization [6]. The linking also utilizes the zeroth- and second-order intensity

moments of the object intensities. This better resolves intersection problems and improves the linking result. For both tools,

the parameters were optimized manually during each stage, until all objects in the scene were detected. Our PF-based method

was initialized using the automatic initialization procedure described in Section IV-H. The user-definable algorithm parameters

were fixed to the following values: σmax= 250 nm, σmin= 120 nm, q1= 7500 nm2/sec3, q2= 25 nm/sec, q3= 0.1, and

103MC samples were used per object. To enable comparisons with manual tracking, five independent, expert observers also

tracked the 2D synthetic image sequences, using the freely available software tool MTrackJ [66].

4) Tracking Results: First, using the 2D synthetic image sequences, we compared the ability of our algorithm, Volocity, and

ParticleTracker to track objects correctly, despite possible object appearances, disappearances, and interactions or crossings.

The results of this comparison are presented in Table I. Two performance measures are listed: r0, which is the ratio between

the number of tracks produced by the algorithm and the true number of tracks present in the data (Ntr), and r1, which is the

ratio between the number of correctly detected tracks and the true number of tracks. Ideally, the values for both ratios should

be equal to 1. A value of r0> 1 indicates that the method produced broken tracks. The main cause of this is the inability to

resolve track intersections in some cases (see Fig. 4 for an example). In such situations the method either initiates new tracks

after the object interaction event (because during the detection stage only one object was detected at that location, see Fig. 4),

increasing the ratio r0, or it incorrectly interchanges the tracks before and after the interaction (see Fig. 5 for an example),

lowering the ratio r1. From the results in Table I and the examples in Figs. 4 and 5, it clearly follows that our PF method is

much more robust in dealing with object interactions. The scenario in the latter example causes no problems for the PF, as,

contrary to two other methods, it exploits information about object appearance. During the measurement-to-track association,

the PF favors measurements that are close to the predicted location and that have an elongation in the predicted direction of

motion. In some cases (see Fig. 6 for an example), all three methods fail, which generally occurs when the interaction is too

complicated to resolve even for expert biologists.

Using the same data sets and tracking results, we calculated the RMSE in object position estimation, as a function of

SNR. To make a fair comparison, only the results of correctly detected tracks were included in these calculations. The results

are shown in Fig. 7. The localization error of our algorithm is in the range of 10–50 nm, depending on the SNR, which is

approximately 2–3 times smaller than for manual tracking. The error bars represent the interobserver variability for manual

tracking, which, together with the average errors, indicate that the performance of manual tracking degrades significantly for

low SNRs, as expected. The errors of the three automated methods show the same trend, with our method being consistently

more accurate than the other two. This may be explained by the fact that, in addition to object localization by center-of-mass

estimation, our hierarchical search performs further localization refinement during the second step (22). The RMSE in Fig.

7 is larger than in Fig. 3, because, even though only correct tracks were included, the accuracy of object localization during

multiple object tracking is unfavorably influenced at places where object interaction occurs.

Page 15

IEEE TRANSACTIONS ON MEDICAL IMAGING 15

SNR

2.03.0 4.05.06.07.0

Root Mean Square Error [nm]

20

40

60

80

100

120

140

160

180

Manual Tracking

ParticleTracker

Volocity

Particle Filter

Standard Deviation

Fig. 7.

and ParticleTracker) and manual tracking (five observers) based on synthetic image data.

The RMSE in object position estimation as a function of SNR for our algorithm (Particle Filter) versus the two other automatic methods (Volocity

Our algorithm was also tested on the 3D synthetic image sequences as described, using 20 MC simulations. The RMSEs

for the observation model LSGranged from ≈ 30 nm (SNR = 7) to ≈ 70 nm (SNR = 2). These errors were comparable to

the errors produced by Volocity (in this test, ParticleTracker was excluded, as it is limited to tracking in 2D+t). Despite the

fact that the axial resolution of the imaging system is approximately three times lower, the localization error was not affected

dramatically relative to the 2D+t case. The reason for this is that in 3D+t data, we have a larger number of informative image

elements (voxels). As a result, the difference in the RMSEs produced by the estimators employed in our algorithm and in

Volocity is less compared to Fig. 7.

B. Evaluation on Real Data

1) Image Acquisition: In addition to the computer generated image data, real 2D fluorescence microscopy image sequences

of MT dynamics were acquired. COS-1 cells were cultured and transfected with GFP-tagged proteins as described [64], [67].

Cells were analyzed at 37oC on a Zeiss 510 confocal laser scanning microscope (LSM-510). In most experiments the optical

slice separation (in the z-dimension) was set to 1 µm. Images of GFP+TIP movements in transfected cells were acquired every

1–3.5 seconds. For different imaging setups, the pixel size ranged from 70 × 70 nm2to 110 × 110 nm2. Image sequences of

30–50 frames were recorded and movies assembled using LSM-510 software. Six representative data sets (30 frames of size

512 × 512 pixels), examples of which are shown in Fig. 1, were preselected from larger volumes by manually choosing the

regions of interest. GFP+TIP dashes were tracked in different cell areas. Instantaneous velocities of dashes were calculated

simply by dividing measured or tracked distances between frames by the temporal sampling interval.

2) Comparison with Manual Tracking: Lacking ground truth for the real data, we evaluated the performance of our algorithm

by visual comparison with manual tracking results. In this case, the latter were obtained from two expert cell biologists, each

of which tracked 10 moving MTs of interest by using the aforementioned software tool MTrackJ. The selection of target MTs

to be tracked was made independently by the two observers. Also, the decision of which feature to track (the tip, the center,

or the brightest point) was left to the observers. When done consistently, this does not influence velocity estimations, which is

what we focused on in these experiments. The parameters of our algorithm (run with the model LSG) were fixed to the same

values as in the case of the evaluation on synthetic data.

3) Tracking Results: Distributions of instant velocities estimated using our algorithm versus manual tracking are presented

in Fig. 8. The graphs show the results for the data sets of Fig. 1(a) and (f), for which SNR ≈ 5 and SNR ≈ 2, respectively. A

visual comparison of the estimated velocities per track, for each of the 10 tracks (the average track length was 13 time steps),

is presented in Fig. 9, with more details for two representative tracks shown in Fig. 10. Application of a paired Student t-test

per track revealed no statistically significant difference between the results of our algorithm and that of manual tracking, for

both expert human observers (p ≫ 0.05 in all cases). Often, biologists are interested in average velocities over sets of tracks.

In the described experiments, the difference in average velocity (per 10 tracks) between automatic and manual tracking was

less than 1%, for both observers. Our velocity estimates are also comparable to those reported previously based on manual

tracking in the same type of image data [64].

Finally, we present two different example visualizations of real data together with the results of tracking using our algorithm.

Fig. 11 shows the results of tracking in the presence of photobleaching, which clearly illustrates the capability of our algorithm

#### View other sources

#### Hide other sources

- Available from W.J. Niessen · May 31, 2014
- Available from smal.ws