# Multi-camera real-time three-dimensional tracking of multiple flying animals

**Abstract**

Automated tracking of animal movement allows analyses that would not otherwise be possible by providing great quantities of data. The additional capability of tracking in real time—with minimal latency—opens up the experimental possibility of manipulating sensory feedback, thus allowing detailed explorations of the neural basis for control of behaviour. Here, we describe a system capable of tracking the three-dimensional position and body orientation of animals such as flies and birds. The system operates with less than 40 ms latency and can track multiple animals simultaneously. To achieve these results, a multi-target tracking algorithm was developed based on the extended Kalman filter and the nearest neighbour standard filter data association algorithm. In one implementation, an 11-camera system is capable of tracking three flies simultaneously at 60 frames per second using a gigabit network of nine standard Intel Pentium 4 and Core 2 Duo computers. This manuscript presents the rationale and details of the algorithms employed and shows three implementations of the system. An experiment was performed using the tracking system to measure the effect of visual contrast on the flight speed of Drosophila melanogaster. At low contrasts, speed is more variable and faster on average than at high contrasts. Thus, the system is already a
useful tool to study the neurobiology and behaviour of freely flying animals. If combined with other techniques, such as ‘virtual reality’-type computer graphics or genetic manipulation, the tracking system would offer a powerful new way to investigate the biology of flying animals.

Multi-camera real-time three-

dimensional tracking of multiple

ﬂying animals

Andrew D. Straw*, Kristin Branson, Titus R. Neumann

and Michael H. Dickinson

California Institute of Technology, Bioengineering, Mailcode 138-78, Pasadena,

CA 91125, USA

Automated tracking of animal movement allows analyses that would not otherwise be

possible by providing great quantities of data. The additional capability of tracking in real

time—with minimal latency—opens up the experimental possibility of manipulating sensory

feedback, thus allowing detailed explorations of the neural basis for control of behaviour.

Here, we describe a system capable of tracking the three-dimensional position and body orien-

tation of animals such as ﬂies and birds. The system operates with less than 40 ms latency and

can track multiple animals simultaneously. To achieve these results, a multi-target tracking

algorithm was developed based on the extended Kalman ﬁlter and the nearest neighbour

standard ﬁlter data association algorithm. In one implementation, an 11-camera system is

capable of tracking three ﬂies simultaneously at 60 frames per second using a gigabit network

of nine standard Intel Pentium 4 and Core 2 Duo computers. This manuscript presents the

rationale and details of the algorithms employed and shows three implementations of the

system. An experiment was performed using the tracking system to measure the effect of

visual contrast on the ﬂight speed of Drosophila melanogaster. At low contrasts, speed is

more variable and faster on average than at high contrasts. Thus, the system is already a

useful tool to study the neurobiology and behaviour of freely ﬂying animals. If combined

with other techniques, such as ‘virtual reality’-type computer graphics or genetic manipu-

lation, the tracking system would offer a powerful new way to investigate the biology of

ﬂying animals.

Keywords: computer vision; animal behaviour; ﬂight; manoeuvring; insects; birds

1. INTRODUCTION

Much of what we know about the visual guidance of ﬂight

[1–4], aerial pursuit [5–9], olfactory search algorithms

[10,11] and control of aerodynamic force generation

[12,13] is based on experiments in which an insect was

tracked during ﬂight. To facilitate these types of studies

and to enable new ones, we created a new, automated

animal tracking system. A signiﬁcant motivation was

to create a system capable of robustly gathering large

quantities of accurate data in a highly automated fashion

in a ﬂexible way. The real-time nature of the system

enables experiments in which an animal’s own movement

is used to control the physical environment, allowing

virtual-reality or other dynamic stimulus regimes to

investigate the feedback-based control performed by

the nervous system. Furthermore, the ability to easily

collect ﬂight trajectories facilitates data analysis and be-

havioural modelling using machine-learning approaches

that require large amounts of data.

Our primary innovation is the use of arbitrary num-

bers of inexpensive cameras for markerless, real-time

tracking of multiple targets. Typically, cameras with

relatively high temporal resolution, such as 100 frames

per second, and which are suitable for real-time image

analysis (those that do not buffer their images to on-

camera memory), have relatively low spatial resolution.

To have high spatial resolution over a large tracking

volume, many cameras are required. Therefore, the

use of multiple cameras enables tracking over large, be-

haviourally and ecologically relevant spatial scales with

high spatial and temporal resolutions while minimizing

the effects of occlusion. The framework naturally allows

information gathered from each camera view to incre-

mentally improve localization. Individual views of

the target thus reﬁne the tracking estimates, even if

other cameras do not see it (for example, owing to

occlusions or low contrast). The use of multiple cameras

also gives the system its name, ﬂydra, from ‘ﬂy’, our pri-

mary experimental animal, and the mythical Greek

multi-headed serpent ‘hydra’.

Flydra is largely composed of standard algorithms,

hardware and software. Our effort has been to integrate

*Author for correspondence (astraw@caltech.edu).

J. R. Soc. Interface (2011) 8, 395–409

doi:10.1098/rsif.2010.0230

Published online 14 July 2010

Received 19 April 2010

Accepted 21 June 2010

395 This journal is q 2010 The Royal Society

these disparate pieces of technology into one coherent,

working system with the important property that the

multi-target tracking algorithm operates with low

latency during experiments.

1.1. System overview

A Bayesian framework provides a natural formalism to

describe our multi-target tracking approach. In such a

framework, previously held beliefs are called the a

priori, or prior, probability distribution of the state of

the system. Incoming observations are used to update

the estimate of the state into the a posteriori, or pos-

terior, probability distribution. This process is often

likened to human reasoning, whereby a person’s best

guess at some value is arrived at through a process of

combining previous expectations of that value with

new observations that inform about the value.

The task of ﬂydra is to ﬁnd the maximum a poster-

iori (MAP) estimate of the state S

t

of all targets at

time t given observations Z

1:t

from all time steps (start-

ing with the ﬁrst time step to the current time step),

p(S

t

jZ

1:t

). Here, S

t

represents the state (position and

velocity) of all targets, S

t

¼ (s

t

1

, ...s

t

l

t

) where l

t

is the

number of targets at time t. Under the ﬁrst-order

Markov assumption, we can factorize the posterior as

pðS

t

jZ

1:t

Þ / pðZ

t

jS

t

Þ

ð

pðS

t

jS

t1

ÞpðS

t1

jZ

1:t1

ÞdS

t1

: ð1:1Þ

Thus, the process of estimating the posterior prob-

ability of target state at time t is a recursive process

in which new observations are used in the model of

observation likelihood p(Z

t

jS

t

). Past observations

become incorporated into the prior, which combines

the motion model p(S

t

jS

t21

) with the target probability

from the previous time step p(S

t21

jZ

1:t21

).

Flydra uses an extended Kalman ﬁlter (EKF) to

approximate the solution to equation (1.1), as described

in §3.1. The observation Z

t

for each time step is the set

of all individual low-dimensional feature vectors con-

taining image position information arising from the

camera views of the targets (§2). In fact, equation

(1.1) neglects the challenges of data association (linking

individual observations with speciﬁc targets) and

targets entering and leaving the tracking volume.

Therefore, the nearest neighbour standard ﬁlter

(NNSF) data association step is used to link individual

observations with target models in the model of obser-

vation likelihood (§3.2), and the state update model

incorporates the ability for targets to enter and leave

the tracking volume (§3.2.3). The heuristics employed

to implement the system typically were optimizations

with regard to real-time performance and low latency

rather than a compact form, and our system only

approximates the full Bayesian solution rather than

perfectly implements it. Nevertheless, the remaining

sections of this manuscript address their relation to

the global Bayesian framework where possible. Aspects

of the system which were found to be important for

low-latency operation are mentioned.

The general form of the apparatus is illustrated

in ﬁgure 1a and a ﬂowchart of operations is given in

ﬁgure 2a. Digital cameras are connected (with an

IEEE 1394 FireWire bus or a dedicated gigabit ethernet

cable) to image processing computers that perform a

background subtraction-based algorithm to extract

image features such as the two-dimensional target pos-

ition and orientation in a given camera’s image. From

these computers, this two-dimensional information is

transmitted over a gigabit ethernet LAN to a central

computer, which performs two- to three-dimensional tri-

angulation and tracking. Although the tracking results

are generated and saved online, in real time as the exper-

iment is performed, raw image sequences can also be

saved for both veriﬁcation purposes as well as other

types of analyses. Finally, reconstructed ﬂight trajec-

tories, such as that of ﬁgure 1b, may then be subjected

to further analysis (ﬁgure 3 and see ﬁgure 9).

1.2. Related work

Several systems have allowed manual or manually

assisted digitization of the trajectories of freely ﬂying ani-

mals. In the 1970s, Land & Collett [5] performed

pioneering studies on the visual guidance of ﬂight in

blowﬂies and, later, in hoverﬂies [14,15]. By the end of

the 1970s and into the 1980s, three-dimensional recon-

structions using two views of ﬂying insects were

performed [6–8,16,17]. In one case, the shadow of a

bee on a planar white surface was used as a second

view to perform three-dimensional reconstruction [18].

Today, hand digitization is still used when complex

cameras

(a)

(b)

12 n

local

computers

central

compute

r

free-flight arena

0.25

0

speed (m s

–1

)

Figure 1. (a) Schematic of the multi-camera tracking system.

(b) A trajectory of a ﬂy (Drosophila melanogaster) near a

dark, vertical post. Arrow indicates direction of ﬂight at

onset of tracking.

396 Real-time 3D multi-camera animal tracking A. D. Straw et al.

J. R. Soc. Interface (2011)

kinematics, such as wing shape and position, are desired,

such as in Drosophila [13], cockatoo [19] and bats [20].

Several authors have solved similar automated multi-

target tracking problems using video. For example,

Khan et al. [21] tracked multiple, interacting ants in

two-dimensions from a single view using particle ﬁlter-

ing with a Markov Chain Monte Carlo sampling step to

solve the multi-targe t tracking problem. Later work by

the same authors [22] achieved real-time speeds through

the use of sparse updating techniques. Branson et al.

[23] addressed the same problem for walking ﬂies.

Their technique uses background subtraction and clus-

tering to detect ﬂies in the image, and casts the data

association problem as an instance of minimum

weight bipartite perfect matching. In implementing

ﬂydra, we found the simpler system described here to

be sufﬁcient for tracking the position of ﬂying ﬂies

and hummingbirds (§5). In addition to tracking in

three dimensions rather than two, a key difference

between the work described about and those addressed

in the present wo rk is that the interactions between our

animals are relatively weak (§3.2, especially equation

(3.6)), and we did not ﬁnd it necessary to implement

a more advanced tracker. Nevertheless, the present

work could be used as the basis for a more advanced

tracker, such as the one using a particle ﬁlter (e.g.

[24]). In that case, the posterior from the EKF (§3.1)

could be used as the proposal distribution for the par-

ticle ﬁlter. Others have decentralized the multiple

object tracking problem to improve performance,

especially when dealing with dynamic occlusions

owing to targets occluding each other (e.g. [25,26]).

Additionally, tracking of dense clouds of starlings

[27–30] and fruit ﬂies [31,32] has enabled detailed

investigation of swarms, although these systems are cur-

rently incapable of operating in real time. By ﬁlming

inside a corner-cube reﬂector, multiple (real and

reﬂected) images allowed Bomphrey et al. [33]totrack

ﬂies in three dimensions with only a single camera,

and the tracking algorithm presented here could make

use of this insight.

Completely automated three-dimensional animal

tracking systems have more recently been created,

such as systems with two cameras that track ﬂies in

real time [34–36]. The system of Grover et al. [ 37], simi-

lar in many respects to the one we describe here, tracks

the visual hull of ﬂies using three cameras to reconstruct

a polygonal model of the three-dimensional shape of the

frame t – 1(a)

(b)(c)

camera i, frame t

camera 1

camera 2

frame t + 1

frame t

two-dimensional

image features

two-dimensional

observations

a priori

three-dimensional

state estimates

three-dimensional

state estimates

false detection

d

2

t

= (..., 3, ...)

d

1

t

= (..., 1, ...)

motion model

( 3.1,

equations

3.2, A

1)

data association

( 3.2,

equations

3.6–3.13)

EKF update

( 3.1, equations 3.3,

3.4, A

5)

a posteriori

three-dimensional

state estimates

data association

matrix

Figure 2. (a) Flowchart of operations. (b) Schematic of a two-dimensional camera view showing the raw images (brown), feature

extraction (blue), state estimation (black), and data association (red). See §§2 and 3.2.4 for an explanation of the symbols. (c)

Three-dimensional reconstruction using the EKF uses prior state estimates (open circle) and observations (blue lines) to con-

struct a posterior state estimate (ﬁlled circle) and covariance ellipsoid (dotted ellipse). See appendix A for details.

Real-time 3D multi-camera animal tracking A. D. Straw et al. 397

J. R. Soc. Interface (2011)

ﬂies. Our system, brieﬂy described in a simpler, earlier

form in Maimon et al. [38], differs in several ways.

First, ﬂydra has a design goal of tracking over large

volumes, and, as a result of the associated limited

spatial resolution (rather than owing to a lack of inter-

est), ﬂydra is concerned only with the location and

orientation of the animal. Second, to facilitate tracking

over large volumes, the ﬂydra system uses a data associ-

ation step as part of the tracking algorithm. The data

association step allows ﬂydra to deal with additional

noise (false positive feature detections) when dealing

with low contrast situations often present when

attempting to track in large volumes. Third, our

system does not attempt to maintain identity corre-

spondence of multiple animals over extended

durations, but rather stops tracking individuals when

the tracking error is too high and starts tracking as

new individuals when detected again. Finally, although

their system operates in real time, no measurements of

latency were provided by Grover et al. [37] with

which to compare our measurements.

1.3. Notation

In the equations to follow, letters in a bold, roman

font signify a vector, which may be speciﬁed by com-

ponents enclosed in parentheses and separated by

commas. Matrices are written in roman font with

uppercase letters. Scalars are in italics. Vectors always

act like a single column matrix, such that for vector

v ¼ (a, b, c), the multiplication with matrix M is

Mv ¼ M½v¼M

a

b

c

2

4

3

5

.

2. TWO-DIMENSIONAL FEATURE

EXTRACTION

The ﬁrst stage of processing converts digital images into

a list of feature points using an elaboration of a back-

ground subtraction algorithm. Because the image of a

target is usually only a few pixels in area, an individual

feature point from a given camera characterizes that

camera’s view of the target. In other words, neglecting

missed detections or false positives, there is usually a

one-to-one correspondence between targets and

extracted feature points from a given camera. Neverthe-

less, our system is capable of successful tracking despite

missing observations owing to occlusion or low contrast

(§3.1) and rejecting false positive feature detections (as

described in §3.2).

In the Bayesian framework, all feature points for

time t are the observation Z

t

. The ith of n cameras

270

50 mm

50 mm

180

90

0

–90

(a)(b)

(c)

0

3530252015

time (s)

1050

0

500

1000

1500

2000

|ψ|, angular velocity

(°

s

–1

)

|ψ|, approach angle (°)

50

100 150 200 250

κ, horizontal distance (mm)

κ

ψ

Figure 3. (a) Top view of the ﬂy trajectory in ﬁgure 1b, showing several close approaches to and movements away from a dark post

placed in the centre of the arena. The arrow indicates initial ﬂight direction. Two sequences are highlighted in colour. The inset

shows the coordinate system. (b) The time-course of attraction and repulsion to the post is characterized by ﬂight directly

towards the post until only a small distance remains, at which point the ﬂy turns rapidly and ﬂies away. Approach angle

c

is

quantiﬁed as the difference between the direction of ﬂight and bearing to the post, both measured in the horizontal plane.

The arrow indicates the initial approach for the selected sequences. (c) Angular velocity (measured tangent to the three-dimen-

sional ﬂight trajectory) indicates that relatively straight ﬂight is punctuated by saccades (brief, rapid turns).

398 Real-time 3D multi-camera animal tracking A. D. Straw et al.

J. R. Soc. Interface (2011)

returns m feature points, with each point z

ij

being a

vector z

ij

¼ (u, v,

a

,

b

,

u

,

e

) where u and v are the coor-

dinates of the point in the image plane and the

remaining components are local image statistics

described below. Z

t

thus consists of all such feature

points for a given frame Z

t

¼ fz

11

, ..., z

1m

, ...,z

n1

, ...,

z

nm

g. (In the interest of simpliﬁed notation, our indexing

scheme is slightly misleading here—there may be varying

numbers of features for each camera rather than always

m as suggested.)

The process to convert a new image to a series of

feature points uses a process based on background s ub-

traction using the running Gaussian average method

(reviewed in [39]). To achieve fast im age processing

required for real-time operation, many of these oper-

ations are performed using the high-performance

single instruction multiple data extensions available on

recent 86 CPUs. Initially, an absolute difference

image is made, where each pixel is the absolute value

of the difference between the incoming frame and the

background image. Feature points that exceed some

threshold difference from the background image are

noted and a small region around each pixel is subjected

to further analysis. For the jth feature, the brightest

point has value

b

j

in this absolute difference image.

All pixels below a certain fraction (e.g. 0.3) of

b

j

are

set to zero to reduce moment arms caused by spurious

pixels. Feature area

a

j

is found from the 0th moment,

the feature centre (u

˜

j

, v

˜

j

) is calculated from the 1st

moment and the feature orientation

u

j

and eccentricity

e

j

are calculated from higher moments. After correcting

for lens distortion (§4), the feature centre is (u

j

, v

j

).

Thus, the jth point is characterized by the vector z

j

¼

(u

j

, v

j

,

a

j

,

b

j

,

u

j

,

e

j

). Such features are extracted on

every frame from every camera, although the num ber

of points m found on each frame may vary. We set

the initial thresholds for detection low to minimize

the number of missed detections—false positives at

this stage are rejected later by the data association

algorithm (§3.2).

Our system is capable of dealing with illumination

conditions that vary slowly over time by using an

ongoing estimate of the background luminance and its

variance, which are maintained on a per-pixel basis

by updating the current estimates with data from

every 500th frame (or other arbitrary interval). A

more sophisticated two-dimensional feature extraction

algorithm could be used, but we have found this

scheme to be sufﬁcient for our purposes and sufﬁciently

simple to operate with minimal latency.

While the real-time operation of ﬂydra is essential for

experiments modifying sensory feedback, another

advantage of an online tracking system is that the

amount of data required to be saved for later analysis

is greatly reduced. By performing only two-dimensional

feature extraction in real time, to reconstruct three-

dimensional trajectories later, only the vectors z

j

need

be saved, resulting in orders of magnitude less data

than the full-frame camera images. Thus, to achieve

solely the low data rates of real-time tracking, the fol-

lowing sections dealing with three dimensions are not

necessary to be implemented for this beneﬁt of real-

time use. Furthermore, raw images taken from the

neighbourhood of the feature points could also be

extracted and saved for later analysis, saving slightly

more data, but still at rates substantially less than the

full camera frames provide. This fact is particularly

useful for cameras with a higher data rate than hard

drives can save, and such a feature is implemented

in ﬂydra.

Figure 4 shows the parameters (u, v,

u

) from the

two-dimensional feature extraction algorithm during

a hummingbird ﬂight. These two-dimensional featu-

res, in addition to three-dimensional reconstructions,

are overlaid on raw images extracted and saved using

the real-time image extraction technique described

above.

3. MULTI-TARGET TRACKING

The goal of ﬂydra, as described in §1.1, is to ﬁnd the

MAP estimate of the state of all targets. For simplicity,

we model interaction between targets in a very limited

0 ms

cam 1

cam 2

25 ms 50 ms 75 ms 100 ms

Figure 4. Raw images, two-dimensional data extracted from the images, and overlaid computed three-dimensional position and

body orientation of a hummingbird (Calypte anna). In these images, a blue circle is drawn centred on the two-dimensional image

coordinates (u

˜

, v

˜

). The blue line segment is drawn through the detected body axis (

u

in §3.2) when eccentricity (1) of the detected

object exceeds a threshold. The orange circle is drawn centred on the three-dimensional estimate of position (x, y, z) reprojected

through the camera calibration matrix P, and the orange line segment is drawn in the direction of the three-dimensional body

orientation vector.

Real-time 3D multi-camera animal tracking A. D. Straw et al. 399

J. R. Soc. Interface (2011)

way. Although in many cases the animals we are

interested in tracking do interact (for example, hum-

mingbirds engage in competition in which they

threaten or even contact each other), mathematically

limiting the interaction facilitates a reduction in com-

putational complexity. First, the process update is

independent for each kth animal

pðS

t

jS

t1

Þ¼

Y

k

pðS

t;k

jS

t1;k

Þ: ð3:1Þ

Second, we implemented only a slight coupling

between targets in the data association algorithm.

Thus, the observation likelihood model p(Z

t

jS

t

) is inde-

pendent for each target with the exception described in

§3.2.1, and making this assumption allows use of the

NNSF as described below.

Modelling individual target states as mostly indepen-

dent allows the problem of estimating the MAP of joint

target state S to be treated nearly as l independent,

smaller problems. One beneﬁt of making the assump-

tion of target independence is that the target tracking

and data association parts of our system are paralleliz-

able. Although not yet implemented in parallel, our

system is theoretically capable of tracking very

many (tens or hundreds) targets simultaneously with

low latency on a computer with sufﬁciently many

processing units.

The cost of this near-independence assumption is

reduced tracking accuracy during periods of near con-

tact (§3.2.1). Data from these periods could be

analysed later using more sophisticated multi-target

tracking data association techniques, presumably in

an ofﬂine setting, especially because such periods

could be easily identiﬁed using a simple algorithm. All

data presented in this paper used the procedure

described here.

3.1. Kalman ﬁltering

The standard EKF approximately estimates statistics of

the posterior distribution (equation (1.1)) for nonlinear

processes with additive Gaussian noise (details are

given in appendix A). To use this framework, we

make the assumption that noise in the relevant

processes is Gaussian. Additionally, our target indepen-

dence assumption allows a single Kalman ﬁlter

implementation to be used for each tracked target.

The EKF estimates state and its covariance based on

a prior state estimate and incoming observations

by using models of the state update process, the

observation process, and estimates of the noise of each

process.

We use a linear model for the dynamics of the system

and a nonlinear model of the observation process.

Speciﬁcally, the time evolution of the system is

modelled with the linear discrete stochastic model

s

t

¼ As

t1

þ w: ð3:2Þ

We treat the target as an orientation-free particle,

with state vector s ¼ (x, y, z , x

˙

, y

˙

, z

˙

) describing position

and velocity in three-dimensional space. The process

update model A represents, in our case, the laws of

motion for a constant velocity particle

A ¼

100dt 00

0100dt 0

0010 0dt

000 1 0 0

000 0 1 0

000 0 0 1

2

6

6

6

6

6

6

4

3

7

7

7

7

7

7

5

with dt being the time step. Manoeuvring of the target

(deviation from the constant velocity) is modelled as

noise in this formulation. The random variable w rep-

resents this process update noise with a normal

probability distribution with zero mean and the process

covariance matrix Q. Despite the use of a constant vel-

ocity model, the more complex trajectory of a ﬂy or

other target is accurately estimated by updating the

state estimate with frequent observations.

For a given data association hypothesis (§3.2), a set

of observations is available for target k. A nonlinear

observation model, requiring use of an EKF, is used

to describe the action of a projective camera (equations

(3.3) and (3.4)). This allows observation error to be

modelled as Gaussian noise on the image plane. Fur-

thermore, during tracking, triangulation happen s only

implicitly, and error estimates of target position are

larger along the direction of t he ray between the

target and the camera centre. (To be clear, explicit

triangulation is performed during the initialization

of a Kalman model target, as explained in §3.2.)

Thus, observations from alternating single cameras on

successive frames would be sufﬁcient to produce a

three-dimensional estimate of target position. For

example, ﬁgure 5 shows a reconstructed ﬂy trajectory

in which two frames were lacking data from all but

one camera. During these two frames, the estimated

error increased, particularly along the camera-ﬂy axis,

and no triangulation was possible. Nevertheless, an

1000

(a)

(b)

y (mm)z (mm)

x (mm)

950

120

150 200 250 300 350

70

Figure 5. Two seconds of a reconstructed Drosophila trajec-

tory. (a) Top view. (b) Side view of same trajectory.

Kalman ﬁlter-based estimates of ﬂy position s are plotted as

dots at the centre of ellipsoids, which are the projections of

the multi-variate normal speciﬁed by the covariance matrix

P. Additionally, position estimated directly by triangulation

of two-dimensional point locations (see appendix B) is plotted

with crosses. The ﬂy began on the right and ﬂew in the direc-

tion denoted by the arrow. Note that for two frames near the

beginning, only a single camera contributed to the tracking

and the error estimate increased.

400 Real-time 3D multi-camera animal tracking A. D. Straw et al.

J. R. Soc. Interface (2011)

estimate of three-dimensional position was made and

appears reasonable. The observation vector y ¼ (u

1

,

v

1

, u

2

, v

2

, ..., u

n

, v

n

) is the vector of the distortion-

corrected image points from n cameras. The nonlinear

observation model relates y

t

to the state s

t

by

y

t

¼ hðs

t

Þþv ð3:3Þ

where s

t

is the state at time t, the function h models the

action of the cameras and v is observation noise. The

vector-valued h(s) is the concatenation of the image

points found by applying the image point equations

(equations (B 2) and (B 3) in appendix B.) to each of

the n cameras

hðsÞ¼ðh

1

ðsÞ; ...;h

n

ðsÞÞ

¼ð

u

1

;

v

1

; ...;

u

n

;

v

n

Þ

¼

r

1

t

1

;

s

1

t

1

; ...;

r

n

t

n

;

s

n

t

n

¼ðHðP

1

XÞ; ...;HðP

n

XÞÞ:

ð3:4Þ

The overbar (-) denotes a noise-free prediction to

which the zero-mean noise vector v is added, and X is

the homogene ous form of the ﬁrst three components

of s. The random variable v models the observation

noise as normal in the image plane with zero mean

and covariance matrix R.

At each time step t, the EKF formu lation is then

used to estimate the state sˆ in addition to the error P

(see appendix A). Together, the data associated with

each target is

G

¼ fsˆ, Pg. With the possibility of mul-

tiple targets being tracked simultaneously, the kth

target is assigned

G

k

.

One issue we faced when implementing the Kalman

ﬁlter was parameter selection. Our choice of parameters

was done through educated guesses followed by an iter-

ated trial-and-error procedure using several different

trajectories’ observations. The parameters that resulted

in trajectories closest to those seen by eye and with least

magnitude error estimate P were used. We obtained

good results, for fruit ﬂies measured under the con-

ditions of our set-ups, with the process covariance

matrix Q being diagonal, with the ﬁrst three entries

being 100 mm

2

and the next three being 0.25

(m

2

s

22

). Therefore, our model treats manoeuvring as

position and velocity noise. For the observation covari-

ance matrix R, we found good results with a diagonal

matrix with entries of 1, corresponding to variance of

the observed image positions of one pixel. Parameter

selection could be automated by an expectation–

maximization algorithm, but we found this was not

necessary.

Another issue is missing data—in some time steps,

all views of the ﬂy may be occluded or low contrast,

leaving a missing value of y for that time step. In

those cases, we simply set the a posteriori estimate to

the a priori prediction, as follows from equation (1.1).

In these circumstances, the error estimate P grows by

the process covariance Q, and does not get reduced by

(non-existent) new observations. This follows directly

from the Kalman ﬁlter equations (appendix A). If too

many successive frames with no observations occur,

the error estimate will exceed a threshold and tracking

will be terminated for that target (described in §3.2.3).

3.2. Data association

One simpliﬁcation made in the system overview (§1.1)

was to neglect the data association problem—the

assignment of observations to targets. We address the

problem by marginalizing the observation likelihood

across hidden data association variables D, where each

D corresponds to a different hypothesis abo ut how the

feature points correspond with the targets. Thus, the

model of observation likelihood from equation (1.1)

becomes

pðZ

t

jS

t

Þ¼

X

D

pðZ

t

; DjS

t

Þ: ð3:5Þ

In fact, computing probabilities across all possible

data association hypotheses D across multiple time

steps would result in a combinatorial explosion of possi-

bilities. Among the various means of limiting the

amount of computation required by limiting the

number of hypotheses considered, we have chosen a

simple method, the NNSF data association algorithm

run on each target independently [40]. This algorithm

is sufﬁciently efﬁcient to operate in real time for typical

conditions of our system. Thus, we approximate the

sum of all data association hypotheses with the single

best hypothesis D

NNSF

, deﬁned to be the NNSF

output for each of the k independent targets

pðZ

t

; D

NNSF

jS

t

Þ

X

D

pðZ

t

; DjS

t

Þ: ð3:6Þ

This implies that we assume hypotheses other than

D

NNSF

have vanishingly small probability. Errors

owing to this assumption being false could be corrected

in a later, ofﬂine pass through the data keeping track of

more data association hypotheses using other

algorithms.

D is a matrix with each column being the data associ-

ation vector for target k such that D ¼ [d

1

...d

k

...].

This matrix has n rows (the number of cameras) and l

columns (the number of active targets). The data associ-

ation vector d

k

for target k has n elements of value null

or index j of the feature z

j

assigned to that target. As

described below (§3.2.4), these values are computed

from the predicted location of the target and the features

returned from the cameras.

3.2.1. Preventing track merging. One well-known poten-

tial problem with multi-target tracking is the undesired

merging of target trajectories if targets begin to share

the same observations. Before implementing the follow-

ing rule, ﬂydra would sometimes suffer from this

merging problem when tracking hummingbirds engaged

in territorial competition. In such ﬁghts, male hum-

mingbirds often ﬂy directly at each other and come in

physical contact. To prevent the two trajectories from

merging in such cases, a single pass is made through

the data association assignments after each frame. In

the case that more than one target was assigned the

exact same subset of feature points, a comparison is

Real-time 3D multi-camera animal tracking A. D. Straw et al. 401

J. R. Soc. Interface (2011)

made between the observation and the predicted obser-

vation. In this case, only the target corresponding to the

closest prediction is assigned the data, and the other

target is updated without any observation. We found

this procedure to require minimal additional compu-

tational cost, while still being effective in preventing

trajectory merging.

3.2.2. NNSF and generative model of image features. To

implement the NNSF algorithm, we implement a gen-

erative model of feature appearance based on the prior

estimate of target state. By predicting target position

in an incoming frame based on prior information, the

system selects two-dimensional image points as being

likely to come from the target by gating unlikely obser-

vations, thus limiting the amount of computation

performed.

Recall from §2 that for each time t and camera i, m

feature points are found with the jth point being z

j

¼

(u

j

, v

j

,

a

j

,

b

j

,

u

j

, 1

j

). The distortion-corrected image

coordinates are (u, v), while

a

is the area of the object

on the image plane measured by thresholding of the

difference image between the current and background

image,

b

is an estimate of the maximum difference

within the difference image, and

u

and 1 are the slope

and eccentricity of the image feature. Each camera

may return multiple candidate points per time step,

with all points from the ith camera represented as Z

i

,

a matrix whose columns are the individual vectors z

j

,

such that Z

i

¼ [ z

1

...z

m

]. The purpose of the data

association algorithm is to assign each incoming point

z to an existing Kalman model

G

, to initialize a new

Kalman model, or attribute it to a false positive (a

null target). Furthermore, old Kalman models for

which no recent observations exist owing to the

target leaving the tracking volume must be deleted.

The use of such a data association algorithm allows

ﬂydra to track multiple targets simultan eously, as in

ﬁgure 6 , and, by reducing computational costs, allows

the two-dimensional feature extraction algorithm

to return many points to minimize the number of

missed detections.

3.2.3. Entry and exit of targets. How does our system

deal with existing targets losing visibility owing to leav-

ing the tracking volume, occlusion or lowered visual

contrast? What happens when new targets become vis-

ible? We treat such occurrences as part of the update

model in the Bayesian framework of §1.1. Thus, in the

terminology from that section, our motion model for

all targets p(S

t

jS

t21

) includes the possibility of initializ-

ing a new target or removing an existing target. This

section describes the procedure followed.

For all data points z that remained ‘unclaimed’

by the predicted locations of pre-existing targets

(see §3.2.4 below), we use an unguided hypothesis

testing algorithm. This triangulates a hypothe-

sized three-dimensional point for every possible

combination of 2, 3, ..., n cameras, for a total of

n

2

þ

n

3

þþ

n

n

combinations. Any three-

dimensional point with reprojection error less than an

arbitrary threshold using the greatest number of cam-

eras is then used to initialize a new Kalman ﬁlter

instance. The initial state estimate is set to that

three-dimensional position with zero velocity and a rela-

tively high error estimate. Tracking is stopped (a target

is removed) once the estimated error P exceeds a

threshold. This most commonly happens, for example,

when the target leaves the tracking area and thus

receives no observations for a given number of frames.

3.2.4. Using incoming data. Ultimately, the purpose

of the data association step is to determine which

image feature points will be used by which target.

Given the target independence assumption, each

target uses incoming data independently. This section

outlines how the data association algorithm is used to

determine the feature points treated as the observation

for a given target.

To use the Kalman ﬁlter described in §3.1, obser-

vation vectors must be formed from the incoming

data. False positives must be separated from correct

detections, and, because multiple targets may be

tracked simultaneously, correct detections must be

associated with a particular Kalman model. At

the beginning of processing for each time step t for

the kth Kalman model, a prior estimate of target pos-

ition and error

G

tjt21

k

¼ fsˆ

tjt21

k

,P

tjt21

k

g is available. It

must be determined which, if any, of the m image

points from the ith camera is associated with the kth

target. Due to the real-time requirements for our

system, ﬂydra gates incoming detections on simple

criteria before performing more computationally inten-

sive tasks.

For target k at time t, the data association function

g is

d

k

t

¼ gðZ

1

; ...; Z

n

;

G

k

tjt1

Þ: ð3:7Þ

This is a function of the image points Z

i

from each of the

n cameras and the prior information for target k.

1.0(a)

(b)

(c)

1.0

1.0

0.8

0.6

0.4

0.2

0

0 5 10 15 20

time (s)

1.5

0.5

0

x (m)

–0.5

0.5

0

y (m)z (m)

–0.5

–1.0

Figure 6. Multiple ﬂies tracked simultaneously. Each auto-

matically segmented trajectory is plotted in its own colour.

Note that the dark green and cyan trajectories probably

came from the same ﬂy which, for a period near the 10th

second, was not tracked owing to a series of missed detections

or leaving the tracking volume (§3.2). (a) First horizontal axis

(x). (b) Second horizontal axis (y). (c) Vertical axis (z).

402 Real-time 3D multi-camera animal tracking A. D. Straw et al.

J. R. Soc. Interface (2011)

The assignment vector for the kth target, d

k

, deﬁnes

which points from which cameras view a target. This

vector has a component for each of the n cameras,

which is either null (if that camera does not contribute)

or is the column index of Z

i

corresponding to the associ-

ated point. Thus, d

k

has length n, the number of

cameras, and no camera may view the same target

more than once. Note, the k and t superscript and sub-

script on d indicate the assignment vector is for target k

at time step t, whereas below (equation (3.8)), the sub-

script i is used to indicate the ith component of the

vector d.

The data association function g may be written in

terms of the components of d. The ith component is

the index of the columns of Z

i

that maximizes likelihood

of the observation given the predicted target state and

error and is deﬁned to be

d

i

¼ argmax

j

ð pðz

j

j

G

ÞÞ; z

j

[ Z

i

: ð3:8Þ

Our likelihood function gates detections based on

two conditions. First, the incoming detected location

(u

j

, v

j

) must be within a threshold Euclidean distance

from the estimated target location projected on the

image. The Euclidean distance on the image plane is

dist2d ¼ d

euclid

u

j

v

j

; HðP

i

XÞ

; ð3:9Þ

where H(P

i

X) ﬁnds the projected image coordinates of

X, where X is the homogeneous form of the ﬁrst three

components of s, the expected three-dimensional pos-

ition of the target. The function H and camera matrix

P

i

are described in appendix B. The gating can be

expressed as an indicator function

1

dist2d

ðu

j

; v

j

Þ¼

1 if dist2d , thresh

dist2d

;

0 otherwise:

ð3:10Þ

Second, the area of the detected object (

a

j

)mustbe

greater than a threshold value, expressed as

1

area

ð

a

j

Þ¼

1if

a

j

. thresh

area

;

0 otherwise:

ð3:11Þ

If these conditions are met, the distance of the ray

connecting the camera centre and two-dimensional

point on the image plane (u

j

, v

j

) from the expected

three-dimensional location a

¯

is used to further deter-

mine likelihood. We use the Mahalanobis distance,

which for a vector a with an expected value of a

¯

with

covariance matrix S is

d

mahal

ða;

¯

aÞ¼

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

ða

¯

aÞ

T

S

1

ða

¯

aÞ

q

: ð3:12Þ

Because the distance function is convex for a given a

¯

and S, we can solve directly for the closest point on the

ray, by setting a

¯

equal to the ﬁrst three terms of sˆ

tjt21

and S to the upper left 3 3 submatrix of P

tjt21

.

Then, if the ray is a parametrized line of the form

L(s) ¼ s

.

(a, b, c) þ (x, y, z) where (a, b, c) is the direc-

tion of the ray formed by the image point (u, v) and the

camera centre and (x, y, z) is a point on the ray, we can

ﬁnd the value of s for which the distance between L(s)

and a

¯

is minimized by ﬁnding the value of s where

derivative of d

mahal

(L(s), a

¯

) is zero. If we call this

closest point a and combine equations ( 3.10) –(3.12),

then our likelihood function is

pðz

j

j

G

Þ¼1

dist2d

ðu

j

; v

j

Þ 1

area

ð

a

j

Þ e

d

mahal

ða;

¯

aÞ

: ð3:13Þ

Note that, owing to the multiplication, if either of

the ﬁrst two factors is zero, the third (and more compu-

tationally expensive) condition need not be evaluated.

4. CAMERA AND LENS CALIBRATION

Camera calibrations may be obtained in any way that

produces camera calibration matrices (described in

appendix B) and, optionally, parameters for a model of

the nonlinear distortions of cameras. Good calibrations

are critical for ﬂydra because, as target visibility changes

from one subset of cameras to another subset, any misa-

lignment of the calibrations will introduce artefactual

movement in the reconstructed trajectories. Typically,

we obtain camera calibrations in a two-step process.

First, the direct linear transformation (DLT) algorithm

[41] directly estimates camera calibration matrices P

i

that could be used for triangulation as described in

appendix B. However, because we use only about 10

manually digitized corresponding two-/three-dimen-

sional pairs per camera, this calibration is of relatively

low precision and, as performed, ignores optical distor-

tions causing deviations from the linear simple pinhole

camera model. Therefore, an automated Multi-Camer a

Self Calibration Toolbox [42] is used as a second step.

This toolbox uses inherent numerical redundancy when

multiple cameras are viewing a common set of three-

dimensional points through use of a factorization algor-

ithm [43] followed by bundle adjustment (reviewed in

§18.1 of [44]). By moving a small LED point light

source through the tracking volume (or, indeed, a freely

ﬂying ﬂy), hundreds of corresponding two-/three-dimen-

sional points are generated which lead to a single,

overdetermined solution which, without knowing the

three-dimensional positions of the LED, is accurate up

to a scale, rotation and translation. The camera centres,

either from the DLT algorithm or measured directly, are

then used to ﬁnd the best scale, rotation and translation.

As part of the Multi-Camera Self Calibration Toolbox,

this process may be iterated with an additional step to

estimate nonlinear camera parameters such as radial dis-

tortion [42] using the Camera Calibration Toolbox of

Bouguet [45]. Alternatively, we have also implemented

the method of Prescott & McLean [46]toestimate

radial distortion parameters before use of Svoboda’s tool-

box, which we found necessary when using wide angle

lenses with signiﬁcant radial distortion (e.g. 150 pixels

in some cases).

5. IMPLEMENTATION AND EVALUATION

We built three different ﬂydra systems: a ﬁve camera,

six computer 100 fps system for tracking fruit ﬂies in

a 0.3 0.3 1.5 m arena (e.g. ﬁgures 1 and 9;[38],

which used the same hardware but a simpler version

of the tracking software), an 11-camera, nine-computer

60 fps system for tracking fruit ﬂies in a large—2 m

Real-time 3D multi-camera animal tracking A. D. Straw et al. 403

J. R. Soc. Interface (2011)

diameter 0.8 m high—cylinder (e.g. ﬁgure 5) and a

four camera, ﬁve computer 200 fps system for tracking

hummingbirds in a 1.5 1.5 3 m arena (e.g.

ﬁgure 4). Apart from the low-level camera drivers, the

same software is running on each of these systems.

We used the Python computer language to

implement ﬂydra. In particular, the open source

M

OTMOT camera software serves as the basis for the

image acquisition and real-time image processing [47].

The motmot.libcamiface documentation contains infor-

mation about what cameras are compatible, and the

motmot.realtime_image_analysis package contains

the algorithms for the two-dimensional real-time

image processing. Several other pieces of software,

most of which are open source, are instrumental to

this system: P

YTABLES, NUMPY, SCIPY,PYRO, WXPYTHON,

TVTK, VTK, MATPLOTLIB,PYOPENGL, PYGLET,PYREX,

CYTHON, CTYPES,INTEL IPP, OpenCV, ATLAS,

LIBDC1394, PTPD, GCC and UBUNTU. We used Intel

Pentium 4 and Core 2 Duo-based computers.

The quality of three-dimensional reconstructions was

veriﬁed in two ways. First, the distance between two-

dimensional points projected from a three-dimensional

estimate derived from the originally extracted two-

dimensional points is a measure of calibration precision.

For all ﬁgures shown, the mean reprojection error was

less than one pixel, and for most cameras in most

ﬁgures, was less than 0.5 pixels. Second, to determine

accuracy, we veriﬁed the three-dimensional coordinates

and distances between coordinates measured through

triangulation against values measured physically. For

two such measurements in the system shown in

ﬁgure 1, these values were within 4 per cent. So, in gen-

eral, the system appears to have high precision (low

reprojection error), but slighly worse accuracy. Because

we are using standard calibration and estimation algor-

ithms, we did not perform a more detailed groundtruth

analysis of position estimates.

We measured the latency of the three-dimensional

reconstruction by synchronizing several clocks involved

in our experimental setup and then measuring the

duration between onset of image acquisition and com-

pletion of the computation of target position. When

ﬂydra is running, the clocks of the various computers

are synchronized to within 1 ms by PTPd, the precise

time protocol daemon, an implementation of the

IEEE 1588 clock synchronization protocol [ 48].

Additionally, a microcontroller (AT90USBKEY,

Atmel, USA) running custom ﬁrmware is connected

over USB to the central reconstruction computer and

is synchronized using an algorithm similar to PTPd,

allowing the precise time of frame trigger events to be

known by processes running within the computers.

Measurements were made of the latency between the

time of the hardware trigger pulse generated on the

microcontroller to start acquisition of frame t and

the moment the state vector sˆ

tjt

was computed. These

measurements were made with a central computer

being a 3 GHz Intel Core 2 Duo CPU. As shown in

ﬁgure 7, the median three-dimensional reconstruction

timestamp is 39 ms. Further investigation showed the

make-up of this delay. From the speciﬁcations of the

cameras and bus used, 19.5 ms is a lower bound on

the latency of transferring the image across the

IEEE 1394 bus (and could presumably be reduced

by using alternative technologies such as Gigabit

Ethernet or Camera Link). Further measurements on

this system showed that two-dimensional feature

extraction takes 6–12 ms, and that image acquisition

and two-dimensional feature extraction together

take 26–32 ms. The remainder of the latency to

three-dimensional estimation is owing to network trans-

mission, triangulation and tracking, and, most likely,

non-optimal queueing and ordering of data while pas-

sing it between these stages. Further investigation and

optimization have not been performed. Although we

have not calculated latency in a similar way in the

200 fps hummingbird tracking GigE system, the

computers are signiﬁcantly faster and therefore two-

dimensional feature extraction takes less than 5 ms on

Core 2 Duo computers. Because the GigE bus is

approximately twice as fast as the 1394, the similarly

sized images arrive with half the latency. We therefore

expect median latency in this system to be about 25 ms.

6. EXPERIMENTAL POSSIBILITIES

A few examples serve to illustrate some of the capabili-

ties of ﬂydra. We are actively engaged in understanding

the sensory-motor control of ﬂight in the fruit ﬂy

Drosophila melanogaster. Many basic features of the

effects of visual stimulation on the ﬂight of ﬂies

are known, and the present system allows us to charac-

terize these phenomena in substantial detail. For

example, the presence of a vertical landmark such as a

black post on a white background greatly inﬂuences

the structure of ﬂight, causing ﬂies to ‘ﬁxate’, or turn

towards, the post [49]. By studying such behaviour in

free ﬂight (e.g. ﬁgure 3), we have found that ﬂies

approach the post until some small distance is reached,

and then often turn rapidly and ﬂy away.

Because we have an online estimate of ﬂy velocity

and position in sˆ, we can predict the future location of

the ﬂy. Of course, the quality of the prediction declines

with the duration of extrapolation, but it is sufﬁcient

200 000

150

000

100

000

n occurances

50 000

0

01020304050607080

latenc

y

(ms)

Figure 7. Latency of one tracking system. A histogram of the

latency of three-dimensional reconstruction was generated

after tracking 20 ﬂies for 18 h. Median latency was 39 ms.

404 Real-time 3D multi-camera animal tracking A. D. Straw et al.

J. R. Soc. Interface (2011)

for many tasks, even with the latency taken by the

three-dimensional reconstruction itself. One example is

the triggering of high resolution, high speed cameras

(e.g. 1024 1024 pixels, 6000 frames per second as

shown in ﬁgure 8a). Such cameras typically buffer

their images to RAM and are downloaded ofﬂine. We

can construct a trigger condition based on the position

of an animal (or a recent history of position, allowing

triggering only on speciﬁc manoeuvres). Figure 8a

shows a contrast-enhancing false colour montage of a

ﬂy making a close approach to a post before leaving.

By studying precise movements of the wings and body

in addition to larger-scale movement through the

environment, we are working to understand the neural

and biomechanical mechanisms underlying control

of ﬂight.

An additional possibility enabled by low latency

three-dimensional state estimates is that visual stimuli

can be modulated to produce a ‘virtual reality’ environ-

ment in which the properties of the visual feedback loop

can be artiﬁcially manipulated [35,36,52,53]. In these

types of experiments, it is critical that moving visual

stimuli do not affect the tracking. For this reason, we

illuminate ﬂies with near-IR light and use high pass ﬁl-

ters in front of the camera lenses (e.g. R-72, Hoya

Corporation). Visual cues provided to the ﬂies are in

the blue-green range.

By estimating the orientation of the animal, approxi-

mate reconstructions of the visual stimulus experienced

by the animal may be made. For example, to make

ﬁgure 8b, a simulated view through the compound eye

of a ﬂy, we assumed that the ﬂy head, and thus eyes,

was oriented tangent to the direction of travel, and

that the roll angle was ﬁxed at zero. This information,

together with a three-dimensional model of the environ-

ment, was used to generate the reconstruction [ 50,51].

Such reconstructions are informative for understanding

the nature of the challenge of navigating visually with

limited spatial resolution. Although it is known that

ﬂies do move their head relative to their longitudinal

body axis, these movements are generally small [2],

and thus the errors resulting from the assumptions

listed above could be reduced by estimating body orien-

tation using the method described in appendix

B. Because a ﬂy’s eyes are ﬁxed to its head, further

reconstruction accuracy could be gained by ﬁxing the

head relative to the body (by gluing the head to the

thorax), although the behavioural effects of such a

manipulation would need to be investigated. Neverthe-

less, owing to the substantial uncertainties involved in

estimating the pose of an insect head, investigations

based on such reconstructions would need to be

carefully validated.

Finally, because of the conﬁgurability of the

system, it is feasible to consider large-scale tracking in

naturalistic environments that would lead to greater

understanding of the natural history of ﬂies [54]or

other animals.

7. EFFECT OF CONTRAST ON SPEED

REGULATION IN DROSOPHILA

At low levels of luminance contrast, when differences in

the luminance of different parts of a scene are minimal,

it is difﬁcult or impossible to detect motion of visual

features. In ﬂies, which judge self-motion (in part)

using visual motion detection, the exact nature of con-

trast sensitivity has been used as a tool to investigate

the fundamental mechanism of motion detection using

electrophysiological recordings. At low contrast levels,

these studies have found a quadratic relationship

between membrane potential and luminance contrast

[55–57]. This result is consistent with Hassenstein–

Reichardt correlator model for elementary motion

detection (the HR-EMD, [58]), and these ﬁnd ings are

part of the evidence to support the hypothesis that

the ﬂy visual system implements something very similar

to this mathematical model.

Despite these and other electrophysiological ﬁndings

suggesting the HR-EMD may underlie ﬂy motion

(a)(b)

Figure 8. Example applications for ﬂydra. (a) By tracking ﬂies in real time, we can trigger high speed video cameras (6000 fps) to

record interesting events for later analysis. (b) A simulated reconstruction of the visual scene viewed through the left eye of Dro-

sophila. Such visual reconstructions can be used to simulate neural activity in the visual system [50,51]. In this simulated view, a

three-dimensional model of the experimental environment, with green backlighting and a black vertical post, was used in con-

junction with an optical model of a ﬂy eye and the ﬂy’s pose estimate to render an image approximating that seen by a ﬂy

during an experiment.

Real-time 3D multi-camera animal tracking A. D. Straw et al. 405

J. R. Soc. Interface (2011)

sensitivity, studies of ﬂight speed regulation and other

visual behaviours in freely ﬂying ﬂies [59] and honey

bees [60–62] show that the free-ﬂight behaviour of

these insects is inconsistent with a ﬂight velocity regu-

lator based on a simple HR-EMD model. More

recently, Baird et al. [63] have shown that over a large

range of contrasts, ﬂight velocity in honey bees is

nearly unaffected by contrast. As noted by those

authors, however, their set-up was unable to achieve

true zero contrast owing to imperfections with their

apparatus. They suggest that contrast adaptation [64]

may have been responsible for boosting the responses

to low contrasts and attenuating responses to high con-

trast. This possibility was supported by the ﬁnding that

forward velocity was better regulated at the nominal

‘zero contrast’ condition than in the presence of an

axial stripe, which may have had the effect of prevent-

ing contrast adaptation while provide no contrast

perpendicular to the direction of ﬂight [63].

We tested the effect of contrast on the regulation of

ﬂight speed in Drosophila melanogaster, and the results

are shown in ﬁgure 9. In this set-up, we found that when

contrast was sufﬁciently high (Michelson contrast

1.6), ﬂies regulated their speed to a mean speed of

0.15 m s

21

with a standard deviation of 0.07. As con-

trast was lowered, the mean speed increased, as did

variability of speed, suggesting that speed regulation

suffered owing to a loss of visual feedback. To perform

these experiments, a computer projector (DepthQ,

modiﬁed to remove colour ﬁlter wheel, Lightspeed

Design, USA) illuminated the long walls and ﬂoor of a

0.3 0.3 1.5 m arena with a regular checkerboard

pattern (5 cm

2

) of varying contrast and ﬁxed luminance

(2 cd m

22

). The test contrasts were cycled, with each

contrast displayed for 5 min. Twenty females ﬂies

were released into the arena and tracked over 12 h.

Any ﬂight segments more than 5 cm from the walls,

ﬂoor or ceiling were analysed, although for the majority

of the time ﬂies were standing or walking on the ﬂoors

or walls of the arena. Horizontal ﬂight speed was

measured as the ﬁrst derivative of position in the XY

direction, and histograms were computed with each

frame constituting a single sample. Because the identity

of the ﬂies could not be tracked for the duration of the

experiment, the data contain pseudo-replication—some

ﬂies probably contributed more to the overall histogram

than others. Nevertheless, the results from three separ-

ate experimental days with 20 new ﬂies tested on each

day were each qualitatively similar to the pooled results

shown, which include 1760 total seconds of ﬂight

in which tracked ﬂy was 5 cm or greater from the

nearest arena surface and were acquired during 30

cumulative hours.

The primary difference between our ﬁndings on the

effect of contrast on ﬂight speed in Drosophila melano-

gaster compared with that found in honey bees by [63]is

that at low contrasts (below 0.16 Michelson contrast),

ﬂight speed in Drosophila is faster and more variable.

This difference could be owing to several non-mutually

exclusive factors: (i) our arena may have fewer imperfec-

tions which create visual contrast; (ii) fruit ﬂies may

have a lower absolute contrast sensitivity than honey

bees; (iii) fruit ﬂies may have a lower contrast

sensitivity at the luminance level of the experiments;

(iv) fruit ﬂies may have less contrast adaptation

ability; or (v) fruit ﬂies may employ an alternate

motion detection mechanism.

Despite the difference between the present results in

fruit ﬂies from those of honey bees at low contrast

levels, at high contrasts (above 0.16 Michelson con-

trast for Drosophila) ﬂight speed in both species was

regulated around a constant value. This suggests

that the visual system has little trouble estimating

self-motion at these contrast values and that insects

regulate ﬂight speed about a set point using visual

information.

8. CONCLUSIONS

The highly automated and real-time capabilities of our

system allow unprecedented experimental opportu-

nities. We are currently investigating the object

approach and avoidance phenomenon of fruit ﬂies illus-

trated in ﬁgure 3. We are also studying manoeuvring in

solitary and competing hummingbirds and the role of

manoeuvring in establishing dominance. One of the

opportunities made possible by the molecular biological

revolution are powerful new tools that can be used to

visualize and modify the activity of neurons and

neural circuits. By precisely quantifying high level beha-

viours, such as the object attraction/repulsion

described above, we hope to make use of these tools

to approach the question of how neurons contribute

to the process of behaviour.

The data for ﬁgure 4 were gathered in collaboration with

Douglas Altshuler. Sawyer Fuller helped with the EKF

formulation, provided helpful feedback on the manuscript

and, together with Gaby Maimon, Rosalyn Sayaman,

Martin Peek and Aza Raskin, helped with physical

construction of arenas and bug reports on the software.

0

frequency

0.1 0.2 0.3 0.4 0.5 0.6

contrast 0.01

0.21 ± 0.11 m

s

–1

contrast 0.03

0.20 ± 0.11 m

s

–1

contrast 0.06

0.17 ± 0.09 m

s

–1

contrast 0.16

0.15 ± 0.07 m

s

–1

contrast 0.40

0.15 ± 0.08 m

s

–1

contrast 1.00

0.14 ± 0.07 m

s

–1

horizontal speed (m s

–1

)

Figure 9. Drosophila melanogaster maintain a lower ﬂight

speed with lower variability as visual contrast is increased.

Mean and standard deviation of ﬂight speeds are shown in

text, and each histogram is normalized to have equal area.

Ambient illumination reﬂected from one surface after being

scattered from other illuminated surfaces slightly reduced con-

trast from the nominal values shown here.

406 Real-time 3D multi-camera animal tracking A. D. Straw et al.

J. R. Soc. Interface (2011)

Pete Trautmann provided insight on data association, and

Pietro Perona provided helpful suggestions on the

manuscript. This work was supported by grants from the

Packard Foundation, AFOSR (FA9550-06-1-0079), ARO

(DAAD 19-03-D-0004), NIH (R01 DA022777) and NSF

(0923802) to M.H.D. and AFOSR (FA9550-10-1-0086) to

A.D.S.

APPENDIX A. EXTENDED KALMAN

FILTER

The EKF was used as described in §3.1. Here, we give

the equations using the notation of this paper. Note

that because only the observation process is nonlinear,

the process dynamics are speciﬁed with (linear)

matrix A.

The a priori predictions of these values base d on the

previous frame’s posterior estimates are

ˆ

s

tjt1

¼ A

ˆ

s

t1jt1

ðA1Þ

and

P

tjt1

¼ AP

t1jt1

A

T

þ Q: ðA2Þ

To incorporate observations, a gain term K is calcu-

lated to weight the innovation arising from the

difference between the a priori state estimate sˆ

tjt21

and the observation y

K

t

¼

P

tjt1

H

T

t

H

t

P

tjt1

H

T

t

þ R

: ðA3Þ

The observation matrix, H

t

, is deﬁned to be the Jaco-

bian of the observation function (equation (3.4))

evaluated at the expected state

H

t

¼

@h

@s

ˆ

s

tjt1

: ðA4Þ

The posterior estimates are then

ˆ

s

tjt

¼

ˆ

s

tjt1

þK

t

ðy

t

H

t

ˆ

s

tjt1

ÞðA5Þ

and

P

tjt

¼ðI K

t

H

t

ÞP

tjt1

: ðA6Þ

APPENDIX B. TRIANGULATION

The basic two- to three-dimensional calculation

ﬁnds the best three-dimensional location for two or

more two-dimensional camera views of a point, and is

implemented using a linear least-squares ﬁt of the inter-

section of n rays deﬁned by the two-dimensional image

points and three-dimensional camera centres of each of

the n cameras [44]. After correction for radial distortion,

the image of a three-dimensional point on the ith

camera is (u

i

, v

i

). For mathematical reasons, it is con-

venient to represent this two-dimensional image point

in homogene ous coordinates

x

i

¼ðr

i

; s

i

; t

i

Þ; ðB1Þ

such that u

i

¼ r

i

/t

i

and v

i

¼ s

i

/t

i

. For convenience, we

deﬁne the functi on H to convert from homogeneous to

Cartesian coordinates, thus

HðxÞ¼ðu; vÞ¼

r

t

;

s

t

: ðB2Þ

The 3 4 camera calibration matrix P

i

models the

projection from a three-dimensional homogeneous

point X ¼ (X

1

, X

2

, X

3

, X

4

) (representing the three-

dimensional point with inhomogeneous coordinates

(x, y, z) ¼ (X

1

/X

4

, X

2

/X

4

, X

3

/X

4

)) into the image

point:

x

i

¼ P

i

X: ðB3Þ

By combining the image point equation (B 3) from

two or more cameras, we can solve for X using the

homogeneous linear triangulation method based on

the singular value decomposition as described in [44],

§§12.2 and A5.3).

A similar approach can be used for reconstructing

the orientation of the longitudinal axis of an insect

or bird. Brieﬂy, a line is ﬁt to this axis in each

two-dimensional image (using u, v and

u

from §2)

and, together with the camera centre, is used to rep-

resent a plane in three-dimensional space. The best-ﬁt

line of intersection of the n planes is then found with

a similar singular value decomposition algorithm

([44], §12.7).

REFERENCES

1 Srinivasan, M. V., Zhang, S. W., Lehrer, M. & Collett,

T. S. 1996 Honeybee navigation en route to the goal:

visual ﬂight control and odometry. J. Exp. Biol. 199,

237–244.

2 Schilstra, C. & Hateren, J. H. 1999 Blowﬂy ﬂight and optic

ﬂow. I. Thorax kinematics and ﬂight dynamics. J. Exp.

Biol. 202, 1481–1490.

3 Tammero, L. F. & Dickinson, M. H. 2002 The inﬂuence of

visual landscape on the free ﬂight behavior of the fruit ﬂy

Drosophila melanogaster. J. Exp. Biol. 205, 327 –343.

4 Kern, R., van Hateren, J. H., Michaelis, C., Lindemann, J.

P. & Egelhaaf, M. 2005 Function of a ﬂy motion-sensitive

neuron matches eye movements during free ﬂight.

PLoS Biol. 3, 1130–1138. (doi:10.1371/journal.pbio.

0030171)

5 Land, M. F. & Collett, T. S. 1974 Chasing behavior of

houseﬂies (Fannia canicularis ): a description and analysis.

J. Comp. Physiol. 89, 331–357. (doi:10.1007/

BF00695351)

6 Buelthoff, H., Poggio, T. & Wehrhahn, C. 1980 3-D analy-

sis of the ﬂight trajectories of ﬂies (Drosophila

melanogaster). Z. Nat. 35c, 811–815.

7 Wehrhahn, C., Poggio, T. & Bu

¨

lthoff, H. 1982 Tracking

and chasing in houseﬂies (musca). Biol. Cybernet. 45,

123–130. (doi:10.1007/BF00335239)

8 Wagner, H. 1986 Flight performance and visual control of

the ﬂight of the free-ﬂying houseﬂy (Musca domestica). II.

Pursuit of targets. Phil. Trans. R. Soc. Lond. B 312, 553–

579. (doi:10.1098/rstb.1986.0018)

9 Mizutani, A., Chahl, J. S. & Srinivasan, M. V. 2003 Insect

behaviour: motion camouﬂage in dragonﬂies. Nature 423,

604. (doi:10.1038/423604a)

10 Frye, M. A., Tarsitano, M. & Dickinson, M. H. 2003 Odor

localization requires visual feedback during free ﬂight in

Drosophila melanogaster. J. Exp. Biol. 206, 843 –855.

(doi:10.1242/jeb.00175)

Real-time 3D multi-camera animal tracking A. D. Straw et al. 407

J. R. Soc. Interface (2011)

11 Budick, S. A. & Dickinson, M. H. 2006 Free-ﬂight

responses of Drosophila melanogaster to attractive odors.

J. Exp. Biol. 209, 3001–3017. (doi:10.1242/jeb.02305)

12 David, C. T. 1978 Relationship between body angle and

ﬂight speed in free-ﬂying Drosophila. Physiol. Entomol.

3, 191– 195. (doi:10.1111/j.1365-3032.1978.tb00148.x)

13 Fry, S. N., Sayaman, R. & Dickinson, M. H. 2003 The

aerodynamics of free-ﬂight maneuvers in Drosophila.

Science 300, 495–498. (doi:10.1126/science.1081944)

14 Collett, T. S. & Land, M. F. 1975 Visual control of ﬂight

behaviour in the hoverﬂy Syritta pipiens L. J. Comp. Phy-

siol. A Neuroethol. Sens. Neural Behav. Physiol. 99, 1–66.

(doi:10.1007/BF01464710)

15 Collett, T. S. & Land, M. F. 1978 How hoverﬂies compute

interception courses. J. Comp. Physiol. A Neuroethol.

Sens. Neural Behav. Physiol. 125, 191–204. (doi:10.

1007/BF00656597)

16 Wehrhahn, C. 1979 Sex-speciﬁc differences in the chasing

behaviour of houseﬂies (musca). Biol. Cybernet. 32, 239–

241. (doi:10.1007/BF00337647)

17 Dahmen, H.-J. & Zeil, J. 1984 Recording and reconstruct-

ing three-dimensional trajectories: a versatile method for

the ﬁeld biologist. Proc. R. Soc. Lond. B 222, 107–113.

(doi:10.1098/rspb.1984.0051)

18 Srinivasan, M. V., Zhang, S. W., Chahl, J. S., Barth, E. &

Venkatesh, S. 2000 How honeybees make grazing landings

on ﬂat surfaces. Biol. Cybernet. 83, 171– 183. (doi:10.

1007/s004220000162)

19 Hedrick, T. L. & Biewener, A. A. 2007 Low speed maneu-

vering ﬂight of the rose-breasted cockatoo (Eolophus

roseicapillus). I. Kinematic and neuromuscular control of

turning. J. Exp. Biol. 210, 1897–1911. (doi:10.1242/jeb.

002055)

20 Tian, X., Diaz, J. I., Middleton, K., Galvao, R., Israeli, E.,

Roemer, A., Sullivan, A., Song, A., Swartz, S. & Breuer,

K. 2006 Direct measurements of the kinematics and

dynamics of bat ﬂight. Bioinspiration & Biomimetics 1,

S10–S18. (doi:10.1088/1748-3182/1/4/S02)

21 Khan, Z., Balch, T. & Dellaert, F. 2005 MCMC-based

particle ﬁltering for tracking a variable number of interact-

ing targets. IEEE Trans. Pattern Anal. Mach. Intell. 27,

2005.

22 Khan, Z., Balch, T. & Dellaert, F. 2006 MCMC data

association and sparse factorization updating for real

time multitarget tracking with merged and multiple

measurements. IEEE Trans. Pattern Anal. Mach. Intell.

28, 1960–1972. (doi:10.1109/TPAMI.2006.247)

23 Branson, K., Robie, A., Bender, J., Perona, P. &

Dickinson, M. H. 2009 High-throughput ethomics in

large groups of Drosophila. Nat. Methods 6, 451–457.

(doi:10.1038/nmeth.1328)

24 Klein, D. J. 2008 Coordinated control and estimation for

multi-agent systems: theory and practice, ch. 6. Tracking

Multiple Fish Robots Using Underwater Cameras, PhD

thesis, University of Washington.

25 Qu, W., Schonfeld, D. & Mohamed, M. 2007 Distributed

Bayesian multiple-target tracking in crowded environ-

ments using multiple collaborative cameras. EURASIP

J. Adv. Signal Process. Article ID 38373. (doi:10.1155/

2007/38373)

26 Qu, W., Schonfeld, D. & Mohamed, M. 2007 Real-time

interactively distributed multi-object tracking using a

magnetic-inertia potential model. IEEE Trans. Multime-

dia (TMM) 9, 511– 519. (doi:10.1109/TMM.2006.886266)

27 Ballerini, M. et al. 2008 Empirical investigation of starling

ﬂocks: a benchmark study in collective animal behaviour.

Anim. Behav. 76, 201–215. (doi:10.1016/j.anbehav.2008.

02.004)

28 Cavagna, A., Cimarelli, A., Giardina, I., Orlandi, A.,

Parisi, G., Procaccini, A., Santagati, R. & Stefanini, F.

2008 New statistical tools for analyzing the structure of

animal groups. Math. Biosci. 214, 32–37. (doi:10.1016/j.

mbs.2008.05.006)

29 Cavagna, A., Giardina, I., Orlandi, A., Parisi, G. &

Procaccini, A. 2008 The starﬂag handbook on collective

animal behaviour: 2. Three-dimensional analysis. Anim.

Behav. 76, 237–248. (

doi:10.1016/j.anbehav.2008.02.003)

30 Cavagna, A., Giardina, I., Orlandi, A., Parisi, G., Procac-

cini, A., Viale, M. & Zdravkovic, V. 2008 The starﬂag

handbook on collective animal behaviour: 1. Empirical

methods. Anim. Behav. 76, 217–236. (doi:10.1016/j.anbe-

hav.2008.02.002)

31 Wu, H., Zhao, Q., Zou, D. & Chen, Y. 2009 Acquiring 3d

motion trajectories of large numbers of swarming animals.

IEEE Int. Conf. on Computer Vision (ICCV) Workshop

on Video Oriented Object and Event Classiﬁcation.

32 Zou, D., Zhao, Q., Wu, H. & Chen, Y. 2009 Reconstruct-

ing 3d motion trajectories of particle swarms by global

correspondence selection. In Int. Conf. on Computer

Vision (ICCV 09). Workshop on Video Oriented Object

and Event Classiﬁcation, pp. 1578 –1585.

33 Bomphrey, R. J., Walker, S. M. & Taylor, G. K. 2009 The

typical ﬂight performance of blowﬂies: measuring the

normal performance envelope of Calliphora vicina using

a novel corner-cube arena. PLoS ONE 4, e7852. (doi:10.

1371/journal.pone.0007852)

34 Marden, J. H., Wolf, M. R. & Weber, K. E. 1997 Aerial

performance of Drosophila melanogaster from populations

selected for upwind ﬂight ability. J. Exp. Biol. 200,

2747–2755.

35 Fry, S. N., Muller, P., Baumann, H. J., Straw, A. D.,

Bichsel, M. & Robert, D. 2004 Context-dependent stimu-

lus presentation to freely moving animals in 3d.

J. Neurosci. Methods 135, 149–157. ( doi:10.1016/j.jneu-

meth.2003.12.012)

36 Fry, S. N., Rohrseitz, N., Straw, A. D. & Dickinson, M. H.

2008 Trackﬂy-virtual reality for a behavioral system

analysis in free-ﬂying fruit ﬂies. J. Neurosci. Methods

171, 110–117. (doi:10.1016/j.jneumeth.2008.02.016)

37 Grover, D., Tower, J. & Tavare, S. 2008 O ﬂy, where art

thou? J. R. Soc. Interface 5, 1181–1191. (doi:10.1098/

rsif.2007.1333)

38 Maimon, G., Straw, A. D. & Dickinson, M. H. 2008 A

simple vision-based algorithm for decision making in

ﬂying Drosophila. Curr. Biol. 18, 464 –470. (doi:10.1016/

j.cub.2008.03.050)

39 Piccardi, M. 2004 Background subtraction techniques: a

review. In Proc. of IEEE SMC 2004 Int. Conf. on Systems,

Man and Cybernetics. The Hague, The Netherlands,

October 2004.

40 Bar-Shalom, Y. & Fortmann, T. E. 1988 Tracking and

data association. New York, NY: Academic Press.

41 Abdel-Aziz, Y. I. & Karara, H. M. 1971 Direct linear trans-

formation from comparator coordinates into object space

coordinates. Proc. American Society of Photogrammetry

Symp. on Close-Range Photogrammetry, Urbana, IL,

26–29 January 1971, pp. 1–18. Falls Church, VA:

American Society of Photogrammetry.

42 Svoboda, T., Martinec, D. & Pajdla, T. 2005 A convenient

multi-camera self-calibration for virtual environments.

PRESENCE: Teleoperators and Virtual Environments

14, 407–422. (doi:10.1162/105474605774785325)

43 Sturm, P. & Triggs, B. 1996 A factorization based algor-

ithm for multi-image projective structure and motion. In

Proc. 4th Eur. Conf. on Computer Vision, April 1996,

Cambridge, UK, pp. 709– 720. Springer.

408 Real-time 3D multi-camera animal tracking A. D. Straw et al.

J. R. Soc. Interface (2011)

44 Hartley, R. I. & Zisserman, A. 2003 Multiple view geome-

try in computer vision, 2nd edn. Cambridge, UK:

Cambridge University Press.

45 Bouguet, J. 2010 Camera calibration toolbox. See http://

www.vision.caltech.edu/bouguetj/calib_doc.

46 Prescott, B. & McLean, G. F. 1997 Line-based correction

of radial lens distortion. Graph. Models Image Process.

59, 39–47. (doi:10.1006/gmip.1996.0407)

47 Straw, A. D. & Dickinson, M. H. 2009 Motmot, an open-

source toolkit for realtime video acquisition and analysis.

Source Code Biol. Med. 4, 5, 1–20. (doi:10.1186/1751-

0473-4-5)

48 Correll, K., Barendt, N. & Branicky, M. 2005 Design

considerations for software-only implementations of the

IEEE 1588 precision time protocol. Proc. Conf. on

IEEE-1588 Standard for a Precision Clock Synchroniza-

tion Protocol for Networked Measurement and Control

Systems, NIST and IEEE, Winterthur, Switzerland, 10–

12 October 1975.

49 Kennedy, J. S. 1939 Visual responses of ﬂying mosquitoes.

Proc. Zool. Soc. Lond. 109, 221–242.

50 Neumann, T. R. 2002 Modeling insect compound eyes:

space-variant spherical vision. In Proc. of the 2nd Int.

Workshop on Biologically Motivated Computer

Vision (eds H. H. Bu

¨

lthoff, S.-W. Lee, T. Poggio & C.

Wallraven), LNCS 2525, pp. 360–367. Berlin, Germany:

Springer.

51 Dickson, W. B., Straw, A. D. & Dickinson, M. H. 2008

Integrative model of drosophila ﬂight. AIAA J. 46,

2150–2165. (doi:10.2514/1.29862)

52 Strauss, R., Schuster, S. & Go

¨

tz, K. G. 1997 Processing of

artiﬁcial visual feedback in the walking fruit ﬂy Drosophila

melanogaster. J. Exp. Biol. 200, 1281–1296.

53 Straw, A. D. 2008 Vision egg: an open-source library for

realtime visual stimulus generation. Front. Neuroinfor-

matics 2, 1–10. (doi:10.3389/neuro.11.004.2008)

54 Stamps, J., Buechner, M., Alexander, K., Davis, J. &

Zuniga, N. 2005 Genotypic differences in space use and

movement patterns in Drosophila melanogaster. Anim.

Behav. 70, 609–618. (doi:10.1016/j.anbehav.2004.11.018)

55 Dvorak, D., Srinivasan, M. V. & French, A. S. 1980 The

contrast sensitivity of ﬂy movement-detecting neurons.

Vision Res. 20, 397–407. (doi:10.1016/

0042-

6989(80)90030-9)

56 Srinivasan, M. V. & Dvorak, D. R. 1980 Spatial processing

of visual information in the movement-detecting pathway of

the ﬂy: characteristics and functional signiﬁcance. J. Comp.

Physiol. A 140, 1–23. (doi:10.1007/BF00613743)

57 Egelhaaf, M. & Borst, A. 1989 Transient and steady-state

response properties of movement detectors. J. Opt. Soc.

Am. A 6, 116– 127. (doi:10.1364/JOSAA.6.000116)

58 Hassenstein, B. & Reichardt, W. 1956 Systemtheoretische

analyse der zeit-, reihenfolgen- und vorzeichenauswertung

bei der bewegungsperzeption des ru

¨

sselkafers Chloropha-

nus. Z. Nat. 11b, 513–524.

59 David, C. T. 1982 Compensation for height in the control

of groundspeed by drosophila in a new, barbers pole wind-

tunnel. J. Comp. Physiol. 147, 485–493. (doi:10.1007/

BF00612014)

60 Srinivasan, M. V., Lehrer, M., Kirchner, W. H. & Zhang,

S. W. 1991 Range perception through apparent image

speed in freely ﬂying honeybees. Vis. Neurosci. 6, 519–

535. (doi:10.1017/S095252380000136X)

61 Srinivasan, M. V., Zhang, S. W. & Chandrashekara, K.

1993 Evidence for two distinct movement detecting mech-

anisms inc insect vision. Naturwissenschaften 80, 38–41.

(doi:10.1007/BF01139758)

62 Si, A., Srinivasan, M. V. & Zhang, S. 2003 Honeybee navi-

gation: properties of the visually driven ‘odometer’. J. Exp.

Biol. 206, 1265–1273. (doi:10.1242/jeb.00236)

63 Baird, E., Srinivasan, M. V., Zhang, S. & Cowling, A. 2005

Visual control of ﬂight speed in honeybees. J. Exp. Biol.

208, 3895– 3905. (doi:10.1242/jeb.01818)

64 Harris, R. A., O’Carroll, D. C. & Laughlin, S. B. 2000 Con-

trast gain reduction in ﬂy motion adaptation. Neuron 28,

595–606. (doi:10.1016/S0896-6273(00)00136-7)

Real-time 3D multi-camera animal tracking A. D. Straw et al. 409

J. R. Soc. Interface (2011)

- CitationsCitations72
- ReferencesReferences73

- "[7] adopts three cameras for reconstructing a 3-dimensional hull for a fly, and then tracks the hull using a Kalman-filter tracker. A similar approach [22] employs up to eleven cameras for realtime trajectory estimation. "

[Show abstract] [Hide abstract]**ABSTRACT:**Occlusion has long been a core challenge for multi-target tracking tasks. In this paper we present context-based tracking strategies and demonstrate those for two very different types of targets, namely vehicles and fruit flies, representing examples of different target categories (e.g. individually identifiable with relatively consistent trajectories versus nearly identical targets with highly irregular trajectories). Those two classes of targets are also recorded with either mobile or static camera systems, and they represent either long-term or high-frequency occlusion scenarios, respectively. Occlusions among rigid vehicles have various occlusion patterns because of the mobile recording platform and the dynamic traffic environment. In contrast, a high-density scene of fruit flies contains hundreds of targets where occlusion is relatively short, but the frequency of occlusions is very high. In this paper we propose tracking systems based on context information, and show that those are able to address both application scenarios of target tracking. The proposed strategy outperforms state-of-the-art methods in both cases. Experimental results also demonstrate the efficiency of the proposed systems for occlusion handling.- "For example, simulated evolutionary algorithms were proposed to solve optimization problems [7, 8], collective behavior models were applied to help model complex traffic and transportation processes [9] and develop intelligent robots [10]. Multi-object tracking via video camera makes it possible to discover new principles underlying these collective behaviors because it can accurately acquire motion data of different organism groups without tedious manual work or pasting markers on the tracked objects and the trajectory data of them is essential for quantitatively analyzing their collective behavior [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25]. Zebrafish (Danio rerio) is widely adopted as a model organism by biologists. "

[Show abstract] [Hide abstract]**ABSTRACT:**Zebrafish (Danio rerio) is one of the most widely used model organisms in collective behavior research. Multi-object tracking with high speed camera is currently the most feasible way to accurately measure their motion states for quantitative study of their collective behavior. However, due to difficulties such as their similar appearance, complex body deformation and frequent occlusions, it is a big challenge for an automated system to be able to reliably track the body geometry of each individual fish. To accomplish this task, we propose a novel fish body model that uses a chain of rectangles to represent fish body. Then in detection stage, the point of maximum curvature along fish boundary is detected and set as fish nose point. Afterwards, in tracking stage, we firstly apply Kalman filter to track fish head, then use rectangle chain fitting to fit fish body, which at the same time further judge the head tracking results and remove the incorrect ones. At last, a tracklets relinking stage further solves trajectory fragmentation due to occlusion. Experiment results show that the proposed tracking system can track a group of zebrafish with their body geometry accurately even when occlusion occurs from time to time.- "Furthermore, the previously cited wind tunnel studies were limited to analysing single mosquito trajectories, from a maximum of four in the field of view, and for up to 15 min. Recently, sophisticated laboratory systems have tracked 20 diurnally active Aedes aegypti mosquitoes for 3 h [11,12], and three fruit flies for 1 h [13] but their complexity (four to six cameras and one to six computers) limit the system's potential for field studies while the illumination method in [11,12] constrained the field of view to wind tunnel scales of 1.2 Â 0.3 Â 0.3 m and to a 10 cm dome in [13]. The flight chambers in laboratory experiments typically provide multiple view optical access of the relevant volume, often the entire flight domain. "

[Show abstract] [Hide abstract]**ABSTRACT:**Many vectors of malaria and other infections spend most of their adult life within human homes, the environment where they bloodfeed and rest, and where control has been most successful. Yet, knowledge of peri-domestic mosquito behaviour is limited, particularly how mosquitoes find and attack human hosts or how insecticides impact on behaviour. This is partly because technology for tracking mosquitoes in their natural habitats, traditional dwellings in disease-endemic countries, has never been available. We describe a sensing device that enables observation and recording of nocturnal mosquitoes attacking humans with or without a bed net, in the laboratory and in rural Africa. The device addresses requirements for sub-millimetre resolution over a 2.0 ! 1.2 ! 2.0 m volume while using minimum irradiance. Data processing strategies to extract individual mosquito trajectories and algorithms to describe behaviour during host/net interactions are introduced. Results from UK laboratory and Tanzanian field tests showed that Culex quinquefasciatus activity was higher and focused on the bed net roof when a human host was present, in colonized and wild populations. Both C. quinquefasciatus and Anopheles gambiae exhibited similar behavioural modes, with average flight velocities varying by less than 10%. The system offers considerable potential for investigations in vector biology and many other fields.

## People who read this publication also read

Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.

This publication is from a journal that may support self archiving.

Learn more