Multi-camera real-time three-dimensional tracking of multiple flying animals

Article (PDF Available)inJournal of The Royal Society Interface 8(56):395-409 · March 2011with118 Reads
DOI: 10.1098/rsif.2010.0230 · Source: PubMed
Automated tracking of animal movement allows analyses that would not otherwise be possible by providing great quantities of data. The additional capability of tracking in real time—with minimal latency—opens up the experimental possibility of manipulating sensory feedback, thus allowing detailed explorations of the neural basis for control of behaviour. Here, we describe a system capable of tracking the three-dimensional position and body orientation of animals such as flies and birds. The system operates with less than 40 ms latency and can track multiple animals simultaneously. To achieve these results, a multi-target tracking algorithm was developed based on the extended Kalman filter and the nearest neighbour standard filter data association algorithm. In one implementation, an 11-camera system is capable of tracking three flies simultaneously at 60 frames per second using a gigabit network of nine standard Intel Pentium 4 and Core 2 Duo computers. This manuscript presents the rationale and details of the algorithms employed and shows three implementations of the system. An experiment was performed using the tracking system to measure the effect of visual contrast on the flight speed of Drosophila melanogaster. At low contrasts, speed is more variable and faster on average than at high contrasts. Thus, the system is already a useful tool to study the neurobiology and behaviour of freely flying animals. If combined with other techniques, such as ‘virtual reality’-type computer graphics or genetic manipulation, the tracking system would offer a powerful new way to investigate the biology of flying animals.
Multi-camera real-time three-
dimensional tracking of multiple
flying animals
Andrew D. Straw*, Kristin Branson, Titus R. Neumann
and Michael H. Dickinson
California Institute of Technology, Bioengineering, Mailcode 138-78, Pasadena,
CA 91125, USA
Automated tracking of animal movement allows analyses that would not otherwise be
possible by providing great quantities of data. The additional capability of tracking in real
time—with minimal latency—opens up the experimental possibility of manipulating sensory
feedback, thus allowing detailed explorations of the neural basis for control of behaviour.
Here, we describe a system capable of tracking the three-dimensional position and body orien-
tation of animals such as flies and birds. The system operates with less than 40 ms latency and
can track multiple animals simultaneously. To achieve these results, a multi-target tracking
algorithm was developed based on the extended Kalman filter and the nearest neighbour
standard filter data association algorithm. In one implementation, an 11-camera system is
capable of tracking three flies simultaneously at 60 frames per second using a gigabit network
of nine standard Intel Pentium 4 and Core 2 Duo computers. This manuscript presents the
rationale and details of the algorithms employed and shows three implementations of the
system. An experiment was performed using the tracking system to measure the effect of
visual contrast on the flight speed of Drosophila melanogaster. At low contrasts, speed is
more variable and faster on average than at high contrasts. Thus, the system is already a
useful tool to study the neurobiology and behaviour of freely flying animals. If combined
with other techniques, such as ‘virtual reality’-type computer graphics or genetic manipu-
lation, the tracking system would offer a powerful new way to investigate the biology of
flying animals.
Keywords: computer vision; animal behaviour; flight; manoeuvring; insects; birds
Much of what we know about the visual guidance of flight
[14], aerial pursuit [59], olfactory search algorithms
[10,11] and control of aerodynamic force generation
[12,13] is based on experiments in which an insect was
tracked during flight. To facilitate these types of studies
and to enable new ones, we created a new, automated
animal tracking system. A significant motivation was
to create a system capable of robustly gathering large
quantities of accurate data in a highly automated fashion
in a flexible way. The real-time nature of the system
enables experiments in which an animal’s own movement
is used to control the physical environment, allowing
virtual-reality or other dynamic stimulus regimes to
investigate the feedback-based control performed by
the nervous system. Furthermore, the ability to easily
collect flight trajectories facilitates data analysis and be-
havioural modelling using machine-learning approaches
that require large amounts of data.
Our primary innovation is the use of arbitrary num-
bers of inexpensive cameras for markerless, real-time
tracking of multiple targets. Typically, cameras with
relatively high temporal resolution, such as 100 frames
per second, and which are suitable for real-time image
analysis (those that do not buffer their images to on-
camera memory), have relatively low spatial resolution.
To have high spatial resolution over a large tracking
volume, many cameras are required. Therefore, the
use of multiple cameras enables tracking over large, be-
haviourally and ecologically relevant spatial scales with
high spatial and temporal resolutions while minimizing
the effects of occlusion. The framework naturally allows
information gathered from each camera view to incre-
mentally improve localization. Individual views of
the target thus refine the tracking estimates, even if
other cameras do not see it (for example, owing to
occlusions or low contrast). The use of multiple cameras
also gives the system its name, flydra, from ‘fly’, our pri-
mary experimental animal, and the mythical Greek
multi-headed serpent ‘hydra’.
Flydra is largely composed of standard algorithms,
hardware and software. Our effort has been to integrate
*Author for correspondence (
J. R. Soc. Interface (2011) 8, 395–409
Published online 14 July 2010
Received 19 April 2010
Accepted 21 June 2010
395 This journal is q 2010 The Royal Society
these disparate pieces of technology into one coherent,
working system with the important property that the
multi-target tracking algorithm operates with low
latency during experiments.
1.1. System overview
A Bayesian framework provides a natural formalism to
describe our multi-target tracking approach. In such a
framework, previously held beliefs are called the a
priori, or prior, probability distribution of the state of
the system. Incoming observations are used to update
the estimate of the state into the a posteriori, or pos-
terior, probability distribution. This process is often
likened to human reasoning, whereby a person’s best
guess at some value is arrived at through a process of
combining previous expectations of that value with
new observations that inform about the value.
The task of flydra is to find the maximum a poster-
iori (MAP) estimate of the state S
of all targets at
time t given observations Z
from all time steps (start-
ing with the first time step to the current time step),
). Here, S
represents the state (position and
velocity) of all targets, S
¼ (s
, ...s
) where l
is the
number of targets at time t. Under the first-order
Markov assumption, we can factorize the posterior as
Þ / pðZ
: ð1:1Þ
Thus, the process of estimating the posterior prob-
ability of target state at time t is a recursive process
in which new observations are used in the model of
observation likelihood p(Z
). Past observations
become incorporated into the prior, which combines
the motion model p(S
) with the target probability
from the previous time step p(S
Flydra uses an extended Kalman filter (EKF) to
approximate the solution to equation (1.1), as described
in §3.1. The observation Z
for each time step is the set
of all individual low-dimensional feature vectors con-
taining image position information arising from the
camera views of the targets (§2). In fact, equation
(1.1) neglects the challenges of data association (linking
individual observations with specific targets) and
targets entering and leaving the tracking volume.
Therefore, the nearest neighbour standard filter
(NNSF) data association step is used to link individual
observations with target models in the model of obser-
vation likelihood (§3.2), and the state update model
incorporates the ability for targets to enter and leave
the tracking volume (§3.2.3). The heuristics employed
to implement the system typically were optimizations
with regard to real-time performance and low latency
rather than a compact form, and our system only
approximates the full Bayesian solution rather than
perfectly implements it. Nevertheless, the remaining
sections of this manuscript address their relation to
the global Bayesian framework where possible. Aspects
of the system which were found to be important for
low-latency operation are mentioned.
The general form of the apparatus is illustrated
in figure 1a and a flowchart of operations is given in
figure 2a. Digital cameras are connected (with an
IEEE 1394 FireWire bus or a dedicated gigabit ethernet
cable) to image processing computers that perform a
background subtraction-based algorithm to extract
image features such as the two-dimensional target pos-
ition and orientation in a given camera’s image. From
these computers, this two-dimensional information is
transmitted over a gigabit ethernet LAN to a central
computer, which performs two- to three-dimensional tri-
angulation and tracking. Although the tracking results
are generated and saved online, in real time as the exper-
iment is performed, raw image sequences can also be
saved for both verification purposes as well as other
types of analyses. Finally, reconstructed flight trajec-
tories, such as that of figure 1b, may then be subjected
to further analysis (figure 3 and see figure 9).
1.2. Related work
Several systems have allowed manual or manually
assisted digitization of the trajectories of freely flying ani-
mals. In the 1970s, Land & Collett [5] performed
pioneering studies on the visual guidance of flight in
blowflies and, later, in hoverflies [14,15]. By the end of
the 1970s and into the 1980s, three-dimensional recon-
structions using two views of flying insects were
performed [68,16,17]. In one case, the shadow of a
bee on a planar white surface was used as a second
view to perform three-dimensional reconstruction [18].
Today, hand digitization is still used when complex
12 n
free-flight arena
speed (m s
Figure 1. (a) Schematic of the multi-camera tracking system.
(b) A trajectory of a fly (Drosophila melanogaster) near a
dark, vertical post. Arrow indicates direction of flight at
onset of tracking.
396 Real-time 3D multi-camera animal tracking A. D. Straw et al.
J. R. Soc. Interface (2011)
kinematics, such as wing shape and position, are desired,
such as in Drosophila [13], cockatoo [19] and bats [20].
Several authors have solved similar automated multi-
target tracking problems using video. For example,
Khan et al. [21] tracked multiple, interacting ants in
two-dimensions from a single view using particle filter-
ing with a Markov Chain Monte Carlo sampling step to
solve the multi-targe t tracking problem. Later work by
the same authors [22] achieved real-time speeds through
the use of sparse updating techniques. Branson et al.
[23] addressed the same problem for walking flies.
Their technique uses background subtraction and clus-
tering to detect flies in the image, and casts the data
association problem as an instance of minimum
weight bipartite perfect matching. In implementing
flydra, we found the simpler system described here to
be sufficient for tracking the position of flying flies
and hummingbirds (§5). In addition to tracking in
three dimensions rather than two, a key difference
between the work described about and those addressed
in the present wo rk is that the interactions between our
animals are relatively weak (§3.2, especially equation
(3.6)), and we did not find it necessary to implement
a more advanced tracker. Nevertheless, the present
work could be used as the basis for a more advanced
tracker, such as the one using a particle filter (e.g.
[24]). In that case, the posterior from the EKF (§3.1)
could be used as the proposal distribution for the par-
ticle filter. Others have decentralized the multiple
object tracking problem to improve performance,
especially when dealing with dynamic occlusions
owing to targets occluding each other (e.g. [25,26]).
Additionally, tracking of dense clouds of starlings
[2730] and fruit flies [31,32] has enabled detailed
investigation of swarms, although these systems are cur-
rently incapable of operating in real time. By filming
inside a corner-cube reflector, multiple (real and
reflected) images allowed Bomphrey et al. [33]totrack
flies in three dimensions with only a single camera,
and the tracking algorithm presented here could make
use of this insight.
Completely automated three-dimensional animal
tracking systems have more recently been created,
such as systems with two cameras that track flies in
real time [3436]. The system of Grover et al. [ 37], simi-
lar in many respects to the one we describe here, tracks
the visual hull of flies using three cameras to reconstruct
a polygonal model of the three-dimensional shape of the
frame t – 1(a)
camera i, frame t
camera 1
camera 2
frame t + 1
frame t
image features
a priori
state estimates
state estimates
false detection
= (..., 3, ...)
= (..., 1, ...)
motion model
( 3.1,
3.2, A
data association
( 3.2,
EKF update
( 3.1, equations 3.3,
3.4, A
a posteriori
state estimates
data association
Figure 2. (a) Flowchart of operations. (b) Schematic of a two-dimensional camera view showing the raw images (brown), feature
extraction (blue), state estimation (black), and data association (red). See §§2 and 3.2.4 for an explanation of the symbols. (c)
Three-dimensional reconstruction using the EKF uses prior state estimates (open circle) and observations (blue lines) to con-
struct a posterior state estimate (filled circle) and covariance ellipsoid (dotted ellipse). See appendix A for details.
Real-time 3D multi-camera animal tracking A. D. Straw et al. 397
J. R. Soc. Interface (2011)
flies. Our system, briefly described in a simpler, earlier
form in Maimon et al. [38], differs in several ways.
First, flydra has a design goal of tracking over large
volumes, and, as a result of the associated limited
spatial resolution (rather than owing to a lack of inter-
est), flydra is concerned only with the location and
orientation of the animal. Second, to facilitate tracking
over large volumes, the flydra system uses a data associ-
ation step as part of the tracking algorithm. The data
association step allows flydra to deal with additional
noise (false positive feature detections) when dealing
with low contrast situations often present when
attempting to track in large volumes. Third, our
system does not attempt to maintain identity corre-
spondence of multiple animals over extended
durations, but rather stops tracking individuals when
the tracking error is too high and starts tracking as
new individuals when detected again. Finally, although
their system operates in real time, no measurements of
latency were provided by Grover et al. [37] with
which to compare our measurements.
1.3. Notation
In the equations to follow, letters in a bold, roman
font signify a vector, which may be specified by com-
ponents enclosed in parentheses and separated by
commas. Matrices are written in roman font with
uppercase letters. Scalars are in italics. Vectors always
act like a single column matrix, such that for vector
v ¼ (a, b, c), the multiplication with matrix M is
Mv ¼ M½v¼M
The first stage of processing converts digital images into
a list of feature points using an elaboration of a back-
ground subtraction algorithm. Because the image of a
target is usually only a few pixels in area, an individual
feature point from a given camera characterizes that
camera’s view of the target. In other words, neglecting
missed detections or false positives, there is usually a
one-to-one correspondence between targets and
extracted feature points from a given camera. Neverthe-
less, our system is capable of successful tracking despite
missing observations owing to occlusion or low contrast
(§3.1) and rejecting false positive feature detections (as
described in §3.2).
In the Bayesian framework, all feature points for
time t are the observation Z
. The ith of n cameras
50 mm
50 mm
time (s)
|ψ|, angular velocity
|ψ|, approach angle (°)
100 150 200 250
κ, horizontal distance (mm)
Figure 3. (a) Top view of the fly trajectory in figure 1b, showing several close approaches to and movements away from a dark post
placed in the centre of the arena. The arrow indicates initial flight direction. Two sequences are highlighted in colour. The inset
shows the coordinate system. (b) The time-course of attraction and repulsion to the post is characterized by flight directly
towards the post until only a small distance remains, at which point the fly turns rapidly and flies away. Approach angle
quantified as the difference between the direction of flight and bearing to the post, both measured in the horizontal plane.
The arrow indicates the initial approach for the selected sequences. (c) Angular velocity (measured tangent to the three-dimen-
sional flight trajectory) indicates that relatively straight flight is punctuated by saccades (brief, rapid turns).
398 Real-time 3D multi-camera animal tracking A. D. Straw et al.
J. R. Soc. Interface (2011)
returns m feature points, with each point z
being a
vector z
¼ (u, v,
) where u and v are the coor-
dinates of the point in the image plane and the
remaining components are local image statistics
described below. Z
thus consists of all such feature
points for a given frame Z
¼ fz
, ..., z
, ...,z
, ...,
g. (In the interest of simplified notation, our indexing
scheme is slightly misleading here—there may be varying
numbers of features for each camera rather than always
m as suggested.)
The process to convert a new image to a series of
feature points uses a process based on background s ub-
traction using the running Gaussian average method
(reviewed in [39]). To achieve fast im age processing
required for real-time operation, many of these oper-
ations are performed using the high-performance
single instruction multiple data extensions available on
recent 86 CPUs. Initially, an absolute difference
image is made, where each pixel is the absolute value
of the difference between the incoming frame and the
background image. Feature points that exceed some
threshold difference from the background image are
noted and a small region around each pixel is subjected
to further analysis. For the jth feature, the brightest
point has value
in this absolute difference image.
All pixels below a certain fraction (e.g. 0.3) of
set to zero to reduce moment arms caused by spurious
pixels. Feature area
is found from the 0th moment,
the feature centre (u
, v
) is calculated from the 1st
moment and the feature orientation
and eccentricity
are calculated from higher moments. After correcting
for lens distortion (§4), the feature centre is (u
, v
Thus, the jth point is characterized by the vector z
, v
). Such features are extracted on
every frame from every camera, although the num ber
of points m found on each frame may vary. We set
the initial thresholds for detection low to minimize
the number of missed detections—false positives at
this stage are rejected later by the data association
algorithm (§3.2).
Our system is capable of dealing with illumination
conditions that vary slowly over time by using an
ongoing estimate of the background luminance and its
variance, which are maintained on a per-pixel basis
by updating the current estimates with data from
every 500th frame (or other arbitrary interval). A
more sophisticated two-dimensional feature extraction
algorithm could be used, but we have found this
scheme to be sufficient for our purposes and sufficiently
simple to operate with minimal latency.
While the real-time operation of flydra is essential for
experiments modifying sensory feedback, another
advantage of an online tracking system is that the
amount of data required to be saved for later analysis
is greatly reduced. By performing only two-dimensional
feature extraction in real time, to reconstruct three-
dimensional trajectories later, only the vectors z
be saved, resulting in orders of magnitude less data
than the full-frame camera images. Thus, to achieve
solely the low data rates of real-time tracking, the fol-
lowing sections dealing with three dimensions are not
necessary to be implemented for this benefit of real-
time use. Furthermore, raw images taken from the
neighbourhood of the feature points could also be
extracted and saved for later analysis, saving slightly
more data, but still at rates substantially less than the
full camera frames provide. This fact is particularly
useful for cameras with a higher data rate than hard
drives can save, and such a feature is implemented
in flydra.
Figure 4 shows the parameters (u, v,
) from the
two-dimensional feature extraction algorithm during
a hummingbird flight. These two-dimensional featu-
res, in addition to three-dimensional reconstructions,
are overlaid on raw images extracted and saved using
the real-time image extraction technique described
The goal of flydra, as described in §1.1, is to find the
MAP estimate of the state of all targets. For simplicity,
we model interaction between targets in a very limited
0 ms
cam 1
cam 2
25 ms 50 ms 75 ms 100 ms
Figure 4. Raw images, two-dimensional data extracted from the images, and overlaid computed three-dimensional position and
body orientation of a hummingbird (Calypte anna). In these images, a blue circle is drawn centred on the two-dimensional image
coordinates (u
, v
). The blue line segment is drawn through the detected body axis (
in §3.2) when eccentricity (1) of the detected
object exceeds a threshold. The orange circle is drawn centred on the three-dimensional estimate of position (x, y, z) reprojected
through the camera calibration matrix P, and the orange line segment is drawn in the direction of the three-dimensional body
orientation vector.
Real-time 3D multi-camera animal tracking A. D. Straw et al. 399
J. R. Soc. Interface (2011)
way. Although in many cases the animals we are
interested in tracking do interact (for example, hum-
mingbirds engage in competition in which they
threaten or even contact each other), mathematically
limiting the interaction facilitates a reduction in com-
putational complexity. First, the process update is
independent for each kth animal
Þ: ð3:1Þ
Second, we implemented only a slight coupling
between targets in the data association algorithm.
Thus, the observation likelihood model p(Z
) is inde-
pendent for each target with the exception described in
§3.2.1, and making this assumption allows use of the
NNSF as described below.
Modelling individual target states as mostly indepen-
dent allows the problem of estimating the MAP of joint
target state S to be treated nearly as l independent,
smaller problems. One benefit of making the assump-
tion of target independence is that the target tracking
and data association parts of our system are paralleliz-
able. Although not yet implemented in parallel, our
system is theoretically capable of tracking very
many (tens or hundreds) targets simultaneously with
low latency on a computer with sufficiently many
processing units.
The cost of this near-independence assumption is
reduced tracking accuracy during periods of near con-
tact (§3.2.1). Data from these periods could be
analysed later using more sophisticated multi-target
tracking data association techniques, presumably in
an offline setting, especially because such periods
could be easily identified using a simple algorithm. All
data presented in this paper used the procedure
described here.
3.1. Kalman filtering
The standard EKF approximately estimates statistics of
the posterior distribution (equation (1.1)) for nonlinear
processes with additive Gaussian noise (details are
given in appendix A). To use this framework, we
make the assumption that noise in the relevant
processes is Gaussian. Additionally, our target indepen-
dence assumption allows a single Kalman filter
implementation to be used for each tracked target.
The EKF estimates state and its covariance based on
a prior state estimate and incoming observations
by using models of the state update process, the
observation process, and estimates of the noise of each
We use a linear model for the dynamics of the system
and a nonlinear model of the observation process.
Specifically, the time evolution of the system is
modelled with the linear discrete stochastic model
¼ As
þ w: ð3:2Þ
We treat the target as an orientation-free particle,
with state vector s ¼ (x, y, z , x
, y
, z
) describing position
and velocity in three-dimensional space. The process
update model A represents, in our case, the laws of
motion for a constant velocity particle
A ¼
100dt 00
0100dt 0
0010 0dt
000 1 0 0
000 0 1 0
000 0 0 1
with dt being the time step. Manoeuvring of the target
(deviation from the constant velocity) is modelled as
noise in this formulation. The random variable w rep-
resents this process update noise with a normal
probability distribution with zero mean and the process
covariance matrix Q. Despite the use of a constant vel-
ocity model, the more complex trajectory of a fly or
other target is accurately estimated by updating the
state estimate with frequent observations.
For a given data association hypothesis (§3.2), a set
of observations is available for target k. A nonlinear
observation model, requiring use of an EKF, is used
to describe the action of a projective camera (equations
(3.3) and (3.4)). This allows observation error to be
modelled as Gaussian noise on the image plane. Fur-
thermore, during tracking, triangulation happen s only
implicitly, and error estimates of target position are
larger along the direction of t he ray between the
target and the camera centre. (To be clear, explicit
triangulation is performed during the initialization
of a Kalman model target, as explained in §3.2.)
Thus, observations from alternating single cameras on
successive frames would be sufficient to produce a
three-dimensional estimate of target position. For
example, figure 5 shows a reconstructed fly trajectory
in which two frames were lacking data from all but
one camera. During these two frames, the estimated
error increased, particularly along the camera-fly axis,
and no triangulation was possible. Nevertheless, an
y (mm)z (mm)
x (mm)
150 200 250 300 350
Figure 5. Two seconds of a reconstructed Drosophila trajec-
tory. (a) Top view. (b) Side view of same trajectory.
Kalman filter-based estimates of fly position s are plotted as
dots at the centre of ellipsoids, which are the projections of
the multi-variate normal specified by the covariance matrix
P. Additionally, position estimated directly by triangulation
of two-dimensional point locations (see appendix B) is plotted
with crosses. The fly began on the right and flew in the direc-
tion denoted by the arrow. Note that for two frames near the
beginning, only a single camera contributed to the tracking
and the error estimate increased.
400 Real-time 3D multi-camera animal tracking A. D. Straw et al.
J. R. Soc. Interface (2011)
estimate of three-dimensional position was made and
appears reasonable. The observation vector y ¼ (u
, u
, v
, ..., u
, v
) is the vector of the distortion-
corrected image points from n cameras. The nonlinear
observation model relates y
to the state s
¼ hðs
Þþv ð3:3Þ
where s
is the state at time t, the function h models the
action of the cameras and v is observation noise. The
vector-valued h(s) is the concatenation of the image
points found by applying the image point equations
(equations (B 2) and (B 3) in appendix B.) to each of
the n cameras
ðsÞ; ...;h
; ...;
; ...;
XÞ; ...;HðP
The overbar (-) denotes a noise-free prediction to
which the zero-mean noise vector v is added, and X is
the homogene ous form of the first three components
of s. The random variable v models the observation
noise as normal in the image plane with zero mean
and covariance matrix R.
At each time step t, the EKF formu lation is then
used to estimate the state sˆ in addition to the error P
(see appendix A). Together, the data associated with
each target is
¼ fsˆ, Pg. With the possibility of mul-
tiple targets being tracked simultaneously, the kth
target is assigned
One issue we faced when implementing the Kalman
filter was parameter selection. Our choice of parameters
was done through educated guesses followed by an iter-
ated trial-and-error procedure using several different
trajectories’ observations. The parameters that resulted
in trajectories closest to those seen by eye and with least
magnitude error estimate P were used. We obtained
good results, for fruit flies measured under the con-
ditions of our set-ups, with the process covariance
matrix Q being diagonal, with the first three entries
being 100 mm
and the next three being 0.25
). Therefore, our model treats manoeuvring as
position and velocity noise. For the observation covari-
ance matrix R, we found good results with a diagonal
matrix with entries of 1, corresponding to variance of
the observed image positions of one pixel. Parameter
selection could be automated by an expectation
maximization algorithm, but we found this was not
Another issue is missing data—in some time steps,
all views of the fly may be occluded or low contrast,
leaving a missing value of y for that time step. In
those cases, we simply set the a posteriori estimate to
the a priori prediction, as follows from equation (1.1).
In these circumstances, the error estimate P grows by
the process covariance Q, and does not get reduced by
(non-existent) new observations. This follows directly
from the Kalman filter equations (appendix A). If too
many successive frames with no observations occur,
the error estimate will exceed a threshold and tracking
will be terminated for that target (described in §3.2.3).
3.2. Data association
One simplification made in the system overview (§1.1)
was to neglect the data association problem—the
assignment of observations to targets. We address the
problem by marginalizing the observation likelihood
across hidden data association variables D, where each
D corresponds to a different hypothesis abo ut how the
feature points correspond with the targets. Thus, the
model of observation likelihood from equation (1.1)
; DjS
Þ: ð3:5Þ
In fact, computing probabilities across all possible
data association hypotheses D across multiple time
steps would result in a combinatorial explosion of possi-
bilities. Among the various means of limiting the
amount of computation required by limiting the
number of hypotheses considered, we have chosen a
simple method, the NNSF data association algorithm
run on each target independently [40]. This algorithm
is sufficiently efficient to operate in real time for typical
conditions of our system. Thus, we approximate the
sum of all data association hypotheses with the single
best hypothesis D
, defined to be the NNSF
output for each of the k independent targets
; D
; DjS
Þ: ð3:6Þ
This implies that we assume hypotheses other than
have vanishingly small probability. Errors
owing to this assumption being false could be corrected
in a later, offline pass through the data keeping track of
more data association hypotheses using other
D is a matrix with each column being the data associ-
ation vector for target k such that D ¼ [d
This matrix has n rows (the number of cameras) and l
columns (the number of active targets). The data associ-
ation vector d
for target k has n elements of value null
or index j of the feature z
assigned to that target. As
described below 3.2.4), these values are computed
from the predicted location of the target and the features
returned from the cameras.
3.2.1. Preventing track merging. One well-known poten-
tial problem with multi-target tracking is the undesired
merging of target trajectories if targets begin to share
the same observations. Before implementing the follow-
ing rule, flydra would sometimes suffer from this
merging problem when tracking hummingbirds engaged
in territorial competition. In such fights, male hum-
mingbirds often fly directly at each other and come in
physical contact. To prevent the two trajectories from
merging in such cases, a single pass is made through
the data association assignments after each frame. In
the case that more than one target was assigned the
exact same subset of feature points, a comparison is
Real-time 3D multi-camera animal tracking A. D. Straw et al. 401
J. R. Soc. Interface (2011)
made between the observation and the predicted obser-
vation. In this case, only the target corresponding to the
closest prediction is assigned the data, and the other
target is updated without any observation. We found
this procedure to require minimal additional compu-
tational cost, while still being effective in preventing
trajectory merging.
3.2.2. NNSF and generative model of image features. To
implement the NNSF algorithm, we implement a gen-
erative model of feature appearance based on the prior
estimate of target state. By predicting target position
in an incoming frame based on prior information, the
system selects two-dimensional image points as being
likely to come from the target by gating unlikely obser-
vations, thus limiting the amount of computation
Recall from §2 that for each time t and camera i, m
feature points are found with the jth point being z
, v
, 1
). The distortion-corrected image
coordinates are (u, v), while
is the area of the object
on the image plane measured by thresholding of the
difference image between the current and background
is an estimate of the maximum difference
within the difference image, and
and 1 are the slope
and eccentricity of the image feature. Each camera
may return multiple candidate points per time step,
with all points from the ith camera represented as Z
a matrix whose columns are the individual vectors z
such that Z
¼ [ z
]. The purpose of the data
association algorithm is to assign each incoming point
z to an existing Kalman model
, to initialize a new
Kalman model, or attribute it to a false positive (a
null target). Furthermore, old Kalman models for
which no recent observations exist owing to the
target leaving the tracking volume must be deleted.
The use of such a data association algorithm allows
flydra to track multiple targets simultan eously, as in
figure 6 , and, by reducing computational costs, allows
the two-dimensional feature extraction algorithm
to return many points to minimize the number of
missed detections.
3.2.3. Entry and exit of targets. How does our system
deal with existing targets losing visibility owing to leav-
ing the tracking volume, occlusion or lowered visual
contrast? What happens when new targets become vis-
ible? We treat such occurrences as part of the update
model in the Bayesian framework of §1.1. Thus, in the
terminology from that section, our motion model for
all targets p(S
) includes the possibility of initializ-
ing a new target or removing an existing target. This
section describes the procedure followed.
For all data points z that remained ‘unclaimed’
by the predicted locations of pre-existing targets
(see §3.2.4 below), we use an unguided hypothesis
testing algorithm. This triangulates a hypothe-
sized three-dimensional point for every possible
combination of 2, 3, ..., n cameras, for a total of
combinations. Any three-
dimensional point with reprojection error less than an
arbitrary threshold using the greatest number of cam-
eras is then used to initialize a new Kalman filter
instance. The initial state estimate is set to that
three-dimensional position with zero velocity and a rela-
tively high error estimate. Tracking is stopped (a target
is removed) once the estimated error P exceeds a
threshold. This most commonly happens, for example,
when the target leaves the tracking area and thus
receives no observations for a given number of frames.
3.2.4. Using incoming data. Ultimately, the purpose
of the data association step is to determine which
image feature points will be used by which target.
Given the target independence assumption, each
target uses incoming data independently. This section
outlines how the data association algorithm is used to
determine the feature points treated as the observation
for a given target.
To use the Kalman filter described in §3.1, obser-
vation vectors must be formed from the incoming
data. False positives must be separated from correct
detections, and, because multiple targets may be
tracked simultaneously, correct detections must be
associated with a particular Kalman model. At
the beginning of processing for each time step t for
the kth Kalman model, a prior estimate of target pos-
ition and error
¼ fsˆ
g is available. It
must be determined which, if any, of the m image
points from the ith camera is associated with the kth
target. Due to the real-time requirements for our
system, flydra gates incoming detections on simple
criteria before performing more computationally inten-
sive tasks.
For target k at time t, the data association function
g is
¼ gðZ
; ...; Z
Þ: ð3:7Þ
This is a function of the image points Z
from each of the
n cameras and the prior information for target k.
0 5 10 15 20
time (s)
x (m)
y (m)z (m)
Figure 6. Multiple flies tracked simultaneously. Each auto-
matically segmented trajectory is plotted in its own colour.
Note that the dark green and cyan trajectories probably
came from the same fly which, for a period near the 10th
second, was not tracked owing to a series of missed detections
or leaving the tracking volume (§3.2). (a) First horizontal axis
(x). (b) Second horizontal axis (y). (c) Vertical axis (z).
402 Real-time 3D multi-camera animal tracking A. D. Straw et al.
J. R. Soc. Interface (2011)
The assignment vector for the kth target, d
, defines
which points from which cameras view a target. This
vector has a component for each of the n cameras,
which is either null (if that camera does not contribute)
or is the column index of Z
corresponding to the associ-
ated point. Thus, d
has length n, the number of
cameras, and no camera may view the same target
more than once. Note, the k and t superscript and sub-
script on d indicate the assignment vector is for target k
at time step t, whereas below (equation (3.8)), the sub-
script i is used to indicate the ith component of the
vector d.
The data association function g may be written in
terms of the components of d. The ith component is
the index of the columns of Z
that maximizes likelihood
of the observation given the predicted target state and
error and is defined to be
¼ argmax
ð pðz
ÞÞ; z
[ Z
: ð3:8Þ
Our likelihood function gates detections based on
two conditions. First, the incoming detected location
, v
) must be within a threshold Euclidean distance
from the estimated target location projected on the
image. The Euclidean distance on the image plane is
dist2d ¼ d
; HðP
; ð3:9Þ
where H(P
X) finds the projected image coordinates of
X, where X is the homogeneous form of the first three
components of s, the expected three-dimensional pos-
ition of the target. The function H and camera matrix
are described in appendix B. The gating can be
expressed as an indicator function
; v
1 if dist2d , thresh
0 otherwise:
Second, the area of the detected object (
greater than a threshold value, expressed as
. thresh
0 otherwise:
If these conditions are met, the distance of the ray
connecting the camera centre and two-dimensional
point on the image plane (u
, v
) from the expected
three-dimensional location a
is used to further deter-
mine likelihood. We use the Mahalanobis distance,
which for a vector a with an expected value of a
covariance matrix S is
: ð3:12Þ
Because the distance function is convex for a given a
and S, we can solve directly for the closest point on the
ray, by setting a
equal to the first three terms of sˆ
and S to the upper left 3 3 submatrix of P
Then, if the ray is a parametrized line of the form
L(s) ¼ s
(a, b, c) þ (x, y, z) where (a, b, c) is the direc-
tion of the ray formed by the image point (u, v) and the
camera centre and (x, y, z) is a point on the ray, we can
find the value of s for which the distance between L(s)
and a
is minimized by finding the value of s where
derivative of d
(L(s), a
) is zero. If we call this
closest point a and combine equations ( 3.10) (3.12),
then our likelihood function is
; v
Þ 1
Þ e
: ð3:13Þ
Note that, owing to the multiplication, if either of
the first two factors is zero, the third (and more compu-
tationally expensive) condition need not be evaluated.
Camera calibrations may be obtained in any way that
produces camera calibration matrices (described in
appendix B) and, optionally, parameters for a model of
the nonlinear distortions of cameras. Good calibrations
are critical for flydra because, as target visibility changes
from one subset of cameras to another subset, any misa-
lignment of the calibrations will introduce artefactual
movement in the reconstructed trajectories. Typically,
we obtain camera calibrations in a two-step process.
First, the direct linear transformation (DLT) algorithm
[41] directly estimates camera calibration matrices P
that could be used for triangulation as described in
appendix B. However, because we use only about 10
manually digitized corresponding two-/three-dimen-
sional pairs per camera, this calibration is of relatively
low precision and, as performed, ignores optical distor-
tions causing deviations from the linear simple pinhole
camera model. Therefore, an automated Multi-Camer a
Self Calibration Toolbox [42] is used as a second step.
This toolbox uses inherent numerical redundancy when
multiple cameras are viewing a common set of three-
dimensional points through use of a factorization algor-
ithm [43] followed by bundle adjustment (reviewed in
§18.1 of [44]). By moving a small LED point light
source through the tracking volume (or, indeed, a freely
flying fly), hundreds of corresponding two-/three-dimen-
sional points are generated which lead to a single,
overdetermined solution which, without knowing the
three-dimensional positions of the LED, is accurate up
to a scale, rotation and translation. The camera centres,
either from the DLT algorithm or measured directly, are
then used to find the best scale, rotation and translation.
As part of the Multi-Camera Self Calibration Toolbox,
this process may be iterated with an additional step to
estimate nonlinear camera parameters such as radial dis-
tortion [42] using the Camera Calibration Toolbox of
Bouguet [45]. Alternatively, we have also implemented
the method of Prescott & McLean [46]toestimate
radial distortion parameters before use of Svoboda’s tool-
box, which we found necessary when using wide angle
lenses with significant radial distortion (e.g. 150 pixels
in some cases).
We built three different flydra systems: a five camera,
six computer 100 fps system for tracking fruit flies in
a 0.3 0.3 1.5 m arena (e.g. figures 1 and 9;[38],
which used the same hardware but a simpler version
of the tracking software), an 11-camera, nine-computer
60 fps system for tracking fruit flies in a large—2 m
Real-time 3D multi-camera animal tracking A. D. Straw et al. 403
J. R. Soc. Interface (2011)
diameter 0.8 m high—cylinder (e.g. figure 5) and a
four camera, five computer 200 fps system for tracking
hummingbirds in a 1.5 1.5 3 m arena (e.g.
figure 4). Apart from the low-level camera drivers, the
same software is running on each of these systems.
We used the Python computer language to
implement flydra. In particular, the open source
OTMOT camera software serves as the basis for the
image acquisition and real-time image processing [47].
The motmot.libcamiface documentation contains infor-
mation about what cameras are compatible, and the
motmot.realtime_image_analysis package contains
the algorithms for the two-dimensional real-time
image processing. Several other pieces of software,
most of which are open source, are instrumental to
this system: P
LIBDC1394, PTPD, GCC and UBUNTU. We used Intel
Pentium 4 and Core 2 Duo-based computers.
The quality of three-dimensional reconstructions was
verified in two ways. First, the distance between two-
dimensional points projected from a three-dimensional
estimate derived from the originally extracted two-
dimensional points is a measure of calibration precision.
For all figures shown, the mean reprojection error was
less than one pixel, and for most cameras in most
figures, was less than 0.5 pixels. Second, to determine
accuracy, we verified the three-dimensional coordinates
and distances between coordinates measured through
triangulation against values measured physically. For
two such measurements in the system shown in
figure 1, these values were within 4 per cent. So, in gen-
eral, the system appears to have high precision (low
reprojection error), but slighly worse accuracy. Because
we are using standard calibration and estimation algor-
ithms, we did not perform a more detailed groundtruth
analysis of position estimates.
We measured the latency of the three-dimensional
reconstruction by synchronizing several clocks involved
in our experimental setup and then measuring the
duration between onset of image acquisition and com-
pletion of the computation of target position. When
flydra is running, the clocks of the various computers
are synchronized to within 1 ms by PTPd, the precise
time protocol daemon, an implementation of the
IEEE 1588 clock synchronization protocol [ 48].
Additionally, a microcontroller (AT90USBKEY,
Atmel, USA) running custom firmware is connected
over USB to the central reconstruction computer and
is synchronized using an algorithm similar to PTPd,
allowing the precise time of frame trigger events to be
known by processes running within the computers.
Measurements were made of the latency between the
time of the hardware trigger pulse generated on the
microcontroller to start acquisition of frame t and
the moment the state vector sˆ
was computed. These
measurements were made with a central computer
being a 3 GHz Intel Core 2 Duo CPU. As shown in
figure 7, the median three-dimensional reconstruction
timestamp is 39 ms. Further investigation showed the
make-up of this delay. From the specifications of the
cameras and bus used, 19.5 ms is a lower bound on
the latency of transferring the image across the
IEEE 1394 bus (and could presumably be reduced
by using alternative technologies such as Gigabit
Ethernet or Camera Link). Further measurements on
this system showed that two-dimensional feature
extraction takes 612 ms, and that image acquisition
and two-dimensional feature extraction together
take 2632 ms. The remainder of the latency to
three-dimensional estimation is owing to network trans-
mission, triangulation and tracking, and, most likely,
non-optimal queueing and ordering of data while pas-
sing it between these stages. Further investigation and
optimization have not been performed. Although we
have not calculated latency in a similar way in the
200 fps hummingbird tracking GigE system, the
computers are significantly faster and therefore two-
dimensional feature extraction takes less than 5 ms on
Core 2 Duo computers. Because the GigE bus is
approximately twice as fast as the 1394, the similarly
sized images arrive with half the latency. We therefore
expect median latency in this system to be about 25 ms.
A few examples serve to illustrate some of the capabili-
ties of flydra. We are actively engaged in understanding
the sensory-motor control of flight in the fruit fly
Drosophila melanogaster. Many basic features of the
effects of visual stimulation on the flight of flies
are known, and the present system allows us to charac-
terize these phenomena in substantial detail. For
example, the presence of a vertical landmark such as a
black post on a white background greatly influences
the structure of flight, causing flies to ‘fixate’, or turn
towards, the post [49]. By studying such behaviour in
free flight (e.g. figure 3), we have found that flies
approach the post until some small distance is reached,
and then often turn rapidly and fly away.
Because we have an online estimate of fly velocity
and position in sˆ, we can predict the future location of
the fly. Of course, the quality of the prediction declines
with the duration of extrapolation, but it is sufficient
200 000
n occurances
50 000
Figure 7. Latency of one tracking system. A histogram of the
latency of three-dimensional reconstruction was generated
after tracking 20 flies for 18 h. Median latency was 39 ms.
404 Real-time 3D multi-camera animal tracking A. D. Straw et al.
J. R. Soc. Interface (2011)
for many tasks, even with the latency taken by the
three-dimensional reconstruction itself. One example is
the triggering of high resolution, high speed cameras
(e.g. 1024 1024 pixels, 6000 frames per second as
shown in figure 8a). Such cameras typically buffer
their images to RAM and are downloaded offline. We
can construct a trigger condition based on the position
of an animal (or a recent history of position, allowing
triggering only on specific manoeuvres). Figure 8a
shows a contrast-enhancing false colour montage of a
fly making a close approach to a post before leaving.
By studying precise movements of the wings and body
in addition to larger-scale movement through the
environment, we are working to understand the neural
and biomechanical mechanisms underlying control
of flight.
An additional possibility enabled by low latency
three-dimensional state estimates is that visual stimuli
can be modulated to produce a ‘virtual reality’ environ-
ment in which the properties of the visual feedback loop
can be artificially manipulated [35,36,52,53]. In these
types of experiments, it is critical that moving visual
stimuli do not affect the tracking. For this reason, we
illuminate flies with near-IR light and use high pass fil-
ters in front of the camera lenses (e.g. R-72, Hoya
Corporation). Visual cues provided to the flies are in
the blue-green range.
By estimating the orientation of the animal, approxi-
mate reconstructions of the visual stimulus experienced
by the animal may be made. For example, to make
figure 8b, a simulated view through the compound eye
of a fly, we assumed that the fly head, and thus eyes,
was oriented tangent to the direction of travel, and
that the roll angle was fixed at zero. This information,
together with a three-dimensional model of the environ-
ment, was used to generate the reconstruction [ 50,51].
Such reconstructions are informative for understanding
the nature of the challenge of navigating visually with
limited spatial resolution. Although it is known that
flies do move their head relative to their longitudinal
body axis, these movements are generally small [2],
and thus the errors resulting from the assumptions
listed above could be reduced by estimating body orien-
tation using the method described in appendix
B. Because a fly’s eyes are fixed to its head, further
reconstruction accuracy could be gained by fixing the
head relative to the body (by gluing the head to the
thorax), although the behavioural effects of such a
manipulation would need to be investigated. Neverthe-
less, owing to the substantial uncertainties involved in
estimating the pose of an insect head, investigations
based on such reconstructions would need to be
carefully validated.
Finally, because of the configurability of the
system, it is feasible to consider large-scale tracking in
naturalistic environments that would lead to greater
understanding of the natural history of flies [54]or
other animals.
At low levels of luminance contrast, when differences in
the luminance of different parts of a scene are minimal,
it is difficult or impossible to detect motion of visual
features. In flies, which judge self-motion (in part)
using visual motion detection, the exact nature of con-
trast sensitivity has been used as a tool to investigate
the fundamental mechanism of motion detection using
electrophysiological recordings. At low contrast levels,
these studies have found a quadratic relationship
between membrane potential and luminance contrast
[5557]. This result is consistent with Hassenstein
Reichardt correlator model for elementary motion
detection (the HR-EMD, [58]), and these find ings are
part of the evidence to support the hypothesis that
the fly visual system implements something very similar
to this mathematical model.
Despite these and other electrophysiological findings
suggesting the HR-EMD may underlie fly motion
Figure 8. Example applications for flydra. (a) By tracking flies in real time, we can trigger high speed video cameras (6000 fps) to
record interesting events for later analysis. (b) A simulated reconstruction of the visual scene viewed through the left eye of Dro-
sophila. Such visual reconstructions can be used to simulate neural activity in the visual system [50,51]. In this simulated view, a
three-dimensional model of the experimental environment, with green backlighting and a black vertical post, was used in con-
junction with an optical model of a fly eye and the fly’s pose estimate to render an image approximating that seen by a fly
during an experiment.
Real-time 3D multi-camera animal tracking A. D. Straw et al. 405
J. R. Soc. Interface (2011)
sensitivity, studies of flight speed regulation and other
visual behaviours in freely flying flies [59] and honey
bees [6062] show that the free-flight behaviour of
these insects is inconsistent with a flight velocity regu-
lator based on a simple HR-EMD model. More
recently, Baird et al. [63] have shown that over a large
range of contrasts, flight velocity in honey bees is
nearly unaffected by contrast. As noted by those
authors, however, their set-up was unable to achieve
true zero contrast owing to imperfections with their
apparatus. They suggest that contrast adaptation [64]
may have been responsible for boosting the responses
to low contrasts and attenuating responses to high con-
trast. This possibility was supported by the finding that
forward velocity was better regulated at the nominal
‘zero contrast’ condition than in the presence of an
axial stripe, which may have had the effect of prevent-
ing contrast adaptation while provide no contrast
perpendicular to the direction of flight [63].
We tested the effect of contrast on the regulation of
flight speed in Drosophila melanogaster, and the results
are shown in figure 9. In this set-up, we found that when
contrast was sufficiently high (Michelson contrast
1.6), flies regulated their speed to a mean speed of
0.15 m s
with a standard deviation of 0.07. As con-
trast was lowered, the mean speed increased, as did
variability of speed, suggesting that speed regulation
suffered owing to a loss of visual feedback. To perform
these experiments, a computer projector (DepthQ,
modified to remove colour filter wheel, Lightspeed
Design, USA) illuminated the long walls and floor of a
0.3 0.3 1.5 m arena with a regular checkerboard
pattern (5 cm
) of varying contrast and fixed luminance
(2 cd m
). The test contrasts were cycled, with each
contrast displayed for 5 min. Twenty females flies
were released into the arena and tracked over 12 h.
Any flight segments more than 5 cm from the walls,
floor or ceiling were analysed, although for the majority
of the time flies were standing or walking on the floors
or walls of the arena. Horizontal flight speed was
measured as the first derivative of position in the XY
direction, and histograms were computed with each
frame constituting a single sample. Because the identity
of the flies could not be tracked for the duration of the
experiment, the data contain pseudo-replication—some
flies probably contributed more to the overall histogram
than others. Nevertheless, the results from three separ-
ate experimental days with 20 new flies tested on each
day were each qualitatively similar to the pooled results
shown, which include 1760 total seconds of flight
in which tracked fly was 5 cm or greater from the
nearest arena surface and were acquired during 30
cumulative hours.
The primary difference between our findings on the
effect of contrast on flight speed in Drosophila melano-
gaster compared with that found in honey bees by [63]is
that at low contrasts (below 0.16 Michelson contrast),
flight speed in Drosophila is faster and more variable.
This difference could be owing to several non-mutually
exclusive factors: (i) our arena may have fewer imperfec-
tions which create visual contrast; (ii) fruit flies may
have a lower absolute contrast sensitivity than honey
bees; (iii) fruit flies may have a lower contrast
sensitivity at the luminance level of the experiments;
(iv) fruit flies may have less contrast adaptation
ability; or (v) fruit flies may employ an alternate
motion detection mechanism.
Despite the difference between the present results in
fruit flies from those of honey bees at low contrast
levels, at high contrasts (above 0.16 Michelson con-
trast for Drosophila) flight speed in both species was
regulated around a constant value. This suggests
that the visual system has little trouble estimating
self-motion at these contrast values and that insects
regulate flight speed about a set point using visual
The highly automated and real-time capabilities of our
system allow unprecedented experimental opportu-
nities. We are currently investigating the object
approach and avoidance phenomenon of fruit flies illus-
trated in figure 3. We are also studying manoeuvring in
solitary and competing hummingbirds and the role of
manoeuvring in establishing dominance. One of the
opportunities made possible by the molecular biological
revolution are powerful new tools that can be used to
visualize and modify the activity of neurons and
neural circuits. By precisely quantifying high level beha-
viours, such as the object attraction/repulsion
described above, we hope to make use of these tools
to approach the question of how neurons contribute
to the process of behaviour.
The data for figure 4 were gathered in collaboration with
Douglas Altshuler. Sawyer Fuller helped with the EKF
formulation, provided helpful feedback on the manuscript
and, together with Gaby Maimon, Rosalyn Sayaman,
Martin Peek and Aza Raskin, helped with physical
construction of arenas and bug reports on the software.
0.1 0.2 0.3 0.4 0.5 0.6
contrast 0.01
0.21 ± 0.11 m
contrast 0.03
0.20 ± 0.11 m
contrast 0.06
0.17 ± 0.09 m
contrast 0.16
0.15 ± 0.07 m
contrast 0.40
0.15 ± 0.08 m
contrast 1.00
0.14 ± 0.07 m
horizontal speed (m s
Figure 9. Drosophila melanogaster maintain a lower flight
speed with lower variability as visual contrast is increased.
Mean and standard deviation of flight speeds are shown in
text, and each histogram is normalized to have equal area.
Ambient illumination reflected from one surface after being
scattered from other illuminated surfaces slightly reduced con-
trast from the nominal values shown here.
406 Real-time 3D multi-camera animal tracking A. D. Straw et al.
J. R. Soc. Interface (2011)
Pete Trautmann provided insight on data association, and
Pietro Perona provided helpful suggestions on the
manuscript. This work was supported by grants from the
Packard Foundation, AFOSR (FA9550-06-1-0079), ARO
(DAAD 19-03-D-0004), NIH (R01 DA022777) and NSF
(0923802) to M.H.D. and AFOSR (FA9550-10-1-0086) to
The EKF was used as described in §3.1. Here, we give
the equations using the notation of this paper. Note
that because only the observation process is nonlinear,
the process dynamics are specified with (linear)
matrix A.
The a priori predictions of these values base d on the
previous frame’s posterior estimates are
¼ A
¼ AP
þ Q: ðA2Þ
To incorporate observations, a gain term K is calcu-
lated to weight the innovation arising from the
difference between the a priori state estimate sˆ
and the observation y
þ R
: ðA3Þ
The observation matrix, H
, is defined to be the Jaco-
bian of the observation function (equation (3.4))
evaluated at the expected state
: ðA4Þ
The posterior estimates are then
¼ðI K
: ðA6Þ
The basic two- to three-dimensional calculation
finds the best three-dimensional location for two or
more two-dimensional camera views of a point, and is
implemented using a linear least-squares fit of the inter-
section of n rays defined by the two-dimensional image
points and three-dimensional camera centres of each of
the n cameras [44]. After correction for radial distortion,
the image of a three-dimensional point on the ith
camera is (u
, v
). For mathematical reasons, it is con-
venient to represent this two-dimensional image point
in homogene ous coordinates
; s
; t
Þ; ðB1Þ
such that u
¼ r
and v
¼ s
. For convenience, we
define the functi on H to convert from homogeneous to
Cartesian coordinates, thus
HðxÞ¼ðu; vÞ¼
: ðB2Þ
The 3 4 camera calibration matrix P
models the
projection from a three-dimensional homogeneous
point X ¼ (X
, X
, X
, X
) (representing the three-
dimensional point with inhomogeneous coordinates
(x, y, z) ¼ (X
, X
, X
)) into the image
¼ P
X: ðB3Þ
By combining the image point equation (B 3) from
two or more cameras, we can solve for X using the
homogeneous linear triangulation method based on
the singular value decomposition as described in [44],
§§12.2 and A5.3).
A similar approach can be used for reconstructing
the orientation of the longitudinal axis of an insect
or bird. Briefly, a line is fit to this axis in each
two-dimensional image (using u, v and
from §2)
and, together with the camera centre, is used to rep-
resent a plane in three-dimensional space. The best-fit
line of intersection of the n planes is then found with
a similar singular value decomposition algorithm
([44], §12.7).
1 Srinivasan, M. V., Zhang, S. W., Lehrer, M. & Collett,
T. S. 1996 Honeybee navigation en route to the goal:
visual flight control and odometry. J. Exp. Biol. 199,
2 Schilstra, C. & Hateren, J. H. 1999 Blowfly flight and optic
flow. I. Thorax kinematics and flight dynamics. J. Exp.
Biol. 202, 14811490.
3 Tammero, L. F. & Dickinson, M. H. 2002 The influence of
visual landscape on the free flight behavior of the fruit fly
Drosophila melanogaster. J. Exp. Biol. 205, 327 343.
4 Kern, R., van Hateren, J. H., Michaelis, C., Lindemann, J.
P. & Egelhaaf, M. 2005 Function of a fly motion-sensitive
neuron matches eye movements during free flight.
PLoS Biol. 3, 11301138. (doi:10.1371/journal.pbio.
5 Land, M. F. & Collett, T. S. 1974 Chasing behavior of
houseflies (Fannia canicularis ): a description and analysis.
J. Comp. Physiol. 89, 331357. (doi:10.1007/
6 Buelthoff, H., Poggio, T. & Wehrhahn, C. 1980 3-D analy-
sis of the flight trajectories of flies (Drosophila
melanogaster). Z. Nat. 35c, 811815.
7 Wehrhahn, C., Poggio, T. & Bu
lthoff, H. 1982 Tracking
and chasing in houseflies (musca). Biol. Cybernet. 45,
123130. (doi:10.1007/BF00335239)
8 Wagner, H. 1986 Flight performance and visual control of
the flight of the free-flying housefly (Musca domestica). II.
Pursuit of targets. Phil. Trans. R. Soc. Lond. B 312, 553
579. (doi:10.1098/rstb.1986.0018)
9 Mizutani, A., Chahl, J. S. & Srinivasan, M. V. 2003 Insect
behaviour: motion camouflage in dragonflies. Nature 423,
604. (doi:10.1038/423604a)
10 Frye, M. A., Tarsitano, M. & Dickinson, M. H. 2003 Odor
localization requires visual feedback during free flight in
Drosophila melanogaster. J. Exp. Biol. 206, 843 855.
Real-time 3D multi-camera animal tracking A. D. Straw et al. 407
J. R. Soc. Interface (2011)
11 Budick, S. A. & Dickinson, M. H. 2006 Free-flight
responses of Drosophila melanogaster to attractive odors.
J. Exp. Biol. 209, 30013017. (doi:10.1242/jeb.02305)
12 David, C. T. 1978 Relationship between body angle and
flight speed in free-flying Drosophila. Physiol. Entomol.
3, 191 195. (doi:10.1111/j.1365-3032.1978.tb00148.x)
13 Fry, S. N., Sayaman, R. & Dickinson, M. H. 2003 The
aerodynamics of free-flight maneuvers in Drosophila.
Science 300, 495498. (doi:10.1126/science.1081944)
14 Collett, T. S. & Land, M. F. 1975 Visual control of flight
behaviour in the hoverfly Syritta pipiens L. J. Comp. Phy-
siol. A Neuroethol. Sens. Neural Behav. Physiol. 99, 166.
15 Collett, T. S. & Land, M. F. 1978 How hoverflies compute
interception courses. J. Comp. Physiol. A Neuroethol.
Sens. Neural Behav. Physiol. 125, 191204. (doi:10.
16 Wehrhahn, C. 1979 Sex-specific differences in the chasing
behaviour of houseflies (musca). Biol. Cybernet. 32, 239
241. (doi:10.1007/BF00337647)
17 Dahmen, H.-J. & Zeil, J. 1984 Recording and reconstruct-
ing three-dimensional trajectories: a versatile method for
the field biologist. Proc. R. Soc. Lond. B 222, 107113.
18 Srinivasan, M. V., Zhang, S. W., Chahl, J. S., Barth, E. &
Venkatesh, S. 2000 How honeybees make grazing landings
on flat surfaces. Biol. Cybernet. 83, 171 183. (doi:10.
19 Hedrick, T. L. & Biewener, A. A. 2007 Low speed maneu-
vering flight of the rose-breasted cockatoo (Eolophus
roseicapillus). I. Kinematic and neuromuscular control of
turning. J. Exp. Biol. 210, 18971911. (doi:10.1242/jeb.
20 Tian, X., Diaz, J. I., Middleton, K., Galvao, R., Israeli, E.,
Roemer, A., Sullivan, A., Song, A., Swartz, S. & Breuer,
K. 2006 Direct measurements of the kinematics and
dynamics of bat flight. Bioinspiration & Biomimetics 1,
S10S18. (doi:10.1088/1748-3182/1/4/S02)
21 Khan, Z., Balch, T. & Dellaert, F. 2005 MCMC-based
particle filtering for tracking a variable number of interact-
ing targets. IEEE Trans. Pattern Anal. Mach. Intell. 27,
22 Khan, Z., Balch, T. & Dellaert, F. 2006 MCMC data
association and sparse factorization updating for real
time multitarget tracking with merged and multiple
measurements. IEEE Trans. Pattern Anal. Mach. Intell.
28, 19601972. (doi:10.1109/TPAMI.2006.247)
23 Branson, K., Robie, A., Bender, J., Perona, P. &
Dickinson, M. H. 2009 High-throughput ethomics in
large groups of Drosophila. Nat. Methods 6, 451457.
24 Klein, D. J. 2008 Coordinated control and estimation for
multi-agent systems: theory and practice, ch. 6. Tracking
Multiple Fish Robots Using Underwater Cameras, PhD
thesis, University of Washington.
25 Qu, W., Schonfeld, D. & Mohamed, M. 2007 Distributed
Bayesian multiple-target tracking in crowded environ-
ments using multiple collaborative cameras. EURASIP
J. Adv. Signal Process. Article ID 38373. (doi:10.1155/
26 Qu, W., Schonfeld, D. & Mohamed, M. 2007 Real-time
interactively distributed multi-object tracking using a
magnetic-inertia potential model. IEEE Trans. Multime-
dia (TMM) 9, 511 519. (doi:10.1109/TMM.2006.886266)
27 Ballerini, M. et al. 2008 Empirical investigation of starling
flocks: a benchmark study in collective animal behaviour.
Anim. Behav. 76, 201215. (doi:10.1016/j.anbehav.2008.
28 Cavagna, A., Cimarelli, A., Giardina, I., Orlandi, A.,
Parisi, G., Procaccini, A., Santagati, R. & Stefanini, F.
2008 New statistical tools for analyzing the structure of
animal groups. Math. Biosci. 214, 3237. (doi:10.1016/j.
29 Cavagna, A., Giardina, I., Orlandi, A., Parisi, G. &
Procaccini, A. 2008 The starflag handbook on collective
animal behaviour: 2. Three-dimensional analysis. Anim.
Behav. 76, 237248. (
30 Cavagna, A., Giardina, I., Orlandi, A., Parisi, G., Procac-
cini, A., Viale, M. & Zdravkovic, V. 2008 The starflag
handbook on collective animal behaviour: 1. Empirical
methods. Anim. Behav. 76, 217236. (doi:10.1016/j.anbe-
31 Wu, H., Zhao, Q., Zou, D. & Chen, Y. 2009 Acquiring 3d
motion trajectories of large numbers of swarming animals.
IEEE Int. Conf. on Computer Vision (ICCV) Workshop
on Video Oriented Object and Event Classification.
32 Zou, D., Zhao, Q., Wu, H. & Chen, Y. 2009 Reconstruct-
ing 3d motion trajectories of particle swarms by global
correspondence selection. In Int. Conf. on Computer
Vision (ICCV 09). Workshop on Video Oriented Object
and Event Classification, pp. 1578 1585.
33 Bomphrey, R. J., Walker, S. M. & Taylor, G. K. 2009 The
typical flight performance of blowflies: measuring the
normal performance envelope of Calliphora vicina using
a novel corner-cube arena. PLoS ONE 4, e7852. (doi:10.
34 Marden, J. H., Wolf, M. R. & Weber, K. E. 1997 Aerial
performance of Drosophila melanogaster from populations
selected for upwind flight ability. J. Exp. Biol. 200,
35 Fry, S. N., Muller, P., Baumann, H. J., Straw, A. D.,
Bichsel, M. & Robert, D. 2004 Context-dependent stimu-
lus presentation to freely moving animals in 3d.
J. Neurosci. Methods 135, 149157. ( doi:10.1016/j.jneu-
36 Fry, S. N., Rohrseitz, N., Straw, A. D. & Dickinson, M. H.
2008 Trackfly-virtual reality for a behavioral system
analysis in free-flying fruit flies. J. Neurosci. Methods
171, 110117. (doi:10.1016/j.jneumeth.2008.02.016)
37 Grover, D., Tower, J. & Tavare, S. 2008 O fly, where art
thou? J. R. Soc. Interface 5, 11811191. (doi:10.1098/
38 Maimon, G., Straw, A. D. & Dickinson, M. H. 2008 A
simple vision-based algorithm for decision making in
flying Drosophila. Curr. Biol. 18, 464 470. (doi:10.1016/
39 Piccardi, M. 2004 Background subtraction techniques: a
review. In Proc. of IEEE SMC 2004 Int. Conf. on Systems,
Man and Cybernetics. The Hague, The Netherlands,
October 2004.
40 Bar-Shalom, Y. & Fortmann, T. E. 1988 Tracking and
data association. New York, NY: Academic Press.
41 Abdel-Aziz, Y. I. & Karara, H. M. 1971 Direct linear trans-
formation from comparator coordinates into object space
coordinates. Proc. American Society of Photogrammetry
Symp. on Close-Range Photogrammetry, Urbana, IL,
2629 January 1971, pp. 118. Falls Church, VA:
American Society of Photogrammetry.
42 Svoboda, T., Martinec, D. & Pajdla, T. 2005 A convenient
multi-camera self-calibration for virtual environments.
PRESENCE: Teleoperators and Virtual Environments
14, 407422. (doi:10.1162/105474605774785325)
43 Sturm, P. & Triggs, B. 1996 A factorization based algor-
ithm for multi-image projective structure and motion. In
Proc. 4th Eur. Conf. on Computer Vision, April 1996,
Cambridge, UK, pp. 709 720. Springer.
408 Real-time 3D multi-camera animal tracking A. D. Straw et al.
J. R. Soc. Interface (2011)
44 Hartley, R. I. & Zisserman, A. 2003 Multiple view geome-
try in computer vision, 2nd edn. Cambridge, UK:
Cambridge University Press.
45 Bouguet, J. 2010 Camera calibration toolbox. See http://
46 Prescott, B. & McLean, G. F. 1997 Line-based correction
of radial lens distortion. Graph. Models Image Process.
59, 3947. (doi:10.1006/gmip.1996.0407)
47 Straw, A. D. & Dickinson, M. H. 2009 Motmot, an open-
source toolkit for realtime video acquisition and analysis.
Source Code Biol. Med. 4, 5, 120. (doi:10.1186/1751-
48 Correll, K., Barendt, N. & Branicky, M. 2005 Design
considerations for software-only implementations of the
IEEE 1588 precision time protocol. Proc. Conf. on
IEEE-1588 Standard for a Precision Clock Synchroniza-
tion Protocol for Networked Measurement and Control
Systems, NIST and IEEE, Winterthur, Switzerland, 10
12 October 1975.
49 Kennedy, J. S. 1939 Visual responses of flying mosquitoes.
Proc. Zool. Soc. Lond. 109, 221242.
50 Neumann, T. R. 2002 Modeling insect compound eyes:
space-variant spherical vision. In Proc. of the 2nd Int.
Workshop on Biologically Motivated Computer
Vision (eds H. H. Bu
lthoff, S.-W. Lee, T. Poggio & C.
Wallraven), LNCS 2525, pp. 360367. Berlin, Germany:
51 Dickson, W. B., Straw, A. D. & Dickinson, M. H. 2008
Integrative model of drosophila flight. AIAA J. 46,
21502165. (doi:10.2514/1.29862)
52 Strauss, R., Schuster, S. & Go
tz, K. G. 1997 Processing of
artificial visual feedback in the walking fruit fly Drosophila
melanogaster. J. Exp. Biol. 200, 12811296.
53 Straw, A. D. 2008 Vision egg: an open-source library for
realtime visual stimulus generation. Front. Neuroinfor-
matics 2, 110. (doi:10.3389/neuro.11.004.2008)
54 Stamps, J., Buechner, M., Alexander, K., Davis, J. &
Zuniga, N. 2005 Genotypic differences in space use and
movement patterns in Drosophila melanogaster. Anim.
Behav. 70, 609618. (doi:10.1016/j.anbehav.2004.11.018)
55 Dvorak, D., Srinivasan, M. V. & French, A. S. 1980 The
contrast sensitivity of fly movement-detecting neurons.
Vision Res. 20, 397407. (doi:10.1016/
56 Srinivasan, M. V. & Dvorak, D. R. 1980 Spatial processing
of visual information in the movement-detecting pathway of
the fly: characteristics and functional significance. J. Comp.
Physiol. A 140, 123. (doi:10.1007/BF00613743)
57 Egelhaaf, M. & Borst, A. 1989 Transient and steady-state
response properties of movement detectors. J. Opt. Soc.
Am. A 6, 116 127. (doi:10.1364/JOSAA.6.000116)
58 Hassenstein, B. & Reichardt, W. 1956 Systemtheoretische
analyse der zeit-, reihenfolgen- und vorzeichenauswertung
bei der bewegungsperzeption des ru
sselkafers Chloropha-
nus. Z. Nat. 11b, 513524.
59 David, C. T. 1982 Compensation for height in the control
of groundspeed by drosophila in a new, barbers pole wind-
tunnel. J. Comp. Physiol. 147, 485493. (doi:10.1007/
60 Srinivasan, M. V., Lehrer, M., Kirchner, W. H. & Zhang,
S. W. 1991 Range perception through apparent image
speed in freely flying honeybees. Vis. Neurosci. 6, 519
535. (doi:10.1017/S095252380000136X)
61 Srinivasan, M. V., Zhang, S. W. & Chandrashekara, K.
1993 Evidence for two distinct movement detecting mech-
anisms inc insect vision. Naturwissenschaften 80, 3841.
62 Si, A., Srinivasan, M. V. & Zhang, S. 2003 Honeybee navi-
gation: properties of the visually driven ‘odometer’. J. Exp.
Biol. 206, 12651273. (doi:10.1242/jeb.00236)
63 Baird, E., Srinivasan, M. V., Zhang, S. & Cowling, A. 2005
Visual control of flight speed in honeybees. J. Exp. Biol.
208, 3895 3905. (doi:10.1242/jeb.01818)
64 Harris, R. A., O’Carroll, D. C. & Laughlin, S. B. 2000 Con-
trast gain reduction in fly motion adaptation. Neuron 28,
595606. (doi:10.1016/S0896-6273(00)00136-7)
Real-time 3D multi-camera animal tracking A. D. Straw et al. 409
J. R. Soc. Interface (2011)
    • "[7] adopts three cameras for reconstructing a 3-dimensional hull for a fly, and then tracks the hull using a Kalman-filter tracker. A similar approach [22] employs up to eleven cameras for realtime trajectory estimation. "
    [Show abstract] [Hide abstract] ABSTRACT: Occlusion has long been a core challenge for multi-target tracking tasks. In this paper we present context-based tracking strategies and demonstrate those for two very different types of targets, namely vehicles and fruit flies, representing examples of different target categories (e.g. individually identifiable with relatively consistent trajectories versus nearly identical targets with highly irregular trajectories). Those two classes of targets are also recorded with either mobile or static camera systems, and they represent either long-term or high-frequency occlusion scenarios, respectively. Occlusions among rigid vehicles have various occlusion patterns because of the mobile recording platform and the dynamic traffic environment. In contrast, a high-density scene of fruit flies contains hundreds of targets where occlusion is relatively short, but the frequency of occlusions is very high. In this paper we propose tracking systems based on context information, and show that those are able to address both application scenarios of target tracking. The proposed strategy outperforms state-of-the-art methods in both cases. Experimental results also demonstrate the efficiency of the proposed systems for occlusion handling.
    Article · May 2016
    • "For example, simulated evolutionary algorithms were proposed to solve optimization problems [7, 8], collective behavior models were applied to help model complex traffic and transportation processes [9] and develop intelligent robots [10]. Multi-object tracking via video camera makes it possible to discover new principles underlying these collective behaviors because it can accurately acquire motion data of different organism groups without tedious manual work or pasting markers on the tracked objects and the trajectory data of them is essential for quantitatively analyzing their collective behavior [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25]. Zebrafish (Danio rerio) is widely adopted as a model organism by biologists. "
    [Show abstract] [Hide abstract] ABSTRACT: Zebrafish (Danio rerio) is one of the most widely used model organisms in collective behavior research. Multi-object tracking with high speed camera is currently the most feasible way to accurately measure their motion states for quantitative study of their collective behavior. However, due to difficulties such as their similar appearance, complex body deformation and frequent occlusions, it is a big challenge for an automated system to be able to reliably track the body geometry of each individual fish. To accomplish this task, we propose a novel fish body model that uses a chain of rectangles to represent fish body. Then in detection stage, the point of maximum curvature along fish boundary is detected and set as fish nose point. Afterwards, in tracking stage, we firstly apply Kalman filter to track fish head, then use rectangle chain fitting to fit fish body, which at the same time further judge the head tracking results and remove the incorrect ones. At last, a tracklets relinking stage further solves trajectory fragmentation due to occlusion. Experiment results show that the proposed tracking system can track a group of zebrafish with their body geometry accurately even when occlusion occurs from time to time.
    Full-text · Article · Apr 2016
    • "Furthermore, the previously cited wind tunnel studies were limited to analysing single mosquito trajectories, from a maximum of four in the field of view, and for up to 15 min. Recently, sophisticated laboratory systems have tracked 20 diurnally active Aedes aegypti mosquitoes for 3 h [11,12], and three fruit flies for 1 h [13] but their complexity (four to six cameras and one to six computers) limit the system's potential for field studies while the illumination method in [11,12] constrained the field of view to wind tunnel scales of 1.2 Â 0.3 Â 0.3 m and to a 10 cm dome in [13]. The flight chambers in laboratory experiments typically provide multiple view optical access of the relevant volume, often the entire flight domain. "
    [Show abstract] [Hide abstract] ABSTRACT: Many vectors of malaria and other infections spend most of their adult life within human homes, the environment where they bloodfeed and rest, and where control has been most successful. Yet, knowledge of peri-domestic mosquito behaviour is limited, particularly how mosquitoes find and attack human hosts or how insecticides impact on behaviour. This is partly because technology for tracking mosquitoes in their natural habitats, traditional dwellings in disease-endemic countries, has never been available. We describe a sensing device that enables observation and recording of nocturnal mosquitoes attacking humans with or without a bed net, in the laboratory and in rural Africa. The device addresses requirements for sub-millimetre resolution over a 2.0 ! 1.2 ! 2.0 m volume while using minimum irradiance. Data processing strategies to extract individual mosquito trajectories and algorithms to describe behaviour during host/net interactions are introduced. Results from UK laboratory and Tanzanian field tests showed that Culex quinquefasciatus activity was higher and focused on the bed net roof when a human host was present, in colonized and wild populations. Both C. quinquefasciatus and Anopheles gambiae exhibited similar behavioural modes, with average flight velocities varying by less than 10%. The system offers considerable potential for investigations in vector biology and many other fields.
    Full-text · Article · Apr 2016
Show more