Content uploaded by Carlos Medrano
Author content
All content in this area was uploaded by Carlos Medrano on Oct 14, 2015
Content may be subject to copyright.
Multicamera sport player tracking with Bayesian
estimation of measurements
Jesús Martínez-del-Rincón
Elías Herrero-Jaraba
J. Raúl Gómez
Carlos Orrite-Uruñuela
Carlos Medrano
Miguel A. Montañés-Laborda
Aragón Institute for Engineering Research
Computer Vision Laboratory
Maria de Luna 1
50018, Zaragoza, Spain
Abstract. We propose a complete application capable of tracking
multiple objects in an environment monitored by multiple cameras.
The system has been specially developed to be applied to sport games,
and it has been evaluated in a real association-football stadium. Each
target is tracked using a local importance-sampling particle filter in each
camera, but the final estimation is made by combining information
from the other cameras using a modified unscented Kalman filter algo-
rithm. Multicamera integration enables us to compensate for bad
measurements or occlusions in some cameras thanks to the other
views it offers. The final algorithm results in a more accurate system
with a lower failure rate.
© 2009 Society of Photo-Optical Instrumentation
Engineers. 关DOI: 10.1117/1.3114605兴
Subject terms: computer vision; tracking; image analysis; image processing; ma-
chine vision; pattern recognition.
Paper 080816R received Oct. 15, 2008; revised manuscript received Feb. 5,
2009; accepted for publication Feb. 6, 2009; published online Apr. 10, 2009.
1 Introduction
Professional sport is an extremely competitive world. Mass
media coverage has contributed to the popularity of sport,
increasing its importance in current society due to the
money and fame that it generates. In this environment, in
which any assistance is welcome, video-based applications
have proliferated.
Video-based approaches have shown themselves to be
an important tool in analysis of athletic performance, espe-
cially in sport teams, where many hours of manual work
are required to analyze tactics and collaborative strategies.
Computer-vision-based methods can provide help in auto-
mating many of those tasks.
Sport analysis can be considered as a classic human ac-
tivity recognition problem with several distinctive con-
straints and requirements, for instance, a fixed number of
targets. The huge amount of interaction between players
during a complete match due to the sport activity makes it
difficult to track all players with a single camera. In this
work, we have concentrated our efforts on developing a
tracking application capable of managing multiple sensors
in order to track multiple objects simultaneously.
The reliability of a tracking system depends on the qual-
ity of the observation that we are able to provide it, being
very sensitive to incorrect and inaccurate measurements.
Thus, our proposal is based on a double-tracking strategy.
First, a tracking filter in the image is responsible for ex-
tracting a robust and temporally coherent observation. The
particle filter
1–3
共PF兲 has been chosen due to its capacity of
modeling non-Gaussian nonlinear distributions, which
leads to a competitive advantage in the image plane, where
the perspective effect and complex interactions produce a
nonlinear environment. Therefore, PFs are applied to each
camera, which permit us to send an indication of the reli-
ability to the second tracker, as well as permitting more
accurate and more robust measurements.
Second, multiview tracking on the ground plane, simpli-
fied thanks to the previous image stage, manages the track-
ing of complex interactions between simple objects and
achieves multisensor conjugation. A variant of the Kalman
filter, called the unscented Kalman tracker,
4,5
will be able to
model the player position and velocity, since a unimodal
distribution is a good approximation of a football player in
a zenithal view. The algorithm receives multiple measure-
ments from each camera, applies a data association method
that establishes the correspondences between measure-
ments and trackers, and estimates the new positions of all
players.
This double-tracking strategy also implies a feedback
between the two trackers but prevents a categorical deci-
sion from being taken over the whole image. If a hypoth-
esis is rejected on the plane of the field using multicamera
information, the decision will be corrected. This feedback
procedure assures that the final decision will be made using
all the available information coming from all the cameras.
Thus, each camera can correct its estimation with the infor-
mation of the other cameras. Furthermore, the feedback
monitors the entry and exit of players in the scene.
The rest of this paper is organized as follows: in Sec. 2
we briefly explain the system architecture. In Sec. 3 we
introduce the processing software and the general scheme
of the system. In Sec. 4 we present the single-view stage
and the monocular tracking. Section 5 shows the tracking
algorithm based on the unscented Kalman filter 共UKF兲 and
multicamera integration. The feedback procedure is ex-
plained in the Sec. 6. Results are presented in Sec. 7, and
conclusions in Sec. 8.
0091-3286/2009/$25.00 © 2009 SPIE
Optical Engineering 48共4兲, 047201 共April 2009兲
Optical Engineering April 2009/Vol. 48共4兲047201-1
1.1 Previous Works
Numerous publications about multi-target tracking applied
to sports analysis exist, with many different applications,
such as highlight extraction, tactical analysis, or improve-
ment of athletic performance.
Highlight extraction is a useful application to TV broad-
casting and automatic labeling of recorded video database.
Working with a dynamic and uncalibrated camera is an
essential requirement. References 6–8 stand out in this field
of application.
Nevertheless, most papers are focused on tracking play-
ers, either working with mobile cameras or with multiple
cameras. In the first option, the PF is shown to be the most
adequate tracking algorithm, due to its advantage in man-
aging multiple targets. However, maintaining multimodal-
ity is not an easy task, and tracking failures appear in player
self-occlusions. This effect is due to the fact that two play-
ers of the same team are virtually identical. To deal with
this problem, Choi and Seo
9
create a synthesized image of
the occlusion out of template images saved previously. In
Ref. 10, Vermaak et al. apply a mixture of PFs, which
maintain the multimodality. By generating human models
with different techniques, like a neural network
11
or point
distribution model 共PDM兲,
12,13
results are improved over
those of the traditional color likelihood.
Tracking players with a single mobile camera can work
during short sequences, but requires complex procedures
共for entry and exit of players, and for maintaining consis-
tent labels兲 during a complete match. Furthermore, this mo-
tion dynamic in the image is affected by the perspective
effect. For this reason, a shared reference, such as a plan, is
commonly used to simplify the problem. Examples are
shown in Refs. 14 and 15 where a dynamic calibration is
recalculated at each frame.
A more widespread option consists in the use of several
static cameras with a shared reference. An exhaustive ap-
proach to the whole problem is tackled in Refs. 16 and 17,
which were written in the framework of the INMOVE
project. A double-tracking strategy 共image and plane兲 is
employed to track all the players, although the simple al-
gorithm applied in the image cannot handle all the complex
situations that appear during a football match.
Football is a collaborative sport, and the ball is an es-
sential element of the game. However, its tracking is com-
plex because its movements are three-dimensional, in con-
trast with the movements of the players, who are always in
the same plane. Several papers tackle this problem,
18,19
but
it is unsolved as yet. Ball tracking is outside the scope of
this paper and will be contemplated in future works.
2 System Architecture
The system input is composed of video data from static
analog cameras with overlapping fields of view at an
association-football stadium. The cameras are positioned
around the stadium at 25 m in height 共on the roof兲. A com-
promise between cost and good performance 共with regard
to resolution, overlapping, etc.兲 has been sought. A detailed
scheme of the locations can be viewed in Fig. 1. All cam-
Fig. 1 Camera locations and fields of view on the football pitch.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-2
eras have been calibrated to a common ground-plane coor-
dinate system using a homographic transformation.
20
The video flow is captured by a video capture card con-
nected to a PC. Each card has four independent channels
with four digital signal processors, which allows us to
record video at a rate of 14 frames per second, using a
hardware MPEG-4 video compression codec. In order to
record the information provided by eight cameras, two
computers have been installed. The system has been de-
signed to exploit synchronization between cameras. The
four cameras connected to each computer are synchronized
by the video capture card; the synchronization between the
two recording servers is obtained using an ad hoc wireless
network 共WIFI兲, which synchronizes the system clocks of
the two computers.
Recorded videos are sent to the processing server. This
server comprises eight single-camera processing comput-
ers, a multicamera integration server, which receives data
from each camera processor and gives the final estimation,
and a GigaLAN switch, which links all computers and en-
ables message transfer. The multicamera integration server
directs and controls the process. It is the device in charge of
maintaining the synchronization between cameras as well
as obtaining the positions of all players at each time step.
The infrastructure and its connection can be seen in Fig. 2.
Each computer associated with each camera processes
its corresponding frame, obtaining a set of features and a
first hypothetical position of targets in the image. When it
has finished, the data are transmitted to the server. A mul-
ticamera server awaits the responses of all cameras and
updates the state estimation of the player on the pitch. Fi-
nally, the server sends a message to permit the camera pro-
cessor to continue with the next frame. However, this mes-
sage is not a simple acknowledgement, since it has
feedback information in order to correct failures in the im-
age. A more detailed explanation of this process is given in
Sec. 5.
The algorithms were compiled in Visual C⫹⫹ and pro-
grammed with a multithread philosophy.
3 General Scheme of Processing
The processing algorithm 共Fig. 3兲 can be divided into two
main parts: the single-view processing stage, which is ap-
plied to each camera independently, and the multiview pro-
cessing stage, which integrates the previous results and
gives us the final estimation state.
Each target is tracked using a local importance-sampling
PF in each camera, but the final estimation is made by
combining information from the other cameras using a
modified UKF algorithm. Multicamera integration enables
compensating bad measurements or occlusions in some
cameras thanks to the other views. The final algorithm re-
sults in a more accurate system with a lower failure rate.
The purpose of the single-view processing stage consists
in extracting a set of hypotheses to be considered in the
multiview tracking process. By applying a tracking algo-
rithm, we obtain a robust-to-occlusion method, which ex-
tracts plausible hypotheses. The PF is a good choice due to
its advantages in multitarget tracking. Color, movement,
and tridimensional information are used to determine the
likelihood of the particles, and to weigh them.
Results of this stage are modeled as Gaussians, whose
mean is the position of each player and whose covariance
represents the reliability of this position. Both are sent to
the multiview tracking algorithm as measurements. In this
Fig. 2 System architecture.
Fig. 3 General scheme of the framework based on a double-tracking strategy.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-3
way, we obtain more robust and more accurate measure-
ments with an extra feature: their reliability.
For the multiview tracking process, unscented Kalman
trackers are used. First, a data association algorithm estab-
lishes the correspondences between measurements and
trackers. Then, the UKF algorithm combines all the mea-
surements corresponding to each tracker, taking into ac-
count their reliability.
The output from this process is the 20 player positions
per time step 共excluding both goalkeepers兲. The system
also indicates the category 共team兲 of each player, and main-
tains the correct number of players in each category. Fur-
thermore, although the identification of individual players
is not possible given the resolution of the input data 共only
the team is recognized兲, a label with the name of each
player is assigned in the first frame, and it will be tracked
during the match.
Finally, the output is sent to each camera to correct fail-
ures in the image tracking 共the feedback procedure previ-
ously mentioned兲.
It may seem strange to use two different tracking algo-
rithms, a PF on the image and an UKF on the ground—
especially in that the multimodal probability distribution
obtained with the PF is approximated with a Gaussian dis-
tribution for use by the UKF, with an evident loss of infor-
mation. However, this assumption is logical if we think of a
player observed from a zenithal view, where no occlusions
should exist, and a Gaussian is an acceptable representation
of a football player.
The approximation performed helps to solve the data
association problems. Measurement assignment can be very
complex when dealing with multiple cameras and multiple
trackers, as in this case. When two trackers approach each
other, it is necessary to use a measurement assignment al-
gorithm that prevents the two trackers from ending up, after
the occlusion, following the measures generated by only
one of them.
There are several methods to deal with data association
共such as the nearest neighbor
21
and the auction
algorithm
22
兲, but none of them is able to cope with a lot of
measures 共as many as there are particles in the PF兲 for each
camera and for each tracker, with an acceptable processing
time. That is the main reason for having two different track-
ing algorithms: The PF yields very good tracking of the
images, while representing the measurements by their
Gaussian approximations allows us to perform a quick and
reliable data association of the measurements with trackers,
applying an UKF.
4 Single-View Processing Stage
As mentioned in previous sections, targets in the image are
tracked using PFs. A PF enables us to manage all players in
the same camera using a single filter, but this method re-
quires a lot of particles to be effective, and consequently a
large computational cost. On the other hand, we can apply
a PF to track each player. Although a multiple-target tracker
based on multiple inefficient single-target trackers can re-
duce the overall performance, we maintain successful
tracking of each player by introducing multicamera infor-
mation by means of the feedback procedure. Without this
information, even a unique PF is not able to ensure correct
results, as is shown in Ref. 10. The strength of the proposed
tracker is demonstrated in Sec. 7.
First, we introduce some notions connected with the PF.
After that, we explain our modification in each stage of the
PF: calculation of the prior probability and posterior prob-
ability, and estimation of the final state. The detailed algo-
rithm is presented in Algorithm 1 共Sec. 5兲.
4.1 Importance-Sampling Particle Filter
In the late 1960s, the idea of sequential importance-
sampling techniques, like the Monte Carlo filter, was con-
ceived. Although interesting, these approaches suffered
from degeneration. Gordon et al.
1
came up with the idea of
resampling to cope with this intrinsic degeneration. This
improvement led to the current form of the particle
filter,
2,3,10,23–25
which is one of the most extensively applied
methodologies to track multimodality and nonlinear and
non-Gaussian models.
The PF is a hypothesis tracker that approximates the
filtered posterior distribution p共x
t
兩z
1:t
兲 over the target state
x
t
given all previous observations z
1:t
up to time t,byaset
of weighted hypotheses called particles. Following this nu-
merical approach, particles 兵x
t−1
i
,⌸
t−1
i
其
i=1
N
are distributed ac-
cording to the target density. The filtering distribution is
given by
p共x
t
兩z
1:t
兲 ⬀ p共z
t
兩x
t
兲
兺
j=1
N
⌸
t−1
i
· p共x
t
兩x
t−1
i
兲, 共1兲
where ⌸
t−1
i
is the weight for the particle x
t−1
i
. Here N
samples x
t
i
are drawn from a proposal distribution
q共x
t
i
兩x
t−1
i
,z
t
兲共also called the importance density兲. The
weights of the new particles can be computed as
2
⌸
t
i
⬀⌸
t−1
i
p共z
t
兩x
t
i
兲p共x
t
i
兩x
t−1
i
兲
q共x
t
i
兩x
t−1
i
,z
t
兲
. 共2兲
The choice of an adequate proposal distribution is one of
the most critical design issues. Bootstrap particle filters use
the state transition prior q共x
t
i
兩x
t−1
i
,z
t
兲= p共x
t
i
兩x
t−1
i
兲 as the pro-
posal distribution to place the particles, since it is intuitive
and can be easily implemented.
1,3,26
Since this probability
does not take into account the most recent observation z
t
,
all the particles may have low likelihood, contributing to an
erroneous posterior estimation. Therefore, the exclusive use
of the transition probability as the proposal distribution
makes the algorithm prone to be distracted by background
clutter.
An alternative approach is sampling the observation to
improve the efficiency of the PF. The idea consists in ap-
plying for the proposal distribution a function g共x
t
i
兲 that
introduces information about the current observation. Nev-
ertheless, since particles are distributed using this proposal
instead of the transition probability, they are not generated
from previous particles, and therefore, the particles cannot
be paired with previous ones to compute the probability
p共x
t
i
兩x
t−1
i
兲. So an additional function f
t
共x
t
i
兲 is applied to
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-4
maintain the temporal coherence. This term represents the
probability of appearance, which is obtained using the
weighted mean over all possible transitions.
⌸
t
i
⬀
p共z
t
兩x
t
i
兲f
t
共x
t
i
兲
g
t
共x
t
i
兲
, 共3兲
where
f
t
共x
t
i
兲 =
兺
j=1
N
⌸
t−1
j
· p共x
t
i
兩x
t−1
j
兲. 共4兲
Note that this modification implies that the dynamical
model is not only used, but it is also evaluated. Although
the sum in Eq. 共4兲 increases the complexity of the algo-
rithm from O共N兲 to O共N
2
兲, the real effect is negligible in
practice because the computational cost of this stage 共for
practical values of N兲 is dwarfed by the time expended on
the observation process, for instance. Moreover, a more ef-
ficient particle set implies a smaller number of particles and
therefore a smaller complexity growth in this stage.
This approach makes up for particle distribution and en-
sures that the importance function does not distort the cal-
culation of the posterior probability p共x
t
兩z
t
兲. In this way,
any proposal distribution can be chosen if the number of
particles, N, is large enough.
In practice, the proposal distribution derives from a
rough observation process and might produce errors and
imperfect estimations. In this regard, it is recommended to
add a percentage of particles by conventional sampling.
Our approach is similar to Ref. 27, where an auxiliary
tracker generates a more accurate proposal distribution us-
ing secondary observations. On the other hand, our pro-
posal uses the same features as those of the likelihood func-
tion to improve the sampling process, for two important
reasons: Finding auxiliary observations is not always an
easy task, and these observations themselves need a good
proposal distribution if we want it to help the main tracker.
Otherwise, it will produce worse estimations.
By making proposals that have high conditional likeli-
hood, we reduce the costs of sampling many times for par-
ticles which have very low likelihood, improving the statis-
tical efficiency of the sampling procedures, so that we can
reduce the number of the particles substantially.
4.2 Prior Probability
In this paper, we present a novel approach to the particle
filter algorithm based on a priori measurement in the color
space, which reduces the costs of evaluating hypotheses
with a very low likelihood. As is shown in Ref. 28, these
hypotheses 共or particles兲 appear due to poor prior density
estimation, particularly when the object motion between
frames is badly modeled by the dynamic model. The im-
provement of the statistical efficiency of the sampling al-
lows substantial reduction of the number of particles.
Color-based image features are used for the proposal
distribution. To model color density functions we have two
possibilities: parametric and nonparametric. The major ad-
vantage of nonparametric approaches is their flexibility to
represent complicated densities effectively. However, they
suffer from large memory requirements and computational
complexity. On the contrary, parametric approaches sim-
plify the target feature modeling and reduce drastically the
computational cost needed to process them, but the as-
sumptions introduced limit their application to simple dis-
tributions. Parametric models reduce the color distribution
of the target to easily parameterizable functions like a
Gaussian function or a mixture of several Gaussians.
There are many parametric density representations; for
instance, Ref. 29 uses some Gaussian mixture models
共GMMs兲 in hue-saturation space to model the target’s color
distributions. Its authors propose an adaptive learning algo-
rithm used to update these color models over time. In Refs.
29 and 30, their authors suggest GMMs, but their method
requires knowledge of the number of components. In Ref.
31, its authors propose a density approximation methodol-
ogy where the density is represented by a weighted sum of
Gaussians, whose number, weights, means, and covariances
are determined automatically.
Unlike parametric models, a nonparametric density esti-
mator is a more general approach that does not assume any
specific shape for the density function, so it is able to rep-
resent very complicated densities effectively. It estimates
the density function directly from the data without any as-
sumptions about the underlying distribution. As mentioned
in Ref. 31, this avoids having to choose a model and esti-
mating its distributions parameters. The histogram is the
simplest nonparametric density estimator.
In video-surveillance applications, a color histogram is
an adequate solution to characterize each target, due to its
simple initialization. However, sport provides an environ-
ment with plain colors known previously. So, we use a
GMM to generate each object’s distribution, which we ex-
plain in the next section.
The importance function g共x
t
i
兲 is introduced in the algo-
rithm as a mask. The mask 共see Fig. 4兲 is obtained by
extracting the main colors of the object 共each color is a
Gaussian兲 and detecting them in the full image or in the
zone surrounding the estimated position 共extracted with the
last mean state兲, whose dimensions depend on the position
and speed variances. Only hypotheses located in this mask
are evaluated and used in the estimation. Moreover, we also
apply this information to obtain a fast estimation of the
posterior density.
4.3 Posterior Probability
Once prediction has been calculated, the multiple hypoth-
eses generated must be evaluated using an adequate likeli-
hood function. In order to obtain a likelihood function that
will weigh each particle, we combine multiple visual clues:
Fig. 4 Color probability image and prior mask generation algorithm.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-5
color, movement and height difference. Color is the most
discriminative clue, which differentiates between object
and background, but also between different kinds of ob-
jects. Movement cannot distinguish between objects, but is
very useful for eliminating background areas with the same
colors as the objects, for instance, lines that define the field.
Height measurement helps us to compensate the perspec-
tive effect and thus to obtain a better estimation of the real
object size.
4.3.1 Color probability
A new input frame is projected onto the target probability
space to generate the color probability image 共CPI兲.We
generate a CPI for each kind of object to be tracked. In our
particular case, we need two CPIs, one for each team, but
more CPIs could be generated: two CPIs for each team if
clothes have complex colors, or even one for each person in
a video surveillance application. The values of the CPI pix-
els are taken from the target’s GMMs and used as classifi-
ers for each pixel in the input picture. The probability as-
signed to each pixel is given by the distance 共using as
metric the Mahalanobis distance兲 to the nearest Gaussian.
In order to generate the GMM, we extract pixels corre-
sponding to both teams and background 共field兲. Each pixel
is projected to the HSV space to reduce the influence of
changing illumination and shadows. Using the expectation-
maximization 共EM兲 algorithm, we obtain several Gauss-
ians, which represent the whole color space of the environ-
ment. The number of Gaussians is chosen according to the
necessities. In our case we need two Gaussian models for
the two teams, two for the field, and two for the halos
surrounding the players due to compression, which can be
considered as a mixture of colors. Once we have decided
the total number of Gaussians and we have applied the EM
algorithm, we have to choose a Gaussian corresponding to
each team for generating both CPIs. Many validation indi-
ces can be applied
32,33
for this purpose. However, the color
space is complex and contains arbitrary shapes, so we can
conclude that no index really guarantees the correct choice.
We have developed a simple index, called the motion
validation index 共MVI兲, based on the use of motion, to
validate the color segmentation. This index is calculated for
each Gaussian as the number of pixels that are classified
into that cluster 共I
Gauss
兲 and have been detected as move-
ment 共 I
mov
兲 divided by the total number of pixels that are
classified into that cluster. Those Gaussians that obtain the
highest values are selected for the corresponding football
teams:
MVI
Gauss
共i兲 =
兺
x
兺
y
I
Gauss
共x,y,i兲I
mov
共x,y兲
兺
x
兺
y
I
Gauss
共x,y,i兲
∀ i 苸 G, 共5兲
where G is the whole GMM, and x and y are the columns
and rows of the images to be processed.
The MVI index also checks that an adequate number of
Gaussians have been selected to model the whole color
space. If the sum of the MVI indices corresponding to the
Gaussians associated with each football team decreases
when we split or merge the Gaussians, we will stop and
select the optimum number of Gaussians.
In order to compensate illumination changes, the means
and covariances of the Gaussians classified into back-
ground and halo are updated, using an online updating
algorithm.
34
Gaussians associated with each team are not
updated, due to the risk of introducing noise.
Once the CPIs have been calculated 共Fig. 4兲, the prob-
ability associated with each particle can be extracted. The
state of every particle defines the width W and height H of
the target as well as the position of the center of the target,
共x
0
, y
0
兲. Using these parameters, a rectangular kernel, with
the same dimensions as the predicted target, enables us to
obtain the posterior probability.
This rectangular kernel can be computed with a large
advantage in computational cost terms: Rectangular filters
can be computed very efficiently using the integral image
method proposed by Viola and Jones.
35
The integral image
at location 共x
0
, y
0
兲 contains the sum of the pixels above and
to the left of x
0
, y
0
, inclusive. Using the integral image, any
rectangular sum can be computed as a sum of four points.
For instance, the sum within the inner rectangle of Fig. 5
can be computed as A + D − 共B+ C兲. So, the new particle set
is updated following the expressions
⌸
˜
t
c
共n兲 = D + A − C − B,
⌸
t
c
共n兲 =
⌸
˜
t
c
共n兲
兺
n
⌸
˜
t
c
共n兲
. 共6兲
This computation is fast, thanks to the integral image
previously calculated.
4.3.2 Motion probability
The motion probability is a weight assigned to each particle
and based on the number of pixels of motion that it con-
tains. Its values are computed using the same kernel used
for calculating color probability, but in this case over the
motion detection image I
mov
.
⌸
˜
t
m
共n兲 =
兺
x=x
0
−W/2
x
0
+W/2
兺
y=y
0
−H/2
y
0
+H/2
I
mov
共x,y兲
HW
,
⌸
t
m
共n兲 =
⌸
˜
t
m
共n兲
兺
n
⌸
˜
t
m
共n兲
. 共7兲
Fig. 5 Convolution kernel to enhance target candidates on the inte-
gral image. Here W and H stand for the predicted target width and
height.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-6
4.3.3 Height difference probability
The last weight is defined as the difference between the
height H stored in the state vector of the particle and the
height that the particle must have because of its new posi-
tion 共given by the propagation stage兲 height 共x
0
, y
0
兲. This
height is given by an algorithm, called the height estimator,
which is explained in Sec. 5 共Algorithm 2兲. We then have:
⌸
˜
t
h
共n兲 = 关 H − Height共x
0
,y
0
兲兴
␣
,
⌸
t
h
共n兲 =
⌸
˜
t
h
共n兲
兺
n
⌸
˜
t
h
共n兲
, 共8兲
where
␣
is a constant that fixes the discriminative power of
the weight and depends on the field of view of the camera
and the lens parameters.
4.3.4 Estimation of the new state
Finally, once the last a posteriori feature has been obtained,
all these features must be combined in order to estimate the
new state for the object to be tracked. The score for each
candidate is calculated using the multiplication rule, that is,
assuming independence between features:
⌸
t
共n兲 = ⌸
t
c
共n兲 · ⌸
t
n
共n兲 · ⌸
t
h
共n兲. 共9兲
The particle set corresponding to this candidate is used to
estimate the new state by means of a weighted average:
E关S
t
兴 =
兺
n=1
N
⌸
t
共n兲f
t
共x
t
兲
g
t
共x
t
兲
· x
t
共n兲, 共10兲
where a correction factor due to the importance sampling
must be included in the particle weights.
However, this estimation is independent for each cam-
era, and it can fail if there are occlusions in the image.
Using multicamera information, this effect is reduced, as
we explain in Sec. 6. A global view of the algorithm can be
seen in Fig. 6 and in Algorithm 1 共Sec. 5兲.
5 Multisensor Data Fusion
Once all players have been detected in each camera, we
project all the measurements onto the plan in order to have
a shared reference space. Each hypothesis is projected, con-
verting it to a single point, which is on the floor. When
these transformations have been made, multicamera track-
ing is applied. For this purpose, we use the UKF as tracking
algorithm.
The UKF
4,5
is a very popular tracking algorithm, which
provides a way of processing nonlinear but Gaussian mod-
els. We propose a modified UKF to extend its application to
multisensor scenes, thus improving the global result. The
combination of several independent sensors increases the
precision and robustness of our tracking system, since it
makes it possible to solve difficult situations, such as oc-
clusions or noise. Multicamera tracking systems have been
exhaustively proposed in previous literature, for instance,
in Refs. 36–38.
First of all, in order to integrate measurements from dif-
ferent cameras, a shared reference is needed. Our reference
is a plan of the football field, and a homographic matrix for
each camera has been calculated. Thus, we can transform
points from the image to the coordinate system of the plan.
Besides, we need a height estimator, which also implies a
previous calibration. Before explaining the tracking algo-
rithm, both procedures are shown in depth in the following
subsections.
Algorithm 1 (Single view stage algorithm). Given a par-
ticle set 兵x
t−1
n
,⌸
t−1
n
其
n=1
N
, which represents the posterior prob-
ability p共x
t−1
兩z
t−1
兲 at time t−1:
1. Generate N
1
new samples from the importance func-
tion x
t
n
⬃g共x
t
兲. Particles are distributed in the inter-
section between an ellipse surrounding S
t
, with radius
proportional to the position variance, and the color
model mask.
2. Propagate N
2
samples from the generated samples in
the previous time step 共resampling兲 x
t
n
⬃p共x
t
兩x
t−1
n
兲.
We obtain a number N
t
=N
1
+N
2
of particles N
t
⫽N
t−1
.
3. Weight the particles with ⌸
t
n
⬃p共z
t
兩x
t
n
兲f
t
共x
t
n
兲/ g
t
共x
t
n
兲
using the equation 共9兲, and normalize them: 兺
n=1
N
t
⌸
t
n
=1.
4. Reweight the particles to include the effect of the
feedback procedure: w
t
n
=⌸
t
n
·
共x
t
n
,x
t
plane
兲, where
is
a distance function between the target in the image
and on the plan.
5. Estimate the new position of the state, E关S
t
兴
=兺
n=1
N
t
w
t
n
x
t
n
.
5.1 Homographic Transformation
Taking a minimum of four points, we can establish the
correspondence between the floor plane in the image and
the plan of the field.
20
With this transformation, we can
locate the position of players on the plan, assuming that the
player is in contact with the floor. One homographic matrix
must be calculated for each camera.
Fig. 6 Single-view tracking algorithm combining importance sam-
pling, multi-cue conjugation, and multicamera feedback estimation.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-7
5.2 Height Estimator
As we mentioned in Sec. 4.3.3, we require a tool for ob-
taining the number of pixels that represents the average
height of a person. Due to the perspective effect, this num-
ber depends on the location of the person in the image. We
are able to ascertain this through scene calibration.
39
First, we have to obtain a plane perpendicular to the
floor, which has been defined with the four points used to
calculate the homographic matrices. For this purpose, we
have to extract four points of the walls, goal posts, or any
other vertical structure. Knowing both planes, we can cal-
culate three vanishing points 共two horizontal and one ver-
tical兲. These vanishing points permit us to project any point
onto the coordinate axes, and elevate it vertically by a num-
ber of pixels corresponding to the height of the target at this
point of the image. This number of pixels has been deter-
mined by a reference height in the image, that is, marking
two points in the image, which are projected onto coordi-
nate axes, and giving the real height in meters. We use the
goal posts as reference, but in the cameras in whose field of
view there are no vertical structures, we can use a person of
standard reference height, assuming everybody has the
same height.
This methodology is able to return the head point 共given
the foot point兲, or return the height 共given both points兲.We
use the first application to determine the number of pixels
corresponding to a player’s height at a given location.
The height estimator algorithm is shown in Algorithm 2
and Fig. 7. Homogeneous coordinates are utilized to sim-
plify the mathematical operations.
Algorithm 2 (Height estimation algorithm).
• Calculate the slope of the line H2−H1:
v
=
y
H2
− y
H1
x
H2
− x
H1
.
• Estimate point H2, supposing the reference height to
be 1.8 m, and using the proportion between the num-
ber of pixels and the reference height in meters.
• Calculate the line L
H2−PFII
=H2⫻PFII in homoge-
neous coordinates, that is,
H2 = 关x
H2
y
H2
1兴.
• Calculate the line L
A−PFI
=A⫻PFI.
• Calculate the axis Y : L
H1−PFII
=H1⫻PFII, where H1
is the coordinate origin.
• Calculate the point A
⬘
=L
A−PFI
⫻L
H1−PFII
.
• Calculate the line L
A
⬘
−PFIII
=A
⬘
⫻PFIII.
• Calculate the point B
⬘
=L
A
⬘
−PFIII
⫻L
H2−PFII
.
• Calculate the line L
A−PFIII
=A⫻PFIII.
• Calculate the line L
B
⬘
−PFI
=B
⬘
⫻PFI.
• Calculate the point B = L
B
⬘
−PFI
⫻L
A−PFIII
.
• Calculate the height:
Height共x
A
,y
A
兲 = 关共x
A
− x
B
兲
2
+ 共y
A
− y
B
兲
2
兴
1/2
.
5.3 Multicamera UKF
In this subsection we present a modification of the UKF
that combines several measurements, provided by different
cameras, for each tracked object. Due to the use of several
sensors as measurement sources, we call the algorithm the
multicamera unscented Kalman filter 共MCUKF兲.
40
This al-
gorithm can be extended to different types of sensors.
41
The filter can be divided into three stages: state predic-
tion, measurement prediction, and estimation. The scheme
of this process is shown in Fig. 8. An external matching
process must be used in order to make correspondences
between trackers and measurements.
5.3.1 State prediction
In the prediction stage, the tracker is initialized with the last
estimation done in the previous time step. Hence, knowing
the previous state x
ˆ
k−1
, with e⫻ 1 components, and its co-
variance P
ˆ
k−1
, with e⫻e components, both the extended
covariance P
ˆ
k−1
a
and state x
ˆ
k−1
a
can be obtained by concat-
enating the previous parameter and the state noise v
k
. This
Fig. 7 Left and middle: estimated height for several players in the image and their projection onto the
height reference. Right: height estimation procedure.
Fig. 8 Multicamera unscented Kalman filter algorithm for multitarget
tracking.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-8
is a simplification of the UKF, in which state and measure-
ment noises are used. The measurement noise will be used
in the measurement prediction stage:
x
ˆ
k−1
a
= 关x
ˆ
k−1
T
E兵v
k
其兴
T
with E兵v
k
其
= 关00 ¯ 0兴
T
, 共11兲
P
ˆ
k−1
a
=
冋
P
ˆ
k−1
0
0
R
v
册
, 共12兲
where R
v
is the state noise matrix. The number of sigma
points is 2n +1 , where n is the length of the extended state.
Following the classical equation for the unscented trans-
form, the first sigma point corresponds with the previous
frame estimation, the next n sigma points are the previous
estimation plus each column of the previous estimation ma-
trix, and the last n points are the previous estimations mi-
nus the same columns:
X
k−1
a
= 关x
ˆ
k−1
a
x
ˆ
k−1
a
+ 关共n + 兲P
ˆ
k−1
a
兴
1/2
x
ˆ
k−1
a
− 关共n
+ 兲P
ˆ
k−1
a
兴
1/2
兴. 共13兲
The components of these sigma points can be divided into
two groups: those derived from the state X
k−1
x
, and those
derived from the state noise X
k−1
v
.
The weights assigned to each sigma point are calculated
in the same way as in the unscented transformation. There-
fore, the 0th weight will be different for obtaining the mean
weight W
i
共m兲
and the covariance weight W
i
共c兲
:
W
0
共m兲
=
n +
,
W
0
共c兲
=
n +
+ 共1+
␣
2
+

兲,
W
i
共m兲
= W
i
共c兲
=
1
2共n + 兲
, i = 1,2, ... ,n, 共14兲
where =
␣
2
共n + k兲− n is a scale parameter. The constant
␣
involves the spread of the sigma point around the mean x
¯
;it
has a small positive value 共usually 1 ⬎
␣
⬎10
−4
兲. The con-
stant k is a secondary scaling parameter, usually with val-
ues between 0 and 3− n. Finally,

is used to incorporate
previous knowledge of the distribution of x.
In order to predict the sigma points at the k’th instant,
knowing the previous points, the transition matrix F is
firstly required. Using a constant-velocity model, F can be
calculated as
x
ˆ
k兩k−1
= F · x
ˆ
k−1
. 共15兲
The sigma point at the next instant of time is
X
ˆ
k兩k−1
x
= F · X
k−1
x
+ X
k−1
x
. 共16兲
With these points and their weights, the predicted mean and
covariance are given by
x
ˆ
k兩k−1
=
兺
i=0
2n
W
i
共m兲
X
i,k兩k−1
x
,
P
ˆ
k兩k−1
=
兺
i=0
2n
W
i
共c兲
关X
i,k兩k−1
x
− x
ˆ
k兩k−1
兴关X
i,k兩k−1
x
− x
ˆ
k兩k−1
兴
T
. 共17兲
A graphic representation of this process is depicted in Fig.
9.
5.3.2 Measurement prediction stage
The second contribution to the original UKF consists in
obtaining the measurement prediction taking into account
the measurements and measurement noises of each camera.
In the measurement prediction stage, the first step consists
in calculating the state predictions and the measurement-
tracker matching. Next, using the predictions and the mea-
surement noise, both the extended state x
ˆ
k
⬘
a
and the covari-
ance P
ˆ
k
⬘
a
can be developed. The concatenated measurement
noise matrix R
n
is built from measurement noise matrices
of each camera, R
i
n
with r⫻r components, where r is the
dimensionality of the measurement and i=1,2, ...,S, with
S the number of cameras:
x
ˆ
k
⬘
a
= 关x
ˆ
k兩k−1
00 ¯ 兴
T
, P
ˆ
k
⬘
a
=
冋
P
ˆ
k兩k−1
0
0
R
n
册
, R
n
=
冤
R
1
n
0
0
R
2
n
冥
. 共18兲
In such a case, a tracker with S measurements, from S
different cameras, will have a state vector with n
⬘
=rS+ e
components, and 2共rS+ e兲+ 1 sigma points:
X
k−1
⬘
a
= 关x
ˆ
k
⬘
a
x
ˆ
k
⬘
a
+ 关共n + 兲P
ˆ
k
⬘
a
兴
1/2
x
ˆ
k
⬘
a
− 关共n + 兲P
ˆ
k
⬘
a
兴
1/2
兴,
共19兲
and we have
Fig. 9 State prediction of mean and sigma points.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-9
W
0
⬘
共m兲
=
n
⬘
+
⬘
,
W
0
⬘
共c兲
=
n
⬘
+
⬘
+ 共1+
␣
⬘
2
+

⬘
兲,
W
i
⬘
共m兲
= W
i
⬘
共c兲
=
1
2共n
⬘
+
⬘
兲
, i = 1,2, ... ,n
⬘
. 共20兲
The sigma point components can be divided into compo-
nents derived from the state X
k
⬘
x
and components derived
from the measurement noise X
k
⬘
n
, which can be separated
according to the measurement 1,2, ...,S:
X
k
⬘
n共1兲
,X
k
⬘
n共2兲
, ...,X
k
⬘
n共S兲
.
The measurement matrix H, which makes the transfor-
mation from state coordinates to measurement coordinates,
is applied to obtain the measurement prediction sigma point
from sigma points, and applies the gain:
Y
k兩k−1
共s兲
= H · X
k
⬘
x
+ X
k
⬘
n共s兲
, s = 1,2, ... ,S. 共21兲
Using these S sets of sigma points, we can obtain, for
each measurement, the measurement prediction, the cova-
riance prediction, and the measurement-state cross-
covariance:
y
ˆ
k兩k−1
共s兲
=
兺
i=0
2n
⬘
W
i
⬘
共m兲
Y
i,k兩k−1
共s兲
, 共22兲
P
y
ˆ
k
y
ˆ
k
共s兲
=
兺
i=0
2n
⬘
W
i
⬘
共c兲
关Y
i,k兩k−1
共s兲
− y
ˆ
k兩k−1
共s兲
兴关Y
i,k兩k−1
共s兲
− y
ˆ
k兩k−1
共s兲
兴
T
, 共23兲
P
x
ˆ
k
y
ˆ
k
共s兲
=
兺
i=0
2n
⬘
W
i
⬘
共c兲
关X
i,k兩k−1
⬘
共s兲
− x
ˆ
k兩k−1
兴关Y
i,k兩k−1
共s兲
− y
ˆ
k兩k−1
共s兲
兴
T
. 共24兲
These equations are depicted in Fig. 10, with a two-camera
example.
5.3.3 Estimation stage
First, a Kalman gain for each measurement associated to
the tracker is calculated:
K
k
共s兲
= P
x
ˆ
k
y
ˆ
k
共s兲
/P
y
ˆ
k
y
ˆ
k
共s兲
. 共25兲
After that, measurements from different cameras must be
combined to obtain a shared estimation 共Fig. 11兲. The
weights

共s兲
play the role of combining the different mea-
surements according to their reliability. It is considered that
weights are composed of two factors: the distance to the
prediction, and the covariance of each measurement. The
two are combined according to the importance given to
each one.
The set of weights will be normalized, since the sum of
the weights must be 1. The mean and covariance estimate
will be
x
ˆ
k
= x
ˆ
k兩k−1
+
兺
s=1
S

共s兲
K
k
共s兲
共y
k
共s兲
− y
ˆ
k兩k−1
共s兲
兲, 共26兲
P
ˆ
k
= P
ˆ
k兩k−1
+
兺
s=1
S

共s兲
K
k
共s兲
P
y
ˆ
k
y
ˆ
k
共s兲
共K
k
共s兲
兲
T
. 共27兲
5.3.4 Comparison of MCUKF versus UKF
We can observe the benefits of MCUKF in Fig. 12, where a
comparison between our multicamera tracking algorithm
and two independent UKFs is established. The comparison
is made between the MCUKF estimation and the mean of
two independent estimations obtained with two single-
camera UKFs. For low state noise R
v
or measurement noise
R
n
, the two algorithms give similar results. However,
MCUKF obtains a lower mean squared error when the
noise level rise.
5.4 Data Association
In the previous subsections, we have described a tracking
algorithm for a single object. In a real application, however,
there are several targets moving in the same region and
interacting with each other. Thus, we need to assign the
corresponding measurement to each tracker chosen, among
all the measurements and the possible existing distracters.
This assignment will be made in the matching stage.
Given the difficulties that multiple target tracking in-
volves, several techniques have been proposed in the litera-
ture. Data association between observations and trackers is
the problem to be solved, and coalescence 共meaning that
the tracker associates more than one trajectory with some
targets while losing track of others兲 is the most challenging
Fig. 10 Hypothetic sigma point distribution for measurements from
two different cameras. These points adjust their positions to repre-
sent the measurement covariance placed on the prediction.
Fig. 11 Graphic scheme of the estimation.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-10
difficulty, especially when similar targets move close or
present occlusions. Moreover, cluttered scenarios produce
false alarms, which introduce confusion into the association
algorithm. A popular approach that tries to cope with this
problem is the Markov chain Monte Carlo 共MCMC兲
42
PF.
This method models explicitly the interaction of targets by
removing measurements that fall within a certain radius of
other target predictions.
Another emerging technique to deal with multitarget
tracking is based on a random set perspective, which
was detailed by Goodman et al.
43
Using this mathematical
tool, Mahler
44
derived the probability hypothesis density
共PHD兲 filter. This filter avoids the necessity of data asso-
ciation and allows one to tackle a number of targets and
observations, which can be created and destroyed during
the tracking. Sequential Monte Carlo techniques for
random-set-based filters, including the PHD filter proposed
in Ref. 45 and the Gaussian solution to the PHD filter pro-
posed in Ref. 46, have led to many multitarget tracking
applications. A higher-order random set multitarget filter
called the cardinalized PHD was proposed in Ref. 47, and
closed-form solutions were published in Ref. 48. Lately,
Pham et al. have extended their application to multisensor
scenarios.
49,50
Fortunately, although in general multitarget tracking
deals with state estimation of a variable number of targets,
assumptions about a constant or known number of targets
can be used to constrain the problem. This is exactly our
case, since the number of football players is constant, and
this fact allows us to simplify the problem and choose a
much more simple and efficient approach. We base our
multitarget proposal on a set of independent filters, utilizing
the multisensor redundancy and the feedback procedure for
handling the coalescence among targets. Although less gen-
eral, independent filters have some advantages: They lead
to a linear growth in the computation, they do not require a
careful design of the proposal density, which played an
important role in the success of previous methods 共such as
MCMC兲, and the inclusion of new trackers in the image
共when a player goes into the visual field of a camera兲
does not affect the approximation accuracy of the existing
trackers.
For those reasons, the goal of this stage consists in se-
lecting the measurements for each independent tracker.
This is the only stage that cannot be computed sequentially
or independently for each tracker. The proposed algorithm
is based on the theory of data association presented in Ref.
51. In this context, we have chosen a modified version of
the nearest-neighbor algorithm.
21
The main difference from
the original algorithm consists in a simple approach that
limits the number of combinations in order to reduce the
computing time, although obtaining a suboptimal result.
In our algorithm, two conditions must be fulfilled for
assigning the measurements: Not more than one measure-
ment of each camera can be assigned to each tracker, and a
measurement cannot be assigned to two different trackers.
In other words, an algorithm to avoid conflicts must be
applied. A set of possible measurements is assigned to each
tracker, using as criterion the Mahalanobis distance be-
tween trackers and measurements. The Mahalanobis dis-
tance is a metric function based on correlations between
variables, which allows us to express the similarity between
those variables, taking into account the distribution of
samples. The equation for the Mahalanobis distance is as
follows:
D = 关共x − x
¯
兲
T
C
−1
共x − x
¯
兲兴
1/2
, 共27⬘ 兲
where x
¯
is the mean and C the covariance matrix.
With an appropriate threshold, it is possible to know if
there is any measurement suitable to a tracker for each
prediction. We consider all the measurements that are in a
zone corresponding to 95% around the mean, that is, all the
measurements whose distances are lower than 5.99 共chi-
squared test兲.
Then, a matrix of possibilities, ⌸, is composed. If each
tracker i has a number m
i
of possible measurements
M
1
1
, M
2
1
, ...,M
m
i
1
, the first row of the matrix is initialized
with the measurements of the first tracker:
⌸
1
= 关M
1
1
M
2
1
¯ M
m
1
1
兴. 共28兲
In the second iteration, the matrix ⌸
1
is replicated m
2
times, and a new row with the measurements assigned to
the second tracker is appended to obtain all possible
combinations:
⌸
2
=
冋
M
1
1
M
2
1
¯
M
m
1
1
M
1
2
M
1
2
¯
M
1
2
兩
M
1
1
M
2
1
¯
M
m
1
1
M
2
2
M
2
2
¯
M
2
2
兩
¯
¯
兩
M
1
1
M
2
1
¯
M
m
1
1
M
m
2
2
M
M
2
2
¯
M
m
2
2
册
=
冋
⌸
1
⌸
1
¯
⌸
1
M
1
2
M
2
2
¯
M
m
2
2
册
共29兲
Fig. 12 Mean squared state error of MCUKF and two independent
camera UKFs for different levels of state noise. A similar graph is
obtained for different levels of measurement noise.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-11
Before starting each iteration, incompatible combina-
tions must be deleted, because two trackers cannot catch
the same measurement. Thus, the matrix ⌸
2
⬘
is obtained.
The process continues in the same way until finishing all
the trackers:
⌸
i
=
冋
⌸
i−1
⬘
⌸
i−1
⬘
¯
⌸
i−1
⬘
M
1
i
M
2
i
¯
M
m
i
i
册
. 共30兲
After applying the algorithm, the resulting matrix ⌸ has
a number of rows equal to the number of trackers, and a
number of columns equal to the number of valid
measurement-tracker combinations. Thus, the combination
whose sum of distances will be minimum is selected. A
graphic example is depicted in Fig. 13.
However, there are some important points that we must
take into account:
• We can choose between two strategies for making the
combination filter: wait for the last iteration, or apply
the filter on each iteration. The first option is easier,
but the second is more efficient.
• If possible, the matching should be calculated inde-
pendently for each camera, in order to reduce the
number of options.
• The possibility of not having a measurement must al-
ways be considered for each tracker. If an incorrect
measurement is assigned, it will damage the other
trackers.
5.4.1 Problems of multisensor multitarget matching
There are several problems using multisensor matching.
Most of these problems only happen if there has been a
detection or segmentation error previously, but solutions
must be designed to avoid these errors, which cause bad
tracking. We have considered the following errors:
• one tracker associated to measurements from different
players
• several trackers associated to measurements from the
same player
• large covariances
• unknown number of possible combinations.
The objective of these corrections is to ensure that each
tracker is tracking a player. It is not possible to ensure that
at a given moment each tracker will be tracking one and
only one player. But the system must be capable of noticing
any important error and be able to restore the desired situ-
ation in a sufficiently brief time.
Problem with one tracker associated to measurements
from different players. This problem only happens when
each sensor just covers a part of the tracking field, it being
possible that two or more players are nearby but not all are
seen from a certain camera. In this case, when a tracker has
a large covariance 共for example, when it has lost the player
it was tracking兲, it is possible that it takes both measure-
ments as if they had come from the same player.
If the covariance is large, then it is also possible that
these measurements from different cameras are distant, in-
dicating that they probably come from different players.
The detection of the problem consists then in establishing a
threshold that represents the maximum distance between
two measurements if they are provided by the same player.
The solution applied is different if the tracker captures
two measurements or more. In case the tracker has two
measurements assigned, the solution is to deassign the fur-
thest measurement from the tracker. After that, it is advis-
able to check if the measurement may correspond to an-
other tracked player.
In the case of three or more measurements, an iterative
process is performed until the distance between each pair of
trajectories is below the threshold. First, the measurement
with the longest global distance to the other measurements
is deassigned. Then, this measurement is assigned to other
tracker if possible. Finally, all distances between each pair
of measurements are evaluated. This process is repeated
until the maximum distance is below the threshold, as seen
in Fig. 14.
Problem with several trackers associated to measure-
ments from the same player. It is also possible that dif-
ferent trackers are associated to different measurements
from the same player 共Fig. 15, left兲, effectively following
these trackers the same player while there are other avail-
able players. Normally, this problem only happens after a
segmentation error or under unexpected circumstances.
Since it is inevitable to make mistakes in previous stages, it
is necessary to develop a system that allows us to correct
them.
To be sure that these two or more measurements come
from the same player, three conditions must be satisfied.
Fig. 13 Diagram with a simple example of how different combina-
tions are generated and impossible combinations are filtered out.
Fig. 14 Diagram used for discarding measurements when they are
too far away. Each designed measurement is reassigned if possible.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-12
The first condition is that no pair of measurements of the
supposed bad trackers can come from the same camera.
The second condition relates to the distance between each
pair of measurements, which must always be below a cer-
tain threshold. The third and final condition is that the
trackers must also have a distance below another threshold.
To be coherent, this threshold must have the same value as
the one used in the solution of the previous problem.
The solution for this problem is to merge all measure-
ments into only one tracker when the three conditions are
satisfied, leaving the other trackers without measurement,
and establishing what we have called exclusion zones.
Each exclusion zone consists of an area that is estab-
lished surrounding a measurement, valid for only one
tracker and only during a determined number of frames.
When a measurement is inside an active exclusion zone, it
is not visible to the tracker that has created this exclusion
zone 共Fig. 15, middle兲.
Exclusion zones are indispensable, since when all mea-
surements are assigned to the same tracker, the other track-
ers make their covariances grow to try to catch other mea-
surements, but they will recapture the same measurements
if the exclusion zone is not established. It is desirable that
the number of frames through which the exclusion zone is
active should be enough to allow the tracker to find another
measurement, and that its size should be enough to ensure
that the tracked player does not cross through its bound-
aries while the exclusion zone is active 共Fig. 15, right兲.
Problem with large covariances. Another possible prob-
lematic situation is generated by trackers with large cova-
riances. When a tracker loses all measurements, its covari-
ance starts to grow, in order to be able to catch an available
measurement. But, since the Mahalanobis distance is used
to measure the distance among trackers and measurements,
it is possible that, if the covariance becomes very large, the
tracker captures a measurement that corresponds to a nearer
tracker with small covariance 共Fig. 15, right兲.
It is possible to limit the maximum covariance area, but
this affects the way the trackers recapture lost measure-
ments. Another option is to give priorities when assigning
the measurements. One or more thresholds can be set, to
divide the trackers into several categories. Trackers with
the smallest covariance will choose their measurements
first, and the other trackers will only be able to choose
among those measurements that the first group of trackers
has not caught. It is necessary that any tracker that has not
lost its measurements be always included in the group of
trackers with the smallest covariance, to avoid assignment
problems.
Problem with the number of possible combina-
tions. The algorithm previously proposed has the disad-
vantage that the number of combinations between measure-
ments and trackers is unknown, and it can be very high if
each tracker has many different available measurements. It
is necessary to limit the number of combinations in order to
reduce the maximum number of combinations that the al-
gorithm can take into account.
The easiest method to limit the total number of combi-
nations is to limit individually the number of possible mea-
surements that a tracker can have discarding further mea-
surements. For example, a threshold of m = 3 is a good
compromise between speed and efficiency in our case. It is
high enough to consider all important options, but not so
high that the processing time will be too long.
However, there is a better way to limit the number of
combinations, consisting in deleting the worst partial com-
binations at each iteration, if they exceed our desired limit.
6 Feedback Procedure
Image tracking is conditioned by camera location and
player occlusions, which can produce tracking failures.
This fact is especially noticeable when targets have virtu-
ally identical appearance, so that only dynamics can be
used to distinguish the targets. These situations are cor-
rected using multicamera tracking in the plan. In those situ-
ations, player locations in the image will be different from
locations in the plan. In order to alleviate this incoherence,
we send feedback state vectors from the plan to each view.
This information is integrated in the particle filter as an
extra likelihood. Particles of each player are reweighted,
assuming the position in the plan of the player as a Gauss-
ian or a super-Gaussian and assigning the weight according
to the distance to the center. In this manner, if the locations
in the two trackers fit each other, the result will not be
affected. On the other hand, if they do not fit, the inaccurate
location in the image is corrected. Thus, each camera track-
ing is helped by other cameras thanks to the feedback pro-
cedure, correcting errors that could not be corrected with a
single view, and improving the results of all cameras.
The metric used to feed back the information from the
plan to the camera is a distance function that we have de-
fined as
共x
t
n
,x
t
plane
兲 ⬀ exp兵关− 共x
t
n
− x
t
plane
兲
T
·
−1
· 共x
t
n
− x
t
plane
兲兴
␣
其,
共31兲
where x
t
plane
is the MCUKF estimate of the mean x
ˆ
t
,
is
the covariance P
ˆ
t
, and
␣
is an exponent that allows control-
Fig. 15 Left: Tracker T1 is tracking measurement M1, from camera
1, while tracker T2 is tracking measurement M2, from camera 2. But
both measurements come from the same player. Middle: Both mea-
surements are assigned to tracker T1, while two exclusion zones,
one around each measurement, are created for tracker T2, prevent-
ing it from catching these measurements while its covariance grows.
Right: Though far from measurement M1, tracker T2 gets it because
the Mahalanobis distance is lower than for T1, because T2 has a
large covariance.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-13
ling and tuning the influence of the feedback process in the
final estimation. In this system, it has been set at 70% of the
final weight.
The procedure was shown in Fig. 6, where it is depicted
how this feedback information is introduced after calculat-
ing the measurement estimation for every camera and be-
fore obtaining the final state estimation. Therefore, the
feedback process can be understood as an iterative refine-
ment of the posterior probability. The methodology that en-
ables that is based on dividing the evaluation step into a set
of layers. This multilayer particle filter allows introducing a
refinement, where the first estimations help to discard some
hypotheses before costly evaluations are done. In this way,
independent observations can be combined sequentially to
give the final estimation. This layered particle filter has a
similar purpose to the annealed particle filter described in
Ref. 52. However, the methodology is different. Whereas
the annealed particle filter uses the same measurement
through the different layers, our proposal introduces a new
factor that was not present in the first estimation and whose
value changes with the posterior estimation.
6.1 Reset Mechanism
In all the previous mechanisms to model interaction and
avoid tracking failures, due to the small size of players in
the image, the measurement is frequently completely lost.
For this reason, a reset mechanism has been implemented.
This reset can act in two different cases:
• If the location of the player in the plan is too different
from the location in the image, a failure in the image
will be taken for granted and it will be reinitialized
with the location in the plan.
• If two plan trackers catch the same measurement and
superimpose on each other, a failure in the plan will be
taken for granted. To remedy it, extra deassigned mea-
surements are looked for in the image, and one of the
trackers is reinitialized to the nearest such
measurement.
Finally, we need an algorithm to administer the transi-
tion of players between cameras. Although the number of
targets is fixed and therefore the number of multicamera
estimations is also fixed, the players go in and out of the
camera fields of view. In these cases, we have to create or
delete a local tracker in the image. A new tracker is initial-
ized with the label and the particles corresponding to the
player.
7 Results
Our system has been tested using several sequences of a
football database. This database was taken in collaboration
with the Government of Aragón, the University of Zara-
goza, and Real Zaragoza S.A.D. football team. For this
purpose, eight analog cameras were installed in the football
stadium and connected to two MPEG4 video recorders. In
the first frame, the initialization is made by hand, choosing
the players in each camera field of view. We use a constant-
acceleration model and a motion dynamic that permits ob-
jects with variable trajectories. By introducing x and y ve-
locities in the state vector we can solve occlusions between
tracked objects:
x
ˆ
k
= 关x
v
x
y
v
y
兴, x
ˆ
k兩k−1
= F · x
ˆ
k−1
, 共32兲
and the dynamic matrix is given by a first-order model.
Table 1 Sequence 1 statistics. The error measurement was obtained using manually labeled ground truth.
Player
Distance
共m兲
Max. velocity
共km/h兲
Mean velocity
共km/h兲 Sprints
In play
共min兲
Max. error
共m兲
Mean error
共m兲
Team12 1 2 1 2 1212121 2
1 54.95 54.09 24.54 23.19 17.05 16.78 2 2 0.19 0.19 1.04 1.07 0.41 0.36
2 33.55 30.11 18.56 12.80 10.41 9.34 1 2 0.19 0.19 1.14 1.35 0.36 0.43
3 39.54 24.53 19.72 10.52 12.27 7.61 1 1 0.19 0.19 1.32 0.72 0.54 0.25
4 27.27 27.89 14.91 12.98 8.46 8.66 1 1 0.19 0.19 1.78 0.94 0.64 0.30
5 40.04 42.88 17.17 15.93 12.43 13.31 1 0 0.19 0.19 0.92 0.85 0.33 0.37
6 25.14 28.50 10.26 16.26 7.80 8.85 1 1 0.19 0.19 2.05 2.45 0.88 0.36
7 18.09 45.63 7.03 17.46 5.61 14.16 2 2 0.19 0.19 0.92 1.35 0.30 0.40
8 37.14 30.98 14.04 14.04 11.53 9.62 1 2 0.19 0.19 1.09 1.42 0.38 0.39
9 20.40 22.70 8.16 8.71 6.33 7.02 1 1 0.19 0.19 1.22 0.93 0.52 0.37
10 19.61 19.02 6.25 7.76 6.06 5.88 0 1 0.19 0.19 1.02 0.95 0.41 0.42
Total 2.45 0.401
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-14
Once tracking has finished, we can extract statistics from
the trajectories of the targets, such as maximum velocity,
mean velocity, covered distance, or number of sprints, to
name a few.
In order to evaluate the system and obtain numerical
results, we have manually labeled 2600 frames correspond-
ing to all cameras for two sequences of 12 s. Sequences
have been chosen with many complex interactions between
players of both teams, but without any scrums or other
special circumstances. In this manner, the accuracy of the
system can be checked using the ground truth of the labeled
sequences. Results and statistics are shown in Tables 1–3
and Figs. 16–19. Reported occlusions are those among
players belonging the same team, that is, with the same
color model. Occlusions between different teams, although
they are common due to defensive coverages, have been
solved automatically by the tracking algorithm in all the
observed situations.
A good accuracy level has been obtained for both se-
quences, for two reasons: the multicamera tracking, which
reduces the measurement noise of each camera using the
others, and the high precision of the single-camera stage.
To estimate the importance of this single-camera stage, we
have tested the system without it, that is, sending the blob
position from each camera directly to the plan. This con-
figuration gives us a value of the location mean error equal
to 1.182 m, which means an increase of 125% over the
multicamera approach.
Sequence 2, more complex than the first one, presents
lower performance, since most of interactions happen in an
area that is not properly covered by any camera due to the
poor image resolution 共see Figs. 18 and 19兲. As conse-
quence, the accuracy decreases and one of the targets is
finally lost. This lost player 共player 7, team 1兲 has been
highlighted in Table 2 to show the incoherent data reported,
which can provide useful information to easily identify lost
Table 2 Sequence 2 statistics. An incorrectly tracked target, which had to be reinitialized manually, is marked in bold 共兲 shows resulting errors
without this tracker.
Player
Distance
共m兲
Max velocity
共km/h兲
Mean velocity
共km/h兲 Sprints
In play
共min兲
Max. error
共m兲
Mena error
共m兲
Team 1 2 1 2 1 2 1 2 1 2 1 2 1 2
1 50.52 48.16 19.48 17.78 15.16 14.45 1 3 0.2 0.2 2.71 2.08 0.90 0.83
2 44.01 53.55 19.70 23.64 13.20 16.06 1 1 0.2 0.2 1.94 3.27 0.59 0.63
3 39.20 59.67 15.09 29.54 11.76 17.9 3 1 0.2 0.2 1.41 4.48 0.50 0.50
4 27.48 35.77 14.65 17.55 8.24 10.73 3 1 0.2 0.2 3.18 1.81 0.42 0.51
5 20.82 87.35 7.84 37.59 6.24 26.20 2 3 0.2 0.2 1.12 5.34 0.29 1.71
6 26.00 46.95 9.99 24.15 7.80 14.09 1 1 0.2 0.2 2.35 2.68 0.94 0.53
7 68.28 28.06 41.59 10.64 20.48 8.42 2 2 0.2 0.2 21.8 3.84 13.0 0.38
8 43.88 29.09 16.33 10.55 13.16 8.73 2 1 0.2 0.2 1.11 1.05 0.55 0.37
9 35.78 42.20 14.47 18.73 10.73 12.66 2 1 0.2 0.2 2.07 1.74 0.46 0.51
10 36.78 30.17 16.03 14.35 11.03 9.05 3 1 0.2 0.2 5.70 1.62 1.21 0.45
Total 21.8 共5.7兲 1.26 共0.646兲
Table 3 Sequence complexity and results obtained.
*
Only occlusions between players of the same team have been scored, since color models
solve the others.
Seq.
Duration
共s兲
Occlusions
*
Length
of
occlusions
Transitions between cameras
Lost
players Mean error
Entering Leaving
Total Solved Total Solved Total Solved
1 11.625 2 2 24 9 9 9 9 0 0.40146777
2 12.0 5 5 84 11 10 12 11 1 0.64622249
Total 23.625 7 7 108 20 19 21 20 1 0.52384513
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-15
Fig. 16 Trackers and trajectories during a period of the football match 共sequence 1兲.
Fig. 17 Trajectories at each view during a period of the football match 共sequence 1兲.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-16
Fig. 18 Trackers and trajectories during a period of the football match 共sequence 2兲.
Fig. 19 Trajectories at each view during a period of the football match 共sequence 2兲.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-17
trackers. By checking performance statistics, like velocity
or distance, trackers with physically impossible results can
be taken as signs of wrong tracking. This fact is important,
because error data require an annotated sequence, which is
not available during a real performance. This information
could be used to generate alarms automatically and act ac-
cordingly.
Although excellent results have been obtained, it is not
easy to extract conclusions about the performance during a
whole match with such short sequences. For this reason, the
complete system has been tested uninterruptedly during a
long sequence of 8 min to extract conclusions and qualita-
tive statistics. This sequence includes all sorts of situations
inherent in a football match 共Fig. 20兲, such as fouls,
scrums, goals, and other complex interactions. Thanks to
this test, results 共see Table 4兲 can be analyzed, and useful
conclusions can be extracted.
Table 4 encapsulates the advantages of our proposal.
The error rate is the number of manual corrections divided
by the length of the sequence, and the saving rate is the
number of manual corrections divided by the total number
of locations needed to track the players. This means that a
human supervisor must act only during 5.84% of the match
time in order to correct the system errors. Thus, our system
is able to save 99.4% of the work that a human user must
do to complete the annotation of the whole match. Finally,
the conflict rate is the number of solved potentially danger-
ous situations divided by the total number of such situa-
tions. Therefore, the system yields a substantial improve-
ment over current commercial systems, which solve most
of the conflicts manually.
To analyze the unsolved errors in depth, we can examine
the distribution of these errors 共Fig. 21兲. Most of them are
situated in zones 1, 2, and 3 and are due to the poor reso-
lution of the players in these areas for all the cameras. As
future improvement, it would be advisable to place three
extra cameras in order to increase the system redundancy,
reducing thus the error rate. The importance of this redun-
dancy is also corroborated by the conflict distribution:
57.32% of the conflicts appear when only one camera could
be used to obtain useful observations 共due to occlusions,
extremely low resolution, or compression losses兲, and the
density of conflicts per square meter is reduced from 2.44
to 1.98 when these conflicts are observed by more than one
camera. A more detailed study that examines the impor-
tance of the camera overlapping is given in the appendix
共Sec. 9兲.
We have also evaluated the modified PF for performance
evaluation of tracking and surveillance on a sequence
共PETS兲共ftp://ftp.pets.rdg.ac.uk/pub/VS-PETS/兲共Fig. 22兲
with a single camera, to check the robustness of the color
segmentation and tracking algorithm. The sequence is com-
pleted successfully in spite of the fact that the cameras and
perspective are quite different.
Using this PETS sequence, we have made a comparison
between our importance-sampling PF and the traditional
PF. Table 5 shows the results for the mean squared error
共MSE兲 and computational cost. As we can see, using the
integral image method allows a useful reduction factor for
each particle estimation. Although an initial cost is added
creating the integral matrix, it is compensated by the huge
number of particle evaluations performed. The comparison
has been made using MATLAB running on a Pentium
2.4-GHz chip; the time costs obtained are useful only for
comparison with one another.
8 Conclusions and Future Work
We have presented a computationally efficient multicamera
algorithm for tracking multiple targets in a sporting match.
The robustness of multiview methods lies in the multiview
integration, which makes the method insensitive to occlu-
sions in some of the cameras. However, a sporting match is
an environment so complex that that is not enough in many
situations. For this reason, we have developed a double-
tracking methodology combined with a feedback proce-
dure. This technique increases the strength of the system at
a feasible computational cost, thanks to several efficient
modifications of the particle filter.
The most valuable novelties presented in this paper are
the modification to reduce computational costs 共using the
integral image and the CPI mask兲, the extension of the
UKF to multicamera applications, and the feedback proce-
dure, which increases the robustness and accuracy of both
tracking systems.
The results obtained are satisfactory, and show a sub-
stantial improvement over systems based on manual label-
ing. Our system allows processing a complete match in a
few hours using a conventional PC. In this way, a football
trainer will have the match at his disposal in order to ana-
Table 4 Long-sequence qualitative results.
Duration
Error
rate
Saving
rate
Conficts
Time Frames Total Solved Rate
7 min 46 s
11661 5.84% 99.37% 759 608 80.1%
Fig. 20 Complex situations that appear during a football match.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-18
lyze it carefully and extract conclusions. Obtaining the re-
sults so soon means an important competitive advantage in
an increasingly competitive environment.
As future work, it will be important to develop a method
to automate the initialization of the players on the map and
the generation of the color models.
53
Furthermore, we must
improve the data association algorithm in very complex
situations 共fouls, corner kicks兲. The inclusion of three extra
cameras as well as the use of gigabit digital cameras will
help to reduce the number of unsolved errors. Finally, we
are working on a better relationship between the particle
filter and the plan in order to obtain a better estimation of
the measurement reliability, thereby improving the multi-
camera integration.
Acknowledgments
The authors thank Government of Aragón and Real Zara-
goza S.A.D. We would also like to thank I3A for its confi-
Fig. 21 Left: Conflict distribution. Right: Main zones of unsolved errors.
Fig. 22 Results for the PETS sequence.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-19
dence. This work is partially supported by grant TIN2006-
11044 and FEDER from Spanish Ministry of Education. J.
Martínez-del-Rincón is supported by a FPI grant BES-
2004-3741 from the Spanish Ministry of Education.
Appendix: Error Analysis
A quality study of system errors has been carried out over a
set of test data chosen from one camera installed in the
stadium. These data have been classified into three zones in
the image, each one with a different mean distance from the
camera position. In this way, the perspective effects can be
analyzed through errors in the overall process. This can be
seen in Fig. 23.
There are two procedures that must be taken into account:
measurement extraction 共involving the PF兲 and homogra-
phy. The errors involve the right working of the UKF and
its correspondence algorithm.
The time evolution of the UKF is governed by the follow-
ing simplified equations:
x
k+1
= Fx
k
+
E
,
y
k
= Hx
k
+
M
, 共33兲
where
E
⬃N共0,
E
兲 and
M
⬃N共0,
M
兲 are the state and
measurement errors, respectively. It is not possible to know
the first one a priori, because it depends of the target tra-
jectory. On the other hand, the measurement error can be
obtained by analyzing the measurement process.
Both errors work together in order to produce the final error
in the UKF algorithm, expressed as the predicted and esti-
mated covariance P. Mathematically, this can be expressed
as P⬇
E
+
M
. The first term can be reduced by adding
more complex dynamic models
54
or incorporating different
models to take into account different player behaviors.
55
The last term can be covered by studying the measurement
process and providing better algorithms. For this purpose,
an alternative proposal has been proposed in this paper. An
image-tracking system has been incorporated 共see Sec. 4.1兲
to improve the measurement process, thus reducing the
measurement noise. Nevertheless, the quality of measure-
ment depends on further parameters, like perspective and
distortion errors. In our case, the perspective error plays an
important role because the players are located at very long
distances from the camera. This fact is one of the most
important reasons for the necessity of using several cam-
eras 共one camera will always have the best perspective of a
player兲.
Assuming that errors are Gaussian with zero mean, two
such measurements can produce a better measurement. This
is another reason to use several cameras. However, the hy-
pothesis of zero mean is not true. Our measurements pos-
sess a positive mean whose value depends on the target
position. This characteristic can be observed in Fig. 24. The
error on each axis is shown in the left and middle graphs,
Table 5 Comparison of our algorithm with conventional particle filter
共PETS sequence, Fig. 22兲.
Algorithm
MSE
共pixel
2
兲
Mean
N
Time 共s兲
PDI Particles Total
Conventional:
N=150
190.2 150 — 4.4 4.4
N=500
165.3 500 — 23.9 23.9
N=1000
171.9 1000 — 44.3 44.3
Ours 112.1 314 0.15 0.46 0.61
Fig. 23 Perspective zones in camera 1.
Fig. 24 Perspective error in the image.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-20
and the error modulus in the rightmost one. Note that error
is lower in the middle of the image, and it rises in the outer
areas.
Once the theory of the measurement error has been treated,
the homography error can be analyzed. This error appears
when measurements in the image are converted to plane
coordinates. Obviously, this error depends on the player
position with respect to the camera, but it appears due to
the homography matrix construction—to be more exact, the
points selected to build this matrix.
We have followed the methodology cited in Ref. 56 in or-
der to compute the covariance of the estimated homogra-
phy. All points used to build this matrix are considered, and
the implicit error is modeled as a isotropic homogeneous
Gaussian. Therefore, these errors are denoted as
x
=
y
=
for image points and ⌺
x
=⌺
y
=⌺ for plane points.
Likewise, the homography covariance is defined as ⌳
h
=J·S·J, where J=−兺
i=2
9
u
k
·u
k
T
/
k
, with u
k
the k’th eigen-
vector of the matrix A
T
·A and
k
the corresponding eigen-
value. Finally, S is defined as follows:
S =
兺
i=1
n
共a
2i−1
T
a
2i−1
f
i
o
+ a
2i
T
a
2i
f
i
e
+ a
2i−1
T
a
2i
f
i
oe
+ a
2i
T
a
2i−1
f
i
eo
兲
共34兲
with a
i
equal to row i of the matrix A. The rest of the
parameters can be defined, on writing H in its vectorial
form H =共h
1
,h
2
,h
3
,h
4
,h
5
,h
6
,h
7
,h
8
,h
9
兲,as
f
i
o
=
2
关h
1
2
+ h
2
2
−2X
i
共h
1
h
7
+ h
2
h
8
兲兴 +2⌺
2
共x
i
h
7
h
9
+ x
i
y
i
h
7
h
8
+ y
i
h
8
h
9
兲 + 共⌺
2
X
i
2
+ x
i
2⌺
2
兲h
7
2
+ 共
2
X
i
2
+ y
i
2
⌺
2
兲h
8
2
+ ⌺
2
h
9
2
,
共35兲
f
i
e
=
2
关h
4
2
+ h
5
2
−2Y
i
共h
4
h
7
+ h
5
h
8
兲兴 +2⌺
2
共x
i
h
7
h
9
+ x
i
y
i
h
7
h
8
+ y
i
h
8
h
9
兲 + 共⌺
2
Y
i
2
+ x
i
2⌺
2
兲h
7
2
+ 共
2
Y
i
2
+ y
i
2
⌺
2
兲h
8
2
+ ⌺
2
h
9
2
,
共36兲
f
i
oe
= f
i
eo
=
2
关共h
1
− X
i
h
7
兲共h
4
− Y
i
h
7
兲 + 共h
2
− X
i
h
8
兲共h
5
− Y
i
h
8
兲兴, 共37兲
with 共X
i
,Y
i
兲 the coordinates on the ground and 共x
i
, y
i
兲 the
coordinates in the image.
The typical formula for homographic computing is x
⬘
=Hx, which is converted to X= Bh, where B is the follow-
ing 3⫻9 matrix:
B =
冢
x
T
0
T
0
T
0
T
x
T
0
T
0
T
0
T
x
T
冣
. 共38兲
The 3 ⫻ 3 matrix ⌳
x
is the covariance of the point X in
homogeneous world coordinates. The conversion to a 2
⫻2 matrix 共⌳
x
2⫻2
兲 in nonhomogeneous coordinates is car-
ried out as follows:
⌳
x
2⫻2
= f⌳
x
f
T
, 共39兲
where X =共 X , Y , W兲
T
共world coordinates兲 and ⵜf is defined
as
ⵜf =
1
W
2
冉
W
0−X
0
W
− Y
冊
. 共40兲
Assuming only noise in the homography computation and
accurate data on the point x, the covariance of the corre-
sponding world point will be
⌳
x
= B⌳
h
B
T
. 共41兲
These equations applied to our test data give us the results
shown in Fig. 25.
In conclusion, although the errors in the image show a val-
ley at the center of the image, these errors converted to
world coordinates increase with the distance from the cam-
era position. The units in world coordinates are pixels,
which can be directly converted to meters. This conversion
gives values close to those shown in Table 3.
References
1. N. J. Gordon, D. J. Salmond, and A. F. M. Smith, “Novel approach to
nonlinear/non-Gaussian Bayesian state estimation,” IEE Proc. F, Ra-
dar Signal Process. 140, 107–113 共1993兲.
2. A. Doucet, N. de Freitas, and N. Gordon, Sequential Monte Carlo
Methods in Practice, Springer-Verlag 共2001兲.
3. M. Isard and A. Blake, “Condensation: conditional density propaga-
tion for visual tracking,” Int. J. Comput. Vis. 29共1兲, 5–28 共1998兲.
4. S. J. Julier and J. K. Uhlmann, “A new extension of the Kalman filter
to nonlinear systems,” in Proc. AeroSense: The 11th Int. Symp. on
Aerospace/Defence Sensing Simulation and Controls 共1997兲.
5. E. A. Wan and R. van der Merwe, The Unscented Kalman Filter,
Chap. 7 共2001兲.
6. A. Ekin, A. M. Tekalp, and R. Mehrotra, “Automatic soccer video
analysis and summarization,” IEEE Trans. Image Process. 12共7兲,
796– 807 共2003兲.
7. J. Assfalg, M. Bertini, C. Colombo, A. Del Bimboa, and W. Nunziati,
“Semantic annotation of soccer videos: automatic highlights identifi-
cation,” Comput. Vis. Image Underst. 92共2–3兲, 285–305 共2003兲.
8. K. Matsumoto, S. Sudo, H. Saito, and S. Ozawa, “Optimized camera
viewpoint determination system for game soccer broadcasting,” in
Proc. MVA2000, IAPR Workshop on Machine Vision Applications,
pp. 115–118 共2000兲.
9. K. Choi and Y. Seo, “Probabilistic tracking of soccer players and
ball,” in Statistical Methods in Video Processing, D. Comanciu, E.
Kanatani, R. Mester, and D. Suter, Eds., pp. 50–60, Springer 共2004兲.
10. J. Vermaak, A. Doucet, and P. Perez, “Maintaining multi-modality
through mixture tracking,” in Int. Conf. on Computer Vision,Vol.2,
p. 1110 共2003兲.
11. K. Okuma, A. Taleghani, N. de Freitas, J. Little, and D. Lowe, “A
boosted particle filter: multitarget detection and tracking,” in
Proc.
8th Eur. Conf. on Computer Vision, pp. 28–39 共2004兲.
Fig. 25 Perspective error on the plan.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-21
12. J. B. Hayet, T. Mathes, J. Czyz, J. Piater, J. Verly, and B. Macq, “A
modular multi-camera framework for team sports tracking,” in Proc.
IEEE Int. Conf. on Advanced Video and Signal-Based Surveillance
共2005兲.
13. C. J. Needham and R. D. Boyle, “Tracking multiple sports players
through occlusion, congestion and scale,” in Br. Machine Vision Conf.
BMCV’01, pp. 93–102 共2001兲.
14. A. Yamada, Y. Shirai, and J. Miura, “Tracking players and a ball in
video image sequence and estimating camera parameters for 3D in-
terpretation of soccer games,” in Proc. 16th Int. Conf. on Pattern
Recognition (ICPR’02), Vol. 1, p. 10303 共2002兲.
15. K. Okuma, J. J. Little, and D. Lowe, “Automatic acquisition of mo-
tion trajectories: tracking hockey players,” Proc. SPIE 5304, 202–213
共2003兲.
16. M. Xu, J. Orwell, and G. A. Jones, “Tracking football players with
multiple cameras,” in Int. Conf. on Image Processing, Vol. 5, pp.
2909–2912 共2004兲.
17. M. Xu, L. Lowey, J. Orwell, and D. Thirde, “Architecture and algo-
rithms for tracking football players with multiple cameras,” IEE
Proc. Vision Image Signal Process. 152共2兲, 232–241 共2005兲.
18. J. Ren, J. Orwell, G. A. Jones, and M. Xu, “Real-time 3D soccer ball
tracking from multiple cameras,” in Br. Machine Vision Conf., pp.
829–838 共2004兲.
19. K. Choi and Y. Seo, “Probabilistic tracking of the soccer ball,” in
Proc. ECCV Workshop on Statistical Methods in Video Processing,
pp. 50–60 共2004兲.
20. R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer
Vision, 2nd ed., Chap. 7.8, Cambridge Univ. Press 共2004兲.
21. Y. Zhang, H. Leung, T. Lo, and J. Litva, “Distributed sequential
nearest neighbour multitarget tracking algorithm,” IEE Proc., Radar
Sonar Navig. 143, 255–260 共Aug. 1996兲.
22. K. Althoff, J. Degerman, and T. Gustavsson, “Combined segmenta-
tion and tracking of neural stem-cells,” Lect. Notes Comput. Sci.
3540, 282–291 共
2005兲.
23. H. Nait-Charif and S. J. McKenna, “Head tracking and action recog-
nition in a smart meeting room,” in 4th IEEE Int. Workshop on Per-
formance Evaluation of Tracking and Surveillance 共2003兲.
24. K. Nummiaro, E. Koller-Meier, and L. Van Gool, “A color-based
particle filter,” in Symp. for Pattern Recognition of the DAGM, Vol.
2449, pp. 353–360 共2002兲.
25. K. Nummairo, E. B. Koller-Meier, and L. Van Gool, “An adaptive
color-based particle filter,” Image Vis. Comput. 21共1兲, 99–110 共2003兲.
26. E. Polat, M. Yeasin, and R. Sharma, “Robust tracking of human body
parts for collaborative human computer interaction,” Comput. Vis.
Image Underst. 89,44–69共2003兲.
27. M. Isard and A. Blake, “Icondensation: unifying low-level and high-
level tracking in a stochastic framework,” in 5th Eur. Conf. on Com-
puter Vision, pp. 893–908 共1998兲.
28. D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object track-
ing,” IEEE Trans. Pattern Anal. Mach. Intell. 25, 564–575 共2003兲.
29. S. McKenna, Y. Raja, and S. Gong, “Tracking colour objects using
adaptive mixture models,” Image Vis. Comput. 17, 225–231 共1999兲.
30. C. Stauffer and W. Grimson, “Learning patterns of activity using
real-time tracking,” IEEE Trans. Pattern Anal. Mach. Intell. 22共8兲,
747–757 共2000兲.
31. B. Han, D. Comaniciu, Y. Zhu, and L. Davis, “Incremental density
approximation and kernel-based Bayesian filtering for object track-
ing,” in IEEE Conf. on Computer Vision and Pattern Recognition
(CVPR’04) 共2004兲.
32. M. S. Yang and K. L. Wu, “Unsupervised possibilistic clustering,”
Pattern Recogn. 39共1兲, 5–21 共2006兲
.
33. M. Kim and R. S. Ramakrishna, “New indices for clustering validity
assessment,” Pattern Recogn. Lett. 26共15兲, 2353–2363 共2005兲.
34. A. Elgammal, R. Duraiswami, D. Harwood, and L. S. Davis, “Back-
ground and foreground modeling using nonparametric kernel density
estimation for visual surveillance,” Proc. IEEE 90共7兲, 1151–1163
共2002兲.
35. P. Viola and M. Jones, “Rapid object detection using a boosted cas-
cade of simple features,” in IEEE Computer Soc. Conf. on Computer
Vision and Pattern Recognition 共2001兲.
36. D. Thirde, M. Borg, J. Ferryman, J. Aguilera, M. Kampel, and G.
Fernandez, “Multi-camera tracking for visual surveillance applica-
tions,” in 11th Computer Vision Winter Workshop 共2006兲.
37. J. Black and T. Ellis, “Multicamera image measurement and corre-
spondence,” Meas. J. Int. Meas. Confed. 35共1兲, 61–71 共2002兲.
38. M. Meyer, T. Ohmacht, R. Bosch, and M. Hotter, “Video surveillance
applications using multiple views of a scene,” in 32nd Annual 1998
Int. Carnahan Conf. on Security Technology Proc., pp. 216–219
共1998兲.
39. A. Criminisi, I. Reid, and A. Zisserman, “Single view metrology,”
Int. J. Comput. Vis. 40共2兲, 123–148 共2000兲.
40. J. Martínez, J. E. Herrero, J. R. Gómez, and C. Orrite, “Automatic
left luggage detection and tracking using multi-camera UKF,” in
IEEE Int. Workshop on Performance Evaluation of Tracking and Sur-
veillance (PETS 06), pp. 59–66 共2006兲.
41. J. R. Gómez, J. E. Herrero, C. Medrano, and C. Orrite, “Multi-sensor
system based on unscented Kalman filter,” in IASTED Int. Conf. on
Visualization, Imaging, and Image Processing, pp. 59–66 共2006兲.
42. N. Bergman and A. Doucet, “Markov chain Monte Carlo data asso-
ciation for target tracking,” in ICASSP ’00: Proc. IEEE Int. Conf. on
Acoustics, Speech, and Signal Processing, pp. II705–II708, IEEE
Computer Soc. 共2000兲.
43. I. Goodman, R. Mahler, and H. Nguyen, Mathematics of Data Fu-
sion
, Kluwer Academic Publishers 共1997兲.
44. R. P. S. Mahler, “Multi-target Bayes filtering via first-order multi-
target moments,” IEEE Trans. Aerosp. Electron. Syst. 39共4兲, 1152–
1178 共2003兲.
45. B. N. Vo, S. Singh, and A. Doucet,“Sequential Monte Carlo methods
for multi-target filtering with random finite sets,” IEEE Trans.
Aerosp. Electron. Syst. 41共4兲, 1224–1245 共2005兲.
46. B. N. Vo and W. K. Ma, “The Gaussian mixture probability hypoth-
esis density filter,” IEEE Trans. Signal Process. 54共11兲, 4091–4104
共2006兲.
47. B. N. Vo, S. Singh, and A. Doucet, “PHD filters of higher order in
target number,” IEEE Trans. Aerosp. Electron. Syst. 43共4兲, 1523–
1543 共2007兲.
48. B. T. Vo, B. N. Vo, and A. Cantoni, “Analytic implementations of the
cardinalized probability hypothesis density filter,” IEEE Trans. Sig-
nal Process. 55共7兲, 3553–3567 共2007兲.
49. N. Pham, W. Huang, and S. Ong, “Multiple sensor multiple object
tracking with GMPHD filter,” in Proc. 10th Int. Conf. on Information
Fusion, pp. 1–7 共2007兲.
50. N. Pham, W. Huang, and S. Ong, “Probability hypothesis density
approach for multi-camera multi-object tracking,” in 8th Asian Conf.
on Computer Vision, pp. 875–884 共2007兲.
51. Y. Bar-Shalom and X. R. Li, Multitarget-Multisensor Tracking: Prin-
ciples and Techniques, Chap. 8 共1995兲.
52. J. Deutscher, A. Blake, and I. Reid, “Articulated body motion capture
by annealed particle filtering,” Comput. Vision Pattern Recognition 2,
126–133
共2000兲.
53. J. R. Gómez, J. E. Herrero, M. Montanés, J. Martínez, and C. Orrite,
“Automatic detection and classification of football players,” in
IASTED Int. Conf. on Signal and Image Processing (SIP 2007),pp.
483–488 共2007兲.
54. A. Senior, “Tracking people with probabilistic appearance models,”
in Proc. ECCV Workshop on Performance Evaluation of Tracking
and Surveillance, pp. 48–55 共2002兲.
55. M. E. Farmer, R. L. Hsu, and A. K. Jain, “Interacting multiple model
共IMM兲 Kalman filters for robust high speed human motion tracking,”
in Proc. 16th Int. Conf. on Pattern Recognition, pp. II:20–23 共2002兲.
56. A. Criminisi, I. D. Reid, and A. Zisserman, “A plane measuring de-
vice,” Image Vis. Comput. 17共8兲, 625–634 共1999兲.
Jesús Martínez-del-Rincón received the
PhD degree from the University of Zara-
goza, specializing in biomedical engineer-
ing, in 2008. He previously graduated from
the University of Zaragoza in telecommuni-
cation in 2003. He is currently pursuing doc-
toral studies in computer vision, motion
analysis, and human tracking.
Elías Herrero-Jaraba received his PhD de-
gree in 2005 from the University of Zara-
goza, Spain. He joined the Centro Politéc-
nico Superior of the University of Zaragoza
as a researcher in March 2001. In February
2003 he became an assistant professor,
and since May 2007 he has been an asso-
ciate professor at the same university. His
current research interests include image
processing, multicamera and multitarget
tracking, three-dimensional vision, and
measurement processes. Dr. Herrero is an associate member of the
IEEE.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-22
J. Raúl Gómez received the MSc degree
from the University of Zaragoza, specializ-
ing in electronic technologies, in 2006. He
previously graduated from the University of
Zaragoza in telecommunication in 2004. Af-
ter that, he joined the Computer Vision
Laboratory of the Aragon Institute of Engi-
neering Research 共I3A兲.Heiscurrentlypur-
suing doctoral studies, specializing in com-
puter vision, tracking, and figure detection.
Carlos Orrite-Uruñuela received the mas-
ter’s degree in Industrial Engineering at the
Zaragoza University in 1989. In 1994, he
completed the master’s degree in biomedi-
cal engineering, working in the field of medi-
cal instrumentation for several industrial
partners. In 1997 he did his PhD on com-
puter vision at the University of Zaragoza.
He is currently an associate professor in the
Department of Electronics and Communica-
tions Engineering at the University of Zara-
goza and carries out his research activities in the Aragon Institute of
Engineering Research 共I3A兲. His research interests are in computer
vision and human-machine interfaces. He has participated in sev-
eral national and international projects. He supervises several MSc
students in computer vision, biometrics, and human motion analysis.
Carlos Medrano received the MS degree in
physics from the University of Zaragoza,
Zaragoza, Spain, in 1994, and the PhD de-
gree in 1997, jointly from the University of
Zaragoza and from Joseph Fourier Univer-
sity, Grenoble, France. His PhD was devel-
oped at the European Synchrotron Radia-
tion Facility 共ESRF兲, Grenoble, France,
studying x-ray imaging techniques for mag-
netic materials. He is a lecturer in the Elec-
tronics Department at the Polytechnic Uni-
versity School of Teruel, Spain, where he has been employed since
1998. After some years dedicated to computer-based control sys-
tems, his current research interests include aspects of computer
vision such as real-time tracking and activity recognition.
Miguel A. Montañés-Laborda graduated in
electronic engineering from the University of
Zaragoza in 2001. He is currently complet-
ing his MSc degree in systems engineering
and computing and the second cycle of
electronic and automatic engineering, both
from the University of Zaragoza. In Septem-
ber 2004, he joined the Computer Vision
Laboratory of the Aragon Institute of Engi-
neering Research 共I3A兲 as a scientific de-
veloper.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 48共4兲047201-23