ArticlePDF Available

Abstract and Figures

We propose a complete application capable of tracking multiple objects in an environment monitored by multiple cameras. The system has been specially developed to be applied to sport games, and it has been evaluated in a real association-football stadium. Each target is tracked using a local importance-sampling particle filter in each camera, but the final estimation is made by combining information from the other cameras using a modified unscented Kalman filter algorithm. Multicamera integration enables us to compensate for bad measurements or occlusions in some cameras thanks to the other views it offers. The final algorithm results in a more accurate system with a lower failure rate.
Content may be subject to copyright.
Multicamera sport player tracking with Bayesian
estimation of measurements
Jesús Martínez-del-Rincón
Elías Herrero-Jaraba
J. Raúl Gómez
Carlos Orrite-Uruñuela
Carlos Medrano
Miguel A. Montañés-Laborda
Aragón Institute for Engineering Research
Computer Vision Laboratory
Maria de Luna 1
50018, Zaragoza, Spain
Abstract. We propose a complete application capable of tracking
multiple objects in an environment monitored by multiple cameras.
The system has been specially developed to be applied to sport games,
and it has been evaluated in a real association-football stadium. Each
target is tracked using a local importance-sampling particle filter in each
camera, but the final estimation is made by combining information
from the other cameras using a modified unscented Kalman filter algo-
rithm. Multicamera integration enables us to compensate for bad
measurements or occlusions in some cameras thanks to the other
views it offers. The final algorithm results in a more accurate system
with a lower failure rate.
© 2009 Society of Photo-Optical Instrumentation
Engineers. DOI: 10.1117/1.3114605
Subject terms: computer vision; tracking; image analysis; image processing; ma-
chine vision; pattern recognition.
Paper 080816R received Oct. 15, 2008; revised manuscript received Feb. 5,
2009; accepted for publication Feb. 6, 2009; published online Apr. 10, 2009.
1 Introduction
Professional sport is an extremely competitive world. Mass
media coverage has contributed to the popularity of sport,
increasing its importance in current society due to the
money and fame that it generates. In this environment, in
which any assistance is welcome, video-based applications
have proliferated.
Video-based approaches have shown themselves to be
an important tool in analysis of athletic performance, espe-
cially in sport teams, where many hours of manual work
are required to analyze tactics and collaborative strategies.
Computer-vision-based methods can provide help in auto-
mating many of those tasks.
Sport analysis can be considered as a classic human ac-
tivity recognition problem with several distinctive con-
straints and requirements, for instance, a fixed number of
targets. The huge amount of interaction between players
during a complete match due to the sport activity makes it
difficult to track all players with a single camera. In this
work, we have concentrated our efforts on developing a
tracking application capable of managing multiple sensors
in order to track multiple objects simultaneously.
The reliability of a tracking system depends on the qual-
ity of the observation that we are able to provide it, being
very sensitive to incorrect and inaccurate measurements.
Thus, our proposal is based on a double-tracking strategy.
First, a tracking filter in the image is responsible for ex-
tracting a robust and temporally coherent observation. The
particle filter
13
PF has been chosen due to its capacity of
modeling non-Gaussian nonlinear distributions, which
leads to a competitive advantage in the image plane, where
the perspective effect and complex interactions produce a
nonlinear environment. Therefore, PFs are applied to each
camera, which permit us to send an indication of the reli-
ability to the second tracker, as well as permitting more
accurate and more robust measurements.
Second, multiview tracking on the ground plane, simpli-
fied thanks to the previous image stage, manages the track-
ing of complex interactions between simple objects and
achieves multisensor conjugation. A variant of the Kalman
filter, called the unscented Kalman tracker,
4,5
will be able to
model the player position and velocity, since a unimodal
distribution is a good approximation of a football player in
a zenithal view. The algorithm receives multiple measure-
ments from each camera, applies a data association method
that establishes the correspondences between measure-
ments and trackers, and estimates the new positions of all
players.
This double-tracking strategy also implies a feedback
between the two trackers but prevents a categorical deci-
sion from being taken over the whole image. If a hypoth-
esis is rejected on the plane of the field using multicamera
information, the decision will be corrected. This feedback
procedure assures that the final decision will be made using
all the available information coming from all the cameras.
Thus, each camera can correct its estimation with the infor-
mation of the other cameras. Furthermore, the feedback
monitors the entry and exit of players in the scene.
The rest of this paper is organized as follows: in Sec. 2
we briefly explain the system architecture. In Sec. 3 we
introduce the processing software and the general scheme
of the system. In Sec. 4 we present the single-view stage
and the monocular tracking. Section 5 shows the tracking
algorithm based on the unscented Kalman filter UKF and
multicamera integration. The feedback procedure is ex-
plained in the Sec. 6. Results are presented in Sec. 7, and
conclusions in Sec. 8.
0091-3286/2009/$25.00 © 2009 SPIE
Optical Engineering 484, 047201 April 2009
Optical Engineering April 2009/Vol. 484047201-1
1.1 Previous Works
Numerous publications about multi-target tracking applied
to sports analysis exist, with many different applications,
such as highlight extraction, tactical analysis, or improve-
ment of athletic performance.
Highlight extraction is a useful application to TV broad-
casting and automatic labeling of recorded video database.
Working with a dynamic and uncalibrated camera is an
essential requirement. References 68 stand out in this field
of application.
Nevertheless, most papers are focused on tracking play-
ers, either working with mobile cameras or with multiple
cameras. In the first option, the PF is shown to be the most
adequate tracking algorithm, due to its advantage in man-
aging multiple targets. However, maintaining multimodal-
ity is not an easy task, and tracking failures appear in player
self-occlusions. This effect is due to the fact that two play-
ers of the same team are virtually identical. To deal with
this problem, Choi and Seo
9
create a synthesized image of
the occlusion out of template images saved previously. In
Ref. 10, Vermaak et al. apply a mixture of PFs, which
maintain the multimodality. By generating human models
with different techniques, like a neural network
11
or point
distribution model PDM,
12,13
results are improved over
those of the traditional color likelihood.
Tracking players with a single mobile camera can work
during short sequences, but requires complex procedures
for entry and exit of players, and for maintaining consis-
tent labels during a complete match. Furthermore, this mo-
tion dynamic in the image is affected by the perspective
effect. For this reason, a shared reference, such as a plan, is
commonly used to simplify the problem. Examples are
shown in Refs. 14 and 15 where a dynamic calibration is
recalculated at each frame.
A more widespread option consists in the use of several
static cameras with a shared reference. An exhaustive ap-
proach to the whole problem is tackled in Refs. 16 and 17,
which were written in the framework of the INMOVE
project. A double-tracking strategy image and plane is
employed to track all the players, although the simple al-
gorithm applied in the image cannot handle all the complex
situations that appear during a football match.
Football is a collaborative sport, and the ball is an es-
sential element of the game. However, its tracking is com-
plex because its movements are three-dimensional, in con-
trast with the movements of the players, who are always in
the same plane. Several papers tackle this problem,
18,19
but
it is unsolved as yet. Ball tracking is outside the scope of
this paper and will be contemplated in future works.
2 System Architecture
The system input is composed of video data from static
analog cameras with overlapping fields of view at an
association-football stadium. The cameras are positioned
around the stadium at 25 m in height on the roof. A com-
promise between cost and good performance with regard
to resolution, overlapping, etc. has been sought. A detailed
scheme of the locations can be viewed in Fig. 1. All cam-
Fig. 1 Camera locations and fields of view on the football pitch.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-2
eras have been calibrated to a common ground-plane coor-
dinate system using a homographic transformation.
20
The video flow is captured by a video capture card con-
nected to a PC. Each card has four independent channels
with four digital signal processors, which allows us to
record video at a rate of 14 frames per second, using a
hardware MPEG-4 video compression codec. In order to
record the information provided by eight cameras, two
computers have been installed. The system has been de-
signed to exploit synchronization between cameras. The
four cameras connected to each computer are synchronized
by the video capture card; the synchronization between the
two recording servers is obtained using an ad hoc wireless
network WIFI, which synchronizes the system clocks of
the two computers.
Recorded videos are sent to the processing server. This
server comprises eight single-camera processing comput-
ers, a multicamera integration server, which receives data
from each camera processor and gives the final estimation,
and a GigaLAN switch, which links all computers and en-
ables message transfer. The multicamera integration server
directs and controls the process. It is the device in charge of
maintaining the synchronization between cameras as well
as obtaining the positions of all players at each time step.
The infrastructure and its connection can be seen in Fig. 2.
Each computer associated with each camera processes
its corresponding frame, obtaining a set of features and a
first hypothetical position of targets in the image. When it
has finished, the data are transmitted to the server. A mul-
ticamera server awaits the responses of all cameras and
updates the state estimation of the player on the pitch. Fi-
nally, the server sends a message to permit the camera pro-
cessor to continue with the next frame. However, this mes-
sage is not a simple acknowledgement, since it has
feedback information in order to correct failures in the im-
age. A more detailed explanation of this process is given in
Sec. 5.
The algorithms were compiled in Visual C⫹⫹ and pro-
grammed with a multithread philosophy.
3 General Scheme of Processing
The processing algorithm Fig. 3 can be divided into two
main parts: the single-view processing stage, which is ap-
plied to each camera independently, and the multiview pro-
cessing stage, which integrates the previous results and
gives us the final estimation state.
Each target is tracked using a local importance-sampling
PF in each camera, but the final estimation is made by
combining information from the other cameras using a
modified UKF algorithm. Multicamera integration enables
compensating bad measurements or occlusions in some
cameras thanks to the other views. The final algorithm re-
sults in a more accurate system with a lower failure rate.
The purpose of the single-view processing stage consists
in extracting a set of hypotheses to be considered in the
multiview tracking process. By applying a tracking algo-
rithm, we obtain a robust-to-occlusion method, which ex-
tracts plausible hypotheses. The PF is a good choice due to
its advantages in multitarget tracking. Color, movement,
and tridimensional information are used to determine the
likelihood of the particles, and to weigh them.
Results of this stage are modeled as Gaussians, whose
mean is the position of each player and whose covariance
represents the reliability of this position. Both are sent to
the multiview tracking algorithm as measurements. In this
Fig. 2 System architecture.
Fig. 3 General scheme of the framework based on a double-tracking strategy.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-3
way, we obtain more robust and more accurate measure-
ments with an extra feature: their reliability.
For the multiview tracking process, unscented Kalman
trackers are used. First, a data association algorithm estab-
lishes the correspondences between measurements and
trackers. Then, the UKF algorithm combines all the mea-
surements corresponding to each tracker, taking into ac-
count their reliability.
The output from this process is the 20 player positions
per time step excluding both goalkeepers. The system
also indicates the category team of each player, and main-
tains the correct number of players in each category. Fur-
thermore, although the identification of individual players
is not possible given the resolution of the input data only
the team is recognized, a label with the name of each
player is assigned in the first frame, and it will be tracked
during the match.
Finally, the output is sent to each camera to correct fail-
ures in the image tracking the feedback procedure previ-
ously mentioned.
It may seem strange to use two different tracking algo-
rithms, a PF on the image and an UKF on the ground—
especially in that the multimodal probability distribution
obtained with the PF is approximated with a Gaussian dis-
tribution for use by the UKF, with an evident loss of infor-
mation. However, this assumption is logical if we think of a
player observed from a zenithal view, where no occlusions
should exist, and a Gaussian is an acceptable representation
of a football player.
The approximation performed helps to solve the data
association problems. Measurement assignment can be very
complex when dealing with multiple cameras and multiple
trackers, as in this case. When two trackers approach each
other, it is necessary to use a measurement assignment al-
gorithm that prevents the two trackers from ending up, after
the occlusion, following the measures generated by only
one of them.
There are several methods to deal with data association
such as the nearest neighbor
21
and the auction
algorithm
22
, but none of them is able to cope with a lot of
measures as many as there are particles in the PF for each
camera and for each tracker, with an acceptable processing
time. That is the main reason for having two different track-
ing algorithms: The PF yields very good tracking of the
images, while representing the measurements by their
Gaussian approximations allows us to perform a quick and
reliable data association of the measurements with trackers,
applying an UKF.
4 Single-View Processing Stage
As mentioned in previous sections, targets in the image are
tracked using PFs. A PF enables us to manage all players in
the same camera using a single filter, but this method re-
quires a lot of particles to be effective, and consequently a
large computational cost. On the other hand, we can apply
a PF to track each player. Although a multiple-target tracker
based on multiple inefficient single-target trackers can re-
duce the overall performance, we maintain successful
tracking of each player by introducing multicamera infor-
mation by means of the feedback procedure. Without this
information, even a unique PF is not able to ensure correct
results, as is shown in Ref. 10. The strength of the proposed
tracker is demonstrated in Sec. 7.
First, we introduce some notions connected with the PF.
After that, we explain our modification in each stage of the
PF: calculation of the prior probability and posterior prob-
ability, and estimation of the final state. The detailed algo-
rithm is presented in Algorithm 1 Sec. 5.
4.1 Importance-Sampling Particle Filter
In the late 1960s, the idea of sequential importance-
sampling techniques, like the Monte Carlo filter, was con-
ceived. Although interesting, these approaches suffered
from degeneration. Gordon et al.
1
came up with the idea of
resampling to cope with this intrinsic degeneration. This
improvement led to the current form of the particle
filter,
2,3,10,2325
which is one of the most extensively applied
methodologies to track multimodality and nonlinear and
non-Gaussian models.
The PF is a hypothesis tracker that approximates the
filtered posterior distribution px
t
z
1:t
over the target state
x
t
given all previous observations z
1:t
up to time t,byaset
of weighted hypotheses called particles. Following this nu-
merical approach, particles x
t−1
i
,
t−1
i
i=1
N
are distributed ac-
cording to the target density. The filtering distribution is
given by
px
t
z
1:t
pz
t
x
t
j=1
N
t−1
i
· px
t
x
t−1
i
, 1
where
t−1
i
is the weight for the particle x
t−1
i
. Here N
samples x
t
i
are drawn from a proposal distribution
qx
t
i
x
t−1
i
,z
t
兲共also called the importance density. The
weights of the new particles can be computed as
2
t
i
⬀⌸
t−1
i
pz
t
x
t
i
px
t
i
x
t−1
i
qx
t
i
x
t−1
i
,z
t
. 2
The choice of an adequate proposal distribution is one of
the most critical design issues. Bootstrap particle filters use
the state transition prior qx
t
i
x
t−1
i
,z
t
= px
t
i
x
t−1
i
as the pro-
posal distribution to place the particles, since it is intuitive
and can be easily implemented.
1,3,26
Since this probability
does not take into account the most recent observation z
t
,
all the particles may have low likelihood, contributing to an
erroneous posterior estimation. Therefore, the exclusive use
of the transition probability as the proposal distribution
makes the algorithm prone to be distracted by background
clutter.
An alternative approach is sampling the observation to
improve the efficiency of the PF. The idea consists in ap-
plying for the proposal distribution a function gx
t
i
that
introduces information about the current observation. Nev-
ertheless, since particles are distributed using this proposal
instead of the transition probability, they are not generated
from previous particles, and therefore, the particles cannot
be paired with previous ones to compute the probability
px
t
i
x
t−1
i
. So an additional function f
t
x
t
i
is applied to
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-4
maintain the temporal coherence. This term represents the
probability of appearance, which is obtained using the
weighted mean over all possible transitions.
t
i
pz
t
x
t
i
f
t
x
t
i
g
t
x
t
i
, 3
where
f
t
x
t
i
=
j=1
N
t−1
j
· px
t
i
x
t−1
j
. 4
Note that this modification implies that the dynamical
model is not only used, but it is also evaluated. Although
the sum in Eq. 4 increases the complexity of the algo-
rithm from ON to ON
2
, the real effect is negligible in
practice because the computational cost of this stage for
practical values of N is dwarfed by the time expended on
the observation process, for instance. Moreover, a more ef-
ficient particle set implies a smaller number of particles and
therefore a smaller complexity growth in this stage.
This approach makes up for particle distribution and en-
sures that the importance function does not distort the cal-
culation of the posterior probability px
t
z
t
. In this way,
any proposal distribution can be chosen if the number of
particles, N, is large enough.
In practice, the proposal distribution derives from a
rough observation process and might produce errors and
imperfect estimations. In this regard, it is recommended to
add a percentage of particles by conventional sampling.
Our approach is similar to Ref. 27, where an auxiliary
tracker generates a more accurate proposal distribution us-
ing secondary observations. On the other hand, our pro-
posal uses the same features as those of the likelihood func-
tion to improve the sampling process, for two important
reasons: Finding auxiliary observations is not always an
easy task, and these observations themselves need a good
proposal distribution if we want it to help the main tracker.
Otherwise, it will produce worse estimations.
By making proposals that have high conditional likeli-
hood, we reduce the costs of sampling many times for par-
ticles which have very low likelihood, improving the statis-
tical efficiency of the sampling procedures, so that we can
reduce the number of the particles substantially.
4.2 Prior Probability
In this paper, we present a novel approach to the particle
filter algorithm based on a priori measurement in the color
space, which reduces the costs of evaluating hypotheses
with a very low likelihood. As is shown in Ref. 28, these
hypotheses or particles appear due to poor prior density
estimation, particularly when the object motion between
frames is badly modeled by the dynamic model. The im-
provement of the statistical efficiency of the sampling al-
lows substantial reduction of the number of particles.
Color-based image features are used for the proposal
distribution. To model color density functions we have two
possibilities: parametric and nonparametric. The major ad-
vantage of nonparametric approaches is their flexibility to
represent complicated densities effectively. However, they
suffer from large memory requirements and computational
complexity. On the contrary, parametric approaches sim-
plify the target feature modeling and reduce drastically the
computational cost needed to process them, but the as-
sumptions introduced limit their application to simple dis-
tributions. Parametric models reduce the color distribution
of the target to easily parameterizable functions like a
Gaussian function or a mixture of several Gaussians.
There are many parametric density representations; for
instance, Ref. 29 uses some Gaussian mixture models
GMMs in hue-saturation space to model the target’s color
distributions. Its authors propose an adaptive learning algo-
rithm used to update these color models over time. In Refs.
29 and 30, their authors suggest GMMs, but their method
requires knowledge of the number of components. In Ref.
31, its authors propose a density approximation methodol-
ogy where the density is represented by a weighted sum of
Gaussians, whose number, weights, means, and covariances
are determined automatically.
Unlike parametric models, a nonparametric density esti-
mator is a more general approach that does not assume any
specific shape for the density function, so it is able to rep-
resent very complicated densities effectively. It estimates
the density function directly from the data without any as-
sumptions about the underlying distribution. As mentioned
in Ref. 31, this avoids having to choose a model and esti-
mating its distributions parameters. The histogram is the
simplest nonparametric density estimator.
In video-surveillance applications, a color histogram is
an adequate solution to characterize each target, due to its
simple initialization. However, sport provides an environ-
ment with plain colors known previously. So, we use a
GMM to generate each object’s distribution, which we ex-
plain in the next section.
The importance function gx
t
i
is introduced in the algo-
rithm as a mask. The mask see Fig. 4 is obtained by
extracting the main colors of the object each color is a
Gaussian and detecting them in the full image or in the
zone surrounding the estimated position extracted with the
last mean state, whose dimensions depend on the position
and speed variances. Only hypotheses located in this mask
are evaluated and used in the estimation. Moreover, we also
apply this information to obtain a fast estimation of the
posterior density.
4.3 Posterior Probability
Once prediction has been calculated, the multiple hypoth-
eses generated must be evaluated using an adequate likeli-
hood function. In order to obtain a likelihood function that
will weigh each particle, we combine multiple visual clues:
Fig. 4 Color probability image and prior mask generation algorithm.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-5
color, movement and height difference. Color is the most
discriminative clue, which differentiates between object
and background, but also between different kinds of ob-
jects. Movement cannot distinguish between objects, but is
very useful for eliminating background areas with the same
colors as the objects, for instance, lines that define the field.
Height measurement helps us to compensate the perspec-
tive effect and thus to obtain a better estimation of the real
object size.
4.3.1 Color probability
A new input frame is projected onto the target probability
space to generate the color probability image CPI.We
generate a CPI for each kind of object to be tracked. In our
particular case, we need two CPIs, one for each team, but
more CPIs could be generated: two CPIs for each team if
clothes have complex colors, or even one for each person in
a video surveillance application. The values of the CPI pix-
els are taken from the target’s GMMs and used as classifi-
ers for each pixel in the input picture. The probability as-
signed to each pixel is given by the distance using as
metric the Mahalanobis distance to the nearest Gaussian.
In order to generate the GMM, we extract pixels corre-
sponding to both teams and background field. Each pixel
is projected to the HSV space to reduce the influence of
changing illumination and shadows. Using the expectation-
maximization EM algorithm, we obtain several Gauss-
ians, which represent the whole color space of the environ-
ment. The number of Gaussians is chosen according to the
necessities. In our case we need two Gaussian models for
the two teams, two for the field, and two for the halos
surrounding the players due to compression, which can be
considered as a mixture of colors. Once we have decided
the total number of Gaussians and we have applied the EM
algorithm, we have to choose a Gaussian corresponding to
each team for generating both CPIs. Many validation indi-
ces can be applied
32,33
for this purpose. However, the color
space is complex and contains arbitrary shapes, so we can
conclude that no index really guarantees the correct choice.
We have developed a simple index, called the motion
validation index MVI, based on the use of motion, to
validate the color segmentation. This index is calculated for
each Gaussian as the number of pixels that are classified
into that cluster I
Gauss
and have been detected as move-
ment I
mov
divided by the total number of pixels that are
classified into that cluster. Those Gaussians that obtain the
highest values are selected for the corresponding football
teams:
MVI
Gauss
i =
x
y
I
Gauss
x,y,iI
mov
x,y
x
y
I
Gauss
x,y,i
i G, 5
where G is the whole GMM, and x and y are the columns
and rows of the images to be processed.
The MVI index also checks that an adequate number of
Gaussians have been selected to model the whole color
space. If the sum of the MVI indices corresponding to the
Gaussians associated with each football team decreases
when we split or merge the Gaussians, we will stop and
select the optimum number of Gaussians.
In order to compensate illumination changes, the means
and covariances of the Gaussians classified into back-
ground and halo are updated, using an online updating
algorithm.
34
Gaussians associated with each team are not
updated, due to the risk of introducing noise.
Once the CPIs have been calculated Fig. 4, the prob-
ability associated with each particle can be extracted. The
state of every particle defines the width W and height H of
the target as well as the position of the center of the target,
x
0
, y
0
. Using these parameters, a rectangular kernel, with
the same dimensions as the predicted target, enables us to
obtain the posterior probability.
This rectangular kernel can be computed with a large
advantage in computational cost terms: Rectangular filters
can be computed very efficiently using the integral image
method proposed by Viola and Jones.
35
The integral image
at location x
0
, y
0
contains the sum of the pixels above and
to the left of x
0
, y
0
, inclusive. Using the integral image, any
rectangular sum can be computed as a sum of four points.
For instance, the sum within the inner rectangle of Fig. 5
can be computed as A + D B+ C. So, the new particle set
is updated following the expressions
˜
t
c
n = D + A C B,
t
c
n =
˜
t
c
n
n
˜
t
c
n
. 6
This computation is fast, thanks to the integral image
previously calculated.
4.3.2 Motion probability
The motion probability is a weight assigned to each particle
and based on the number of pixels of motion that it con-
tains. Its values are computed using the same kernel used
for calculating color probability, but in this case over the
motion detection image I
mov
.
˜
t
m
n =
x=x
0
W/2
x
0
+W/2
y=y
0
H/2
y
0
+H/2
I
mov
x,y
HW
,
t
m
n =
˜
t
m
n
n
˜
t
m
n
. 7
Fig. 5 Convolution kernel to enhance target candidates on the inte-
gral image. Here W and H stand for the predicted target width and
height.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-6
4.3.3 Height difference probability
The last weight is defined as the difference between the
height H stored in the state vector of the particle and the
height that the particle must have because of its new posi-
tion given by the propagation stage height x
0
, y
0
. This
height is given by an algorithm, called the height estimator,
which is explained in Sec. 5 Algorithm 2. We then have:
˜
t
h
n = H Heightx
0
,y
0
兲兴
,
t
h
n =
˜
t
h
n
n
˜
t
h
n
, 8
where
is a constant that fixes the discriminative power of
the weight and depends on the field of view of the camera
and the lens parameters.
4.3.4 Estimation of the new state
Finally, once the last a posteriori feature has been obtained,
all these features must be combined in order to estimate the
new state for the object to be tracked. The score for each
candidate is calculated using the multiplication rule, that is,
assuming independence between features:
t
n =
t
c
n ·
t
n
n ·
t
h
n. 9
The particle set corresponding to this candidate is used to
estimate the new state by means of a weighted average:
ES
t
=
n=1
N
t
nf
t
x
t
g
t
x
t
· x
t
n, 10
where a correction factor due to the importance sampling
must be included in the particle weights.
However, this estimation is independent for each cam-
era, and it can fail if there are occlusions in the image.
Using multicamera information, this effect is reduced, as
we explain in Sec. 6. A global view of the algorithm can be
seen in Fig. 6 and in Algorithm 1 Sec. 5.
5 Multisensor Data Fusion
Once all players have been detected in each camera, we
project all the measurements onto the plan in order to have
a shared reference space. Each hypothesis is projected, con-
verting it to a single point, which is on the floor. When
these transformations have been made, multicamera track-
ing is applied. For this purpose, we use the UKF as tracking
algorithm.
The UKF
4,5
is a very popular tracking algorithm, which
provides a way of processing nonlinear but Gaussian mod-
els. We propose a modified UKF to extend its application to
multisensor scenes, thus improving the global result. The
combination of several independent sensors increases the
precision and robustness of our tracking system, since it
makes it possible to solve difficult situations, such as oc-
clusions or noise. Multicamera tracking systems have been
exhaustively proposed in previous literature, for instance,
in Refs. 3638.
First of all, in order to integrate measurements from dif-
ferent cameras, a shared reference is needed. Our reference
is a plan of the football field, and a homographic matrix for
each camera has been calculated. Thus, we can transform
points from the image to the coordinate system of the plan.
Besides, we need a height estimator, which also implies a
previous calibration. Before explaining the tracking algo-
rithm, both procedures are shown in depth in the following
subsections.
Algorithm 1 (Single view stage algorithm). Given a par-
ticle set x
t−1
n
,
t−1
n
n=1
N
, which represents the posterior prob-
ability px
t−1
z
t−1
at time t−1:
1. Generate N
1
new samples from the importance func-
tion x
t
n
gx
t
. Particles are distributed in the inter-
section between an ellipse surrounding S
t
, with radius
proportional to the position variance, and the color
model mask.
2. Propagate N
2
samples from the generated samples in
the previous time step resampling x
t
n
px
t
x
t−1
n
.
We obtain a number N
t
=N
1
+N
2
of particles N
t
N
t−1
.
3. Weight the particles with
t
n
pz
t
x
t
n
f
t
x
t
n
/ g
t
x
t
n
using the equation 9, and normalize them:
n=1
N
t
t
n
=1.
4. Reweight the particles to include the effect of the
feedback procedure: w
t
n
=
t
n
·
x
t
n
,x
t
plane
, where
is
a distance function between the target in the image
and on the plan.
5. Estimate the new position of the state, ES
t
=
n=1
N
t
w
t
n
x
t
n
.
5.1 Homographic Transformation
Taking a minimum of four points, we can establish the
correspondence between the floor plane in the image and
the plan of the field.
20
With this transformation, we can
locate the position of players on the plan, assuming that the
player is in contact with the floor. One homographic matrix
must be calculated for each camera.
Fig. 6 Single-view tracking algorithm combining importance sam-
pling, multi-cue conjugation, and multicamera feedback estimation.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-7
5.2 Height Estimator
As we mentioned in Sec. 4.3.3, we require a tool for ob-
taining the number of pixels that represents the average
height of a person. Due to the perspective effect, this num-
ber depends on the location of the person in the image. We
are able to ascertain this through scene calibration.
39
First, we have to obtain a plane perpendicular to the
floor, which has been defined with the four points used to
calculate the homographic matrices. For this purpose, we
have to extract four points of the walls, goal posts, or any
other vertical structure. Knowing both planes, we can cal-
culate three vanishing points two horizontal and one ver-
tical. These vanishing points permit us to project any point
onto the coordinate axes, and elevate it vertically by a num-
ber of pixels corresponding to the height of the target at this
point of the image. This number of pixels has been deter-
mined by a reference height in the image, that is, marking
two points in the image, which are projected onto coordi-
nate axes, and giving the real height in meters. We use the
goal posts as reference, but in the cameras in whose field of
view there are no vertical structures, we can use a person of
standard reference height, assuming everybody has the
same height.
This methodology is able to return the head point given
the foot point, or return the height given both points.We
use the first application to determine the number of pixels
corresponding to a players height at a given location.
The height estimator algorithm is shown in Algorithm 2
and Fig. 7. Homogeneous coordinates are utilized to sim-
plify the mathematical operations.
Algorithm 2 (Height estimation algorithm).
Calculate the slope of the line H2H1:
v
=
y
H2
y
H1
x
H2
x
H1
.
Estimate point H2, supposing the reference height to
be 1.8 m, and using the proportion between the num-
ber of pixels and the reference height in meters.
Calculate the line L
H2PFII
=H2PFII in homoge-
neous coordinates, that is,
H2 = x
H2
y
H2
1.
Calculate the line L
APFI
=APFI.
Calculate the axis Y : L
H1PFII
=H1PFII, where H1
is the coordinate origin.
Calculate the point A
=L
APFI
L
H1PFII
.
Calculate the line L
A
PFIII
=A
PFIII.
Calculate the point B
=L
A
PFIII
L
H2PFII
.
Calculate the line L
APFIII
=APFIII.
Calculate the line L
B
PFI
=B
PFI.
Calculate the point B = L
B
PFI
L
APFIII
.
Calculate the height:
Heightx
A
,y
A
= 关共x
A
x
B
2
+ y
A
y
B
2
1/2
.
5.3 Multicamera UKF
In this subsection we present a modification of the UKF
that combines several measurements, provided by different
cameras, for each tracked object. Due to the use of several
sensors as measurement sources, we call the algorithm the
multicamera unscented Kalman filter MCUKF.
40
This al-
gorithm can be extended to different types of sensors.
41
The filter can be divided into three stages: state predic-
tion, measurement prediction, and estimation. The scheme
of this process is shown in Fig. 8. An external matching
process must be used in order to make correspondences
between trackers and measurements.
5.3.1 State prediction
In the prediction stage, the tracker is initialized with the last
estimation done in the previous time step. Hence, knowing
the previous state x
ˆ
k−1
, with e 1 components, and its co-
variance P
ˆ
k−1
, with ee components, both the extended
covariance P
ˆ
k−1
a
and state x
ˆ
k−1
a
can be obtained by concat-
enating the previous parameter and the state noise v
k
. This
Fig. 7 Left and middle: estimated height for several players in the image and their projection onto the
height reference. Right: height estimation procedure.
Fig. 8 Multicamera unscented Kalman filter algorithm for multitarget
tracking.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-8
is a simplification of the UKF, in which state and measure-
ment noises are used. The measurement noise will be used
in the measurement prediction stage:
x
ˆ
k−1
a
= x
ˆ
k−1
T
Ev
k
其兴
T
with Ev
k
= 00 ¯ 0
T
, 11
P
ˆ
k−1
a
=
P
ˆ
k−1
0
0
R
v
, 12
where R
v
is the state noise matrix. The number of sigma
points is 2n +1 , where n is the length of the extended state.
Following the classical equation for the unscented trans-
form, the first sigma point corresponds with the previous
frame estimation, the next n sigma points are the previous
estimation plus each column of the previous estimation ma-
trix, and the last n points are the previous estimations mi-
nus the same columns:
X
k−1
a
= x
ˆ
k−1
a
x
ˆ
k−1
a
+ 关共n + P
ˆ
k−1
a
1/2
x
ˆ
k−1
a
关共n
+ P
ˆ
k−1
a
1/2
. 13
The components of these sigma points can be divided into
two groups: those derived from the state X
k−1
x
, and those
derived from the state noise X
k−1
v
.
The weights assigned to each sigma point are calculated
in the same way as in the unscented transformation. There-
fore, the 0th weight will be different for obtaining the mean
weight W
i
m
and the covariance weight W
i
c
:
W
0
m
=
n +
,
W
0
c
=
n +
+ 1+
2
+
,
W
i
m
= W
i
c
=
1
2n +
, i = 1,2, ... ,n, 14
where =
2
n + k n is a scale parameter. The constant
involves the spread of the sigma point around the mean x
¯
;it
has a small positive value usually 1
10
−4
. The con-
stant k is a secondary scaling parameter, usually with val-
ues between 0 and 3 n. Finally,
is used to incorporate
previous knowledge of the distribution of x.
In order to predict the sigma points at the k’th instant,
knowing the previous points, the transition matrix F is
firstly required. Using a constant-velocity model, F can be
calculated as
x
ˆ
kk−1
= F · x
ˆ
k−1
. 15
The sigma point at the next instant of time is
X
ˆ
kk−1
x
= F · X
k−1
x
+ X
k−1
x
. 16
With these points and their weights, the predicted mean and
covariance are given by
x
ˆ
kk−1
=
i=0
2n
W
i
m
X
i,kk−1
x
,
P
ˆ
kk−1
=
i=0
2n
W
i
c
X
i,kk−1
x
x
ˆ
kk−1
兴关X
i,kk−1
x
x
ˆ
kk−1
T
. 17
A graphic representation of this process is depicted in Fig.
9.
5.3.2 Measurement prediction stage
The second contribution to the original UKF consists in
obtaining the measurement prediction taking into account
the measurements and measurement noises of each camera.
In the measurement prediction stage, the first step consists
in calculating the state predictions and the measurement-
tracker matching. Next, using the predictions and the mea-
surement noise, both the extended state x
ˆ
k
a
and the covari-
ance P
ˆ
k
a
can be developed. The concatenated measurement
noise matrix R
n
is built from measurement noise matrices
of each camera, R
i
n
with rr components, where r is the
dimensionality of the measurement and i=1,2, ...,S, with
S the number of cameras:
x
ˆ
k
a
= x
ˆ
kk−1
00 ¯
T
, P
ˆ
k
a
=
P
ˆ
kk−1
0
0
R
n
, R
n
=
R
1
n
0
0
R
2
n
. 18
In such a case, a tracker with S measurements, from S
different cameras, will have a state vector with n
=rS+ e
components, and 2rS+ e+ 1 sigma points:
X
k−1
a
= x
ˆ
k
a
x
ˆ
k
a
+ 关共n + P
ˆ
k
a
1/2
x
ˆ
k
a
关共n + P
ˆ
k
a
1/2
,
19
and we have
Fig. 9 State prediction of mean and sigma points.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-9
W
0
m
=
n
+
,
W
0
c
=
n
+
+ 1+
2
+
,
W
i
m
= W
i
c
=
1
2n
+
, i = 1,2, ... ,n
. 20
The sigma point components can be divided into compo-
nents derived from the state X
k
x
and components derived
from the measurement noise X
k
n
, which can be separated
according to the measurement 1,2, ...,S:
X
k
n1
,X
k
n2
, ...,X
k
nS
.
The measurement matrix H, which makes the transfor-
mation from state coordinates to measurement coordinates,
is applied to obtain the measurement prediction sigma point
from sigma points, and applies the gain:
Y
kk−1
s
= H · X
k
x
+ X
k
ns
, s = 1,2, ... ,S. 21
Using these S sets of sigma points, we can obtain, for
each measurement, the measurement prediction, the cova-
riance prediction, and the measurement-state cross-
covariance:
y
ˆ
kk−1
s
=
i=0
2n
W
i
m
Y
i,kk−1
s
, 22
P
y
ˆ
k
y
ˆ
k
s
=
i=0
2n
W
i
c
Y
i,kk−1
s
y
ˆ
kk−1
s
兴关Y
i,kk−1
s
y
ˆ
kk−1
s
T
, 23
P
x
ˆ
k
y
ˆ
k
s
=
i=0
2n
W
i
c
X
i,kk−1
s
x
ˆ
kk−1
兴关Y
i,kk−1
s
y
ˆ
kk−1
s
T
. 24
These equations are depicted in Fig. 10, with a two-camera
example.
5.3.3 Estimation stage
First, a Kalman gain for each measurement associated to
the tracker is calculated:
K
k
s
= P
x
ˆ
k
y
ˆ
k
s
/P
y
ˆ
k
y
ˆ
k
s
. 25
After that, measurements from different cameras must be
combined to obtain a shared estimation Fig. 11. The
weights
s
play the role of combining the different mea-
surements according to their reliability. It is considered that
weights are composed of two factors: the distance to the
prediction, and the covariance of each measurement. The
two are combined according to the importance given to
each one.
The set of weights will be normalized, since the sum of
the weights must be 1. The mean and covariance estimate
will be
x
ˆ
k
= x
ˆ
kk−1
+
s=1
S
s
K
k
s
y
k
s
y
ˆ
kk−1
s
, 26
P
ˆ
k
= P
ˆ
kk−1
+
s=1
S
s
K
k
s
P
y
ˆ
k
y
ˆ
k
s
K
k
s
T
. 27
5.3.4 Comparison of MCUKF versus UKF
We can observe the benefits of MCUKF in Fig. 12, where a
comparison between our multicamera tracking algorithm
and two independent UKFs is established. The comparison
is made between the MCUKF estimation and the mean of
two independent estimations obtained with two single-
camera UKFs. For low state noise R
v
or measurement noise
R
n
, the two algorithms give similar results. However,
MCUKF obtains a lower mean squared error when the
noise level rise.
5.4 Data Association
In the previous subsections, we have described a tracking
algorithm for a single object. In a real application, however,
there are several targets moving in the same region and
interacting with each other. Thus, we need to assign the
corresponding measurement to each tracker chosen, among
all the measurements and the possible existing distracters.
This assignment will be made in the matching stage.
Given the difficulties that multiple target tracking in-
volves, several techniques have been proposed in the litera-
ture. Data association between observations and trackers is
the problem to be solved, and coalescence meaning that
the tracker associates more than one trajectory with some
targets while losing track of others is the most challenging
Fig. 10 Hypothetic sigma point distribution for measurements from
two different cameras. These points adjust their positions to repre-
sent the measurement covariance placed on the prediction.
Fig. 11 Graphic scheme of the estimation.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-10
difficulty, especially when similar targets move close or
present occlusions. Moreover, cluttered scenarios produce
false alarms, which introduce confusion into the association
algorithm. A popular approach that tries to cope with this
problem is the Markov chain Monte Carlo MCMC
42
PF.
This method models explicitly the interaction of targets by
removing measurements that fall within a certain radius of
other target predictions.
Another emerging technique to deal with multitarget
tracking is based on a random set perspective, which
was detailed by Goodman et al.
43
Using this mathematical
tool, Mahler
44
derived the probability hypothesis density
PHD filter. This filter avoids the necessity of data asso-
ciation and allows one to tackle a number of targets and
observations, which can be created and destroyed during
the tracking. Sequential Monte Carlo techniques for
random-set-based filters, including the PHD filter proposed
in Ref. 45 and the Gaussian solution to the PHD filter pro-
posed in Ref. 46, have led to many multitarget tracking
applications. A higher-order random set multitarget filter
called the cardinalized PHD was proposed in Ref. 47, and
closed-form solutions were published in Ref. 48. Lately,
Pham et al. have extended their application to multisensor
scenarios.
49,50
Fortunately, although in general multitarget tracking
deals with state estimation of a variable number of targets,
assumptions about a constant or known number of targets
can be used to constrain the problem. This is exactly our
case, since the number of football players is constant, and
this fact allows us to simplify the problem and choose a
much more simple and efficient approach. We base our
multitarget proposal on a set of independent filters, utilizing
the multisensor redundancy and the feedback procedure for
handling the coalescence among targets. Although less gen-
eral, independent filters have some advantages: They lead
to a linear growth in the computation, they do not require a
careful design of the proposal density, which played an
important role in the success of previous methods such as
MCMC, and the inclusion of new trackers in the image
when a player goes into the visual field of a camera
does not affect the approximation accuracy of the existing
trackers.
For those reasons, the goal of this stage consists in se-
lecting the measurements for each independent tracker.
This is the only stage that cannot be computed sequentially
or independently for each tracker. The proposed algorithm
is based on the theory of data association presented in Ref.
51. In this context, we have chosen a modified version of
the nearest-neighbor algorithm.
21
The main difference from
the original algorithm consists in a simple approach that
limits the number of combinations in order to reduce the
computing time, although obtaining a suboptimal result.
In our algorithm, two conditions must be fulfilled for
assigning the measurements: Not more than one measure-
ment of each camera can be assigned to each tracker, and a
measurement cannot be assigned to two different trackers.
In other words, an algorithm to avoid conflicts must be
applied. A set of possible measurements is assigned to each
tracker, using as criterion the Mahalanobis distance be-
tween trackers and measurements. The Mahalanobis dis-
tance is a metric function based on correlations between
variables, which allows us to express the similarity between
those variables, taking into account the distribution of
samples. The equation for the Mahalanobis distance is as
follows:
D = 关共x x
¯
T
C
−1
x x
¯
兲兴
1/2
, 27
where x
¯
is the mean and C the covariance matrix.
With an appropriate threshold, it is possible to know if
there is any measurement suitable to a tracker for each
prediction. We consider all the measurements that are in a
zone corresponding to 95% around the mean, that is, all the
measurements whose distances are lower than 5.99 chi-
squared test.
Then, a matrix of possibilities, , is composed. If each
tracker i has a number m
i
of possible measurements
M
1
1
, M
2
1
, ...,M
m
i
1
, the first row of the matrix is initialized
with the measurements of the first tracker:
1
= M
1
1
M
2
1
¯ M
m
1
1
. 28
In the second iteration, the matrix
1
is replicated m
2
times, and a new row with the measurements assigned to
the second tracker is appended to obtain all possible
combinations:
2
=
M
1
1
M
2
1
¯
M
m
1
1
M
1
2
M
1
2
¯
M
1
2
M
1
1
M
2
1
¯
M
m
1
1
M
2
2
M
2
2
¯
M
2
2
¯
¯
M
1
1
M
2
1
¯
M
m
1
1
M
m
2
2
M
M
2
2
¯
M
m
2
2
=
1
1
¯
1
M
1
2
M
2
2
¯
M
m
2
2
29
Fig. 12 Mean squared state error of MCUKF and two independent
camera UKFs for different levels of state noise. A similar graph is
obtained for different levels of measurement noise.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-11
Before starting each iteration, incompatible combina-
tions must be deleted, because two trackers cannot catch
the same measurement. Thus, the matrix
2
is obtained.
The process continues in the same way until finishing all
the trackers:
i
=
i−1
i−1
¯
i−1
M
1
i
M
2
i
¯
M
m
i
i
. 30
After applying the algorithm, the resulting matrix has
a number of rows equal to the number of trackers, and a
number of columns equal to the number of valid
measurement-tracker combinations. Thus, the combination
whose sum of distances will be minimum is selected. A
graphic example is depicted in Fig. 13.
However, there are some important points that we must
take into account:
We can choose between two strategies for making the
combination filter: wait for the last iteration, or apply
the filter on each iteration. The first option is easier,
but the second is more efficient.
If possible, the matching should be calculated inde-
pendently for each camera, in order to reduce the
number of options.
The possibility of not having a measurement must al-
ways be considered for each tracker. If an incorrect
measurement is assigned, it will damage the other
trackers.
5.4.1 Problems of multisensor multitarget matching
There are several problems using multisensor matching.
Most of these problems only happen if there has been a
detection or segmentation error previously, but solutions
must be designed to avoid these errors, which cause bad
tracking. We have considered the following errors:
one tracker associated to measurements from different
players
several trackers associated to measurements from the
same player
large covariances
unknown number of possible combinations.
The objective of these corrections is to ensure that each
tracker is tracking a player. It is not possible to ensure that
at a given moment each tracker will be tracking one and
only one player. But the system must be capable of noticing
any important error and be able to restore the desired situ-
ation in a sufficiently brief time.
Problem with one tracker associated to measurements
from different players. This problem only happens when
each sensor just covers a part of the tracking field, it being
possible that two or more players are nearby but not all are
seen from a certain camera. In this case, when a tracker has
a large covariance for example, when it has lost the player
it was tracking, it is possible that it takes both measure-
ments as if they had come from the same player.
If the covariance is large, then it is also possible that
these measurements from different cameras are distant, in-
dicating that they probably come from different players.
The detection of the problem consists then in establishing a
threshold that represents the maximum distance between
two measurements if they are provided by the same player.
The solution applied is different if the tracker captures
two measurements or more. In case the tracker has two
measurements assigned, the solution is to deassign the fur-
thest measurement from the tracker. After that, it is advis-
able to check if the measurement may correspond to an-
other tracked player.
In the case of three or more measurements, an iterative
process is performed until the distance between each pair of
trajectories is below the threshold. First, the measurement
with the longest global distance to the other measurements
is deassigned. Then, this measurement is assigned to other
tracker if possible. Finally, all distances between each pair
of measurements are evaluated. This process is repeated
until the maximum distance is below the threshold, as seen
in Fig. 14.
Problem with several trackers associated to measure-
ments from the same player. It is also possible that dif-
ferent trackers are associated to different measurements
from the same player Fig. 15, left, effectively following
these trackers the same player while there are other avail-
able players. Normally, this problem only happens after a
segmentation error or under unexpected circumstances.
Since it is inevitable to make mistakes in previous stages, it
is necessary to develop a system that allows us to correct
them.
To be sure that these two or more measurements come
from the same player, three conditions must be satisfied.
Fig. 13 Diagram with a simple example of how different combina-
tions are generated and impossible combinations are filtered out.
Fig. 14 Diagram used for discarding measurements when they are
too far away. Each designed measurement is reassigned if possible.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-12
The first condition is that no pair of measurements of the
supposed bad trackers can come from the same camera.
The second condition relates to the distance between each
pair of measurements, which must always be below a cer-
tain threshold. The third and final condition is that the
trackers must also have a distance below another threshold.
To be coherent, this threshold must have the same value as
the one used in the solution of the previous problem.
The solution for this problem is to merge all measure-
ments into only one tracker when the three conditions are
satisfied, leaving the other trackers without measurement,
and establishing what we have called exclusion zones.
Each exclusion zone consists of an area that is estab-
lished surrounding a measurement, valid for only one
tracker and only during a determined number of frames.
When a measurement is inside an active exclusion zone, it
is not visible to the tracker that has created this exclusion
zone Fig. 15, middle.
Exclusion zones are indispensable, since when all mea-
surements are assigned to the same tracker, the other track-
ers make their covariances grow to try to catch other mea-
surements, but they will recapture the same measurements
if the exclusion zone is not established. It is desirable that
the number of frames through which the exclusion zone is
active should be enough to allow the tracker to find another
measurement, and that its size should be enough to ensure
that the tracked player does not cross through its bound-
aries while the exclusion zone is active Fig. 15, right.
Problem with large covariances. Another possible prob-
lematic situation is generated by trackers with large cova-
riances. When a tracker loses all measurements, its covari-
ance starts to grow, in order to be able to catch an available
measurement. But, since the Mahalanobis distance is used
to measure the distance among trackers and measurements,
it is possible that, if the covariance becomes very large, the
tracker captures a measurement that corresponds to a nearer
tracker with small covariance Fig. 15, right.
It is possible to limit the maximum covariance area, but
this affects the way the trackers recapture lost measure-
ments. Another option is to give priorities when assigning
the measurements. One or more thresholds can be set, to
divide the trackers into several categories. Trackers with
the smallest covariance will choose their measurements
first, and the other trackers will only be able to choose
among those measurements that the first group of trackers
has not caught. It is necessary that any tracker that has not
lost its measurements be always included in the group of
trackers with the smallest covariance, to avoid assignment
problems.
Problem with the number of possible combina-
tions. The algorithm previously proposed has the disad-
vantage that the number of combinations between measure-
ments and trackers is unknown, and it can be very high if
each tracker has many different available measurements. It
is necessary to limit the number of combinations in order to
reduce the maximum number of combinations that the al-
gorithm can take into account.
The easiest method to limit the total number of combi-
nations is to limit individually the number of possible mea-
surements that a tracker can have discarding further mea-
surements. For example, a threshold of m = 3 is a good
compromise between speed and efficiency in our case. It is
high enough to consider all important options, but not so
high that the processing time will be too long.
However, there is a better way to limit the number of
combinations, consisting in deleting the worst partial com-
binations at each iteration, if they exceed our desired limit.
6 Feedback Procedure
Image tracking is conditioned by camera location and
player occlusions, which can produce tracking failures.
This fact is especially noticeable when targets have virtu-
ally identical appearance, so that only dynamics can be
used to distinguish the targets. These situations are cor-
rected using multicamera tracking in the plan. In those situ-
ations, player locations in the image will be different from
locations in the plan. In order to alleviate this incoherence,
we send feedback state vectors from the plan to each view.
This information is integrated in the particle filter as an
extra likelihood. Particles of each player are reweighted,
assuming the position in the plan of the player as a Gauss-
ian or a super-Gaussian and assigning the weight according
to the distance to the center. In this manner, if the locations
in the two trackers fit each other, the result will not be
affected. On the other hand, if they do not fit, the inaccurate
location in the image is corrected. Thus, each camera track-
ing is helped by other cameras thanks to the feedback pro-
cedure, correcting errors that could not be corrected with a
single view, and improving the results of all cameras.
The metric used to feed back the information from the
plan to the camera is a distance function that we have de-
fined as
x
t
n
,x
t
plane
exp兵关 x
t
n
x
t
plane
T
·
−1
· x
t
n
x
t
plane
兲兴
,
31
where x
t
plane
is the MCUKF estimate of the mean x
ˆ
t
,
is
the covariance P
ˆ
t
, and
is an exponent that allows control-
Fig. 15 Left: Tracker T1 is tracking measurement M1, from camera
1, while tracker T2 is tracking measurement M2, from camera 2. But
both measurements come from the same player. Middle: Both mea-
surements are assigned to tracker T1, while two exclusion zones,
one around each measurement, are created for tracker T2, prevent-
ing it from catching these measurements while its covariance grows.
Right: Though far from measurement M1, tracker T2 gets it because
the Mahalanobis distance is lower than for T1, because T2 has a
large covariance.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-13
ling and tuning the influence of the feedback process in the
final estimation. In this system, it has been set at 70% of the
final weight.
The procedure was shown in Fig. 6, where it is depicted
how this feedback information is introduced after calculat-
ing the measurement estimation for every camera and be-
fore obtaining the final state estimation. Therefore, the
feedback process can be understood as an iterative refine-
ment of the posterior probability. The methodology that en-
ables that is based on dividing the evaluation step into a set
of layers. This multilayer particle filter allows introducing a
refinement, where the first estimations help to discard some
hypotheses before costly evaluations are done. In this way,
independent observations can be combined sequentially to
give the final estimation. This layered particle filter has a
similar purpose to the annealed particle filter described in
Ref. 52. However, the methodology is different. Whereas
the annealed particle filter uses the same measurement
through the different layers, our proposal introduces a new
factor that was not present in the first estimation and whose
value changes with the posterior estimation.
6.1 Reset Mechanism
In all the previous mechanisms to model interaction and
avoid tracking failures, due to the small size of players in
the image, the measurement is frequently completely lost.
For this reason, a reset mechanism has been implemented.
This reset can act in two different cases:
If the location of the player in the plan is too different
from the location in the image, a failure in the image
will be taken for granted and it will be reinitialized
with the location in the plan.
If two plan trackers catch the same measurement and
superimpose on each other, a failure in the plan will be
taken for granted. To remedy it, extra deassigned mea-
surements are looked for in the image, and one of the
trackers is reinitialized to the nearest such
measurement.
Finally, we need an algorithm to administer the transi-
tion of players between cameras. Although the number of
targets is fixed and therefore the number of multicamera
estimations is also fixed, the players go in and out of the
camera fields of view. In these cases, we have to create or
delete a local tracker in the image. A new tracker is initial-
ized with the label and the particles corresponding to the
player.
7 Results
Our system has been tested using several sequences of a
football database. This database was taken in collaboration
with the Government of Aragón, the University of Zara-
goza, and Real Zaragoza S.A.D. football team. For this
purpose, eight analog cameras were installed in the football
stadium and connected to two MPEG4 video recorders. In
the first frame, the initialization is made by hand, choosing
the players in each camera field of view. We use a constant-
acceleration model and a motion dynamic that permits ob-
jects with variable trajectories. By introducing x and y ve-
locities in the state vector we can solve occlusions between
tracked objects:
x
ˆ
k
= x
v
x
y
v
y
, x
ˆ
kk−1
= F · x
ˆ
k−1
, 32
and the dynamic matrix is given by a first-order model.
Table 1 Sequence 1 statistics. The error measurement was obtained using manually labeled ground truth.
Player
Distance
m
Max. velocity
km/h
Mean velocity
km/h Sprints
In play
min
Max. error
m
Mean error
m
Team12 1 2 1 2 1212121 2
1 54.95 54.09 24.54 23.19 17.05 16.78 2 2 0.19 0.19 1.04 1.07 0.41 0.36
2 33.55 30.11 18.56 12.80 10.41 9.34 1 2 0.19 0.19 1.14 1.35 0.36 0.43
3 39.54 24.53 19.72 10.52 12.27 7.61 1 1 0.19 0.19 1.32 0.72 0.54 0.25
4 27.27 27.89 14.91 12.98 8.46 8.66 1 1 0.19 0.19 1.78 0.94 0.64 0.30
5 40.04 42.88 17.17 15.93 12.43 13.31 1 0 0.19 0.19 0.92 0.85 0.33 0.37
6 25.14 28.50 10.26 16.26 7.80 8.85 1 1 0.19 0.19 2.05 2.45 0.88 0.36
7 18.09 45.63 7.03 17.46 5.61 14.16 2 2 0.19 0.19 0.92 1.35 0.30 0.40
8 37.14 30.98 14.04 14.04 11.53 9.62 1 2 0.19 0.19 1.09 1.42 0.38 0.39
9 20.40 22.70 8.16 8.71 6.33 7.02 1 1 0.19 0.19 1.22 0.93 0.52 0.37
10 19.61 19.02 6.25 7.76 6.06 5.88 0 1 0.19 0.19 1.02 0.95 0.41 0.42
Total 2.45 0.401
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-14
Once tracking has finished, we can extract statistics from
the trajectories of the targets, such as maximum velocity,
mean velocity, covered distance, or number of sprints, to
name a few.
In order to evaluate the system and obtain numerical
results, we have manually labeled 2600 frames correspond-
ing to all cameras for two sequences of 12 s. Sequences
have been chosen with many complex interactions between
players of both teams, but without any scrums or other
special circumstances. In this manner, the accuracy of the
system can be checked using the ground truth of the labeled
sequences. Results and statistics are shown in Tables 13
and Figs. 1619. Reported occlusions are those among
players belonging the same team, that is, with the same
color model. Occlusions between different teams, although
they are common due to defensive coverages, have been
solved automatically by the tracking algorithm in all the
observed situations.
A good accuracy level has been obtained for both se-
quences, for two reasons: the multicamera tracking, which
reduces the measurement noise of each camera using the
others, and the high precision of the single-camera stage.
To estimate the importance of this single-camera stage, we
have tested the system without it, that is, sending the blob
position from each camera directly to the plan. This con-
figuration gives us a value of the location mean error equal
to 1.182 m, which means an increase of 125% over the
multicamera approach.
Sequence 2, more complex than the first one, presents
lower performance, since most of interactions happen in an
area that is not properly covered by any camera due to the
poor image resolution see Figs. 18 and 19. As conse-
quence, the accuracy decreases and one of the targets is
finally lost. This lost player player 7, team 1 has been
highlighted in Table 2 to show the incoherent data reported,
which can provide useful information to easily identify lost
Table 2 Sequence 2 statistics. An incorrectly tracked target, which had to be reinitialized manually, is marked in bold 共兲 shows resulting errors
without this tracker.
Player
Distance
m
Max velocity
km/h
Mean velocity
km/h Sprints
In play
min
Max. error
m
Mena error
m
Team 1 2 1 2 1 2 1 2 1 2 1 2 1 2
1 50.52 48.16 19.48 17.78 15.16 14.45 1 3 0.2 0.2 2.71 2.08 0.90 0.83
2 44.01 53.55 19.70 23.64 13.20 16.06 1 1 0.2 0.2 1.94 3.27 0.59 0.63
3 39.20 59.67 15.09 29.54 11.76 17.9 3 1 0.2 0.2 1.41 4.48 0.50 0.50
4 27.48 35.77 14.65 17.55 8.24 10.73 3 1 0.2 0.2 3.18 1.81 0.42 0.51
5 20.82 87.35 7.84 37.59 6.24 26.20 2 3 0.2 0.2 1.12 5.34 0.29 1.71
6 26.00 46.95 9.99 24.15 7.80 14.09 1 1 0.2 0.2 2.35 2.68 0.94 0.53
7 68.28 28.06 41.59 10.64 20.48 8.42 2 2 0.2 0.2 21.8 3.84 13.0 0.38
8 43.88 29.09 16.33 10.55 13.16 8.73 2 1 0.2 0.2 1.11 1.05 0.55 0.37
9 35.78 42.20 14.47 18.73 10.73 12.66 2 1 0.2 0.2 2.07 1.74 0.46 0.51
10 36.78 30.17 16.03 14.35 11.03 9.05 3 1 0.2 0.2 5.70 1.62 1.21 0.45
Total 21.8 5.7 1.26 0.646
Table 3 Sequence complexity and results obtained.
*
Only occlusions between players of the same team have been scored, since color models
solve the others.
Seq.
Duration
s
Occlusions
*
Length
of
occlusions
Transitions between cameras
Lost
players Mean error
Entering Leaving
Total Solved Total Solved Total Solved
1 11.625 2 2 24 9 9 9 9 0 0.40146777
2 12.0 5 5 84 11 10 12 11 1 0.64622249
Total 23.625 7 7 108 20 19 21 20 1 0.52384513
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-15
Fig. 16 Trackers and trajectories during a period of the football match sequence 1.
Fig. 17 Trajectories at each view during a period of the football match sequence 1.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-16
Fig. 18 Trackers and trajectories during a period of the football match sequence 2.
Fig. 19 Trajectories at each view during a period of the football match sequence 2.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-17
trackers. By checking performance statistics, like velocity
or distance, trackers with physically impossible results can
be taken as signs of wrong tracking. This fact is important,
because error data require an annotated sequence, which is
not available during a real performance. This information
could be used to generate alarms automatically and act ac-
cordingly.
Although excellent results have been obtained, it is not
easy to extract conclusions about the performance during a
whole match with such short sequences. For this reason, the
complete system has been tested uninterruptedly during a
long sequence of 8 min to extract conclusions and qualita-
tive statistics. This sequence includes all sorts of situations
inherent in a football match Fig. 20, such as fouls,
scrums, goals, and other complex interactions. Thanks to
this test, results see Table 4 can be analyzed, and useful
conclusions can be extracted.
Table 4 encapsulates the advantages of our proposal.
The error rate is the number of manual corrections divided
by the length of the sequence, and the saving rate is the
number of manual corrections divided by the total number
of locations needed to track the players. This means that a
human supervisor must act only during 5.84% of the match
time in order to correct the system errors. Thus, our system
is able to save 99.4% of the work that a human user must
do to complete the annotation of the whole match. Finally,
the conflict rate is the number of solved potentially danger-
ous situations divided by the total number of such situa-
tions. Therefore, the system yields a substantial improve-
ment over current commercial systems, which solve most
of the conflicts manually.
To analyze the unsolved errors in depth, we can examine
the distribution of these errors Fig. 21. Most of them are
situated in zones 1, 2, and 3 and are due to the poor reso-
lution of the players in these areas for all the cameras. As
future improvement, it would be advisable to place three
extra cameras in order to increase the system redundancy,
reducing thus the error rate. The importance of this redun-
dancy is also corroborated by the conflict distribution:
57.32% of the conflicts appear when only one camera could
be used to obtain useful observations due to occlusions,
extremely low resolution, or compression losses, and the
density of conflicts per square meter is reduced from 2.44
to 1.98 when these conflicts are observed by more than one
camera. A more detailed study that examines the impor-
tance of the camera overlapping is given in the appendix
Sec. 9.
We have also evaluated the modified PF for performance
evaluation of tracking and surveillance on a sequence
PETS兲共ftp://ftp.pets.rdg.ac.uk/pub/VS-PETS/兲共Fig. 22
with a single camera, to check the robustness of the color
segmentation and tracking algorithm. The sequence is com-
pleted successfully in spite of the fact that the cameras and
perspective are quite different.
Using this PETS sequence, we have made a comparison
between our importance-sampling PF and the traditional
PF. Table 5 shows the results for the mean squared error
MSE and computational cost. As we can see, using the
integral image method allows a useful reduction factor for
each particle estimation. Although an initial cost is added
creating the integral matrix, it is compensated by the huge
number of particle evaluations performed. The comparison
has been made using MATLAB running on a Pentium
2.4-GHz chip; the time costs obtained are useful only for
comparison with one another.
8 Conclusions and Future Work
We have presented a computationally efficient multicamera
algorithm for tracking multiple targets in a sporting match.
The robustness of multiview methods lies in the multiview
integration, which makes the method insensitive to occlu-
sions in some of the cameras. However, a sporting match is
an environment so complex that that is not enough in many
situations. For this reason, we have developed a double-
tracking methodology combined with a feedback proce-
dure. This technique increases the strength of the system at
a feasible computational cost, thanks to several efficient
modifications of the particle filter.
The most valuable novelties presented in this paper are
the modification to reduce computational costs using the
integral image and the CPI mask, the extension of the
UKF to multicamera applications, and the feedback proce-
dure, which increases the robustness and accuracy of both
tracking systems.
The results obtained are satisfactory, and show a sub-
stantial improvement over systems based on manual label-
ing. Our system allows processing a complete match in a
few hours using a conventional PC. In this way, a football
trainer will have the match at his disposal in order to ana-
Table 4 Long-sequence qualitative results.
Duration
Error
rate
Saving
rate
Conficts
Time Frames Total Solved Rate
7 min 46 s
11661 5.84% 99.37% 759 608 80.1%
Fig. 20 Complex situations that appear during a football match.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-18
lyze it carefully and extract conclusions. Obtaining the re-
sults so soon means an important competitive advantage in
an increasingly competitive environment.
As future work, it will be important to develop a method
to automate the initialization of the players on the map and
the generation of the color models.
53
Furthermore, we must
improve the data association algorithm in very complex
situations fouls, corner kicks. The inclusion of three extra
cameras as well as the use of gigabit digital cameras will
help to reduce the number of unsolved errors. Finally, we
are working on a better relationship between the particle
filter and the plan in order to obtain a better estimation of
the measurement reliability, thereby improving the multi-
camera integration.
Acknowledgments
The authors thank Government of Aragón and Real Zara-
goza S.A.D. We would also like to thank I3A for its confi-
Fig. 21 Left: Conflict distribution. Right: Main zones of unsolved errors.
Fig. 22 Results for the PETS sequence.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-19
dence. This work is partially supported by grant TIN2006-
11044 and FEDER from Spanish Ministry of Education. J.
Martínez-del-Rincón is supported by a FPI grant BES-
2004-3741 from the Spanish Ministry of Education.
Appendix: Error Analysis
A quality study of system errors has been carried out over a
set of test data chosen from one camera installed in the
stadium. These data have been classified into three zones in
the image, each one with a different mean distance from the
camera position. In this way, the perspective effects can be
analyzed through errors in the overall process. This can be
seen in Fig. 23.
There are two procedures that must be taken into account:
measurement extraction involving the PF and homogra-
phy. The errors involve the right working of the UKF and
its correspondence algorithm.
The time evolution of the UKF is governed by the follow-
ing simplified equations:
x
k+1
= Fx
k
+
E
,
y
k
= Hx
k
+
M
, 33
where
E
N0,
E
and
M
N0,
M
are the state and
measurement errors, respectively. It is not possible to know
the first one a priori, because it depends of the target tra-
jectory. On the other hand, the measurement error can be
obtained by analyzing the measurement process.
Both errors work together in order to produce the final error
in the UKF algorithm, expressed as the predicted and esti-
mated covariance P. Mathematically, this can be expressed
as P
E
+
M
. The first term can be reduced by adding
more complex dynamic models
54
or incorporating different
models to take into account different player behaviors.
55
The last term can be covered by studying the measurement
process and providing better algorithms. For this purpose,
an alternative proposal has been proposed in this paper. An
image-tracking system has been incorporated see Sec. 4.1
to improve the measurement process, thus reducing the
measurement noise. Nevertheless, the quality of measure-
ment depends on further parameters, like perspective and
distortion errors. In our case, the perspective error plays an
important role because the players are located at very long
distances from the camera. This fact is one of the most
important reasons for the necessity of using several cam-
eras one camera will always have the best perspective of a
player.
Assuming that errors are Gaussian with zero mean, two
such measurements can produce a better measurement. This
is another reason to use several cameras. However, the hy-
pothesis of zero mean is not true. Our measurements pos-
sess a positive mean whose value depends on the target
position. This characteristic can be observed in Fig. 24. The
error on each axis is shown in the left and middle graphs,
Table 5 Comparison of our algorithm with conventional particle filter
PETS sequence, Fig. 22.
Algorithm
MSE
pixel
2
Mean
N
Time s
PDI Particles Total
Conventional:
N=150
190.2 150 4.4 4.4
N=500
165.3 500 23.9 23.9
N=1000
171.9 1000 44.3 44.3
Ours 112.1 314 0.15 0.46 0.61
Fig. 23 Perspective zones in camera 1.
Fig. 24 Perspective error in the image.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-20
and the error modulus in the rightmost one. Note that error
is lower in the middle of the image, and it rises in the outer
areas.
Once the theory of the measurement error has been treated,
the homography error can be analyzed. This error appears
when measurements in the image are converted to plane
coordinates. Obviously, this error depends on the player
position with respect to the camera, but it appears due to
the homography matrix construction—to be more exact, the
points selected to build this matrix.
We have followed the methodology cited in Ref. 56 in or-
der to compute the covariance of the estimated homogra-
phy. All points used to build this matrix are considered, and
the implicit error is modeled as a isotropic homogeneous
Gaussian. Therefore, these errors are denoted as
x
=
y
=
for image points and
x
=
y
= for plane points.
Likewise, the homography covariance is defined as
h
=J·S·J, where J=−
i=2
9
u
k
·u
k
T
/
k
, with u
k
the k’th eigen-
vector of the matrix A
T
·A and
k
the corresponding eigen-
value. Finally, S is defined as follows:
S =
i=1
n
a
2i−1
T
a
2i−1
f
i
o
+ a
2i
T
a
2i
f
i
e
+ a
2i−1
T
a
2i
f
i
oe
+ a
2i
T
a
2i−1
f
i
eo
34
with a
i
equal to row i of the matrix A. The rest of the
parameters can be defined, on writing H in its vectorial
form H =h
1
,h
2
,h
3
,h
4
,h
5
,h
6
,h
7
,h
8
,h
9
,as
f
i
o
=
2
h
1
2
+ h
2
2
−2X
i
h
1
h
7
+ h
2
h
8
兲兴 +2
2
x
i
h
7
h
9
+ x
i
y
i
h
7
h
8
+ y
i
h
8
h
9
+
2
X
i
2
+ x
i
2
2
h
7
2
+
2
X
i
2
+ y
i
2
2
h
8
2
+
2
h
9
2
,
35
f
i
e
=
2
h
4
2
+ h
5
2
−2Y
i
h
4
h
7
+ h
5
h
8
兲兴 +2
2
x
i
h
7
h
9
+ x
i
y
i
h
7
h
8
+ y
i
h
8
h
9
+
2
Y
i
2
+ x
i
2
2
h
7
2
+
2
Y
i
2
+ y
i
2
2
h
8
2
+
2
h
9
2
,
36
f
i
oe
= f
i
eo
=
2
关共h
1
X
i
h
7
兲共h
4
Y
i
h
7
+ h
2
X
i
h
8
兲共h
5
Y
i
h
8
兲兴, 37
with X
i
,Y
i
the coordinates on the ground and x
i
, y
i
the
coordinates in the image.
The typical formula for homographic computing is x
=Hx, which is converted to X= Bh, where B is the follow-
ing 39 matrix:
B =
x
T
0
T
0
T
0
T
x
T
0
T
0
T
0
T
x
T
. 38
The 3 3 matrix
x
is the covariance of the point X in
homogeneous world coordinates. The conversion to a 2
2 matrix
x
22
in nonhomogeneous coordinates is car-
ried out as follows:
x
22
= f
x
f
T
, 39
where X =X , Y , W
T
world coordinates and f is defined
as
f =
1
W
2
W
0−X
0
W
Y
. 40
Assuming only noise in the homography computation and
accurate data on the point x, the covariance of the corre-
sponding world point will be
x
= B
h
B
T
. 41
These equations applied to our test data give us the results
shown in Fig. 25.
In conclusion, although the errors in the image show a val-
ley at the center of the image, these errors converted to
world coordinates increase with the distance from the cam-
era position. The units in world coordinates are pixels,
which can be directly converted to meters. This conversion
gives values close to those shown in Table 3.
References
1. N. J. Gordon, D. J. Salmond, and A. F. M. Smith, “Novel approach to
nonlinear/non-Gaussian Bayesian state estimation,” IEE Proc. F, Ra-
dar Signal Process. 140, 107–113 1993.
2. A. Doucet, N. de Freitas, and N. Gordon, Sequential Monte Carlo
Methods in Practice, Springer-Verlag 2001.
3. M. Isard and A. Blake, “Condensation: conditional density propaga-
tion for visual tracking,” Int. J. Comput. Vis. 291, 5–28 1998.
4. S. J. Julier and J. K. Uhlmann, “A new extension of the Kalman filter
to nonlinear systems,” in Proc. AeroSense: The 11th Int. Symp. on
Aerospace/Defence Sensing Simulation and Controls 1997.
5. E. A. Wan and R. van der Merwe, The Unscented Kalman Filter,
Chap. 7 2001.
6. A. Ekin, A. M. Tekalp, and R. Mehrotra, “Automatic soccer video
analysis and summarization,” IEEE Trans. Image Process. 127,
796– 807 2003.
7. J. Assfalg, M. Bertini, C. Colombo, A. Del Bimboa, and W. Nunziati,
“Semantic annotation of soccer videos: automatic highlights identifi-
cation,” Comput. Vis. Image Underst. 922–3, 285–305 2003.
8. K. Matsumoto, S. Sudo, H. Saito, and S. Ozawa, “Optimized camera
viewpoint determination system for game soccer broadcasting,” in
Proc. MVA2000, IAPR Workshop on Machine Vision Applications,
pp. 115–118 2000.
9. K. Choi and Y. Seo, “Probabilistic tracking of soccer players and
ball,” in Statistical Methods in Video Processing, D. Comanciu, E.
Kanatani, R. Mester, and D. Suter, Eds., pp. 50–60, Springer 2004.
10. J. Vermaak, A. Doucet, and P. Perez, “Maintaining multi-modality
through mixture tracking,” in Int. Conf. on Computer Vision,Vol.2,
p. 1110 2003.
11. K. Okuma, A. Taleghani, N. de Freitas, J. Little, and D. Lowe, “A
boosted particle filter: multitarget detection and tracking,” in
Proc.
8th Eur. Conf. on Computer Vision, pp. 28–39 2004.
Fig. 25 Perspective error on the plan.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-21
12. J. B. Hayet, T. Mathes, J. Czyz, J. Piater, J. Verly, and B. Macq, “A
modular multi-camera framework for team sports tracking,” in Proc.
IEEE Int. Conf. on Advanced Video and Signal-Based Surveillance
2005.
13. C. J. Needham and R. D. Boyle, “Tracking multiple sports players
through occlusion, congestion and scale,” in Br. Machine Vision Conf.
BMCV’01, pp. 93–102 2001.
14. A. Yamada, Y. Shirai, and J. Miura, “Tracking players and a ball in
video image sequence and estimating camera parameters for 3D in-
terpretation of soccer games,” in Proc. 16th Int. Conf. on Pattern
Recognition (ICPR’02), Vol. 1, p. 10303 2002.
15. K. Okuma, J. J. Little, and D. Lowe, “Automatic acquisition of mo-
tion trajectories: tracking hockey players,” Proc. SPIE 5304, 202–213
2003.
16. M. Xu, J. Orwell, and G. A. Jones, “Tracking football players with
multiple cameras,” in Int. Conf. on Image Processing, Vol. 5, pp.
2909–2912 2004.
17. M. Xu, L. Lowey, J. Orwell, and D. Thirde, “Architecture and algo-
rithms for tracking football players with multiple cameras,” IEE
Proc. Vision Image Signal Process. 1522, 232–241 2005.
18. J. Ren, J. Orwell, G. A. Jones, and M. Xu, “Real-time 3D soccer ball
tracking from multiple cameras,” in Br. Machine Vision Conf., pp.
829–838 2004.
19. K. Choi and Y. Seo, “Probabilistic tracking of the soccer ball,” in
Proc. ECCV Workshop on Statistical Methods in Video Processing,
pp. 50–60 2004.
20. R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer
Vision, 2nd ed., Chap. 7.8, Cambridge Univ. Press 2004.
21. Y. Zhang, H. Leung, T. Lo, and J. Litva, “Distributed sequential
nearest neighbour multitarget tracking algorithm,” IEE Proc., Radar
Sonar Navig. 143, 255–260 Aug. 1996.
22. K. Althoff, J. Degerman, and T. Gustavsson, “Combined segmenta-
tion and tracking of neural stem-cells,” Lect. Notes Comput. Sci.
3540, 282–291
2005.
23. H. Nait-Charif and S. J. McKenna, “Head tracking and action recog-
nition in a smart meeting room,” in 4th IEEE Int. Workshop on Per-
formance Evaluation of Tracking and Surveillance 2003.
24. K. Nummiaro, E. Koller-Meier, and L. Van Gool, “A color-based
particle filter,” in Symp. for Pattern Recognition of the DAGM, Vol.
2449, pp. 353–360 2002.
25. K. Nummairo, E. B. Koller-Meier, and L. Van Gool, “An adaptive
color-based particle filter,” Image Vis. Comput. 211, 99–110 2003.
26. E. Polat, M. Yeasin, and R. Sharma, “Robust tracking of human body
parts for collaborative human computer interaction,” Comput. Vis.
Image Underst. 89,44692003.
27. M. Isard and A. Blake, “Icondensation: unifying low-level and high-
level tracking in a stochastic framework,” in 5th Eur. Conf. on Com-
puter Vision, pp. 893–908 1998.
28. D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object track-
ing,” IEEE Trans. Pattern Anal. Mach. Intell. 25, 564–575 2003.
29. S. McKenna, Y. Raja, and S. Gong, “Tracking colour objects using
adaptive mixture models,” Image Vis. Comput. 17, 225–231 1999.
30. C. Stauffer and W. Grimson, “Learning patterns of activity using
real-time tracking,” IEEE Trans. Pattern Anal. Mach. Intell. 228,
747–757 2000.
31. B. Han, D. Comaniciu, Y. Zhu, and L. Davis, “Incremental density
approximation and kernel-based Bayesian filtering for object track-
ing,” in IEEE Conf. on Computer Vision and Pattern Recognition
(CVPR’04) 2004.
32. M. S. Yang and K. L. Wu, “Unsupervised possibilistic clustering,”
Pattern Recogn. 391, 5–21 2006
.
33. M. Kim and R. S. Ramakrishna, “New indices for clustering validity
assessment,” Pattern Recogn. Lett. 2615, 2353–2363 2005.
34. A. Elgammal, R. Duraiswami, D. Harwood, and L. S. Davis, “Back-
ground and foreground modeling using nonparametric kernel density
estimation for visual surveillance,” Proc. IEEE 907, 1151–1163
2002.
35. P. Viola and M. Jones, “Rapid object detection using a boosted cas-
cade of simple features,” in IEEE Computer Soc. Conf. on Computer
Vision and Pattern Recognition 2001.
36. D. Thirde, M. Borg, J. Ferryman, J. Aguilera, M. Kampel, and G.
Fernandez, “Multi-camera tracking for visual surveillance applica-
tions,” in 11th Computer Vision Winter Workshop 2006.
37. J. Black and T. Ellis, “Multicamera image measurement and corre-
spondence,” Meas. J. Int. Meas. Confed. 351, 61–71 2002.
38. M. Meyer, T. Ohmacht, R. Bosch, and M. Hotter, “Video surveillance
applications using multiple views of a scene,” in 32nd Annual 1998
Int. Carnahan Conf. on Security Technology Proc., pp. 216–219
1998.
39. A. Criminisi, I. Reid, and A. Zisserman, “Single view metrology,”
Int. J. Comput. Vis. 402, 123–148 2000.
40. J. Martínez, J. E. Herrero, J. R. Gómez, and C. Orrite, “Automatic
left luggage detection and tracking using multi-camera UKF,” in
IEEE Int. Workshop on Performance Evaluation of Tracking and Sur-
veillance (PETS 06), pp. 59–66 2006.
41. J. R. Gómez, J. E. Herrero, C. Medrano, and C. Orrite, “Multi-sensor
system based on unscented Kalman filter,” in IASTED Int. Conf. on
Visualization, Imaging, and Image Processing, pp. 59–66 2006.
42. N. Bergman and A. Doucet, “Markov chain Monte Carlo data asso-
ciation for target tracking,” in ICASSP ’00: Proc. IEEE Int. Conf. on
Acoustics, Speech, and Signal Processing, pp. II705–II708, IEEE
Computer Soc. 2000.
43. I. Goodman, R. Mahler, and H. Nguyen, Mathematics of Data Fu-
sion
, Kluwer Academic Publishers 1997.
44. R. P. S. Mahler, “Multi-target Bayes filtering via first-order multi-
target moments,” IEEE Trans. Aerosp. Electron. Syst. 394, 1152–
1178 2003.
45. B. N. Vo, S. Singh, and A. Doucet,“Sequential Monte Carlo methods
for multi-target filtering with random finite sets,” IEEE Trans.
Aerosp. Electron. Syst. 414, 1224–1245 2005.
46. B. N. Vo and W. K. Ma, “The Gaussian mixture probability hypoth-
esis density filter,” IEEE Trans. Signal Process. 5411, 4091–4104
2006.
47. B. N. Vo, S. Singh, and A. Doucet, “PHD filters of higher order in
target number,” IEEE Trans. Aerosp. Electron. Syst. 434, 1523–
1543 2007.
48. B. T. Vo, B. N. Vo, and A. Cantoni, “Analytic implementations of the
cardinalized probability hypothesis density filter,” IEEE Trans. Sig-
nal Process. 557, 3553–3567 2007.
49. N. Pham, W. Huang, and S. Ong, “Multiple sensor multiple object
tracking with GMPHD filter,” in Proc. 10th Int. Conf. on Information
Fusion, pp. 1–7 2007.
50. N. Pham, W. Huang, and S. Ong, “Probability hypothesis density
approach for multi-camera multi-object tracking,” in 8th Asian Conf.
on Computer Vision, pp. 875–884 2007.
51. Y. Bar-Shalom and X. R. Li, Multitarget-Multisensor Tracking: Prin-
ciples and Techniques, Chap. 8 1995.
52. J. Deutscher, A. Blake, and I. Reid, “Articulated body motion capture
by annealed particle filtering,” Comput. Vision Pattern Recognition 2,
126–133
2000.
53. J. R. Gómez, J. E. Herrero, M. Montanés, J. Martínez, and C. Orrite,
“Automatic detection and classification of football players,” in
IASTED Int. Conf. on Signal and Image Processing (SIP 2007),pp.
483–488 2007.
54. A. Senior, “Tracking people with probabilistic appearance models,”
in Proc. ECCV Workshop on Performance Evaluation of Tracking
and Surveillance, pp. 48–55 2002.
55. M. E. Farmer, R. L. Hsu, and A. K. Jain, “Interacting multiple model
IMM Kalman filters for robust high speed human motion tracking,”
in Proc. 16th Int. Conf. on Pattern Recognition, pp. II:20–23 2002.
56. A. Criminisi, I. D. Reid, and A. Zisserman, “A plane measuring de-
vice,” Image Vis. Comput. 178, 625–634 1999.
Jesús Martínez-del-Rincón received the
PhD degree from the University of Zara-
goza, specializing in biomedical engineer-
ing, in 2008. He previously graduated from
the University of Zaragoza in telecommuni-
cation in 2003. He is currently pursuing doc-
toral studies in computer vision, motion
analysis, and human tracking.
Elías Herrero-Jaraba received his PhD de-
gree in 2005 from the University of Zara-
goza, Spain. He joined the Centro Politéc-
nico Superior of the University of Zaragoza
as a researcher in March 2001. In February
2003 he became an assistant professor,
and since May 2007 he has been an asso-
ciate professor at the same university. His
current research interests include image
processing, multicamera and multitarget
tracking, three-dimensional vision, and
measurement processes. Dr. Herrero is an associate member of the
IEEE.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-22
J. Raúl Gómez received the MSc degree
from the University of Zaragoza, specializ-
ing in electronic technologies, in 2006. He
previously graduated from the University of
Zaragoza in telecommunication in 2004. Af-
ter that, he joined the Computer Vision
Laboratory of the Aragon Institute of Engi-
neering Research I3A.Heiscurrentlypur-
suing doctoral studies, specializing in com-
puter vision, tracking, and figure detection.
Carlos Orrite-Uruñuela received the mas-
ters degree in Industrial Engineering at the
Zaragoza University in 1989. In 1994, he
completed the masters degree in biomedi-
cal engineering, working in the field of medi-
cal instrumentation for several industrial
partners. In 1997 he did his PhD on com-
puter vision at the University of Zaragoza.
He is currently an associate professor in the
Department of Electronics and Communica-
tions Engineering at the University of Zara-
goza and carries out his research activities in the Aragon Institute of
Engineering Research I3A. His research interests are in computer
vision and human-machine interfaces. He has participated in sev-
eral national and international projects. He supervises several MSc
students in computer vision, biometrics, and human motion analysis.
Carlos Medrano received the MS degree in
physics from the University of Zaragoza,
Zaragoza, Spain, in 1994, and the PhD de-
gree in 1997, jointly from the University of
Zaragoza and from Joseph Fourier Univer-
sity, Grenoble, France. His PhD was devel-
oped at the European Synchrotron Radia-
tion Facility ESRF, Grenoble, France,
studying x-ray imaging techniques for mag-
netic materials. He is a lecturer in the Elec-
tronics Department at the Polytechnic Uni-
versity School of Teruel, Spain, where he has been employed since
1998. After some years dedicated to computer-based control sys-
tems, his current research interests include aspects of computer
vision such as real-time tracking and activity recognition.
Miguel A. Montañés-Laborda graduated in
electronic engineering from the University of
Zaragoza in 2001. He is currently complet-
ing his MSc degree in systems engineering
and computing and the second cycle of
electronic and automatic engineering, both
from the University of Zaragoza. In Septem-
ber 2004, he joined the Computer Vision
Laboratory of the Aragon Institute of Engi-
neering Research I3A as a scientific de-
veloper.
Martínez-del-Rincón et al.: Multicamera sport player tracking with Bayesian estimation of measurements
Optical Engineering April 2009/Vol. 484047201-23
... Over the last decades there has been an exponential increase in soccer video analysis research (Leo et al., 2013). Using Bayesian methods, some investigators (Martinez-del-Rincon et al., 2009;Motoi et al., 2012) tried to develop some game tracking systems that have revealed some functionality in detecting in-game events. These models seem to be important not only for coaches and practitioners but also for an increasing number of fans at home or in the stadium who may be interested in consuming more 'live' detailed, performance information. ...
... Over the last decades there has been an exponential increase in soccer video analysis research (Leo et al., 2013). Using Bayesian methods, some investigators (Martinez-del-Rincon et al., 2009;Motoi et al., 2012) tried to develop some game tracking systems that have revealed some functionality in detecting in-game events. These models seem to be important not only for coaches and practitioners but also for an increasing number of fans at home or in the stadium who may be interested in consuming more 'live' detailed, performance information. ...
... Another important application for multicamera systems is sport game monitoring, where the information from different cameras is fused in order to track all the objects in the scene and collect information and statistics of players and teams. Example of these kind of applications can be found in Ref. 7, where particle filters are used to track objects in each single view and an unscented Kalman filter is used to combine the information provided by the particle filters and to calculate the trajectory of the final players. Another example is Ref. 8, where the images provided by several cameras (from 8 to 15 cameras) are processed by a hybrid system that combines graphic processing units (GPUs) and classical CPUs: Mixture of Gaussian algorithms have been used to segment the football players that are subsequently classified and tracked. ...
Article
Full-text available
We present an adaptive and efficient background modeling strategy for real-time object detection in multicamera systems. The proposed approach is an innovative multiparameter adaptation strategy of the mixture of Gaussian (MoG) background modeling algorithm. This approach is able to efficiently adjust the computational requirements of the tasks to the available processing power and to the activity of the scene. The innovative approach allows one to adapt the MoG without a significant loss in the detection accuracy while contemporarily adhering to the real-time constraints. The adaptation strategy works at the local level by modifying, independently, the MoG parameters of each task, and then, whenever the results of the local strategy are not satisfactory, a global adaptation strategy starts that aims at balancing the workload among the tasks. Our approach has been tested on three different data sets, including several image sizes, heterogeneous environments (indoor and outdoor scenarios), and different real-time constraints. The results show that the proposed adaptive system is well suited for multicamera applications thanks to this efficiency and adaptability; it guarantees real-time highly accurate detections.
... Even though soccer player trajectory estimation methods have been developed [6]- [8], problems exist for our purpose, including a high requirement for the resolution of input images, poor at player shadows, and huge processing cost. On the other hand, sporting events are held in outdoor large-scale spaces, and the number of capturing cameras is limited. ...
Conference Paper
Full-text available
Our research aims to generate a player's view video stream by using a 3D free-viewpoint video technique. Since player trajectories are necessary to generate the video, we propose a real-time player trajectory estimation method by utilizing the shadow regions from soccer scenes. This paper describes our trial to realize real-time processing. We divide the process into capture and server computers. In addition, we reduced the processing cost with pipeline parallelization and optimization. We apply our proposed method to an actual soccer match held in a stadium and show its effectiveness.
Book
Full-text available
To understand the dynamic patterns of behaviours and interactions between athletes that characterize successful performance in different sports is an important challenge for all sport practitioners. This book guides the reader in understanding how an ecological dynamics framework for use of artificial intelligence (AI) can be implemented to interpret sport performance and the design of practice contexts. By examining how AI methodologies are utilized in team games, such as football, as well as in individual sports, such as golf and climbing, this book provides a better understanding of the kinematic and physiological indicators that might better capture athletic performance by looking at the current state-of-the-art AI approaches. Artificial Intelligence in Sport Performance Analysis provides an all-encompassing perspective in an innovative approach that signals practical applications for both academics and practitioners in the fields of coaching, sports analysis, and sport science, as well as related subjects such as engineering, computer and data science, and statistics.
Article
Full-text available
In this paper we present an adaptive multi-camera system for real time object detection able to efficiently adjust the computational requirements of video processing blocks to the available processing power and the activity of the scene. The system is based on a two level adaptation strategy that works at local and at global level. Object detection is based on a Gaussian mixtures model background subtraction algorithm. Results show that the system can efficiently adapt the algorithm parameters without a significant loss in the detection accuracy. I. INTRODUCTION Object detection algorithms are fundamental modules in all video based tracking systems. The analysis of temporal and spatial information between different frames allows the detection of moving objects in a video sequence. Background subtraction is a technique that estimates a background model of the scene and, then, any deviation from the model is considered as a moving object. Different background modeling techniques have been presented in the literature (1). Stauffer and Grimson (2) presented a popular algorithm that aims at estimating a background model of the scene where each pixel is independently modeled as a mixture of Gaussian distributions. The algorithm is composed by two fundamental steps: the first step is the estimation of whether or not a pixel belongs to the background model; during the second step the Gaussian parameters are updated. This "pixel-wise" approach is computational demanding since for each pixel all the parameters of the distributions have to be updated. Several variants of this algorithm have been presented in the literature with the aim of reducing the computational time in real time applications (3): improve the efficiency in the estimation of the parameters (4); implement variable block size approaches, sub sampling techniques and hierarchical strategies (3); or use a variable number of Gaussians per pixel as a function of the amount of foreground pixels (5). Nevertheless, these approaches are strictly related to a specific application, showing a low degree of adaptability since they are based on the modification of only one characteristic of the algorithm. Their direct applicability to multi-camera systems is very limited, and, therefore, more efficient and adaptive strategies are required. In this work we present an adaptive and efficient multi- camera system for real time object detection able to adapt the computational requirements to the scene activity and the available processing resources.
Article
Full-text available
In this paper, we propose a multi-camera application capable of processing high resolution images and extracting features based on colors patterns over graphic processing units (GPU). The goal is to work in real time under the uncontrolled environment of a sport event like a football match. Since football players are composed for diverse and complex color patterns, a Gaussian Mixture Models (GMM) is applied as segmentation paradigm, in order to analyze sport live images and video. Optimization techniques have also been applied over the C++ implementation using profiling tools focused on high performance. Time consuming tasks were implemented over NVIDIA’s CUDA platform, and later restructured and enhanced, speeding up the whole process significantly. Our resulting code is around 4–11 times faster on a low cost GPU than a highly optimized C++ version on a central processing unit (CPU) over the same data. Real time has been obtained processing until 64 frames per second. An important conclusion derived from our study is the scalability of the application to the number of cores on the GPU. KeywordsImage processing–Color segmentation–Real time– GPU – CUDA
Article
Full-text available
Computer systems that have the capability of analyzing complex and dynamic scenes play an essential role in video annotation. Scenes can be complex in such a way that there are many cluttered objects with different colors, shapes and sizes, and can be dynamic with multiple interacting moving objects and a constantly changing background. In reality, there are many scenes that are complex, dynamic, and challenging enough for computers to describe. These scenes include games of sports, air traffic, car traffic, street intersections, and cloud transformations. Our research is about the challenge of inventing a descriptive computer system that analyzes scenes of hockey games where multiple moving players interact with each other on a constantly moving background due to camera motions. Ultimately, such a computer system should be able to acquire reliable data by extracting the players" motion as their trajectories, querying them by analyzing the descriptive information of data, and predict the motions of some hockey players based on the result of the query. Among these three major aspects of the system, we primarily focus on visual information of the scenes, that is, how to automatically acquire motion trajectories of hockey players from video. More accurately, we automatically analyze the hockey scenes by estimating parameters (i.e., pan, tilt, and zoom) of the broadcast cameras, tracking hockey players in those scenes, and constructing a visual description of the data by displaying trajectories of those players. Many technical problems in vision such as fast and unpredictable players" motions and rapid camera motions make our challenge worth tackling. To the best of our knowledge, there have not been any automatic video annotation systems for hockey developed in the past. Although there are many obstacles to overcome, our efforts and accomplishments would hopefully establish the infrastructure of the automatic hockey annotation system and become a milestone for research in automatic video annotation in this domain.
Article
An effective system simultaneously tracking multiple players and a ball on broadcasted soccer matches is proposed in this paper. This system uses particle filter with synthesized images from templates for tracking players of the same team in occlusion. This synthesized image where an adaptive color histogram is made from means an expected image for each particle and gives more precise likelihood evaluation of the particles. For ball tracking, when the ball is in ballistic motion without any interruption of players, an ordinary particle filter estimates the state of the ball. When the ball is considered to be possessed by a player or players, the tracker stops, waits for the ball to reappear in the area around the corresponding players. This tracker gives good performance on the commonly broadcasted soccer match videos.