Decentralized camera network control using game theory
ABSTRACT This paper deals with the problem of decentralized, cooperative control of a camera network. We focus on applications where events unfold over a large geographic area and need to be analyzed by multiple cameras or other kinds of imaging sensors. There is no central unit accumulating and analyzing all the data. The overall goal is to keep track of all objects (i.e., targets) in the region of deployment of the cameras, while selectively focusing at a high resolution on some particular target features based on application requirements. Efficient usage of resources in such a scenario requires that the cameras be active. However, this control cannot be based on separate analysis of the sensed video in each camera. They must act collaboratively to be able to acquire multiple targets at different resolutions. Our research focuses on developing accurate and efficient target acquisition and camera control algorithms in such scenarios using game theory. We show simulated experimental results of the approach.
-
Citations (0)
-
Cited In (0)
Page 1
DECENTRALIZED CAMERA NETWORK CONTROL USING GAME THEORY
Bi Song, Cristian Soto, Amit K. Roy-Chowdhury, Jay A. Farrell
Department of Electrical Engineering
University of California, Riverside
ABSTRACT
This paper deals with the problem of decentralized, cooper-
ative control of a camera network. We focus on applications
where events unfold over a large geographic area and need
to be analyzed by multiple cameras or other kinds of imag-
ing sensors. There is no central unit accumulating and ana-
lyzing all the data. The overall goal is to keep track of all
objects (i.e., targets) in the region of deployment of the cam-
eras, while selectively focusing at a high resolution on some
particular target features based on application requirements.
Efficient usage of resources in such a scenario requires that
the cameras be active. However, this control cannot be based
on separate analysis of the sensed video in each camera. They
must act collaboratively to be able to acquire multiple targets
at different resolutions. Our research focuses on developing
accurate and efficient target acquisition and camera control
algorithms in such scenarios using game theory. We show
simulated experimental results of the approach.
IndexTerms— cameranetwork, cooperativecontrol, game
theory, decentralized processing
1. INTRODUCTION
Small and large networks of video cameras are being installed
in many applications, e.g., video surveillance, national and
homeland security, recording events in a conference room,
etc. It is natural to expect that these camera networks would
be used to acquire targets at multiple resolutions, e.g., multi-
ple people, a single person, a face. For efficiency and max-
imum resource utilization, it is desirable to be able to con-
trol the cameras based on the requirements of the scene being
analyzed. It is also desirable that the control mechanism be
decentralized for a number of reasons. In some of these appli-
cations, there may be constraints of bandwidth, secure trans-
mission facilities, and difficulty in analyzing a huge amount
of data centrally. In such situations, the cameras would have
to act as autonomous agents and decisions would have to be
taken in a decentralized manner. Also, to achieve an opti-
mal result, the cameras should be working cooperatively with
each other trying to achieve the same overall goal. This is the
broad problem that we address in this paper.
The cooperative and decentralized nature of this problem
leads us to explore a game theoretic solution. Specifically,
we employ a framework in which the optimization of local
sensor utility functions leads to an optimal value for a global
utility. This achieves the goal of keeping track of all targets
at an acceptable resolution and some at high resolution. The
optimal camera parameters in the sensor network are deter-
mined dynamically according to these utility functions and
negotiation mechanisms based on the estimation of the states,
i.e. position and velocity, of the targets. To employ suitable
negotiation mechanisms between the different sensors is of
great importance since the cameras have to take strategic de-
cisionsaccordingtotheperceivedoranticipatedactionsof the
other cameras. This entire framework leads to a completely
decentralized approach.
The remainder of this paper is organized as follows: Sec-
tion 2 presents a rationale for the need of a decentralized col-
laborative camera network. Section 3 states the problem and
its solution in game theoretic terms. Our experimental results
are presented in Section 4. We summarize our work in Sec-
tion 5 with some possible directions for future research.
2. TECHNICAL RATIONALE
2.1. Necessity of collaboration in a camera network
We start by motivating the necessity of an intelligent camera
network and a cooperative strategy. For purposes of expla-
nation, we will refer to this problem as Cooperative Target
Acquisition (CTA). Two questions we need to address are the
following - (i) why do we need active cameras (as opposed to
having a network of cameras with a fixed set of parameters)
and (ii) why does the control strategy need to be cooperative?
The main reason for having a dynamically configurable
network is that it would be prohibitively expensive to have a
static setup that would cater to all possible situations. For ex-
ample, suppose we needed to focus on one person (possibly
non-cooperative) or specific features (e.g., face) of the person
as he walks around an airport terminal and obtain a high res-
olution image of him while keeping track of other activities
also going on in the terminal. To achieve this, we will either
need to dynamically change the parameters of the cameras
where this person is visible or have a setup whereby it would
be possible to capture high resolution imagery irrespective of
978-1-4244-2665-2/08/$25.00 c ?2008 IEEE
Page 2
where the person is in the terminal. The second option would
be very expensive and a huge waste of resources, both tech-
nical and economical. Therefore we need a way to control
the cameras based on the sensed data. Currently, similar ap-
plications try to cover the entire area or the most important
parts of it with a set of passive cameras, and have difficulty in
acquiring high resolution shots selectively.
The control strategy must necessarily be cooperative be-
causeeachcamera’sparametersettingsentailcertainconstraints
on other cameras. For example, if a camera zooms in to fo-
cus on the face of one particular person, thus narrowing its
field of view (FOV), it risks losing much of the person and
the surroundings. Another camera can compensate for this by
adjusting its parameters to track the person with a lower zoom
setting. This requires analysis of the video data in a network-
centric manner, leading to a cost-effective method to obtain
high resolution images for features at dynamically changing
locations.
Figure 1 shows a simple example to illustrate these points.
While Camera 1 keeps the entire scene in focus, it cannot get
high resolution features on any of the people. This is done by
Cameras 2, 3 and 4, but they have narrow FOVs. In fact, we
can see that Camera 2 can no longer keep track of the person
in the white shirt with its current settings, but can keep track
of the other person. Camera 3, therefore, takes responsibil-
ity for tracking the person in the white shirt. With the infor-
mation already available, Camera 4 is able to obtain a high
resolution facial image. Since events will unfold dynamically
over a large area, these assignments must be made dynami-
cally. For example, if the person in the green shirt does not
need to be tracked, Camera 2 can be released for some other
task. However, this requires controlling the cameras such that
they work in a collaborative manner. It would be very diffi-
cult, if not impossible, for a human operator to analyze mul-
tiple video streams and change the camera parameters simul-
taneously. Hence, the necessity for automated solutions.
2.2. Necessity of a decentralized strategy
As the problem complexity increases, it may be difficult to
analyze all the data in a centralized manner and come up with
an effective CTA strategy. There may not be enough band-
width and transmission power available to send all the data
to a central station. It is also not information rate efficient to
transmit an entirely unprocessed video stream. Furthermore,
security of the transmission and interception by a hostile op-
ponent may be a factor in some applications. Finally, even
if it is possible to securely transmit all the data to a central
unit, it may not be the best strategy as it is often intractable to
come up with a centralized optimal policy given the complex
nature of these systems and the environments where they are
deployed. In this framework, each camera must take its own
decisions based on analysis of its own sensed data and nego-
tiation mechanisms with other sensors. We propose a game
Fig. 1. An example of a camera network where different cameras
work collaboratively to fulfill different tasks. While Camera 1 keeps
the entire scene in focus, it cannot get high resolution features on
any of the people. This is done by Cameras 2, 3 and 4, but they have
narrow FOVs. In fact, we can see that Camera 2 can no longer keep
track of the person in the white shirt with its current settings, but can
keep track of the other person. Camera 3, therefore, takes responsi-
bility for tracking the person in the white shirt. With the information
already available, Camera 4 is able to obtain a high resolution facial
image.
theoretic solution to this problem, the motivation for which is
given in Section 3. The decentralized strategy is explained di-
agrammatically in Figure 2. The utility functions of the cam-
eras and their negotiation strategies are discussed in sections
3.4 and 3.5
2.3. Relation to previous work
Ontheonehand, theresearchpresentedinthispaperisrelated
to two classical problems in robotics and computer vision -
sensor planning and active vision [1]. On the other hand, the
work is related to camera networks which has recently gener-
ated a lot of interest.
Some recent work has dealt with networks of vision sen-
sors, namely computing the statistical dependence between
cameras, computing the camera network topology, tracking
over “blind” areas of the network, and camera handoff [2, 3,
4, 5]. However, these methods do not deal with the issue of
actively reconfiguring the network based on feedback from
the sensed data. The work that is perhaps most closely related
to this paper is [6]. Here, a virtual camera network environ-
ment was used to demonstrate a camera control scheme that
is a mixture between a distributed and a centralized scheme
using both static and PTZ cameras. Their work focused on
how to group cameras which are relevant to the same task in
a centralized manner while maintaining the individual groups
decentralized. In [7], a solution to the problem of optimal
Page 3
Fig. 2. A diagrammatic representation of a decentralized CTA
strategy.
camera placement given some coverage constraints was pre-
sented and can be used to come up with an initial camera con-
figuration.
Abroadframeworkforthevehicle-targetassignmentprob-
lem using game theory was presented in [8]; however, it does
not include a risk vs. reward trade-off (as the dynamics of
the targets is not considered) and does not consider the con-
straints imposed by video cameras. Unlike the above meth-
ods, we develop target acquisition strategies in a camera net-
work that rely on cooperative decision making in dynamic
environments.
3. COOPERATIVE TARGET ACQUISITION USING
GAME THEORY
Our goal is to develop a decentralized strategy for coordi-
nated control that relies on local decision-making at the cam-
era nodes, while being aligned with the suitable global crite-
rion of tracking multiple targets at multiple resolutions. For
this purpose, we propose to use novel game theoretic ideas
thatrelyonmulti-playerlearningandnegotiationmechanisms
[9]. The result is a decision making process that aims to op-
timize a certain global criterion based on individual decisions
by each component (sensor) and the decisions of other inter-
connected components. In the next section, we provide an
intuitive reasoning to justify the presented solution strategy.
Then we describe the criteria that each camera should use
to evaluate its own sensed data and the design of the nego-
tiation mechanisms. Finally, we describe the game-theoretic
approach for distributed control of the sensor network.
3.1. Motivation for game theoretic formulation
We aim to design a decentralized CTA strategy for a network
of imaging sensors whereby the sensors will optimally assign
themselves to a set of targets. This is done using a game the-
oretic formulation whereby each sensor is a rational decision
maker, optimizing its own utility function which indirectly
translates to the optimization of a global utility function.
In the context of the motivating example we presented in
Section 2.1, we provide an intuitive rationale why game the-
ory is a feasible solution strategy. The targets {T } are the op-
ponents. A target loses and is eliminated from the game if it
is captured in an image of the desired resolution (specified by
application requirements). Our team of cameras scores each
time it obtains a desired resolution image of a target; how-
ever, there is risk associated with each high resolution image
attempt, as explained in detail in Section 2.1 (acquisition of
each target at high resolution risks losing the global picture
of the relative positions of all the targets and subsequent loss
of track of a target if it leaves the field of view (FOV) of its
imaging device.)
When the number of targets is larger than the number of
cameras, the necessity of cooperation between the cameras is
obvious. Even when the number of cameras is larger than the
number of targets, the benefit of cooperation and information
sharing is clear. A camera tracking Tiat relatively low zoom
allows other cameras to track other targets at higher zoom
levels, provided the state of the camera tracking Ticould be
communicated to them. In addition, cameras at intermediate
zoom levels can allow tracking of multiple targets at a suf-
ficient level of accuracy to enable other cameras to attempt
higher risk, but higher reward, images at high zoom levels.
This shows that a decentralized strategy for camera control is
feasible through local decision making and negotiations with
neighboring cameras.
The objective of the team is to optimize the global utility
function by having each camera optimize its own local utility
function, again subject to risk, but now also subject to pre-
dictions of the actions of the other cameras. The first step is
to find suitable local utility functions such that the objectives
of each camera are localized to that camera, yet aligned with
a global utility function. The second step is to propose an
appropriate negotiation mechanism between cameras to en-
sure convergence of the distributed solution toward the global
solution. The actual computation of these functions depends
upon the analysis of the sensed video.
3.2. Precise problem statement and notation
The overall goal of the imaging network is to track all the
objects of interest in the area where they are deployed and ac-
quire high resolution imagery of some of these objects (speci-
fied by a user or another application). The global utility func-
tion should provide a measure of this objective.
Let us consider Nttargets in the entire area of deployment
and Ncsensors that need to be assigned to these targets. In
the camera network setup, each target will be represented by
a location vector and a resolution parameter, e.g., in Figure 1,
one target could be the face of the person in the white shirt
with a high resolution setting, while other targets could be all
Page 4
moving objects in the entire observed area with a low reso-
lution parameter setting. Let the cameras be denoted as C =
{C1,...,CNc} and the targets as T = {T1,...,TNt}. Camera
Ci ∈ C will select its own set of targets ai(= {Ti}) ∈ Ai,
where Aiis the set of targets that can be assigned to Ci, by
optimizing its own utility function UCi(ai). This is known as
target assignment in the game theory literature. Our problem
is to design these utility functions and appropriate negotiation
procedures that lead to a mutually agreeable assignment of
targets resulting in meeting the global criterion.
3.3. Game theory fundamentals
A well-known concept in game theory is the notion of Nash
equilibrium. In the context of our image network problem,
it will be defined as a choice of targets a∗= (a∗
such that no sensor could improve its utility further by devi-
ating from a∗. Obviously, this is a function of time since the
targets are dynamic and the sensors could also be mobile or
capable of panning, tilting and zooming. For our problem,
a Nash equilibrium will be reached at a particular instant of
time when all the cameras are tracking all the targets in the
deployment region at an acceptable resolution and there is no
advantage for a particular camera to choose some other target
to track.
Mathematically, ifa−idenotesthecollectionoftargetsfor
all cameras except camera Ci, then a∗is a pure Nash equilib-
rium if
1,...,a∗
Nc)
UCi(a∗
i,a∗
−i) = max
ai∈AiUCi(ai,a∗
−i),∀Ci∈ C.
(1)
3.4. Choice of utility functions
3.4.1. Target Utility.
If the number of cameras viewing Tiat an acceptable resolu-
tion r0is nr0
assignment profile a is
i, then the utility of covering Tiusing a particular
UTi(a) =?1
if nr0
if nr0
i> 0
i= 0
0
(2)
3.4.2. Global Utility.
From the target utility function, we can now define the global
utility function as the sum of the utilities generated by track-
ing all the targets, i.e.,
?
Ug(a) =
Ti
UTi(a).
(3)
3.4.3. Camera Utility.
We now come to the all important question of defining camera
utility in a suitable manner so that it is aligned to the global
utility function. A target utility, UTi, represents the overall
value of tracking a target Tiat an acceptable resolution. The
camera utility, UCi, represents a particular sensor’s share of
this value. We will say that the camera utility is aligned with
the global utility when the sensor can take an action improv-
ing its own utility if and only if it also improves the global
utility [10]. This will require prediction of the actions of
team-mates, which will be achieved through the negotiation
mechanisms described in the next section. Under this defini-
tion, we can use a number of utility functions that have been
proposed in the game theory literature [11, 10]. In our appli-
cation, we propose to use what is known as Wonderful Life
Utility (WLU). In WLU, the utility of a sensor tracking a par-
ticular target is the marginal contribution to the global utility
as a result of this action, i.e., the sensor utility is the change in
the global utility as a result of that sensor tracking that partic-
ular target as opposed to not tracking it. The exact expression
is
UCi(ai,a−i) = Ug(ai,a−i) − Ug(a−i).
This definition of camera utility works well if the team-
mate predictions are accurate and stable. As shown in [11],
this sensor utility leads to a potential game with the global
utility function as the potential function, and hence they are
aligned with the global utility. This ensures that the resulting
set of targets that are chosen will be included within the set of
pure Nash equilibria. In particular, the Wonderful Life Utility
has been widely used in economics [12].
(4)
3.5. Negotiation mechanisms
Tracking objects of interest in a dynamic setting requires ne-
gotiation mechanisms between the different sensors, allow-
ing them to come up with the strategic decisions described
above. Each sensor negotiates with other sensors to (i) ac-
curately estimate the state of the targets, (ii) accurately pre-
dict their team-mates’ parameters, and (iii) decide its own ac-
tion. Note that this entails each camera to consider risk, along
with the rewards measured by the utility functions described
above. This makes each sensor truly autonomous thus pro-
viding robustness in uncertain and adversarial environments,
where there could be a lack of adequate communication re-
sources, and incomplete knowledge of the sensed data and
environmental conditions of the other sensors. Hence the co-
operation between the sensors is limited to exchanging infor-
mation about the states of other sensors and targets, not the
actual video data.
The overall idea is to use learning algorithms for multi-
player games [11]. A particularly appealing strategy for this
problem is Spatial Adaptive Play (SAP) [13]. This is be-
cause it can be implemented with a low computational burden
on each camera and leads to an optimal assignment of tar-
gets with arbitrarily high probabilities for the WLU described
above. In a particular step of the SAP negotiation strategy, a
Page 5
camera is chosen randomly according to a uniform distribu-
tion. This camera can update the targets it is tracking so as to
maximize its own utility by taking into account the proposed
targets of all the other cameras in the previous step.
Let us consider a concrete example to illustrate this. Sup-
pose that the chosen camera, say Ciwith zoom zi, detects a
new object within its FOV that it may need to track at a high
resolution (specified by a user, for example.) Now, Cican
choose to track this object or, if the object is being tracked
by some other camera, it can leave it to that camera. This is
where the negotiation mechanism comes in. If Cihas access
to the assignment of the other cameras at a previous step, it
can determine whether that camera will be able to maintain
tracking at the desired resolution or the object will be out of
that camera’s FOV. Thus, Cican reach an informed decision
based on its own utility function and the negotiation mech-
anism that allows it to access the previous decisions of the
other cameras. It also requires predicting the position of the
target at a future time instant. By proposing this action, it
can monitor the responses of the other cameras and assess the
risk/reward tradeoff.
3.5.1. Application of SAP negotiation mechanism
In our method for camera control in a sensor network, we
adopt the Spatial Adaptive Play (SAP) strategy. Let us con-
sider a camera Ciwhich is viewing the area under surveil-
lance at an acceptable resolution. At any step of SAP negoti-
ations, Ciis randomly chosen from the pool of cameras in the
network according to a uniform distribution1, and only this
camera is given the chance to update its proposed parameter
settings. At any negotiation step, Ciproposes a parameter
setting to maximize its own utility based on other cameras’
parameters at the previous step. After Ciupdates its settings,
it broadcasts its parameters (i.e. pan, tilt and zoom) to the
entire network. Based on calibration data, the other cameras
can then calculate the area covered by Ciand use that in-
formation to update its parameters after being chosen at any
negotiation step. After the negotiation converges, the entire
area of interest is viewed at an acceptable resolution. An im-
plicit assumption is that the amount of time it takes for the
assignments to be finalized is less that the time for the targets
to move from one camera to another. This will be true if each
camera has sufficient processing power and if cameras are as-
signed to targets only when the targets will be in a cameras’
FOV for a reasonable amount of time.
Let us now consider the case where a specific application
requires the tracking of a target at a high resolution. Initially,
if a camera Cjdetermines that it can view the target at the
highest resolution based on the parameters of the other cam-
eras in the network, it starts tracking the target. Once Cjtakes
1The issue of choosing which camera to update in a decentralized manner
will be dealt with in future work. Some suggestions for implementing this
can be found in [8].
Table 1. Variables used in algorithms
discrete time step
Set of all cameras
Set of all targets
Target being tracked at acceptable resolution r0
Target being tracked at specified high resolution rh
Camera tracking (Tk,r0)
Camera tracking (Tk,rh)
Prediction time step
Predicted position of Tkat time t + ∆
Predicted time in which Tkis expected to leave Cl’s FOV
Predicted position of Tkatˆtkout
Target about to leave Cl’s FOV
t
{Cl}Nc
{Tk}Nt
(Tk,r0)
(Tk,rh)
Ci
Ch
∆
ˆ pk(t + ∆)
ˆtkout
ˆ pkout
Tkout
l=1
k=1
on the task of tracking the target at a high resolution, it will
predict the state (i.e. position and velocity) of the target at
each time step t + ∆, where ∆ is the prediction time. If
Cj predicts that it will not be able to view the target after
a time tout, it will asynchronously broadcast toutand the tar-
get’spredictedpositionattouttotheentirenetworkindicating
that it needs to handoff the tracking to another camera. When
a camera Ci(i ?= j) is now chosen to update its parameters
at any negotiation step, it will first determine if it can adjust
its parameter to view the target at the desired high resolution
based on the predicted target parameters broadcast by Cj. If it
can do so, it will set its parameters accordingly and take over
the tracking of the target. However, if Cidetermines it is not
able to view the target at a high resolution, it will simply con-
tinue with the negotiation cycle maximizing its own utility as
presented above. Once Cjis not able any more to track the
target, it returns to the negotiation cycle to maximize its own
utility.
This entire negotiation approach based on game theory
can be seen in more detail in Algorithms 1-4 (variables are
described in Table 1). In these algorithms, t = {t0,t1,t2,...}
need not be contiguous time instants and will depend upon the
time it takes for each target assignment to be complete. We
will assume that the computation time to come up with a set
of camera parameters, given a target configuration, is much
smaller than the time it takes for the target configuration to
change. Since all possible targets will always be tracked at a
low resolution, this implies that we are concerned only with
the time interval for which a high resolution target is chosen.
Hence, it is a very reasonable assumption.
4. EXPERIMENTAL RESULTS
To evaluate the proposed game theoretic method for camera
control, we tested our approach on a simulated camera net-
work. The area under surveillance is set up as a rectangu-
lar region of 20 by 30 meters. Since we cannot know the
time and location in which the targets will enter the area,
we treat the entire area under surveillance as targets. In our
specific implementation presented here, we divide the area
into grids in a way similar to [7] to make the problem more