Content uploaded by Aaron Mavrinac

Author content

All content in this area was uploaded by Aaron Mavrinac on Sep 20, 2017

Content may be subject to copyright.

Int J Comput Vis

DOI 10.1007/s11263-012-0587-7

Modeling Coverage in Camera Networks: A Survey

Aaron Mavrinac ·Xiang Chen

Received: 16 June 2011 / Accepted: 9 October 2012

© Springer Science+Business Media New York 2012

Abstract Modeling the coverage of a sensor network is

an important step in a number of design and optimization

techniques. The nature of vision sensors presents unique

challenges in deriving such models for camera networks. A

comprehensive survey of geometric and topological coverage

models for camera networks from the literature is presented.

The models are analyzed and compared in the context of their

intended applications, and from this treatment the properties

of a hypothetical inclusively general model of each type are

derived.

Keywords Camera networks ·Coverage geometry ·

Coverage topology ·Sensor planning ·Calibration

1 Introduction

Visual coverage is an important quantiﬁable property of cam-

era networks, describing from a pragmatic standpoint what

the system can see—that is, what visual data it is physically

capable of collecting—and thus informing the most funda-

mental requirement of any computer vision task. Virtually

all camera network applications depend on or can beneﬁt

from knowledge about the coverage of individual cameras,

the coverage of the network as a whole, or the relationships

between cameras in terms of their coverage.

It is therefore not surprising that there exists a sig-

niﬁcant body of literature on modeling camera network

A. Mavrinac ·X. Chen (B

)

Department of Electrical and Computer Engineering,

University of Windsor, Windsor, ON, Canada

e-mail: xchen@uwindsor.ca

A. Mavrinac

e-mail: mavrin1@uwindsor.ca

coverage, spanning back to the earliest days of camera net-

work research. With such diverse applications as sensor

planning, optimal camera placement, camera reconﬁgura-

tion, camera selection, calibration, tracking correspondence,

and optimal load distribution, and often broad variation of

speciﬁc objectives and constraints within each, numerous

structures have emerged for capturing coverage informa-

tion. Given the mixed lineage of camera networks, these

models are inﬂuenced by earlier work in computer vision

and in sensor networks; they also exhibit their own inno-

vations to meet the unique challenges of the ﬁeld. In this

correspondence, we examine the state of the art in mod-

eling camera network coverage, with a view to how the

properties of various models relate to their intended appli-

cations.

Coverage models can be classiﬁed into two different major

types. Geometric coverage models, the focus of Sect. 2,are

concerned with the physical area or volume of the scene cov-

ered by a camera network. Given some information about the

camera viewpoints, the physical structure of the scene, and

the task to be performed, such a model seeks to quantify

whether or not a particular stimulus (minimally described

as a point in Rn) is covered, and, in some cases, how well.

A set structure with geometric deﬁnitions lends itself natu-

rally to this purpose. Topological models are combinatorial

structures describing the relationships between cameras in

a network with respect to their coverage. This survey con-

siders two types: coverage overlap models in Sect. 3, which

describe pairs or groups of cameras with mutual coverage of

the scene, and transition models in Sect. 4, which describe

the more abstract relationships arising from the possibility (or

probability) of a moving agent transiting from one camera’s

region of coverage to that of another. A topological cover-

age model is typically formalized as a graph—or as some

more general graph-like structure (e.g., simplicial complex,

123

Int J Comput Vis

Fig. 1 Hierarchy of information in coverage models

hypergraph)—in which vertices represent the individual

camera nodes, and edges indicate coverage relationships.

The ideal geometric coverage model is derived from four

primary sources of information: the viewpoint parameters

(including position, orientation, and intrinsic parameters) of

all cameras in the network, a model of static objects in the

scene, a probabilistic model of agent dynamics, and a set of

independent task-speciﬁc requirements. The coverage over-

lap topology may be thought of as a distillation of some of

this information into a combinatorial structure; ideally, it is

derived from the quantiﬁed geometric overlap between indi-

vidual cameras or coverage cells. Transition topology is ide-

ally derived from both the geometric coverage (at some level

of granularity) and from the agent dynamics model, captur-

ing information not present in either the geometric coverage

model or the overlap topology. Figure 1illustrates the ideal

hierarchy of information. All three types of model can be esti-

mated directly from various forms of captured visual infor-

mation in the absence of some or all of the primary sources.

We take a common approach in examining each type of

coverage model. First, we describe the abstract, fundamental

structure of the model class, a comprehensive generaliza-

tion of all reviewed models which provides the reader with

an overall understanding and consistent terminology for the

ensuing discussion. Next, we introduce the various surveyed

works presenting models of this type, in the context of their

target applications. The models being thus introduced, we

compare and contrast the realizations (or lack thereof) of

each individual aspect of the general model. Finally, with

respect to the ultimate goal of producing a general, accurate

model of the given class, we discuss the state of the art and

expose the open research questions by highlighting the most

promising contributions to date.

1.1 Applications

Geometric coverage models are used in a variety of sen-

sor planning applications. In single-camera vision systems,

the objective is to ﬁnd a viewpoint which satisﬁes coverage

requirements, often on a well-deﬁned target in a controlled

scene. Analytic solutions are common. By contrast, the opti-

mal camera placement problem in multi-camera systems is

usually far less structured, with an objective of maximizing

coverage under a cost constraint or minimize cost under a

coverage constraint; typical approaches include optimization

by linear programming or search heuristics such as genetic

algorithms. In this context, the coverage of multiple cameras

is modeled simultaneously within the scene, and the over-

all coverage performance of the network can be quantiﬁed.

A special case of this problem is camera reconﬁguration,

wherein only certain viewpoint parameters of a ﬁxed set of

cameras are variable, and must be optimized for maximum

coverage performance in an online context (which carries

its own unique constraints). Camera coverage models can

also be used for online camera selection; typical problem

instances involve determining one or more optimal views of

a given parametric target, subject to energy and other con-

straints.

Coverage overlap topology models have several impor-

tant applications in computer vision and camera networks.

The ofﬂine problem of multi-view registration has been

approached by applying combinatorial optimizations to an

equivalent structure, derived from view overlap, to control

the registration process. Similarly, a number of multi-camera

calibration algorithms, which estimate the relative poses of

the cameras in the network, proceed according to optimized

paths in a combinatorial representation of coverage overlap.

Knowledge of overlap topology may also greatly improve

the performance of direct tracking correspondence, in which

tracked agents are matched between cameras with simultane-

ous coverage. This is a subproblem of the more general pre-

dictive tracking problem and is the basis for camera handoff

across overlapping cameras. Coverage overlap models have

also been applied to scheduling problems in camera networks

with energy constraints, such as duty cycling, triggered wake-

up, and load distribution.

Transition topology models are tailor-made for online pre-

dictive tracking applications in camera networks. The objec-

tive here is to estimate the probability and duration of an

agent transition from one region of coverage (e.g., a camera)

to another. In a sense, this is a generalization of the direct

tracking correspondence problem to cameras which do not

necessarily have coverage overlap, and is likewise used for

camera handoff and other tracking tasks.

1.2 Scope of this Survey

Our interest in this survey is in geometrical and topological

models of (generally) multi-camera coverage, and, where

applicable, on methods of estimating their parameters. We

cover posterior applications only insofar as they elucidate

123

Int J Comput Vis

the motivations behind decisions in the design of the mod-

els; the inclusion of a full exposition of every application for

its own sake would produce a prohibitively long work, tanta-

mount to a survey of a considerable cross-section of the entire

camera network ﬁeld, and more to the point, would not be

particularly germane to an understanding of the theory and

state of the art of coverage models.

In order to remain focused on the stated purpose, the works

surveyed here are primarily drawn from literature speciﬁcally

on camera networks or other multi-camera systems. Where

appropriate, the origins of certain concepts, often from the

broader computer vision and sensor network ﬁelds, are men-

tioned to provide historical context.

The survey of geometric coverage models in Sect. 2

includes some discussion of single-camera sensor planning

and next best view work from the computer vision litera-

ture, as the level of detail in some of these models provides

a comparative baseline for later multi-camera work. It also

includes mention of some general sensor network models

where appropriate, although the focus is primarily on direc-

tional sensor networks (usually a label for camera networks).

Section 3considers some coverage overlap models devel-

oped for multi-view registration, which is not necessarily

a camera network application, as the views are typically

obtained from a single camera in a video sequence or multi-

image scan. However, since multiple views are theoretically

equivalent to multiple cameras, the model structures and

techniques are of interest here.

The transition models presented in Sect. 4address an issue

largely endemic to camera networks, and accordingly, all of

the works surveyed are speciﬁc to the ﬁeld.

2 Coverage Geometry

2.1 Anatomy of a Geometric Coverage Model

Ageometric coverage model describes the coverage of a geo-

metric space by a sensor or sensor system. Losing some gen-

erality, an intuitive example is a region of a three-dimensional

Euclidean space which is imaged by a camera sensor satis-

factorily for a given image processing task. A precise deﬁni-

tion of a generalized model derived from the works surveyed

follows.

2.1.1 Coverage Model

Asensor system is an entity which detects stimuli for the pur-

pose of executing a task. In general, this system may physi-

cally comprise a single sensor, multiple sensors, or part of one

or more sensors’ ranges, with one or more sensing modali-

ties. Stimuli are uniquely deﬁned in a stimulus space S;in

most cases, stimuli are 2D or 3D points in the stimulus space

R2or R3, respectively, but exceptions exist when geometri-

cal characteristics of the stimulus other than spatial position

(e.g., direction) affect coverage as well.

A stimulus p∈Sis considered covered by a sensor sys-

tem if it yields a response sufﬁcient to achieve the given

task. An ideal coverage function, therefore, is a mapping

C:S→{0,1}, where C(p)=1 indicates that a point is

covered. Equivalently, one can speak of the covered volume

C⊂S. A more general deﬁnition C:S→Rencom-

passes models which handle uncertainty and/or consider cov-

erage quality with a coverage grade; this allows for (at least)

relative assessment of coverage. Deﬁning the function as

C:S→[0,1], or equivalently, over any closed range which

can be mapped linearly thereto, also allows absolute assess-

ment; as well, extension of the subset notion is possible if

one considers Ca fuzzy subset of S.

2.1.2 Coverage Criteria for Vision

All visual stimuli considered in the work reviewed herein can

be reduced to point features, which have a single point of

origin in Euclidean space and possibly other characteristics.

Based on well-studied geometric imaging models (Faugeras

1993;Ma et al. 2004), various researchers have identiﬁed

one or more of the criteria described here and incorporated

them into their coverage models. We assume that the reader is

familiar with the terminology and parameters of the standard

camera model, and with basic camera optics. The collective

set of intrinsic and extrinsic parameter values of a camera is

termed a viewpoint, and the associated parameter space is the

viewpoint space V(Figs. 2and 3).

We identify three basic criteria (Tarabanis et al. 1994)

which depend only on the viewpoint and a feature point in

R3; two dimensional coverage models can be thought of as

projecting these criteria onto the R2plane.

–Field of View: The inﬁnite subspace of R3which can

theoretically be imaged by the camera, determined by the

horizontal and vertical apex angles (in turn, by the optics

and physical image sensor size) and the pose (extrinsics)

of the camera.

–Resolution: A constraint on the minimum 1required res-

olution; translates directly into an upper limit on depth.

–Focus: A constraint on the acceptable sharpness of the

image; given a maximum blur circle diameter, imposes

upper and lower depth limits around the focus distance

(this range is termed the depth-of-ﬁeld).

1A maximum resolution constraint is conceivable, e.g., for privacy

purposes, but we have not encountered this in the literature.

123

Int J Comput Vis

Fig. 2 Imaging criteria (Erdem et al. 2003;Tarabanis et al. 1994)—α

and βare the ﬁeld of view angles, zris the depth limit for resolution,

and znand zfare the near and far depth of ﬁeld limits for focus

Fig. 3 Visual stimulus space—the spatial and view angle coverage

criteria induce a stimulus space comprising three-dimensional position

and direction

The ﬁeld of view in combination with the depth constraints

of resolution and/or focus are also sometimes termed the

viewing frustum.

Considering the view angle to the feature—the direction

of the surface normal at the feature point with respect to the

camera’s optical axis or the ray joining its principal point

with the feature point—adds a fourth coverage criterion, and

up to two additional (angular) dimensions to S.

A feature may also be occluded, thus not covered, if

the ray from the feature to the optical center of the cam-

era is interrupted by an opaque physical object. We consider

two criteria for occlusion, differing primarily in the type of

information about the scene used to evaluate them: static

occlusion, caused by static and/or deterministically dynamic

objects, such as walls, and dynamic occlusion, caused by

probabilistically dynamic objects, such as humans. Note that

self-occlusion is typically handled by the view angle crite-

rion, by imposing a maximum angle of 90◦.

2.1.3 Task Deﬁnition

Some of the imaging criteria in the previous section require

task-speciﬁc parameters in addition to the camera model

parameters. These typically include the minimum resolution

and maximum acceptable blur circle diameter for the reso-

lution and focus criteria, respectively, as well as a maximum

acceptable view angle. The scene information required by

the occlusion criteria—a deterministic scene model and/or

a probabilistic model of scene agent dynamics—is also part

of the task deﬁnition. It should be noted that the information

conveyed by the static scene model and the agent dynamics

model may not be mutually independent; for example, agent

motion may be constrained by the presence of walls, which

also factor in static occlusion.

Besides this coverage model information, it is worth men-

tioning two other aspects of a task deﬁnition often encoun-

tered, both of which are known under various names.

The ﬁrst is the relevance function, which indicates the sub-

set of Swhich is of interest, and may be prioritized (graded).

In general, this takes a form similar to that of a coverage

function, R:S→R. Depending on the task, it may indi-

cate a large volume of the stimulus space, representing e.g.,

a set of rooms and hallways to observe, or a small, localized

region of interest, representing e.g., a part to be inspected or

a person to be tracked.

The second is the allowable viewpoint set, a subset of V

encompassing all allowable viewpoints. Some dimensions

may be constrained to single or discrete values dictated by

hardware properties or ﬁxed settings of the cameras, inde-

pendent of the task. In general, however, such constraints are

task-speciﬁc.

2.2 Geometric Coverage Models by Application

The basic function of a geometric coverage model is to

evaluate the coverage of some region of the stimulus space

(viz. a relevance function) by a sensor system. While

not an end in itself, some form of this evaluation is a

clear prerequisite for a family of camera network coverage

problems.

One motivating application which predates camera net-

works is the ofﬂine sensor planning problem. The objective

is to ﬁnd a viewpoint which adequately covers the relevance

function for a given task. This may be found via a generate-

and-test search, or else the entire set of viewpoints may be

solved analytically. Although the output of such methods is

a subset of viewpoints in Vwhich cover some R⊂S,itis

generally straightforward to invert the criteria to obtain the

coverage C⊂Sfor some speciﬁc viewpoint in V.Tarabanis

et al. (1995) give an excellent survey of the topic. Typically,

the target systems employ a single camera observing a rel-

atively structured scene, and thus require (and can afford)

highly accurate coverage models. Cowan and Kovesi (1988)

and Tarabanis et al. (1995) present good examples.

In the multi-camera context, one encounters a simi-

lar problem most commonly known as optimal camera

123

Int J Comput Vis

placement. The exact approach used in single-camera sen-

sor planning does not scale well to multiple cameras, and

there are typically additional design variables such as the

number of cameras (with cost constraints), so nonlinear opti-

mization techniques and search heuristics are the typical

tools of choice, encouraging much simpler coverage models.

Typically, the objective is to search for either the solution

with maximum coverage given a ﬁxed cost (or number of

cameras), or the solution with minimum cost yielding some

minimum coverage. The problem appears similar to the clas-

sic art gallery problem (O’Rourke 1987;González-Banos

and Latombe 2001) frame it so, with their model assuming

omnidirectional visibility and inﬁnite range. Limiting visi-

bility and range yields a more accurate model of coverage,

but fundamentally changes the problem. Following the sen-

sor network approach, Ma and Liu (2005b,2007) propose a

so-called boolean sector coverage model (derived from the

common 2D disc model Wang 2010), enabling them to treat

optimal camera placement similarly to a set covering prob-

lem (Tao et al. 2006;Liu et al. 2008). Qian and Qi (2008),

Wang et al. (2009), and Jiang et al. (2010) further develop this

direction. Erdem et al. (2003) and Erdem and Sclaroff (2006)

approach the problem with a more realistic two-dimensional

model; subsequent results using different coverage models

and optimization techniques but similar basic method have

been reported by Hörster and Lienhart (2006,2009), Angella

et al. (2007), and Zhao et al. (2008,2009). Malik and Bajcsy

(2008) similarly address optimal placement of stereo camera

nodes. Yao et a l . (2008) adapt this type of approach to sur-

veillance networks with tracking and handoff tasks, adding

a “safety margin” to their coverage model to enforce the

necessary coverage overlap. The work of Mittal and Davis

(2004,2008) and Mittal (2006) extends the set of constraints

to include dynamic occlusion, important in a signiﬁcant sub-

set of applications involving relatively high densities of mov-

ing agents.

The problem of online camera reconﬁguration is funda-

mentally similar to optimal camera placement, but restricts

the allowable viewpoints to those which can be arrived at

by varying some set of parameters which can be controlled

online (e.g., mobile platforms, pan-tilt mounts, motorized

lenses), and allows for a dynamic relevance objective based

on feedback from camera data. The coverage models and

optimization techniques used may reﬂect the need for real-

time online performance. Bodor et al. (2005,2007) and Fiore

et al. (2008) seek to optimize the conﬁguration of cam-

eras mounted on mobile robots for global scene coverage.

Piciarelli et al. (2009,2010) address reconﬁguration of pan-

tilt-zoom (PTZ) cameras, common in surveillance applica-

tions. Ram et al. (2006) and Erdem and Sclaroff (2006)

both also touch on PTZ reconﬁguration; the latter do so by

introducing a time constraint to the optimal camera place-

ment problem. Chen et al. (2010) focus on the view angle

criterion in optimizing the conﬁguration of rotating (panning)

cameras.

Coverage evaluation is also useful in an online context for

camera selection, which chooses an optimal subset of view-

points for a localized relevance function, often subject to

constraints such as energy costs. In the single-camera realm,

this can be related to the next best view problem, approached

by Reed and Allen (2000) and Chen and Li (2004) using cov-

erage models similar to those used in sensor planning. Park

et al. (2006) use a fairly realistic three-dimensional coverage

model for camera selection, and acknowledge that a yet more

sophisticated model could be substituted. The approach of

Shen et al. (2007) is notable for assigning a scalar coverage

metric to the stimulus space and for allowing task-speciﬁc

weighting of the individual factors; they also touch on a ver-

sion of the optimal camera placement problem. Soro and

Heinzelman (2007) approach a slightly different problem:

given a desired viewpoint directly, rather than a relevance

function, their algorithm attempts to ﬁnd the closest actual

viewpoint (subject to energy costs).

For completeness, it is worth mentioning the geometric

component of the topological coverage overlap model of

Kulkarni et al. (2007), which differs from other geometric

models surveyed here in that it is not analytically derived

from a camera model. Instead, it is purely empirical: through

a Monte Carlo process whereby a structured target is placed

at an arbitrary number of random points in the scene, each

camera with a view to the target at a given position estimates

its pose, and each Voronoi cell around a target position forms

a part of the geometric coverage of each camera that observed

that position. In combination with the topological model, it

is applied to scheduling problems. This model is discussed

further in Sect. 3.

2.3 Analysis and Comparison of Geometric Models

Table 1compares the nature and properties of a number

of camera network coverage models from the literature,

grouped by application. Since most of these models have

been developed with speciﬁc applications in mind (indicated

in the ﬁrst column), it should be interpreted as a comment

on the generality – not necessarily the validity or quality—of

the models. The second column indicates the dimensional-

ity of the model; a dimensionality of 2.5 indicates that the

ﬁnal representation is two-dimensional, but is derived from

three-dimensional characteristics of the cameras and scene.

The third column indicates whether the model is graded,

i.e., whether it assigns to a point a scalar measure of cov-

erage in some form (weighted, probabilistic, fuzzy, etc.);

non-graded models are bivalent. The following four columns

indicate which of the imaging coverage criteria (ﬁeld of view,

resolution, focus, view angle) are included. The ﬁnal two

columns indicate which occlusion criteria (static, dynamic)

123

Int J Comput Vis

Tabl e 1 Comparison of selected geometric camera network coverage models

Model Appl. Properties Imaging criteria Occlusion

Dim. Graded FOV Resol. Focus Angle Static Dynamic

Cowan and Kovesi (1988)SP•••

Tarabanis et al. (1995)SP•••

González-Banos and Latombe (2001)OCP ••

Wang et al. (2009)OCP••

Jiang et al. (2010)OCP••

Erdem and Sclaroff (2006)OCP••

Hörster and Lienhart (2009)OCP••

Angella et al. (2007)OCP•••

Zhao et al. (2009)OCP•••

Malik and Bajcsy (2008)OCP•••

Mittal and Davis (2008)OCP••

◗

Bodor et al. (2007)CR••

◗

Piciarelli et al. (2010)CR••

◗

Park et al. (2006)CS•••

Shen et al. (2007)CS••

◗

SP sensor planning, OCP optimal camera placement, CR camera reconﬁguration, CS camera selection

are included. It should be noted that, in some cases, the

authors do not provide quantitative descriptions of some

criteria or means of obtaining the information required to

derive them.

2.3.1 Dimensionality

Although vision is an inherently three-dimensional phenom-

enon, many coverage models in various applications are two-

dimensional. In such cases, to simplify the problem at hand,

it is assumed (either implicitly or explicitly) that

– all cameras are positioned in a common plane,

– all targets are constrained to a common plane, and

– the scene consists of occluding vertical “high walls.”

In models derived from the art gallery problem formula-

tion, e.g., González-Banos and Latombe (2001), the choice

reﬂects the fact that three-dimensional AGP is NP-hard

(Marengoni et al. 2000). The vast majority of work on sen-

sor network coverage problems (Meguerdichian et al. 2001)

has employed two-dimensional disc models (Wang 2010)

(although the three-dimensional case has been studied Huang

et al. 2007), assuming a roughly planar environment. Some

camera network models, including those of Ma and Liu

(2005b,2007), Liu et al. (2008), Wang et al. (2009), and

Jiang et al. (2010), follow directly from this tradition, simply

restricting the disc to a sector (Wang 2010) for direction-

ality. Erdem and Sclaroff (2006) and Hörster and Lienhart

(2009) do not appear to share this lineage, and explicitly cite

the complexity of their respective optimization methods as

motivating their restriction to two dimensions. The model of

Yao et a l . (2008) appears to be heavily inﬂuenced by that of

Erdem and Sclaroff. In all of the preceding cases, the domain

of camera coverage is explicitly planar.

In contrast, some two-dimensional models are not devel-

oped from the ground up as such. Bodor et al. (2005,2007)

and Mittal and Davis (2008) begin with three-dimensional

analytic treatments of their respective constraints, but subse-

quent assumptions about the scene and viewpoint restrictions

effectively reduce their models to the plane without loss of

information. Shen et al. (2007) present a similar treatment

of view angle—in particular, including the inclination angle

between the sensor and a human subject’s head with respect to

the ground plane—in an otherwise two-dimensional model.

Piciarelli et al. (2010) account for a three-dimensional ﬁeld

of view criterion by projecting the elliptical cross-section of

their conical visible region onto the plane.

Early coverage models used in sensor planning, such as

those of Cowan and Kovesi (1988) and Tarabanis et al.

(1995), are fully three-dimensional: the gains in generality

and accuracy clearly outweigh the added complexity in the

single-camera case. These advantages have induced a num-

ber of multi-camera coverage models across the application

spectrum to follow suit. Cerfontaine et al. (2006) describe

a multi-camera method employing a three-dimensional cov-

erage model presumably derived from the pinhole camera

model, but give no details on the criteria. Park et al. (2006)

fully describe their model with a three-dimensional view-

ing frustum; the multi-camera complexity is handled by

dividing the covered volume into discrete parts and gen-

erating look-up tables for coverage grade. Angella et al.

(2007) employ a three-dimensional model drawing heavily

on the sensor planning literature. The models of Malik and

Bajcsy (2008) and Zhao et al. (2009) are also fully three-

dimensional.

123

Int J Comput Vis

(a) (b)

Fig. 4 Coverage valuation schemes—coverage may be graded either

using a bivalent indicator function, or a real-valued function (without

loss of generality, bounded to [0,1])

2.3.2 Valuation

Real-world sensor planning applications typically have well-

deﬁned requirements, and the goal is simply to ﬁnd any view-

point which meets these requirements. Accordingly, models

such as those of Cowan and Kovesi (1988) are bivalent: either

the viewpoint is acceptable or it is not, or equivalently, either

a relevance function is covered or it is not. Tarabanis et al.

(1995) discuss not only this admissibility of a viewpoint,

but also its optimality, proposing an overall coverage quality

metric based on the robustness in individual criteria (Fig. 4).

In solving the camera selection problem, one is interested

in ﬁnding the best view of a relevance function, to which a

real-valued coverage metric clearly lends itself. In Park et al.

(2006), the quality of coverage of a point pfrom a camera Ci

is considered to vary inversely with the distance from pto the

center of the viewing frustum of Ci. The authors point out

that developing an accurate coverage quality metric is not

their focus, and allow that a more sophisticated deﬁnition

could be substituted. Shen et al. (2007) explicitly set out to

deﬁne such a metric for the restricted problem case of human

surveillance; theirs takes the form of a real-valued function.

Soro and Heinzelman (2007) study several coverage-

based valuations of viewpoints for camera selection, but as

previously mentioned, their formulation is notably different

than others discussed here. Roughly speaking, each valua-

tion can be thought of as a distance metric in V. If one were

to assign an ideal viewpoint to every p∈S, these metrics

would effectively constitute a coverage grade of the form

C(p):S→R+.

By contrast, in solving the optimal camera placement and

reconﬁguration problems, bivalent coverage valuations are

used almost exclusively, to enable the use of various opti-

mization techniques (e.g., binary integer programming) that

would otherwise not be applicable. Wang et al. (2009)pro-

vide one counterexample, applying a multi-agent genetic

algorithm over a graded coverage model simple enough to

make the optimization computationally feasible. Continuous

grading functions deﬁned by Yao et a l . (2008) assign reduced

coverage values to the edges (i.e., regions near the limits of

ﬁeld of view and resolution) of a camera’s model, in order to

encourage their optimization process to yield solutions with a

substantial margin of overlap between cameras for improved

tracking and handoff. Shen et al. (2007) notably use their

coverage grade as a constraint in solving a restricted case of

the optimal camera placement problem using a greedy algo-

rithm.

2.3.3 Field of View

The coverage model employed by González-Banos and

Latombe (2001) is unique among those surveyed in assuming

omnidirectional viewing capabilities, and thus not including

a ﬁeld of view criterion. The directional nature of camera

coverage is a recurring key point in the literature, and ﬁeld

of view is the most commonly modeled constraint.

The simple sector-based models of Ma and Liu (2005b),

Ma and Liu (2007), Liu et al. (2008), Qian and Qi (2008),

Wang et al. (2009), and Jiang et al. (2010) describe ﬁeld

of view with a single angle parameter, which corresponds

roughly to the horizontal apex angle. The boundary rays are

symmetric about the optical axis, implying an assumption

of non-oblique projection. This turns out to be a satisfactory

deﬁnition in two dimensions; Erdem and Sclaroff (2006) and

Hörster and Lienhart (2009) arrive at the same by way of

the pinhole camera model, perhaps elucidating how its value

should be determined from a given camera system.

Erdem et al. (2003) also describe the three-dimensional

ﬁeld of view using two apex angles. Malik and Bajcsy (2008)

and Mittal and Davis (2008) handle ﬁeld of view similarly.

Cowan and Kovesi (1988) and Tarabanis et al. (1995) both

effectively limit ﬁeld of view to the smaller of the two apex

angles and assume non-oblique projection. Piciarelli et al.

(2010) model the ﬁeld of view as a cone, presumably with

aperture angle equal to the smaller apex angle. While this

representation facilitates their algorithm by projecting to a

circle of constant radius on a transformation of the scene

plane, it lacks accuracy and no justiﬁcation is given in the

context of their application.

The apex angles are derived from a more elementary char-

acterization of the ﬁeld of view. In general, a point p∈R3

is within the ﬁeld of view of a camera if its projection lies

somewhere on the physical sensor surface. The ﬁeld of view

induced by a rectangular sensor is a pyramid bounded by the

rays from each of its four corners through the optical center

of the camera. Zhao et al. (2009) use this constraint directly,

and can theoretically handle oblique projection. The visible

pyramid volume can also be thought of as divided into an

inﬁnite set of visible “subplanes” orthogonal to the optical

axis; Park et al. (2006) simply assume that the dimensions of

the visible subplanes at the near and far depth of ﬁeld lim-

its are known, and that these subplanes are centered at the

optical axis (implying non-oblique projection).

123

Int J Comput Vis

2.3.4 Resolution

The sector-based models proposed by Ma and Liu (2005b,

2007), Liu et al. (2008), Wang et al. (2009), and Jiang et al.

(2010) have a radial sensing range limit; although there is

no explicit relationship to a resolution constraint, it seems

its most likely justiﬁcation. Cowan and Kovesi (1988) model

their resolution constraint as an arc in two dimensions and

as a spherical cap in three dimensions.

In fact, this circular/spherical representation unnecessar-

ily complicates the matter: since the projected image is planar

and orthogonal to the optical axis, resolution is a function

of depth along the optical axis rather than distance along

the ray from the optical center (Tarabanis et al. 1994). The

triangle-shaped model of Hörster and Lienhart (2009)isa

more accurate two-dimensional representation of the resolu-

tion constraint, although it is not explicitly parameterized as

such. Erdem and Sclaroff (2006), Bodor et al. (2007), Malik

and Bajcsy (2008), Yao et a l . (2008), and Mittal and Davis

(2008) all use distance along the optical axis as the single

parameter for the resolution constraint. The last also sug-

gest that such a resolution criterion could be used as a “soft”

constraint informing a quality measure.

2.3.5 Focus

While focus is a staple constraint in sensor planning coverage

models (Tarabanis et al. 1995), it has not been included in

most coverage models developed for other purposes. Angella

et al. (2007) mention it, but as with their other imaging cri-

teria, they provide no details. Park et al. (2006) are the other

exception; their model is bounded in depth along the optical

axis by the near and far depth of ﬁeld limits.

Park et al. also use focus as part of their coverage grade

computation (discussed in Sect. 2.3.2), to some extent: if the

center of the viewing frustum is taken as an approximation

of the focus distance, the distance of a point along the opti-

cal axis from the center varies approximately proportionally

to the blur circle diameter. A similar interpretation can be

applied to the valuation function of Wang et al. (2009).

2.3.6 View Angle

A constraint on view angle (Fig. 5) is present in some sensor

planning coverage models (Tarabanis et al. 1995), such as that

of Cowan and Kovesi (1988). In the multi-camera context,

it has been included where the target task depends on view

angle. For example, the task of the camera network in Zhao

et al. (2009) is the identiﬁcation of planar tags, the perfor-

mance of which degrades with increasing view angle. Sim-

ilarly, Shen et al. (2007) are interested in surveillance tasks

such as face tracking, so view angle features prominently in

their model. Mittal and Davis (2008), drawing on the earlier

Fig. 5 View angle—the view angle to a point feature on a surface,

shown as pwith a corresponding surface normal, is measured as αin

some sources and as βin others

sensor planning models, include the criterion, anticipating

that some tasks will have such requirements.

Special cases of task view angle requirements give rise to

a few alternate—but equivalent—forms of the view angle cri-

terion. Bodor et al. (2007) are interested in observing paths,

where foreshortening effects due to the view angle to a path

degrade performance; their view angle criterion is based on

both the angle between the path normal and the camera posi-

tion, and the angle between the path center and the optical

axis. Some applications, such as those of Malik and Bajcsy

(2008) and Chow et al. (2007), require 360◦coverage of a tar-

get, and deﬁne a maximum view angle for mutual coverage

of a point by two cameras. If the view angle to a feature on an

opaque surface exceeds 90◦, the surface occludes the feature

from view; this phenomenon is known as self-occlusion and

is sometimes treated as a separate criterion, such as by Chen

and Li (2004) and Zhao et al. (2009).

An interesting question that arises in deﬁning this criterion

is whether to measure the view angle between the feature

surface normal and the optical axis, or between the feature

surface normal and the line-of-sight ray from the camera’s

optical center. Both approaches have merit in terms of validity

with respect to task requirements. The former is taken by

Chen and Li (2004) and Bodor et al. (2007); the latter, by

Cowan and Kovesi (1988), Shen et al. (2007), Malik and

Bajcsy (2008), and Zhao et al. (2009).

Soro and Heinzelman (2007), in one of their models, grade

views primarily based on view angle.

2.3.7 Static Occlusion

Occlusion by static scene objects factors heavily in most

multi-camera coverage work (Fig. 6). Malik and Bajcsy

(2008), whose model does not include a static occlusion

criterion, assume a simple rectangular room with nonzero

123

Int J Comput Vis

Fig. 6 Static occlusion—the

two white boxes represent the

static scene model. Coverage

without the constraint is

outlined, with actual coverage in

gray

relevance somewhere near its center, which suits their tar-

get task, but in most multi-camera applications the scene

is assumed to be more complex. The “high wall” occlusion

model common in two-dimensional approaches has its origin

in the art gallery problem, exempliﬁed by González-Banos

and Latombe (2001). This constraint is enforced as follows:

given a scene model consisting of line segments in the plane,

a point p∈R2is occluded (not covered) if the line of sight

from the camera’s optical center to pintersects any such line

segment. Erdem and Sclaroff (2006) propose an algorithm to

construct a continuous “visibility polygon” set which con-

tains all non-occluded scene points. Hörster and Lienhart

(2009), Mittal and Davis (2008), and Shen et al. (2007)sim-

ply check for line-of-sight on each discrete relevance point.

Jiang et al. (2010) approximate static occlusion by simply

excluding obstacle regions from the ﬁeld of view of a cam-

era; in conﬁned spaces and using cameras with realistic ﬁeld

of view, this would likely result in poor performance.

The three-dimensional analog to the line segment scene

model is composed of opaque surfaces. A continuous,

analytic solution has been employed in sensor planning

(Tarabanis et al. 1996) and next best view (Maver and Bajcsy

1993) applications. In the multi-camera context, discrete

line-of-sight checking is more common, as is done by Angella

et al. (2007) and Zhao et al. (2009).

Piciarelli et al. (2010) handle static occlusion directly in

the relevance function. Each camera node has its own copy of

the global relevance function, with all points occluded (via

two-dimensional line of sight) from that camera removed

from the model.

2.3.8 Dynamic Occlusion

Mittal and Davis (2008) have pioneered handling dynamic

occlusion in a geometric coverage model. They use a prob-

abilistic model of agent occupancy and some assumptions

about agent height and allowable camera viewpoints to for-

mulate a probabilistic visibility criterion, which is then inte-

grated with their other (static) constraints. Angella et al.

(2007) use this model. Chen and Davis (2008) independently

propose their own probabilistic metric for dynamic occlu-

sion, under similar assumptions about the agents and cam-

eras. Qian and Qi (2008) also propose a probabilistic model,

with targets modeled as 2D discs (analogous to Mittal and

Davis’ representation) and using a simple sector-type cover-

age model.

Zhao et al. (2009) include a “mutual occlusion” criterion in

their model, which approximates worst-case dynamic occlu-

sion by specifying a range of view angles within which a

point is assumed to be occluded by another agent.

2.3.9 Combining Criteria and Multi-Camera Coverage

Cowan and Kovesi (1988) treat coverage criteria as con-

straints on the viewpoint, so in order to ﬁnd the solution

set which satisﬁes all constraints (i.e., the set of viewpoints

which adequately cover the relevance function), it sufﬁces to

intersect the solution set for each individual criterion. Biva-

lent coverage models have taken much the same approach,

intersecting the sets of covered points generated by each cri-

terion, exempliﬁed by the “feasible region” result of Erdem

and Sclaroff (2006). In the multi-camera context, the over-

all coverage of the scene is of interest; this is usually found

by taking the union of the coverage sets for each individual

camera, as Erdem and Sclaroff also show.

Mittal and Davis (2008) integrate their probabilistic

dynamic occlusion metric with their other “static” constraints

to obtain an overall (graded) quality metric for each point and

orientation.

Several models in the literature also provide mechanisms

to compute overall k-coverage of a scene. Erdem and Sclaroff

(2006) show a similar approach in their experimental ﬁg-

ures, but none of their experimental problem statements

require multi-camera coverage. Liu et al. (2008)alsouse

an intersection-union approach in their work, which focuses

speciﬁcally on k-coverage.

Mittal and Davis (2008) discuss more complex “algorith-

mic constraints” involving the interplay of various constraints

between multiple cameras, for such tasks as stereo matching.

To some extent, particularly on the view angle criterion, this

is realized in the k-coverage model of Shen et al. (2007).

2.3.10 Task Parameters

A recurring motif in the literature is that the quantiﬁcation

of visual coverage depends as much on the task as it does

on the parameters of the imaging system. Generally, given a

computer vision algorithm used in a task, it is at least the-

oretically possible to quantify soft or hard requirements on

imaging properties such as resolution, focus, and view angle.

The actual values of the imaging constraints in sensor plan-

ning models, such as those of Cowan and Kovesi (1988) and

Tarabanis et al. (1995), are assumed to be direct task

requirements. Erdem and Sclaroff (2006) emphasize the task-

speciﬁc nature of the constraints in the optimal camera place-

ment context.

123

Int J Comput Vis

One form of the optimal camera placement problem con-

strains the minimum required proportion of the relevance

function covered by the solution (while maximizing or min-

imizing some other variable, such as cost), a task-speciﬁc

requirement. This is one of the four variations studied by

Hörster and Lienhart (2009). The weighted form of this pro-

portion, sometimes called the coverage rate (Jiang et al.

2010), may ﬁll a similar role, as in the optimal placement

problem studied by Shen et al. (2007).

2.3.11 Relevance Function

A relevance function is most commonly used in optimal

camera placement and camera reconﬁguration applications,

where it comprises the coverage objective. Often, the nonzero

relevance function is implicitly the working volume (as in the

art gallery problem); in order to support a general problem

deﬁnition, however, a model should separate the coverage

target from other considerations such as the scene model for

static occlusion and the allowable viewpoint set. Jiang et al.

(2010), Hörster and Lienhart (2009), Angella et al. (2007),

Zhao et al. (2009), and Malik and Bajcsy (2008) all allow

speciﬁcation of a relevance function, in some form, that is

distinct from the scene model and/or allowable viewpoint

positions.

It is also useful to allow prioritization of coverage in the

relevance function. One of the experiments of Erdem and

Sclaroff (2006) speciﬁes a higher resolution requirement on

certain parts of the ﬂoor plan. Hörster and Lienhart (2009)

use a continuous weighted relevance function in their prob-

lem instance deﬁnition; in the actual discrete domain of their

algorithm, this informs the sampling frequency of control

points. Jiang et al. (2010) retain a similar continuous deﬁni-

tion, using distinct regions with integer weights to simplify

the weighted coverage computation. Piciarelli et al. (2010)

deﬁne a relevance function as a mapping of discrete points

to real values.

2.3.12 Allowable Viewpoint Set

Generally, explicit restrictions on viewpoints in sensor plan-

ning and optimal camera placement applications are on the

position component of the viewpoint only. There are usu-

ally no restrictions on orientation; a notable exception is the

work of Chen and Li (2004), where both position and orien-

tation are constrained by kinematic reachability by the robot

on which the camera is mounted. Restrictions or speciﬁca-

tions on other aspects of the viewpoint, such as the intrinsic

parameters of the camera, are usually implicit in the problem

instance.

Sensor planning coverage models translate coverage cri-

teria into constraints on the solution set of viewpoints, so the

allowable viewpoint set is just a directly-deﬁned constraint

to be intersected with the rest. The “prohibited regions” con-

straint employed by Cowan and Kovesi (1988) is an example.

In the art gallery problem, the volume of relevance to be

covered and the allowable Rnpositions of the guards (cam-

eras) are implicitly the same, as exempliﬁed by González-

Banos and Latombe (2001). Erdem and Sclaroff (2006) and

Wang et al. (2009) also use the relevance volume as the

allowable viewpoint position set. Jiang et al. (2010) spec-

ify a relevance function, but place no restriction on camera

position. Malik and Bajcsy (2008) constrain camera positions

to a rectangular volume which is implied to be a superset of

the relevance volume. Shen et al. (2007) restrict viewpoints

to the outer boundary of a rectangular relevance volume.

Hörster and Lienhart (2009) specify the relevance function

and the allowable viewpoint set separately, as subsets of a

larger working volume.

Camera reconﬁguration applications typically place tighter

restrictions on allowable viewpoints. Bodor et al. (2007)

allow full online control of position and orientation, as their

cameras are mounted on mobile robots. Piciarelli et al. (2010)

allow online control of orientation (pan and tilt) as well as

some intrinsic parameters (zoom), but constrain the cameras

to ﬁxed positions. Chen et al. (2010) allow horizontal rotation

(pan) only.

2.4 State of the Art and Open Problems

To date, no geometric model has fully captured the phe-

nomenon of visual coverage in a representation suitable for

the general multi-camera context. While some of the single-

camera sensor planning models we have discussed are quite

accurate and general enough to apply to a wide set of tasks,

they are ill-suited to modeling typical systems and environ-

ments involving multiple cameras, and in their present form

would likely put prohibitive computational requirements on

optimizations involving even relatively small networks. Con-

versely, in expressly designing multi-camera models in forms

suitable for speciﬁc optimization techniques, the remainder

of the authors mentioned have restricted applicability to rel-

atively speciﬁc problem classes. Mittal and Davis (2008)

appear to have designed the most accurate and general model

to date which is still suitable for multi-camera optimization,

but it is still somewhat restricted by certain assumptions,

notably its two-dimensional ﬁnal representation, and its lack

of a focus criterion.

The ideal geometric coverage model would not only

accurately model visual coverage in a form convenient for

multi-camera systems and their environments, with as few

assumptions as possible and allowing for generalized task

requirements, but also provide this information in a form

accomodating powerful optimization techniques. It is clear

from the preceding discussion that the factors involved

in a model achieving the former goal would be highly

123

Int J Comput Vis

complex, complicating success in the latter goal. The prevail-

ing approach to this problem has been to design the model to

be as accurate and general as possible for one speciﬁc opti-

mization technique from the outset, but this has failed to pro-

duce the ideal model. We suggest that attempting to achieve

the ﬁrst goal in isolation could, at the very least, produce a

tool for evaluation, but may also yield new insights into the

nature of multi-camera coverage that may lend the model,

or some derivative thereof, to an appropriate optimization

scheme.

Most sources surveyed have assumed that the coverage

model employed reﬂects a posteriori task performance, with

little or no validation of the model itself. In order to eval-

uate accuracy and generality, a generic scheme for relating

the coverage metric to a task performance metric should be

developed and adopted. A simple statistical measure, such as

the Pearson product-moment correlation coefﬁcient, might

sufﬁce; depending on the nature of the coverage and perfor-

mance metrics, other measures might be more illuminating.

3 Coverage Overlap Topology

3.1 Anatomy of a Coverage Overlap Model

Acoverage overlap model describes the topology of a cam-

era network in terms of coverage overlap (mutual coverage

of some part of the scene). Typically, the camera node is

the atomic entity, and of interest are the node-level coverage

overlap relationships. It is often desirable to capture not only

the fact but the degree of overlap.

In the most general form, such a model is a weighted

undirected hypergraph H=(C,E,w), where Cis a set of

coverage cells,E⊆P(C)(where Pdenotes the power set)

is a set of hyperedges, and w:E→R+is a weight function

over E. A coverage cell may represent an individual camera

node’s coverage model or some portion thereof. The exis-

tence of a hyperedge e∈Eindicates that the nodes in e

share mutual coverage of the scene, with a k-hyperedge cor-

responding to k-coverage. In a weighted model, w(e)quanti-

ﬁes the degree of shared coverage. In an unweighted model,

implicitly, w(e)=1ife∈Eand w(e)=0 otherwise; the

existence of an edge indicates sufﬁcient mutual coverage for

the given task, by some task-speciﬁc quantitative deﬁnition.

The most common form is the vision graph (Fig. 7), which

is an ordinary graph (a 2-uniform hypergraph) and thus con-

siders only pairwise coverage overlap.

3.2 Coverage Overlap Models by Application

The earliest examples of coverage overlap models are

found in multi-view registration applications, including video

sequence registration and 3D range image registration. Since

Fig. 7 Vision graph—from 2D coverage geometry (left) to overlap

topology (right). Note that the pairwise overlaps of A,B,and Eare

represented, but their 3-overlap is not

the objective is to align visual data from multiple views, it is

clearly useful to know which views overlap and thus might

have some corresponding features for registration. Although

this is not necessarily a camera network application, multi-

ple views are theoretically equivalent to multiple cameras.

Sawhney et al. (1998) propose a graph formalism of the cov-

erage overlap relationships between multiple views for video

sequence registration, with each frame (view) represented by

a vertex, addressing the fact that frames (views) which are not

temporally adjacent may still be adjacent in terms of overlap

topology. Kang et al. (2000) construct a similar graph repre-

sentation of the overlap topology of frames, in which edges

indicate either temporal or spatial (overlap) adjacency. Their

algorithm searches for an optimal path in this graph to min-

imize error in global registration. Huber (2001) constructs

a graph for registration of partial 3D views using an over-

lap criterion on the range images, analyzes the registration

problem through its connectivity properties, and performs

reconstruction over a spanning tree. Sharp et al. (2004)also

study 3D range image registration using a similar graph for-

malism, which they assume exists a priori. They approach the

global registration problem by ﬁrst considering registration

over basis cycles within the graph, then merging the results

using an averaging technique.

Knowledge of camera network topology in terms of cov-

erage overlap is a useful precursor to full metric multi-

camera calibration.Antone et al. (2002) require, as input

to their calibration algorithm, a graph of node adjacency;

although the criterion for edge presence is based on posi-

tion (from GPS), since the algorithm targets omni-directional

cameras, this is supposed to approximate coverage over-

lap and is thus a vision graph. Brand et al. (2004)fur-

ther develop this work, using directionally-constrained graph

embeddings. Devarajan and Radke (2004) name and explic-

itly describe the vision graph, pointing out its distinctive-

ness from the communication graph (a departure from sensor

networks), and demonstrating its usefulness in informing a

full calibration algorithm as to which camera pairs should

attempt to ﬁnd a homography. However, they offer no

means of obtaining the vision graph automatically, instead

123

Int J Comput Vis

Tabl e 2 Comparison of selected topological coverage overlap models

Model Appl. Properties Construction data

Struct.2Weight k-View Part. Geom. Reg. Feat. Occup. Motion

Sawhney et al. (1998)RG •

Huber (2001)RG •

Sharp et al. (2004)RG

•

Cheng et al. (2007)CG •

Kurillo et al. (2008)CG

•

Bajramovic et al. (2009)CG •

Mavrinac et al. (2010)CG

•

Stauffer and Tieu (2003)DTG •

Mandel et al. (2007)DTG •

Van Den Hengel et al. (2006)DT G •

Lobaton et al. (2010)DTSC •

Kulkarni et al. (2007)SHG

•

Mavrinac and Chen (2011)SHG •

Rmulti-view registration, Ccalibration, DT direct tracking correspondence, Sscheduling, Ggraph, HG - hypergraph, SC simplicial complex

making the temporary assumption that it is available a pri-

ori. Cheng et al. (2007) address this issue by approximat-

ing the vision graph via pairwise feature matching, and

describe a full calibration algorithm also employing the fea-

ture data following the procedure of Devarajan and Radke.

Kurillo et al. (2008) construct a weighted vision graph based

on the number of shared calibration points, then optimize the

set of calibration pairs by ﬁnding a shortest path spanning

tree. Bajramovic et al. (2009) perform multi-camera cali-

bration over connected components of their vision graph,

which they construct independently using the normalized

joint entropy of point correspondence probability, one of sev-

eral methods described by Brückner et al. (2009). Mavrinac

et al. (2010) describe the vision graph as a theoretical upper

bound for the connectivity of their grouping and calibration

graphs.

Overlap topology can be used to help establish direct

tracking correspondence, a subproblem of tracking cor-

respondence involving agents simultaneously visible in

multiple cameras. This is useful for camera handoff among

overlapping cameras (Javed et al. 2000;Khan and Shah

2003). In this context, overlap topology is usually consid-

ered to be a subset of a more general transition topology,

models for which are covered in Sect. 4.Stauffer and Tieu

(2003) describe a “camera graph” which identiﬁes with the

vision graph, estimating camera overlap from sets of likely

correspondences between tracks. This graph is then used as

feedback to improve tracking correspondence. Mandel et al.

(2007) use a probabilistic approach on motion correspon-

dence to establish overlap topology for tracking purposes.

In a series of papers on the topic, Van Den Hengel et al.

(2006,2007), Detmold et al. (2007,2008), Hill et al. (2008)

describe the exclusion approach, whereby the vision graph

begins complete and edges are removed based on contra-

dictory occupancy observations, with a target application of

tracking correspondence in surveillance networks. Lobaton

et al. (2009a,b,2010) propose a simplicial complex repre-

sentation of overlap topology dubbed the CN-complex,pri-

marily targeted at tracking applications. Overlap topology is

employed by Song et al. (2010) as part of their consensus

approach to tracking and activity recognition.

Camera networks are often composed of devices with lim-

ited computational and energy resources. Knowledge of over-

lap topology can help inform efﬁcient scheduling of node

activity. Ma and Liu (2005a) estimate the correlation between

views using their previously described geometric coverage

model, to improve the efﬁciency of video processing in

camera networks with partially redundant views. However,

the information used is not strictly topological, and the

method applies speciﬁcally to two-camera systems. Dai and

Akyildiz (2009) address the latter issue by extending the cor-

relation problem to multiple cameras, but their model is also

not strictly topological. Kulkarni et al. (2007) construct a

vision graph using a Monte Carlo feature matching technique

with a geometric model component, and demonstrate its use

in duty cycling and triggered wake-up. Mavrinac and Chen

(2011) propose a coverage hypergraph derived directly from

their geometric coverage model, and apply it to the optimiza-

tion of load distribution using a parallel machine scheduling

algorithm.

3.3 Analysis and Comparison of Overlap Models

Table 2compares the nature and properties of a selection

of topological coverage overlap models from the literature,

grouped by application (indicated in the ﬁrst column). The

123

Int J Comput Vis

second column identiﬁes the combinatorial structure used

(whether explicit or interpreted), and the following three

columns indicate which additional properties are exhibited:

edge weighting, k-view modeling, and modeling of par-

tial views, respectively. The remaining ﬁve columns specify

which type of data is used in constructing the model: geomet-

ric coverage information, registration results, local feature

matching, occupancy correlation, or motion correlation.

3.3.1 Combinatorial Structure

Although not all of the coverage overlap models surveyed are

explicitly formalized as graphs (or hypergraphs), they can be

cast as cases of the general model described in Sect. 3.1 with-

out loss of information. The original descriptions given by

the authors are summarized here, and instances where ancil-

lary information not captured by the graph representation is

present are highlighted.

The vision graph as described by Devarajan and Radke

(2004)—an undirected, unweighted graph with vertices rep-

resenting camera nodes and edges indicating sufﬁcient cover-

age overlap for the purposes of the task—is the simplest and

most common combinatorial structure for models of cover-

age overlap topology seen in the literature. This is the explicit

form of the models of Cheng et al. (2007), Bajramovic et al.

(2009), Mavrinac et al. (2010), and Stauffer and Tieu (2003).

The graphs of Sawhney et al. (1998) and Kang et al. (2000)

describe temporal and spatial adjacency, but since in their

application temporally adjacent frames are assumed to be

spatially adjacent also, they are effectively describing the

vision graph structure. The graphs described by Huber (2001)

and Sharp et al. (2004) are also essentially vision graphs;

though edges are annotated with pairwise relative pose and

other relations, this information is not part of the overlap

model proper. Mandel et al. (2007) and Van Den Hengel et al.

(2006,2007) do not explicitly present graph formalisms, but

maintain sets of hypotheses about coverage overlap which

correspond to edges in the vision graph.

Some recent models extend the captured topology from

pairwise overlap to k-overlap, requiring a hypergraph or

hypergraph-like structure to accomodate the relationships.

Lobaton et al. (2009a,b,2010) partially achieve this with a

simplicial complex representation. This choice of represen-

tation, over a more abstract structure such as a hypergraph,

seems to stem from the focus being more on geometrical

properties and operations and less on combinatorial opti-

mization. They are interested in overlap topology only up

to 2-simplices (or 3-simplices in a hypothetical extension to

three dimensions), so their model does not capture general

k-overlap. Kulkarni et al. (2007) model the full k-overlap

topology of the camera network, although they do not explic-

itly formalize this model in a hypergraph representation or

use any combinatorial techniques. Mavrinac and Chen (2011)

Fig. 8 Vertex granularity in the CN-complex (Lobaton et al. 2010)—

the simplicial complex on the right more accurately describes the cov-

erage overlap topology between cameras Aand B

present an explicit hypergraph representation of k-overlap

topology, with an initial scheduling application using a com-

binatorial algorithm.

The assingment of one vertex to each camera node is sen-

sible for most combinatorial optimization purposes, but a few

models eschew this paradigm and subdivide vertex assign-

ment into coverage cells. Motivations for doing so vary. Van

Den Hengel et al. (2006,2007) subdivide views into an arbi-

trary number of windows to handle partial coverage overlap

of cameras (due to the speciﬁcs of their construction method,

discussed in Sect. 3.3.2). Mandel et al. (2007) divide views

into regions for a similar reason. In both cases, it appears

that the model of interest to the eventual application recom-

bines the coverage cells (which are each associated with a

speciﬁc camera) to the more usual granularity of one vertex

per camera. Lobaton et al. (2009a,b,2010) divide the two-

dimensional geometric coverage of each camera at rays to

occlusion events (which they call “bisecting lines”), allowing

their model to accurately capture some geometric properties

of overlap, such as static occlusions within the ﬁeld of view,

as shown in Fig. 8. This increased granularity is preserved,

and shown to be beneﬁcial in tracking applications.

Calibration and scheduling applications of overlap models

often make use of graph optimizations related to path length.

In such cases, weighting vision graph edges proportionally

to the degree of coverage overlap can yield better results.

Given an edge eAB ={A,B}linking cameras Aand B,

Kurillo et al. (2008) assign the weight w(eAB)=1

NAB , where

NAB is the number of common reference points detected by

cameras Aand B(see Sect. 3.3.2 for more on their graph

construction method). Kulkarni et al. (2007) similarly com-

pute the degree of k-overlap from the number of common

reference points of kcameras. Both describe methods of han-

dling non-uniform spatial distributions of reference points.

Mavrinac and Chen (2011) theoretically use the volume of

intersection between kcameras’ geometric coverage models

123

Int J Comput Vis

to weight hyperedges, but in practice, the required polytope

intersection procedure is NP-hard, so they use a uniform dis-

tribution of points to compute a discrete approximation.

3.3.2 Construction from Visual Data

In theory, overlap topology is a derived property of the geo-

metric coverage of the camera network. However, since it

is often employed in applications where geometric coverage

information is unavailable, especially in calibration initial-

ization, it is often necessary to estimate it using visual data.

Finding correspondences between visual data in some form

among views is the obvious means—if camera Amatches

a piece of its own information to one from camera B, then

a hypothesis of mutual coverage between Aand Bcan be

made or strengthened.

For tasks which already make use of correspondences

between local image features, the same information, or some

subset thereof, can be used to recover overlap topology. This

is the approach originally suggested by Devarajan and Radke

(2004). Because their algorithm works in an ofﬂine central-

ized context, Kang et al. (2000) are able to directly corre-

late image features to infer topology. Cheng et al. (2007),

addressing the camera network calibration problem, attempt

to make such an approach scalable in an online distributed

context by instead sharing feature digests of SIFT (Lowe

2004) descriptors among camera nodes. Bajramovic et al.

(2009) and Brückner et al. (2009) use the pairwise joint

entropy of point correspondence probability distributions,

based on SIFT feature descriptors, as a measure of overlap.

Kurillo et al. (2008) use direct matching of a more sparse but

more accurate set of features, obtained from a structured cal-

ibration target. A similar approach is taken by Kulkarni et al.

(2007), although the structured target is only used for topol-

ogy inference and is unrelated to their application. In their

case, both the degree and geometry of coverage overlap are

estimated using a Monte Carlo technique, whereby the tar-

get is imaged at random reference points, and the k-covered

Voronoi cells around each point contribute to the estimate for

each of the kcameras covering it.

Registration-based applications are typically iterative, and

some overlap models are updated using new correspondence

data available in each iteration. Sawhney et al. (1998) infer

global overlap topology iteratively, using feedback from a

local coarse registration stage to recover graph edges, and

subsequently performing local ﬁne registration on adjacent

views. An analogous three-dimensional process is employed

by Mavrinac et al. (2010) in a distributed calibration algo-

rithm, with coarse registration results iteratively building a

grouping graph which then informs pairwise ﬁne registra-

tion. Huber (2001) also uses candidate registration matches

to iteratively infer overlap topology.

Camera networks often have wide baselines and large rota-

tional motion between cameras, over which local

feature detectors generally have poor repeatability and

matching performance (Mikolajczyk et al. 2006;Moreels

and Perona 2007). Fortunately, they offer the possibil-

ity of matching online motion data instead of static fea-

tures, which can be more robust under some circumstances.

Stauffer and Tieu (2003) argue that the descriptiveness, spa-

tial sparsity, temporal continuity, and linear increase in vol-

ume over time of tracking correspondences make them more

reliable in matching than static features. They correlate local

tracks between cameras over time, and infer a vision graph

edge where the expectation of a match exceeds a thresh-

old. Mandel et al. (2007) take a slightly different approach,

detecting local motion and attempting to correlate it with

motion observed in other cameras, via a distributed algo-

rithm. Lobaton et al. (2010) automatically decompose cam-

eras into coverage cells by locally ﬁnding “bisecting lines” at

which occlusion events occur (e.g., walls), then, with a dis-

tributed algorithm, globally estimate cell overlap by match-

ing concurrent occlusion events over time.

Van Den Hengel et al. (2006,2007) take the reverse

approach to those described thus far. Their exclusion algo-

rithm begins by assuming all camera nodes have overlapping

coverage, thus a complete vision graph, and eliminates edges

over time using occupancy data to rule out coverage overlap.

This method does not require any correspondence between

observations; if camera Ais occupied (currently observing

an object) and camera Bis unoccupied, this is evidence that

Aand Bdo not have mutual coverage, which through obser-

vation ratio calculations and thresholding contributes to the

ﬁnal model. Partial overlaps are handled by dividing camera

coverage into an arbitrary number of coverage cells. Hill et al.

(2008) describe a number of potential shortcomings in real-

world operation, along with ways of mitigating the adverse

effects on performance. Detmold et al. (2007,2008) extend

the approach into an online distributed context for scalability

and dynamic updating of the model.

One direct route to an overlap model well-suited to the

task at hand is to use the very visual data used by the

task itself to estimate the model, if this data (or similar

data) is available. This can clearly be seen in most cases

of registration, feature-based calibration, and tracking appli-

cations in Table 2. In a distributed camera network, depend-

ing on the nature of the data and the amount of it required

to establish accurate overlap estimates, there is a poten-

tial scalability issue since, initially, the data must effec-

tively be broadcast to all other nodes. As mentioned in the

preceding section, Cheng et al. (2007) address this using

digests of the SIFT features to establish overlap topology,

then share the substantially larger full feature data pair-

wise only among cameras with sufﬁcient overlap for cal-

ibration. Kurillo et al. (2008) also use calibration feature

123

Int J Comput Vis

points to estimate overlap; scalability is less of an issue

because they use a structured calibration target, which yields

a set of features both sparse enough to distribute among

many cameras and robust enough to achieve accurate met-

ric calibration. In the algorithm of Stauffer and Tieu (2003),

overlap topology estimation is part of the closed-loop track-

ing correspondence task itself. The scheduling applications

of Kulkarni et al. (2007) and Mavrinac and Chen (2011)

use occupancy correlation and geometric coverage, respec-

tively, in an attempt to obtain the same fundamental informa-

tion, viz. the degree of content pertinent to the task in each

k-view.

3.4 State of the Art and Open Problems

The vision graph is a well-established concept and the-

oretical tool in the multi-camera network literature. In

the application classes of multi-view registration and cal-

ibration, which (in the surveyed cases) involve pairwise

coverage relationships exclusively, it has proven useful

in its basic form. Additional optimizations are possible

with appropriate use of edge weights and related com-

binatorial techniques, as demonstrated by Sharp et al.

(2004) and Kurillo et al. (2008) for the respective applica-

tions.

When used in direct tracking correspondence, the lim-

itations become apparent. Arbitrary subdivisions of cam-

era nodes into partial coverage cells appears to improve

performance, but this is unsatisfying from a theoretical

standpoint. Lobaton et al. (2009a,2010) present an explicit

departure from the graph model, allowing them to represent

2-coverage and 3-coverage in a simplicial complex; however,

presumably since their application does not require it, gen-

eral k-coverage modeling is absent. Kulkarni et al. (2007)

and Mavrinac and Chen (2011) use more general hyper-

graph (or equivalent) models explicitly designed for general

k-coverage, suitable for scheduling in distributed camera net-

works, but ignore the coverage subdivisions needed by track-

ing applications.

We believe that the generalized hypergraph model

presented in Sect. 3.1 includes all of the information

necessary to ﬁt the needs of each of the applications cov-

ered here, and being a relatively straightforward combi-

nation of existing concepts from the literature, should be

backwards-compatible with all of the reviewed sources. In

the absence of task-speciﬁc geometric coverage informa-

tion, it is sensible to use the task data itself to approx-

imate the model. It remains an open question whether

the nature of the information contained in edge weights,

and the additional combinatorial optimizations they make

possible, can be incorporated into such a uniﬁed frame-

work.

4 Transition Topology

4.1 Anatomy of a Transition Model

Atransition model describes the topology of a camera net-

work in terms of the probability and/or timing of moving

agents transitioning from one region of coverage to another.

While an overlap model of the sort covered in Sect. 3cap-

tures a physical topology, a transition model captures a more

abstract functional topology of agent activity. Relationships

may exist among camera nodes with no mutual scene cover-

age (non-overlapping cameras). Since the target application

class is agent tracking, the granularity of the topology may

extend down to subsets of camera nodes’ coverage: entry

and exit points and regions of overlap are often considered

individually (Fig. 9).

In the most general form, such a model is a weighted

directed graph G=(C,A,w), where Cis a set of cover-

age cells,Ais a set of arcs, and w:A→R+is a weight

function over A. A coverage cell may represent an individual

camera node’s coverage model or some portion thereof, such

as an entry or exit zone (note that a coverage cell may be

both an entry zone and an exit zone). Cmay also include a

special source/sink node to collectively represent the uncov-

ered portion(s) of the scene. The existence of an arc a∈A

indicates that agents may transition from the tail region to

the head region. In a weighted model, w(a)is a quantitative

metric encapsulating the probability and/or duration of the

transition.

4.2 Transition Models by Application

Transition models are largely aimed at one particular applica-

tion class: predictive tracking in (generally) non-overlapping

camera networks. For a locally tracked agent leaving one cov-

erage cell, the objective is to predict in which other coverage

cell(s) the agent will reappear, possibly to inform camera

handoff. A special case occurs when the cameras have cover-

age overlap, which is addressed by several models of overlap

topology as the direct tracking correspondence problem (see

Sect. 3.2). Javed et al. (2003) show that, in the context of

non-overlapping tracking correspondence, transition prob-

abilities and durations are dependent on individual correla-

tions of entry and exit zones, of which each camera may have

a number. Their geometric counterparts are coverage cells,

and in a combinatorial transition model, they comprise the

vertex set. Various techniques have been applied to this type

of model to aid in tracking agents across non-overlapping

views (i.e., through unobserved regions).

The model presented by Ellis et al. (2003) exempliﬁes this

approach. Their method automatically identiﬁes entry and

exit zones in each camera ((a problem previously addressed

by Stauffer 2003), then ﬁnds the transition topology by

123

Int J Comput Vis

Fig. 9 Transition graph—from

2D coverage geometry (left)to

transition topology (right). Dark

ellipses denote entry and exit

zones. Intra-camera transition

arcs are shown in gray. The S/S

vertices represent external agent

source/sink

temporally correlating a large number of local trajectories

between cameras, requiring no actual tracking correspon-

dence. Makris et al. (2004) extend this method and further

develop its theoretical basis. Stauffer (2005) operates on a

closely related model, but presuming the availability of a cov-

erage overlap model—Stauffer cites his own previous work

with Stauffer and Tieu (2003)—treats cliques of overlapping

cameras (connected components in the vision graph) as the

larger coverage structure containing entry and exit zones,

on the premise that the overlapping case is better handled

by robust direct correspondences. The aforementioned meth-

ods ascribe to observations an implicit correspondence, and

assume a unimodal statistical distribution of transitions. Tieu

et al. (2005) address this with a method capable of handling

multimodal distributions.

Marinakis et al. (2005); Marinakis and Dudek (2005) con-

sider cameras with full coverage of widely-separated sections

of hallways in a building, so that transitions are constrained

to the hallway topology. Due to these constraints, the entry

and exit zone coverage cells (transition graph vertices) are

the cameras themselves, and the cameras need only be capa-

ble of detecting an agent’s presence with reasonable ﬁdelity

for their method to successfully estimate the topology. Niu

and Grimson (2006) target a vehicle tracking application,

using appearance to match observations between, and infer

the topology of, non-overlapping cameras.

Dick and Brooks (2004) approach the predictive

tracking problem with a Markov model which captures

transition topology after a training phase, albeit not in

an explicitly combinatorial form, dividing the view into

blocks over which the topology is found. The method of

Gilbert and Bowden (2006) incrementally learns the topol-

ogy between recursively subdivided blocks of the views; their

method does not require a training phase and can adapt to

changes in the camera network. Both yield a probabilistic

topological model which can be used in conjunction with

appearance-based matching to track across disjoint views.

Zou et al. (2007) are interested in tracking humans, and

integrate appearance-based agent correspondence based on

face recognition into the inference method of Ellis, Makris,

and Black, for improved robustness in their target instance.

Nam et al. (2007) also speciﬁcally track humans, and an

appropriate appearance model is integral to their estimation

method. The method of Farrell and Davis (2008) falls within

this category as well, and is notable for its expressly distrib-

uted approach, which affords scalability to large, distributed

surveillance networks.

Finally, it should be noted that the coverage overlap model

developed by Van Den Hengel et al. (2006,2007), Detmold

et al. (2007,2008), Hill et al. (2008) can be extended, as the

authors explain, to capture non-overlapping transition topol-

ogy by adding a temporal padding window to the exclusion

method.

4.3 Analysis and Comparison of Transition Models

Table 3compares the properties of a selection of topological

transition models from the literature. Intepreting each model

as a graph, the ﬁrst and second columns indicate whether

the graph is directed and weighted, respectively. The third

column indicates whether vertices of the graph represent

individual entry/exit points, of which each camera may have

several; the implication otherwise is that the granularity is

at the level of cameras only. The fourth column indicates

whether the model includes an explicit source/sink vertex,

for agents entering or leaving the scene. The ﬁfth column

indicates whether the graph models transitions between over-

lapping cameras, thus implicitly modeling coverage overlap

to the extent described with direct tracking correspondence

applications in Sect. 3. The ﬁnal two columns specify which

type of data is used in constructing the model: statistical

correlation between temporal events, or correlation via an

appearance model.

123

Int J Comput Vis

Tabl e 3 Comparison of selected topological transition models

Model Properties Construction data

Directed Weight Entry/exit Source/sink Overlap Temporal Appearance

Ellis et al. (2003) •

Makris et al. (2004) •

Dick and Brooks (2004) •

Marinakis et al. (2005)•

Stauffer (2005) •

Tieu et al. (2005) •

Niu and Grimson (2006)•

Nam et al. (2007) •

Zou et al. (2007) ••

Farrell and Davis (2008)•

4.3.1 Combinatorial Structure

Relatively few of the transition models surveyed are explic-

itly presented as graphs resembling the generalized model

described in Sect. 4.1.Marinakis et al. (2005); Marinakis and

Dudek (2005) model the topology in a directed, unweighted

graph, in which vertices represent camera nodes and arcs

represent possible transitions. Transition probabilities and

durations are captured separately in an agent model. The

graph of Nam et al. (2007) also has a vertex for each camera

node, but also has intermediate vertices representing either

an overlapping or non-overlapping transition point and a

source/sink vertex; since individual entry and exit zones are

not represented, the graph is undirected. Zou et al. (2007)

use essentially the same model as Ellis et al. (2003); Makris

et al. (2004), but treat it explicitly as a weighted, directed

graph, with vertices representing entry and exit zones and

arcs indicating possible transitions. Trivially, related mod-

els, such as those of Stauffer (2005) and Tieu et al. (2005),

could be treated similarly. The transition matrix of Dick and

Brooks (2004) can be interpreted as an incidence matrix

for the transition graph. In general, it is not difﬁcult to

apply a graph interpretation to any of the models surveyed

here.

As discussed in Sect. 3.3.1, coverage overlap models typ-

ically represent each camera node as a vertex, a structure

which offers useful combinatorial properties in most appli-

cations. Some transition models employ this structure as

well. Marinakis et al. (2005); Marinakis and Dudek (2005)

assume widely separated cameras and wish to avoid deal-

ing with complex local tracking, so this is the sensible

representation for their case. Niu and Grimson (2006) and

Farrell and Davis (2008) also consider transitions only

between strictly non-overlapping cameras. In scenes of even

moderate complexity, however, a transition topology among

individual entry and exit points is more germane to predictive

tracking. This structure is described by Ellis et al. (2003);

Makris et al. (2004) and used by a plurality of the models

surveyed (Stauffer 2005;Tieu et al. 2005;Zou et al. 2007;

Nam et al. 2007). Dick and Brooks (2004) do not automati-

cally determine entry and exit points, but do divide the cam-

eras into coverage cells, which would induce the vertices in

a graph interpretation of their model.

Makris et al. (2004) include a source/sink vertex (which

they call a “virtual node”), in addition to the entry and exit

zone vertices, to handle the probabilistic paths of agents

entering or leaving the overall coverage of the camera net-

work. Marinakis and Dudek (2005) and Nam et al. (2007)

also include such a vertex in their models.

Among explicit graph models with arc weights, the deﬁ-

nition of the weighting function varies. Makris et al. (2004)

annotate arcs in the graphical representation of their model

according to the probability of transition, computed from the

cross-correlation of the temporal sequences of departure and

arrival events at each entry and exit zone (vertex), but do not

operate on it as a weighted graph. Zou et al. (2007) explic-

itly apply this weighting to the graph representation, with

a modiﬁed correlation function based on both identity and

appearance (as opposed to identity only). In contrast, Nam

et al. (2007) weight arcs based on the mean duration of tran-

sitions between cameras.

4.3.2 Construction from Visual Data

It is normally assumed that the camera network is uncal-

ibrated and that information about the scene and agent

dynamics is unavailable a priori. For the purposes of this dis-

cussion, we will approach the construction methods assum-

ing that entry and exit zones are known, either estimated sep-

arately (Stauffer 2003;Ellis et al. 2003;Gilbert and Bowden

2006) or speciﬁed a priori, as in the case where each camera

is a single entry/exit zone. If agents can be uniquely identiﬁed

123

Int J Comput Vis

and reliably matched between all generally non-overlapping

views, and a sequence of their arrival and departure events

is obtained over a period of time, distributions of the proba-

bilities and durations of transitions can be established. From

this information, all of the parameters of the general transi-

tion model can be obtained.

Unfortunately, correspondence of agents of arbitrary

appearance between generally disjoint views is notoriously

difﬁcult. Ellis et al. (2003); Makris et al. (2004) sidestep

this challenge with a method of construction based on pure

temporal correlation of otherwise unmatched observations.

Essentially, they assume implicit correspondence between

all pairs of arrival and departure events, and seek a sin-

gle mode of temporal correlation between each pair of

entry and exit zones within a time window (positive and

negative); every peak above a certain threshold induces

an arc in the transition graph between the associated ver-

tices. Stauffer (2005) employs a similar method, but con-

siders transitions between overlapping cameras separately

(see Sect. 4.3.3), so the transition time window is positive

only. Tieu et al. (2005) handle more general statistical depen-

dences, capturing richer multi-modal transition distributions

rather than simply a mean transition duration, and thus per-

mitting topology estimation from more complex agent behav-

ior. Marinakis et al. (2005); Marinakis and Dudek (2005)also

avoid direct correspondence. They assume that the dynam-

ics of an agent is a Markov process, and estimate the para-

meters of this process—the probabilities and durations of

transitions—using a Monte Carlo Expectation Maximization

method.

Methods which do rely on appearance-based agent cor-

respondence normally have a narrower application focus.

Dick and Brooks (2004) require a training phase for their

Markov model which relies on colour-based correspondence.

Niu and Grimson (2006) rely on correspondence of tracked

vehicles using an appearance model based on colour and

size. The estimation method of Nam et al. (2007) centers

around correspondence based on background subtraction and

a human appearance model. Farrell and Davis (2008)employ

an information-theoretic appearance matching process, and

infer the expected transition model from the accumulated

evidence using a modiﬁed multinomial distribution. Their

method is also notable for its distributed design: its “semi-

localized” processing yields a scalable algorithm for which

the authors demonstrate successful results in networks up to

100 nodes.

Zou et al. (2007) integrate correspondence based on face

recognition into the previously described statistical method

of Ellis, Makris, and Black, resulting in a hybrid approach

which they claim outperforms methods based purely on either

identity or appearance.

(a) (b)

Fig. 10 Possible cases of transition—dark ellipses denote entry and

exit zones, and the dotted line indicates the agent path

4.3.3 Transitions Between Overlapping Cameras

There is a question as to how transitions between cameras

with overlapping coverage should be handled in transition

models. Referring to the example agent paths in Fig. 10,itis

clear how to handle the transition between non-overlapping

cameras shown in Fig. 10a, as the surveyed methods unan-

imously agree: an arc from Aor its exit zone to Bor its

entry zone, with a positive transit duration. However, in the

transition between overlapping cameras shown in Fig. 10b,

the agent passes through the entry zone of B before passing

through the exit zone of A, and the agent is observed by one or

both cameras during the entire transition. Transitions from

one entry or exit zone to another within a single camera’s

coverage can be thought of as a special case of this scenario.

Ellis et al. (2003); Makris et al. (2004) deal with the over-

lapping case as with the non-overlapping case. For a given

departure event at time t1, they check for arrival events at time

t2∈[t1−T,t1+T], where Tis a temporal search window.

Thus, in Fig. 10a, t2>t1, whereas in Fig. 10b, t2<t1.The

advantage of this approach is that it does not require prior

estimation of overlap topology, and uses a single process to

estimate transition topology for a general-case camera net-

work with overlapping and/or non-overlapping cameras.

Stauffer (2005) argues that the overlapping case is best

handled by more robust direct tracking correspondence, and

proposes ﬁrst estimating overlap topology (Stauffer and Tieu

2003), then treating connected components in the vision

graph as single “cameras”—in general, with multiple entry

and exit zone vertices—in the transition model. The advan-

tage of this approach is improved robustness in estimating

the overlapping portions of the transition topology, assum-

ing a reliable means of ﬁnding inter-camera correspondences

of agents and/or their tracks is available.

4.4 State of the Art and Open Problems

Numerous researchers have converged on the structure

described in Sect. 4.1, to varying degrees. As with cover-

age overlap models, it is safe to say that this generalized

model subsumes all existing cases; individual models have

123

Int J Comput Vis

left out certain properties (arc directivity and weights, node

subdivision, source/sink node) either because they are unnec-

essary for the particular application case or else to facilitate

optimization. Given the clear focus on a single application

class, future optimization efforts should adopt such a uniﬁed

model, if possible, for the sake of general applicability.

Approximation of the graph from visual data is split

between statistical temporal correlation and appearance-

based correlation. Given the complementary strengths of

both methods, the way forward seems to be a hybrid approach

in the vein of Zou et al. (2007). If agent dynamics are being

modeled for the purposes of probabilistic occlusion, as by

Mittal and Davis (2008), this may also be informative for

transition model approximation.

One point of contention, to which the answer is not yet

clear, is whether the graph should model transitions strictly

between non-overlapping coverage cells, with overlapping

transitions handled separately as proposed by

Stauffer (2005), or all transitions. If the relative reliability of

the approximations for overlapping transitions is the issue,

implementation of the aforementioned hybrid approximation

approach may favor the latter uniﬁed model.

5 Conclusions and Future Directions

We have endeavoured to present a comprehensive and lucid

exposition of the theory, state of the art, and challenges of

modeling the visual coverage of camera networks. The mod-

els and estimation methods discussed in this survey represent

the efforts of researchers to develop theory to describe the

particular and unique properties of this relatively new type of

system, drawing on concepts from computer vision, sensor

networks, and other related ﬁelds.

Through an analysis of their properties in the context

of speciﬁc applications, a generalized prototype model of

each type has been derived, of which the major structure of

the models in the surveyed works can be cast as particular

instances. In addition to providing a clear overview of the

three types of models individually, it is relevant at this point

to explicitly expose the relationships between them in terms

of the information they encapsulate.

To date, camera network coverage has not been described

from the vantage of an inclusive understanding of its various

applications. Researchers have developed tools to achieve

particular objectives, adapting work from similar but not

quite identical problems elsewhere in the landscape. Over

time, this has begun to converge and evolve into a theoreti-

cal framework particular to camera networks. It is our belief

that the unique challenges involved warrant a next phase in

modeling camera network coverage, viz. the development of

a comprehensive, rigorous, and mature theory encompassing

geometric coverage as well as both notions of topology. Our

generalized models and exposition of their information hier-

archy are a ﬁrst step in this direction, but a truly useful theory

will be forged in the ﬁre of application, and many cues can be

taken from the design decisions made in the various works

surveyed here. A general, analytic understanding of coverage

will reduce duplicated effort and open up new possibilities in

solving a large cross-section of important problems in camera

networks.

Acknowledgments This research was supported in part by the Natural

Sciences and Engineering Research Council of Canada.

References

Angella, F., Reithler, L., & Gallesio, F. (2007). Optimal deployment

of cameras for video surveillance systems. In Proceedings of IEEE

conference on advanced video and signal based surveillance (pp.

388–392).

Antone, M., & Teller, S. (2002). Scalable extrinsic calibration of

omni-directional image networks. International Journal of Com-

puter Vision,49(2/3), 143–174.

Bajramovic, F., Brückner, M., & Denzler, J. (2009). Using common

ﬁeld of view detection for multi camera calibration. In Proceedings

of vision, modeling, and visualization workshop.

Bodor, R., Drenner, A., Janssen, M., Schrater, P., & Papanikolopoulos,

N. (2005). Mobile camera positioning to optimize the observability

of human activity recognition tasks. In Proceedings of IEEE/RSJ

international conference on intelligent robots (pp. 4037–4042).

Bodor, R., Drenner, A., Schrater, P., & Papanikolopoulos, N. (2007).

Optimal camera placement for automated surveillance tasks. Journal

of Intelligent and Robotic Systems,50(3), 257–295.

Brand, M., Antone, M., & Teller, S. (2004). Spectral soluction of large-

scale extrinsic camera calibration as a graph embedding problem.

In Proceedings of 8th European conference on computer vision (pp.

262–273).

Brückner, M., Bajramovic, F., & Denzler, J. (2009). Geometric and

probabilistic image dissimilarity measures for common ﬁeld of view

detection. In Proceedings of IEEE computer society conference on

computer vision and, pattern recognition (pp. 2052–2057).

Cerfontaine, P. A., Schirski, M., Bundgens, D., & Kuhlen, T. (2006).

Automatic multi-camera setup optimization for optical tracking. In

Proceedings of virtual reality conference (pp. 295–296).

Chen, S., & Li, Y. (2004). Automatic sensor placement for model-based

robot vision. IEEE Transactions on Systems, Man, and Cybernetics,

34(1), 393–408.

Chen, T. S., Tsai, H. W., Chen, C. P., & Peng, J. J. (2010). Object cover-

age with camera rotation in visual sensor networks. In Proceedings

of 6th international wireless communications and mobile computing

conference (pp. 79–83).

Chen, X., & Davis, J. (2008). An occlusion metric for selecting robust

camera conﬁgurations. Machine Vision and Applications,19(4),

217–222.

Cheng, Z., Devarajan, D., & Radke, R. J. (2007). Determining

vision graphs for distributed camera networks using feature digests.

EURASIP Journal on Advances in Signal Processing,2007, 1–11.

Chow, K. Y., Lui, K. S., & Lam, E. Y. (2007). Achieving 360 angle

coverage with minimum transmission cost in visual sensor networks.

In Proceedings of IEEE wireless communications and networking

conference (pp. 4112–4116).

Cowan, C. K., & Kovesi, P. D. (1988). Automatic sensor placement from

vision task requirements. IEEE Transactions on Pattern Analysis and

Machine Intelligence,10(3), 407–416.

123

Int J Comput Vis

Dai, R., & Akyildiz, I. F. (2009). A spatial correlation model for visual

information in wireless multimedia sensor networks. IEEE Transac-

tions on Multimedia,11(6), 1148–1159.

Detmold, H., Dick, A. R., Van Den Hengel, A., Cichowski, A., Hill, R.,

Kocadag, E., Falkner, K., & Munro, D. S. (2007). Topology estima-

tion for thousand-camera surveillance networks. In Proceedings of

1st ACM/IEEE international conference on distributed smart cam-

eras (pp. 195–202).

Detmold, H., Dick, A. R., Van Den Hengel, A., Cichowski, A., Hill,

R., Kocadag, E., Yarom, Y., Falkner, K., & Munro, D. S. (2008).

Estimating camera overlap in large and growing networks. In Pro-

ceedings of 2nd ACM/IEEE international conference on distributed

smart cameras.

Devarajan, D., & Radke, R. J. (2004). Distributed metric calibration of

large camera networks. In Proceedings of 1st workshop on broad-

band advanced sensor networks.

Dick, A. R., & Brooks, M. J. (2004). A stochastic approach to track-

ing objects across multiple cameras. In Proceedings Australian joint

conference on artiﬁcial intelligence (pp. 160–170).

Ellis, T. J., Makris, D., & Black, J. K. (2003). Learning a multi-camera

topology. In Proceedings of joint IEEE workshop on visual surveil-

lance and performance evaluation of tracking and surveillance (pp.

165–171).

Erdem, U. M., & Sclaroff, S. (2003). Automated placement of cam-

eras in a ﬂoorplan to satisfy task-speciﬁc constraints. Tech. Report,

Boston University.

Erdem, U. M., & Sclaroff, S. (2006). Automated camera layout to satisfy

task-speciﬁc and ﬂoor plan-speciﬁc coverage requirements. Com-

puter Vision and Image Understanding,103(3), 156–169.

Farrell, R., & Davis, L. S. (2008). Decentralized discovery of camera

network topology. In Proceedings of 2nd ACM/IEEE international

conference on distributed smart cameras.

Faugeras, O. (1993). Dimensional computer vision: A geometric view-

point. London: MIT Press.

Fiore, L., Somasundaram, G., Drenner, A., & Papanikolopoulos, N.

(2008). Optimal camera placement with adaptation to dynamic

scenes. In Proceedings of IEEE international conference on robotics

and automation (pp. 956–961).

Gilbert, A., & Bowden, R. (2006). Tracking objects across cameras by

incrementally learning inter-camera colour calibration and patterns

of activity. In Proceedings of 9th European conference on computer

vision (pp. 125–136).

González-Banos, H., & Latombe, J. C. (2001). A randomized art-gallery

algorithm for sensor placement. In Proceedings of 17th annual sym-

posium computational geometry (pp. 232–240).

Hill, R., Dick, A. R., Van Den Hengel, A., Cichowski, A., & Detmold,

H. (2008). Empirical evaluation of the exclusion approach to estimat-

ing camera overlap. In Proceedings of 2nd ACM/IEEE international

conference on distributed smart cameras.

Hörster, E., & Lienhart, R. (2006). On the optimal placement of multiple

visual sensors. In Proceedings of 4th ACM international workshop

on video surveillance and sensor, networks (pp. 111–120).

Hörster, E., & Lienhart, R. (2009). Optimal placement of multiple visual

sensors. In H. Aghajan & A. Cavallaro (Eds.), Multi-camera net-

works: Principles and applications (Chap. 5, pp. 117–138). Burling-

ton: Academic Press.

Huang, C. F., Tseng, Y. C., & Lo, L. C. (2007). The Coverage Problem

in Three-Dimensional Wireless Sensor Networks. J. Interconnection

Networks,8(3), 209–227.

Huber, D. F. (2001). Automatic 3D modeling using range images

obtained from unknown viewpoints. In Proceedings of 3rd inter-

national conference on 3D digital imaging and modeling (Vol. 7, pp.

153–160).

Javed, O., Khan, S., Rasheed, Z., & Shah, M. (2000). Camera handoff:

Tracking in multiple uncalibrated stationary cameras. In Proceedings

of workshop on human motion (pp. 113–118).

Javed, O., Rasheed, Z., Shaﬁque, K., & Shah, M. (2003). Tracking

across multiple cameras with disjoint views. In Proceedings of 9th

IEEE international conference on computer vision (pp. 952–957).

Jiang, Y., Yang, J., Chen, W., & Wang, W. (2010). A coverage enhance-

ment method of directional sensor network based on genetic algo-

rithm for occlusion-free surveillance. In Proceedings of interna-

tional conference on computational aspects of social networks

(pp. 311–314).

Kang, E. Y., Cohen, I., & Medioni, G. G. (2000). A graph-based global

registration for 2D mosaics integrated media systems center. In Pro-

ceedings of international conference on pattern recognition (pp.

257–260).

Khan, S., & Shah, M. (2003). Consistent labeling of tracked objects

in multiple cameras with overlapping ﬁelds of view. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence,25(10),

1355–1360.

Kulkarni, P., Shenoy, P., & Ganesan, D. (2007). Approximate initial-

ization of camera sensor networks. In