Conference PaperPDF Available

Feature-based calibration of distributed smart stereo camera networks



A distributed smart camera network is a collective of vision-capable devices with enough processing power to execute algorithms for collaborative vision tasks. A true 3D sensing network applies to a broad range of applications, and local stereo vision capabilities at each node offer the potential for a particularly robust implementation. A novel spatial calibration method for such a network is presented, which obtains pose estimates suitable for collaborative 3D vision in a distributed fashion using two stages of registration on robust 3D features. The method is initially described in a geometrical sense, then presented in a practical implementation using existing vision and registration algorithms. The method is designed independently of networking details, making only a few basic assumptions about the underlying networkpsilas capabilities. Experiments using both software simulations and physical devices are designed and executed to demonstrate performance.
Aaron Mavrinac, Xiang Chen, and Kemal Tepe
University of Windsor
Department of Electrical and Computer Engineering
401 Sunset Ave., Windsor, Ontario, Canada N9B 3P4
A distributed smart camera network is a collective of vision-
capable devices with enough processing power to execute al-
gorithms for collaborative vision tasks. A true 3D sensing
network applies to a broad range of applications, and local
stereo vision capabilities at each node offer the potential for
a particularly robust implementation. A novel spatial cali-
bration method for such a network is presented, which ob-
tains pose estimates suitable for collaborative 3D vision in a
distributed fashion using two stages of registration on robust
3D features. The method is initially de scribed in a geomet-
rical sense, then presented in a practical implementation us-
ing existing vision and registration algorithms. The method
is designed independently of networking details, ma king only
a few basic a ssumptions about the underlying network’s ca-
pabilities. Experiments using both software simulations and
physical devices are designed and executed to demonstrate
Index Terms camera network, calibration, collabora-
tive, distributed, registration, 3D vision
The relatively new concept of 3D visual sensor networks [2]
is emerging within the area of dis tributed smart cameras. By
collecting and processing true 3D information, such networks
offer improvements in existing applications and promise en-
tirely new possibilities.
The 3D sensing paradigm includes the use of passive 3D
(stereo) vision, fusion of information from multiple views,
and distributed collaborative processing. Many of the ba-
sic computer vision operations, including shape recognition,
object tracking, motion analysis, and scene reconstruction,
have been improved through one or two of these properties;
we contend that all three in unison yield yet greater bene-
fits. Thus, our work applies to distributed smart stereo cam-
era ne tworks, wherein each node consists of a device capable
Funding for this research was provided in part by the Natural Sciences
and Engineering Research Council of Canada.
of passive 3D vision and the distributed algorithms operate
primarily or exclusively on 3D data.
In order to perform any useful collaborative processing
of this information, there must exist some way to bring data
from multiple nodes into a common reference frame. This
is achieved through calibration, which includes spatial local-
ization and orientation as well as temporal synchronization.
While some distributed time synchronization methods from
the sensor network literature are applicable, existing localiza-
tion methods are insufficient.
This paper presents a novel scalable spatial calibration
method for distributed smart stereo camera networks. It is
designed independent of node architecture or network details,
and makes few assumptions about node deployment or scene
contents. The problem is reduc e d to a geometrical form in
Section 3, from which the implementation in Section 4 fol-
lows. A more thorough treatment c a n be found in [1].
The majority of rese a rch in distributed smart cameras to
date has focuse d on monocular vision at each node. A num-
ber of methods for distributed self-calibration have been pro-
posed for this paradigm, and though the vision components
are not readily applicable to 3D sensing nodes, the general
localization and distribution concepts developed apply to any
vision-based system.
From the perspective of traditional sensor networks, the
primary challenges are the directionality of vis ion sensors, the
higher degree of accuracy required by vision applications, and
the large volume of raw sensor da ta . Conversely, from the
perspective of traditional computer vision, the cha llenge is in
the scalable distribution of proces sing among nodes and the
related limitations of network bandwidth.
While traditional sensor network methods generally em-
ploy omnidirectional sensors and thus require only localiza-
tion, vision-based networks also require orientation. To apply
similar methods to directional vision sensors, the concept of
the vis ion graph is introduced in [4], where an edge on the
graph represents shared field of view rather than a communi-
cation link.
Functional calibration methods a re prese nted for monoc-
ular distributed smart cameras in [4, 5]. These are based on
2008 IEEE
wide-baseline stereo methods, which are generally not robust
due to the matching problem [13], a nd require unwieldy ini-
tialization schemes or dictate deployment constraints. Some
methods, such as [6], use motion of objects in the scene to
calibrate, but these still suffer from the matching problem
to a degree and require certain kinds of scene. Potentially
more robust methods are presented in [7, 8]; however, these
require the use of markers or bea cons placed in the environ-
ment, which is infeasible in many ca ses and may constrain
deployment or extension to dynamic calibration.
With the true 3D sensing network paradigm introduced in
[2], advocating distributed smart stereo cameras, a calibration
method called Lighthouse is presented in [3] which use s 3D
features and geographic hash tables (GHTs) to localize and
orient nodes. Our method employs the same basic concept,
but is more complete and addresse s some impractica lities in
the former.
2.1. Definitions
2.1.1. Nodes and Groups
A node is the abstract or physical smart stereo camera device
itself; nodes shall be denoted by s equential capital letters (A,
B, and so forth). The set of all nodes in the network shall be
denoted N (where |N| represents the total number of nodes).
A group is a set of nodes which a gree on a single leader node;
a group led by node A shall be denoted G
(where |G
| rep-
resents the number of nodes in the group). Every group is a
subset of the full set of nodes (G
N), and every node is
a member of exactly one group (so, if G
and G
are two
separate groups, |G
| = 0).
2.1.2. Point Sets and Features
A point set is the full set of interest points dete cted locally
at a node; the point set of node A shall be denoted S
. The
overlap betwee n point sets S
and S
refers to the size of
the intersection of the two sets |S
|, said intersection
occurring where a point in S
corresponds to the same phys-
ical point as a point in S
. The percent overlap is defined as
, S
) =
|, |S
× 100% (1)
A feature is any subset of the point set of a certain size (de-
termined by a parameter of the algorithm); when discussing a
single arbitrary feature from node A, it shall be denoted F
where F
. Two features F
and F
, from nodes A and
B respectively, are considered to match (denoted F
if each point in F
corresponds to the same physical point as
a point in F
. In the context of the algorithm, it is impossible
to ascertain this correspondence, so the term match implies
rather a presumed match based on a criterion of geometrical
2.2. Pose
Pose is a concept used here to describe the relative motion
between two nodes in a distributed smart camera network,
which is the basis of calibration. Each node is considered
to have its own local coordinate system. The relative pose of
node A with respect to node B is denoted P
, and is the
rigid transformation in 3D Euclidean space from the coordi-
nate system of A to that of B.
The transformation P
: R
consists of a rotation
matrix (3 × 3 real orthogonal matrix) R
and a 3-element
translation vector T
. P
maps a point x R
as follows:
(x) = R
x + T
The identity pose is denoted P
, and consists of the identity
matrix R
and the zero vector T
The inverse of pose P
, denoted P
, reverses the pose
transformation (so that P
= P
). It can be determined
as follows:
(x) = R
x R
A succession of pose transformations P
(x)) can
be composed into a single pose, denoted (P
)(x), as
)(x) = R
x + (R
+ T
) (4)
This transformation maps from the coordinate system of A
to that of B, then from that of B to that of C; therefore, the
transformation from A to C can be computed via composition
as P
= (P
)(x). This operation is transitive, so
one node’s pose relative to another can be computed indirectly
over an arbitrary number of intermediate pose s if they exist.
2.3. Graphs
Three types of undirected graphs are helpful in describing dis-
tributed smart camera calibration: the communication graph,
the vision graph, and the calibration graph [4 ]. Graphs are
described a s connected if there exists a path connecting every
pair of nodes, and complete if there exists an edge between
each pair of nodes.
The communication graph describes the effective commu-
nication links between nodes in the network from the perspec-
tive of the layer presented to the application. A complete
communication graph indicates that any node may commu-
nicate directly with any other node.
The vision graph describes which nodes share significant
portions of their field of view. A pair of nodes have a con-
necting e dge in this graph if the volume of space in the inter-
section of the ir fields of view is considered large enough that
it might contain sufficient data for the operations required by
the algorithm.
The calibration graph describes which nodes have a di-
rect estimate of their pairwise pos e. Obviously, it is desirable
that this graph be connected, s o that any two nodes X and
Y may es timate their relative pose P
by composition of
known pose estimates. Edges can only be established where
there exist edges in the vision graph, and the most complete
calibration graph possible is ide ntica l to the vision graph.
3.1. Problem Statement
The overall objective is to spatially ca librate a series of homo-
geneous smart stereo camera nodes , with no a priori knowl-
edge and using only the nodes ’ 3D visual data, in a distributed
fashion. Assuming the visual data consists of a set of 3D
points triangulated from stereo images of the environment,
the problem may be reduced to geometrical terms:
Given a set of nodes N, each node X N hav-
ing a point set S
, estimate the pos e P
enough node pairs (X, Y ) such that the calibra-
tion graph for N is connected.
The shared view assumption (3.2.2) and the repeatability cri-
terion of interest point detection (4.2) imply a sufficient de -
gree of overlap between a sufficient numbe r of node pairs for
3.2. Assumptions
3.2.1. Pre-Deployment Offline Access
It is assumed that, prior to deployment of the network, there is
a period during which each node may be accesse d without re-
striction in a controlled environment, in order to perform cer-
tain essential modifications to software (such as assignment of
a unique identifier, network configuration, and intrinsic/stereo
calibration of the cameras).
3.2.2. Shared View
For full convergence, it is assumed that the vision graph is
connected. This imposes a minimum qualitative constraint on
node deployment that the shared field of view of the entire
network be continuous and have substantial internal pairwise
3.2.3. Fixed Nodes
It is assumed that each node is fixed in its location and ori-
entation relative to all other nodes. It is also assumed that,
once internally calibrated for stereo vision, no node change s
the relative motion between its cameras or the internal param-
eters (e.g. focal length) of either of its cameras.
3.2.4. Static Scene
It is a ssumed that the contents of the scene are fully static
for the purpose s of acquiring calibration point sets. This is
solely for simplicity, and could easily be relaxed by employ-
ing background estimation techniques or accurate temporal
3.2.5. Abstract Network
It is assumed tha t the nodes are capable of autonomously
forming an ad-hoc network of s ome kind, wherein each node
can be addressed by a unique identifier. From the algorithm’s
point of view, the network is assumed to be fully connected
[17], or in other words, the communication graph is assumed
to be complete. Additionally, it is assumed that arbitrary
amounts of data can be sent with assured delivery.
3.3. Problem Analysis
3.3.1. Two-Stage Registration
Bringing the point sets, and thereby the node coordinate sys-
tems, into a lignment with one another can be accomplished
by registration. Registration algorithms may be divided into
two types: coarse registration, which can align points without
an initial estimate but are generally not very accurate; a nd fine
registration, which require an initial estimate to align points
but are very accurate [9].
For our purposes, no alignment estimate is initially avail-
able, yet high accuracy is desirable. The typical solution when
presented with such a problem is a two-stage a pproac h, us-
ing coarse registration to initialize fine registration. However,
there is more to the problem in our case: it is not even known
which point sets overlap or to what degree. We use a process
of feature matching to determine how to proceed with regis-
tration between nodes .
3.3.2. Feature Matching
In order to find c oa rs e pose estimates between nodes with no
knowledge of their point set overlap in a distributed fashion,
a pairwise feature matching process similar to that described
in [3] ca n be employed.
The goal is to find pairwise matches between nodes’ fea-
tures, and then use those matches to calculate coarse relative
pose estimates for the node pairs. Both a re accomplished
through the c oarse registration algorithm; if the registration
error falls below a certain threshold t
, the features are con-
sidered to match, and the registration result yields a coarse
pose estimate betwe e n the source nodes.
Consider point sets from two nodes, S
and S
, from
which, according to the coarse matching algorithm, each node
randomly selects a feature of size f 3, resulting in F
and F
where |F
| = |F
| = f |S
|. The
performance of the matching scheme depends on the proba-
bility of a match between F
and F
, P (F
), which
can be calculated as follows:
P (F
) =
| f)!(|S
| f)!
| f)!
It is therefore desirable to increase |S
| relative to |S
and |S
| (i.e., increase the percent overlap), which translates
into repeatability in interest point detection (4.2). There is
a trade-off in the value of f between matching performance
and false matches; generally, a low value such as f = 4 is
3.3.3. Feature Categorization
No details have yet been given about how to bring features to-
gether for matching in a distributed fashion. The idea of fea-
ture c a tegorization is borrowed from the data-centric storage
literature, us e d with reference to distributed smart camera net-
works in [2] and more specifically to their calibration in [3].
The goal is to evenly distribute the processing and storage of
the data in a distributed system based on s ome quantitative or
qualitative metric of the data itself. For this, a smooth, deter-
ministic geometric desc riptor function, denoted g, is used.
The solution space of this descriptor is then divided as
evenly as possible among the nodes in the network, with some
overlap (see be low), and features detected locally a t each node
are sent to the appropriate node for matching to other ge omet-
rically similar features.
Ideally, the difference between the des criptors of two fea-
tures F
and F
describes the degree of difference d between
those features:
, F
) = |g(F
) g(F
)| (6)
Based on the measurement accuracy of a node and the spe cific
coarse registration algorithm used, there is a similarity thresh-
old t
, such that it is necessary to compare two features F
and F
if d(F
, F
) < t
, a nd unnecessary otherwise; this
will be termed the similarity condition. The desirable overlap
for categorization, then, is t
/2 in all directions.
Note that categorization, and thus the nodes where fea-
tures are matched, has no relation to the nodes where those
features originated. When a matc h is found, the result is re-
turned to one of the two source nodes, bas e d on s ome deter-
ministic selection function such that for a given pair of nodes
the same node is always selected.
3.3.4. Coarse Grouping
In order to guarantee that all nodes with edges on the vision
graph attempt pairwise pose refinement without the need for
exhaustive feature matching, we introduce a grouping scheme
wherein nodes are merged into ever-larger groups within the
same coordinate system, albeit with only coarse estimates.
Through pose composition, any node in a group can deter-
mine its coarse pose with respect to any other node . This is
conceptually similar in some ways to the GHT sc he me pro-
posed in [3].
A node always knows its current group leader and the set
of nodes comprising its group. Within a group (2.1.1), each
node has a coarse pose estimate relative to the group leader,
called its group coarse pos e estimate, and denoted C
for a
node A. Relative coarse pose estimates (e.g. C
for node
A relative to node B) can be computed from these, either di-
rectly or through one or more compositions. Initially, each
node begins in a singleton group, of which it is the leader,
with its group coarse pose estimate initialized to P
A merge is initiated when two nodes have detected a cer-
tain minimum number t
of consistent matches with each
other. Consistency is enforced via a threshold t
ing the minimum Euclidean distance between the pose esti-
mates’ mappings of a given point (such as the centroid µ
of the computing node’s point set). Once a node has stored
at least t
matches with a particular other node, e ach time a
new match is de te cted for that node, an average coarse pose
estimate is computed for every combination M
of matches
containing the new match, and checked for consistency:
) C
)|| t
, m M
If a consistent average is found, it is considered a reliable
relative coarse pose estimate, and is forwarded to the source
nodes’ group leaders and composed as necessary to merge the
nodes’ respective groups.
Fig. 1. Group Merging
Figure 1 illustrates a typical group merge. Node D, of
group G
, and node G, of group G
, find a relative coarse
pose estimate through feature matching, and initiate a merge.
The nodes in group G
do not modify their group coarse pose
information. Node Gs new group coarse pose estimate (C
is the composition of its estimated pose relative to node D
with node Ds group coarse pose estimate:
= C
The new group coarse pose estimates for the merging group’s
leader (C
) and any other nodes in the merging group (in
this case, C
) can similarly be calculated as compositions of
known pose estimates:
= C
) (9)
= C
)) (10)
Since merging consists of composition operations, it is a tran-
sitive operation which can occur based on matches (and the
resultant relative coa rs e pose e stimates) between any pair of
nodes in different groups. Figure 1 illustrates this by showing
the actual history of merges leading to the groups as arrows
between the node pairs; in reality, of course, every node in the
group has a direct pose es timate to the leader (group coarse
pose estimate).
3.3.5. Pairwise Pose Refinement
Once a given pair of node s belong to a group via the feature
matching process, those node s can use their coarse relative
pose estimate as a starting point for pose refinement. This is
achieved by applying a fine registration algorithm to a large
number of points initialized into coarse alignment.
Fig. 2. Field of View Cone Approximation
As shown in Figure 2, the actual point s ets used for fine
registration are selected, at each node, as those falling within
the intersection of the two nodes’ fields of view, as approxi-
mated by a cone of a certain angle and length extending along
the positive z-axis of each node’s coordinate system (via the
coarse pose estimate). If there are fewer than a specified min-
imum number of points, which includes the case where there
is no intersection at a ll, the nodes do not attempt pose refine-
3.3.6. Indirect Pose Estimation
A pair of nodes attempting to determine their relative pose can
now communicate directly to find the shortest path along the
existing pairwise fine pose estimates (calibration graph) and
thus obtain a composition with a minimum of error. A node A
may find such an estimate P
relative to a node B according
to the following a lgorithm (suppose F P
represents the set of
fine pose estimates at node A):
1. If P
, select P
and end.
2. For each P
, reques t F P
from node X. If
, select P
= P
and end.
3. For each P
, reques t F P
from node Y . If
, select P
= P
4. Continue until P
has been found.
As indirect fine pose e stimates are found (even intermediate
ones that were not requested), they should be added to F P to
avoid unne cessary repetition of network requests and compu-
4.1. Distributed Calibration Algorithm
The algorithm is split into ten distinct processes at each node;
six for coarse grouping, and four for pairwise pose refine-
ment. Each process acts upon receipt of a message, with the
exception of the feature selection process, which executes pe-
riodically, and the pose refinement initiator process, which
executes whenever the group composition is updated.
There are four parameters intrinsic to the algorithm itself,
following from Section 3.3: the feature size f , the similar-
ity threshold t
, the match threshold t
, and the consistency
threshold t
. Certain other implementation-specific parame-
ters are also required, notably those for the coarse and fine
registration algorithms; in particular, t
and t
are refer-
enced here as generic error thres holds for the coarse and fine
registration a lgorithms, respectively.
Fig. 3. Feature Selection Process
Fig. 4. Feature Matching Process
Fig. 5. Match Processing Process
Fig. 6. Group Merge Initiator Process
Fig. 7. Group Merge R esponder Process
Fig. 8. Group Update Process
Fig. 9. Pose Refinement Initiator Process
4.2. Interest Point Detection
Three distinct parts of calibration are impacted directly by the
interest point detection algorithm used: the c orresponde nce
algorithm, the coarse matching s cheme, and the fine registra-
tion algorithm. In all three cases, it is the repeatability perfor-
mance metric which is of interest; higher repeatability yields
higher overlap in the point sets.
For practical purposes (including the availability of source
Fig. 10. Pose Refinement Responder Process
Fig. 11. Fine Registration Process
Fig. 12. Pose Update Process
code from the authors), the FAST interest point detector [14,
15] is selected for this implementation. C onvergence ca n be
encouraged by constraining nodes to sha re large portions of
their fields of view or by calibrating on a sc ene with strong
interest points.
4.3. Registration
Since matching features overlap fully, an excellent s olution to
the coarse registration problem is the fully-contained version
of the DARCES algorithm [10], using three control points.
DARCES without the RANSAC component is a relatively
simple algorithm, allowing it to perform rapid matching on
a large number of features.
The concept of the Iterative Closest Point (ICP) algorithm
[11] le nds itself well to the fine registration problem encoun-
tered in pairwise pose refinement. However, the difficulty
of stable interest point detection, occlusion effects, and un-
certainty about the overlap in field of view all contribute to
poor overlap in the point sets used for pose refinement. The
Trimmed Iterative Closest Point (TrICP) algorithm [12], used
in this implementation, can be automatically tuned to any de-
gree of overlap, and is applicable to overlaps under 50%.
5.1. Performance Metrics
5.1.1. Convergence
Convergence is the measure of the algorithm’s ability to bring
nodes into a common reference frame and its time perfor-
mance in doing so. For our purposes, there are actually two
distinct considerations:
1. The ability of coarse grouping to merge into a minimum
number of groups.
2. The ability of pairwise pose refinement to establish a
maximum number of pairwise estimates.
Calibration is considered successful in terms of convergence
when coarse grouping merges the entire network into a single
group and pairwise fine pose estimates are established such
that the calibration graph is connected.
5.1.2. Accuracy
Accuracy is the measure of the error in the algorithm’s result-
ing pose estimates. The mean error in a pose estimate can
be determined by averaging the Euclidean distance between a
number of points with ground-truth corresponde nc e, detected
and triangulated at the nodes separately from those used for
calibration. Although error accumulates with the path length
(number of pose c ompos itions) in the calibration graph, it is
more relevant to cons ider the path length in the vision graph,
since the 3D reconstruction cons istency among nodes observ-
ing the same part of the scene is the likely criterion.
5.1.3. Scalability
Scalability is the measure of the effect on the algorithm’s pe r-
formance of the number of nodes in the network. The three
primary resources to consider are node-local computing re-
sources (i.e. CPU and memory), node-local data storage, and
network bandwidth.
In order to properly evaluate sca la bility, it is necessary to
examine individual factors arising from the algorithm itself.
The most significant of these can be summarized in terms of
the number of nodes in the network |N| as follows:
Feature dissemination requires bandwidth resources in
|N| per node.
Feature matching requires computing and storage re-
sources in |N|.
This as sumes that each node maintains a more or le ss constant
number of pairwise edges in the vision graph regardless of
|N|, as would be the case with most applications. In cases
where this assumption does not hold, it is necessary to add a
third factor:
Pairwise pose refinement computation requires com-
puting resources in |N|.
Scalability in all three resources can be quantized experimen-
tally in te rms of the above factors.
5.2. Manual Point Set
In order to test the capabilities of the ca libration algorithm
and tune its parameters under controlled conditions, the first
experiment series is designed to ope rate on manually selected
points with full correspondences across all four node s. The
primary purpose of this experiment type, once suitable param-
eters are found, is to test the effects of different point set s izes
and overlap characte ristics on convergence and accuracy.
5.2.1. Procedure
A total of 22 point subsets are extracted from the da ta, and
each is tested using the distributed calibration software, with
all four nodes running loc ally on the same workstation. This
procedure is repeated twice for each subset, and the average
results for convergence time and mean error are calculated
and recorded.
5.2.2. Results
(a) Convergence Time Trends in n and p
(b) Accuracy Trends in n and p
Fig. 13. Manual Experiment Results
5.3. Automatic Point Set
Having es ta blished some criteria for reasonably timely con-
vergence in the manual point set experiments, the next ste p is
to test real automatic calibration of the ne twork. The purpose
of these experiments is to test the convergence and accuracy
performance of the algorithm in real conditions.
5.3.1. Procedure
Four instances of the local point detection software, config-
ured to execute the distributed calibration software on com-
pletion, are run in automatic mode on the vision platform
workstation. Convergence time and the final calibration graph
are recorded. A ground truth point set is manually selected
for each camera rig, and the mean error is calculated and
5.3.2. Results
The me a n error a nd convergence time of a typical experiment
from this series is shown below. Figure 14 shows the final
calibration graph, Figure 15 shows the physical deployment
of the nodes for the experiment, and Figure 16 shows the vi-
sualization of the resultant pose estimates (which can be seen
to match the configuration in Figure 15).
Mean Error: 2.7666 mm
Convergence Time: 159 s
Fig. 14. Calibration Graph for A utomatic Experiment
5.4. Virtual Point Set
Since only four physical camera rigs are available, testing
scalability to larger networks is impossible in an automatic
experiment and difficult to control using the manual meth-
ods. Instead, controlled virtual point sets a re supplied to the
same calibration algorithm implementation to test the scala-
bility metric.
5.4.1. Procedure
Point sets are generated for 5, 10, 15, 20, and 25 nodes. The
total outgoing bandwidth in kilobytes, final size of the match-
ing database in features, and total number of coars e and fine
registration executions are recorded.
Fig. 15. Camera Deployment for Automatic Experiment
Fig. 16. Pose Visualization for Automatic Experiment
5.4.2. Results
As expected, total bandwidth usage per node increases ap-
proximately linearly in relation to the number of nodes in the
network (Figure 17). This affects different networks in differ-
ent ways. In a network where the physical me dium is shared
by all nodes the worst-case scenario the total network
bandwidth usage is the relevant factor. In that case, the band-
width usage increases non-linearly, potentially at up to |N|
However, many routing methods us e d in sensor networks are
much more efficient and therefore mitigate this effect.
The number of features stored at each node increases ap-
proximately linearly in relation to the number of nodes (Fig-
ure 18). Features are very small data (a series of f 3-tuples, an
identifier, and a geometric descriptor value), but when scaling
to extremely large networks it must be ensured that adequate
storage is provided at each node for these features.
The number of coarse registration operations performed at
each node increases approximately linearly in relation to the
number of node s (Figure 19); as expected, this is proportional
Fig. 17. Bandwidth Usage in |N| (Average and Maximum)
Fig. 18. Node-Local Storage in |N| (Average and Maximum)
to the number of features stored. If processing throughput is
the limiting factor, this increase will cause the convergence
time to increase linearly with the number of nodes.
Fig. 19. Coarse Registration Processing in |N| (Average and
Since, in this network, the number of vision graph edges
per node does not generally increase as its total number of
nodes increases, the number of fine registrations per node is
approximately constant.
A calibration method for distributed smart stereo camera net-
works has been developed which converges well, provides ac-
curate pairwise orientation, and scales well to large networks.
This provides a bas e upon which to build a full 3D visual
sensor network providing primitive data-centric queries, upon
which in turn a variety of high-level applications can be de-
Currently, the algorithm makes it possible for smart stereo
camera devices to self-localize and self-orient relative to one
another in a distributed fashion, a llowing for various subse -
quent stages of realization for a variety of applications. The
immediate opportunity is to provide a generalized framework
for building these solutions, which would rest on the underly-
ing ass umption that the network is accurately calibrated and
can perform 3D reconstruction across multiple views.
The major implementation drawback is the instability of
interest point detection in the general case; at present, it is
necessary to control the s cene somewhat by adding one or
more calibration ta rgets for convergence to occur reliably. Im-
proving this situation is an important avenue for future work.
[1] A. Mavrinac, “Feature-Based Calibration of Distributed
Smart Stereo Camera Networks, M.A.Sc. Thesis, Uni-
versity of Windsor, 2008.
[2] M. Akdere, U. Cetintemel, D. Crispell, J. Ja nnotti,
J. Mao, and G. Taubin, “Data -Ce ntric Visual Sensor
Networks for 3D Sens ing, in Proc. 2nd Intl. Conf. on
Geosensor Networks , 2006.
[3] J. Jannotti and J. Mao, “Distributed Calibration of Smart
Cameras, in Proc. Intl. Workshop on Distributed Smart
Cameras, pp. 55–61, 2006.
[4] D. Devarajan and R. J. Radke, “Distributed Metric Cal-
ibration of Large Camera Networks, in Proc. 1st Work-
shop on Broadband Advanced Sensor Networks, 2004.
[5] W. E. Mantzel, H. Choi, and R. G. Baraniuk, “Dis-
tributed Camera Network Localization, in Proc. 38th
Asilomar Conf. on Signals, Systems and Computers,
[6] S. Funiak, C. Guestrin, M. Paskin, and R. Sukthankar,
“Distributed Localization of Networked Cameras, in
Proc. 5th Intl. Conf. on Information Processing in Sen-
sor Networks , pp. 34–42, 2006.
[7] C. Beall and H. Qi, Distributed Self-Deployment in Vi-
sual Sensor Networks, in Proc. Intl. Conf. on Control,
Automation, Robotics and Vision, pp. 1–6, 2006.
[8] C. J. Taylor and B. Shirmohammadi, “Self Localiz-
ing Smart Camera Networks and their Applications to
3D Modeling, in Proc. Intl. Workshop on Distr ibuted
Smart Cameras, pp. 46–50, 2006.
[9] J. Salvi, C. M a ta bosch, D. Fofi, and J. Forest, A Review
of Recent Range Image Registration Methods with Ac-
curacy Evaluation, Image and Vision Computing, vol.
25, no. 5, pp. 578–596, 2007.
[10] C.-S. Chen, Y.-P. Hung, and J.-B. Cheng, “RANSAC-
Based DARCES: A New Approach to Fast Automatic
Registration of Partially Overlapping Range Image s,
IEEE Trans. on Pattern Analysis and Machine Intelli-
gence, vol. 21, no. 11, pp. 1229–1234, 1999.
[11] P. J. Besl and N. D. McKay, A Method for Registration
of 3-D Shapes, IEEE Trans. on Pattern Analysis and
Machine Intelligence, vol. 14, no. 2, pp. 239–256, 1992.
[12] D. Chetverikov, D. Svirko, D. Stepanov, and P. Krsek,
“The Trimmed Iterative Closest Point Algorithm, in
Proc. Intl. Conf. on Pattern Recognition, pp. 545–548,
[13] A. Baumberg, “ R eliable Feature Matching Across
Widely Separated Views, in Proc. IEEE Conf. on Com-
puter Vision and Pattern Recognition, pp. 1774–1781,
[14] E. Rosten and T. Drummond, “Fusing Points and Lines
for High Performance Tracking, in Proc. 10th IEEE
Intl. Conf. on Computer Vision, pp. 1508–1511, 2005.
[15] E. Rosten and T. Drummond, “Machine Learning for
High-Speed Corner Detection, in Proc. 9th European
Conf. on Computer Vision, pp. 430–443, 2006.
[16] H. C. Longuet-Higgins, A Computer Algorithm for Re-
constructing a Scene from Two Projections, Nature,
vol. 293, pp. 133–135, 1981.
[17] M. Raynal, Distributed Algorithms and Protocols, John
Wiley & Sons, 1988.
[18] O. Faugeras, Three-Dimensional Computer Vision: A
Geometric Viewpoint, The MIT Press, 1993.
[19] Y. Ma, S. Soatto, J. Ko
a, S. S. Sastry, An Invita-
tion to 3-D Vision: From Images to Geometric Models.
Springer-Verlag, 2004.
... As SfM approaches often take a significant amount of time, it may be more efficient to perform only a partial calibration in systems that require periodic re-calibration. For example, in [11] the intrinsic parameters are supposed to be known a priori and in [8,10], specifically aimed at stereo cameras in distributed 3D visual sensor network, the stereo-intrinsic parameters are assumed known and fixed, leaving only the stereo-extrinsic parameters to be estimated. ...
In this paper, we describe a system that performs online transformation estimation between pre-calibrated stereo cameras. This allows the stereo cameras to be moved around and automatically re-calibrated without the use of a calibration object. This also allows the set-up to recover from accidental nudges that invalidate the extrinsic (external to the stereo camera) calibration. The obtained transformations can be used in virtual view rendering for 3D Video. The relative positions and orientations of the stereo cameras are obtained using sparse point correspondences found in different views of the scene. For each stereo camera, 3D coordinates of salient scene points are triangulated and their image feature descriptors are used to locate the same points in the views of other stereo cameras. The salient point descriptors SIFT and SURF are evaluated for this purpose. Given enough salient image points, the proposed solution accurately finds the transformation between stereo camera pairs with a reprojection error less than 1 pixel.
Conference Paper
Vision-based inspection systems measures defects accurately with the help of a checkerboard calibration (CBC) method. However, the 3D measurements of such systems are prone to errors, caused by physical misalignment of the object-of-interest and noisy image data. The PantoInspect Train Monitoring System (PTMS), is one such system that inspects defects on pantographs mounted on top of the electric trains. In PTMS, the measurement errors can compromise railway safety. Although this problem can be solved by re-calibrating the cameras, the process involves manual intervention leading to large servicing times. Therefore, in this paper, we propose Feature Based Calibration (FBC) in place of CBC, to cater an obvious need for online re-calibration that enhances the usability of the system. FBC involves feature extraction, pose estimation, back-projection of defect points and estimation of 3D measurements. We explore four state-of-the-art pose estimation algorithms in FBC using very few feature points. This paper evaluates and discusses the performance of FBC and its robustness against practical problems, in comparison to CBC. As a result, we identify the best FBC algorithm type and operational scheme for PTMS. In conclusion, we show that, by adopting FBC in PTMS and other related 3D systems, better performance and robustness can be achieved compared to CBC.
Stereo-based 3D distributed smart camera networks are useful in a broad range of applications. Knowledge of the relative locations and orientations of nodes in the network is an essential prerequisite for true 3D sensing. A novel spatial calibration method for a network of pre-calibrated stereo smart cameras is presented, which obtains pose estimates suitable for collaborative 3D vision in a distributed fashion using two stages of registration on robust 3D point sets. The method is initially described in a geometrical sense, then presented in a practical implementation using existing vision and registration algorithms. Experiments using both software simulations and physical devices are designed and executed to demonstrate performance.
Full-text available
There is currently a strong trend towards the deployment of advanced computer vision methods on embedded systems. This deployment is very challenging since embedded plat-forms often provide limited resources such as computing per-formance, memory and power. In this paper we present a decentralized solution for track-ing objects across multiple embedded smart cameras. Smart cameras combine video sensing, processing and communica-tion on a single embedded device which is equipped with a multi-processor computation and communication infrastruc-ture. Tracking an object within the multi-camera system is done by a single tracking instance which follows the target by migrating to the camera which observes the object next. Our multi-camera tracking approach focuses on a fully de-centralized handover between adjacent cameras. Having no central coordination results in an autonomous and scalable tracking method. We have fully implemented this novel multi-camera tracking approach on our embedded smart cameras. Track-ing is achieved by the well-known CamShift algorithm; the handover procedure is realized using a mobile agent system available on the smart camera network. For visualization, the smart cameras send the observed scene and tracking re-sults to a separate PC. Our approach has been successfully evaluated on tracking people at our campus.
A simple algorithm for computing the three-dimensional structure of a scene from a correlated pair of perspective projections is described here, when the spatial relationship between the two projections is unknown. This problem is relevant not only to photographic surveying1 but also to binocular vision2, where the non-visual information available to the observer about the orientation and focal length of each eye is much less accurate than the optical information supplied by the retinal images themselves. The problem also arises in monocular perception of motion3, where the two projections represent views which are separated in time as well as space. As Marr and Poggio4 have noted, the fusing of two images to produce a three-dimensional percept involves two distinct processes: the establishment of a 1:1 correspondence between image points in the two views—the 'correspondence problem'—and the use of the associated disparities for determining the distances of visible elements in the scene. I shall assume that the correspondence problem has been solved; the problem of reconstructing the scene then reduces to that of finding the relative orientation of the two viewpoints.
Motivated by applications in surveillance sensor networks, we present a distributed algorithm for the automatic, exter-nal, metric calibration of a network of cameras with no centralized processor. We model the set of uncalibrated cameras as nodes in a communication network, and pro-pose a distributed algorithm in which each camera only communicates with other cameras that image some of the same scene points. Each node independently forms a neighborhood cluster on which the local calibration takes place, and calibrated nodes and scene points are incremen-tally merged into a common coordinate frame. The accu-rate performance of the algorithm is illustrated using ex-amples that model real-world sensor networking situations.
This paper describes a technology for localizing networks of embedded cameras and sensors. In this scheme the cameras and the nodes are equipped with controllable light sources (either visible or infrared) which are used for sig-naling. Each camera node can then automatically determine the bearing to all the nodes that are visible from its vantage point. From these angular measurements, the camera nodes are able to determine the relative positions and orientations of other nodes in the network. The method is dual to other network localization tech-niques in that it uses angular measurements derived from images rather than range measurements derived from time of flight or signal attenuation. The scheme can be imple-mented with commonly available components and scales well since the localization calculations only require limited local communication. Further, the method provides esti-mates of camera orientation which cannot be determined solely from range measurements. The localization technology can serve as a basic capa-bility on which higher level applications can be built. The method could be used to automatically survey the locations of sensors of interest, to implement distributed surveillance systems or to analyze the structure of a scene based on the images obtained from multiple registered vantage points.
Localization in sensornets determines the location of sensor nodes, and allows applications to make geograph-ically sensitive queries. Smart camera networks must not only be localized, but calibrated. Calibration goes beyond localization to include orientation and position information that is sufficiently fine-grained to allow fu-sion between overlapping camera views. This paper introduces Lighthouse, a distributed calibra-tion system that allows wireless smart camera networks to obtain a unified coordinate system without manual configuration or additional hardware. The proposed technique uses stereo cameras to obtain robust 3D fea-ture sets which are matched using incrementally built Geographic Hash Tables (GHTs). Lighthouse finds matches between cameras, even be-tween distant cameras, without centralizing observa-tions. Lighthouse also contributes several advancements in the cooperative creation of GHTs, including boot-strapping, topology determination, and consistent hash-ing for topology changes. Simulations indicate that Lighthouse significantly outperforms simpler matching schemes at all feature densities, and approximates the centralized solution in all but the most feature-poor en-vironments.