Content uploaded by Xiang Chen
Author content
All content in this area was uploaded by Xiang Chen on Oct 10, 2014
Content may be subject to copyright.
FEATURE-BASED CALIBRATION OF DISTRIBUTED
SMART STEREO CAMERA NETWORKS
Aaron Mavrinac, Xiang Chen, and Kemal Tepe
University of Windsor
Department of Electrical and Computer Engineering
401 Sunset Ave., Windsor, Ontario, Canada N9B 3P4
ABSTRACT
A distributed smart camera network is a collective of vision-
capable devices with enough processing power to execute al-
gorithms for collaborative vision tasks. A true 3D sensing
network applies to a broad range of applications, and local
stereo vision capabilities at each node offer the potential for
a particularly robust implementation. A novel spatial cali-
bration method for such a network is presented, which ob-
tains pose estimates suitable for collaborative 3D vision in a
distributed fashion using two stages of registration on robust
3D features. The method is initially de scribed in a geomet-
rical sense, then presented in a practical implementation us-
ing existing vision and registration algorithms. The method
is designed independently of networking details, ma king only
a few basic a ssumptions about the underlying network’s ca-
pabilities. Experiments using both software simulations and
physical devices are designed and executed to demonstrate
performance.
Index Terms— camera network, calibration, collabora-
tive, distributed, registration, 3D vision
1. INTRODUCTION
The relatively new concept of 3D visual sensor networks [2]
is emerging within the area of dis tributed smart cameras. By
collecting and processing true 3D information, such networks
offer improvements in existing applications and promise en-
tirely new possibilities.
The 3D sensing paradigm includes the use of passive 3D
(stereo) vision, fusion of information from multiple views,
and distributed collaborative processing. Many of the ba-
sic computer vision operations, including shape recognition,
object tracking, motion analysis, and scene reconstruction,
have been improved through one or two of these properties;
we contend that all three in unison yield yet greater bene-
fits. Thus, our work applies to distributed smart stereo cam-
era ne tworks, wherein each node consists of a device capable
Funding for this research was provided in part by the Natural Sciences
and Engineering Research Council of Canada.
of passive 3D vision and the distributed algorithms operate
primarily or exclusively on 3D data.
In order to perform any useful collaborative processing
of this information, there must exist some way to bring data
from multiple nodes into a common reference frame. This
is achieved through calibration, which includes spatial local-
ization and orientation as well as temporal synchronization.
While some distributed time synchronization methods from
the sensor network literature are applicable, existing localiza-
tion methods are insufficient.
This paper presents a novel scalable spatial calibration
method for distributed smart stereo camera networks. It is
designed independent of node architecture or network details,
and makes few assumptions about node deployment or scene
contents. The problem is reduc e d to a geometrical form in
Section 3, from which the implementation in Section 4 fol-
lows. A more thorough treatment c a n be found in [1].
The majority of rese a rch in distributed smart cameras to
date has focuse d on monocular vision at each node. A num-
ber of methods for distributed self-calibration have been pro-
posed for this paradigm, and though the vision components
are not readily applicable to 3D sensing nodes, the general
localization and distribution concepts developed apply to any
vision-based system.
From the perspective of traditional sensor networks, the
primary challenges are the directionality of vis ion sensors, the
higher degree of accuracy required by vision applications, and
the large volume of raw sensor da ta . Conversely, from the
perspective of traditional computer vision, the cha llenge is in
the scalable distribution of proces sing among nodes and the
related limitations of network bandwidth.
While traditional sensor network methods generally em-
ploy omnidirectional sensors and thus require only localiza-
tion, vision-based networks also require orientation. To apply
similar methods to directional vision sensors, the concept of
the vis ion graph is introduced in [4], where an edge on the
graph represents shared field of view rather than a communi-
cation link.
Functional calibration methods a re prese nted for monoc-
ular distributed smart cameras in [4, 5]. These are based on
978-1-4244-2665-2/08/$25.00
c
2008 IEEE
wide-baseline stereo methods, which are generally not robust
due to the matching problem [13], a nd require unwieldy ini-
tialization schemes or dictate deployment constraints. Some
methods, such as [6], use motion of objects in the scene to
calibrate, but these still suffer from the matching problem
to a degree and require certain kinds of scene. Potentially
more robust methods are presented in [7, 8]; however, these
require the use of markers or bea cons placed in the environ-
ment, which is infeasible in many ca ses and may constrain
deployment or extension to dynamic calibration.
With the true 3D sensing network paradigm introduced in
[2], advocating distributed smart stereo cameras, a calibration
method called Lighthouse is presented in [3] which use s 3D
features and geographic hash tables (GHTs) to localize and
orient nodes. Our method employs the same basic concept,
but is more complete and addresse s some impractica lities in
the former.
2. PRELIMINARIES
2.1. Definitions
2.1.1. Nodes and Groups
A node is the abstract or physical smart stereo camera device
itself; nodes shall be denoted by s equential capital letters (A,
B, and so forth). The set of all nodes in the network shall be
denoted N (where |N| represents the total number of nodes).
A group is a set of nodes which a gree on a single leader node;
a group led by node A shall be denoted G
A
(where |G
A
| rep-
resents the number of nodes in the group). Every group is a
subset of the full set of nodes (G
A
⊆ N), and every node is
a member of exactly one group (so, if G
A
and G
B
are two
separate groups, |G
A
∩ G
B
| = 0).
2.1.2. Point Sets and Features
A point set is the full set of interest points dete cted locally
at a node; the point set of node A shall be denoted S
A
. The
overlap betwee n point sets S
A
and S
B
refers to the size of
the intersection of the two sets |S
A
∩ S
B
|, said intersection
occurring where a point in S
A
corresponds to the same phys-
ical point as a point in S
B
. The percent overlap is defined as
follows:
%O(S
A
, S
B
) =
|S
A
∩ S
B
|
max(|S
A
|, |S
B
|)
× 100% (1)
A feature is any subset of the point set of a certain size (de-
termined by a parameter of the algorithm); when discussing a
single arbitrary feature from node A, it shall be denoted F
A
,
where F
A
⊆ S
A
. Two features F
A
and F
B
, from nodes A and
B respectively, are considered to match (denoted F
A
≈ F
B
)
if each point in F
A
corresponds to the same physical point as
a point in F
B
. In the context of the algorithm, it is impossible
to ascertain this correspondence, so the term match implies
rather a presumed match based on a criterion of geometrical
similarity.
2.2. Pose
Pose is a concept used here to describe the relative motion
between two nodes in a distributed smart camera network,
which is the basis of calibration. Each node is considered
to have its own local coordinate system. The relative pose of
node A with respect to node B is denoted P
AB
, and is the
rigid transformation in 3D Euclidean space from the coordi-
nate system of A to that of B.
The transformation P
AB
: R
3
→ R
3
consists of a rotation
matrix (3 × 3 real orthogonal matrix) R
AB
and a 3-element
translation vector T
AB
. P
AB
maps a point x ∈ R
3
as follows:
P
AB
(x) = R
AB
x + T
AB
(2)
The identity pose is denoted P
I
, and consists of the identity
matrix R
I
and the zero vector T
I
.
The inverse of pose P
AB
, denoted P
−1
AB
, reverses the pose
transformation (so that P
−1
AB
= P
BA
). It can be determined
as follows:
P
−1
AB
(x) = R
−1
AB
x − R
−1
AB
T
AB
(3)
A succession of pose transformations P
BC
(P
AB
(x)) can
be composed into a single pose, denoted (P
AB
◦ P
BC
)(x), as
follows:
(P
AB
◦ P
BC
)(x) = R
BC
R
AB
x + (R
BC
T
AB
+ T
BC
) (4)
This transformation maps from the coordinate system of A
to that of B, then from that of B to that of C; therefore, the
transformation from A to C can be computed via composition
as P
AC
= (P
AB
◦ P
BC
)(x). This operation is transitive, so
one node’s pose relative to another can be computed indirectly
over an arbitrary number of intermediate pose s if they exist.
2.3. Graphs
Three types of undirected graphs are helpful in describing dis-
tributed smart camera calibration: the communication graph,
the vision graph, and the calibration graph [4 ]. Graphs are
described a s connected if there exists a path connecting every
pair of nodes, and complete if there exists an edge between
each pair of nodes.
The communication graph describes the effective commu-
nication links between nodes in the network from the perspec-
tive of the layer presented to the application. A complete
communication graph indicates that any node may commu-
nicate directly with any other node.
The vision graph describes which nodes share significant
portions of their field of view. A pair of nodes have a con-
necting e dge in this graph if the volume of space in the inter-
section of the ir fields of view is considered large enough that
it might contain sufficient data for the operations required by
the algorithm.
The calibration graph describes which nodes have a di-
rect estimate of their pairwise pos e. Obviously, it is desirable
that this graph be connected, s o that any two nodes X and
Y may es timate their relative pose P
XY
by composition of
known pose estimates. Edges can only be established where
there exist edges in the vision graph, and the most complete
calibration graph possible is ide ntica l to the vision graph.
3. MAIN PROBLEM
3.1. Problem Statement
The overall objective is to spatially ca librate a series of homo-
geneous smart stereo camera nodes , with no a priori knowl-
edge and using only the nodes ’ 3D visual data, in a distributed
fashion. Assuming the visual data consists of a set of 3D
points triangulated from stereo images of the environment,
the problem may be reduced to geometrical terms:
Given a set of nodes N, each node X ∈ N hav-
ing a point set S
X
, estimate the pos e P
XY
for
enough node pairs (X, Y ) such that the calibra-
tion graph for N is connected.
The shared view assumption (3.2.2) and the repeatability cri-
terion of interest point detection (4.2) imply a sufficient de -
gree of overlap between a sufficient numbe r of node pairs for
convergence.
3.2. Assumptions
3.2.1. Pre-Deployment Offline Access
It is assumed that, prior to deployment of the network, there is
a period during which each node may be accesse d without re-
striction in a controlled environment, in order to perform cer-
tain essential modifications to software (such as assignment of
a unique identifier, network configuration, and intrinsic/stereo
calibration of the cameras).
3.2.2. Shared View
For full convergence, it is assumed that the vision graph is
connected. This imposes a minimum qualitative constraint on
node deployment that the shared field of view of the entire
network be continuous and have substantial internal pairwise
overlap.
3.2.3. Fixed Nodes
It is assumed that each node is fixed in its location and ori-
entation relative to all other nodes. It is also assumed that,
once internally calibrated for stereo vision, no node change s
the relative motion between its cameras or the internal param-
eters (e.g. focal length) of either of its cameras.
3.2.4. Static Scene
It is a ssumed that the contents of the scene are fully static
for the purpose s of acquiring calibration point sets. This is
solely for simplicity, and could easily be relaxed by employ-
ing background estimation techniques or accurate temporal
synchronization.
3.2.5. Abstract Network
It is assumed tha t the nodes are capable of autonomously
forming an ad-hoc network of s ome kind, wherein each node
can be addressed by a unique identifier. From the algorithm’s
point of view, the network is assumed to be fully connected
[17], or in other words, the communication graph is assumed
to be complete. Additionally, it is assumed that arbitrary
amounts of data can be sent with assured delivery.
3.3. Problem Analysis
3.3.1. Two-Stage Registration
Bringing the point sets, and thereby the node coordinate sys-
tems, into a lignment with one another can be accomplished
by registration. Registration algorithms may be divided into
two types: coarse registration, which can align points without
an initial estimate but are generally not very accurate; a nd fine
registration, which require an initial estimate to align points
but are very accurate [9].
For our purposes, no alignment estimate is initially avail-
able, yet high accuracy is desirable. The typical solution when
presented with such a problem is a two-stage a pproac h, us-
ing coarse registration to initialize fine registration. However,
there is more to the problem in our case: it is not even known
which point sets overlap or to what degree. We use a process
of feature matching to determine how to proceed with regis-
tration between nodes .
3.3.2. Feature Matching
In order to find c oa rs e pose estimates between nodes with no
knowledge of their point set overlap in a distributed fashion,
a pairwise feature matching process similar to that described
in [3] ca n be employed.
The goal is to find pairwise matches between nodes’ fea-
tures, and then use those matches to calculate coarse relative
pose estimates for the node pairs. Both a re accomplished
through the c oarse registration algorithm; if the registration
error falls below a certain threshold t
ec
, the features are con-
sidered to match, and the registration result yields a coarse
pose estimate betwe e n the source nodes.
Consider point sets from two nodes, S
A
and S
B
, from
which, according to the coarse matching algorithm, each node
randomly selects a feature of size f ≥ 3, resulting in F
A
⊆
S
A
and F
B
⊆ S
B
where |F
A
| = |F
B
| = f ≤ |S
A
∩S
B
|. The
performance of the matching scheme depends on the proba-
bility of a match between F
A
and F
B
, P (F
A
≈ F
B
), which
can be calculated as follows:
P (F
A
≈ F
B
) =
|S
A
∩ S
B
|!f!(|S
A
| − f)!(|S
B
| − f)!
|S
A
|!|S
B
|!(|S
A
∩ S
B
| − f)!
(5)
It is therefore desirable to increase |S
A
∩ S
B
| relative to |S
A
|
and |S
B
| (i.e., increase the percent overlap), which translates
into repeatability in interest point detection (4.2). There is
a trade-off in the value of f between matching performance
and false matches; generally, a low value such as f = 4 is
adequate.
3.3.3. Feature Categorization
No details have yet been given about how to bring features to-
gether for matching in a distributed fashion. The idea of fea-
ture c a tegorization is borrowed from the data-centric storage
literature, us e d with reference to distributed smart camera net-
works in [2] and more specifically to their calibration in [3].
The goal is to evenly distribute the processing and storage of
the data in a distributed system based on s ome quantitative or
qualitative metric of the data itself. For this, a smooth, deter-
ministic geometric desc riptor function, denoted g, is used.
The solution space of this descriptor is then divided as
evenly as possible among the nodes in the network, with some
overlap (see be low), and features detected locally a t each node
are sent to the appropriate node for matching to other ge omet-
rically similar features.
Ideally, the difference between the des criptors of two fea-
tures F
A
and F
B
describes the degree of difference d between
those features:
d(F
A
, F
B
) = |g(F
A
) − g(F
B
)| (6)
Based on the measurement accuracy of a node and the spe cific
coarse registration algorithm used, there is a similarity thresh-
old t
d
, such that it is necessary to compare two features F
A
and F
B
if d(F
A
, F
B
) < t
d
, a nd unnecessary otherwise; this
will be termed the similarity condition. The desirable overlap
for categorization, then, is t
d
/2 in all directions.
Note that categorization, and thus the nodes where fea-
tures are matched, has no relation to the nodes where those
features originated. When a matc h is found, the result is re-
turned to one of the two source nodes, bas e d on s ome deter-
ministic selection function such that for a given pair of nodes
the same node is always selected.
3.3.4. Coarse Grouping
In order to guarantee that all nodes with edges on the vision
graph attempt pairwise pose refinement without the need for
exhaustive feature matching, we introduce a grouping scheme
wherein nodes are merged into ever-larger groups within the
same coordinate system, albeit with only coarse estimates.
Through pose composition, any node in a group can deter-
mine its coarse pose with respect to any other node . This is
conceptually similar in some ways to the GHT sc he me pro-
posed in [3].
A node always knows its current group leader and the set
of nodes comprising its group. Within a group (2.1.1), each
node has a coarse pose estimate relative to the group leader,
called its group coarse pos e estimate, and denoted C
A
for a
node A. Relative coarse pose estimates (e.g. C
AB
for node
A relative to node B) can be computed from these, either di-
rectly or through one or more compositions. Initially, each
node begins in a singleton group, of which it is the leader,
with its group coarse pose estimate initialized to P
I
.
A merge is initiated when two nodes have detected a cer-
tain minimum number t
m
of consistent matches with each
other. Consistency is enforced via a threshold t
c
specify-
ing the minimum Euclidean distance between the pose esti-
mates’ mappings of a given point (such as the centroid µ
S
of the computing node’s point set). Once a node has stored
at least t
m
matches with a particular other node, e ach time a
new match is de te cted for that node, an average coarse pose
estimate is computed for every combination M
i
of matches
containing the new match, and checked for consistency:
||C
m
(µ
S
) − C
avg
(µ
S
)|| ≤ t
c
, ∀m ∈ M
i
(7)
If a consistent average is found, it is considered a reliable
relative coarse pose estimate, and is forwarded to the source
nodes’ group leaders and composed as necessary to merge the
nodes’ respective groups.
Fig. 1. Group Merging
Figure 1 illustrates a typical group merge. Node D, of
group G
A
, and node G, of group G
F
, find a relative coarse
pose estimate through feature matching, and initiate a merge.
The nodes in group G
A
do not modify their group coarse pose
information. Node G’s new group coarse pose estimate (C
′
G
)
is the composition of its estimated pose relative to node D
with node D’s group coarse pose estimate:
C
′
G
= C
GD
◦ C
D
(8)
The new group coarse pose estimates for the merging group’s
leader (C
′
F
) and any other nodes in the merging group (in
this case, C
′
H
) can similarly be calculated as compositions of
known pose estimates:
C
′
F
= C
−1
G
◦ (C
GD
◦ C
D
) (9)
C
′
H
= C
H
◦ (C
−1
G
◦ (C
GD
◦ C
D
)) (10)
Since merging consists of composition operations, it is a tran-
sitive operation which can occur based on matches (and the
resultant relative coa rs e pose e stimates) between any pair of
nodes in different groups. Figure 1 illustrates this by showing
the actual history of merges leading to the groups as arrows
between the node pairs; in reality, of course, every node in the
group has a direct pose es timate to the leader (group coarse
pose estimate).
3.3.5. Pairwise Pose Refinement
Once a given pair of node s belong to a group via the feature
matching process, those node s can use their coarse relative
pose estimate as a starting point for pose refinement. This is
achieved by applying a fine registration algorithm to a large
number of points initialized into coarse alignment.
Fig. 2. Field of View Cone Approximation
As shown in Figure 2, the actual point s ets used for fine
registration are selected, at each node, as those falling within
the intersection of the two nodes’ fields of view, as approxi-
mated by a cone of a certain angle and length extending along
the positive z-axis of each node’s coordinate system (via the
coarse pose estimate). If there are fewer than a specified min-
imum number of points, which includes the case where there
is no intersection at a ll, the nodes do not attempt pose refine-
ment.
3.3.6. Indirect Pose Estimation
A pair of nodes attempting to determine their relative pose can
now communicate directly to find the shortest path along the
existing pairwise fine pose estimates (calibration graph) and
thus obtain a composition with a minimum of error. A node A
may find such an estimate P
AB
relative to a node B according
to the following a lgorithm (suppose F P
A
represents the set of
fine pose estimates at node A):
1. If P
AB
∈ F P
A
, select P
AB
and end.
2. For each P
AX
∈ F P
A
, reques t F P
X
from node X. If
P
XB
∈ F P
X
, select P
AB
= P
AX
◦ P
XB
and end.
3. For each P
XY
∈ F P
X
, reques t F P
Y
from node Y . If
P
Y B
∈ F P
Y
, select P
AB
= P
AX
◦ P
XY
◦ P
Y B
and
end.
4. Continue until P
AB
has been found.
As indirect fine pose e stimates are found (even intermediate
ones that were not requested), they should be added to F P to
avoid unne cessary repetition of network requests and compu-
tations.
4. ALGORITHM DESIGN
4.1. Distributed Calibration Algorithm
The algorithm is split into ten distinct processes at each node;
six for coarse grouping, and four for pairwise pose refine-
ment. Each process acts upon receipt of a message, with the
exception of the feature selection process, which executes pe-
riodically, and the pose refinement initiator process, which
executes whenever the group composition is updated.
There are four parameters intrinsic to the algorithm itself,
following from Section 3.3: the feature size f , the similar-
ity threshold t
d
, the match threshold t
m
, and the consistency
threshold t
c
. Certain other implementation-specific parame-
ters are also required, notably those for the coarse and fine
registration algorithms; in particular, t
ec
and t
ef
are refer-
enced here as generic error thres holds for the coarse and fine
registration a lgorithms, respectively.
Fig. 3. Feature Selection Process
Fig. 4. Feature Matching Process
Fig. 5. Match Processing Process
Fig. 6. Group Merge Initiator Process
Fig. 7. Group Merge R esponder Process
Fig. 8. Group Update Process
Fig. 9. Pose Refinement Initiator Process
4.2. Interest Point Detection
Three distinct parts of calibration are impacted directly by the
interest point detection algorithm used: the c orresponde nce
algorithm, the coarse matching s cheme, and the fine registra-
tion algorithm. In all three cases, it is the repeatability perfor-
mance metric which is of interest; higher repeatability yields
higher overlap in the point sets.
For practical purposes (including the availability of source
Fig. 10. Pose Refinement Responder Process
Fig. 11. Fine Registration Process
Fig. 12. Pose Update Process
code from the authors), the FAST interest point detector [14,
15] is selected for this implementation. C onvergence ca n be
encouraged by constraining nodes to sha re large portions of
their fields of view or by calibrating on a sc ene with strong
interest points.
4.3. Registration
Since matching features overlap fully, an excellent s olution to
the coarse registration problem is the fully-contained version
of the DARCES algorithm [10], using three control points.
DARCES without the RANSAC component is a relatively
simple algorithm, allowing it to perform rapid matching on
a large number of features.
The concept of the Iterative Closest Point (ICP) algorithm
[11] le nds itself well to the fine registration problem encoun-
tered in pairwise pose refinement. However, the difficulty
of stable interest point detection, occlusion effects, and un-
certainty about the overlap in field of view all contribute to
poor overlap in the point sets used for pose refinement. The
Trimmed Iterative Closest Point (TrICP) algorithm [12], used
in this implementation, can be automatically tuned to any de-
gree of overlap, and is applicable to overlaps under 50%.
5. EXPERIMENTS
5.1. Performance Metrics
5.1.1. Convergence
Convergence is the measure of the algorithm’s ability to bring
nodes into a common reference frame and its time perfor-
mance in doing so. For our purposes, there are actually two
distinct considerations:
1. The ability of coarse grouping to merge into a minimum
number of groups.
2. The ability of pairwise pose refinement to establish a
maximum number of pairwise estimates.
Calibration is considered successful in terms of convergence
when coarse grouping merges the entire network into a single
group and pairwise fine pose estimates are established such
that the calibration graph is connected.
5.1.2. Accuracy
Accuracy is the measure of the error in the algorithm’s result-
ing pose estimates. The mean error in a pose estimate can
be determined by averaging the Euclidean distance between a
number of points with ground-truth corresponde nc e, detected
and triangulated at the nodes separately from those used for
calibration. Although error accumulates with the path length
(number of pose c ompos itions) in the calibration graph, it is
more relevant to cons ider the path length in the vision graph,
since the 3D reconstruction cons istency among nodes observ-
ing the same part of the scene is the likely criterion.
5.1.3. Scalability
Scalability is the measure of the effect on the algorithm’s pe r-
formance of the number of nodes in the network. The three
primary resources to consider are node-local computing re-
sources (i.e. CPU and memory), node-local data storage, and
network bandwidth.
In order to properly evaluate sca la bility, it is necessary to
examine individual factors arising from the algorithm itself.
The most significant of these can be summarized in terms of
the number of nodes in the network |N| as follows:
• Feature dissemination requires bandwidth resources in
|N| per node.
• Feature matching requires computing and storage re-
sources in |N|.
This as sumes that each node maintains a more or le ss constant
number of pairwise edges in the vision graph regardless of
|N|, as would be the case with most applications. In cases
where this assumption does not hold, it is necessary to add a
third factor:
• Pairwise pose refinement computation requires com-
puting resources in |N|.
Scalability in all three resources can be quantized experimen-
tally in te rms of the above factors.
5.2. Manual Point Set
In order to test the capabilities of the ca libration algorithm
and tune its parameters under controlled conditions, the first
experiment series is designed to ope rate on manually selected
points with full correspondences across all four node s. The
primary purpose of this experiment type, once suitable param-
eters are found, is to test the effects of different point set s izes
and overlap characte ristics on convergence and accuracy.
5.2.1. Procedure
A total of 22 point subsets are extracted from the da ta, and
each is tested using the distributed calibration software, with
all four nodes running loc ally on the same workstation. This
procedure is repeated twice for each subset, and the average
results for convergence time and mean error are calculated
and recorded.
5.2.2. Results
50
60
70
80
90
100
20
30
40
50
10
100
1000
(a) Convergence Time Trends in n and p
50
60
70
80
90
100
20
30
40
50
1.75
2
2.25
2.5
2.75
(b) Accuracy Trends in n and p
Fig. 13. Manual Experiment Results
5.3. Automatic Point Set
Having es ta blished some criteria for reasonably timely con-
vergence in the manual point set experiments, the next ste p is
to test real automatic calibration of the ne twork. The purpose
of these experiments is to test the convergence and accuracy
performance of the algorithm in real conditions.
5.3.1. Procedure
Four instances of the local point detection software, config-
ured to execute the distributed calibration software on com-
pletion, are run in automatic mode on the vision platform
workstation. Convergence time and the final calibration graph
are recorded. A ground truth point set is manually selected
for each camera rig, and the mean error is calculated and
recorded.
5.3.2. Results
The me a n error a nd convergence time of a typical experiment
from this series is shown below. Figure 14 shows the final
calibration graph, Figure 15 shows the physical deployment
of the nodes for the experiment, and Figure 16 shows the vi-
sualization of the resultant pose estimates (which can be seen
to match the configuration in Figure 15).
• Mean Error: 2.7666 mm
• Convergence Time: 159 s
Fig. 14. Calibration Graph for A utomatic Experiment
5.4. Virtual Point Set
Since only four physical camera rigs are available, testing
scalability to larger networks is impossible in an automatic
experiment and difficult to control using the manual meth-
ods. Instead, controlled virtual point sets a re supplied to the
same calibration algorithm implementation to test the scala-
bility metric.
5.4.1. Procedure
Point sets are generated for 5, 10, 15, 20, and 25 nodes. The
total outgoing bandwidth in kilobytes, final size of the match-
ing database in features, and total number of coars e and fine
registration executions are recorded.
Fig. 15. Camera Deployment for Automatic Experiment
Fig. 16. Pose Visualization for Automatic Experiment
5.4.2. Results
As expected, total bandwidth usage per node increases ap-
proximately linearly in relation to the number of nodes in the
network (Figure 17). This affects different networks in differ-
ent ways. In a network where the physical me dium is shared
by all nodes – the worst-case scenario – the total network
bandwidth usage is the relevant factor. In that case, the band-
width usage increases non-linearly, potentially at up to |N|
3
.
However, many routing methods us e d in sensor networks are
much more efficient and therefore mitigate this effect.
The number of features stored at each node increases ap-
proximately linearly in relation to the number of nodes (Fig-
ure 18). Features are very small data (a series of f 3-tuples, an
identifier, and a geometric descriptor value), but when scaling
to extremely large networks it must be ensured that adequate
storage is provided at each node for these features.
The number of coarse registration operations performed at
each node increases approximately linearly in relation to the
number of node s (Figure 19); as expected, this is proportional
Fig. 17. Bandwidth Usage in |N| (Average and Maximum)
Fig. 18. Node-Local Storage in |N| (Average and Maximum)
to the number of features stored. If processing throughput is
the limiting factor, this increase will cause the convergence
time to increase linearly with the number of nodes.
Fig. 19. Coarse Registration Processing in |N| (Average and
Maximum)
Since, in this network, the number of vision graph edges
per node does not generally increase as its total number of
nodes increases, the number of fine registrations per node is
approximately constant.
6. CONCLUSIONS
A calibration method for distributed smart stereo camera net-
works has been developed which converges well, provides ac-
curate pairwise orientation, and scales well to large networks.
This provides a bas e upon which to build a full 3D visual
sensor network providing primitive data-centric queries, upon
which in turn a variety of high-level applications can be de-
veloped.
Currently, the algorithm makes it possible for smart stereo
camera devices to self-localize and self-orient relative to one
another in a distributed fashion, a llowing for various subse -
quent stages of realization for a variety of applications. The
immediate opportunity is to provide a generalized framework
for building these solutions, which would rest on the underly-
ing ass umption that the network is accurately calibrated and
can perform 3D reconstruction across multiple views.
The major implementation drawback is the instability of
interest point detection in the general case; at present, it is
necessary to control the s cene somewhat by adding one or
more calibration ta rgets for convergence to occur reliably. Im-
proving this situation is an important avenue for future work.
7. REFERENCES
[1] A. Mavrinac, “Feature-Based Calibration of Distributed
Smart Stereo Camera Networks,” M.A.Sc. Thesis, Uni-
versity of Windsor, 2008.
[2] M. Akdere, U. Cetintemel, D. Crispell, J. Ja nnotti,
J. Mao, and G. Taubin, “Data -Ce ntric Visual Sensor
Networks for 3D Sens ing,” in Proc. 2nd Intl. Conf. on
Geosensor Networks , 2006.
[3] J. Jannotti and J. Mao, “Distributed Calibration of Smart
Cameras,” in Proc. Intl. Workshop on Distributed Smart
Cameras, pp. 55–61, 2006.
[4] D. Devarajan and R. J. Radke, “Distributed Metric Cal-
ibration of Large Camera Networks,” in Proc. 1st Work-
shop on Broadband Advanced Sensor Networks, 2004.
[5] W. E. Mantzel, H. Choi, and R. G. Baraniuk, “Dis-
tributed Camera Network Localization,” in Proc. 38th
Asilomar Conf. on Signals, Systems and Computers,
2004.
[6] S. Funiak, C. Guestrin, M. Paskin, and R. Sukthankar,
“Distributed Localization of Networked Cameras,” in
Proc. 5th Intl. Conf. on Information Processing in Sen-
sor Networks , pp. 34–42, 2006.
[7] C. Beall and H. Qi, “Distributed Self-Deployment in Vi-
sual Sensor Networks,” in Proc. Intl. Conf. on Control,
Automation, Robotics and Vision, pp. 1–6, 2006.
[8] C. J. Taylor and B. Shirmohammadi, “Self Localiz-
ing Smart Camera Networks and their Applications to
3D Modeling,” in Proc. Intl. Workshop on Distr ibuted
Smart Cameras, pp. 46–50, 2006.
[9] J. Salvi, C. M a ta bosch, D. Fofi, and J. Forest, “A Review
of Recent Range Image Registration Methods with Ac-
curacy Evaluation,” Image and Vision Computing, vol.
25, no. 5, pp. 578–596, 2007.
[10] C.-S. Chen, Y.-P. Hung, and J.-B. Cheng, “RANSAC-
Based DARCES: A New Approach to Fast Automatic
Registration of Partially Overlapping Range Image s,”
IEEE Trans. on Pattern Analysis and Machine Intelli-
gence, vol. 21, no. 11, pp. 1229–1234, 1999.
[11] P. J. Besl and N. D. McKay, “A Method for Registration
of 3-D Shapes,” IEEE Trans. on Pattern Analysis and
Machine Intelligence, vol. 14, no. 2, pp. 239–256, 1992.
[12] D. Chetverikov, D. Svirko, D. Stepanov, and P. Krsek,
“The Trimmed Iterative Closest Point Algorithm,” in
Proc. Intl. Conf. on Pattern Recognition, pp. 545–548,
2002.
[13] A. Baumberg, “ R eliable Feature Matching Across
Widely Separated Views,” in Proc. IEEE Conf. on Com-
puter Vision and Pattern Recognition, pp. 1774–1781,
2000.
[14] E. Rosten and T. Drummond, “Fusing Points and Lines
for High Performance Tracking,” in Proc. 10th IEEE
Intl. Conf. on Computer Vision, pp. 1508–1511, 2005.
[15] E. Rosten and T. Drummond, “Machine Learning for
High-Speed Corner Detection,” in Proc. 9th European
Conf. on Computer Vision, pp. 430–443, 2006.
[16] H. C. Longuet-Higgins, “A Computer Algorithm for Re-
constructing a Scene from Two Projections,” Nature,
vol. 293, pp. 133–135, 1981.
[17] M. Raynal, Distributed Algorithms and Protocols, John
Wiley & Sons, 1988.
[18] O. Faugeras, Three-Dimensional Computer Vision: A
Geometric Viewpoint, The MIT Press, 1993.
[19] Y. Ma, S. Soatto, J. Ko
˘
seck
´
a, S. S. Sastry, An Invita-
tion to 3-D Vision: From Images to Geometric Models.
Springer-Verlag, 2004.