Content uploaded by Xiang Chen
Author content
All content in this area was uploaded by Xiang Chen on Oct 10, 2014
Content may be subject to copyright.
FEATUREBASED CALIBRATION OF DISTRIBUTED
SMART STEREO CAMERA NETWORKS
Aaron Mavrinac, Xiang Chen, and Kemal Tepe
University of Windsor
Department of Electrical and Computer Engineering
401 Sunset Ave., Windsor, Ontario, Canada N9B 3P4
ABSTRACT
A distributed smart camera network is a collective of vision
capable devices with enough processing power to execute al
gorithms for collaborative vision tasks. A true 3D sensing
network applies to a broad range of applications, and local
stereo vision capabilities at each node offer the potential for
a particularly robust implementation. A novel spatial cali
bration method for such a network is presented, which ob
tains pose estimates suitable for collaborative 3D vision in a
distributed fashion using two stages of registration on robust
3D features. The method is initially de scribed in a geomet
rical sense, then presented in a practical implementation us
ing existing vision and registration algorithms. The method
is designed independently of networking details, ma king only
a few basic a ssumptions about the underlying network’s ca
pabilities. Experiments using both software simulations and
physical devices are designed and executed to demonstrate
performance.
Index Terms— camera network, calibration, collabora
tive, distributed, registration, 3D vision
1. INTRODUCTION
The relatively new concept of 3D visual sensor networks [2]
is emerging within the area of dis tributed smart cameras. By
collecting and processing true 3D information, such networks
offer improvements in existing applications and promise en
tirely new possibilities.
The 3D sensing paradigm includes the use of passive 3D
(stereo) vision, fusion of information from multiple views,
and distributed collaborative processing. Many of the ba
sic computer vision operations, including shape recognition,
object tracking, motion analysis, and scene reconstruction,
have been improved through one or two of these properties;
we contend that all three in unison yield yet greater bene
ﬁts. Thus, our work applies to distributed smart stereo cam
era ne tworks, wherein each node consists of a device capable
Funding for this research was provided in part by the Natural Sciences
and Engineering Research Council of Canada.
of passive 3D vision and the distributed algorithms operate
primarily or exclusively on 3D data.
In order to perform any useful collaborative processing
of this information, there must exist some way to bring data
from multiple nodes into a common reference frame. This
is achieved through calibration, which includes spatial local
ization and orientation as well as temporal synchronization.
While some distributed time synchronization methods from
the sensor network literature are applicable, existing localiza
tion methods are insufﬁcient.
This paper presents a novel scalable spatial calibration
method for distributed smart stereo camera networks. It is
designed independent of node architecture or network details,
and makes few assumptions about node deployment or scene
contents. The problem is reduc e d to a geometrical form in
Section 3, from which the implementation in Section 4 fol
lows. A more thorough treatment c a n be found in [1].
The majority of rese a rch in distributed smart cameras to
date has focuse d on monocular vision at each node. A num
ber of methods for distributed selfcalibration have been pro
posed for this paradigm, and though the vision components
are not readily applicable to 3D sensing nodes, the general
localization and distribution concepts developed apply to any
visionbased system.
From the perspective of traditional sensor networks, the
primary challenges are the directionality of vis ion sensors, the
higher degree of accuracy required by vision applications, and
the large volume of raw sensor da ta . Conversely, from the
perspective of traditional computer vision, the cha llenge is in
the scalable distribution of proces sing among nodes and the
related limitations of network bandwidth.
While traditional sensor network methods generally em
ploy omnidirectional sensors and thus require only localiza
tion, visionbased networks also require orientation. To apply
similar methods to directional vision sensors, the concept of
the vis ion graph is introduced in [4], where an edge on the
graph represents shared ﬁeld of view rather than a communi
cation link.
Functional calibration methods a re prese nted for monoc
ular distributed smart cameras in [4, 5]. These are based on
9781424426652/08/$25.00
c
2008 IEEE
widebaseline stereo methods, which are generally not robust
due to the matching problem [13], a nd require unwieldy ini
tialization schemes or dictate deployment constraints. Some
methods, such as [6], use motion of objects in the scene to
calibrate, but these still suffer from the matching problem
to a degree and require certain kinds of scene. Potentially
more robust methods are presented in [7, 8]; however, these
require the use of markers or bea cons placed in the environ
ment, which is infeasible in many ca ses and may constrain
deployment or extension to dynamic calibration.
With the true 3D sensing network paradigm introduced in
[2], advocating distributed smart stereo cameras, a calibration
method called Lighthouse is presented in [3] which use s 3D
features and geographic hash tables (GHTs) to localize and
orient nodes. Our method employs the same basic concept,
but is more complete and addresse s some impractica lities in
the former.
2. PRELIMINARIES
2.1. Deﬁnitions
2.1.1. Nodes and Groups
A node is the abstract or physical smart stereo camera device
itself; nodes shall be denoted by s equential capital letters (A,
B, and so forth). The set of all nodes in the network shall be
denoted N (where N represents the total number of nodes).
A group is a set of nodes which a gree on a single leader node;
a group led by node A shall be denoted G
A
(where G
A
 rep
resents the number of nodes in the group). Every group is a
subset of the full set of nodes (G
A
⊆ N), and every node is
a member of exactly one group (so, if G
A
and G
B
are two
separate groups, G
A
∩ G
B
 = 0).
2.1.2. Point Sets and Features
A point set is the full set of interest points dete cted locally
at a node; the point set of node A shall be denoted S
A
. The
overlap betwee n point sets S
A
and S
B
refers to the size of
the intersection of the two sets S
A
∩ S
B
, said intersection
occurring where a point in S
A
corresponds to the same phys
ical point as a point in S
B
. The percent overlap is deﬁned as
follows:
%O(S
A
, S
B
) =
S
A
∩ S
B

max(S
A
, S
B
)
× 100% (1)
A feature is any subset of the point set of a certain size (de
termined by a parameter of the algorithm); when discussing a
single arbitrary feature from node A, it shall be denoted F
A
,
where F
A
⊆ S
A
. Two features F
A
and F
B
, from nodes A and
B respectively, are considered to match (denoted F
A
≈ F
B
)
if each point in F
A
corresponds to the same physical point as
a point in F
B
. In the context of the algorithm, it is impossible
to ascertain this correspondence, so the term match implies
rather a presumed match based on a criterion of geometrical
similarity.
2.2. Pose
Pose is a concept used here to describe the relative motion
between two nodes in a distributed smart camera network,
which is the basis of calibration. Each node is considered
to have its own local coordinate system. The relative pose of
node A with respect to node B is denoted P
AB
, and is the
rigid transformation in 3D Euclidean space from the coordi
nate system of A to that of B.
The transformation P
AB
: R
3
→ R
3
consists of a rotation
matrix (3 × 3 real orthogonal matrix) R
AB
and a 3element
translation vector T
AB
. P
AB
maps a point x ∈ R
3
as follows:
P
AB
(x) = R
AB
x + T
AB
(2)
The identity pose is denoted P
I
, and consists of the identity
matrix R
I
and the zero vector T
I
.
The inverse of pose P
AB
, denoted P
−1
AB
, reverses the pose
transformation (so that P
−1
AB
= P
BA
). It can be determined
as follows:
P
−1
AB
(x) = R
−1
AB
x − R
−1
AB
T
AB
(3)
A succession of pose transformations P
BC
(P
AB
(x)) can
be composed into a single pose, denoted (P
AB
◦ P
BC
)(x), as
follows:
(P
AB
◦ P
BC
)(x) = R
BC
R
AB
x + (R
BC
T
AB
+ T
BC
) (4)
This transformation maps from the coordinate system of A
to that of B, then from that of B to that of C; therefore, the
transformation from A to C can be computed via composition
as P
AC
= (P
AB
◦ P
BC
)(x). This operation is transitive, so
one node’s pose relative to another can be computed indirectly
over an arbitrary number of intermediate pose s if they exist.
2.3. Graphs
Three types of undirected graphs are helpful in describing dis
tributed smart camera calibration: the communication graph,
the vision graph, and the calibration graph [4 ]. Graphs are
described a s connected if there exists a path connecting every
pair of nodes, and complete if there exists an edge between
each pair of nodes.
The communication graph describes the effective commu
nication links between nodes in the network from the perspec
tive of the layer presented to the application. A complete
communication graph indicates that any node may commu
nicate directly with any other node.
The vision graph describes which nodes share signiﬁcant
portions of their ﬁeld of view. A pair of nodes have a con
necting e dge in this graph if the volume of space in the inter
section of the ir ﬁelds of view is considered large enough that
it might contain sufﬁcient data for the operations required by
the algorithm.
The calibration graph describes which nodes have a di
rect estimate of their pairwise pos e. Obviously, it is desirable
that this graph be connected, s o that any two nodes X and
Y may es timate their relative pose P
XY
by composition of
known pose estimates. Edges can only be established where
there exist edges in the vision graph, and the most complete
calibration graph possible is ide ntica l to the vision graph.
3. MAIN PROBLEM
3.1. Problem Statement
The overall objective is to spatially ca librate a series of homo
geneous smart stereo camera nodes , with no a priori knowl
edge and using only the nodes ’ 3D visual data, in a distributed
fashion. Assuming the visual data consists of a set of 3D
points triangulated from stereo images of the environment,
the problem may be reduced to geometrical terms:
Given a set of nodes N, each node X ∈ N hav
ing a point set S
X
, estimate the pos e P
XY
for
enough node pairs (X, Y ) such that the calibra
tion graph for N is connected.
The shared view assumption (3.2.2) and the repeatability cri
terion of interest point detection (4.2) imply a sufﬁcient de 
gree of overlap between a sufﬁcient numbe r of node pairs for
convergence.
3.2. Assumptions
3.2.1. PreDeployment Ofﬂine Access
It is assumed that, prior to deployment of the network, there is
a period during which each node may be accesse d without re
striction in a controlled environment, in order to perform cer
tain essential modiﬁcations to software (such as assignment of
a unique identiﬁer, network conﬁguration, and intrinsic/stereo
calibration of the cameras).
3.2.2. Shared View
For full convergence, it is assumed that the vision graph is
connected. This imposes a minimum qualitative constraint on
node deployment that the shared ﬁeld of view of the entire
network be continuous and have substantial internal pairwise
overlap.
3.2.3. Fixed Nodes
It is assumed that each node is ﬁxed in its location and ori
entation relative to all other nodes. It is also assumed that,
once internally calibrated for stereo vision, no node change s
the relative motion between its cameras or the internal param
eters (e.g. focal length) of either of its cameras.
3.2.4. Static Scene
It is a ssumed that the contents of the scene are fully static
for the purpose s of acquiring calibration point sets. This is
solely for simplicity, and could easily be relaxed by employ
ing background estimation techniques or accurate temporal
synchronization.
3.2.5. Abstract Network
It is assumed tha t the nodes are capable of autonomously
forming an adhoc network of s ome kind, wherein each node
can be addressed by a unique identiﬁer. From the algorithm’s
point of view, the network is assumed to be fully connected
[17], or in other words, the communication graph is assumed
to be complete. Additionally, it is assumed that arbitrary
amounts of data can be sent with assured delivery.
3.3. Problem Analysis
3.3.1. TwoStage Registration
Bringing the point sets, and thereby the node coordinate sys
tems, into a lignment with one another can be accomplished
by registration. Registration algorithms may be divided into
two types: coarse registration, which can align points without
an initial estimate but are generally not very accurate; a nd ﬁne
registration, which require an initial estimate to align points
but are very accurate [9].
For our purposes, no alignment estimate is initially avail
able, yet high accuracy is desirable. The typical solution when
presented with such a problem is a twostage a pproac h, us
ing coarse registration to initialize ﬁne registration. However,
there is more to the problem in our case: it is not even known
which point sets overlap or to what degree. We use a process
of feature matching to determine how to proceed with regis
tration between nodes .
3.3.2. Feature Matching
In order to ﬁnd c oa rs e pose estimates between nodes with no
knowledge of their point set overlap in a distributed fashion,
a pairwise feature matching process similar to that described
in [3] ca n be employed.
The goal is to ﬁnd pairwise matches between nodes’ fea
tures, and then use those matches to calculate coarse relative
pose estimates for the node pairs. Both a re accomplished
through the c oarse registration algorithm; if the registration
error falls below a certain threshold t
ec
, the features are con
sidered to match, and the registration result yields a coarse
pose estimate betwe e n the source nodes.
Consider point sets from two nodes, S
A
and S
B
, from
which, according to the coarse matching algorithm, each node
randomly selects a feature of size f ≥ 3, resulting in F
A
⊆
S
A
and F
B
⊆ S
B
where F
A
 = F
B
 = f ≤ S
A
∩S
B
. The
performance of the matching scheme depends on the proba
bility of a match between F
A
and F
B
, P (F
A
≈ F
B
), which
can be calculated as follows:
P (F
A
≈ F
B
) =
S
A
∩ S
B
!f!(S
A
 − f)!(S
B
 − f)!
S
A
!S
B
!(S
A
∩ S
B
 − f)!
(5)
It is therefore desirable to increase S
A
∩ S
B
 relative to S
A

and S
B
 (i.e., increase the percent overlap), which translates
into repeatability in interest point detection (4.2). There is
a tradeoff in the value of f between matching performance
and false matches; generally, a low value such as f = 4 is
adequate.
3.3.3. Feature Categorization
No details have yet been given about how to bring features to
gether for matching in a distributed fashion. The idea of fea
ture c a tegorization is borrowed from the datacentric storage
literature, us e d with reference to distributed smart camera net
works in [2] and more speciﬁcally to their calibration in [3].
The goal is to evenly distribute the processing and storage of
the data in a distributed system based on s ome quantitative or
qualitative metric of the data itself. For this, a smooth, deter
ministic geometric desc riptor function, denoted g, is used.
The solution space of this descriptor is then divided as
evenly as possible among the nodes in the network, with some
overlap (see be low), and features detected locally a t each node
are sent to the appropriate node for matching to other ge omet
rically similar features.
Ideally, the difference between the des criptors of two fea
tures F
A
and F
B
describes the degree of difference d between
those features:
d(F
A
, F
B
) = g(F
A
) − g(F
B
) (6)
Based on the measurement accuracy of a node and the spe ciﬁc
coarse registration algorithm used, there is a similarity thresh
old t
d
, such that it is necessary to compare two features F
A
and F
B
if d(F
A
, F
B
) < t
d
, a nd unnecessary otherwise; this
will be termed the similarity condition. The desirable overlap
for categorization, then, is t
d
/2 in all directions.
Note that categorization, and thus the nodes where fea
tures are matched, has no relation to the nodes where those
features originated. When a matc h is found, the result is re
turned to one of the two source nodes, bas e d on s ome deter
ministic selection function such that for a given pair of nodes
the same node is always selected.
3.3.4. Coarse Grouping
In order to guarantee that all nodes with edges on the vision
graph attempt pairwise pose reﬁnement without the need for
exhaustive feature matching, we introduce a grouping scheme
wherein nodes are merged into everlarger groups within the
same coordinate system, albeit with only coarse estimates.
Through pose composition, any node in a group can deter
mine its coarse pose with respect to any other node . This is
conceptually similar in some ways to the GHT sc he me pro
posed in [3].
A node always knows its current group leader and the set
of nodes comprising its group. Within a group (2.1.1), each
node has a coarse pose estimate relative to the group leader,
called its group coarse pos e estimate, and denoted C
A
for a
node A. Relative coarse pose estimates (e.g. C
AB
for node
A relative to node B) can be computed from these, either di
rectly or through one or more compositions. Initially, each
node begins in a singleton group, of which it is the leader,
with its group coarse pose estimate initialized to P
I
.
A merge is initiated when two nodes have detected a cer
tain minimum number t
m
of consistent matches with each
other. Consistency is enforced via a threshold t
c
specify
ing the minimum Euclidean distance between the pose esti
mates’ mappings of a given point (such as the centroid µ
S
of the computing node’s point set). Once a node has stored
at least t
m
matches with a particular other node, e ach time a
new match is de te cted for that node, an average coarse pose
estimate is computed for every combination M
i
of matches
containing the new match, and checked for consistency:
C
m
(µ
S
) − C
avg
(µ
S
) ≤ t
c
, ∀m ∈ M
i
(7)
If a consistent average is found, it is considered a reliable
relative coarse pose estimate, and is forwarded to the source
nodes’ group leaders and composed as necessary to merge the
nodes’ respective groups.
Fig. 1. Group Merging
Figure 1 illustrates a typical group merge. Node D, of
group G
A
, and node G, of group G
F
, ﬁnd a relative coarse
pose estimate through feature matching, and initiate a merge.
The nodes in group G
A
do not modify their group coarse pose
information. Node G’s new group coarse pose estimate (C
′
G
)
is the composition of its estimated pose relative to node D
with node D’s group coarse pose estimate:
C
′
G
= C
GD
◦ C
D
(8)
The new group coarse pose estimates for the merging group’s
leader (C
′
F
) and any other nodes in the merging group (in
this case, C
′
H
) can similarly be calculated as compositions of
known pose estimates:
C
′
F
= C
−1
G
◦ (C
GD
◦ C
D
) (9)
C
′
H
= C
H
◦ (C
−1
G
◦ (C
GD
◦ C
D
)) (10)
Since merging consists of composition operations, it is a tran
sitive operation which can occur based on matches (and the
resultant relative coa rs e pose e stimates) between any pair of
nodes in different groups. Figure 1 illustrates this by showing
the actual history of merges leading to the groups as arrows
between the node pairs; in reality, of course, every node in the
group has a direct pose es timate to the leader (group coarse
pose estimate).
3.3.5. Pairwise Pose Reﬁnement
Once a given pair of node s belong to a group via the feature
matching process, those node s can use their coarse relative
pose estimate as a starting point for pose reﬁnement. This is
achieved by applying a ﬁne registration algorithm to a large
number of points initialized into coarse alignment.
Fig. 2. Field of View Cone Approximation
As shown in Figure 2, the actual point s ets used for ﬁne
registration are selected, at each node, as those falling within
the intersection of the two nodes’ ﬁelds of view, as approxi
mated by a cone of a certain angle and length extending along
the positive zaxis of each node’s coordinate system (via the
coarse pose estimate). If there are fewer than a speciﬁed min
imum number of points, which includes the case where there
is no intersection at a ll, the nodes do not attempt pose reﬁne
ment.
3.3.6. Indirect Pose Estimation
A pair of nodes attempting to determine their relative pose can
now communicate directly to ﬁnd the shortest path along the
existing pairwise ﬁne pose estimates (calibration graph) and
thus obtain a composition with a minimum of error. A node A
may ﬁnd such an estimate P
AB
relative to a node B according
to the following a lgorithm (suppose F P
A
represents the set of
ﬁne pose estimates at node A):
1. If P
AB
∈ F P
A
, select P
AB
and end.
2. For each P
AX
∈ F P
A
, reques t F P
X
from node X. If
P
XB
∈ F P
X
, select P
AB
= P
AX
◦ P
XB
and end.
3. For each P
XY
∈ F P
X
, reques t F P
Y
from node Y . If
P
Y B
∈ F P
Y
, select P
AB
= P
AX
◦ P
XY
◦ P
Y B
and
end.
4. Continue until P
AB
has been found.
As indirect ﬁne pose e stimates are found (even intermediate
ones that were not requested), they should be added to F P to
avoid unne cessary repetition of network requests and compu
tations.
4. ALGORITHM DESIGN
4.1. Distributed Calibration Algorithm
The algorithm is split into ten distinct processes at each node;
six for coarse grouping, and four for pairwise pose reﬁne
ment. Each process acts upon receipt of a message, with the
exception of the feature selection process, which executes pe
riodically, and the pose reﬁnement initiator process, which
executes whenever the group composition is updated.
There are four parameters intrinsic to the algorithm itself,
following from Section 3.3: the feature size f , the similar
ity threshold t
d
, the match threshold t
m
, and the consistency
threshold t
c
. Certain other implementationspeciﬁc parame
ters are also required, notably those for the coarse and ﬁne
registration algorithms; in particular, t
ec
and t
ef
are refer
enced here as generic error thres holds for the coarse and ﬁne
registration a lgorithms, respectively.
Fig. 3. Feature Selection Process
Fig. 4. Feature Matching Process
Fig. 5. Match Processing Process
Fig. 6. Group Merge Initiator Process
Fig. 7. Group Merge R esponder Process
Fig. 8. Group Update Process
Fig. 9. Pose Reﬁnement Initiator Process
4.2. Interest Point Detection
Three distinct parts of calibration are impacted directly by the
interest point detection algorithm used: the c orresponde nce
algorithm, the coarse matching s cheme, and the ﬁne registra
tion algorithm. In all three cases, it is the repeatability perfor
mance metric which is of interest; higher repeatability yields
higher overlap in the point sets.
For practical purposes (including the availability of source
Fig. 10. Pose Reﬁnement Responder Process
Fig. 11. Fine Registration Process
Fig. 12. Pose Update Process
code from the authors), the FAST interest point detector [14,
15] is selected for this implementation. C onvergence ca n be
encouraged by constraining nodes to sha re large portions of
their ﬁelds of view or by calibrating on a sc ene with strong
interest points.
4.3. Registration
Since matching features overlap fully, an excellent s olution to
the coarse registration problem is the fullycontained version
of the DARCES algorithm [10], using three control points.
DARCES without the RANSAC component is a relatively
simple algorithm, allowing it to perform rapid matching on
a large number of features.
The concept of the Iterative Closest Point (ICP) algorithm
[11] le nds itself well to the ﬁne registration problem encoun
tered in pairwise pose reﬁnement. However, the difﬁculty
of stable interest point detection, occlusion effects, and un
certainty about the overlap in ﬁeld of view all contribute to
poor overlap in the point sets used for pose reﬁnement. The
Trimmed Iterative Closest Point (TrICP) algorithm [12], used
in this implementation, can be automatically tuned to any de
gree of overlap, and is applicable to overlaps under 50%.
5. EXPERIMENTS
5.1. Performance Metrics
5.1.1. Convergence
Convergence is the measure of the algorithm’s ability to bring
nodes into a common reference frame and its time perfor
mance in doing so. For our purposes, there are actually two
distinct considerations:
1. The ability of coarse grouping to merge into a minimum
number of groups.
2. The ability of pairwise pose reﬁnement to establish a
maximum number of pairwise estimates.
Calibration is considered successful in terms of convergence
when coarse grouping merges the entire network into a single
group and pairwise ﬁne pose estimates are established such
that the calibration graph is connected.
5.1.2. Accuracy
Accuracy is the measure of the error in the algorithm’s result
ing pose estimates. The mean error in a pose estimate can
be determined by averaging the Euclidean distance between a
number of points with groundtruth corresponde nc e, detected
and triangulated at the nodes separately from those used for
calibration. Although error accumulates with the path length
(number of pose c ompos itions) in the calibration graph, it is
more relevant to cons ider the path length in the vision graph,
since the 3D reconstruction cons istency among nodes observ
ing the same part of the scene is the likely criterion.
5.1.3. Scalability
Scalability is the measure of the effect on the algorithm’s pe r
formance of the number of nodes in the network. The three
primary resources to consider are nodelocal computing re
sources (i.e. CPU and memory), nodelocal data storage, and
network bandwidth.
In order to properly evaluate sca la bility, it is necessary to
examine individual factors arising from the algorithm itself.
The most signiﬁcant of these can be summarized in terms of
the number of nodes in the network N as follows:
• Feature dissemination requires bandwidth resources in
N per node.
• Feature matching requires computing and storage re
sources in N.
This as sumes that each node maintains a more or le ss constant
number of pairwise edges in the vision graph regardless of
N, as would be the case with most applications. In cases
where this assumption does not hold, it is necessary to add a
third factor:
• Pairwise pose reﬁnement computation requires com
puting resources in N.
Scalability in all three resources can be quantized experimen
tally in te rms of the above factors.
5.2. Manual Point Set
In order to test the capabilities of the ca libration algorithm
and tune its parameters under controlled conditions, the ﬁrst
experiment series is designed to ope rate on manually selected
points with full correspondences across all four node s. The
primary purpose of this experiment type, once suitable param
eters are found, is to test the effects of different point set s izes
and overlap characte ristics on convergence and accuracy.
5.2.1. Procedure
A total of 22 point subsets are extracted from the da ta, and
each is tested using the distributed calibration software, with
all four nodes running loc ally on the same workstation. This
procedure is repeated twice for each subset, and the average
results for convergence time and mean error are calculated
and recorded.
5.2.2. Results
50
60
70
80
90
100
20
30
40
50
10
100
1000
(a) Convergence Time Trends in n and p
50
60
70
80
90
100
20
30
40
50
1.75
2
2.25
2.5
2.75
(b) Accuracy Trends in n and p
Fig. 13. Manual Experiment Results
5.3. Automatic Point Set
Having es ta blished some criteria for reasonably timely con
vergence in the manual point set experiments, the next ste p is
to test real automatic calibration of the ne twork. The purpose
of these experiments is to test the convergence and accuracy
performance of the algorithm in real conditions.
5.3.1. Procedure
Four instances of the local point detection software, conﬁg
ured to execute the distributed calibration software on com
pletion, are run in automatic mode on the vision platform
workstation. Convergence time and the ﬁnal calibration graph
are recorded. A ground truth point set is manually selected
for each camera rig, and the mean error is calculated and
recorded.
5.3.2. Results
The me a n error a nd convergence time of a typical experiment
from this series is shown below. Figure 14 shows the ﬁnal
calibration graph, Figure 15 shows the physical deployment
of the nodes for the experiment, and Figure 16 shows the vi
sualization of the resultant pose estimates (which can be seen
to match the conﬁguration in Figure 15).
• Mean Error: 2.7666 mm
• Convergence Time: 159 s
Fig. 14. Calibration Graph for A utomatic Experiment
5.4. Virtual Point Set
Since only four physical camera rigs are available, testing
scalability to larger networks is impossible in an automatic
experiment and difﬁcult to control using the manual meth
ods. Instead, controlled virtual point sets a re supplied to the
same calibration algorithm implementation to test the scala
bility metric.
5.4.1. Procedure
Point sets are generated for 5, 10, 15, 20, and 25 nodes. The
total outgoing bandwidth in kilobytes, ﬁnal size of the match
ing database in features, and total number of coars e and ﬁne
registration executions are recorded.
Fig. 15. Camera Deployment for Automatic Experiment
Fig. 16. Pose Visualization for Automatic Experiment
5.4.2. Results
As expected, total bandwidth usage per node increases ap
proximately linearly in relation to the number of nodes in the
network (Figure 17). This affects different networks in differ
ent ways. In a network where the physical me dium is shared
by all nodes – the worstcase scenario – the total network
bandwidth usage is the relevant factor. In that case, the band
width usage increases nonlinearly, potentially at up to N
3
.
However, many routing methods us e d in sensor networks are
much more efﬁcient and therefore mitigate this effect.
The number of features stored at each node increases ap
proximately linearly in relation to the number of nodes (Fig
ure 18). Features are very small data (a series of f 3tuples, an
identiﬁer, and a geometric descriptor value), but when scaling
to extremely large networks it must be ensured that adequate
storage is provided at each node for these features.
The number of coarse registration operations performed at
each node increases approximately linearly in relation to the
number of node s (Figure 19); as expected, this is proportional
Fig. 17. Bandwidth Usage in N (Average and Maximum)
Fig. 18. NodeLocal Storage in N (Average and Maximum)
to the number of features stored. If processing throughput is
the limiting factor, this increase will cause the convergence
time to increase linearly with the number of nodes.
Fig. 19. Coarse Registration Processing in N (Average and
Maximum)
Since, in this network, the number of vision graph edges
per node does not generally increase as its total number of
nodes increases, the number of ﬁne registrations per node is
approximately constant.
6. CONCLUSIONS
A calibration method for distributed smart stereo camera net
works has been developed which converges well, provides ac
curate pairwise orientation, and scales well to large networks.
This provides a bas e upon which to build a full 3D visual
sensor network providing primitive datacentric queries, upon
which in turn a variety of highlevel applications can be de
veloped.
Currently, the algorithm makes it possible for smart stereo
camera devices to selflocalize and selforient relative to one
another in a distributed fashion, a llowing for various subse 
quent stages of realization for a variety of applications. The
immediate opportunity is to provide a generalized framework
for building these solutions, which would rest on the underly
ing ass umption that the network is accurately calibrated and
can perform 3D reconstruction across multiple views.
The major implementation drawback is the instability of
interest point detection in the general case; at present, it is
necessary to control the s cene somewhat by adding one or
more calibration ta rgets for convergence to occur reliably. Im
proving this situation is an important avenue for future work.
7. REFERENCES
[1] A. Mavrinac, “FeatureBased Calibration of Distributed
Smart Stereo Camera Networks,” M.A.Sc. Thesis, Uni
versity of Windsor, 2008.
[2] M. Akdere, U. Cetintemel, D. Crispell, J. Ja nnotti,
J. Mao, and G. Taubin, “Data Ce ntric Visual Sensor
Networks for 3D Sens ing,” in Proc. 2nd Intl. Conf. on
Geosensor Networks , 2006.
[3] J. Jannotti and J. Mao, “Distributed Calibration of Smart
Cameras,” in Proc. Intl. Workshop on Distributed Smart
Cameras, pp. 55–61, 2006.
[4] D. Devarajan and R. J. Radke, “Distributed Metric Cal
ibration of Large Camera Networks,” in Proc. 1st Work
shop on Broadband Advanced Sensor Networks, 2004.
[5] W. E. Mantzel, H. Choi, and R. G. Baraniuk, “Dis
tributed Camera Network Localization,” in Proc. 38th
Asilomar Conf. on Signals, Systems and Computers,
2004.
[6] S. Funiak, C. Guestrin, M. Paskin, and R. Sukthankar,
“Distributed Localization of Networked Cameras,” in
Proc. 5th Intl. Conf. on Information Processing in Sen
sor Networks , pp. 34–42, 2006.
[7] C. Beall and H. Qi, “Distributed SelfDeployment in Vi
sual Sensor Networks,” in Proc. Intl. Conf. on Control,
Automation, Robotics and Vision, pp. 1–6, 2006.
[8] C. J. Taylor and B. Shirmohammadi, “Self Localiz
ing Smart Camera Networks and their Applications to
3D Modeling,” in Proc. Intl. Workshop on Distr ibuted
Smart Cameras, pp. 46–50, 2006.
[9] J. Salvi, C. M a ta bosch, D. Foﬁ, and J. Forest, “A Review
of Recent Range Image Registration Methods with Ac
curacy Evaluation,” Image and Vision Computing, vol.
25, no. 5, pp. 578–596, 2007.
[10] C.S. Chen, Y.P. Hung, and J.B. Cheng, “RANSAC
Based DARCES: A New Approach to Fast Automatic
Registration of Partially Overlapping Range Image s,”
IEEE Trans. on Pattern Analysis and Machine Intelli
gence, vol. 21, no. 11, pp. 1229–1234, 1999.
[11] P. J. Besl and N. D. McKay, “A Method for Registration
of 3D Shapes,” IEEE Trans. on Pattern Analysis and
Machine Intelligence, vol. 14, no. 2, pp. 239–256, 1992.
[12] D. Chetverikov, D. Svirko, D. Stepanov, and P. Krsek,
“The Trimmed Iterative Closest Point Algorithm,” in
Proc. Intl. Conf. on Pattern Recognition, pp. 545–548,
2002.
[13] A. Baumberg, “ R eliable Feature Matching Across
Widely Separated Views,” in Proc. IEEE Conf. on Com
puter Vision and Pattern Recognition, pp. 1774–1781,
2000.
[14] E. Rosten and T. Drummond, “Fusing Points and Lines
for High Performance Tracking,” in Proc. 10th IEEE
Intl. Conf. on Computer Vision, pp. 1508–1511, 2005.
[15] E. Rosten and T. Drummond, “Machine Learning for
HighSpeed Corner Detection,” in Proc. 9th European
Conf. on Computer Vision, pp. 430–443, 2006.
[16] H. C. LonguetHiggins, “A Computer Algorithm for Re
constructing a Scene from Two Projections,” Nature,
vol. 293, pp. 133–135, 1981.
[17] M. Raynal, Distributed Algorithms and Protocols, John
Wiley & Sons, 1988.
[18] O. Faugeras, ThreeDimensional Computer Vision: A
Geometric Viewpoint, The MIT Press, 1993.
[19] Y. Ma, S. Soatto, J. Ko
˘
seck
´
a, S. S. Sastry, An Invita
tion to 3D Vision: From Images to Geometric Models.
SpringerVerlag, 2004.