ArticlePDF Available

Camera calibration correction in Shape from Inconsistent Silhouette

Authors:

Abstract and Figures

The use of shape from silhouette for reconstruction tasks is plagued by two types of real-world errors: camera calibration error and silhouette segmentation error. When either error is present, we call the problem the Shape from Inconsistent Silhouette (SfIS) problem. In this paper, we show how small camera calibration error can be corrected when using a previously-published SfIS technique to generate a reconstruction, by using an Iterative Closest Point (ICP) approach. We give formulations under two scenarios: the first of which is only external camera calibration parameters rotation and translation need to be corrected for each camera and the second of which is that both internal and external parameters need to be corrected. We formulate the problem as a 2D-3D ICP problem and find approximate solutions using a nonlinear minimization algorithm, the Levenberg-Marquadt method. We demonstrate the ability of our algorithm to create more representative reconstructions of both synthetic and real datasets of thin objects as compared to uncorrected datasets.
Content may be subject to copyright.
Camera Calibration Correction in Shape from Inconsistent
Silhouette
Amy Tabb1,2and Johnny Park2
amy.tabb@ars.usda.gov jpark@purdue.edu
Abstract The use of shape from silhouette for
reconstruction tasks is plagued by two types of real-
world errors: camera calibration error and silhouette
segmentation error. When either error is present, we
call the problem the Shape from Inconsistent Silhou-
ette (SfIS) problem. In this paper, we show how small
camera calibration error can be corrected when using
a previously-published SfIS technique to generate a
reconstruction, by using an Iterative Closest Point
(ICP) approach. We give formulations under two
scenarios: the first of which is only external camera
calibration parameters rotation and translation need
to be corrected for each camera and the second of
which is that both internal and external parameters
need to be corrected. We formulate the problem as
a 2D-3D ICP problem and find approximate solu-
tions using a nonlinear minimization algorithm, the
Levenberg-Marquadt method. We demonstrate the
ability of our algorithm to create more representative
reconstructions of both synthetic and real datasets of
thin objects as compared to uncorrected datasets.
I. INTRODUCTION
Camera calibration error greatly affects the accuracy
of intersection-based Shape from Silhouette reconstruc-
tions such as the visual hull (VH) [1], particularly when
the object to be reconstructed is thin. When silhouette
segmentation error and camera calibration error are as-
sumed to be present, we call the task of reconstructing
an object’s shape given these types of error the Shape
from Inconsistent Silhouette (SfIS) problem.
This paper deals with the problem of correcting small
camera calibration error in a Shape from Inconsistent Sil-
houette context, meaning that not only is calibration er-
ror present, but silhouette segmentation error is present
as well. We use the results of this work for our current
application of reconstructing leafless trees in laboratory
and field conditions, necessary for extracting tree fea-
tures for phenotyping or robotic pruning. Consequently,
we do not make any a priori assumptions about the
object’s shape as there is significant biological diversity.
However, the trees we use are trained such that there are
Mention of trade names or commercial products in this publi-
cation is solely for the purpose of providing specific information
and does not imply recommendation or endorsement by the U.S.
Department of Agriculture. USDA is an equal opportunity provider
and employer. The work in this document was partially supported
by Pennsylvania Department of Agriculture grant “Improving the
Fruit Quality of Pennsylvania Apples with Precision Pruning” (404-
57, 815U 4510).
1USDA-ARS-AFRS, Kearneysville, West Viginia, USA
2Purdue University, West Lafayette, Indiana, USA
Fig. 1. A partial silhouette of a tree with segmentation error
one to four main branches with thin branches emanating
from those larger branches. In addition, because of space
constraints in the orchard rows, the silhouettes of the tree
are partial or truncated silhouettes as in Fig. 1. While
these details of the application greatly influenced our
approach to camera calibration correction for SfIS, the
method described in this work could be used for other,
smoother and smaller objects as well.
The recent work on correcting camera calibration pa-
rameters from silhouettes can be divided into three rough
groups. The first concerns the calibration of cameras
from silhouettes assuming circular motion, such as that
of an object on a turntable ([2], [3], [4], [5], and [6]).
The second group considers general camera motion ([7],
[8], [9], [10]), while the third calibrates camera networks
using sequences of silhouettes, usually of humans moving
in the environment ([11] and [12]).
Within these groups are differing techniques for refin-
ing camera calibration given initial calibration parame-
ters. Most utilize epipolar constraints, and differ in the
manner that frontier points are used. Åström et al. in
[7] used generalized epipolar constraints, and requires
accurate silhouettes and frontier point localization. Boyer
in [8] uses a pairwise cone tangency constraint. In Huang
and Lai [3], circular motion is estimated by exploiting
the homography that relates silhouettes and epipoles.
Mendonça et al. estimates circular motion using a se-
quential approach where epipoles are estimated last in
[4]. Furukawa et al. in [9] use a RANSAC strategy
to estimate camera calibration parameters and frontier
points for orthographic cameras. Sinha et al. in [12] get
around the problem of few epipolar tangents by using
video sequences of humans moving in the environment,
2015 IEEE International Conference on Robotics and Automation (ICRA)
Washington State Convention Center
Seattle, Washington, May 26-30, 2015
978-1-4799-6922-7/15/$31.00 ©2015 IEEE 4827
and is somewhat tolerant of segmentation error. Using
a different approach but a similar application, Zhang et
al. in [11] use the centroids of humans moving in the
environment to create correspondences, and calibration
is done using a structure from motion approach. Using a
calibration generated by circular motion as an initializer,
Wong and Cipolla use a manually-aligned initialization
for more general motion in [6]. Yamazoe et al. mini-
mize the distance between frontier points projected onto
images and the silhouettes using bundle adjustment in
[10]. Finally, Zhang and Wong [5] in a circular turntable
sequence estimate the internal and external parameters
using epipolar tangents.
The use of epipolar constraints makes several assump-
tions about the characteristics of the datasets, 1) that
the silhouettes are generally accurate and 2) that the
silhouettes capture the whole object, meaning that the
image silhouettes are not truncated or partial silhouettes
of the full object. Henández et al. in [2] developed a
circular motion calibration system for silhouettes under
the assumption that silhouettes may be truncated or par-
tial, using a silhouette consistency measure. Furukawa
and Ponce [13] create a more accurate and efficient
reconstruction pipeline by using a hierarchical process to
generate camera and reconstruction parameters; scaled-
down images are used first, and as the algorithm pro-
gresses larger and larger images are used to refine pa-
rameters using a structure from motion approach.
In this work, we propose a camera calibration cor-
rection procedure that is not dependent on epipolar
constraints. As mentioned previously, the use of epipolar
constraints assumes that silhouettes are relatively accu-
rate and reflect the complete object (i.e. not truncated
or partial). Our approach is to minimize the projection
error of the reconstruction and the silhouettes, using a
three-step procedure. In the first stage, an initial re-
construction is estimated using a reconstruction method
for SfIS. Then, the SfIS reconstruction is aligned to
the input silhouettes using an Iterative Closest Point
(ICP) approach. The resulting 3D-2D ICP optimization
problem is non-linear, so we use a Levenberg-Marquadt
method for finding an approximate solution. Then a new
SfIS reconstruction is found using the updated camera
calibration parameters. We use the SfIS reconstruction
method described in [14]. The camera calibration correc-
tion does not depend on the choice of the reconstruction
method, though, so other SfIS reconstruction methods
may be substituted such as [15], [16], [17], [18], and [19].
Our work is most similar to Henández et al. [2] as they
use a silhouette consistency measure and allow partial
and truncated silhouettes. However, in [2] the silhouettes
are assumed to be relatively accurate and the motion
is assumed to be circular, while in our work we deal
with general camera motion and silhouettes with seg-
mentation error. Our method also has some similarities
to Wong and Cipolla [6] in that the reconstruction image
and silhouette image are aligned; however, they use a
manual method to generate a initial calibration and their
cost function is dependent on the presence of epipolar
tangencies. Finally, the work of Furukawa and Ponce [13]
is similar to ours in that the estimated reconstruction is
projected to each image and matches are found, though
their work estimates a new reconstruction after every
round of matches, which we avoided because of the
computational expense involved in the SfIS problem.
In summary, our contributions to the state of the art
are:
1) A method for correcting camera calibration error in
a SfIS context, under two different scenarios, with
partial silhouettes and general camera motion.
2) A description of a 2D-3D ICP algorithm for align-
ment.
3) More representative reconstructions of compli-
cated, thin objects such as trees.
II. Preliminaries
Our notation for camera calibration parameters closely
follows that of Hartley and Zisserman in [20].
We assume that the camera calibration parameters
for nccameras are represented by the matrices K
R3×3,RR3×3,tR3×1, and where the projec-
tion equations for the relationship between a three-
dimensional point in homogenous coordinates XR4×1
and an two-dimensional image point in homogenous
coordinates xR3×1is:
x=K[R t]X(1)
and x=x0x1x2T, is an image point in the xand y
direction is the pair (x0/x2, x1/x2).
Rcan be decomposed into three Euler angles, θx,θy,
and θz, so we represent Ras a function of three angles:
R=R(θx, θy, θz)(2)
A parameterization of Ris necessary to preserve the
orthonormality of Rduring the Levenberg-Marquadt
minimization in Section V-D.
Furthermore, we assume that Kis an upper triangular
matrix of the form
K=
k00k2
0k3k4
001
(3)
However, other forms of Ksuch as those described in
[20] can be used as well.
Finally, we assume that the initial reconstruction can
be represented by closed polyhedral meshes. Our imple-
mentation is a voxel-based technique where the voxels are
cubes, so the polyhedral mesh is made up of square faces
that represent the boundary between the reconstructed
shape and empty space. However, other reconstruction
methods whose output is or could be converted to a
polyhedral mesh such as in EPVH [21] can be used with
4828
the camera calibration correction procedure we describe
here.
III. Camera configuration scenarios
We present camera calibration correction procedures
in two scenarios:
1) Adjust the Rand tmatrices for each camera, while
keeping Kconstant for each camera.
2) Adjust the R,t, and Kmatrices for each camera.
Which scenario to use depends on the application and
one’s assumptions about the fidelity of K. For instance,
we have multi-camera datasets with poor camera calibra-
tion; in such a case the second scenario would be chosen
to find an appropriate alignment of the reconstruction
and the silhouettes. We also have a dataset where one
camera is mounted on the end-effector of a robot and K
is assumed to be accurate, in this case, we choose the first
scenario. Various other scenarios are possible depending
on the application, and can be derived from the these two
basic scenarios and the framework we present in Section
V-C.
When we describe the general method, we let the
parameters be represented by a vector p. The sizes of
pfor the first configuration is 6 (three angles for R
and three elements of t) and for the second is 10 (the 6
from the first scenario and 4 internal camera calibration
parameters). Whatever the scenario used, we denote the
matrix represention of pbe P(p).
IV. Method overview
Here, we summarize the method for correcting camera
calibration and relate it to the stopping criterion we use
for our ICP algorithm. First, we denote a reconstruction’s
shape as S.Sconsists of a set of faces, as mentioned in
Section II.
In our previous work ([14]), we gave a metric for
describing the degree of mismatch between the recon-
struction and a set of input silhouettes I, where an
individual image is denoted by I. For one image in the
sequence of input silhouette images, the image of Sis
computed using the camera calibration parameters of
that input image; this image is IS. Then the silhouette
inconsistency error (SI E) of the reconstruction and the
input image is SI E =Pq|I(q)IS(q)|, where qis a
pixel index.
Given these preliminaries about SI E, the general algo-
rithm is outlined in Algorithm 1. The SI E error is com-
puted using the current camera calibration parameters;
if some better parameters can be found using the 2D-
3D ICP algorithm then these parameters are accepted as
the current parameters. We define “better” as parameters
that result in a lower SIE value. This process is repeated
for each image in the input silhouette sequence.
An illustration of the algorithm’s progress on align-
ment is shown in Fig. 2.
Algorithm 1 Cali-Correction(I,S,p(0))
1: nis the maximum number of iterations
2: I(0)
Sis the image of Susing P(p(0))
3: SI E(0) =Pq|I(q)I(0)
S(q)|
4: p=p(0)
5: for all i= 0 to ndo
6: Align I(i)
Sto Iusing the 2D-3D ICP; result is p(i+1)
7: Compute I(i+1)
Susing p(i+1)
8: SI E(i+1) =Pq|I(q)I(i+1)
S(q)|
9: if SI E(i+1) SI E(i)then
10: p=p(i+1)
11: else
12: break;
13: end if
14: end for
15: return p
V. 3D-2D ICP
This section details how we adapt the ICP algorithm,
which is usually used for 2D-2D alignments or 3D-3D
alignments, to the case of a 2D-3D alignment. While
one option was to perform a standard 2D-2D alignment
assuming a planar projective transform, and then to
interpret those results as a camera calibration correction,
this approach ignores the dimensionality of the original
problem. There are various efficient variants of ICP
available for aligning 3D meshes ([22]). We adapt a basic
form given in ([22]), which consists of a sequence of select,
match, and minimize steps.
A. Selection of 3D points
We now give more details about the projection of Sto
IS, and how ISis used in the 2D-3D ICP algorithm.
Every face in Sis made up of a sequence of three-
dimensional points X. We project each point to the
camera specified by P(p), and repeat this process for all
faces in S; by filling in the convex polygon created with
the sequence of projected points for each face, we can
generate IS. From there, we determine which points X,
when projected, fall inside the silhouette of ISand which
fall outside. In Fig. 3, we show the silhouette boundary
of ISfor some large voxels as a medium gray lines,
while those projected points that are on the silhouette
boundary are represented by white circles, and those
inside the silhouette boundary are represented as green
circles.
The 3D points that we use for ICP are those points
that are on the silhouette boundary of IS– in other
words, the points which projected produced white circles
in Fig 3 – and we denote this set of points for camera cas
XSc. We use all of the points in XScto generate matches.
B. Matching 3D points with 2D image coordinates
From the original image silhouettes (I) and the silhou-
ette of the reconstruction (IS) for camera c, we compute
4829
(a) The first six iterations of the ICP algorithm where Rand tare allowed to change, Kis kept fixed.
(b) The first six iterations of the ICP algorithm where R,t, and Kare all allowed to change. This run of ICP is done
after the completion of a round when only Rand tare allowed to change.
Fig. 2. [Best viewed in color.] Illustration of the progression of the camera calibration correction algorithm. Original silhouette image
pixels Iare medium gray; this silhouette boundary of the reconstruction image ISis in green. The top row represents the alignment as
a result of the first six iterations of the 2D-3D algorithm where Rand tare adjusted. Once that process terminates as a result of the
stopping criterion related to SI E, the 2D-3D algorithm is run again, the difference being that R,t, and Kare adjusted. The second row
represents the first six iterations of the second process, once the R,tonly adjustment has been completed. More details of particular
experimental choices can be found in Section VI.
Fig. 3. [Best viewed in color.] This figure is an illustration
of how ISis generated from S. The points composing each face
in Sare projected using a current estimate of camera calibration
parameters; the projected face is filled to generate a black and white
image. The silhouette boundary is shown in this figure as medium
gray lines, while projected points inside the boundary are shown as
green circles. Points on the silhouette boundary are represented as
white circles; the 3D points generating the white circles form the
set XScfor a camera c.
the surface normals for each pixel of the silhouette. Since
we use square faces for S, depending on the voxel size the
projection of the reconstruction can have right angles
and other severe changes in normal vector direction,
particularly for large voxels as shown in Fig. 3. To
reduce this effect, we smooth the normals of the projected
reconstruction silhouettes. Given the kth silhouette pixel
in a contour, the smoothed normal is given by simple
averaging, where n0
kis the smoothed normal at position
k:n0
k= (nk+nk1+nk+1)/3. This smoothing process
is performed twice.
Given that the projection of a 3D point XXScis x=
PiX, we search for the closest original image silhouette
point to xwhere the angle between normals is less that
2π/3. We represent this image silhouette point as φ(X).
Many ICP algorithms reject a percentage of the worst
matches. Our approach to SfIS has been to assume that
error exists, but not to specify the quantity or source of
that error. As a result, we are reluctant to use a pre-set
threshold for rejecting matches. For instance, sometimes
the matches are quite accurate, and discarding some
of them according to a set percentage would result in
discarding good information. On the other hand, some
reconstructions Sare quite noisy, so the reject percentage
should be large. To avoid committing to a threshold for
rejection ahead of time, we implemented the following
scheme.
First, we perform the Levenberg-Marquadt minimiza-
tion for the given matches without rejecting any matches.
If the resulting camera parameters pgive a lower value
of SI E, we accept those parameters as p=pand stop.
If not, we reject the worst 1% of matches and run the
minimization again. This process continues until either
parameters resulting in a smaller value of SIE are found,
or the number of iterations is exceeded (typically set at
10 in our experiments).
C. Cost function formulation
Once the matches φ(X)are found, we seek camera cal-
ibration parameters that minimize the distance between
the projections of Xand the silhouette matched pixel
φ(X).
min
pX
i
||P(p)Xiφi(Xi)||2(4)
We can represent Eq. 4 as follows, where P(p)T
1repre-
sents the first row of P(p),P(p)T
2the second row and so
on, as in Hartley and Zisserman [20], and where φi(Xi)1
4830
is the xcomponent of the matching pixel to Xiand
φi(Xi)2is the ycomponent of the matching pixel to Xi:
ˆp=arg min
pX
i
P(p)T
1Xi
P(p)T
3Xi
φi(Xi)12
+P(p)T
2Xi
P(p)T
3Xi
φi(Xi)22(5)
This is a nonlinear least squares problem; we can
rearrange into the standard form with a residual vector
as follows:
ˆp=arg min
p
2|XS|−1
X
j=0
rj(P(p))2(6)
where
r2i(P(p)) = P(p)T
1Xi
P(p)T
3Xi
φi(Xi)1(7)
r2i+1(P(p)) = P(p)T
2Xi
P(p)T
3Xi
φi(Xi)2(8)
for all XiXS.
While many other ICP algorithms use a weighting for
each match, we instead use a constant weighting for each
match.
D. Levenberg-Marquadt modification for Newton’s
method of nonlinear least squares
To find an approximate solution to Eq 6, we use
the Levenberg-Marquadt modification for nonlinear least
squares. We quickly summarize the method here; more
in-depth treatments can be found in optimization texts
such as [23].
We compute the Jacobian J(P(p)), whose size is
2|XS| × |p|matrix. Each element of the matrix is Jij =
∂ri
∂pj.
Given the Jacobian and the residual functions, the
update formula for a new p(k+1) is:
p(k+1) =p(k)
JP(p)TJP(p)+µkI1
JP(p)Tr(P(p)) (9)
where Iis an identity matrix of size |p| × |p|and
µk0is the damping parameter and chosen according
to the standard practice of choosing a small µkand then
increasing it until the direction is a descent direction. We
let p(0) be the initial camera calibration parameters.
VI. Implementation details
As mentioned previously, we use the method in [14]
for generating SfIS reconstructions. In that method, the
SfIS reconstruction problem is cast as a pseudo-Boolean
minimization problem with a non-submodular function.
Given that the problem is NP-Hard, a local minimum is
found with a local search method.
In our implementation of the method, the projection
information for each voxel is computed before local
search, to avoid having to recompute projection infor-
mation over multiple iterations. However, this leads to
a great consumption of memory. We then altered the
original algorithm to use a hierarchical approach so that
memory requirements are satisfied even on very large
datasets, with over 100 million voxels.
First, we choose a large voxel size and perform the local
minimum search. Then, starting with a reconstruction
at the large voxel size, one voxel is divided into 8 child
voxels like in an octtree; the child voxels’ label is the
same as their parent voxel’s label. For the child voxels,
projection information is computed for voxels within a
specified distance of the border of empty and occupied
voxels, and another reconstruction is generated with
the child voxels. This splitting continues for as many
iterations as the user desires.
In the results shown here, we let the number of
divisions of voxel size be k. To generate uncorrected
results, we proceed through all of the divisions, to the
kth division and local minimum search. For corrected
results, we use the reconstruction generated at k1
division, correct camera calibration parameters, and then
proceed to the kth division to do local minimum search
with correction parameters.
We deal with the two correction scenarios, when only
external parameters are corrected, and when internal and
external parameters are corrected, in the following way.
We correct the external parameters first. Then, if internal
parameters are to be corrected as well, we correct exter-
nal and internal parameters using the solution generated
by correcting only the external parameters as an initial
solution.
Finally, for the implementation of the Levenberg-
Marquadt method, we used the software package levmar
[24]. All of of the results shown here were generated on
a workstation with one 8-core processor and 72 GB of
RAM.
VII. Datasets
To validate our camera calibration algorithm, we
demonstrate results on both a simulated and six real
datasets, making a total of seven datasets. Descriptions
of the two types of datasets follow. All 3D results are
visualized with Meshlab [25].
A. Simulated data
We generated a simulated dataset, called the
Model Tree dataset, composed of 32 cameras and
using a tree model shown in Figure 5a. Camera cali-
bration error was added to the dataset by altering one
element of translation component of each camera by
e∈ {−20,19, ..., 19,20}. The element altered (x,y, or
z) and the value of ewas determined randomly, with
4831
(a) (b)
Fig. 4. [Best viewed in color.] Illustration of the synthetic
dataset Model Tree used to validate the algorithm. In 4a, one of
the 32 silhouette images of the synthetic tree. In 4b, the synthetic
tree is shown with the 32 cameras, which are placed on two planes.
all possible values of ehaving a uniform probability of
being selected, as the three axes of t. An example of a
silhouette image from this dataset is shown in Figure 4,
as well as the placement of cameras with respect to the
model. All of the silhouette images for this set consist of
partial silhouettes. The reconstruction voxel size is 2.5
mm.
To assess the error of our reconstruction, we also
generated a voxel-version of the ground truth model to
compare to the reconstructions with and without the
correction. The voxel size of the tree matches that of the
reconstruction, so false positive and false negative rates
can easily be computed.
B. Real data
We generated real datasets in the our laboratory using
two different camera configurations. In the first configu-
ration, there are 13 inexpensive webcameras (image size
1280 ×960 pixels) mounted such that they are to the side
and above an object. The external camera calibration
was estimated using the camera calibration procedure
of Zhang [26], using a custom 2-plane calibration object.
The datasets using this configuration are called Branch,
Coil, and Coils and Cable. The voxel size is 1.5 mm.
The second configuration consists of one camera (im-
age size 2456 ×2058 pixels) mounted on the end effector
of a robot; the robot moves to 38 different positions and
acquires images of a larger object than considered in the
first configuration. The external camera parameters are
estimated using a hand-eye, robot-world calibration. The
datasets using this configuration are Tree 1,Tree 2,
and Pole and Coil. The voxel size is 3mm.
For both of these dataset groups, silhouette images
were generated using a background subtraction method.
VIII. Results and Discussion
We first show the effect of camera calibration correc-
tion to the SI E values, as shown in Table I. Recall that
the value of SIE roughly indicates the number of pixels
that do not match between the input silhouette and the
image of the reconstructed shape. Also, note that since
we use a voxel-based method, the SIE is never zero
except for the case of perfect camera calibration, perfect
segmentation, and infinitely-small voxels.
Given all of these preliminaries, we can see from the
table that correcting only the external parameters results
in a great decrease in the value of SIE, when compared
to the uncorrected results. From examining the table,
for these datasets a reduction of 33% or more in SI E
values can be gained by the camera calibration correction
scenarios.
We display a selection of the reconstructions of the
seven datasets using no correction, external parameter
correction, and external and internal parameter correc-
tion in Figures 5-8. We were able to display the recon-
structions for all datasets because of space limitations.
We discuss the synthetic dataset first, in Figures 5.
We can see in this figure that the uncorrected SfIS
reconstruction in Figure 5b is quite noisy, and the de-
tail of small branches is largely lost. However, in the
reconstructions using corrected parameters (Figures 5c
and 5d), the reconstruction more faithfully represents
the ground truth model, though there are some noisy
regions remaining for small branches. There is very little
difference between the reconstructions using the two
types of corrections.
In Table II, we show classification rates of the recon-
structions as compared to a voxel-version of the ground
truth model. In this table, ‘TP’ is true positive (in our
context, a positive is an occupied voxel), ‘FP’ is false
positive, ‘TN’ is true negative, and ‘FN’ is false negative.
This table shows that the true positive rate increases by
22 % when using either one of the corrections, while the
true negative rate remains largely the same. The opposite
behavior can be observed with the false negative and false
positive rates.
The Branch,Coil, and Coils and Cable
datasets reconstructions have similar behavior
as the Model Tree dataset (the reconstruction
Coils and Cable can be seen in Figure 6): there is
more noise in the uncorrected reconstruction than in the
corrected reconstruction. Since the objects in the dataset
are very thin, in the uncorrected reconstructions there
are breaks in the surface where it is continuous in the
original object. The corrected reconstructions tend to
repair these breaks and reduce noisy regions as well. This
is especially true for the Coils and Cable dataset,
where the small diameter wire, in the uncorrected
reconstruction, is broken into many pieces, but is
connected in the corrected reconstruction (Figure 6).
Figure 7 shows the looped thin cable (on the left side of
the coil in Figure 6).
The use of external parameter correction produces
reconstructions that are more representative than re-
constructions generated without the correction. For this
dataset, the best reconstructions were generated by using
the correction of both internal and external parameters.
The second group of real datasets, Tree 1,Tree 2,
4832
(a) Ground truth (b) Without any camera cali-
bration correction (c) With correction of exter-
nal parameters (d) With correction of inter-
nal and external parameters
Fig. 5. The reconstruction of the Model Tree dataset
and Pole and Coil were generated with one camera
mounted on the end-effector of a robot. The first tree,
in the Tree 1 dataset, is considered to have a ‘weeping’
form, and it has many small branches, whereas Tree 2
has a more upright form (detail in Figure 8). Finally,
Pole and Coil consists of a metal pole with a coil
attached using a zip tie; there are four thumb screws
along the pole’s length. For all three of these datasets,
there is a great deal of improvement in the reconstruc-
tion’s ability to represent the original object when the
camera calibration parameters are corrected. There is
little difference between the two types of correction, but
using a correction of the external and internal parameters
seem to produce the best results, with more small de-
tails reconstructed. For instance, in the Pole and Coil
dataset, the narrow plastic tie which holds the coil
to the pole is reconstructed when all parameters are
corrected and is not when only the external parameters
are corrected (not shown).
TABLE I
The Silhouette Inconsistency error SIE of the seven
datasets
Dataset SI E, no
correction
SI E with R,
tcorrection
SI E with R,
t,Kcorrection
Model Tree 2,044,618 946,104 937,490
Branch 171,346 124,351 109,984
Coil 182,176 104,806 99,195
Coils and Cable 331,743 187,255 161,649
Tree 1 3,001,539 2,036,888 1,739,336
Tree 2 1,969,238 1,149,083 1,029,106
Pole and Coil 711,988 363,249 302,940
(a) Without any
camera calibration
correction
(b) With correction
of external parame-
ters
(c) With correction
of external and in-
ternal parameters
Fig. 6. The reconstruction of the Coils and Cable dataset
(a) Without any
camera calibration
correction
(b) With correction
of external parame-
ters
(c) With correction
of external and in-
ternal parameters
Fig. 7. Detail of the Coils and Cable dataset’s reconstruction
(a) Without any
camera calibration
correction
(b) With correction
of external parame-
ters
(c) With correction
of external and in-
ternal parameters
Fig. 8. Detail of the Tree 2 dataset’s reconstruction
4833
TABLE II
Reconstruction accuracy as compared to a voxelated
ground truth of the Model Tree dataset
Reconstruction TP TN FP FN
No correction 0.599462 0.999496 0.000504248 0.400538
R,tcorrection 0.821213 0.999587 0.000412875 0.178787
R,t,K
correction 0.824469 0.999476 0.000524409 0.175531
IX. Conclusion
We have presented a method for camera calibration
correction, under two different scenarios, in a Shape from
Inconsistent Silhouette context. We have shown through
the use of different objects and scenarios that the use
of camera calibration correction can improve the recon-
structions of thin objects given an initial reconstruction.
A conclusion we came to after examining the results
is that there is little to lose by performing the correction
with both external and internal cameras. When inter-
nal camera calibration parameters are somewhat poor,
performing the full correction results in more repre-
sentative reconstructions (datasets Branch,Coil, and
Coils and Cable). On the other hand, the similarity
of the reconstructions using the two types of corrections
for the synthetic datasets have shown that performing
the full correction will not degrade the reconstruction,
even when it is known that the internal parameters were
not perturbed by error. The reason for this is that our
acceptance of updated camera calibration parameters is
dependent on a lower SIE score than the score gained
with the current parameters. This requirement prevents
the calibration from deviating even more form the true
calibration.
References
[1] A. Laurentini, “The visual hull concept for silhouette-based
image understanding,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 16, no. 2, pp. 150–162, 1994.
[2] C. Hernández, F. Schmitt, and R. Cipolla, “Silhouette co-
herence for camera calibration under circular motion,” IEEE
Transactions on Pattern Analysis and Machine Intelligence,
vol. 29, no. 2, pp. 343–349, 2007.
[3] P.-H. Huang and S.-H. Lai, “Silhouette-based camera calibra-
tion from sparse views under circular motion,” in Computer
Vision and Pattern Recognition, 2008. CVPR 2008. IEEE
Conference on, June 2008, pp. 1–8.
[4] P. Mendonca, K. Y. K. Wong, and R. Cippolla, “Epipolar
geometry from profiles under circular motion,” Pattern Anal-
ysis and Machine Intelligence, IEEE Transactions on, vol. 23,
no. 6, pp. 604–616, Jun 2001.
[5] H. Zhang and K. Y. K. Wong, “Self-calibration of turntable
sequences from silhouettes,” Pattern Analysis and Machine
Intelligence, IEEE Transactions on, vol. 31, no. 1, pp. 5–14,
Jan 2009.
[6] K.-Y. Wong and R. Cipolla, “Reconstruction of sculpture from
its profiles with unknown camera positions,” Image Process-
ing, IEEE Transactions on, vol. 13, no. 3, pp. 381–389, March
2004.
[7] K. Åström, R. Cipolla, and P. Giblin, “Generalised epipo-
lar constraints,” International Journal of Computer Vision,
vol. 33, no. 1, pp. 51–72, 1999.
[8] E. Boyer, “On Using Silhouettes for Camera Calibration,” in
7th Asian Conference on Computer Vision (ACCV ’06), ser.
Lecture Notes in Computer Science (LNCS), P. J. Narayanan,
S. K. Nayar, and H.-Y. Shum, Eds., vol. 3851/2006. Hyder-
abad, India: Springer-Verlag, 2006, pp. 1–10.
[9] Y. Furukawa, A. Sethi, J. Ponce, and D. Kriegman, “Ro-
bust structure and motion from outlines of smooth curved
surfaces,” Pattern Analysis and Machine Intelligence, IEEE
Transactions on, vol. 28, no. 2, pp. 302–315, Feb 2006.
[10] H. Yamazoe, A. Utsumi, and S. Abe, “Multiple camera cali-
bration with bundled optimization using silhouette geometry
constraints,” in Pattern Recognition, 2006. ICPR 2006. 18th
International Conference on, vol. 3, 2006, pp. 960–963.
[11] X. Zhang, Y. Zhang, X. Zhang, T. Yang, X. Tong, and
H. Zhang, “A convenient multi-camera self-calibration method
based on human body motion analysis,” in Image and Graph-
ics, 2009. ICIG ’09. Fifth International Conference on, Sept
2009, pp. 3–8.
[12] S. Sinha, M. Pollefeys, and L. McMillan, “Camera network
calibration from dynamic silhouettes,” in Computer Vision
and Pattern Recognition, 2004. CVPR 2004. Proceedings of
the 2004 IEEE Computer Society Conference on, vol. 1, June
2004, pp. I–195–I–202 Vol.1.
[13] Y. Furukawa and J. Ponce, “Accurate camera calibration
from multi-view stereo and bundle adjustment,” in Computer
Vision and Pattern Recognition, 2008. CVPR 2008. IEEE
Conference on, June 2008, pp. 1–8.
[14] A. Tabb, “Shape from silhouette probability maps: reconstruc-
tion of thin objects in the presence of silhouette extraction and
calibration error,” in IEEE Conference on Computer Vision
and Pattern Recognition, June 2013.
[15] G. K. Cheung, T. Kanade, J.-Y. Bouguet, and M. Holler, “A
real time system for robust 3d voxel reconstruction of human
motions,” CVPR ’00: Proceedings of the IEEE International
Conference on Computer Vision and Pattern Recognition,
vol. 2, p. 2714, 2000.
[16] J.-L. Landabaso, M. Pardàs, and J. R. Casas, “Shape from
inconsistent silhouette,” Computer Vision and Image Under-
standing, vol. 112, no. 2, pp. 210 – 224, 2008.
[17] J.-S. Franco and E. Boyer, “Fusion of Multi-View Silhouette
Cues Using a Space Occupancy Grid,” in ICCV ’05: Proceed-
ings of the Tenth IEEE International Conference on Computer
Vision. Beijing, China: IEEE Computer Society, 2005, pp.
1747–1753.
[18] L. Guan, J.-S. Franco, and M. Pollefeys, “3D Occlusion In-
ference from Silhouette Cues,” in CVPR ’07: Proceedings of
the IEEE International Conference on Computer Vision and
Pattern Recognition, United States, June 2007, pp. 1–8.
[19] G. Haro and M. Pardàs, “Shape from incomplete silhouettes
based on the reprojection error,” Image and Vision Comput-
ing, vol. 28, no. 9, pp. 1354 – 1368, 2010.
[20] R. I. Hartley and A. Zisserman, Multiple View Geometry in
Computer Vision, 2nd ed. Cambridge University Press, ISBN:
0521540518, 2004.
[21] J.-S. Franco and E. Boyer, “Exact polyhedral visual hulls,” in
in British Machine Vision Conference (BMVC’03, 2003, pp.
329–338.
[22] S. Rusinkiewicz and M. Levoy, “Efficient variants of the icp
algorithm,” in 3-D Digital Imaging and Modeling, 2001. Pro-
ceedings. Third International Conference on, 2001, pp. 145–
152.
[23] E. K. Chong and S. H. Zak, An introduction to optimization,
3rd ed. John Wiley & Sons, 2011.
[24] M. Lourakis, “levmar: Levenberg-marquardt nonlinear
least squares algorithms in C/C++,” [web page]
http://www.ics.forth.gr/~lourakis/levmar/, Jul. 2004,
[Accessed on 31 Jan. 2005.].
[25] Meshlab, “Developed with the support of the 3d-coform
project,” http://meshlab.sourceforge.net/.
[26] Z. Zhang, “A flexible new technique for camera calibration,”
Pattern Analysis and Machine Intelligence, IEEE Transac-
tions on, vol. 22, no. 11, pp. 1330–1334, Nov 2000.
4834
... The 3D Tree Reconstruction and Shape Analyses. A representative WEEP tree was analyzed by using an in-house-designed structural phenotyping system at the US Department of Agriculture, Agricultural Research Service Appalachian Fruit Research Station, Kearneysville, WV (72)(73)(74)(75). Briefly, the tree was imaged against a blue background via a robotic arm with two digital cameras mounted on the robot's end effector. ...
Article
Significance Trees’ branches grow against the pull of gravity and toward light. Although gravity and light perception have been studied in model species, much is unknown about how trees detect and respond to these signals. Here, we report the identification of a gene ( WEEP ) that controls lateral branch orientations and is directly or indirectly required for gravity responses in trees. Loss or reduction of WEEP expression produced branches that grow outward and downward and did not exhibit normal gravitropism responses when displaced. WEEP is conserved throughout the plant kingdom and may be involved in gravity perception. WEEP may also be a valuable target for breeding or engineering trees with improved shapes for agricultural and landscaping applications.
Thesis
Der technische Fortschritt und die damit einhergehende steigende Systemauflösung der Aufnahmegeräte machen die Bewegung des Patienten zu einem entscheidenden limitierenden Faktor der Bildqualität in der digitalen Volumentomographie (DVT). Patientenbewegungen verursachen schwere Bildartefakte im rekonstruierten Volumen und können dessen Diagnostizierbarkeit maßgeblich beeinträchtigen. Zur Korrektur von Patientenbewegungen können Autokalibrierverfahren eingesetzt werden, welche den Bewegungsfehler in den Bilddaten erkennen und automatisch korrigieren. Diese Arbeit behandelt die dentalspezifische Problemstellung einer Bewegungskorrektur für DVT. Aufgrund der geringen Strahlendosis und der Gerätegeometrie weisen dentale DVT-Daten verstärkt Bildartefakte auf. Dies verschärft die Rahmenbedingungen für Autokalibrierverfahren, da aufgrund der Bildartefakte der Bildentstehungsprozess nur ungenügend modelliert werden kann. Das erschwert eine isolierte Betrachtung des Bewegungsfehlers und somit die Auswertung der Datentreue von Projektionsdaten und rekonstruiertem Volumen. Diese Arbeit fokussiert sich daher auf merkmalbasierte Verfahren, welche eine Abstraktion des Autokalibrierproblems von den Bildartefakten ermöglichen. Konkret werden die Konturen der Zähne ausgewertet. Konturmerkmale sind zur Autokalibrierung dentaler DVT-Daten besonders geeignet, da ihre Dimensionalität und Ausprägung die Nachteile des hohen Rauschens und der Strukturüberlagerungen der Röntgenprojektionen kompensieren. Diese Arbeit umfasst die Beschreibung und Evaluation von drei neuen, auf dentale Daten zugeschnittenen Autokalibrierverfahren. Die Autokalibrierverfahren schätzen die Parameter der Projektionsgeometrie einer DVT-Aufnahme aus ihren Projektionsdaten. Sie behandeln Aspekte der Verfahrensrobustheit und der Reduktion der Problemkomplexität. Die Evaluation zeigt eine deutliche Verbesserung der Bewegungsartefakte und eine durchschnittliche Wiederherstellung der Bildschärfevonbiszu98%. Die Eignung konturbasierter Autokalibrierverfahren zur Kompensation von Patientenbewegungen in der dentalen DVT wird somit belegt.
Article
Full-text available
This paper addresses the problem of motion estimation from profiles (also known as apparent contours) of an object rotating on a turntable in front of a single camera. Its main contribution is the development of a practical and accurate technique for solving this problem from profiles alone, which is precise enough to allow for the reconstruction of the shape of the object. No correspondences between points or lines are necessary, although the method proposed can be used equally when these features are available without any further adaptation. Symmetry properties of the surface of revolution swept out by the rotating object are exploited to obtain the image of the rotation axis and the homography relating epipolar lines in two views in a robust and elegant way. These, together with geometric constraints for images of rotating objects, are then used to obtain first the image of the horizon, which is the projection of the plane that contains the camera centers, and then the epipoles, thus fully determining the epipolar geometry of the image sequence. The estimation of the epipolar geometry by this sequential approach (image of rotation axis¿homography¿image of the horizon¿epipoles) avoids many of the problems usually found in other algorithms for motion recovery from profiles. In particular, the search for the epipoles, by far the most critical step, is carried out as a simple one-dimensional optimization problem. The initialization of the parameters is trivial and completely automatic for all stages of the algorithm. After the estimation of the epipolar geometry, the Euclidean motion is recovered using the fixed intrinsic parameters of the camera obtained either from a calibration grid or from self-calibration techniques. Finally, the spinning object is reconstructed from its profiles using the motion estimated in the previous stage. Results from real data are presented, demonstrating the efficiency and usefulness of the proposed methods.
Conference Paper
Full-text available
This paper considers the problem of reconstructing the shape of thin, texture-less objects such as leafless trees when there is noise or deterministic error in the silhouette extraction step or there are small errors in camera calibration. Traditional intersection-based techniques such as the visual hull are not robust to error because they penalize false negative and false positive error unequally. We provide a voxel-based formalism that penalizes false negative and positive error equally, by casting the reconstruction problem as a pseudo-Boolean minimization problem, where voxels are the variables of a pseudo-Boolean function and are labeled occupied or empty. Since the pseudo-Boolean minimization problem is NP-Hard for nonsubmodular functions, we developed an algorithm for an approximate solution using local minimum search. Our algorithm treats input binary probability maps (in other words, silhouettes) or continuously-valued probability maps identically, and places no constraints on camera placement. The algorithm was tested on three different leafless trees and one metal object where the number of voxels is 54.4 million (voxel sides measure 3.6 mm). Results show that our approach reconstructs the complicated branching structure of thin, texture-less objects in the presence of error where intersection-based approaches currently fail.
Article
Full-text available
In this paper we will discuss structure and motion problems for curved surfaces. These will be studied using the silhouettes or apparent contours in the images. The problem of determining camera motion from the apparent contours of curved three-dimensional surfaces, is studied. It will be shown how special points, called epipolar tangency points or frontier points, can be used to solve this problem. A generalised epipolar constraint is introduced, which applies to points, curves, as well as to apparent contours of surfaces. The theory is developed for both continuous and discrete motion, known and unknown orientation, calibrated and uncalibrated, perspective, weak perspective and orthographic cameras. Results of an iterative scheme to recover the epipolar line structure from real image sequences using only the outlines of curved surfaces, is presented. A statistical evaluation is performed to estimate the stability of the solution. It is also shown how the motion of the camera from a sequence of images can be obtained from the relative motion between image pairs.
Article
Full-text available
The advent of high-resolution digital cameras and sophisticated multi-view stereo algorithms offers the promise of unprecedented geometric fidelity in image-based modeling tasks, but it also puts unprecedented demands on camera calibration to fulfill these promises. This paper presents a novel approach to camera calibration where top-down information from rough camera parameter estimates and the output of a multi-view-stereo system on scaled-down input images is used to effectively guide the search for additional image correspondences and significantly improve camera calibration parameters using a standard bundle adjustment algorithm (Lourakis and Argyros 2008). The proposed method has been tested on six real datasets including objects without salient features for which image correspondences cannot be found in a purely bottom-up fashion, and objects with high curvature and thin structures that are lost in visual hull construction even with small errors in camera parameters. Three different methods have been used to qualitatively assess the improvements of the camera parameters. The implementation of the proposed algorithm is publicly available at Furukawa and Ponce (2008b).
Conference Paper
Full-text available
We consider the problem of detecting and accounting for the presence of occluders in a 3D scene based on silhouette cues in video streams obtained from multiple, calibrated views. While well studied and robust in controlled environments, silhouette-based reconstruction of dynamic objects fails in general environments where uncontrolled occlusions are commonplace, due to inherent silhouette corruption by occluders. We show that occluders in the interaction space of dynamic objects can be detected and their 3D shape fully recovered as a byproduct of shape-from-silhouette analysis. We provide a Bayesian sensor fusion formulation to process all occlusion cues occurring in a multi-view sequence. Results show that the shape of static occluders can be robustly recovered from pure dynamic object motion, and that this information can be used for online self-correction and consolidation of dynamic object shape reconstruction.
Article
Full-text available
This paper addresses the problem of recovering both the intrinsic and extrinsic parameters of a camera from the silhouettes of an object in a turntable sequence. Previous silhouette-based approaches have exploited correspondences induced by epipolar tangents to estimate the image invariants under turntable motion and achieved a weak calibration of the cameras. It is known that the fundamental matrix relating any two views in a turntable sequence can be expressed explicitly in terms of the image invariants, the rotation angle, and a fixed scalar. It will be shown that the imaged circular points for the turntable plane can also be formulated in terms of the same image invariants and fixed scalar. This allows the imaged circular points to be recovered directly from the estimated image invariants, and provide constraints for the estimation of the imaged absolute conic. The camera calibration matrix can thus be recovered. A robust method for estimating the fixed scalar from image triplets is introduced, and a method for recovering the rotation angles using the estimated imaged circular points and epipoles is presented. Using the estimated camera intrinsics and extrinsics, a Euclidean reconstruction can be obtained. Experimental results on real data sequences are presented, which demonstrate the high precision achieved by the proposed method.
Article
Shape from silhouette (SfS) is the general term used to refer to the techniques that obtain a volume estimate from a set of binary images. In a first step, a number of images are taken from different positions around the scene of interest. Later, each image is segmented to produce binary masks, also called silhouettes, to delimit the objects of interest. Finally, the volume estimate is obtained as the maximal one which yields the silhouettes. The set of silhouettes is usually considered to be consistent which means that there exists at least one volume which completely explains them. However, silhouettes are normally inconsistent due to inaccurate calibration or erroneous silhouette extraction techniques. In spite of that, SfS techniques reconstruct only that part of the volume which projects consistently in all the silhouettes, leaving the rest unreconstructed. In this paper, we extend the idea of SfS to be used with sets of inconsistent silhouettes. We propose a fast technique for estimating that part of the volume which projects inconsistently and propose a criteria for classifying it by minimizing the probability of miss-classification taking into account the 2D error detection probabilities of the silhouettes. A number of theoretical and empirical results are given, showing that the proposed method reduces the reconstruction error.
Article
Traditional shape from silhouette methods compute the 3D shape as the intersection of the back-projected silhouettes in the 3D space, the so called visual hull. However, silhouettes that have been obtained with background subtraction techniques often present miss-detection errors (produced by false negatives or occlusions) which produce incomplete 3D shapes. Our approach deals with miss-detections, false alarms, and noise in the silhouettes. We recover the voxel occupancy which describes the 3D shape by minimizing an energy based on an approximation of the error between the shape 2D projections and the silhouettes. Two variants of the projection – and as a result the energy – as a function of the voxel occupancy are proposed. One of these variants outperforms the other. The energy also includes a sparsity measure, a regularization term, and takes into account the visibility of the voxels in each view in order to handle self-occlusions.