Content uploaded by Amy Tabb

Author content

All content in this area was uploaded by Amy Tabb on Dec 22, 2016

Content may be subject to copyright.

Camera Calibration Correction in Shape from Inconsistent

Silhouette

Amy Tabb1,2and Johnny Park2

amy.tabb@ars.usda.gov jpark@purdue.edu

Abstract— The use of shape from silhouette for

reconstruction tasks is plagued by two types of real-

world errors: camera calibration error and silhouette

segmentation error. When either error is present, we

call the problem the Shape from Inconsistent Silhou-

ette (SfIS) problem. In this paper, we show how small

camera calibration error can be corrected when using

a previously-published SfIS technique to generate a

reconstruction, by using an Iterative Closest Point

(ICP) approach. We give formulations under two

scenarios: the ﬁrst of which is only external camera

calibration parameters rotation and translation need

to be corrected for each camera and the second of

which is that both internal and external parameters

need to be corrected. We formulate the problem as

a 2D-3D ICP problem and ﬁnd approximate solu-

tions using a nonlinear minimization algorithm, the

Levenberg-Marquadt method. We demonstrate the

ability of our algorithm to create more representative

reconstructions of both synthetic and real datasets of

thin objects as compared to uncorrected datasets.

I. INTRODUCTION

Camera calibration error greatly aﬀects the accuracy

of intersection-based Shape from Silhouette reconstruc-

tions such as the visual hull (VH) [1], particularly when

the object to be reconstructed is thin. When silhouette

segmentation error and camera calibration error are as-

sumed to be present, we call the task of reconstructing

an object’s shape given these types of error the Shape

from Inconsistent Silhouette (SfIS) problem.

This paper deals with the problem of correcting small

camera calibration error in a Shape from Inconsistent Sil-

houette context, meaning that not only is calibration er-

ror present, but silhouette segmentation error is present

as well. We use the results of this work for our current

application of reconstructing leaﬂess trees in laboratory

and ﬁeld conditions, necessary for extracting tree fea-

tures for phenotyping or robotic pruning. Consequently,

we do not make any a priori assumptions about the

object’s shape as there is signiﬁcant biological diversity.

However, the trees we use are trained such that there are

Mention of trade names or commercial products in this publi-

cation is solely for the purpose of providing speciﬁc information

and does not imply recommendation or endorsement by the U.S.

Department of Agriculture. USDA is an equal opportunity provider

and employer. The work in this document was partially supported

by Pennsylvania Department of Agriculture grant “Improving the

Fruit Quality of Pennsylvania Apples with Precision Pruning” (404-

57, 815U 4510).

1USDA-ARS-AFRS, Kearneysville, West Viginia, USA

2Purdue University, West Lafayette, Indiana, USA

Fig. 1. A partial silhouette of a tree with segmentation error

one to four main branches with thin branches emanating

from those larger branches. In addition, because of space

constraints in the orchard rows, the silhouettes of the tree

are partial or truncated silhouettes as in Fig. 1. While

these details of the application greatly inﬂuenced our

approach to camera calibration correction for SfIS, the

method described in this work could be used for other,

smoother and smaller objects as well.

The recent work on correcting camera calibration pa-

rameters from silhouettes can be divided into three rough

groups. The ﬁrst concerns the calibration of cameras

from silhouettes assuming circular motion, such as that

of an object on a turntable ([2], [3], [4], [5], and [6]).

The second group considers general camera motion ([7],

[8], [9], [10]), while the third calibrates camera networks

using sequences of silhouettes, usually of humans moving

in the environment ([11] and [12]).

Within these groups are diﬀering techniques for reﬁn-

ing camera calibration given initial calibration parame-

ters. Most utilize epipolar constraints, and diﬀer in the

manner that frontier points are used. Åström et al. in

[7] used generalized epipolar constraints, and requires

accurate silhouettes and frontier point localization. Boyer

in [8] uses a pairwise cone tangency constraint. In Huang

and Lai [3], circular motion is estimated by exploiting

the homography that relates silhouettes and epipoles.

Mendonça et al. estimates circular motion using a se-

quential approach where epipoles are estimated last in

[4]. Furukawa et al. in [9] use a RANSAC strategy

to estimate camera calibration parameters and frontier

points for orthographic cameras. Sinha et al. in [12] get

around the problem of few epipolar tangents by using

video sequences of humans moving in the environment,

2015 IEEE International Conference on Robotics and Automation (ICRA)

Washington State Convention Center

Seattle, Washington, May 26-30, 2015

978-1-4799-6922-7/15/$31.00 ©2015 IEEE 4827

and is somewhat tolerant of segmentation error. Using

a diﬀerent approach but a similar application, Zhang et

al. in [11] use the centroids of humans moving in the

environment to create correspondences, and calibration

is done using a structure from motion approach. Using a

calibration generated by circular motion as an initializer,

Wong and Cipolla use a manually-aligned initialization

for more general motion in [6]. Yamazoe et al. mini-

mize the distance between frontier points projected onto

images and the silhouettes using bundle adjustment in

[10]. Finally, Zhang and Wong [5] in a circular turntable

sequence estimate the internal and external parameters

using epipolar tangents.

The use of epipolar constraints makes several assump-

tions about the characteristics of the datasets, 1) that

the silhouettes are generally accurate and 2) that the

silhouettes capture the whole object, meaning that the

image silhouettes are not truncated or partial silhouettes

of the full object. Henández et al. in [2] developed a

circular motion calibration system for silhouettes under

the assumption that silhouettes may be truncated or par-

tial, using a silhouette consistency measure. Furukawa

and Ponce [13] create a more accurate and eﬃcient

reconstruction pipeline by using a hierarchical process to

generate camera and reconstruction parameters; scaled-

down images are used ﬁrst, and as the algorithm pro-

gresses larger and larger images are used to reﬁne pa-

rameters using a structure from motion approach.

In this work, we propose a camera calibration cor-

rection procedure that is not dependent on epipolar

constraints. As mentioned previously, the use of epipolar

constraints assumes that silhouettes are relatively accu-

rate and reﬂect the complete object (i.e. not truncated

or partial). Our approach is to minimize the projection

error of the reconstruction and the silhouettes, using a

three-step procedure. In the ﬁrst stage, an initial re-

construction is estimated using a reconstruction method

for SfIS. Then, the SfIS reconstruction is aligned to

the input silhouettes using an Iterative Closest Point

(ICP) approach. The resulting 3D-2D ICP optimization

problem is non-linear, so we use a Levenberg-Marquadt

method for ﬁnding an approximate solution. Then a new

SfIS reconstruction is found using the updated camera

calibration parameters. We use the SfIS reconstruction

method described in [14]. The camera calibration correc-

tion does not depend on the choice of the reconstruction

method, though, so other SfIS reconstruction methods

may be substituted such as [15], [16], [17], [18], and [19].

Our work is most similar to Henández et al. [2] as they

use a silhouette consistency measure and allow partial

and truncated silhouettes. However, in [2] the silhouettes

are assumed to be relatively accurate and the motion

is assumed to be circular, while in our work we deal

with general camera motion and silhouettes with seg-

mentation error. Our method also has some similarities

to Wong and Cipolla [6] in that the reconstruction image

and silhouette image are aligned; however, they use a

manual method to generate a initial calibration and their

cost function is dependent on the presence of epipolar

tangencies. Finally, the work of Furukawa and Ponce [13]

is similar to ours in that the estimated reconstruction is

projected to each image and matches are found, though

their work estimates a new reconstruction after every

round of matches, which we avoided because of the

computational expense involved in the SfIS problem.

In summary, our contributions to the state of the art

are:

1) A method for correcting camera calibration error in

a SfIS context, under two diﬀerent scenarios, with

partial silhouettes and general camera motion.

2) A description of a 2D-3D ICP algorithm for align-

ment.

3) More representative reconstructions of compli-

cated, thin objects such as trees.

II. Preliminaries

Our notation for camera calibration parameters closely

follows that of Hartley and Zisserman in [20].

We assume that the camera calibration parameters

for nccameras are represented by the matrices K∈

R3×3,R∈R3×3,t∈R3×1, and where the projec-

tion equations for the relationship between a three-

dimensional point in homogenous coordinates X∈R4×1

and an two-dimensional image point in homogenous

coordinates x∈R3×1is:

x=K[R t]X(1)

and x=x0x1x2T, is an image point in the xand y

direction is the pair (x0/x2, x1/x2).

Rcan be decomposed into three Euler angles, θx,θy,

and θz, so we represent Ras a function of three angles:

R=R(θx, θy, θz)(2)

A parameterization of Ris necessary to preserve the

orthonormality of Rduring the Levenberg-Marquadt

minimization in Section V-D.

Furthermore, we assume that Kis an upper triangular

matrix of the form

K=

k00k2

0k3k4

001

(3)

However, other forms of Ksuch as those described in

[20] can be used as well.

Finally, we assume that the initial reconstruction can

be represented by closed polyhedral meshes. Our imple-

mentation is a voxel-based technique where the voxels are

cubes, so the polyhedral mesh is made up of square faces

that represent the boundary between the reconstructed

shape and empty space. However, other reconstruction

methods whose output is or could be converted to a

polyhedral mesh such as in EPVH [21] can be used with

4828

the camera calibration correction procedure we describe

here.

III. Camera configuration scenarios

We present camera calibration correction procedures

in two scenarios:

1) Adjust the Rand tmatrices for each camera, while

keeping Kconstant for each camera.

2) Adjust the R,t, and Kmatrices for each camera.

Which scenario to use depends on the application and

one’s assumptions about the ﬁdelity of K. For instance,

we have multi-camera datasets with poor camera calibra-

tion; in such a case the second scenario would be chosen

to ﬁnd an appropriate alignment of the reconstruction

and the silhouettes. We also have a dataset where one

camera is mounted on the end-eﬀector of a robot and K

is assumed to be accurate, in this case, we choose the ﬁrst

scenario. Various other scenarios are possible depending

on the application, and can be derived from the these two

basic scenarios and the framework we present in Section

V-C.

When we describe the general method, we let the

parameters be represented by a vector p. The sizes of

pfor the ﬁrst conﬁguration is 6 (three angles for R

and three elements of t) and for the second is 10 (the 6

from the ﬁrst scenario and 4 internal camera calibration

parameters). Whatever the scenario used, we denote the

matrix represention of pbe P(p).

IV. Method overview

Here, we summarize the method for correcting camera

calibration and relate it to the stopping criterion we use

for our ICP algorithm. First, we denote a reconstruction’s

shape as S.Sconsists of a set of faces, as mentioned in

Section II.

In our previous work ([14]), we gave a metric for

describing the degree of mismatch between the recon-

struction and a set of input silhouettes I, where an

individual image is denoted by I. For one image in the

sequence of input silhouette images, the image of Sis

computed using the camera calibration parameters of

that input image; this image is IS. Then the silhouette

inconsistency error (SI E) of the reconstruction and the

input image is SI E =P∀q|I(q)−IS(q)|, where qis a

pixel index.

Given these preliminaries about SI E, the general algo-

rithm is outlined in Algorithm 1. The SI E error is com-

puted using the current camera calibration parameters;

if some better parameters can be found using the 2D-

3D ICP algorithm then these parameters are accepted as

the current parameters. We deﬁne “better” as parameters

that result in a lower SIE value. This process is repeated

for each image in the input silhouette sequence.

An illustration of the algorithm’s progress on align-

ment is shown in Fig. 2.

Algorithm 1 Cali-Correction(I,S,p(0))

1: nis the maximum number of iterations

2: I(0)

Sis the image of Susing P(p(0))

3: SI E(0) =P∀q|I(q)−I(0)

S(q)|

4: p∗=p(0)

5: for all i= 0 to ndo

6: Align I(i)

Sto Iusing the 2D-3D ICP; result is p(i+1)

7: Compute I(i+1)

Susing p(i+1)

8: SI E(i+1) =P∀q|I(q)−I(i+1)

S(q)|

9: if SI E(i+1) ≤SI E(i)then

10: p∗=p(i+1)

11: else

12: break;

13: end if

14: end for

15: return p∗

V. 3D-2D ICP

This section details how we adapt the ICP algorithm,

which is usually used for 2D-2D alignments or 3D-3D

alignments, to the case of a 2D-3D alignment. While

one option was to perform a standard 2D-2D alignment

assuming a planar projective transform, and then to

interpret those results as a camera calibration correction,

this approach ignores the dimensionality of the original

problem. There are various eﬃcient variants of ICP

available for aligning 3D meshes ([22]). We adapt a basic

form given in ([22]), which consists of a sequence of select,

match, and minimize steps.

A. Selection of 3D points

We now give more details about the projection of Sto

IS, and how ISis used in the 2D-3D ICP algorithm.

Every face in Sis made up of a sequence of three-

dimensional points X. We project each point to the

camera speciﬁed by P(p), and repeat this process for all

faces in S; by ﬁlling in the convex polygon created with

the sequence of projected points for each face, we can

generate IS. From there, we determine which points X,

when projected, fall inside the silhouette of ISand which

fall outside. In Fig. 3, we show the silhouette boundary

of ISfor some large voxels as a medium gray lines,

while those projected points that are on the silhouette

boundary are represented by white circles, and those

inside the silhouette boundary are represented as green

circles.

The 3D points that we use for ICP are those points

that are on the silhouette boundary of IS– in other

words, the points which projected produced white circles

in Fig 3 – and we denote this set of points for camera cas

XSc. We use all of the points in XScto generate matches.

B. Matching 3D points with 2D image coordinates

From the original image silhouettes (I) and the silhou-

ette of the reconstruction (IS) for camera c, we compute

4829

(a) The ﬁrst six iterations of the ICP algorithm where Rand tare allowed to change, Kis kept ﬁxed.

(b) The ﬁrst six iterations of the ICP algorithm where R,t, and Kare all allowed to change. This run of ICP is done

after the completion of a round when only Rand tare allowed to change.

Fig. 2. [Best viewed in color.] Illustration of the progression of the camera calibration correction algorithm. Original silhouette image

pixels Iare medium gray; this silhouette boundary of the reconstruction image ISis in green. The top row represents the alignment as

a result of the ﬁrst six iterations of the 2D-3D algorithm where Rand tare adjusted. Once that process terminates as a result of the

stopping criterion related to SI E, the 2D-3D algorithm is run again, the diﬀerence being that R,t, and Kare adjusted. The second row

represents the ﬁrst six iterations of the second process, once the R,tonly adjustment has been completed. More details of particular

experimental choices can be found in Section VI.

Fig. 3. [Best viewed in color.] This ﬁgure is an illustration

of how ISis generated from S. The points composing each face

in Sare projected using a current estimate of camera calibration

parameters; the projected face is ﬁlled to generate a black and white

image. The silhouette boundary is shown in this ﬁgure as medium

gray lines, while projected points inside the boundary are shown as

green circles. Points on the silhouette boundary are represented as

white circles; the 3D points generating the white circles form the

set XScfor a camera c.

the surface normals for each pixel of the silhouette. Since

we use square faces for S, depending on the voxel size the

projection of the reconstruction can have right angles

and other severe changes in normal vector direction,

particularly for large voxels as shown in Fig. 3. To

reduce this eﬀect, we smooth the normals of the projected

reconstruction silhouettes. Given the kth silhouette pixel

in a contour, the smoothed normal is given by simple

averaging, where n0

kis the smoothed normal at position

k:n0

k= (nk+nk−1+nk+1)/3. This smoothing process

is performed twice.

Given that the projection of a 3D point X∈XScis x=

PiX, we search for the closest original image silhouette

point to xwhere the angle between normals is less that

2π/3. We represent this image silhouette point as φ(X).

Many ICP algorithms reject a percentage of the worst

matches. Our approach to SfIS has been to assume that

error exists, but not to specify the quantity or source of

that error. As a result, we are reluctant to use a pre-set

threshold for rejecting matches. For instance, sometimes

the matches are quite accurate, and discarding some

of them according to a set percentage would result in

discarding good information. On the other hand, some

reconstructions Sare quite noisy, so the reject percentage

should be large. To avoid committing to a threshold for

rejection ahead of time, we implemented the following

scheme.

First, we perform the Levenberg-Marquadt minimiza-

tion for the given matches without rejecting any matches.

If the resulting camera parameters pgive a lower value

of SI E, we accept those parameters as p∗=pand stop.

If not, we reject the worst 1% of matches and run the

minimization again. This process continues until either

parameters resulting in a smaller value of SIE are found,

or the number of iterations is exceeded (typically set at

10 in our experiments).

C. Cost function formulation

Once the matches φ(X)are found, we seek camera cal-

ibration parameters that minimize the distance between

the projections of Xand the silhouette matched pixel

φ(X).

min

pX

i

||P(p)Xi−φi(Xi)||2(4)

We can represent Eq. 4 as follows, where P(p)T

1repre-

sents the ﬁrst row of P(p),P(p)T

2the second row and so

on, as in Hartley and Zisserman [20], and where φi(Xi)1

4830

is the xcomponent of the matching pixel to Xiand

φi(Xi)2is the ycomponent of the matching pixel to Xi:

ˆp=arg min

pX

i

P(p)T

1Xi

P(p)T

3Xi

−φi(Xi)12

+P(p)T

2Xi

P(p)T

3Xi

−φi(Xi)22(5)

This is a nonlinear least squares problem; we can

rearrange into the standard form with a residual vector

as follows:

ˆp=arg min

p

2|XS|−1

X

j=0

rj(P(p))2(6)

where

r2i(P(p)) = P(p)T

1Xi

P(p)T

3Xi

−φi(Xi)1(7)

r2i+1(P(p)) = P(p)T

2Xi

P(p)T

3Xi

−φi(Xi)2(8)

for all Xi∈XS.

While many other ICP algorithms use a weighting for

each match, we instead use a constant weighting for each

match.

D. Levenberg-Marquadt modiﬁcation for Newton’s

method of nonlinear least squares

To ﬁnd an approximate solution to Eq 6, we use

the Levenberg-Marquadt modiﬁcation for nonlinear least

squares. We quickly summarize the method here; more

in-depth treatments can be found in optimization texts

such as [23].

We compute the Jacobian J(P(p)), whose size is

2|XS| × |p|matrix. Each element of the matrix is Jij =

∂ri

∂pj.

Given the Jacobian and the residual functions, the

update formula for a new p(k+1) is:

p(k+1) =p(k)−

JP(p)TJP(p)+µkI−1

JP(p)Tr(P(p)) (9)

where Iis an identity matrix of size |p| × |p|and

µk≥0is the damping parameter and chosen according

to the standard practice of choosing a small µkand then

increasing it until the direction is a descent direction. We

let p(0) be the initial camera calibration parameters.

VI. Implementation details

As mentioned previously, we use the method in [14]

for generating SfIS reconstructions. In that method, the

SfIS reconstruction problem is cast as a pseudo-Boolean

minimization problem with a non-submodular function.

Given that the problem is NP-Hard, a local minimum is

found with a local search method.

In our implementation of the method, the projection

information for each voxel is computed before local

search, to avoid having to recompute projection infor-

mation over multiple iterations. However, this leads to

a great consumption of memory. We then altered the

original algorithm to use a hierarchical approach so that

memory requirements are satisﬁed even on very large

datasets, with over 100 million voxels.

First, we choose a large voxel size and perform the local

minimum search. Then, starting with a reconstruction

at the large voxel size, one voxel is divided into 8 child

voxels like in an octtree; the child voxels’ label is the

same as their parent voxel’s label. For the child voxels,

projection information is computed for voxels within a

speciﬁed distance of the border of empty and occupied

voxels, and another reconstruction is generated with

the child voxels. This splitting continues for as many

iterations as the user desires.

In the results shown here, we let the number of

divisions of voxel size be k. To generate uncorrected

results, we proceed through all of the divisions, to the

kth division and local minimum search. For corrected

results, we use the reconstruction generated at k−1

division, correct camera calibration parameters, and then

proceed to the kth division to do local minimum search

with correction parameters.

We deal with the two correction scenarios, when only

external parameters are corrected, and when internal and

external parameters are corrected, in the following way.

We correct the external parameters ﬁrst. Then, if internal

parameters are to be corrected as well, we correct exter-

nal and internal parameters using the solution generated

by correcting only the external parameters as an initial

solution.

Finally, for the implementation of the Levenberg-

Marquadt method, we used the software package levmar

[24]. All of of the results shown here were generated on

a workstation with one 8-core processor and 72 GB of

RAM.

VII. Datasets

To validate our camera calibration algorithm, we

demonstrate results on both a simulated and six real

datasets, making a total of seven datasets. Descriptions

of the two types of datasets follow. All 3D results are

visualized with Meshlab [25].

A. Simulated data

We generated a simulated dataset, called the

Model Tree dataset, composed of 32 cameras and

using a tree model shown in Figure 5a. Camera cali-

bration error was added to the dataset by altering one

element of translation component of each camera by

e∈ {−20,−19, ..., 19,20}. The element altered (x,y, or

z) and the value of ewas determined randomly, with

4831

(a) (b)

Fig. 4. [Best viewed in color.] Illustration of the synthetic

dataset Model Tree used to validate the algorithm. In 4a, one of

the 32 silhouette images of the synthetic tree. In 4b, the synthetic

tree is shown with the 32 cameras, which are placed on two planes.

all possible values of ehaving a uniform probability of

being selected, as the three axes of t. An example of a

silhouette image from this dataset is shown in Figure 4,

as well as the placement of cameras with respect to the

model. All of the silhouette images for this set consist of

partial silhouettes. The reconstruction voxel size is 2.5

mm.

To assess the error of our reconstruction, we also

generated a voxel-version of the ground truth model to

compare to the reconstructions with and without the

correction. The voxel size of the tree matches that of the

reconstruction, so false positive and false negative rates

can easily be computed.

B. Real data

We generated real datasets in the our laboratory using

two diﬀerent camera conﬁgurations. In the ﬁrst conﬁgu-

ration, there are 13 inexpensive webcameras (image size

1280 ×960 pixels) mounted such that they are to the side

and above an object. The external camera calibration

was estimated using the camera calibration procedure

of Zhang [26], using a custom 2-plane calibration object.

The datasets using this conﬁguration are called Branch,

Coil, and Coils and Cable. The voxel size is 1.5 mm.

The second conﬁguration consists of one camera (im-

age size 2456 ×2058 pixels) mounted on the end eﬀector

of a robot; the robot moves to 38 diﬀerent positions and

acquires images of a larger object than considered in the

ﬁrst conﬁguration. The external camera parameters are

estimated using a hand-eye, robot-world calibration. The

datasets using this conﬁguration are Tree 1,Tree 2,

and Pole and Coil. The voxel size is 3mm.

For both of these dataset groups, silhouette images

were generated using a background subtraction method.

VIII. Results and Discussion

We ﬁrst show the eﬀect of camera calibration correc-

tion to the SI E values, as shown in Table I. Recall that

the value of SIE roughly indicates the number of pixels

that do not match between the input silhouette and the

image of the reconstructed shape. Also, note that since

we use a voxel-based method, the SIE is never zero

except for the case of perfect camera calibration, perfect

segmentation, and inﬁnitely-small voxels.

Given all of these preliminaries, we can see from the

table that correcting only the external parameters results

in a great decrease in the value of SIE, when compared

to the uncorrected results. From examining the table,

for these datasets a reduction of 33% or more in SI E

values can be gained by the camera calibration correction

scenarios.

We display a selection of the reconstructions of the

seven datasets using no correction, external parameter

correction, and external and internal parameter correc-

tion in Figures 5-8. We were able to display the recon-

structions for all datasets because of space limitations.

We discuss the synthetic dataset ﬁrst, in Figures 5.

We can see in this ﬁgure that the uncorrected SfIS

reconstruction in Figure 5b is quite noisy, and the de-

tail of small branches is largely lost. However, in the

reconstructions using corrected parameters (Figures 5c

and 5d), the reconstruction more faithfully represents

the ground truth model, though there are some noisy

regions remaining for small branches. There is very little

diﬀerence between the reconstructions using the two

types of corrections.

In Table II, we show classiﬁcation rates of the recon-

structions as compared to a voxel-version of the ground

truth model. In this table, ‘TP’ is true positive (in our

context, a positive is an occupied voxel), ‘FP’ is false

positive, ‘TN’ is true negative, and ‘FN’ is false negative.

This table shows that the true positive rate increases by

22 % when using either one of the corrections, while the

true negative rate remains largely the same. The opposite

behavior can be observed with the false negative and false

positive rates.

The Branch,Coil, and Coils and Cable

datasets reconstructions have similar behavior

as the Model Tree dataset (the reconstruction

Coils and Cable can be seen in Figure 6): there is

more noise in the uncorrected reconstruction than in the

corrected reconstruction. Since the objects in the dataset

are very thin, in the uncorrected reconstructions there

are breaks in the surface where it is continuous in the

original object. The corrected reconstructions tend to

repair these breaks and reduce noisy regions as well. This

is especially true for the Coils and Cable dataset,

where the small diameter wire, in the uncorrected

reconstruction, is broken into many pieces, but is

connected in the corrected reconstruction (Figure 6).

Figure 7 shows the looped thin cable (on the left side of

the coil in Figure 6).

The use of external parameter correction produces

reconstructions that are more representative than re-

constructions generated without the correction. For this

dataset, the best reconstructions were generated by using

the correction of both internal and external parameters.

The second group of real datasets, Tree 1,Tree 2,

4832

(a) Ground truth (b) Without any camera cali-

bration correction (c) With correction of exter-

nal parameters (d) With correction of inter-

nal and external parameters

Fig. 5. The reconstruction of the Model Tree dataset

and Pole and Coil were generated with one camera

mounted on the end-eﬀector of a robot. The ﬁrst tree,

in the Tree 1 dataset, is considered to have a ‘weeping’

form, and it has many small branches, whereas Tree 2

has a more upright form (detail in Figure 8). Finally,

Pole and Coil consists of a metal pole with a coil

attached using a zip tie; there are four thumb screws

along the pole’s length. For all three of these datasets,

there is a great deal of improvement in the reconstruc-

tion’s ability to represent the original object when the

camera calibration parameters are corrected. There is

little diﬀerence between the two types of correction, but

using a correction of the external and internal parameters

seem to produce the best results, with more small de-

tails reconstructed. For instance, in the Pole and Coil

dataset, the narrow plastic tie which holds the coil

to the pole is reconstructed when all parameters are

corrected and is not when only the external parameters

are corrected (not shown).

TABLE I

The Silhouette Inconsistency error SIE of the seven

datasets

Dataset SI E, no

correction

SI E with R,

tcorrection

SI E with R,

t,Kcorrection

Model Tree 2,044,618 946,104 937,490

Branch 171,346 124,351 109,984

Coil 182,176 104,806 99,195

Coils and Cable 331,743 187,255 161,649

Tree 1 3,001,539 2,036,888 1,739,336

Tree 2 1,969,238 1,149,083 1,029,106

Pole and Coil 711,988 363,249 302,940

(a) Without any

camera calibration

correction

(b) With correction

of external parame-

ters

(c) With correction

of external and in-

ternal parameters

Fig. 6. The reconstruction of the Coils and Cable dataset

(a) Without any

camera calibration

correction

(b) With correction

of external parame-

ters

(c) With correction

of external and in-

ternal parameters

Fig. 7. Detail of the Coils and Cable dataset’s reconstruction

(a) Without any

camera calibration

correction

(b) With correction

of external parame-

ters

(c) With correction

of external and in-

ternal parameters

Fig. 8. Detail of the Tree 2 dataset’s reconstruction

4833

TABLE II

Reconstruction accuracy as compared to a voxelated

ground truth of the Model Tree dataset

Reconstruction TP TN FP FN

No correction 0.599462 0.999496 0.000504248 0.400538

R,tcorrection 0.821213 0.999587 0.000412875 0.178787

R,t,K

correction 0.824469 0.999476 0.000524409 0.175531

IX. Conclusion

We have presented a method for camera calibration

correction, under two diﬀerent scenarios, in a Shape from

Inconsistent Silhouette context. We have shown through

the use of diﬀerent objects and scenarios that the use

of camera calibration correction can improve the recon-

structions of thin objects given an initial reconstruction.

A conclusion we came to after examining the results

is that there is little to lose by performing the correction

with both external and internal cameras. When inter-

nal camera calibration parameters are somewhat poor,

performing the full correction results in more repre-

sentative reconstructions (datasets Branch,Coil, and

Coils and Cable). On the other hand, the similarity

of the reconstructions using the two types of corrections

for the synthetic datasets have shown that performing

the full correction will not degrade the reconstruction,

even when it is known that the internal parameters were

not perturbed by error. The reason for this is that our

acceptance of updated camera calibration parameters is

dependent on a lower SIE score than the score gained

with the current parameters. This requirement prevents

the calibration from deviating even more form the true

calibration.

References

[1] A. Laurentini, “The visual hull concept for silhouette-based

image understanding,” IEEE Trans. Pattern Anal. Mach.

Intell., vol. 16, no. 2, pp. 150–162, 1994.

[2] C. Hernández, F. Schmitt, and R. Cipolla, “Silhouette co-

herence for camera calibration under circular motion,” IEEE

Transactions on Pattern Analysis and Machine Intelligence,

vol. 29, no. 2, pp. 343–349, 2007.

[3] P.-H. Huang and S.-H. Lai, “Silhouette-based camera calibra-

tion from sparse views under circular motion,” in Computer

Vision and Pattern Recognition, 2008. CVPR 2008. IEEE

Conference on, June 2008, pp. 1–8.

[4] P. Mendonca, K. Y. K. Wong, and R. Cippolla, “Epipolar

geometry from proﬁles under circular motion,” Pattern Anal-

ysis and Machine Intelligence, IEEE Transactions on, vol. 23,

no. 6, pp. 604–616, Jun 2001.

[5] H. Zhang and K. Y. K. Wong, “Self-calibration of turntable

sequences from silhouettes,” Pattern Analysis and Machine

Intelligence, IEEE Transactions on, vol. 31, no. 1, pp. 5–14,

Jan 2009.

[6] K.-Y. Wong and R. Cipolla, “Reconstruction of sculpture from

its proﬁles with unknown camera positions,” Image Process-

ing, IEEE Transactions on, vol. 13, no. 3, pp. 381–389, March

2004.

[7] K. Åström, R. Cipolla, and P. Giblin, “Generalised epipo-

lar constraints,” International Journal of Computer Vision,

vol. 33, no. 1, pp. 51–72, 1999.

[8] E. Boyer, “On Using Silhouettes for Camera Calibration,” in

7th Asian Conference on Computer Vision (ACCV ’06), ser.

Lecture Notes in Computer Science (LNCS), P. J. Narayanan,

S. K. Nayar, and H.-Y. Shum, Eds., vol. 3851/2006. Hyder-

abad, India: Springer-Verlag, 2006, pp. 1–10.

[9] Y. Furukawa, A. Sethi, J. Ponce, and D. Kriegman, “Ro-

bust structure and motion from outlines of smooth curved

surfaces,” Pattern Analysis and Machine Intelligence, IEEE

Transactions on, vol. 28, no. 2, pp. 302–315, Feb 2006.

[10] H. Yamazoe, A. Utsumi, and S. Abe, “Multiple camera cali-

bration with bundled optimization using silhouette geometry

constraints,” in Pattern Recognition, 2006. ICPR 2006. 18th

International Conference on, vol. 3, 2006, pp. 960–963.

[11] X. Zhang, Y. Zhang, X. Zhang, T. Yang, X. Tong, and

H. Zhang, “A convenient multi-camera self-calibration method

based on human body motion analysis,” in Image and Graph-

ics, 2009. ICIG ’09. Fifth International Conference on, Sept

2009, pp. 3–8.

[12] S. Sinha, M. Pollefeys, and L. McMillan, “Camera network

calibration from dynamic silhouettes,” in Computer Vision

and Pattern Recognition, 2004. CVPR 2004. Proceedings of

the 2004 IEEE Computer Society Conference on, vol. 1, June

2004, pp. I–195–I–202 Vol.1.

[13] Y. Furukawa and J. Ponce, “Accurate camera calibration

from multi-view stereo and bundle adjustment,” in Computer

Vision and Pattern Recognition, 2008. CVPR 2008. IEEE

Conference on, June 2008, pp. 1–8.

[14] A. Tabb, “Shape from silhouette probability maps: reconstruc-

tion of thin objects in the presence of silhouette extraction and

calibration error,” in IEEE Conference on Computer Vision

and Pattern Recognition, June 2013.

[15] G. K. Cheung, T. Kanade, J.-Y. Bouguet, and M. Holler, “A

real time system for robust 3d voxel reconstruction of human

motions,” CVPR ’00: Proceedings of the IEEE International

Conference on Computer Vision and Pattern Recognition,

vol. 2, p. 2714, 2000.

[16] J.-L. Landabaso, M. Pardàs, and J. R. Casas, “Shape from

inconsistent silhouette,” Computer Vision and Image Under-

standing, vol. 112, no. 2, pp. 210 – 224, 2008.

[17] J.-S. Franco and E. Boyer, “Fusion of Multi-View Silhouette

Cues Using a Space Occupancy Grid,” in ICCV ’05: Proceed-

ings of the Tenth IEEE International Conference on Computer

Vision. Beijing, China: IEEE Computer Society, 2005, pp.

1747–1753.

[18] L. Guan, J.-S. Franco, and M. Pollefeys, “3D Occlusion In-

ference from Silhouette Cues,” in CVPR ’07: Proceedings of

the IEEE International Conference on Computer Vision and

Pattern Recognition, United States, June 2007, pp. 1–8.

[19] G. Haro and M. Pardàs, “Shape from incomplete silhouettes

based on the reprojection error,” Image and Vision Comput-

ing, vol. 28, no. 9, pp. 1354 – 1368, 2010.

[20] R. I. Hartley and A. Zisserman, Multiple View Geometry in

Computer Vision, 2nd ed. Cambridge University Press, ISBN:

0521540518, 2004.

[21] J.-S. Franco and E. Boyer, “Exact polyhedral visual hulls,” in

in British Machine Vision Conference (BMVC’03, 2003, pp.

329–338.

[22] S. Rusinkiewicz and M. Levoy, “Eﬃcient variants of the icp

algorithm,” in 3-D Digital Imaging and Modeling, 2001. Pro-

ceedings. Third International Conference on, 2001, pp. 145–

152.

[23] E. K. Chong and S. H. Zak, An introduction to optimization,

3rd ed. John Wiley & Sons, 2011.

[24] M. Lourakis, “levmar: Levenberg-marquardt nonlinear

least squares algorithms in C/C++,” [web page]

http://www.ics.forth.gr/~lourakis/levmar/, Jul. 2004,

[Accessed on 31 Jan. 2005.].

[25] Meshlab, “Developed with the support of the 3d-coform

project,” http://meshlab.sourceforge.net/.

[26] Z. Zhang, “A ﬂexible new technique for camera calibration,”

Pattern Analysis and Machine Intelligence, IEEE Transac-

tions on, vol. 22, no. 11, pp. 1330–1334, Nov 2000.

4834