Content uploaded by Emile Hendriks
Author content
All content in this area was uploaded by Emile Hendriks on Nov 23, 2012
Content may be subject to copyright.
Self calibration of stereo cameras with lens distortion
André Redert and Emile Hendriks
Information and Communication Theory Group, Department of Electrical Engineering
Delft University of Technology
Mekelweg 4, 2628 CD Delft, The Netherlands
email: {andre,emile}@it.et.tudelft.nl
phone: +31 15 278 6269, fax: +31 15 278 1843
Abstract
In this paper we present a new method for the
self calibration of a stereo camera pair
including lens distortion. A single stereo image
pair is used without calibration object or prior
knowledge of the scene.
The method is based on the estimation of a
stereo image correspondence field followed by
extraction of the calibration parameters. The
correspondence field is obtained by a Markov
Random Field motion estimator with a
calibration invariant epipolar constraint. A
new probabilistic model for local lens
distortion serves as basis for the constraint. A
probabilistic stereo camera model is fitted to
the correspondence field using a MAP
criterion. The optimum is found by the
stochastic minimisation procedure simulated
annealing.
We are currently investigating the new
technique. We expect to improve on the results
reported in [8], since in our approach the
number of corresponding points used for
parameter extraction is several orders higher.
1. Introduction
In the area of 3D measurements with stereo
imaging, accurate calibration of the camera pair
is crucial. The reconstruction of scene points is
done by estimation of corresponding pixels in
left and right image and subsequent
triangulation. Accurate triangulation can only
be performed if the external, internal and
lens/ccd distortion parameters of both cameras
are known. As an important byproduct
calibration reduces the correspondence problem
from 2D motion estimation to the more efficient
and reliable 1D disparity estimation.
There are two different techniques for
calibration: fixed and self calibration. In fixed
calibration, all parameters are extracted off line
by placing a special object with known
geometry in front of the cameras and processing
of the camera images. In the past 20 years, little
has change is observed in fixed calibration
techniques [2,7].
Although fixed calibration procedures give
the most accurate results, they suffer from a
number of disadvantages. The procedure
requires a special calibration object and user
interaction. Most off all, the parameters become
useless after a change in the camera parameters
due to e.g. zooming.
These disadvantages can be circumvented by
self calibration techniques. In [4] distortion
parameters are found on the basis of a stereo
pair or triplet of images. In [8] all calibration
parameters are found using a stereo pair of
images. The images contain some calibration
pattern with unknown geometry. Lens
distortion is assumed to be equal for both
cameras and have a radial component only. A
sparse correspondence field is obtained by
feature extraction, from which then the
calibration parameters are determined.
In this paper, we consider self calibration of
all camera parameters on the basis of a single
image pair. The image pair contains the actual
3D scene of interest. A dense correspondence
field is estimated, without need for feature
extraction. Subsequently the calibration
parameters are extracted from the field, see
Figure 1.
Figure 1: The self calibration system
The correspondence field is obtained by motion
estimation with an epipolar constraint that is
invariant to the internal and external camera
parameters. It utilizes a probabilistic Markov
Random Field model for lens distortion. The
distortion MRF model is then easily combined
with a MRF model for motion estimation which
are known for their high accuracy results [5].
Finally a probabilistic stereo camera model is
fitted to the correspondence field obtained. Due
to the a priori parameter probabilities the
solution is regularized. This enables the
existence of a
single
global optimum and makes
its localisation more easy. We use the
stochastic minimisation procedure simulated
annealing to find the solution.
The paper is organised as follows. In the next
section we describe our correspondence
estimator based on motion estimation and a new
probabilistic distortion model. Section 3 gives
the probabilistic stereo camera model and
section 4 describes the extraction of camera
parameters from the correspondence field.
Finally section 5 gives our preliminary results.
2. Correspondence estimation
In this section the goal is to find the
corresponding pixels in left and right images, in
the absence of calibration knowledge.
In a stereo image pair, the corresponding pixel
pairs follow the so called epipolar constraint.
Without lens distortion, all correspondences in
the stereo pair ly on straight lines in the left and
right images, the epipolar lines, regardless of all
other parameters. With lens distortion, the
epipolar lines become curved.
We use a Markov Random Field motion
estimator with a constraint on the curvature of
the epipolar lines. This is invariant to all
camera parameters except lens distortion.
We estimate the correspondence field
m
and
simultaneously a field
L
that describes the
orientation of the epipolar curve for each point
in the left image. The field
m
defines the
correspondending left and right pixels:
x
y
m
x
y
R
R
L
L
=
→
(1)
The scalar field 0
L
< models the direction
of the epipolar lines in the left image, defined
as the angle between the
x
L
axis and the line
tangent to the epipolar curve. Figure 3 shows
the situation in which there is no lens distortion
and the epipolar curves are straight lines.
Figure 3: The epipolar direction field
L
At each point
P
in the left image, we construct
a line
l
tangent to the epipolar line and
parametrize it by
L
with
L
= 0 at P.
For a distortionless left camera the following
holds:
∂
∂ λ
α α
∂
∂
α α
∂
∂
α
L
L
L
L
L
L
L
L
x
y
= + =
cos
sin
0
(2)
Now we project line
l
onto the right image
plane using the correspondence field
m
. As
depicted in Figure 3, the uniform
parametrisation by
L
is not preserved by the
projection. It is affected by both calibration
parameters and disparity, which is a function of
the particular 3D scene. When traveling on line
l
with constant velocity, the corresponding walk
on line
l’
in the right image will thus exhibit
unknown accelerations.
If the right camera is also distortionless, the
line
l’
in the right image is a straight line. To
remain on a straight line, the acceleration
should always be parallel to the velocity.
Formulated mathematically, we have:
∂
∂λ
∂
∂λ
2
2
L
R
R
L
R
R
x
y
K
x
y
= ⋅
(3)
The value of
K
depends on the calibration
parameters and the particular 3D scene. It
changes with image position.
For cameras with distortion, (2) and (3) do not
hold. We now model any deviation from (2)
and (3) by Gaussian probability densities,
which result in quadratic energy functions in
the MRF model. After rewriting (2) and (3)
into:
α
L
'
=
0
a
K
v
→ →
=
(4)
we construct the energy terms by:
E
L
1
2
=
α
'
E
a
v
v
a
2
2
2
2
=
×
→ →
(5)
The normalisation in
E
2
makes sure the energy
is invariant to
K
and the absolute magnitude of
both
a
and
v
, which all depend on the particular
3D scene. Appropriate weight factors for the
two terms in a MRF motion estimator will be
determined by experiment.
We have now defined a correspondence
estimator with 3 unknown fields (motion
m
X
,
m
Y
and epipolar angle
L
) and two field equations
(2) and (3). Together with a luminance
difference term [5] they form the total MRF
model.
We use a hierarchical gradient descent
algorithm to compute
m
and
L
given the left
and right image luminance.
3. Stereo camera model
In this section we describe our probabilistic
stereo camera model. Figure 2 shows the
complete model. Five reference frames are
defined, the stereo frame, the left/right lens
frames and the left/right projection frames.
Gaussian probability densities are assigned to
all parameters. We will now describe the
parameters and their mean and variance.
Figure 2: The stereo camera model
Without calibration object with a known length
in meters, it is impossible in a stereo system to
obtain measurements in meters. We therefore
select the camera baseline as our unit of length.
The frame SF is defined to be a right handed
frame in which the two optical centres ly on the
x-axis symmetrically around the origin.
The optical centres therefore have fixed
coordinates in SF, (-½,0,0) for the left camera
and (+½,0,0) for the right camera. The
orientations of the the left and right lens frames
are defined by two sets of Euler angles
(
x
,
y
,
z
). The lens is present in the origin of
the lens frames, oriented in the xy planes. We
assume radial symmetry in the lenses and thus
we can assume
z
=0. The other two angles are
modelled by =0 and =2 rad. This introduces
a small bias towards cameras that are aimed at
the same object. The reference frame
SF
is
defined up to a rotation around the x-axis. We
can therefore introduce an arbitrary equation
that eliminates either
x;L
or
x;R
,
such as e.g.
x;L
+
x;R
= 0.
We assume the ccd to be perfectly flat and
have perfectly perpendicular image axes. The
image formation is invariant for scaling of the
triplet focal length, horizontal and vertical pixel
size. Therefore we choose without with loss of
generality the horizontal size of the pixels equal
to 1 (both cameras) and the vertical size equal
to R
L/R
, the pixel ratio. The ratio is modeled as
= 1, = 0.3.
The positions of the projection frames
PF
L/R
(ccd chip) relative to the lens frames
LF
L/R
are
defined by two vectors (
O
PF
X
LF
,
O
PF
Y
LF
,
O
PF
Z
LF
).
The first two numbers define the intersection of
the lens optical axis with the ccd (mis-
positioning) and are modeled by = 0, = 10
(pixels). The third is the focal length modeled
with = .
The orientation of the projection frames
PF
L/R
(ccd chip) relative to the lens frames
LF
L/R
is
defined by two sets of Euler angles (
x
,
y
,
z
).
z
defines the rotation of the projection frame and
is modeled with = . The other two angles
define non perpendicular ccd placement and are
modeled with = 0, = 5 .
Since ccd misplacement is incorporated in
several of the previous parameters, lens
distortion can be modeled simpler than in [6].
We use only the radial distortion parameter K
3
with = 0, = 0.3.
4. Extraction of calibration
parameters
Once we have the correspondence field
m
, we
can extract the camera parameters. We use the
following procedure.
In the stereo frame
SF
we introduce a new
coordinate, the epipolar coordinate e:
e
y
z
SF
SF
SF
=
arctan
(6)
Given a point in the left image P
PL
we use the
stereo camera model to construct the line in
space on which all possible corresponding
scene points
P
SL
ly. All these points have the
same epipolar coordinate
P
SL
e
SF
. Using the field
m
we do the same with the right image point.
The difference in epipolar coordinate for the
left and right points is due to errors in the
correspondence field
m
and mismatch of the
assumed camera model and the actual cameras.
The difference is modeled by a Gaussian
probability density function with =0 and
epi
to be determined by experiment.
The total probability for the parameters given
prior knowledge defined in section 3 and
deviations from (6) is defined as
P
e
TOT
E
TOT
=
−
,
with
E
TOT
equal to:
E
Q
TOT
i
i
i
all
parameters
epi
all
pixel
pairs
=
−
+
−
∑ ∑
µ
σ σ
Q
SL
e
SR
e
i
SF
SF
P
P
2
2
(7)
We use a MAP criterion to define the best set of
parameters. A simulated annealing procedure is
used to maximize
P
TOT
. This procedure does not
require the computation of complex derivatives
of
P
TOT
with respect to the parameters. The
search space is small enough to allow a slow
cooling procedure that finds the global
optimum.
5. Preliminary results
We are currently working on the motion
estimator with energy terms (5). We are
considering an extra smoothness term that is
necessary also in the case when complete
calibration data is available [3].
We expect to improve on the results reported
in [8], since in our approach a priori calibration
information is used and the number of
corresponding points used for parameter
extraction is several orders higher.
Acknowledgement
This work is done in the framework of the
European ACTS project PANORAMA. One of
the major goals of this project is a hardware
realisation of a real time multi viewpoint
autostereoscopic video system.
References
[1] D.V. Papadimitriou and T.J. Dennis,
“Epipolar line estimation and rectification
for stereo image pairs”,
IEEE Transactions
on Image Processing
, Vol. 5, No. 4, 1996,
pp. 672-676
[2] F. Pedersini, D. Pele, A. Sarti and S.
Tubaro, “Calibration and self-calibration of
multi-ocular camera systems”, in
proceedings of the
International Workshop
on Synthetic-Natural Hybrid Coding and
Three Dimensional Imaging
(IWSNHC3DI’97), Rhodos, Greece, pp. 81-
84, 1997
[3] P.A. Redert, C.J. Tsai, E.A. Hendriks and
A.K. Katsaggelos, “Disparity estimation
with modeling of occlusion and object
orientation”, Proceedings of SPIE VCIP98,
Vol. 3309, pp. 798-808, 1998
[4] G.P. Stein, “Lens distortion calibration
using point correspondences”, in
IEEE
Conference on CVPR
, pp. 602-609, 1997
[5] C. Stiller, “Object-based estimation of
dense motion fields”,
IEEE Transactions on
Image Processing
, Vol. 6, No. 2, pp. 234-
250, 1997
[6] J. Weng, P. Cohen and M. Herniou,
“Camera calibration with distortion models
and accuracy evaluation”, in
IEEE
Transactions on PAMI
, Vol. 14, No. 10,
pp. 965-980, 1992
[7] H.J. Woltring, “Simultaneous multiframe
analytical calibration (SMAC) by recourse
to oblique obervations of planar control
distributions”,
SPIE Vol. 166 Applications
of Human Biostereometrics
, pp. 124-135,
1978
[8] Z. Zhang, “On the epipolar geometry
between two images with lens distortion”,
in proceedings
of Int. Conf. Pattern
Recognition
(
ICPR
), Vol. 1, pp. 407-411,
1996