Content uploaded by Wilhelm Burger
Author content
All content in this area was uploaded by Wilhelm Burger on Apr 30, 2020
Content may be subject to copyright.
Zhang’s Camera Calibration Algorithm:
In-Depth Tutorial and Implementation
by
Wilhelm Burger
wilhelm.burger@fh-hagenberg.at
Technical Report HGB16-05
16th May, 2016 (revised April 2020)
Department of Digital Media
University of Applied Sciences Upper Austria, School of Informatics,
Communications and Media, Softwarepark 11, 4232 Hagenberg, Austria
www.fh-hagenberg.at
Copyright
©
2016 by the author(s). This report may be freely used, copied, printed and
distributed in its entirety on paper or electronically for academic purposes. Any copies must
include this cover page and notice. Any non-academic use is subject to explicit permission by
the authors.
Abstract
This report details the algorithmic steps involved in the well-known camera
calibration method by Zhang and describes an associated open-source Java im-
plementation that depends only upon the Apache Commons Math library.
Index Terms: Computer vision, camera calibration, Zhang’s method, camera
projection, imaging geometry, image rectification, Java implementation.
Software Repository: https://github.com/imagingbook/imagingbook-calibrate
Citation
Wilhelm Burger: Zhang’s Camera Calibration Algorithm: In-Depth Tutorial and Imple-
mentation, Technical Report HGB16-05, University of Applied Sciences Upper Austria,
School of Informatics, Communications and Media, Dept. of Digital Media, Hagenberg,
Austria, May 2016. https://www.researchgate.net/publication/303233579 Zhang’s Camera
Calibration Algorithm In-Depth Tutorial and Implementation.
@techreport{BurgerCalibration2016,
author = {Burger, Wilhelm},
title = {Zhang’s Camera Calibration Algorithm: In-Depth Tutorial
and Implementation},
language = {english},
institution = {University of Applied Sciences Upper Austria, School of
Informatics, Communications and Media, Dept. of Digital
Media},
address = {Hagenberg, Austria},
number = {HGB16-05},
year = {2016},
month = {05},
url = {https://www.researchgate.net/publication/303233579_Zhang’
s_Camera_Calibration_Algorithm_In-
Depth_Tutorial_and_Implementation}
}
ii
Burger – Zhang’s Camera Calibration Algorithm 1
1 Introduction
Accurate knowledge of the image projection parameters is an essential prereq-
uisite for any kind of quantitative geometric measurement in computer vision.
The real projection parameters depend on numerous technical elements and are
usually not provided by manufacturers of imaging systems. Also, e. g., in the
case of cameras equipped with zoom lenses, the projection parameters may be
variable.
Many approaches to camera calibration exist (see, e. g., [9, p. 226]) using
different strategies with regard to what about the 3D scene is known. Some
approaches make use of a special, calibrated 3D setup (calibration rig ), where the
position of all 3D points and the camera center are known. Other approaches,
such as the one by Zhang described here, use multiple views of a 3D pattern
of known structure but unknown position and orientation in space. Finally,
calibration methods exist that make no assumptions about the 3D structure of
the scene, using multiple views of arbitrary, rigid structures. This is commonly
called “self calibration”. In this case the intrinsic camera parameters and the
extrinsic viewing parameters (3D structure) are recovered together. Based on
the imaging model described in Sec. 2, the following parameters are recovered:
The intrinsic camera parameters, i.e., the inner transformations of the
camera, including focal length, position of the principal point, sensor scale
and skew.
The parameters of the non-linear lens distortion.
The external transformation parameters (3D rotation and translation) for
each of the given views of the reference pattern.
2 The perspective projection model
This section describes the underlying projection process from 3D world points
to 2D sensor coordinates and outlines the associated notation.
2.1 The pinhole camera model
The simple and well-known pinhole camera model (see, e. g., [2, Chap. 1]) is
used to describe the projection of 3D world points onto the camera’s sensor
plane. We assume that the image plane is positioned in front of the optical
center, thus the image is not upside-down. The image plane is positioned at the
distance ffrom the optical center C= (0,0,0)|, perpendicular to the optical
axis. The optical center Cis the origin of the 3D camera coordinate system.
The optical axis aligns with the Z-axis of the coordinate system and intersects
the image plane at (0,0, f )|. Throughout this text, we use the definitions listed
in Table 1.
We assume that initially the camera coordinate system is identical to the
world coordinate system (we later remove this constraint and use two separate
coordinate systems). From similar triangles, every 3D point X= (X, Y, Z )|
Burger – Zhang’s Camera Calibration Algorithm 2
Table 1: List of symbols and notation used in this document.
X= (X, Y, Z )|. . . . . . . . a 3D world point
X= (X0,...,XN−1). . . the 3D target points, with Xj= (X, Y , 0)|
X= hom(X).......... homogeneous coordinate for a Cartesian coordinate X
X= hom−1(X) . . . . . . . Cartesian coordinate for a homogeneous coordinate X
x= (x, y)|. . . . . . . . . . . . . a projected 2D point in the normalized image plane
˜
x= (˜x, ˜y)|. . . . . . . . . . . . . a 2D point after lens distortion (in the normalized image
plane)
u= (u, v)|. . . . . . . . . . . . a projected 2D sensor p oint
˙
u= ( ˙u, ˙v)|. . . . . . . . . . . . an observed 2D sensor point
˙
Ui= ( ˙
ui,0,..., ˙
ui,N−1). the observed sensor points for view i
˙
U= ( ˙
U0,..., ˙
UM−1) . . the observed sensor points for all Mviews
A. . . . . . . . . . . . . . . . . . . . . intrinsic camera matrix (Eqn. (15))
a. . . . . . . . . . . . . . . . . . . . . . vector of intrinsic camera parameters, including the dis-
tortion coefficients (Eqn. (120))
f. . . . . . . . . . . . . . . . . . . . . . fo cal length
R..................... a 3×3 rotation matrix
ρ= (ρx, ρy, ρz) . . . . . . . . a 3D rotation (Rodrigues) vector (Eqn. (126))
t= (tx, ty, tz)|. . . . . . . . . a 3D translation vector
W= (R|t) . . . . . . . . . . . an extrinsic camera (view) matrix (Eqn. (16))
w. . . . . . . . . . . . . . . . . . . . . vector of 3D view parameters (Eqn. (121))
W= (Wi) . . . . . . . . . . . . . a sequence of camera views
ˇ
P(W,X).............. 3D 7→ 2D pro jection, which maps the 3D point Xto
normalized image coordinates with view parameters W
(Eqn. (20))
P(A,W,X) . . . . . . . . . . . 3D 7→ 2D pro jection, which maps the 3D point Xto the
associated 2D sensor point uwith camera intrinsics A
and view parameters W(Eqn. (24))
P(A,k,W,X) . . . . . . . . 3D 7→ 2D projection, which includes camera lens distor-
tion with parameters k(Eqn. (28)).
yields the relations
X
Z=x
f,Y
Z=y
f(1)
and thus the projection point in the image plane is
x=f·X
Z, y =f·Y
Z(2)
or, in vector notation,
x=x
y=f
Z·X
Y.(3)
The world point Xlies on a ray which passes through the optical center C=
(0,0,0)|and the corresponding image point xi, i.e.,
Xi=λ·xi− C=λ·xi,(4)
Burger – Zhang’s Camera Calibration Algorithm 3
for some λ > 1.
2.2 The projection matrix
Equations (2) and (3) describe nonlinear transformations in the domain of Carte-
sian coordinates. Using homogeneous coordinates,1the perspective transforma-
tion can be written as a (linear) matrix equation
x
y≡f
Z·X
Y≡
fX/Z
fY /Z
1
≡
fX
fY
Z
=
f0 0 0
0f0 0
0 0 1 0
| {z }
MP
·
X
Y
Z
1
(5)
or, written more compactly,2
x= hom−1(MP·hom(X)) .(6)
The projection matrix MPcan be decomposed into two matrices Mfand M0
in the form
MP=
f0 0 0
0f0 0
0 0 1 0
=
f0 0
0f0
0 0 1
| {z }
Mf
·
1000
0100
0010
| {z }
M0
=Mf·M0,(7)
where Mfmodels the internals of the (ideal) pinhole camera with focal length f
and M0describes the transformation between the camera coordinates and the
world coordinates. In particular, M0is often referred to as the standard (or
canonical )projection matrix [9], which corresponds to the simple viewing ge-
ometry (the optical axis being aligned with the Z-axis), which we have assumed
so far.
2.3 Viewing under rigid motion
If the camera has its own (non-canonical) coordinate system, it observes 3D
points that were subjected to rigid body motion, as described in Sec. A.2. Thus
the projective transformation MP(Eqn. (5)) is now applied to the modified (ro-
tated and translated) points X0instead of the original 3D points X= hom(X),
that is,
x= hom−1[MP·X0] = hom−1[MP·Mrb ·hom(X)],(8)
where the matrix Mrb specifies some rigid body motion in 3D.3The com-
plete perspective imaging transformation for the ideal pinhole camera with focal
length funder rigid motion can thus be written as
x= hom−1[Mf·M0·Mrb ·hom(X)] (9)
1See also Sec. A.1 in the Appendix.
2The operator x= hom(x) converts Cartesian coordinates to homogeneous coordinates.
Inversely, x= hom−1(x) denotes the conversion from homogeneous to Cartesian coordinates
(see Sec. A.1 of the Appendix).
3See also Sec. A.2.3, Eqn. (170) in the Appendix.
Burger – Zhang’s Camera Calibration Algorithm 4
or, by combining M0and Mrb into a single matrix,
x
y= hom−1"
f0 0
0f0
0 0 1
·
1000
0100
0010
·
r11 r12 r13 tx
r21 r22 r23 ty
r31 r32 r33 tz
0 0 0 1
·
X
Y
Z
1
#
= hom−1"
f0 0
0f0
0 0 1
| {z }
Mf
·
r11 r12 r13 tx
r21 r22 r23 ty
r31 r32 r33 tz
| {z }
(Rt)
·
X
Y
Z
1
#(10)
= hom−1Mf·(Rt)·hom(X).(11)
In the special case of f= 1, Mfbecomes the identity matrix and can thus be
omitted, that is,
x= hom−1[(Rt)·hom(X)] .(12)
In the following, this is referred to as the “normalized projection”.
2.4 Intrinsic camera parameters
A final small step is required to make the perspective imaging transformation
in Eqn. (11) useful as a model for real cameras. In particular, we need to define
how the continuous x/y-coordinates on the image plane map to actual pixel
coordinates by taking into account
the (possibly different) sensor scales sx,syin x- and y-direction, respec-
tively,
the location of the image center uc= (uc, vc) with respect to the image
coordinate system (i. e., the optical axis), and
the skewedness (diagonal distortion) sθof the image plane (which is usu-
ally negligible or zero).
The final sensor coordinates u= (u, v)|are obtained from the normalized
image coordinates x= (x, y)|(see Eqn. (12)) as
u
v= hom−1"
sxsθuc
0syvc
0 0 1
·
f0 0
0f0
0 0 1
| {z }
A
·
x
y
1
#(13)
= hom−1[A·hom(x)] ,(14)
where
A=
fsxf sθuc
0fsyvc
0 0 1
=
α γ uc
0β vc
0 0 1
(15)
is the so-called intrinsic camera matrix. Taking into account these additional
inner camera parameters, the complete perspective imaging transformation can
Burger – Zhang’s Camera Calibration Algorithm 5
now be written as
u
v= hom−1"
α γ uc
0β vc
0 0 1
| {z }
A
·
r11 r12 r13 tx
r21 r22 r23 ty
r31 r32 r33 tz
| {z }
W=(R|t)
·
Xi
Yi
Zi
1
#(16)
= hom−1[A·W·hom(X)] ,(17)
where Acaptures the intrinsic properties of the camera (“intrinsics”), and
W= (R|t) are the extrinsic parameters of the projection transformation
(“extrinsics”). Note that we can calculate the projection in Eqn. (17) in two
steps:
Step 1: Calculate the normalized projection x= (x, y)|(see Eqn. (12)):
x
y= hom−1[W·hom(X)] (18)
= hom−1"
r11 r12 r13 tx
r21 r22 r23 ty
r31 r32 r33 tz
·
Xi
Yi
Zi
1
#(19)
=ˇ
P(W,X).(20)
Step 2: Map from normalized coordinates xto sensor coordinates u= (u, v)|
by the affine transformation A(see Eqn. (14)):
u
v= hom−1[A·hom(x)] = A0·hom(x) (21)
= hom−1"
α γ uc
0β vc
0 0 1
·
x
y
1
#=α γ uc
0β vc
| {z }
A0
·
x
y
1
,(22)
where A0is the upper 2 ×3 submatrix of A. Note that, by using A0, no explicit
conversion to Cartesian coordinates (hom−1) is required in Eqn. (22). Aand A0
are affine mappings in 2D. Combining the two steps above we can summarize
the whole 3D to 2D projection process, (from world coordinates Xto sensor
coordinates u) in a single expression,
u=A0·hom[x] = A0·homhom−1[W·hom(X)]=P(A,W,X).(23)
We will refer to P(A,W,X), as defined in Eqn. (23), as the projection function,
which maps the 3D point X= (X, Y, Z )|(defined in world coordinates) to the
2D sensor point u= (u, v)|, using the intrinsic parameters Aand the extrinsic
(view) parameters W. This function can be separated into two component
functions in the form
P(A,W,X) = Px(A,W,X)
Py(A,W,X)=u
v=u,(24)
We will build on this notation in the following steps.
Burger – Zhang’s Camera Calibration Algorithm 6
The primary goal of camera calibration is to determine the un-
known intrinsic properties of the camera i. e., the elements of
the matrix A(as detailed in Sec. 3).
2.5 Lens distortion
So far we have relied on the naive pinhole camera model which exhibits no
distortions beyond the projective transformation described above. Real cameras
are built with lenses instead of pinholes and these introduce additional geometric
distortions that include two main components [8, p 342]:
Decentering errors caused by a displacement of the lens center from
the optical axis (this is mostly taken care of by the variable offset (uc, vc)
in Eqn. (13));
Radial distortion caused by variations in light refractions, which is typ-
ically apparent in wide-angle lenses (“barrel distortion”).
While lens distortion is a complex physical phenomenon in general, it is usually
modeled with sufficient accuracy as a single-variable polynomial function D(r)
of the radial distance rfrom the lens center [8, p 343] (see Sec. 2.5.2, Eqn. (32)).
2.5.1 Where does the lens distortion come in?
Lens distortion affects the normalized projection coordinates x, i. e., before the
image-to-sensor transformation (defined by the intrinsic camera parameters) is
applied. Before we investigate the actual distortion model, we define a general
distortion function warp: R27→ R2, which maps an undistorted 2D coordinate
xto a distorted 2D coordinate ˜
x(again in the normalized projection plane) by
˜
x= warp(x,k),(25)
where kis a vector of distortion parameters. With this definition, we can
reformulate the projection process in Eqn. (23) to include the lens distortion as
u=A0·hom˜
x=A0·homwarp(x,k)(26)
=A0·homwarphom−1[W·hom(X)]
| {z }
x
,k (27)
=P(A,k,W,X).(28)
In the following, we describe how the warp() function (referenced in Eqns. (26)–
(27)) is specified and calculated.
2.5.2 Radial distortion model
Aradial model is most commonly used for correcting geometric lens distor-
tions. By radial distortion we understand that the displacement is restricted to
radial lines emanating from the image center; the amount of radial displacement
(inwards or outwards) is a function of the radius only. In the normalized pro-
jection plane, the optical axis intersects the image plane at xc= (0,0), which
Burger – Zhang’s Camera Calibration Algorithm 7
is assumed to be the center of the lens distortion. The radial distance riof a
projected point x= (x, y)|from the center can thus be simply calculated as
ri=kxi−xck=kxik=qx2
i+y2
i.(29)
The distortion is assumed to be radially symmetric, i. e., it only depends on the
original radius riof a given point xi, as defined in Eqn. (29). The distortion
model can thus be specified by a single-variable function D(r, k), such that the
distorted radius is
˜r=frad(r) = r·[1 + D(r, k)].(30)
Consequently (see Eqn. (25)), the warped projection point is ˜
xi= warp(xi,k),
with
warp(xi,k) = xi·[1 + D(kxik,k)].(31)
The function D(r, k) specifies the (positive or negative) radial deviation for a
given radius r. A simple but effective radial distortion model is based on the
polynomial function
D(r, k) = k0·r2+k1·r4=k·r2
r4,(32)
with the (unknown) coefficients k= (k0, k1), as illustrated in Fig. 1.4Note that,
if k0=k1= 0, then also D(r, k) = 0 and there is no distortion.
r
D(r, k)
Figure 1: Plot of the radial deviation function D(r, k) in Eqn. (32) for coeffi-
cients k= (k0, k1) = (−0.2286,0.190335).
2.6 Summary of the projection process
In summary, the following steps model the complete projection process (see
Fig. 3):
4Other formulations of the radial distortion function can be found in the literature. For
example, D(r) = k0·r2+k1·r4+k2·r6is used in [8, p. 343] or D(r) = k0·r+k1·r2+
k2·r3+k3·r4by Devernay and Faugeras (mentioned in [9, p. 58]). For a detailed analysis
of radial distortion in fisheye lenses see [3].
Burger – Zhang’s Camera Calibration Algorithm 8
r
˜r=frad (r)
Figure 2: Plot of the radial distortion ˜r=frad(r) = r·[1 + D(r, k)] in Eqn.
(30) for the same parameters (k) as in Fig. 1.
Burger – Zhang’s Camera Calibration Algorithm 9
(c)(b)
(a)
1
Z
X
x
y
˜x
˜y
u
v
uc
radial
distortion
˜
x←warp(x,k)
affine 2D
mapping
u←A·˜
x
x
˜
x
u
Figure 3: Summary of the projection chain (from right to left). In (c) the
3D point X(in camera coordinates) is projected (with f= 1) onto the “ideal”
image plane to the normalized coordinates x= (x, y)|. Radial lens distortion
in (b) maps point xto ˜
x= (˜x, ˜y)|. The affine mapping specified by the
intrinsic camera transformation (matrix A) finally yields the observed sensor
image coordinates u= (u, v)|in (a).
1. World-to-camera transformation: Given a point (X, Y , Z)|, expressed
in 3D world coordinates, its position in the 3D camera coordinate system
is specified by the viewing transformation W(see Eqn. (168)),
X0
Y0
Z0
1
=
r11 r12 r13 tx
r21 r22 r23 ty
r31 r32 r33 tz
0 0 0 1
| {z }
W
·
X
Y
Z
1
,(33)
in homogeneous coordinates, or simply
X0=W·X.(34)
Burger – Zhang’s Camera Calibration Algorithm 9
2. Projection onto the “normalized” (ideal) image plane: The per-
spective projection from the 3D-point X0= (X0, Y 0, Z 0)|(in 3D camera
co-ordinates) onto continuous, normalized 2D coordinates x= (x, y)|on
the image plane is defined (see Eqn. (5)) as
x
y=1
Z0·X0
Y0= hom−1
h
1000
0100
0010
·
X0
Y0
Z0
1
i=ˇ
P(W,X),(35)
which is equivalent to an ideal pinhole projection with focal length f= 1.
3. Radial lens distortion: The normalized 2D projection coordinates x=
(x, y)|are subjected to a non-linear radial distortion with respect to the
optical center xc= (0,0)|, expressed by the mapping
˜
x= warp(x,k) or ˜x
˜y=x
y·[1 + D(r, k)],(36)
with r=px2+x2and D(r, k) as defined in Eqn. (32). ˜
x= (˜x, ˜y)|are
the lens-distorted 2D coordinates—still in the normalized image plane.
4. Affine 2D transformation to sensor coordinates: The normalized
projection points are finally mapped to the scaled and skewed sensor co-
ordinates (see Eqn. (13)) by the affine transformation
u=A0·hom[˜
x] or u
v=α γ uc
0β vc·
˜x
˜y
1
,(37)
where α,β,γ,uc, vcare the intrinsic camera parameters (see Eqns. (15)
and (22)).
3 Plane-based self calibration
The popular camera calibration method by Zhang [14, 15] uses a few (at least
two) views of a planar calibration pattern, called “model” or “target”, whose
layout and metric dimensions are precisely known.5The calibration procedure
works roughly as follows:
1. Images I0, . . . , IM−1of the model are taken under different views by either
moving the model or the camera (or both).
2. From each image Ii(i= 0, . . . , M −1), Nsensor points ˙
ui,0,..., ˙
ui,j ,
. . . , ˙
ui,N−1are extracted (observed), assumed to be in 1:1 correspondence
with the points on the model plane.
3. From the observed points, the associated homographies H0,...,HM−1
(linear mappings from the model points and the observed 2D image points)
are estimated for each view i. (see Sec. 3.2).
5A variant of this calibration technique is also implemented in OpenCV [7].
Burger – Zhang’s Camera Calibration Algorithm 10
4. From the view homographies Hi, the five intrinsic parameters (α, β, γ,
uc,vc) of the camera are estimated using a closed-form (linear) solution,
ignoring any lens distortion at this point. M≥3 views give a unique solu-
tion (up to an undetermined scale factor). If the sensor plane is assumed
to be without skew (i. e., γ= 0, which is a reasonable assumption) then
N= 2 images are sufficient. More views generally lead to more accurate
results (see Sec. 3.3).
5. Once the camera intrinsics are known, the extrinsic 3D parameters Ri,ti
are calculated for each camera view i(see Sec. 3.4).
6. The radial distortion parameters k0, k1are estimated by linear least-squar-
es minimization (see Sec. 3.5).
7. Finally, using the estimated parameter values as an initial guess, all pa-
rameters are refined by non-linear optimization over all Mviews (see Sec.
3.6).
These steps are explained in greater detail below (see [14] for a complete de-
scription and the list of symbols in Table 1).
3.1 Calibration model and observed views
The calibration model contains Nreference points X0,...,XN−1whose 3D
coordinates are known. The points are assumed to lie in the XY -plane, i. e.,
their Z-component is zero.
We assume that Mdifferent views (i. e., pictures) are taken of the model
plane and we use i= 0, . . . , M −1 to denote the ith view of the model. From
each camera picture Ii,Nfeature points are extracted, so we get the observed
sensor points
˙
ui,j ∈R2,(38)
with view numbers i= 0, . . . , M −1 and point numbers j= 0, . . . , N −1. Note
that every observed point ˙
ui,j must correspond to the associated model point
Xj. Thus the model points Xjand the image points ˙
ui,j must be supplied in
the same order. It is essential that this condition is met, since otherwise the
calibration will deliver invalid results.
3.2 Step 1: Estimating the homography for each view
Using Eqn. (17), the mapping (homography) between the observed image coor-
dinates ˙
ui,j = ( ˙ui,j ,˙vi,j )|and the corresponding 3D point coordinates Xjcan
be expressed as
s·
˙ui,j
˙vi,j
1
=A·Riti·
Xj
Yj
Zj
1
(39)
or
s·˙
ui,j =A·Riti·Xj(40)
(with homogeneous coordinates ˙
u,X), where
A=
α γ uc
0β vc
0 0 1
(41)
Burger – Zhang’s Camera Calibration Algorithm 11
represents the intrinsic camera parameters (common to every view) and and sis
an arbitrary, non-zero scale factor. Ri,tiare the (extrinsic) 3D rotation matrix
and the translation vector for the specific view i.
Since the model points Xjare assumed to lie in the XY -plane of the world
coordinate system (i. e., Zj= 0 for all j),6we can rewrite Eqn. (39) as
s·
˙ui,j
˙vi,j
1
=A·
| | | |
ri,0ri,1ri,2ti
| | | |
·
Xj
Yj
0
1
=A·
| | |
ri,0ri,1ti
| | |
·
Xj
Yj
1
,(42)
where ri,0,ri,1,ri,2denote the three column vectors of the matrix Ri. Note
that Zj= 0 makes the third vector (ri,2) redundant and it is therefore omitted
in the right part of Eqn. (42). This is equivalent to a 2D homography mapping
s·
˙ui,j
˙vi,j
1
=Hi·
Xj
Yj
1
,(43)
where sis a (undetermined) non-zero scale factor and Hi=λ·A·Ritiis a
3×3 homography matrix (λbeing an arbitrary scalar value which can be ignored
since we work with homogeneous coordinates). The matrix Hiis composed of
the 3 column vectors hi,0,hi,1,hi,2, that is,
Hi=
|||
hi,0hi,1hi,2
|||
=λ·A·
| | |
ri,0ri,1ti
| | |
.(44)
Thus the task is to determine the homography matrix Hifor a set of corre-
sponding 2D points ( ˙ui,j ,˙vi,j )|and (Xj, Yj)|.
3.2.1 Homography estimation with the Direct Linear Transforma-
tion (DLT)
Among the several methods for estimating homography mappings, the DLT
method is the simplest (see [5, Ch. 4]). It is also used in Zhang’s original
implementation. We assume that the two corresponding 2D points sequences,
the model points X= (X0,...,XN−1), with Xj= (Xj, Yj) and the associated
(observed) sensor points ˙
U= ( ˙
u0,..., ˙
uN−1), with ˙
uj= ( ˙uj,˙vj)|, are related
by a homography transformation, that is (written in homogeneous coordinates),
˙
uj=H·Xj(45)
or
˙uj
˙vj
˙wj
=
H0,0H0,1H0,2
H1,0H1,1H1,2
H2,0H2,1H2,2
·
Xj
Yj
Zj
,(46)
6This only means that we assume a moving camera instead of a moving object plane
and makes no difference, because only the relative coordinates with respect to the camera is
required for calibration.
Burger – Zhang’s Camera Calibration Algorithm 12
for j= 0, . . . , N −1. Without loss of generality (to show!), we can set Zj= 1
(such that Xj=Xj,Yj=Yj) and can thus rewrite Eqn. (46) as
˙uj
˙vj
˙wj
=
H0,0H0,1H0,2
H1,0H1,1H1,2
H2,0H2,1H2,2
·
Xj
Yj
1
.(47)
In Cartesian coordinates, this corresponds to a pair of non-linear equations,
uj=˙uj
˙wj
=H0,0·Xj+H0,1·Yj+H0,2
H2,0·Xj+H2,1·Yj+H2,2
,(48)
vj=˙vj
˙wj
=H1,0·Xj+H1,1·Yj+H1,2
H2,0·Xj+H2,1·Yj+H2,2
,(49)
which can be rearranged to
˙uj·(H2,0·Xj+H2,1·Yj+H2,2) = H0,0·Xj+H0,1·Yj+H0,2,(50)
˙vj·(H2,0·Xj+H2,1·Yj+H2,2) = H1,0·Xj+H1,1·Yj+H1,2,(51)
and finally
˙uj·Xj·H2,0+ ˙uj·Yj·H2,1+ ˙uj·H2,2−H0,0·Xj−H0,1·Yj−H0,2= 0,(52)
˙vj·Xj·H2,0+ ˙vj·Yj·H2,1+ ˙vj·H2,2−H1,0·Xj−H1,1·Yj−H1,2= 0.(53)
This is a pair of homogeneous equations (since the right hand side is zero)
that are linear in the unknown coefficients Hr,c (although, due to the mixed
terms, still non-linear w.r.t. the coordinates ˙uj,˙vj, Xj, Yj). By collecting the
nine elements of the unknown homography matrix Hinto the vector
h= (H0,0, H0,1, H0,2, H1,0, H1,1, H1,2, H2,0, H2,1, H2,2)|,(54)
Eqns. (52) and (53) can be written in the form
−Xj−Yj−1 0 0 0 ˙ujXj˙ujYj˙uj
0 0 0 −Xj−Yj−1 ˙vjXj˙vjYj˙vj·h=0
0,(55)
for every corresponding point pair ( ˙uj,˙vj)↔(Xj, Yj). Thus, Npoint pairs,
assumed to be related by the same homography H, yield a system of 2Nhomo-
geneous linear equations in the form
−X0−Y0−1 0 0 0 ˙u0X0˙u0Y0˙u0
0 0 0 −X0−Y0−1 ˙v0X0˙v0Y0˙v0
−X1−Y1−1 0 0 0 ˙u1X1˙u1Y1˙u1
0 0 0 −X1−Y1−1 ˙v1X1˙v1Y1˙v1
−X2−Y2−1 0 0 0 ˙u2X2˙u2Y2˙u2
0 0 0 −X2−Y2−1 ˙v2X2˙v2Y2˙v2
.
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
.
−XN−1−YN−1−1 0 0 0 ˙uN−1XN−1˙uN−1YN−1˙uN−1
0 0 0 −XN−1−YN−1−1 ˙vN−1XN−1˙vN−1YN−1˙vN−1
·
H0,0
H0,1
H0,2
H1,0
H1,1
H1,2
H2,0
H2,1
H2,2
=
0
0
0
0
0
0
.
.
.
0
0
(56)
or, in the usual matrix-vector notation,
M·h=0,(57)
where Mis a 2N×9 matrix (with all elements being known constants), dis
the vector of the nine unknowns, and 0is the zero vector of length 2N.
Burger – Zhang’s Camera Calibration Algorithm 13
Solving homogeneous systems of linear equations: While Eqn. (57)
looks quite similar to an ordinary system of linear equations of the form M·x=
b, it cannot be solved in the usual way (without additional constraints), since it
always has h=0as a trivial (and thus useless) solution. However, Eqn. (57) can
be solved by singular-value decomposition (SVD, [4, Sec. 6.11][1, Sec. 4.5.3][10,
Sec. 2.6]) of the matrix M, which separates M(of size 2N×9) into the product
of three matrices U,S,Vin the form
M=U·S·V|.(58)
Here7Uis a 2N×2N(in this particular case) unitary8matrix, Sis a 2N×9
rectangular diagonal matrix with non-negative real values, and Vis a 9 ×9
unitary matrix. The concrete decomposition of Mis thus
M=
U0,0··· U0,2N−1
U1,0··· U1,2N−1
.
.
.....
.
.
U2N−1,0··· U2N−1,2N−1
| {z }
U
·
s000000000
0s10000000
0 0 s2000000
000s300000
0000s40000
00000s5000
000000s60 0
0000000s70
00000000s8
000000000
.
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
.
000000000
| {z }
S
·
V0,0··· V8,0
V0,1··· V8,1
.
.
.....
.
.
V0,8··· V8,8
| {z }
V|
.(59)
The diagonal elements of S,s0, . . . , s8are called the singular values of the
decomposed matrix M. Each singular value sihas an associated column vector
uiin U(called a left singular vector of M) and a dedicated row vector v|
iin
V|(i. e., a column vector viin V), called a right singular vector. Thus Eqn.
(59) can be equally written as
M=
| | |
u0··· ui· ·· u2N−1
| | |
·S·
—v|
0—
.
.
.
—v|
i—
.
.
.
—v|
8—
,(60)
where Uconsists of the column vectors (u0,...,u2N−1)9and Vis composed of
the row vectors (v|
0,...,v|
8).
The seeked-for solution to Eqn. (57), i. e., the unknown parameter vector
h, is finally found as the right singular vector vkassociated with the smal lest
singular value sk= min(s0, . . . , s8), that is,
h=vk,with k= argmin
0≤i<9
si. (61)
7There are several different notations in use regarding the size of the matrices involved in
the SVD. We use the version adopted by Matlab and Mathematica (http://mathworld.wolfram.
com/SingularValueDecomposition.html), where Shas the same size as the original matrix M
and U,Vare square.
8A square matrix Uis called unitary if its column vectors are orthogonal, i. e., if U·U|=
U|·U=I.
9Note that the vectors uiare zero for i≥9.
Burger – Zhang’s Camera Calibration Algorithm 14
If the resulting homography transformation for the corresponding point sets
is exact, the value of skis zero. This is generally the case when the homography
is calculated from 4 corresponding point pairs, which is the required minimum
number to solve for the eight degrees of freedom.
If more than 4 point pairs are involved, the system in Eqn. (57) is overdeter-
mined (which is the usual case). Here the value of skindicates the residual or
“goodness of fit” of the resulting homography. Of course, if the fit is exact for
all point pairs, skwill again be zero. In the case of an overdetermined system,
the obtained solution minimizes Eqn. (57) in the least-squares sense, that is,
kM·hk2→min .(62)
In many common SVD implementations,10 the singular values in Sare arranged
in non-increasing order (i. e., si≥si+1), such that (in our case) s8comes out as
the smallest value and v8(the last column vector of V) is the corresponding so-
lution. For example, the following Java code segment shows an implementation
with the Apache Commons Math (ACM) library:11
RealMatrix M;
SingularValueDecomposition svd = new SingularValueDecomposition(M);
RealMatrix V = svd.getV();
RealVector h = V.getColumnVector(V.getColumnDimension() - 1);
Note that the formulation in Eqn. (62) minimizes an algebraic residual that
does not directly relate to the geometric projection error. This does not cause a
problem in general, since the remaining errors are eliminated in the final, overall
optimization step (see Sec. 3.6). However, minimizing the projection errors of
the homographies at this stage (which is not done in Zhang’s implementation)
may help to improve the convergence of the final optimization. It requires non-
linear optimization, for which the above solution can serve as a good initial
guess (see Sec. 3.2.3).
3.2.2 Normalization of input data
To improve numerical stability of the calculations, it is recommended [6] to
normalize both 2D point sets Xand ˙
U, before performing the homography
estimation described in the previous section.
Normalization is accomplished by transforming each point set by a dedicated
3×3 normalization matrix N(in homogeneous coordinates), such that the
transformed point set becomes centered at the origin and scaled to a standard
diameter, i. e.,
X0= normalize(X) = (NX·X0,...,NX·XN−1),(63)
˙
U0= normalize( ˙
U)=(NU·˙
u0,...,NU·˙
uN−1).(64)
See Sec. B of the Appendix for methods to calculate the normalization ma-
trices NXand NU. Homography estimation is then (analogous to Eqn. (45))
performed on the normalized point sets X0,˙
U0, by calculating a matrix H0sat-
isfying (in the least squares sense)
NU·˙
uj=H0·NX·Xj,(65)
10E. g., Matlab,Mathematica,Apache Commons Math.
11http://commons.apache.org/math/ (version 3.6)
Burger – Zhang’s Camera Calibration Algorithm 15
for j= 0, . . . , N −1. By substituting ˙
uj(from Eqn. (45)) we get
NU·H·Xj=H0·NX·Xj(66)
and then, by dropping xjon both sides,
NU·H=H0·NX.(67)
Thus the de-normalized homography can be calculated (by multiplying both
sides with N−1
U) as
H=N−1
U·H0·NX.(68)
3.2.3 Non-linear refinement of estimated homographies
As mentioned above, the homography estimates obtained with the DLT method
do not generally minimize the projection errors in the sensor image. In this
step, the estimated homography Hfor a single view is numerically refined by
minimizing the projection error. Minimizing the projection error requires non-
linear optimization, which is usually implemented with iterative schemes, such
as the Levenberg-Marquart (LM) method, described in Sec. E of the Appendix.
Given is the ordered sequence of model (target) points X= (X0,...,XN−1),
the set of corresponding (observed) sensor points ˙
U= ( ˙
u0,..., ˙
uN−1) for a
particular view, and the estimated homography H(calculated as described in
Sec. 3.2.1). Following Eqn. (45), the goal is to minimize the projection error
Eproj =
N−1
X
j=0
˙
uj−H·Xj
2(69)
in the sensor plane, assuming that the positions of the model points Xjare accu-
rate. The Levenberg-Marquart optimization, performed by procedure Optimize()
in Alg. 4.3, requires two functions for specifying the optimization problem:
The value function: The function Val(X,h) returns the vector (of length
2N) of projected (u, v) coordinates, obtained by applying the homography H
(represented by the vector h= (h0,...,h8)) to all model points Xj= (Xj, Yj)|
in X, that is,
uj=uj
vj= hom−1(H·hom(Xj)).(70)
The returned vector stacks the u, v values for all Nmodel points in the form
Val(X,h) = Y=
u0
v0
u1
v1
.
.
.
uN−1
vN−1
.(71)
Burger – Zhang’s Camera Calibration Algorithm 16
The Jacobian function: The function Jac(X,h) returns a stacked Jacobian
matrix (of size 2N×9), consisting of the partial derivatives w.r.t the nine
homography parameters. The structure of the returned matrix is
Jac(X,h) = J=
∂u0
∂h0
∂u0
∂h1
∂u0
∂h2
∂u0
∂h3
∂u0
∂h4
∂u0
∂h5
∂u0
∂h6
∂u0
∂h7
∂u0
∂h8
∂v0
∂h0
∂v0
∂h1
∂v0
∂h2
∂v0
∂h3
∂v0
∂h4
∂v0
∂h5
∂v0
∂h6
∂v0
∂h7
∂v0
∂h8
∂u1
∂h0
∂u1
∂h1
∂u1
∂h2
∂u1
∂h3
∂u1
∂h4
∂u1
∂h5
∂u1
∂h6
∂u1
∂h7
∂u1
∂h8
∂v1
∂h0
∂v1
∂h1
∂v1
∂h2
∂v1
∂h3
∂v1
∂h4
∂v1
∂h5
∂v1
∂h6
∂v1
∂h7
∂v1
∂h8
.
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
.
∂uN−1
∂h0
∂uN−1
∂h1
∂uN−1
∂h2
∂uN−1
∂h3
∂uN−1
∂h4
∂uN−1
∂h5
∂uN−1
∂h6
∂uN−1
∂h7
∂uN−1
∂h8
∂vN−1
∂h0
∂vN−1
∂h1
∂vN−1
∂h2
∂vN−1
∂h3
∂vN−1
∂h4
∂vN−1
∂h5
∂vN−1
∂h6
∂vN−1
∂h7
∂vN−1
∂h8
,
(72)
where each pair of rows is associated with a particular model point Xj=
(Xj, Yj) and contains the partial derivatives
J2j,∗
J2j+1,∗!= Xj
w
Yj
w
1
w0 0 0 −sx·Xj
w2
−sx·Yj
w2
−sx
w2
0 0 0 Xj
w
Yj
w
1
w
−sy·Xj
w2
−sy·Yj
w2
−sy
w2)!,(73)
with
sx=h0·Xj+h1·Yj+h2,(74)
sy=h3·Xj+h4·Yj+h5,(75)
w=h6·Xj+h7·Yj+h8.(76)
See Alg. 4.3 for a compact summary of this step.
3.3 Step 2: Determining the intrinsic camera parameters
In the previous step, the homographies H0,...,HM−1were calculated indepen-
dently for each of the Mviews. The homographies encode both the common
camera intrinsics as well as the extrinsic transformation parameters that are
generally different for each view. This section describes the extraction of the
intrinsic camera parameters from the given set of homographies.
As defined in Eqn. (42), a homography H=Hicombines the inner camera
transformation Aand the view-specific external transformation R,t, such that
H=
| | |
h0h1h2
| | |
=λ·A·
| | |
r0r1t
| | |
,(77)
where λis an arbitrary nonzero scale factor. For Rto be a valid rotation matrix,
the column vectors r0,r1must be orthonormal, i. e.,
r|
0·r1=r|
1·r0= 0 and (78)
r|
0·r0=r|
1·r1= 1.(79)
Burger – Zhang’s Camera Calibration Algorithm 17
We can see from Eqn. (77) that
h0=λ·A·r0,(80)
h1=λ·A·r1(81)
and thus
A−1·h0=λ·r0,(82)
A−1·h1=λ·r1(83)
and furthermore12
h|
0·(A−1)|=λ·r|
0,(84)
h|
1·(A−1)|=λ·r|
1.(85)
Based on Eqns. (78)–(79) this yields two fundamental contraints on the
intrinsic parameters for a given homography H:
h|
0·(A−1)|·A−1·h1= 0,(86)
h|
0·(A−1)|·A−1·h0=h|
1·(A−1)|·A−1·h1(87)
(the factor λis irrelevant here). For estimating the camera intrinsics, Zhang
substitutes the above expression (A−1)|·A−1by a new matrix13
B= (A−1)|·A−1=
B0B1B3
B1B2B4
B3B4B5
,(88)
which is symmetric and composed of only 6 distinct quantities (by inserting
from Eqn. (15)):
B0=1
α2, B1=−γ
α2β,(89)
B2=γ2
α2β2+1
β2, B3=vcγ−ucβ
α2β,(90)
B4=−γ(vcγ−ucβ)
α2β2−vc
β2, B5=(vcγ−ucβ)2
α2β2+v2
c
β2+ 1.(91)
We can now write the constraints in Eqns. (86)–(87) in the form
h|
0·B·h1= 0,(92)
h|
0·B·h0−h|
1·B·h1= 0,(93)
with hp= (Hp,0, Hp,1, Hp,2)|being the pth column vector (for p∈ {0,1,2}), of