Content uploaded by Wilhelm Burger
Author content
All content in this area was uploaded by Wilhelm Burger on Apr 30, 2020
Content may be subject to copyright.
Zhang’s Camera Calibration Algorithm:
InDepth Tutorial and Implementation
by
Wilhelm Burger
wilhelm.burger@fhhagenberg.at
Technical Report HGB1605
16th May, 2016 (revised April 2020)
Department of Digital Media
University of Applied Sciences Upper Austria, School of Informatics,
Communications and Media, Softwarepark 11, 4232 Hagenberg, Austria
www.fhhagenberg.at
Copyright 2016 by the author(s). This report may be freely used, copied, printed and
distributed in its entirety on paper or electronically for academic purposes. Any copies must
include this cover page and notice. Any nonacademic use is subject to explicit permission by
the authors.
Abstract
This report details the algorithmic steps involved in the wellknown camera
calibration method by Zhang and describes an associated opensource Java im
plementation that depends only upon the Apache Commons Math library.
Index Terms: Computer vision, camera calibration, Zhang’s method, camera
projection, imaging geometry, image rectiﬁcation, Java implementation.
Software Repository: https://github.com/imagingbook/imagingbookcalibrate
Citation
Wilhelm Burger: Zhang’s Camera Calibration Algorithm: InDepth Tutorial and Imple
mentation, Technical Report HGB1605, University of Applied Sciences Upper Austria,
School of Informatics, Communications and Media, Dept. of Digital Media, Hagenberg,
Austria, May 2016. https://www.researchgate.net/publication/303233579 Zhang’s Camera
Calibration Algorithm InDepth Tutorial and Implementation.
@techreport{BurgerCalibration2016,
author = {Burger, Wilhelm},
title = {Zhang’s Camera Calibration Algorithm: InDepth Tutorial
and Implementation},
language = {english},
institution = {University of Applied Sciences Upper Austria, School of
Informatics, Communications and Media, Dept. of Digital
Media},
address = {Hagenberg, Austria},
number = {HGB1605},
year = {2016},
month = {05},
url = {https://www.researchgate.net/publication/303233579_Zhang’
s_Camera_Calibration_Algorithm_In
Depth_Tutorial_and_Implementation}
}
ii
Burger – Zhang’s Camera Calibration Algorithm 1
1 Introduction
Accurate knowledge of the image projection parameters is an essential prereq
uisite for any kind of quantitative geometric measurement in computer vision.
The real projection parameters depend on numerous technical elements and are
usually not provided by manufacturers of imaging systems. Also, e. g., in the
case of cameras equipped with zoom lenses, the projection parameters may be
variable.
Many approaches to camera calibration exist (see, e. g., [9, p. 226]) using
diﬀerent strategies with regard to what about the 3D scene is known. Some
approaches make use of a special, calibrated 3D setup (calibration rig ), where the
position of all 3D points and the camera center are known. Other approaches,
such as the one by Zhang described here, use multiple views of a 3D pattern
of known structure but unknown position and orientation in space. Finally,
calibration methods exist that make no assumptions about the 3D structure of
the scene, using multiple views of arbitrary, rigid structures. This is commonly
called “self calibration”. In this case the intrinsic camera parameters and the
extrinsic viewing parameters (3D structure) are recovered together. Based on
the imaging model described in Sec. 2, the following parameters are recovered:
The intrinsic camera parameters, i.e., the inner transformations of the
camera, including focal length, position of the principal point, sensor scale
and skew.
The parameters of the nonlinear lens distortion.
The external transformation parameters (3D rotation and translation) for
each of the given views of the reference pattern.
2 The perspective projection model
This section describes the underlying projection process from 3D world points
to 2D sensor coordinates and outlines the associated notation.
2.1 The pinhole camera model
The simple and wellknown pinhole camera model (see, e. g., [2, Chap. 1]) is
used to describe the projection of 3D world points onto the camera’s sensor
plane. We assume that the image plane is positioned in front of the optical
center, thus the image is not upsidedown. The image plane is positioned at the
distance ffrom the optical center C= (0,0,0), perpendicular to the optical
axis. The optical center Cis the origin of the 3D camera coordinate system.
The optical axis aligns with the Zaxis of the coordinate system and intersects
the image plane at (0,0, f ). Throughout this text, we use the deﬁnitions listed
in Table 1.
We assume that initially the camera coordinate system is identical to the
world coordinate system (we later remove this constraint and use two separate
coordinate systems). From similar triangles, every 3D point X= (X, Y, Z )
Burger – Zhang’s Camera Calibration Algorithm 2
Table 1: List of symbols and notation used in this document.
X= (X, Y, Z ). . . . . . . . a 3D world point
X= (X0,...,XN−1). . . the 3D target points, with Xj= (X, Y , 0)
X= hom(X).......... homogeneous coordinate for a Cartesian coordinate X
X= hom−1(X) . . . . . . . Cartesian coordinate for a homogeneous coordinate X
x= (x, y). . . . . . . . . . . . . a projected 2D point in the normalized image plane
˜
x= (˜x, ˜y). . . . . . . . . . . . . a 2D point after lens distortion (in the normalized image
plane)
u= (u, v). . . . . . . . . . . . a projected 2D sensor p oint
˙
u= ( ˙u, ˙v). . . . . . . . . . . . an observed 2D sensor point
˙
Ui= ( ˙
ui,0,..., ˙
ui,N−1). the observed sensor points for view i
˙
U= ( ˙
U0,..., ˙
UM−1) . . the observed sensor points for all Mviews
A. . . . . . . . . . . . . . . . . . . . . intrinsic camera matrix (Eqn. (15))
a. . . . . . . . . . . . . . . . . . . . . . vector of intrinsic camera parameters, including the dis
tortion coeﬃcients (Eqn. (120))
f. . . . . . . . . . . . . . . . . . . . . . fo cal length
R..................... a 3×3 rotation matrix
ρ= (ρx, ρy, ρz) . . . . . . . . a 3D rotation (Rodrigues) vector (Eqn. (126))
t= (tx, ty, tz). . . . . . . . . a 3D translation vector
W= (Rt) . . . . . . . . . . . an extrinsic camera (view) matrix (Eqn. (16))
w. . . . . . . . . . . . . . . . . . . . . vector of 3D view parameters (Eqn. (121))
W= (Wi) . . . . . . . . . . . . . a sequence of camera views
ˇ
P(W,X).............. 3D 7→ 2D pro jection, which maps the 3D point Xto
normalized image coordinates with view parameters W
(Eqn. (20))
P(A,W,X) . . . . . . . . . . . 3D 7→ 2D pro jection, which maps the 3D point Xto the
associated 2D sensor point uwith camera intrinsics A
and view parameters W(Eqn. (24))
P(A,k,W,X) . . . . . . . . 3D 7→ 2D projection, which includes camera lens distor
tion with parameters k(Eqn. (28)).
yields the relations
X
Z=x
f,Y
Z=y
f(1)
and thus the projection point in the image plane is
x=f·X
Z, y =f·Y
Z(2)
or, in vector notation,
x=x
y=f
Z·X
Y.(3)
The world point Xlies on a ray which passes through the optical center C=
(0,0,0)and the corresponding image point xi, i.e.,
Xi=λ·xi− C=λ·xi,(4)
Burger – Zhang’s Camera Calibration Algorithm 3
for some λ > 1.
2.2 The projection matrix
Equations (2) and (3) describe nonlinear transformations in the domain of Carte
sian coordinates. Using homogeneous coordinates,1the perspective transforma
tion can be written as a (linear) matrix equation
x
y≡f
Z·X
Y≡
fX/Z
fY /Z
1
≡
fX
fY
Z
=
f0 0 0
0f0 0
0 0 1 0
 {z }
MP
·
X
Y
Z
1
(5)
or, written more compactly,2
x= hom−1(MP·hom(X)) .(6)
The projection matrix MPcan be decomposed into two matrices Mfand M0
in the form
MP=
f0 0 0
0f0 0
0 0 1 0
=
f0 0
0f0
0 0 1
 {z }
Mf
·
1000
0100
0010
 {z }
M0
=Mf·M0,(7)
where Mfmodels the internals of the (ideal) pinhole camera with focal length f
and M0describes the transformation between the camera coordinates and the
world coordinates. In particular, M0is often referred to as the standard (or
canonical )projection matrix [9], which corresponds to the simple viewing ge
ometry (the optical axis being aligned with the Zaxis), which we have assumed
so far.
2.3 Viewing under rigid motion
If the camera has its own (noncanonical) coordinate system, it observes 3D
points that were subjected to rigid body motion, as described in Sec. A.2. Thus
the projective transformation MP(Eqn. (5)) is now applied to the modiﬁed (ro
tated and translated) points X0instead of the original 3D points X= hom(X),
that is,
x= hom−1[MP·X0] = hom−1[MP·Mrb ·hom(X)],(8)
where the matrix Mrb speciﬁes some rigid body motion in 3D.3The com
plete perspective imaging transformation for the ideal pinhole camera with focal
length funder rigid motion can thus be written as
x= hom−1[Mf·M0·Mrb ·hom(X)] (9)
1See also Sec. A.1 in the Appendix.
2The operator x= hom(x) converts Cartesian coordinates to homogeneous coordinates.
Inversely, x= hom−1(x) denotes the conversion from homogeneous to Cartesian coordinates
(see Sec. A.1 of the Appendix).
3See also Sec. A.2.3, Eqn. (170) in the Appendix.
Burger – Zhang’s Camera Calibration Algorithm 4
or, by combining M0and Mrb into a single matrix,
x
y= hom−1"
f0 0
0f0
0 0 1
·
1000
0100
0010
·
r11 r12 r13 tx
r21 r22 r23 ty
r31 r32 r33 tz
0 0 0 1
·
X
Y
Z
1
#
= hom−1"
f0 0
0f0
0 0 1
 {z }
Mf
·
r11 r12 r13 tx
r21 r22 r23 ty
r31 r32 r33 tz
 {z }
(Rt)
·
X
Y
Z
1
#(10)
= hom−1Mf·(Rt)·hom(X).(11)
In the special case of f= 1, Mfbecomes the identity matrix and can thus be
omitted, that is,
x= hom−1[(Rt)·hom(X)] .(12)
In the following, this is referred to as the “normalized projection”.
2.4 Intrinsic camera parameters
A ﬁnal small step is required to make the perspective imaging transformation
in Eqn. (11) useful as a model for real cameras. In particular, we need to deﬁne
how the continuous x/ycoordinates on the image plane map to actual pixel
coordinates by taking into account
the (possibly diﬀerent) sensor scales sx,syin x and ydirection, respec
tively,
the location of the image center uc= (uc, vc) with respect to the image
coordinate system (i. e., the optical axis), and
the skewedness (diagonal distortion) sθof the image plane (which is usu
ally negligible or zero).
The ﬁnal sensor coordinates u= (u, v)are obtained from the normalized
image coordinates x= (x, y)(see Eqn. (12)) as
u
v= hom−1"
sxsθuc
0syvc
0 0 1
·
f0 0
0f0
0 0 1
 {z }
A
·
x
y
1
#(13)
= hom−1[A·hom(x)] ,(14)
where
A=
fsxf sθuc
0fsyvc
0 0 1
=
α γ uc
0β vc
0 0 1
(15)
is the socalled intrinsic camera matrix. Taking into account these additional
inner camera parameters, the complete perspective imaging transformation can
Burger – Zhang’s Camera Calibration Algorithm 5
now be written as
u
v= hom−1"
α γ uc
0β vc
0 0 1
 {z }
A
·
r11 r12 r13 tx
r21 r22 r23 ty
r31 r32 r33 tz
 {z }
W=(Rt)
·
Xi
Yi
Zi
1
#(16)
= hom−1[A·W·hom(X)] ,(17)
where Acaptures the intrinsic properties of the camera (“intrinsics”), and
W= (Rt) are the extrinsic parameters of the projection transformation
(“extrinsics”). Note that we can calculate the projection in Eqn. (17) in two
steps:
Step 1: Calculate the normalized projection x= (x, y)(see Eqn. (12)):
x
y= hom−1[W·hom(X)] (18)
= hom−1"
r11 r12 r13 tx
r21 r22 r23 ty
r31 r32 r33 tz
·
Xi
Yi
Zi
1
#(19)
=ˇ
P(W,X).(20)
Step 2: Map from normalized coordinates xto sensor coordinates u= (u, v)
by the aﬃne transformation A(see Eqn. (14)):
u
v= hom−1[A·hom(x)] = A0·hom(x) (21)
= hom−1"
α γ uc
0β vc
0 0 1
·
x
y
1
#=α γ uc
0β vc
 {z }
A0
·
x
y
1
,(22)
where A0is the upper 2 ×3 submatrix of A. Note that, by using A0, no explicit
conversion to Cartesian coordinates (hom−1) is required in Eqn. (22). Aand A0
are aﬃne mappings in 2D. Combining the two steps above we can summarize
the whole 3D to 2D projection process, (from world coordinates Xto sensor
coordinates u) in a single expression,
u=A0·hom[x] = A0·homhom−1[W·hom(X)]=P(A,W,X).(23)
We will refer to P(A,W,X), as deﬁned in Eqn. (23), as the projection function,
which maps the 3D point X= (X, Y, Z )(deﬁned in world coordinates) to the
2D sensor point u= (u, v), using the intrinsic parameters Aand the extrinsic
(view) parameters W. This function can be separated into two component
functions in the form
P(A,W,X) = Px(A,W,X)
Py(A,W,X)=u
v=u,(24)
We will build on this notation in the following steps.
Burger – Zhang’s Camera Calibration Algorithm 6
The primary goal of camera calibration is to determine the un
known intrinsic properties of the camera i. e., the elements of
the matrix A(as detailed in Sec. 3).
2.5 Lens distortion
So far we have relied on the naive pinhole camera model which exhibits no
distortions beyond the projective transformation described above. Real cameras
are built with lenses instead of pinholes and these introduce additional geometric
distortions that include two main components [8, p 342]:
Decentering errors caused by a displacement of the lens center from
the optical axis (this is mostly taken care of by the variable oﬀset (uc, vc)
in Eqn. (13));
Radial distortion caused by variations in light refractions, which is typ
ically apparent in wideangle lenses (“barrel distortion”).
While lens distortion is a complex physical phenomenon in general, it is usually
modeled with suﬃcient accuracy as a singlevariable polynomial function D(r)
of the radial distance rfrom the lens center [8, p 343] (see Sec. 2.5.2, Eqn. (32)).
2.5.1 Where does the lens distortion come in?
Lens distortion aﬀects the normalized projection coordinates x, i. e., before the
imagetosensor transformation (deﬁned by the intrinsic camera parameters) is
applied. Before we investigate the actual distortion model, we deﬁne a general
distortion function warp: R27→ R2, which maps an undistorted 2D coordinate
xto a distorted 2D coordinate ˜
x(again in the normalized projection plane) by
˜
x= warp(x,k),(25)
where kis a vector of distortion parameters. With this deﬁnition, we can
reformulate the projection process in Eqn. (23) to include the lens distortion as
u=A0·hom˜
x=A0·homwarp(x,k)(26)
=A0·homwarphom−1[W·hom(X)]
 {z }
x
,k (27)
=P(A,k,W,X).(28)
In the following, we describe how the warp() function (referenced in Eqns. (26)–
(27)) is speciﬁed and calculated.
2.5.2 Radial distortion model
Aradial model is most commonly used for correcting geometric lens distor
tions. By radial distortion we understand that the displacement is restricted to
radial lines emanating from the image center; the amount of radial displacement
(inwards or outwards) is a function of the radius only. In the normalized pro
jection plane, the optical axis intersects the image plane at xc= (0,0), which
Burger – Zhang’s Camera Calibration Algorithm 7
is assumed to be the center of the lens distortion. The radial distance riof a
projected point x= (x, y)from the center can thus be simply calculated as
ri=kxi−xck=kxik=qx2
i+y2
i.(29)
The distortion is assumed to be radially symmetric, i. e., it only depends on the
original radius riof a given point xi, as deﬁned in Eqn. (29). The distortion
model can thus be speciﬁed by a singlevariable function D(r, k), such that the
distorted radius is
˜r=frad(r) = r·[1 + D(r, k)].(30)
Consequently (see Eqn. (25)), the warped projection point is ˜
xi= warp(xi,k),
with
warp(xi,k) = xi·[1 + D(kxik,k)].(31)
The function D(r, k) speciﬁes the (positive or negative) radial deviation for a
given radius r. A simple but eﬀective radial distortion model is based on the
polynomial function
D(r, k) = k0·r2+k1·r4=k·r2
r4,(32)
with the (unknown) coeﬃcients k= (k0, k1), as illustrated in Fig. 1.4Note that,
if k0=k1= 0, then also D(r, k) = 0 and there is no distortion.
r
D(r, k)
Figure 1: Plot of the radial deviation function D(r, k) in Eqn. (32) for coeﬃ
cients k= (k0, k1) = (−0.2286,0.190335).
2.6 Summary of the projection process
In summary, the following steps model the complete projection process (see
Fig. 3):
4Other formulations of the radial distortion function can be found in the literature. For
example, D(r) = k0·r2+k1·r4+k2·r6is used in [8, p. 343] or D(r) = k0·r+k1·r2+
k2·r3+k3·r4by Devernay and Faugeras (mentioned in [9, p. 58]). For a detailed analysis
of radial distortion in ﬁsheye lenses see [3].
Burger – Zhang’s Camera Calibration Algorithm 8
r
˜r=frad (r)
Figure 2: Plot of the radial distortion ˜r=frad(r) = r·[1 + D(r, k)] in Eqn.
(30) for the same parameters (k) as in Fig. 1.
Burger – Zhang’s Camera Calibration Algorithm 9
(c)(b)
(a)
1
Z
X
x
y
˜x
˜y
u
v
uc
radial
distortion
˜
x←warp(x,k)
aﬃne 2D
mapping
u←A·˜
x
x
˜
x
u
Figure 3: Summary of the projection chain (from right to left). In (c) the
3D point X(in camera coordinates) is projected (with f= 1) onto the “ideal”
image plane to the normalized coordinates x= (x, y). Radial lens distortion
in (b) maps point xto ˜
x= (˜x, ˜y). The aﬃne mapping speciﬁed by the
intrinsic camera transformation (matrix A) ﬁnally yields the observed sensor
image coordinates u= (u, v)in (a).
1. Worldtocamera transformation: Given a point (X, Y , Z), expressed
in 3D world coordinates, its position in the 3D camera coordinate system
is speciﬁed by the viewing transformation W(see Eqn. (168)),
X0
Y0
Z0
1
=
r11 r12 r13 tx
r21 r22 r23 ty
r31 r32 r33 tz
0 0 0 1
 {z }
W
·
X
Y
Z
1
,(33)
in homogeneous coordinates, or simply
X0=W·X.(34)
Burger – Zhang’s Camera Calibration Algorithm 9
2. Projection onto the “normalized” (ideal) image plane: The per
spective projection from the 3Dpoint X0= (X0, Y 0, Z 0)(in 3D camera
coordinates) onto continuous, normalized 2D coordinates x= (x, y)on
the image plane is deﬁned (see Eqn. (5)) as
x
y=1
Z0·X0
Y0= hom−1
h
1000
0100
0010
·
X0
Y0
Z0
1
i=ˇ
P(W,X),(35)
which is equivalent to an ideal pinhole projection with focal length f= 1.
3. Radial lens distortion: The normalized 2D projection coordinates x=
(x, y)are subjected to a nonlinear radial distortion with respect to the
optical center xc= (0,0), expressed by the mapping
˜
x= warp(x,k) or ˜x
˜y=x
y·[1 + D(r, k)],(36)
with r=px2+x2and D(r, k) as deﬁned in Eqn. (32). ˜
x= (˜x, ˜y)are
the lensdistorted 2D coordinates—still in the normalized image plane.
4. Aﬃne 2D transformation to sensor coordinates: The normalized
projection points are ﬁnally mapped to the scaled and skewed sensor co
ordinates (see Eqn. (13)) by the aﬃne transformation
u=A0·hom[˜
x] or u
v=α γ uc
0β vc·
˜x
˜y
1
,(37)
where α,β,γ,uc, vcare the intrinsic camera parameters (see Eqns. (15)
and (22)).
3 Planebased self calibration
The popular camera calibration method by Zhang [14, 15] uses a few (at least
two) views of a planar calibration pattern, called “model” or “target”, whose
layout and metric dimensions are precisely known.5The calibration procedure
works roughly as follows:
1. Images I0, . . . , IM−1of the model are taken under diﬀerent views by either
moving the model or the camera (or both).
2. From each image Ii(i= 0, . . . , M −1), Nsensor points ˙
ui,0,..., ˙
ui,j ,
. . . , ˙
ui,N−1are extracted (observed), assumed to be in 1:1 correspondence
with the points on the model plane.
3. From the observed points, the associated homographies H0,...,HM−1
(linear mappings from the model points and the observed 2D image points)
are estimated for each view i. (see Sec. 3.2).
5A variant of this calibration technique is also implemented in OpenCV [7].
Burger – Zhang’s Camera Calibration Algorithm 10
4. From the view homographies Hi, the ﬁve intrinsic parameters (α, β, γ,
uc,vc) of the camera are estimated using a closedform (linear) solution,
ignoring any lens distortion at this point. M≥3 views give a unique solu
tion (up to an undetermined scale factor). If the sensor plane is assumed
to be without skew (i. e., γ= 0, which is a reasonable assumption) then
N= 2 images are suﬃcient. More views generally lead to more accurate
results (see Sec. 3.3).
5. Once the camera intrinsics are known, the extrinsic 3D parameters Ri,ti
are calculated for each camera view i(see Sec. 3.4).
6. The radial distortion parameters k0, k1are estimated by linear leastsquar
es minimization (see Sec. 3.5).
7. Finally, using the estimated parameter values as an initial guess, all pa
rameters are reﬁned by nonlinear optimization over all Mviews (see Sec.
3.6).
These steps are explained in greater detail below (see [14] for a complete de
scription and the list of symbols in Table 1).
3.1 Calibration model and observed views
The calibration model contains Nreference points X0,...,XN−1whose 3D
coordinates are known. The points are assumed to lie in the XY plane, i. e.,
their Zcomponent is zero.
We assume that Mdiﬀerent views (i. e., pictures) are taken of the model
plane and we use i= 0, . . . , M −1 to denote the ith view of the model. From
each camera picture Ii,Nfeature points are extracted, so we get the observed
sensor points
˙
ui,j ∈R2,(38)
with view numbers i= 0, . . . , M −1 and point numbers j= 0, . . . , N −1. Note
that every observed point ˙
ui,j must correspond to the associated model point
Xj. Thus the model points Xjand the image points ˙
ui,j must be supplied in
the same order. It is essential that this condition is met, since otherwise the
calibration will deliver invalid results.
3.2 Step 1: Estimating the homography for each view
Using Eqn. (17), the mapping (homography) between the observed image coor
dinates ˙
ui,j = ( ˙ui,j ,˙vi,j )and the corresponding 3D point coordinates Xjcan
be expressed as
s·
˙ui,j
˙vi,j
1
=A·Riti·
Xj
Yj
Zj
1
(39)
or
s·˙
ui,j =A·Riti·Xj(40)
(with homogeneous coordinates ˙
u,X), where
A=
α γ uc
0β vc
0 0 1
(41)
Burger – Zhang’s Camera Calibration Algorithm 11
represents the intrinsic camera parameters (common to every view) and and sis
an arbitrary, nonzero scale factor. Ri,tiare the (extrinsic) 3D rotation matrix
and the translation vector for the speciﬁc view i.
Since the model points Xjare assumed to lie in the XY plane of the world
coordinate system (i. e., Zj= 0 for all j),6we can rewrite Eqn. (39) as
s·
˙ui,j
˙vi,j
1
=A·
   
ri,0ri,1ri,2ti
   
·
Xj
Yj
0
1
=A·
  
ri,0ri,1ti
  
·
Xj
Yj
1
,(42)
where ri,0,ri,1,ri,2denote the three column vectors of the matrix Ri. Note
that Zj= 0 makes the third vector (ri,2) redundant and it is therefore omitted
in the right part of Eqn. (42). This is equivalent to a 2D homography mapping
s·
˙ui,j
˙vi,j
1
=Hi·
Xj
Yj
1
,(43)
where sis a (undetermined) nonzero scale factor and Hi=λ·A·Ritiis a
3×3 homography matrix (λbeing an arbitrary scalar value which can be ignored
since we work with homogeneous coordinates). The matrix Hiis composed of
the 3 column vectors hi,0,hi,1,hi,2, that is,
Hi=

hi,0hi,1hi,2

=λ·A·
  
ri,0ri,1ti
  
.(44)
Thus the task is to determine the homography matrix Hifor a set of corre
sponding 2D points ( ˙ui,j ,˙vi,j )and (Xj, Yj).
3.2.1 Homography estimation with the Direct Linear Transforma
tion (DLT)
Among the several methods for estimating homography mappings, the DLT
method is the simplest (see [5, Ch. 4]). It is also used in Zhang’s original
implementation. We assume that the two corresponding 2D points sequences,
the model points X= (X0,...,XN−1), with Xj= (Xj, Yj) and the associated
(observed) sensor points ˙
U= ( ˙
u0,..., ˙
uN−1), with ˙
uj= ( ˙uj,˙vj), are related
by a homography transformation, that is (written in homogeneous coordinates),
˙
uj=H·Xj(45)
or
˙uj
˙vj
˙wj
=
H0,0H0,1H0,2
H1,0H1,1H1,2
H2,0H2,1H2,2
·
Xj
Yj
Zj
,(46)
6This only means that we assume a moving camera instead of a moving object plane
and makes no diﬀerence, because only the relative coordinates with respect to the camera is
required for calibration.
Burger – Zhang’s Camera Calibration Algorithm 12
for j= 0, . . . , N −1. Without loss of generality (to show!), we can set Zj= 1
(such that Xj=Xj,Yj=Yj) and can thus rewrite Eqn. (46) as
˙uj
˙vj
˙wj
=
H0,0H0,1H0,2
H1,0H1,1H1,2
H2,0H2,1H2,2
·
Xj
Yj
1
.(47)
In Cartesian coordinates, this corresponds to a pair of nonlinear equations,
uj=˙uj
˙wj
=H0,0·Xj+H0,1·Yj+H0,2
H2,0·Xj+H2,1·Yj+H2,2
,(48)
vj=˙vj
˙wj
=H1,0·Xj+H1,1·Yj+H1,2
H2,0·Xj+H2,1·Yj+H2,2
,(49)
which can be rearranged to
˙uj·(H2,0·Xj+H2,1·Yj+H2,2) = H0,0·Xj+H0,1·Yj+H0,2,(50)
˙vj·(H2,0·Xj+H2,1·Yj+H2,2) = H1,0·Xj+H1,1·Yj+H1,2,(51)
and ﬁnally
˙uj·Xj·H2,0+ ˙uj·Yj·H2,1+ ˙uj·H2,2−H0,0·Xj−H0,1·Yj−H0,2= 0,(52)
˙vj·Xj·H2,0+ ˙vj·Yj·H2,1+ ˙vj·H2,2−H1,0·Xj−H1,1·Yj−H1,2= 0.(53)
This is a pair of homogeneous equations (since the right hand side is zero)
that are linear in the unknown coeﬃcients Hr,c (although, due to the mixed
terms, still nonlinear w.r.t. the coordinates ˙uj,˙vj, Xj, Yj). By collecting the
nine elements of the unknown homography matrix Hinto the vector
h= (H0,0, H0,1, H0,2, H1,0, H1,1, H1,2, H2,0, H2,1, H2,2),(54)
Eqns. (52) and (53) can be written in the form
−Xj−Yj−1 0 0 0 ˙ujXj˙ujYj˙uj
0 0 0 −Xj−Yj−1 ˙vjXj˙vjYj˙vj·h=0
0,(55)
for every corresponding point pair ( ˙uj,˙vj)↔(Xj, Yj). Thus, Npoint pairs,
assumed to be related by the same homography H, yield a system of 2Nhomo
geneous linear equations in the form
−X0−Y0−1 0 0 0 ˙u0X0˙u0Y0˙u0
0 0 0 −X0−Y0−1 ˙v0X0˙v0Y0˙v0
−X1−Y1−1 0 0 0 ˙u1X1˙u1Y1˙u1
0 0 0 −X1−Y1−1 ˙v1X1˙v1Y1˙v1
−X2−Y2−1 0 0 0 ˙u2X2˙u2Y2˙u2
0 0 0 −X2−Y2−1 ˙v2X2˙v2Y2˙v2
.
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
.
−XN−1−YN−1−1 0 0 0 ˙uN−1XN−1˙uN−1YN−1˙uN−1
0 0 0 −XN−1−YN−1−1 ˙vN−1XN−1˙vN−1YN−1˙vN−1
·
H0,0
H0,1
H0,2
H1,0
H1,1
H1,2
H2,0
H2,1
H2,2
=
0
0
0
0
0
0
.
.
.
0
0
(56)
or, in the usual matrixvector notation,
M·h=0,(57)
where Mis a 2N×9 matrix (with all elements being known constants), dis
the vector of the nine unknowns, and 0is the zero vector of length 2N.
Burger – Zhang’s Camera Calibration Algorithm 13
Solving homogeneous systems of linear equations: While Eqn. (57)
looks quite similar to an ordinary system of linear equations of the form M·x=
b, it cannot be solved in the usual way (without additional constraints), since it
always has h=0as a trivial (and thus useless) solution. However, Eqn. (57) can
be solved by singularvalue decomposition (SVD, [4, Sec. 6.11][1, Sec. 4.5.3][10,
Sec. 2.6]) of the matrix M, which separates M(of size 2N×9) into the product
of three matrices U,S,Vin the form
M=U·S·V.(58)
Here7Uis a 2N×2N(in this particular case) unitary8matrix, Sis a 2N×9
rectangular diagonal matrix with nonnegative real values, and Vis a 9 ×9
unitary matrix. The concrete decomposition of Mis thus
M=
U0,0··· U0,2N−1
U1,0··· U1,2N−1
.
.
.....
.
.
U2N−1,0··· U2N−1,2N−1
 {z }
U
·
s000000000
0s10000000
0 0 s2000000
000s300000
0000s40000
00000s5000
000000s60 0
0000000s70
00000000s8
000000000
.
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
.
000000000
 {z }
S
·
V0,0··· V8,0
V0,1··· V8,1
.
.
.....
.
.
V0,8··· V8,8
 {z }
V
.(59)
The diagonal elements of S,s0, . . . , s8are called the singular values of the
decomposed matrix M. Each singular value sihas an associated column vector
uiin U(called a left singular vector of M) and a dedicated row vector v
iin
V(i. e., a column vector viin V), called a right singular vector. Thus Eqn.
(59) can be equally written as
M=
  
u0··· ui· ·· u2N−1
  
·S·
—v
0—
.
.
.
—v
i—
.
.
.
—v
8—
,(60)
where Uconsists of the column vectors (u0,...,u2N−1)9and Vis composed of
the row vectors (v
0,...,v
8).
The seekedfor solution to Eqn. (57), i. e., the unknown parameter vector
h, is ﬁnally found as the right singular vector vkassociated with the smal lest
singular value sk= min(s0, . . . , s8), that is,
h=vk,with k= argmin
0≤i<9
si. (61)
7There are several diﬀerent notations in use regarding the size of the matrices involved in
the SVD. We use the version adopted by Matlab and Mathematica (http://mathworld.wolfram.
com/SingularValueDecomposition.html), where Shas the same size as the original matrix M
and U,Vare square.
8A square matrix Uis called unitary if its column vectors are orthogonal, i. e., if U·U=
U·U=I.
9Note that the vectors uiare zero for i≥9.
Burger – Zhang’s Camera Calibration Algorithm 14
If the resulting homography transformation for the corresponding point sets
is exact, the value of skis zero. This is generally the case when the homography
is calculated from 4 corresponding point pairs, which is the required minimum
number to solve for the eight degrees of freedom.
If more than 4 point pairs are involved, the system in Eqn. (57) is overdeter
mined (which is the usual case). Here the value of skindicates the residual or
“goodness of ﬁt” of the resulting homography. Of course, if the ﬁt is exact for
all point pairs, skwill again be zero. In the case of an overdetermined system,
the obtained solution minimizes Eqn. (57) in the leastsquares sense, that is,
kM·hk2→min .(62)
In many common SVD implementations,10 the singular values in Sare arranged
in nonincreasing order (i. e., si≥si+1), such that (in our case) s8comes out as
the smallest value and v8(the last column vector of V) is the corresponding so
lution. For example, the following Java code segment shows an implementation
with the Apache Commons Math (ACM) library:11
RealMatrix M;
SingularValueDecomposition svd = new SingularValueDecomposition(M);
RealMatrix V = svd.getV();
RealVector h = V.getColumnVector(V.getColumnDimension()  1);
Note that the formulation in Eqn. (62) minimizes an algebraic residual that
does not directly relate to the geometric projection error. This does not cause a
problem in general, since the remaining errors are eliminated in the ﬁnal, overall
optimization step (see Sec. 3.6). However, minimizing the projection errors of
the homographies at this stage (which is not done in Zhang’s implementation)
may help to improve the convergence of the ﬁnal optimization. It requires non
linear optimization, for which the above solution can serve as a good initial
guess (see Sec. 3.2.3).
3.2.2 Normalization of input data
To improve numerical stability of the calculations, it is recommended [6] to
normalize both 2D point sets Xand ˙
U, before performing the homography
estimation described in the previous section.
Normalization is accomplished by transforming each point set by a dedicated
3×3 normalization matrix N(in homogeneous coordinates), such that the
transformed point set becomes centered at the origin and scaled to a standard
diameter, i. e.,
X0= normalize(X) = (NX·X0,...,NX·XN−1),(63)
˙
U0= normalize( ˙
U)=(NU·˙
u0,...,NU·˙
uN−1).(64)
See Sec. B of the Appendix for methods to calculate the normalization ma
trices NXand NU. Homography estimation is then (analogous to Eqn. (45))
performed on the normalized point sets X0,˙
U0, by calculating a matrix H0sat
isfying (in the least squares sense)
NU·˙
uj=H0·NX·Xj,(65)
10E. g., Matlab,Mathematica,Apache Commons Math.
11http://commons.apache.org/math/ (version 3.6)
Burger – Zhang’s Camera Calibration Algorithm 15
for j= 0, . . . , N −1. By substituting ˙
uj(from Eqn. (45)) we get
NU·H·Xj=H0·NX·Xj(66)
and then, by dropping xjon both sides,
NU·H=H0·NX.(67)
Thus the denormalized homography can be calculated (by multiplying both
sides with N−1
U) as
H=N−1
U·H0·NX.(68)
3.2.3 Nonlinear reﬁnement of estimated homographies
As mentioned above, the homography estimates obtained with the DLT method
do not generally minimize the projection errors in the sensor image. In this
step, the estimated homography Hfor a single view is numerically reﬁned by
minimizing the projection error. Minimizing the projection error requires non
linear optimization, which is usually implemented with iterative schemes, such
as the LevenbergMarquart (LM) method, described in Sec. E of the Appendix.
Given is the ordered sequence of model (target) points X= (X0,...,XN−1),
the set of corresponding (observed) sensor points ˙
U= ( ˙
u0,..., ˙
uN−1) for a
particular view, and the estimated homography H(calculated as described in
Sec. 3.2.1). Following Eqn. (45), the goal is to minimize the projection error
Eproj =
N−1
X
j=0
˙
uj−H·Xj
2(69)
in the sensor plane, assuming that the positions of the model points Xjare accu
rate. The LevenbergMarquart optimization, performed by procedure Optimize()
in Alg. 4.3, requires two functions for specifying the optimization problem:
The value function: The function Val(X,h) returns the vector (of length
2N) of projected (u, v) coordinates, obtained by applying the homography H
(represented by the vector h= (h0,...,h8)) to all model points Xj= (Xj, Yj)
in X, that is,
uj=uj
vj= hom−1(H·hom(Xj)).(70)
The returned vector stacks the u, v values for all Nmodel points in the form
Val(X,h) = Y=
u0
v0
u1
v1
.
.
.
uN−1
vN−1
.(71)
Burger – Zhang’s Camera Calibration Algorithm 16
The Jacobian function: The function Jac(X,h) returns a stacked Jacobian
matrix (of size 2N×9), consisting of the partial derivatives w.r.t the nine
homography parameters. The structure of the returned matrix is
Jac(X,h) = J=
∂u0
∂h0
∂u0
∂h1
∂u0
∂h2
∂u0
∂h3
∂u0
∂h4
∂u0
∂h5
∂u0
∂h6
∂u0
∂h7
∂u0
∂h8
∂v0
∂h0
∂v0
∂h1
∂v0
∂h2
∂v0
∂h3
∂v0
∂h4
∂v0
∂h5
∂v0
∂h6
∂v0
∂h7
∂v0
∂h8
∂u1
∂h0
∂u1
∂h1
∂u1
∂h2
∂u1
∂h3
∂u1
∂h4
∂u1
∂h5
∂u1
∂h6
∂u1
∂h7
∂u1
∂h8
∂v1
∂h0
∂v1
∂h1
∂v1
∂h2
∂v1
∂h3
∂v1
∂h4
∂v1
∂h5
∂v1
∂h6
∂v1
∂h7
∂v1
∂h8
.
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
.
∂uN−1
∂h0
∂uN−1
∂h1
∂uN−1
∂h2
∂uN−1
∂h3
∂uN−1
∂h4
∂uN−1
∂h5
∂uN−1
∂h6
∂uN−1
∂h7
∂uN−1
∂h8
∂vN−1
∂h0
∂vN−1
∂h1
∂vN−1
∂h2
∂vN−1
∂h3
∂vN−1
∂h4
∂vN−1
∂h5
∂vN−1
∂h6
∂vN−1
∂h7
∂vN−1
∂h8
,
(72)
where each pair of rows is associated with a particular model point Xj=
(Xj, Yj) and contains the partial derivatives
J2j,∗
J2j+1,∗!= Xj
w
Yj
w
1
w0 0 0 −sx·Xj
w2
−sx·Yj
w2
−sx
w2
0 0 0 Xj
w
Yj
w
1
w
−sy·Xj
w2
−sy·Yj
w2
−sy
w2)!,(73)
with
sx=h0·Xj+h1·Yj+h2,(74)
sy=h3·Xj+h4·Yj+h5,(75)
w=h6·Xj+h7·Yj+h8.(76)
See Alg. 4.3 for a compact summary of this step.
3.3 Step 2: Determining the intrinsic camera parameters
In the previous step, the homographies H0,...,HM−1were calculated indepen
dently for each of the Mviews. The homographies encode both the common
camera intrinsics as well as the extrinsic transformation parameters that are
generally diﬀerent for each view. This section describes the extraction of the
intrinsic camera parameters from the given set of homographies.
As deﬁned in Eqn. (42), a homography H=Hicombines the inner camera
transformation Aand the viewspeciﬁc external transformation R,t, such that
H=
  
h0h1h2
  
=λ·A·
  
r0r1t
  
,(77)
where λis an arbitrary nonzero scale factor. For Rto be a valid rotation matrix,
the column vectors r0,r1must be orthonormal, i. e.,
r
0·r1=r
1·r0= 0 and (78)
r
0·r0=r
1·r1= 1.(79)
Burger – Zhang’s Camera Calibration Algorithm 17
We can see from Eqn. (77) that
h0=λ·A·r0,(80)
h1=λ·A·r1(81)
and thus
A−1·h0=λ·r0,(82)
A−1·h1=λ·r1(83)
and furthermore12
h
0·(A−1)=λ·r
0,(84)
h
1·(A−1)=λ·r
1.(85)
Based on Eqns. (78)–(79) this yields two fundamental contraints on the
intrinsic parameters for a given homography H:
h
0·(A−1)·A−1·h1= 0,(86)
h
0·(A−1)·A−1·h0=h
1·(A−1)·A−1·h1(87)
(the factor λis irrelevant here). For estimating the camera intrinsics, Zhang
substitutes the above expression (A−1)·A−1by a new matrix13
B= (A−1)·A−1=
B0B1B3
B1B2B4
B3B4B5
,(88)
which is symmetric and composed of only 6 distinct quantities (by inserting
from Eqn. (15)):
B0=1
α2, B1=−γ
α2β,(89)
B2=γ2
α2β2+1
β2, B3=vcγ−ucβ
α2β,(90)
B4=−γ(vcγ−ucβ)
α2β2−vc
β2, B5=(vcγ−ucβ)2
α2β2+v2
c
β2+ 1.(91)
We can now write the constraints in Eqns. (86)–(87) in the form
h
0·B·h1= 0,(92)
h
0·B·h0−h
1·B·h1= 0,(93)
with hp= (Hp,0, Hp,1, Hp,2)being the pth column vector (for p∈ {0,1,2}), of
homography H(see Eqn. (77)). Using the 6dimensional vector
b= (B0, B1, B2, B3, B4, B5)
.(94)
12Since (A·B)=B·A.
13We use the same (not necessarily intuitive) sequencing of the matrix elements as in [15]
for compatibility.
Burger – Zhang’s Camera Calibration Algorithm 18
to represent the matrix B(Eqn. (88)), we get the identity
h
p·B·hq=vp,q(H)·b,(95)
where vp,q(H) is a 6dimensional row vector obtained from the (estimated)
homography Has
vp,q(H) =
H0,p ·H0,q
H0,p ·H1,q +H1,p ·H0,q
H1,p ·H1,q
H2,p ·H0,q +H0,p ·H2,q
H2,p ·H1,q +H1,p ·H2,q
H2,p ·H2,q

.(96)
For a particular homography H, the two constraints in Eqns. (86)–(87) can now
be reformulated as a pair of linear equations,
v0,1(H)
v0,0(H)−v1,1(H)·b=0
0,(97)
for the unknown vector b(deﬁned in Eqn. (94)). To consider the estimated
homographies Hifrom all Mviews, the associated 2Mequations are simply
stacked in the usual way, i. e.,
v0,1(H0)
v0,0(H0)−v1,1(H0)
v0,1(H1)
v0,0(H1)−v1,1(H1)
.
.
.
v0,1(Hi)
v0,0(Hi)−v1,1(Hi)
.
.
.
v0,1(HM−1)
v0,0(HM−1)−v1,1(HM−1)
·b=
0
0
0
0
.
.
.
0
0
.
.
.
0
0
or V·b=0,(98)
for short, with the matrix Vof size 2M×6. Again we have an overdeter
mined system of homogeneous linear equations, which we can readily solved by
singularvalue decomposition, as described earlier (see Sec. 3.2.1).
Once the vector b= (B0, B1, B2, B3, B4, B5)and hence Bis known, the
camera intrinsics (i. e., the matrix A) can be calculated. Note that the matrix
Arelates to Bonly by the (unknown) scale factor λ, i. e., B=λ·(A−1)·A−1.
An elegant (though not trivial) closedform calculation of A, proposed in [15],
is
α=pw/(d·B0),(99)
β=qw/d2·B0,(100)
γ=qw/(d2·B0)·B1,(101)
uc= (B1B4−B2B3)/d, (102)
vc= (B1B3−B0B4)/d, (103)
Burger – Zhang’s Camera Calibration Algorithm 19
with
w=B0B2B5−B2
1B5−B0B2
4+ 2B1B3B4−B2B2
3,(104)
d=B0B2−B2
1.(105)
An alternative formulation for calculating A– based on numerical decomposi
tion of B– is given in Sec. C.
3.4 Step 3: Extrinsic view parameters
Once the camera intrinsics are known, the extrinsic parameters R,tfor each
view ican be calculated from the corresponding homography H=Hi. From
Eqn. (77) we get
r0=λ·A−1·h0,r1=λ·A−1·h1,t=λ·A−1·h2,(106)
with the scale factor
λ=1
kA−1·h0k=1
kA−1·h1k,(107)
and ﬁnally (since Rmust be orthonormal)
r2=r0×r1.(108)
Note that h,r,t, and λare diﬀerent for each view i. T he resulting 3 ×3 matrix
R= (r0r1r2) is most likely not a proper rotation matrix. However, there
are proven techniques for calculating the most similar “true” rotation matrix for
a given 3×3 matrix (again based on singularvalue decomposition, as described
e. g. in [10, Sec. 2.6.5] and [14, Appendix C]).14
3.5 Step 4: Estimating radial lens distortion
All calculations so far assumed that the inner camera transformation follows
the simple pinhole projection model. In particular, the distortions introduced
by real lens systems were ignored so far. In this step, a simple nonlinear
lens distortion model is added to the projection pipeline and its parameters
are calculated from the observed images. This is accomplished in two steps:
First, the distortion parameters are estimated by linear leastsquares ﬁtting,
minimizing the projection error. The lens distortion parameters are then reﬁned
(simultaneously with all other parameters) in a ﬁnal, overall optimization step,
described in Sec. 3.6.
At this point the (linear) camera intrinsics (A) are approximately known,
and the assumption is that all remaining projection errors can be attributed
to lens distortion. Thus any inaccuracies in the previous steps will also aﬀect
the distortion estimates and the results obtained in this step may be far oﬀ the
real values. Alternatively, one could omit this step altogether and rely on the
overall reﬁnement step (in Sec. 3.6) to calculate accurate distortion parameters
(assuming zero distortion at the start).
14The Rotation class of the Apache Commons Math library provides a construction (among
others) that builds a proper rotation from any 3 ×3 matrix that is suﬃciently conditioned.
Burger – Zhang’s Camera Calibration Algorithm 20
As mentioned, all remaining errors, i. e., the deviations between the projected
sensor points ui,j and the actually observed sensor points ˙
ui,j ,
˙
di,j =˙
ui,j −ui,j ,(109)
are now attributed to lens distortion. Consequently, ˙
di,j is referred to as the
observed distortion vector.
As described in Sec. 2.5.2, lens distortion is modeled as a radial displacement,
that is, the original (undistorted) projection ui,j is warped to the distorted point
˜
ui,j by
˜
ui,j =uc+ (ui,j −uc)·[1 + D(ri,j ,k)] (110)
=uc+ui,j −uc+ (ui,j −uc)·D(ri,j ,k) (111)
=ui,j + (ui,j −uc)·D(ri,j ,k)
 {z }
di,j
=ui,j +di,j .(112)
The resulting model distortion vector,
di,j = (ui,j −uc)·D(ri,j ,k),(113)
is based on the estimated projection center uc(in sensor coordinates) and the
nonlinear radial distortion function D(r, k), as deﬁned in Eqn. (32). The pa
rameters k= (k0, k1) are to be estimated (see Fig. 1). Note that, in Eqn. (113),
the radius ri,j passed to the function D() is not calculated from the projected
sensor points ui,j but as the distance of the associated points xi,j from the
projection center (principal point) xc= (0,0)in the “normalized” projection
plane, that is,
ri,j =kxi,j −xck=kxi,j k.(114)
For a positive function value D(ri,j,k) the sensor point ui,j is shifted outwards
(i. e., away from the projection center) to the distorted position ˜
ui,j , and inwards
if the function value is negative.
The unknown distortion parameters kcan be estimated by minimizing the
diﬀerence between the model distortions di,j (Eqn. (113)) and the associated
observed distortions ˙
di,j (Eqn. (109)), that is,
X
i,j kdi,j −˙
di,j k → min .(115)
In other words, we are looking for a least squares solution to the overdetermined
system of equations di,j =˙
di,j (for all point index pairs i, j), that is,
(ui,j −uc)·D(ri,j ,k) = ˙
ui,j −ui,j ,(116)
to ﬁnd the distortion parameters k. By inserting from Eqns. (109)–(113) and
expanding the function D() from Eqn. (32), every observed point i, j contributes
a pair of equations
( ˙ui,j −uc)·r2
i,j ·k0+ ( ˙ui,j −uc)·r4
i,j ·k1= ( ˙ui,j −ui,j ),
( ˙vi,j −vc)·r2
i,j ·k0+ ( ˙vi,j −vc)·r4
i,j ·k1= ( ˙vi,j −vi,j ),(117)
Burger – Zhang’s Camera Calibration Algorithm 21
to the system. Note that (fortunately) these equations are linear in the un
knowns k0, k1, i. e., they can be solved with standard linear algebra methods.15
For this purpose, we rewrite Eqn. (117) in the familiar matrix notation as
( ˙ui,j −uc)·r2
i,j ( ˙ui,j −uc)·r4
i,j
( ˙vi,j −vc)·r2
i,j ( ˙vi,j −vc)·r4
i,j ·k0
k1=˙ui,j −ui,j
˙vi,j −vi,j .(118)
By stacking the equations for all MN points on top of each other, we end up
with a system of 2MN linear equations,
( ˙u0,0−uc)·r2
0,0( ˙u0,0−uc)·r4
0,0
( ˙v0,0−vc)·r2
0,0( ˙v0,0−vc)·r4
0,0
( ˙u0,1−uc)·r2
0,1( ˙u0,1−uc)·r4
0,1
( ˙v0,1−vc)·r2
0,1( ˙v0,1−vc)·r4
0,1
.
.
.
.
.
.
( ˙ui,j −uc)·r2
i,j ( ˙ui,j −uc)·r4
i,j
( ˙vi,j −vc)·r2
i,j ( ˙vi,j −vc)·r4
i,j
.
.
.
.
.
.
( ˙uM−1,N−1−uc)·r2
M−1,N−1( ˙uM−1,N−1−uc)·r4
M−1,N−1
( ˙vM−1,N−1−vc)·r2
M−1,N−1(˙vM−1,N−1−vc)·r4
M−1,N−1
 {z }
D
·k0
k1=
˙u0,0−u0,0
˙v0,0−v0,0
˙u0,1−u0,1
˙v0,1−v0,1
.
.
.
˙ui,j −ui,j
˙vi,j −vi,j
.
.
.
˙uM−1,N−1−uM−1,N−1
˙vM−1,N−1−vM−1,N−1
 {z }
˙
d
,
(119)
or D·k=˙
dfor short, with k= (k0, k1)as the vector of unknowns. The least
squares solution that minimizes kD·k−˙
dk2is found with the usual numerical
methods (e. g., singularvalue or QRdecomposition).16
3.6 Step 5: Reﬁning all parameters
The last step of the calibration procedure is to optimize all calibration param
eters, i. e., the camera intrinsics and the extrinsic parameters for all Mviews
(with Nobserved points each), in a single (nonlinear) system of equations. We
deﬁne the vectors
a= (α, β, γ , uc, vc
 {z }
from A
, k0, k1
{z}
k
),(120)
to collect the intrinsic camera parameters (a, with 5 elements taken from the
estimated matrix Aand 2 elements from k) and the extrinsic parameters
wi= (ρi,x, ρi,y , ρi,z
 {z }
from Ri
, ti,x, ti,y , ti,z
 {z }
from ti
)