Point cloud densiﬁcation
February 11, 2011
Master’s Thesis in Engineering Physics, 30 ECTS-credits
Supervisor at CS-UmU: Niclas B¨
Examiner: Christina Igasto
DEPARTMEN T OF PHYSICS
SE-901 87 UME ˚
Several automatic methods exist for creating 3D point clouds extracted from 2D photos. In many
cases, the result is a sparse point cloud, unevenly distributed over the scene.
After determining the coordinates of the same point in two images of an object, the 3D position
of that point can be calculated using knowledge of camera data and relative orientation.
A model created from a unevenly distributed point clouds may loss detail and precision in the
sparse areas. The aim of this thesis is to study methods for densiﬁcation of point clouds.
This thesis contains a literature study over different methods for extracting matched point pairs,
and an implementation of Least Square Template Matching (LSTM) with a set of improvement
techniques. The implementation is evaluated on a set of different scenes of various difﬁculty.
LSTM is implemented by working on a dense grid of points in an image and Wallis ﬁltering is
used to enhance contrast. The matched point correspondences are evaluated with parameters from
the optimization in order to keep good matches and discard bad ones. The purpose is to ﬁnd details
close to a plane in the images, or on plane-like surfaces.
A set of extensions to LSTM is implemented in the aim of improving the quality of the matched
points. The seed points are improved by Transformed Normalized Cross Correlation (TNCC) and
Multiple Seed Points (MSP) for the same template, and then tested to see if they converge to the
same result. Wallis ﬁltering is used to increase the contrast in the image. The quality of the extracted
points are evaluated with respect to correlation with other optimization parameters and comparison
of standard deviation in x- and y- direction. If a point is rejected, the option to try again with a larger
template size exists, called Adaptive Template Size (ATS).
1 Introduction 1
1.1 Background...................................... 1
1.2 Aims.......................................... 1
1.3 RelatedWork ..................................... 2
1.4 OrganizationofThesis ................................ 2
2 Theory 3
2.1 The3Dmodelingprocess............................... 3
2.2 Projectivegeometry.................................. 5
2.2.1 Homogenous coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Transformations of P2............................ 5
2.3 Thepinholecameramodel .............................. 7
2.4 Stereoviewgeometry................................. 8
2.4.1 Epipolargeometry .............................. 8
2.4.2 The Fundamental Matrix, F......................... 8
2.4.3 Triangulation................................. 9
2.4.4 Imagerectiﬁcation .............................. 9
2.5 Estimation....................................... 10
2.5.1 Statistics ................................... 10
2.5.2 Optimization ................................. 12
2.5.3 Rank N-1 approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6 Least Squares Template Matching . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Overview of methods for densiﬁcation 15
3.1 Introduction...................................... 15
3.2 Variouskindsofinput................................. 15
3.2.1 Videodata .................................. 15
3.2.2 Laserscannerdata .............................. 15
3.2.3 Stillimages.................................. 16
3.3 Matching ....................................... 16
3.3.1 SIFT, Scale-Invariant Feature Transform . . . . . . . . . . . . . . . . . . . 16
3.3.2 Maximum Stable Extremal Regions . . . . . . . . . . . . . . . . . . . . . 16
3.3.3 Distinctive Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.4 Multi-View Stereo reconstruction algorithms . . . . . . . . . . . . . . . . 16
3.4 Qualityofmatches .................................. 16
4 Implementation 17
4.1 Method ........................................ 17
4.2 Implementationdetails ................................ 18
4.2.1 Algorithmoverview ............................. 18
4.2.2 Adaptive Template Size (ATS) . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2.3 Wallisﬁltering ................................ 19
4.2.4 Transformed Normalized Cross Correlation (TNCC) . . . . . . . . . . . . 19
4.2.5 Multiple Seed Points (MSP) . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.6 Acceptancecriteria.............................. 20
4.2.7 Errorcodes.................................. 20
4.3 Choiceoftemplatesize................................ 25
4.3.1 Calculation of z-coordinate from perturbed input data . . . . . . . . . . . . 25
5 Experiments 27
5.1 Imagesets....................................... 27
5.1.1 Image pair A, the loading dock . . . . . . . . . . . . . . . . . . . . . . . . 27
5.1.2 Image pair B, “Sliperiet” . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.1.3 Image pair B, “Elgiganten” . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Experiments...................................... 32
5.2.1 Experiment 1, Asphalt . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2.2 Experiment 2, Brick walls . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2.3 Experiment3,Door.............................. 32
5.2.4 Experiment4,Lawn ............................. 32
5.2.5 Experiment 5, Corrugated plate . . . . . . . . . . . . . . . . . . . . . . . 32
6 Results 33
6.1 Experiments...................................... 33
6.1.1 Asphalt .................................... 33
6.1.2 Brickwalls .................................. 36
6.1.3 Door ..................................... 38
6.1.4 Lawn ..................................... 42
6.1.5 Corrugatedplate ............................... 44
7 Discussion 47
7.1 Evaluationofaims .................................. 47
7.2 Additionalanalysis .................................. 48
7.2.1 Pointclouddensity.............................. 48
7.2.2 Runtime.................................... 48
7.2.3 Errorcodes.................................. 48
7.2.4 Homographies ................................ 49
8 Conclusions 51
9 Future work 53
10 Acknowledgements 55
A Homographies 59
A.1 Loadingdock ..................................... 59
A.2 BuildingSliperiet................................... 60
A.3 BuildingElgiganten.................................. 60
B Abbreviations 61
List of Figures
2.1 Similarity, afﬁne and projective transform of the same pattern. . . . . . . . . . . . 6
2.2 Schematic view of a pinhole camera. . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 The epipolar line connects the the cameras’ focal points. . . . . . . . . . . . . . . 8
2.4 Lensdistortion .................................... 9
2.5 Normaldistribution.................................. 10
2.6 Correlation ...................................... 12
4.1 Gridpoints ...................................... 21
4.2 Seedpoints ...................................... 21
4.3 Template and search patch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4 Grass without Wallis ﬁltering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.5 Grass with Wallis ﬁltering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.6 Normalized Cross Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.7 Search part for multiple seed points . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.1 Left image of image pair A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Right image of image pair A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3 Left image of image pair B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.4 Right image of image pair B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.5 Left image of image pair C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.6 Right image of image pair C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.1 Detected points of example 1 c+w . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.2 AsphaltinA.5..................................... 34
6.3 Wallis ﬁltered asphalt in A.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.4 Point cloud of areas A.1, A.2 and A.4 . . . . . . . . . . . . . . . . . . . . . . . . 36
6.5 ResultofMSPinareaA.1 .............................. 37
6.6 ResultsExperiment3................................. 39
6.7 ResultsExperiment3................................. 39
6.8 ResultsExperiment3................................. 40
6.9 Histogram over used template sizes in Experiment 3 . . . . . . . . . . . . . . . . . 41
viii LIST OF FIGURES
6.10 Used seed points in exp. 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.11 Used seed points in exp. 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.14 Histogram over templates sizes in experiment 5 . . . . . . . . . . . . . . . . . . . 46
7.1 Histogram over error codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Several automatic methods exist for creating 3D point clouds extracted from sets of images. In many
cases, they create sparse point clouds which are unevenly distributed over the objects. The task of
this thesis is to evaluate, compare and develop routines and theory for densiﬁcation of 3D point
clouds obtained from images.
Point clouds are used in 3D modeling for generation of accurate models of real world items or
scenes. If the point cloud is sparse, the detail of the model will suffer as well as the precision of
approximated geometric primitives, therefore the subject of densiﬁcation methods are of interest to
The aims of this thesis are to evaluate some methods for generation of point clouds and ﬁnding pos-
sible reﬁnements, that should result in more detailed 3D reconstructions of for example buildings
and ground. Some important aspects are speed, robustness, and quality of the output.
The following reconstruction cases are of special interest:
–Some 3D points on a surface, e.g. a wall of a building , have been reconstructed. The goal is to
extract more points on the wall to determine intrusions/extrusions from e.g. window frames.
–A sparse 3D point cloud have been automatically reconstructed on the ground. The ground
topography is represented as a 2.5D mesh. The goal is to extract more points to obtain a
topography of higher resolution.
2 Chapter 1. Introduction
1.3 Related Work
Several techniques for constructing detailed 3D point clouds exist. In the aim of documenting and
reconstructing detailed heritage objects, the papers by El-Hakim et al. , Gr ¨
un et al. ,
Remondino et al.  and Remondino et al. , describe reconstruction of detailed models
from image data.
Some papers on methods based on video input are found in Gallup et al.  and Frahm et al.
The papers by D’Apuzzo  and Blostein and Huang  deal with quality evaluation of
the generated point clouds.
A prototype of a computer application for photogrammetric reconstruction of textured 3D mod-
els of buildings is presented in Fors Nilsson and Grundberg  where the necessity of point
cloud densiﬁcation is noted.
An overview of literature for 3D reconstruction algorithms is done by B¨
orlin and Igasto 
and a deeper evaluation of algorithms can be found in Seitz et al. .
1.4 Organization of Thesis
The focus of this thesis is creating dense point clouds of reliable points extracted from digital images.
In Chapter 2 theories of photogrammetry, 3D reconstruction and statistics are introduced. Chapter 3
presents an overview of other methods used for point matching, densiﬁcation of point clouds and
related subjects. The implemented method and the implementation with details are described in
Chapter 4. A set of experiments designed to evaluate the implemented methods are presented in
Chapter 5. The results of the experiments are presented in Chapter 6 followed by discussion and
evaluation of the aims in Chapter 7. Finally, Chapter 8 contains acknowledgments.
Photogrammetry deals with ﬁnding the geometric properties of objects, starting with a set of images
of the object. As mentioned in McGlone et al. , the subject of photogrammetry was born
in the 1850:s, when the ability to take aerial photographies from hot-air balloons gave inspiration
to ideas of techniques to make measurements in aerial photographs in the aim of making maps of
forests and ground.
The technique is today used in different applications like computer and robotic vision, see for
example Hartley and Zisserman , in creating models of objects and landscapes, and creating
models of buildings for simulators and virtual reality.
2.1 The 3D modeling process
The 3D modeling process can be described in various ways depending on methods and aims. The
following way of structure is based on B¨
orlin and Igasto :
1. Image acquisition is the task of planning the camera network, take photos, calibration of
cameras and rectiﬁcation of the images. Different kinds of input images require different
handling. Some examples are images from single cameras, images from a stereo rig, different
angles between the camera positions, video data, and combination with laser scanner data of
2. Feature points detection in images. Feature points are points that are likely to be detected in
3. Matching of feature points is required to know which points are corresponding to each other
in the pair of images.
4. Relative orientation between images calculates the relative positions of the cameras where
the images were taken.
5. Triangulation is used to calculate the 3D point corresponding to each pair of matched points.
6. Co-registration is done to organize point clouds from different sets in the same coordinate
7. Point cloud densiﬁcation is used to ﬁnd more details and retrieve more points for better
estimation of planes and geometry.
4 Chapter 2. Theory
8. Segmentation and structuring in order to separate different objects in the images.
9. Texturing the model with extracted textures from images makes the model photo realistic and
This thesis focus on step 7, in close connection to steps 2 and 3.
2.2. Projective geometry 5
2.2 Projective geometry
Projective geometry is an extension to Euclidean geometry to include e.g. ideal points that cor-
respond to the intersection of parallel lines. The following introduction of the subject covers the
concepts necessary to understand the pinhole camera model and geometrical transformations. The
notation of this section follows Hartley and Zisserman .
2.2.1 Homogenous coordinates
A line in the 2D plane determined by the equation
ax +by +c= 0
can be represented as
l= [a, b, c]T,
that means the line consists of all points x= [x, y]Tthat satisﬁes the equation ax +by +c= 0. In
homogenous coordinates the point will be [x, y, 1]T. Two lines land l0intersect in the point xif the
cross product of the lines equals the point,
In 3D a space point is in the similar way given by
p= [x, y, z, 1]T
and a plane by
l= [a, b, c, d].
2.2.2 Transformations of P2
Transformations of the projective plane P2are classiﬁed in four classes, Isometries,Similarity trans-
formations,Afﬁne transformations and Projective transformations. A transformation is performed
using a matrix multiplication of a transformation matrix Hand the points xto transform,
Figure 2.1 shows the effects of some different transformations.
Isometries are the simplest kind of transformations. They consist of a translation and a rotation of
the plane, which means that distances and angles are preserved. The transformation is represented
where Ris a 2D rotation matrix including optional mirroring, and tis a 2×1vector determining
the translation. This transformation has three degrees of freedom, corresponding to rotation angle
6 Chapter 2. Theory
Figure 2.1: Similarity, afﬁne and projective transform of the same pattern.
Combining the rotation of an isometry with a scaling factor sgives a similarity transform
A similarity transform preserves angles between lines, the shape of an object and the ratios between
distances and areas. This transform has four degrees of freedom.
An afﬁne transformation combines the similarity transform with a deformation of the plane, which
in block matrix form are
where Ais a composition of a rotation matrix and a deformation matrix, which is diagonal and
contains scaling factors for xand y
Ais then composed as A=R(θ)R(−φ)DR(θ). An afﬁne transformation preserve parallel lines,
ratios and lengths of parallel line segments and ratios of areas as well as directions in the rotated
plane. The afﬁne transformation has six degrees of freedom.
Projective transformations give perspective views where objects far away is smaller than close ones.
The transformation is represented by
2.3. The pinhole camera model 7
The vector vTdetermines the transformation of the ideal point where parallel lines intersect. The
projective transform has 8 degrees of freedom, only the ratio between the elements in the matrix are
ﬁxed. This makes it possible to determine the transform between two planes from four pairs of
2.3 The pinhole camera model
Figure 2.2: Schematic view of a pinhole camera. The image plane is shown in front of the camera
centre to simplify the image, in real cameras the image plane, image sensor, is behind the centre of
A simple camera model is the pinhole camera. A 3D point Xin world coordinates maps on to
the 2D point xon the image plane Zof the camera where the ray between Xand the camera centre
Cintersects the plane. The focal distance fis the distance between the image plane and the camera
centre which is the focal point of the lens. Orthogonal to the image plane, the principal ray passes
through the camera centre along the principal axis, originating in the principal point of the image
plane. The principal plane is the plane parallel with the image plane through the camera centre.
Figure 2.2 shows a schematic view of the pinhole camera model.
The projection xof a 3D point Xon the image plane of the camera is given by
where the camera matrix Pis composed by the 3×4matrix
P=KR[I| − C],
The camera matrix describes a camera setup composed by internal and external camera pa-
rameters. The internal parameters are the focal length fof the camera, the principal point P, the
resolution mx, myand optional skew s. The focal length and the principal point are converted to
pixels using the resolution parameters. αx=fmx, αy=fmyis the focal length in pixels and
x0=mxPx, y0=myPyis the principal point. The internal parameters are stored in the camera
The external parameters determine the camera position relative to the world. These are the posi-
tion of the camera centre Cand the rotation of the camera constructed by a rotation matrix R.
8 Chapter 2. Theory
2.4 Stereo view geometry
2.4.1 Epipolar geometry
The relationship between two images of the same object taken from different points of view, is
described by the epipolar geometry of the images. The two centres of the cameras, Cand C0, spans
the baseline, see ﬁgure 2.3 (left). Each camera has an epipole, eand e0, ﬁgure 2.3 (right) which is
the projection of the other focal point of the second camera on the image plane of the ﬁrst camera.
Every plane determined by an arbitrary point Xand the baseline between C C0is an epipolar plane.
The line of intersection of the image plane and the epipolar plane is called the epipolar line. When
the projection point xof a point Xis known in one image, the projection point x0is restricted to lie
on a line through the projection of the camera centre Cand the point xin the image plane of the
second camera. This line intersects the epipole e0.
Figure 2.3: The epipolar line connects the the cameras’ focal points.
2.4.2 The Fundamental Matrix, F
The fundamental matrix is an algebraic representation of the epipolar geometry. The fundamental
matrix Fis deﬁned by
for all corresponding points. The fundamental matrix F is a 3×3of rank 2. Given at least seven
respectively eight pairs of points the fundamental matrix can be calculated using either the seven
point algorithm or the eight point algorithm, see Hartley and Zisserman  for details.
The relationship between the fundamental matrix and the camera matrices is given by
where [e0]×is the representation matrix for transforming a cross product to a matrix-vector multi-
plication and P+is the pseudoinverse of the matrix P.
2.4. Stereo view geometry 9
When the camera matrices are calculated, and the coordinates for a point correspondence are known,
the 3D point can be calculated by solving the equation system
2.4.4 Image rectiﬁcation
The lens in a camera causes some distortion in the images, making straight lines in the outskirts
of the image projected curved, because of the mapping of a 3D world onto a 2D sensor through
a spherical lens. This error in the images can be reduced by rectiﬁcation of the image. In this
work, only pre-rectiﬁed images are used. Figure 2.4 illustrates the effect of lens distortion and
rectifying. The topics of image rectiﬁcation and lens distortion are throughly explained in Hartley
and Zisserman  and Remondino 
Figure 2.4: The grid to the left is curved as a lens distorted image, the right image is the rectiﬁed
10 Chapter 2. Theory
This section follows mostly the notation from Montgomery et al. .
Origins of errors
In tasks where measurements are done there usually occur some errors. The quality of measurements
are affected by systematic errors (bias) and unstructured errors (variance).
In photogrammetry usual origins of errors are the camera calibration, the quality of point extrac-
tion, the quality of the model function and numeric errors in triangulation and optimization.
Normal and χ2distribution
The Normal distribution describes the way many random errors affects the results. The most prob-
able value is close to the expected value µ. Few values are far away. The standard deviation σ(and
variance) describes the dispersion of values. The distribution of a normal random variable is deﬁned
by the probability density function N(µ, σ2), giving
f(x) = 1
2σ2for − ∞ <x<∞,
where µis the expectation value of the distribution and σ2is the variance.
Figure 2.5: Normal distribution with expectation value µ= 5 and variance σ2= 10.
2.5. Estimation 11
The χ2distribution is deﬁned by
py(y, n) = y(n/2−1)e−y /2
2)n∈ N, y > 0,
where Γ(.)is the Gamma-function. A particular case is the sum of squared independent random
which is χ2distributed, [F¨
orstner and Wrobel, 2004].
Variance and standard deviation
The variance is a measure of the width of a distribution. It is deﬁned as
where f(x)is the probability density function of the distribution and µis the expected value. The
standard deviation is σ, the square root of the variance.
Covariance is a measure of how two variables interact with each other. A covariance of zero implies
that the variables are uncorrelated. Of two variables xand ywith estimation values E(x)and E(y)
and mean values µxand µythe covariance is
Cov(x, y) = E(xy)−(µxµy),
The correlation coefﬁcient is the normalized covariance and determines the strength of the linear
relationship between the variables. The correlation coefﬁcient is determined by
ρxy =Cov(x, y)
where Sxy is called the corrected sum of cross products, deﬁned by
Correlated and non-correlated errors
A positive correlation coefﬁcient between xand yimplies that given a small value of x, a small
value of yis likely. If the coefﬁcient is zero, there is no linear relationship. If the observations are
plotted, ρ= 0 if the plotted points are equally distributed. If they are close to a line in positive
direction, ρis close to 1.
12 Chapter 2. Theory
Figure 2.6: Values in the left image are correlated with a correlation coefﬁcient close to 1. Values in
the right image are not correlated, and hence the correlation coefﬁcient is close to 0.
The covariance matrix
The covariance matrix is composed by the variance of each variable and their covariances.
Error propagation of linear combinations of random variables
As presented in F¨
orstner and Wrobel [2004, ch.188.8.131.52.3] a set of nnormally distributed random
variables with covariance matrix Cxx can compose a vector
x= [x1, x2,...xn]T,x∈N(µx,Cxx )
A linear transformation of the vector xis deﬁned by
The expected value and the variance transform are deﬁned as follows:
E(Y) = E(Mx +p) = ME(x) + p=Mµx+p
σ2=V(y) = V(Mx +p) = MV(x)MT+V(p) = MCxxMT.
Optimization has a set of different applications in photogrammetry. One is in template matching,
which is the interesting area for this work, and one is in Bundle adjustment where a model and a
point cloud are adjusted to each other. The task of least squares optimization is to minimize the
norm ||r(x)|| of the residual r(x)between a model function f(x)and the observations b. If the
model is linear, the residual becomes
r(x) = Ax−b.
2.5. Estimation 13
where Ais a constant matrix.
Many problems belong to the class least squares problem which can be solved using different
algorithms. The choice of optimization algorithm is a question of efﬁciency in time and memory,
precision, probability to converge and implementation difﬁculty. Some methods, as Levenberg-
Marquardt and Gauss-Newton, [Nocedal and Wright, 1999, ch.10.3] linearize the problem and solve
it using linear optimization techniques.
If the covariances of the variables are known and not zero, the problem is considered weighted and
the optimization problem can be formulated as
The index Windicates that the norm ||.||2
Wis weighted and deﬁned as
The weight matrix Wis deﬁned by
where Cbb is the covariance matrix of the observations. The structure of the covariances may be
enough, and then the covariance matrix can be decomposed to
0is a scaling parameter and Qbb is a structure parameter.
2.5.3 Rank N-1 approximation
Rank N-1 transform is often, for historical reasons, called Direct Linear Transform. In this work,
the aim of the rank N-1 transform is to ﬁnd the homography H:P2→ P2from a set of point
iin a pair of images. The homography can be exactly determined from
four point correspondences or estimated from a larger amount of points using the Singular Value
Decomposition (SVD), described in e.g. Strang .
The transformation is given by
where His a 3×3-matrix. x0
iand Hxiare parallel in R3giving x0
i×Hxi=0. Rewriting the
system, using h1= [h11h12 h13]Tand x0
i)T, gives in matrix form
also written as A0
ih=0. This system has only two linearly independent rows, which makes it
possible to remove one row. By assembling the 2×9equation for all known points, a 2n×9system
is built. This gives an over-determined equation system to solve, and according to noise, it will
probably not have an unique solution. Instead, the optimization problem
s.t.||h|| = 1.
14 Chapter 2. Theory
This can be solved by using the SVD of A
The solution is found in the right singular vector vn
This transformation matrix Pis usually ill-conditioned, because the centres of gravity of the
set of points are far from zero. Making a normalization of the points, as described in Hartley and
Zisserman [2003, ch.4.1], gives a less perturbation sensitive system, which implies smaller variance
in the transformed points. By transforming the points to have their centre of gravity at the origin
and mean distance √2to the origin, before applying the Rank N-1 transform, the system becomes
2.6 Least Squares Template Matching
Least Squares Template Matching (LSTM), presented by Gruen , searches for the best po-
sition for a template in a search patch. The method is closely related to Adaptive Least Squares
Matching (ALSM) described in Gruen . The template and the search patch are described by
two discrete two-dimensional functions, f(x, y)and g(x, y). A noise function e(x, y)contains the
difference between the image functions
f(x, y)−e(x, y) = g(x, y ).
The aim of the optimization is to reduce the noise function e(x, y). The template function f(x, y)
is transformed by an afﬁne transformation approximating the projective difference between the im-
ages to match the search patch function g(x, y). Optionally, the transformation is combined with
radiometric parameters to compensate for lighting differences. The optimization parameters are
combined in a vector xo, based on the homography matrix H. The Gauss Newton optimization is
then applied on the elements of xto minimize the resulting noise.
s.t.xe(x) = min
The output from LSTM is primary the position of the template in the both images. Optional, the
implementations also return the results of the other optimization parameters, the step lengths used
by Gauss Newton and statistics.
Overview of methods for
There are many different methods used in the aim of improving the density of point clouds, with
different pros and cons. For image acquisition video data or still images can be used, sometimes
in combination with laser scanner data of the object. Both feature based methods and area based
methods are used in various implementation. Much of the work with densiﬁcation is done on small
objects with symmetrical camera network all around the object.
The task of ﬁnding general methods for densiﬁcation is still an open task.
3.2 Various kinds of input
3.2.1 Video data
Working with video data has the advantage of very short base lines between sequential image, and
with knowledge of the camera motion and calculated vanishing lines, which are vertical, an up-
vector can be determined. A disadvantage of this kind of input data is the lower resolution of most
video cameras which probably gives less details, and also to handle the large amount of data created
from the short base line images to handle. In Gallup et al.  and Frahm et al.  a system
for 3D-reconstruction of architectural scenes is presented. The input data used are uncalibrated
video data. For faster handling of the data the GPU is used for calculations. The introduction of
the Viewpoint Invariant Patch (VIP) gives a descriptor of features looking similar from a variety of
3.2.2 Laser scanner data
A laser scanner device can make accurate measurements of the distance to points in an object.
Combining laser scanner data from different points of view gives the opportunity to create detailed
model of the hull of an object. In Wendt  laser scanner data are combined with still images
by describing features in both data sets and then matching these together. That makes it possible to
create accurate textured models. However, a laser scanning device is expensive, and not commonly
16 Chapter 3. Overview of methods for densiﬁcation
3.2.3 Still images
For still images, a lot of different methods are used. There are differences in methods used if the
input are single images or stereo images, if the base lines are short or long, whether the camera
positions are known or not. As mentioned in Remondino  the camera network is important
to improve the precision of the reconstructed model. Knowledge of the requirements of the re-
construction method are important to decide the choice of cameras used, their positions and their
There are two main classes of methods for matching of images, feature based methods and area
based methods, in some cases combined with each other to improve results. Here are some different
approaches brieﬂy presented.
3.3.1 SIFT, Scale-Invariant Feature Transform
SIFT is a widely used feature based method for object recognition and was ﬁrst presented by Lowe
. It uses a class of local image features which are invariant to image scaling, translation and
rotation, and partially invariant to changes in lighting and projection.
3.3.2 Maximum Stable Extremal Regions
Maximum Stable Extremal Regions (MSER) is a method for feature detection proposed by Matas
et al.  where regions which are little affected by various points of view are chosen as feature
points. Wide baseline image pairs are possible to use for point matching using this method
3.3.3 Distinctive Similarity Measure
In Yoon and Kweon  a method to measure the distinguishable of a feature point is presented.
A point which is unique in the image is treated as more valuable than a point that is similar to many
3.3.4 Multi-View Stereo reconstruction algorithms
A family of algorithms useful for high precision reconstruction of small objects. The requirements
of dense camera networks in exact positions makes them not suitable for outdoor work, but they are
used for detailed reconstructions of smaller objects. An evaluation of a set of these algorithms is
done in Seitz et al. .
3.4 Quality of matches
A denser point cloud is not useful if the matched points are of low quality due to wrong matchings
or low precision. D’Apuzzo  suggests a couple of measurements to detect the quality of a
point match. Some of the values suggested to use are the σ0from template matching, the differences
shifts in xand y- direction, the scale factors in xand yand step lengths used by the optimization.
The objective for this thesis is template matching in the purpose of point cloud densiﬁcation. In this
section, an implementation of least squares template matching is described, combined with a set of
different methods for reﬁnements in preprocessing of the images, analysis of resulting parameters
and successive improvement.
Four corners in one image and a homography between the particular plane to match in the pair
of images is given to the method. The images have known epipolar geometry and camera positions.
The implemented methods shall be evaluated with respect to
–Point cloud quality:
•Completeness — what is the density of the generated point cloud?
•Robustness — how many matches are correct?
•Precision — what is the reconstruction error for the correct matches?
–Method sensitivity to:
•Object geometry — how large deviations from the basic shape can be reconstructed?
•Camera network geometry — how well do the methods work with a long baseline, i.e.
the images were taken far apart?
The choice of method is Least Squares Template Matching (LSTM) as described in ch. 2.6. A dense
grid of seed points is constructed for the area of interest in the left image. The grid is transformed to
the right image using the homography to generate initial guesses for the points. This gives a more
evenly distributed set of seed points than feature-based methods usually generates. The density of
points is possible to adjust by changing the grid.
A number of extensions to LSTM have been implemented. These are; Wallis Filtering for con-
trast enhancement, Transformed Normalized Cross Correlation (TNCC) to improve the initial pairs
of seed points, Multiple Seed Points (MSP) to ensure stability of the match and Adaptive Template
Size (ATS) to detect matches where the scale of gradients in the image is too large for the ﬁrst
one used. The matches found are evaluated in respect to matching parameters to detect and reject
possible false matchings.
18 Chapter 4. Implementation
4.2 Implementation details
4.2.1 Algorithm overview
1. Create homography Hfor a plane Prepresenting a wall, ground or equivalent surface in the
set of images.
2. Place a rectangular grid over the area in the left image for point generation as in ﬁgure 4.1.
3. Create initial seed points by transforming the grid points with the homography to the right
image as in ﬁgure 4.2.
4. Optional: Improve the contrast by Wallis Filtering (section 4.2.3)
5. For each grid point:
(a) Optional: Improve the seed point by Transformed Normalized Cross Correlation, see
–If the maximum value of the cross correlation is too low, set an error code.
(b) Cut out a template centered in the grid point of the left image and a search patch cen-
tered in the improved seed point in the right image which is three times larger than the
template, see ﬁgure 4.3.
(c) Optional: If Multiple seed points is used, put eight extra seed points around the grid
point in the left image.
–Perform Least Squares Template Matching for all nine points.
–Choose coordinates the most start points converge to as the found point. If less than
three converges to the same point, restart from the normalized cross correlation with
a larger template.
–If less than four seed points converge to the same point, set an error code.
(d) If Multiple seed points is not used, perform least squares template matching on the point.
(e) Calculate the covariance matrix for the found point. If it not passes the analysis, set an
6. Calculate 3D points for the accepted point pairs using calibration data for the cameras.
4.2.2 Adaptive Template Size (ATS)
A small template size of seven pixels is used initially, and the size grows up to a maximum size of
31 pixels in steps of four if no point is found using smaller templates. For the least squares template
matching to converge to the right point it requires that the point is inside the area overlapped by the
template. If adaptive template size is used, a check for error codes is evaluated when step 5in the
algorithm is ﬁnished for each point. If an error code is found the step is re-executed using a larger
template size until an error-free run is done, or maximum template size is reached.
4.2. Implementation details 19
4.2.3 Wallis ﬁltering
Wallis ﬁltering is a method for contrast enhancement. The ﬁlter function is represented by
Ji,j =αµ +(1 −α)(Ii,j −µ)
where Ji,j is the new value of the processed pixel in the image, Ii,j is the old value, αis a blending
parameter, µis the mean intensity of the pixels in the ﬁlter window and σis the standard deviation.
If αis close to 1one, the Wallis ﬁlter is an averaging ﬁlter and if αis close to zero it normalizes the
intensity of the pixels.
This process is time consuming, so it is wise to only ﬁlter the interesting parts on an image. In
images 4.4 and 4.5, a grass area is displayed in gray scale and its Wallis ﬁltered version.
4.2.4 Transformed Normalized Cross Correlation (TNCC)
The normalized cross correlation is used to improve the initial seed point created by transforming
the grid point with the homography. Compared to classical Normalized Cross Correlation, described
by Gonzalez and Woods , this process pretransforms the image by the homography.
The procedure is as follows:
–Take a template from the left image centered in the grid point.
–Find the corners of an area three times as large as the template centered in the same point in
the left image.
–Transform the corner points using the homography to the right image.
–Pick the rectangular area including the corner points and transform the sub image with imtransform
to reshape it as the left image.
–Find the best starting point with normxcorr2 for the template in the transformed sub image.
–Transform the coordinates for the best point to absolute coordinates in the right image. This
is the new seed point.
–If the maximum cross correlation value is to low, try with larger template size (and larger
Image 4.6 shows the Normalized cross correlation. This improvement is expensive in terms of exe-
cution time, especially if the template sizes are large, and is not recommended to use in combination
with Adaptive Template Size.
4.2.5 Multiple Seed Points (MSP)
Eight extra seed points are generated around the seed point improved by normalized cross corre-
lation. Least squares template matching is applied on all seed points. If at least three of them
converges to within two pixels, this point is accepted as the result. Figure 4.7 shows a template, the
multiple seed points and the different points of convergence.
20 Chapter 4. Implementation
4.2.6 Acceptance criteria
The question of which points should be accepted has many different answers. Higher precision
requirements give less dense point clouds but higher precision in the matched points. The chosen
criteria are correlation between optimization parameters, the scale adjustment sx, syin x- and y-
direction and the result from the optimization. In this implementation a correlation value larger than
0.80 implies rejection as well as a quotient of x- and y- scale factors larger than two.
4.2.7 Error codes
A set of error codes is implemented to prepare for evaluation of why points are rejected. The codes
1. The max value from Normalized cross correlation was lower than 0.8.
2. Less than 4 of multiple seed points converged to the same result.
3. Maximum adaptive template size reached without any accepted point found.
4. The optimization did not converge.
5. The scale adjustment quotient sx/syis too high (over 2) or too low (less than 0.5).
6. Too high correlation between position and any other optimization parameter (over 0.80).
7. Too high standard deviation σ0from the optimization.
An accepted point has the code 0. Codes 1-3 appears only when their respective method is applied.
Code 3 for maximum adaptive template size overrides the other codes.
4.2. Implementation details 21
Figure 4.1: A 10 ×10 grid of points.
Figure 4.2: Seed points generated by transforming a 10 ×10 grid of points with the homography H.
22 Chapter 4. Implementation
Figure 4.3: Left image is an example of a template 15 ×15 pixels, right image is the corresponding
4.2. Implementation details 23
Figure 4.4: Grass without Wallis ﬁltering
Figure 4.5: Grass with Wallis ﬁltering for contrast enhancement
24 Chapter 4. Implementation
Figure 4.6: Normalized cross correlation of a template and the corresponding search patch in area
A.5. The left image is the template of 21×21 pixels, the middle image is the search patch, and the
right image is the resulting cross correlation where light pixel implies high correlation.
Figure 4.7: Example of a search patch for multiple seed points in area A.5. Green circles are seed
points, red stars are the found points.
4.3. Choice of template size 25
4.3 Choice of template size
To estimate appropriate template size, calculations are done to determine the need for precision in
point coordinates and required template size.
4.3.1 Calculation of z-coordinate from perturbed input data
The depth of the signs on the wall in area 2in image 5.3 on page 30 is estimated to 0.2meters.
Taking a point in the 3D cloud of this image set as an example,
is projected in camera 1 (left image) on the point
using the camera equation (eq.2.1). A shift of the z-coordinate of 0.2meters gives the 2D point
The difference in projection on the image plane is
This tells us that a 0.2meter detail is projected less than two pixels from its corresponding point
in the plane. Least Squares Template Matching requires that the true point is within a half template
size from the seed point, which is fulﬁlled by a template size of seven pixels.
26 Chapter 4. Implementation
A number of experiments were designed to investigate the properties of LSTM and the extensions
described in chapter 4. In this chapter the examples are described, followed by their results in chapter
The tests were run under MATLAB 7.9.0 (R2009b) on Intel Core2 Quad CPU Q9300 2.500GHz
with 4096 MB memory and Intel Core2 Quad Q9400 2.66GHz with 4096 MB memory. The second
type has about 5% higher performance.
5.1 Image sets
Three image sets were used, image set A - the loading dock of the MIT building, ﬁgures 5.1 and
5.2, image set B - the building “Sliperiet” at Str¨
ompilen , ﬁgures 5.3 and 5.4, and image set C - the
building “Elgiganten” at Str¨
ompilen, ﬁgures 5.5 and 5.6. These different areas have been selected
for evaluation based on their different properties and structures.
The experiments are set up in the purpose of evaluating how the different extensions applied to
LSTM works on image parts with different properties. For every area, a 100 ×100 grid of seed
points were distributed to create a dense set of possible point correspondances.
The results are presented in tables with comments and ﬁgures in the following results chapter
where the amount of accepted point matchings and the execution times are presented for all experi-
ments, and other properties where they are of interest.
5.1.1 Image pair A, the loading dock
This set of images contains planes in different viewing angles and different types of texture.
The used areas are the following (numbered as in ﬁgure 5.1 on page 29 and ﬁgure 5.2 on page
A.1 A brick wall at an oblique angle in shadow.
A.2 A brick wall from almost orthogonal angle in direct sunlight.
A.3 A painted door with a window and some signs, very dark area of the image.
A.4 Brick wall from almost orthogonal angle in shadow.
A.5 Asphalt in skew angle with a lot of random structure.
28 Chapter 5. Experiments
5.1.2 Image pair B, “Sliperiet”
This set of images contains three planes on different distances with signs and windows protruding
from the main wall. There is also an asphalt plane at a skew angle.
The used areas are the following (numbered as in images 5.3 on page 30 and 5.4 on page 30):
B.1 Plastered wall with doors, windows, including cracks in the plaster (not used in ﬁnal version).
B.2 Plastered wall with windows, signs and obstacles at the bottom.
B.3 Plastered wall with some ﬁne cracks in the plaster and windows (not used in ﬁnal version).
B.4 Asphalt in skew angle with a lot of random structure.
5.1.3 Image pair B, “Elgiganten”
This set of images contains a grass area, and two wall areas of different structure.
The used areas are the following (numbered as in images 5.5 and 5.6):
C.1 Grass in skew angle with a lot of random structure.
C.2 Wall of corrugated plate with mostly horizontal gradients.
C.3 Sign with a lot of strong gradients combined with gradient free areas.
5.1. Image sets 29
Figure 5.1: Left image of image pair A, the loading dock.
Figure 5.2: Right image of image pair A, the loading dock.
30 Chapter 5. Experiments
Figure 5.3: Left image of image pair B, “Sliperiet”.
Figure 5.4: Right image of image pair B, “Sliperiet”.
5.1. Image sets 31
Figure 5.5: Left image of image pair C, “Elgiganten”.
Figure 5.6: Right image of image pair C, “Elgiganten”.
32 Chapter 5. Experiments
5.2.1 Experiment 1, Asphalt
Two asphalt areas were chosen to study the effects of different lightning in the same kind of area,
the effect of a gradient from a shadow, and how multiple seed points works in an area with a lot
of small, irregular structure. The chosen areas are area A.5 and area B.4. For both areas, TNCC
and Wallis ﬁltering are used, and in the second run also combined with MSP. For reference, a run
without extensions is done.
5.2.2 Experiment 2, Brick walls
Three brick wall areas are studied to evaluate effects of skew in similar structure and different light
settings. Two of them were facing the camera almost directly, one in direct sunlight and the other
one in shadow. The third one was orthogonal to the other two and was in shadow. The areas used are
in image A, area A.1, area A.2 and area A.4. The extension MSP was compared to baseline LSTM.
5.2.3 Experiment 3, Door
The door in image area A.3 is very dark and contain little information detectable for the human eye.
This area is studied to see if LSTM is capable of matching under these circumstances. In this case,
the effects of adaptive patch size are analyzed and the patch sizes used are presented. The area used
is area A.3. The ﬁrst run for reference is using TNCC and Wallis ﬁltering, to the second run ATS is
5.2.4 Experiment 4, Lawn
The grass area C.1, is chosen to study the effects of Wallis ﬁltering on an image with irregular
gradients and little contrast. Two runs are performed, one baseline LSTM, and one with Wallis
5.2.5 Experiment 5, Corrugated plate
Area C.2 consists of a facade of corrugated plate with horizontal gradients but very few vertical
gradients. This area is studied to determine if it is possible to detect points in this kind of area with
the help of Wallis ﬁltering or/and ATS. As comparison, area C.3 is used which contains signs with
irregular, man-made gradients. The methods to test are baseline LSTM, only Wallis, and Wallis
combined with ATS.
In this section the results from the experiments are presented in detail.
The asphalt area contains lots of small irregular gradients as seen in image 6.2. In the Wallis ﬁltered
image the gradient from the shadow is distinct as well as parts of the others, see ﬁgure 6.3, but
some smaller gradients are suppressed. In this experiment, an outlier is deﬁned as a point whose 3D
coordinate is more than 5 cm away from a manually determined plane in the point cloud. As seen in
image 6.1, the combination of TNCC and Wallis only detects points along the gradient conjured by
the shadow. The effect of Wallis ﬁltering of asphalt is shown in image 4.6.
Table 6.1: Example 1, Area A.4 and area B.4. No improvements vs. TNCC + Wallis vs. TNCC +
Wallis + MSP vs. Wallis vs. Wallis + MSP.
Area + method A.5 + 0 A.5 + cw A.5 + cwm A.5 + w A.5 + wm
Number of accepted points 5601 79 79 6363 6423
Runtime (s) 184.3 7646 9412 335.7 474.4
Median residuals 0.01 0.009 0.008 0.006 0.006
Std.deviation 0.015 0.01 0.012 0.012 0.012
No. outliers 30 0 0 0 0
Area + method B.4 + 0 B.4 + cw B.4 + cwm B.4 + w B.4 + wm
Number of accepted points 5481 79 79 4922 4948
Runtime (s) 166.78 7850 9623 484.2 566.3
Residuals 0.15 0.21 0.20 0.12 0.15
Std.deviation 0.20 0.24 0.20 0.19 0.19
No. outliers 4690 6 15 4183 4205
34 Chapter 6. Results
Figure 6.1: Red points are points detected in area A.5 with TNCC and Wallis ﬁltering, green points
are seed points.
Figure 6.2: A patch of asphalt with a strong gradient from a shadow.
6.1. Experiments 35
Figure 6.3: Wallis ﬁltered image of asphalt.
36 Chapter 6. Results
6.1.2 Brick walls
In the brighter area A.2, less points are matched than in the shadow areas. A possible cause for this
is the less prominent gradients in the ﬁrst case. The number of accepted points with using LSTM
extended with MSP is close to those matched by standard LSTM.
Figure 6.4: 3d point cloud of matched points using LSTM with MSP in areas A.1, A.2 and A.4.
Table 6.2: Example 2, Area A.1, area A.2 and A.4. No improvements versus MSP.
Area + method A.1 + 0 A.1 + m A.2 + 0 A.2 + m A.4 +0 A.4 +m
Number of accepted points 3674 3683 1517 1484 2926 2902
Runtime (s) 105.2 845.4 138.9 1327 175.3 1428
6.1. Experiments 37
Figure 6.5: Result of MSP in area A.1 in left image, template in right image. Blue circles are the
multiple seed points, red dots are found points.
38 Chapter 6. Results
On the dark door, improvements with Wallis ﬁltering respectively Wallis with ATS give over 100
times as many hits as improvement with Wallis ﬁltering and TNCC, see table 6.3 and ﬁgure 6.6, 6.7,
and 6.8 for a comparison. A seen in the histogram in ﬁgure 6.9, larger template sizes only make a
minor increase to the amount of accepted points.
Table 6.3: Example 3, Area A.3. TNCC + Wallis vs. TNCC + Wallis + ATS.
Area + method A.3 + cw A.3 + wa A.3 w
Number of accepted points 41 9545 6182
Runtime (s) 7375 374.68 117.4
6.1. Experiments 39
Figure 6.6: Seed points giving accepted results in part of Area A.3 using Wallis and TNCC.
Figure 6.7: Seed points giving accepted results in part of Area A.3 using Wallis.
40 Chapter 6. Results
Figure 6.8: Seed points giving accepted results in part of Area A.3 using Wallis and ATS.
6.1. Experiments 41
Figure 6.9: Histogram over used template sizes in Experiment 3, A.3, with Wallis ﬁltering and ATS.
42 Chapter 6. Results
On the irregular gradients of a lawn, the extension of Wallis ﬁltering found about 75% more points
than standard LSTM, fulﬁlling the purpose of densiﬁcation, as seen in table 6.4. Figure 6.11 com-
pared to 6.10 shows, the Wallis ﬁltering helped for accepting points in areas where the baseline
Figure 6.10: Seed points giving accepted results in part of area C.1 using baseline LSTM.
Table 6.4: Results from example 4, Area C.1 baseline LSTM vs Wallis ﬁltering
Area + method C.1 + 0 C.1 + w
Number of accepted points 2283 3844
Runtime (s) 389 459
6.1. Experiments 43
Figure 6.11: Seed points giving accepted results in part of area C.1 with Wallis ﬁltering.
44 Chapter 6. Results
6.1.5 Corrugated plate
The ATS and Wallis extension to LSTM gives six times as many accepted points as plain LSTM,
see table 6.5. Only Wallis gives less number of points this time, and the ATS is the signiﬁcant
improverer. As seen in ﬁgure 6.14, the number of accepted points with template size 7 is the same
as all accepted points with the Wallis extension. More points is accepted when larger templates is
used. In area C.3 with the signs containing gradients in different directions, many more points are
accepted in all cases.
Figure 6.12: Seed points giving accepted results in part of area C.2 with baseline LSTM.
Table 6.5: Results Example 5, Area C.2 and C.3 no improvements, Wallis vs. Wallis and ATS.
Area + method C.2 + 0 C2 + w c.2 + wa C.3 + 0 C.3 + w C.3 + wa
Number of accepted points 172 148 1116 1139 2771 6037
Runtime (s) 278.5 150.6 1392 408.9 129.9 642.4
6.1. Experiments 45
Figure 6.13: Seed points giving accepted results in part of area C.2 with Wallis and ATS.
46 Chapter 6. Results
Figure 6.14: Histogram over required template sizes for acceptance of points in area C.2.
7.1 Evaluation of aims
In chapter 4, a set of properties to evaluate the methods were stated. All of these did not ﬁt this
implementation, however here are they commented.
–Point cloud quality:
•Completeness - The densities of point clouds are varying, some extensions gives denser
point clouds, some gives sparser but more exact clouds
•Robustness - Some of the extensions rejects many points due to possible large errors,
and hence their robustness is considered good.
•Precision - Is not generally computed due to no knowledge of ground truth.
–Method sensitivity to:
•Object geometry - Calculation gives that points with 0.2 meter deviation from the plane
are easy to match.
•Camera network geometry - Since only images obtained by a stereo rig of two cameras
with ﬁxed base line are used, the evaluation of method sensitivity with respect to camera
network geometry is not applicable.
48 Chapter 7. Discussion
7.2 Additional analysis
7.2.1 Point cloud density
As seen in the tables and ﬁgures of the results chapter, the density of the point clouds varies between
different choice of extensions to LSTM and type of area it is used on.
The main tendencies are
–TNCC reduces the density of the point cloud, sometimes more than ten times,
–Wallis ﬁltering improves usually the density of the point cloud signiﬁcantly, however there
are some exceptions, area C.2, B.2 and B.4, where the densities decreased some.
–ATS increases the density, how much depends on structure of the surface.
–MSP neither increase or decrease the density more than a few percent.
The runtime varies a lot between the different methods, and is also depending on the type of area.
Some things to notice:
–Wallis ﬁltering takes time directly depending on the size of the ﬁltered area. When matching
large amounts of points, the overhead for Wallis ﬁltering isn’t a problem.
–TNCC works fast on small areas, but in combination with APS it may generate too long
runtimes. A couple of runs in this conﬁguration are not presented because the runs exceeded
a week and therefore aborted. With a smaller maximum template size, for example 21 ×21
pixels, this would not have been a problem.
–MSP gives the predicted nine times longer runtimes than base line LSTM.
–ATS runtime varies depending on how often large template sizes are used. It has acceptable
speed if many points are accepted on small templates, but if the templates grows it slows down
7.2.3 Error codes
The error codes are analyzed for all cases except ATS, this is because the ATS error code, 3, over
rules every other error code.
The error code 0is set for the accepted points. This code applies to 30 % of the points total
As displayed in ﬁgure 7.1, code 5, the code for to high correlation between position and any
other optimization parameter is the most common one. This one implies that the optimization may
have gone wrong, and the returned point is not reliable. This code is possible to get with all the
The second most common error code is 4, indicating that the optimization did not converge
followed by error code 1 from the TNCC. As noted in the results chapter, TNCC gave much less
points than any of the other codes, and that is indicated by this common code only possible to set in
the TNCC cases. Those points accepted with TNCC are intended to be more exact than the others.
Error code 7is also set from the optimization, indicating to high standard deviation of the opti-
7.2. Additional analysis 49
Figure 7.1: Histogram over error codes
The MSP rejects very few points by its error code 2. It may be possible to set a higher threshold
on number of points converging to the same point here.
The error code 6is possible to be reset by 7in the error code logic, and may in reality ﬁt on more
APS is required to set the error code 3, and resets then all other codes, so that one is not used in
The quality of the homography is very relevant for the result, since a bad ﬁtting hompgraphy gives
bad seed points for the optimization. The code implemented for assisted creation of homographies
gives a warning if it is plausible that the quality is low, but it is recommended to always generate a
couple of homography matrices and compare them as well as test them on the image.
50 Chapter 7. Discussion
It is possible to densify a point cloud using extensions to LSTM. Aspects of point quality and time
consumptions need to be considered in comparison to the need of densiﬁcation of the point cloud.
The task of densiﬁcation of point clouds suffer from the same difﬁculty as most image analysis tasks;
generally working methods aren’t existing, and different kinds of objects take advantage of different
52 Chapter 8. Conclusions
During the work on this thesis a lot of questions possible to evaluate have risen. A selection of them
– Template sizes It would be possible to write an algorithm for ﬁnding the optimal patch size
in different parts of an image. This would probably include frequency analysis of the image
using Fourier transform or wavelets, or using the SIFT-detectors radius information.
– TNCC. The threshold for accepting a seed point from TNCC needs a deeper evaluation.
– Precision of point clouds. The point clouds generated from LSTM with the different ex-
tensions would be compared to ground truth of the object, determining the precision of the
– Bench mark.Constructing a set of images including camera calibration data, ground truth,
homographies and sets of points to match and detect would give the opportunity to compare
different methods regarding precision, robustness and time consumed in a re-usable way.
– MSP A deeper study of distribution of the multiple seed points, the threshold for the deﬁnition
of the same convergence point, and number of agreeing points can be interesting.
54 Chapter 9. Future work
I want to thank my supervisor, Niclas B¨
orlin, for taking his time to have me as a student. Your talent
in inspiring and making the subject fun to work with has been much worth for me to accomplish this
work. I also want to thank David Grundberg and H˚
akan Fors Nilsson, for helping me out a couple
56 Chapter 10. Acknowledgements
Steven D. Blostein and Thomas S. Huang. Error analysis in stereo determination of 3-d point posi-
tions. IEEE T Pattern Anal, 9(6):752–765, November 1987. doi: 10.1109/TPAMI.1987.4767982.
orlin and Christina Igasto. 3d measurements of buildings and environment for harbor sim-
ulators. Technical Report UMINF 09.19, Department of Computing Science, Ume˚
SE-901 87 Ume˚
a, Sweden, October 2009.
Nicolas D’Apuzzo. Surface measurement and tracking of human body parts from multi station
video sequences. PhD thesis, Institute of Geodesy and Photogrammetry, ETH Z¨
Switzerland, October 2003.
S.F. El-Hakim, J.-A. Beraldin, M. Picard, and G. Godin. Detailed 3d reconstruction of large-scale
heritage sites with integrated techniques. IEEE Comput Graphics Appl, 24(3):21–29, May 2004.
ISSN 0272-1716. doi: 10.1109/MCG.2004.1318815.
akan Fors Nilsson and David Grundberg. Plane-based close range photogrammetric reconstruction
of buildings. Master’s thesis, Department of Computing Science, Ume˚
a University, Technical
report UMNAD 784/09, UMINF 09.18 2009.
orstner and Bernhard Wrobel. Mathematical Concepts in Photogrammetry, chapter 2,
pages 15–180. IAPRS, 5 edition, 2004.
Jan-Michael Frahm, Marc Pollefeys, Brian Clipp, David Gallup, Rahul Raguram, ChangChang Wu,
and Christopher Zach. 3d reconstruction of architectural scenes from uncalibrated video se-
quences. International Archives of Photogrammetry, Remote Sensing, and Spatial Information
Sciences, XXXVIII(5/W1):7 pp, October 2009.
D. Gallup, J.-M. Frahm, P. Mordohai, Q. Yang, and M. Pollefeys. Real-time plane-sweeping stereo
with multiple sweeping directions. In Proc. CVPR, pages 1–8, Minneapolis, Minnesota, USA,
June 2007. IEEE. doi: 10.1109/CVPR.2007.383245.
Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing. Addison-Wesley, 3rd edition,
A. Gruen. Least squares matching: a fundamental measurement algorithm. In K. B. Atkinson,
editor, Close Range Photogrammetry and Machine Vision, chapter 8, pages 217–255. Whittles,
Caithness, Scotland, 1996.
A. W. Gruen. Adaptive least squares correlation: A powerful image matching technique. S Afr J of
Photogrammetry, 14(3):175–187, 1985.
un, Fabio Remondino, and Li Zhang. Photogrammetric reconstruction of the great buddha
of bamiyan, afghanistan. Photogramm Rec, 19(107):177–199, 2004.
R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University
Press, ISBN: 0521623049, 2000.
R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University
Press, ISBN: 0521540518, 2nd edition, 2003.
David G. Lowe. Object recognition from local scale-invariant features. In Proc Intl Conf on Com-
puter Vision, pages 1150–1157, Corfu, Greece, September 1999.
J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baseline stereo from maximally stable
extremal regions. In A. David Marshall and Paul L. Rosin, editors, Proc British Machine Vision
Conference, pages 384–393, Cardiff, UK, September 2002. British Machine Vision Association.
Chris McGlone, Edward Mikhail, and Jim Bethel, editors. Manual of Photogrammetry. ASPRS, 5th
edition, July 2004. ISBN 1-57083-071-1.
Douglas C. Montgomery, George C. Runger, and Norma Faris Hubele. In Engineering Statistics,
2004. ISBN 0-471-45240-8.
Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer-Verlag, 1999. ISBN
F. Remondino, S. El-Hakim, S. Girardi, A. Rizzi, S. Benedetti, and L. Gonzo. 3d virtual recon-
struction and visualization of complex architectures - the 3d-arch project. International Archives
of Photogrammetry, Remote Sensing, and Spatial Information Sciences, XXXVIII(5/W1):9 pp,
Fabio Remondino. Image-based modeling for object and human reconstruction. PhD thesis, Institute
of Geodesy and Photogrammetry, ETH Z¨
urich, ETH Hoenggerberg, Z¨
urich, Swizerland, 2006.
Fabio Remondino, Sabry F. El-Hakim, Armin Gruen, and Li Zhang. Development and performance
analysis of image matching for detailed surface reconstruction of heritage objects. IEEE Signal
Proc Mag, 25(4):55–64, July 2008.
Steven M. Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski. A compari-
son and evaluation of multi-view stereo reconstruction algorithms. In CVPR’06, volume 1, pages
519–528, June 2006. doi: 10.1109/CVPR.2006.19.
Gilbert Strang. Introduction to Linear Algebra. Wellesley-Cambridge, 3rd edition, 2003.
Axel Wendt. A concept for feature based data registration by simultaneous consideration of laser
scanner data and photogrammetric images. ISPRS J Photogramm, 62(2):122 – 134, 2007. ISSN
0924-2716. doi: DOI: 10.1016/j.isprsjprs.2006.12.001.
Kuk-Jin Yoon and In So Kweon. Distinctive similarity measure for stereo matching under point
ambiguity. Comp Vis Imag Under, 112(2):173 – 183, 2008. ISSN 1077-3142. doi: DOI:
Here is the homography matrices used for the different areas presented.
A.1 Loading dock
0.9591 −0.1217 −68.80
0.0052 0.7070 158.1
0.912 −0.055 12.70
−0.0158 0.9234 75.42
1.024 0.021 −95.45
0.008 1.083 −31.09
0 0 1
0.8794 −0.251 78.94
0.0011 0.772 96.15
0 0 1
60 Chapter A. Homographies
A.2 Building Sliperiet
0.9430 0.0028 58.96
−0.0175 0.971 71.08
0 0 1
0.9617 0.0133 12.37
−0.0131 0.989 57.58
0 0 1
0.9075 −0.0182 120.61
−0.0154 0.941 69.30
0 0 1
0 0 1
A.3 Building Elgiganten
1.077 0.602 −872.95
0.052 0.935 1.80
0 0 1
1.021 −0.003 −46.81
0.012 1.013 −49.32
0 0 1
1.046 −0.011 −67.00
0.014 1.025 −56.98
0 0 1
LSTM Least Squares Template Matching
ALSM Adaptive Least Squares Matching
GPU Graphics Processing Unit
VIP Viewpoint Invariant Patch
SVD Singular Value Decomposition
MSP Multiple Seed Points
TNCC Transformed Normalized Cross Correlation
ATS Adaptive Template Size
SIFT Scale Invariant Feature Transform