Conference PaperPDF Available

A Direct Least-Squares Solution to Multi-View Absolute and Relative Pose from 2D-3D Perspective Line Pairs


Abstract and Figures

We propose a new algorithm for estimating the absolute and relative pose of a multi-view camera system. We derive a direct least squares solver using Grobner basis which works both for the minimal case (set of 3 line pairs for each camera) and the general case using all inlier 2D-3D line pairs for a multi-view camera system. The algorithm has been validated on a large synthetic dataset as well as real data. Experimental results confirm the stable and real-time performance under realistic outlier ratio and noise on the line parameters. Comparative tests show that our method compares favorably to the latest state of the art algorithms.
Content may be subject to copyright.
A Direct Least-Squares Solution to Multi-View Absolute and Relative Pose from
2D-3D Perspective Line Pairs
Hichem Abdellalia, Robert Frohlicha
aUniversity of Szeged, Szeged, Hungary
Zoltan Katoa,b
bJ. Selye University, Komarno, Slovakia
We propose a new algorithm for estimating the absolute
and relative pose of a multi-view camera system. We de-
rive a direct least squares solver using Grobner basis which
works both for the minimal case (set of 3 line pairs for each
camera) and the general case using all inlier 2D-3D line
pairs for a multi-view camera system. The algorithm has
been validated on a large synthetic dataset as well as real
data. Experimental results confirm the stable and real-time
performance under realistic outlier ratio and noise on the
line parameters. Comparative tests show that our method
compares favorably to the latest state of the art algorithms.
1. Introduction
Absolute pose estimation consists in determining the po-
sition and orientation of a camera with respect to a 3D world
coordinate frame, while relative pose estimation aims to
compute the same parameters with respect to a reference
camera. Relative pose estimation is needed when we have
a system of two or more cameras. Absolute and relative
poses are fundamental in various computer vision applica-
tions, such as visual odometry, simultaneous localization
and mapping (SLAM), image-based localization and nav-
igation, augmented reality. The problem has been exten-
sively studied yielding various formulations and solutions.
Most of the approaches focus on a single camera pose esti-
This work was partially supported by the NKFI-6 fund through project
K120366; ”Integrated program for training new generation of scientists in
the fields of computer science”, EFOP-3.6.3-VEKOP-16-2017-0002; the
Research & Development Operational Programme for the project ”Mod-
ernization and Improvement of Technical Infrastructure for Research and
Development of J. Selye University in the Fields of Nanotechnology
and Intelligent Space”, ITMS 26210120042, co-funded by the European
Regional Development Fund; Research & Innovation Operational Pro-
gramme for the Project: ”Support of research and development activities
of J. Selye University in the field of Digital Slovakia and creative indus-
try”, ITMS code: NFP313010T504, co-funded by the European Regional
Development Fund.
mation using point correspondences. However, modern ap-
plications, especially in vision-based localization and nav-
igation for robotics and autonomous vehicles, it is often
desirable to use multi-camera systems which covers large
field of views and provides direct 3D measurements. The
absolute pose estimation of a perspective camera from n
2D–3D point correspondences is known in the literature
as the Perspective-n-Point (PnP) problem, which has been
widely studied in the last few decades [7,19,20,12,10].
Various solutions have been developed for both large nas
well as for the n= 3 minimal case (see [12] for a recent
overview). The use of line correspondences, known as the
Perspective-n-Line (PnL) problem, has also been investi-
gated in the last two decades, yielding robust and efficient
solutions (see [36] for a detailed overview). The minimal
case of n= 3 line correspondences is particularly impor-
tant as its solution is the basis for dealing with the general
PnL problem. It has been shown in [5], that P3L leads to
an 8th order polynomial, which is higher than the 4th or-
der polynomial of a P3P problem. While the use of point
and line correspondences are widespread, there are pose es-
timation methods relying on other type of correspondences,
e.g. set of regions [28,27] or silhouettes. However, such
approaches are typically computationally more expensive
hence they cannot be used as real-time solvers.
Recently, due to increasing popularity of multi-camera
systems in e.g. autonomous driving [17] and UAVs, the
problem of multi-view pose estimation has been addressed.
Solutions to the PnP or PnL problem cover only single-
view perspective cameras, hence new methods are needed
to efficiently deal with the generalized PnP (gPnP) or non-
perspective PnP (NPnP) [4,12,17,18]. As for relative
pose, lines in two images do not provide any constraints on
the camera pose [8]. However, if information is available
about some special 3D geometric configuration of the lines,
then relative pose can also be estimated [6]. In [33], relative
pose estimation is extended with a 3D line direction estima-
tion step applied for monocular visual odometry, while [21]
used a line-based space resection approach for UAV navi-
gation. More recently, in [38], lines of building structures
were used for trajectory estimation in a SLAM approach,
while in [26], a hybrid point and line based SLAM tech-
nique was proposed.
Several point-based [24,10,12] as well as mixed point
and line based methods [4,22] exist, but little work has
been done on using line correspondences only. For ab-
solute pose of a single camera, Mirzaei and Roumeliotis
proposed the first globally optimal non-iterative solution
(AlgLS) [23] which formulates the problem as a multi-
variate polynomial system with rotation parametrized by the
Cayley-Gibbs-Rodriguez (CGR) representation. Zhang et
al. proposed RPnL [37] which was further modified into the
Accurate Subset-based PnL (ASPnL) method [36], which
is one of the most accurate non-iterative method. Another
recent work from Wang et al. deals with the P3L [34] as
well as with the PnL [35] problem. In [35], a fast and ro-
bust solution is proposed (called SRPnL) and its superior
performance is confirmed by a comprehensive experimen-
tal comparison with many state of the art PnL solvers, like
AlgLS [23], ASPnL [36]. Therefore in this paper, we will
validate our method through various comparative experi-
ments with AlgLS and SRPnL as they perfomed the best
in [35]. For multi-view camera systems, one notable work
is the minimal NP3L solver of Lee [16], which deals with
the 6pose parameter estimation for a fully calibrated multi-
view camera system. In [9], the same problem is addressed
with known vertical direction which leads to two fast and
robust solvers.
While robust minimal solutions based on line correspon-
dences for absolute pose [23,37,36,16,9,35] or absolute
and relative pose with known vertical direction [1] exists,
none of these methods estimate full absolute and relative
poses simultaneously in a multiview system. In this paper,
we propose a direct least squares solution to this problem.
First, a direct solver using Grobner bases is proposed which
works for any n3number of lines suitable for hypoth-
esis testing like in RANSAC [7]. The poses can be further
refined through a few iterations of an iterative least squares
solver, which runs efficiently due to the optimal initializa-
tion provided by our direct solver. The performance and
robustness of the proposed method have been evaluated for
n3lines as well as for M1camera systems on large
synthetic datasets as well as on real data.
2. Perspective Lines and Camera Pose
Given a calibrated perspective camera, its camera matrix
P=K[R|t]consists of the internal calibration matrix K
and the camera pose [R|t]w.r.t. the world coordinate frame
W. A homogeneous 3D point Xis mapped by Pinto a
homogeneous 2D image point xas [8]
=PX =K[R|t]X,(1)
Figure 1. Projection of a 3D line in a 3 cameras system.
where ’
=’ denotes the equivalence of homogeneous coordi-
nates, i.e. equality up to a non-zero scale factor. Since we
assume a calibrated camera, we can multiply both sides of
(1) by K1and work with the equivalent normalized image
=K1PX = [R|t]X.(2)
The above equation is the starting point of perspective pose
estimation [12,20,19,14]: given a set of 3D–2D point cor-
respondences (xiXi), one can recover the 3D rigid
body transformation (R,t) : W C acting between the
world coordinate frame Wand the camera coordinate frame
3D lines may have various representations in the projec-
tive space [25,3]. In this paper, 3D lines will be represented
as L= (V,X), where Vis the unit direction vector of the
line and Xis a point on the line [29,9,1].
The projection of Lin a perspective camera is a 2D line
lwhich can also be represented as l= (v,x). Note that
the point xis not necessarily the image of the 3D point X!
Both the 3D line Land its perspective image llie on the
projection plane πpassing through the camera projection
center C(see Fig. 1). The unit normal to the plane πin the
camera coordinate system Cis denoted by n, which can be
computed from the image line las
Since Llies also on π, its direction vector Vis perpen-
dicular to n. Hence we get the following equation which
involves only the absolute pose (R,t)[9,1]
nRV =nVC= 0,(4)
where Ris the rotation matrix from the world Wto the
camera Cframe and VCdenotes the unit direction vector of
Lin the camera coordinate frame. Furthermore, the vector
from the camera center Cto the point Xon line Lis also
lying on π, thus it is also perpendicular to n:
n(RX +t) = nXC= 0,(5)
where tis the translation from the world Wto the reference
camera Cframe and XCdenotes the point Xon Lin the
camera coordinate frame.
The absolute pose (R,t)has 6degrees of freedom
(3rotation angles (α, β, γ )and 3translation components
(tx, ty, tz)along the 3coordinate axes). Thus to solve for
the pose using (4) and (5), we need a minimum of 3line cor-
respondences {(Li, li)}3
i=1, which is called the P3L prob-
lem. The solution is obtained in two steps: first the rotation
Ris solved using (4), which in general involves solving
a system of 8-th order polynomials. Then translation is ob-
tained from (5) by backsubstituting R, which yields a linear
system of equations in terms of t[36,16,37,34]. Clearly,
the main challange is the solution for Rdue to the nonlin-
earity of the equations as well as the additional constraints
on Rto be a valid rotation (i.e. orthonormal) matrix. Al-
though for special line configurations (e.g. orthogonal, par-
allel or intersecting lines) [36] or with additional knowledge
of e.g. the vertical direction [9,1], a lower order polynomial
may be achieved, usually the P3L polynomial will not be
lower than 8 for general line configurations [5].
Let us have a closer look at the parametrization of ro-
tations as the final form of (4) depends on this. It is well
known, that the rotation group SO(3) has 3degrees of free-
dom. The most popular parametrization is Euler angles,
which represents Rvia 3rotation angles around the co-
ordinate axes. Unfortunately, this representation involves
trigonometric functions which would yield trigonometric
equations in (4). One approach is to letting these trigono-
metric functions of one angle αto be two separate un-
knowns [20,36,37,9], which –together with the trigono-
metric constraints– lead to polynomial equations. Alterna-
tively, one can solve directly for the 9elements of Rin
(4) –as a linear system– and then enforce orthonormality on
the solution yielding again to (different) polynomial equa-
tions [35,36]. Herein, in order to eliminate trigonometric
functions, let us substitute s= tan(α/2) [14,2,9,1], for
which cos(α) = (1s2)/(1+s2)and sin(α) = 2s/(1+s2),
yielding a second order polynomial in sfor one rotation
around a coordinate axis.
3. Direct Least Squares Solver
In order to reduce the number of unknowns to 2in (4),
we eliminate one rotation in Rby defining an intermedi-
ate coordinate system M[37,36,35] in which the rota-
tion angle around the Xaxis can be easily obtained. First,
let us select a line pair (L0, l0)with the longest projection
length as longer edges are less affected by noise on their
endpoints [37,36,35]. The origin of Mis located at the
Algorithm 1 The proposed MRPnL algorithm.
Input: M: the number of the cameras 1and the refer-
ence camera C0
NCi2D-3D lines pairs (nCi
j,Xj, and Vj) for each cam-
era Ci
Output: The absolute pose (R,t) : W → C0and the rela-
tive poses (Ri,ti) : C0→ Ci
1: Normalize the 3D line points Xjby N.
2: Rotation: Determine the intermediate rotation RMand
apply it to the input line pairs. Then RM
xis calculated
and the two remaining rotation RMis obtained by solv-
ing the polynomial system of equations (13), which to-
gether provides R.
3: Translation: By back-substituting the rotation, solve the
linear system of equations (5) for C0or (17) for the
other cameras via SVD.
4: Optional refinement of the absolute and relative poses
for all cameras and lines simultaneously by solving the
system (19) in the lest squares sense (see Section 3.3).
5: Denormalize to get the final pose estimates.
origin of Wand its axes (XM,YM,ZM)are
where the Y axis of Maligns with nC
0. The rotation
RM= [XM,YM,ZM]rotates the normals and direc-
tion vectors into the intermediate frame M. The rotation
xaround X axis within Mis then easily calculated be-
cause it is the angle between the Z axis and VM
0, hence
the rotation matrix acting within the intermediate coordi-
nate frame Mis composed of the rotations around the re-
maining two axes as
(1 + s2)(1 + r2)RM=RM
z(r) =
(1 s2)(1 r2)2r(1 s2) 2s(r2+ 1)
2r(s2+ 1) (s2+ 1)(1 r2) 0
2s(1 r2) 4sr (1 s2)(r2+ 1)
and we obtain the new form of (4) as
xRMV) = nM⊤RMVM= 0.(10)
Expanding the above equation gives a 4-th order polynomial
of (s, r)with coefficients in terms of VMand nM:
2V1n2+ 2 V2n1
2V3n1+ 2 V1n3
= 0 (11)
Each line pair generates one such equation, yielding a sys-
tem of Nequations, which is solved in the least squares
sense. For this purpose, let’s take the sum of squares of the
above system
e(s, r) =
and then find arg min(s,r)e(s, r). The first order optimality
condition for (12) is
e(s, r) = "∂e(s,r )
∂r #="PN
i=1 bs
i=1 br
where for each line pair
2a1a7+ 2 a3a4+ 2 a5a9
4a1a2+ 2 a42
2a1a9+ 2 a4a5
6a2a7+ 6 a4a6+ 6 a8a9
4a1a6+ 4 a2a3+ 4 a4a7+ 4 a5a8+ 2 a92
6a2a9+ 6 a4a8
2a3a9+ 2 a5a7
4a1a8+ 4 a2a5+ 4 a4a9
8a2a6+ 4 a82
6a6a9+ 6 a7a8
4a3a6+ 2 a72
4a3a8+ 4 a5a6+ 4 a7a9
and similarly bsuscan also be expressed in terms of the
coefficients aof each line pair. Thus the solution of the
system of 2 polynomial equations (each of them is 7-th or-
der) in (13) provides the rotation parameters (s, r). Herein,
we use the automatic generator of Kukelova et al. [13] to
generate a solver using Grobner basis[13,15] for the sys-
tem in (13). Once the solution(s) are obtained, the complete
R, acting between the world Wand camera Cframe, is ob-
tained as R=R
The translation tis then obtained by backsubstituting R
into (5) yielding a system of linear equations, which can be
solved by SVD decomposition. While in (4), all quantities
are already normalized (nand Vare of unit length), (5) con-
tains the 3D point Xgiven in an arbitrary world coordinate
system W, which needs to be normalized for numerical sta-
bility [8]. Therefore, we transform our 3D line segments
into a unit cube around the origin of Wby a normaliz-
ing transformation Ncomposed of a translation [u, v, z]T
with [u, v, z]Tbeing the centroid of the 3D scene points;
followed by a uniform scaling s=1
max(|h|,|w|,|d|), where h
is the height, wis the width, and dis the depth of the 3D
data. The solution is then obtained in this normalized space,
hence the result (˜
t)need to be denormalized. Since the
equations used to solve for the rotation are unaffected by
this normalization (thanks to uniform scaling!), ˜
Ris the fi-
nal rotation, while the translation ˜
tneeds to be corrected by
applying I˜
Although (13) might have several solutions, the solver
will only return the real ones and then one has to select
the geometrically valid (R,t)based on the visibility of the
lines and the backprojection error (18) [36,16,35,9,1].
3.1. Multi-view Case and Relative Pose
When the 3D lines are viewed by a system of Mcal-
ibrated perspective cameras, each 3D line Lhas up to M
images li, i = 1 . . . M . One of the cameras is the refer-
ence camera C0therefore the absolute pose of the camera
system (R,t)is defined as the rigid transformation acting
between Wand C0, while individual camera frames Ciare
related to the reference camera frame via the relative poses
(Ri,ti) : C0→ Ci, i = 1,...,M 1. Of course, the
equations (4), (16) and (5) remain valid for C0, while for the
other cameras Ci, the projection of Lyields similar equa-
tions but the unknown relative pose (Ri,ti)will also be
involved. Hence (4) becomes:
iRiRV = 0 (15)
from which we get
xRMiRV) = nMiRMiVMi= 0
which –after a similar derivation as in the single camera
case– yields also a system of polynomial equations of the
same form as in (13), hence the same solver can be used
to solve for each camera Ci, i = 1, . . . , M 1. Once the
solutions are obtained, each Riis backsubstituted into the
corresponding linear system similar to (5):
i(RiXC0+ti) = n
i(Ri(RX +t) + ti) = 0 (17)
which is solved for tiby SVD.
3.2. Robust Outlier Filtering
In real applications, putative line correspondences are
extracted that are corrupted by outliers as well as noise.
While noise is handled well by our least squares for-
mulation of the equations, outliers must be removed
via RANSAC [7] or the M-estimator sample consensus
(MSAC) algorithm [30], which relies on a minimal solver
and a backprojection error. The minimal set consists of
3 line-pairs, providing 2 equations for the rotation in (4)
or (16), and 3 equations for the translation in (5) or (17).
The equations are then solved as outlined before and out-
liers are filtered using the backprojection error of the 3D
lines. Herein we used the error measure proposed in [16]
that practically calculates the mean of the shortest distances
dxsand dxefrom the 2D line segment endpoints xsand xe
to the corresponding infinite line determined by the back-
projected 3D line onto the normalized plane:
Note, that the error is normalized with the length of the 2D
line segment, hence making the measure independent of the
length of the detected 2D line segment.
3.3. Pose Refinement
We will now formulate a least-squares refinement for the
multi-view case, based on the equations (4) and (15) by sim-
ply stacking for each line pair in C0(4), (5) and for each
camera i= 1,...,M 1and each line pair in Ci(15)
and (17) containing the absolute pose (R,t)and the rela-
tive poses (Ri,ti):
j= 1,...,NC0:
jRVj= 0
j(RXj+t) = 0
i= 1,...,M 1; j= 1, . . . , NCi:
jRiRVj= 0
j(Ri(RXj+t) + ti) = 0
A least-squares solution is then obtained by minimizing the
squared error of the system, which can be solved via stan-
dard algorithms like Levenberg-Marquardt with the initial-
ization obtained from the direct solver. Note, that this step
is optional, and only executed for the overdetermined n > 3
case if the line parameters are noisy. Experiments show, that
Figure 2. Illustration of 10% 2D noise on 10 random lines placed
on 1plane. Red is original, blue one is the noisy.
the initialization given by the direct solver is sufficiently
stable, hence only a few iterations are needed to reach the
optimum. The proposed algorithm, that we call Multi-view
RPnL (MRPnL) is summarized in Algorithm 1.
4. Experimental Results
For the quantitative evaluation of the proposed method,
synthetic datasets were generated using the calibration pa-
rameters of a real perspective camera, with available phys-
ical parameters (sensor size), enabling us to represent our
3D scene in an equivalent metric space. Multiple sets of
1000 samples were generated containing 3D-2D line pairs.
The 3D scene was created with a typical urban or indoor en-
vironment in mind, where only few planar surfaces are usu-
ally visible in a camera at once, thus we created 3planes,
placing them randomly (with a rotation of ±30around all
3axes and ±[12] m horizontal and vertical translation and
±[0.51.5] m in depth) in the 3D space, each containing
20 random line segments, with a minimum length of 0.5m.
The 2D data was then generated by capturing images of the
scene, practically projecting the 3D lines with a virtual cam-
era of 2378x1580 pixel resolution. In each scene we placed
5cameras with a random rotation of ±50around all 3axes,
and random translation of ±1m in the horizontal and verti-
cal direction, while in the optical axis’ direction at [4 6] m
from the scene.
The estimated pose was evaluated in terms of the angu-
lar distance ǫ, that represents the overall rotation error, and
also in terms of translation error as the norm of the differ-
ence between the ground truth and estimated translation. To
evaluate the robustness of the methods against noisy line
detections, random noise was added to the line parameters.
Practically we simulated noise by corrupting one endpoint
of the line (similarly in 2D and 3D) as adding a random
number to each coordinate of the point up to the specified
percentage of the actual coordinate value. The unit direc-
tion vector was also modified in the same manner. We show
results for 3%,10% and 15% 2D and 3D noise levels, that
translate to an average shift on the image of 22px with 3%
noise up to 110px with 15% noise (example of 10% noise
can be seen in Fig. 2).
0 200 400 600 800 1000
Test cases
Rotation angular distance (deg)
MRPnL 3% 2D, m=2.11
MRPnL LM 3% 2D, m=2.18
AlgLs 3% 2D, m=1.52
SRPnL 3% 2D, m=1.93
0 200 400 600 800 1000
Test cases
Rotation angular distance (deg)
MRPnL 3% 3D, m=1.79
MRPnL LM 3% 3D, m=1.8
AlgLs 3% 3D, m=1.19
SRPnL 3% 3D, m=1.62
0 200 400 600 800 1000
Test cases
Translation error (m)
MRPnL 3% 2D, m=0.311
MRPnL LM 3% 2D, m=0.316
AlgLs 3% 2D, m=0.24
SRPnL 3% 2D, m=0.318
0 200 400 600 800 1000
Test cases
Translation error (m)
MRPnL 3% 3D, m=0.165
MRPnL LM 3% 3D, m=0.176
AlgLs 3% 3D, m=0.14
SRPnL 3% 3D, m=0.148
0 200 400 600 800 1000
Test cases
Time (s)
MRPnL 3% 2D, m=0.00227
MRPnL LM 3% 2D, m=0.00833
AlgLs 3% 2D, m=0.0536
SRPnL 3% 2D, m=0.00356
0 200 400 600 800 1000
Test cases
Time (s)
MRPnL 3% 3D, m=0.00245
MRPnL LM 3% 3D, m=0.0074
AlgLs 3% 3D, m=0.0534
SRPnL 3% 3D, m=0.00274
Figure 3. Rotation and translation error, and CPU time for MRPnL
with and without refinement, SRPnL, and the globally optimal Al-
gLS with 3% noise and n= 3 lines, except for AlgLs n= 4. First
column is with 2D noise, second column with 3D noise.
4.1. Comparison with State-of-the-Art
First, the performance of the proposed MRPnL direct
solver and MRPnL LM containing the refinement step in
Section 3.3 is evaluated for single camera case. Wang et
al. published a recent comparison [35] of 6 state of the
art methods with their SRPnL method, which proved to be
dominating both in terms of CPU time as well as efficiency
and robustness. Therefore herein, we only focus on com-
paring the proposed MRPnL algorithm with the most com-
petitive methods from [35]: SRPnL and AlgLS [23].
The Matlab implementation of the competing methods
are available from [35]. For a fair comparison, we used the
automatic generator of Kukelova et al. [13], that provides
a Matlab-based solver, but we remark that we also success-
fully used Kneip’s generator [11], which produces a solver
in C++ that is much faster. All experiments were done on
an i7computer with 16 GB of RAM.
Comparisons were performed in two different setups,
first using the minimum number of line matches that each
algorithm requires, then using all 60 line pairs of the scene,
with only a single camera, since the formulation of SRPnL
and AlgLS doesn’t cover multi-view setups.
For the first setup, using n= 4 lines for AlgLS and
n= 3 for the other methods, the obtained results are shown
in Fig. 3. Results on plots are always sorted based on the er-
ror from best to worst. In this setup all the methods perform
very similar in terms of median errors of the pose param-
eters, only AlgLS produces lower errors due to the higher
number of line-pairs it is using (n= 4). The methods are
robust for up to 3% 2D noise, where the median ǫalready
reaches above 2, only exception is AlgLS with 1.52. In
terms of runtime, MRPnL is the fastest with 2.4ms, then
SRPnL follows with 3.5ms as the highest median runtime
among multiple data sets. MRPnL LM takes more time
(8.3ms), but is still much faster than AlgLS (53ms).
Using n= 60 line-matches, AlgLS and MRPnL LM
have the best results with the lowest median rotation and
translation errors, with 15% noise MRPnL LM handles bet-
ter the 3D noise, while AlgLS favors the noise in 2D do-
main, both of them having the median ǫbelow 1.5with
2D noise and 1.05with 3D noise (see Fig. 4for results).
The only other method that handles well 15% noise both in
2D and 3D domain is MRPnL (median ǫ2.5and 2.17re-
spectively, and translation error of 0.21m and 0.1m), while
SRPnL can only obtain similar errors with up to 5% noise.
Since MRPnL robustly provides a good initialization, LM
refinement usually performs only 5iterations, keeping the
runtime comparable with SRPnL and MRPnL.
Based on the data presented in [35], we can confirm that
the CPU time of the methods does not change significantly
for n < 200 lines. Since in a realistic dataset of an urban
environment we shouldn’t expect to have such many inlier
pairs (e.g. the large scale dataset presented by [22] also uses
an average of 130 lines per image), we did not find an eval-
uation with hundreds of lines relevant.
4.2. Multi-view Case
The multi-view configuration presented in Section 3.1
with LM refinement (Section 3.3), was tested on data with
10% 2D noise using 5cameras. Results are shown in Fig. 5,
where –as a baseline– we also show the results achieved
with a single camera. Clearly, the accuracy of pose esti-
mates are consistent over all cameras. As for the CPU time,
one can see that it scales with the number of cameras but
still remains under 56 ms for 5cameras, which is slightly
faster than AlgLS for a single camera. It is thus clear
that the proposed MRPnL LM algorithm performs well in
a multi-view setup, median rotation errors remain below 1
and the translation below 22 cm.
4.3. Robustness to Outliers
Since in a multi-view system, filtering the outliers has
to be performed for each view independently from the oth-
ers, we evaluate the robustness to outliers on a single cam-
era only. The proposed MRPnL direct solver proved to be
the fastest and more robust to noise than SRPnL, thus it is
well suited for outlier detection in a RANSAC algorithm
(see Section 3.2). In our experiments, we used the built in
M-estimator sample consensus (MSAC) algorithm function
0 200 400 600 800 1000
Test cases
Rotation angular distance (deg)
MRPnL 15% 2D, m=2.51
MRPnL LM 15% 2D, m=1.49
AlgLs 15% 2D, m=1.28
SRPnL 15% 2D, m=10.1
0 200 400 600 800 1000
Test cases
Translation error (m)
MRPnL 15% 2D, m=0.212
MRPnL LM 15% 2D, m=0.43
AlgLs 15% 2D, m=0.176
SRPnL 15% 2D, m=0.68
0 200 400 600 800 1000
Test cases
Time (s)
MRPnL 15% 2D, m=0.00255
MRPnL LM 15% 2D, m=0.00901
AlgLs 15% 2D, m=0.0597
SRPnL 15% 2D, m=0.00303
0 200 400 600 800 1000
Test cases
Rotation angular distance (deg)
MRPnL 15% 3D, m=2.17
MRPnL LM 15% 3D, m=0.952
AlgLs 15% 3D, m=1.04
SRPnL 15% 3D, m=2.57
0 200 400 600 800 1000
Test cases
Translation error (m)
MRPnL 15% 3D, m=0.102
MRPnL LM 15% 3D, m=0.0667
AlgLs 15% 3D, m=0.118
SRPnL 15% 3D, m=0.0965
0 200 400 600 800 1000
Test cases
Time (s)
MRPnL 15% 3D, m=0.00295
MRPnL LM 15% 3D, m=0.00917
AlgLs 15% 3D, m=0.0596
SRPnL 15% 3D, m=0.00322
Figure 4. Rotation and translation error, and CPU time for MRPnL with and without refinement, SRPnL and the globally optimal AlgLS
with 15% noise and all 60 lines used, first row with 2D noise, second row with 3D noise.
0 200 400 600 800 1000
Test cases
Rotation angular distance (deg)
MRPnL LM Camera 1, m=0.931
MRPnL LM Camera 2, m=0.916
MRPnL LM Camera 3, m=0.883
MRPnL LM Camera 4, m=0.942
MRPnL LM Camera 5, m=0.886
MRPnL LM Single Camera, m=0.931
0 200 400 600 800 1000
Test cases
Translation error (m)
MRPnL LM Camera 1, m=0.212
MRPnL LM Camera 2, m=0.192
MRPnL LM Camera 3, m=0.201
MRPnL LM Camera 4, m=0.207
MRPnL LM Camera 5, m=0.193
MRPnL LM Single Camera, m=0.212
0 200 400 600 800 1000
Test cases
Time (s)
MRPnL LM 5 Cameras, m=0.0558
MRPnL LM Single Camera, m=0.00786
Figure 5. Overall rotation and translation error, and CPU time for MRPnL LM in a multi-view setup with 5cameras, using 10% 2D noise.
of Matlab [30] together with the backprojection error pre-
sented in (18).
The synthetic dataset previously defined was extended
by adding a specific number of outlier 2D-3D line-pairs
with randomly generated coordinates, to obtain the outlier
ratio of: 30% and 60% (26, 90 outliers respectively). The
threshold for RANSAC was experimentally determined as
the average between the maximum of the inliers’ and min-
imum of the outliers’ backprojection error calculated with
the reference pose. In our tests, RANSAC with MRPnL was
able to robustly filter out all outliers in most test cases, since
there was a clear separation between the inliers and outliers,
but we found that only a smaller inlier set can be obtained if
the outlier lines are taken randomly from the same planes as
the inliers, thus they are not different enough from the cor-
rect lines. Pose estimation errors of MRPnL on the obtained
inliers with 10% noise and 60% outlier ratio are very simi-
lar to the baseline results obtained using only the inliers, we
can observe an increase only in median translation errors, as
shown in Fig. 6. The expense of such filtering is visible in
the runtime plot in Fig. 6, where 30% outliers can be filtered
relatively fast, but 60% outliers in almost two seconds.
4.4. Real Data
To evaluate the proposed algorithm on real data, we used
a 2D-3D dataset captured in an outdoor urban environment,
containing a dense 3D pointcloud of the scene, captured
with a Riegl VZ400 Lidar (with an attached Nikon DSLR
providing reference RGB images), and 2D 4K resolution
video sequences captured by a flying UAV along differ-
ent trajectories. We extracted intermittent frames from the
video sequence (16 frames from a sequence of 1800), to
better evaluate the robustness to changing scene and light-
ing conditions. The ground truth pose of each camera was
estimated with UPnP [12] using reflective markers placed
on the scene, automatically detected by the scanner in 3D,
and manually selected and matched in 2D. For many multi-
view localization and visual odometry applications the most
important criteria of a good result is the correct projection
between 2D and 3D domain. This can be evaluated easily by
0 200 400 600 800 1000
Test cases
Rotation angular distance (deg)
MRPnL r=0%, m=1.72
MRPnL r=30%, m=1.77
MRPnL r=60%, m=1.82
0 200 400 600 800 1000
Test cases
Translation error (m)
MRPnL r=0%, m=0.131
MRPnL r=30%, m=0.155
MRPnL r=60%, m=0.173
0 200 400 600 800 1000
Test cases
Time (s)
MRPnL r=0%, m=0.00206
MRPnL r=30%, m=0.164
MRPnL r=60%, m=1.94
Figure 6. MRPnL LM pose estimation results on the inlier line-pairs obtained by RANSAC with 10% noise and r= 30% r= 60% outlier
ratio, compared to the baseline results without RANSAC on the inlier set (r= 0%).
Figure 7. MRPnL LM trajectory estimation results on 16 frames
of a longer drone sequence. Ground truth camera poses and the
trajectory are shown in green, the estimated ones in red, while the
used 3D lines (81 in total) are also visible (better viewed in color).
forward projection error measured in the marker points (for
the ground truth poses the maximum error was 12 cm, and
the median 4cm). 2D lines on the frames were extracted us-
ing OpenCV LSD detector [32]. In order to have a known
inlier set, 2D-3D matching is done by manually matching
2D lines to lines detected on the LIDAR reference RGB
images, that directly provides the corresponding 3D lines.
Other methods (e.g. [22]) rely on VisualSFM, and use the
estimated camera poses to project 2D lines into 3D, while
with our approach, learnable line segment descriptors [31]
could also be used for automatic 2D-2D matching.
Despite the fact that only a relatively small number of
lines were used (an average of 13 lines and maximum 21
lines per image, compared to e.g. [22], where they used 130
lines and 50 points per image), the proposed MRPnL solver
can estimate the absolute and relative poses quite robustly,
15 out of the 16 frames have a maximum forward projection
error of less than 30 cm, the median forward projection error
being 8cm. Applying the LM refinement to the whole sys-
tem increased the algorithm runtime from 80ms to 480ms,
but all error measures show improvement, median forward
projection error is reduced to 7.4cm. Results of the camera
path estimation can be seen in Fig. 7, where green marks
the ground truth trajectory and camera positions and red the
estimated one. All the used 81 3D lines are also shown in
random colors.
Up to 30% outlier ratio is well tolerated in the real case
too, showing similar results as above, even if the outliers
are quite similar to the inliers, since they are randomly se-
lected from the same visible scene planes. Results show
similar performance to the synthetic experiments, errors in-
crease only when the number of inliers gets drastically re-
duced by the outlier filtering due to no clear separation be-
tween inliers and outliers. We have shown, that the pro-
posed MRPnL method is able to handle the challenging path
estimation of 4DoF quadrotor UAVs that can have an unsys-
tematic movement, making sudden turns, floating around in
any direction.
5. Conclusion
A novel algebraic approach has been proposed for com-
puting the absolute and relative poses of a multi-view per-
spective camera system using line correspondences, which
works without reformulation both for minimal problems (3
line pairs per camera) as well as for the general n > 3and
multiple camera cases. Unlike previous approaches [37,36,
35], rotation is solved first through a two-variate 7-th order
system of polynomial equations using a Grobner basis di-
rect LSE solver, which reduces numerical error propagation
and works both for minimal and general line sets. Then the
translation is solved via a linear system. Experimental tests
on large synthetic as well as real datasets confirm the state
of the art performance of the proposed algorithm. Compar-
ative results show that our method outperforms recent al-
ternative methods (AlgLS [23], ASPnL [36], SRPnL [35])
in terms of speed, accuracy, and robustness. Furthermore,
unlike these methods, our algorithm works for multi-view
scenarios and is robust up to 60% outlier ratio when com-
bined with a RANSAC-like method.
[1] H. Abdellali and Z. Kato. Absolute and relative pose esti-
mation of a multi-view camera system using 2d-3d line pairs
and vertical direction. In Proceedings of International Con-
ference on Digital Image Computing: Techniques and Appli-
cations, pages 1–8, Canberra, Australia, Dec. 2018. IEEE. 2,
[2] C. Albl, Z. Kukelova, and T. Pajdla. Rolling shutter absolute
pose problem with known vertical direction. In Proceedings
of Conference on Computer Vision and Pattern Recognition,
pages 3355–3363, Las Vegas, NV, USA, June 2016. 3
[3] A. Bartoli and P. Sturm. The 3D line motion matrix and
alignment of line reconstructions. International Journal of
Computer Vision, 57(3):159–178, 2004. 2
[4] F. Camposeco, T. Sattler, and M. Pollefeys. Minimal solvers
for generalized pose and scale estimation from two rays and
one point. In B. Leibe, J. Matas, N. Sebe, and M. Welling,
editors, Proceedings of European Conference Computer Vi-
sion, volume 9909 of Lecture Notes in Computer Science,
pages 202–218, Amsterdam, The Netherlands, Oct. 2016.
Springer. 1,2
[5] H. Chen. Pose determination from line-to-plane corre-
spondences: Existence condition and closed-form solutions.
IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, 13(6):530–541, 1991. 1,3
[6] A. Elqursh and A. Elgammal. Line-based relative pose es-
timation. In Proceedings of Conference on Computer Vi-
sion and Pattern Recognition, pages 3049–3056, Colorado
Springs, CO, USA, June 2011. IEEE. 1
[7] M. A. Fischler and R. C. Bolles. Random sample consen-
sus: A paradigm for model fitting with applications to im-
age analysis and automated cartography. Commun. ACM,
24(6):381–395, 1981. 1,2,5
[8] R. Hartley and A. Zisserman. Multiple View Geometry in
Computer Vision. Cambridge University Press, Cambridge,
UK, 2004. 1,2,4
[9] N. Horanyi and Z. Kato. Multiview absolute pose using 3D
- 2D perspective line correspondences and vertical direction.
In Proceedings of ICCV Workshop on Multiview Relation-
ships in 3D Data, pages 1–9, Venice, Italy, Oct. 2017. IEEE.
[10] T. Ke and S. I. Roumeliotis. An efficient algebraic solution to
the perspective-three-point problem. In Proceedings of Con-
ference on Computer Vision and Pattern Recognition, pages
1–9, Honolulu, HI, USA, July 2017. IEEE. 1,2
[11] L. Kneip. Polyjam, 2015 [online].
url: 6
[12] L. Kneip, H. Li, and Y. Seo. UPnP: an optimal O(n) solution
to the absolute pose problem with universal applicability. In
D. J. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors,
Proceedings of European Conference Computer Vision, Part
I, volume 8689 of Lecture Notes in Computer Science, pages
127–142, Zurich, Switzerland, Sept. 2014. Springer. 1,2,7
[13] Z. Kukelova, M. Bujnak, and T. Pajdla. Automatic generator
of minimal problem solvers. In Proceedings of European
Conference on Computer Vision, pages 302–315. Springer
Berlin Heidelberg, 2008. 4,6
[14] Z. Kukelova, M. Bujnak, and T. Pajdla. Closed-form solu-
tions to minimal absolute pose problems with known vertical
direction. In R. Kimmel, R. Klette, and A. Sugimoto, editors,
Proceedings of Asian Conference on Computer Vision, Part
II, volume 6493 of LNCS, pages 216–229, Queenstown, New
Zealand, Nov. 2010. Springer. 2,3
[15] V. Larsson, M. Oskarsson, K. ˚
om, A. Wallis,
Z. Kukelova, and T. Pajdla. Beyond grobner bases: Basis
selection for minimal solvers. In Proceedings of Conference
on Computer Vision and Pattern Recognition, pages 3945–
3954, Salt Lake City, UT, USA, June 2018. IEEE Computer
Society. 4
[16] G. H. Lee. A minimal solution for non-perspective pose esti-
mation from line correspondences. In Proceedings of Euro-
pean Conference on Computer Vision, pages 170–185, Am-
sterdam, The Netherlands, Oct. 2016. Springer. 2,3,4,5
[17] G. H. Lee, F. Fraundorfer, and M. Pollefeys. Motion esti-
mation for self-driving cars with a generalized camera. In
Proceedings of Conference on Computer Vision and Pattern
Recognition, pages 2746–2753, Portland, OR, USA, June
2013. 1
[18] G. H. Lee, M. Pollefeys, and F. Fraundorfer. Relative pose
estimation for a multi-camera system with known vertical di-
rection. In Proceedings of Conference on Computer Vision
and Pattern Recognition, pages 540–547, Columbus, OH,
USA, June 2014. 1
[19] V. Lepetit, F.Moreno-Noguer, and P.Fua. EPnP: an accurate
O(n) solution to the PnP problem. International Journal of
Computer Vision, 81(2), 2009. 1,2
[20] S. Li, C. Xu, and M. Xie. A robust O(n) solution to the
perspective-n-point problem. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 34(7):1444–1450, 2012.
[21] J. Li-Chee-Ming and C. Armenakis. UAV navigation system
using line-based sensor pose estimation. Geo-spatial Infor-
mation Science, 21(1):2–11, jan 2018. 1
[22] P. Miraldo, T. Dias, and S. Ramalingam. A minimal closed-
form solution for multi-perspective pose estimation using
points and lines. In The European Conference on Computer
Vision, pages 1–17, Munich, Germany, Sept. 2018. Springer.
[23] F. M. Mirzaei and S. I. Roumeliotis. Globally optimal
pose estimation from line correspondences. In International
Conference on Robotics and Automation, pages 5581–5588,
Shanghai, China, May 2011. IEEE, IEEE. 2,6,8
[24] M. Persson and K. Nordberg. Lambda twist: An accurate
fast robust perspective three point (p3p) solver. In The Euro-
pean Conference on Computer Vision, pages 1–15, Munich,
Germany, Sept. 2018. Springer. 2
[25] H. Pottmann and J. Wallner. Computational Line Geometry.
Mathematics and Visualization. Springer, 2009. 2
[26] A. Pumarola, A. Vakhitov, A. Agudo, F. Moreno-Noguer,
and A. Sanfeliu. Relative localization for aerial manipulation
with PL-SLAM. In Springer Tracts in Advanced Robotics,
pages 239–248. Springer International Publishing, 2019. 2
[27] L. Tamas, R. Frohlich, and Z. Kato. Relative pose estima-
tion and fusion of omnidirectional and lidar cameras. In
L. de Agapito, M. M. Bronstein, and C. Rother, editors, Pro-
ceedings of the ECCV Workshop on Computer Vision for
Road Scene Understanding and Autonomous Driving, vol-
ume 8926 of Lecture Notes in Computer Science, pages 640–
651, Zurich, Switzerland, Sept. 2014. Springer. 1
[28] L. Tamas and Z. Kato. Targetless calibration of a lidar - per-
spective camera pair. In Proceedings of ICCV Workshop on
Big Data in 3D Computer Vision, pages 668–675, Sydney,
Australia, Dec. 2013. IEEE, IEEE. 1
[29] C. J. Taylor and D. J. Kriegman. Structure and motion from
line segments in multiple images. IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, 17(11):1021–1032,
Nov. 1995. 2
[30] P. Torr and A. Zisserman. MLESAC: A new robust estimator
with application to estimating image geometry. Computer
Vision and Image Understanding, 78(1):138–156, apr 2000.
[31] A. Vakhitov, V. Lempitsky, and Y. Zheng. Stereo relative
pose from line and point feature triplets. In Computer Vi-
sion – ECCV 2018, pages 662–677. Springer International
Publishing, 2018. 8
[32] R. von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall.
LSD: A fast line segment detector with a false detection con-
trol. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 32(4):722–732, apr 2010. 8
[33] N. von Schmude, P. Lothe, and B. J¨
ahne. Relative pose
estimation from straight lines using parallel line clustering
and its application to monocular visual odometry. In Pro-
ceedings of the 11th Joint Conference on Computer Vision,
Imaging and Computer Graphics Theory and Applications.
SCITEPRESS - Science and and Technology Publications,
2016. 1
[34] P. Wang, G. Xu, and Y. Cheng. A novel algebraic solution
to the perspective-three-line problem. Computer Vision and
Image Understanding, 2018. in press. 2,3
[35] P. Wang, G. Xu, Y. Cheng, and Q. Yu. Camera pose estima-
tion from lines: a fast, robust and general method. Machine
Vision and Applications, feb 2019. 2,3,4,6,8
[36] C. Xu, L. Zhang, L. Cheng, and R. Koch. Pose estimation
from line correspondences: A complete analysis and a se-
ries of solutions. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 39(6):1209–1222, 2017. 1,2,3,4,8
[37] L. Zhang, C. Xu, K. Lee, and R. Koch. Robust and efficient
pose estimation from line correspondences. In K. M. Lee,
Y. Matsushita, J. M. Rehg, and Z. Hu, editors, Proceesings of
Asian Conference on Computer Vision, volume 7726 of Lec-
ture Notes in Computer Science, pages 217–230, Daejeon,
Korea, Nov. 2012. Springer. 2,3,8
[38] H. Zhou, D. Zou, L. Pei, R. Ying, P. Liu, and W. Yu.
StructSLAM: Visual SLAM with building structure lines.
IEEE Transactions on Vehicular Technology, 64(4):1364–
1375, apr 2015. 1
... Besides, the method [32] can be applied to not only perspective but also omnidirectional cameras. Abdellali et al. [33] utilised a direct least squares solver with a Grobner basis for both the minimal and general configuration. ...
... Concretely, any 3D line segment consists of two vectors: a direction unit vector of the line and the vector which is perpendicular to the plane consisting of the line and the origin. The methods of second categories [30][31][32][33] express a 3D line segment utilising a unit direction vector of the line and a 3D point located on the line. Besides, these two kinds of methods attempt to estimate the pose in 3D space. ...
Full-text available
This study addresses the problem of non‐perspective pose estimation from line correspondences in the traffic scenarios. A coarse‐to‐fine 3D road registration method is proposed for this problem in two stages. Firstly, the iterative closest point algorithm is exploited to estimate the pose coarsely. An objective function is then established to incorporate the feature correspondences for refining the coarse pose. Besides, the framework including road registration is employed for traffic video augmentation. The framework begins with the inputs of traffic videos, road information from Geographic Information Systems and 3D models of traffic elements (e.g. vehicles, pedestrians). Subsequently, 3D road model generation and point‐to‐line correspondence establishment are achieved in the preprossessing stage. After road and viewpoint registration, the 3D graphic engine is employed to simulate the traffic scene with the road, viewpoints and traffic elements. The augmented videos are generated by fusing the original frames and newly projected traffic elements. The authors demonstrate the superiority of the proposed registration method by the comparison to state‐of‐the‐arts in both quantitative and qualitative experiments. In addition, the frames of the augmented videos validate the proposed method in the application.
... The relative error in optimal camera placement for the proposed work is 0.07% with 13.2% overlapping which makes the proposed solution suitable for the recognition task in a small field of view. 75.2% inlier is detected using 3 cameras in [66] while 10% inlier and 60% outlier ratio is achieved in [67]. Most work which used multi-view camera had a large coverage area [43,[45][46][47]. ...
Full-text available
This paper is on the concept of quantum steering and quantum entanglement of two observers. The concept is applied to a multi-view computer vision system that incorporates two cameras. Three separate multi-view static camera setups are used to compare the recognition accuracy and to improve the arrangement of cameras in gesture recognition and classroom organisation applications. Prominent features which are partially hidden from a viewpoint can increase performance if given attention. In view of that, this paper proposes an entanglement-based ranking technique that updates the weights of attentive features to improve the classification rate. Principles of entanglement are also used to optimise the position of cameras such that the authoritative hidden features are visible. The proposed technique has a high recognition rate with static cameras. It also shows a low error rate in the field of view when switching to other applications. The results are validated with derangement and Bland–Altman agreement test. The entanglement approach for determining the fine-tuned position of static cameras in a recognition task outperforms many active camera networks.
... Considering the limitations of traditional keypoint feature, various approaches resort to line features [11,[28][29][30], since straight lines can provide abundant information about the structural information of the images. Progress in line detection has been made by many researchers [15,16,31,32], which also boosts the study and the application of line matching [33][34][35][36]. ...
Full-text available
High-quality feature matching is a critical prerequisite in a wide range of applications. Most contemporary methods concentrate on detecting keypoints or line features for matching, which have achieved adequate results. However, in some low-texture environments where these features are both lacking, previously used approaches may result in an insufficient number of matches. Besides, in repeated-texture environments, feature matching is also a challenging task. As a matter of fact, there exist numerous irregular curves that can be detected in all kinds of images, including low-texture and repeated-texture scenes, which inspires us to move a step further and dig into the research of curves. In this paper, we propose an accurate method to match invariant features from irregular curves. Our method consists of two stages, the first of which is to match the curves as accurately as possible by an elaborate three-step matching strategy. The second is to extract the matching features with the presented self-adaptive curve fitting approach. Experiments have shown that the matching performances of our features in ordinary scenes are comparable to previous keypoints. Particularly, our features can outperform the keypoints in low-texture and repeated-texture scenes.
Full-text available
Estimating camera pose is one of the key steps in computer vison, photogrammetry and SLAM (Simultaneous Localization and Mapping). It is mainly calculated based on the 2D–3D correspondences of features, including 2D–3D point and line correspondences. If a zoom lens is equipped, the focal length needs to be estimated simultaneously. In this paper, a new method of fast and accurate pose estimation with unknown focal length using two 2D–3D line correspondences and the camera position is proposed. Our core contribution is to convert the PnL (perspective-n-line) problem with 2D–3D line correspondences into an estimation problem with 3D–3D point correspondences. One 3D line and the camera position in the world frame can define a plane, the 2D line projection of the 3D line and the camera position in the camera frame can define another plane, and actually the two planes are the same plane, which is the key geometric characteristic in this paper’s estimation of focal length and pose. We establish the transform between the normal vectors of the two planes with this characteristic, and this transform can be regarded as the camera projection of a 3D point. Then, the pose estimation using 2D–3D line correspondences is converted into pose estimation using 3D–3D point correspondences in intermediate frames, and, lastly, pose estimation can be finished quickly. In addition, using the property whereby the angle between two planes is invariant in both the camera frame and world frame, we can estimate the camera focal length quickly and accurately. Experimental results show that our proposed method has good performance in numerical stability, noise sensitivity and computational speed with synthetic data and real scenarios, and has strong robustness to camera position noise.
This paper presents a complete, accurate, and efficient solution for the Perspective-n-Line (PnL) problem. Generally, the camera pose can be determined from $N \geq 3$ 2D-3D line correspondences. The minimal problem $(N = 3)$ and the least-squares problem $(N \textgreater 3)$ are generally solved in different ways. This paper shows that a least-squares PnL problem can be transformed into a quadratic equation system that has the same form as the minimal problem. This leads to a unified solution for the minimal and least-squares PnL problems. We adopt the Gram-Schmidt process and a novel hidden variable polynomial solver to increase the numerical stability of our algorithm. Experimental results show that our algorithm is more accurate and robust than the state-of-the-art least-squares algorithms [1], [2], [3], [4] and is significantly faster. Moreover, our algorithm is more stable than previous minimal solutions [3], [5], [6] with comparable runtime.
Full-text available
In this paper, we revisit the perspective-n-line problem and propose a closed-form solution that is fast, robust and generally applicable. Our main idea is to formulate the pose estimation problem into an optimal problem. Our method only needs to solve a fifteenth-order and a fourth-order univariate polynomial, respectively, which makes the processes more easily understood and significantly improves the performance. Experiment results show that our method offers accuracy and precision comparable or better than existing state-of-the-art methods, but with significantly lower computational cost. This superior computational efficiency is particularly suitable for real applications.
Conference Paper
Full-text available
We propose a new algorithm for estimating the absolute and relative pose of a multi-view camera system. The algorithm relies on two solvers: a direct solver using a minimal set of 6 line pairs and a least squares solver which uses all inlier 2D-3D line pairs. The algorithm have been validated on a large synthetic dataset, experimental results confirm the stable and real-time performance under realistic noise on the line parameters as well as on the vertical direction. Furthermore,the algorithm performs well on real data with less then half degree rotation error and less than 25 cm translation error.
Full-text available
Stereo relative pose problem lies at the core of stereo visual odometry systems that are used in many applications. In this work we present two minimal solvers for the stereo relative pose. We specifically consider the case when a minimal set consists of three point or line features and each of them has three known projections on two stereo cameras. We validate the importance of this formulation for practical purposes in our experiments with motion estimation. We then present a complete classification of minimal cases with three point or line correspondences each having three projections, and present two new solvers that can handle all such cases. We demonstrate a considerable effect from the integration of the new solvers into a visual SLAM system.
Full-text available
This work presents a mapping and tracking system based on images to enable a small Unmanned Aerial Vehicle (UAV) to accurately navigate in indoor and GPS-denied outdoor environments. A method is proposed to estimate the UAV’s pose (i.e., the 3D position and orientation of the camera sensor) in real-time using only the on-board RGB camera as the UAV travels through a known 3D environment (i.e., a 3D CAD model). Linear features are extracted and automatically matched between images collected by the UAV’s onboard RGB camera and the 3D object model. The matched lines from the 3D model serve as ground control to estimate the camera pose in real-time via line-based space resection. The results demonstrate that the proposed model-based pose estimation algorithm provides sub-meter positioning accuracies in both indoor and outdoor environments. It is also that shown the proposed method can provide sparse updates to correct the drift from complementary simultaneous localization and mapping (SLAM)-derived pose estimates.
This chapter explains a precise SLAM technique, PL-SLAM, that allows to simultaneously process points and lines and tackle situations where point-only based methods are prone to fail, like poorly textured scenes or motion blurred images where feature points are vanished out. The method is remarkably robust against image noise, and that it outperforms state-of-the-art methods for point based contour alignment. The method can run in real-time and in a low cost hardware.
We present Lambda Twist; a novel P3P solver which is accurate, fast and robust. Current state-of-the-art P3P solvers find all roots to a quartic and discard geometrically invalid and duplicate solutions in a post-processing step. Instead of solving a quartic, the proposed P3P solver exploits the underlying elliptic equations which can be solved by a fast and numerically accurate diagonalization. This diagonalization requires a single real root of a cubic which is then used to find the, up to four, P3P solutions. Unlike the direct quartic solvers our method never computes geometrically invalid or duplicate solutions.
In this work, we present a novel algebraic method to the perspective-three-line (P3L) problem for determining the position and attitude of a calibrated camera from features of three known reference lines. Unlike other methods, the proposed method uses an intermediate camera frame F and an intermediate world frame E, with sparse known line coordinates, facilitating formulations of the P3L problem. Additionally, the rotation matrix between the frame E and the frame F is parameterized by using its orthogonality, and then a closed-form solution for the P3L pose problem is obtained from subsequent substitutions. This algebraic method makes the processes more easily followed and significantly improves the performance. The experimental results show that the proposed method offers numerical stability, accuracy and efficiency comparable or better than that of state-of-the-art method.
In this paper, we present a new algebraic method to solve the perspective-three-point (P3P) problem, which directly computes the rotation and position of a camera in the world frame without the intermediate derivation of the points in the camera frame. Unlike other online methods, the proposed method uses an “object“ coordinate frame, in which the known 3D coordinates are sparse, facilitating formulations of the P3P problem. Additionally, two auxiliary variables are introduced to parameterize the rotation and position matrix, and then a closed-form solution for the camera pose is obtained from subsequent substitutions. This algebraic approach makes the processes more easily followed and significantly improves the performance. Experimental results demonstrated that our method offers accuracy and precision comparable to the existing state-of-the art methods but it has better computational efficiency.