Content uploaded by Zoltan Kato

Author content

All content in this area was uploaded by Zoltan Kato on Sep 25, 2019

Content may be subject to copyright.

A Direct Least-Squares Solution to Multi-View Absolute and Relative Pose from

2D-3D Perspective Line Pairs∗

Hichem Abdellalia, Robert Frohlicha

aUniversity of Szeged, Szeged, Hungary

hichem,frohlich@inf.u-szeged.hu

Zoltan Katoa,b

bJ. Selye University, Komarno, Slovakia

kato@inf.u-szeged.hu

Abstract

We propose a new algorithm for estimating the absolute

and relative pose of a multi-view camera system. We de-

rive a direct least squares solver using Grobner basis which

works both for the minimal case (set of 3 line pairs for each

camera) and the general case using all inlier 2D-3D line

pairs for a multi-view camera system. The algorithm has

been validated on a large synthetic dataset as well as real

data. Experimental results conﬁrm the stable and real-time

performance under realistic outlier ratio and noise on the

line parameters. Comparative tests show that our method

compares favorably to the latest state of the art algorithms.

1. Introduction

Absolute pose estimation consists in determining the po-

sition and orientation of a camera with respect to a 3D world

coordinate frame, while relative pose estimation aims to

compute the same parameters with respect to a reference

camera. Relative pose estimation is needed when we have

a system of two or more cameras. Absolute and relative

poses are fundamental in various computer vision applica-

tions, such as visual odometry, simultaneous localization

and mapping (SLAM), image-based localization and nav-

igation, augmented reality. The problem has been exten-

sively studied yielding various formulations and solutions.

Most of the approaches focus on a single camera pose esti-

∗This work was partially supported by the NKFI-6 fund through project

K120366; ”Integrated program for training new generation of scientists in

the ﬁelds of computer science”, EFOP-3.6.3-VEKOP-16-2017-0002; the

Research & Development Operational Programme for the project ”Mod-

ernization and Improvement of Technical Infrastructure for Research and

Development of J. Selye University in the Fields of Nanotechnology

and Intelligent Space”, ITMS 26210120042, co-funded by the European

Regional Development Fund; Research & Innovation Operational Pro-

gramme for the Project: ”Support of research and development activities

of J. Selye University in the ﬁeld of Digital Slovakia and creative indus-

try”, ITMS code: NFP313010T504, co-funded by the European Regional

Development Fund.

mation using point correspondences. However, modern ap-

plications, especially in vision-based localization and nav-

igation for robotics and autonomous vehicles, it is often

desirable to use multi-camera systems which covers large

ﬁeld of views and provides direct 3D measurements. The

absolute pose estimation of a perspective camera from n

2D–3D point correspondences is known in the literature

as the Perspective-n-Point (PnP) problem, which has been

widely studied in the last few decades [7,19,20,12,10].

Various solutions have been developed for both large nas

well as for the n= 3 minimal case (see [12] for a recent

overview). The use of line correspondences, known as the

Perspective-n-Line (PnL) problem, has also been investi-

gated in the last two decades, yielding robust and efﬁcient

solutions (see [36] for a detailed overview). The minimal

case of n= 3 line correspondences is particularly impor-

tant as its solution is the basis for dealing with the general

PnL problem. It has been shown in [5], that P3L leads to

an 8th order polynomial, which is higher than the 4th or-

der polynomial of a P3P problem. While the use of point

and line correspondences are widespread, there are pose es-

timation methods relying on other type of correspondences,

e.g. set of regions [28,27] or silhouettes. However, such

approaches are typically computationally more expensive

hence they cannot be used as real-time solvers.

Recently, due to increasing popularity of multi-camera

systems in e.g. autonomous driving [17] and UAVs, the

problem of multi-view pose estimation has been addressed.

Solutions to the PnP or PnL problem cover only single-

view perspective cameras, hence new methods are needed

to efﬁciently deal with the generalized PnP (gPnP) or non-

perspective PnP (NPnP) [4,12,17,18]. As for relative

pose, lines in two images do not provide any constraints on

the camera pose [8]. However, if information is available

about some special 3D geometric conﬁguration of the lines,

then relative pose can also be estimated [6]. In [33], relative

pose estimation is extended with a 3D line direction estima-

tion step applied for monocular visual odometry, while [21]

used a line-based space resection approach for UAV navi-

gation. More recently, in [38], lines of building structures

were used for trajectory estimation in a SLAM approach,

while in [26], a hybrid point and line based SLAM tech-

nique was proposed.

Several point-based [24,10,12] as well as mixed point

and line based methods [4,22] exist, but little work has

been done on using line correspondences only. For ab-

solute pose of a single camera, Mirzaei and Roumeliotis

proposed the ﬁrst globally optimal non-iterative solution

(AlgLS) [23] which formulates the problem as a multi-

variate polynomial system with rotation parametrized by the

Cayley-Gibbs-Rodriguez (CGR) representation. Zhang et

al. proposed RPnL [37] which was further modiﬁed into the

Accurate Subset-based PnL (ASPnL) method [36], which

is one of the most accurate non-iterative method. Another

recent work from Wang et al. deals with the P3L [34] as

well as with the PnL [35] problem. In [35], a fast and ro-

bust solution is proposed (called SRPnL) and its superior

performance is conﬁrmed by a comprehensive experimen-

tal comparison with many state of the art PnL solvers, like

AlgLS [23], ASPnL [36]. Therefore in this paper, we will

validate our method through various comparative experi-

ments with AlgLS and SRPnL as they perfomed the best

in [35]. For multi-view camera systems, one notable work

is the minimal NP3L solver of Lee [16], which deals with

the 6pose parameter estimation for a fully calibrated multi-

view camera system. In [9], the same problem is addressed

with known vertical direction which leads to two fast and

robust solvers.

While robust minimal solutions based on line correspon-

dences for absolute pose [23,37,36,16,9,35] or absolute

and relative pose with known vertical direction [1] exists,

none of these methods estimate full absolute and relative

poses simultaneously in a multiview system. In this paper,

we propose a direct least squares solution to this problem.

First, a direct solver using Grobner bases is proposed which

works for any n≥3number of lines suitable for hypoth-

esis testing like in RANSAC [7]. The poses can be further

reﬁned through a few iterations of an iterative least squares

solver, which runs efﬁciently due to the optimal initializa-

tion provided by our direct solver. The performance and

robustness of the proposed method have been evaluated for

n≥3lines as well as for M≥1camera systems on large

synthetic datasets as well as on real data.

2. Perspective Lines and Camera Pose

Given a calibrated perspective camera, its camera matrix

P=K[R|t]consists of the internal calibration matrix K

and the camera pose [R|t]w.r.t. the world coordinate frame

W. A homogeneous 3D point Xis mapped by Pinto a

homogeneous 2D image point x′as [8]

x′∼

=PX =K[R|t]X,(1)

n0

V

X

L

(R,t)

n1

n2

Y

Z

X

y

z

x

imageplane0

v0

x0

l0

y

zx

imageplane2

v2

x2

l2

y

z

x

imageplane1

v1

x1

l1

(R2,t2)

(R1,t1)

C0

C2

C1

W

Figure 1. Projection of a 3D line in a 3 cameras system.

where ’∼

=’ denotes the equivalence of homogeneous coordi-

nates, i.e. equality up to a non-zero scale factor. Since we

assume a calibrated camera, we can multiply both sides of

(1) by K−1and work with the equivalent normalized image

x=K−1x′∼

=K−1PX = [R|t]X.(2)

The above equation is the starting point of perspective pose

estimation [12,20,19,14]: given a set of 3D–2D point cor-

respondences (xi↔Xi), one can recover the 3D rigid

body transformation (R,t) : W → C acting between the

world coordinate frame Wand the camera coordinate frame

C.

3D lines may have various representations in the projec-

tive space [25,3]. In this paper, 3D lines will be represented

as L= (V,X), where Vis the unit direction vector of the

line and Xis a point on the line [29,9,1].

The projection of Lin a perspective camera is a 2D line

lwhich can also be represented as l= (v,x). Note that

the point xis not necessarily the image of the 3D point X!

Both the 3D line Land its perspective image llie on the

projection plane πpassing through the camera projection

center C(see Fig. 1). The unit normal to the plane πin the

camera coordinate system Cis denoted by n, which can be

computed from the image line las

n=(v×x)

kv×xk.(3)

Since Llies also on π, its direction vector Vis perpen-

dicular to n. Hence we get the following equation which

involves only the absolute pose (R,t)[9,1]

n⊤RV =n⊤VC= 0,(4)

where Ris the rotation matrix from the world Wto the

camera Cframe and VCdenotes the unit direction vector of

Lin the camera coordinate frame. Furthermore, the vector

from the camera center Cto the point Xon line Lis also

lying on π, thus it is also perpendicular to n:

n⊤(RX +t) = n⊤XC= 0,(5)

where tis the translation from the world Wto the reference

camera Cframe and XCdenotes the point Xon Lin the

camera coordinate frame.

The absolute pose (R,t)has 6degrees of freedom

(3rotation angles (α, β, γ )and 3translation components

(tx, ty, tz)along the 3coordinate axes). Thus to solve for

the pose using (4) and (5), we need a minimum of 3line cor-

respondences {(Li, li)}3

i=1, which is called the P3L prob-

lem. The solution is obtained in two steps: ﬁrst the rotation

Ris solved using (4), which in general involves solving

a system of 8-th order polynomials. Then translation is ob-

tained from (5) by backsubstituting R, which yields a linear

system of equations in terms of t[36,16,37,34]. Clearly,

the main challange is the solution for Rdue to the nonlin-

earity of the equations as well as the additional constraints

on Rto be a valid rotation (i.e. orthonormal) matrix. Al-

though for special line conﬁgurations (e.g. orthogonal, par-

allel or intersecting lines) [36] or with additional knowledge

of e.g. the vertical direction [9,1], a lower order polynomial

may be achieved, usually the P3L polynomial will not be

lower than 8 for general line conﬁgurations [5].

Let us have a closer look at the parametrization of ro-

tations as the ﬁnal form of (4) depends on this. It is well

known, that the rotation group SO(3) has 3degrees of free-

dom. The most popular parametrization is Euler angles,

which represents Rvia 3rotation angles around the co-

ordinate axes. Unfortunately, this representation involves

trigonometric functions which would yield trigonometric

equations in (4). One approach is to letting these trigono-

metric functions of one angle αto be two separate un-

knowns [20,36,37,9], which –together with the trigono-

metric constraints– lead to polynomial equations. Alterna-

tively, one can solve directly for the 9elements of Rin

(4) –as a linear system– and then enforce orthonormality on

the solution yielding again to (different) polynomial equa-

tions [35,36]. Herein, in order to eliminate trigonometric

functions, let us substitute s= tan(α/2) [14,2,9,1], for

which cos(α) = (1−s2)/(1+s2)and sin(α) = 2s/(1+s2),

yielding a second order polynomial in sfor one rotation

around a coordinate axis.

3. Direct Least Squares Solver

In order to reduce the number of unknowns to 2in (4),

we eliminate one rotation in Rby deﬁning an intermedi-

ate coordinate system M[37,36,35] in which the rota-

tion angle around the Xaxis can be easily obtained. First,

let us select a line pair (L0, l0)with the longest projection

length as longer edges are less affected by noise on their

endpoints [37,36,35]. The origin of Mis located at the

Algorithm 1 The proposed MRPnL algorithm.

Input: M: the number of the cameras ≥1and the refer-

ence camera C0

NCi2D-3D lines pairs (nCi

j,Xj, and Vj) for each cam-

era Ci

Output: The absolute pose (R,t) : W → C0and the rela-

tive poses (Ri,ti) : C0→ Ci

1: Normalize the 3D line points Xjby N.

2: Rotation: Determine the intermediate rotation RMand

apply it to the input line pairs. Then RM

xis calculated

and the two remaining rotation RMis obtained by solv-

ing the polynomial system of equations (13), which to-

gether provides R.

3: Translation: By back-substituting the rotation, solve the

linear system of equations (5) for C0or (17) for the

other cameras via SVD.

4: Optional reﬁnement of the absolute and relative poses

for all cameras and lines simultaneously by solving the

system (19) in the lest squares sense (see Section 3.3).

5: Denormalize to get the ﬁnal pose estimates.

origin of Wand its axes (XM,YM,ZM)are

YM=nC

0

knC

0k(6)

XM=nC

0×VW

0

knC

0×VW

0k(7)

ZM=XM×YM

kXM×YMk(8)

where the Y axis of Maligns with nC

0. The rotation

RM= [XM,YM,ZM]⊤rotates the normals and direc-

tion vectors into the intermediate frame M. The rotation

RM

xaround X axis within Mis then easily calculated be-

cause it is the angle between the Z axis and VM

0, hence

the rotation matrix acting within the intermediate coordi-

nate frame Mis composed of the rotations around the re-

maining two axes as

(1 + s2)(1 + r2)RM=RM

y(s)RM

z(r) =

(1 −s2)(1 −r2)−2r(1 −s2) 2s(r2+ 1)

2r(s2+ 1) (s2+ 1)(1 −r2) 0

−2s(1 −r2) 4sr (1 −s2)(r2+ 1)

(9)

and we obtain the new form of (4) as

(RMn)⊤RM(RM

xRMV) = nM⊤RMVM= 0.(10)

Expanding the above equation gives a 4-th order polynomial

of (s, r)with coefﬁcients in terms of VMand nM:

a⊤u=

V1n1+V2n2+V3n3

2V1n2−2V2n1

2V3n1−2V1n3

−V1n1−V2n2+V3n3

−V1n1+V2n2−V3n3

V1n1−V2n2−V3n3

2V1n2+ 2 V2n1

2V3n1+ 2 V1n3

4V2n3

⊤

·

1

r

s

r2

s2

s2r2

s2r

sr2

sr

= 0 (11)

Each line pair generates one such equation, yielding a sys-

tem of Nequations, which is solved in the least squares

sense. For this purpose, let’s take the sum of squares of the

above system

e(s, r) =

N

X

i=1

(a⊤

iu)2(12)

and then ﬁnd arg min(s,r)e(s, r). The ﬁrst order optimality

condition for (12) is

∇e(s, r) = "∂e(s,r )

∂s

∂e(s,r)

∂r #="PN

i=1 bs⊤

ius

PN

i=1 br⊤

iur#=0(13)

where for each line pair

br⊤ur=

2a1a4

6a2a4

2a1a7+ 2 a3a4+ 2 a5a9

4a1a2+ 2 a42

2a1a9+ 2 a4a5

6a2a7+ 6 a4a6+ 6 a8a9

4a1a6+ 4 a2a3+ 4 a4a7+ 4 a5a8+ 2 a92

6a2a9+ 6 a4a8

4a22

2a3a7

2a3a9+ 2 a5a7

4a1a8+ 4 a2a5+ 4 a4a9

4a62

8a6a8

8a2a6+ 4 a82

8a2a8

6a6a7

6a6a9+ 6 a7a8

4a3a6+ 2 a72

4a3a8+ 4 a5a6+ 4 a7a9

⊤

1

r2

s2

r

s

s2r2

s2r

sr2

r3

s4

s3

sr

r3s4

r3s3

r3s2

r3s

r2s4

r2s3

rs4

rs3

(14)

and similarly bs⊤uscan also be expressed in terms of the

coefﬁcients aof each line pair. Thus the solution of the

system of 2 polynomial equations (each of them is 7-th or-

der) in (13) provides the rotation parameters (s, r). Herein,

we use the automatic generator of Kukelova et al. [13] to

generate a solver using Grobner basis[13,15] for the sys-

tem in (13). Once the solution(s) are obtained, the complete

R, acting between the world Wand camera Cframe, is ob-

tained as R=R⊤

M(RMRM

x)RM.

The translation tis then obtained by backsubstituting R

into (5) yielding a system of linear equations, which can be

solved by SVD decomposition. While in (4), all quantities

are already normalized (nand Vare of unit length), (5) con-

tains the 3D point Xgiven in an arbitrary world coordinate

system W, which needs to be normalized for numerical sta-

bility [8]. Therefore, we transform our 3D line segments

into a unit cube around the origin of Wby a normaliz-

ing transformation Ncomposed of a translation −[u, v, z]T

with [u, v, z]Tbeing the centroid of the 3D scene points;

followed by a uniform scaling s=1

max(|h|,|w|,|d|), where h

is the height, wis the width, and dis the depth of the 3D

data. The solution is then obtained in this normalized space,

hence the result (˜

R,˜

t)need to be denormalized. Since the

equations used to solve for the rotation are unaffected by

this normalization (thanks to uniform scaling!), ˜

Ris the ﬁ-

nal rotation, while the translation ˜

tneeds to be corrected by

applying I˜

t

01N.

Although (13) might have several solutions, the solver

will only return the real ones and then one has to select

the geometrically valid (R,t)based on the visibility of the

lines and the backprojection error (18) [36,16,35,9,1].

3.1. Multi-view Case and Relative Pose

When the 3D lines are viewed by a system of Mcal-

ibrated perspective cameras, each 3D line Lhas up to M

images li, i = 1 . . . M . One of the cameras is the refer-

ence camera C0therefore the absolute pose of the camera

system (R,t)is deﬁned as the rigid transformation acting

between Wand C0, while individual camera frames Ciare

related to the reference camera frame via the relative poses

(Ri,ti) : C0→ Ci, i = 1,...,M −1. Of course, the

equations (4), (16) and (5) remain valid for C0, while for the

other cameras Ci, the projection of Lyields similar equa-

tions but the unknown relative pose (Ri,ti)will also be

involved. Hence (4) becomes:

n⊤

iRiVC0=n⊤

iRiRV = 0 (15)

from which we get

(RMini)⊤RMi(RMi

xRMiRV) = nMi⊤RMiVMi= 0

(16)

which –after a similar derivation as in the single camera

case– yields also a system of polynomial equations of the

same form as in (13), hence the same solver can be used

to solve for each camera Ci, i = 1, . . . , M −1. Once the

solutions are obtained, each Riis backsubstituted into the

corresponding linear system similar to (5):

n⊤

i(RiXC0+ti) = n⊤

i(Ri(RX +t) + ti) = 0 (17)

which is solved for tiby SVD.

3.2. Robust Outlier Filtering

In real applications, putative line correspondences are

extracted that are corrupted by outliers as well as noise.

While noise is handled well by our least squares for-

mulation of the equations, outliers must be removed

via RANSAC [7] or the M-estimator sample consensus

(MSAC) algorithm [30], which relies on a minimal solver

and a backprojection error. The minimal set consists of

3 line-pairs, providing 2 equations for the rotation in (4)

or (16), and 3 equations for the translation in (5) or (17).

The equations are then solved as outlined before and out-

liers are ﬁltered using the backprojection error of the 3D

lines. Herein we used the error measure proposed in [16]

that practically calculates the mean of the shortest distances

dxsand dxefrom the 2D line segment endpoints xsand xe

to the corresponding inﬁnite line determined by the back-

projected 3D line onto the normalized plane:

dxs+dxe

2(kxe−xsk)(18)

Note, that the error is normalized with the length of the 2D

line segment, hence making the measure independent of the

length of the detected 2D line segment.

3.3. Pose Reﬁnement

We will now formulate a least-squares reﬁnement for the

multi-view case, based on the equations (4) and (15) by sim-

ply stacking for each line pair in C0(4), (5) and for each

camera i= 1,...,M −1and each line pair in Ci(15)

and (17) containing the absolute pose (R,t)and the rela-

tive poses (Ri,ti):

∀j= 1,...,NC0:

nC0⊤

jRVj= 0

nC0⊤

j(RXj+t) = 0

∀i= 1,...,M −1; ∀j= 1, . . . , NCi:

nCi⊤

jRiRVj= 0

nCi⊤

j(Ri(RXj+t) + ti) = 0

(19)

A least-squares solution is then obtained by minimizing the

squared error of the system, which can be solved via stan-

dard algorithms like Levenberg-Marquardt with the initial-

ization obtained from the direct solver. Note, that this step

is optional, and only executed for the overdetermined n > 3

case if the line parameters are noisy. Experiments show, that

Figure 2. Illustration of 10% 2D noise on 10 random lines placed

on 1plane. Red is original, blue one is the noisy.

the initialization given by the direct solver is sufﬁciently

stable, hence only a few iterations are needed to reach the

optimum. The proposed algorithm, that we call Multi-view

RPnL (MRPnL) is summarized in Algorithm 1.

4. Experimental Results

For the quantitative evaluation of the proposed method,

synthetic datasets were generated using the calibration pa-

rameters of a real perspective camera, with available phys-

ical parameters (sensor size), enabling us to represent our

3D scene in an equivalent metric space. Multiple sets of

1000 samples were generated containing 3D-2D line pairs.

The 3D scene was created with a typical urban or indoor en-

vironment in mind, where only few planar surfaces are usu-

ally visible in a camera at once, thus we created 3planes,

placing them randomly (with a rotation of ±30◦around all

3axes and ±[1−2] m horizontal and vertical translation and

±[0.5−1.5] m in depth) in the 3D space, each containing

20 random line segments, with a minimum length of 0.5m.

The 2D data was then generated by capturing images of the

scene, practically projecting the 3D lines with a virtual cam-

era of 2378x1580 pixel resolution. In each scene we placed

5cameras with a random rotation of ±50◦around all 3axes,

and random translation of ±1m in the horizontal and verti-

cal direction, while in the optical axis’ direction at [4 −6] m

from the scene.

The estimated pose was evaluated in terms of the angu-

lar distance ǫ, that represents the overall rotation error, and

also in terms of translation error as the norm of the differ-

ence between the ground truth and estimated translation. To

evaluate the robustness of the methods against noisy line

detections, random noise was added to the line parameters.

Practically we simulated noise by corrupting one endpoint

of the line (similarly in 2D and 3D) as adding a random

number to each coordinate of the point up to the speciﬁed

percentage of the actual coordinate value. The unit direc-

tion vector was also modiﬁed in the same manner. We show

results for 3%,10% and 15% 2D and 3D noise levels, that

translate to an average shift on the image of 22px with 3%

noise up to 110px with 15% noise (example of 10% noise

can be seen in Fig. 2).

0 200 400 600 800 1000

Test cases

0

2

4

6

8

10

Rotation angular distance (deg)

MRPnL 3% 2D, m=2.11

MRPnL LM 3% 2D, m=2.18

AlgLs 3% 2D, m=1.52

SRPnL 3% 2D, m=1.93

0 200 400 600 800 1000

Test cases

0

2

4

6

8

10

Rotation angular distance (deg)

MRPnL 3% 3D, m=1.79

MRPnL LM 3% 3D, m=1.8

AlgLs 3% 3D, m=1.19

SRPnL 3% 3D, m=1.62

0 200 400 600 800 1000

Test cases

0

0.2

0.4

0.6

0.8

1

Translation error (m)

MRPnL 3% 2D, m=0.311

MRPnL LM 3% 2D, m=0.316

AlgLs 3% 2D, m=0.24

SRPnL 3% 2D, m=0.318

0 200 400 600 800 1000

Test cases

0

0.2

0.4

0.6

0.8

1

Translation error (m)

MRPnL 3% 3D, m=0.165

MRPnL LM 3% 3D, m=0.176

AlgLs 3% 3D, m=0.14

SRPnL 3% 3D, m=0.148

0 200 400 600 800 1000

Test cases

0

0.02

0.04

0.06

0.08

Time (s)

MRPnL 3% 2D, m=0.00227

MRPnL LM 3% 2D, m=0.00833

AlgLs 3% 2D, m=0.0536

SRPnL 3% 2D, m=0.00356

0 200 400 600 800 1000

Test cases

0

0.02

0.04

0.06

0.08

Time (s)

MRPnL 3% 3D, m=0.00245

MRPnL LM 3% 3D, m=0.0074

AlgLs 3% 3D, m=0.0534

SRPnL 3% 3D, m=0.00274

Figure 3. Rotation and translation error, and CPU time for MRPnL

with and without reﬁnement, SRPnL, and the globally optimal Al-

gLS with 3% noise and n= 3 lines, except for AlgLs n= 4. First

column is with 2D noise, second column with 3D noise.

4.1. Comparison with State-of-the-Art

First, the performance of the proposed MRPnL direct

solver and MRPnL LM containing the reﬁnement step in

Section 3.3 is evaluated for single camera case. Wang et

al. published a recent comparison [35] of 6 state of the

art methods with their SRPnL method, which proved to be

dominating both in terms of CPU time as well as efﬁciency

and robustness. Therefore herein, we only focus on com-

paring the proposed MRPnL algorithm with the most com-

petitive methods from [35]: SRPnL and AlgLS [23].

The Matlab implementation of the competing methods

are available from [35]. For a fair comparison, we used the

automatic generator of Kukelova et al. [13], that provides

a Matlab-based solver, but we remark that we also success-

fully used Kneip’s generator [11], which produces a solver

in C++ that is much faster. All experiments were done on

an i7computer with 16 GB of RAM.

Comparisons were performed in two different setups,

ﬁrst using the minimum number of line matches that each

algorithm requires, then using all 60 line pairs of the scene,

with only a single camera, since the formulation of SRPnL

and AlgLS doesn’t cover multi-view setups.

For the ﬁrst setup, using n= 4 lines for AlgLS and

n= 3 for the other methods, the obtained results are shown

in Fig. 3. Results on plots are always sorted based on the er-

ror from best to worst. In this setup all the methods perform

very similar in terms of median errors of the pose param-

eters, only AlgLS produces lower errors due to the higher

number of line-pairs it is using (n= 4). The methods are

robust for up to 3% 2D noise, where the median ǫalready

reaches above 2◦, only exception is AlgLS with 1.52◦. In

terms of runtime, MRPnL is the fastest with 2.4ms, then

SRPnL follows with 3.5ms as the highest median runtime

among multiple data sets. MRPnL LM takes more time

(8.3ms), but is still much faster than AlgLS (53ms).

Using n= 60 line-matches, AlgLS and MRPnL LM

have the best results with the lowest median rotation and

translation errors, with 15% noise MRPnL LM handles bet-

ter the 3D noise, while AlgLS favors the noise in 2D do-

main, both of them having the median ǫbelow 1.5◦with

2D noise and 1.05◦with 3D noise (see Fig. 4for results).

The only other method that handles well 15% noise both in

2D and 3D domain is MRPnL (median ǫ2.5◦and 2.17◦re-

spectively, and translation error of 0.21m and 0.1m), while

SRPnL can only obtain similar errors with up to 5% noise.

Since MRPnL robustly provides a good initialization, LM

reﬁnement usually performs only 5iterations, keeping the

runtime comparable with SRPnL and MRPnL.

Based on the data presented in [35], we can conﬁrm that

the CPU time of the methods does not change signiﬁcantly

for n < 200 lines. Since in a realistic dataset of an urban

environment we shouldn’t expect to have such many inlier

pairs (e.g. the large scale dataset presented by [22] also uses

an average of 130 lines per image), we did not ﬁnd an eval-

uation with hundreds of lines relevant.

4.2. Multi-view Case

The multi-view conﬁguration presented in Section 3.1

with LM reﬁnement (Section 3.3), was tested on data with

10% 2D noise using 5cameras. Results are shown in Fig. 5,

where –as a baseline– we also show the results achieved

with a single camera. Clearly, the accuracy of pose esti-

mates are consistent over all cameras. As for the CPU time,

one can see that it scales with the number of cameras but

still remains under 56 ms for 5cameras, which is slightly

faster than AlgLS for a single camera. It is thus clear

that the proposed MRPnL LM algorithm performs well in

a multi-view setup, median rotation errors remain below 1◦

and the translation below 22 cm.

4.3. Robustness to Outliers

Since in a multi-view system, ﬁltering the outliers has

to be performed for each view independently from the oth-

ers, we evaluate the robustness to outliers on a single cam-

era only. The proposed MRPnL direct solver proved to be

the fastest and more robust to noise than SRPnL, thus it is

well suited for outlier detection in a RANSAC algorithm

(see Section 3.2). In our experiments, we used the built in

M-estimator sample consensus (MSAC) algorithm function

0 200 400 600 800 1000

Test cases

0

5

10

15

20

25

Rotation angular distance (deg)

MRPnL 15% 2D, m=2.51

MRPnL LM 15% 2D, m=1.49

AlgLs 15% 2D, m=1.28

SRPnL 15% 2D, m=10.1

0 200 400 600 800 1000

Test cases

0

0.5

1

1.5

Translation error (m)

MRPnL 15% 2D, m=0.212

MRPnL LM 15% 2D, m=0.43

AlgLs 15% 2D, m=0.176

SRPnL 15% 2D, m=0.68

0 200 400 600 800 1000

Test cases

0

0.02

0.04

0.06

0.08

Time (s)

MRPnL 15% 2D, m=0.00255

MRPnL LM 15% 2D, m=0.00901

AlgLs 15% 2D, m=0.0597

SRPnL 15% 2D, m=0.00303

0 200 400 600 800 1000

Test cases

0

2

4

6

Rotation angular distance (deg)

MRPnL 15% 3D, m=2.17

MRPnL LM 15% 3D, m=0.952

AlgLs 15% 3D, m=1.04

SRPnL 15% 3D, m=2.57

0 200 400 600 800 1000

Test cases

0

0.1

0.2

0.3

Translation error (m)

MRPnL 15% 3D, m=0.102

MRPnL LM 15% 3D, m=0.0667

AlgLs 15% 3D, m=0.118

SRPnL 15% 3D, m=0.0965

0 200 400 600 800 1000

Test cases

0

0.02

0.04

0.06

0.08

Time (s)

MRPnL 15% 3D, m=0.00295

MRPnL LM 15% 3D, m=0.00917

AlgLs 15% 3D, m=0.0596

SRPnL 15% 3D, m=0.00322

Figure 4. Rotation and translation error, and CPU time for MRPnL with and without reﬁnement, SRPnL and the globally optimal AlgLS

with 15% noise and all 60 lines used, ﬁrst row with 2D noise, second row with 3D noise.

0 200 400 600 800 1000

Test cases

0

0.5

1

1.5

2

2.5

3

Rotation angular distance (deg)

MRPnL LM Camera 1, m=0.931

MRPnL LM Camera 2, m=0.916

MRPnL LM Camera 3, m=0.883

MRPnL LM Camera 4, m=0.942

MRPnL LM Camera 5, m=0.886

MRPnL LM Single Camera, m=0.931

0 200 400 600 800 1000

Test cases

0

0.2

0.4

0.6

0.8

1

Translation error (m)

MRPnL LM Camera 1, m=0.212

MRPnL LM Camera 2, m=0.192

MRPnL LM Camera 3, m=0.201

MRPnL LM Camera 4, m=0.207

MRPnL LM Camera 5, m=0.193

MRPnL LM Single Camera, m=0.212

0 200 400 600 800 1000

Test cases

0

0.02

0.04

0.06

0.08

Time (s)

MRPnL LM 5 Cameras, m=0.0558

MRPnL LM Single Camera, m=0.00786

Figure 5. Overall rotation and translation error, and CPU time for MRPnL LM in a multi-view setup with 5cameras, using 10% 2D noise.

of Matlab [30] together with the backprojection error pre-

sented in (18).

The synthetic dataset previously deﬁned was extended

by adding a speciﬁc number of outlier 2D-3D line-pairs

with randomly generated coordinates, to obtain the outlier

ratio of: 30% and 60% (26, 90 outliers respectively). The

threshold for RANSAC was experimentally determined as

the average between the maximum of the inliers’ and min-

imum of the outliers’ backprojection error calculated with

the reference pose. In our tests, RANSAC with MRPnL was

able to robustly ﬁlter out all outliers in most test cases, since

there was a clear separation between the inliers and outliers,

but we found that only a smaller inlier set can be obtained if

the outlier lines are taken randomly from the same planes as

the inliers, thus they are not different enough from the cor-

rect lines. Pose estimation errors of MRPnL on the obtained

inliers with 10% noise and 60% outlier ratio are very simi-

lar to the baseline results obtained using only the inliers, we

can observe an increase only in median translation errors, as

shown in Fig. 6. The expense of such ﬁltering is visible in

the runtime plot in Fig. 6, where 30% outliers can be ﬁltered

relatively fast, but 60% outliers in almost two seconds.

4.4. Real Data

To evaluate the proposed algorithm on real data, we used

a 2D-3D dataset captured in an outdoor urban environment,

containing a dense 3D pointcloud of the scene, captured

with a Riegl VZ400 Lidar (with an attached Nikon DSLR

providing reference RGB images), and 2D 4K resolution

video sequences captured by a ﬂying UAV along differ-

ent trajectories. We extracted intermittent frames from the

video sequence (16 frames from a sequence of 1800), to

better evaluate the robustness to changing scene and light-

ing conditions. The ground truth pose of each camera was

estimated with UPnP [12] using reﬂective markers placed

on the scene, automatically detected by the scanner in 3D,

and manually selected and matched in 2D. For many multi-

view localization and visual odometry applications the most

important criteria of a good result is the correct projection

between 2D and 3D domain. This can be evaluated easily by

0 200 400 600 800 1000

Test cases

0

2

4

6

8

10

Rotation angular distance (deg)

MRPnL r=0%, m=1.72

MRPnL r=30%, m=1.77

MRPnL r=60%, m=1.82

0 200 400 600 800 1000

Test cases

0

0.2

0.4

0.6

0.8

1

Translation error (m)

MRPnL r=0%, m=0.131

MRPnL r=30%, m=0.155

MRPnL r=60%, m=0.173

0 200 400 600 800 1000

Test cases

0

1

2

3

4

Time (s)

MRPnL r=0%, m=0.00206

MRPnL r=30%, m=0.164

MRPnL r=60%, m=1.94

Figure 6. MRPnL LM pose estimation results on the inlier line-pairs obtained by RANSAC with 10% noise and r= 30% r= 60% outlier

ratio, compared to the baseline results without RANSAC on the inlier set (r= 0%).

Figure 7. MRPnL LM trajectory estimation results on 16 frames

of a longer drone sequence. Ground truth camera poses and the

trajectory are shown in green, the estimated ones in red, while the

used 3D lines (81 in total) are also visible (better viewed in color).

forward projection error measured in the marker points (for

the ground truth poses the maximum error was 12 cm, and

the median 4cm). 2D lines on the frames were extracted us-

ing OpenCV LSD detector [32]. In order to have a known

inlier set, 2D-3D matching is done by manually matching

2D lines to lines detected on the LIDAR reference RGB

images, that directly provides the corresponding 3D lines.

Other methods (e.g. [22]) rely on VisualSFM, and use the

estimated camera poses to project 2D lines into 3D, while

with our approach, learnable line segment descriptors [31]

could also be used for automatic 2D-2D matching.

Despite the fact that only a relatively small number of

lines were used (an average of 13 lines and maximum 21

lines per image, compared to e.g. [22], where they used 130

lines and 50 points per image), the proposed MRPnL solver

can estimate the absolute and relative poses quite robustly,

15 out of the 16 frames have a maximum forward projection

error of less than 30 cm, the median forward projection error

being 8cm. Applying the LM reﬁnement to the whole sys-

tem increased the algorithm runtime from 80ms to 480ms,

but all error measures show improvement, median forward

projection error is reduced to 7.4cm. Results of the camera

path estimation can be seen in Fig. 7, where green marks

the ground truth trajectory and camera positions and red the

estimated one. All the used 81 3D lines are also shown in

random colors.

Up to 30% outlier ratio is well tolerated in the real case

too, showing similar results as above, even if the outliers

are quite similar to the inliers, since they are randomly se-

lected from the same visible scene planes. Results show

similar performance to the synthetic experiments, errors in-

crease only when the number of inliers gets drastically re-

duced by the outlier ﬁltering due to no clear separation be-

tween inliers and outliers. We have shown, that the pro-

posed MRPnL method is able to handle the challenging path

estimation of 4DoF quadrotor UAVs that can have an unsys-

tematic movement, making sudden turns, ﬂoating around in

any direction.

5. Conclusion

A novel algebraic approach has been proposed for com-

puting the absolute and relative poses of a multi-view per-

spective camera system using line correspondences, which

works without reformulation both for minimal problems (3

line pairs per camera) as well as for the general n > 3and

multiple camera cases. Unlike previous approaches [37,36,

35], rotation is solved ﬁrst through a two-variate 7-th order

system of polynomial equations using a Grobner basis di-

rect LSE solver, which reduces numerical error propagation

and works both for minimal and general line sets. Then the

translation is solved via a linear system. Experimental tests

on large synthetic as well as real datasets conﬁrm the state

of the art performance of the proposed algorithm. Compar-

ative results show that our method outperforms recent al-

ternative methods (AlgLS [23], ASPnL [36], SRPnL [35])

in terms of speed, accuracy, and robustness. Furthermore,

unlike these methods, our algorithm works for multi-view

scenarios and is robust up to 60% outlier ratio when com-

bined with a RANSAC-like method.

References

[1] H. Abdellali and Z. Kato. Absolute and relative pose esti-

mation of a multi-view camera system using 2d-3d line pairs

and vertical direction. In Proceedings of International Con-

ference on Digital Image Computing: Techniques and Appli-

cations, pages 1–8, Canberra, Australia, Dec. 2018. IEEE. 2,

3,4

[2] C. Albl, Z. Kukelova, and T. Pajdla. Rolling shutter absolute

pose problem with known vertical direction. In Proceedings

of Conference on Computer Vision and Pattern Recognition,

pages 3355–3363, Las Vegas, NV, USA, June 2016. 3

[3] A. Bartoli and P. Sturm. The 3D line motion matrix and

alignment of line reconstructions. International Journal of

Computer Vision, 57(3):159–178, 2004. 2

[4] F. Camposeco, T. Sattler, and M. Pollefeys. Minimal solvers

for generalized pose and scale estimation from two rays and

one point. In B. Leibe, J. Matas, N. Sebe, and M. Welling,

editors, Proceedings of European Conference Computer Vi-

sion, volume 9909 of Lecture Notes in Computer Science,

pages 202–218, Amsterdam, The Netherlands, Oct. 2016.

Springer. 1,2

[5] H. Chen. Pose determination from line-to-plane corre-

spondences: Existence condition and closed-form solutions.

IEEE Transactions on Pattern Analysis and Machine Intelli-

gence, 13(6):530–541, 1991. 1,3

[6] A. Elqursh and A. Elgammal. Line-based relative pose es-

timation. In Proceedings of Conference on Computer Vi-

sion and Pattern Recognition, pages 3049–3056, Colorado

Springs, CO, USA, June 2011. IEEE. 1

[7] M. A. Fischler and R. C. Bolles. Random sample consen-

sus: A paradigm for model ﬁtting with applications to im-

age analysis and automated cartography. Commun. ACM,

24(6):381–395, 1981. 1,2,5

[8] R. Hartley and A. Zisserman. Multiple View Geometry in

Computer Vision. Cambridge University Press, Cambridge,

UK, 2004. 1,2,4

[9] N. Horanyi and Z. Kato. Multiview absolute pose using 3D

- 2D perspective line correspondences and vertical direction.

In Proceedings of ICCV Workshop on Multiview Relation-

ships in 3D Data, pages 1–9, Venice, Italy, Oct. 2017. IEEE.

2,3,4

[10] T. Ke and S. I. Roumeliotis. An efﬁcient algebraic solution to

the perspective-three-point problem. In Proceedings of Con-

ference on Computer Vision and Pattern Recognition, pages

1–9, Honolulu, HI, USA, July 2017. IEEE. 1,2

[11] L. Kneip. Polyjam, 2015 [online].

url:https://github.com/laurentkneip/polyjam. 6

[12] L. Kneip, H. Li, and Y. Seo. UPnP: an optimal O(n) solution

to the absolute pose problem with universal applicability. In

D. J. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors,

Proceedings of European Conference Computer Vision, Part

I, volume 8689 of Lecture Notes in Computer Science, pages

127–142, Zurich, Switzerland, Sept. 2014. Springer. 1,2,7

[13] Z. Kukelova, M. Bujnak, and T. Pajdla. Automatic generator

of minimal problem solvers. In Proceedings of European

Conference on Computer Vision, pages 302–315. Springer

Berlin Heidelberg, 2008. 4,6

[14] Z. Kukelova, M. Bujnak, and T. Pajdla. Closed-form solu-

tions to minimal absolute pose problems with known vertical

direction. In R. Kimmel, R. Klette, and A. Sugimoto, editors,

Proceedings of Asian Conference on Computer Vision, Part

II, volume 6493 of LNCS, pages 216–229, Queenstown, New

Zealand, Nov. 2010. Springer. 2,3

[15] V. Larsson, M. Oskarsson, K. ˚

Astr¨

om, A. Wallis,

Z. Kukelova, and T. Pajdla. Beyond grobner bases: Basis

selection for minimal solvers. In Proceedings of Conference

on Computer Vision and Pattern Recognition, pages 3945–

3954, Salt Lake City, UT, USA, June 2018. IEEE Computer

Society. 4

[16] G. H. Lee. A minimal solution for non-perspective pose esti-

mation from line correspondences. In Proceedings of Euro-

pean Conference on Computer Vision, pages 170–185, Am-

sterdam, The Netherlands, Oct. 2016. Springer. 2,3,4,5

[17] G. H. Lee, F. Fraundorfer, and M. Pollefeys. Motion esti-

mation for self-driving cars with a generalized camera. In

Proceedings of Conference on Computer Vision and Pattern

Recognition, pages 2746–2753, Portland, OR, USA, June

2013. 1

[18] G. H. Lee, M. Pollefeys, and F. Fraundorfer. Relative pose

estimation for a multi-camera system with known vertical di-

rection. In Proceedings of Conference on Computer Vision

and Pattern Recognition, pages 540–547, Columbus, OH,

USA, June 2014. 1

[19] V. Lepetit, F.Moreno-Noguer, and P.Fua. EPnP: an accurate

O(n) solution to the PnP problem. International Journal of

Computer Vision, 81(2), 2009. 1,2

[20] S. Li, C. Xu, and M. Xie. A robust O(n) solution to the

perspective-n-point problem. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 34(7):1444–1450, 2012.

1,2,3

[21] J. Li-Chee-Ming and C. Armenakis. UAV navigation system

using line-based sensor pose estimation. Geo-spatial Infor-

mation Science, 21(1):2–11, jan 2018. 1

[22] P. Miraldo, T. Dias, and S. Ramalingam. A minimal closed-

form solution for multi-perspective pose estimation using

points and lines. In The European Conference on Computer

Vision, pages 1–17, Munich, Germany, Sept. 2018. Springer.

2,6,8

[23] F. M. Mirzaei and S. I. Roumeliotis. Globally optimal

pose estimation from line correspondences. In International

Conference on Robotics and Automation, pages 5581–5588,

Shanghai, China, May 2011. IEEE, IEEE. 2,6,8

[24] M. Persson and K. Nordberg. Lambda twist: An accurate

fast robust perspective three point (p3p) solver. In The Euro-

pean Conference on Computer Vision, pages 1–15, Munich,

Germany, Sept. 2018. Springer. 2

[25] H. Pottmann and J. Wallner. Computational Line Geometry.

Mathematics and Visualization. Springer, 2009. 2

[26] A. Pumarola, A. Vakhitov, A. Agudo, F. Moreno-Noguer,

and A. Sanfeliu. Relative localization for aerial manipulation

with PL-SLAM. In Springer Tracts in Advanced Robotics,

pages 239–248. Springer International Publishing, 2019. 2

[27] L. Tamas, R. Frohlich, and Z. Kato. Relative pose estima-

tion and fusion of omnidirectional and lidar cameras. In

L. de Agapito, M. M. Bronstein, and C. Rother, editors, Pro-

ceedings of the ECCV Workshop on Computer Vision for

Road Scene Understanding and Autonomous Driving, vol-

ume 8926 of Lecture Notes in Computer Science, pages 640–

651, Zurich, Switzerland, Sept. 2014. Springer. 1

[28] L. Tamas and Z. Kato. Targetless calibration of a lidar - per-

spective camera pair. In Proceedings of ICCV Workshop on

Big Data in 3D Computer Vision, pages 668–675, Sydney,

Australia, Dec. 2013. IEEE, IEEE. 1

[29] C. J. Taylor and D. J. Kriegman. Structure and motion from

line segments in multiple images. IEEE Transactions on Pat-

tern Analysis and Machine Intelligence, 17(11):1021–1032,

Nov. 1995. 2

[30] P. Torr and A. Zisserman. MLESAC: A new robust estimator

with application to estimating image geometry. Computer

Vision and Image Understanding, 78(1):138–156, apr 2000.

5,7

[31] A. Vakhitov, V. Lempitsky, and Y. Zheng. Stereo relative

pose from line and point feature triplets. In Computer Vi-

sion – ECCV 2018, pages 662–677. Springer International

Publishing, 2018. 8

[32] R. von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall.

LSD: A fast line segment detector with a false detection con-

trol. IEEE Transactions on Pattern Analysis and Machine

Intelligence, 32(4):722–732, apr 2010. 8

[33] N. von Schmude, P. Lothe, and B. J¨

ahne. Relative pose

estimation from straight lines using parallel line clustering

and its application to monocular visual odometry. In Pro-

ceedings of the 11th Joint Conference on Computer Vision,

Imaging and Computer Graphics Theory and Applications.

SCITEPRESS - Science and and Technology Publications,

2016. 1

[34] P. Wang, G. Xu, and Y. Cheng. A novel algebraic solution

to the perspective-three-line problem. Computer Vision and

Image Understanding, 2018. in press. 2,3

[35] P. Wang, G. Xu, Y. Cheng, and Q. Yu. Camera pose estima-

tion from lines: a fast, robust and general method. Machine

Vision and Applications, feb 2019. 2,3,4,6,8

[36] C. Xu, L. Zhang, L. Cheng, and R. Koch. Pose estimation

from line correspondences: A complete analysis and a se-

ries of solutions. IEEE Transactions on Pattern Analysis and

Machine Intelligence, 39(6):1209–1222, 2017. 1,2,3,4,8

[37] L. Zhang, C. Xu, K. Lee, and R. Koch. Robust and efﬁcient

pose estimation from line correspondences. In K. M. Lee,

Y. Matsushita, J. M. Rehg, and Z. Hu, editors, Proceesings of

Asian Conference on Computer Vision, volume 7726 of Lec-

ture Notes in Computer Science, pages 217–230, Daejeon,

Korea, Nov. 2012. Springer. 2,3,8

[38] H. Zhou, D. Zou, L. Pei, R. Ying, P. Liu, and W. Yu.

StructSLAM: Visual SLAM with building structure lines.

IEEE Transactions on Vehicular Technology, 64(4):1364–

1375, apr 2015. 1