# The L1-norm best-fit hyperplane problem.

**ABSTRACT** We formalize an algorithm for solving the L(1)-norm best-fit hyperplane problem derived using first principles and geometric insights about L(1) projection and L(1) regression. The procedure follows from a new proof of global optimality and relies on the solution of a small number of linear programs. The procedure is implemented for validation and testing. This analysis of the L(1)-norm best-fit hyperplane problem makes the procedure accessible to applications in areas such as location theory, computer vision, and multivariate statistics.

**0**Bookmarks

**·**

**122**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**This survey highlights the recent advances in algorithms for numerical linear algebra that have come from the technique of linear sketching, whereby given a matrix, one first compressed it to a much smaller matrix by multiplying it by a (usually) random matrix with certain properties. Much of the expensive computation can then be performed on the smaller matrix, thereby accelerating the solution for the original problem. In this survey we consider least squares as well as robust regression problems, low rank approximation, and graph sparsification. We also discuss a number of variants of these problems. Finally, we discuss the limitations of sketching methods.11/2014; - Journal of Mathematical Imaging and Vision 01/2014; · 2.33 Impact Factor
- SourceAvailable from: Dimitris A. Pados[Show abstract] [Hide abstract]

**ABSTRACT:**We describe ways to define and calculate L1-norm signal subspaces which are less sensitive to outlying data than L2-calculated subspaces. We start with the computation of the L1 maximum-projection principal component of a data matrix containing N signal samples of dimension D. We show that while the general problem is formally NP-hard in asymptotically large N, D, the case of engineering interest of fixed dimension D and asymptotically large sample size N is not. In particular, for the case where the sample size is less than the fixed dimension (N < D), we present in explicit form an optimal algorithm of computational cost 2^N. For the case N ≥ D, we present an optimal algorithm of complexity O(N^D). We generalize to multiple L1-max-projection components and present an explicit optimal L1 subspace calculation algorithm of complexity O(N^(DK−K+1)) where K is the desired number of L1 principal components (subspace rank). We conclude with illustrations of L1-subspace signal processing in the fields of data dimensionality reduction, direction-of-arrival estimation, and image conditioning/restoration.IEEE Transactions on Signal Processing 05/2014; · 3.20 Impact Factor

Page 1

The L1-norm best-fit hyperplane problem

JP Brooks1and JH Dul´ a2

1Corresponding Author, Department of Statistical Sciences and Operations Research, Virginia Commonwealth University, Richmond, VA 23284,

jpbrooks@vcu.edu

2Department of Management, Virginia Commonwealth University, Richmond, VA 23284

August 12, 2009

1Abstract

We present a simple and efficient algorithm for solving the L1-norm best-fit hyperplane problem

derived using first principles and intuitive geometric insights about L1projections. The problem

is easy to solve because the procedure relies on the solution of a small number of linear programs.

We provide a simple proof that global optimality is achieved. The procedure is implemented for

validation and testing. The result can be the basis for an L1principal component analysis method.

2Introduction

Given points xi∈ Rm, i = 1,...,n, consider the Lp-norm best-fit hyperplane problem for the case

when the hyperplane is an m − 1-dimensional subspace.

n

?

where ||·||pis the Lp-norm of the argument, V ∈ Rm×m−1, αi∈ Rm−1, and p ≥ 1. A solution to this

nonconvex mathematical program, (V∗,A∗= [α∗

Rm|x = V∗α for some α ∈ Rm−1}, that minimizes the sum of the p-norm distances of the points

to the subspace. Our results extend directly to the general hyperplane case. This representation

of an affine set in terms of linear combinations of vectors in V has several specialized applications

such as in providing information about the directions of dispersion in a point set with regard to

the Lp-norm.

The case when p = 2 is a well-studied problem dating back to Pearson [21]. The optimal solution

V∗consists of the m−1 eigenvectors of XTX, where XT= [x1,x2,...,xn], corresponding to the

m − 1 largest eigenvalues [13]. The solution minimizes the sum of Euclidean distances of points

to their orthogonal projections in the fitted hyperplane. The problem when p = 2 is a basis for

traditional principal component analysis (PCA). The columns of V∗define the first m−1 principal

components; the last principal component is the normal vector to the optimal hyperplane [13].

This paper deals with the case when p = 1. The solution to this problem minimizes the sum

of L1 distances of points to their L1 projections in the fitted hyperplane. The problem under

min

αi,i=1,...,n;V

i=1

||xi− V αi||p,

(1)

1,α∗

2,...,α∗

n]), defines a subspace in Rm, {x ∈

1

Page 2

consideration is not the orthogonal linear L1approximation problem (see, e.g. [23]). As we will

see, the optimal solution is based on the residuals in only a single unit direction rather than

the distances between points and their orthogonal projections in the fitted subspace. Our result

consolidates and synthesizes more general results about projections [17] and the problem of fitting

hyperplanes to data [18].

The L1-norm best-fit hyperplane problem has several applications. Ke and Kanade [14, 15] apply

a more general form of (1) with p = 1 where the subspace defined by V can have fewer than m−1

dimensions. They use the formulation in the context of subspace estimation for image analysis

using the affine camera model [15]. Agarwal et al. [1] solve the more general perspective camera

model. Kwak [16] treats a problem closely related to the L1-norm best-fit hyperplane problem

by finding successive directions of maximum variation by maximizing the L1lengths of projected

points along a fitted vector. The approach is the basis of an L1PCA method that he applies to

face recognition data. In each of these three works, there is no guarantee of global optimality in

polynomial time to the respective L1best-fit optimization problem formulations. Ke and Kanade

[14] provide an exact solution when m = 2, and a heuristic approach for generating locally optimal

solutions when m > 2. Agarwal et al. [1] reformulate the problem as a fractional program and

give a branch-and-bound algorithm for finding globally optimal solutions; the algorithm has an

exponential worst-case running time. Kwak [16] provides an algorithm with only local optimality

of solutions guaranteed.

Various schemes for PCA have been proposed that involve the L1-norm to impart robustness. Previ-

ous approaches include using the L1-norm for robust covariance matrix estimation [5, 8], specifying

a fixed-effects model based on a multivariate Laplace distribution and applying heuristics for pa-

rameter estimation [3, 9], and employing heuristics for finding successive directions of maximum

variation based on the L1-norm [7, 16].

Access to the L1-norm best-fit hyperplane obtained by solving (1) when p = 1 is the first step

towards a future pure L1-based PCA procedure. The procedure will project points down one

dimension at each iteration until the projected points lie in a one-dimensional subspace. Each

iteration will provide the direction of least dispersion in the respective subspace. This procedure

will benefit from well-known L1properties such as robustness to outliers.

3 Notation, Assumptions, Definitions

The span of the points has dimension at least m−1, so that there exists a matrix V∗of full column

rank that is optimal for (1). Boldface uppercase letters represent matrices, and boldface lowercase

letters represent vectors. A unit direction is a direction along one of the 2m unit vectors ±uj,

j = 1,...,m. The external representation [22] of a subspace S ⊆ Rmis the set {x ∈ Rm|Ax = 0}

for an appropriately-defined matrix A ∈ Rq×m.

subspace is {x ∈ Rm|x = VTα for some α} for a matrix V ∈ Rq×mwhere the rows of V span the

subspace. The projection of a point x onto a set S is the set of points P such that the distance

between x and points in P is minimum among all points in S; we will call elements of P projections.

The internal representation [22] of the same

4L1Projection

Suppose we are given a point ˆ x ∈ Rmand a matrix V ∈ Rm×m−1of full column rank. The

projection of ˆ x onto S = {x ∈ Rm|x = V α for some α} can be found by solving the following

2

Page 3

optimization problem.

min

α||ˆ x − V α||1= min

α

m

?

j=1

???ˆ xj− (V α)j

1,...,λ+

???

(2)

For non-negative variables λ+= [λ+

The mathematical program in (2) can be reformulated as a linear program (LP) that leads to

important geometric insights.

m]Tand λ−= [λ−

1,...,λ−

m]T, let λ+− λ−= ˆ x − V α.

min

α,λ+,λ−

?m

j=1|λ+

j− λ−

j| =

s.t.

min

α,λ+,λ−

?m

V α + λ+− λ−

j=1λ+

j+ λ−

j

=

≥

ˆ x

0

λ+,λ−

≡ LP(V , ˆ x) (3)

An optimal solution to LP(V , ˆ x) provides the magnitudes λ+

an L1projection of ˆ x onto S. Optimal values for α are scaling factors that locate the projection

in terms of a linear combination of the columns of V . The following result states that there exists

a projection of ˆ x ∈ Rmonto an m − 1-dimensional subspace that is located along a single unit

direction from ˆ x.

jand λ−

jfor the unit directions for

Result 1. Given a subspace S = {x ∈ Rm|x = V α for some α} of dimension m − 1 and a point

ˆ x ∈ Rm, ˆ x / ∈ S, there exists a solution to (3) with exactly one component from (λ+,λ−) positive.

Proof. Because the variables in α are unbounded, they will never leave the basis in a simplex pivot

(see [19], p. 170). Therefore, there exists an optimal basic feasible solution with all of the variables

in α and one component of (λ+,λ−) basic.

Result 1 can also be proved using Corollary 2.2 in [17], after applying a correction (?n

Figure 1 illustrates Result 1 in two dimensions. In Figure 1(a), the unique projection of ˆ x onto

S is along the negative vertical unit direction. The subspace S is defined by the solitary vector

in V ; therefore, the value for α∗in LP(V , ˆ x) is positive and less than 1. Figure 1(b) illustrates

the situation when the projection is along the horizontal direction. Because of the orientation of

the vector in V , the value for α∗will be negative. Figure 1(c) illustrates the special case when S

is a line that makes a 45 degree angle with the coordinate axes. In this case, the projection is a

segment of S. There exist optimal solutions to LP(V , ˆ x) corresponding to projections along both

of the horizontal and vertical unit directions. As we will see, the projection direction depends on

the orientation of S and not on the location of ˆ x.

Next consider the projection of a set of points xi∈ Rm, i = 1,...n onto an m − 1-dimensional

subspace. The following result establishes that each point projects onto the subspace along the

same unit direction.

i=1νi= 1 is

replaced with?

{i:|wi|=||w||∞}νi= 1, see [17]).

Result 2. Given a set of points xi ∈ Rm, i = 1,...n and a subspace S = {x ∈ Rm|x =

V α for some α} of dimension m − 1, there exists an optimal solution to LP(V ,xi) with either

λ+

j? ≥ 0 or λ−

j? ≥ 0 for some j?, and λ+

j= λ−

j= 0 for j ?= j?.

3

Page 4

S

P

V

θ

x ˆ

S

P

V

θ

x ˆ

S

45

o

P

V

x ˆ

(a)(b) (c)

Figure 1: The L1projection, P, of a point ˆ x onto a subspace S depends on the orientation of the

subspace. In 2D, when the angle, θ, is different from 45othe projection is unique but directly along

either (a) the y-axis or (b) the x-axis. When (c) θ = 45o, the projection P is a segment and it

includes the points along both unit directions.

Proof. When S has an external representation, the result follows from Theorem 2.1 in [17] and

Result 1.

Results 1 and 2 apply to general hyperplanes in Rm. These properties of L1 projection are the

basis for a new procedure for finding the m − 1-dimensional subspace of best fit. We can find an

L1-norm best-fit hyperplane by considering the residuals along each of the m unit directions.

L1regression is a well-understood procedure for analyzing the dependence of one variable on other

variables in a point set [20]. The L1regression problem is to find a hyperplane that minimizes the

sum of L1-norm distances from the points in a point set along the unit direction corresponding to

the “independent” variable. The designation of the independent variable in a general point set is

effectively arbitrary. Since an L1-norm best-fit hyperplane for a point set will have the property

that all projections occur along the same unit direction, the problem reduces to finding the best

L1regression in each of the m unit directions. Charnes et al. [6] and Wagner [24] show that L1

linear regression can be solved by finding an optimal solution to a linear program. This realization

about the relationship between L1projection and L1regression is the basis for a new procedure for

solving the L1-norm best-fit hyperplane problem. Theorem 1 formalizes this result.

Theorem 1. Given a set of points xi∈ Rm, i = 1,...,n, an optimal solution to the L1-norm best-

fit hyperplane problem is the hyperplane given by {x ∈ Rm|β∗

and

0+ β∗Tx = 0} where (β∗

0,β∗) = Rj∗

j∗= argmin

j=1,...,mRj(x1,...,xn) =min

β0,β,e+,e−

n

?

i=1

e+

i+ e−

i

(4)

subject to

β0+ βTxi+ e+

i− e−

i

=

=

≥

0

−1

0

i = 1,...,n

βj

e+

i,e−

i

i = 1,...,n

4

Page 5

Proof. Suppose that for a point set a different hyperplane attains a better L1fit. By Result 2, we

know that all points will project onto this hyperplane along a single unit direction corresponding

to j?. The contradiction if j?= j∗is immediate by the optimality of (β∗

leads to a contradiction because Rj∗ would not have been minimal.

0,β∗). Similarly, j??= j∗

Theorem 1 is an instance of a more general result about hyperplane fitting using general norms in

Minkowski spaces [18]. The idea for the L1case is suggested by Zemel [25], but no formal proof is

provided. Neither of these works implement and test a procedure based on this result.

The proof of Theorem 1 implies that there exists a projection into S that has all of the properties

of an optimal L1regression hyperplane. Some of these properties are summarized in the following

corollary.

Corollary 1. Given a set of points xi∈ Rm, i = 1,...,n, there exists a projection into an m − 1-

dimensional subspace S such that

1. the sum of L1distances of points to S is minimized among all m − 1-dimensional subspaces,

2. at least m − 1 of the points lie in S [2],

3. the difference between number of points on each side of S is at most m [2], and

4. the projection of points into S is maximum likelihood for errors following a joint distribution

of m independent, identically distributed Laplace random variables [1].

Problem (1) is stated in terms of an internal representation of an affine set. Theorem 1 provides an

externally-defined best-fit hyperplane. In order to satisfy the original requirements of the problem,

we must calculate m−1 linearly independent vectors that span the optimal hyperplane. We can find

an optimal matrix V by applying an orthogonalization procedure to (β∗

linearly independent vectors in Rm.

0,β∗) and m−1 additional

5 Optimal L1projection procedure

Theorem 1 motivates Algorithm 1 for calculating the L1subspace of best fit.

The input to Algorithm 1 are n observations of m dimensions. The main loop in Steps 2-7 solves

m LPs each of which has 2n + m + 1 variables. Step 8 involves n(m − 1) multiplications. In Step

9, the matrix V can be found by performing a singular value decomposition on the matrix whose

rows are comprised of zi, i = 1,...,n, which has complexity O(m2n + m3) [10]. Therefore, since

LPs can be solved in polynomial time, the overall complexity of Algorithm 1 is polynomial.

6Numerical Validation

Algorithm 1 is validated by comparing the results obtained for four instances to the results ob-

tained with an industry-standard nonlinear programming solver. The formulation in (1) for p = 1

is recast as a mathematical program with a linear objective and 2mn nonlinear constraints and

is submitted to KNITRO [4] via an algebraic modeling language (AMPL). The four point sets have

dimensions m = 3,5,10,20 with n = 10,25,50, and 25000 observations, respectively. The point

sets are available online at http://www.people.vcu.edu/∼jpbrooks/ProjEl/index.html.

5

Page 6

Algorithm 1 Calculating the L1-norm best-fit hyperplane

Given points xi∈ Rm, i = 1,...,n.

1: Let R∗= ∞

2: for j in 1,...,m do

3:

Solve the LP in (4) to find the L1regression hyperplane with the jthcolumn representing

the dependent variable and the remaining columns representing the independent variables.

The optimal hyperplane has coefficients (β0,β) and error Rj.

4:

if Rj< R∗then

5:

R∗= Rj, j∗= j, β∗

6:

end if

7: end for

8: For each xi, the optimal projection onto S is given by zi, i = 1,...,n, where zij = xij for

j ?= j∗and zij∗ = β∗

the jthelement.

9: S is defined by {x|V α = x for some α}, where the columns of V are vectors that span the

zi’s.

0= β0, β∗= β

0+ β∗T

(j∗)xi(j∗), where for a vector y, y(j)is the vector created by removing

Algorithm 1 is implemented using the ILOG CPLEX 11.1 Callable Library [11] for the solution

of LPs. The instances are solved on a machine with 3.2 GHz Intel Pentium D processors and 4

GB RAM. The first two instances are solved using the student version of KNITRO on the same

architecture. The remaining instances for KNITRO must be solved using the NEOS Server [12]

because of the limitations of the student version.

Table 1 summarizes the results of applying Algorithm 1 and KNITRO to the point sets. The objective

of this exercise is to verify that the procedure in Algorithm 1 produces a solution with the same

objective function value as an optimal solution to the original nonlinear best-fit problem formulated

directly from expression (1). The procedures obtain solutions with identical objective function

values of the first three point sets. KNITRO was unable to solve the problem for the fourth point set

due to insufficient memory available at the host site.

Table 1: Performance of Algorithm 1 and KNITRO on Synthetic Point Sets

Algorithm 1

Solution

Time (s)

<1

<1

<1

2939.5

*No solution due to insufficient memory

KNITRO

Optimal

Objective

Solution

Time (s)

Best

m

3

5

10

20

n

Objective

10

25

50

79.0

27.1

51.4

<1

3.0

57.0

79.0

27.1

51.4

25000 1987.3**

Figure 2 depicts different plots of planes for the first point set of Table 1. Panels 2(a) and 2(b)

display the regression planes when the independent variables are x and y, respectively. The sums

of the L1deviations for the planes in 2(a) and 2(b) are 517.8 and 332.1, respectively. Panel 2(c)

is the regression plane when the third variable is the independent variable. The sum of the L1

deviations along this axis is 79.0 for this plane; therefore, by Theorem 1, it is an L1best-fit plane

for the points. Panel 2(d) shows the plane obtained when a nonlinear formulation based on (1) is

6

Page 7

solved using KNITRO. The planes in 2(c) and 2(d) are essentially the same; the differences can be

attributed to numerical error. For Panels 2(a), 2(b), 2(c), Properties 2 and 3 of Corollary 1 are

verified.

7 Conclusion

In spite of all that is known about general projection theory and the identification of best-fit

hyperplanes, the L1-norm best-fit hyperplane problem is still being treated as a difficult nonlinear

optimization problem in application areas such as computer vision and statistics. With insights

about L1projection and L1regression, the problem is surprisingly simple to solve. Two key insights

into the geometry of L1projections onto a hyperplane: (1) L1projection occurs along a single unit

direction, and (2) the direction of projection is independent of the location of the point, suggest

immediately an algorithm for solving this problem. The algorithm calculates the L1 regression

hyperplanes for each of the m dimensions in which the points reside, and selects the one that

minimizes the sum of the L1distances. The algorithm is implemented and numerically validated.

With this new algorithm, large-scale instances arising from multivariate statistics and computer

vision are now easily solvable and the elements are in place for a pure L1-based PCA procedure.

7

Page 8

L1Regression Plane

w.r.t. First Coordinate

L1Regression Plane

w.r.t. Second Coordinate

(a)(b)

L1Regression Plane

w.r.t. Third Coordinate: Optimal L1FitOptimal KNITRO Plane

(c)(d)

Figure 2: Point set, fitted planes, and projection directions for the m = 3, n = 10 point set in

Table 1.

8

Page 9

References

[1] S. Agarwal, M.K. Chandraker, F. Kahl, D. Kriegman, and S. Belongie. Practical global opti-

mization for multiview geometry. Lecture Notes in Computer Science, 3951:592–605, 2006.

[2] G. Appa and C. Smith. On L1and Chebyshev estimation. Mathematical Programming, 5:73–

87, 1973.

[3] A. Baccini, P. Besse, and A. de Faguerolles. A L1-norm PCA and heuristic approach. In

Proceedings of the International Conference on Ordinal and Symbolic Data Analysis, pages

359–368, 1996.

[4] R.H. Byrd, J. Nocedal, and R.A. Waltz. Large-Scale Nonlinear Optimization, Eds. G. di Pillo

and M. Roma, chapter KNITRO: An integrated package for nonlinear optimization, pages

35–59. Springer Verlag, 2006.

[5] N.A. Campbell. Robust procedures in multivariate analysis I: Robust covariance estimation.

Applied Statistics, 29:231–237, 1980.

[6] A. Charnes, W.W. Cooper, and R.O. Ferguson. Optimal estimation of executive compensation

by linear programming. Management Science, 1:138–150, 1955.

[7] V. Choulakian. L1-norm projection pursuit principal component analysis. Computational

Statistics and Data Analysis, 50:1441–1451, 2006.

[8] J.S. Galpin and D.M. Hawkins. Methods of L1estimation of a covarance matrix. Computational

Statistics and Data Analysis, 5:305–319, 1987.

[9] J. Gao. Robust L1 principal component analysis and its Bayesian variational inference. Neural

Computation, 20:555–572, 2008.

[10] G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins University Press,

Baltimore, MD, 1983.

[11] ILOG. ILOG CPLEX Division. 889 Alder Avenue, Incline Village, Nevada, 2009.

[12] M. Mesnier J. Czyzyk and J. Mor´ e. The NEOS server. IEEE Journal on Computational

Science and Engineering, 5:68–75, 1998.

[13] I.T. Jolliffe. Principal Component Analysis. Springer, 2nd edition, 2002.

[14] Q. Ke and T. Kanade. Robust subspace computation using L1 norm. Technical Report CMU-

CS-03-172, Carnegie Mellon University, Pittsburgh, PA, 2003.

[15] Q. Ke and T. Kanade. Robust l1norm factorization in the presence of outliers and missing data

by alternative convex programming. In IEEE Conference on Computer Vision and Pattern

Recognition, 2005.

[16] N. Kwak. Principal component analysis based on L1-norm maximization. IEEE Transactions

on Pattern Analysis and Machine Intelligence, 30:1672–1680, 2008.

[17] O.L. Mangasarian. Arbitrary-norm separating plane. Operations Research Letters, 24:15–23,

1999.

9

Page 10

[18] H. Martini and A. Sch¨ obel. Median hyperplanes in normed spaces - a survey. Discrete Applied

Mathematics, 89:181–195, 1998.

[19] K.G. Murty. Linear Programming. Wiley, 1983.

[20] S.C. Narula and J.F. Wellington. The minimum sum of absolute errors regression: A state of

the art survey. International Statistical Review, 50:317–326, 1982.

[21] K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical

Magazine, 2:559–572, 1901.

[22] R.T. Rockafellar. Convex analysis. Princeton University Press, 1970.

[23] H. Sp¨ ath and G.A. Watson. On orthogonal linear l1approximation. Numerische Mathematik,

51:531–543, 1987.

[24] W.H. Wagner. Linear programming techniques for regression analysis. Journal of the American

Statistical Association, 54:206–212, 1959.

[25] E. Zemel. An o(n) algorithm for the linear multiple choice knapsack problem and related

problems. Information Processing Letters, 18:123–128, 1984.

10

#### View other sources

#### Hide other sources

- Available from J. H. Dula · May 20, 2014
- Available from optimization-online.org