The L1-norm best-fit hyperplane problem

Virginia Commonwealth University, 1015 Floyd Avenue, P.O. Box 843083, Richmond, VA 23284.
Applied Mathematics Letters (Impact Factor: 1.34). 10/2012; 26(1):51-56. DOI: 10.1016/j.aml.2012.03.031
Source: PubMed


We formalize an algorithm for solving the L(1)-norm best-fit hyperplane problem derived using first principles and geometric insights about L(1) projection and L(1) regression. The procedure follows from a new proof of global optimality and relies on the solution of a small number of linear programs. The procedure is implemented for validation and testing. This analysis of the L(1)-norm best-fit hyperplane problem makes the procedure accessible to applications in areas such as location theory, computer vision, and multivariate statistics.

  • Source
    • "For example, by taking advantage of the fast version of subspace-preserving sampling from [10], we can construct and apply a (1 ± )-distortion sparse embedding matrix for 1 in O(nnz(A) · log n + poly(d//)) time. In addition, we can use it to compute a (1 + )approximation to the 1 regression problem in O(nnz(A) · log n + poly(d//)) time, which in turn leads to immediate improvements in 1 -based matrix approximation objectives, e.g., for the 1 subspace approximation problem [6] [29] [10]. • For p , for all p ∈ (1, 2), we obtain a low-distortion sparse embedding matrix Π such that ΠA can be computed in input-sparsity time. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Low-distortion embeddings are critical building blocks for developing random sampling and random projection algorithms for linear algebra problems. We show that, given a matrix $A \in \R^{n \times d}$ with $n \gg d$ and a $p \in [1, 2)$, with a constant probability, we can construct a low-distortion embedding matrix $\Pi \in \R^{O(\poly(d)) \times n}$ that embeds $\A_p$, the $\ell_p$ subspace spanned by $A$'s columns, into $(\R^{O(\poly(d))}, \| \cdot \|_p)$; the distortion of our embeddings is only $O(\poly(d))$, and we can compute $\Pi A$ in $O(\nnz(A))$ time, i.e., input-sparsity time. Our result generalizes the input-sparsity time $\ell_2$ subspace embedding by Clarkson and Woodruff [STOC'13]; and for completeness, we present a simpler and improved analysis of their construction for $\ell_2$. These input-sparsity time $\ell_p$ embeddings are optimal, up to constants, in terms of their running time; and the improved running time propagates to applications such as $(1\pm \epsilon)$-distortion $\ell_p$ subspace embedding and relative-error $\ell_p$ regression. For $\ell_2$, we show that a $(1+\epsilon)$-approximate solution to the $\ell_2$ regression problem specified by the matrix $A$ and a vector $b \in \R^n$ can be computed in $O(\nnz(A) + d^3 \log(d/\epsilon) /\epsilon^2)$ time; and for $\ell_p$, via a subspace-preserving sampling procedure, we show that a $(1\pm \epsilon)$-distortion embedding of $\A_p$ into $\R^{O(\poly(d))}$ can be computed in $O(\nnz(A) \cdot \log n)$ time, and we also show that a $(1+\epsilon)$-approximate solution to the $\ell_p$ regression problem $\min_{x \in \R^d} \|A x - b\|_p$ can be computed in $O(\nnz(A) \cdot \log n + \poly(d) \log(1/\epsilon)/\epsilon^2)$ time. Moreover, we can improve the embedding dimension or equivalently the sample size to $O(d^{3+p/2} \log(1/\epsilon) / \epsilon^2)$ without increasing the complexity.
    Preview · Article · Oct 2012 · Proceedings of the Annual ACM Symposium on Theory of Computing
  • Source
    • "Thus, using our approximation algorithm for constrained multiple 1 regression that we described in Section 4.3.1, we can build an approximation algorithm for the 1 -norm subspace approximation problem that improves upon the previous best algorithm from [24] and [5]. (The running time of the algorithm of [24] is Ω(nd ω + + poly(dε −1 log n)), where ω ≈ 2.376 and β > 0 is any constant.) "
    [Show abstract] [Hide abstract]
    ABSTRACT: We provide fast algorithms for overconstrained $\ell_p$ regression and related problems: for an $n\times d$ input matrix $A$ and vector $b\in\R^n$, in $O(nd\log n)$ time we reduce the problem $\min_{x\in\R^d} \norm{Ax-b}_p$ to the same problem with input matrix $\tilde A$ of dimension $s \times d$ and corresponding $\tilde b$ of dimension $s\times 1$. Here, $\tilde A$ and $\tilde b$ are a coreset for the problem, consisting of sampled and rescaled rows of $A$ and $b$; and $s$ is independent of $n$ and polynomial in $d$. Our results improve on the best previous algorithms when $n\gg d$, for all $p\in [1,\infty)$ except $p=2$. We also provide a suite of improved results for finding well-conditioned bases via ellipsoidal rounding, illustrating tradeoffs between running time and conditioning quality, including a one-pass conditioning algorithm for general $\ell_p$ problems. We also provide an empirical evaluation of implementations of our algorithms for $p=1$, comparing them with related algorithms. Our empirical results clearly show that, in the asymptotic regime, the theory is a very good guide to the practical performance of these algorithms. Our algorithms use our faster constructions of well-conditioned bases for $\ell_p$ spaces and, for $p=1$, a fast subspace embedding of independent interest that we call the Fast Cauchy Transform: a distribution over matrices $\Pi: \R^n\mapsto \R^{O(d\log d)}$, found obliviously to $A$, that approximately preserves the $\ell_1$ norms: that is, with large probability, simultaneously for all $x$, $\norm{Ax}_1 \approx \norm{\Pi Ax}_1$, with distortion $O(d^{2+\eta})$, for an arbitrarily small constant $\eta>0$; and, moreover, $\Pi A$ can be computed in $O(nd\log d)$ time. The techniques underlying our Fast Cauchy Transform include fast Johnson-Lindenstrauss transforms, low-coherence matrices, and rescaling by Cauchy random variables.
    Preview · Article · Jul 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Given a set of data W={w1,...,wN}∈RDW={w1,...,wN}∈RD drawn from a union of subspaces, we focus on determining a nonlinear model of the form U=⋃i∈ISiU=⋃i∈ISi, where {Si⊂RD}i∈I{Si⊂RD}i∈I is a set of subspaces, that is nearest to W. The model is then used to classify W into clusters. Our approach is based on the binary reduced row echelon form of data matrix, combined with an iterative scheme based on a non-linear approximation method. We prove that, in absence of noise, our approach can find the number of subspaces, their dimensions, and an orthonormal basis for each subspace SiSi. We provide a comprehensive analysis of our theory and determine its limitations and strengths in presence of outliers and noise.
    Full-text · Article · Jan 2013 · Applied and Computational Harmonic Analysis
Show more