# The L1-norm best-fit hyperplane problem

### Full-text

J. H. Dula, Jul 07, 2015 Available from:- [Show abstract] [Hide abstract]

**ABSTRACT:**Low-distortion embeddings are critical building blocks for developing random sampling and random projection algorithms for linear algebra problems. We show that, given a matrix $A \in \R^{n \times d}$ with $n \gg d$ and a $p \in [1, 2)$, with a constant probability, we can construct a low-distortion embedding matrix $\Pi \in \R^{O(\poly(d)) \times n}$ that embeds $\A_p$, the $\ell_p$ subspace spanned by $A$'s columns, into $(\R^{O(\poly(d))}, \| \cdot \|_p)$; the distortion of our embeddings is only $O(\poly(d))$, and we can compute $\Pi A$ in $O(\nnz(A))$ time, i.e., input-sparsity time. Our result generalizes the input-sparsity time $\ell_2$ subspace embedding by Clarkson and Woodruff [STOC'13]; and for completeness, we present a simpler and improved analysis of their construction for $\ell_2$. These input-sparsity time $\ell_p$ embeddings are optimal, up to constants, in terms of their running time; and the improved running time propagates to applications such as $(1\pm \epsilon)$-distortion $\ell_p$ subspace embedding and relative-error $\ell_p$ regression. For $\ell_2$, we show that a $(1+\epsilon)$-approximate solution to the $\ell_2$ regression problem specified by the matrix $A$ and a vector $b \in \R^n$ can be computed in $O(\nnz(A) + d^3 \log(d/\epsilon) /\epsilon^2)$ time; and for $\ell_p$, via a subspace-preserving sampling procedure, we show that a $(1\pm \epsilon)$-distortion embedding of $\A_p$ into $\R^{O(\poly(d))}$ can be computed in $O(\nnz(A) \cdot \log n)$ time, and we also show that a $(1+\epsilon)$-approximate solution to the $\ell_p$ regression problem $\min_{x \in \R^d} \|A x - b\|_p$ can be computed in $O(\nnz(A) \cdot \log n + \poly(d) \log(1/\epsilon)/\epsilon^2)$ time. Moreover, we can improve the embedding dimension or equivalently the sample size to $O(d^{3+p/2} \log(1/\epsilon) / \epsilon^2)$ without increasing the complexity. - [Show abstract] [Hide abstract]

**ABSTRACT:**We provide fast algorithms for overconstrained $\ell_p$ regression and related problems: for an $n\times d$ input matrix $A$ and vector $b\in\R^n$, in $O(nd\log n)$ time we reduce the problem $\min_{x\in\R^d} \norm{Ax-b}_p$ to the same problem with input matrix $\tilde A$ of dimension $s \times d$ and corresponding $\tilde b$ of dimension $s\times 1$. Here, $\tilde A$ and $\tilde b$ are a coreset for the problem, consisting of sampled and rescaled rows of $A$ and $b$; and $s$ is independent of $n$ and polynomial in $d$. Our results improve on the best previous algorithms when $n\gg d$, for all $p\in [1,\infty)$ except $p=2$. We also provide a suite of improved results for finding well-conditioned bases via ellipsoidal rounding, illustrating tradeoffs between running time and conditioning quality, including a one-pass conditioning algorithm for general $\ell_p$ problems. We also provide an empirical evaluation of implementations of our algorithms for $p=1$, comparing them with related algorithms. Our empirical results clearly show that, in the asymptotic regime, the theory is a very good guide to the practical performance of these algorithms. Our algorithms use our faster constructions of well-conditioned bases for $\ell_p$ spaces and, for $p=1$, a fast subspace embedding of independent interest that we call the Fast Cauchy Transform: a distribution over matrices $\Pi: \R^n\mapsto \R^{O(d\log d)}$, found obliviously to $A$, that approximately preserves the $\ell_1$ norms: that is, with large probability, simultaneously for all $x$, $\norm{Ax}_1 \approx \norm{\Pi Ax}_1$, with distortion $O(d^{2+\eta})$, for an arbitrarily small constant $\eta>0$; and, moreover, $\Pi A$ can be computed in $O(nd\log d)$ time. The techniques underlying our Fast Cauchy Transform include fast Johnson-Lindenstrauss transforms, low-coherence matrices, and rescaling by Cauchy random variables. - [Show abstract] [Hide abstract]

**ABSTRACT:**Given a set of data W={w1,...,wN}∈RDW={w1,...,wN}∈RD drawn from a union of subspaces, we focus on determining a nonlinear model of the form U=⋃i∈ISiU=⋃i∈ISi, where {Si⊂RD}i∈I{Si⊂RD}i∈I is a set of subspaces, that is nearest to W. The model is then used to classify W into clusters. Our approach is based on the binary reduced row echelon form of data matrix, combined with an iterative scheme based on a non-linear approximation method. We prove that, in absence of noise, our approach can find the number of subspaces, their dimensions, and an orthonormal basis for each subspace SiSi. We provide a comprehensive analysis of our theory and determine its limitations and strengths in presence of outliers and noise.Applied and Computational Harmonic Analysis 01/2013; 37(2). DOI:10.1016/j.acha.2013.12.001 · 3.00 Impact Factor