Article

Sharp thresholds for high-dimensional and noisy recovery of sparsity

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The problem of consistently estimating the sparsity pattern of a vector $\betastar \in \real^\mdim$ based on observations contaminated by noise arises in various contexts, including subset selection in regression, structure estimation in graphical models, sparse approximation, and signal denoising. We analyze the behavior of $\ell_1$-constrained quadratic programming (QP), also referred to as the Lasso, for recovering the sparsity pattern. Our main result is to establish a sharp relation between the problem dimension $\mdim$, the number $\spindex$ of non-zero elements in $\betastar$, and the number of observations $\numobs$ that are required for reliable recovery. For a broad class of Gaussian ensembles satisfying mutual incoherence conditions, we establish existence and compute explicit values of thresholds $\ThreshLow$ and $\ThreshUp$ with the following properties: for any $\epsilon > 0$, if $\numobs > 2 (\ThreshUp + \epsilon) \log (\mdim - \spindex) + \spindex + 1$, then the Lasso succeeds in recovering the sparsity pattern with probability converging to one for large problems, whereas for $\numobs < 2 (\ThreshLow - \epsilon) \log (\mdim - \spindex) + \spindex + 1$, then the probability of successful recovery converges to zero. For the special case of the uniform Gaussian ensemble, we show that $\ThreshLow = \ThreshUp = 1$, so that the threshold is sharp and exactly determined.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Meinshausen (2007) presents similar results for the LASSO for dimensionality of exponential growth and finite nonsparsity size, but its persistency rate is slower than that of a relaxed LASSO. For consistency and selection consistency results see Donoho, Elad and Temlyakov (2006), Meinshausen and Bühlmann (2006), Wainwright (2006), Zhao and Yu (2006), Bunea, Tsybakov and Wegkamp (2007), Bickel, Ritov and Tsybakov (2008), van de Geer (2008), and Zhang and Huang (2008), among others. ...
... This shows that the capacity of the LASSO for selecting a consistent model is very limited, noticing also that the L 1 -norm of the regression coefficients typically increase with s. See, e.g., Wainwright (2006). As discussed above, condition (42) is a stringent condition in high dimensions for the LASSO estimator to enjoy the weak oracle property. ...
Article
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods.
... As expected, this assumes that y was in fact generated by that x and given to us. The case when the available y's are noisy versions of real y's is also of interest [7,8,33,54]. Although that case is not of primary interest in the present paper it is worth mentioning that the recent popularity of ℓ 1 -optimization in compressed sensing is significantly due to its robustness with respect to noisy y's. ...
... which is what we established as a goal in (54). We summarize the results in the following theorem. ...
Article
In our recent work \cite{StojnicCSetam09} we considered solving under-determined systems of linear equations with sparse solutions. In a large dimensional and statistical context we proved that if the number of equations in the system is proportional to the length of the unknown vector then there is a sparsity (number of non-zero elements of the unknown vector) also proportional to the length of the unknown vector such that a polynomial $\ell_1$-optimization technique succeeds in solving the system. We provided lower bounds on the proportionality constants that are in a solid numerical agreement with what one can observe through numerical experiments. Here we create a mechanism that can be used to derive the upper bounds on the proportionality constants. Moreover, the upper bounds obtained through such a mechanism match the lower bounds from \cite{StojnicCSetam09} and ultimately make the latter ones optimal.
... Observe that in Figure 7 for m ∈ {15, 20} all results are very close to the random reference -whereas the values of the mutual coherence of the initial, random, frames H 0 in Tables I, II, III and IV are very high compared to the mutual coherence achieved by SIDCO. To see why this might be the case we remind the reader that compressed sensing is able to recover the correct support when the amplitudes of the non-zero entries of a are above a constant times M/m log M [50], which takes high values for relative small m and that has been shown to hold for Gaussian random sensing matrices with order s log M measurements [50]. Apart from the mutual coherence, recent developments [51] show that unit norm tight frames perform well in compressive sensing applications when measuring the reconstruction average mean squared error. ...
... Observe that in Figure 7 for m ∈ {15, 20} all results are very close to the random reference -whereas the values of the mutual coherence of the initial, random, frames H 0 in Tables I, II, III and IV are very high compared to the mutual coherence achieved by SIDCO. To see why this might be the case we remind the reader that compressed sensing is able to recover the correct support when the amplitudes of the non-zero entries of a are above a constant times M/m log M [50], which takes high values for relative small m and that has been shown to hold for Gaussian random sensing matrices with order s log M measurements [50]. Apart from the mutual coherence, recent developments [51] show that unit norm tight frames perform well in compressive sensing applications when measuring the reconstruction average mean squared error. ...
Article
Full-text available
The construction of highly incoherent frames, sequences of vectors placed on the unit hyper sphere of a finite dimensional Hilbert space with low correlation between them, has proven very difficult. Algorithms proposed in the past have focused in minimizing the absolute value off-diagonal entries of the Gram matrix of these structures. Recently, a method based on convex optimization that operates directly on the vectors of the frame has been shown to produce promising results. This paper gives a detailed analysis of the optimization problem at the heart of this approach and, based on these insights, proposes a new method that substantially outperforms the initial approach and all current methods in the literature for all types of frames, with low and high redundancy. We give extensive experimental results that show the effectiveness of the proposed method and its application to optimized compressed sensing.
... Employing such an output as the set K in algorithm (20) would allow canceling the last term on the right-hand side of (28), thus obtaining better estimates. The quest for such an Oracle functional can also be interpreted as the estimation of the support, or the sparsity pattern of the solution, a problem that has been extensively studied in the context of compressed sensing (see, e.g., [22,36,46]). Our approach, discussed in Section 7, relies instead on statistical learning techniques, and in particular based on Graph Neural Networks, to provide a data-driven approximation of the optimal Oracle functional. ...
Preprint
Full-text available
Sparse recovery principles play an important role in solving many nonlinear ill-posed inverse problems. We investigate a variational framework with support Oracle for compressed sensing sparse reconstructions, where the available measurements are nonlinear and possibly corrupted by noise. A graph neural network, named Oracle-Net, is proposed to predict the support from the nonlinear measurements and is integrated into a regularized recovery model to enforce sparsity. The derived nonsmooth optimization problem is then efficiently solved through a constrained proximal gradient method. Error bounds on the approximate solution of the proposed Oracle-based optimization are provided in the context of the ill-posed Electrical Impedance Tomography problem. Numerical solutions of the EIT nonlinear inverse reconstruction problem confirm the potential of the proposed method which improves the reconstruction quality from undersampled measurements, under sparsity assumptions.
... Indeed, there is research that studies the importance of this scaling for support recovery in certain regression settings including in the case of the lasso (e.g. [27,28,14,10]). In Bertsimas and Van Pary's work on CIO, they empirically demonstrate the advantage of subset selection over the lasso specifically with reference to these "phase transitions" in variable selection [2]. Still, there is room in this area for empirical comparisons of estimators with scaling in mind, for example to put performance across problem configurations on a somewhat standardized basis and provide clarity for practitioners as to which scalings favor which estimators. ...
Preprint
Sparse linear regression is a vast field and there are many different algorithms available to build models. Two new papers published in Statistical Science study the comparative performance of several sparse regression methodologies, including the lasso and subset selection. Comprehensive empirical analyses allow the researchers to demonstrate the relative merits of each estimator and provide guidance to practitioners. In this discussion, we summarize and compare the two studies and we examine points of agreement and divergence, aiming to provide clarity and value to users. The authors have started a highly constructive dialogue, our goal is to continue it.
... Though the lasso enables a sparse model, it is unstable with high-dimensional data and cannot select more variables than the sample size before it saturates when p >n [3,20,22,28,32,33]. ...
Thesis
Full-text available
For a linear regression, the traditional technique deals with a case where the number of observations n more than the number of predictor variables p (n>p). In the case n<p, the classical method fails to estimate the coefficients. A solution of this problem in the case of correlated predictors is provided in this thesis. A new regularization and variable selection is proposed under the name of Sparse Ridge Fusion (SRF). In the case of highly correlated predictor , the simulated examples and a real data show that the SRF always outperforms the lasso, elastic net, and the S-Lasso, and the results show that the SRF selects more predictor variables than the sample size n while the maximum selected variables by lasso is n size.
... During the recent years compressed sensing (CS) has gained importance, mostly inspired by the positive theoretical and K. Malczewski, Department of Electronics and Telecommunications, Poznan University of Technology (e-mail: kmal@et.put.poznan.pl). experimental results shown in [1], [2], [3]. The sparsity in Magnetic Resonance Imaging (MRI) is applied to significantly undersample k-space. ...
Article
Full-text available
Magnetic Resonance Imaging (MRI) reconstruction algorithm using semi-PROPELLER compressed sensing is presented in this paper. It is exhibited that introduced algorithm for estimating data shifts is feasible when super-resolution is applied. The offered approach utilizes compressively sensed MRI PROPELLER sequences and improves MR images spatial resolution in circumstances when highly undersampled k-space trajectories are applied. Compressed sensing (CS) aims at signal and images reconstructing from significantly fewer measurements than were traditionally thought necessary. It is shown that the presented approach improves MR spatial resolution in cases when Compressed Sensing (CS) sequences are used. The application of CS in medical modalities has the potential for significant scan time reductions, with visible benefits for patients and health care economics. These methods emphasize on maximizing image sparsity on known sparse transform domain and minimizing fidelity. This diagnostic modality struggles with an inherently slow data acquisition process. The use of CS to MRI leads to substantial scan time reductions [7] and visible benefits for patients and economic factors. In this report the objective is to combine Super-Resolution image enhancement algorithm with both PROPELLER sequence and CS framework. The motion estimation algorithm being a part of super resolution reconstruction (SRR) estimates shifts for all blades jointly, utilizing blade-pair correlations that are both strong and more robust to noise.
Chapter
The beginning of this book illustrates that linear regression models can describe the relationships between the genes’ copy numbers and a biomarker. However, those models do not provide information about the relationships among the copy numbers themselves. To describe such relationships, we use a different type of models, called graphical models. We neglect the biomarker and summarize the measured copy numbers in vector-valued observations, where each vector corresponds to a specific subject and each coordinate of these vectors to a specific gene (in the linear regression model, these vectors are the rows of the design matrix). Graphical models then formulate the relationships among the copy numbers as conditional dependence networks among the coordinates of the vector-valued observations. If the observations follow a multivariate Gauss distribution, we speak of Gaussian graphical models. This is the most common class of graphical models and, therefore, the focus of this chapter.
Chapter
In the preceding sections, we have discussed the estimation of target parameters, such as the elements of the regression vector β in linear regression. In this section, our task is to complement these estimates with measures of uncertainty. We call this task inference. Our first step is to introduce an algorithm for computing estimators that are defined through systems of equations.
Chapter
Regularized estimators consist of two terms, one for comparing model parameters to data and one for including prior information. The tuning parameters define the weighting: small tuning parameters emphasize the data, while large tuning parameters emphasize the prior information. An optimal tuning parameter balances the data and the prior information such that an estimator’s error for a given task is minimized. Data-driven calibration schemes try to find such optimal tuning parameters in practice.
Chapter
Linear regression relates predictor variables and outcome variables, such as gene copy numbers and the level of a biomarker. The assumed linearity of the relationships makes the models convenient both mathematically and computationally. And since the data can be arbitrarily transformed beforehand, such as by including polynomials of the copy numbers as predictor variables or by replacing the level of the biomarker in the outcome variable by its logarithm, linear regression can also effectively model non-linear relationships. This simplicity and flexibility have made linear regression the most popular statistical framework across the sciences and standard textbook material. But the standard methods for linear regression, such as the least-squares estimator, premise that the number of parameters is small as compared to the number of samples, which limits their usefulness in modern, data-intensive research, where the increasing granularity of data has prompted interest in increasingly complex models. More recent high-dimensional methods, in contrast, allow for models with many more parameters. These methods are the topic of this chapter.
Chapter
The regression model (1.1) in Sect. 1.1 relates blood levels of a biomarker with characteristics of the subjects’ genomes. Corresponding data allow us to analyze this relationship from a variety of different perspectives: For example, we can study how the biomarker levels depend on the ensemble of genes, we can study the role of each individual gene, or we can study which genes influence the biomarker levels in the first place.
Chapter
In the remainder of this book, we establish mathematical guarantees for high-dimensional estimators. The concepts underpinning our derivations are the basis for high-dimensional theories in general, but for the sake of clarity we focus here on linear regression with regularized least-squares estimators.
Thesis
The reality of big data poses both opportunities and challenges to modern researchers. Its key features -- large sample sizes, high-dimensional feature spaces, and structural complexity -- enforce new paradigms upon the creation of effective yet algorithmic efficient data analysis algorithms. In this dissertation, we illustrate a few paradigms through the analysis of three new algorithms. The first two algorithms consider the problem of phase retrieval, in which we seek to recover a signal from random rank-one quadratic measurements. We first show that an adaptation of the randomized Kaczmarz method provably exhibits linear convergence so long as our sample size is linear in the signal dimension. Next, we show that the standard SDP relaxation of sparse PCA yields an algorithm that does signal recovery for sparse, model-misspecified phase retrieval with a sample complexity that scales according to the square of the sparsity parameter. Finally, our third algorithm addresses the problem of Non-Gaussian Component Analysis, in which we are trying to identify the non-Gaussian marginals of a high-dimensional distribution. We prove that our algorithm exhibits polynomial time convergence with polynomial sample complexity.
Chapter
Orthogonal frequency-division multiple access has been widely adopted by the modern wireless networking standards. These use initial uplink synchronization (IUS) process to detect and uplink-synchronize with new user equipments (UEs) (3rd Generation Partnership Project; technical specification group radio access network; evolved universal terrestrial radio access (E-UTRA); physical channels and modulation (release 10), (2011) [1]). IUS is a random access process where a UE intending to start communication transmits a code during an “IUS opportunity”. The code is chosen uniformly at random from a predefined codebook. The eNodeB uses the received signal to detect the codes, and estimate the uplink channel parameters associated with each detected code. This detection and estimation problem is known to be quite challenging, particularly when the number of UEs transmitting during an IUS opportunity is not small. We discuss some recent sparse signal processing methods to address this problem in the context of long-term evolution (LTE) standards. This research does not only give some new directions to solve the detection and estimation problem but also provides guidelines for designing the codebook. In addition, the key ideas are applicable to other OFDMA systems.
Chapter
Regularization is a popular variable selection technique for high dimensional regression models. However, under the ultra-high dimensional setting, a direct application of the regularization methods tends to fail in terms of model selection consistency due to the possible spurious correlations among predictors. Motivated by the ideas of screening (Fan and Lv, J R Stat Soc Ser B Stat Methodol 70:849–911, 2008) and retention (Weng et al, Manuscript, 2013), we propose a new two-step framework for variable selection, where in the first step, marginal learning techniques are utilized to partition variables into different categories, and the regularization methods can be applied afterwards. The technical conditions of model selection consistency for this broad framework relax those for the one-step regularization methods. Extensive simulations show the competitive performance of the new method.
Article
One of the main measurements performed in a nuclear spectroscopy experiment is the activity of an unknown radioactive source. The use of digital apparatus and physical perturbations, known as pile-up effect, make this measurement difficult when the activity of the source is high. In recent contributions, the use of compressive sensing methods yielded good estimates of this activity. This paper presents an improvement of a previously described method. It takes into account the fact that the signal used for the activity estimation is sampled, and introduces another plug-in estimation to counterbalance the bias introduced by the sampling. Results on simulations and real data validate the proposed approach, but illustrate that a good fit between the dictionary used and the signal at hand is required.
Article
Full-text available
We consider the problem of offline change point detection from noisy piecewise constant signals. We propose normalized fused LASSO (FL), an extension of the FL, obtained by normalizing the columns of the sensing matrix of the LASSO equivalent. We analyze the performance of the proposed method, and in particular, we show that it is consistent in detecting change points as the noise variance tends to zero. Numerical experiments support our theoretical findings.
Conference Paper
We investigate the task of model selection for high-dimensional data. For this purpose, we propose an extension to the Bayesian information criterion. Our information criterion is asymptotically consistent either as the number of measurements tends to infinity or as the variance of noise decreases to zero. The numerical results provided support our claim. Additionally, we highlight the link between model selection for high-dimensional data and the choice of hyper-parameter in l 1 -constrained estimators, specifically the LASSO.
Article
This paper investigates the problem of recovering the support of structured signals via adaptive compressive sensing. We examine several classes of structured support sets, and characterize the fundamental limits of accurately recovering such sets through compressive measurements, while simultaneously providing adaptive support recovery protocols that perform near optimally for these classes. We show that by adaptively designing the sensing matrix we can attain significant performance gains over non-adaptive protocols. These gains arise from the fact that adaptive sensing can: (i) better mitigate the effects of noise, and (ii) better capitalize on the structure of the support sets.
Chapter
In the analysis of data acquired from label-free experiments by liquid chromatography coupled with mass spectrometry (LC-MS), accounting for potential sources of variability can improve the detection of true differences in ion abundance. Mixed effects models are commonly used to estimate variabilities due to heterogeneity of the biological specimen, differences in sample preparation, and instrument variation. In this chapter, we investigate the mixed effects models and evaluate their performance in difference detection, in comparison to other methods such as marginal t-test, which uses the average over analytical and technical replicates within each biological sample for statistical analysis. Experimental design including replication assignment and sample size calculation is discussed. These are highly dependent on the variation contributed by the different sources, which can be estimated from LC-MS pilot studies prior to running large-scale label-free experiments.
Article
Full-text available
We derive fundamental sample complexity bounds for recovering sparse and structured signals for linear and nonlinear observation models including sparse regression, group testing, multivariate regression and problems with missing features. In general, sparse signal processing problems can be characterized in terms of the following Markovian property. We are given a set of N variables X1;X2; : : : ;XN, and there is an unknown subset of variables S f1; : : : ;Ng that are relevant for predicting outcomes Y . More specifically, when Y is conditioned on fXngn2S it is conditionally independent of the other variables, fXngn62S. Our goal is to identify the set S from samples of the variables X and the associated outcomes Y . We characterize this problem as a version of the noisy channel coding problem. Using asymptotic information theoretic analyses, we establish mutual information formulas that provide sufficient and necessary conditions on the number of samples required to successfully recover the salient variables. These mutual information expressions unify conditions for both linear and nonlinear observations. We then compute sample complexity bounds for the aforementioned models, based on the mutual information expressions in order to demonstrate the applicability and flexibility of our results in general sparse signal processing models.
Article
In regression analysis, a parametric model is often assumed from prior information or a pilot study. If the model assumption is valid, the parametric method is useful. However, the efficiency of the estimator is not guaranteed when a poor model is selected. This article aims to check whether the model assumption is correct or not and to estimate the regression function. To achieve this, we propose a hybrid technique of parametrically guided method and group lasso. First, the parametric model is prepared. The parametrically guided estimator is constructed by summing the parametric estimator and nonparametric estimator. For the estimation of the nonparametric component, we use B-splines and the group lasso method. If the nonparametric component is estimated to be a zero function, the parametrically guided estimator is reduced to the parametric estimator. Then, we can decide that the parametric model assumption is correct. If the nonparametric estimator is not zero, the semiparametric estimator is obtained. Thus, the proposed method discovers the model structure and estimates the regression function simultaneously. We investigate the asymptotic properties of the proposed estimator. A simulation study and real data example are presented. Copyright
Conference Paper
The LASSO regression has been studied extensively in the statistics and signal processing community, especially in the realm of sparse parameter estimation from linear measurements. We analyze the convergence rate of a first-order method applied on a smooth, strictly convex, and parametric upper bound on the LASSO objective function. The upper bound approaches the true non-smooth objective as the parameter tends to infinity. We show that a gradient-based algorithm, applied to minimize the smooth upper bound, yields a convergence rate of O (1/K), where K denotes the number of iterations performed. The analysis also reveals the optimum value of the parameter that achieves a desired prediction accuracy, provided that the total number of iterations is decided a priori. The convergence rate of the proposed algorithm and the amount of computation required in each iteration are same as that of the iterative soft thresholding technique. However, the proposed algorithm does not involve any thresholding operation. The performance of the proposed technique, referred to as smoothed LASSO, is validated on synthesized signals. We also deploy smoothed LASSO for estimating an image from its blurred and noisy measurement, and compare the performance with the fast iterative shrinkage thresholding algorithm for a fixed run-time budget, in terms of the reconstruction peak signal-to-noise ratio and structural similarity index.
Article
The statistics literature of the past 15 years has established many favorable properties for sparse diminishing-bias regularization: techniques that can roughly be understood as providing estimation under penalty functions spanning the range of concavity between ℓ0 and ℓ1 norms. However, lasso ℓ1-regularized estimation remains the standard tool for industrial Big Data applications because of its minimal computational cost and the presence of easy-to-apply rules for penalty selection. In response, this article proposes a simple new algorithm framework that requires no more computation than a lasso path: the path of one-step estimators (POSE) does ℓ1 penalized regression estimation on a grid of decreasing penalties, but adapts coefficient-specific weights to decrease as a function of the coefficient estimated in the previous path step. This provides sparse diminishing-bias regularization at no extra cost over the fastest lasso algorithms. Moreover, our gamma lasso implementation of POSE is accompanied by a reliable heuristic for the fit degrees of freedom, so that standard information criteria can be applied in penalty selection. We also provide novel results on the distance between weighted-ℓ1 and ℓ0 penalized predictors; this allows us to build intuition about POSE and other diminishing-bias regularization schemes. The methods and results are illustrated in extensive simulations and in application of logistic regression to evaluating the performance of hockey players. Supplementary materials for this article are available online. © 2017 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America
Conference Paper
Fluorescent calcium imaging provides a potentially powerful tool for inferring connectivity in large neural circuits. However, a key challenge in using calcium imaging for connectivity detection is that current systems often have a temporal response and frame rates that can be orders of magnitude slower than the underlying neural spiking process. Bayesian inference methods based on expectation-maximization (EM) have been proposed to overcome these limitations, but are often computationally demanding since the E-step in the EM procedure typically involves state estimation for a high-dimensional nonlinear dynamical system. In this work, we propose a computationally scalable method based on a hybrid of loopy belief propagation and approximate message passing (AMP). The key insight is that a neural system as viewed through calcium imaging can be factorized into simple scalar dynamical systems for each neuron with linear interconnections between the neurons. Using the structure, the updates in the proposed hybrid AMP methodology can be computed by a set of one-dimensional state estimation procedures and linear transforms with the connectivity matrix. The method extends earlier works by incorporating more general nonlinear dynamics and responses to stimuli.
Article
ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> mean filtering is a conventional, optimization based method to estimate the positions of jumps in a piecewise constant signal perturbed by additive noise. In this method, the ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> norm penalizes sparsity of the first-order derivative of the signal. Theoretical results, however, show that in some situations, which can occur frequently in practice, even when the jump amplitudes tend to ∞, the conventional method identifies false change points. This issue, which is referred to as the stair-casing problem herein, restricts practical importance of ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> mean filtering. In this paper, sparsity is penalized more tightly than the ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> norm by exploiting a certain class of nonconvex functions, while the strict convexity of the consequent optimization problem is preserved. This results in a higher performance in detecting change points. To theoretically justify the performance improvements over ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> mean filtering, deterministic and stochastic sufficient conditions for exact change point recovery are derived. In particular, theoretical results show that in the stair-casing problem, our approach might be able to exclude the false change points, while ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> mean filtering may fail. A number of numerical simulations assist to show superiority of our method over ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> mean filtering and another state-of-the-art algorithm that promotes sparsity tighter than the ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> norm. Specifically, it is shown that our approach can consistently detect change points when the jump amplitudes become sufficiently large, while the two other competitors cannot.
Chapter
Concentration of measure plays a central role in the content of this book. This chapter gives the first account of this subject. Bernstein-type concentration inequalities are often used to investigate the sums of random variables (scalars, vectors and matrices). In particular, we survey the recent status of sums of random matrices in Chap. 2, which gives us the straightforward impression of the classical view of the subject.
Article
Full-text available
The problem of the distributed recovery of jointly sparse signals has attracted much attention recently. Let us assume that the nodes of a network observe different sparse signals with common support; starting from linear, compressed measurements, and exploiting network communication, each node aims at reconstructing the support and the non-zero values of its observed signal. In the literature, distributed greedy algorithms have been proposed to tackle this problem, among which the most reliable ones require a large amount of transmitted data, which barely adapts to realistic network communication constraints. In this paper, we address the problem through a reweighted l(1) soft thresholding technique, in which the threshold is iteratively tuned based on the current estimate of the support. The proposed method adapts to constrained networks, as it requires only local communication among neighbors, and the transmitted messages are indices from a finite set. We analytically prove the convergence of the proposed algorithm and we show that it outperforms the state-of-the-art greedy methods in terms of balance between recovery accuracy and communication load.
Article
The construction of highly incoherent frames, sequences of vectors placed on the unit hyper sphere of a finite dimensional Hilbert space with low correlation between them, has proven very difficult. Algorithms proposed in the past have focused on minimizing the absolute value off-diagonal entries of the Gram matrix of these structures. Recently, a method based on convex optimization that operates directly on the vectors of the frame has been shown to produce promising results. This paper gives a detailed analysis of the optimization problem at the heart of this approach and, based on these insights, proposes a new method that substantially outperforms the initial approach and all current methods in the literature for all types of frames, with low and high redundancy. We give extensive experimental results that show the effectiveness of the proposed method and its application to optimized compressed sensing.
Conference Paper
Magnetic Resonance Imaging (MRI) reconstruction algorithm using semi-PROPELLER compressed sensing is presented in this paper. It is exhibited that introduced algorithm for estimating data shifts is feasible when super-resolution is applied. The offered approach utilizes MRI PROPELLER sequences and improves MR images spatial resolution in circumstances when highly undersampled k-space trajectories are applied. Compressed Sensing (CS) aims at signal and images reconstructing from significantly fewer measurements than were conventionally assumed necessary. This diagnostic modality struggles with an inherently slow data acquisition process. The use of CS to MRI leads to substantial scan time reductions and visible benefits for patients and economic factors. In this report the objective is to combine Super-Resolution image enhancement algorithm with both PROPELLER sequence and CS framework. All the techniques emphasize on maximizing image sparsity on known sparse transform domain and minimizing fidelity. The motion estimation algorithm being a part of super resolution reconstruction (SRR) estimates shifts for all blades jointly, emphasizing blade-pair correlations that are both strong and more robust to noise.
Article
Full-text available
Sparse modeling has been widely and successfully used in many applications such as computer vision, machine learning, and pattern recognition and, accompanied with those applications, significant research has studied the theoretical limits and algorithm design for convex relaxations in sparse modeling. However, only little has been done for theoretical limits of non-negative versions of sparse modeling. The behavior is expected to be similar as the general sparse modeling, but a precise analysis has not been explored. This paper studies the performance of non-negative sparse modeling, especially for non-negativity constrained and $\ell_1$-penalized least squares, and gives an exact bound for which this problem can recover the correct signal elements. We pose two conditions to guarantee the correct signal recovery: minimum coefficient condition (MCC) and non-linearity vs. subset coherence condition (NSCC). The former defines the minimum weight for each of the correct atoms present in the signal and the latter defines the tolerable deviation from the linear model relative to the positive subset coherence (PSC), a novel type of "coherence" metric. We provide rigorous performance guarantees based on these conditions and experimentally verify their precise predictive power in a hyperspectral data unmixing application.
Article
This brief presents a systematic approach for the design of sparse dynamic output feedback control structures. A supplementary complexity cost function term is used to promote sparsity in the structure while optimizing an H₂ performance cost simultaneously. Optimization problems in which a combinatorial sparsity measure is combined with a nonlinear performance cost function are NP-hard. NP-hard problems do not have tractable solutions, requiring either a numerical solution or a relaxation into a solvable form. Relaxations will introduce conservatism, but at the same time retain stability and performance guarantees. In this brief, a new relaxation methodology is proposed, which allows the problem to be formulated as a convex semidefinite program. This is made possible via use of a new state-space form that establishes a direct relationship between the state-space and the resulting transfer function matrix parameters.
Article
Relay selection is a simple technique that achieves spatial diversity in cooperative relay networks. Generally, relay selection algorithms require channel state information (CSI) feedback from all cooperating relays in order to make a selection decision. This requirement poses two important challenges which are often neglected in the literature. Firstly, the fed back channel information is usually corrupted by additive noise. Secondly, CSI feedback generates a great deal of feedback overhead (air-time) that could result in significant performance hits. In this paper, we propose a compressive sensing (CS) based relay selection algorithm that reduces the feedback overhead of relay networks under the assumption of noisy feedback channels. The proposed algorithm exploits CS to first obtain the identity of a set of relays with favorable channel conditions. Following that, the CSI of the identified relays is estimated using least squares estimation without any additional feedback. Both single and multiple relay selection cases are considered. After deriving closed-form expressions for the asymptotic end-to-end SNR at the destination and the feedback load for different relaying protocols, we show that CS-based selection drastically reduces the feedback load and achieves a rate close to that obtained by selection algorithms with dedicated error-free feedback.
Article
In this paper we characterize sharp time-data tradeoffs for optimization problems used for solving linear inverse problems. We focus on the minimization of a least-squares objective subject to a constraint defined as the sub-level set of a penalty function. We present a unified convergence analysis of the gradient projection algorithm applied to such problems. We sharply characterize the convergence rate associated with a wide variety of random measurement ensembles in terms of the number of measurements and structural complexity of the signal with respect to the chosen penalty function. The results apply to both convex and nonconvex constraints, demonstrating that a linear convergence rate is attainable even though the least squares objective is not strongly convex in these settings. When specialized to Gaussian measurements our results show that such linear convergence occurs when the number of measurements is merely $4$ times the minimal number required to recover the desired signal at all (a.k.a. the phase transition). We also achieve a slower but geometric rate of convergence precisely above the phase transition point. Extensive numerical results suggest that the derived rates exactly match the empirical performance.
Article
Full-text available
The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.
Conference Paper
Full-text available
We consider the problem of enforcing a sparsity prior in underdetermined linear problems, which is also known as sparse signal representation in overcomplete bases. The problem is combinatorial in nature, and a direct approach is computationally intractable, even for moderate data sizes. A number of approximations have been considered in the literature, including stepwise regression, matching pursuit and its variants, and, recently, basis pursuit (ℓ<sub>1</sub>) and also ℓ<sub>p</sub>-norm relaxations with p<1. Although the exact notion of sparsity (expressed by an ℓ<sub>0</sub>-norm) is replaced by ℓ<sub>1</sub> and ℓ<sub>p</sub> norms in the latter two, it can be shown that under some conditions these relaxations solve the original problem exactly. The seminal paper of D.L. Donoho and X. Huo (see Stanford Univ. Tech. report: http://www-sccm.stanford.edu/pub/sccm/sccm02-17.pdf) establishes this fact for ℓ<sub>1</sub> (basis pursuit) for a special case where the linear operator is composed of an orthogonal pair. We extend their results to a general underdetermined linear operator. Furthermore, we derive conditions for the equivalence of ℓ<sub>0</sub> and ℓ<sub>p</sub> problems, and extend the results to the problem of enforcing sparsity with respect to a transformation (which includes total variation priors as a special case). Finally, we describe an interesting result relating the sign patterns of solutions to the question of ℓ<sub>1</sub>-ℓ<sub>0</sub> equivalence.
Article
Full-text available
Overcomplete representations are attracting interest in signal processing theory, particularly due to their potential to generate sparse representations of signals. However, in general, the problem of finding sparse representations must be unstable in the presence of noise. This paper establishes the possibility of stable recovery under a combination of sufficient sparsity and favorable structure of the overcomplete system. Considering an ideal underlying signal that has a sufficiently sparse representation, it is assumed that only a noisy version of it can be observed. Assuming further that the overcomplete system is incoherent, it is shown that the optimally sparse approximation to the noisy data differs from the optimally sparse decomposition of the ideal noiseless signal by at most a constant multiple of the noise level. As this optimal-spar-sity method requires heavy (combinatorial) computational effort, approximation algorithms are considered. It is shown that similar stability is also available using the basis and the matching pursuit algorithms. Furthermore, it is shown that these methods result in sparse approximation of the noisy data that contains only terms also appearing in the unique sparsest representation of the ideal noiseless sparse signal.
Article
Full-text available
We consider the asymptotic behavior ofregression estimators that minimize the residual sum of squares plus a penalty proportional to $\sum|\beta_j|^{\gamma}$. for some $\gamma > 0$. These estimators include the Lasso as a special case when $\gamma = 1$. Under appropriate conditions, we show that the limiting distributions can have positive probability mass at 0 when the true value of the parameter is 0.We also consider asymptotics for “nearly singular” designs.
Article
Full-text available
If a signal x is known to have a sparse representation with respect to a frame, it can be estimated from a noise-corrupted observation y by finding the best sparse approximation to y. Removing noise in this manner depends on the frame efficiently representing the signal while it inefficiently represents the noise. The mean-squared error (MSE) of this denoising scheme and the probability that the estimate has the same sparsity pattern as the original signal are analyzed. First an MSE bound that depends on a new bound on approximating a Gaussian signal as a linear combination of elements of an overcomplete dictionary is given. Further analyses are for dictionaries generated randomly according to a spherically-symmetric distribution and signals expressible with single dictionary elements. Easily-computed approximations for the probability of selecting the correct dictionary element and the MSE are given. Asymptotic expressions reveal a critical input signal-to-noise ratio for signal recovery.
Article
Full-text available
Suppose x is an unknown vector in Ropf<sup>m</sup> (a digital image or signal); we plan to measure n general linear functionals of x and then reconstruct. If x is known to be compressible by transform coding with a known transform, and we reconstruct via the nonlinear procedure defined here, the number of measurements n can be dramatically smaller than the size m. Thus, certain natural classes of images with m pixels need only n=O(m<sup>1/4</sup>log<sup>5/2</sup>(m)) nonadaptive nonpixel samples for faithful recovery, as opposed to the usual m pixel samples. More specifically, suppose x has a sparse representation in some orthonormal basis (e.g., wavelet, Fourier) or tight frame (e.g., curvelet, Gabor)-so the coefficients belong to an lscr<sub>p</sub> ball for 0<ples1. The N most important coefficients in that expansion allow reconstruction with lscr<sub>2</sub> error O(N<sup>1/2-1</sup>p/). It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients. Moreover, a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing. The nonadaptive measurements have the character of "random" linear combinations of basis/frame elements. Our results use the notions of optimal recovery, of n-widths, and information-based complexity. We estimate the Gel'fand n-widths of lscr<sub>p</sub> balls in high-dimensional Euclidean space in the case 0<ples1, and give a criterion identifying near- optimal subspaces for Gel'fand n-widths. We show that "most" subspaces are near-optimal, and show that convex optimization (Basis Pursuit) is a near-optimal way to extract information derived from these near-optimal subspaces
Article
Full-text available
Overcomplete representations are attracting interest in signal processing theory, particularly due to their potential to generate sparse representations of signals. However, in general, the problem of finding sparse representations must be unstable in the presence of noise. This paper establishes the possibility of stable recovery under a combination of sufficient sparsity and favorable structure of the overcomplete system. Considering an ideal underlying signal that has a sufficiently sparse representation, it is assumed that only a noisy version of it can be observed. Assuming further that the overcomplete system is incoherent, it is shown that the optimally sparse approximation to the noisy data differs from the optimally sparse decomposition of the ideal noiseless signal by at most a constant multiple of the noise level. As this optimal-sparsity method requires heavy (combinatorial) computational effort, approximation algorithms are considered. It is shown that similar stability is also available using the basis and the matching pursuit algorithms. Furthermore, it is shown that these methods result in sparse approximation of the noisy data that contains only terms also appearing in the unique sparsest representation of the ideal noiseless sparse signal.
Article
Full-text available
In previous work, Elad and Bruckstein (EB) have provided a sufficient condition for replacing an l<sub>0</sub> optimization by linear programming minimization when searching for the unique sparse representation. We establish here that the EB condition is both sufficient and necessary.
Article
Full-text available
An elementary proof of a basic uncertainty principle concerning pairs of representations of R<sup>N</sup> vectors in different orthonormal bases is provided. The result, slightly stronger than stated before, has a direct impact on the uniqueness property of the sparse representation of such vectors using pairs of orthonormal bases as overcomplete dictionaries. The main contribution in this paper is the improvement of an important result due to Donoho and Huo (2001) concerning the replacement of the l<sub>0</sub> optimization problem by a linear programming (LP) minimization when searching for the unique sparse representation.
Article
Full-text available
The fundamental theorems on the asymptotic behavior of eigenvalues, inverses, and products of "finite section" Toeplitz matrices and Toeplitz matrices with absolutely summable elements are derived in a tutorial manner. Mathematical elegance and generality are sacrificed for conceptual simplicity and insight in the hopes of making these results available to engineers lacking either the background or endurance to attack the mathematical literature on the subject. By limiting the generality of the matrices considered the essential ideas and results can be conveyed in a more intuitive manner without the mathematical machinery required for the most general cases. As an application the results are applied to the study of the covariance matrices and their factors of linear models of discrete time random processes. Acknowledgements The author gratefully acknowledges the assistance of Ronald M. Aarts of the Philips Research Labs in correcting many typos and errors in the 1993 revision, Liu Mingyu in pointing out errors corrected in the 1998 revision, Paolo Tilli of the Scuola Normale Superiore of Pisa for pointing out an incorrect corollary and providing the correction, and to David Neuho# of the University of Michigan for pointing out several typographical errors and some confusing notation. For corrections, comments, and improvements to the 2001 revision thanks are due to William Trench, John Dattorro, and Young Han-Kim. In particular, Trench brought the Wielandt-Ho#man theorem and its use to prove strengthened results to my attention. Section 2.4 largely follows his suggestions, although I take the blame for any introduced errors. Contents 1
Article
Full-text available
The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs. Neighborhood selection estimates the conditional independence restrictions separately for each node in the graph and is hence equivalent to variable selection for Gaussian linear models. We show that the proposed neighborhood selection scheme is consistent for sparse high-dimensional graphs. Consistency hinges on the choice of the penalty parameter. The oracle value for optimal prediction does not lead to a consistent neighborhood estimate. Controlling instead the probability of falsely joining some distinct connectivity components of the graph, consistent estimation for sparse graphs is achieved (with exponential rates), even when the number of variables grows as the number of observations raised to an arbitrary power.
Article
This paper studies a difficult and fundamental problem that arises throughout electrical engineering, applied mathematics, and statistics. Suppose that one forms a short linear combination of elementary signals drawn from a large, fixed collection. Given an observation of the linear combination that has been contaminated with additive noise, the goal is to identify which elementary signals participated and to approximate their coefficients. Although many algorithms have been proposed, there is little theory which guarantees that these algorithms can accurately and efficiently solve the problem. This paper studies a method called convex relaxation, which attempts to recover the ideal sparse signal by solving a convex program. This approach is powerful because the optimization can be completed in polynomial time with standard scientific software. The paper provides general conditions which ensure that convex relaxation succeeds. As evidence of the broad impact of these results, the paper describes how convex relaxation can be used for several concrete signal recovery problems. It also describes applications to channel coding, linear regression, and numerical analysis.
Article
The time-frequency and time-scale communities have recently developed a large number of overcomplete waveform dictionaries-stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the method of frames (MOF), Matching pursuit (MP), and, for special dictionaries, the best orthogonal basis (BOB). Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l(1) norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, in abstract harmonic analysis, total variation denoising, and multiscale edge denoising. BP in highly overcomplete dictionaries leads to large-scale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.
Article
This paper considers the model problem of reconstructing an object from incomplete frequency samples. Consider a discrete-time signal f∈C<sup>N</sup> and a randomly chosen set of frequencies Ω. Is it possible to reconstruct f from the partial knowledge of its Fourier coefficients on the set Ω? A typical result of this paper is as follows. Suppose that f is a superposition of |T| spikes f(t)=σ<sub>τ∈T</sub>f(τ)δ(t-τ) obeying |T|≤C<sub>M</sub>·(log N)<sup>-1</sup> · |Ω| for some constant C<sub>M</sub>>0. We do not know the locations of the spikes nor their amplitudes. Then with probability at least 1-O(N<sup>-M</sup>), f can be reconstructed exactly as the solution to the ℓ<sub>1</sub> minimization problem. In short, exact recovery may be obtained by solving a convex optimization problem. We give numerical values for C<sub>M</sub> which depend on the desired probability of success. Our result may be interpreted as a novel kind of nonlinear sampling theorem. In effect, it says that any signal made out of |T| spikes may be recovered by convex programming from almost every set of frequencies of size O(|T|·logN). Moreover, this is nearly optimal in the sense that any method succeeding with probability 1-O(N<sup>-M</sup>) would in general require a number of frequency samples at least proportional to |T|·logN. The methodology extends to a variety of other situations and higher dimensions. For example, we show how one can reconstruct a piecewise constant (one- or two-dimensional) object from incomplete frequency samples - provided that the number of jumps (discontinuities) obeys the condition above - by minimizing other convex functionals such as the total variation of f.
Article
Exponential and Information Inequalities.- Gaussian Processes.- Gaussian Model Selection.- Concentration Inequalities.- Maximal Inequalities.- Density Estimation via Model Selection.- Statistical Learning.
Article
OBJECTIVES Prediction, Explanation, Elimination or What? How Many Variables in the Prediction Formula? Alternatives to Using Subsets 'Black Box' Use of Best-Subsets Techniques LEAST-SQUARES COMPUTATIONS Using Sums of Squares and Products Matrices Orthogonal Reduction Methods Gauss-Jordan v. Orthogonal Reduction Methods Interpretation of Projections Appendix A: Operation Counts for All-Subsets Regression FINDING SUBSETS WHICH FIT WELL Objectives and Limitations of this Chapter Forward Selection Efroymson's Algorithm Backward Elimination Sequential Replacement Algorithm Replacing Two Variables at a Time Generating All Subsets Using Branch-and-Bound Techniques Grouping Variables Ridge Regression and Other Alternatives The Non-Negative Garrote and the Lasso Some Examples Conclusions and Recommendations HYPOTHESIS TESTING Is There any Information in the Remaining Variables? Is One Subset Better than Another? Appendix A: Spjftvoll's Method - Detailed Description WHEN TO STOP? What Criterion Should We Use? Prediction Criteria Cross-Validation and the PRESS Statistic Bootstrapping Likelihood and Information-Based Stopping Rules Appendix A. Approximate Equivalence of Stopping Rules ESTIMATION OF REGRESSION COEFFICIENTS Selection Bias Choice Between Two Variables Selection Bias in the General Case, and its Reduction Conditional Likelihood Estimation Estimation of Population Means Estimating Least-Squares Projections Appendix A: Changing Projections to Equate Sums of Squares BAYESIAN METHODS Bayesian Introduction 'Spike and Slab' Prior Normal prior for Regression Coefficients Model Averaging Picking the Best Model CONCLUSIONS AND SOME RECOMMENDATIONS REFERENCES INDEX
Article
In the paper I give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection via the lasso.
Article
The following problem is considered: given a matrix A in Rm×n, (m rows and n columns), a vector b in Rm, and ε > 0, compute a vector x satisfying ∥Ax - b∥2 ≤ ε if such exists, such that x has the fewest number of non-zero entries over all such vectors. It is shown that the problem is NP-hard, but that the well-known greedy heuristic is good in that it computes a solution with at most [18 Opt(ε/2)∥A+∥22 ln(∥b∥2/ε)] non-zero entries, where Opt(ε/2) is the optimum number of nonzero entries at error ε/2, A is the matrix obtained by normalizing each column of A with respect to the L2 norm, and A+ is its pseudo-inverse.
Article
A formula for the general second-order moment of inverse elements of a Wishart matrix is derived from a result due to S. Das Gupta.
Conference Paper
The paper extends some recent results on sparse representations of signals in redundant bases developed in the noise-free case to the case of noisy observations. The type of question addressed so far is: given a (n,m)-matrix A with m>n and a vector b=Ax, find a sufficient condition for b to have a unique sparsest representation as a linear combination of the columns of A. The answer is a bound on the number of nonzero entries of, say, x<sub>o</sub>, that guarantees that x<sub>o</sub> is the unique and sparsest solution of Ax=b with b=Ax<sub>o</sub>. We consider the case b=Ax<sub>o</sub>+e where x<sub>o</sub> satisfies the sparsity conditions requested in the noise-free case and seek conditions on e, a vector of additive noise or modeling errors, under which x<sub>o</sub> can be recovered from b in a sense to be defined.
Article
This paper studies a difficult and fundamental problem that arises throughout electrical engineering, applied mathematics, and statistics. Suppose that one forms a short linear combination of elementary signals drawn from a large, fixed collection. Given an observation of the linear combination that has been contaminated with additive noise, the goal is to identify which elementary signals participated and to approximate their coefficients. Although many algorithms have been proposed, there is little theory which guarantees that these algorithms can accurately and efficiently solve the problem. This paper studies a method called convex relaxation, which attempts to recover the ideal sparse signal by solving a convex program. This approach is powerful because the optimization can be completed in polynomial time with standard scientific software. The paper provides general conditions which ensure that convex relaxation succeeds. As evidence of the broad impact of these results, the paper describes how convex relaxation can be used for several concrete signal recovery problems. It also describes applications to channel coding, linear regression, and numerical analysis
Article
This paper considers a natural error correcting problem with real valued input/output. We wish to recover an input vector f∈R<sup>n</sup> from corrupted measurements y=Af+e. Here, A is an m by n (coding) matrix and e is an arbitrary and unknown vector of errors. Is it possible to recover f exactly from the data y? We prove that under suitable conditions on the coding matrix A, the input f is the unique solution to the ℓ<sub>1</sub>-minimization problem (||x||<sub>ℓ1</sub>:=Σ<sub>i</sub>|x<sub>i</sub>|) min(g∈R<sup>n</sup>) ||y - Ag||<sub>ℓ1</sub> provided that the support of the vector of errors is not too large, ||e||<sub>ℓ0</sub>:=|{i:e<sub>i</sub> ≠ 0}|≤ρ·m for some ρ>0. In short, f can be recovered exactly by solving a simple convex optimization problem (which one can recast as a linear program). In addition, numerical experiments suggest that this recovery procedure works unreasonably well; f is recovered exactly even in situations where a significant fraction of the output is corrupted. This work is related to the problem of finding sparse solutions to vastly underdetermined systems of linear equations. There are also significant connections with the problem of recovering signals from highly incomplete measurements. In fact, the results introduced in this paper improve on our earlier work. Finally, underlying the success of ℓ<sub>1</sub> is a crucial property we call the uniform uncertainty principle that we shall describe in detail.
Article
The purpose of this contribution is to extend some recent results on sparse representations of signals in redundant bases developed in the noise-free case to the case of noisy observations. The type of question addressed so far is as follows: given an (n,m)-matrix A with m>n and a vector b=Ax<sub>o</sub>, i.e., admitting a sparse representation x<sub>o</sub>, find a sufficient condition for b to have a unique sparsest representation. The answer is a bound on the number of nonzero entries in x<sub>o</sub>. We consider the case b=Ax<sub>o</sub>+e where x<sub>o</sub> satisfies the sparsity conditions requested in the noise-free case and e is a vector of additive noise or modeling errors, and seek conditions under which x<sub>o</sub> can be recovered from b in a sense to be defined. The conditions we obtain relate the noise energy to the signal level as well as to a parameter of the quadratic program we use to recover the unknown sparsest representation. When the signal-to-noise ratio is large enough, all the components of the signal are still present when the noise is deleted; otherwise, the smallest components of the signal are themselves erased in a quite rational and predictable way
Article
This article presents new results on using a greedy algorithm, orthogonal matching pursuit (OMP), to solve the sparse approximation problem over redundant dictionaries. It provides a sufficient condition under which both OMP and Donoho's basis pursuit (BP) paradigm can recover the optimal representation of an exactly sparse signal. It leverages this theory to show that both OMP and BP succeed for every sparse input signal from a wide class of dictionaries. These quasi-incoherent dictionaries offer a natural generalization of incoherent dictionaries, and the cumulative coherence function is introduced to quantify the level of incoherence. This analysis unifies all the recent results on BP and extends them to OMP. Furthermore, the paper develops a sufficient condition under which OMP can identify atoms from an optimal approximation of a nonsparse signal. From there, it argues that OMP is an approximation algorithm for the sparse problem over a quasi-incoherent dictionary. That is, for every input signal, OMP calculates a sparse approximant whose error is only a small factor worse than the minimal error that can be attained with the same number of terms.
Article
Suppose a discrete-time signal S(t), 0&les;t<N, is a superposition of atoms taken from a combined time-frequency dictionary made of spike sequences 1<sub>{t=τ}</sub> and sinusoids exp{2πiwt/N}/√N. Can one recover, from knowledge of S alone, the precise collection of atoms going to make up S? Because every discrete-time signal can be represented as a superposition of spikes alone, or as a superposition of sinusoids alone, there is no unique way of writing S as a sum of spikes and sinusoids in general. We prove that if S is representable as a highly sparse superposition of atoms from this time-frequency dictionary, then there is only one such highly sparse representation of S, and it can be obtained by solving the convex optimization problem of minimizing the l<sup>1</sup> norm of the coefficients among all decompositions. Here “highly sparse” means that N<sub>t</sub>+N<sub>w</sub><√N/2 where N<sub>t</sub> is the number of time atoms, N<sub>w</sub> is the number of frequency atoms, and N is the length of the discrete-time signal. Underlying this result is a general l<sup>1</sup> uncertainty principle which says that if two bases are mutually incoherent, no nonzero signal can have a sparse representation in both bases simultaneously. For the above setting, the bases are sinusoids and spikes, and mutual incoherence is measured in terms of the largest inner product between different basis elements. The uncertainty principle holds for a variety of interesting basis pairs, not just sinusoids and spikes. The results have idealized applications to band-limited approximation with gross errors, to error-correcting encryption, and to separation of uncoordinated sources. Related phenomena hold for functions of a real variable, with basis pairs such as sinusoids and wavelets, and for functions of two variables, with basis pairs such as wavelets and ridgelets. In these settings, if a function f is representable by a sufficiently sparse superposition of terms taken from both bases, then there is only one such sparse representation; it may be obtained by minimum l<sup>1</sup> norm atomic decomposition. The condition “sufficiently sparse” becomes a multiscale condition; for example, that the number of wavelets at level j plus the number of sinusoids in the jth dyadic frequency band are together less than a constant times 2<sup>j/2 </sup>
Article
The purpose of model selection algorithms such as All Subsets, Forward Selection, and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the eificient pre- diction of a response variable. Least Angle Regression (" LARS"), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods.
Article
The Time-Frequency and Time-Scale communities have recently developed a large number of overcomplete waveform dictionaries --- stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the Method of Frames (MOF), Matching Pursuit (MP), and, for special dictionaries, the Best Orthogonal Basis (BOB). Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l 1 norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP! and BOB, including better sparsity, and super-resolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, in abstract harmonic analysis, total variation de-noising, and multi-scale edge de-noising. Basis Pursuit in highly ...
Article
. This paper deals with sparse approximations by means of convex combinations of elements from a predetermined "basis" subset S of a function space. Specifically, the focus is on the rate at which the lowest achievable error can be reduced as larger subsets of S are allowed when constructing an approximant. The new results extend those given for Hilbert spaces by Jones and Barron, including in particular a computationally attractive incremental approximation scheme. Bounds are derived for broad classes of Banach spaces; in particular, for Lp spaces with 1 ! p ! 1, the O(n Gamma1=2 ) bounds of Barron and Jones are recovered when p = 2. One motivation for the questions studied here arises from the area of "artificial neural networks," where the problem can be stated in terms of the growth in the number of "neurons" (the elements of S) needed in order to achieve a desired error rate. The focus on non-Hilbert spaces is due to the desire to understand approximation in the more "robust" ...
Article
We obtain an asymptotically sharp error bound in the classical Sudakov-Fernique comparison inequality for finite collections of gaussian random variables. Our proof is short and self-contained, and gives an easy alternative argument for the classical inequality, extended to the case of non-centered processes.
  • H A David
H. A. David and H. N. Nagaraja. Order Statistics. Wiley Series in Probability and Statistics. Wiley, New York, 2003.
Constructive Approximation
  • R A Devore
  • G G Lorentz
R. A. DeVore and G. G. Lorentz. Constructive Approximation. Springer-Verlag, New York, NY, 1993.