Page 1
4672 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 7, JULY 2011
Nearly Sharp Sufficient Conditions on
Exact Sparsity Pattern Recovery
Kamiar Rahnama Rad
Abstract—Consider the ?-dimensional vector ? ? ???? where
? ?
This canbeviewed as alinear system withsparsityconstraints cor-
rupted by noise, where the objective is to estimate the sparsity pat-
tern of ? given the observation vector ? and the measurement ma-
trix ?. First, we derive a nonasymptotic upper bound on the prob-
ability that a specific wrong sparsity pattern is identified by the
maximum-likelihood estimator. We find that this probability de-
pends (inversely)exponentially on the difference of ?????and the
??-norm of ?? projected onto the range of columns of ? indexed
by thewrong sparsitypattern.Second, when? israndomlydrawn
from a Gaussian ensemble, we calculate a nonasymptotic upper
bound on the probability of the maximum-likelihood decoder not
declaring (partially) the true sparsity pattern. Consequently, we
obtain sufficient conditions on the sample size ? that guarantee al-
most surely the recovery of the true sparsity pattern. We find that
the required growth rate of sample size ? matches the growth rate
of previously established necessary conditions.
?has only ? nonzero entries and ? ?
?is a Gaussian noise.
Index Terms—Hypothesis testing, random projections, sparsity
pattern recovery, subset selection, underdetermined systems of
equations.
I. INTRODUCTION
F
technology; examples include array signal processing [1],
neural [2] and genomic data analysis [3], to name a few. In
many of these applications, it is natural to seek for sparse solu-
tions ofsuch systems, i.e., solutions withfewnonzero elements.
A common setting is when we believe or we know a priori that
only a small subset of the candidate sources, neurons, or genes
influence the observations, but their location is unknown.
More concretely, the problem we consider is that of esti-
mating the support of
that only
of its entries are nonzero based on the observational
model
INDING solutions to underdetermined systems of equa-
tions arises in a wide array of problems in science and
given the a priori knowledge
(1)
where
tors,
ditive measurement noise, assumed to be zero mean and with
is a collection of input measurement vec-
is the output measurement andis the ad-
Manuscript received October 02, 2009; revised August 19, 2010; accepted
January 28, 2011. Date of current version June 22, 2011. This work was pre-
sented at the 43rd Annual Conference on Information Sciences and Systems,
March 2009.
The author is with the Department of Statistics, Columbia University, New
York, NY 10027 USA (e-mail: kamiar@stat.columbia.edu).
Communicated by J. Romberg, Associate Editor for Signal Processing.
Digital Object Identifier 10.1109/TIT.2011.2145670
known covariance equal to
responding entry of
surement, respectively.
The output of the optimal (sparsity) decoder is defined as the
support set of the sparse solution
minimizes the residual sum of squares where
1. Each row of and the cor-
are viewed as an input and output mea-
with support sizethat
(2)
is the optimal estimate of
sparseness. The support set of
imizing the probability of identifying a wrong sparsity pattern.
First, we are concerned with the likelihood of the sparsity
patternof
asa functionof and .Weobtainanupperbound
on the probability that
has any specific sparsity pattern and
find that this bound depends (inversely) exponentially on the
difference of
and the -norm of
range of columns of
indexed by the wrong sparsity pattern.
Second, when the entries of
cally distributed(i.i.d.) random variables we are concerned with
establishingsufficientconditionsthatguaranteethereliabilityof
sparsity pattern recovery. Ideally, we would like to characterize
such conditions based on a minimal number of parameters in-
cluding the sparsity level , the signal dimension , the number
of measurements
and the signal-to-noise ratio(SNR) which is
equal to
given the a priori information of
is optimal in the sense of min-
projected onto the
are independent and identi-
(3)
Assume that the absolute value of the nonzero entries of
lower bounded by
the entries of
is equal to one11. Hence
are
2. Further, suppose that the variance of
and therefore it is natural to ask, how does the ability to reliably
estimate the sparsity pattern depend on
We find that a nonasymptotic upper bound on the probability
of the maximum-likelihood decoder not declaring the true spar-
sity pattern can be found when the entries of the measurement
matrix are i.i.d. normal random variables. This allows us to
obtain sufficient conditions on the number of measurements
as a function of
We show that our results strengthen earlier sufficient conditions
.
for reliable sparsity recovery.
1This entails no loss of generality, by standard rescaling of ?.
2To the best of our knowledge, Wainwright [4] was the first to formulate the
information theoretic limitations of sparsity pattern recovery using ?
of the key parameters.
as one
0018-9448/$26.00 © 2011 IEEE
Page 2
RAD: NEARLY SHARP SUFFICIENT CONDITIONS ON EXACT SPARSITY PATTERN RECOVERY4673
[4]–[7], and we show that the sufficient conditions on
the growth rate of the necessary conditions in both the linear,
i.e.,
, and the sublinear, i.e.,
long as
is and
match
, regimes, as
.
A. Previous Work
A large body of recent work, including [4]–[10], analyzed
reliable sparsity pattern recovery exploiting optimal and sub-
optimal decoders for large random Gaussian measurement ma-
trices. The average error probability, necessary and sufficient
conditions for sparsity pattern recovery for Gaussian measure-
ment matrices were analyzed in [4] in terms of
As a generalization of the previous work, using the Fano in-
equality, necessary conditions for general random and sparse
measurementmatriceswerepresentedin[8].Thesufficientcon-
ditions in [6] were obtained based on a simple maximum cor-
relation algorithm and a closely related thresholding estimator
discussed in [11]. In addition to the well-known formulation of
thenecessary and sufficientconditionsbasedon
Fletcher et al. [6] included the maximum-to-average ratio3of
in their analysis. Necessary and sufficient conditions for frac-
tional sparsity pattern recovery were analyzed in [5], [9].
We will discuss the relationship to this work below in more
depth, after describing our analysis and results in more detail.
.
,
B. Notation
The following conventions will remain in effect throughout
this paper. Calligraphic letters are used to indicate sparsity pat-
terns defined as a set of integers between 1 and , with cardi-
nality . We say
has sparsity pattern
with indices
are nonzero.
entries that are in
but not in
. We denote by , the matrix obtained from
extracting
columns with indices obeying
stand for the sparsity pattern or support set of
norm
of a matrixdefined as
if the entries
stands for the set of
for the cardinality of and
by
. Let
. The matrix
Note that if
equal to the top eigenvalue of
all vector norms are
orthonormal operator projecting into the subspace spanned by
the columns of
be defined as
is a positive semi-definite matrix then
. Except for the matrix norm
,
is
. Finally, let the
.
II. RESULTS
For the observational model in (1), assume that the true spar-
sity model is; as a result
(4)
3The maximum-to-average ratio of ? was defined as ?????? .
We first state a result on the probability of the event
i.e.,
ment matrix
.
,
, for any and any measure-
Theorem 1: For the observational model of (4) and estimate
in (2), the following bound holds:
where
The proof of Theorem 1, given in Section III, employs the
Chernoff technique and the properties of the eigenvalues of the
difference of projection matrices, to bound the probability of
declaring a wrong sparsity pattern
asfunctionofthemeasurementmatrix
. The error ratedecreases exponentiallyin the norm of the pro-
jection of
on the orthogonal subspace spanned by
the columns of
. This is in agreement with the intuition that
the closer different subspaces corresponding to different sets of
columns of
are, the harder it is to differentiate them, and
hence the higher the error probability will be.
Thetheorembelowgivesanonasymptoticboundontheprob-
abilityoftheeventthatthedeclaredsparsitypattern
from thetrue sparsity pattern
the entries of the measurement matrix
standard normal distribution. It is clear that by letting
obtain an upper bound on the error probability of exact sparsity
pattern recovery.
.
instead of the true one
andthetrueparameter
differs
in no more than indices, when
are drawn i.i.d. from a
we
Theorem 2: Suppose that for the observational model of (4)
and the estimate
in (2) the entries of
. If we have the equation shown at the bottom of the
page, where
are i.i.d.and
then
where
ThekeyelementsintheproofincludeTheorem1,application
of union bounds (a fairly standard technique which has been
and.
Page 3
4674 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 7, JULY 2011
used before for this problem [4], [5], [7]), asymptotic behavior
of binomial coefficients and properties of convex functions.
Note that in the linear regime, i.e.,
and the probability of misidentifying more than
any fraction (less than one) goes to zero exponentially fast as
. In words, if the SNR is fixed while the dimension of
the signal increases unboundedly, it is still possible to recover
reliably some fraction of the support. This is in agreement with
previous results on partial sparsity pattern recovery [5], [9].
Ifwelet
, ,and
the upper bound of
For
or equivalently
is bounded above by
, with
scaleasafunctionof ,then
scales like
, the probability of error as
for some
.
. Therefore
(5)
is finite and as a consequence of the Borel-Cantelli Lemma, for
large enough , the decoder declares the true sparsity pattern al-
mostsurely.Inotherwords,theestimate
the same loss as an oracle which is supplied with perfect infor-
mationaboutwhichcoefficientsof
corollary summarizes the aforementioned statements.
basedon(2)achieves
arenonzero.Thefollowing
Corollary 3: For the observational model of (4) and the esti-
mate
in (2), let ,and
there exists a constant
such that if
and
scale as a function of . Then
isand,
then a.s. for large enough ,
loss as an oracle which is supplied with perfect information
about which coefficients of
are nonzero and
Remarks:
•
is required to ensure that for a sufficiently
large
, we have
where and are defined in Theorem 1.
•
is required to ensure that for a sufficiently
large , we have
in Theorem 1.
ThesufficientconditionsinCorollary3canbecomparedagainst
similar conditions for exact sparsity pattern recovery in [4]–[7];
for example, in the sublinear regime
, [4], [7] proved that
[5], [6] proved that
vein, according to Corollary 3
achieves the same performance
.
whereis defined
, when
is sufficient, and
is sufficient. In that
suffices to ensure exact sparsity pattern recovery; therefore, it
strengthens these earlier results.
What remains is to see whether the sufficient conditions in
Corollary 3 match the necessary conditions proved in [8]:
Theorem 4 [8]: Suppose that the entries of the measurement
matrix
are drawn i.i.d. from any distribution with
zero-mean and variance one. Then a necessary condition for
asymptotically reliable recovery is that
where
The necessary condition in Theorem 4 asymptotically resem-
bles the sufficient condition in Corollary 3; recall that
. The sufficient conditions of Corollary 3 can be com-
pared against the necessary conditions in [8] for exact sparsity
pattern recovery, as shown in Table I. The first paper to estab-
lish the sufficient conditions in row 1 and row 4 of Table I is
[10]. The sufficient conditions presented in the first four rows
of Table I are a consequence of past work [4], also recovered
by Corollary 3. The new stronger result in this paper provides
the sufficient conditions in row 5 and 6, which did not appear in
previous studies [4]–[7], and match the previous necessary con-
ditions presented in [8]. (It is worth reminding that these results
are restricted to
and .).
III. PROOF OF THEOREM 1
We first state three basic lemmas.
Lemma 5: If any
early independent then for any sparsity pattern
that
the difference of projection matrices
has pairs of nonzero positive and negative
eigenvalues, bounded above by one and bounded below by neg-
ative one, respectively, and equal in magnitude.
columns of the matrix are lin-
such and
Lemma 6: For
and, we have
Lemma 7: For and, we have
Wedefertheproofsofthelemmas5and7toaftertheproofof
Theorem 1. Lemma 6 follows standard Gaussian integrals [12].
A. Proof of Theorem 1
For a given sparsity pattern
squares is achieved by
, the minimum residual sum of
where
column space of
size , the optimum decoder declares
denotes the orthogonal projection operator into the
; that is, among all sparsity patterns with
Page 4
RAD: NEARLY SHARP SUFFICIENT CONDITIONS ON EXACT SPARSITY PATTERN RECOVERY4675
TABLE I
NECESSARY AND SUFFICIENT CONDITIONS ON THE NUMBER OF MEASUREMENTS ? REQUIRED FOR RELIABLE SUPPORT RECOVERY IN THE LINEAR AND THE
SUBLINEAR REGIME. THE SUFFICIENT CONDITIONS PRESENTED IN THE FIRST FOUR ROWS ARE A CONSEQUENCE OF PAST WORK [4], ALSO RECOVERED BY
COROLLARY 3. THE NEW STRONGER RESULT IN THIS PAPER PROVIDES THE SUFFICIENT CONDITIONS IN ROW 5 AND 6, WHICH DID NOT APPEAR IN PREVIOUS
STUDIES [4]–[7], AND MATCH THE NECESSARY CONDITIONS PRESENTED IN [8]
as the optimum estimate of the true sparsity pattern in terms of
minimum error probability. Recall the definition of
note that
. If the decoder incorrectly declares
instead of the true sparsity pattern (namely
in (2) and
), then
or equivalently
The probability that the optimal decoder declares wrongly the
sparsity pattern
instead of the true sparsity pattern
than the probability that
technique an upper bound on the probability that
obtained
is less
. With the aid of the Chernoff
is
Note that
in Gaussian random vectors. This allows us to use standard
Gaussian integrals to calculate
bound the expectation,
is required to be bounded which is a
necessary condition in Lemma 6. From Lemma 6, we learned
that
is a random variable that has a quadratic form
. In order to
(6)
where we made the following abbreviations:
ForLemma 6, we need
that the eigenvalues of
consequently, (6) holds for
With the aid of the definition of the
applying it to
(6) can be bounded as follows:
and we provein Lemma5
are bounded in absolute value by one;
.
norm of matrices and
the first term in the r.h.s. of
(7)
we
Since
have
lies in the subspace spanned by the columns of
which yields the following:
and similarly
The aforementioned equations and the inequality (7) yields the
upper bound shown in (8), as found at the bottom of the page.
Lemma7introducesanupperboundfor
a lower bound for
simplify the upper bound of
in the proof of Lemma 7 is the eigenvalue properties of
and
that can be used to further
. The main ingredient
(8)
Page 5
4676 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 7, JULY 2011
that were established in Lemma 5. Substituting the bounds
obtained in Lemma 7 in (8), we have
(9)
Finally, to prove Theorem 1, we take the infimum of
overwhich is equal to
and obtain the desired bound as shown in the equation
at the bottom of the page.
at
Now we prove the remaining lemmas.
Proof of Lemma 5: Before we prove the result, let us intro-
duce some notations.
• For any
subspace spanned by the columns of
•
stands for the subspace orthogonal to
•
andstand for
, respectively,
• and finally for any subspace
onal projection onto
. (With a slight abuse of notation,
for any sparsity pattern
, we use
It is worthwhile noting that
Lemma 4.1], for any
and
, is defined as the linear
,
,
and
, designates the orthog-
instead of
is empty. From [13,
, it holds that
).
in
(10)
(11)
which yields
Consequently
(12)
Since any set of columns of
independent, for any
, we have
with size less or equal to
such that
are
andand
and
(13)
(14)
therefore
The dimension of
is equal to
which is the null space of
We just proved that
eigenvalue zero. The range of
space
.Therefore,
with absolute value less or equal to one (The eigenvalues of
are equal to one only if
If
is an eigenvector of
we have
has eigenvalues with
is the dimensional
nonzeroeigenvalues has
.)
with eigenvalue , then
Next, we prove that the vector
is an eigenvector of
presented in the following exploits the definition of the eigen-
vector
:
with eigenvalue. The proof
This means that for every eigenvector
there exist another eigenvector
ProofofLemma7: FromLemma5,weknowthat
haspairs of nonzero positive and negative eigenvalues, whose
magnitudes are equal. Let the positive eigenvalues be denoted
by
, then
with eigenvalue
with eigenvalue.
Since, the eigenvalues are bounded by one, again by Lemma 5,
is lower bounded by; consequently
Page 6
RAD: NEARLY SHARP SUFFICIENT CONDITIONS ON EXACT SPARSITY PATTERN RECOVERY4677
To prove, note that
has:
•
•
• and
It is not hard to see that because
eigenvalue of
and hence,
eigenvaluesequalto
eigenvaluesequalto
;
;
eigenvalues equal to one.
and the top
is bounded above by
IV. PROOF OF THEOREM 2
We state two simple lemmas used to prove Theorem 2.
Lemma 8: For Gaussian measurement matrices, with
the average error probability that the optimum decoder
declares
is bounded by
where.
Lemma 9: For the function
defined on positive integers if
(15)
then
Before we prove the two lemmas, let us see how they imply
Theorem 2.
A. Proof of Theorem 2
In order to find conditions under which
asymptotically goes to zero, we exploit the union bound in
conjunction with counting arguments and the previously stated
two lemmas.
First, note that the event
the union of the events
such that . The union bound allows us to bound the
probability of the event
probabilities of events like
can be written as
for all sparsity patterns
by the sum of
. In mathematical terms
Lemma 8 which is based on generating functions of chi-square
distributions introduces an upper bound for the event
; namely
with
bound
obtain an upper bound for the event
depend on
as long as
patterns
that are different from
.Therefore,wecanbound
. If we replace
which follows the definition of
with the lower
we
that does not
is fixed. The number of sparsity
in exactly elements is
by
inequality
. To summarize, exploiting
, we have
(16)
Let
stand for the exponent in the previous equation
where we defined
From Lemma 9, we know that if
(17)
then
and therefore
(18)
For
to
following condition:
, it suffices thatand go
fast enough. In the statement of Theorem 2, we have the
that results in the following upper bound:
(19)
(20)
Page 7
4678 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 7, JULY 2011
Hence, if
(21)
thenwegettheequationshownatthebottomofthepage. There-
fore, inequalities (17) and (21), which are the main conditions
in Theorem 1, imply that
where.
Now we prove the remaining lemmas.
Proof of Lemma 8: The columns of
by definition, disjoint and therefore independent Gaussian
random matrices with column spaces spanning random inde-
pendent
-and -dimensional subspaces, respectively.
The Gaussian random vector
entries with variance
since the random Gaussian vector
onto the subspace orthogonal to the random column space of
, the quantity
chi-square random variable with
Thus
andare,
has i.i.d. Gaussian
. Therefore, we conclude that,
is projected
is a
degrees of freedom.
The first inequality follows from Theorem 1 and the second
equality comes from the well-known formula (see for example
[12]) for the moment-generating function of a chi-square
random variable; that is,
.
Proof of Lemma 9: Let us first explain the idea behind this
Lemma.Weaimtoprovethatundercertainconditions,forsome
, is a decreasing function for
increasing function for
bound
for
and an
. This yields thedesired upper
(22)
Webeginbytakingderivativesof
tioned claim
toprovetheaforemen-
Note that in the following steps, we use inequality (15), i.e.,
to prove inequality (22).
1.
Due to the positivity of the denominator and the quadratic
and concave nature of the numerator of
(a)
for
(b)
for
(c)
for
2. From inequality (15), we have
which ensures that
. This implies the convexity of
and the negativity of
tions depending on whether
1.
: From inequality (15) we have
has two solutionsand such that.
, we have:
;
;
.
. Therefore, we have
for
for. We have two situa-
or not:
which implies that
. This, in conjunction with
plies that
: is convex for
3. Either case, i.e.,
creasing for all
proves the desired inequality (22).
for, im-
is decreasing for.
2.
.
is convex for
and convex for
or de-
,
V. CONCLUSION
In this paper, we examined the probability that the optimal
decoder declares an incorrect sparsity pattern. We obtained an
upper bound for any generic measurement matrix, and this al-
lowed us to calculate the error probability in the case of random
measurement matrices. In the special case when the entries of
the measurement matrix are i.i.d. normal random variables, we
computed an upper bound on the expected error probability.
Sufficient conditions on exact sparsity pattern recovery were
obtained, and they were shown to improve the previous results
[4]–[7]. Moreover, these results asymptotically match (in terms
ofgrowthrate)thecorrespondingnecessaryconditionpresented
in [8]. An interesting open problem is to extend the sufficient
conditionsderivedinthisworktonon-Gaussianandsparsemea-
surement matrices.
Page 8
RAD: NEARLY SHARP SUFFICIENT CONDITIONS ON EXACT SPARSITY PATTERN RECOVERY4679
ACKNOWLEDGMENT
The author would like to express his gratitude to V. Roy-
chowdhury for introducing him to this subject. The author
is grateful to I. Kontoyiannis, L. Paninski, X. Pitkov, and Y.
Mishchenko for careful reading of the manuscript and fruitful
discussions, and to the referees for their critical comments that
improved the presentation of the manuscript.
REFERENCES
[1] M. Zibulevsky and B. Pearlmutter, “Blind source separation by sparse
decomposition in a signal dictionary,” Neur. Comput., vol. 13, pp.
863–882, 2001.
[2] W. Vinje and J. Gallant, “Sparse coding and decorrelation in primary
visual cortex during natural vision,” Science, vol. 287, no. 5456, pp.
1273–1276, 2000.
[3] D. di Bernardo, M. J. Thompson, T. Gardner, S. E. Chobot, E. L.
Eastwood, A. P. Wojtovich, S. J. Elliott, S. Schaus, and J. J. Collins,
“Chemogenomic profiling on a genome-wide scale using reverse-engi-
neered gene networks,” Nat. Biotech, vol. 23, pp. 377–383, Mar. 2005.
[4] M.Wainwright,“Information-theoreticlimitationsonsparsityrecovery
in the high-dimensional and noisy setting,” IEEE Trans. Inf. Theory,
vol. 55, no. 12, pp. 5728–5741, Dec. 2009.
[5] M. Akcakaya and V. Tarokh, “Shannon-theoretic limits on noisy
compressive sampling,” IEEE Trans. Inf. Theory, vol. 56, no. 1, pp.
492–504, Jan. 2010.
[6] A. Fletcher, S. Rangan, and V. Goyal, “Necessary and sufficient con-
ditions on sparsity pattern recovery,” IEEE Trans. Inf. Theory, vol. 55,
no. 12, pp. 5758–5772, Dec. 2009.
[7] A.Karbasi,A.Hormati,S.Mohajer,andM.Vetterli,“Supportrecovery
in compressed sensing: An estimation theoretic approach,” in Proc.
2009 IEEE Int. Symp. Information Theory, 2009.
[8] W. Wang, M. Wainwright, and K. Ramchandran, “Information-theo-
retic limits on sparse signal recovery: Dense versus sparse measure-
mentmatrices,”IEEETrans.Inf.Theory,vol.56,no.6,pp.2967–2979,
Jun. 2010.
[9] G. Reeves and M. Gastpar, “Sampling bounds for sparse support re-
covery in the presence of noise,” in Proc. IEEE Int. Symp. Information
Theory, 2008, pp. 2187–2191.
[10] M. J. Wainwright, “Sharp thresholds for noisy and high-dimensional
recovery of sparsity using ? -constrained quadratic programming
(lasso),” IEEE Trans. Inf. Theory, vol. 55, no. 5, pp. 2183–2202, May
2009.
[11] H. Rauhut, K. Schnass, and P. Vandergheynst, “Compressed sensing
and redundant dictionaries,” IEEE Trans. Inf. Theory, vol. 54, no. 5,
pp. 2210–2219, May 2008.
[12] T. A. Severini, Elements of Distribution Theory.
Cambridge University Press, 2005.
[13] P. Bjorstad and J. Mandel, “On the spectra of sums of orthogonal pro-
jections with applications to parallel computing,” BIT Numer. Math.,
vol. 31, pp. 76–88, 1991.
Cambridge, U.K.:
Kamiar Rahnama Rad was born in Darmstadt, Germany. He received the
B.Sc. degree in electrical engineering from Sharif University of Technology,
Tehran, in 2004 and the M.Sc. degree in electrical engineering from University
of California, Los Angeles, in 2006. He is currently pursuing the Ph.D. degree
in statistics at Columbia University, New York.
His research interests include information theory, computational neuro-
science, and social learning theory.
Download full-text