Page 1

4672IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 7, JULY 2011

Nearly Sharp Sufficient Conditions on

Exact Sparsity Pattern Recovery

Kamiar Rahnama Rad

Abstract—Consider the ?-dimensional vector ? ? ???? where

? ?

This canbeviewed as alinear system withsparsityconstraints cor-

rupted by noise, where the objective is to estimate the sparsity pat-

tern of ? given the observation vector ? and the measurement ma-

trix ?. First, we derive a nonasymptotic upper bound on the prob-

ability that a specific wrong sparsity pattern is identified by the

maximum-likelihood estimator. We find that this probability de-

pends (inversely)exponentially on the difference of ?????and the

??-norm of ?? projected onto the range of columns of ? indexed

by thewrong sparsitypattern.Second, when? israndomlydrawn

from a Gaussian ensemble, we calculate a nonasymptotic upper

bound on the probability of the maximum-likelihood decoder not

declaring (partially) the true sparsity pattern. Consequently, we

obtain sufficient conditions on the sample size ? that guarantee al-

most surely the recovery of the true sparsity pattern. We find that

the required growth rate of sample size ? matches the growth rate

of previously established necessary conditions.

?has only ? nonzero entries and ? ?

?is a Gaussian noise.

Index Terms—Hypothesis testing, random projections, sparsity

pattern recovery, subset selection, underdetermined systems of

equations.

I. INTRODUCTION

F

technology; examples include array signal processing [1],

neural [2] and genomic data analysis [3], to name a few. In

many of these applications, it is natural to seek for sparse solu-

tions ofsuch systems, i.e., solutions withfewnonzero elements.

A common setting is when we believe or we know a priori that

only a small subset of the candidate sources, neurons, or genes

influence the observations, but their location is unknown.

More concretely, the problem we consider is that of esti-

mating the support of

that only

of its entries are nonzero based on the observational

model

INDING solutions to underdetermined systems of equa-

tions arises in a wide array of problems in science and

given the a priori knowledge

(1)

where

tors,

ditive measurement noise, assumed to be zero mean and with

is a collection of input measurement vec-

is the output measurement andis the ad-

Manuscript received October 02, 2009; revised August 19, 2010; accepted

January 28, 2011. Date of current version June 22, 2011. This work was pre-

sented at the 43rd Annual Conference on Information Sciences and Systems,

March 2009.

The author is with the Department of Statistics, Columbia University, New

York, NY 10027 USA (e-mail: kamiar@stat.columbia.edu).

Communicated by J. Romberg, Associate Editor for Signal Processing.

Digital Object Identifier 10.1109/TIT.2011.2145670

known covariance equal to

responding entry of

surement, respectively.

The output of the optimal (sparsity) decoder is defined as the

support set of the sparse solution

minimizes the residual sum of squares where

1. Each row ofand the cor-

are viewed as an input and output mea-

with support sizethat

(2)

is the optimal estimate of

sparseness. The support set of

imizing the probability of identifying a wrong sparsity pattern.

First, we are concerned with the likelihood of the sparsity

patternof

asa functionofand .Weobtainanupperbound

on the probability that

has any specific sparsity pattern and

find that this bound depends (inversely) exponentially on the

difference of

and the-norm of

range of columns of

indexed by the wrong sparsity pattern.

Second, when the entries of

cally distributed(i.i.d.) random variables we are concerned with

establishingsufficientconditionsthatguaranteethereliabilityof

sparsity pattern recovery. Ideally, we would like to characterize

such conditions based on a minimal number of parameters in-

cluding the sparsity level , the signal dimension , the number

of measurements

and the signal-to-noise ratio(SNR) which is

equal to

given the a priori information of

is optimal in the sense of min-

projected onto the

are independent and identi-

(3)

Assume that the absolute value of the nonzero entries of

lower bounded by

the entries of

is equal to one11. Hence

are

2. Further, suppose that the variance of

and therefore it is natural to ask, how does the ability to reliably

estimate the sparsity pattern depend on

We find that a nonasymptotic upper bound on the probability

of the maximum-likelihood decoder not declaring the true spar-

sity pattern can be found when the entries of the measurement

matrix are i.i.d. normal random variables. This allows us to

obtain sufficient conditions on the number of measurements

as a function of

We show that our results strengthen earlier sufficient conditions

.

for reliable sparsity recovery.

1This entails no loss of generality, by standard rescaling of ?.

2To the best of our knowledge, Wainwright [4] was the first to formulate the

information theoretic limitations of sparsity pattern recovery using ?

of the key parameters.

as one

0018-9448/$26.00 © 2011 IEEE

Page 2

RAD: NEARLY SHARP SUFFICIENT CONDITIONS ON EXACT SPARSITY PATTERN RECOVERY 4673

[4]–[7], and we show that the sufficient conditions on

the growth rate of the necessary conditions in both the linear,

i.e.,

, and the sublinear, i.e.,

long as

isand

match

, regimes, as

.

A. Previous Work

A large body of recent work, including [4]–[10], analyzed

reliable sparsity pattern recovery exploiting optimal and sub-

optimal decoders for large random Gaussian measurement ma-

trices. The average error probability, necessary and sufficient

conditions for sparsity pattern recovery for Gaussian measure-

ment matrices were analyzed in [4] in terms of

As a generalization of the previous work, using the Fano in-

equality, necessary conditions for general random and sparse

measurementmatriceswerepresentedin[8].Thesufficientcon-

ditions in [6] were obtained based on a simple maximum cor-

relation algorithm and a closely related thresholding estimator

discussed in [11]. In addition to the well-known formulation of

thenecessary and sufficientconditionsbasedon

Fletcher et al. [6] included the maximum-to-average ratio3of

in their analysis. Necessary and sufficient conditions for frac-

tional sparsity pattern recovery were analyzed in [5], [9].

We will discuss the relationship to this work below in more

depth, after describing our analysis and results in more detail.

.

,

B. Notation

The following conventions will remain in effect throughout

this paper. Calligraphic letters are used to indicate sparsity pat-

terns defined as a set of integers between 1 and , with cardi-

nality . We say

has sparsity pattern

with indices

are nonzero.

entries that are in

but not in

. We denote by, the matrix obtained from

extracting

columns with indices obeying

stand for the sparsity pattern or support set of

norm

of a matrixdefined as

if the entries

stands for the set of

for the cardinality ofand

by

. Let

. The matrix

Note that if

equal to the top eigenvalue of

all vector norms are

orthonormal operator projecting into the subspace spanned by

the columns of

be defined as

is a positive semi-definite matrix then

. Except for the matrix norm

,

is

. Finally, let the

.

II. RESULTS

For the observational model in (1), assume that the true spar-

sity model is; as a result

(4)

3The maximum-to-average ratio of ? was defined as ?????? .

We first state a result on the probability of the event

i.e.,

ment matrix

.

,

, for anyand any measure-

Theorem 1: For the observational model of (4) and estimate

in (2), the following bound holds:

where

The proof of Theorem 1, given in Section III, employs the

Chernoff technique and the properties of the eigenvalues of the

difference of projection matrices, to bound the probability of

declaring a wrong sparsity pattern

asfunctionofthemeasurementmatrix

. The error ratedecreases exponentiallyin the norm of the pro-

jection of

on the orthogonal subspace spanned by

the columns of

. This is in agreement with the intuition that

the closer different subspaces corresponding to different sets of

columns of

are, the harder it is to differentiate them, and

hence the higher the error probability will be.

Thetheorembelowgivesanonasymptoticboundontheprob-

abilityoftheeventthatthedeclaredsparsitypattern

from thetrue sparsity pattern

the entries of the measurement matrix

standard normal distribution. It is clear that by letting

obtain an upper bound on the error probability of exact sparsity

pattern recovery.

.

instead of the true one

andthetrueparameter

differs

in no more thanindices, when

are drawn i.i.d. from a

we

Theorem 2: Suppose that for the observational model of (4)

and the estimate

in (2) the entries of

. If we have the equation shown at the bottom of the

page, where

are i.i.d.and

then

where

ThekeyelementsintheproofincludeTheorem1,application

of union bounds (a fairly standard technique which has been

and.

Page 3

4674IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 7, JULY 2011

used before for this problem [4], [5], [7]), asymptotic behavior

of binomial coefficients and properties of convex functions.

Note that in the linear regime, i.e.,

andthe probability of misidentifying more than

any fraction (less than one) goes to zero exponentially fast as

. In words, if the SNR is fixed while the dimension of

the signal increases unboundedly, it is still possible to recover

reliably some fraction of the support. This is in agreement with

previous results on partial sparsity pattern recovery [5], [9].

Ifwelet

,,and

the upper bound of

For

or equivalently

is bounded above by

, with

scaleasafunctionof ,then

scales like

, the probability of error as

for some

.

. Therefore

(5)

is finite and as a consequence of the Borel-Cantelli Lemma, for

large enough , the decoder declares the true sparsity pattern al-

mostsurely.Inotherwords,theestimate

the same loss as an oracle which is supplied with perfect infor-

mationaboutwhichcoefficientsof

corollary summarizes the aforementioned statements.

basedon(2)achieves

arenonzero.Thefollowing

Corollary 3: For the observational model of (4) and the esti-

mate

in (2), let ,and

there exists a constant

such that if

and

scale as a function of . Then

is and,

then a.s. for large enough ,

loss as an oracle which is supplied with perfect information

about which coefficients of

are nonzero and

Remarks:

•

is required to ensure that for a sufficiently

large

, we have

whereand are defined in Theorem 1.

•

is required to ensure that for a sufficiently

large , we have

in Theorem 1.

ThesufficientconditionsinCorollary3canbecomparedagainst

similar conditions for exact sparsity pattern recovery in [4]–[7];

for example, in the sublinear regime

, [4], [7] proved that

[5], [6] proved that

vein, according to Corollary 3

achieves the same performance

.

whereis defined

, when

is sufficient, and

is sufficient. In that

suffices to ensure exact sparsity pattern recovery; therefore, it

strengthens these earlier results.

What remains is to see whether the sufficient conditions in

Corollary 3 match the necessary conditions proved in [8]:

Theorem 4 [8]: Suppose that the entries of the measurement

matrix

are drawn i.i.d. from any distribution with

zero-mean and variance one. Then a necessary condition for

asymptotically reliable recovery is that

where

The necessary condition in Theorem 4 asymptotically resem-

bles the sufficient condition in Corollary 3; recall that

. The sufficient conditions of Corollary 3 can be com-

pared against the necessary conditions in [8] for exact sparsity

pattern recovery, as shown in Table I. The first paper to estab-

lish the sufficient conditions in row 1 and row 4 of Table I is

[10]. The sufficient conditions presented in the first four rows

of Table I are a consequence of past work [4], also recovered

by Corollary 3. The new stronger result in this paper provides

the sufficient conditions in row 5 and 6, which did not appear in

previous studies [4]–[7], and match the previous necessary con-

ditions presented in [8]. (It is worth reminding that these results

are restricted to

and.).

III. PROOF OF THEOREM 1

We first state three basic lemmas.

Lemma 5: If any

early independent then for any sparsity pattern

that

the difference of projection matrices

haspairs of nonzero positive and negative

eigenvalues, bounded above by one and bounded below by neg-

ative one, respectively, and equal in magnitude.

columns of thematrixare lin-

suchand

Lemma 6: For

and, we have

Lemma 7: Forand, we have

Wedefertheproofsofthelemmas5and7toaftertheproofof

Theorem 1. Lemma 6 follows standard Gaussian integrals [12].

A. Proof of Theorem 1

For a given sparsity pattern

squares is achieved by

, the minimum residual sum of

where

column space of

size , the optimum decoder declares

denotes the orthogonal projection operator into the

; that is, among all sparsity patterns with

Page 4

RAD: NEARLY SHARP SUFFICIENT CONDITIONS ON EXACT SPARSITY PATTERN RECOVERY 4675

TABLE I

NECESSARY AND SUFFICIENT CONDITIONS ON THE NUMBER OF MEASUREMENTS ? REQUIRED FOR RELIABLE SUPPORT RECOVERY IN THE LINEAR AND THE

SUBLINEAR REGIME. THE SUFFICIENT CONDITIONS PRESENTED IN THE FIRST FOUR ROWS ARE A CONSEQUENCE OF PAST WORK [4], ALSO RECOVERED BY

COROLLARY 3. THE NEW STRONGER RESULT IN THIS PAPER PROVIDES THE SUFFICIENT CONDITIONS IN ROW 5 AND 6, WHICH DID NOT APPEAR IN PREVIOUS

STUDIES [4]–[7], AND MATCH THE NECESSARY CONDITIONS PRESENTED IN [8]

as the optimum estimate of the true sparsity pattern in terms of

minimum error probability. Recall the definition of

note that

. If the decoder incorrectly declares

instead of the true sparsity pattern (namely

in (2) and

), then

or equivalently

The probability that the optimal decoder declares wrongly the

sparsity pattern

instead of the true sparsity pattern

than the probability that

technique an upper bound on the probability that

obtained

is less

. With the aid of the Chernoff

is

Note that

in Gaussian random vectors. This allows us to use standard

Gaussian integrals to calculate

bound the expectation,

is required to be bounded which is a

necessary condition in Lemma 6. From Lemma 6, we learned

that

is a random variable that has a quadratic form

. In order to

(6)

where we made the following abbreviations:

ForLemma 6, we need

that the eigenvalues of

consequently, (6) holds for

With the aid of the definition of the

applying it to

(6) can be bounded as follows:

and we provein Lemma5

are bounded in absolute value by one;

.

norm of matrices and

the first term in the r.h.s. of

(7)

we

Since

have

lies in the subspace spanned by the columns of

which yields the following:

and similarly

The aforementioned equations and the inequality (7) yields the

upper bound shown in (8), as found at the bottom of the page.

Lemma7introducesanupperboundfor

a lower bound for

simplify the upper bound of

in the proof of Lemma 7 is the eigenvalue properties of

and

that can be used to further

. The main ingredient

(8)

Page 5

4676IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 7, JULY 2011

that were established in Lemma 5. Substituting the bounds

obtained in Lemma 7 in (8), we have

(9)

Finally, to prove Theorem 1, we take the infimum of

overwhich is equal to

and obtain the desired bound as shown in the equation

at the bottom of the page.

at

Now we prove the remaining lemmas.

Proof of Lemma 5: Before we prove the result, let us intro-

duce some notations.

• For any

subspace spanned by the columns of

•

stands for the subspace orthogonal to

•

andstand for

, respectively,

• and finally for any subspace

onal projection onto

. (With a slight abuse of notation,

for any sparsity pattern

, we use

It is worthwhile noting that

Lemma 4.1], for any

and

,is defined as the linear

,

,

and

,designates the orthog-

instead of

is empty. From [13,

, it holds that

).

in

(10)

(11)

which yields

Consequently

(12)

Since any set of columns of

independent, for any

, we have

with size less or equal to

such that

are

andand

and

(13)

(14)

therefore

The dimension of

is equal to

which is the null space of

We just proved that

eigenvalue zero. The range of

space

.Therefore,

with absolute value less or equal to one (The eigenvalues of

are equal to one only if

If

is an eigenvector of

we have

haseigenvalues with

is thedimensional

nonzeroeigenvalueshas

.)

with eigenvalue , then

Next, we prove that the vector

is an eigenvector of

presented in the following exploits the definition of the eigen-

vector

:

with eigenvalue. The proof

This means that for every eigenvector

there exist another eigenvector

ProofofLemma7: FromLemma5,weknowthat

haspairs of nonzero positive and negative eigenvalues, whose

magnitudes are equal. Let the positive eigenvalues be denoted

by

, then

with eigenvalue

with eigenvalue.

Since, the eigenvalues are bounded by one, again by Lemma 5,

is lower bounded by ; consequently

Page 6

RAD: NEARLY SHARP SUFFICIENT CONDITIONS ON EXACT SPARSITY PATTERN RECOVERY4677

To prove , note that

has:

•

•

• and

It is not hard to see that because

eigenvalue of

and hence,

eigenvaluesequalto

eigenvaluesequalto

;

;

eigenvalues equal to one.

and the top

is bounded above by

IV. PROOF OF THEOREM 2

We state two simple lemmas used to prove Theorem 2.

Lemma 8: For Gaussian measurement matrices, with

the average error probability that the optimum decoder

declares

is bounded by

where.

Lemma 9: For the function

defined on positive integers if

(15)

then

Before we prove the two lemmas, let us see how they imply

Theorem 2.

A. Proof of Theorem 2

In order to find conditions under which

asymptotically goes to zero, we exploit the union bound in

conjunction with counting arguments and the previously stated

two lemmas.

First, note that the event

the union of the events

such that. The union bound allows us to bound the

probability of the event

probabilities of events like

can be written as

for all sparsity patterns

by the sum of

. In mathematical terms

Lemma 8 which is based on generating functions of chi-square

distributions introduces an upper bound for the event

; namely

with

bound

obtain an upper bound for the event

depend on

as long as

patterns

that are different from

.Therefore,wecanbound

. If we replace

which follows the definition of

with the lower

we

that does not

is fixed. The number of sparsity

in exactly elements is

by

inequality

. To summarize, exploiting

, we have

(16)

Let

stand for the exponent in the previous equation

where we defined

From Lemma 9, we know that if

(17)

then

and therefore

(18)

For

to

following condition:

, it suffices thatandgo

fast enough. In the statement of Theorem 2, we have the

that results in the following upper bound:

(19)

(20)

Page 7

4678IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 7, JULY 2011

Hence, if

(21)

thenwegettheequationshownatthebottomofthepage. There-

fore, inequalities (17) and (21), which are the main conditions

in Theorem 1, imply that

where.

Now we prove the remaining lemmas.

Proof of Lemma 8: The columns of

by definition, disjoint and therefore independent Gaussian

random matrices with column spaces spanning random inde-

pendent

-and-dimensional subspaces, respectively.

The Gaussian random vector

entries with variance

since the random Gaussian vector

onto the subspace orthogonal to the random column space of

, the quantity

chi-square random variable with

Thus

andare,

has i.i.d. Gaussian

. Therefore, we conclude that,

is projected

is a

degrees of freedom.

The first inequality follows from Theorem 1 and the second

equality comes from the well-known formula (see for example

[12]) for the moment-generating function of a chi-square

random variable; that is,

.

Proof of Lemma 9: Let us first explain the idea behind this

Lemma.Weaimtoprovethatundercertainconditions,forsome

,is a decreasing function for

increasing function for

bound

for

and an

. This yields thedesired upper

(22)

Webeginbytakingderivativesof

tioned claim

toprovetheaforemen-

Note that in the following steps, we use inequality (15), i.e.,

to prove inequality (22).

1.

Due to the positivity of the denominator and the quadratic

and concave nature of the numerator of

(a)

for

(b)

for

(c)

for

2. From inequality (15), we have

which ensures that

. This implies the convexity of

and the negativity of

tions depending on whether

1.

: From inequality (15) we have

has two solutionsandsuch that.

, we have:

;

;

.

. Therefore, we have

for

for . We have two situa-

or not:

which implies that

. This, in conjunction with

plies that

:is convex for

3. Either case, i.e.,

creasing for all

proves the desired inequality (22).

for, im-

is decreasing for.

2.

.

is convex for

and convex for

or de-

,

V. CONCLUSION

In this paper, we examined the probability that the optimal

decoder declares an incorrect sparsity pattern. We obtained an

upper bound for any generic measurement matrix, and this al-

lowed us to calculate the error probability in the case of random

measurement matrices. In the special case when the entries of

the measurement matrix are i.i.d. normal random variables, we

computed an upper bound on the expected error probability.

Sufficient conditions on exact sparsity pattern recovery were

obtained, and they were shown to improve the previous results

[4]–[7]. Moreover, these results asymptotically match (in terms

ofgrowthrate)thecorrespondingnecessaryconditionpresented

in [8]. An interesting open problem is to extend the sufficient

conditionsderivedinthisworktonon-Gaussianandsparsemea-

surement matrices.

Page 8

RAD: NEARLY SHARP SUFFICIENT CONDITIONS ON EXACT SPARSITY PATTERN RECOVERY4679

ACKNOWLEDGMENT

The author would like to express his gratitude to V. Roy-

chowdhury for introducing him to this subject. The author

is grateful to I. Kontoyiannis, L. Paninski, X. Pitkov, and Y.

Mishchenko for careful reading of the manuscript and fruitful

discussions, and to the referees for their critical comments that

improved the presentation of the manuscript.

REFERENCES

[1] M. Zibulevsky and B. Pearlmutter, “Blind source separation by sparse

decomposition in a signal dictionary,” Neur. Comput., vol. 13, pp.

863–882, 2001.

[2] W. Vinje and J. Gallant, “Sparse coding and decorrelation in primary

visual cortex during natural vision,” Science, vol. 287, no. 5456, pp.

1273–1276, 2000.

[3] D. di Bernardo, M. J. Thompson, T. Gardner, S. E. Chobot, E. L.

Eastwood, A. P. Wojtovich, S. J. Elliott, S. Schaus, and J. J. Collins,

“Chemogenomic profiling on a genome-wide scale using reverse-engi-

neered gene networks,” Nat. Biotech, vol. 23, pp. 377–383, Mar. 2005.

[4] M.Wainwright,“Information-theoreticlimitationsonsparsityrecovery

in the high-dimensional and noisy setting,” IEEE Trans. Inf. Theory,

vol. 55, no. 12, pp. 5728–5741, Dec. 2009.

[5] M. Akcakaya and V. Tarokh, “Shannon-theoretic limits on noisy

compressive sampling,” IEEE Trans. Inf. Theory, vol. 56, no. 1, pp.

492–504, Jan. 2010.

[6] A. Fletcher, S. Rangan, and V. Goyal, “Necessary and sufficient con-

ditions on sparsity pattern recovery,” IEEE Trans. Inf. Theory, vol. 55,

no. 12, pp. 5758–5772, Dec. 2009.

[7] A.Karbasi,A.Hormati,S.Mohajer,andM.Vetterli,“Supportrecovery

in compressed sensing: An estimation theoretic approach,” in Proc.

2009 IEEE Int. Symp. Information Theory, 2009.

[8] W. Wang, M. Wainwright, and K. Ramchandran, “Information-theo-

retic limits on sparse signal recovery: Dense versus sparse measure-

mentmatrices,”IEEETrans.Inf.Theory,vol.56,no.6,pp.2967–2979,

Jun. 2010.

[9] G. Reeves and M. Gastpar, “Sampling bounds for sparse support re-

covery in the presence of noise,” in Proc. IEEE Int. Symp. Information

Theory, 2008, pp. 2187–2191.

[10] M. J. Wainwright, “Sharp thresholds for noisy and high-dimensional

recovery of sparsity using ? -constrained quadratic programming

(lasso),” IEEE Trans. Inf. Theory, vol. 55, no. 5, pp. 2183–2202, May

2009.

[11] H. Rauhut, K. Schnass, and P. Vandergheynst, “Compressed sensing

and redundant dictionaries,” IEEE Trans. Inf. Theory, vol. 54, no. 5,

pp. 2210–2219, May 2008.

[12] T. A. Severini, Elements of Distribution Theory.

Cambridge University Press, 2005.

[13] P. Bjorstad and J. Mandel, “On the spectra of sums of orthogonal pro-

jections with applications to parallel computing,” BIT Numer. Math.,

vol. 31, pp. 76–88, 1991.

Cambridge, U.K.:

Kamiar Rahnama Rad was born in Darmstadt, Germany. He received the

B.Sc. degree in electrical engineering from Sharif University of Technology,

Tehran, in 2004 and the M.Sc. degree in electrical engineering from University

of California, Los Angeles, in 2006. He is currently pursuing the Ph.D. degree

in statistics at Columbia University, New York.

His research interests include information theory, computational neuro-

science, and social learning theory.