On the Relation Between Sparse Reconstruction and Parameter Estimation With Model Order Selection
ABSTRACT We examine the relationship between sparse linear reconstruction and the classic problem of continuous parametric modeling. In sparse reconstruction, one wishes to recover a sparse amplitude vector from a measurement that is described as a linear combination of a small number of discrete additive components. Recent results in the compressive sensing literature have provided fast sparse reconstruction algorithms with guaranteed performance bounds for problems with certain structure. In this paper, we show an explicit connection between sparse reconstruction and parameter/order estimation and demonstrate how sparse reconstruction may be used to solve model order selection and parameter estimation problems. The structural assumption used in compressive sensing to guarantee reconstruction performance-the Restricted Isometry Property-is not satisfied in the general parameter estimation context. Nonetheless, we develop a method for selecting sparsity parameters such that sparse reconstruction mimics classic order selection criteria such as Akaike information criterion (AIC) and Bayesian information criterion (BIC). We compare the performance of the sparse reconstruction approach with traditional model order selection/parameter estimation techniques for a sinusoids-in-noise example. We find that the two methods have comparable performance in most cases, and that sparse linear modeling performs better than traditional model-based parameter/order estimation for closely spaced sinusoids with low signal-to-noise ratio.
-
Citations (0)
- Cited In (1)
-
Article: Proportionate Adaptive Filters From a Basis Pursuit Perspective
[show abstract] [hide abstract]
ABSTRACT: In this letter, we show that the normalized least-mean-square (NLMS) algorithm and the affine projection algorithm (APA) can be decomposed as the sum of two orthogonal vectors. One of these vectors is derived from an ℓ<sub>2</sub>-norm optimization problem while the other one is simply a good initialization vector. By replacing this optimization with the basis pursuit, which is based on the ℓ<sub>1</sub>-norm optimization, we derive the proportionate NLMS (PNLMS) algorithm and the proportionate APA (PAPA). Many other adaptive filters can be derived following this approach, including new ones.IEEE Signal Processing Letters 01/2011; · 1.39 Impact Factor
Page 1
1
On the Relation between Sparse Reconstruction
and Parameter Estimation with Model Order
Selection
Christian D. Austin, Student Member, IEEE, Randolph L. Moses, Senior Member, IEEE,
Joshua N. Ash, Member, IEEE, Emre Ertin, Member, IEEE
Abstract
We examine the relationship between sparse linear reconstruction and the classic problem of continuous parametric
modeling. In sparse reconstruction, one wishes to recover a sparse amplitude vector from a measurement that
is described as a linear combination of a small number of discrete additive components. Recent results in the
compressive sensing literature have provided fast sparse reconstruction algorithms with guaranteed performance
bounds for problems with certain structure. In this paper we show an explicit connection between sparse reconstruction
and parameter/order estimation and demonstrate how sparse reconstruction may be used to solve model order
selection and parameter estimation problems. The structural assumption used in compressive sensing to guarantee
reconstruction performance—the Restricted Isometry Property—is not satisfied in the general parameter estimation
context. Nonetheless, we develop a method for selecting sparsity parameters such that sparse reconstruction mimics
classic order selection criteria such as AIC and BIC. We compare the performance of the sparse reconstruction
approach with traditional model order selection/parameter estimation techniques for a sinusoids-in-noise example.
We find that the two methods have comparable performance in most cases, and that sparse linear modeling performs
better than traditional model-based parameter/order estimation for closely-spaced sinusoids with low SNR.
Index Terms
Model Order Selection, Parameter Estimation, Sparse Reconstruction, Compressed Sensing, Information Criteria
I. INTRODUCTION
We examine the relationship between the classic problem of parametric modeling and sparse reconstruction. In
many parametric modeling problems, one is given noisy measurements of a signal that is a weighted sum of M
terms, each term parameterized by a set of parameters θm:
yn=
M
?
m=1
αmf(tn,θm) + ǫn,n = 1,...,N.
(1)
The authors are with the Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH 43210 USA.
February 6, 2010 DRAFT
Page 2
2
In general, the signal components f(t,θ) depend nonlinearly on the θmvectors. The goal is to estimate the model
order M, the parameter vectors Θ = {θm}M
m=1, and α = {αm}M
m=1from the noisy measurement vector y =
[y1,...,yN]T, where ǫnis noise. In sparse reconstruction, a measurement vector y is modeled as a linear system
y = Ax + ǫ, where A is a known matrix and ǫ = [ǫ1,...,ǫN]Tis a noise vector; the goal is to reconstruct x. For
sparse reconstruction, the linear system A is typically highly underdetermined; so, A has many more columns than
rows, and x is assumed to be sparse, that is, to have a small number of nonzero elements.
Sparse reconstruction, or sparse linear modeling, is closely related to compressed sensing [1], [2], and there
has been a wealth of recent results on both algorithms and reconstruction guarantees for sparse solutions of linear
inverse problems (see, e.g., [3], [4]). Compressed sensing (CS) has recently been successfully applied to a number
of problems in signal and image modeling and reconstruction. These techniques apply to applications in which a
measurement can be described as a linear combination of terms in which the number of terms is known to be small
but the indices of nonzero terms is unknown a priori.
Sparse reconstruction can be connected to parametric modeling by sampling the parameter space and forming
the columns of A from evaluations of f(t,θ) over a sampled grid of θ-values. Then, parametric estimation is
approximated by selecting a small number of these columns that correspond to the nonzero entries of the sparse
vector x. This method of sparse estimation has been previously considered for estimating the direction of arrival
parameter in source localization [5]–[7], for estimating a scattering center location parameter [8], or range and
doppler parameters [9] in radar signal processing, and the location parameter of spin density in EPR medical
imaging [10]. This work differs from previous work using sparse parameter estimation in that it addresses the
issues of dictionary matrix construction and sparse parameter selection through consideration of both the underlying
parametric model and model order selection.
In this paper, we formally pose the joint parameter estimation and model order selection problem as a sparse
reconstruction problem. We discuss how sampling of the parameter space relates to the Restricted Isometry Property
(RIP) and impacts estimation accuracy. We show that information-based parametric model order selection methods
(e.g., AIC, BIC, GIC) may be incorporated into sparse reconstruction problem statements. Based on this formulation,
we present a sparse reconstruction algorithm that performs both parameter estimation and model order selection.
We also investigate how the sparse reconstruction algorithm settings, along with parameter vector sampling, impact
the corresponding parametric modeling solution. Finally, we illustrate the performance of the proposed approach
on the classical example of sinusoids-in-noise model order selection and parameter estimation.
The remainder of this paper is organized as follows. In Section II we review both parametric modeling and
sparse reconstruction and discuss their connection in the context of model order selection and parameter estimation.
In Section III, we present the main result of the paper, which connects the choice of the sparsity parameter
in sparse reconstruction to information criteria in classic model order selection problems; and we develop a
corresponding sparse reconstruction algorithm to implement both parameter estimation and model order selection.
Section IV compares results of direct parametric modeling to modeling using the sparse reconstruction approach,
and conclusions are given in Section V.
DRAFTFebruary 6, 2010
Page 3
AUSTIN ET AL: ON THE RELATION BETWEEN...3
II. MODEL ORDER SELECTION AND PARAMETER ESTIMATION
We briefly review classic parametric model order selection/continuous-parameter estimation and sparse recon-
struction, and discuss how these two methods relate.
A. Classical model order selection and parametric estimation
The classical parameter estimation setting considers α,Θ in (1) as continuous parameters to be estimated. When
the model order M in (1) is known, it is straightforward to compute the maximum likelihood estimate (MLE) of
the model parameters as
{ˆ αml,ˆΘml} = argmax
[α,Θ]lnp(y;α,Θ),
(2)
where lnp(y;α,Θ) is the log-likelihood function of α and Θ for a given measurement vector y. Construction of
p(y;α,Θ) requires knowledge of the distribution of the additive noise {ǫn} in (1). When noise samples are i.i.d.
complex circular Gaussian, ǫn∼ CN(0,σ2), n = 1...N, the MLE takes the form
{ˆ αml,ˆΘml} = arg min
[α,Θ]
N
?
n=1
1
σ2
?????yn−
M
?
m=1
αmf(tn,θm)
?????
2
,
(3)
which, for general f(t,θ), is a nonlinear least-squares optimization problem.
For many parametric modeling problems, the model order M is unknown and must be estimated using, for exam-
ple, classic information criteria methods such as the Akaike Information Criterion (AIC), the Bayesian Information
Criterion (BIC), or the Generalized Information Criteria (GIC). For the parametric model shown in (1) with i.i.d.
CN(0,σ2) noise, these criteria take the common form [11]
J(M; ˆ αml,ˆΘml) =
N
?
n=1
1
σ2
?????yn−
M
?
m=1
ˆ αmf(tn,ˆΘm)
?????
2
+ η,
(4)
where
η =
ne,
(AIC)
ln(N)(ne/2),
(BIC)
ν ne,
(GIC)
.
(5)
The variable ν is a user-defined penalty within the GIC framework, and neis the effective number of unknown
parameters in the model. Typically, the effective number of parameters is equal to the actual number of real-valued
unknown parameters in the model. In this case, for the model (1), ne= M(1+length(θm)) if the {αm} are real, and
ne= M(2+length(θm)) if the {αm} are complex-valued. However, there are exceptions to the parameter counting
rule-of-thumb, such as sinusoids-in-noise, where ne= 5M, even though there are 3M unknowns (amplitude, phase,
and frequency unknowns for each sinusoid); see e.g. [12].
The information-criteria model order estimate minimizes the cost (4)
ˆ
M = argmin
M
J(M; ˆ αml,ˆΘml),
(6)
with η chosen from (5) according to the desired selection rule.
February 6, 2010DRAFT
Page 4
4
B. Sparse reconstruction
Sparse reconstruction seeks to solve an underdetermined linear system with sparsifying constraints of the form
y = Ax
s.t. ||x||0= M,
(7)
where x ∈ CKis an M-sparse vector (i.e., it only contains M nonzero elements) to be determined from a
measurement y ∈ CNand known matrix A ∈ C(N×K), N < K. We use ?x?0to denote the 0-(quasi)norm, which
counts the number of nonzero elements of x. The measurement equation y = Ax is ill-posed and lacks a unique
solution without the sparsifying constraint. However, the constraint ||x||0= M is discontinuous in x and imposes
combinatoric complexity in solving (7); a direct solution is to try all
?K
M
?
possible choices of the M nonzero
element indices of the K × 1 vector x. To overcome this combinatoric complexity, recent results have shown that
when A satisfies RIP [1], then the solution to the convex problem
min||x||1
s.t. y = Ax
(8)
is unique and identical to the solution of (7). The optimization (8) is known as Basis Pursuit (BP) [13].
When the measurements contain noise (y = Ax + ǫ), BP may be represented as the dual problem
min
x
||y − Ax||2
2+ λ||x||1
1,
(9)
where the first term is a data fit measure, and the second is a sparsifying cost; the variable λ is referred to as
the sparsity parameter and trades off data fidelity with sparsity. The optimization (9) is referred to as basis pursuit
denoising (BPDN) [13].
The use of ℓ1norms to induce sparsifying solutions to linear systems has been employed for many years, however
recent developments and interest in CS have generated a renewed interest in sparse reconstruction. Compressive
sensing typically uses A-matrices populated with random elements [1], [2]. This construction has been shown to
satisfy the RIP with high probability and therefore guarantees reconstruction performance using BPDN.
However, unlike CS, the use of sparse reconstruction to solve general model order and parameter estimation
problems does not typically satisfy the RIP condition. Applying sparse reconstruction methods to the additive
component model (1) requires forming a dictionary matrix A with elements evaluated from f(t,θ). Define the
N-vector
a(θ) = [f(t1,θ),...,f(tN,θ)]T,
(10)
where we assume, without loss of generality, that f(t,θ) is scaled such that ||a(θ)||2= 1. The noiseless component
model may then be written
y =
M
?
m=1
αma(θm)
(11)
≈ Ax,
||x||0= M
(12)
DRAFTFebruary 6, 2010
Page 5
AUSTIN ET AL: ON THE RELATION BETWEEN...5
where
A = [a(¯θ1),...,a(¯θK)](N × K)
(13)
is a dictionary matrix whose columns are formed by evaluating a(θ) on a set of parameter sample points¯θ1,...,¯θK.
When the parameter samples are sufficiently dense, the approximation (12) will hold with the nonzero elements
of x given by the elements of α. The nonzero elements of x effectively select the columns of A with parameters
close to the true parameters {θm}. In the case of additive noise, this sampled variant of the additive model may be
solved via an ℓp-norm extension of BPDN
˜ x = argmin
x
||y − Ax||2
2+ λ||x||p
p,
(14)
where ||x||p
p=?K
k=1|xk|p,0 < p ≤ 1. Note that limp→0?x?p
compute the order estimateˆ
M, the amplitude estimates {ˆ αm}, and the unobservable parameter estimates {ˆθm}:
p= ?x?0. A solution ˜ x to (14) then enables us to
ˆ
M = ||˜ x||0
(15)
ˆ αm= ˜ xIm
(16)
ˆθm=¯θIm,
(17)
where {Im} is an ordered set such that |˜ xIm| is the mth largest-magnitude element of ˜ x.
Clearly the order and parameter estimates (15)–(17) depend on the solution ˜ x which, in turn, depends on i) how
the parameter space is sampled in order to form the columns of A, ii) the value of p defining the ℓpnorm, and iii)
the value of the sparsity parameter, λ. The following section describes the proposed reconstruction algorithm and
how each of these three elements may be appropriately selected in order to effectively perform joint model order
selection and parameter estimation using sparse reconstruction.
III. MAIN RESULTS
In this section we develop a procedure for implementing both model order selection and parameter estimation
using sparse reconstruction, and we discuss algorithmic and performance considerations in selecting p, λ, and the
θ sampling density.
A. Reconstruction Algorithm and λ-selection
In practice, a solution ˜ x to the optimization (14) may only be “approximately sparse,” meaning that only a small
number of components have significant magnitudes while many other components have negligible, but nonzero,
magnitudes. These small-magnitude components contribute very little to the predicted model and effectively only
serve to artificially increase the perceived model order. Therefore, we set the small-magnitude components of ˜ x to
zero using the thresholding operation
ˆ x = H(˜ x) : ˆ xk=
0,
if
20log10
?
|˜ xk|
max
j
|˜ xj|
?
< τ
˜ xk,
otherwise
(18)
February 6, 2010 DRAFT
Page 6
6
to generate a new amplitude vector ˆ x. Thus, all components of ˜ x that are more than τ dB down from the largest
magnitude component are set to zero.
We select λ based on the information criteria (4), which, for the linear model (12) in noise, takes the form
J(x) =
1
σ2||y − Ax||2
2+ µ||x||0,
(19)
where the parameter µ is chosen as appropriate to implement AIC, BIC, or GIC. For example, ne= 5M for the
sinusoids-in-noise problem, and from (5) we have for BIC η = ln(N)5
2||x||0, implying µ = ln(N)5
2. Writing the
output of (14) and (18) as ˆ x(λ) = H(˜ x(λ)) to explicitly indicate the dependence on λ, we see that λ indexes a set
of potential sparse reconstruction solutions. For large λ, ?ˆ x(λ)?0= 0, and as λ decreases the number of nonzero
elements of ˆ x(λ) increases. We propose to select an optimal λ using the model order selection criterion in (4) as:
λ0= argmin
λ
J [H(˜ x(λ))]
(20)
The final sparse reconstruction is ˆ x(λ0), which may be substituted for ˜ x in (15)–(17) to produce the model order
and parameter estimates.
Previous approaches to λ-selection have largely been ad-hoc or heuristic based. Although cross-validation provides
one principled method of choosing λ [14], [15], this approach can be computationally expensive and necessitate
the collection of extra training data. The approach presented here transfers the problem of λ-selection in (14) to
one of µ-selection in (19). The strength of this approach is that µ may be selected in a principled manner using a
particular information criterion, such as AIC or BIC.
In practice, the optimization of (20) is complicated by the discontinuity of the ?·?0norm in (19), and conventional
gradient-based methods cannot be used. Any optimization algorithm capable of solving (20) can be used to find
λ. One such simple algorithm to approximate the minimium solution of J(ˆ x(λ)) is a tree-like grid search. At the
current stage of the search, the cost function is evaluated on a grid of points within a bounded interval of λ values.
The two points with the smallest cost are retained and denoted as λ1and λ2. The interval of next search stage is
centered atλ1+λ2
2
and has length |λ2−λ1|, and the minimum from a grid of points on this interval is searched for.
The search can be stopped after a fixed number of stages or when |J(ˆ xn+1(λ))−J(ˆ xn(λ))| ≤ ξ, where J(ˆ xn(λ))
is the minimum cost at stage n, and ξ is a user determined tolerance. The point with minimum cost at the final
stage is chosen as λ0. We use this tree search algorithm for the simulations in Section IV.
The algorithm is summarized in Table I.
TABLE I
SPARSE RECONSTRUCTION FOR JOINT MODEL ORDER AND PARAMETER ESTIMATION: ALGORITHM SUMMARY
• Form a dictionary matrix A by evaluating the component function f(t,θ) at parameter samples¯θ1,...,¯θK
• Select a value of µ in (19) based on the desired information criteria (e.g. µ = ln(N)5
2for BIC and a sinusoids-in-noise model)
• Minimize (20) to find the optimal sparsity parameter λ0and corresponding ˆ x(λ0).
• Substitute ˜ x = ˆ x(λ0) into (15)–(17) to obtain order and parameter estimates.
DRAFTFebruary 6, 2010
Page 7
AUSTIN ET AL: ON THE RELATION BETWEEN...7
B. Algorithm Settings
When using sparse reconstruction to solve continuous estimation problems, performance depends on the choice
of algorithm settings, namely the dictionary sampling, p, and λ. In this section, we discuss the selection of these
settings, and demonstrate the effect that different settings have on order and parameter estimation performance.
1) Dictionary Sampling Considerations: The first step in sparse parameter estimation is to form a dictionary,
A, from the model component a(θ), by sampling from the space of potential parameter values θ. The choice of
parameter samples {¯θk}, and in particular the spacing between adjacent parameters, affects parameter estimation
performance. Assuming the unobservable parameter θ is constrained to a region in RN, this region could be divided
into an equi-spaced grid, and dictionary columns a(θ) may be evaluated on this grid. Alternatively, non-equi-spaced
sampling mechanisms can be devised in order to minimize resultant parameter estimation error. An example of this
sampling strategy is presented in [16], where sample spacing is based on the Cram´ er-Rao lower bound (CRB).
Regardless of sampling method, it is desirable to sample columns as densely as possible in θ-space because
the estimation accuracy of θ is limited by the sample spacing of¯θk used to form A; coarse sampling results
in quantization error. However, from a computational perspective, it is also desirable to constrain the number of
columns in A. Furthermore, as θ-sampling becomes finer, intercolumn correlation increases, A does not satisfy the
RIP, and the solution to (14) may degrade.
In Fig. 1 we illustrate these properties as a function of θ-space sampling for a model order 1 sinusoids-in-noise
estimation problem. The details of the model and simulation parameters are presented in Section IV, where the
sinusoids-in-noise problem is considered in detail. Fig. 1 shows both the average estimated model order and the
average residual error versus ∆θ for an equi-spaced grid. The residual error is defined as ||yt− Aˆ x||2
2, where
yt is a noiseless version of the signal known to the simulation, and θ is frequency f of the sinusoid. The solid
line demonstrates the performance of the proposed algorithm using BIC-based selection of λ. The dotted line
demonstrates performance for an ad-hoc selection of λ; this fixed value of was chosen as the BIC-based value for
the specific sample spacing of ∆θ = 2.4 × 10−3Hz.
For the fixed value of λ, model order estimates and residual error are a strong function of θ-sampling. For large
∆θ, more model components are selected to fit the measured data. Better model order and residual performance can
be achieved by decreasing the sample spacing in the dictionary; however, as sampling becomes fine, performance
begins to degrade. This degradation is caused by the optimization routine used to solve (14) converging to local
minima. The convergence issues arise due to the non-convexity of (14) that results for p < 1. The value p = 0.8
was used in this example; motivation for this value of p is described in the following section. For small ∆θ, we
observe that the majority of the model order estimates are correct (ˆ
M = 1), however as ∆θ decreases, local minima
yieldˆ
M = 0 with increasing frequency. This is reflected in Fig. 1(a) by average model orders less than 1 for small
∆θ.
Fig. 1 demonstrates that BIC-based λ selection is superior to the ad-hoc fixed λ selection. Whereas, fixed-λ
performance is sensitive to ∆θ, BIC λ selection is not. For BIC selection, the estimated model order is approximately
1 (the correct value) over the range of ∆θ considered, and the residual error is always lower than the fixed-λ residual.
February 6, 2010DRAFT
Page 8
8
10
−4
10
−3
10
−2
10
−1
1
1.5
2
2.5
3
3.5
4
∆ θ
Model order
(a) Estimated model order
10
−4
10
−3
10
−2
10
−1
1.5
2
2.5
3
3.5
4
4.5
5
∆ θ
Residual
(b) Residual error
Fig. 1.Average estimated model order and residual error versus ∆θ. True model order is M = 1. The solid line corresponds to BIC-based λ
selection; the dotted line corresponds to fixed lambda. Error bars in each plot indicate standard error.
Furthermore, local convergence toˆ
M = 0 was not observed in the BIC selection case.
2) Selection of p.: The dictionaries used for sparse parameter estimation typically have high intercolumn correla-
tion. In this case, the RIP of compressive sensing [1] is not satisfied, and compressive sensing bounds on estimation
error cannot be applied. For example, for a sinusoidal model, the correlation between columns is described by
a sinc function, where the independent variable is the distance between frequencies. As the frequency separation
decreases, more columns will have high correlation on the main lobe of the sinc function.
Using p < 1 has been shown to be beneficial in reducing ℓ2estimation error in a similar optimization problem
[17] under RIP conditions. However, when p < 1, (14) is a non-convex optimization problem, and the optimization
routine used to solve (14) is not guaranteed to converge to the global minimum. Although global minimization
is not guaranteed when p < 1, empirical evidence in our experiments demonstrates that it is beneficial to use for
DRAFTFebruary 6, 2010
Page 9
AUSTIN ET AL: ON THE RELATION BETWEEN...9
0.4 0.50.60.7
p
0.8 0.91
1
1.2
1.4
1.6
1.8
2
Model order
(a) Estimated model order
0.40.50.60.7
p
0.80.91
1.5
2
2.5
3
3.5
4
4.5
5
5.5
Residual
(b) Residual error
Fig. 2.Average estimated model order and residual error versus p. True model order is M = 1. The information criterion BIC is used to
select λ. Error bars in each plot indicate standard error.
parameter estimation, even when RIP is violated. Fig. 2 shows the residual and estimated model order versus p for
the same sinusoid-in-noise (M = 1) estimation problem. Both residual error and model order performance improve
for decreasing p until approximately 0.8, after which there is very little change. Selection of any p < 0.8 result
in similar algorithm performance. We observe that reconstruction performance for p > 0.8 is sensitive to tolerance
settings in the optimization routine used in minimizing (14).
IV. NUMERICAL EXAMPLES: SINUSOID ESTIMATION
In this section we examine model order and parameter estimation performance for the sinusoids-in-noise model
yn=
M
?
m=1
αmej2πfmtn+ ǫn,n = 1,...,N,
(21)
February 6, 2010DRAFT
Page 10
10
where M is the model order and N is the number of time samples; fm and αm are the frequency in Hz and
complex amplitude of the mth sinusoid; tnis the time parameter in seconds, which we assume is equi-spaced and
given by tn= (n − 1)T. In general, time samples do not have to be equi-spaced. The sampling period is T, and
ǫnis i.i.d. CN(0,σ2) noise. This model is commonly used in wide range of applications, including radar imaging
and direction of arrival processing [18].
The sinusoids-in-noise model is an additive component model and can be represented as a linear system of
the form (12) for sparse estimation. We compare sparse estimation performance under different noise levels to
the traditional ESPRIT spectral estimation method. We first review spectral estimation methods and fundamental
resolution and parameter variance limits for the sinusoids-in-noise problem.
A. Spectral Estimation Methods
Traditional spectral estimation methods estimate the nonlinear frequency parameters, fm, in the sinusoids-in-noise
model (21). Once the frequency parameters are estimated, linear least-squares estimation can be used to estimate
the remaining linear amplitude parameters. Nonparametric (e.g., periodogram and correlogram) and parametric
(maximum-likelihood, AR, ARMA, MUSIC, ESPRIT, etc.) estimation methods have been proposed for frequency
estimation [11].
When using any of the previously mentioned methods to estimate the frequency parameters, it is assumed that
the model order M is known. In practice the model order is often unknown a priori and must be estimated before
frequency estimation is performed; for non-parametric methods, the model order dictates the number of peaks in the
power spectral density (PSD) to pick, and in the parametric methods, the model order is needed to set the number
of components in (21).
Many model order selection methods have been proposed in the literature. We utilize information criterion model
order selection methods here as discussed in previous sections. These methods can be applied to any signal where
a parametric model is known, as is the case here.
In this paper, as a benchmark for the sparse reconstruction approach, we use the parametric method ESPRIT [11]
with the model order selection method BIC for estimating the frequencies and model order in (21). We chose ESPRIT
for its ability to superresolve frequencies, its computational efficiency, and its near-optimal statistical estimation
performance [19]. For equi-spaced time samples in (21), we define superresolution as the ability to discriminate
between two frequencies spaced closer than a Rayleigh resolution cell, defined as
1
NTHz .
We note that sparse parameter estimation is more general than ESPRIT. Whereas ESPRIT is only for sinusoidal
estimation; sparse parameter estimation can be applied to any problem approximated by a linear additive model.
B. Parameter Error and Resolution Limits
We characterize performance by two types of error: model order error and parameter estimation error. Given the
correct model order, parameter estimation error of an unbiased estimate can be quantified with respect to the CRB.
DRAFTFebruary 6, 2010
Page 11
AUSTIN ET AL: ON THE RELATION BETWEEN...11
For the sinusoids in i.i.d. CN(0,σ2) noise model (21), with known variance, the CRB is [11]
E
??ˆΦ − Φ
??ˆΦ − Φ
?T?
≥
?2
σ2Re?µH
iµj
??−1
i,j
(22)
where
Φ =[f1,...,fM,Re{α1},...,Re{αM},
Im{α1},...,Im{αM}]T,
and where Φ andˆΦ are the vectors of true parameters and parameter estimates respectively. The real and imaginary
operators are Re{·} and Im{·}, respectively. The derivative of vector
??M
m=1αmej2πfmtn?
nwith respect to the ith
parameter of Φ is denoted µi; superscript ‘H’ indicates Hermitian transpose, and A ≥ B means A−B is a positive
semi-definite matrix. Using this relation, we can lower bound the variance on unbiased parameter estimates. For a
fixed model order of 1 and data model (21), the CRB for frequency estimates using equi-spaced time samples is
given by
var(ˆf) ≥
1
2T2SNR
N
N?N−1
n=0n2−
??N−1
n=0n
?2,
(23)
where SNR is defined as |α|2/σ2, and α is the signal amplitude. In the following section, we compare frequency
parameter estimation to the lower bound on variance given by the CRB.
A CRB analysis can also be used to bound the achievable resolution of two closely spaced sinusoids. One way to
define resolution is to declare the sinusoids resolved if frequencies are separated by more than some multiple of the
minimum standard deviation as determined by the CRB [20], [21]. Here, we define two sinusoids to be resolvable
if they are separated in frequency by at least the minimum standard deviation of an unbiased frequency separation
estimate. Using this resolvability metric, it can be shown that for equal-amplitude, closely spaced sinusoids, the
minimum frequency separation, ∆f, can be approximated by [22]
∆f=
1
2πT
?
2880(N2+ 131)
4NSNR(N2− 1)(N2− 4)(N2− 9)
?1/4
.
(24)
The SNR is defined as above, where α is the amplitude of each sinusoid.
C. Simulations
In this section, we compare the performance of the proposed sparse estimation algorithm to the ESPRIT spectral
estimation algorithm for the sinusoids-in-noise model estimation problem. The sparse estimation algorithm jointly
estimates model order and parameters as discussed in Section III. We use BIC to select model order in both
algorithms; so, we set µ =
5
2ln(N), corresponding to BIC, in (19). In ESPRIT, frequencies are estimated for a
fixed model order, and amplitudes are estimated using least-squares estimation with ESPRIT frequency estimates
substituted for the true frequencies.
The following results use Monte-Carlo simulation for 200 noisy realizations. True sinusoid amplitudes are held
at a constant SNR, defined as |α|2/σ2, for each realization, where α is the amplitude of the sinusoids, and σ2is the
variance of i.i.d. complex circular Gaussian noise. We define dB in this section as 10log(·). If there is more than
February 6, 2010DRAFT
Page 12
12
one sinusoid, both are given the same magnitude. Phases are held constant at 0.6129 radians for one sinusoid and
0.6129 and 3.9732 radians for two sinusoids; these phases were drawn uniformly at random from [0, 2π] and then
fixed. Frequency parameters are held constant with different separations depending on the simulation scenario. All
simulations use N = 16 time samples which are uniformly sampled at tn= (n−1)T, n = 1,...,N, with sampling
period T = 0.1 s. In sparse estimation simulations, we use a dictionary that is 256 times frequency oversampled,
meaning that there are 16×256 columns in the dictionary, and the frequency separation between adjacent columns
is
1
256×
the correlation window length, k, is chosen to be the one that gives the best model order selection performance for
1
NT= 2.4·10−3Hz. For ESPRIT estimation, it is assumed that the maximum model order is N/2 = 8, and
k = N/2+1,...,N. Although this is not possible in practice, we use this window selection to provide a best-case
ESPRIT estimate for comparison with sparse parameter estimation.
We use p = 0.8 in the simulations, as motivated by the discussion in Section III. The optimization transfer
algorithm presented in [23] is used to solve (14), and the initial value given to the optimization routine is x0= AHy.
A value of τ = −40dB was used in threshold operator (18). The tree search for finding λ consists of a search
depth of two levels. The first level consists of 27 samples in the range [10−4, 102] logarithmically spaced, and the
second level consists of 6 equispaced samples. Deeper searches with finer refinements require longer run times,
and provide similar results. The simulation and optimization settings described here were also used in Figs. 1 and
2 in Section III.
1) One sinusoid: Figs. 3 and 4 show model order selection probability histograms and amplitude-frequency
parameter plots for a single sinusoid-in-noise for an SNR of 10 and 0 dB, respectively. Results are lines showing
the estimated frequency location with amplitude encoded by line height for each of the 200 Monte-Carlo simulations.
Parameter plots are shown when both the estimated model order is the correct model order of 1, and when the
estimated model order is 2.
ESPRIT outperforms the sparse estimator in model order estimation performance for both noise levels, but is
only marginally better for low SNR. Parameter plots of incorrectly selected model order 2 show the frequency
closest to the true frequency in black and the second estimated frequency in green. For both estimators, the second
spurious frequency estimates tend to be small in magnitude compared to estimates close to the true frequency;
as SNR decreases, these spurious estimates increase in magnitude with respect to the estimate closer to the true
frequency. So, the model order performance discrepancy in the high SNR case may not be significant, if one low
magnitude spurious frequency estimate can be tolerated.
For the cases where the estimated model order equals the true order (ˆ
M = M = 1), dashed vertical blue lines
in the parameter plots are shown at the true frequency plus and minus twice the square-root of the CRB in (22).
Most of the realizations are contained inside the CRB lines. Table II shows the square-root of the CRB and root-
mean-squared error (RMSE) for 100% and 95% of sparse model and ESPRIT parameter estimates, given the correct
model order estimate. When only 95% of the estimates are used to calculate RMSE, 5% outliers are discarded. We
show RMSE with 95% of the estimates, since a small number of outliers can skew RMSE results to make it appear
that the estimator is performing poorly. The RMSE of the sparse algorithm is slightly better than that of ESPRIT,
DRAFT February 6, 2010
Page 13
AUSTIN ET AL: ON THE RELATION BETWEEN... 13
024
Model Order
6810
0
0.2
0.4
0.6
0.8
1
Probability
(a) Sparse MO
024
Model Order
6810
0
0.2
0.4
0.6
0.8
1
Probability
(b) ESPRIT MO
4.94.9555.05 5.15.15
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Frequency (Hz)
Magnitude
(c) Sparse Parameter Estimates (ˆ
M = 1)
4.9 4.9555.05 5.1 5.15
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Frequency (Hz)
Magnitude
(d) ESPRIT Parameter Estimates (ˆ
M = 1)
02468 10
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Frequency (Hz)
Magnitude
(e) Sparse Parameter Estimates (ˆ
M = 2)
02468 10
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Frequency (Hz)
Magnitude
(f) ESPRIT Parameter Estimates (ˆ
M = 2)
Fig. 3. Model order probability and parameter estimates for true model order 1. Simulations were run over 200 realizations with 10 dB SNR.
The red ‘x’ and vertical red dotted line indicate the position of the true sinusoid which has magnitude 1, shown by a horizontal red dotted line.
The dashed blue lines are located at twice the square-root of the CRB from the true frequency.
√CRB f1
RMSE f1(100%)RMSE f1(95%)
SNR (dB)SparseESPRITSparse ESPRIT
00.06100.0730 0.08080.06260.0690
10 0.01930.02270.0313 0.01980.0279
TABLE II
SQUARE-ROOT OF CRB AND RMSE OF FREQUENCY ESTIMATES GIVEN THAT THE TRUE MODEL ORDER 1 IS SELECTED.
and sparse estimation comes close to CRB performance at 95% RMSE.
2) Two well-separated sinusoids: In this example, we consider the model order and parameter estimation
performance when two sinusoids are well separated—4 Rayleigh resolution bins apart. Model order and parameter
estimation performance is shown in Figs. 5 and 6 for SNRs of 10 dB and 0 dB, respectively. Parameter plots are
shown only when the estimated model order is equal to the true model order of 2; the trends for model order
overestimation are similar to those in the one-sinusoid case. The frequencies associated with the lower frequency
are colored black and the frequencies associated with the higher frequency are colored green. After associating two
estimated frequencies to the two true frequencies, the remaining spurious estimates are typically small but become
larger as SNR decreases. When model order is underestimated in the low SNR case, the single sinusoid estimate is
February 6, 2010DRAFT
Page 14
14
024
Model Order
6810
0
0.2
0.4
0.6
0.8
1
Probability
(a) Sparse Model Order
024
Model Order
6810
0
0.2
0.4
0.6
0.8
1
Probability
(b) ESPRIT Model Order
4.84.95 5.15.25.3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Frequency (Hz)
Magnitude
(c) Sparse Parameter Estimates (ˆ
M = 1)
4.84.955.15.25.3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Frequency (Hz)
Magnitude
(d) ESPRIT Parameter Estimates (ˆ
M = 1)
02468 10
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Frequency (Hz)
Magnitude
(e) Sparse Parameter Estimates (ˆ
M = 2)
02468 10
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Frequency (Hz)
Magnitude
(f) ESPRIT Parameter Estimates (ˆ
M = 2)
Fig. 4.Model order probability and parameter estimates for true model order 1. Simulations were run over 200 realizations with 0 dB SNR.
The red ‘x’ and vertical red dotted line indicates the position of the true sinusoid which has magnitude 1, shown by a horizontal red dotted
line. The dashed blue lines are located at twice the square-root of the CRB from the true frequency.
√CRB f1+
RMSE f1+ RMSE f1+
√CRB f2
RMSE f2(100%)RMSE f2(95%)
SNR (dB)SparseESPRITSparseESPRIT
00.12360.16420.1642 0.15080.1524
100.03900.04660.05770.04350.0541
TABLE III
SUM OF SQUARE-ROOT OF CRBS AND SUM OF RMSES OF FREQUENCY ESTIMATES GIVEN THAT THE TRUE MODEL ORDER 2 IS SELECTED
FOR WELL SEPARATED SINUSOIDS.
close in frequency to one of the true sinusoids. To associate estimated sinusoids with a true sinusoids, we perform
a combinatoric search over all pairings of estimated frequency and true frequency pairs. The pairings that achieve
minimum least-squares frequency distance are selected as the data association. This association method is also used
for RMSE calculation. We see that a majority of the frequency estimates fall within the ±2σ CRB lines shown in
the parameter plots. The sum of RMSEs and square-root of CRBs for frequency estimates are shown in Table III.
Performance is similar for both estimation methods, but is slightly better for sparse estimation, especially for high
SNR.
DRAFT February 6, 2010
Page 15
AUSTIN ET AL: ON THE RELATION BETWEEN...15
024
Model Order
6810
0
0.2
0.4
0.6
0.8
1
Probability
(a) Sparse Model Order
024
Model Order
68 10
0
0.2
0.4
0.6
0.8
1
Probability
(b) ESPRIT Model Order
4.555.56 6.577.58
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Frequency (Hz)
Magnitude
(c) Sparse Parameter Estimates
4.555.56 6.577.58
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Frequency (Hz)
Magnitude
(d) ESPRIT Parameter Estimates
Fig. 5.Model order probability and parameter estimates for true model order 2 with sinusoids spaced 4 Rayleigh bins apart. Simulations
were run over 200 realizations with 10 dB SNR. The red ‘x’s and vertical red dotted lines indicate the position of the true sinusoids which
have magnitude 1, shown by a horizontal red dotted line. The dashed blue lines are located at twice the square-root of the CRB from the true
frequency.
3) Two closely-spaced sinusoids: Estimation performance for two closely spaced sinusoids is examined next.
We use (24) to select frequency spacing, which suggests that frequency spacing should be no closer than 0.26 and
0.46 Rayleigh bins apart for SNR of 10 db and 0 dB, respectively. We thus select slightly larger spacings of 0.3
and 0.5 Rayleigh bins for 10 and 0 dB, respectively.
Figs. 7 and 8 show model order and parameter estimation performance. We note that for both SNRs, many
outliers in the ESPRIT plots for model order 2 appear outside the frequency axis range shown. The middle row of
these figures shows the estimated parameters when the estimated order equals the true order (ˆ
M = M = 2), and the
bottom row shows the parameter estimates when the model order is incorrectly estimated as 1. In the 10 dB SNR
case, there is no parameter plot forˆ
M = 1 because model order 1 is never estimated in any of the realizations.
From these figures we see that model order estimation performance is significantly better for the sparse estimation
algorithm than for the ESPRIT algorithm, although performance of both algorithms is poor for low SNR. ESPRIT
underestimates model order with high probability for both SNRs. When model order is underestimated, both
February 6, 2010DRAFT