ArticlePDF Available
Sparse representations and compressive sampling
approaches in engineering mechanics: A review of
theoretical concepts and diverse applications
Ioannis A. Kougioumtzogloua,, Ioannis Petromichelakisa, Apostolos F. Psarosa
aDepartment of Civil Engineering and Engineering Mechanics,
Columbia University, 500 W 120th St, New York, NY 10027, United States
A review of theoretical concepts and diverse applications of sparse represen-
tations and compressive sampling (CS) approaches in engineering mechanics
problems is provided from a broad perspective. First, following a presenta-
tion of well-established CS concepts and optimization algorithms, attention is
directed to currently emerging tools and techniques for enhancing solution spar-
sity and for exploiting additional information in the data. These include alter-
native to `1-norm minimization formulations and iterative re-weighting solution
schemes, Bayesian approaches, as well as structured sparsity and dictionary
learning strategies. Next, CS-based research work of relevance to engineering
mechanics problems is categorized and discussed under three distinct applica-
tion areas: a) inverse problems in structural health monitoring, b) uncertainty
modeling and simulation, and c) computationally efficient uncertainty propa-
gation. Notably, the vast majority of problems in all three areas share the
challenge of “incomplete data”, addressed by the versatile CS framework. In
this regard, incomplete data may manifest themselves in various different forms
and can correspond to missing or compressed data, or even refer generally to
insufficiently few function evaluations. The primary objective of this review
paper relates to identifying and presenting significant contributions in each of
the above three application areas in engineering mechanics, with the goal of
expediting additional research and development efforts. To this aim, an exten-
sive list of 248 references is provided, composed almost exclusively of books and
archival papers, which can be readily available to a potential reader.
Keywords: sparse representations, compressive sampling, engineering
mechanics, uncertainty quantification, incomplete data
Corresponding Author
Email address: (Ioannis A. Kougioumtzoglou)
Preprint submitted to Probabilistic Engineering Mechanics June 23, 2020
1. Introduction
The problem of determining the current and predicting the future states of a
system based on knowledge of a limited number of data points has been a persis-
tent challenge in a wide range of scientific fields. Advancements in this direction
have led to various significant theoretical results, which have unequivocally rev-
olutionized modern science. One of the most characteristic examples relates to
the development of representations based on Fourier series [1]. This trigono-
metric series expansion of periodic functions has served as the starting point for
various efficient expansion and representation schemes (e.g., [2]). During the
past fifteen years, research efforts have focused on identifying and exploiting
low-dimensional representations of high-dimensional data, as well as on estab-
lishing conditions guaranteeing unique representation in the low-dimensional
space. This has triggered the birth of the currently expanding field of compres-
sive sampling (CS) (e.g., [3,4]), as well as the rejuvenation of the more general
field of sparse representations (e.g., [5,6]).
From a historical perspective, there have been several examples and early ob-
servations suggesting that signal reconstruction is possible by utilizing a smaller
number of samples than the minimum dictated by the Shannon-Nyquist (SN)
theorem (e.g., [2,7]). Indicatively, Carathodory showed in [8,9] that a signal
expressed as a sum of any ksinusoids can be recovered based on knowledge
of its values at zero time and at any other 2ktime points. Further, Beurling
[10] discussed the possibility of extrapolating in a nonlinear manner and deter-
mining the complete Fourier transform of a signal assuming that only part of
the Fourier transform is known. Dorfman [11] studied the combinatorial group
testing problem and provided one of the first sparse signal recovery problem
formulations. Also, Logan [12] showed that it is possible to reconstruct a band-
limited corrupted signal by an `1-norm minimization approach. These early,
seemingly paradoxical, results were further supported by relevant studies in the
field of geophysics [1315] (see also [16]) pertaining to the analysis of seismic
signals of spike train form due to the layered structure of geological formations.
It was shown that these sparse spike trains can be recovered accurately based
on incomplete and noisy measurements.
Nevertheless, it can hardly be disputed that sparse representations theory
and tools have been revitalized in recent years due to the pioneering work
in [1719], which provided bounds on the number of measurements required
for the recovery of high-dimensional data under the condition that the lat-
ter possess a low-dimensional representation in a transformed domain. The
aforementioned theoretical results, coupled with potent numerical algorithms
from the well-established field of convex optimization, have led to numerous
impactful contributions in a wide range of application areas. In this regard,
CS-related theoretical advancements and diverse applications associated with
the fields of signal and image processing, biomedicine, communication systems
and sensor networks, information security and pattern recognition have been
well-documented in dedicated books (e.g., [3,4,2026]), special issues (e.g.,
[27]) and review papers (e.g., [2843]).
More recently, the field of engineering mechanics has also benefited from the
advent of sparse representations and CS approaches in conjunction with uncer-
tainty quantification and health monitoring of diverse systems and structures.
However, to the best of the authors’ knowledge, there are currently no review
papers providing a comprehensive discussion and a broad perspective on the
aforementioned developments in engineering mechanics. In fact, although there
have been a couple of relevant efforts reported previously, these focus either on
specific and relatively narrow application domains, or are cross-disciplinary in
nature and lack any focus on a specific research field. Indicatively, the authors
in [44] focus exclusively on reviewing polynomial chaos expansions coupled with
CS approaches as applied in stochastic mechanics problems, whereas reference
[45] discusses the problem of CS-based governing dynamics modeling of complex
systems with applications in interdisciplinary science and engineering.
In this regard, in an effort to address this gap in the literature and to com-
plement some of the previous works by incorporating more recent developments,
this review paper focuses on sparse representations and CS approaches in the
field of engineering mechanics. Specifically, in Section 2, following a presen-
tation of well-established CS concepts and optimization algorithms, attention
is directed to currently emerging tools and techniques for enhancing solution
sparsity and for exploiting additional information in the data. These include
alternative to `1-norm minimization formulations and iterative re-weighting so-
lution schemes, Bayesian approaches, as well as structured sparsity and dic-
tionary learning strategies. Next, in Section 3, a rather broad perspective is
provided on CS-related contributions to engineering mechanics, and relevant
research work is categorized under three distinct application areas: a) inverse
problems in structural health monitoring, b) uncertainty modeling and simu-
lation, and c) computationally efficient uncertainty propagation. Notably, the
vast majority of problems in all three areas share the challenge of “incomplete
data”, addressed by the versatile CS framework. In this regard, incomplete
data may manifest themselves in various different forms and can correspond to
missing or compressed data, or even refer generally to insufficiently few func-
tion evaluations. Further, concluding remarks are presented in Section 4. It is
noted that the primary objective of this review paper relates to identifying and
discussing significant contributions in each of the above three application areas
in engineering mechanics, with the goal of expediting additional research and
development efforts. To this aim, an extensive list of 248 references is provided,
composed almost exclusively of books and archival papers, which can be readily
available to a potential reader.
2. Theoretical concepts and algorithmic aspects
In this section, the basic theoretical concepts and algorithmic aspects related
to sparse representations and CS tools are reviewed. To enhance the pedagogical
merit of the paper and motivate the reader, a simple example is provided first
where the necessity for CS tools arises naturally. Next, the problem of approxi-
mating a sparse signal is formulated as an optimization problem and solved via
a brute-force approach. The need for more computationally efficient tools is dis-
cussed and relevant methodologies are presented. Further, the critical question
regarding performance guarantees and the number of required measurements is
addressed. Lastly, methodologies for exploiting additional information present
in the data and for enhancing solution sparsity are also discussed. The inter-
ested reader is also directed to the books in references [36] for a more detailed
presentation from a signal processing and a mathematics perspectives.
2.1. Motivation
The SN theorem states that a bandlimited continuous-time signal can be
exactly reconstructed by a set of uniformly spaced measurements sampled at
the Nyquist rate; i.e., at a frequency double the maximum frequency present in
the signal (e.g., [2,7]). Although the SN theorem has impacted significantly the
signal processing field and related applications, in many cases the dictated min-
imum number of measurements can be prohibitive from a computational cost
perspective [3]. In this regard, the acquired signals are often compressed by
utilizing an appropriate change of basis. In this new basis, the expansion coeffi-
cient vector has only few nonzero elements; thus, yielding significant savings in
terms of required storage capacity (see, for instance, related “lossy” compression
techniques [46]).
In this context, a vector xof length nis referred to as k-sparse if at most k
out of its ncomponents are nonzero; that is,
where k·kpdenotes the `p-norm defined as [6]
kxkp= n
i=1 |xi|p!
for 0 <p<. Note that, according to the definition of Eq. (2), k·k0(also
called the cardinality of x) is not a proper norm. However, it can be defined as
the limit of k·kpfor p0; that is,
I(xi) (3)
In Eq. (3), I(xi) is the indicator function, which takes the value 0 if xi= 0 and
1 otherwise [6]. Moreover, if the coefficient vector is k-sparse, then the original
signal is characterized as sparse, or, in other words, it exhibits a sparse repre-
sentation in this particular expansion basis. Accordingly, a vector is referred to
as compressible (or approximately sparse), if it can be approximated satisfac-
torily by a sparse vector. Typical examples of sparse, compressible, and dense
(i.e., not sparse) vectors are shown in Fig. 1to illustrate the differences between
them. In passing, it is worth noting that there also exist generalizations of the
concept of a vector norm to that of a matrix norm. Indicatively, the Frobenius
norm (given as the sum of squares of the elements of a matrix) and the nuclear
norm (given as the sum of singular values of a matrix) can be construed as gen-
eralizations of the `2- and the `1-norms, respectively, to account for matrices
(e.g., [47,48]).
Fig. 1. Typical examples of sparse, compressible and dense (i.e., not sparse) vectors.
Next, motivated by the aforementioned sparsity, inherent in a wide range
of signals in various applications, it is natural to pose the question whether
it is possible to bypass the potentially cumbersome task of recording the sig-
nal at a rate dictated by the SN theorem. In this manner, acquisition of the
signal directly in compressed form by employing a sub-Nyquist rate would cir-
cumvent the computationally costly two-step process of capturing and storing
the complete signal first, and then compressing it by discarding the redundant
information. For tutorial effectiveness, consider the continuous-time signal
y(t) = 10 cos (30t+π) + 12cos 20t+π
8+ 4 cos 6t+2π
where 0 t < 2π. According to the SN theorem [2,7], exact reconstruction
requires the signal to be sampled at minimum 60 points in the time domain.
However, the signal of Eq. (4) can be represented exactly in the frequency do-
main by 6 coefficients only, i.e., 2 for each dimension of the real and imaginary
components of the harmonics at 6, 20 and 30 rad/s. Clearly, in this case,
the frequency domain representation of the signal can be construed as signif-
icantly more “compact” (or, in other words, sparser) than the corresponding
representation in the time domain. Considering this seemingly inefficient signal
representation in one domain (where typically measurements are acquired) and
the considerably more compact representation in a different one, it would be
advantageous to develop a methodology for reconstructing (exactly or approx-
imately) signals that are known to have a sparse representation in some basis
by collecting the smallest possible number of measurements.
This problem takes the form of an underdetermined linear system of equa-
tions, i.e.,
where yis a vector containing m<nmeasurements of the original signal, x0is
the coefficient vector, and Acan be either fixed, or written as A=ΦD, with D
being the basis matrix and Φan m×nmatrix (also known as CS matrix [49] as it
randomly deletes rows of D). In passing, the original complete signal is denoted
by y0and is given as y0=Dx0. For the specific aforementioned example of
Eq. (4), the yvector represents the measurements in the time domain, while
Drepresents the Fourier basis matrix. Obviously, the underdetermined system
of Eq. (5) has either no solution, or an infinite number of solutions. Further,
since most real-life signals are corrupted with noise and are characterized by
measurement errors, the reconstruction tools should exhibit robustness in their
implementation. In fact, the reconstruction tools should also exhibit stability
in their performance and be capable of addressing even cases of almost-sparse
2.2. Regularization of the underdetermined system of equations
It has been shown that although the system of Eq. (5) is amenable in general
to an infinite number of solutions, it can be regularized (i.e., constrained) so that
only one solution is relevant (e.g., [4]). Specifically, it can take the form of a
constrained optimization problem for which the objective function relates to the
sparsity of the signal (expressed via the `0-norm); and feasible solutions are only
the ones satisfying Eq. (5), which acts as the constraint. This can be written as
xkxk0subject to y=Ax (6)
Further, it has been proved [6] that utilizing m2kmeasurements Eq. (6)
yields a unique solution x, equal to the k-sparse coefficient vector x0. This result
defines a measurement bound (see also Sections 2.4-2.7 for more general results
and discussion) dictating the required number of measurements for a certain
sparsity degree of x0, and is based on the necessary and sufficient condition
(e.g., [6])
2kkrank(A) (7)
In Eq. (7), krank(A) is the largest number sfor which every subset of scolumn
vectors of Ais linearly independent. The bound m2kcan be easily obtained
by utilizing the condition of Eq. (7) in conjunction with the inequality
that holds for any arbitrary matrix [6]. Note that krank(A) is NP-hard (NP
standing for nondeterministic polynomial time) to compute, i.e., even for rel-
atively small matrices Athere is no available algorithm for computing it effi-
ciently [6]. Therefore, it is often impossible to verify numerically the condition
of Eq. (7). Alternatively, the mutual coherence of Ais easier to evaluate from
an algorithmic point of view. This is defined as
µ(A) = max
and can be used to determine whether the `0-norm minimization problem of
Eq. (6) has a unique solution coinciding with x0. In Eq. (9), ai,i∈ {1, . . . , n},
represents the i-th column vector of A, and ,·i denotes an inner product.
Thus, µ(A) can be construed as a metric of the independence degree of the
basis vectors of A. In this regard, considering also that krank(A)1(A) [6],
satisfying the condition
is sufficient for obtaining x0as the unique solution of the `0-norm minimiza-
tion problem of Eq. (6) with m2k[50]. It becomes clear that matrices A
with higher mutual coherence values µ(A) are better choices within the con-
text of CS and sparse approximations. More importantly, it has been shown
that matrices A, satisfying conditions such as those of Eqs. (7) and (10), can
be constructed by random submatrices of bounded orthonormal systems (e.g.,
Fourier, wavelet and Legendre bases with randomly deleted rows [4]). Indica-
tively, regarding the `0-norm minimization problem of Eq. (6), any signal with
ak-sparse Fourier coefficient vector can be exactly reconstructed by 2kran-
domly selected measurements in the time domain. Thus, the signal of Eq. (4)
can be reconstructed by utilizing 12 measurements only as compared to the 60
measurements required by the SN theorem [4].
Further, as mentioned in Section 2.1, measurement errors and noise are often
unavoidable in practice, while the exact sparsity degree of the coefficient vector
may not be known a priori. Clearly, this motivates the reformulation of the
`0-norm minimization problem of Eq. (6) to account for measurement errors
and for approximately sparse coefficient vectors. In this regard, Eq. (6) is cast
in the form
xkxk0subject to kyAxk2(11)
where is a pre-specified discrepancy from the measurement vector y. It can be
readily seen that the equality constraint of Eq. (6) is replaced in Eq. (11) by an
inequality constraint; thus, enlarging the set of feasible solutions and enhancing
the flexibility and robustness of the technique from a practical implementation
perspective. Notably, theoretical results referring to necessary and sufficient
conditions for sparse signal reconstruction, similar to the ones described by
Eqs. (7) and (10), are also available for the formulation of Eq. (11) [5]. In
Sections 2.4-2.5, approximate solution techniques are presented for the `0-norm
minimization problem of Eq. (11), which is also referred to in the literature as
“noisy” `0-norm minimization problem.
2.3. A brute-force solution approach for the `0-norm minimization problem
A rather brute-force approach for solving the `0-norm minimization problem
of Eq. (6) relates to employing an exhaustive search by considering all possible
supports (i.e., all possible combinations of the nonzero components locations)
for the estimated vector xand by checking if a solution to y=Ax exists for
each and every support. In this regard, Fig. 2shows the reconstructed Fourier
coefficient vector for the signal of Eq. (4) by solving the `0-norm minimization
problem of Eq. (6) via exhaustive search. Further, in Fig. 3the original and
the reconstructed signals in the time domain are plotted. It is seen that the
reconstructed signal by utilizing only 12 measurements (i.e., 20% of the number
required by the SN theorem) matches perfectly that of Eq. (4).
Fig. 2. Frequency domain representation of the original (blue circles) and the reconstructed
(green dots) signals of Eq. (4).
Fig. 3. Time domain representation of the original (blue line) and the reconstructed (green
circles) signals of Eq. (4); the set of randomly sampled measurements (red squares) is also
Although it becomes clear, based on the simple example of Eq. (4), that a
CS-based signal reconstruction is advantageous due to the considerably smaller
number of required measurements, a brute-force solution approach by exhaustive
search becomes computationally prohibitive for an increasing size of A. This
is readily understood by considering the number of supports to be examined
in an exhaustive search solution framework, which is equal to n
k. In fact, it
has been shown that solving the `0-norm minimization problem of Eq. (6) (or
alternatively, Eq. (11)) is NP-hard [6,51].
To address this challenge, two main categories of solution approaches are pre-
sented and discussed in the following sections. Specifically, in Section 2.4 meth-
ods for solving approximately the `0-norm minimization problems of Eqs. (6)
and (11) are described, and in Section 2.5 methods for solving a relaxed form of
the `0-norm minimization problem are presented. In passing, it is worth noting
that, as mentioned in Section 1, a wide range of application areas have bene-
fited from the advent of CS-based theoretical concepts and related numerical
solution schemes. This is anticipated considering the fact that various seem-
ingly unrelated problems originating from different disciplines can be cast as
underdetermined systems of equations of the form of Eq. (5). In this regard,
the solution approaches discussed in the ensuing sections are quite general and
have been adopted in a rather straightforward manner by diverse research fields,
including the field of engineering mechanics, which is the focus of this paper.
2.4. Approximate solutions to the `0-norm minimization problem: Greedy meth-
An approximate approach for solving the `0-norm problem of Eq. (6) (or
Eq. (11)) relates to, first, determining the support of the coefficient vector (i.e.,
the optimal locations of the nonzero coefficients) and, second, estimating the
values of the nonzero coefficients. In this regard, aiming to obtain a globally
optimal solution, greedy methods make a sequence of decisions based on certain
local optimality conditions. More specifically, greedy methods can be broadly
divided into two categories [3]. The first relates to greedy pursuits, which con-
struct the support of an initially empty coefficient vector xin an iterative man-
ner. This is done by adding in each iteration a column vector of matrix A
that best fits the measurements yaccording to a pre-specified criterion, and
subsequently, by estimating the coefficients of the selected support. The second
relates to thresholding-type methods, which iteratively obtain an estimate x
and employ “hard” thresholding to retain only the largest kcoefficients. Con-
cisely, the main difference between these two categories is that greedy pursuits
construct the solution xgradually by adding nonzero coefficients to it, whereas
thresholding methods provide an estimate for the solution in each iteration cy-
cle, and subsequently, remove the relatively small coefficients as dictated by the
prescribed threshold.
Greedy methods have been developed in conjunction with diverse applica-
tions in various fields such as signal processing and statistics. In this regard, a
vast amount of algorithms with similar characteristics is available in the litera-
ture, while multiple extensions and modifications have been proposed over the
years [3,6]. Typical examples of greedy pursuits include the matching pursuit
(MP) [52] and the orthogonal matching pursuit (OMP) [53] algorithms, whereas
indicative popular thresholding methods include the iterative hard threshold-
ing [54] and the compressive sampling matching pursuit [55] algorithms. In
general, it can be argued that greedy methods are attractive due to the fact
that their numerical implementation is rather straightforward, and that the as-
sociated computational cost is kept at a reasonable level under the condition
that the sparsity degree kis relatively low. Further, in many cases theoretical
performance guarantees are available (e.g., [3]), while empirical studies can be
performed to analyze and compare the performance of different algorithms (see
also Section 2.7). The interested reader is also directed to [3,5] and references
therein for more details.
In the following, the OMP algorithm, which is one of the most widely utilized
greedy methods, is presented in more detail [4]. The input to OMP is the m-
length measurement vector y, the m×nmatrix Aand the sparsity degree k
that the coefficient vector x0is anticipated to exhibit. At the j= 0 iteration,
the algorithm starts with an empty support S(0), a zero vector x(0) , and thus,
with a residual vector equal to
r(0) =yAx(0) =y(12)
Next, the column vector of matrix Awith the maximum correlation with the
residual r(0) is selected. In this manner, the corresponding location in the
support of x(1) becomes active. Subsequently, for the selected support S(1), the
value of the nonzero component is obtained by employing ordinary least squares
and minimizing
2. The above scheme is repeated in an iterative
manner until convergence. The convergence criterion may relate, for instance,
to the quantity kyAxk2becoming smaller than a prescribed threshold, or to
the number of nonzero elements in xexceeding a given value. It is noted that,
at every iteration of the algorithm, the vector xis updated as an orthogonal
projection of yonto the subspace of the selected vectors from A. The fact
that all the components of xmay change at a given iteration constitutes the
most pronounced difference with the related MP algorithm. Specifically, at the
j-th iteration, the MP algorithm adds a single component in xaccording to the
criterion of maximum reduction of the residual r(j+1), without affecting the rest
of the vector elements.
Further, indicative OMP enhancements include efficient implementation schemes
for performing the matrix-vector multiplications ATr(j)to obtain correlation
degree estimates at each iteration j(e.g., based on fast Fourier transform [5]),
reduction of the least squares related computational cost (e.g., based on QR
factorization [56]), approaches for adding not only one, but multiple elements
to the support at each iteration (e.g., Stagewise OMP [57]), and various alter-
native strategies for selecting the next location to be added to the support S
(e.g., least squares OMP [5]).
Also, it is worth noting that the aforementioned algorithms (and OMP in
particular), which aim at solving the `0-norm minimization problem directly,
perform satisfactorily when the sparsity kof the coefficient vector is small com-
pared to its size n. However, they can become computationally prohibitive in
cases, for example, of high-dimensional problems exhibiting low sparsity degrees.
2.5. Relaxation of the `0-norm minimization problem: `1-norm minimization
As highlighted in Section 2.3, the `0-norm minimization problem in Eq. (6) is
NP-hard to solve. To address this challenge, a popular solution approach relates
to utilizing a surrogate for approximating the non-convex ob jective function. To
this aim, convex surrogates appear particularly attractive, primarily due to the
plethora of well-established theoretical tools and efficient numerical algorithms
for analyzing and solving convex optimization problems.
(a) `0-norm (b) `1/2-norm
(c) `1-norm (d) `2-norm
Fig. 4. Two-dimensional representation of various norms restricted in the domain
[1,1] ×[1,1].
In this regard, there are several properties rendering the `1-norm an effi-
cacious convex surrogate for the non-convex `0-norm. Specifically, focusing on
the hypercube Bdefined as B={x|kxk1}(i.e., each dimension of
xis restricted within the interval [1,1]), the `1-norm is the largest convex
function not exceeding k.k0; that is, kxk1= sup{h(x)|his convex and h(x)
kxk0xB}. This property can be proved formally (e.g., [4]) and such
a function is referred to typically as convex envelope. In Fig. 4, the two-
dimensional case is depicted for tutorial effectiveness, where it can be observed
that the `1-norm (Fig. 4c) is indeed the convex envelope of the `0-norm (Fig. 4a).
Further, it is seen that the `1/2-norm (Fig. 4b) is non-convex (this holds for all
`p-norms with 0 < p < 1), whereas the `2-norm (Fig. 4d) is not the largest
convex underestimator of the `0-norm; the latter holds true for all `p-norms
with p > 1 (e.g., [5]). Next, utilizing the `1-norm surrogate, the optimization
problem in Eq. (6) is replaced by
xkxk1subject to y=Ax (13)
which is a convex optimization problem that can be solved much more efficiently
than the one in Eq. (6) [3,6]. Eq. (13) is also referred to in the literature as
convex relaxation of the `0-norm minimization problem.
Fig. 5. Graphical demonstration of the `1-norm property of recovering sparse vectors:
sparse vector x0, feasible set {x0}+ null(A) and `1-ball of radius kx0k1.
A graphical demonstration of the property of the `1-norm minimization to
recover sparse vectors is provided in Fig. 5. Specifically, the set of feasible vectors
xrelated to the optimization problem in Eq. (13) form the affine subspace
S={x|Ax =y}={x0}+ null(A) (14)
where null(A) denotes the nullspace of matrix A. The `1-norm minimization
determines the point in Swith the smallest `1-norm. This procedure can be
illustrated by considering the `1-ball
of radius one in Rn, which contains all vectors xwith objective function in
Eq. (13) at most equal to one. Scaling B1by syields the set of vectors xwith
objective function at most equal to s. In this regard, initializing the process by
setting the value of sequal to zero, B1is expanded gradually by increasing s.
The `1-norm minimizer is obtained when sB1first reaches the affine subspace
Sand this intersection point x0is the solution to the optimization problem in
Eq. (13). Considering the geometry of the ball B1, it is readily seen that the
possible solution points belong either to the vertices or to the edges of B1, which
correspond, indeed, to sparse vectors.
Motivated by early observations in the 1960s [12] related to its efficiency
in recovering sparse vectors, `1-norm minimization has been employed over the
past few decades in various applications based on rather heuristic arguments. In
fact, it was not until the first decade of the twenty-first century that conditions
guaranteeing sparse recovery were formally stated and proved. Specifically, it
was shown in [50,58] that the condition of Eq. (10) guarantees that the `1-
norm minimization problem in Eq. (13) has a unique solution provided that the
columns of Ahave unit `2-norms. Moreover, this solution coincides with the
unique solution of the `0-norm minimization problem in Eq. (6). Notably, it
was shown in [59] that the mutual coherence of any matrix ARm×nwith
unit `2-norm columns is bounded by
m(n1) (16)
Eq. (16) is referred to in the literature as Welch’s bound (e.g., [5]). Taking into
account Eqs. (10) and (16) yields a bound estimate of the form mCk2on the
number mof measurements required for the recovery of a k-sparse vector via
`1-norm minimization; Cdenotes a constant factor. Note, however, that this
bound on mappears quite conservative from a practical perspective, and that a
significantly smaller number of measurements (than of the order of k2) can be
adequate for successful sparse recovery based on `1-norm minimization.
In this regard, a condition that leads to considerably “tighter” bounds on
the number mof measurements guaranteeing exact `1-norm recovery of any k-
sparse vector, relates to the restricted isometry property (RIP). According to
[60], a matrix Asatisfies the RIP of order k, with constant δ, if
xk-sparse,(1 δ)kxk2
2≤ kAxk2
2(1 + δ)kxk2
Also, the order kRIP constant δk(A) is the smallest number δsatisfying the
inequality in Eq. (17). Further, it was shown in [17] that if y=Ax0with
kx0k0=kand δ2k(A)<21, then x0is the unique optimal solution of the
optimization problem in Eq. (13). Next, focusing on the RIP of Gaussian ran-
dom matrices, i.e., matrices Awith independent N(0,1/m) random variables as
entries, it was shown (e.g., [17,61]) that a k-sparse vector can be reconstructed
via `1-norm minimization by employing only mC0klog(n/k) random mea-
surements, where C0is a constant factor. This is a substantially tighter bound
on mas compared to mCk2derived by employing Eqs. (10) and (16). Note
also that the bound mC0klog(n/k) allows for (k, m, n) to scale proportionally
[18], a result that contributed to the rapid development of the CS field. One of
the tightest known bounds of this kind related to Gaussian matrices was derived
in [19]. This requires only m8klog(n/k) + 12kmeasurements and does not
involve any unknown constants. However, notwithstanding their considerable
theoretical value, in many cases the practical merit of such measurement bounds
is limited by the fact that the sparsity degree kof the target vector is unknown,
in general, a priori. Thus, alternative approaches are required for assessing the
performance of sparse recovery algorithms (see Section 2.7). In passing, it is
mentioned for completeness that two other relevant properties, which are ac-
tually utilized in the proof of exact `1-norm recovery under the RIP [17], are
the nullspace and the restricted strong convexity properties. Unfortunately, the
problem of verifying any of these conditions for a given matrix Ais NP-hard
2.6. Approximate solutions to the `1-norm minimization problem: Convex op-
timization methods
The `1-norm minimization problem of Eq. (13), also known as basis pursuit,
can be cast in the form
xkxk1subject to kyAxk2(18)
to account for measurement error as also explained in Section 2.1. To facilitate
its numerical solution, Eq. (18) can be equivalently written as an unconstrained
problem in the form
The problem of Eq. (19) is also referred to in the literature as basis pur-
suit denoising (BPDN) [63], or least absolute shrinkage and selection operator
(LASSO) [64].
Two main challenges to be addressed for solving, in a computationally effi-
cient manner, the convex optimization problem of Eq. (19) relate to scalability
and non-differentiability. Second-order convex optimization algorithms, such as
interior-point methods or quasi-Newton schemes [65], are advantageous in the
sense that they require typically relatively few iterations to converge. Note,
however, that for an n-variate problem each iteration involves the solution of
an n×nlinear system, incurring a computational cost of O(n3) per iteration.
In various signal processing and statistical learning applications (as well as in
many engineering mechanics applications discussed in Section 3), the number
nof variables can reach the order of millions; thus, rendering the computa-
tional cost of even a single iteration prohibitively large. Therefore, attention
has been directed to alternative algorithms, which utilize only first-order infor-
mation about the objective function.
A standard first-order method in convex optimization is the gradient descent
method, which in its basic form and under certain smoothness conditions ex-
hibits a convergence rate of O(1/j), where jis the iteration number. Neverthe-
less, the objective function in Eq. (19) involves the non-differentiable `1-norm.
To address this challenge related to the evaluation of the gradient, subgradient
methodologies can be employed; however, these are typically characterized by a
relatively poor convergence rate of the order O(1/j) [6].
Further, the proximal gradient (PG) method (e.g., [66]) exhibits considerable
efficiency in solving optimization problems, where the objective function consists
of the sum of a smooth convex function f(x) (with fbeing Lipschitz contin-
uous; see [67] for a definition of Lipschitz continuity) and a non-differentiable
convex function g(x), such as in Eq. (19). In this regard, the proximal operator
is defined as
proxg(z) = arg min
xg(x) + 1
and the update formulae at iteration jtake the form
x(j+1) = proxg/L z(j)(21)
where Lis typically equal to the Lipschitz constant of f. The j-th PG iteration
in Eq. (21) can be construed as a two-step update formula, where, first, an ordi-
nary gradient descent step z(j)decreasing the smooth function fis determined
and, second, the step x(j+1) is chosen in a manner that it both reduces the value
of the non-differentiable function gand remains close to z(j)via the introduc-
tion of the term
2in the proximal operator of Eq. (20). The strong
convexity of k.k2
2guarantees that the PG step has a unique solution [68]. Also,
compared to the subgradient method, the PG algorithm yields a convergence
rate of O(1/j), which corresponds to the standard case of no non-differentiable
terms. Clearly, although the PG method exhibits a relatively high convergence
rate, the minimization of a non-differentiable function at each iteration can be
computationally demanding. Nevertheless, the non-differentiable `1-norm func-
tion in Eq. (19) yields a proximal operator (also known as soft thresholding
operator) in closed-form, i.e.,
[proxλk.k1(z)]i= soft(zi, λ) = sign(zi)max(|zi| − λ, 0) (22)
This leads to the iterative soft-thresholding algorithm (ISTA) [69], which is
widely utilized in `1-norm minimization approaches. Moreover, it has been
shown that the theoretically optimal convergence rate for first-order optimiza-
tion methods is O(1/j2) and this can be achieved by Nesterov’s accelerated
gradient method (AGM) [70]. Note that the fundamental concept in AGM,
which relates to introducing a momentum term in the gradient descent up-
date formula, can be readily used in conjunction with the ISTA. This has led
to the computationally efficient fast iterative shrinkage-thresholding algorithm
(FISTA) [71] with a convergence rate of O(1/j2).
A potential shortcoming of ISTA and FISTA relates to parameter λ, which
needs to be tuned for solving the problem in Eq. (19). The least angle regres-
sion (LARS) algorithm [21,72] constitutes an alternative approach with the
advantageous feature of computing the entire solution path directly; that is, the
solution of Eq. (19) corresponding to a given range of λvalues is provided as the
output of a single run of the algorithm. LARS follows a procedure somewhat
similar to the OMP, and thus, it can be argued that its performance deterio-
rates for problems with increasing dimensionality. It is worth mentioning that
there exist various other algorithms for solving the BPDN problem in Eq. (19).
Indicatively, these include coordinate descent algorithms, which update a single
coordinate at each iteration, and primal-dual algorithms (e.g., [73]).
Alternative solution schemes, which deviate considerably from the standard
`1-norm formulation of Eq. (19) and aim at further enhancing the sparsity of
the solution, are presented and discussed in Section 2.8.
2.7. Performance analysis
Successful CS-based reconstruction of a sparse coefficient vector x0relies
on a priori knowledge of the minimum possible number m, where mis the size
of the measurement vector y. Of course, the selected solution algorithm, the
basis matrix Dand the CS matrix Φaffect the reconstruction accuracy as well.
In this regard, this section focuses on presenting both theoretical results and
practical approaches for addressing the above points related to CS performance
The problem is typically posed in the literature as determining the minimum
number mfor exact reconstruction of an arbitrary coefficient vector x0with
sparsity degree k, given a fixed matrix A(strong case). Alternatively, the weak
case relates to determining the minimum number mfor exact reconstruction of a
specific coefficient vector x0with sparsity degree kby appropriately constructing
a matrix A; see also [74].
In this regard, the potential of combinatorial geometry has been explored for
providing precise measurement bounds (e.g., [74]; see also [75] for alternative
approaches). More specifically, consider the `1-ball of Fig. 5, which represents
a convex polytope Cin Rn. A polytope generalizes the definition of a three-
dimensional polyhedron to ndimensions and is characterized by a number f0(C)
of 0-dimensional faces (i.e., vertices), a number f1(C) of 1-dimensional faces
(i.e., edges), and, in general, a number fk(C) of k-dimensional faces (e.g., [74]).
Obviously, multiplying a matrix Aof size m×nby a vector of size nprojects the
vector onto a lower-dimensional space, and thus, the number of k-dimensional
faces of ACcan only be less than or equal to fk(C), i.e.,
fk(AC)fk(C) for k0 (23)
Notably, it has been proved (e.g., [76]) that the ratio of face counts of the
projected polytope ACover the original polytope Cis equal to the probability
of exact reconstruction of x0by solving the `1-norm minimization problem of
Eq (13). Also, it has been shown that, for the weak case [76,77] and for
matrices Awith independent identically distributed N(0,1) Gaussian random
entries and provided a sufficiently large number m, the fraction fk(AC)/fk(C)
approaches 1 as the problem dimension napproaches infinity. Various other
similar results have been obtained referring to, indicatively, the reconstruction
of a non-negative coefficient vector by solving a special form of Eq. (13) (e.g.,
[74]), cases of employing other than `1-norm regularization methods (e.g., [78]),
cases of finite nvalues (e.g., [79]), as well as the strong reconstruction case for
which the condition fk(AC) = fk(C) must hold.
However, notwithstanding empirical evidence indicating that exact recon-
struction with a similarly small number mof measurements is possible (e.g.,
[77]), theoretical results on precise measurement bounds for cases of non-Gaussian
matrices Ahave been scarce; see also [75]. In this regard, an alternative ap-
proach relates to associating the reconstruction performance with certain prop-
erties of matrix A, such as krank(A), µ(A), nullspace property, and RIP; see
also Sections 2.2 and 2.5. Concisely, exact reconstruction of the coefficient vec-
tor x0is guaranteed (at least with high probability; see, for instance, [60]) if a
condition, such as Eq. (17), pertaining to the sparsity degree kand to a prop-
erty of Ais satisfied. In this context, it is noted that there has been extensive
research during the past decade (e.g., [4]) on identifying properties of Awith
direct relation to the performance of the reconstruction problem. Of course,
verifying such conditions for a given matrix A, or constructing a matrix Aad-
hering to prescribed properties, are nontrivial challenges. For example, Eq. (7)
represents a necessary and sufficient condition guaranteeing exact reconstruc-
tion of x0by solving either the `0-norm or the `1-norm minimization problems
of Eqs. (6) and (13), respectively; however, it is NP-hard to verify [62]. On the
other hand, it is rather straightforward to construct matrices Awith low mutual
coherence µ(A) dictated by the sufficient condition of Eq. (10) (e.g., [4,58]);
however, the measurement bound obtained is rather conservative (referred to as
“pessimistic” in the CS literature [74]). Further, the nullspace property guaran-
teeing exact reconstruction of x0by solving the `1-norm minimization problem
of Eq. (13) is also NP-hard to verify [4]. Regarding matrices Asatisfying the
RIP of Eq. (17), these can be constructed by employing, for instance, random
submatrices of bounded orthonormal systems [4], and thus, RIP has been used
in various practical problems. Nevertheless, the RIP-based measurement vector
size mis also a pessimistic bound; see also [60,74] for a discussion and [80] for
related improvements.
Although the aforementioned results and conditions are characterized by
theoretical rigor and have been catalytic for the advancement of CS, alternative
rather empirical approaches are necessary for addressing more general cases. In-
dicatively, these include the tasks of tuning a certain algorithm (i.e., selecting an
optimal set of parameters; e.g., [81]) and comparing performances of different
reconstruction algorithms [82], as well as cases of coefficient vectors exhibit-
ing structured sparsity (see Section 2.8.3); and thus, alternative algorithms are
required for exploiting this additional information [83]. In this regard, empir-
ical measurement bounds are often constructed in practice in the form of a
phase diagram, i.e., a diagram depicting the transition from accurate recovery
to recovery with significant error. In fact, such a diagram not only provides a
required number of measurements as a function of problem size nand sparsity
degree k, but also illustrates the behavior of the reconstruction error with vary-
ing values of mand k. Of particular importance to applications is the width of
the transition zone from accurate to inaccurate reconstruction, which has been
shown to be sharper for increasing values of n(e.g., [78]). It is worth noting
that phase diagrams, as a tool for assessing the performance of CS methodolo-
gies, appear versatile in addressing a wide range of diverse problems in a rather
straightforward manner.
Phase diagrams are typically constructed with the aid of synthetic data.
Specifically, for a fixed coefficient vector length n, synthetic vectors x0are
constructed randomly with varying sparsity degree values k, and reconstruction
is attempted based on measurement vectors yof varying sizes m(see [74] for
more details). The above procedure is applied for every possible combination of
(m, k) with successful reconstruction indicated when the error associated with
the estimate x, i.e.,
err =kxx0k2
is below a certain threshold (e.g., <105). Finally, the mean reconstruction
success rate is plotted for each and every pair of m/n (undersampling ratio
or degree of underdeterminacy) and k/m (sparsity ratio). In Fig. 6, indicative
results are plotted for various values of m/n and k/m, with n= 100 and Abeing
the Fourier basis matrix with randomly deleted rows. The mean success rate
has been evaluated based on 200 reconstruction runs by employing the standard
basis pursuit SPGL1 algorithm [84]. The region corresponding to reconstruction
error less than 105with high probability is shown with blue color, whereas
yellow indicates the region corresponding to inaccurate reconstruction with high
probability. The transition zone lies in between. To provide an illustrative
example, it is seen that for a k-sparse coefficient vector x0with n= 100 and
k= 20, m= 30 measurements are adequate to yield successful reconstruction
with high probability.
Fig. 6. Phase diagram corresponding to random sub-matrices of a Fourier basis matrix and
to reconstruction using the SPGL1 algorithm. The brightness of each point represents the
observed success rate, ranging from certain failure (yellow) to certain success (blue). The
z-axis corresponds to the average success rate over 200 runs; the y-axis corresponds to the
ratio showing the degree of the problem underdeterminacy, whereas the y-axis corresponds
to the ratio showing the sparsity degree of the coefficient vector.
2.8. Enhancing sparsity and exploiting additional information in the data
In this section, attention is directed to currently emerging tools and tech-
niques for enhancing solution sparsity and for exploiting additional information
in the data. These include alternative to `1-norm minimization formulations
and iterative re-weighting solution schemes, Bayesian approaches, as well as
structured sparsity and dictionary learning strategies.
2.8.1. Alternative to `1-norm minimization formulations and iterative re-weighting
solution schemes
As discussed in Sections 2.2-2.3, although the `0-norm formulation of Eq. (6)
leads to sparse coefficient vectors based on minimal number of measurements,
there is no known algorithm for solving it efficiently. In this regard, although
convex `1-norm relaxations of the `0-norm problem have been proposed to ad-
dress this challenge (see Sections 2.5-2.6), it has been shown that alternative,
mostly non-convex, proxies of `0-norm exhibit enhanced sparsity-promoting be-
havior in comparison to `1-norm. Indicatively, the difference of the convex `1-
and `2-norms has been considered in [85,86], leading to an overall non-convex
Lipschitz-continuous metric denoted as `12-norm. The related minimization
problem can be solved, for instance, by the difference of convex functions algo-
rithm [87].
An alternative, more general, formulation of the sparse vector recovery prob-
lem relates to replacing the `0-norm minimization criterion by an `p-norm cri-
terion as
psubject to y=Ax (25)
where 0 < p 1, and to employing an iterative re-weighting solution scheme.
In fact, although the formulation in Eq. (25) is non-convex (for p6= 1), it can
be solved efficiently by iteratively minimizing a convex function, such as the
re-weighted `2-norm. In this context, the focal underdetermined system solver
(FOCUSS) [88] has been one of the first such research efforts followed by a
number of relevant contributions [8991] addressing the problem of Eq. (25) in
conjunction with iterative re-weighting solution schemes. Further, important
theoretical results, similar to the RIP (see Section 2.5), have been established
[9295] providing conditions guaranteeing equivalence between Eqs. (25) and
It is worth mentioning that the iteratively-reweighted-least-squares (IRLS)
method, initially introduced for robust statistical estimation applications [96,
97], has also received significant attention with the advent of CS [4,98]. The
IRLS solves a least squares problem iteratively considering kxk1=xTX1x
and X= diag(|x|). In other words, IRLS re-weights the `2-norm iteratively to
approximate an `1-norm minimization function. In a similar manner, FOCUSS
re-weights the `2-norm iteratively to approximate the `p-norm minimization
function of Eq. (25) with 0 <p<1. Note that FOCUSS has been widely utilized
in early studies of the dictionary learning problem as well (see Section 2.8.4).
In the following, attention is directed to a solution scheme initially proposed
in [99], which re-weights the `1-norm iteratively to approximate an `0-norm
minimization function. This solution scheme, referred to as IR`1in the ensuing
analysis, aims at enhancing the sparsity exhibited by the `1-norm formulation
while preserving convexity. The rationale relates to minimizing the influence
of the nonzero coefficients magnitude, similarly to the `0-norm. In this regard,
a number of positive weights w1, . . . , wnare introduced, and the “weighted”
`1-norm minimization problem is formulated as
xkW xk1subject to y=Ax (26)
where W= diag(w1, . . . , wn). It is noted that the solution of the problem in
Eq. (26) does not coincide, in general, with the solution of the original prob-
lem in Eq. (13). In fact, the weights wican be construed as parameters to be
appropriately selected for improving the reconstruction performance. Since the
weights wiare introduced to counteract the influence of the coefficients magni-
tude, it is evident that their optimal values should be inversely proportional to
the magnitudes, i.e.,
|x0,i|if x0,i 6= 0
if x0,i = 0 (27)
Taking into account Eq. (27), the problem in Eq. (26) is guaranteed to yield the
correct solution xunder the assumption that mk[99].
(a) (b) (c)
Fig. 7. Weighted `1-norm minimization for improved sparse signal recovery. (a) Sparse
vector x0, feasible set {x0}+ null(A) and `1-ball of radius kx0k1.(b) There exists vector
x6=x0with k¯
xk1<kx0k1.(c) Weighted `1-ball; there is no ¯
xk1≤ kW x0k1.
Clearly, since x0is the unknown vector to be determined, it is not possible
to select the weights according to Eq. (27). Nevertheless, large weights are used
in practice to discourage nonzero coefficients, whereas small weights are used to
encourage nonzero coefficients. To provide an illustrative example, consider the
3-dimensional problem shown in Fig. 7, where the target vector is x0= [0,0,1]T,
A= [6,2,3], and the plane in gray color represents the affine subspace {x0}+
null(A), i.e., the set of points xR3satisfying Ax =Ax0=y. It is observed in
Fig. 7a that the plane y=Ax intersects with the interior of the `1-ball of radius
1 centered at the origin. Hence, the `1-norm minimization approach discussed
in Section 2.5 (see Fig. 5and the subsequent discussion) recovers the incorrect
vector ¯
x= [0.5,0,0]Tshown in Fig. 7b. Next, considering a weighting matrix
W= diag(5,5,1), the weighted `1-norm minimization of Eq. (26) correctly
recovers x0as shown in Fig. 7c, which depicts the weighted `1-ball BW
radius 1, defined as BW
1={x|kW xk11}. It is worth noting that any choice
of the weights with w1>2w3and 3w2>2w3in the above example yields a
sufficiently modified (weighted) `1-ball for recovering x0. The fact that there
may exist a wide range of possible candidate values for the weights w1, . . . , wn
has motivated the development of an iterative algorithm in [99]. Initially, the
weights are set equal to 1 and the formulation degenerates to Eq. (13), which
yields a first approximation x(0) of the target vector. Next, the weights are
updated according to
i|+i= 1, . . . , n (28)
where a small value is introduced to avoid division by zero. This IR`1scheme
has been shown to correctly converge to the sparse solution vector in a relatively
small number of iterations. Further, in comparison to the original `1-norm
minimization, a smaller number of measurements mis required in general [99].
Moreover, any `1-norm minimization method (see Section 2.6) can be employed
for the solution of Eq. (26).
In passing, note that the weights used in FOCUSS take the form w(j+1)
i, whereas the coefficients converging to zero are removed and constrained
to be identically zero. Moreover, an IRLS scheme was proposed in [94], where
the value of in the update formula of the weights (see Eq. (28)) is grad-
ually reduced with increasing iteration number. This scheme, referred to as
-regularized IRLS, exhibits similar performance to the IR`1in sparse vector
recovery, and appears to outperform the standard IRLS in the sense that it
requires significantly fewer measurements, especially as the value of pdecreases
and approaches 0 [94].
2.8.2. Bayesian CS approaches
This section presents the fundamental concepts of an alternative class of
methodologies addressing the CS problem from a Bayesian perspective [100]. In
this regard, the prior belief that x0is sparse is expressed via an appropriately
chosen probability density function (PDF), whereas the objective is to provide
a posterior PDF for the values of the estimate xby utilizing a small number
mof measurements y, where m<n. The Bayesian CS approach exhibits two
significant advantages over the standard CS techniques. First, in contrast to the
deterministic estimates obtained for the sparse vector xin the traditional CS
framework, Bayesian CS yields a posterior PDF. Clearly, this provides a tool for
uncertainty quantification associated with the reconstructed vector x. Second,
instead of a priori selecting a fixed random matrix Φfollowing standard CS (see
Section 2.1), the posterior PDF can be employed for determining the CS matrix
Φadaptively. This is achieved in an iterative manner by selecting at each cycle
the next row of Φthat minimizes the reconstruction uncertainty (see [100]).
In Bayesian CS, the residual vector r(see Eq. (12)) is modeled as a zero-mean
Gaussian vector with covariance matrix σ2I, where Iis the identity matrix of
size m. This choice yields a Gaussian likelihood function of the form
p(y|x, σ2) = (2πσ2)m
2exp 1
Comparing with the standard CS, Eq. (29) corresponds to the first term of
Eq. (19) and can be construed as a measure of the reconstruction accuracy for
given xand σ2. Next, a sparsity-promoting prior PDF is required for x. A
popular choice is the Laplace PDF [101] of the form
2exp (λ|xi|) = λ
exp (λkxk1) (30)
where λis the coefficient of the penalty factor in Eq. (19). It is noted that
the Bayesian formulation corresponding to the standard CS problem of Eq. (19)
aims at determining the maximum a posteriori (MAP) value of xby using the
likelihood function of Eq. (29) in conjunction with the Laplace prior of Eq. (30).
This naturally raises the question of whether the Bayesian approach can be
adapted for determining the complete posterior PDF p(x|y). Unfortunately,
the Laplace prior in Eq. (30) is not conjugate to the Gaussian likelihood in
Eq. (29), and thus, the Bayesian inference problem cannot be solved to yield
the posterior PDF in closed-form. The interested reader is directed to [102] for
more details about conjugacy in Bayesian inference.
In this regard, there have been efforts for addressing this issue within the
context of sparse Bayesian learning [103] by introducing a technique typically
referred to as relevance vector machine (RVM). Specifically, two distinct PDFs
are utilized, i.e., a zero-mean Gaussian prior on each element of xof the form
i=1 N(xi|0, α1
i) (31)
and a Gamma prior for each element αiof αgiven by
p(α|β, γ )
Γ(αi|β, γ ) (32)
In Eqs. (31) and (32), αrepresents hyperparameters, whereas βand γare
parameters that need to be tuned. Hence, a marginalization over the hyperpa-
rameters α, yields the overall prior on xas
p(x|β, γ )
i=1 Z
0N(xi|0, α1
i)Γ(αi|β, γ )dαi(33)
Since the Gamma PDF Γ(αi|β, γ ) is the conjugate prior of the Gaussian PDF
N(xi|0, α1
i) with respect to αi, the integrals appearing in the product of
Eq. (33) can be evaluated in closed-form yielding the Student-t distribution
[103]. The PDF of Eq. (33) is plotted in Fig. 8a, where it is seen that nonzero
probability values are concentrated primarily around the origin and along the
axes; thus, indicating that sparse vectors are more probable than dense vec-
tors. The Laplace prior of Eq. (30), plotted in Fig. 8b, exhibits similar features.
In contrast, a product of independent Gaussian random variables, plotted in
Fig. 8c, does not exhibit probability concentration along the axes.
(a) Student-t: p(x1, x2) from Eq. (33)(b) Laplace: p(x1, x2) from Eq. (30)
(c) Gaussian: p(x1, x2) Q2
i=1 N(xi|0,1)
(d) Contour lines at p= 0.01
Fig. 8. Various probability density functions that may be employed as priors. (a),(b)
Nonzero probability values are primarily concentrated around the origin and along the axes
encouraging sparse solutions. (c) There is no probability density concentration along the
axes. (d) Contour lines of the three density functions at PDF value p= 0.01.
Several other sparsity inducing priors have been proposed in the sparse
Bayesian learning literature. Indicatively, the spike-and-slab approach intro-
duced in [104], where spike refers to the probability of a particular coefficient
being zero and slab relates to the prior distribution of the coefficients, has been
utilized for Bayesian variable selection [105] and for penalized likelihood estima-
tion [106]. More recently, the horseshoe distribution was proposed in [107] (see
also [108] for a review survey), which exhibits advanced performance in terms
of robustness and adaptivity to different sparsity patterns, and is amenable to
analytical mathematical treatment.
The hierarchical structure discussed so far leads eventually to a convenient
representation of the complete posterior PDF p(x|y) as multivariate Gaussian
with mean vector and covariance matrix given by
Σ=σ2ATA+ diag(α1, . . . , αn)1(35)
respectively. Therefore, the Bayesian CS formulation leads to the problem of
estimating the hyperparameters σand α= [α1, . . . , αn]T. This can be achieved
with the aid of standard Bayesian tools such as Markov chain Monte Carlo [109]
and variational inference [110]. Nevertheless, it can be argued that the stan-
dard solution approach in Bayesian CS is the RVM [103], which is a type II
maximum-likelihood approach exhibiting both satisfactory accuracy and com-
putational efficiency [100]. Specifically, the objective relates to estimating the
values of αand σthat maximize the logarithm of the marginal likelihood, where
marginalization is performed over x. This can be accomplished by implement-
ing an expectation maximization (EM) algorithm, which leads to closed-form
recursive formulae for the iterative solution of the unknown hyperparameters α
and σ(see [103] and [100] for more details). From a computational cost per-
spective, the evaluation of Eq. (35) involves the inversion of an n×nmatrix, an
operation of complexity O(n3). This limitation has been addressed in [111,112]
by developing a fast RVM algorithm with complexity O(nk2).
2.8.3. Structured sparsity
Solving the optimization problems of Eqs. (6) and (13) by employing the
convex and non-convex approaches described in Sections 2.4 and 2.5, respec-
tively, the position of each coefficient in the coefficient vector x0is not taken
into account. However, due to the physics of the specific problem, x0may ex-
hibit not only sparsity, but also additional patterns. This situation is referred to
in the literature as structured sparsity [113]. In this regard, typical examples of
structured sparsity include group sparsity (e.g., [114]), according to which the
coefficients of x0are clustered in disjoint or overlapping groups; hierarchical
sparsity, according to which the coefficients are divided into parents and chil-
dren that are jointly zero or nonzero (see, for instance, the wavelet tree sparsity
in [115]); and, more generally, graph sparsity, according to which underlying re-
lationships between coefficients are described by a graph structure with the aid
of nodes, representing the coefficients, and edges, representing the relationships
between them (e.g., [116]).
Notably, within the context of sparse reconstruction, structured sparsity
serves as additional information to be exploited for further reducing the re-
quired number of measurements. In fact, it has been shown that modifying the
regularization method leads to improvement in reconstruction accuracy for a
given number of measurements (e.g., [113]). Approaches for exploiting struc-
tured sparsity include both convex (e.g., [114]) and non-convex (e.g., [116,117])
formulations with a varying degree of success depending on the type of infor-
mation available.
An indicative example of a greedy, non-convex, approach proposed in [116]
is StructOMP, which can be construed as a generalization of the OMP algo-
rithm described in Section 2.4. In StructOMP, the input consists not only of
the m-length measurement vector yand the m×nmatrix A, but also of the
group structure that the coefficient vector is anticipated to exhibit. Specifically,
a block set is defined that contains possible disjoint or overlapping groups, while
each block is assigned a value that describes its complexity; i.e., generalizing the
notion of sparsity to account for group structures (see [116] for more details).
For example, in standard sparse vectors, each component of the coefficient vec-
tor is considered to have complexity 1, and thus, if a given coefficient is active,
the coefficient vector will be less sparse by 1. In other words, every coefficient
is equally penalized and encouraged to be zero. In group sparse vectors treated
within the StructOMP framework, if a given block is active, the complexity
of the overall coefficient vector increases by the complexity of that group, and
thus, blocks are not evenly penalized. Next, similarly to the standard OMP,
the algorithm, first, selects which block reduces krk2(see Eq. (12)) per unit
increase of complexity the most (this block is considered to provide the maxi-
mum progress to the algorithm), and, second, assigns values to the coefficients
of the selected block via least squares regression. Subsequently, the algorithm
locates the next block corresponding to the maximum progress and terminates
either when krk2becomes smaller than a prescribed threshold, or when the
complexity of xbecomes larger than a prescribed value. In general, StructOMP
is straightforward to implement and can address cases of overlapping groups as
well. Theoretical results regarding its performance (e.g., Structured-RIP; see
also Section 2.7) can be found in [116].
Further, a notable convex approach is the elastic net, proposed in [118] in
the context of grouped variable selection in regression analysis. Specifically, as
argued in [118], the two most widely used regularization techniques, namely,
`1-norm penalization (referred to as LASSO [64] in the statistical and as BPDN
[63] in the signal processing communities) and `2-norm penalization (used for
coefficient vector shrinkage and also known as ridge regression or Tikhonov
regularization) are unable to perform variable selection, shrinkage and variable
grouping, simultaneously. Thus, a rather straightforward approach relates to
combining `1-norm (that promotes sparsity and leads to shrinkage) with `2-
norm (that promotes grouping and leads to shrinkage) penalizations, yielding
the minimization problem
The formulation of Eq. (36), referred to as naive elastic net in the original paper
[118], can be recast as a LASSO problem (see Eq. (19)); thus, a solution estimate
xcan be obtained by using the LARS-EN algorithm, which is a modified version
of LARS (see also Section 2.6). However, Eq. (36) leads to an undesirably high
degree of shrinkage due to the combined shrinkage effects of the `1- and `2-
norms. Therefore, scaling of the solution estimate is typically applied in the
x= (1 + λ2)ˆ
where xis known as the elastic net solution estimate. Overall, elastic net
exhibits the significant advantage of successfully promoting group sparsity, even
in cases where no information about group structures in x0is available; see also
Fig. 9for a visual representation of the elastic net unit-norm ball in R3.
Next, another widely used regularization technique that promotes group
sparsity is the `1/`ppenalty, which encourages sparse solutions at the group
level, but not within the groups (e.g., [114]). In this regard, the minimization
problem becomes
where xgrepresents the coefficients of xthat belong to group gG, with Gbe-
ing the set of all groups, and dgrepresent positive scalar weights. The approach
described by Eq. (38) is referred to in the literature as group-LASSO, with typ-
ical pvalues being 2 and [113]. Indicatively, in Fig. 9the three-dimensional
`1/`2-norm ball is shown and compared with the elastic net unit-norm ball.
Group-LASSO has been shown to improve reconstruction performance as com-
pared to standard LASSO [119]. This is under the condition that there is a
priori knowledge of coefficients x0forming disjoint groups, with either simulta-
neously active or simultaneously inactive coefficients. Further, to account for
the impact of uneven group sizes, rather sophisticated approaches exist for ap-
propriately selecting the weights dg(see, for instance, [120]). Clearly, if groups
in Gare allowed to overlap, more complex coefficient structures can be formed,
such as hierarchical and graph structures [113]. In fact, a direct extension
of group-LASSO relates to considering groups in Gdefined as intersections of
complements of overlapping groups [121]. An alternative approach, commonly
referred to as latent group-LASSO [122], considers groups in Gdefined as unions
of overlapping groups. Interestingly, the latter approach can be construed as a
convex relaxation of StructOMP; see [123] for more details and comparisons.
(a) Elastic net unit-norm ball (b) `1/`2-norm ball
Fig. 9. Comparison between the elastic net unit-norm ball (a) and the `1/`2-norm ball (b)
in R3. In the elastic net ball, curved contours encourage grouping of coefficients, whereas
sharp edges and vertices promote sparsity. In the `1/`2-norm ball, sparsity is promoted
between the groups g1={x1, x2}and g2={x3}, however, no particular direction is
encouraged within the groups.
2.8.4. Dictionary learning strategies
This section focuses on approaches addressing the problem of determining
an optimal matrix ARm×nin Eq. (5) based on a training set of available
signals {yi}N
i=1. These approaches are collectively referred to in the literature
as dictionary learning and seek for a proper basis ARm×n(also termed
overcomplete dictionary), promoting the sparse representation of signals with
similar characteristics to the training set.
In the following, YRm×Ndenotes the matrix with the training vectors
i=1 as its columns, and XRn×Nrepresents the matrix with the corre-
sponding representation vectors {xi}N
i=1 as its vectors, where yi=Axifor all
i= 1, . . . , N . Also, the columns of matrix Aare referred to herein as dictionary
atoms, following the established terminology in the literature. In this regard,
the associated optimization problem can be formulated quite generally in the
Fsubject to kxik0k0i= 1, . . . , N (39)
where k.kFdenotes the Frobenius norm, and k0is an integer denoting a pre-
specified target sparsity degree. Eq. (39) can be equivalently cast in the form
i=1 kxik0subject to kYAXk2
for a fixed value . Next, to address the non-convexity of the `0-norm, an `1-
norm relaxation can be introduced yielding an objective function of the form
Although the objective function in Eq. (41) is not jointly convex with respect
to variables Aand X, it becomes convex with respect to one variable when the
other is kept fixed [124]. This motivates a two-step iterative solution approach,
adopted by the vast ma jority of researchers in dictionary learning. The approach
entails a sparse coding step, i.e., the determination of the representation vectors
in X, followed by a dictionary update step for A. A closer examination of
Eq. (41) shows that the `1-norm constraint on xitends to reduce the values of
the nonzero elements of xi, which forces the elements of Ato increase arbitrarily
in the dictionary update step. This undesirable effect can be meliorated by
restraining the columns of Ato have `2-norms less or equal to one.
Dictionary learning approaches have also been developed within a Bayesian
framework (see Section 2.8.2). Indicatively, according to [125] the dictionary
Ais determined based on maximization of the likelihood PDF p(Y|A). Two
fundamental assumptions are introduced in [125]. First, independence is as-
sumed between the training samples, and second, the prior PDF p(X) is chosen
in a manner that the elements of each representation vector xiare zero-mean
independent identically distributed random variables following the Laplace dis-
tribution. These assumptions lead to a similar to Eq. (41) formulation, which
can be solved efficiently by employing a steepest descent approach for the sparse
coding step and a closed-form formula for the dictionary update step (see also
[126128]). Furthermore, the method of optimal directions [129131] can be
construed as a modification of the aforementioned approach, which provides a
simpler dictionary update formula and allows for the adoption of more sophis-
ticated techniques (e.g., OMP or FOCUSS) for the sparse coding step; see also
[132134] for an alternative related approach, which relies on a MAP setting. In
general, the dictionary learning techniques available in the literature utilize var-
ious different solution approaches for the dictionary update and/or the sparse
coding steps. For instance, the OMP [135] (see Section 2.4) can be utilized for
the problems in Eq. (39) or Eq. (40), or the ISTA [136,137] and the LARS [124]
(see Section 2.6) can be employed for the problem in Eq. (41).
Early work in dictionary learning has also been inspired by vector quanti-
zation (VQ) clustering [135,138]. In VQ clustering, a set of descriptive vectors
j=1 is learned and each training sample is represented by one of those
vectors, typically the closest one in an `2-norm sense. This approach can be
construed as an extreme case of sparse representation, where only one atom
of dictionary Ais selected for representing y; thus, yielding a 1sparse repre-
sentation vector x. Note that in the general sparse representation framework
discussed so far, each signal is represented as a linear combination of more than
one atoms of A. Further, the K-means algorithm, also referred to as the gener-
alized Lloyd algorithm [139], is routinely utilized in the VQ training procedure.
This motivates its use for addressing the dictionary learning problem. Interest-
ingly, the K-means algorithm is a two step procedure, which, first, determines
the 1sparse representation vectors, and second, updates the dictionary (or
codebook in VQ terminology), in a similar manner as the two step dictionary
update framework discussed previously. In this regard, a dictionary learning
technique referred to as K-SVD was presented in [135], where the sparse cod-
ing step is performed by employing OMP (this choice is not restrictive), and
the dictionary update is performed by sequentially updating each column of A
based on singular value decomposition (SVD) to minimize the approximation
error. Although it is not guaranteed to converge, and its convergence perfor-
mance depends on the robustness of the adopted sparse coding algorithm [135],
K-SVD has exhibited highly satisfactory accuracy in various applications such
as image denoising [138].
The dictionary learning approaches discussed so far can be categorized as
“batch” algorithms in the sense that the complete training set is provided as an
input and the ensemble of the training samples is processed at each iteration.
Clearly, this affects the computational efficiency of these algorithms, especially
in applications involving large training sets. To address this challenge, an on-
line dictionary learning algorithm was presented in [124,140], which processes
the training signals one at a time, or in mini-batches. This algorithm utilizes a
LARS solution approach for the problem in Eq. (41) regarding the sparse coding
step, and a block-coordinate descent method with warm restarts for the dictio-
nary update step; see also [141]. Compared with the standard batch algorithms,
this online dictionary learning technique has been shown to exhibit enhanced
performance for both large and small training sets [124]. Moreover, under cer-
tain rather strong conditions, convergence to a stationary point is guaranteed
[124]. In passing, it is noted that all dictionary learning algorithms address a
non-convex problem, and thus, are susceptible to being trapped in local minima
3. Diverse applications in engineering mechanics
Sparse representations and CS approaches have impacted significantly the
field of engineering mechanics over the past few years. In this section, relevant
research work is categorized under three distinct application areas, whereas a
concerted effort is made to highlight the links and interconnections between the
theoretical concepts presented in Section 2and the specific engineering mechan-
ics applications discussed below.
The first application area relates to inverse problems in the field of structural
health monitoring, and specifically to the development of techniques for struc-
tural system identification and damage detection subject to incomplete data.
In fact, applications related to efficient data compression and storage at the
sensors level, and to fast data transmission, have proved quite advantageous
for real-time structural health monitoring. Also, exploitation of the inherently
sparse data structure of vibration response measurements has benefited the de-
velopment of efficacious system identification and damage detection schemes.
The second application area relates to uncertainty modeling and simulation
under incomplete data. In particular, CS-based techniques have been developed
within the context of stochastic processes to address problems in engineering
mechanics related to spectral analysis, statistics estimation and Monte Carlo
simulation under sparse measurements.
The third application area relates to developing computationally efficient
uncertainty propagation techniques for determining the response statistics of
diverse systems in engineering mechanics. The rationale relates to employing CS
tools for evaluating the system response, which is represented by appropriately
chosen sparse expansions. In this manner, the associated computational cost is
reduced, and thus, the solution technique can be applied to higher-dimensional
It is interesting to note that although the aforementioned application areas
appear relatively unrelated to each other, the theoretical concepts and math-
ematical tools utilized (and described in Section 2) are surprisingly similar in
their implementation. This is due to the fact that problems in all three areas
share the challenge of incomplete data. Of course, incomplete data may mani-
fest themselves in various different forms and can correspond to missing or com-
pressed data, or even refer generally to insufficiently few function evaluations.
Ultimately, however, in all herein considered applications, the mathematical for-
mulation yields an undetermined linear system of the form of Eq. (5), which can
be addressed by the versatile CS machinery discussed in Section 2.
3.1. Inverse problems in structural health monitoring: Structural system iden-
tification and damage detection under incomplete data
One of the first applications of sparse representations and CS theory in the
field of structural health monitoring has been the analysis of sparse and/or in-
complete data acquired by diverse sensor technologies. In this regard, many
applications have focused on developing efficient data compression schemes for
real-time structural health monitoring. The rationale relates to acquiring the
signal directly in a compressed format. Clearly, this circumvents the compu-
tational burden of compressing it locally at the sensor and bypasses the need
for sensors with high storage capacity. This entails the utilization of CS in
conjunction with an appropriate compression basis (in which the signal has a
sparse representation) for reconstructing data series with far higher resolution
than those originally captured. Notably, in the problem of data compression the
CS efficiency can be optimized by appropriately designing the sampling matrix
Φ, or, equivalently, matrix Ain Eq. (5). However, this is not the case when
the problem of limited and/or missing data is considered. Indicatively, prac-
tical reasons for the occurrence of limited data include data loss due to both
equipment failure (e.g., damaged sensors) and sensor thresholding limitations.
Numerous other issues including sensor maintenance, bandwidth limitations,
usage and data acquisition restrictions, as well as data corruption may also lead
to missing data. It becomes clear that applying CS theory to the problem of
missing data for signal reconstruction differs primarily in one respect as com-
pared to data compression; that is, missing data are not necessarily intentional.
Obviously, this removes control over one important step of compressive sam-
pling, i.e., the arrangement of the sampling matrix Φ. Indeed, as mentioned
in Section 2.2, a number of bases with randomly deleted rows, such as Fourier,
satisfy the requirements of Eqs. (7) and (10) for sparse reconstruction with high
probability. Unfortunately, the missing data may not be uniformly distributed
over the record; thus, regular or large gaps of missing data can lead to matrices
Awith less incoherent basis vectors. Clearly, this additional challenge highlights
the need for assessing the performance of the various CS tools in conjunction
with matrices Athat do not (strictly) conform to theoretical conditions such as
the RIP (see also Section 2.7).
One of the first CS applications for data compression can be found in [142],
where the authors utilized bridge vibration data and employed orthogonal ex-
pansion bases (e.g., Fourier and wavelets) in conjunction with an `1-norm mini-
mization formulation. The approach was subsequently applied for signal recon-
struction related to the problem of data loss in a wireless sensor network during
transmission of data between the wireless sensor nodes and the base station
[143]. In a similar context, CS was employed in [144] for data loss recovery
associated with a fast-moving wireless sensing technique for structural health
monitoring of bridges without interrupting traffic.
Further, following pioneering contributions in signal processing (see Sec-
tion 2.8.2), a Bayesian CS methodology was proposed in [145], which, in con-
trast to the standard approaches delineated in Sections 2.3-2.6, provides also
with an estimate of the signal reconstruction uncertainty. Specifically, this
Bayesian treatment yields posterior distributions p(x|y) for the basis coeffi-
cients of Eq. (5), which can be used eventually for suppressing the basis terms
whose contribution to the reconstructed signal is minimal. The methodology
was further enhanced and its reconstruction robustness was improved in [146],
where its performance was assessed with regard to recovery of lost data during
wireless transmission.
Next, by proposing a matrix reshape scheme, a low-rank representation of
large-scale structural seismic and typhoon responses was identified in [147],
which proved to be beneficial for efficient data compression. The scheme was
coupled in [148] with a nuclear norm minimization algorithm for recovering of
multi-channel structural response time-histories with randomly missing data.
The same authors exploited CS tools in [149] for efficient transmission and re-
covery of large-scale image data related to structural system and civil infrastruc-
ture health diagnosis. Furthermore, the relatively recently proposed concept of
group sparsity (see also Section 2.8.3) was employed in [150] for reconstructing
incomplete vibration data measured by sensors placed at various different lo-
cations of the structure, while in [151] a dictionary learning strategy (see also
Section 2.8.4) was proposed for under-sampled acoustic emission signal recon-
Finally, it is worth mentioning that references [152,153] focus on practi-
cal implementation of CS algorithms in wireless sensor networks, and provide
relevant discussions about optimal configurations and energy efficiency aspects.
Moreover, a hybrid sensor network configuration was proposed in [154] (see also
[155]), based on fusion of a minimal number of tethered sensors with wireless
nodes, for improving the information content of the transmitted data.
In the remainder of the section, attention is directed to system identification
and damage detection methodologies, which exploit the capabilities of the CS
machinery. In this regard, the work in [156] constitutes one of the first research
efforts to employ CS-based data analysis for estimating the damage condition
of a structure. In [157], a CS-based scheme was proposed and applied for de-
termining the degradation of a pipe-soil interaction model, where the damage
identification task was treated as a pattern classification problem. In a relatively
different context, a standard `1-norm optimization approach was proposed in
[158] for identifying the distribution of moving vehicle loads on cable-stayed
bridges. Further, a scheme was devised in [159] based on a combination of blind
feature extraction and sparse representation classification, in conjunction with
a modal analysis treatment, for locating the structural damage and assessing its
severity. The same authors proposed an output-only identification approach in
[160] by coupling CS with blind source separation schemes for determining the
mode shape matrix of the structural model. In [161] the approach was modified
to account for video camera based vibration measurements. Along similar lines,
in [162] the mode shapes of a multi-degree-of-freedom (MDOF) structural sys-
tem were identified based on under-sampled vibration data collected by wireless
sensors; see also [163] for a formulation of the mode shape identification problem
based on atomic norm minimization.
In [164,165] a sensitivity-based model updating scheme in conjunction with
`1-norm minimization was proposed for identifying localized damage in struc-
tures based on incomplete modal information; see also [166,167] for some related
work. In this context, several authors highlighted the limitations of employing
a Tikhonov regularization strategy (see also Section 2.8.3), typically used in
sensitivity-based model updating, for addressing the resulting underdetermined
problem. In particular, to address issues related to over-smoothing resulting
from Tikhonov regularization and to promote the sparseness of the damage
identification problem, various `1-norm regularization schemes were proposed
in [168173]. In [174] the authors combined CS for signal reconstruction with
auto-regressive and Wiener filter based methods for structural damage detection
and localization, while in [175] the ill-posedness of the inverse damage identifi-
cation problem was addressed by adding an `1-norm regularization term in the
objective function.
Furthermore, in [176,177] a spectral identification technique was developed
for determining the parameters of nonlinear and time-variant structural systems
based on available input-output (excitation-response) realizations. A significant
advantage of the technique relates to the fact that it can readily account for the
presence of fractional derivative terms in the system governing equations, as
well as for the cases of non-stationary, incomplete and/or noise-corrupted data.
Specifically, the technique relies on recasting the governing equations as a set
of multiple-input-multiple-output systems in the wavelet domain. Next, an `1-
norm minimization procedure based on CS theory is employed for determining
the wavelet coefficients of the available incomplete non-stationary input-output
data. Finally, these wavelet coefficients are utilized to reconstruct the non-
stationary incomplete signals, and consequently, to determine system related
time- and frequency-dependent wavelet-based frequency response functions and
associated parameters. The technique can be construed as a generalization of the
multiple-input-single-output methodology pioneered by Bendat and co-workers
(e.g., [178]) to account for non-stationary and incomplete data, as well as for
fractional derivative modeling.
Moreover, in [179,180] a power spectrum blind multi-coset sampling ap-
proach was proposed for operational modal analysis applications involving wire-
less sensor networks. In comparison with a CS treatment, the performance of
the multi-coset sampling approach appeared rather insensitive to the signal
sparsity degree. In [181] a dictionary learning approach (see also Section 2.8.4)
was applied for nonlinear structural system identification and for determining
the underlying governing equations based on available input-output data; see
also [182,183] for indicative applications of sparsity-based algorithms utilizing
dictionaries in damage detection problems. It is worth mentioning that CS
concepts have also been used for structural system impact force identification.
Indicatively, in [184] a hybrid `1/`2-norm minimization approach for promoting
group sparsity was proposed (see Section 2.8.3); see also [185,186] for some
relevant references.
3.2. Uncertainty modeling and simulation under incomplete data
CS-based techniques have also been developed within the context of stochas-
tic processes to address problems in stochastic engineering mechanics related to
spectral analysis, statistics estimation and Monte Carlo simulation under in-
complete available data. Specifically, Kougioumtzoglou and co-workers relied
on CS theory for stationary and non-stationary stochastic process power spec-
trum estimation subject to missing data [187]. This was done in conjunction
with an `1-norm optimization algorithm for obtaining a sparse representation of
the signal in the selected basis (i.e., Fourier or wavelets). Notably, the underly-
ing stochastic process power spectrum can be estimated in a direct manner by
utilizing the determined expansion coefficients; thus, circumventing the compu-
tational cost related to reconstructing the signal in the time domain.
The technique was enhanced in [188] by utilizing an adaptive basis re-
weighting scheme for increasing further the sparsity of the solution (see also
Section 2.8.1), and was applied in [189] for structural system response and re-
liability analysis under missing data. The rationale relates to applying CS to
multiple process records iteratively, and to utilizing the cumulative information
from all records for the purpose of seeking a sparse representation in an aver-
age sense over an ensemble. By introducing this iterative process to alter basis
coefficients, a significant gain in spectral estimation accuracy was observed as
compared to standard CS. In a similar context, an `p-norm (0 < p < 1) opti-
mization algorithm was proposed in [190] for promoting solution sparsity (see
also Section 2.8.1). Regarding the effect of the chosen norm on the power spec-
trum estimation error, it was shown that the `1/2-norm provides almost always
a sparser solution than the `1-norm. This was corroborated by various examples
considering stationary, non-stationary and two-dimensional processes related to
sea wave, wind, and material properties spectra, respectively. It was also ob-
served that the reconstruction accuracy of the technique is further enhanced
when coupled with the aforementioned adaptive basis re-weighting scheme.
The above developments have found recently diverse applications in ma-
rine engineering. Indicatively, a methodology based on `1/2-norm minimization
(see Section 2.8.1) was proposed for efficient processing and joint time-frequency
analysis of relatively long water wave records by enabling reconstruction of data
recorded at a very low (sub-Nyquist) sampling rate [191]. Further, a CS tech-
nique relying on adaptive basis re-weighting (see Section 2.8.1 and [188]) was
developed in [192] for extrapolating in the spatial domain and estimating the
space-time characteristics of a sea state based on data collected at very few
spatially sparse points (e.g., wave buoys). This is of considerable importance to
a number of marine engineering applications involving three-dimensional waves
interacting with marine structures, such as optimizing arrays of wave energy
converters. Furthermore, a novel approach for measuring the sea surface ele-
vation on vertical breakwaters was developed in [193]. Note that this is not a
trivial problem since alternative typically used approaches, such as ultrasonic
probes and image processing, exhibit limitations related to signal distortion and
high computational cost, respectively. In this regard, the authors in [193] relied
on pressure measurements and on a CS-based reconstruction algorithm in con-
junction with a generalized harmonic wavelet basis. Specifically, the proposed
approach leads to an `1-norm based constrained optimization scheme, which
utilizes the known values of the free surface data to reconstruct all other miss-
ing data while adhering at the same time to prescribed upper and lower bounds
at all time instants. The approach was also used in [194] as a validation tool
for supporting the veracity of the analytically derived probability distribution
of the nonlinear wave crest height on a vertical breakwater.
In [195] a Bayesian CS approach (see also Section 2.8.2) was proposed for
estimating profiles of soil properties based on sparse measurement data. The
approach is capable of quantifying the uncertainty of the statistical estimates as
well, while its performance was assessed in [196] against alternative widely used
techniques for interpolation of spatially varying and sparsely measured geo-data.
From a random field simulation perspective, the approach was coupled in [197]
with a Karhunen-Lo`eve expansion for generating random field samples within
a Monte Carlo simulation context. It was further generalized in [198] for sim-
ulation of cross-correlated random fields in the spatial domain, and in [199] to
account for non-stationary and non-Gaussian random fields. Also, it was shown
in [200] that the approach can be employed for random field simulation without
the need for “detrending” first, while a bootstrap approach for statistical infer-
ence of random field auto-correlation structure was proposed in [201] based on
a combination of Bayesian CS and Karhunen-Lo`eve expansion.
3.3. Computationally efficient uncertainty propagation in engineering mechan-
Addressing the challenge of uncertainty propagation in engineering mechan-
ics relates to the development of analytical and numerical methodologies for
stochastic response analysis of engineering systems. Specifically, ever-increasing
computational capabilities, novel signal processing techniques, and advanced
experimental setups have contributed to a highly sophisticated mathematical
modeling of the system governing equations. In general, these take the form
of high-dimensional stochastic (partial / fractional) differential equations to be
solved for evaluating the system response statistics; see also [202,203] for a
broad perspective. In this regard, a wide range of solution techniques rely on
appropriate (stochastic) representations and expansions of the system response
quantities of interest (e.g., displacements, stresses, etc), where the objective is
to determine the expansion coefficients accurately and in a computationally ef-
ficient manner. Recently, the potential sparsity of such expansions has been
exploited and CS-based strategies have been proposed for reducing the asso-
ciated computational cost and for extending the range of applicability of the
techniques to problems of higher dimensions.
In this context, a rather popular solution technique in stochastic mechanics
relates to the use of polynomial chaos expansions (e.g., [204206]). This en-
tails the expansion of the system response quantity on a basis of (multivariate)
polynomials that are orthogonal with respect to the joint PDF of the input.
Recently, polynomial chaos expansions have been coupled with CS concepts
and tools for efficient representation and determination of the system response
(e.g., [44]). This has been motivated not only by theoretical results showing
that multivariate functions possess sparse expansions in orthogonal polynomial
bases (e.g., [207]), but also by the typically observed structured sparsity (see
also Section 2.8.3) in the polynomial chaos expansions of various problems; that
is, coefficients corresponding to low polynomial orders tend to be larger than
coefficients corresponding to higher orders.
One of the first research efforts to explore the sparsity-promoting proper-
ties of the `1-norm, in conjunction with a LARS algorithm (see Section 2.6)
for automatically detecting the significant coefficients of the polynomial chaos
expansion, can be found in [208]. Further, in [209] the polynomial chaos ex-
pansion was combined with standard CS for efficiently constructing a solution
representation of elliptic stochastic partial differential equations. Applications
of the technique to address diverse problems in the fields of molecular biology,
astrodynamics, and computational fluid dynamics can be found in [210], [211],
and [212], respectively.
Following the aforementioned relatively standard implementation of the CS
approach, a weighting scheme was proposed in [213] for further promoting spar-
sity in the recovery of the expansion coefficients (see also Section 2.8.1), while
an adaptive re-weighting `1-norm minimization scheme was applied in [214] for
the solution of stochastic partial differential equations. Note that in several
cases, such as in [215], the construction of the weighting matrix Win Eq. (26)
can be based on a priori information and on theoretical results about the de-
cay of the polynomial chaos coefficients. Moreover, additional information in
the form of response derivative estimates may be available. In this context,
gradient-enhanced `1-norm minimization schemes were proposed in [216,217]
for accelerating the determination of the polynomial coefficients. Further, it is
worth mentioning that alternative optimization algorithms based on `p-norm,
p < 1, [218,219] and on `12-norm [220] (see Section 2.8.1) were also employed
for increasing the sparsity of the obtained polynomial chaos expansion coefficient
vector (see also Section 2.8.1).
More recently, dictionary learning approaches (see Section 2.8.4) and iter-
ative basis updating schemes were proposed for increasing the approximation
accuracy and for decreasing the required number of expansion coefficients. In
this regard, anisotropic basis sets with more terms in important dimensions were
constructed in [221] in an adaptive manner, while an incremental algorithm was
employed in [222] for promoting sparsity by exploring sub-dimensional expan-
sions. Also, by resorting to CS concepts, a basis adaptation technique was
developed in [223] yielding a sparse polynomial chaos expansion. Specifically, a
two-step optimization algorithm was devised, which calculates the coefficients
and the input projection matrix of a low dimensional polynomial chaos expan-
sion with respect to an optimally rotated basis. Further, to reduce the number
of samples necessary for recovering the expansion coefficients, importance sam-
pling and coherence-optimal sampling strategies were developed in [224], and
applied in [225] in conjunction with adaptive global bases; see also [226,227] for
relevant work. It is worth mentioning that Bayesian CS (see also Section 2.8.2)
has also been used in conjunction with polynomial chaos expansions for basis
selection and uncertainty quantification regarding the basis significance (e.g.,
Of course, polynomial chaos expansions are not the only response repre-
sentations that have been employed in conjunction with CS strategies for ef-
ficient uncertainty propagation. Indicatively, sparse wavelet-based expansions
were employed in [230], and were coupled with importance sampling schemes
for determining the expansion coefficients in an efficient manner. In [231], a
problem-dependent basis in conjunction with a Karhunen-Lo`eve representation
was proposed for enhancing the sparsity of the coefficient vector. In a relatively
different context, CS was applied in [232] for the computationally efficient calcu-
lation of high-dimensional integrals arising in quantum mechanics. In particu-
lar, by interpreting the integrand as a tensor in a suitable tensor product space,
its entries were determined by utilizing an `1-norm minimization in conjunc-
tion with few only function evaluations. Next, by employing a rank reduction
strategy, the high-dimensional integrand was cast in the form of a sum of low
dimensional functions to be integrated by a standard Gauss-Hermite quadrature
Further, Kougioumtzoglou and coworkers have recently adapted, extended,
and applied the Wiener path integral methodology, which originates from the-
oretical physics (e.g., [233235]), for the stochastic response analysis and opti-
mization of diverse engineering dynamical systems (e.g., [236242]). Specifically,
it has been shown that the joint response transition PDF of stochastically ex-
cited dynamical systems can be expressed exactly as a functional integral over
all possible paths that the response process may follow [236,237]. Notably, a di-
verse class of problems, such as systems endowed with fractional derivative terms
[238] or characterized by singular diffusion matrices [243], structures exhibiting
various nonlinear behaviors [240], as well as systems sub ject to non-white, non-
Gaussian and non-stationary excitation processes [244], can be readily addressed
by the versatile Wiener path integral formalism.
Nevertheless, the analytical evaluation of the path integral is, in general,
a highly challenging task, and thus, approximate solution techniques are typ-
ically required. In this regard, the standard approach, which is referred to in
the theoretical physics literature as the semi-classical approximation, relates to
accounting in the path integral only for the path associated with the maximum
probability of occurrence (also known as the most probable path). Therefore,
evaluating the path integral degenerates to obtaining the most probable path
and to determining its probability. However, obtaining analytically in explicit
form the (dependent on boundary conditions) most probable path is generally
impossible. Thus, a variational problem is solved numerically for determining a
specific point of the joint response PDF. Accordingly, for an M-DOF system cor-
responding to 2Mstochastic dimensions (Mdisplacements and Mvelocities),
and discretizing the effective PDF domain using Npoints in each dimension,
the number of required “measurements” (i.e., number of boundary value prob-
lems to be solved numerically) becomes N2M. Clearly, this demonstrates the
high computational cost related to a brute-force implementation. However, it
has been shown recently that this “data acquisition” process can be coupled
with versatile expansion schemes, compressive sampling techniques and group
sparsity concepts. Specifically, by utilizing (time-variant) sparse representations
of the response PDF (e.g., monomial or wavelet bases) and by exploiting the
group structure of the expansion coefficients (see Section 2.8.3), the response
PDF of relatively high-dimensional nonlinear systems can be determined in a
computationally efficient manner [239,245,246].
4. Concluding remarks
A review of CS theoretical concepts and numerical tools in conjunction with
diverse applications in engineering mechanics has been attempted from a broad
perspective. In this regard, a concerted effort has been made to highlight the
links and interconnections between the CS theory and algorithms presented in
Section 2and the plethora of applications in engineering mechanics discussed in
Section 3. Hopefully, the extensive list of readily available references can serve
as a compass for navigating the interested researcher though the multitude of
CS concepts and applications, even beyond the scope of this paper.
It is anticipated that the currently rapid progress in data science and ma-
chine learning will facilitate further the development, enhancement and applica-
tion of CS-based techniques in engineering mechanics. Indicatively, the work in
[247] constitutes an interesting effort towards this direction, where deep learn-
ing is employed in conjunction with sparse regression and `1-norm minimization
for simultaneous reduced-order modeling and data-driven identification of the
governing equations of the dynamical system. Such approaches may prove in
the near future indispensable for the materialization and practical implementa-
tion of emerging concepts in data-driven uncertainty quantification and health
monitoring of diverse engineering systems and structures; see, for instance, the
concept of a digital twin mirroring the physical system and tracking its temporal
evolution (e.g., [248]).
I. A. Kougioumtzoglou gratefully acknowledges the support by the CMMI
Division of the National Science Foundation, USA (Award number: 1724930).
[1] J. Fourier, Theorie Analytique De La Chaleur, Par M. Fourier, Chez
Firmin Didot, p`ere et fils, 1822.
[2] J. G. Proakis, D. G. Manolakis, Introduction to Digital Signal Processing,
Prentice Hall Professional Technical Reference, 1988.
[3] Y. C. Eldar, G. Kutyniok, Compressed Sensing: Theory and Applications,
Cambridge University Press, 2012.
[4] S. Foucart, H. Rauhut, A Mathematical Introduction to Compressive
Sensing, Birkh¨auser Basel, 2013.
[5] M. Elad, Sparse and Redundant Representations: From Theory to Ap-
plications in Signal and Image Processing, Springer Science & Business
Media, 2010.
[6] I. Rish, G. Grabarnik, Sparse Modeling: Theory, Algorithms, and Appli-
cations, CRC Press, 2014.
[7] C. E. Shannon, Communication in the Presence of Noise, Proceedings of
the IRE 37 (1949) 10–21.
[8] C. Carath´eodory, ¨
Uber Den Variabilit¨atsbereich Der Koeffizienten von
Potenzreihen, Die Gegebene Werte Nicht Annehmen, Mathematische An-
nalen 64 (1907) 95–115.
[9] C. Carath´eodory, ¨
Uber Den Variabilit¨atsbereich Der Fourier’schen Kon-
stanten von Positiven Harmonischen Funktionen, Rendiconti Del Circolo
Matematico di Palermo (1884-1940) 32 (1911) 193–217.
[10] A. Beurling, Sur Les Int´egrales de Fourier Absolument Convergentes et
Leur Application `a Une Transformation Fonctionelle, in: Ninth Scandina-
vian Mathematical Congress, 345–366, 1938.
[11] R. Dorfman, The Detection of Defective Members of Large Populations,
The Annals of Mathematical Statistics 14 (1943) 436–440.
[12] B. Logan, Properties of High-Pass Functions, Ph.D. thesis, Doctoral The-
sis, Electrical Engineering Department, Columbia University, New York,
[13] H. L. Taylor, S. C. Banks, J. F. McCoy, Deconvolution with the `1 Norm,
Geophysics 44 (1979) 39–52.
[14] S. Levy, P. K. Fullagar, Reconstruction of a Sparse Spike Train from a Por-
tion of Its Spectrum and Application to High-Resolution Deconvolution,
Geophysics 46 (1981) 1235–1243.
[15] C. Walker, T. J. Ulrych, Autoregressive Recovery of the Acoustic
Impedance, Geophysics 48 (1983) 1338–1350.
[16] F. J. Herrmann, M. P. Friedlander, O. Yilmaz, Fighting the Curse of
Dimensionality: Compressive Sensing in Exploration Seismology, IEEE
Signal Processing Magazine 29 (2012) 88–100.
[17] E. J. Cand`es, J. K. Romberg, T. Tao, Stable Signal Recovery from In-
complete and Inaccurate Measurements, Communications on Pure and
Applied Mathematics 59 (2006) 1207–1223.
[18] D. L. Donoho, et al., Compressed Sensing, IEEE Transactions on Infor-
mation Theory 52 (2006) 1289–1306.
[19] M. Rudelson, R. Vershynin, On Sparse Reconstruction from Fourier and
Gaussian Measurements, Communications on Pure and Applied Mathe-
matics 61 (2008) 1025–1045.
[20] P. B¨uhlmann, S. Van De Geer, Statistics for High-Dimensional Data:
Methods, Theory and Applications, Springer Science & Business Media,
[21] R. Tibshirani, M. Wainwright, T. Hastie, Statistical Learning with Spar-
sity: The Lasso and Generalizations, Chapman and Hall/CRC, 2015.
[22] J.-L. Starck, F. Murtagh, J. Fadili, Sparse Image and Signal Processing:
Wavelets and Related Geometric Multiscale Analysis, Cambridge univer-
sity press, 2015.
[23] Q. Zhang, B. Li, Dictionary Learning in Visual Computing, Synthesis
Lectures on Image, Video, & Multimedia Processing 8 (2015) 1–151.
[24] H. Boche, Compressed Sensing and Its Applications : MATHEON Work-
shop 2013, Birkh¨auser, 2015.
[25] H. Boche, Compressed Sensing and Its Applications : Second International
MATHEON Conference 2015, Birkh¨auser, 2017.
[26] H. Boche, Compressed Sensing and Its Applications : Third International
MATHEON Conference 2017, Birkh¨auser, 2019.
[27] R. G. Baraniuk, E. Candes, M. Elad, Y. Ma, Applications of Sparse Rep-
resentation and Compressive Sensing, Proceedings of the IEEE 98 (2010)
[28] M. F. Duarte, Y. C. Eldar, Structured Compressed Sensing: From Theory
to Applications, IEEE Transactions on Signal Processing 59 (2011) 4053–
[29] D. Craven, B. McGinley, L. Kilmartin, M. Glavin, E. Jones, Compressed
Sensing for Bioelectric Signals: A Review, IEEE Journal of Biomedical
and Health Informatics 19 (2014) 529–540.
[30] L. P. Yaroslavsky, Compression, Restoration, Resampling,‘Compressive
Sensing’: Fast Transforms in Digital Imaging, Journal of Optics 17 (2015)
[31] D. Thapa, K. Raahemifar, V. Lakshminarayanan, Less Is More: Compres-
sive Sensing in Optics and Image Science, Journal of Modern Optics 62
(2015) 415–429.
[32] Z. Zhang, Y. Xu, J. Yang, X. Li, D. Zhang, A Survey of Sparse Represen-
tation: Algorithms and Applications, IEEE Access 3 (2015) 490–530.
[33] Y. Zhang, L. Y. Zhang, J. Zhou, L. Liu, F. Chen, X. He, A Review of
Compressive Sensing in Information Security Field, IEEE Access 4 (2016)
[34] N. Vaswani, J. Zhan, Recursive Recovery of Sparse Signal Sequences from
Compressive Measurements: A Review, IEEE Transactions on Signal Pro-
cessing 64 (2016) 3523–3549.
[35] G. Kumar, K. Baskaran, R. E. Blessing, M. Lydia, A Comprehensive
Review on the Impact of Compressed Sensing in Wireless Sensor Networks,
International Journal on Smart Sensing & Intelligent Systems 9.
[36] R. E. Carrillo, A. B. Ramirez, G. R. Arce, K. E. Barner, B. M. Sadler, Ro-
bust Compressive Sensing of Sparse Signals: A Review, EURASIP Journal
on Advances in Signal Processing 2016 (2016) 108.
[37] Y. V. Parkale, S. L. Nalbalwar, Application of Compressed Sensing (CS)
for ECG Signal Compression: A Review, in: Proceedings of the Interna-
tional Conference on Data Engineering and Communication Technology,
Springer, 53–65, 2017.
[38] M. Sandilya, S. Nirmala, Compressed Sensing Trends in Magnetic Reso-
nance Imaging, Engineering science and technology, an international jour-
nal 20 (2017) 1342–1352.
[39] S. Cheng, Z. Cai, J. Li, Approximate Sensory Data Collection: A Survey,
Sensors 17 (2017) 564.
[40] M. Rani, S. Dhok, R. Deshmukh, A Systematic Review of Compressive
Sensing: Concepts, Implementations and Applications, IEEE Access 6
(2018) 4875–4894.
[41] H. Djelouat, A. Amira, F. Bensaali, Compressive Sensing-Based IoT Ap-
plications: A Review, Journal of Sensor and Actuator Networks 7 (2018)
[42] Y. Wang, D. Meng, M. Yuan, Sparse Recovery: From Vectors to Tensors,
National Science Review 5 (2017) 756–767.
[43] E. Sejdi´c, I. Orovi´c, S. Stankovi´c, Compressive Sensing Meets
Time–Frequency: An Overview of Recent Advances in Time–Frequency
Processing of Sparse Signals, Digital signal processing 77 (2018) 22–35.
[44] J. Hampton, A. Doostan, Compressive Sampling Methods for Sparse Poly-
nomial Chaos Expansions, in: Handbook of Uncertainty Quantification,
Springer International Publishing, 1–29, 2017.
[45] W.-X. Wang, Y.-C. Lai, C. Grebogi, Data Based Identification and Pre-
diction of Nonlinear and Complex Dynamical Systems, Physics Reports
644 (2016) 1–76.
[46] K. Sayood, Introduction to Data Compression, Newnes, 2012.
[47] C. D. Meyer, Matrix Analysis and Applied Linear Algebra, vol. 71, Siam,
[48] S. Friedland, L.-H. Lim, Nuclear Norm of Higher-Order Tensors, Mathe-
matics of Computation 87 (2018) 1255–1281.
[49] E. J. Cand`es, Compressive Sampling, in: Proceedings of the International
Congress of Mathematicians, vol. 3, Madrid, Spain, 1433–1452, 2006.
[50] D. L. Donoho, M. Elad, Optimally Sparse Representation in General
(Nonorthogonal) Dictionaries via L1 Minimization, Proceedings of the Na-
tional Academy of Sciences 100 (2003) 2197–2202.
[51] B. K. Natarajan, Sparse Approximate Solutions to Linear Systems, SIAM
Journal on Computing 24 (1995) 227–234.
[52] S. G. Mallat, Z. Zhang, Matching Pursuits with Time-Frequency Dictio-
naries, IEEE Transactions on Signal Processing 41 (1993) 3397–3415.
[53] Y. C. Pati, R. Rezaiifar, P. S. Krishnaprasad, Orthogonal Matching Pur-
suit: Recursive Function Approximation with Applications to Wavelet
Decomposition, in: Proceedings of 27th Asilomar Conference on Signals,
Systems and Computers, IEEE, 40–44, 1993.
[54] T. Blumensath, M. E. Davies, Iterative Thresholding for Sparse Approxi-
mations, Journal of Fourier Analysis and Applications 14 (2008) 629–654.
[55] D. Needell, J. A. Tropp, CoSaMP: Iterative Signal Recovery from In-
complete and Inaccurate Samples, Applied and Computational Harmonic
Analysis 26 (2009) 301–321.
[56] G. M. Davis, S. G. Mallat, Z. Zhang, Adaptive Time-Frequency Decom-
positions, Optical Engineering 33 (1994) 2183–2192.
[57] D. L. Donoho, I. Drori, Y. Tsaig, J.-L. Starck, Sparse Solution of Under-
determined Linear Equations by Stagewise Orthogonal Matching Pursuit,
Department of Statistics, Stanford University, 2006.
[58] R. Gribonval, M. Nielsen, Sparse Representations in Unions of Bases,
IEEE Transactions on Information Theory 49 (2003) 3320–3325.
[59] L. Welch, Lower Bounds on the Maximum Cross Correlation of Signals
(Corresp.), IEEE Transactions on Information Theory 20 (1974) 397–399.
[60] E. Cand`es, T. Tao, Decoding by Linear Programming, IEEE Transactions
on Information Theory .
[61] R. Baraniuk, M. Davenport, R. DeVore, M. Wakin, A Simple Proof of
the Restricted Isometry Property for Random Matrices, Constructive Ap-
proximation 28 (2008) 253–263.
[62] A. M. Tillmann, M. E. Pfetsch, The Computational Complexity of the
Restricted Isometry Property, the Nullspace Property, and Related Con-
cepts in Compressed Sensing, IEEE Transactions on Information Theory
60 (2014) 1248–1259.
[63] S. S. Chen, D. L. Donoho, M. A. Saunders, Atomic Decomposition by
Basis Pursuit, SIAM Review 43 (2001) 129–159.
[64] R. Tibshirani, Regression Shrinkage and Selection via the Lasso, Journal
of the Royal Statistical Society: Series B (Methodological) 58 (1996) 267–
[65] J. Nocedal, S. Wright, Numerical Optimization, Springer Science & Busi-
ness Media, 2006.
[66] P. L. Combettes, J.-C. Pesquet, Proximal Splitting Methods in Signal
Processing, in: Fixed-Point Algorithms for Inverse Problems in Science
and Engineering, Springer, 185–212, 2011.
[67] H. H. Sohrab, Basic real analysis, Springer, 2003.
[68] N. Parikh, S. Boyd, et al., Proximal Algorithms, Foundations and Trends
in Optimization 1 (2014) 127–239.
[69] I. Daubechies, M. Defrise, C. De Mol, An Iterative Thresholding Algo-
rithm for Linear Inverse Problems with a Sparsity Constraint, Communi-
cations on Pure and Applied Mathematics 57 (2004) 1413–1457.
[70] Y. E. Nesterov, A Method for Solving the Convex Programming Problem
with Convergence Rate O (1/Kˆ
2), in: Soviet Mathematics Doklady, vol.
269, 543–547, 1983.
[71] A. Beck, M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm
for Linear Inverse Problems, SIAM Journal on Imaging Sciences 2 (2009)
[72] B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, et al., Least Angle Re-
gression, The Annals of Statistics 32 (2004) 407–499.
[73] A. Chambolle, T. Pock, A First-Order Primal-Dual Algorithm for Convex
Problems with Applications to Imaging, Journal of Mathematical Imaging
and Vision 40 (2011) 120–145.
[74] D. L. Donoho, J. Tanner, Precise Undersampling Theorems, Proceedings
of the IEEE 98 (2010) 913–924.
[75] D. Amelunxen, M. Lotz, M. B. McCoy, J. A. Tropp, Living on the Edge:
Phase Transitions in Convex Programs with Random Data, Information
and Inference: A Journal of the IMA 3 (2014) 224–294.
[76] D. L. Donoho, High-Dimensional Centrally Symmetric Polytopes with
Neighborliness Proportional to Dimension, Discrete & Computational Ge-
ometry 35 (2006) 617–652.
[77] D. Donoho, J. Tanner, Observed Universality of Phase Transitions in
High-Dimensional Geometry, with Implications for Modern Data Analysis
and Signal Processing, Philosophical Transactions of the Royal Society A:
Mathematical, Physical and Engineering Sciences 367 (2009) 4273–4293.
[78] D. L. Donoho, J. Tanner, Counting the Faces of Randomly-Projected
Hypercubes and Orthants, with Applications, Discrete & computational
geometry 43 (2010) 522–541.
[79] D. L. Donoho, J. Tanner, Exponential Bounds Implying Construction of
Compressed Sensing Matrices, Error-Correcting Codes, and Neighborly
Polytopes by Random Sampling, IEEE Transactions on Information The-
ory 56 (2010) 2002–2016.
[80] T. T. Cai, A. Zhang, Sharp RIP Bound for Sparse Signal and Low-
Rank Matrix Recovery, Applied and Computational Harmonic Analysis
35 (2013) 74–93.
[81] A. Maleki, D. L. Donoho, Optimally Tuned Iterative Reconstruction Algo-
rithms for Compressed Sensing, IEEE Journal of Selected Topics in Signal
Processing 4 (2010) 330–341.
[82] J. D. Blanchard, J. Tanner, Performance Comparisons of Greedy Algo-
rithms in Compressed Sensing, Numerical Linear Algebra with Applica-
tions 22 (2015) 254–282.
[83] C. Hegde, P. Indyk, L. Schmidt, A Fast Approximation Algorithm for
Tree-Sparse Recovery, in: 2014 IEEE International Symposium on Infor-
mation Theory, IEEE, 1842–1846, 2014.
[84] E. Van Den Berg, M. P. Friedlander, Probing the Pareto Frontier for
Basis Pursuit Solutions, SIAM Journal on Scientific Computing 31 (2008)
[85] E. Esser, Y. Lou, J. Xin, A Method for Finding Structured Sparse So-
lutions to Nonnegative Least Squares Problems with Applications, SIAM
Journal on Imaging Sciences 6 (2013) 2010–2046.
[86] P. Yin, Y. Lou, Q. He, J. Xin, Minimization of 1-2 for Compressed Sensing,
SIAM Journal on Scientific Computing 37 (2015) A536–A563.
[87] P. D. Tao, The DC (Difference of Convex Functions) Programming and
DCA Revisited with DC Models of Real World Nonconvex Optimization
Problems, Annals of operations research 133 (2005) 23–46.
[88] I. F. Gorodnitsky, B. D. Rao, Sparse Signal Reconstruction from Limited
Data Using FOCUSS: A Re-Weighted Minimum Norm Algorithm, IEEE
Transactions on Signal Processing 45 (1997) 600–616.
[89] M. A. Figueiredo, R. D. Nowak, A Bound Optimization Approach to
Wavelet-Based Image Deconvolution, in: IEEE International Conference
on Image Processing 2005, vol. 2, IEEE, II–782, 2005.
[90] M. A. Figueiredo, J. M. Bioucas-Dias, R. D. Nowak, Majoriza-
tion–Minimization Algorithms for Wavelet-Based Image Restoration,
IEEE Transactions on Image processing 16 (2007) 2980–2991.
[91] Z. Xu, H. Zhang, Y. Wang, X. Chang, Y. Liang, L 1/2 Regularization,
Science China Information Sciences 53 (2010) 1159–1169.
[92] R. Chartrand, Exact Reconstruction of Sparse Signals via Nonconvex Min-
imization, IEEE Signal Processing Letters 14 (2007) 707–710.
[93] R. Chartrand, V. Staneva, Restricted Isometry Properties and Nonconvex
Compressive Sensing, Inverse Problems 24 (2008) 035020.
[94] R. Chartrand, W. Yin, Iteratively Reweighted Algorithms for Compressive
Sensing, in: 2008 IEEE International Conference on Acoustics, Speech and
Signal Processing, IEEE, 3869–3872, 2008.
[95] R. Saab, R. Chartrand, O. Yilmaz, Stable Sparse Approximations via
Nonconvex Optimization, in: 2008 IEEE International Conference on
Acoustics, Speech and Signal Processing, IEEE, 3885–3888, 2008.
[96] E. Schlossmacher, An Iterative Technique for Absolute Deviations Curve
Fitting, Journal of the American Statistical Association 68 (1973) 857–
[97] P. W. Holland, R. E. Welsch, Robust Regression Using Iteratively
Reweighted Least-Squares, Communications in Statistics-theory and
Methods 6 (1977) 813–827.
[98] I. Daubechies, R. DeVore, M. Fornasier, C. S. G¨unt¨urk, Iteratively
Reweighted Least Squares Minimization for Sparse Recovery, Commu-
nications on Pure and Applied Mathematics 63 (2010) 1–38.
[99] E. J. Candes, M. B. Wakin, S. P. Boyd, Enhancing Sparsity by Reweighted
`1 Minimization, Journal of Fourier analysis and applications 14 (2008)
[100] S. Ji, Y. Xue, L. Carin, et al., Bayesian Compressive Sensing, IEEE Trans-
actions on Signal Processing 56 (2008) 2346.
[101] S. D. Babacan, R. Molina, A. K. Katsaggelos, Bayesian Compressive
Sensing Using Laplace Priors, IEEE Transactions on Image Processing
19 (2009) 53–63.
[102] W. M. Bolstad, J. M. Curran, Introduction to Bayesian statistics, John
Wiley & Sons, 2016.
[103] M. E. Tipping, Sparse Bayesian Learning and the Relevance Vector Ma-
chine, Journal of machine learning research 1 (2001) 211–244.
[104] T. J. Mitchell, J. J. Beauchamp, Bayesian variable selection in linear re-
gression, Journal of the american statistical association 83 (404) (1988)
[105] H. Ishwaran, J. S. Rao, et al., Spike and slab variable selection: frequentist
and Bayesian strategies, The Annals of Statistics 33 (2) (2005) 730–773.
[106] V. Roˇckov´a, E. I. George, The spike-and-slab lasso, Journal of the Amer-
ican Statistical Association 113 (521) (2018) 431–444.
[107] C. M. Carvalho, N. G. Polson, J. G. Scott, The horseshoe estimator for
sparse signals, Biometrika 97 (2) (2010) 465–480.
[108] A. Bhadra, J. Datta, N. G. Polson, B. Willard, et al., Lasso meets horse-
shoe: A survey, Statistical Science 34 (3) (2019) 405–427.
[109] W. R. Gilks, S. Richardson, D. Spiegelhalter, Markov Chain Monte Carlo
in Practice, Chapman and Hall/CRC, 1995.
[110] D. M. Blei, A. Kucukelbir, J. D. McAuliffe, Variational Inference: A Re-
view for Statisticians, Journal of the American Statistical Association 112
(2017) 859–877.
[111] A. C. Faul, M. E. Tipping, Analysis of Sparse Bayesian Learning, in:
Advances in Neural Information Processing Systems, 383–389, 2002.
[112] M. E. Tipping, A. C. Faul, Fast Marginal Likelihood Maximisation for
Sparse Bayesian Models., in: AISTATS, 2003.
[113] F. Bach, R. Jenatton, J. Mairal, G. Obozinski, Structured Sparsity
through Convex Optimization, Statistical Science 27 (2012) 450–468.
[114] M. Yuan, Y. Lin, Model Selection and Estimation in Regression with
Grouped Variables, Journal of the Royal Statistical Society: Series B (Sta-
tistical Methodology) 68 (2006) 49–67.
[115] C. Chen, J. Huang, Compressive Sensing MRI with Wavelet Tree Sparsity,
in: Advances in Neural Information Processing Systems, 1115–1123, 2012.
[116] J. Huang, T. Zhang, D. Metaxas, Learning with Structured Sparsity, Jour-
nal of Machine Learning Research 12 (2011) 3371–3412.
[117] R. G. Baraniuk, V. Cevher, M. F. Duarte, C. Hegde, Model-Based Com-
pressive Sensing, IEEE Transactions on information theory 56 (2010)
[118] H. Zou, T. Hastie, Regularization and Variable Selection via the Elastic
Net, Journal of the Royal Statistical Society: Series B (Statistical Method-
ology) 67 (2005) 301–320.
[119] J. Huang, T. Zhang, The Benefit of Group Sparsity, The Annals of Statis-
tics 38 (2010) 1978–2004.
[120] F. R. Bach, Consistency of the Group Lasso and Multiple Kernel Learning,
Journal of Machine Learning Research 9 (2008) 1179–1225.
[121] R. Jenatton, J.-Y. Audibert, F. Bach, Structured Variable Selection with
Sparsity-Inducing Norms, Journal of Machine Learning Research 12 (2011)
[122] G. Obozinski, L. Jacob, J.-P. Vert, Group Lasso with Overlaps: The La-
tent Group Lasso Approach, arXiv preprint arXiv:1110.0413 .
[123] J. Mairal, R. Jenatton, G. Obozinski, F. Bach, Convex and Network Flow
Optimization for Structured Sparsity, Journal of Machine Learning Re-
search 12 (2011) 2681–2720.
[124] J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online Learning for Matrix Fac-
torization and Sparse Coding, Journal of Machine Learning Research 11
(2010) 19–60.
[125] B. A. Olshausen, D. J. Field, Sparse Coding with an Overcomplete Basis
Set: A Strategy Employed by V1?, Vision Research 37 (1997) 3311–3325.
[126] B. A. Olshausen, D. J. Field, Natural Image Statistics and Efficient Cod-
ing, Network: Computation in Neural Systems 7 (1996) 333–339.
[127] M. S. Lewicki, B. A. Olshausen, Probabilistic Framework for the Adap-
tation and Comparison of Image Codes, Journal of the Optical Society of
America A 16 (1999) 1587–1601.
[128] M. S. Lewicki, T. J. Sejnowski, Learning Overcomplete Representations,
Neural Computation 12 (2000) 337–365.
[129] K. Engan, S. O. Aase, J. H. Husoy, Method of Optimal Directions for
Frame Design, in: 1999 IEEE International Conference on Acoustics,
Speech, and Signal Processing. Proceedings. ICASSP99, vol. 5, IEEE,
2443–2446, 1999.
[130] K. Engan, B. D. Rao, K. Kreutz-Delgado, Frame Design Using FOCUSS
with Method of Optimal Directions (MOD), in: Proceedings of the Nor-
wegian Signal Processing Symposium, vol. 99, 65–69, 1999.
[131] K. Engan, S. O. Aase, J. H. Husøy, Multi-Frame Compression: Theory
and Design, Signal Processing 80 (2000) 2121–2140.
[132] K. Kreutz-Delgado, J. F. Murray, B. D. Rao, K. Engan, T.-W. Lee, T. J.
Sejnowski, Dictionary Learning Algorithms for Sparse Representation,
Neural Computation 15 (2003) 349–396.
[133] J. F. Murray, K. Kreutz-Delgado, An Improved FOCUSS-Based Learn-
ing Algorithm for Solving Sparse Linear Inverse Problems, in: Confer-
ence Record of Thirty-Fifth Asilomar Conference on Signals, Systems and
Computers, vol. 1, IEEE, 347–351, 2001.
[134] K. Kreutz-Delgado, B. D. Rao, FOCUSS-Based Dictionary Learning Al-
gorithms, in: Wavelet Applications in Signal and Image Processing VIII,
vol. 4119, International Society for Optics and Photonics, 459–474, 2000.
[135] M. Aharon, M. Elad, A. Bruckstein, et al., K-SVD: An Algorithm for De-
signing Overcomplete Dictionaries for Sparse Representation, IEEE Trans-
actions on Signal Processing 54 (2006) 4311.
[136] K. Gregor, Y. LeCun, Learning Fast Approximations of Sparse Coding,
in: Proceedings of the 27th International Conference on International
Conference on Machine Learning, Omnipress, 399–406, 2010.
[137] A. Fawzi, M. Davies, P. Frossard, Dictionary Learning for Fast Classi-
fication Based on Soft-Thresholding, International Journal of Computer
Vision 114 (2015) 306–321.
[138] I. Tosic, P. Frossard, Dictionary Learning: What Is the Right Represen-
tation for My Signal?, IEEE Signal Processing Magazine 28 (2011) 27–38.
[139] A. Gersho, R. M. Gray, Vector Quantization and Signal Compression, vol.
159, Springer Science & Business Media, 2012.
[140] J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online Dictionary Learning for
Sparse Coding, in: Proceedings of the 26th Annual International Confer-
ence on Machine Learning, ACM, 689–696, 2009.
[141] D. Bertsekas, Nonlinear Programming, Athena Scientific, 1999.
[142] Y. Bao, J. L. Beck, H. Li, Compressive Sampling for Accelerometer Signals
in Structural Health Monitoring, Structural Health Monitoring 10 (2011)
[143] Y. Bao, H. Li, X. Sun, Y. Yu, J. Ou, Compressive Sampling-Based Data
Loss Recovery for Wireless Sensor Networks Used in Civil Structural
Health Monitoring, Structural Health Monitoring 12 (2013) 78–95.
[144] Y. Bao, Y. Yu, H. Li, X. Mao, W. Jiao, Z. Zou, J. Ou, Compressive
Sensing-based Lost Data Recovery of Fast-moving Wireless Sensing for
Structural Health Monitoring, Structural Control and Health Monitoring
22 (2015) 433–448.
[145] Y. Huang, J. L. Beck, S. Wu, H. Li, Robust Bayesian Compressive Sensing
for Signals in Structural Health Monitoring, Computer-Aided Civil and
Infrastructure Engineering 29 (2014) 160–179.
[146] Y. Huang, J. L. Beck, S. Wu, H. Li, Bayesian Compressive Sensing for Ap-
proximately Sparse Signals and Application to Structural Health Monitor-
ing Signals for Data Loss Recovery, Probabilistic Engineering Mechanics
46 (2016) 62–79.
[147] Y. Yang, S. Nagarajaiah, Y.-Q. Ni, Data Compression of Very Large-scale
Structural Seismic and Typhoon Responses by Low-rank Representation
with Matrix Reshape, Structural Control and Health Monitoring 22 (2015)
[148] Y. Yang, S. Nagarajaiah, Harnessing Data Structure for Recovery of Ran-
domly Missing Structural Vibration Responses Time History: Sparse Rep-
resentation versus Low-Rank Structure, Mechanical Systems and Signal
Processing 74 (2016) 165–182.
[149] Y. Yang, S. Nagarajaiah, Robust Data Transmission and Recovery of Im-
ages by Compressed Sensing for Structural Health Diagnosis, Structural
Control and Health Monitoring 24 (2017) e1856.
[150] Y. Bao, Z. Shi, X. Wang, H. Li, Compressive Sensing of Wireless Sensors
Based on Group Sparse Optimization for Structural Health Monitoring,
Structural Health Monitoring 17 (2018) 823–836.
[151] Z. Pang, M. Yuan, M. B. Wakin, A Random Demodulation Architecture
for Sub-Sampling Acoustic Emission Signals in Structural Health Moni-
toring, Journal of Sound and Vibration 431 (2018) 390–404.
[152] S. O’Connor, J. Lynch, A. Gilbert, Compressed Sensing Embedded in
an Operational Wireless Sensor Network to Achieve Energy Efficiency in
Long-Term Monitoring Applications, Smart Materials and Structures 23
(2014) 085014.
[153] Z. Zou, Y. Bao, H. Li, B. F. Spencer, J. Ou, Embedding Compressive
Sensing-Based Data Loss Recovery Algorithm into Wireless Smart Sensors
for Structural Health Monitoring, IEEE Sensors Journal 15 (2014) 797–
[154] R. Klis, E. N. Chatzi, Data Recovery via Hybrid Sensor Networks for
Vibration Monitoring of Civil Structures, International Journal of Sus-
tainable Materials and Structural Systems 2 (2015) 161–184.
[155] R. Klis, E. N. Chatzi, Vibration Monitoring via Spectro-Temporal Com-
pressive Sensing for Wireless Sensor Networks, Structure and Infrastruc-
ture Engineering 13 (2017) 195–209.
[156] D. Mascare˜nas, A. Cattaneo, J. Theiler, C. Farrar, Compressed Sensing
Techniques for Detecting Damage in Structures, Structural Health Moni-
toring 12 (2013) 325–338.
[157] Y. Wang, H. Hao, Damage Identification Scheme Based on Compressive
Sensing, Journal of Computing in Civil Engineering 29 (2013) 04014037.
[158] Y. Bao, H. Li, Z. Chen, F. Zhang, A. Guo, Sparse `1 Optimization-based
Identification Approach for the Distribution of Moving Heavy Vehicle
Loads on Cable-stayed Bridges, Structural Control and Health Monitoring
23 (2016) 144–155.
[159] Y. Yang, S. Nagarajaiah, Structural Damage Identification via a Combina-
tion of Blind Feature Extraction and Sparse Representation Classification,
Mechanical Systems and Signal Processing 45 (2014) 1–23.
[160] Y. Yang, S. Nagarajaiah, Output-Only Modal Identification by Com-
pressed Sensing: Non-Uniform Low-Rate Random Sampling, Mechanical
Systems and Signal Processing 56 (2015) 15–34.
[161] Y. Yang, C. Dorn, T. Mancini, Z. Talken, S. Nagarajaiah, G. Kenyon,
C. Farrar, D. Mascare˜nas, Blind Identification of Full-Field Vibra-
tion Modes of Output-Only Structures from Uniformly-Sampled, Possi-
bly Temporally-Aliased (Sub-Nyquist), Video Measurements, Journal of
Sound and Vibration 390 (2017) 232–256.
[162] J. Y. Park, M. B. Wakin, A. C. Gilbert, Modal Analysis with Compressive
Measurements, IEEE Transactions on Signal Processing 62 (2014) 1655–
[163] S. Li, D. Yang, G. Tang, M. B. Wakin, Atomic Norm Minimization for
Modal Analysis from Random and Compressed Samples, IEEE Transac-
tions on Signal Processing 66 (2018) 1817–1831.
[164] E. M. Hernandez, Identification of Isolated Structural Damage from In-
complete Spectrum Information Using L1-Norm Minimization, Mechani-
cal Systems and Signal Processing 46 (2014) 59–69.
[165] E. M. Hernandez, Identification of Localized Structural Damage from
Highly Incomplete Modal Information: Theory and Experiments, Jour-
nal of Engineering Mechanics 142 (2015) 04015075.
[166] X.-Q. Zhou, Y. Xia, S. Weng, L1 Regularization Approach to Structural
Damage Detection Using Frequency Data, Structural Health Monitoring
14 (2015) 571–582.
[167] C. B. Smith, E. M. Hernandez, Detection of Spatially Sparse Damage
Using Impulse Response Sensitivity and LASSO Regularization, Inverse
Problems in Science and Engineering 27 (2019) 1–16.
[168] C. Zhang, Y. Xu, Comparative Studies on Damage Identification with
Tikhonov Regularization and Sparse Regularization, Structural Control
and Health Monitoring 23 (2016) 560–579.
[169] C. Zhang, J.-Z. Huang, G.-Q. Song, L. Chen, Structural Damage Identifi-
cation by Extended Kalman Filter with `1-norm Regularization Scheme,
Structural Control and Health Monitoring 24 (2017) e1999.
[170] R. Hou, Y. Xia, X. Zhou, Structural Damage Detection Based on L1
Regularization Using Natural Frequencies and Mode Shapes, Structural
Control and Health Monitoring 25 (2018) e2107.
[171] R. Hou, Y. Xia, Y. Bao, X. Zhou, Selection of Regularization Parameter
for L1-Regularized Damage Detection, Journal of Sound and Vibration
423 (2018) 141–160.
[172] X. Zhou, R. Hou, Y. Wu, Structural Damage Detection Based on Iter-
atively Reweighted `1 Regularization Algorithm, Advances in Structural
Engineering 22 (2019) 1479–1487.
[173] L. Wang, Z.-R. Lu, Sensitivity-Free Damage Identification Based on In-
complete Modal Data, Sparse Regularization and Alternating Minimiza-
tion Approach, Mechanical Systems and Signal Processing 120 (2019) 43–
[174] M. Jayawardhana, X. Zhu, R. Liyanapathirana, U. Gunawardana, Com-
pressive Sensing for Efficient Health Monitoring and Effective Damage
Detection of Structures, Mechanical Systems and Signal Processing 84
(2017) 414–430.
[175] J. Guo, L. Wang, I. Takewaki, Modal-based Structural Damage Identifica-
tion by Minimum Constitutive Relation Error and Sparse Regularization,
Structural Control and Health Monitoring 25 (2018) e2255.
[176] I. A. Kougioumtzoglou, K. R. M. dos Santos, L. Comerford, Incomplete
Data Based Parameter Identification of Nonlinear and Time-Variant Os-
cillators with Fractional Derivative Elements, Mechanical Systems and
Signal Processing 94 (2017) 279–296.
[177] K. R. M. dos Santos, O. Brudastova, I. A. Kougioumtzoglou, Spec-
tral identification of nonlinear multi-degree-of-freedom structural systems
with fractional derivative terms based on incomplete non-stationary data,
Structural Safety 86 (2020) 101975.
[178] J. S. Bendat, Nonlinear Systems Techniques and Applications, John Wiley
& Sons, 1998.
[179] K. Gkoktsi, A. Giaralis, Assessment of Sub-Nyquist Deterministic and
Random Data Sampling Techniques for Operational Modal Analysis,
Structural Health Monitoring 16 (2017) 630–646.
[180] K. Gkoktsi, A. Giaralis, A Multi-Sensor Sub-Nyquist Power Spectrum
Blind Sampling Approach for Low-Power Wireless Sensors in Operational
Modal Analysis Applications, Mechanical Systems and Signal Processing
116 (2019) 879–899.
[181] Z. Lai, S. Nagara jaiah, Sparse Structural System Identification Method
for Nonlinear Dynamic Systems with Hysteresis/Inelastic Behavior, Me-
chanical Systems and Signal Processing 117 (2019) 813–842.
[182] Z. Lai, S. Nagarajaiah, Semi-supervised Structural Linear/Nonlinear
Damage Detection and Characterization Using Sparse Identification,
Structural Control and Health Monitoring 26 (2019) e2306.
[183] D. Sen, A. Aghazadeh, A. Mousavi, S. Nagarajaiah, R. Baraniuk, Sparsity-
Based Approaches for Damage Detection in Plates, Mechanical Systems
and Signal Processing 117 (2019) 333–346.
[184] A. Rezayat, V. Nassiri, B. De Pauw, J. Ertveldt, S. Vanlanduit, P. Guil-
laume, Identification of Dynamic Forces Using Group-Sparsity in Fre-
quency Domain, Mechanical Systems and Signal Processing 70 (2016)
[185] Q. Li, Q. Lu, A Hierarchical Bayesian Method for Vibration-Based Time
Domain Force Reconstruction Problems, Journal of Sound and Vibration
421 (2018) 190–204.
[186] B. Qiao, Z. Mao, J. Liu, Z. Zhao, X. Chen, Group Sparse Regularization
for Impact Force Identification in Time Domain, Journal of Sound and
Vibration 445 (2019) 44–63.
[187] L. Comerford, I. A. Kougioumtzoglou, M. Beer, Compressive Sensing
Based Stochastic Process Power Spectrum Estimation Subject to Miss-
ing Data, Probabilistic Engineering Mechanics 44 (2016) 66–76.
[188] L. A. Comerford, M. Beer, I. A. Kougioumtzoglou, Compressive Sensing
Based Power Spectrum Estimation from Incomplete Records by Utilizing
an Adaptive Basis, in: 2014 IEEE Symposium on Computational Intelli-
gence for Engineering Solutions (CIES), IEEE, 117–124, 2014.
[189] L. Comerford, H. Jensen, F. Mayorga, M. Beer, I. Kougioumtzoglou,