Content uploaded by Raja Giryes
Author content
All content in this area was uploaded by Raja Giryes on Sep 25, 2014
Content may be subject to copyright.
1
Sparsity Based Methods for Overparametrized
Variational Problems
Raja Giryes and Michael Elad and Alfred Bruckstein
Department of Computer Science, The Technion - Israel Institute of Technology
Haifa, 32000, Israel
{raja,elad}@cs.technion.ac.il
Abstract—Two complementary approaches have been exten-
sively used in signal and image processing leading to novel
results, the sparse representation methodology and the varia-
tional strategy. Recently, a new sparsity based model has been
proposed, the cosparse analysis framework, that has established a
very interesting connection between the two, highlighting shown
how the traditional total variation minimization problem can
be viewed as a sparse approximation problem. Based on this
work we introduce a sparsity based framework for solving
overparametrized variational problems. The latter has been used
to improve the estimation of optical flow and also for general
denoising of signals and images. However, the recovery of the
space varying parameters involved was not adequately solved by
the traditional variational methods. We first demonstrate the
efficiency of the new framework for one dimensional signals
in recovering a piecewise linear and polynomial function. We
present also performance guarantees for recovery of picewise
polynomial functions from a few number of measurements. Then
for images we illustrate how the new technique can be used for
denoising, geometrical inpainting and segmentation.
I. INTRODUCTION
Any successful signal and image processing technique relies
on the fact that the given signals or images of interest belong
to a class described by a certain a priori known model.
Given the model, the signal is processed by estimating the
“correct” parameters of the model. For example, in the sparsity
framework the signals are assumed to belong to a union of
low dimensional subspaces [1], [2], [3], [4]. In the variational
strategy the signal is assumed to have certain smoothness
properties or some model on how it is allowed to vary [5].
Though both sparsity based and variational based ap-
proaches are widely used for signal processing and computer
vision, they are often viewed as two different methods with
little in common between them. One of the well known vari-
atonal techniques is the usage of total variation regularization
for denoising and inverse problems. It can be formulated as
[5]
min
˜
f
g−M˜
f
2
2+λ
∇˜
f
1,(1)
where g=Mf +e∈Rmis a noisy signal, M∈Rm×dis a
measurement matrix, e∈Rmis an additive noise, f∈Rdis
the original unknown signal, λis a regularization parameter
and ∇fis the vector gradients of f.
The anisotropic version of (1) is
min
˜
f
g−M˜
f
2
2+λ
ΩDIF˜
f
1,(2)
where ΩDIF is the finite difference operator that returns the
derivatives of the signal. For images it returns the horizontal
and vertical derivatives. Note that for 1D signals there is
no difference between (1) and (2) as the gradient equals the
derivative. However, in 2D (1) considers the sum of gradients
(square root of the squared sum of directional derivatives) and
(2) considers the absolute sum of directional derivatives.
Recently, a very interesting connection has been drawn
between the the total variation minimization problem and the
sparsity model. It has been shown that (2) can be viewed as
an `1-relaxation technique for approximating signals which
are sparse in their derivatives domain, i.e., after applying the
operator ΩDIF on them [6], [7], [8]. Such signals are said to
be cosparse under the operator ΩDIF in the analysis sparsity
model [7].
Notice that the TV regularization is only one example from
the variational framework. One recent variational approach is
the overparametrized one that represents the signal as a com-
bination of known functions of the space variables weighted
by space-variant coefficients parameters of the model.
For example, a linear overparametrization for one dimen-
sional signals represents the i-th element in fas f[i] =
a[i] + b[i]i, where aand bare the coefficients of the vectors
[1 . . . 1]∗and [1 . . . d]∗. For images this parametrization is
f[i, j] = a[i, j ] + b1[i, j]i+b2[i, j]j. Such parametrizations
have been shown to improve the denoising performance of
(1) [9] and to provide the best recovery results for estimating
optical flow [10], [11], [12].
A. Our Contribution
The true force behind overparametrization is that while it
uses more variables then needed for representation the signals,
these are often naturally suited to describe the structure of the
signal. For example, if a signal is piecewise linear then we may
impose a constraint on the overparametrization coefficients a
and bto be piecewise constant.
Note that peicewise constant signals are sparse under the
ΩDIF operator. We have seen that for one coefficients parameter
we can use the tools developed in the analysis sparsity model.
However, in our case aand bare jointly sparse, i.e., their
change points are collocated in time.
Constraints about structure in the sparsity pattern in a
representation have already been analyzed in the literature and
are commonly referred to as joint sparsity models if we look
2
at separate vectors or as block sparsity if we look at a single
vector.
In this work we introduce a general sparsity based frame-
work for solving overparametrized variational problems. We
show the efficiency of the new framework for one dimensional
signals in recovering a piecewise linear signal. Then we move
to images and demonstrate how the new technique can be used
for denoising, geometrical inpainting and segmentation.
B. Organization
The organization of this work is as follows. In Section II
we describe briefly the synthesis and analysis sparsity models
and the GAPN algorithm. In Section IV we present our new
framework for solving overparmetrized variational problems
using sparsity. In Section V we present some experiments
for linear overparametrization of images and one dimensional
signals. We show how the proposed method can be used
for image denoising, segmentation and inpainting. Section VI
concludes our work and proposes future directions of research.
II. TH E SYNTHESIS AND ANALYSIS SPARSITY MODELS
Considering again the linear measurements
g=Mf +e,(3)
note that without a prior knowledge on fwe cannot recover it
from gif m < d or e6= 0. However, if we know apriori that
fresides in a union of low dimensional subspaces, which do
not intersect with the null space of M, then we can estimate
fstably by selecting the signal that belongs to this union of
subspaces and is closest to g.
This is exactly the idea behind the sparsity approach [1],
[2], [3], [4]. In the classical model, the signal fis assumed
to have a sparse representation αunder a given dictionary D,
i.e., f=Dα,kαk0≤k, where k·k0is the `0pseudo-norm
that counts the number of non-zero entries in a vector and k
is the sparsity of the signal. With this model we can recover
gby solving
min
αkg−MDαk2
2s.t. kαk0≤k, (4)
if kis foreknown or
min
αkαk0s.t. kg−MDαk2
2≤ kek2
2,(5)
if we have information about the energy of the noise e.
As these minimization problems are NP-hard [13], many
approximation techniques have been proposed to approximate
their solution with guarantees on the recovery that depends
on the properties of the matrices Mand D. These include
`1-relaxation [14], [15], [16], known also as LASSO [17],
orthogonal matching pursuit (OMP) [18], [19], compressive
sampling matching pursuit (CoSaMP) [20], subspace pursuit
(SP) [21], iterative hard thresholding (IHT) [22] and hard
thresholding pursuit (HTP) [23].
Note that each low dimensional subspace in the standard
sparsity model, known also as the synthesis model, is spanned
by a collection of kcolumns from D. Another framework for
modeling a union of low dimensional subspaces is the analysis
one [7], [14]. This model considers the behavior of Ωf, the
signal after applying a given operator Ωon it, and assumes that
it is sparse. Note that here, the zeros are those that characterize
the subspace as each zero in Ωf corresponds to a row in Ω
to which fis orthogonal to. We say that fis cosparse under
Ωwith a cosupport Λif ΩΛf= 0 is sparse and ΩΛf= 0,
where ΩΛis a sub-matrix of Ωwith the rows corresponding
to the set Λ.
The analysis variants of (5) and (4) for estimating fare
min
˜
f
g−M˜
f
2
2s.t.
Ω˜
f
0≤k, (6)
where kis the number of non-zeros in Ωf, and
min
˜
f
Ω˜
f
0s.t.
g−M˜
f
2
2≤ kek2
2.(7)
As we have had in the synthesis case, the minimization
problems in (6) and (7) are also NP-hard. Therefore, ap-
proximation techniques have been proposed including Greedy
Analysis Pursuit (GAP) [7], GAP noise (GAPN) [24], anal-
ysis CoSAMP (ACoSaMP), analysis SP (ASP), analysis IHT
(AIHT) and analysis HTP (AHTP) [8].
III. REC OVE RY GUAR AN TE ES
Having the sparsity models defined, we revisit the over-
parametrization variational problem. If we know that our
signal fis piecwiese linear then it is clear that the coef-
ficients parameters should be piecewise constant with the
same changing locations when linear overparametrization is
used. Let Ibe the identity matrix, X,diag(1, . . . , d),
where diag(1, . . . , d)is a diagonal matrix with 1, . . . , d on
its diagonal, and kbe the number of change points in the
signal. Then we can write f= [I,X] [a∗,b∗]∗noticing that a
and bare jointly sparse under ΩDIF, i.e, ΩDIF aand ΩDIFbhas
the same non-zero locations. With this observation we can
extend the analysis minimization problem (6) to support the
structured sparsity in the vector [a∗,b∗]∗. This leads us to the
following minimization problem:
min
a,b
g−M[I,X]a
b
2
2
(8)
s.t. k|ΩDIFa|+|ΩDIF b|k0≤k,
where |ΩDIFa|2denotes applying element-wise absolute value
and square on the entries of ΩDIFa.
Note that we can have a similar formulation for this problem
also in the synthesis framework using the heaviside dictionary
DHS =
1 0 . . . . . . 0
1 1 0 . . . 0
.
.
..
.
........
.
.
1 1 . . . 1 0
1 1 . . . . . . 1
,(9)
which contains all the heaviside step functions. We use the
known observation that every one dimensional signal with k
change points can be sparsely represented using k+ 1 atoms
from DHS (kcolumns for representing the change points plus
one for the DC). Therefore, one may recover the coefficients
3
Algorithm 1 Signal Space CoSaMP (SSCoSaMP) for Piece-
wise Polynomial Functions
Input: k, M,g, γ where g=Mf +e,f=
I,X1,...,Xn[a∗,b∗
1,...,b∗
n]∗is a piecewise polyno-
mial function of order n,k=k|ΩDIFa|+Pn
i=1 |ΩDIFbi|k0
is the number of jumps in the representation coefficients
of f,eis an additive noise and γis a parameter of the
algorithm. Sn(·, k)is a procedure that approximates a
given signal by a piecewise polynomial function of order
nwith kjumps.
Output: ˆ
f: A piecewise polynomial with k+ 1 that approxi-
mates f.
•Initialize the jumps’ locations T0=∅, the residual g0
r=
gand set t= 0.
while halting criterion is not satisfied do
•t=t+ 1.
•Find the parameterization ar,br,1,...,br,n of the
residual’s polynomial approximation by calculating
Sn(M∗gt−1
r, ak).
•Find new temporal jump locations: T∆=the support
of |ΩDIFar|+Pn
i=1 |ΩDIFbr,i|.
•Update the jumps locations’ indices: ˜
Tt=Tt−1∪T∆.
•Compute temporal parameters: [ap,bp,1,...,bp,n] =
argmin˜
a,˜
b1,...,˜
bn
g−MXn[a∗,b∗
1,...,b∗
n]∗
2
2s.t.
(ΩDIF ˜
a)(˜
Tt)C= 0,(ΩDIF ˜
b1)(˜
Tt)C= 0,. . . ,
(ΩDIF ˜
bn)(˜
Tt)C= 0.
•Calculate a polynomial approximation of order n:
ft=Sn(I,X1,...,Xna∗
p,b∗
p,1,...,b∗
p,n∗, k).
•Find new jump locations: t=the locations of the
jumps in the parametrization of ft.
•Update the residual: gt
r=g−Mft.
end while
•Form final solution ˆ
f=ft.
parameters aand bthrough their sparse representations by
solving
min
α,β
g−M[I,X]DHS 0
0DHS α
β
2
2
(10)
s.t. k|α|+|β|k0≤k,
and then setting a=DHS αand b=DH S β. This minimiza-
tion problem can be approximated using the group-LASSO
estimator [25], the mixed-`2/`1relaxation (extension of the
`1relaxation) [26], [27], the Block OMP (BOMP) algorithm
[28] or the extensions of CoSaMP and IHT to structured
sparsity [29]. To the best of our knowledge, there are no such
extensions for the analysis framework.
The problem with the existing synthesis techniques is
twofold: (i) No recovery guarantees exist for them with the
dictionary DHS ; (ii) It is hard to generalize the model in (4)
to other types of signals. In the next section we present an
algorithm that deals with the second point. Now, we turn to
present an algorithm that approximates both (4) and (6) and
has theoretical reconstruction performance guarantees for RIP
matrices M.
The reason that no theoretical guarantees are provided
for the DHS dictionary is the high correlation between its
columns. These create high ambiguity between the different
columns in DHS causing the classical synthesis techniques to
fail in recovering the representations αand β. This problem
has been addressed in several contributions that have treated
the signal directly and not its representation [30], [31], [32],
[33], [34], [35].
In this work we recruit the signal space CoSaMP (SS-
CoSaMP) strategy [30], [34]1. This technique assumes the
existence of a projection that given a signal finds its closest
signal (in the `2norm sense) that belongs to the model2.,
where in our case the model is piecewise linear functions with
kjump points. In the algorithm we treat the more general case
of picewise polynomial functions of order nand denotes the
corresponding projection by Sn(·, k). A version of SSCoSaMP
adapted to our model is presented in Algorithm 1. Due to the
equivalence between DHS and ΩDI F we use the latter in the
algorithm.
We suggest performing this projection using dynamic pro-
gramming. Our strategy is a generalization of the one that
appears in [8], [36] and might be used also with piecewise
polynomial models and is not limited only to piecewise
linear functions. Therefore, we have that SSCoSaMP with our
projection method is suitable for approximating
min
a,b
g−MI,X1,...,Xn
a
b1
.
.
.
bn
2
2
(11)
s.t.
|ΩDIFa|2+
n
X
i=1 |ΩDIFbi|2
0≤k,
for any value of n.
A. Optimal Approximation using Piece-wise Polynomial Func-
tions
Our technique utilizes the fact that ones the jump points are
set, the optimal parameters of the polynomial in a segment
[i, j]can be calculated optimally by solving a least squares
minimization problem
I[i, j],X1[i, j ],...,Xn[i, j]
a[i, j]
b1[i, j]
.
.
.
bn[i, j]
−g[i, j]
2
2
,(12)
where g[i, j]is the sub-vector of gsupported by the indices
ito j(i≤j) and X1[i, j]is the (square) sub-matrix of X1
corresponding to the indices ito j. We denote by Pn(g[i, j])
the resulted polynomial function. Indeed, in the case that the
size of the segment is smaller than the number of parameters,
e.g., segment of size one for a linear function, the above
1In a very similar way we could have used the analysis CoSaMP
(ACoSaMP) [8].
2Note that in [30], [34] the projection might be allowed to be near-optimal
in the sense that the projection error is close to the optimal error up to a
multiplicative constant.
4
minimization problem has infinitely many options for setting
the parameters. However, all of them lead to the same result,
which is keeping the values of the points in the segment, i.e.,
having Pn(g[i, j]) = g[i, j ].
Denote by Sn(g[1,˜
d], k)the optimal approximation of the
signal gby a piecewise polynomial function with kjumps.
It can be calculated by solving the following recursive mini-
mization problem
ˆ
i= argmin
1≤i< ˜
dkSn(g[1, i], k −1) −g[1, i]k2
2(13)
+
Pn(g[i+ 1,˜
d]) −g[i+ 1,˜
d]
2
2,
and setting
Sn(g[1,˜
d], k) = Sn(g[1,ˆ
i], k −1)
Pn(g[ˆ
i+ 1,˜
d]) .(14)
The vectors Sn(g[1, i], k −1) can be calculated recursively us-
ing (13). The recursion ends with Sn(g[1, i],0) = Pn(g[1, i]).
This leads us to the following algorithm for calculating an
optimal approximation for a signal g. Notice that this algo-
rithm provides us also with the parametrization of piecewise
polynomial.
1) Calculate Sn(g[1, i],0) = Pn(g[1, i]) for 1≤i≤d.
2) For ˜
k=1:k−1do
•Calculate Sn(g[1,˜
d],˜
k)for 1≤˜
d≤dusing (13)
and (14).
3) Calculate Sn(g[1, d], k)using (13) and (14).
Denoting by Tthe worst case complexity of calculating
Pn(g[i, j]) for any i, j , we have that the complexity of step
1) is O(dT ); of step 2) is O(kd2(T+d)), as the computation
of the projection error is of complexity O(d); and of step 3)
O(d(T+d)). Summing all together we get a total complexity
of O(kd2(T+d)) for the algorithm, which is a polynomial
complexity since Tis polynomial.
B. Recovery Guarantees for Piecewise Polynomial Functions
Having the recovery of SSCoSaMP we may ask ourselves
whether we can guarantee having a stable recovery with it
or even denoising effect. Utilizing two theorems from [34]
and [35] we get reconstruction bounds for SSCoSaMP that
guarantee stable recovery if the noise is additive adversarial
and denoising effect if the noise is i.i.d. white Gaussian.
Both theorems rely on the following property of the mea-
surement matrix M:
Definition 3.1: A matrix Mhas the polynomial restricted
isometry property of order n(Pn-RIP) with a constant δkif
for any piecewise polynomial function of order nvwe have
(1 −δk)kvk2
2≤ kMvk2
2≤(1 + δk)kvk2
2.(15)
Having the P-RIP definition, we turn to present the first
theorem, which treats the adversarial noise case.
Theorem 3.2 (Based on Corollary 3.2 in [34]): Let M
satisfy the Pn-RIP (15) with a constant δ4(k+1) <0.046.
Then after a constant number of iterations, it holds for the
SSCoSaMP estimate that
ˆ
f−f
2≤Ckek2,(16)
where C > 2is a constant depending on δ4(k+1) .
Remark that the above theorem implies that we may com-
pressedly sense piecewise polynomial functions and that we
can have a perfect recovery in the noiseless case e= 0. Note
also that if Mis a subgaussian random matrix then it is
sufficient to use only m=O(k(n+ log(d)) measurements
[3], [8].
Though the above theorem is interesting for compressed
sensing, it does no guarantee noise reduction, even for the
case M=I, as C > 2. The reason for this is that no
prior is provided with respect to the noise distribution. By
introducing such an assumption, one may get better recon-
struction guarantees. The following theorem assumes that the
noise is randomly Gaussian distributed and provides denoising
guarantees.
Theorem 3.3: Assume the conditions of Theorem 3.2. Then
after a constant number of iterations t∗it holds with prob-
ability exceeding 1−2
(3k)! n−βfor the SSCoSaMP estimate
that
kˆ
x−xk2≤(17)
Cp(1 + δ3k)3k1 + p2(1 + β) log(nd)σ.
The bound in the theorem can be given on the expected
error instead of being given only with high probability using
the proof technique in [37]. Remark that if were given an
oracle that foreknows the locations of the jumps in the
parametrization, the error we would get would be O(√kσ).
As the log(nd)factor in our bound is inevitable [38], we may
conclude that our guarantee is optimal up to a constant factor.
IV. SPAR SI TY BA SE D OVE RPARAMETRIZED VARIATIONAL
ALGORITHM
In this section we generalize the model in (11) to support
other overparametrization forms, including higher dimensional
signals such as images, for the case that an upper bound
for the noise energy is given and not the sparsity k, as in
many applications we do not know kbeforehand but rather
have information about the noise statistics. Notice that for the
synthesis model such a generalization is not trivial because
while extending the ΩDIF operator to high dimensions is easy,
it is not clear how to do it for the Heaviside dictionary.
We turn to present the extension to (7), the case where
the noise power is known. Let X1, . . . Xnbe functions of the
space variables and a1. . . antheir coefficients parameters. We
generalize the formulation further and re-insert the matrix M
and assume that all coefficients parameters are jointly sparse
under a general operator Ω. We may recover these coefficients
by solving
min
˜
a1,...,˜
an
n
X
i=1 |Ω˜
ai|2
0
(18)
s.t.
g−M[X1, . . . Xn]
˜
a1
.
.
.
˜
an
2
≤ kek2.
Having an estimate for all the coefficients, our approximation
for the original signal fis ˆ
f= [X1,...,Xn] [ˆ
a∗
1,...,ˆ
a∗
1]∗.
5
Algorithm 2 The Block GAPN Algorithm
Input: k, [X1,...,Xn],M,Ω,g, where g=Mf +e,f=
[X1,...,Xn] [a∗
1,...,a∗
n]∗,kis the sparsity of Pn
i|Ωai|2
and eis the additive noise.
Output: ˆ
a1,...,ˆ
an: cosparse approximations of a1,...,an.
ˆ
f: an estimate for f.
Initialize cosupport Λ = {1, . . . , p}and set t= 0.
while halting criterion is not satisfied do
t=t+ 1.
Calculate a new estimate:
[ˆ
a∗
1,...,ˆ
a∗
n]∗= argmin
˜
a1,...,˜
an
n
X
i=1 kΩΛ˜
aik2
2(19)
s.t.
g−[X1, . . . Xn]
˜
a1
.
.
.
˜
an
2
≤ kek2.
Update cosupport: Λ=Λ\nargmaxjPn
i=1 kΩjˆ
aik2
2o.
end while
Form an estimate for the original signal: ˆ
f=
[X1,...,Xn] [ˆ
a∗
1,...,ˆ
a∗
1]∗.
As this minimization problem is NP-hard we suggest to
solve it by a generalization of the GAPN algorithm [24], the
block GAPN (BGAPN), which is presented in Algorithm 2,
. Remark that if one wishes to use this algorithm for high
dimensional signals he may remove from the cosupport several
elements at the update cosupport stage instead of one.
Ideally, we would expect that after several iterations of
updating the cosupport in BGAPN we would have ΩΛˆ
f= 0.
However, many signals are only nearly cosparse, i.e., have k
significantly large values in Ωf while the rest are smaller than
a small constant . Therefore, a natural stopping criterion in
this case would be to stop when the maximal value in |Ωˆ
ai|
is smaller than . Of course, this is not the only option for a
stopping criterion. For example, one may look at the relative
solution change in each iteration or use a constant number of
iterations if kis foreknown.
The reason we do not use the SSCoSaMP technique here
is that it assumes kto be foreknown. Note that there are
no known recovery guarantees for GAPN as we have had
for SSCoSaMP before. Therefore, we present its efficiency
in several experiments in the next section.
Before we move to the next section we note that one of the
advantages of the above formulation and the BGAPN algo-
rithm is that it is relatively easy to add to it new constraints.
For example, for the one dimensional function case we would
expect that the piecewise linear lines should connect to each
other smoothly. However, we do not have such a continuity
constraint in the current formulation. As we shall see in the
next section, the absence of such a constraint allows jumps in
the connection points between the lines.
One possibility for solving the problem is to add a continuity
constraint on the jump points of the signal. In Algorithm 3
we present a modified version of the BGAPN algorithm that
imposes the continuity constraint on the change points. In
Algorithm 3 The Block GAPN Algorithm with Continuity
Constraint
Input: k, [X1,...,Xn],M,Ω∈Rp×d,g,γ, where g=
Mf +e,f= [X1,...,Xn] [a∗
1,...,a∗
n]∗,kis the sparsity
of Pn
i|Ωai|2,eis the additive noise and γis a weight for
the continuity constraint.
Output: ˆ
a1,...,ˆ
an: cosparse approximations of a1,...,an.
ˆ
f: an estimate for f.
Initialize cosupport Λ = {1, . . . , p}and set t= 0.
while halting criterion is not satisfied do
t=t+ 1.
Calculate a new estimate:
[ˆ
a∗
1,...,ˆ
a∗
n]∗= argmin
˜
a1,...,˜
an
n
X
i=1 kΩΛ˜
aik2
2(20)
s.t.
g−M[X1, . . . Xn]
˜
a1
.
.
.
˜
an
2
≤ kek2.
Update cosupport: Λ=Λ\nargmaxjPn
i=1 kΩjˆ
aik2
2o.
Create Weight Matrix: W= diag(w1, . . . , wp), where
wi= 0 if i∈Λor wi= 1 otherwise.
end while
Recalculate the estimate:
[ˆ
a∗
1,...,ˆ
a∗
n]∗= argmin
˜
a1,...,˜
an
n
X
i=1 kΩΛ˜
aik2
2(21)
+γ
WΩ [X1, . . . Xn]
˜
a1
.
.
.
˜
an
2
2
s.t.
g−M[X1, . . . Xn]
˜
a1
.
.
.
˜
an
2
≤ kek2.
Form an estimate for the original signal: ˆ
f=
[X1,...,Xn] [ˆ
a∗
1,...,ˆ
a∗
1]∗.
the next section we shall see how this solves the problem.
Note that this is only one example for a constraint that one
may wishes to add to the BGAPN technique. For example, in
images the jumps are a desired thing as they form the edges.
However, there are other behaviors which we would expect in
them such as smoothness in the edges’ directions.
V. EX PE RI ME NT S
For demonstrating the efficiency of the proposed method we
perform several simulations. We start with the one dimensional
case testing our line fitting approach with the continuity
constraint and without it. Then we perform some tests on
images using our scheme. We start by denoising cartoon
images that follow the piecewise linear model. We compare
our outcome with the one of TV denoising [5] and show that
we do not suffer from the staircasing effect [39]. Then we
show how our framework may be used for image segmentation
drawing a connection to the Mumford-Shah functional [40],
6
[41]. We end by demonstrating how our proposed mechanism
can be used for inpainting large missing portions of an image
A. Piecewise Linear Lines
In order to check the performance of the line fitting, we
generate random linear piecewise functions with 300 samples,
6jumps and dynamic range [−1,1]. Then we contaminate the
signal with a white Gaussian noise with a standard deviation
σ= 0.1or σ= 0.25. We compare the recovery result of
BGAPN with and without the continuity constraint. Figs. 1
and 2 present two examples from our reconstruction results .
It can be observed that the addition of the continuity constraint
is essential for the correctness of the recovery. Indeed, without
it we have jumps between the segments in the linear function.
In addition, it should be mentioned that by using the BOMP
algorithm, i.e., using the synthesis framework, with and with-
out the continuity constraint, we get very similar behavior to
the one we observe with BGAPN. We focus on the analysis
approach because of the the ability to extend it to images and
higher order functions more straightforwardly.
As a last remark, note that the main achievement in our line
fitting results is the ability to segment the line and to provide
a parametric representation for it. Indeed, one may achieve
better denoising results without using the linear model in terms
of mean squared error (MSE). However, the approximated
function is not guaranteed to be piecewise linear and therefore
learning the change points from it is sub-optimal. See [12] and
the references therein for more details.
B. Image Denoising
We turn to evaluate the performance of our approach
on images. We use a linear overparametrization of the two
dimensional plane and use the two dimensional difference
operator ΩDIF that calculates the horizontal and vertical discrete
derivatives of an image by applying the filters [1,−1] and
[1,−1]∗on it. We apply our scheme for denoising the swoosh
and house images, where the first one is a true piecewise linear
image and the second is not because of the texture in it. We
compare our results with the ones of TV denoising [5].
Fig. 3 presents the recovery of swoosh from its noisy
version contaminated with an additive white Gaussian noise
with σ= 20. Notice that since we are using only horizontal
and vertical derivatives the algorithm favors edges in these
directions. Therefore, we apply our scheme also using an
operator that calculates in addition the diagonal derivatives
1 0
0−1and 0 1
−1 0 . It can be seen that with these
derivatives, we get a better recovery for the while image in
general and for its edges in particular. Note also that we do
not suffer from the staircasing effect that appears in the TV
recovery.
Fig. 4 demonstrates the denoising result we get for an image
with texture. Also here, we do not suffer from the staircasing
effect that appears in the TV recovery. The lower PSNR we
get with our method is due to the fact that our model is linear
and therefore is less capable to adapt itself to the texture.
By using a cubic overparametrization we get PSNR which is
(a) Original Image (b) Noisy Image σ= 20
(c) BGAPN with ΩDIF . PSNR =
39.09dB.
(d) BGAPN with ΩDIF and diagonal
derivatives. PSNR = 39.57dB.
(e) TV recovery. PSNR = 33.32dB.
Fig. 3. Denoising of swoosh using the BGAPN algorithm with and without
diagonal derivatives. The result of TV is presented as a reference. Notice that
we do not have the staircasing effect that appears in the TV reconstruction.
equal to the one of TV. Note also that for larger noise powers
the recovery performance of our algorithm in terms of PSNR
becomes better than TV also with the linear model.
C. Segmentation
Note that since our recovered images are piecewise linear
it is possible to recover the edges of the original image from
them. In Fig. 5 we present the gradient map of our recovered
image and the one of the original image. It can be seen that
while the gradients of the original image capture also the
texture changes, with our method only the main edges are
being preserved. We remark that if we add a blur operator and
recover the image using our approach, the edges are preserved
as well. This motivates us to use our scheme for segmentation.
Notice that since our scheme divides the image to piecewise
linear regions and then segment the image by the boundaries
of each of them, our strategy can be viewed as an approach
7
0 50 100 150 200 250 300
−1.5
−1
−0.5
0
0.5
1
1.5
(a) Noisy Image σ= 0.1
0 50 100 150 200 250 300
−1.5
−1
−0.5
0
0.5
1
original
analysis recovered
analysis recovered continuous
(b) Function Recovery for σ= 0.1
0 50 100 150 200 250 300
−10
−5
0
5
10
a estimate
b estimate
0 50 100 150 200 250 300
−10
−5
0
5
10
a true
b true
(c) Coefficients Parameters Recovery for σ= 0.1
0 50 100 150 200 250 300
−1.5
−1
−0.5
0
0.5
1
1.5
2
(d) Noisy Image σ= 0.25
0 50 100 150 200 250 300
−1.5
−1
−0.5
0
0.5
1
1.5
2
original
analysis recovered
analysis recovered continuous
(e) Function Recovery for σ= 0.25
0 50 100 150 200 250 300
−10
−5
0
5
10
a estimate
b estimate
c estimate
0 50 100 150 200 250 300
−10
−5
0
5
10
a true
b true
c true
(f) Coefficients Parameters Recovery for σ= 0.25
Fig. 1. Recovery of a piecewise linear function using the BGAPN algorithm with and without a constraint on the continuity
that minimizes the Mumford Shah functional [40], [41]. On
the other hand, if the image has only two regions, our
segmentation result can be viewed as a solution of the Chan-
Vese functional with the difference that we model each region
by a polynomial function instead of approximating it by a
constant [42].
We present our segmentation results for three images and
for each display the piecewise constant version of each image
together with its boundary map. Our segmentation results
appear in Figs. 6, 7 and 8. By looking at our segmentation
results, it is clear that there is still a large room for improve-
ment. One direction for improvement is to use more filters
within Ω. Another one is to calculate the gradients of the
coefficients parameters and not of the recovered image as
they are supposed to be truly piecewise constant. We leave
these ideas to a future work. Remark that also without these
suggested improvements, our simple segmentation scheme
presents reasonable results and therefore demonstrates the a
great potential of our whole framework.
D. Geometrical Inpainting
The last application for which we demonstrate our new
formulation is filling large missing portions in an image. Let
Mbe the inpainting matrix that removes the missing elements
from an image.
We present four inpainting examples. We remove circles
with radius twenty pixels from the house image at different
locations. Fig. 9 presents four inpainting examples using our
technique. Notice that in all the examples we remove the
texture from the image due to the prior we use. At the first
example the line is perfectly recovered and this behavior is
expected as the line is included in our model. At the second
example, the algorithm decides to remove the shadow and
replace it with the color of the house. The reason is that the
circle covers most of the shadow. At the third example, we
fill the missing missing information by vertical and horizontal
edges. As observed before, this is due to the fact that these
are the derivatives that we have in ΩDIF. Fig. 10 shows the
recovery result that we get if we add the diagonal derivatives.
At the last example, our scheme cuts through the white part
of the roof and gives the shading and the roof the same color.
The reason this happens is that such a solution creates less
edges compared to the one that would connect the two white
parts of the roof. In order to solve this problem one may add
a constraint to the minimization that imposes smoothness on
the edge directions.
VI. CONCLUSION AND FUTURE WO RK
This work has presented a novel framework for solving the
overparametrized variational problem using sparse representa-
tions. We have demonstrated how this framework can be used
both for one dimensional and two dimensional functions, while
a generalization to higher dimensions is a straightforward thing
to do. We have solved the problem of line fitting for piecewise
linear one dimensional functions and then showed how the
new technique can be used for denoising, segmentation and
inpainting.
8
0 50 100 150 200 250 300
−1.5
−1
−0.5
0
0.5
1
1.5
(a) Noisy Image σ= 0.1
0 50 100 150 200 250 300
−1.5
−1
−0.5
0
0.5
1
1.5
original
analysis recovered
analysis recovered continuous
(b) Function Recovery for σ= 0.1
0 50 100 150 200 250 300
−10
−5
0
5
10
a estimate
b estimate
0 50 100 150 200 250 300
−10
−5
0
5
10
a true
b true
(c) Coefficients Parameters Recovery for σ= 0.1
0 50 100 150 200 250 300
−1.5
−1
−0.5
0
0.5
1
1.5
(d) Noisy Image σ= 0.25
0 50 100 150 200 250 300
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
original
analysis recovered
analysis recovered continuous
(e) Function Recovery for σ= 0.25
0 50 100 150 200 250 300
−10
−5
0
5
10
a estimate
b estimate
c estimate
0 50 100 150 200 250 300
−10
−5
0
5
10
a true
b true
c true
(f) Coefficients Parameters Recovery for σ= 0.25
Fig. 2. Recovery of a piecewise linear function using the BGAPN algorithm with and without a constraint on the continuity
(a) Original Image (b) Piecewise linear version of the image (c) Image Segmentation
Fig. 6. Piecewise linear version of the coins image together with the segmentation result.
Though this work has focused mainly on linear over-
parametrizations, the extension to other forms is straight-
forward and trivial. In our experiments we have used the
cubic overparametrizations. However, to keep the discussion as
simple as possible we have chosen not to present these results
in the experiments section. As a future research, we believe
that a learning process should be added to our scheme. It
should adapt the functions of the space variables X1,...,Xn
and the filters in Ωto the signal at hand. Another route is to
integrate our scheme in the state-of-the-art overparametrized
based algorithm for optical flow in [10].
REFERENCES
[1] A. M. Bruckstein, D. L. Donoho, and M. Elad, “From sparse solutions of
systems of equations to sparse modeling of signals and images,” SIAM
Review, vol. 51, no. 1, pp. 34–81, 2009.
[2] R. Gribonval and M. Nielsen, “Sparse representations in unions of
bases,” IEEE Trans. Inf. Theory., vol. 49, no. 12, pp. 3320–3325, Dec.
2003.
[3] T. Blumensath and M. Davies, “Sampling theorems for signals from the
union of finite-dimensional linear subspaces,” IEEE Trans. Inf. Theory.,
vol. 55, no. 4, pp. 1872 –1882, april 2009.
[4] Y. Lu and M. Do, “A theory for sampling signals from a union of
subspaces,” IEEE Trans. Signal Process., vol. 56, no. 6, pp. 2334 –2345,
Jun. 2008.
[5] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based
noise removal algorithms,” Physica D: Nonlinear Phenomena, vol. 60,
no. 1-4, pp. 259–268, 1992.
9
(a) Original Image (b) Piecewise linear version of the image (c) Image Segmentation
Fig. 7. Piecewise linear version of an airplane image together with the segmentation result.
(a) Original Image (b) Piecewise linear version of the image (c) Image Segmentation
Fig. 8. Piecewise linear version of man image together with the segmentation result.
[6] D. Needell and R. Ward, “Stable image reconstruction using total
variation minimization,” SIAM Journal on Imaging Sciences, vol. 6,
no. 2, pp. 1035–1058, 2013.
[7] S. Nam, M. Davies, M. Elad, and R. Gribonval, “The cosparse analysis
model and algorithms,” Appl. Comput. Harmon. Anal., vol. 34, no. 1,
pp. 30 – 56, 2013.
[8] R. Giryes, S. Nam, M. Elad, R. Gribonval, and M. Davies, “Greedy-
like algorithms for the cosparse analysis model,” Linear Algebra and its
Applications, vol. 441, no. 0, pp. 22 – 60, Jan. 2014, special Issue on
Sparse Approximate Solution of Linear Systems.
[9] T. Nir and A. Bruckstein, “On over-parameterized model based tv-
denoising,” in Signals, Circuits and Systems, 2007. ISSCS 2007. In-
ternational Symposium on, vol. 1, July 2007, pp. 1–4.
[10] T. Nir, A. Bruckstein, and R. Kimmel, “Over-parameterized variational
optical flow,” International Journal of Computer Vision, vol. 76, no. 2,
pp. 205–216, 2008.
[11] G. Rosman, S. Shem-Tov, D. Bitton, T. Nir, G. Adiv, R. Kimmel,
A. Feuer, and A. Bruckstein, “Over-parameterized optical flow using
a stereoscopic constraint,” in Scale Space and Variational Methods in
Computer Vision, ser. Lecture Notes in Computer Science, A. Bruck-
stein, B. Haar Romeny, A. M. Bronstein, and M. Bronstein, Eds.
Springer Berlin Heidelberg, 2012, vol. 6667, pp. 761–772.
[12] S. Shem-Tov, G. Rosman, G. Adiv, R. Kimmel, and A. M. Bruckstein,
“On globally optimal local modeling: From moving least squares to over-
parametrization,” in Innovations for Shape Analysis. Springer, 2013,
pp. 379–405.
[13] G. Davis, S. Mallat, and M. Avellaneda, “Adaptive greedy approxima-
tions,” Journal of Constructive Approximation, vol. 50, pp. 57–98, 1997.
[14] M. Elad, P. Milanfar, and R. Rubinstein, “Analysis versus synthesis in
signal priors,” Inverse Problems, vol. 23, no. 3, pp. 947–968, June 2007.
[15] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition
by basis pursuit,” SIAM Journal on Scientific Computing, vol. 20, no. 1,
pp. 33–61, 1998.
[16] D. L. Donoho and M. Elad, “On the stability of the basis pursuit in the
presence of noise,” Signal Process., vol. 86, no. 3, pp. 511–532, 2006.
[17] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Roy.
Statist. Soc. B, vol. 58, no. 1, pp. 267–288, 1996.
[18] S. Chen, S. A. Billings, and W. Luo, “Orthogonal least squares methods
and their application to non-linear system identification,” International
Journal of Control, vol. 50, no. 5, pp. 1873–1896, 1989.
[19] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictio-
naries,” IEEE Trans. Signal Process., vol. 41, pp. 3397–3415, 1993.
[20] D. Needell and J. Tropp, “CoSaMP: Iterative signal recovery from
incomplete and inaccurate samples,” Applied and Computational Har-
monic Analysis, vol. 26, no. 3, pp. 301 – 321, May 2009.
[21] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensing
signal reconstruction,” IEEE Trans. Inf. Theory., vol. 55, no. 5, pp. 2230
–2249, May 2009.
[22] T. Blumensath and M. Davies, “Iterative hard thresholding for com-
pressed sensing,” Applied and Computational Harmonic Analysis,
vol. 27, no. 3, pp. 265 – 274, 2009.
[23] S. Foucart, “Hard thresholding pursuit: an algorithm for compressive
sensing,” SIAM J. Numer. Anal., vol. 49, no. 6, pp. 2543–2563, 2011.
[24] S. Nam, M. Davies, M. Elad, and R. Gribonval, “Recovery of cosparse
signals with greedy analysis pursuit in the presence of noise,” in 4th
IEEE International Workshop on Computational Advances in Multi-
Sensor Adaptive Processing (CAMSAP), Dec 2011, pp. 361–364.
[25] M. Yuan and Y. Lin, “Model selection and estimation in regression with
grouped variables,” Journal of the Royal Statistical Society: Series B
(Statistical Methodology), vol. 68, no. 1, pp. 49–67, 2006.
[26] Y. Eldar and M. Mishali, “Robust recovery of signals from a structured
union of subspaces,” Information Theory, IEEE Transactions on, vol. 55,
no. 11, pp. 5302–5316, Nov 2009.
[27] M. Stojnic, F. Parvaresh, and B. Hassibi, “On the reconstruction of
block-sparse signals with an optimal number of measurements,” Signal
Processing, IEEE Transactions on, vol. 57, no. 8, pp. 3075–3085, Aug
2009.
[28] Y. Eldar, P. Kuppinger, and H. Bolcskei, “Block-sparse signals: Un-
certainty relations and efficient recovery,” Signal Processing, IEEE
Transactions on, vol. 58, no. 6, pp. 3042–3054, June 2010.
[29] R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, “Model-based com-
pressive sensing,” Information Theory, IEEE Transactions on, vol. 56,
no. 4, pp. 1982–2001, April 2010.
[30] M. Davenport, D. Needell, and M. Wakin, “Signal space cosamp for
sparse recovery with redundant dictionaries,” IEEE Trans. Inf. Theory,
vol. 59, no. 10, pp. 6820–6829, Oct 2013.
[31] R. Giryes and M. Elad, “Can we allow linear dependencies in the dic-
tionary in the synthesis framework?” in IEEE International Conference
on Acoustics, Speech and Signal Processing (ICASSP), 2013.
[32] ——, “Iterative hard thresholding for signal recovery using near optimal
10
(a) Original Image (b) Noisy Image σ= 20
(c) BGAPN with ΩDIF . PSNR =
29.6dB.
(d) TV recovery. PSNR =30.28dB.
Fig. 4. Denoising of house using the BGAPN algorithm. The result of TV
is presented as a reference. Notice that we do not have the staircasing effect
that appears in the TV reconstruction. Because our model is linear we do not
recover the texture and thus we get slightly inferior results compared to TV
with respect to PSNR. Note that if we use a cubic overparametrization with
BGAPN instead of linear we get PSNR equal to that of TV.
(a) Original Image Gradients (b) Recovered Image Gradients
Fig. 5. Gradient map of the clean house image and our recovered image from
Fig. 4.
projections,” in 10th Int. Conf. on Sampling Theory Appl. (SAMPTA),
2013.
[33] ——, “OMP with highly coherent dictionaries,” in 10th Int. Conf. on
Sampling Theory Appl. (SAMPTA), 2013.
[34] R. Giryes and D. Needell, “Greedy signal space methods for incoherence
and beyond,” CoRR, vol. abs/1309.2676, 2014.
[35] ——, “Near oracle performance of signal space greedy methods,” CoRR,
vol. abs/1402.2601, 2014.
[36] T. Han, S. Kay, and T. Huang, “Optimal segmentation of signals and
its application to image denoising and boundary feature extraction,” in
IEEE International Conference on Image Processing (ICIP)., vol. 4, oct.
2004, pp. 2693 – 2696 Vol. 4.
[37] R. Giryes and M. Elad, “RIP-based near-oracle performance guarantees
for SP, CoSaMP, and IHT,” IEEE Trans. Signal Process., vol. 60, no. 3,
pp. 1465–1468, March 2012.
(a) Image with missing portion (b) Recovered Image
(c) Image with missing portion (d) Recovered Image
(e) Image with missing portion (f) Recovered Image
(g) Image with missing portion (h) Recovered Image
Fig. 9. house inpainting examples.
[38] E. Cand`
es, “The restricted isometry property and its implications for
compressed sensing,” Comptes Rendus Mathematique, vol. 346, no. 9-
10, pp. 589 – 592, 2008.
[39] J. Savage and K. Chen, “On multigrids for solving a class of improved
total variation based staircasing reduction models,” in Image Processing
Based on Partial Differential Equations, ser. Mathematics and Visual-
ization, X.-C. Tai, K.-A. Lie, T. Chan, and S. Osher, Eds. Springer
Berlin Heidelberg, 2007, pp. 69–94.
[40] D. Mumford and J. Shah, “Optimal approximations by piecewise smooth
functions and associated variational problems,” Communications on Pure
11
(a) Image with missing portion (b) Recovered Image
Fig. 10. house inpainting examples with diagonal derivatives.
and Applied Mathematics, vol. 42, no. 5, pp. 577–685, 1989.
[41] L. Ambrosio and V. M. Tortorelli, “Approximation of functional depend-
ing on jumps by elliptic functional via t-convergence,” Communications
on Pure and Applied Mathematics, vol. 43, no. 8, pp. 999–1036, 1990.
[42] T. Chan and L. Vese, “Active contours without edges,” Image Processing,
IEEE Transactions on, vol. 10, no. 2, pp. 266–277, Feb 2001.