PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

The k-sparse nonnegative least squares (NNLS) problem is a variant of the standard least squares problem, where the solution is constrained to be nonnegative and to have at most k nonzero entries. Several methods exist to tackle this NP-hard problem, including fast but approximate heuristics, and exact methods based on brute-force or branch-and-bound algorithms. Although intuitive, the k-sparse constraint is sometimes limited; the parameter k can be hard to tune, especially in the case of NNLS with multiple right-hand sides (MNNLS) where the relevant k could differ between columns. In this work, we propose a novel biobjective formulation of the k-sparse nonnegative least squares problem. We present an extension of Arborescent, a branch-and-bound algorithm for exact k-sparse NNLS, that computes the whole Pareto front (that is, the set of optimal solutions for all values of k) instead of only the k-sparse solution, for virtually the same computing cost. We also present a method for MNNLS that enforces a matrix-wise sparsity constraint, by first computing the Pareto front for each column and then selecting one solution per column to build a globally optimal solution matrix. We show the advantages of the proposed approach for the unmixing of hyperspectral images.
Content may be subject to copyright.
Exact Biobjective k-Sparse Nonnegative Least
Nicolas Nadisic, Arnaud Vandaele, Nicolas Gillis
e de Mons
Mons, Belgium
Jeremy E. Cohen
e de Rennes, INRIA, CNRS, IRISA
Rennes, France
Abstract—The k-sparse nonnegative least squares (NNLS)
problem is a variant of the standard least squares problem, where
the solution is constrained to be nonnegative and to have at most
knonzero entries. Several methods exist to tackle this NP-hard
problem, including fast but approximate heuristics, and exact
methods based on brute-force or branch-and-bound algorithms.
Although intuitive, the k-sparse constraint is sometimes limited;
the parameter kcan be hard to tune, especially in the case
of NNLS with multiple right-hand sides (MNNLS) where the
relevant kcould differ between columns. In this work, we propose
a novel biobjective formulation of the k-sparse nonnegative least
squares problem. We present an extension of Arborescent, a
branch-and-bound algorithm for exact k-sparse NNLS, that com-
putes the whole Pareto front (that is, the set of optimal solutions
for all values of k) instead of only the k-sparse solution, for
virtually the same computing cost. We also present a method for
MNNLS that enforces a matrix-wise sparsity constraint, by first
computing the Pareto front for each column and then selecting
one solution per column to build a globally optimal solution
matrix. We show the advantages of the proposed approach for
the unmixing of hyperspectral images.
Index Terms—sparse approximation, `0constraint, biobjective
optimization, nonnegative least squares
Nonnegative least squares (NNLS) problems occur in many
signal processing and data mining tasks, when data points
can be expressed as additive linear combinations of basis
components [1]. For example, in hyperspectral images, the
spectral signature of a pixel is the additive linear combination
of the spectral signatures of the materials it contains [2]. NNLS
problems are also at the heart of many alternating algorithms
to compute nonnegative matrix factorization (NMF) [3].
Given ARm×rand bRm, the standard NNLS problem
can be written as follows,
xkAx bk2
2such that x0,(1)
where xRr, and x0means xis entry-wise nonnegative.
Nonnegativity in least squares problems is known to natu-
rally induce sparsity (see Theorem 6.1 in [4]), that is, solutions
with few nonzero entries. Sparsity is an appreciated feature,
as it often improves the interpretability of the solution even
NN and NG acknowledge the support by the European Research Council
(ERC starting grant No 679515), and by the Fonds de la Recherche Sci-
entifique - FNRS and the Fonds Wetenschappelijk Onderzoek - Vlanderen
(FWO) under EOS project O005318F-RG47.
for invertible linear systems. For example, in hyperspectral
unmixing, that is, the task of identifying materials in a
hyperspectral image, sparsity means a pixel is expressed as
a combination of only a few materials. However, there is
no guarantee on the sparsity of the solution of a general
NNLS problem, while some applications may benefit from
explicit sparsity constraints. Leveraging prior knowledge of
the sparsity of the solution can help regularize the problem,
reduce noise, and improve the results.
The most natural sparsity measure is the `0-“norm”, as it is
equal to the number of nonzero entries of a vector, kxk0=
Card{i:xi6= 0}. A vector is said k-sparse if it has at most k
nonzero entries. A common way to enforce sparsity is with a
k-sparsity constraint. Combined with nonnegativity, this leads
to the following problem, called k-sparse NNLS,
xkAx bk2
2such that x0and kxk0k. (2)
This problem is sometimes called nonnegative sparse coding
or cardinality-constrained NNLS.
In hyperspectral unmixing, the k-sparsity constraint means
a pixel can be composed of at most kmaterials. Albeit
quite intuitive, this formulation still suffers from the need
to choose an appropriate parameter k. Most importantly, in
NNLS problems with multiple right-hand sides (MNNLS), the
suitable koften varies from one column to another (pixels can
contain different numbers of materials), and imposing a single
sparsity parameter can produce inadequate results.
To overcome this issue, instead of optimizing the error while
constraining the sparsity, one can consider a biobjective formu-
lation. Here, the objectives are minimizing the reconstruction
error on the one hand, and maximizing the sparsity (that is,
minimizing the `0-“norm”) on the other hand,
x0{kAx bk2
These objectives are conflicting, hence there is no unique
optimal solution to Problem (3), and solutions representing
different tradeoffs between error and sparsity are all equally
good. Therefore, we consider the notion of Pareto-optimality.
Given a set of objectives to optimize, a solution xis said
Pareto-optimal if and only if it is not dominated, that is, there
does not exist a solution that is at least as good as xon all
objectives and strictly better than xon a least one objective.
The set of all Pareto-optimal solutions to a problem is called
the Pareto front, see Figure 1. In Problem (3), the discreteness
kAx bk2
1 2 3 4 r= 5
x= 0
xargminx0kAx bk2
Fig. 1. Example of the Pareto front for a biobjective k-sparse NNLS problem
with r= 5 variables. The first solution, for kxk0= 0, corresponds to the zero
vector. The last solution, for kxk0= 5, corresponds to the NNLS problem
with no sparsity constraint. Here the penultimate solution is identical to the
last one, meaning that the solution with no sparsity constraint has naturally 1
zero entry.
of the `0-“norm” actually makes the computation of the Pareto
front easier, as it suffices to solve Problem (2) for all possible
values of kxk0in {1,2, . . . , r}. By computing the Pareto front
instead of just one solution, we provide the user with a set
of solutions to choose from, representing different tradeoffs
between error and sparsity, and remove the need to define a
parameter ka priori. Note that, when rank(A)< m, which
includes the underdetermined case m<r, the optimal solution
of Problem (2) is not necessarily unique. This does not change
the principles of the methods presented in this work; they
return one optimal solution among the possible ones.
In this paper, we tackle Problem (3) exactly. In section II,
we review briefly existing approaches for sparse NNLS. In sec-
tion III, we describe the existing branch-and-bound algorithm
Arborescent (abbreviated Arbo), upon which our approach is
built. In section IV, we introduce our novel extension of Arbo
to tackle Problem (3) exactly. In section V, we present our
approach to leverage this extension in the context of sparse
MNNLS. In section VI, we illustrate the proposed method
with the unmixing of hyperspectral images.
The discreteness of the `0-“norm” makes problems like (2)
combinatorial, and thus hard to solve. For this reason, many
approximate methods have been used.
The most common one is to use the `1-norm as a convex
relaxation of the `0-“norm” in order to leverage the efficient
algorithms and strong theoretical results from convex opti-
mization. The `1-penalized problem minxkAxbk2+λkxk1is
called the LASSO, and several nonnegative variants have been
studied, see for example [5]. However, these methods suffer
from several drawbacks. Tuning the parameter λto reach a
target sparsity can be tricky, especially in MNNLS where the
adequate λcan vary between columns. Although there exist
conditions under which `1methods are guaranteed to produce
a solution with the same support as the `0method, they are
quite restrictive in practice [6].
Greedy heuristics are also widely used. These methods start
with an empty support, and select entries one by one to add to
the support, until the target sparsity kis reached. The selection
is done greedily, by choosing at each iteration the entry that
maximizes the decrease of the error. Orthogonal variants make
sure that entries are not selected more than once; orthogonal
matching pursuit (OMP) and orthogonal least squares (OLS)
are the most popular algorithms. Recently, nonnegative vari-
ants have been studied, see [7] and the references therein. They
solve (2) approximately, and recovery guarantees depend on
conditions that can be restrictive in practice.
A few methods were proposed to solve exactly `0-
constrained problems, similar to (2) but with different con-
straints. Reference [8] introduced a branch-and-cut algorithm
using continuous relaxations of the `0-“norm”. It was latter
extended and improved, see [9] and the references therein.
Reference [9] introduced mixed-integer programming (MIP)
formulations for several variants involving the `0-“norm” (to
be able to solve them with a generic MIP solver), and [10]
proposed dedicated branch-and-bound algorithms to solve
them. Finally, [11] introduced a branch-and-bound algorithm
specifically designed for k-sparse NNLS; this is the foundation
our work is based upon.
In this section, we briefly describe the algorithm Arbo
introduced in [11]. This algorithm solves k-sparse NNLS
(2) exactly using a branch-and-bound strategy. Instead of
enumerating all possible supports, it uses the structure of the
problem to prune large parts of the search space. This search
space is mapped on a tree, see Figure 2. Every node of the tree
represents an over-support Kof x, that is, the set of entries of x
not constrained to be zero, with K ⊆ {1,2, . . . , r}. Exploring
a node means solving the NNLS subproblem
f(K) = min ||A(:,K)x(K)b||2
2such that x(K)0,
where x(K)is the subvector composed of the entries of x
indexed by K. The value f(K)is the error associated to the
node corresponding to K. We solve the NNLS subproblems
using an active-set method [12]. On top of solving the sub-
problems exactly, this method supports a warm start, that is,
it can be initialized at a given node with the solution from a
previous node. This significantly speeds up the computation as
the initial guess at each node is close to the optimal solution.
root node, unconstrained
k0r= 5
X = [0 x2x3x4x5]
X=[000x4x5] X = [0 0 x30x5] X = [0 0 x3x40] k02 = kstop
X = [0 x20x4x5] X = [0 x2x30x5]... k03
X=[x10x3x4x5]... k04
Fig. 2. Example of the Arbo search tree, for r= 5 and k= 2.
The root node represents the NNLS problem with no
sparsity constraint, and every descending node represents this
problem with one entry constrained to be zero. This is done
recursively, until reaching the nodes with kunconstrained
entries. The nodes at this depth are leafs of the search tree, and
represent feasible solutions to problem (2). To prune this tree,
we use the fact that in any optimization problem, when adding
constraints, the solution cannot improve. By construction, a
given node will always have an error greater than (or equal
to) the error of its parent node. When we reach a leaf, we
obtain a feasible solution whose error is an upper bound for
problem (2). Therefore, if a given node Nhas an error greater
than this bound, then all children nodes descending from N
will also have a error greater than the bound, and thus cannot
be optimal solutions; Ncan be pruned safely. Moreover, by
ordering the entries in the root node by ascending order and
then exploring depth-first and “left-first”, we first constrain to
zero the entries that are already close to zero in the standard
NNLS problem, and therefore that are more likely to be zero
in the constrained problem. This strategy leads quickly to good
feasible solutions and allows to prune efficiently large parts of
the search space. Other technical choices that are key in the
performance of the algorithm are detailed in [11].
Although Arbo was designed to solve the k-sparse NNLS
problem, it can be easily extended to compute the whole Pareto
front. Indeed, while exploring the search tree and computing
intermediary nodes, it also computes automatically the optimal
k0-sparse solutions for all k0∈ {k,...,r}. Our extension (that
we call Arbo-Pareto) therefore consists in maintaining a list of
the optimal k0-sparse solutions, making a comparison for every
node explored, and updating the list when a better solution
is found. This almost does not affect the computational cost
of the algorithm as the cost of comparison and update is
negligible compared to the cost of the exploration of a node.
To show that Arbo indeed computes these solutions, we
prove by contradiction that it cannot prune an optimal k0-
sparse solution. Suppose the node γis the optimal k0-sparse
solution for a given k0> k, and that it is pruned by Arbo.
Because it is pruned, its error must be larger than the error of
some feasible k-sparse solution α,f(γ)> f(α). However,
there exists necessarily a parent node of αthat is k0-sparse;
we call it β. By construction, f(α)> f(β), so we have
f(γ)> f(β), meaning that γis not the optimal k0-sparse
solution. This contradicts the hypothesis.
Arbo-Pareto is detailed in Algorithm 1. NNLS refers to
the active-set method described above. The set Pis the
pool of nodes, initialized with the root node (with no entry
constrained) on line 4. A node is selected from Pon line 8,
and removed from Pon line 9. On line 10, the NNLS
subproblem restricted to the over-support Kis solved using
the parent solution as initialization. If the error at the current
node is worse than the current best feasible solution, then no
descending node can be optimal, and we prune the current
node (line 13). Otherwise, we continue the exploration. If the
sparsity target kis not reached, we generate one node for every
entry of the over-support (lines 16 and 17). We then compare
the error of the current node with the error of the current best
Algorithm 1: Arbo-Pareto
Input: ARm×r
+,k∈ {1,2, . . . , n}
Output: Pareto front S
1Init K0← {1, ..., r}
2Init x0NNLS(A, b)
3Sort the entries in x0in ascending order
4Init P← {(K0, x0)}
5Init Ei+for all i∈ {k, . . . , r}
6Init Si~
0for all i∈ {k, . . . , r}
7while P6=do
8(K, xparent)
9PP\ {(K, xparent)}
10 x, error NNLS(A(:,K), b, xparent(K))
11 k0=size(K)
12 if error > Ekthen
13 prune (do nothing)
14 else
15 if k0> k then
16 foreach i∈ K do
17 PP∪ {(K \ {i}, x)}
18 if error < Ek0then
19 Ek0error
20 Sk0x
k0-sparse solution, on line 18. If it is lower, we update this
error (line 19) and the Pareto front (line 20).
Note that, if k= 1, Arbo computes the whole Pareto front.
Otherwise, it only computes the part with k0∈ {k,...,r}. The
extended algorithm Arbo-Pareto can be used to solve sparse
NNLS problems in a biobjective way, but it can also be used
as a subroutine in a MNNLS algorithm with a matrix-wise
sparsity constraint, as described in the next section.
In MNNLS problems, that occur for example in alternating
algorithms for NMF, sparsity is usually enforced column-wise,
as follows,
X0kAX Bk2
Fsuch that j, kX(:, j)k0k(4)
where BRm×nis a given data matrix, ARm×ris a given
dictionary, XRr×n
+is the solution matrix we compute and
X(:, j)denotes the jth column of X. Problem (4) can be
decomposed into nindependent k-sparse NNLS problems of
the form (2).
Although this formulation is intuitive (a data point is
composed of at most kbasis components), it can be limiting
in some contexts. Notably, when sparsity varies between
columns, setting the right kcan be tricky. This is often the
case in hyperspectral images where pixels contain different
numbers of materials. To overcome this issue, we consider a
matrix-wise sparsity constraint,
X0kAX Bk2
Fsuch that kXk0q, (5)
where qis a matrix-wise sparsity parameter, thus enforcing an
average sparsity q/n on the columns of X.
Theoretically, we could solve problem (5) with any al-
gorithm for k-sparse NNLS (such as greedy algorithms, or
even Arbo), by vectorizing the problem. However, this would
not be tractable computationally for large instances, as the
resulting NNLS problem would have dimensions mn by rn.
To the best of our knowledge, only one previous work [13]
considered problem (5) in its matrix nature. It proposed an
algorithm to solve it approximately in two steps. First, it
applies a homotopy method to generate a regularization path
for every column, that is, a set of solutions representing
different tradeoffs between error and sparsity. This first step
is only approximate because the homotopy method relies on
an `1-penalized formulation, so there is no guarantee that
the solutions computed correspond to the real Pareto front
of the biobjective k-sparse NNLS problem. Second, it selects
one solution per column to build a solution matrix Xthat
minimizes the error while respecting a matrix-wise sparsity
constraint; this is done with a greedy-like algorithm, that is
very cheap but was shown to optimally solve the selection
Here, we use a similar approach, but we replace the homo-
topy method of the first step by our algorithm Arbo-Pareto. We
call this new approach Arbo+sel. After computing the Pareto
front for every column, we build a cost matrix Cwhere every
entry C(k0, j)is the error of the k0-sparse solution (with k0
between kand r) of the jth column of X. Then, we select
one solution per column to build an optimal solution matrix
X, following the greedy-like method from [13]. In a nutshell,
we consider one cursor per column of C, and begin with all
cursors at zero, meaning that the zero vector is selected for
each column. Then, at each iteration, we choose one cursor to
increment such that the error decrease is maximized. We stop
when the sparsity target qis reached. This greedy selection
is globally optimal because the squared Frobenius norm is
separable by columns.
If Arbo-Pareto is run for each column with k= 1, then the
proposed algorithm Arbo+sel provides the globally optimal
solution for (5), because the selection subproblem is also
solved exactly.
In this section, we study the performance of the proposed
approach Arbo+sel on the unmixing of 4 hyperspectral images.
A hyperspectral image can be represented as a matrix B
where each column corresponds to a pixel and each row to
a different wavelength. The rcolumns of the dictionary A
represent the spectral signature of the pure materials (also
called endmembers) present in the image [2]. Given Band
A, we compute X, whose columns represent the abundance
of materials in each pixel. Most pixels contain only a few
endmembers [14], therefore it makes sense to enforce sparsity
on X. We consider the 4 widely used hyperspectral images1
1Downloaded from set.html
Samson, Jasper, Urban, and Cuprite, and we use as dictionaries
(that is, for the matrix A) the ground truths from [15]. The
characteristics of the data are summarized in Table I. The
number mcorresponds to the number of wavelengths, nto
the number of pixels, and rto the number of endmembers in
the ground truth; BRm×nand ARm×r.
Dataset m n r
Samson 156 95 ×95 = 9025 3
Jasper 198 100 ×100 = 1000 4
Urban 162 307 ×307 = 94249 6
Cuprite 188 250 ×191 = 47750 12
Our method, described in section V, is noted Arbo+sel. We
run Arbo-Pareto with k= 1 to compute the whole Pareto
front, and then apply the selection strategy to build Xwith a
matrix-wise sparsity constraint. We compare the performance
of Arbo+sel with 3 other methods:
An active-set algorithm that solves the NNLS problem
with no sparsity constraint, noted AS. This is equivalent
to exploring only the root node in Arbo.
The original Arbo algorithm with a column-wise k-
sparsity constraint, noted Arbo k-s.
The algorithm from [13], that solves problem (5) approx-
imately with a homotopy method followed by a matrix-
wise selection. It is noted Ht+sel.
All algorithms are implemented in Julia. They are mono-
threaded, and executed on a computer with a processor Intel
Core i5-8350U @1.70GHz. Source code and scripts are pro-
vided in an online repository2.
For every dataset, we run the 4 algorithms and measure the
average column sparsity of the solutions (number of entries
larger than 103divided by the number of columns, after
a normalization of the columns so that the maximum per
column is 1), the relative error kAXBkF
kBkF, and the running
time (median over 10 runs). Jasper and Urban are processed
once with all algorithms for k=q/n = 2, and once with
Ht+sel and Arbo+sel for q/n = 1.8, which is not possible
with other algorithms.
The results of the experiments for the unmixing of hyper-
spectral images are shown on Table II. Time is in seconds and
the relative error is in percent. Without sparsity constraint,
we observe that the results are already quite sparse. The
column-wise k-sparse method Arbo produces solutions with
an average sparsity below the target k, meaning that many
columns are actually sparser than the target. Logically, all
methods enforcing sparsity increase the reconstruction error.
However, this loss is limited for Ht+sel and even smaller for
Arbo+sel. Arbo+sel is always better than the other sparsity-
enforcing methods, which is expected as it is the only one
that solves problem (5) exactly. Arbo-based methods show an
increase in computing time, but it is reasonable for the datasets
with small r. The overcost of Arbo-sel compared to Arbo k-s
AS Arbo k-s Ht+sel Arbo+sel
Samson Time 0.10 0.19 0.25 0.43
r= 3 Rel error 3.30 3.40 3.30 3.30
k= 2 Sparsity 2.19 1.83 2.0 2.0
Jasper Time 0.13 0.42 0.40 0.80
r= 4 Rel error 5.71 6.18 5.72 5.71
k= 2 Sparsity 2.23 1.78 2.0 2.0
Jasper Time 0.39 0.78
r= 4 Rel error 5.95 5.74
q/n = 1.8Sparsity 1.8 1.8
Urban Time 2.19 13.26 6.66 29.63
r= 6 Rel error 7.67 8.27 7.83 7.71
k= 2 Sparsity 2.62 1.83 2.0 2.0
Urban Time 6.52 29.22
r= 6 Rel error 8.22 7.80
q/n = 1.8Sparsity 1.8 1.8
Cuprite Time 1.53 224.23 6.82 1408.5
r= 12 Rel error 1.74 1.94 2.01 1.83
k= 4 Sparsity 6.60 3.81 4.0 4.0
(a) Active-set (no sparsity constraint) (b) Arbo with k= 2
(c) Ht+sel with q/n = 1.8(d) Arbo+sel with q/n = 1.8
Fig. 3. Abundance maps of the sixth endmember from the unmixing of
the Urban hyperspectral image (that is, sixth row of Xreshaped) by several
is due to the fact that, for the latter, we set k= 1 to generate
the whole Pareto front. When rgrows, the computing time
grows exponentially; this is the main limitation of our method.
However, in applications such as hyperspectral unmixing, r
is typically small. Also, these applications are generally not
in real-time, thus the computing time is not critical, and the
overcost of our method can be accepted when an exact result
is needed.
Some abundance maps of one endmember of the Urban
image are shown on Figure 3. It corresponds to pixels from
rooftops. Visually, with no sparsity constraint, the image
is pretty noisy and it includes pixels from other materials.
The column-wise Arbo reduces a little noise from other
endmembers, but it also adds noise to the rooftop pixels (some
zones are blurry and pixelated). Ht+sel removes most of the
noise, but it also loses a lot of information from the rooftop
pixels (some relevant zones are blacked out). Arbo+sel reduces
significantly the noise while preserving most of the rooftop
pixels, with a more distinct separation.
We proposed Arbo-Pareto, an extension to the algorithm
Arborescent to compute the Pareto front of the biobjective k-
sparse NNLS problem, that is, the set of optimal k0-sparse so-
lutions for different values of k0. We also proposed Arbo+sel, a
way to leverage this extension to solve exactly multiple right-
hand sides NNLS with a matrix-wise sparsity constraint, by
computing a Pareto front for every column and then applying
an optimal selection strategy. We showed that, for a modest
increase in computing cost, Arbo+sel brings improvement over
existing methods in the unmixing of hyperspectral images. It
scales well and is applicable to large datasets, as long as the
rank ris small.
[1] D. D. Lee and H. S. Seung, “Unsupervised learning by convex and conic
coding,” in Advances in neural information processing systems, 1997,
pp. 515–521.
[2] J. M. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader,
and J. Chanussot, “Hyperspectral unmixing overview: Geometrical,
statistical, and sparse regression-based approaches,” IEEE Journal of
Selected Topics in Applied Earth Observations and Remote Sensing,
vol. 5, no. 2, pp. 354–379, 2012.
[3] N. Gillis, Nonnegative Matrix Factorization. SIAM, 2020.
[4] C. L. Byrne, Applied Iterative Methods. AK Peters, 2008.
[5] P. O. Hoyer, “Non-negative sparse coding,” in IEEE Workshop on Neural
Networks for Signal Processing, 2002, pp. 557–565.
[6] J. E. Cohen and N. Gillis, “Nonnegative Low-rank Sparse Component
Analysis,” in ICASSP, 2019, pp. 8226–8230.
[7] T. T. Nguyen, J. Idier, C. Soussen, and E.-H. Djermoune, “Non-
Negative Orthogonal Greedy Algorithms,IEEE Transactions on Signal
Processing, pp. 1–16, 2019.
[8] D. Bienstock, “Computational study of a family of mixed-integer
quadratic programming problems,” Mathematical programming, vol. 74,
no. 2, pp. 121–140, 1996.
[9] S. Bourguignon, J. Ninin, H. Carfantan, and M. Mongeau, “Exact Sparse
Approximation Problems via Mixed-Integer Programming: Formulations
and Computational Performance,” IEEE Transactions on Signal Process-
ing, vol. 64, no. 6, pp. 1405–1419, 2016.
[10] R. B. Mhenni, S. Bourguignon, and J. Ninin, “Global Optimization for
Sparse Solution of Least Squares Problems,” Preprint hal-02066368,
[11] N. Nadisic, A. Vandaele, N. Gillis, and J. E. Cohen, “Exact Sparse
Nonnegative Least Squares,” in ICASSP, 2020, pp. 5395 – 5399.
[12] L. F. Portugal, J. J. Judice, and L. N. Vicente, “A comparison of block
pivoting and interior-point algorithms for linear least squares problems
with nonnegative variables,Mathematics of Computation, vol. 63, no.
208, pp. 625–643, 1994.
[13] N. Nadisic, A. Vandaele, and N. Gillis, “A homotopy-based algorithm
for sparse multiple right-hand sides nonnegative least squares,Preprint
arXiv:2011.11066, 2020.
[14] W.-K. Ma, J. M. Bioucas-Dias, T.-H. Chan, N. Gillis, P. Gader, A. J.
Plaza, A. Ambikapathi, and C.-Y. Chi, “A signal processing perspective
on hyperspectral unmixing: Insights from remote sensing,” IEEE Signal
Processing Magazine, vol. 31, no. 1, pp. 67–81, 2013.
[15] F. Zhu, “Hyperspectral unmixing: ground truth labeling, datasets, bench-
mark performances and survey,” Preprint arXiv:1708.05125, 2017.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Orthogonal greedy algorithms are popular sparse signal reconstruction algorithms. Their principle is to select atoms one by one. A series of unconstrained least-square subproblems of gradually increasing size is solved to compute the approximation coefficients, which is efficiently performed using a fast recursive update scheme. When dealing with nonnegative sparse signal reconstruction, a series of $non-negative$ least-squares subproblems have to be solved. Fast implementation becomes tricky since each subproblem does not have a closedform solution anymore. Recently, non-negative extensions of the classical orthogonal matching pursuit and orthogonal least squares algorithms were proposed, using slow (i.e., non-recursive) or recursive but inexact implementations. In this paper, we revisit these algorithms in a unified way. We define a class of non-negative orthogonal algorithms and exhibit their structural properties. We propose a fast and exact implementation based on the active-set resolution of non-negative least-squares and exploiting warm start initializations. The algorithms are assessed in terms of accuracy and computational complexity for a sparse spike deconvolution problem. We also present an application to near-infrared spectra decomposition.
Full-text available
Blind hyperspectral unmixing (HU), also known as unsupervised HU, is one of the most prominent research topics in signal processing (SP) for hyperspectral remote sensing [1], [2]. Blind HU aims at identifying materials present in a captured scene, as well as their compositions, by using high spectral resolution of hyperspectral images. It is a blind source separation (BSS) problem from a SP viewpoint. Research on this topic started in the 1990s in geoscience and remote sensing [3]-[7], enabled by technological advances in hyperspectral sensing at the time. In recent years, blind HU has attracted much interest from other fields such as SP, machine learning, and optimization, and the subsequent cross-disciplinary research activities have made blind HU a vibrant topic. The resulting impact is not just on remote sensing - blind HU has provided a unique problem scenario that inspired researchers from different fields to devise novel blind SP methods. In fact, one may say that blind HU has established a new branch of BSS approaches not seen in classical BSS studies. In particular, the convex geometry concepts - discovered by early remote sensing researchers through empirical observations [3]-[7] and refined by later research - are elegant and very different from statistical independence-based BSS approaches established in the SP field. Moreover, the latest research on blind HU is rapidly adopting advanced techniques, such as those in sparse SP and optimization. The present development of blind HU seems to be converging to a point where the lines between remote sensing-originated ideas and advanced SP and optimization concepts are no longer clear, and insights from both sides would be used to establish better methods.
Full-text available
Imaging spectrometers measure electromagnetic energy scattered in their instantaneous field view in hundreds or thousands of spectral channels with higher spectral resolution than multispectral cameras. Imaging spectrometers are therefore often referred to as hyperspectral cameras (HSCs). Higher spectral resolution enables material identification via spectroscopic analysis, which facilitates countless applications that require identifying materials in scenarios unsuitable for classical spectroscopic analysis. Due to low spatial resolution of HSCs, microscopic material mixing, and multiple scattering, spectra measured by HSCs are mixtures of spectra of materials in a scene. Thus, accurate estimation requires unmixing. Pixels are assumed to be mixtures of a few materials, called endmembers. Unmixing involves estimating all or some of: the number of endmembers, their spectral signatures, and their abundances at each pixel. Unmixing is a challenging, ill-posed inverse problem because of model inaccuracies, observation noise, environmental conditions, endmember variability, and data set size. Researchers have devised and investigated many models searching for robust, stable, tractable, and accurate unmixing algorithms. This paper presents an overview of unmixing methods from the time of Keshava and Mustard's unmixing tutorial [1] to the present. Mixing models are first discussed. Signal-subspace, geometrical, statistical, sparsity-based, and spatial-contextual unmixing algorithms are described. Mathematical problems and potential solutions are described. Algorithm characteristics are illustrated experimentally.
Full-text available
. We present computational experience with a branch-and-cut algorithm to solve quadratic programming problems where there is an upper bound on the number of positive variables. Such problems arise in financial applications. The algorithm solves the largest real-life problems in a few minutes of run-time. 1 Introduction. We are interested in optimization problems QMIP of the form: min x T Qx + c T x s.t. Ax b (1) jsupp(x)j K (2) 0 x j u j ; all j (3) where x is an n-vector, Q is a symmetric positive-semidefinite matrix, supp(x) = fj : x j ? 0g and K is a positive integer. Problems of this type are of interest in portfolio optimization. Briefly, variables in the problem correspond to commodities to be bought, the objective is a measure of "risk", the constraints (1) prescribe levels of "performance", and constraint (2) specifies that not too many 1 different types of commodities can be chosen. All data is derived from statistical information. A good deal of previous work ha...
Full-text available
Unsupervised learning algorithms based on convex and conic encoders are proposed. The encoders find the closest convex or conic combination of basis vectors to the input. The learning algorithms produce basis vectors that minimize the reconstruction error of the encoders. The convex algorithm develops locally linear models of the input, while the conic algorithm discovers features. Both algorithms are used to model handwritten digits and compared with vector quantization and principal component analysis. The neural network implementations involve feedback connections that project a reconstruction back to the input layer. 1 Introduction Vector quantization (VQ) and principal component analysis (PCA) are two widely used unsupervised learning algorithms, based on two fundamentally different ways of encoding data. In VQ, the input is encoded as the index of the closest prototype stored in memory. In PCA, the input is encoded as the coefficients of a linear superposition of a set of basis ...
Finding solutions to least-squares problems with low cardinality has found many applications, including portfolio optimization, subset selection in statistics, and inverse problems in signal processing. Although most works consider local approaches that scale with high-dimensional problems, some others address its global optimization via mixed integer programming (MIP) reformulations. We propose dedicated branch-and-bound methods for the exact resolution of moderate-size, yet difficult, sparse optimization problems, through three possible formulations: cardinality-constrained and cardinality-penalized least-squares, and cardinality minimization under quadratic constraints. A specific tree exploration strategy is built. Continuous relaxation problems involved at each node are reformulated as ℓ1-norm-based optimization problems, for which a dedicated algorithm is designed. The obtained certified solutions are shown to better estimate sparsity patterns than standard methods on simulated variable selection problems involving highly correlated variables. Problem instances selecting up to 24 components among 100 variables, and up to 15 components among 1000 variables, can be solved in less than 1000 s. Unguaranteed solutions obtained by limiting the computing time to 1s are also shown to provide competitive estimates. Our algorithms strongly outperform the CPLEX MIP solver as the dimension increases, especially for quadratically-constrained problems. The source codes are made freely available online.
Sparse approximation addresses the problem of approximately fitting a linear model with a solution having as few non-zero components as possible. While most sparse estimation algorithms rely on suboptimal formulations, this work studies the performance of exact optimization of l <sub xmlns:mml="" xmlns:xlink="">0</sub> -norm-based problems through Mixed-Integer Programs (MIPs). Nine different sparse optimization problems are formulated based on l <sub xmlns:mml="" xmlns:xlink="">1</sub> , l <sub xmlns:mml="" xmlns:xlink="">2</sub> or l <sub xmlns:mml="" xmlns:xlink="">∞</sub> data misfit measures, and involving whether constrained or penalized formulations. For each problem, MIP reformulations allow exact optimization, with optimality proof, for moderate-size yet difficult sparse estimation problems. Algorithmic efficiency of all formulations is evaluated on sparse deconvolution problems. This study promotes error-constrained minimization of the l <sub xmlns:mml="" xmlns:xlink="">0</sub> norm as the most efficient choice when associated with l <sub xmlns:mml="" xmlns:xlink="">1</sub> and l <sub xmlns:mml="" xmlns:xlink="">∞</sub> misfits, while the l <sub xmlns:mml="" xmlns:xlink="">2</sub> misfit is more efficiently optimized with sparsity-constrained and sparsity-penalized problems. Exact l <sub xmlns:mml="" xmlns:xlink="">0</sub> -norm optimization is shown to outperform classical methods in terms of solution quality, both for over- and underdetermined problems. Numerical simulations emphasize the relevance of the different lp fitting possibilities as a function of the noise statistical distribution. Such exact approaches are shown to be an efficient alternative, in moderate dimension, to classical (suboptimal) sparse approximation algorithms with l <sub xmlns:mml="" xmlns:xlink="">2</sub> data misfit. They also provide an algorithmic solution to less common sparse optimization problems based on l <sub xmlns:mml="" xmlns:xlink="">1</sub> and l <sub xmlns:mml="" xmlns:xlink="">∞</sub> misfits. For each formulation, simulated test problems are proposed where optima have been successfully computed. Data and optimal solutions are made available as potential benchmarks for evaluating other sparse approximation methods.
We discuss the use of block principal pivoting and predictor-corrector methods for the solution of large-scale linear least squares problems with nonnegative varibles (NVLSQ). We also describe two implementations of these algorithms that are based on the normal equations and corrected seminormal equations (CSNE) approaches. We show that the method of normal equations should be employed in the implementation of the predictor- corrector algorithm. This type of approach should also be used in the implementation of the block principal pivoting method, but a switch to the CSNE method may be useful in the last iterations of the algorithm. Computational experience is also included in this paper and shows that both the predictor-corrector and the block principal pivoting algorithms are quite efficient to deal with large-scale NVLSQ problems.
Conference Paper
Non-negative sparse coding is a method for decomposing multivariate data into non-negative sparse components. We briefly describe the motivation behind this type of data representation and its relation to standard sparse coding and non-negative matrix factorization. We then give a simple yet efficient multiplicative algorithm for finding the optimal values of the hidden components. In addition, we show how the basis vectors can be learned from the observed data. Simulations demonstrate the effectiveness of the proposed method.
  • N Gillis
N. Gillis, Nonnegative Matrix Factorization. SIAM, 2020.