Content uploaded by Nicolas Nadisic
Author content
All content in this area was uploaded by Nicolas Nadisic on Aug 29, 2022
Content may be subject to copyright.
Randomized Successive Projection Algorithm
Olivier VUTHA NH , Nicolas NADISIC, Nicolas GILLIS
Département de Mathématique et Recherche Opérationnelle, Université de Mons
Rue de Houdain 9, 7000 Mons, Belgique.
{olivier.vuthanh,nicolas.nadisic,nicolas.gillis}@umons.ac.be
Résumé – L’algorithme de projections successives (SPA) est un algorithme standard pour la factorisation non-négative de matrices (NMF). Il
est basé sur l’hypothèse de séparabilité. En démélange hyperspectral, c’est-à-dire l’extraction des matériaux dans une image hyperspectrale, la
séparabilité est équivalente à l’hypothèse du pixel pur et stipule que, pour chaque matériau présent dans l’image, il existe au moins un pixel
composé uniquement de ce matériau. SPA est rapide et a été prouvé robuste au bruit, mais il est sensible aux valeurs aberrantes (outliers). Aussi,
il est déterministe, et donc pour un problème donné il produit toujours la même solution. Or, il a été démontré empiriquement que l’algorithme
non-déterministe vertex component analysis (VCA), lorsqu’il est exécuté un nombre assez grand de fois, produit souvent au moins une solution
qui est meilleure que la solution de SPA. Dans cet article, nous cherchons à combiner ces qualités et introduisons une version aléatoire de SPA,
appelée RandSPA, qui produit des résultats potentiellement différents à chaque exécution. Il peut être exécuté plusieurs fois pour conserver la
meilleure solution, et il est encore garanti robuste au bruit. Des expériences de démélange d’images hyperspectrales montrent que la meilleure
solution sur plusieurs exécutions de RandSPA est généralement meilleure que la solution du SPA original.
Abstract – The successive projection algorithm (SPA) is a widely used algorithm for nonnegative matrix factorization (NMF). It is based on
the separability assumption. In hyperspectral unmixing, that is, the extraction of materials in a hyperspectral image, separability is equivalent to
the pure-pixel assumption and states that for each material present in the image there exists at least one pixel composed of only this material.
SPA is fast and provably robust to noise, but it not robust to outliers. Also, it is deterministic, so for a given setting it always produces the same
solution. Yet, it has been shown empirically that the non-deterministic algorithm vertex component analysis (VCA), when run sufficiently many
times, often produces at least one solution that is better than the solution of SPA. In this paper, we combine the best of both worlds and introduce
a randomized version of SPA dubbed RandSPA, that produces potentially different results at each run. It can be run several times to keep the best
solution, and it is still provably robust to noise. Experiments on the unmixing of hyperspectral images show that the best solution among several
runs of RandSPA is generally better that the solution of vanilla SPA.
1 Introduction
Nonnegative matrix factorization (NMF) is a linear dimension-
ality reduction technique that became a standard tool to extract
latent structures in nonnegative data. Given an input matrix
X∈Rm×n
+and a factorization rank r < min(m, n), NMF
consists in finding two factors W∈Rm×r
+and H∈Rr×n
+
such that X≈W H . Columns of Xare called data points, and
if His column-stochastic then the columns of Wcan be seen as
the vertices of the convex hull containing the data points. Ap-
plications of NMF include feature extraction in images, topic
modeling, audio source separation, chemometrics, or blind hy-
perspectral unmixing (HU), see for example [4] and the refer-
ences therein. Blind HU consists in identifying the materials
present in an hyperspectral image as well as their distribution
in the pixels of the image.
In general, NMF is NP-hard [12]. However, under the sepa-
rability assumption, it is solvable in polynomial time [2]. This
assumption states that for every vertex (column of W), there
exists at least one data point (column of X) equal to this ver-
tex. In blind HU, this is known as the pure-pixel assumption
and means that for each material, there is at least one pixel
composed almost purely of this material. Many algorithms
have been introduced that leverage this assumption, see for in-
stance [5, Chapter 7] and the references therein. Recently, al-
gorithms for separable NMF that are provably robust to noise
have been introduced [2]. One of the most widely used is the
successive projection algorithm (SPA) [1].
SPA is robust to noise and generally works well in practice.
However, it suffers from several drawbacks, notably a sensitiv-
ity to outliers. SPA is deterministic, that is for a given problem
it gives the same result at every run. It is also greedy, in the
sense that it extract vertices sequentially, so an error at a given
iteration cannot be compensated in the following iterations. In
this paper, we aim at addressing the sensitivity to outliers by
designing a non-deterministic variant of SPA that could be run
several times, in the hope that at least one run will not extract
outliers.
Let us discuss an observation from [10]. The separable NMF
algorithm called vertex component analysis (VCA) [11] in-
cludes a random projection, therefore it is non-deterministic
and at each run it produces potentially a different result. VCA
is simpler and its guarantees are weaker than those of SPA, and
the experiments in [10] show that VCA performs worse than
SPA on average, but they also show that the best result of VCA
over many runs is in most cases better that the result of SPA
in terms of reconstruction error. This observation is our main
motivation to design a non-deterministic variant of SPA, that
we coin as randomized SPA (RandSPA).
This paper is organized as follows. In section 2 we introduce
the general form of recursive algorithm for separable NMF an-
alyzed in [7] which generalizes SPA. In section 3 we present
the main contribution of this paper, that is a randomized vari-
ant of SPA, called RandSPA. We show the theoretical results on
the robustness to noise of SPA still hold for RandSPA, while the
randomization allows to better handle outliers by allowing a di-
versity in the solutions produced. In section 4 we illustrate the
advantages of our method with experiments on both synthetic
data sets and the unmixing of hyperspectral images.
2 Successive Projection Algorithm
In this section, we discuss the successive projection algorithm
(SPA). It is based on the separability assumption, detailed be-
low.
Assumption 1 (Separability) The m-by-nmatrix X∈Rm×n
is r-separable if there exist a nonnegative matrix Hsuch that
X=X(:,J)H, where X(:,J)denotes the subset of columns
of Xindexed by Jand |J| =r.
The pseudocode for a general recursive algorithm for separa-
ble NMF is given in Algorithm 1. Historically, the first variant
of Algorithm 1 has been introduced by Araújo et al. [1] for
spectroscopic component analysis with f(x) = ∥x∥2
2=x⊤x,
which is the so-called SPA. In the noiseless case, that is, under
Assumption 1, SPA is guaranteed to retrieve Jand more gen-
erally, the vertices of the set of points which are the columns
of X[9]. This particular choice of fis proved to be the most
robust to noise given the bounds in [7]. See Theorem 1 with
Q=Ifor the error bounds. The algorithm is iterative and is
composed of the following two main steps:
• Selection step: the column that maximizes a given func-
tion fis selected (line 3).
• Projection step: all the columns are projected onto the
orthogonal complement of the current selected columns
(line 5).
These two steps are repeated rtimes, rbeing the target number
of extracted columns. The drawback with the ℓ2-norm is its
sensitivity to outliers and the fact that it makes SPA determin-
istic. If some outliers are selected, running SPA again would
still retrieve the exact same outliers.
3 Randomized SPA
In this section, we introduce the main contribution of this work,
that is a randomized variant of SPA called RandSPA. Its key
Algorithm 1: Recursive algorithm for separable
NMF [7]. It coincides with SPA when f(x) = ∥x∥2
2.
Input: An r-separable matrix X∈Rm×n, a function f
to maximize.
Output: Index set Jof cardinality rsuch that
X≈X(:,J)Hfor some H≥0.
1Let J=∅,P⊥=Im,V=[].
2for k= 1 :rdo
3Let jk= argmax1≤j≤nf(P⊥X(:, j)). (Break ties
arbitrarily, if necessary.)
4Let J=J ∪ {jk}.
5Update the projector P⊥onto the orthogonal
complement of X(:,J):
vk=P⊥X(:, jk)
∥P⊥X(:, jk)∥2
,
V= [V vk],
P⊥←Im−V V T.
features are that it computes potentially different solutions at
each run, thus allowing a multi-start strategy, and that the the-
oretical robustness results of SPA still hold.
RandSPA follows Algorithm 1 with f(x) = x⊤QQ⊤x, with
Q∈Rm×νbeing a randomly generated matrix with ν≥r. To
control the conditioning of Q, we generate the columns of Q
such that they are mutually orthogonal and such that
∥Q(:,1)∥2= 1 ≥ ·· · ≥ ∥Q(:, ν)∥2= 1/√κ
where κis the desired conditioning of QQ⊤. For the columns
between the first and the last one, we make the arbitrary choice
to fix them also to 1/√κ. If Q⊤Whas full column rank, which
happens with probability one if ν≥r, RandSPA is robust to
noise with the following bounds:
Theorem 1 [6, Corollary 1] Let ˜
X=X+N, where Xsat-
isfies Assumption 1, Whas full column rank, and Nis noise
with maxj∥N(:, j)∥2≤ϵ; and let Q∈Rm×νwith ν≥r. If
Q⊤Whas full column rank and
ϵ≤ Oσmin (W)
√rκ3(Q⊤W),
then SPA applied on matrix Q⊤˜
Xidentifies a set of indices J
corresponding to the columns of Wup to the error
max
1≤j≤rmin
k∈J
W(:, j)−˜
X(:, k)
2≤ Oϵκ(W)κ(Q⊤W)3.
Theorem 1 is directly applicable to RandSPA since choosing
f(x) = x⊤QQ⊤xis equivalent to performing SPA on Q⊤˜
X.
The only subtlety is that with RandSPA, a random Qis drawn
at each column extraction. The error bound for RandSPA is
then the one with the highest drawn κ(Q⊤W).
Let us note that choosing ν= 1 or ∥Q(:, j )∥= 1/√κwith
κ→ ∞ for all j > 1retrieves VCA. Choosing ν=mand
κ(Q)=1retrieves SPA. Hence, RandSPA creates a continuum
between SPA, with more provable robustness, and VCA, with
more solution diversity.
4 Numerical experiments
In this section, we study empirically the performance of the
proposed algorithm RandSPA on the unmixing of hyperspectral
images. The algorithms have been implemented in Julia [3].
Our codes are available in an online repository1along with the
data and test scripts used in our experiments. Our tests are per-
formed on 5 real hyperspectral datasets2described in Table 1.
Dataset m n r
Jasper 198 100 ×100 = 10000 4
Samson 156 95 ×95 = 9025 3
Urban 162 307 ×307 = 94249 5
Cuprite 188 250 ×191 = 47750 12
San Diego 188 400 ×400 = 160000 8
Table 1: Summary of the datasets, for which X∈Rm×n.
For all the tests, we choose ν=r+ 1 and a relatively well
conditioned Qwith κ(Q) = 1.5. We then compute W=X(:,J)
once with SPA and 30 times with RandSPA. Next, we com-
pute Hby solving the nonnegative least squares (NNLS) sub-
problem minH≥0∥X−W H ∥2
Fexactly with an active-set al-
gorithm [8], and we compute the relative reconstruction error
∥X−W H ∥F/∥X∥F. For RandSPA, we show the best error
and the median error among the 30 runs. Note that in our set-
ting we choose the best solution as the one with the lower re-
construction error, but other methods could be used to choose
the best solution among all the computed ones.
The results of the experiments for SPA and RandSPA are
presented in Table 2. The median error of RandSPA is on
the same order than that of SPA, except for Cuprite where it
is higher. It is even slightly smaller for Samson and Urban.
On the other hand, the error from the best run of RandSPA
is always smaller than that of SPA. Particularly, the error is de-
creased respectively by 37%, 32% and 27% for Samson, Urban
and San Diego. This improvement is quite noticeable.
The resulting false color images for Jasper, Samson, Urban
and Cuprite are shown on Figure 2. They represent the repar-
tition of the materials identified by SPA and RandSPA in the
image. As we can see for Urban, SPA does not manage to sep-
arate well the grass and the trees (both the grass and trees are
in green), while with RandSPA, it occurred that some random
Qamplified some directions that separate better the grass (in
blue) and the trees (in green). Similarly, in the abundance maps
from the unmixing of Samson in Figure 2, RandSPA separates
1https://gitlab.com/nnadisic/randspa
2Downloaded from http://lesun.weebly.com
Dataset SPA Med. RandSPA Best RandSPA
Jasper 8.6869 8.7577 8.0206
Samson 6.4914 6.3114 3.9706
Urban 10.9367 9.6354 6.5402
Cuprite 2.6975 3.526 2.2824
San Diego 12.6845 12.8714 9.2032
Table 2: Relative reconstruction error ∥X−W H ∥F/∥X∥Fin
percent.
the soil (in red), the water (in blue) and the trees (in green) bet-
ter than SPA where the soil (in blue) is extracted but the water
is not clearly identified.
Let us discuss another experiment on the dataset Samson.
We add some Gaussian noise such that SN R = 20dB, we fix
κ= 1 and vary ν, and then show the average best error in
1,5,10 and 20 runs on Figure 1. As we can see, with a suf-
ficient amount of runs that is 10 in this experiment, the rela-
tive error significantly improves for a νnear 10 in comparison
to other choices of ν. In particular, it is also better than both
ν= 1 (VCA) and a high νlike 50 that should behave like SPA.
Without added noise, VCA would perform better than every ν
higher than 1 starting from 10 runs. However, when the data is
noisy, this experiment highlights that VCA is not robust enough
to noise and that the best run from a method between SPA and
VCA is better than both SPA and VCA.
12 12.5 13 13.5
1
3
6
10
25
50
Rank νof Q
Average best in 1 run
12 12.5
Average best in 5 runs
12 12.5
1
3
6
10
25
50
Relative error (%)
Rank νof Q
Average best in 10 runs
11.8 12 12.2
Relative error (%)
Average best in 20 runs
Figure 1: Average best reconstruction error on several runs, de-
pending on ν, with κ= 1, on the hyperspectral image Samson
with added noise such that SN R = 20dB.
5 Conclusion
In this paper, we introduced RandSPA, a variant of the separa-
ble NMF algorithm SPA that introduces randomness to allow
a multi-start strategy. The robustness results of SPA still hold
(a) SPA (b) RandSPA
Jasper
SamsonUrban
Cuprite
Figure 2: Abundance maps in false color from the unmixing of
hyperspectral images.
for RandSPA, provided a bound on the noise that depends on
the parameters used. We showed empirically on the unmixing
of hyperspectral images that, with sufficiently many runs, the
best solution from RandSPA is generally better that the solution
from SPA. We also showed that RandSPA creates a continuum
between the two algorithms SPA and VCA, as we can recover
these algorithms by running RandSPA with some given param-
eter values.
Acknowledgements The authors acknowledge the support by
the F.R.S.-FNRS and the FWO under EOS project O005318F-
RG47. NG also acknowledges the Francqui foundation.
References
[1] Mário César Ugulino Araújo et al. “The successive pro-
jections algorithm for variable selection in spectroscopic
multicomponent analysis”. In: Chemometrics and intel-
ligent laboratory systems 57.2 (2001), pp. 65–73.
[2] Sanjeev Arora et al. “Computing a nonnegative matrix
factorization-Provably”. In: 44th Annual ACM Sympo-
sium on Theory of Computing, STOC’12. 2012, pp. 145–
161.
[3] Jeff Bezanson et al. “Julia: A fresh approach to numer-
ical computing”. In: SIAM Review 59.1 (2017), pp. 65–
98.
[4] Xiao Fu et al. “Nonnegative Matrix Factorization for
Signal and Data Analytics: Identifiability, Algorithms,
and Applications.” In: IEEE Signal processing magazine
36.2 (2019), pp. 59–80.
[5] Nicolas Gillis. Nonnegative matrix factorization. SIAM,
2020.
[6] Nicolas Gillis and Wing-Kin Ma. “Enhancing pure-pixel
identification performance via preconditioning”. In: SIAM
Journal on Imaging Sciences 8.2 (2015), pp. 1161–1186.
[7] Nicolas Gillis and Stephen A Vavasis. “Fast and robust
recursive algorithms for separable nonnegative matrix
factorization”. In: IEEE Transactions on Pattern Analy-
sis and Machine Intelligence 36.4 (2013), pp. 698–714.
[8] Jingu Kim and Haesun Park. “Toward faster nonnega-
tive matrix factorization: A new algorithm and compar-
isons”. In: 2008 Eighth IEEE International Conference
on Data Mining. IEEE. 2008, pp. 353–362.
[9] Wing-Kin Ma et al. “A signal processing perspective
on hyperspectral unmixing: Insights from remote sens-
ing”. In: IEEE Signal Processing Magazine 31.1 (2013),
pp. 67–81.
[10] Nicolas Nadisic, Nicolas Gillis, and Christophe Kervazo.
“Smoothed separable nonnegative matrix factorization”.
In: preprint arXiv:2110.05528 (2021).
[11] José MP Nascimento and José MB Dias. “Vertex com-
ponent analysis: A fast algorithm to unmix hyperspectral
data”. In: IEEE transactions on Geoscience and Remote
Sensing 43.4 (2005), pp. 898–910.
[12] Stephen A Vavasis. “On the complexity of nonnegative
matrix factorization”. In: SIAM Journal on Optimization
20.3 (2010), pp. 1364–1377.