Content uploaded by Nicolas Nadisic

Author content

All content in this area was uploaded by Nicolas Nadisic on Aug 29, 2022

Content may be subject to copyright.

Randomized Successive Projection Algorithm

Olivier VUTHA NH , Nicolas NADISIC, Nicolas GILLIS

Département de Mathématique et Recherche Opérationnelle, Université de Mons

Rue de Houdain 9, 7000 Mons, Belgique.

{olivier.vuthanh,nicolas.nadisic,nicolas.gillis}@umons.ac.be

Résumé – L’algorithme de projections successives (SPA) est un algorithme standard pour la factorisation non-négative de matrices (NMF). Il

est basé sur l’hypothèse de séparabilité. En démélange hyperspectral, c’est-à-dire l’extraction des matériaux dans une image hyperspectrale, la

séparabilité est équivalente à l’hypothèse du pixel pur et stipule que, pour chaque matériau présent dans l’image, il existe au moins un pixel

composé uniquement de ce matériau. SPA est rapide et a été prouvé robuste au bruit, mais il est sensible aux valeurs aberrantes (outliers). Aussi,

il est déterministe, et donc pour un problème donné il produit toujours la même solution. Or, il a été démontré empiriquement que l’algorithme

non-déterministe vertex component analysis (VCA), lorsqu’il est exécuté un nombre assez grand de fois, produit souvent au moins une solution

qui est meilleure que la solution de SPA. Dans cet article, nous cherchons à combiner ces qualités et introduisons une version aléatoire de SPA,

appelée RandSPA, qui produit des résultats potentiellement différents à chaque exécution. Il peut être exécuté plusieurs fois pour conserver la

meilleure solution, et il est encore garanti robuste au bruit. Des expériences de démélange d’images hyperspectrales montrent que la meilleure

solution sur plusieurs exécutions de RandSPA est généralement meilleure que la solution du SPA original.

Abstract – The successive projection algorithm (SPA) is a widely used algorithm for nonnegative matrix factorization (NMF). It is based on

the separability assumption. In hyperspectral unmixing, that is, the extraction of materials in a hyperspectral image, separability is equivalent to

the pure-pixel assumption and states that for each material present in the image there exists at least one pixel composed of only this material.

SPA is fast and provably robust to noise, but it not robust to outliers. Also, it is deterministic, so for a given setting it always produces the same

solution. Yet, it has been shown empirically that the non-deterministic algorithm vertex component analysis (VCA), when run sufﬁciently many

times, often produces at least one solution that is better than the solution of SPA. In this paper, we combine the best of both worlds and introduce

a randomized version of SPA dubbed RandSPA, that produces potentially different results at each run. It can be run several times to keep the best

solution, and it is still provably robust to noise. Experiments on the unmixing of hyperspectral images show that the best solution among several

runs of RandSPA is generally better that the solution of vanilla SPA.

1 Introduction

Nonnegative matrix factorization (NMF) is a linear dimension-

ality reduction technique that became a standard tool to extract

latent structures in nonnegative data. Given an input matrix

X∈Rm×n

+and a factorization rank r < min(m, n), NMF

consists in ﬁnding two factors W∈Rm×r

+and H∈Rr×n

+

such that X≈W H . Columns of Xare called data points, and

if His column-stochastic then the columns of Wcan be seen as

the vertices of the convex hull containing the data points. Ap-

plications of NMF include feature extraction in images, topic

modeling, audio source separation, chemometrics, or blind hy-

perspectral unmixing (HU), see for example [4] and the refer-

ences therein. Blind HU consists in identifying the materials

present in an hyperspectral image as well as their distribution

in the pixels of the image.

In general, NMF is NP-hard [12]. However, under the sepa-

rability assumption, it is solvable in polynomial time [2]. This

assumption states that for every vertex (column of W), there

exists at least one data point (column of X) equal to this ver-

tex. In blind HU, this is known as the pure-pixel assumption

and means that for each material, there is at least one pixel

composed almost purely of this material. Many algorithms

have been introduced that leverage this assumption, see for in-

stance [5, Chapter 7] and the references therein. Recently, al-

gorithms for separable NMF that are provably robust to noise

have been introduced [2]. One of the most widely used is the

successive projection algorithm (SPA) [1].

SPA is robust to noise and generally works well in practice.

However, it suffers from several drawbacks, notably a sensitiv-

ity to outliers. SPA is deterministic, that is for a given problem

it gives the same result at every run. It is also greedy, in the

sense that it extract vertices sequentially, so an error at a given

iteration cannot be compensated in the following iterations. In

this paper, we aim at addressing the sensitivity to outliers by

designing a non-deterministic variant of SPA that could be run

several times, in the hope that at least one run will not extract

outliers.

Let us discuss an observation from [10]. The separable NMF

algorithm called vertex component analysis (VCA) [11] in-

cludes a random projection, therefore it is non-deterministic

and at each run it produces potentially a different result. VCA

is simpler and its guarantees are weaker than those of SPA, and

the experiments in [10] show that VCA performs worse than

SPA on average, but they also show that the best result of VCA

over many runs is in most cases better that the result of SPA

in terms of reconstruction error. This observation is our main

motivation to design a non-deterministic variant of SPA, that

we coin as randomized SPA (RandSPA).

This paper is organized as follows. In section 2 we introduce

the general form of recursive algorithm for separable NMF an-

alyzed in [7] which generalizes SPA. In section 3 we present

the main contribution of this paper, that is a randomized vari-

ant of SPA, called RandSPA. We show the theoretical results on

the robustness to noise of SPA still hold for RandSPA, while the

randomization allows to better handle outliers by allowing a di-

versity in the solutions produced. In section 4 we illustrate the

advantages of our method with experiments on both synthetic

data sets and the unmixing of hyperspectral images.

2 Successive Projection Algorithm

In this section, we discuss the successive projection algorithm

(SPA). It is based on the separability assumption, detailed be-

low.

Assumption 1 (Separability) The m-by-nmatrix X∈Rm×n

is r-separable if there exist a nonnegative matrix Hsuch that

X=X(:,J)H, where X(:,J)denotes the subset of columns

of Xindexed by Jand |J| =r.

The pseudocode for a general recursive algorithm for separa-

ble NMF is given in Algorithm 1. Historically, the ﬁrst variant

of Algorithm 1 has been introduced by Araújo et al. [1] for

spectroscopic component analysis with f(x) = ∥x∥2

2=x⊤x,

which is the so-called SPA. In the noiseless case, that is, under

Assumption 1, SPA is guaranteed to retrieve Jand more gen-

erally, the vertices of the set of points which are the columns

of X[9]. This particular choice of fis proved to be the most

robust to noise given the bounds in [7]. See Theorem 1 with

Q=Ifor the error bounds. The algorithm is iterative and is

composed of the following two main steps:

• Selection step: the column that maximizes a given func-

tion fis selected (line 3).

• Projection step: all the columns are projected onto the

orthogonal complement of the current selected columns

(line 5).

These two steps are repeated rtimes, rbeing the target number

of extracted columns. The drawback with the ℓ2-norm is its

sensitivity to outliers and the fact that it makes SPA determin-

istic. If some outliers are selected, running SPA again would

still retrieve the exact same outliers.

3 Randomized SPA

In this section, we introduce the main contribution of this work,

that is a randomized variant of SPA called RandSPA. Its key

Algorithm 1: Recursive algorithm for separable

NMF [7]. It coincides with SPA when f(x) = ∥x∥2

2.

Input: An r-separable matrix X∈Rm×n, a function f

to maximize.

Output: Index set Jof cardinality rsuch that

X≈X(:,J)Hfor some H≥0.

1Let J=∅,P⊥=Im,V=[].

2for k= 1 :rdo

3Let jk= argmax1≤j≤nf(P⊥X(:, j)). (Break ties

arbitrarily, if necessary.)

4Let J=J ∪ {jk}.

5Update the projector P⊥onto the orthogonal

complement of X(:,J):

vk=P⊥X(:, jk)

∥P⊥X(:, jk)∥2

,

V= [V vk],

P⊥←Im−V V T.

features are that it computes potentially different solutions at

each run, thus allowing a multi-start strategy, and that the the-

oretical robustness results of SPA still hold.

RandSPA follows Algorithm 1 with f(x) = x⊤QQ⊤x, with

Q∈Rm×νbeing a randomly generated matrix with ν≥r. To

control the conditioning of Q, we generate the columns of Q

such that they are mutually orthogonal and such that

∥Q(:,1)∥2= 1 ≥ ·· · ≥ ∥Q(:, ν)∥2= 1/√κ

where κis the desired conditioning of QQ⊤. For the columns

between the ﬁrst and the last one, we make the arbitrary choice

to ﬁx them also to 1/√κ. If Q⊤Whas full column rank, which

happens with probability one if ν≥r, RandSPA is robust to

noise with the following bounds:

Theorem 1 [6, Corollary 1] Let ˜

X=X+N, where Xsat-

isﬁes Assumption 1, Whas full column rank, and Nis noise

with maxj∥N(:, j)∥2≤ϵ; and let Q∈Rm×νwith ν≥r. If

Q⊤Whas full column rank and

ϵ≤ Oσmin (W)

√rκ3(Q⊤W),

then SPA applied on matrix Q⊤˜

Xidentiﬁes a set of indices J

corresponding to the columns of Wup to the error

max

1≤j≤rmin

k∈J

W(:, j)−˜

X(:, k)

2≤ Oϵκ(W)κ(Q⊤W)3.

Theorem 1 is directly applicable to RandSPA since choosing

f(x) = x⊤QQ⊤xis equivalent to performing SPA on Q⊤˜

X.

The only subtlety is that with RandSPA, a random Qis drawn

at each column extraction. The error bound for RandSPA is

then the one with the highest drawn κ(Q⊤W).

Let us note that choosing ν= 1 or ∥Q(:, j )∥= 1/√κwith

κ→ ∞ for all j > 1retrieves VCA. Choosing ν=mand

κ(Q)=1retrieves SPA. Hence, RandSPA creates a continuum

between SPA, with more provable robustness, and VCA, with

more solution diversity.

4 Numerical experiments

In this section, we study empirically the performance of the

proposed algorithm RandSPA on the unmixing of hyperspectral

images. The algorithms have been implemented in Julia [3].

Our codes are available in an online repository1along with the

data and test scripts used in our experiments. Our tests are per-

formed on 5 real hyperspectral datasets2described in Table 1.

Dataset m n r

Jasper 198 100 ×100 = 10000 4

Samson 156 95 ×95 = 9025 3

Urban 162 307 ×307 = 94249 5

Cuprite 188 250 ×191 = 47750 12

San Diego 188 400 ×400 = 160000 8

Table 1: Summary of the datasets, for which X∈Rm×n.

For all the tests, we choose ν=r+ 1 and a relatively well

conditioned Qwith κ(Q) = 1.5. We then compute W=X(:,J)

once with SPA and 30 times with RandSPA. Next, we com-

pute Hby solving the nonnegative least squares (NNLS) sub-

problem minH≥0∥X−W H ∥2

Fexactly with an active-set al-

gorithm [8], and we compute the relative reconstruction error

∥X−W H ∥F/∥X∥F. For RandSPA, we show the best error

and the median error among the 30 runs. Note that in our set-

ting we choose the best solution as the one with the lower re-

construction error, but other methods could be used to choose

the best solution among all the computed ones.

The results of the experiments for SPA and RandSPA are

presented in Table 2. The median error of RandSPA is on

the same order than that of SPA, except for Cuprite where it

is higher. It is even slightly smaller for Samson and Urban.

On the other hand, the error from the best run of RandSPA

is always smaller than that of SPA. Particularly, the error is de-

creased respectively by 37%, 32% and 27% for Samson, Urban

and San Diego. This improvement is quite noticeable.

The resulting false color images for Jasper, Samson, Urban

and Cuprite are shown on Figure 2. They represent the repar-

tition of the materials identiﬁed by SPA and RandSPA in the

image. As we can see for Urban, SPA does not manage to sep-

arate well the grass and the trees (both the grass and trees are

in green), while with RandSPA, it occurred that some random

Qampliﬁed some directions that separate better the grass (in

blue) and the trees (in green). Similarly, in the abundance maps

from the unmixing of Samson in Figure 2, RandSPA separates

1https://gitlab.com/nnadisic/randspa

2Downloaded from http://lesun.weebly.com

Dataset SPA Med. RandSPA Best RandSPA

Jasper 8.6869 8.7577 8.0206

Samson 6.4914 6.3114 3.9706

Urban 10.9367 9.6354 6.5402

Cuprite 2.6975 3.526 2.2824

San Diego 12.6845 12.8714 9.2032

Table 2: Relative reconstruction error ∥X−W H ∥F/∥X∥Fin

percent.

the soil (in red), the water (in blue) and the trees (in green) bet-

ter than SPA where the soil (in blue) is extracted but the water

is not clearly identiﬁed.

Let us discuss another experiment on the dataset Samson.

We add some Gaussian noise such that SN R = 20dB, we ﬁx

κ= 1 and vary ν, and then show the average best error in

1,5,10 and 20 runs on Figure 1. As we can see, with a suf-

ﬁcient amount of runs that is 10 in this experiment, the rela-

tive error signiﬁcantly improves for a νnear 10 in comparison

to other choices of ν. In particular, it is also better than both

ν= 1 (VCA) and a high νlike 50 that should behave like SPA.

Without added noise, VCA would perform better than every ν

higher than 1 starting from 10 runs. However, when the data is

noisy, this experiment highlights that VCA is not robust enough

to noise and that the best run from a method between SPA and

VCA is better than both SPA and VCA.

12 12.5 13 13.5

1

3

6

10

25

50

Rank νof Q

Average best in 1 run

12 12.5

Average best in 5 runs

12 12.5

1

3

6

10

25

50

Relative error (%)

Rank νof Q

Average best in 10 runs

11.8 12 12.2

Relative error (%)

Average best in 20 runs

Figure 1: Average best reconstruction error on several runs, de-

pending on ν, with κ= 1, on the hyperspectral image Samson

with added noise such that SN R = 20dB.

5 Conclusion

In this paper, we introduced RandSPA, a variant of the separa-

ble NMF algorithm SPA that introduces randomness to allow

a multi-start strategy. The robustness results of SPA still hold

(a) SPA (b) RandSPA

Jasper

SamsonUrban

Cuprite

Figure 2: Abundance maps in false color from the unmixing of

hyperspectral images.

for RandSPA, provided a bound on the noise that depends on

the parameters used. We showed empirically on the unmixing

of hyperspectral images that, with sufﬁciently many runs, the

best solution from RandSPA is generally better that the solution

from SPA. We also showed that RandSPA creates a continuum

between the two algorithms SPA and VCA, as we can recover

these algorithms by running RandSPA with some given param-

eter values.

Acknowledgements The authors acknowledge the support by

the F.R.S.-FNRS and the FWO under EOS project O005318F-

RG47. NG also acknowledges the Francqui foundation.

References

[1] Mário César Ugulino Araújo et al. “The successive pro-

jections algorithm for variable selection in spectroscopic

multicomponent analysis”. In: Chemometrics and intel-

ligent laboratory systems 57.2 (2001), pp. 65–73.

[2] Sanjeev Arora et al. “Computing a nonnegative matrix

factorization-Provably”. In: 44th Annual ACM Sympo-

sium on Theory of Computing, STOC’12. 2012, pp. 145–

161.

[3] Jeff Bezanson et al. “Julia: A fresh approach to numer-

ical computing”. In: SIAM Review 59.1 (2017), pp. 65–

98.

[4] Xiao Fu et al. “Nonnegative Matrix Factorization for

Signal and Data Analytics: Identiﬁability, Algorithms,

and Applications.” In: IEEE Signal processing magazine

36.2 (2019), pp. 59–80.

[5] Nicolas Gillis. Nonnegative matrix factorization. SIAM,

2020.

[6] Nicolas Gillis and Wing-Kin Ma. “Enhancing pure-pixel

identiﬁcation performance via preconditioning”. In: SIAM

Journal on Imaging Sciences 8.2 (2015), pp. 1161–1186.

[7] Nicolas Gillis and Stephen A Vavasis. “Fast and robust

recursive algorithms for separable nonnegative matrix

factorization”. In: IEEE Transactions on Pattern Analy-

sis and Machine Intelligence 36.4 (2013), pp. 698–714.

[8] Jingu Kim and Haesun Park. “Toward faster nonnega-

tive matrix factorization: A new algorithm and compar-

isons”. In: 2008 Eighth IEEE International Conference

on Data Mining. IEEE. 2008, pp. 353–362.

[9] Wing-Kin Ma et al. “A signal processing perspective

on hyperspectral unmixing: Insights from remote sens-

ing”. In: IEEE Signal Processing Magazine 31.1 (2013),

pp. 67–81.

[10] Nicolas Nadisic, Nicolas Gillis, and Christophe Kervazo.

“Smoothed separable nonnegative matrix factorization”.

In: preprint arXiv:2110.05528 (2021).

[11] José MP Nascimento and José MB Dias. “Vertex com-

ponent analysis: A fast algorithm to unmix hyperspectral

data”. In: IEEE transactions on Geoscience and Remote

Sensing 43.4 (2005), pp. 898–910.

[12] Stephen A Vavasis. “On the complexity of nonnegative

matrix factorization”. In: SIAM Journal on Optimization

20.3 (2010), pp. 1364–1377.