Conference PaperPDF Available

Abstract and Figures

ABSTRACT In recent years, nonnegative matrix factorization (NMF) with volume regularization has been shown to be a powerful identifiable model; for example for hyperspectral unmixing, document classification, community detection and hidden Markov models. In this paper, we show that minimum-volume NMF (min-volNMF) can also be used when the basis matrix is rank deficient, which is a reasonable scenario for some real-world NMF problems (e.g., for unmixing multispectral images). We propose an alternating fast projected gradient method for minvol NMF and illustrate its use on rank-deficient NMF problems; namely a synthetic data set and a multispectral image. Index Terms— nonnegative matrix factoriztion, minimum volume, identifiability, rank deficiency
Content may be subject to copyright.
MINIMUM-VOLUME RANK-DEFICIENT NONNEGATIVE MATRIX FACTORIZATIONS
Valentin Leplat, Andersen M.S. Ang, Nicolas Gillis
University of Mons, Rue de Houdain 9, 7000 Mons, Belgium
ABSTRACT
In recent years, nonnegative matrix factorization (NMF) with
volume regularization has been shown to be a powerful iden-
tifiable model; for example for hyperspectral unmixing, docu-
ment classification, community detection and hidden Markov
models. In this paper, we show that minimum-volume NMF
(min-vol NMF) can also be used when the basis matrix is rank
deficient, which is a reasonable scenario for some real-world
NMF problems (e.g., for unmixing multispectral images). We
propose an alternating fast projected gradient method for min-
vol NMF and illustrate its use on rank-deficient NMF prob-
lems; namely a synthetic data set and a multispectral image.
Index Termsnonnegative matrix factorization, mini-
mum volume, identifiability, rank deficiency
1. INTRODUCTION
Given a nonnegative matrix XRm×n
+and a factorization
rank r, nonnegative matrix factorization (NMF) requires to
find two nonnegative matrices WRm×r
+and HRr×n
+
such that XW H . For simplicity, we will use the Frobe-
nius norm, which is arguably the most widely used, to assess
the error of an NMF solution and consider the following opti-
mization problem
min
WRm×r,HRr×n||XW H ||2
Fs.t. W0and H0.
NMF is in most cases ill-posed because the optimal solution
is not unique. In order to make the solution of the above prob-
lem unique (up to permutation and scaling of the columns of
Wand rows of H) hence making the problem well-posed and
the parameters (W, H)of the problem identifiable, a key idea
is to look for a solution Wwith minimum volume; see [1] and
the references therein. A possible formulation for minimum-
volume NMF (min-vol NMF) is as follows
min
W0,H(:,j )rj||XW H ||2
F+λvol(W),(1)
where r={xRr
+|Pixi1},λis a penalty parame-
ter, and vol(W)is a function that measures the volume of the
columns of W. Note that Hneeds to be normalized otherwise
Authors acknowledge the support by the European Research Council
(ERC starting grant no679515) and by the Fonds de la Recherche Sci-
entifique - FNRS and the Fonds Wetenschappelijk Onderzoek - Vlanderen
(FWO) under EOS Project no O005318F-RG47.
Wwould go to zero since W H = (cW )(H/c)for any c > 0.
In this paper, we will use vol(W) = logdet(WTW+δI),
where Iis the identity matrix of appropriate dimensions. The
reason for using such a measure is that pdet(WTW)/r!is
the volume of the convex hull of the columns of Wand the
origin. Under some appropriate conditions on X=W H ,
this model will provably recover the true underlying (W, H )
that generated X. These recovery conditions require that the
columns of Xare sufficiently well spread in the convex hull
generated by the columns of W[2, 3, 4]; this is the so-called
sufficiently scattered condition. In particular, data points need
to be located on the facets of this convex hull hence Hneeds
to be sufficiently sparse. A few remarks are in order:
The ideas behind min-vol NMF have been introduced in
the hyperspectral image community and date back from the
paper [5]; see also the discussions in [6, 1].
As far as we know, these theoretical results only apply in
noiseless conditions hence robustness to noise of model (1)
still needs to be rigorously analyzed (this is a very promising
but difficult direction of further research).
The sufficiently scattered condition is a generalization of
the separability condition which requires W=X(:,K)for
some index set Kof size r. Separability makes the NMF
problem easily solvable, and efficient and robust algorithms
exist; see, e.g., [7, 6, 8] and the references therein. Note that
although min-vol NMF guarantees identifiability, the corre-
sponding optimization problem (1) is still hard to solve in
general; as the original NMF problem [9].
Another key assumption that is used in min-vol NMF is
that the basis matrix Wis full rank, that is, rank(W) = r;
otherwise det(WTW)=0. However, there are situations
when the matrix Wis not full rank: this happens in particular
when rank(X)6= rank+(X)where rank+(X)is the non-
negative rank of Xwhich is the smallest rsuch that Xhas
an exact NMF decomposition (that is, X=W H). Here is a
simple example:
X=
1100
0011
0110
1001
(2)
for which rank(X)=3<rank+(X)=4. The columns of
the matrix Xare the vertices of a square in a 2-dimensional
subspace; see Fig. 2 for an illustration. A practical situation
where this could happen is in multispectral imaging. Let us
construct the matrix Xsuch that each column X(:, j)0
is the spectral signature of a pixel. Then, under the linear
mixing model, each column of Xis the nonnegative linear
combination of the spectral signatures of the constitutive ma-
terials present in the image, referred to as endmembers: we
have X(:, j) = Pr
k=1 W(:, k)H(k, j ), where W(:, k)is the
spectral signature of the kth endmember, and H(k, j)is the
abundance of the kth endmember in the jth pixel; see [6]
for more details. For multispectral images, the number of
materials within the scene being imaged can be larger than
the number of spectral bands meaning that r > m hence
rank(W)m<r.
In this paper, we focus on the min-vol NMF formulation
in the rank-deficient scenario, that is, when rank(W)< r.
The main contribution of this paper is three-fold: (i) We ex-
plain why min-vol NMF (1) can be used meaningfully when
the basis matrix Wis not full rank. This is, as far as we know,
the first time this observation is made in the literature. (ii) We
propose an algorithm based on alternating projected fast gra-
dient method to tackle this problem. (iii) We illustrate our
results on a synthetic data set and a multispectral image.
2. MIN-VOL NMF IN THE RANK-DEFICIENT CASE
Let us discuss the min-vol NMF model we consider in this
paper, namely,
min
W0,H(:,j )rj||XW H ||2
F+λlogdet(WTW+δI),(3)
which has three key ingredients: the choice of the volume
regularizer, that is, logdet(WTW+δI), the parameters δand
λ. They are discussed in the next three paragraphs.
Choice of the volume regularizer Most functions used
to minimize the volume of the columns of Ware based
on the Gram matrix WTW; in particular, det(WTW)and
logdet(WTW+δI)for some δ > 0are the most widely
used measures; see, e.g., [10, 11]. Note that det(WTW) =
Πr
i=1σ2
i(W), hence the log term allows to weight down
large singular values and has been observed to work bet-
ter in practice; see, e.g., [12]. When Wis rank deficient
(that is, rank(W)< r), some singular values of Ware
equal to zero hence det(WTW)=0. Therefore, the func-
tion det(WTW)cannot distinguish between different rank-
deficient solutions1. However, we have logdet(WTW+δI)
=Pr
i=1 log(σ2
i(W) + δ). Hence if Whas one (or more)
singular value equal to zero, this measure still makes sense:
among two rank-deficient solutions belonging to the same
low-dimensional subspace, minimizing logdet(WTW+δI)
will favor a solution whose convex hull has a smaller volume
within that subspace since decreasing the non-zero singular
values of (WTW+δI)will decrease logdet(WTW+δI).
In mathematical terms, let WRm×rbelong to a k-
dimensional subspace with k < r so that W=U S where
1Of course, one could also use the measure det(WTW+δI )mean-
ingfully in the rank-deficient case. However, it would be numerically more
challenging since for each singular value of Wequal to zero, the objective is
multiplied by δwhich should be chosen relatively small.
URm×kis an orthogonal basis of that subspace and S
Rk×rare the coordinates of the columns of Win that sub-
space. Then, logdet(WTW+δI) = Pk
i=1 log(σ2
i(S) + δ) +
(rk) log(δ). The min-vol criterion logdet(WTW+δI)
with δ > 0is therefore meaningful even when Wdoes not
have rank r.
Choice of δThe function logdet(WTW+δI)which is equal
to Pr
i=1 log(σ2
i(W) + δ)is a non-convex surrogate for the
`0norm of the vector of singular values of W(up to con-
stants factors), that is, of rank(W)[13, 14]. It is sharper than
the `1norm of the vector of singular values (that is, the nu-
clear norm) for δsufficiently small; see Fig. 1. Therefore, if
one wants to promote rank-deficient solutions, δshould not
be chosen too large, say δ0.1. Moreover, δshould not
Fig. 1. Function log(x2+δ)log(δ)
log(1+δ)log(δ)for different values of δ,`1
norm (=|x|) and `0norm (= 0 for x= 0,= 1 otherwise).
be chosen too small otherwise W W T+δI might be badly
conditioned which makes the optimization problem harder to
solve (see Section 3) –also, this could give too much impor-
tance to zero singular values which might not be desirable.
Therefore, in practice, we recommend to use a value of δbe-
tween 0.1 and 103. We will use δ= 0.1in this paper. Note
that in previous works, δwas chosen very small (e.g., 108
in [11]) which, as explained above, is not a desirable choice,
at least in the rank-deficient case. Even in the full-rank case,
we argue that choosing δtoo small is also not desirable since
it promotes rank-deficient solutions.
Choice of λThe choice of δwill influence the choice of λ.
In fact, the smaller δ, the larger |logdet(δ)|, hence to balance
the two terms in the objective (3), λshould be smaller. For the
practical implementation, we will initialize W(0) =X(:,K)
where Kis computed with the successive nonnegative pro-
jection algorithm (SNPA) that can handle the rank-deficient
separable NMF problem [15]. Note that SNPA also provides
the matrix H(0) so as to minimize ||XW(0)H(0) ||2
Fwhile
H(0)(:, j )rfor all j. Finally, we will choose
λ=˜
λ||XW(0)H(0) ||2
F
|logdet(W(0)TW(0) +δI)|,
where we recommend to choose ˜
λbetween 1 and 103de-
pending on the noise level (the noisier the input matrix, the
larger λshould be).
3. ALGORITHM FOR MIN-VOL NMF
Most algorithms for NMF optimize alternatively over Wand
H, and we adopt this strategy in this paper. For the up-
date of H, we will use the projected fast gradient method
(PFGM) from [15]. Note that, as opposed to previously pro-
posed methods for min-vol NMF, we assume that the sum of
the entries of each column of His smaller or equal to one,
not equal to one, which is more general. For the update of W,
we use a PFGM applied on an strongly convex upper approx-
imation of the objective function; similarly as done in [11]–
although in that paper, authors did not consider explicitly the
case W0(Wis unconstrained in their model) and did
not write down explicitly a PFGM taking advantage of strong
convexity. For the sake of completeness, we briefly recall this
approach. The following upper bound for the logdet term
holds: for any Q0and S0, we have
logdet(Q)g(Q, S) = logdet(S) + trace S1(QS)
= trace S1Q+ logdet(S)r.
This follows from the concavity of logdet(.)as g(Q, S)is
the first-order Taylor approximation of logdet(Q)around
S–it has also been used for example in [16]. This gives
logdet(WTW+δI)trace(Y W TW) + logdet(Y1)r
for any Wand any Y= (ZTZ+δI)1with δ > 0. Plugging
this in the original objective function, and denoting wT
ithe
ith row of matrix Wand h., .iis the Frobenius inner product
of two matrices, we obtain
`(W) = ||XW H ||2
F+λlogdet(WTW+δI)
=||X||2
F2hXHT, Wi+hWTW, H HTi
+λlogdet(WTW+δI)
≤ hWTW, HH T+λY i − 2hC, W i+b
= 2
n
X
i=1 1
2wT
iAwicT
iwi+b=¯
`(W),
where Y= (ZTZ+δI)1and A=HHT+λY are pos-
itive definite for δ, λ > 0,C=XH T, and bis a constant
independent of W. Note that ¯
`(W) = `(W)for Z=W.
Minimizing the upper bound ¯
`(W)of `(W)requires to solve
mindependent strongly convex optimization problems with
Hessian matrix A. Using PFGM on this problem, we obtain
a linear convergence method with rate 1κ1
1+κ1where κis
the condition number of A[17]. Note that the subproblem in
variable His not strongly convex when Wis rank deficient in
which case PFGM converges sublinearly, in O(1/k2)where
kis the iteration number. In any case, PFGM is an optimal
first-order method in both cases [17], that is, no first-order
method can have a faster convergence rate. When Wis rank
deficient, we have λ
δL=λmax(A)≤ ||H||2
2+λ
δ, where
Lis the largest eigenvalue of A. This shows the importance
of not choosing δtoo small, since the smaller δ, the larger the
conditioning of Ahence the slower will be the PFGM. Note
that Lis the Lipschitz constant of the gradient of the objective
function and controls the stepsize which is equal to 1/L. Our
proposed algorithm is summarized in Alg. 1. We will use 10
inner iterations for the PFGM on Wand H.
Algorithm 1 Min-vol NMF using alternating PFGM
Require: Input matrix XRm×n
+, the factorization rank r,
δ > 0,˜
λ > 0, number of iterations maxiter.
Ensure: (W, H)is an approximate solution of (3).
1: Initialize (W, H)using SNPA [15].
2: Let λ=˜
λ||XW H||2
F
logdet(WTW+δI).
3: for k= 1,2,...,maxiter do
4: % Update W
5: Let A=HHT+λ(WTW+δI)1and C=XH T.
6: Perform a few steps of PFGM on the prob-
lem minU01
2hUTU, Ai−hU, Ci, with initializa-
tion U=W. Set Was last iterate.
7: % Update H
8: Perform a few steps of PFGM on the problem
minH(:,j)rj||XW H ||2
Fas in [15].
9: end for
4. NUMERICAL EXPERIMENTS
We now apply our method on a synthetic and a real-world data
set. All tests are preformed using Matlab R2015a on a laptop
Intel CORE i7-7500U CPU @2.9GHz 24GB RAM. The code
is available from http://bit.ly/minvolNMF.
Synthetic data set. Let us construct the matrix XR4×500
as follows: Wis taken as the matrix from (2) so that
rank(W) = 3 < r = 4, and each column of His distributed
using the Dirichlet distribution of parameter (0.1,...,0.1).
Each column of Hwith an entry larger 0.8 is resampled as
long as this condition does not hold. This guarantees that no
data point is close to a column of W(this is sometimes re-
ferred to as the purity index). Fig. 2 illustrates this geometric
problem. As observed on Fig. 2, Alg. 1 is able to perfectly
Fig. 2. Synthetic data set and recovery. (Only the first three
entries of each four-dimensional vector are displayed.)
recover the true columns of W. For this experiment, we
use ˜
λ= 0.01. Fig. 3 illustrates the same experiment where
noise is added to X= max(0, W H +N)where N=
randn(m,n) in Matlab notation (i.i.d. Gaussian distribution of
mean zero and standard deviation ). Note that the average of
the entries of Xis 0.5 (each column is a linear combination
of the columns of W, with weights summing to one). Fig. 3
displays the average over 20 randomly generated matrices X
of the relative error d(W, ˜
W) = ||W˜
W||F
||W||Fwhere ˜
Wis the
solution computed by Alg. 1 depending on the noise level
. This illustrates that min-vol NMF is robust against noise
since the d(W, ˜
W)is smaller than 1% for 1%.
Fig. 3. Evolution of the recovery of the true Wdepending on
the noise N=rand(m,n) using Alg. 1 (˜
λ= 0.01,δ= 0.1,
maxiter = 100).
Multispectral image. The San Diego airport is a HYDICE
hyperspectral image (HSI) containing 158 clean bands, and
400 ×400 pixels for each spectral image; see, e.g., [18].
There are mainly three types of materials: road surfaces,
roofs and vegetation (trees and grass). The image can be
well approximated using r=8. Since we are interested in
the case rank(W)<r, we select m=5 spectral band using
the successive projection algorithm [19] (this is essentially
Gram-Schmidt with column pivoting) applied on XT. This
provides bands that are representative: the selected bands are
4, 32, 116, 128, 150. Hence, we are factoring a 5-by-160000
matrix using a r=8. Note that we have removed outlying
pixels (some spectra contain large negative entries while oth-
ers have a norm order of magnitude larger than most pixels).
Fig. 4 displays the abundance maps extracted (that is, the
rows of matrix H): they correspond to meaningful locations
of materials. Here we have used ˜
λ=0.1 and 1000 iterations.
From the initial solution provided by SNPA, min-vol NMF
is able to reduce the error ||XW H ||Fby a factor of 11.7
while the term logdet(WTW+δI)only increases by a factor
of 1.06. The final relative error is ||XWH ||F
||X||F= 0.2%.
5. CONCLUSION
In this paper, we have shown that min-vol NMF can be used
meaningfully for rank-deficient NMF’s. We have provided a
simple algorithm to tackle this problem and have illustrated
the behaviour of the method on synthetic and real-world data
Fig. 4. Abundance maps extract by min-vol NMF using only
five bands of the San Diego airport HSI. From left to right, top
to bottom: vegetation (grass and trees), three different types
of roof tops, four different types of road surfaces.
sets. This work is only preliminary and many important ques-
tions remain open; in particular
Under which conditions can we prove the identifiability of
min-vol NMF in the rank-deficient case (as done in [2, 3] for
the full-rank case)? Intuitively, it seems that a condition sim-
ilar to the sufficiently-scattered condition would be sufficient
but this has to be analysed thoroughly.
Can we prove robustness to noise of such techniques? (The
question is also open for the full-rank case.)
Can we design faster and more robust algorithms? And
algorithms taking advantage of the fact that the solution is
rank-deficient?
6. REFERENCES
[1] Xiao Fu, Kejun Huang, Nicholas D Sidiropoulos, and
Wing-Kin Ma, “Nonnegative matrix factorization for
signal and data analytics: Identifiability, algorithms, and
applications,” IEEE Signal Processing Magazine, 2018,
to appear.
[2] Chia-Hsiang Lin, Wing-Kin Ma, Wei-Chiang Li,
Chong-Yung Chi, and ArulMurugan Ambikapathi,
“Identifiability of the simplex volume minimization cri-
terion for blind hyperspectral unmixing: The no-pure-
pixel case, IEEE Transactions on Geoscience and Re-
mote Sensing, vol. 53, no. 10, pp. 5530–5546, 2015.
[3] Xiao Fu, Wing-Kin Ma, Kejun Huang, and Nicholas D
Sidiropoulos, “Blind separation of quasi-stationary
sources: Exploiting convex geometry in covariance do-
main.,” IEEE Transactions Signal Processing, vol. 63,
no. 9, pp. 2306–2320, 2015.
[4] Xiao Fu, Kejun Huang, and Nicholas D Sidiropoulos,
“On identifiability of nonnegative matrix factorization,
IEEE Signal Processing Letters, vol. 25, no. 3, pp. 328–
332, 2018.
[5] Maurice D Craig, “Minimum-volume transforms for re-
motely sensed data,” IEEE Transactions on Geoscience
and Remote Sensing, vol. 32, no. 3, pp. 542–552, 1994.
[6] Wing-Kin Ma, Jos´
e M Bioucas-Dias, Tsung-Han Chan,
Nicolas Gillis, Paul Gader, Antonio J Plaza, ArulMu-
rugan Ambikapathi, and Chong-Yung Chi, A signal
processing perspective on hyperspectral unmixing: In-
sights from remote sensing,” IEEE Signal Processing
Magazine, vol. 31, no. 1, pp. 67–81, 2014.
[7] Sanjeev Arora, Rong Ge, Ravindran Kannan, and
Ankur Moitra, “Computing a nonnegative matrix
factorization–provably,” in Proceedings of the forty-
fourth annual ACM symposium on Theory of computing.
ACM, 2012, pp. 145–162.
[8] Nicolas Gillis, “Introduction to nonnegative matrix fac-
torization,” SIAG/OPT Views and News, vol. 25, no. 1,
pp. 7–16, 2017.
[9] Stephen A Vavasis, “On the complexity of nonnegative
matrix factorization, SIAM Journal on Optimization,
vol. 20, no. 3, pp. 1364–1377, 2010.
[10] Lidan Miao and Hairong Qi, “Endmember extraction
from highly mixed data using minimum volume con-
strained nonnegative matrix factorization, IEEE Trans-
actions on Geoscience and Remote Sensing, vol. 45, no.
3, pp. 765–777, 2007.
[11] Xiao Fu, Kejun Huang, Bo Yang, Wing-Kin Ma,
and Nicholas D. Sidiropoulos, “Robust volume
minimization-based matrix factorization for remote
sensing and document clustering,” IEEE Transactions
on Signal Processing, vol. 64, no. 23, pp. 6254–6268,
2016.
[12] Andersen M.S. Ang and Nicolas Gillis, “Volume reg-
ularized non-negative matrix factorizations, in 2018
Workshop on Hyperspectral Image and Signal Process-
ing: Evolution in Remote Sensing (WHISPERS), 2018.
[13] Maryam Fazel, Matrix rank minimization with applica-
tions, Ph.D. thesis, Stanford University, 2002.
[14] Maryam Fazel, Haitham Hindi, and Stephen P Boyd,
“Log-det heuristic for matrix rank minimization with
applications to Hankel and Euclidean distance matri-
ces,” in Proceedings of the 2003 American Control Con-
ference. IEEE, 2003, vol. 3, pp. 2156–2162.
[15] Nicolas Gillis, “Successive nonnegative projection algo-
rithm for robust nonnegative blind source separation,”
SIAM Journal on Imaging Sciences, vol. 7, no. 2, pp.
1420–1450, 2014.
[16] Kazuyoshi Yoshii, Ryota Tomioka, Daichi Mochihashi,
and Masataka Goto, “Beyond NMF: Time-domain au-
dio source separation without phase reconstruction,” in
ISMIR, 2013, pp. 369–374.
[17] Yurii Nesterov, Introductory lectures on convex opti-
mization: A basic course, vol. 87, Springer Science &
Business Media, 2013.
[18] Nicolas Gillis, Da Kuang, and Haesun Park, “Hierarchi-
cal clustering of hyperspectral images using rank-two
nonnegative matrix factorization, IEEE Transactions
on Geoscience and Remote Sensing, vol. 53, no. 4, pp.
2066–2078, 2015.
[19] Nicolas Gillis and Stephen A Vavasis, “Fast and robust
recursive algorithms for separable nonnegative matrix
factorization,IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 36, no. 4, pp. 698–714,
2014.
... where . F is the Frobenius norm, λ > 0 is a parameter balancing the two terms in the objective function, I r is the r × r identity matrix, and δ > 0 is a small parameter that prevents log det(W W) from going to −∞ if W is rank deficient [11]. The use of the logarithm of the determinant is less sensible to very disparate singular values of W, leading to better practical performances [8], [12]. ...
... As far as we know, all algorithms for min-vol NMF rely on two-block coordinate descent methods that update each block (W or H) by using some outer optimization algorithm to solve the subproblems formed by restricting the min-vol NMF problem to each block. For example, the state-of-theart method from [11] uses Nesterov fast gradient method to update each factor matrix, one at a time. ...
... This is a crucial aspect that will make our proposed algorithm faster: when we start the update of a block of variables (here, W or H), we can use the inertial force (using the previous iterate) although the other blocks have been updated in the mean time. 2) TITAN allows to update the surrogate after each update of W and H, which was not possible with the algorithm from [11] because it applied fast gradient from convex optimization on a fixed surrogate. ...
... Thus, many researchers have introduced novel NMF algorithms by adding different auxiliary regularizes to the conventional NMF framework in order to improve the uniqueness of its solution with respect to the HU setting. l 1/2 -sparsity constrained NMF (l 1/2 -NMF) [8], spatial group sparsity regularized NMF (SGSNMF) [42], minimum volume rank deficient NMF (Min-vol NMF) [43], manifold regularized sparse NMF [7], Double Constrained NMF [44], total variation regularized reweighted sparse NMF (TV-RSNMF) [45], subspace clustering constrained sparse NMF (SC-NMF) [46], nonsmooth NMF (nsNMF) [47], robust collaborative NMF (R-CoNMF) [48], Subspace Structure Regularized NMF (SSRNMF) [49], graph regularized NMF (GNMF) [50] and Projection-Based NMF (PNMF) [51] are some customary NMF-based baselines utilized for HU. Furthermore, A new architecture has recently emerged for blind unmixing under the premise Nonnegative Tensor factorization (NTF). ...
... In order to solve above problems while improving the uniqueness, many previous works have incorporated additional auxiliary regularizes on A and S [8], [10], [42], [43], [48], [50]. ...
... The proposed algorithm is tested on simulated as well as real hyperspectral datasets (See Fig. 4). Also, we compare the performance of our proposed algorithm with the popular stateof-the-art NMF-based HU baselines: l 1/2 -NMF [8], SGSNMF [42], Min-vol NMF [43], R-CoNMF [48], SSRNMF [49] and MVNTF [52]. To ensure that the evaluations are done on common grounds, we utilize the same initializing procedure and stopping criteria as mentioned in Sections V-A and V-D respectively, for all the competing algorithms except MVNTF algorithm which is initialized with random values. ...
Article
Full-text available
Hyperspectral unmixing (HU) has become an important technique in exploiting hyperspectral data since it decomposes a mixed pixel into a collection of endmembers weighted by fractional abundances. The endmembers of a hyperspectral image (HSI) are more likely to be generated by independent sources and be mixed in a macroscopic degree before arriving at the sensor element of the imaging spectrometer as mixed spectra. Over the past few decades, many attempts have focused on imposing auxiliary regularizes on the conventional nonnegative matrix factorization (NMF) framework in order to effectively unmix these mixed spectra. As a promising step toward finding an optimum regularizer to extract endmembers, this paper presents a novel blind HU algorithm, referred to as Kurtosis-based Smooth Nonnegative Matrix Factorization (KbSNMF) which incorporates a novel regularizer based on the statistical independence of the probability density functions of endmember spectra. Imposing this regularizer on the conventional NMF framework promotes the extraction of independent endmembers while further enhancing the parts-based representation of data. Experiments conducted on diverse synthetic HSI datasets (with numerous numbers of endmembers, spectral bands, pixels, and noise levels) and three standard real HSI datasets demonstrate the validity of the proposed KbSNMF algorithm compared to several stateof- the-art NMF-based HU baselines. The proposed algorithm exhibits superior performance especially in terms of extracting endmember spectra from hyperspectral data; therefore, it could uplift the performance of recent deep learning HU methods which utilize the endmember spectra as supervisory input data for abundance extraction.
... • Simplex volume minimization: We use the model (6.1) min W,H X − W H 2 F +λ logdet(W W + δI r ) such that H(:, j) ∈ ∆ r for all j, which has been shown to provide the best practical performances [16,1], and use the efficient algorithm proposed in [29]. We will use different parameters forλ = ...
... , where (W (0) , H (0) ) is computed by SNPA, while δ = 0.1; see [29] for more details. We refer to this algorithm as min vol. ...
... Other separable NMF algorithms also work in the rank-deficient case (see, for example, [4,37,24]) but are computationally much more demanding than SNPA, as they rely on solving n linear programs in n variables. • The min vol model (6.1) can be used in the rank-deficient case [29]. However, it does not come with identifiability guarantees (this is actually an open problem). ...
... where . F is the Frobenius norm, λ > 0 is a parameter balancing the two terms in the objective function, I r is the r × r identity matrix, and δ > 0 is a small parameter that prevents log det(W W) from going to −∞ if W is rank deficient [11]. The use of the logarithm of the determinant is less sensible to very disparate singular values of W, leading to better practical performances [8], [12]. ...
... As far as we know, all algorithms for min-vol NMF rely on two-block coordinate descent methods that update each block (W or H) by using some outer optimization algorithm to solve the subproblems formed by restricting the min-vol NMF problem to each block. For example, the state-of-theart method from [11] uses Nesterov fast gradient method to update each factor matrix, one at a time. ...
... This is a crucial aspect that will make our proposed algorithm faster: when we start the update of a block of variables (here, W or H), we can use the inertial force (using the previous iterate) although the other blocks have been updated in the mean time. 2) TITAN allows to update the surrogate after each update of W and H, which was not possible with the algorithm from [11] because it applied fast gradient from convex optimization on a fixed surrogate. ...
Preprint
Full-text available
Nonnegative matrix factorization with the minimum-volume criterion (min-vol NMF) guarantees that, under some mild and realistic conditions, the factorization has an essentially unique solution. This result has been successfully leveraged in many applications, including topic modeling, hyperspectral image unmixing, and audio source separation. In this paper, we propose a fast algorithm to solve min-vol NMF which is based on a recently introduced block majorization-minimization framework with extrapolation steps. We illustrate the effectiveness of our new algorithm compared to the state of the art on several real hyperspectral images and document data sets.
... To solve the minVolNMF problem, a majorization-minimization (MM) framework is usually considered. This consists in minimizing a surrogate function, namely a strongly convex upper approximation of the loss function, see [27] and [28] for the details. The FPGM of Algorithm 4 can then be applied on this surrogate. ...
Preprint
Full-text available
Deep matrix factorizations (deep MFs) are recent unsupervised data mining techniques inspired by constrained low-rank approximations. They aim to extract complex hierarchies of features within high-dimensional datasets. Most of the loss functions proposed in the literature to evaluate the quality of deep MF models and the underlying optimization frameworks are not consistent because different losses are used at different layers. In this paper, we introduce two meaningful loss functions for deep MF and present a generic framework to solve the corresponding optimization problems. We illustrate the effectiveness of this approach through the integration of various constraints and regularizations, such as sparsity, nonnegativity and minimum-volume. The models are successfully applied on both synthetic and real data, namely for hyperspectral unmixing and extraction of facial features.
... Therefore, in general, unsupervised CNMF is not regularized enough to perform transcription. While some works focus on further regularization of NMF [14], we instead turn towards semi-supervision. Figure 3. Three trained templates from the AkPnCGdD synthetic piano in MAPS, using τ = 10 convolution size. ...
Preprint
Full-text available
Automatic Music Transcription, which consists in transforming an audio recording of a musical performance into symbolic format, remains a difficult Music Information Retrieval task. In this work, we propose a semi-supervised approach using low-rank matrix factorization techniques, in particular Convolutive Nonnegative Matrix Factorization. In the semi-supervised setting, only a single recording of each individual notes is required. We show on the MAPS dataset that the proposed semi-supervised CNMF method performs better than state-of-the-art low-rank factorization techniques and a little worse than supervised deep learning state-of-the-art methods, while however suffering from generalization issues.
... 6. MV-NMF is a state-of-the-art minimum-volume NMF algorithm [11] which uses a fast gradient method to solve the sub problems in W and H from [24]. ...
Preprint
Full-text available
Nonnegative matrix factorization (NMF) is a popular model in the field of pattern recognition. It aims to find a low rank approximation for nonnegative data M by a product of two nonnegative matrices W and H. In general, NMF is NP-hard to solve while it can be solved efficiently under separability assumption, which requires the columns of factor matrix are equal to columns of the input matrix. In this paper, we generalize separability assumption based on 3-factor NMF M=P_1SP_2, and require that S is a sub-matrix of the input matrix. We refer to this NMF as a Co-Separable NMF (CoS-NMF). We discuss some mathematics properties of CoS-NMF, and present the relationships with other related matrix factorizations such as CUR decomposition, generalized separable NMF(GS-NMF), and bi-orthogonal tri-factorization (BiOR-NM3F). An optimization model for CoS-NMF is proposed and alternated fast gradient method is employed to solve the model. Numerical experiments on synthetic datasets, document datasets and facial databases are conducted to verify the effectiveness of our CoS-NMF model. Compared to state-of-the-art methods, CoS-NMF model performs very well in co-clustering task, and preserves a good approximation to the input data matrix as well.
... Some basic approaches to these nonconvex problems include projected gradient methods (Lin, 2007), multiplicative update rules (Gonzalez and Zhang, 2005; Lee and Seung, 1999) and alternating optimization (Chu et al., 2004;Paatero and Tapper, 1994). More sophisticated algorithms for NMF have been proposed in recent years, for example see Gillis and Vavasis (2014); Leplat et al. (2019);Mizutani (2014). In this paper we present algorithms to obtain good solutions for the regularized AA problem with sparsity constraints. ...
Preprint
Full-text available
We consider the problem of sparse nonnegative matrix factorization (NMF) with archetypal regularization. The goal is to represent a collection of data points as nonnegative linear combinations of a few nonnegative sparse factors with appealing geometric properties, arising from the use of archetypal regularization. We generalize the notion of robustness studied in Javadi and Montanari (2019) (without sparsity) to the notions of (a) strong robustness that implies each estimated archetype is close to the underlying archetypes and (b) weak robustness that implies there exists at least one recovered archetype that is close to the underlying archetypes. Our theoretical results on robustness guarantees hold under minimal assumptions on the underlying data, and applies to settings where the underlying archetypes need not be sparse. We propose new algorithms for our optimization problem; and present numerical experiments on synthetic and real datasets that shed further insights into our proposed framework and theoretical developments.
Article
Radio Frequency Identification (RFID) has been one of the critical technologies of the Internet of Things (IoT). With the rapid development of the IoT, the RFID systems are required to be more efficient and with high throughput capacity. In the widespread IoT application scenes, the collision problem of the RFID tags has become the increasingly remarkable problem in RFID systems. Traditionally, the anti-collision algorithms of RFID systems are always based on time division multiple access (TDMA). Although the TDMA based anti-collision algorithms are simple and easy to implement, it often misses tags and costs high time. Afterwards, the anti-collision algorithms based on blind source separation (BSS) have been introduced. These BSS based anti-collision algorithms are more efficient and stable, but they are mostly suitable for the determined or overdetermined case, i.e., the number of tags is less than that of the readers in RFID systems. Only a few anti-collision algorithms are taken into account of the underdetermined collision model. Because this underdetermined RFID collision model will give rise to more difficult solution but with very meaningfully practical IoT applications. Therefore, to investigate high quality underdetermined anti-collision algorithm for RFID system plays an important role in improving the efficiency of RFID system, and enable RFID implement more wide applications in future IoT systems. As a motivation, this paper proposes a new anti-collision algorithm for underdetermined RFID mixed system for performance improvement. In this work, the nonnegative matrix factorization (NMF) with minimum correlation and minimum volume constrains, i.e., the new MCV_NMF algorithm is proposed for anti-collision application in underdetermined RFID systems. This algorithm combines the independent principle of the tag signals with the NMF mechanism to achieve performance enhancement. The experimental results and analysis corroborate that this new algorithm can implement the underdetermined collision problem well and enhance the throughput capacity of RFID system.
Conference Paper
Full-text available
This work considers two volume regularized non-negative matrix factorization (NMF) problems that decompose a non-negative matrix X into the product of two nonnegative matrices W and H with a regularization on the volume of the convex hull spanned by the columns of W. This regularizer takes two forms: the determinant (det) and logarithm of the determinant (logdet) of the Gramian of W. In this paper, we explore the structure of these problems and present several algorithms, including a new algorithm based on an eigenvalue upper bound of the logdet function. Experimental results on synthetic data show that (i) the new algorithm is competitive with the standard Taylor bound, and (ii) the logdet regularizer works better than the det regularizer. We also illustrate the applicability of the new algorithm on the San Diego airport hyperspectral image.
Article
Full-text available
Nonnegative matrix factorization (NMF) has become a workhorse for signal and data analytics, triggered by its model parsimony and interpretability. Perhaps a bit surprisingly, the understanding to its model identifiability---the major reason behind the interpretability in many applications such as topic mining and hyperspectral imaging---had been rather limited until recent years. Beginning from the 2010s, the identifiability research of NMF has progressed considerably: Many interesting and important results have been discovered by the signal processing (SP) and machine learning (ML) communities. NMF identifiability has a great impact on many aspects in practice, such as ill-posed formulation avoidance and performance-guaranteed algorithm design. On the other hand, there is no tutorial paper that introduces NMF from an identifiability viewpoint. In this paper, we aim at filling this gap by offering a comprehensive and deep tutorial on model identifiability of NMF as well as the connections to algorithms and applications. This tutorial will help researchers and graduate students grasp the essence and insights of NMF, thereby avoiding typical `pitfalls' that are often times due to unidentifiable NMF formulations. This paper will also help practitioners pick/design suitable factorization tools for their own problems.
Article
Full-text available
In this letter, we propose a new identification criterion that guarantees the recovery of the low-rank latent factors in the nonnegative matrix factorization (NMF) model, under mild conditions. Specifically, using the proposed criterion, it suffices to identify the latent factors if the rows of one factor are \emph{sufficiently scattered} over the nonnegative orthant, while no structural assumption is imposed on the other factor except being full-rank. This is by far the mildest condition under which the latent factors are provably identifiable from the NMF model.
Article
Full-text available
In this paper, we introduce and provide a short overview of nonnegative matrix factorization (NMF). Several aspects of NMF are discussed, namely, the application in hyperspectral imaging, geometry and uniqueness of NMF solutions, complexity, algorithms, and its link with extended formulations of polyhedra. In order to put NMF into perspective, the more general problem class of constrained low-rank matrix approximation problems is first briefly introduced.
Article
Full-text available
This paper revisits blind source separation of instantaneously mixed quasi-stationary sources (BSS-QSS), motivated by the observation that in certain applications (e.g., speech) there exist time frames during which only one source is active, or locally dominant. Combined with nonnegativity of source powers, this endows the problem with a nice convex geometry that enables elegant and efficient BSS solutions. Local dominance is tantamount to the so-called pure pixel/separability assumption in hyperspectral unmixing/nonnegative matrix factorization, respectively. Building on this link, a very simple algorithm called successive projection algorithm (SPA) is considered for estimating the mixing system in closed form. To complement SPA in the specific BSS-QSS context, an algebraic preprocessing procedure is proposed to suppress short-term source cross-correlation interference. The proposed procedure is simple, effective, and supported by theoretical analysis. Solutions based on volume minimization (VolMin) are also considered. By theoretical analysis, it is shown that VolMin guarantees perfect mixing system identifiability under an assumption more relaxed than (exact) local dominance—which means wider applicability in practice. Exploiting the specific structure of BSS-QSS, a fast VolMin algorithm is proposed for the overdetermined case. Careful simulations using real speech sources showcase the simplicity, efficiency, and accuracy of the proposed algorithms.
Article
Full-text available
In blind hyperspectral unmixing (HU), the pure-pixel assumption is well-known to be powerful in enabling simple and effective blind HU solutions. However, the pure-pixel assumption is not always satisfied in an exact sense, especially for scenarios where pixels are all intimately mixed. In the no pure-pixel case, a good blind HU approach to consider is the minimum volume enclosing simplex (MVES). Empirical experience has suggested that MVES algorithms can perform well without pure pixels, although it was not totally clear why this is true from a theoretical viewpoint. This paper aims to address the latter issue. We develop an analysis framework wherein the perfect identifiability of MVES is studied under the noiseless case. We prove that MVES is indeed robust against lack of pure pixels, as long as the pixels do not get too heavily mixed and too asymmetrically spread. Also, our analysis reveals a surprising and counter-intuitive result, namely, that MVES becomes more robust against lack of pure pixels as the number of endmembers increases. The theoretical results are verified by numerical simulations.
Article
Full-text available
Blind hyperspectral unmixing (HU), also known as unsupervised HU, is one of the most prominent research topics in signal processing (SP) for hyperspectral remote sensing [1], [2]. Blind HU aims at identifying materials present in a captured scene, as well as their compositions, by using high spectral resolution of hyperspectral images. It is a blind source separation (BSS) problem from a SP viewpoint. Research on this topic started in the 1990s in geoscience and remote sensing [3]-[7], enabled by technological advances in hyperspectral sensing at the time. In recent years, blind HU has attracted much interest from other fields such as SP, machine learning, and optimization, and the subsequent cross-disciplinary research activities have made blind HU a vibrant topic. The resulting impact is not just on remote sensing - blind HU has provided a unique problem scenario that inspired researchers from different fields to devise novel blind SP methods. In fact, one may say that blind HU has established a new branch of BSS approaches not seen in classical BSS studies. In particular, the convex geometry concepts - discovered by early remote sensing researchers through empirical observations [3]-[7] and refined by later research - are elegant and very different from statistical independence-based BSS approaches established in the SP field. Moreover, the latest research on blind HU is rapidly adopting advanced techniques, such as those in sparse SP and optimization. The present development of blind HU seems to be converging to a point where the lines between remote sensing-originated ideas and advanced SP and optimization concepts are no longer clear, and insights from both sides would be used to establish better methods.
Article
Full-text available
In this paper, we design a hierarchical clustering algorithm for high-resolution hyperspectral images. At the core of the algorithm, a new rank-two nonnegative matrix factorizations (NMF) algorithm is used to split the clusters, which is motivated by convex geometry concepts. The method starts with a single cluster containing all pixels, and, at each step, (i) selects a cluster in such a way that the error at the next step is minimized, and (ii) splits the selected cluster into two disjoint clusters using rank-two NMF in such a way that the clusters are well balanced and stable. The proposed method can also be used as an endmember extraction algorithm in the presence of pure pixels. The effectiveness of this approach is illustrated on several synthetic and real-world hyperspectral images, and shown to outperform standard clustering techniques such as k-means, spherical k-means and standard NMF.
Article
In the nonnegative matrix factorization (NMF) problem we are given an $n \times m$ nonnegative matrix $M$ and an integer $r > 0$. Our goal is to express $M$ as $A W$, where $A$ and $W$ are nonnegative matrices of size $n \times r$ and $r \times m$, respectively. In some applications, it makes sense to ask instead for the product $AW$ to approximate $M$, i.e. (approximately) minimize $\left\lVert{M - AW}_F\right\rVert$, where $\left\lVert\right\rVert_F$, denotes the Frobenius norm; we refer to this as approximate NMF. This problem has a rich history spanning quantum mechanics, probability theory, data analysis, polyhedral combinatorics, communication complexity, demography, chemometrics, etc. In the past decade NMF has become enormously popular in machine learning, where $A$ and $W$ are computed using a variety of local search heuristics. Vavasis recently proved that this problem is NP-complete. (Without the restriction that $A$ and $W$ be nonnegative, both the exact and approximate problems can be solved optimally via the singular value decomposition.) We initiate a study of when this problem is solvable in polynomial time. Our results are the following: 1. We give a polynomial-time algorithm for exact and approximate NMF for every constant $r$. Indeed NMF is most interesting in applications precisely when $r$ is small. 2. We complement this with a hardness result, that if exact $NMF$ can be solved in time $(nm)^{o(r)}$, 3-SAT has a subexponential-time algorithm. This rules out substantial improvements to the above algorithm. 3. We give an algorithm that runs in time polynomial in $n$, $m$, and $r$ under the separablity condition identified by Donoho and Stodden in 2003. The algorithm may be practical since it is simple and noise tolerant (under benign assumptions). Separability is believed to hold in many practical settings. To the best of our knowledge, this last result is the first example of a polynomial-time algorithm that provably works under a non-trivial condition on the input and we believe that this will be an interesting and important direction for future work.
Article
This paper considers \emph{volume minimization} (VolMin)-based structured matrix factorization (SMF). VolMin is a factorization criterion that decomposes a given data matrix into a basis matrix times a structured coefficient matrix via finding the minimum-volume simplex that encloses all the columns of the data matrix. Recent work showed that VolMin guarantees the identifiability of the factor matrices under mild conditions that are realistic in a wide variety of applications. This paper focuses on both theoretical and practical aspects of VolMin. On the theory side, exact equivalence of two independently developed sufficient conditions for VolMin identifiability is proven here, thereby providing a more comprehensive understanding of this aspect of VolMin. On the algorithm side, computational complexity and sensitivity to outliers are two key challenges associated with real-world applications of VolMin. These are addressed here via a new VolMin algorithm that handles volume regularization in a computationally simple way, and automatically detects and {iteratively downweights} outliers, simultaneously. Simulations and real-data experiments using a remotely sensed hyperspectral image and the Reuters document corpus are employed to showcase the effectiveness of the proposed algorithm.