Conference PaperPDF Available

Minimum-Volume Rank-Deficient Non-negative Matrix Factorizations

Authors:

Abstract and Figures

ABSTRACT In recent years, nonnegative matrix factorization (NMF) with volume regularization has been shown to be a powerful identifiable model; for example for hyperspectral unmixing, document classification, community detection and hidden Markov models. In this paper, we show that minimum-volume NMF (min-volNMF) can also be used when the basis matrix is rank deficient, which is a reasonable scenario for some real-world NMF problems (e.g., for unmixing multispectral images). We propose an alternating fast projected gradient method for minvol NMF and illustrate its use on rank-deficient NMF problems; namely a synthetic data set and a multispectral image. Index Terms— nonnegative matrix factoriztion, minimum volume, identifiability, rank deficiency
Content may be subject to copyright.
MINIMUM-VOLUME RANK-DEFICIENT NONNEGATIVE MATRIX FACTORIZATIONS
Valentin Leplat, Andersen M.S. Ang, Nicolas Gillis
University of Mons, Rue de Houdain 9, 7000 Mons, Belgium
ABSTRACT
In recent years, nonnegative matrix factorization (NMF) with
volume regularization has been shown to be a powerful iden-
tifiable model; for example for hyperspectral unmixing, docu-
ment classification, community detection and hidden Markov
models. In this paper, we show that minimum-volume NMF
(min-vol NMF) can also be used when the basis matrix is rank
deficient, which is a reasonable scenario for some real-world
NMF problems (e.g., for unmixing multispectral images). We
propose an alternating fast projected gradient method for min-
vol NMF and illustrate its use on rank-deficient NMF prob-
lems; namely a synthetic data set and a multispectral image.
Index Termsnonnegative matrix factorization, mini-
mum volume, identifiability, rank deficiency
1. INTRODUCTION
Given a nonnegative matrix XRm×n
+and a factorization
rank r, nonnegative matrix factorization (NMF) requires to
find two nonnegative matrices WRm×r
+and HRr×n
+
such that XW H . For simplicity, we will use the Frobe-
nius norm, which is arguably the most widely used, to assess
the error of an NMF solution and consider the following opti-
mization problem
min
WRm×r,HRr×n||XW H ||2
Fs.t. W0and H0.
NMF is in most cases ill-posed because the optimal solution
is not unique. In order to make the solution of the above prob-
lem unique (up to permutation and scaling of the columns of
Wand rows of H) hence making the problem well-posed and
the parameters (W, H)of the problem identifiable, a key idea
is to look for a solution Wwith minimum volume; see [1] and
the references therein. A possible formulation for minimum-
volume NMF (min-vol NMF) is as follows
min
W0,H(:,j )rj||XW H ||2
F+λvol(W),(1)
where r={xRr
+|Pixi1},λis a penalty parame-
ter, and vol(W)is a function that measures the volume of the
columns of W. Note that Hneeds to be normalized otherwise
Authors acknowledge the support by the European Research Council
(ERC starting grant no679515) and by the Fonds de la Recherche Sci-
entifique - FNRS and the Fonds Wetenschappelijk Onderzoek - Vlanderen
(FWO) under EOS Project no O005318F-RG47.
Wwould go to zero since W H = (cW )(H/c)for any c > 0.
In this paper, we will use vol(W) = logdet(WTW+δI),
where Iis the identity matrix of appropriate dimensions. The
reason for using such a measure is that pdet(WTW)/r!is
the volume of the convex hull of the columns of Wand the
origin. Under some appropriate conditions on X=W H ,
this model will provably recover the true underlying (W, H )
that generated X. These recovery conditions require that the
columns of Xare sufficiently well spread in the convex hull
generated by the columns of W[2, 3, 4]; this is the so-called
sufficiently scattered condition. In particular, data points need
to be located on the facets of this convex hull hence Hneeds
to be sufficiently sparse. A few remarks are in order:
The ideas behind min-vol NMF have been introduced in
the hyperspectral image community and date back from the
paper [5]; see also the discussions in [6, 1].
As far as we know, these theoretical results only apply in
noiseless conditions hence robustness to noise of model (1)
still needs to be rigorously analyzed (this is a very promising
but difficult direction of further research).
The sufficiently scattered condition is a generalization of
the separability condition which requires W=X(:,K)for
some index set Kof size r. Separability makes the NMF
problem easily solvable, and efficient and robust algorithms
exist; see, e.g., [7, 6, 8] and the references therein. Note that
although min-vol NMF guarantees identifiability, the corre-
sponding optimization problem (1) is still hard to solve in
general; as the original NMF problem [9].
Another key assumption that is used in min-vol NMF is
that the basis matrix Wis full rank, that is, rank(W) = r;
otherwise det(WTW)=0. However, there are situations
when the matrix Wis not full rank: this happens in particular
when rank(X)6= rank+(X)where rank+(X)is the non-
negative rank of Xwhich is the smallest rsuch that Xhas
an exact NMF decomposition (that is, X=W H). Here is a
simple example:
X=
1100
0011
0110
1001
(2)
for which rank(X)=3<rank+(X)=4. The columns of
the matrix Xare the vertices of a square in a 2-dimensional
subspace; see Fig. 2 for an illustration. A practical situation
where this could happen is in multispectral imaging. Let us
construct the matrix Xsuch that each column X(:, j)0
is the spectral signature of a pixel. Then, under the linear
mixing model, each column of Xis the nonnegative linear
combination of the spectral signatures of the constitutive ma-
terials present in the image, referred to as endmembers: we
have X(:, j) = Pr
k=1 W(:, k)H(k, j ), where W(:, k)is the
spectral signature of the kth endmember, and H(k, j)is the
abundance of the kth endmember in the jth pixel; see [6]
for more details. For multispectral images, the number of
materials within the scene being imaged can be larger than
the number of spectral bands meaning that r > m hence
rank(W)m<r.
In this paper, we focus on the min-vol NMF formulation
in the rank-deficient scenario, that is, when rank(W)< r.
The main contribution of this paper is three-fold: (i) We ex-
plain why min-vol NMF (1) can be used meaningfully when
the basis matrix Wis not full rank. This is, as far as we know,
the first time this observation is made in the literature. (ii) We
propose an algorithm based on alternating projected fast gra-
dient method to tackle this problem. (iii) We illustrate our
results on a synthetic data set and a multispectral image.
2. MIN-VOL NMF IN THE RANK-DEFICIENT CASE
Let us discuss the min-vol NMF model we consider in this
paper, namely,
min
W0,H(:,j )rj||XW H ||2
F+λlogdet(WTW+δI),(3)
which has three key ingredients: the choice of the volume
regularizer, that is, logdet(WTW+δI), the parameters δand
λ. They are discussed in the next three paragraphs.
Choice of the volume regularizer Most functions used
to minimize the volume of the columns of Ware based
on the Gram matrix WTW; in particular, det(WTW)and
logdet(WTW+δI)for some δ > 0are the most widely
used measures; see, e.g., [10, 11]. Note that det(WTW) =
Πr
i=1σ2
i(W), hence the log term allows to weight down
large singular values and has been observed to work bet-
ter in practice; see, e.g., [12]. When Wis rank deficient
(that is, rank(W)< r), some singular values of Ware
equal to zero hence det(WTW)=0. Therefore, the func-
tion det(WTW)cannot distinguish between different rank-
deficient solutions1. However, we have logdet(WTW+δI)
=Pr
i=1 log(σ2
i(W) + δ). Hence if Whas one (or more)
singular value equal to zero, this measure still makes sense:
among two rank-deficient solutions belonging to the same
low-dimensional subspace, minimizing logdet(WTW+δI)
will favor a solution whose convex hull has a smaller volume
within that subspace since decreasing the non-zero singular
values of (WTW+δI)will decrease logdet(WTW+δI).
In mathematical terms, let WRm×rbelong to a k-
dimensional subspace with k < r so that W=U S where
1Of course, one could also use the measure det(WTW+δI )mean-
ingfully in the rank-deficient case. However, it would be numerically more
challenging since for each singular value of Wequal to zero, the objective is
multiplied by δwhich should be chosen relatively small.
URm×kis an orthogonal basis of that subspace and S
Rk×rare the coordinates of the columns of Win that sub-
space. Then, logdet(WTW+δI) = Pk
i=1 log(σ2
i(S) + δ) +
(rk) log(δ). The min-vol criterion logdet(WTW+δI)
with δ > 0is therefore meaningful even when Wdoes not
have rank r.
Choice of δThe function logdet(WTW+δI)which is equal
to Pr
i=1 log(σ2
i(W) + δ)is a non-convex surrogate for the
`0norm of the vector of singular values of W(up to con-
stants factors), that is, of rank(W)[13, 14]. It is sharper than
the `1norm of the vector of singular values (that is, the nu-
clear norm) for δsufficiently small; see Fig. 1. Therefore, if
one wants to promote rank-deficient solutions, δshould not
be chosen too large, say δ0.1. Moreover, δshould not
Fig. 1. Function log(x2+δ)log(δ)
log(1+δ)log(δ)for different values of δ,`1
norm (=|x|) and `0norm (= 0 for x= 0,= 1 otherwise).
be chosen too small otherwise W W T+δI might be badly
conditioned which makes the optimization problem harder to
solve (see Section 3) –also, this could give too much impor-
tance to zero singular values which might not be desirable.
Therefore, in practice, we recommend to use a value of δbe-
tween 0.1 and 103. We will use δ= 0.1in this paper. Note
that in previous works, δwas chosen very small (e.g., 108
in [11]) which, as explained above, is not a desirable choice,
at least in the rank-deficient case. Even in the full-rank case,
we argue that choosing δtoo small is also not desirable since
it promotes rank-deficient solutions.
Choice of λThe choice of δwill influence the choice of λ.
In fact, the smaller δ, the larger |logdet(δ)|, hence to balance
the two terms in the objective (3), λshould be smaller. For the
practical implementation, we will initialize W(0) =X(:,K)
where Kis computed with the successive nonnegative pro-
jection algorithm (SNPA) that can handle the rank-deficient
separable NMF problem [15]. Note that SNPA also provides
the matrix H(0) so as to minimize ||XW(0)H(0) ||2
Fwhile
H(0)(:, j )rfor all j. Finally, we will choose
λ=˜
λ||XW(0)H(0) ||2
F
|logdet(W(0)TW(0) +δI)|,
where we recommend to choose ˜
λbetween 1 and 103de-
pending on the noise level (the noisier the input matrix, the
larger λshould be).
3. ALGORITHM FOR MIN-VOL NMF
Most algorithms for NMF optimize alternatively over Wand
H, and we adopt this strategy in this paper. For the up-
date of H, we will use the projected fast gradient method
(PFGM) from [15]. Note that, as opposed to previously pro-
posed methods for min-vol NMF, we assume that the sum of
the entries of each column of His smaller or equal to one,
not equal to one, which is more general. For the update of W,
we use a PFGM applied on an strongly convex upper approx-
imation of the objective function; similarly as done in [11]–
although in that paper, authors did not consider explicitly the
case W0(Wis unconstrained in their model) and did
not write down explicitly a PFGM taking advantage of strong
convexity. For the sake of completeness, we briefly recall this
approach. The following upper bound for the logdet term
holds: for any Q0and S0, we have
logdet(Q)g(Q, S) = logdet(S) + trace S1(QS)
= trace S1Q+ logdet(S)r.
This follows from the concavity of logdet(.)as g(Q, S)is
the first-order Taylor approximation of logdet(Q)around
S–it has also been used for example in [16]. This gives
logdet(WTW+δI)trace(Y W TW) + logdet(Y1)r
for any Wand any Y= (ZTZ+δI)1with δ > 0. Plugging
this in the original objective function, and denoting wT
ithe
ith row of matrix Wand h., .iis the Frobenius inner product
of two matrices, we obtain
`(W) = ||XW H ||2
F+λlogdet(WTW+δI)
=||X||2
F2hXHT, Wi+hWTW, H HTi
+λlogdet(WTW+δI)
≤ hWTW, HH T+λY i − 2hC, W i+b
= 2
n
X
i=1 1
2wT
iAwicT
iwi+b=¯
`(W),
where Y= (ZTZ+δI)1and A=HHT+λY are pos-
itive definite for δ, λ > 0,C=XH T, and bis a constant
independent of W. Note that ¯
`(W) = `(W)for Z=W.
Minimizing the upper bound ¯
`(W)of `(W)requires to solve
mindependent strongly convex optimization problems with
Hessian matrix A. Using PFGM on this problem, we obtain
a linear convergence method with rate 1κ1
1+κ1where κis
the condition number of A[17]. Note that the subproblem in
variable His not strongly convex when Wis rank deficient in
which case PFGM converges sublinearly, in O(1/k2)where
kis the iteration number. In any case, PFGM is an optimal
first-order method in both cases [17], that is, no first-order
method can have a faster convergence rate. When Wis rank
deficient, we have λ
δL=λmax(A)≤ ||H||2
2+λ
δ, where
Lis the largest eigenvalue of A. This shows the importance
of not choosing δtoo small, since the smaller δ, the larger the
conditioning of Ahence the slower will be the PFGM. Note
that Lis the Lipschitz constant of the gradient of the objective
function and controls the stepsize which is equal to 1/L. Our
proposed algorithm is summarized in Alg. 1. We will use 10
inner iterations for the PFGM on Wand H.
Algorithm 1 Min-vol NMF using alternating PFGM
Require: Input matrix XRm×n
+, the factorization rank r,
δ > 0,˜
λ > 0, number of iterations maxiter.
Ensure: (W, H)is an approximate solution of (3).
1: Initialize (W, H)using SNPA [15].
2: Let λ=˜
λ||XW H||2
F
logdet(WTW+δI).
3: for k= 1,2,...,maxiter do
4: % Update W
5: Let A=HHT+λ(WTW+δI)1and C=XH T.
6: Perform a few steps of PFGM on the prob-
lem minU01
2hUTU, Ai−hU, Ci, with initializa-
tion U=W. Set Was last iterate.
7: % Update H
8: Perform a few steps of PFGM on the problem
minH(:,j)rj||XW H ||2
Fas in [15].
9: end for
4. NUMERICAL EXPERIMENTS
We now apply our method on a synthetic and a real-world data
set. All tests are preformed using Matlab R2015a on a laptop
Intel CORE i7-7500U CPU @2.9GHz 24GB RAM. The code
is available from http://bit.ly/minvolNMF.
Synthetic data set. Let us construct the matrix XR4×500
as follows: Wis taken as the matrix from (2) so that
rank(W) = 3 < r = 4, and each column of His distributed
using the Dirichlet distribution of parameter (0.1,...,0.1).
Each column of Hwith an entry larger 0.8 is resampled as
long as this condition does not hold. This guarantees that no
data point is close to a column of W(this is sometimes re-
ferred to as the purity index). Fig. 2 illustrates this geometric
problem. As observed on Fig. 2, Alg. 1 is able to perfectly
Fig. 2. Synthetic data set and recovery. (Only the first three
entries of each four-dimensional vector are displayed.)
recover the true columns of W. For this experiment, we
use ˜
λ= 0.01. Fig. 3 illustrates the same experiment where
noise is added to X= max(0, W H +N)where N=
randn(m,n) in Matlab notation (i.i.d. Gaussian distribution of
mean zero and standard deviation ). Note that the average of
the entries of Xis 0.5 (each column is a linear combination
of the columns of W, with weights summing to one). Fig. 3
displays the average over 20 randomly generated matrices X
of the relative error d(W, ˜
W) = ||W˜
W||F
||W||Fwhere ˜
Wis the
solution computed by Alg. 1 depending on the noise level
. This illustrates that min-vol NMF is robust against noise
since the d(W, ˜
W)is smaller than 1% for 1%.
Fig. 3. Evolution of the recovery of the true Wdepending on
the noise N=rand(m,n) using Alg. 1 (˜
λ= 0.01,δ= 0.1,
maxiter = 100).
Multispectral image. The San Diego airport is a HYDICE
hyperspectral image (HSI) containing 158 clean bands, and
400 ×400 pixels for each spectral image; see, e.g., [18].
There are mainly three types of materials: road surfaces,
roofs and vegetation (trees and grass). The image can be
well approximated using r=8. Since we are interested in
the case rank(W)<r, we select m=5 spectral band using
the successive projection algorithm [19] (this is essentially
Gram-Schmidt with column pivoting) applied on XT. This
provides bands that are representative: the selected bands are
4, 32, 116, 128, 150. Hence, we are factoring a 5-by-160000
matrix using a r=8. Note that we have removed outlying
pixels (some spectra contain large negative entries while oth-
ers have a norm order of magnitude larger than most pixels).
Fig. 4 displays the abundance maps extracted (that is, the
rows of matrix H): they correspond to meaningful locations
of materials. Here we have used ˜
λ=0.1 and 1000 iterations.
From the initial solution provided by SNPA, min-vol NMF
is able to reduce the error ||XW H ||Fby a factor of 11.7
while the term logdet(WTW+δI)only increases by a factor
of 1.06. The final relative error is ||XWH ||F
||X||F= 0.2%.
5. CONCLUSION
In this paper, we have shown that min-vol NMF can be used
meaningfully for rank-deficient NMF’s. We have provided a
simple algorithm to tackle this problem and have illustrated
the behaviour of the method on synthetic and real-world data
Fig. 4. Abundance maps extract by min-vol NMF using only
five bands of the San Diego airport HSI. From left to right, top
to bottom: vegetation (grass and trees), three different types
of roof tops, four different types of road surfaces.
sets. This work is only preliminary and many important ques-
tions remain open; in particular
Under which conditions can we prove the identifiability of
min-vol NMF in the rank-deficient case (as done in [2, 3] for
the full-rank case)? Intuitively, it seems that a condition sim-
ilar to the sufficiently-scattered condition would be sufficient
but this has to be analysed thoroughly.
Can we prove robustness to noise of such techniques? (The
question is also open for the full-rank case.)
Can we design faster and more robust algorithms? And
algorithms taking advantage of the fact that the solution is
rank-deficient?
6. REFERENCES
[1] Xiao Fu, Kejun Huang, Nicholas D Sidiropoulos, and
Wing-Kin Ma, “Nonnegative matrix factorization for
signal and data analytics: Identifiability, algorithms, and
applications,” IEEE Signal Processing Magazine, 2018,
to appear.
[2] Chia-Hsiang Lin, Wing-Kin Ma, Wei-Chiang Li,
Chong-Yung Chi, and ArulMurugan Ambikapathi,
“Identifiability of the simplex volume minimization cri-
terion for blind hyperspectral unmixing: The no-pure-
pixel case, IEEE Transactions on Geoscience and Re-
mote Sensing, vol. 53, no. 10, pp. 5530–5546, 2015.
[3] Xiao Fu, Wing-Kin Ma, Kejun Huang, and Nicholas D
Sidiropoulos, “Blind separation of quasi-stationary
sources: Exploiting convex geometry in covariance do-
main.,” IEEE Transactions Signal Processing, vol. 63,
no. 9, pp. 2306–2320, 2015.
[4] Xiao Fu, Kejun Huang, and Nicholas D Sidiropoulos,
“On identifiability of nonnegative matrix factorization,
IEEE Signal Processing Letters, vol. 25, no. 3, pp. 328–
332, 2018.
[5] Maurice D Craig, “Minimum-volume transforms for re-
motely sensed data,” IEEE Transactions on Geoscience
and Remote Sensing, vol. 32, no. 3, pp. 542–552, 1994.
[6] Wing-Kin Ma, Jos´
e M Bioucas-Dias, Tsung-Han Chan,
Nicolas Gillis, Paul Gader, Antonio J Plaza, ArulMu-
rugan Ambikapathi, and Chong-Yung Chi, A signal
processing perspective on hyperspectral unmixing: In-
sights from remote sensing,” IEEE Signal Processing
Magazine, vol. 31, no. 1, pp. 67–81, 2014.
[7] Sanjeev Arora, Rong Ge, Ravindran Kannan, and
Ankur Moitra, “Computing a nonnegative matrix
factorization–provably,” in Proceedings of the forty-
fourth annual ACM symposium on Theory of computing.
ACM, 2012, pp. 145–162.
[8] Nicolas Gillis, “Introduction to nonnegative matrix fac-
torization,” SIAG/OPT Views and News, vol. 25, no. 1,
pp. 7–16, 2017.
[9] Stephen A Vavasis, “On the complexity of nonnegative
matrix factorization, SIAM Journal on Optimization,
vol. 20, no. 3, pp. 1364–1377, 2010.
[10] Lidan Miao and Hairong Qi, “Endmember extraction
from highly mixed data using minimum volume con-
strained nonnegative matrix factorization, IEEE Trans-
actions on Geoscience and Remote Sensing, vol. 45, no.
3, pp. 765–777, 2007.
[11] Xiao Fu, Kejun Huang, Bo Yang, Wing-Kin Ma,
and Nicholas D. Sidiropoulos, “Robust volume
minimization-based matrix factorization for remote
sensing and document clustering,” IEEE Transactions
on Signal Processing, vol. 64, no. 23, pp. 6254–6268,
2016.
[12] Andersen M.S. Ang and Nicolas Gillis, “Volume reg-
ularized non-negative matrix factorizations, in 2018
Workshop on Hyperspectral Image and Signal Process-
ing: Evolution in Remote Sensing (WHISPERS), 2018.
[13] Maryam Fazel, Matrix rank minimization with applica-
tions, Ph.D. thesis, Stanford University, 2002.
[14] Maryam Fazel, Haitham Hindi, and Stephen P Boyd,
“Log-det heuristic for matrix rank minimization with
applications to Hankel and Euclidean distance matri-
ces,” in Proceedings of the 2003 American Control Con-
ference. IEEE, 2003, vol. 3, pp. 2156–2162.
[15] Nicolas Gillis, “Successive nonnegative projection algo-
rithm for robust nonnegative blind source separation,”
SIAM Journal on Imaging Sciences, vol. 7, no. 2, pp.
1420–1450, 2014.
[16] Kazuyoshi Yoshii, Ryota Tomioka, Daichi Mochihashi,
and Masataka Goto, “Beyond NMF: Time-domain au-
dio source separation without phase reconstruction,” in
ISMIR, 2013, pp. 369–374.
[17] Yurii Nesterov, Introductory lectures on convex opti-
mization: A basic course, vol. 87, Springer Science &
Business Media, 2013.
[18] Nicolas Gillis, Da Kuang, and Haesun Park, “Hierarchi-
cal clustering of hyperspectral images using rank-two
nonnegative matrix factorization, IEEE Transactions
on Geoscience and Remote Sensing, vol. 53, no. 4, pp.
2066–2078, 2015.
[19] Nicolas Gillis and Stephen A Vavasis, “Fast and robust
recursive algorithms for separable nonnegative matrix
factorization,IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 36, no. 4, pp. 698–714,
2014.
... We compare MV-Dual to six state-of-the-art algorithms: \bullet SNPA [16] is based on the separability assumption and presents a robust extension to the successive projection algorithm (SPA) [2,16] by taking advantage of the nonnegativity constraint in the decomposition. \bullet Simplex volume minimization (Min-Vol) fits a simplex with minimum volume to the data points using the following optimization problem [24]: ...
... This problem is optimized based on a block coordinate descent approach using the fast gradient method. The parameter \lambda is chosen as in [24]: \\lambda \| X - W0H0\| 2 F \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ (W \top 0 W0+\delta Ir) where (W 0 , H 0 ) is obtained by SNPA and \\lambda \in \{ 0.1, 1, 5\} where 0.1 is the default value in [24]. \bullet MVES [10] searches for an enclosing simplex with minimum volume and converts the problem into a determinant maximization problem by focusing on the inverse of \W defined in (2.2). \bullet Maximum-volume inscribed ellipsoid (MVIE) [27] inscribes a maximum-volume ellipsoid in the convex hull of the data points to identify the facets of conv(W ). ...
... This problem is optimized based on a block coordinate descent approach using the fast gradient method. The parameter \lambda is chosen as in [24]: \\lambda \| X - W0H0\| 2 F \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ (W \top 0 W0+\delta Ir) where (W 0 , H 0 ) is obtained by SNPA and \\lambda \in \{ 0.1, 1, 5\} where 0.1 is the default value in [24]. \bullet MVES [10] searches for an enclosing simplex with minimum volume and converts the problem into a determinant maximization problem by focusing on the inverse of \W defined in (2.2). \bullet Maximum-volume inscribed ellipsoid (MVIE) [27] inscribes a maximum-volume ellipsoid in the convex hull of the data points to identify the facets of conv(W ). ...
... 4]. This leads to identifiability/uniqueness of NMF, as stated in Theorem 2. In practice, we use logdet(W ⊤ W + δI) (with the addition of a small parameter δ) for numerical stability; see the discussion by Leplat et al. [2019]. ...
... • In scenarios where the factorization rank has been overestimated, min-vol NMF can perform automatic rank detection by setting some of the rank-one factors to zero [Leplat et al., 2019]. ...
... However, applying Theorem 2 to each layer individually is not possible because it would require the W ℓ matrices to have full rank, which is precluded by construction due to the hierarchical structure where W ℓ−1 = W ℓ H ℓ and the assumption r ℓ < r ℓ−1 . Fortunately, empirical observations suggest that min-vol NMF can recover W even when it is rank-deficient, provided that H is sufficiently sparse, as demonstrated by Leplat et al. [2019]. Additionally, the literature includes sparse NMF models, such as those discussed by Abdolali and Gillis [2021], which offer identifiability even in the rank-deficient case. ...
Article
Full-text available
Deep nonnegative matrix factorization (deep NMF) has recently emerged as a valuable technique for extracting multiple layers of features across different scales. However, all existing deep NMF models and algorithms have primarily centered their evaluation on the least squares error, which may not be the most appropriate metric for assessing the quality of approximations on diverse data sets. For instance, when dealing with data types such as audio signals and documents, it is widely acknowledged that ß-divergences offer a more suitable alternative. In this article, we develop new models and algorithms for deep NMF using some ß-divergences, with a focus on the Kullback-Leibler divergence. Subsequently, we apply these techniques to the extraction of facial features, the identification of topics within document collections, and the identification of materials within hyperspectral images.
... First, it means the model prevents over-fitting. Second, compared with existing NMF models such as the minimum-volume NMF [19,5] (see below) which was shown to exhibit [3] rank-finding ability, SON-NMF is applicable to rank-deficient matrix. ...
... Recently it has been observed in [3] that when using volume regularization in the form of log det(W ⊤ W + δI r ), minvol NMF on rank deficient matrix M (i.e., overestimating the r parameter) has the ability to zeroing out extra components in W , H. This has also been observed in audio blind source separation [5], where a rank-7 factorization is used on a dataset with 3 sources, the minvol NMF is able to set the redundant components to zero. ...
... First we use a synthetic data [3] that the data matrix Z = Dataset generation We follows [3]. In the experiment, we use Z as the ground truth W , denoted as W true , we generate the ground truth H, denoted as H true , by sampling from a Dirichlet distribution with distribution parameter α = 0.05 for each element in a column vector. ...
Preprint
Full-text available
When applying nonnegative matrix factorization (NMF), generally the rank parameter is unknown. Such rank in NMF, called the nonnegative rank, is usually estimated heuristically since computing the exact value of it is NP-hard. In this work, we propose an approximation method to estimate such rank while solving NMF on-the-fly. We use sum-of-norm (SON), a group-lasso structure that encourages pairwise similarity, to reduce the rank of a factor matrix where the rank is overestimated at the beginning. On various datasets, SON-NMF is able to reveal the correct nonnegative rank of the data without any prior knowledge nor tuning. SON-NMF is a nonconvx nonsmmoth non-separable non-proximable problem, solving it is nontrivial. First, as rank estimation in NMF is NP-hard, the proposed approach does not enjoy a lower computational complexity. Using a graph-theoretic argument, we prove that the complexity of the SON-NMF is almost irreducible. Second, the per-iteration cost of any algorithm solving SON-NMF is possibly high, which motivated us to propose a first-order BCD algorithm to approximately solve SON-NMF with a low per-iteration cost, in which we do so by the proximal average operator. Lastly, we propose a simple greedy method for post-processing. SON-NMF exhibits favourable features for applications. Beside the ability to automatically estimate the rank from data, SON-NMF can deal with rank-deficient data matrix, can detect weak component with small energy. Furthermore, on the application of hyperspectral imaging, SON-NMF handle the issue of spectral variability naturally.
... Volume minimization based methods find a minimum volume simplex that encloses all pixels in the data; the endmembers are the vertices of the obtained simplex. Methods such as MVES [17] and MinVolNMF [18] enforce the enclosure of pixels as a hard constraint, while other methods such as MVSA [19] and SISAL [6] attempt to account for noise by allowing negative abundance estimates with some penalty term. Geometric approaches based on volume minimization do not require the pure pixel assumption to be satisfied, but other data conditions may be necessary. ...
... For a single patch under our proposed model, this would be equivalent to the patch containing a pixel with no foreground material and a pixel entirely covered with foreground material. A more relaxed condition known as sufficient scattering has also been proposed [39], [40]. 1 It has been shown that under the aforementioned data conditions for the linear mixing model, algorithms such as MinVolNMF [18] can recover the true material signatures. To our knowledge, no works have developed equivalent identifiability conditions for the bilinear mixing model. ...
... . . , K, and f version of MinVolNMF [18], [45], BMMF-LS-NM [32], and SNPALQ [33]. Given a non-negative input matrix Y ∈ R M ×N + , MinVolNMF solves the following problem: ...
Article
Hyperspectral imaging considers the measurement of spectral signatures in near and far field settings. In the far field setting, the interactions of material spectral signatures are typically modeled using linear mixing. In the near field setting, material signatures frequently interact in a nonlinear manner (e.g., intimate mixing). An important task in hyperspectral imaging is to estimate the distribution and spectral signatures of materials present in hyperspectral data, i.e., unmixing. Motivated by forensics, this work considers a specific unmixing task, namely, the problem of foreground material signature extraction in an intimate mixing setting where thin layers of foreground material are deposited on other (background) materials. The unmixing task presents a fundamental challenge of unique (identifiable) recovery of material signatures in this and other settings. We propose a novel model for this intimate mixing setting and explore a framework for the task of foreground material signature extraction with identifiability guarantees under this model. We identify solution criteria and data conditions under which a foreground material signature can be extracted up to scaling and elementwise-inverse variations with theoretical guarantees in a noiseless setting. We present algorithms based on two solution criteria (volume minimization and endpoint member identification) to recover foreground material signatures under these conditions. Numerical experiments on real and synthetic data illustrate the efficacy of the proposed algorithms.
... To enhance the analysis of these mixed spectra, researchers have made various attempts over the past few decades to incorporate additional regularizations into the traditional NMF framework. These include methods such as spatial group sparsity regularized NMF (SGSNMF) proposed by Wang et al. [33], minimum volume rank deficient NMF (Min-vol NMF) introduced by Leplat et al. [34,35], projection-based NMF (PNMF) by Yuan et al. [36], among others. ...
... In this section, we evaluate the accuracy of the endmember extraction and unmixing results obtained by the KMBNMF algorithm. To assess its performance, we compare it with six popular blind hyperspectral unmixing algorithms: manifold regularized sparse NMF (GLNMF) [35], spatial group sparsity-regularized NMF (SGSNMF) [33], Min-vol NMF [34], robust collaborative NMF for HU (R-CoNMF) [49], matrix-vector nonnegative tensor factorization for HU (MVNTF) [50] and the kurtosis constrained NMF (KbsNMF). ...
Article
Full-text available
The Nonnegative Matrix Factorization (NMF) algorithm and its variants have gained widespread popularity across various domains, including neural networks, text clustering, image processing, and signal analysis. In the context of hyperspectral unmixing (HU), an important task involving the accurate extraction of endmembers from mixed spectra, researchers have been actively exploring different regularization techniques within the traditional NMF framework. These techniques aim to improve the precision and reliability of the endmember extraction process in HU. In this study, we propose a novel HU algorithm called KMBNMF, which introduces an average kurtosis regularization term based on endmember spectra to enhance endmember extraction, additionally, it integrates a manifold regularization term into the average kurtosis-constrained NMF by constructing a symmetric weight matrix. This combination of these two regularization techniques not only optimizes the extraction process of independent endmembers but also improves the part-based representation capability of hyperspectral data. Experimental results obtained from simulated and real-world hyperspectral datasets demonstrate the competitive performance of the proposed KMBNMF algorithm when compared to state-of-the-art algorithms.
... The NMF decomposition, denoted as M ≈ U V where both U and V are nonnegative, enables the extraction of sparse facial features represented by the columns of U . We perform a rank-100 NMF 4 [23] on the ORL dataset, resulting in U ∈ R 4096×100 , yielding a nonnegative sparse matrix. Similarly, for the YaleB dataset, a rank-81 NMF generates U ∈ R 1024×81 as a nonnegative sparse matrix. ...
Article
Full-text available
Recently, there has been a growing interest in the exploration of Nonlinear Matrix Decomposition (NMD) due to its close ties with neural networks. NMD aims to find a low-rank matrix from a sparse nonnegative matrix with a per-element nonlinear function. A typical choice is the Rectified Linear Unit (ReLU) activation function. To address over-fitting in the existing ReLU-based NMD model (ReLU-NMD), we propose a Tikhonov regularized ReLU-NMD model, referred to as ReLU-NMD-T. Subsequently, we introduce a momentum accelerated algorithm for handling the ReLU-NMD-T model. A distinctive feature, setting our work apart from most existing studies, is the incorporation of both positive and negative momentum parameters in our algorithm. Our numerical experiments on real-world datasets show the effectiveness of the proposed model and algorithm. Moreover, the code is available at https://github.com/nothing2wang/NMD-TM .
... In order to further illustrate the correctness of the overall source separation of the algorithm, this paper uses Equation (21)(22) to calculate the correlation coefficient between the source signal and the separation signal to evaluate the method performance. We compare the proposed method with other NMF algorithms [15,[20][21][22] when used in RFID systems. As can be seen from Fig. 1, compared with other NMF algorithms, the BCA_PM algorithm performs better when used for RFID anticollision. ...
Article
Radio Frequency Identification (RFID) is one of the key technologies of the Internet of Things. However, during its application, it faces a huge challenge of co-frequency interference cancellation, that is, the tag collision problem. The multi-tag anti-collision problem is modeled as a Blind Source Separation (BSS) problem from the perspective of system communication transmission layer signal processing. In order to reduce the cost of the reader antenna, this paper uses the boundedness of the tag communication signal to propose an underdetermined RFID tag anti-collision method based on Bounded Component Analysis (BCA). This algorithm converts the underdetermined tag into the signal collision model is combined with the BCA mechanism. Verification analysis was conducted using simulation data. The experimental results show that compared with the nonnegative matrix factorization (NMF) algorithm based on minimum correlation and minimum volume constraints, the bounded component analysis method proposed in this article can perform better. Solving the underdetermined collision problem greatly improves the effect of eliminating co-channel interference of tag signals, improves the system bit error rate performance, and reduces the complexity of the underdetermined model system.
... In [14], they showed that the sightly different regularizer ∥W ∥ * + 1 2 ∥H∥ 2 F yields better results than 1 2 ∥W ∥ 2 F + ∥H∥ 2 F , both with uniform or nonuniform samplings. Going back to our point of interest, it is interesting to observe that the MinVol regularizer provides more adaptability as a (non-convex) relaxation of the rank [15], since logdet(W ⊤ W + δI) = i log(σ 2 i (W ) + δ). As it can be seen in Fig. 3, logdet(W ⊤ W + δI) approximates a range of behaviors between the ℓ 0 and the ℓ 1 norms. ...
Preprint
Full-text available
Low-rank matrix approximation is a standard, yet powerful, embedding technique that can be used to tackle a broad range of problems, including the recovery of missing data. In this paper, we focus on the performance of nonnegative matrix factorization (NMF) with minimum-volume (MinVol) regularization on the task of nonnegative data imputation. The particular choice of the MinVol regularization is justified by its interesting identifiability property and by its link with the nuclear norm. We show experimentally that MinVol NMF is a relevant model for nonnegative data recovery, especially when the recovery of a unique embedding is desired. Additionally, we introduce a new version of MinVol NMF that exhibits some promising results.
Conference Paper
Full-text available
This work considers two volume regularized non-negative matrix factorization (NMF) problems that decompose a non-negative matrix X into the product of two nonnegative matrices W and H with a regularization on the volume of the convex hull spanned by the columns of W. This regularizer takes two forms: the determinant (det) and logarithm of the determinant (logdet) of the Gramian of W. In this paper, we explore the structure of these problems and present several algorithms, including a new algorithm based on an eigenvalue upper bound of the logdet function. Experimental results on synthetic data show that (i) the new algorithm is competitive with the standard Taylor bound, and (ii) the logdet regularizer works better than the det regularizer. We also illustrate the applicability of the new algorithm on the San Diego airport hyperspectral image.
Article
Full-text available
Nonnegative matrix factorization (NMF) has become a workhorse for signal and data analytics, triggered by its model parsimony and interpretability. Perhaps a bit surprisingly, the understanding to its model identifiability---the major reason behind the interpretability in many applications such as topic mining and hyperspectral imaging---had been rather limited until recent years. Beginning from the 2010s, the identifiability research of NMF has progressed considerably: Many interesting and important results have been discovered by the signal processing (SP) and machine learning (ML) communities. NMF identifiability has a great impact on many aspects in practice, such as ill-posed formulation avoidance and performance-guaranteed algorithm design. On the other hand, there is no tutorial paper that introduces NMF from an identifiability viewpoint. In this paper, we aim at filling this gap by offering a comprehensive and deep tutorial on model identifiability of NMF as well as the connections to algorithms and applications. This tutorial will help researchers and graduate students grasp the essence and insights of NMF, thereby avoiding typical `pitfalls' that are often times due to unidentifiable NMF formulations. This paper will also help practitioners pick/design suitable factorization tools for their own problems.
Article
Full-text available
In this letter, we propose a new identification criterion that guarantees the recovery of the low-rank latent factors in the nonnegative matrix factorization (NMF) model, under mild conditions. Specifically, using the proposed criterion, it suffices to identify the latent factors if the rows of one factor are \emph{sufficiently scattered} over the nonnegative orthant, while no structural assumption is imposed on the other factor except being full-rank. This is by far the mildest condition under which the latent factors are provably identifiable from the NMF model.
Article
Full-text available
In this paper, we introduce and provide a short overview of nonnegative matrix factorization (NMF). Several aspects of NMF are discussed, namely, the application in hyperspectral imaging, geometry and uniqueness of NMF solutions, complexity, algorithms, and its link with extended formulations of polyhedra. In order to put NMF into perspective, the more general problem class of constrained low-rank matrix approximation problems is first briefly introduced.
Article
Full-text available
This paper revisits blind source separation of instantaneously mixed quasi-stationary sources (BSS-QSS), motivated by the observation that in certain applications (e.g., speech) there exist time frames during which only one source is active, or locally dominant. Combined with nonnegativity of source powers, this endows the problem with a nice convex geometry that enables elegant and efficient BSS solutions. Local dominance is tantamount to the so-called pure pixel/separability assumption in hyperspectral unmixing/nonnegative matrix factorization, respectively. Building on this link, a very simple algorithm called successive projection algorithm (SPA) is considered for estimating the mixing system in closed form. To complement SPA in the specific BSS-QSS context, an algebraic preprocessing procedure is proposed to suppress short-term source cross-correlation interference. The proposed procedure is simple, effective, and supported by theoretical analysis. Solutions based on volume minimization (VolMin) are also considered. By theoretical analysis, it is shown that VolMin guarantees perfect mixing system identifiability under an assumption more relaxed than (exact) local dominance—which means wider applicability in practice. Exploiting the specific structure of BSS-QSS, a fast VolMin algorithm is proposed for the overdetermined case. Careful simulations using real speech sources showcase the simplicity, efficiency, and accuracy of the proposed algorithms.
Article
Full-text available
In blind hyperspectral unmixing (HU), the pure-pixel assumption is well-known to be powerful in enabling simple and effective blind HU solutions. However, the pure-pixel assumption is not always satisfied in an exact sense, especially for scenarios where pixels are all intimately mixed. In the no pure-pixel case, a good blind HU approach to consider is the minimum volume enclosing simplex (MVES). Empirical experience has suggested that MVES algorithms can perform well without pure pixels, although it was not totally clear why this is true from a theoretical viewpoint. This paper aims to address the latter issue. We develop an analysis framework wherein the perfect identifiability of MVES is studied under the noiseless case. We prove that MVES is indeed robust against lack of pure pixels, as long as the pixels do not get too heavily mixed and too asymmetrically spread. Also, our analysis reveals a surprising and counter-intuitive result, namely, that MVES becomes more robust against lack of pure pixels as the number of endmembers increases. The theoretical results are verified by numerical simulations.
Article
Full-text available
Blind hyperspectral unmixing (HU), also known as unsupervised HU, is one of the most prominent research topics in signal processing (SP) for hyperspectral remote sensing [1], [2]. Blind HU aims at identifying materials present in a captured scene, as well as their compositions, by using high spectral resolution of hyperspectral images. It is a blind source separation (BSS) problem from a SP viewpoint. Research on this topic started in the 1990s in geoscience and remote sensing [3]-[7], enabled by technological advances in hyperspectral sensing at the time. In recent years, blind HU has attracted much interest from other fields such as SP, machine learning, and optimization, and the subsequent cross-disciplinary research activities have made blind HU a vibrant topic. The resulting impact is not just on remote sensing - blind HU has provided a unique problem scenario that inspired researchers from different fields to devise novel blind SP methods. In fact, one may say that blind HU has established a new branch of BSS approaches not seen in classical BSS studies. In particular, the convex geometry concepts - discovered by early remote sensing researchers through empirical observations [3]-[7] and refined by later research - are elegant and very different from statistical independence-based BSS approaches established in the SP field. Moreover, the latest research on blind HU is rapidly adopting advanced techniques, such as those in sparse SP and optimization. The present development of blind HU seems to be converging to a point where the lines between remote sensing-originated ideas and advanced SP and optimization concepts are no longer clear, and insights from both sides would be used to establish better methods.
Article
Full-text available
In this paper, we design a hierarchical clustering algorithm for high-resolution hyperspectral images. At the core of the algorithm, a new rank-two nonnegative matrix factorizations (NMF) algorithm is used to split the clusters, which is motivated by convex geometry concepts. The method starts with a single cluster containing all pixels, and, at each step, (i) selects a cluster in such a way that the error at the next step is minimized, and (ii) splits the selected cluster into two disjoint clusters using rank-two NMF in such a way that the clusters are well balanced and stable. The proposed method can also be used as an endmember extraction algorithm in the presence of pure pixels. The effectiveness of this approach is illustrated on several synthetic and real-world hyperspectral images, and shown to outperform standard clustering techniques such as k-means, spherical k-means and standard NMF.
Article
In the nonnegative matrix factorization (NMF) problem we are given an n×mn \times m nonnegative matrix M and an integer r>0r > 0. Our goal is to express M as AWA W, where A and W are nonnegative matrices of size n×rn \times r and r×mr \times m, respectively. In some applications, it makes sense to ask instead for the product AW to approximate M, i.e. (approximately) minimize MAWF\left\lVert{M - AW}_F\right\rVert, where F\left\lVert\right\rVert_F, denotes the Frobenius norm; we refer to this as approximate NMF. This problem has a rich history spanning quantum mechanics, probability theory, data analysis, polyhedral combinatorics, communication complexity, demography, chemometrics, etc. In the past decade NMF has become enormously popular in machine learning, where A and W are computed using a variety of local search heuristics. Vavasis recently proved that this problem is NP-complete. (Without the restriction that A and W be nonnegative, both the exact and approximate problems can be solved optimally via the singular value decomposition.) We initiate a study of when this problem is solvable in polynomial time. Our results are the following: 1. We give a polynomial-time algorithm for exact and approximate NMF for every constant r. Indeed NMF is most interesting in applications precisely when r is small. 2. We complement this with a hardness result, that if exact NMF can be solved in time (nm)o(r)(nm)^{o(r)}, 3-SAT has a subexponential-time algorithm. This rules out substantial improvements to the above algorithm. 3. We give an algorithm that runs in time polynomial in n, m, and r under the separablity condition identified by Donoho and Stodden in 2003. The algorithm may be practical since it is simple and noise tolerant (under benign assumptions). Separability is believed to hold in many practical settings. To the best of our knowledge, this last result is the first example of a polynomial-time algorithm that provably works under a non-trivial condition on the input and we believe that this will be an interesting and important direction for future work.
Article
This paper considers \emph{volume minimization} (VolMin)-based structured matrix factorization (SMF). VolMin is a factorization criterion that decomposes a given data matrix into a basis matrix times a structured coefficient matrix via finding the minimum-volume simplex that encloses all the columns of the data matrix. Recent work showed that VolMin guarantees the identifiability of the factor matrices under mild conditions that are realistic in a wide variety of applications. This paper focuses on both theoretical and practical aspects of VolMin. On the theory side, exact equivalence of two independently developed sufficient conditions for VolMin identifiability is proven here, thereby providing a more comprehensive understanding of this aspect of VolMin. On the algorithm side, computational complexity and sensitivity to outliers are two key challenges associated with real-world applications of VolMin. These are addressed here via a new VolMin algorithm that handles volume regularization in a computationally simple way, and automatically detects and {iteratively downweights} outliers, simultaneously. Simulations and real-data experiments using a remotely sensed hyperspectral image and the Reuters document corpus are employed to showcase the effectiveness of the proposed algorithm.