Conference PaperPDF Available

Minimum-Volume Rank-Deficient Non-negative Matrix Factorizations

Authors:

Abstract and Figures

ABSTRACT In recent years, nonnegative matrix factorization (NMF) with volume regularization has been shown to be a powerful identifiable model; for example for hyperspectral unmixing, document classification, community detection and hidden Markov models. In this paper, we show that minimum-volume NMF (min-volNMF) can also be used when the basis matrix is rank deficient, which is a reasonable scenario for some real-world NMF problems (e.g., for unmixing multispectral images). We propose an alternating fast projected gradient method for minvol NMF and illustrate its use on rank-deficient NMF problems; namely a synthetic data set and a multispectral image. Index Terms— nonnegative matrix factoriztion, minimum volume, identifiability, rank deficiency
Content may be subject to copyright.
MINIMUM-VOLUME RANK-DEFICIENT NONNEGATIVE MATRIX FACTORIZATIONS
Valentin Leplat, Andersen M.S. Ang, Nicolas Gillis
University of Mons, Rue de Houdain 9, 7000 Mons, Belgium
ABSTRACT
In recent years, nonnegative matrix factorization (NMF) with
volume regularization has been shown to be a powerful iden-
tifiable model; for example for hyperspectral unmixing, docu-
ment classification, community detection and hidden Markov
models. In this paper, we show that minimum-volume NMF
(min-vol NMF) can also be used when the basis matrix is rank
deficient, which is a reasonable scenario for some real-world
NMF problems (e.g., for unmixing multispectral images). We
propose an alternating fast projected gradient method for min-
vol NMF and illustrate its use on rank-deficient NMF prob-
lems; namely a synthetic data set and a multispectral image.
Index Termsnonnegative matrix factorization, mini-
mum volume, identifiability, rank deficiency
1. INTRODUCTION
Given a nonnegative matrix XRm×n
+and a factorization
rank r, nonnegative matrix factorization (NMF) requires to
find two nonnegative matrices WRm×r
+and HRr×n
+
such that XW H . For simplicity, we will use the Frobe-
nius norm, which is arguably the most widely used, to assess
the error of an NMF solution and consider the following opti-
mization problem
min
WRm×r,HRr×n||XW H ||2
Fs.t. W0and H0.
NMF is in most cases ill-posed because the optimal solution
is not unique. In order to make the solution of the above prob-
lem unique (up to permutation and scaling of the columns of
Wand rows of H) hence making the problem well-posed and
the parameters (W, H)of the problem identifiable, a key idea
is to look for a solution Wwith minimum volume; see [1] and
the references therein. A possible formulation for minimum-
volume NMF (min-vol NMF) is as follows
min
W0,H(:,j )rj||XW H ||2
F+λvol(W),(1)
where r={xRr
+|Pixi1},λis a penalty parame-
ter, and vol(W)is a function that measures the volume of the
columns of W. Note that Hneeds to be normalized otherwise
Authors acknowledge the support by the European Research Council
(ERC starting grant no679515) and by the Fonds de la Recherche Sci-
entifique - FNRS and the Fonds Wetenschappelijk Onderzoek - Vlanderen
(FWO) under EOS Project no O005318F-RG47.
Wwould go to zero since W H = (cW )(H/c)for any c > 0.
In this paper, we will use vol(W) = logdet(WTW+δI),
where Iis the identity matrix of appropriate dimensions. The
reason for using such a measure is that pdet(WTW)/r!is
the volume of the convex hull of the columns of Wand the
origin. Under some appropriate conditions on X=W H ,
this model will provably recover the true underlying (W, H )
that generated X. These recovery conditions require that the
columns of Xare sufficiently well spread in the convex hull
generated by the columns of W[2, 3, 4]; this is the so-called
sufficiently scattered condition. In particular, data points need
to be located on the facets of this convex hull hence Hneeds
to be sufficiently sparse. A few remarks are in order:
The ideas behind min-vol NMF have been introduced in
the hyperspectral image community and date back from the
paper [5]; see also the discussions in [6, 1].
As far as we know, these theoretical results only apply in
noiseless conditions hence robustness to noise of model (1)
still needs to be rigorously analyzed (this is a very promising
but difficult direction of further research).
The sufficiently scattered condition is a generalization of
the separability condition which requires W=X(:,K)for
some index set Kof size r. Separability makes the NMF
problem easily solvable, and efficient and robust algorithms
exist; see, e.g., [7, 6, 8] and the references therein. Note that
although min-vol NMF guarantees identifiability, the corre-
sponding optimization problem (1) is still hard to solve in
general; as the original NMF problem [9].
Another key assumption that is used in min-vol NMF is
that the basis matrix Wis full rank, that is, rank(W) = r;
otherwise det(WTW)=0. However, there are situations
when the matrix Wis not full rank: this happens in particular
when rank(X)6= rank+(X)where rank+(X)is the non-
negative rank of Xwhich is the smallest rsuch that Xhas
an exact NMF decomposition (that is, X=W H). Here is a
simple example:
X=
1100
0011
0110
1001
(2)
for which rank(X)=3<rank+(X)=4. The columns of
the matrix Xare the vertices of a square in a 2-dimensional
subspace; see Fig. 2 for an illustration. A practical situation
where this could happen is in multispectral imaging. Let us
construct the matrix Xsuch that each column X(:, j)0
is the spectral signature of a pixel. Then, under the linear
mixing model, each column of Xis the nonnegative linear
combination of the spectral signatures of the constitutive ma-
terials present in the image, referred to as endmembers: we
have X(:, j) = Pr
k=1 W(:, k)H(k, j ), where W(:, k)is the
spectral signature of the kth endmember, and H(k, j)is the
abundance of the kth endmember in the jth pixel; see [6]
for more details. For multispectral images, the number of
materials within the scene being imaged can be larger than
the number of spectral bands meaning that r > m hence
rank(W)m<r.
In this paper, we focus on the min-vol NMF formulation
in the rank-deficient scenario, that is, when rank(W)< r.
The main contribution of this paper is three-fold: (i) We ex-
plain why min-vol NMF (1) can be used meaningfully when
the basis matrix Wis not full rank. This is, as far as we know,
the first time this observation is made in the literature. (ii) We
propose an algorithm based on alternating projected fast gra-
dient method to tackle this problem. (iii) We illustrate our
results on a synthetic data set and a multispectral image.
2. MIN-VOL NMF IN THE RANK-DEFICIENT CASE
Let us discuss the min-vol NMF model we consider in this
paper, namely,
min
W0,H(:,j )rj||XW H ||2
F+λlogdet(WTW+δI),(3)
which has three key ingredients: the choice of the volume
regularizer, that is, logdet(WTW+δI), the parameters δand
λ. They are discussed in the next three paragraphs.
Choice of the volume regularizer Most functions used
to minimize the volume of the columns of Ware based
on the Gram matrix WTW; in particular, det(WTW)and
logdet(WTW+δI)for some δ > 0are the most widely
used measures; see, e.g., [10, 11]. Note that det(WTW) =
Πr
i=1σ2
i(W), hence the log term allows to weight down
large singular values and has been observed to work bet-
ter in practice; see, e.g., [12]. When Wis rank deficient
(that is, rank(W)< r), some singular values of Ware
equal to zero hence det(WTW)=0. Therefore, the func-
tion det(WTW)cannot distinguish between different rank-
deficient solutions1. However, we have logdet(WTW+δI)
=Pr
i=1 log(σ2
i(W) + δ). Hence if Whas one (or more)
singular value equal to zero, this measure still makes sense:
among two rank-deficient solutions belonging to the same
low-dimensional subspace, minimizing logdet(WTW+δI)
will favor a solution whose convex hull has a smaller volume
within that subspace since decreasing the non-zero singular
values of (WTW+δI)will decrease logdet(WTW+δI).
In mathematical terms, let WRm×rbelong to a k-
dimensional subspace with k < r so that W=U S where
1Of course, one could also use the measure det(WTW+δI )mean-
ingfully in the rank-deficient case. However, it would be numerically more
challenging since for each singular value of Wequal to zero, the objective is
multiplied by δwhich should be chosen relatively small.
URm×kis an orthogonal basis of that subspace and S
Rk×rare the coordinates of the columns of Win that sub-
space. Then, logdet(WTW+δI) = Pk
i=1 log(σ2
i(S) + δ) +
(rk) log(δ). The min-vol criterion logdet(WTW+δI)
with δ > 0is therefore meaningful even when Wdoes not
have rank r.
Choice of δThe function logdet(WTW+δI)which is equal
to Pr
i=1 log(σ2
i(W) + δ)is a non-convex surrogate for the
`0norm of the vector of singular values of W(up to con-
stants factors), that is, of rank(W)[13, 14]. It is sharper than
the `1norm of the vector of singular values (that is, the nu-
clear norm) for δsufficiently small; see Fig. 1. Therefore, if
one wants to promote rank-deficient solutions, δshould not
be chosen too large, say δ0.1. Moreover, δshould not
Fig. 1. Function log(x2+δ)log(δ)
log(1+δ)log(δ)for different values of δ,`1
norm (=|x|) and `0norm (= 0 for x= 0,= 1 otherwise).
be chosen too small otherwise W W T+δI might be badly
conditioned which makes the optimization problem harder to
solve (see Section 3) –also, this could give too much impor-
tance to zero singular values which might not be desirable.
Therefore, in practice, we recommend to use a value of δbe-
tween 0.1 and 103. We will use δ= 0.1in this paper. Note
that in previous works, δwas chosen very small (e.g., 108
in [11]) which, as explained above, is not a desirable choice,
at least in the rank-deficient case. Even in the full-rank case,
we argue that choosing δtoo small is also not desirable since
it promotes rank-deficient solutions.
Choice of λThe choice of δwill influence the choice of λ.
In fact, the smaller δ, the larger |logdet(δ)|, hence to balance
the two terms in the objective (3), λshould be smaller. For the
practical implementation, we will initialize W(0) =X(:,K)
where Kis computed with the successive nonnegative pro-
jection algorithm (SNPA) that can handle the rank-deficient
separable NMF problem [15]. Note that SNPA also provides
the matrix H(0) so as to minimize ||XW(0)H(0) ||2
Fwhile
H(0)(:, j )rfor all j. Finally, we will choose
λ=˜
λ||XW(0)H(0) ||2
F
|logdet(W(0)TW(0) +δI)|,
where we recommend to choose ˜
λbetween 1 and 103de-
pending on the noise level (the noisier the input matrix, the
larger λshould be).
3. ALGORITHM FOR MIN-VOL NMF
Most algorithms for NMF optimize alternatively over Wand
H, and we adopt this strategy in this paper. For the up-
date of H, we will use the projected fast gradient method
(PFGM) from [15]. Note that, as opposed to previously pro-
posed methods for min-vol NMF, we assume that the sum of
the entries of each column of His smaller or equal to one,
not equal to one, which is more general. For the update of W,
we use a PFGM applied on an strongly convex upper approx-
imation of the objective function; similarly as done in [11]–
although in that paper, authors did not consider explicitly the
case W0(Wis unconstrained in their model) and did
not write down explicitly a PFGM taking advantage of strong
convexity. For the sake of completeness, we briefly recall this
approach. The following upper bound for the logdet term
holds: for any Q0and S0, we have
logdet(Q)g(Q, S) = logdet(S) + trace S1(QS)
= trace S1Q+ logdet(S)r.
This follows from the concavity of logdet(.)as g(Q, S)is
the first-order Taylor approximation of logdet(Q)around
S–it has also been used for example in [16]. This gives
logdet(WTW+δI)trace(Y W TW) + logdet(Y1)r
for any Wand any Y= (ZTZ+δI)1with δ > 0. Plugging
this in the original objective function, and denoting wT
ithe
ith row of matrix Wand h., .iis the Frobenius inner product
of two matrices, we obtain
`(W) = ||XW H ||2
F+λlogdet(WTW+δI)
=||X||2
F2hXHT, Wi+hWTW, H HTi
+λlogdet(WTW+δI)
≤ hWTW, HH T+λY i − 2hC, W i+b
= 2
n
X
i=1 1
2wT
iAwicT
iwi+b=¯
`(W),
where Y= (ZTZ+δI)1and A=HHT+λY are pos-
itive definite for δ, λ > 0,C=XH T, and bis a constant
independent of W. Note that ¯
`(W) = `(W)for Z=W.
Minimizing the upper bound ¯
`(W)of `(W)requires to solve
mindependent strongly convex optimization problems with
Hessian matrix A. Using PFGM on this problem, we obtain
a linear convergence method with rate 1κ1
1+κ1where κis
the condition number of A[17]. Note that the subproblem in
variable His not strongly convex when Wis rank deficient in
which case PFGM converges sublinearly, in O(1/k2)where
kis the iteration number. In any case, PFGM is an optimal
first-order method in both cases [17], that is, no first-order
method can have a faster convergence rate. When Wis rank
deficient, we have λ
δL=λmax(A)≤ ||H||2
2+λ
δ, where
Lis the largest eigenvalue of A. This shows the importance
of not choosing δtoo small, since the smaller δ, the larger the
conditioning of Ahence the slower will be the PFGM. Note
that Lis the Lipschitz constant of the gradient of the objective
function and controls the stepsize which is equal to 1/L. Our
proposed algorithm is summarized in Alg. 1. We will use 10
inner iterations for the PFGM on Wand H.
Algorithm 1 Min-vol NMF using alternating PFGM
Require: Input matrix XRm×n
+, the factorization rank r,
δ > 0,˜
λ > 0, number of iterations maxiter.
Ensure: (W, H)is an approximate solution of (3).
1: Initialize (W, H)using SNPA [15].
2: Let λ=˜
λ||XW H||2
F
logdet(WTW+δI).
3: for k= 1,2,...,maxiter do
4: % Update W
5: Let A=HHT+λ(WTW+δI)1and C=XH T.
6: Perform a few steps of PFGM on the prob-
lem minU01
2hUTU, Ai−hU, Ci, with initializa-
tion U=W. Set Was last iterate.
7: % Update H
8: Perform a few steps of PFGM on the problem
minH(:,j)rj||XW H ||2
Fas in [15].
9: end for
4. NUMERICAL EXPERIMENTS
We now apply our method on a synthetic and a real-world data
set. All tests are preformed using Matlab R2015a on a laptop
Intel CORE i7-7500U CPU @2.9GHz 24GB RAM. The code
is available from http://bit.ly/minvolNMF.
Synthetic data set. Let us construct the matrix XR4×500
as follows: Wis taken as the matrix from (2) so that
rank(W) = 3 < r = 4, and each column of His distributed
using the Dirichlet distribution of parameter (0.1,...,0.1).
Each column of Hwith an entry larger 0.8 is resampled as
long as this condition does not hold. This guarantees that no
data point is close to a column of W(this is sometimes re-
ferred to as the purity index). Fig. 2 illustrates this geometric
problem. As observed on Fig. 2, Alg. 1 is able to perfectly
Fig. 2. Synthetic data set and recovery. (Only the first three
entries of each four-dimensional vector are displayed.)
recover the true columns of W. For this experiment, we
use ˜
λ= 0.01. Fig. 3 illustrates the same experiment where
noise is added to X= max(0, W H +N)where N=
randn(m,n) in Matlab notation (i.i.d. Gaussian distribution of
mean zero and standard deviation ). Note that the average of
the entries of Xis 0.5 (each column is a linear combination
of the columns of W, with weights summing to one). Fig. 3
displays the average over 20 randomly generated matrices X
of the relative error d(W, ˜
W) = ||W˜
W||F
||W||Fwhere ˜
Wis the
solution computed by Alg. 1 depending on the noise level
. This illustrates that min-vol NMF is robust against noise
since the d(W, ˜
W)is smaller than 1% for 1%.
Fig. 3. Evolution of the recovery of the true Wdepending on
the noise N=rand(m,n) using Alg. 1 (˜
λ= 0.01,δ= 0.1,
maxiter = 100).
Multispectral image. The San Diego airport is a HYDICE
hyperspectral image (HSI) containing 158 clean bands, and
400 ×400 pixels for each spectral image; see, e.g., [18].
There are mainly three types of materials: road surfaces,
roofs and vegetation (trees and grass). The image can be
well approximated using r=8. Since we are interested in
the case rank(W)<r, we select m=5 spectral band using
the successive projection algorithm [19] (this is essentially
Gram-Schmidt with column pivoting) applied on XT. This
provides bands that are representative: the selected bands are
4, 32, 116, 128, 150. Hence, we are factoring a 5-by-160000
matrix using a r=8. Note that we have removed outlying
pixels (some spectra contain large negative entries while oth-
ers have a norm order of magnitude larger than most pixels).
Fig. 4 displays the abundance maps extracted (that is, the
rows of matrix H): they correspond to meaningful locations
of materials. Here we have used ˜
λ=0.1 and 1000 iterations.
From the initial solution provided by SNPA, min-vol NMF
is able to reduce the error ||XW H ||Fby a factor of 11.7
while the term logdet(WTW+δI)only increases by a factor
of 1.06. The final relative error is ||XWH ||F
||X||F= 0.2%.
5. CONCLUSION
In this paper, we have shown that min-vol NMF can be used
meaningfully for rank-deficient NMF’s. We have provided a
simple algorithm to tackle this problem and have illustrated
the behaviour of the method on synthetic and real-world data
Fig. 4. Abundance maps extract by min-vol NMF using only
five bands of the San Diego airport HSI. From left to right, top
to bottom: vegetation (grass and trees), three different types
of roof tops, four different types of road surfaces.
sets. This work is only preliminary and many important ques-
tions remain open; in particular
Under which conditions can we prove the identifiability of
min-vol NMF in the rank-deficient case (as done in [2, 3] for
the full-rank case)? Intuitively, it seems that a condition sim-
ilar to the sufficiently-scattered condition would be sufficient
but this has to be analysed thoroughly.
Can we prove robustness to noise of such techniques? (The
question is also open for the full-rank case.)
Can we design faster and more robust algorithms? And
algorithms taking advantage of the fact that the solution is
rank-deficient?
6. REFERENCES
[1] Xiao Fu, Kejun Huang, Nicholas D Sidiropoulos, and
Wing-Kin Ma, “Nonnegative matrix factorization for
signal and data analytics: Identifiability, algorithms, and
applications,” IEEE Signal Processing Magazine, 2018,
to appear.
[2] Chia-Hsiang Lin, Wing-Kin Ma, Wei-Chiang Li,
Chong-Yung Chi, and ArulMurugan Ambikapathi,
“Identifiability of the simplex volume minimization cri-
terion for blind hyperspectral unmixing: The no-pure-
pixel case, IEEE Transactions on Geoscience and Re-
mote Sensing, vol. 53, no. 10, pp. 5530–5546, 2015.
[3] Xiao Fu, Wing-Kin Ma, Kejun Huang, and Nicholas D
Sidiropoulos, “Blind separation of quasi-stationary
sources: Exploiting convex geometry in covariance do-
main.,” IEEE Transactions Signal Processing, vol. 63,
no. 9, pp. 2306–2320, 2015.
[4] Xiao Fu, Kejun Huang, and Nicholas D Sidiropoulos,
“On identifiability of nonnegative matrix factorization,
IEEE Signal Processing Letters, vol. 25, no. 3, pp. 328–
332, 2018.
[5] Maurice D Craig, “Minimum-volume transforms for re-
motely sensed data,” IEEE Transactions on Geoscience
and Remote Sensing, vol. 32, no. 3, pp. 542–552, 1994.
[6] Wing-Kin Ma, Jos´
e M Bioucas-Dias, Tsung-Han Chan,
Nicolas Gillis, Paul Gader, Antonio J Plaza, ArulMu-
rugan Ambikapathi, and Chong-Yung Chi, A signal
processing perspective on hyperspectral unmixing: In-
sights from remote sensing,” IEEE Signal Processing
Magazine, vol. 31, no. 1, pp. 67–81, 2014.
[7] Sanjeev Arora, Rong Ge, Ravindran Kannan, and
Ankur Moitra, “Computing a nonnegative matrix
factorization–provably,” in Proceedings of the forty-
fourth annual ACM symposium on Theory of computing.
ACM, 2012, pp. 145–162.
[8] Nicolas Gillis, “Introduction to nonnegative matrix fac-
torization,” SIAG/OPT Views and News, vol. 25, no. 1,
pp. 7–16, 2017.
[9] Stephen A Vavasis, “On the complexity of nonnegative
matrix factorization, SIAM Journal on Optimization,
vol. 20, no. 3, pp. 1364–1377, 2010.
[10] Lidan Miao and Hairong Qi, “Endmember extraction
from highly mixed data using minimum volume con-
strained nonnegative matrix factorization, IEEE Trans-
actions on Geoscience and Remote Sensing, vol. 45, no.
3, pp. 765–777, 2007.
[11] Xiao Fu, Kejun Huang, Bo Yang, Wing-Kin Ma,
and Nicholas D. Sidiropoulos, “Robust volume
minimization-based matrix factorization for remote
sensing and document clustering,” IEEE Transactions
on Signal Processing, vol. 64, no. 23, pp. 6254–6268,
2016.
[12] Andersen M.S. Ang and Nicolas Gillis, “Volume reg-
ularized non-negative matrix factorizations, in 2018
Workshop on Hyperspectral Image and Signal Process-
ing: Evolution in Remote Sensing (WHISPERS), 2018.
[13] Maryam Fazel, Matrix rank minimization with applica-
tions, Ph.D. thesis, Stanford University, 2002.
[14] Maryam Fazel, Haitham Hindi, and Stephen P Boyd,
“Log-det heuristic for matrix rank minimization with
applications to Hankel and Euclidean distance matri-
ces,” in Proceedings of the 2003 American Control Con-
ference. IEEE, 2003, vol. 3, pp. 2156–2162.
[15] Nicolas Gillis, “Successive nonnegative projection algo-
rithm for robust nonnegative blind source separation,”
SIAM Journal on Imaging Sciences, vol. 7, no. 2, pp.
1420–1450, 2014.
[16] Kazuyoshi Yoshii, Ryota Tomioka, Daichi Mochihashi,
and Masataka Goto, “Beyond NMF: Time-domain au-
dio source separation without phase reconstruction,” in
ISMIR, 2013, pp. 369–374.
[17] Yurii Nesterov, Introductory lectures on convex opti-
mization: A basic course, vol. 87, Springer Science &
Business Media, 2013.
[18] Nicolas Gillis, Da Kuang, and Haesun Park, “Hierarchi-
cal clustering of hyperspectral images using rank-two
nonnegative matrix factorization, IEEE Transactions
on Geoscience and Remote Sensing, vol. 53, no. 4, pp.
2066–2078, 2015.
[19] Nicolas Gillis and Stephen A Vavasis, “Fast and robust
recursive algorithms for separable nonnegative matrix
factorization,IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 36, no. 4, pp. 698–714,
2014.
... 4]. This leads to identifiability/uniqueness of NMF, as stated in Theorem 2. In practice, we use logdet(W ⊤ W + δI) (with the addition of a small parameter δ) for numerical stability; see the discussion in [23]. Theorem 2 ([15,13,24]). ...
... • In scenarios where the factorization rank has been overestimated, min-vol NMF can perform automatic rank detection by effectively setting some of the rank-one factors to zero [23]. ...
... However, applying Theorem 2 to each layer individually is not possible because it would require the W ℓ matrices to have full rank, which is precluded by construction due to the hierarchical structure where W ℓ−1 = W ℓ H ℓ and the assumption r ℓ < r ℓ−1 . Fortunately, empirical observations suggest that min-vol NMF can recover W even when it is rank-deficient, provided that H is sufficiently sparse, as demonstrated in [23]. Additionally, the literature includes sparse NMF models, such as those discussed in [1], which offer identifiability even in the rank-deficient case. ...
... Volume minimization based methods find a minimum volume simplex that encloses all pixels in the data; the endmembers are the vertices of the obtained simplex. Methods such as MVES [16] and MinVolNMF [17] enforce the enclosure of pixels as a hard constraint, while other methods such as MVSA [18] and SISAL [6] attempt to account for noise by allowing negative abundance estimates with some penalty term. Geometric approaches based on volume minimization do not require the pure pixel assumption to be satisfied, but other data conditions may be necessary. ...
... For a single patch under our proposed model, this would be equivalent to the patch containing a pixel with no foreground material and a pixel entirely covered with foreground material. A more relaxed condition known as sufficient scattering has also been proposed [26], [27]. 1 It has been shown that under the aforementioned data conditions for the linear mixing model, algorithms such as MinVolNMF [17] can recover the true material signatures. To our knowledge, no works have developed equivalent identifiability conditions for the bilinear mixing model. ...
... where the δI term for small but sufficiently large δ > 0 ensures that the determinant is positive. To solve this problem, we use the MinVolNMF algorithm provided in [17]. Given a patch following the bag-of-patches model, the Input: ...
Preprint
The problem of foreground material signature extraction in an intimate (nonlinear) mixing setting is considered. It is possible for a foreground material signature to appear in combination with multiple background material signatures. We explore a framework for foreground material signature extraction based on a patch model that accounts for such background variation. We identify data conditions under which a foreground material signature can be extracted up to scaling and elementwise-inverse variations. We present algorithms based on volume minimization and endpoint member identification to recover foreground material signatures under these conditions. Numerical experiments on real and synthetic data illustrate the efficacy of the proposed algorithms.
... Therefore, the blind source separation algorithm cannot be directly used to solve the RFID system collision problem in the under-determined state. The nonnegative matrix factorization (NMF) algorithm in the blind source separation algorithm is essentially a matrix decomposition under specific constraints [39,40]. By setting the constraints, it can complete the source signal estimation, and applying it to the RFID system in the under-determined state can also complete the separation of the tag collision signal in the under-determined state. ...
Article
Full-text available
With radio frequency identification (RFID) becoming a popular wireless technology, more and more relevant applications are emerging. Therefore, anti-collision algorithms, which determine the time to tag identification and the accuracy of identification, have become very important in RFID systems. This paper presents the algorithms of ALOHA for randomness, the binary tree algorithm for determinism, and a hybrid anti-collision algorithm that combines these two algorithms. To compensate for the low throughput of traditional algorithms, RFID anti-collision algorithms based on blind source separation (BSS) are described, as the tag signals of RFID systems conform to the basic assumptions of the independent component analysis (ICA) algorithm. In the determined case, the ICA algorithm-based RFID anti-collision method is described. In the under-determined case, a combination of tag grouping with a blind separation algorithm and constrained non-negative matrix factorization (NMF) is used to separate the multi-tag mixing problem. Since the estimation of tag or frame length is the main step to solve the RFID anti-collision problem, this paper introduces an anti-collision algorithm based on machine learning to estimate the number of tags.
... Therefore, in general, unsupervised CNMF is not regularized enough to perform transcription. While some works focus on further regularization of NMF [18], we instead turn towards semi-supervision. ...
Conference Paper
Full-text available
Automatic Music Transcription, which consists in transforming an audio recording of a musical performance into symbolic format, remains a difficult Music Information Retrieval task. In this work, which focuses on piano transcription, we propose a semi-supervised approach using low-rank matrix factorization techniques, in particular Convolutive Nonnegative Matrix Factorization. In the semi-supervised setting, only a single recording of each individual notes is required. We show on the MAPS dataset that the proposed semi-supervised CNMF method performs better than state-of-the-art low-rank factorization techniques and a little worse than supervised deep learning state-of-the-art methods, while however suffering from generalization issues.
... To solve the minVolNMF problem, a majorization-minimization (MM) framework is usually considered. This consists in minimizing a surrogate function, namely a strongly convex upper approximation of the loss function, see [27] and [28] for the details. The FPGM of Algorithm 4 can then be applied on this surrogate. ...
Article
Full-text available
Deep matrix factorizations (deep MFs) are recent unsupervised data mining techniques inspired by constrained low-rank approximations. They aim to extract complex hierarchies of features within high-dimensional datasets. Most of the loss functions proposed in the literature to evaluate the quality of deep MF models and the underlying optimization frameworks are not consistent because different losses are used at different layers. In this paper, we introduce two meaningful loss functions for deep MF and present a generic framework to solve the corresponding optimization problems. We illustrate the effectiveness of this approach through the integration of various constraints and regularizations, such as sparsity, nonnegativity and minimum-volume. The models are successfully applied on both synthetic and real data, namely for hyperspectral unmixing and extraction of facial features.
... To solve the minVolNMF problem, a majorization-minimization (MM) framework is usually considered. This consists in minimizing a surrogate function, namely a strongly convex upper approximation of the loss function, see [27] and [28] for the details. The FPGM of Algorithm 4 can then be applied on this surrogate. ...
Preprint
Full-text available
Deep matrix factorizations (deep MFs) are recent unsupervised data mining techniques inspired by constrained low-rank approximations. They aim to extract complex hierarchies of features within high-dimensional datasets. Most of the loss functions proposed in the literature to evaluate the quality of deep MF models and the underlying optimization frameworks are not consistent because different losses are used at different layers. In this paper, we introduce two meaningful loss functions for deep MF and present a generic framework to solve the corresponding optimization problems. We illustrate the effectiveness of this approach through the integration of various constraints and regularizations, such as sparsity, nonnegativity and minimum-volume. The models are successfully applied on both synthetic and real data, namely for hyperspectral unmixing and extraction of facial features.
Article
Full-text available
Sensor selection is one of the key factors that dictate the performance of estimating vertical wheel forces in vehicle durability design. To select K most relevant sensors among S candidate ones that best fit the response of one vertical wheel force, it has S!/(K!(S-K)!) possible choices to evaluate, which is not practical unless K or S is small. In order to tackle this issue, this paper proposes a data-driven method based on maximizing the marginal likelihood of the data of the vertical wheel force without knowing the dynamics of vehicle systems. Although the resulting optimization problem is a mixed-integer programming problem, it is relaxed to a convex problem with continuous variables and linear constraints. The proposed sensor selection method is flexible and easy to implement, and no additional hyper-parameters needed to be tuned using cross-validation. The feasibility and effectiveness of the proposed method are verified using experimental data in vehicle durability design. The results show that the proposed method has good performance with different data sizes and model orders, in providing sub-optimal sensor configurations for estimating vertical wheel forces in vehicles.
Article
Full-text available
Scene attribute recognition is to identify attribute labels of one scene image based on scene representation for deeper semantic understanding of scenes. In the past decades, numerous algorithms for scene representation have been proposed by feature engineering or deep convolutional neural network. For models based on only one kind of image feature, it is still difficult to learn the representation of multiple attributes from local image region. For models based on deep learning, despite multi-label can be directly used for learning attributes representation, huge training data are usually necessary to build the multi-label model. In this paper, we investigate the problem by the way of scene representation modeling with multi-feature and non-deep learning. Firstly, we introduce linear mixing model (LMM) for scene image modeling, then present a novel approach, referred to as the mini-batch minimum simplex estimation (MMSE), for attribute-based scene representation learning from highly complex image data. Finally, a two-stage multi-feature fusion method is proposed to further improve the feature representation for scene attribute recognition. The proposed method takes advantage of the fast convergence of nonnegative matrix factorization (NMF) schemes, and at the same time using mini-batch to speed up the computation for large-scale scene dataset. The experimental results based on real image scene demonstrate that the proposed method outperforms several other advanced scene attribute recognition approaches.
Conference Paper
Full-text available
This work considers two volume regularized non-negative matrix factorization (NMF) problems that decompose a non-negative matrix X into the product of two nonnegative matrices W and H with a regularization on the volume of the convex hull spanned by the columns of W. This regularizer takes two forms: the determinant (det) and logarithm of the determinant (logdet) of the Gramian of W. In this paper, we explore the structure of these problems and present several algorithms, including a new algorithm based on an eigenvalue upper bound of the logdet function. Experimental results on synthetic data show that (i) the new algorithm is competitive with the standard Taylor bound, and (ii) the logdet regularizer works better than the det regularizer. We also illustrate the applicability of the new algorithm on the San Diego airport hyperspectral image.
Article
Full-text available
Nonnegative matrix factorization (NMF) has become a workhorse for signal and data analytics, triggered by its model parsimony and interpretability. Perhaps a bit surprisingly, the understanding to its model identifiability---the major reason behind the interpretability in many applications such as topic mining and hyperspectral imaging---had been rather limited until recent years. Beginning from the 2010s, the identifiability research of NMF has progressed considerably: Many interesting and important results have been discovered by the signal processing (SP) and machine learning (ML) communities. NMF identifiability has a great impact on many aspects in practice, such as ill-posed formulation avoidance and performance-guaranteed algorithm design. On the other hand, there is no tutorial paper that introduces NMF from an identifiability viewpoint. In this paper, we aim at filling this gap by offering a comprehensive and deep tutorial on model identifiability of NMF as well as the connections to algorithms and applications. This tutorial will help researchers and graduate students grasp the essence and insights of NMF, thereby avoiding typical `pitfalls' that are often times due to unidentifiable NMF formulations. This paper will also help practitioners pick/design suitable factorization tools for their own problems.
Article
Full-text available
In this letter, we propose a new identification criterion that guarantees the recovery of the low-rank latent factors in the nonnegative matrix factorization (NMF) model, under mild conditions. Specifically, using the proposed criterion, it suffices to identify the latent factors if the rows of one factor are \emph{sufficiently scattered} over the nonnegative orthant, while no structural assumption is imposed on the other factor except being full-rank. This is by far the mildest condition under which the latent factors are provably identifiable from the NMF model.
Article
Full-text available
In this paper, we introduce and provide a short overview of nonnegative matrix factorization (NMF). Several aspects of NMF are discussed, namely, the application in hyperspectral imaging, geometry and uniqueness of NMF solutions, complexity, algorithms, and its link with extended formulations of polyhedra. In order to put NMF into perspective, the more general problem class of constrained low-rank matrix approximation problems is first briefly introduced.
Article
Full-text available
This paper revisits blind source separation of instantaneously mixed quasi-stationary sources (BSS-QSS), motivated by the observation that in certain applications (e.g., speech) there exist time frames during which only one source is active, or locally dominant. Combined with nonnegativity of source powers, this endows the problem with a nice convex geometry that enables elegant and efficient BSS solutions. Local dominance is tantamount to the so-called pure pixel/separability assumption in hyperspectral unmixing/nonnegative matrix factorization, respectively. Building on this link, a very simple algorithm called successive projection algorithm (SPA) is considered for estimating the mixing system in closed form. To complement SPA in the specific BSS-QSS context, an algebraic preprocessing procedure is proposed to suppress short-term source cross-correlation interference. The proposed procedure is simple, effective, and supported by theoretical analysis. Solutions based on volume minimization (VolMin) are also considered. By theoretical analysis, it is shown that VolMin guarantees perfect mixing system identifiability under an assumption more relaxed than (exact) local dominance—which means wider applicability in practice. Exploiting the specific structure of BSS-QSS, a fast VolMin algorithm is proposed for the overdetermined case. Careful simulations using real speech sources showcase the simplicity, efficiency, and accuracy of the proposed algorithms.
Article
Full-text available
In blind hyperspectral unmixing (HU), the pure-pixel assumption is well-known to be powerful in enabling simple and effective blind HU solutions. However, the pure-pixel assumption is not always satisfied in an exact sense, especially for scenarios where pixels are all intimately mixed. In the no pure-pixel case, a good blind HU approach to consider is the minimum volume enclosing simplex (MVES). Empirical experience has suggested that MVES algorithms can perform well without pure pixels, although it was not totally clear why this is true from a theoretical viewpoint. This paper aims to address the latter issue. We develop an analysis framework wherein the perfect identifiability of MVES is studied under the noiseless case. We prove that MVES is indeed robust against lack of pure pixels, as long as the pixels do not get too heavily mixed and too asymmetrically spread. Also, our analysis reveals a surprising and counter-intuitive result, namely, that MVES becomes more robust against lack of pure pixels as the number of endmembers increases. The theoretical results are verified by numerical simulations.
Article
Full-text available
Blind hyperspectral unmixing (HU), also known as unsupervised HU, is one of the most prominent research topics in signal processing (SP) for hyperspectral remote sensing [1], [2]. Blind HU aims at identifying materials present in a captured scene, as well as their compositions, by using high spectral resolution of hyperspectral images. It is a blind source separation (BSS) problem from a SP viewpoint. Research on this topic started in the 1990s in geoscience and remote sensing [3]-[7], enabled by technological advances in hyperspectral sensing at the time. In recent years, blind HU has attracted much interest from other fields such as SP, machine learning, and optimization, and the subsequent cross-disciplinary research activities have made blind HU a vibrant topic. The resulting impact is not just on remote sensing - blind HU has provided a unique problem scenario that inspired researchers from different fields to devise novel blind SP methods. In fact, one may say that blind HU has established a new branch of BSS approaches not seen in classical BSS studies. In particular, the convex geometry concepts - discovered by early remote sensing researchers through empirical observations [3]-[7] and refined by later research - are elegant and very different from statistical independence-based BSS approaches established in the SP field. Moreover, the latest research on blind HU is rapidly adopting advanced techniques, such as those in sparse SP and optimization. The present development of blind HU seems to be converging to a point where the lines between remote sensing-originated ideas and advanced SP and optimization concepts are no longer clear, and insights from both sides would be used to establish better methods.
Article
Full-text available
In this paper, we design a hierarchical clustering algorithm for high-resolution hyperspectral images. At the core of the algorithm, a new rank-two nonnegative matrix factorizations (NMF) algorithm is used to split the clusters, which is motivated by convex geometry concepts. The method starts with a single cluster containing all pixels, and, at each step, (i) selects a cluster in such a way that the error at the next step is minimized, and (ii) splits the selected cluster into two disjoint clusters using rank-two NMF in such a way that the clusters are well balanced and stable. The proposed method can also be used as an endmember extraction algorithm in the presence of pure pixels. The effectiveness of this approach is illustrated on several synthetic and real-world hyperspectral images, and shown to outperform standard clustering techniques such as k-means, spherical k-means and standard NMF.
Article
In the nonnegative matrix factorization (NMF) problem we are given an $n \times m$ nonnegative matrix $M$ and an integer $r > 0$. Our goal is to express $M$ as $A W$, where $A$ and $W$ are nonnegative matrices of size $n \times r$ and $r \times m$, respectively. In some applications, it makes sense to ask instead for the product $AW$ to approximate $M$, i.e. (approximately) minimize $\left\lVert{M - AW}_F\right\rVert$, where $\left\lVert\right\rVert_F$, denotes the Frobenius norm; we refer to this as approximate NMF. This problem has a rich history spanning quantum mechanics, probability theory, data analysis, polyhedral combinatorics, communication complexity, demography, chemometrics, etc. In the past decade NMF has become enormously popular in machine learning, where $A$ and $W$ are computed using a variety of local search heuristics. Vavasis recently proved that this problem is NP-complete. (Without the restriction that $A$ and $W$ be nonnegative, both the exact and approximate problems can be solved optimally via the singular value decomposition.) We initiate a study of when this problem is solvable in polynomial time. Our results are the following: 1. We give a polynomial-time algorithm for exact and approximate NMF for every constant $r$. Indeed NMF is most interesting in applications precisely when $r$ is small. 2. We complement this with a hardness result, that if exact $NMF$ can be solved in time $(nm)^{o(r)}$, 3-SAT has a subexponential-time algorithm. This rules out substantial improvements to the above algorithm. 3. We give an algorithm that runs in time polynomial in $n$, $m$, and $r$ under the separablity condition identified by Donoho and Stodden in 2003. The algorithm may be practical since it is simple and noise tolerant (under benign assumptions). Separability is believed to hold in many practical settings. To the best of our knowledge, this last result is the first example of a polynomial-time algorithm that provably works under a non-trivial condition on the input and we believe that this will be an interesting and important direction for future work.
Article
This paper considers \emph{volume minimization} (VolMin)-based structured matrix factorization (SMF). VolMin is a factorization criterion that decomposes a given data matrix into a basis matrix times a structured coefficient matrix via finding the minimum-volume simplex that encloses all the columns of the data matrix. Recent work showed that VolMin guarantees the identifiability of the factor matrices under mild conditions that are realistic in a wide variety of applications. This paper focuses on both theoretical and practical aspects of VolMin. On the theory side, exact equivalence of two independently developed sufficient conditions for VolMin identifiability is proven here, thereby providing a more comprehensive understanding of this aspect of VolMin. On the algorithm side, computational complexity and sensitivity to outliers are two key challenges associated with real-world applications of VolMin. These are addressed here via a new VolMin algorithm that handles volume regularization in a computationally simple way, and automatically detects and {iteratively downweights} outliers, simultaneously. Simulations and real-data experiments using a remotely sensed hyperspectral image and the Reuters document corpus are employed to showcase the effectiveness of the proposed algorithm.