Content uploaded by Xiao Fu
Author content
All content in this area was uploaded by Xiao Fu on Sep 26, 2017
Content may be subject to copyright.
On Identifiability of Nonnegative Matrix Factorization
Xiao Fu∗, Kejun Huang∗, and Nicholas D. Sidiropoulos
Department of Electrical and Computer Engineering, University of Minnesota,
Minneapolis, 55455, MN, United States
Email: (xfu,huang663,nikos)@umn.edu
September 5, 2017
Abstract
In this letter, we propose a new identification criterion that guarantees the recovery of
the low-rank latent factors in the nonnegative matrix factorization (NMF) model, under mild
conditions. Specifically, using the proposed criterion, it suffices to identify the latent factors if
the rows of one factor are sufficiently scattered over the nonnegative orthant, while no structural
assumption is imposed on the other factor except being full-rank. This is by far the mildest
condition under which the latent factors are provably identifiable from the NMF model.
1 Introduction
Nonnegative matrix factorization (NMF) [1,2] aims to decompose a data matrix into low-rank latent
factor matrices with nonnegativity constraints on (one or both of) the latent matrices. In other
words, given a data matrix X∈RM×Nand a targeted rank r, NMF tries to find a factorization
model X=W H>, where W∈RM×rand/or H∈RN×rtake only nonnegative values and
r≤min{M, N }.
One notable trait of NMF is model identifiability – the latent factors are uniquely identifiable
under some conditions (up to some trivial ambiguities). Identifiability is critical in parameter
estimation and model recovery. In signal processing, many NMF-based approaches have there-
fore been proposed to handle problems such as blind source separation [3], spectrum sensing [4],
and hyperspectral unmixing [5, 6], where model identifiability plays an essential role. In machine
learning, identifiability of NMF is also considered essential for applications such as latent mixture
model recovery [7], topic mining [8], and social network clustering [9], where model identifiability
is entangled with interpretability of the results.
Despite the importance of identifiability in NMF, the analytical understanding of this aspect
is still quite limited and many existing identifiability conditions for NMF are not satisfactory in
some sense. Donoho et al. [10], Laurberg et al. [11], and Huang et al. [12] have proven different
sufficient conditions for identifiability of NMF, but these conditions all require that both of the
generative factors Wand Hexhibit certain sparsity patterns or properties. The machine learning
∗The two authors contributed equally.
1
arXiv:1709.00614v1 [cs.LG] 2 Sep 2017
and remote sensing communities have proposed several factorization criteria and algorithms that
have identifiability guarantees, but these methods heavily rely on the so-called separability condition
[8,13–18]. The separability condition essentially assumes that there is a (scaled) permutation matrix
in one of the two latent factors as a submatrix, which is clearly restrictive in practice. Recently, Fu
et al. [3] and Lin et al. [19] proved that the so-called volume minimization (VolMin) criterion can
identify Wand Hwithout any assumption on one factor (say, W) except being full-rank when
the other (H) satisfies a condition which is much milder than separability. However, the caveat is
that VolMin also requires that each row of the nonnegative factor sums up to one. This assumption
implies loss of generality, and is not satisfied in many applications.
In this letter, we reveal a new identifiablity result for NMF, which is obtained from a delicate
tweak of the VolMin identification criterion. Specifically, we ‘shift’ the sum-to-one constraint on H
from its rows to its columns. As a result, we show that this ‘constraint-altered VolMin criterion’
identifies Wand Hwith provable guarantees under conditions that are much more easily satisfied
relative to VolMin. This interesting tweak is seemingly slight yet the result is significant: putting
sum-to-one constraints on the columns (instead of rows) of His without loss of generality, since
the bilinear model X=W H>can always be re-written as X=W D−1(H D)
>, where Dis a
full-rank diagonal matrix satisfying Dr,r = 1/kH:,rk1. Our new result is the only identifiability
condition that does not assume any other structure beyond the target rank on W(e.g., zero pattern
or nonnegativity) and has natural assumptions on H(relative to the restrictive row sum-to-one
assumption as in VolMin).
2 Background
To facilitate our discussion, let us formally define identifiability of constrained matrix factorization.
Definition 1. (Identifiability) Consider a data matrix generated from the model X=W\H>
\,
where W\and H\are the ground-truth factors. Let (W?,H?) be an optimal solution from an
identification criterion,
(W?,H?) = arg min
X=W H>
g(W,H).
If W\and/or H\satisfy some condition such that for all (W?,H?), we have that W?=W\ΠD
and H?=H\ΠD−1, where Πis a permutation matrix and Dis a full-rank diagonal matrix, then
we say that the matrix factorization model is identifiable under that condition1.
For the ‘plain NMF’ model [1, 10, 12, 20], the identification criterion g(W,H) is 1 (or, ∞) if W
or Hhas a negative element, and 0 otherwise. Assuming that Xcan be perfectly factored under
the postulated model, the above is equivalent to the popular least-squares NMF formulation:
min
W≥0,H≥0,
X−W H>
2
F.(1)
Several sufficient conditions for identifiability of (1) have been proposed. Early results in [10, 11]
require that one factor (say, H) satisfies the so-called the separability condition:
Definition 2 (Separability).A matrix H∈RN×r
+is separable if for every k= 1, ..., r, there exists
a row index nksuch that Hnk,:=αke
>
k, where αk>0 is a scalar and ekis the kth coordinate
vector in Rr.
1Whereas identifiability is usually understood as a property of a given model that is independent of the identifi-
cation criterion, NMF can be identifiable under a suitable identification criterion, but not under another, as we will
soon see.
2
With the separability assumption, the works in [10, 11] first revealed the reason behind the
success of NMF in many applications – NMF is unique under some conditions. The downside is that
separability is easily violated in practice – see discussions in [5]. In addition, the conditions in [10,11]
also need that Wto exhibit a certain zero pattern on top of Hsatisfying separability. This is also
considered restrictive in practice – e.g., in hyperspectral unmixing, W:,r’s are spectral signatures,
which are always dense. The remote sensing and machine learning communities have come up
with many different separability-based identification methods without assuming zero patterns on
W, e.g., the volume maximization (VolMax) criterion [8, 13] and self-dictionary sparse regression
[8, 15, 16,21, 22], respectively. However, the separability condition was not relaxed in those works.
The stringent separability condition was considerably relaxed by Huang et al. [12] based on a
so-called sufficiently scattered condition from a geometric interpretation of NMF.
Definition 3 (Sufficiently Scattered).A matrix H∈RN×r
+is sufficiently scattered if 1) cone{H>} ⊇
C, 2) cone{H>}∗∩bdC∗={λek|λ≥0, k = 1, ..., r}, where C={x|x
>1≥√r−1kxk2},
C∗={x|x
>1≥ kxk2}, cone{H>}={x|x=H>θ,∀θ≥0}and cone{H>}∗={y|x
>y≥0,∀x∈
cone{H>}} are the conic hull of H>and its dual cone, respectively, and bd denotes the boundary
of a set.
The main result in [12] is that if both Wand Hsatisfy the sufficiently scattered condition, then
the criterion in (1) has identifiability. This is a notable result since it was the first provable result in
which separability was relaxed for both Wand H. The sufficiently scattered condition essentially
means that cone{H>}contains Cas its subset, which is much more relaxed than separability that
needs cone{H>}to contain the entire nonnegative orthant; see Fig. 1.
On the other hand, the zero-pattern assumption on Wand Hare still needed in [12]. Another
line of work removed the zero pattern assumption from one factor (say, W) by using a different
identification criterion [3, 19]:
min
W∈RM×r,H∈RN×rdet W>W(2a)
s.t.X=W H>,(2b)
H1=1,H≥0,(2c)
where 1is an all-one vector with proper length. Criterion (2) aims at finding the minimum-volume
(measured by determinant) data-enclosing convex hull (or simplex). The main result in [3] is that
if the ground-truth H∈ {Y∈RN×r|Y1=1,Y≥0}and His sufficiently scattered, then,
the volume minimization (VolMin) criterion identifies the ground-truth Wand H. This very
intuitive result is illustrated in Fig. 2: if His sufficiently scattered in the nonnegative orthant,
X:,n’s are sufficiently spread in the convex hull 2spanned by the columns of W. Then, finding the
minimum-volume data-enclosing convex hull recovers the ground-truth W. This result resolves the
long-standing Craig’s conjecture in remote sensing [23] proposed in the 1990s.
The VolMin identifiability condition is intriguing since it completely sets Wfree – there is
no assumption on the ground-truth Wexcept for being full-column rank, and it has a very mild
assumption on H. There is a caveat, however: The VolMin criterion needs an extra condition on
the ground-truth H, namely H1=1, so that the columns of Xall live in the convex hull (not conic
hull as in the general NMF case) spanned by the columns of W– otherwise, the geometric intuition
of VolMin in Fig. 2 does not make sense. Many NMF problem instances stemming from applications
2The convex hull of Wis defined as conv{W:,1,...,W:,r}={x|x=W θ ,∀θ≥0,1
>
θ= 1}.
3
do not naturally satisfy this assumption. The common trick is to normalize the columns of Xusing
their `1-norms [14] so that an equivalent model with this sum-to-one assumption holding is enforced
– but normalization only works when the ground-truth Wis also nonnegative. This raises a natural
question: can we essentially keep the advantages of VolMin identifiability (namely, no structural
assumption on W(other than low-rank) and no separability requirement on H) without assuming
sum-to-one on the rows of the ground-truth H?
3 Main Result
Our main result in this letter fixes the issues with the VolMin identifiability. Specifically, we
show that, with a careful and delicate tweak to the VolMin criterion, one can identify the model
X=W H>without assuming the sum-to-one condition on the rows of H:
Theorem 1. Assume that X=W\H>
\where W\∈RM×rand H\∈RN×rand that rank(X) =
rank(W\) = r. Also, assume that H\is sufficiently scattered. Let (W?,H?)be the optimal solution
of the following identification criterion:
min
W∈RM×r,H∈RN×rdet W>W(3a)
s.t.X=W H>,(3b)
H>1=1,H≥0.(3c)
Then, W?=W\ΠDand H?=H\ΠD−1must hold, where Πand Ddenotes a permutation
matrix and a full-rank diagonal matrix, respectively,
At first glance, the identification criterion in (3) looks similar to VolMin in (2). The difference
lies between (2c) and (3c). In (3c), we ‘shift’ the sum-to-one condition to the columns of H, rather
than enforcing it on the rows of H. This simple modification makes a big difference in terms of
generality: Enforcing columns of Hto be sum-to-one entails no loss in generality, since in bilinear
factorization models like X=W H>there is always an intrinsic scaling ambiguity of the columns.
In other words, one can always assume the columns of Hare scaled by a diagonal matrix and then
counter scale the corresponding columns of W, which will not affect the factorization model; i.e.,
X= (W D−1)(HD)
>still holds. Therefore, there is no need for data normalization to enforce this
constraint, as opposed to the VolMin case. In fact, the identifiability of (3) holds for H>1=ρ1for
any ρ > 0 – we use ρ= 1 only for notational simplicity.
We should mention that avoiding normalization is a significant advantage in practice even when
W≥0holds, especially when there is noise – since normalization may amplify noise. It was
also reported in the literature that normalization degrades performance of text mining significantly
since it usually worsens the conditioning of the data matrix [24]. In addition, as mentioned, in
applications where Wnaturally contains negative elements (e.g., channel identification in MIMO
communications), even normalization cannot enforce the VolMin model.
It is worth noting that the criterion in Theorem 1 has by far the most relaxed identifiability
conditions for nonnegative matrix factorization. A detailed comparison of different NMF conditions
are listed in Table 1, where one can see that Criterion (3) works under the mildest conditions on
both Hand W. Specifically, compared to plain NMF, the new criterion does not assume any
structure on W; compared to VolMin, it does not need the sum-to-one assumption on the rows of
Hor nonnegativity of W; it also does not need separability, which is inherited from the advantage
of VolMin.
4
Figure 1: Illustration of the separability (left) and sufficiently scattered (right) conditions by as-
suming that the viewer stands in the nonnegative orthant and faces the origin. The dots are rows of
H; the triangle is the nonnegative orthant; the circle is C; the shaded region is cone{H>}. Clearly,
separability is special case of the sufficiently scattered condition.
Table 1: Different assumptions on Wand Hfor identifiability of NMF.
plain [12] Self-dict [15, 16, 22] VolMax [8, 13] VolMin [3, 19] Proposed
WNN, Suff. NN, Full-rank NN, Full-rank NN, Full-rank Full-rank
(Full-rank) ( Full-rank) (Full-rank)
HNN. Suff. NN. Sep. NN. Sep NN., Suff. NN. Suff.
(NN., Sep., row sto) (NN., Sep., row sto) (NN., Suff., row sto)
Note: ‘NN’ means nonnegativity, ‘Sep.’ means separability, ‘Suff.’ denotes the sufficiently scattered
condition, and ‘sto’ denotes sum-to-one. The conditions in ‘(·)’ give an alternative set of conditions
for the corresponding approach.
In the next section, we will show the proof of Theorem 1. We should remark that the although
it seems that shifting the sum-to-one constraint to the columns of His a ‘small’ modification to
VolMin, the result in Theorem 1 was not obvious at all before we proved it: by this modification, the
clear geometric intuition of VolMin no longer holds – the objective in (3) no longer corresponds to
the volume of a data-enclosing convex hull and has no geometric interpretation any more. Indeed,
our proof for the new criterion is purely algebraic rather than geometric.
4 Proof of Theorem 1
The major insights of the proof are evolved from the VolMin work of the authors and variants
[3, 25, 26], with proper modifications to show Theorem 1. To proceed, let us first introduce the
following classic lemma in convex analysis:
Lemma 1. [27] If K1and K2are convex cones and K1⊆ K2, then, K∗
2⊆ K∗
1,where K∗
1and K∗
2
denote the dual cones of K1and K2, respectively.
Our purpose is to show that the optimization criterion in (3) outputs W?and H?that are the
column-scaled and permuted versions of the ground-truth W\and H\. To this end, let us denote
(c
W∈RM×r,c
H∈RN×r) as a feasible solution of Problem (3) that satisfies the constraints in (3),
i.e.,
X=c
Wc
H>,c
H>1=1,c
H≥0.(4)
5
Figure 2: The intuition of VolMin. The shaded region is conv {W:,1,...,W:,r}; the dots are X:,n ’s;
the dash lines are enclosing convex hulls; the bold dashed lines comprise the minimum-volume
data-enclosing convex hull.
Note that X=W\H>
\and that W\has full-column rank. In addition, since H\is sufficiently
scattered, rank(H\) = ralso holds [26, Lemma1]. Consequently, there exists an invertible A∈Rr×r
such that c
H=H\A,c
W=W\A−>.(5)
This is because c
Wand c
Hhas to have full column-rank and thus H\and c
Hspan the same subspace.
Otherwise, rank(X) = rcannot hold. Since (4) holds, one can see that
c
H>1=A
>H>
\1=A
>1=1.(6)
By (4), we also have H\A≥0.By the definition of a dual cone, H\A≥0 means that ai∈
cone{H>
\}∗, where aiis the i-column of A, for all i= 1, ..., r. Because H\is sufficiently scattered,
we have that C ⊆ cone{H\}which, together with Lemma 1, leads to cone{H>
\}⊆C∗. This further
implies that ai∈ C∗, which means kaik2≤1
>ai,by the definition of C∗. Then we have the following
chain
|det(A)| ≤
k
Y
i=1 kaik2(7a)
≤
k
Y
i=1
1
>ai(7b)
= 1,(7c)
where (7a) is Hadamard’s inequality, and (7b) is due to ai∈C∗.
Now, suppose the equality is attained, i.e., |det(A)|= 1, then all the inequalities in (7) hold
as equality, and specifically (7b) means that the columns of Alie on the boundary of C∗. Recall
that ai∈cone{H>
\}∗, and H\being sufficiently scattered, according to the second requirement in
Definition 3, shows that cone{H>
\}∗∩bdC∗={λek|λ≥0, k = 1, ..., r},therefore ai’s can only be
the ek’s. In other words, Acan only be a permutation matrix.
Suppose that an optimal solution H?of (3) is not a column permutation of H\. Since W\and
H\are clearly feasible for (3), this means that det(W>
?W?)≤det(W>
\W\). We also know that for
6
every feasible solution, including W?and H?, Eq. (5) holds, which means we have H?=H\Aand
W?=W\A−> hold for a certain invertible A∈Rr×r. Since H\is sufficiently scattered, according
to (7b), and our assumption that Ais not a permutation matrix, we have |det(A)|<1.However,
the optimal objective of (3) is
det(W>
?W?) = det(A−1W>
\W\A−>)
= det(A−1) det(W>
\W\) det(A−>)
=|det(A)|−2det(W>
\W\)
>det(W>
\W\),
which contradicts our first assumption that (W?,H?) is an optimal solution for (3). Therefore, H?
must be a column permutation of H\.Q.E.D.
As a remark, the proof of Theorem 1 follows the same rationale of that of the VolMin identifia-
bility as in [3]. The critical change is that we have made use of the relationship between sufficiently
scattered Hand the inequality in (7) here. This inequality appeared in [25,26] but was not related
to the bilinear matrix factorization criterion in (3) – which might be by far the most important
application of this inequality. The interesting and surprising point is that, by this simple yet deli-
cate tweak , the identifiability criterion can cover a substantially wider range of applications which
naturally involve W’s that are not nonnegative.
5 Validation and Discussion
The identification criterion in (3) is a nonconvex optimization problem. In particular, the bilinear
constraint X=W H>is not easy to handle. However, the existing work-arounds for handling
VolMin can all be employed to deal with Problem (3). One popular method for VolMin is to first
take the singular value decomposition (SVD) of the data X=UΣV>, where U∈RM×r,Σ∈Rr×r
and V∈RN×r. Then, V>=f
W H>holds where f
W∈Rr×ris invertible, because Vand Hspan
the same range space. One can use (3) to identify Hfrom the data model f
X=V>=f
W H>.
Since f
Wis square and nonsingular, it has an inverse Q=f
W−1. The identification criterion in
(3) can be recast as maxQ∈Rr×r|det (Q)|,s.t.Qf
X1=1,Qf
X≥0.This reformulated problem is
much more handy from an optimization point of view. To be specific, one can fix all the columns
in Qexcept one, e.g., qi. Then the optimization w.r.t. qiis a linear function, i.e., det(Q) =
Pr
i=1(−1)i+kQk,i det(Qk,i) = p
>qi, where p= [p1, . . . , pr]
>,pk= (−1)i+kdet(Qk,i),∀k= 1, ..., r,
and Qk,i is a submatrix of Qwithout the kth row and ith column of Q. Maximizing |p
>qi|subject
to linear constraints can be solved via maximizing both p
>qiand −p
>qi, followed by picking the
solution that gives larger absolute objective. Then, cyclically updating the columns of Mresults
in an alternating optimization (AO) algorithm. Similar SVD and AO based solvers were proposed
to handle VolMin and its variants in [25, 26, 28], and empirically good results have been observed.
Note that the AO procedure is not the only possible solver here. When the data is very noisy, one
can reformulate the problem in (3) as minW,H>
1=1,H≥0
X−W H>
2
F+λdet(W>W),where
λ > 0 balances the determinant term and the data fidelity. Many algorithms for regularized NMF
can be employed and modified to handle the above.
An illustrative simulation is shown in Table 2 to showcase the soundness of the theorem. In this
simulation, we generate X=W\H>
\with r= 5,10 and M=N= 200. We tested several cases.
1) W\≥0,H\≥0, and both W\and H\are sufficiently scattered; 2) W\≥0,H\≥0, and H\
7
Table 2: MSEs of the estimated c
H.
Method
MSE of H
case 1 (sp. W) case 2 (den. W) case 3 (Gauss. W)
Plain (r= 5) 5.49E-05 0.0147 0.7468
VolMin (r= 5) 1.36E-08 7.31E-10 1.0406
Proposed (r= 5) 7.32E-18 7.78E-18 8.44E-18
Plain (r= 10) 4.82E-04 0.0403 0.8003
VolMin (r= 10) 8.64E-09 8.66E-09 1.2017
Proposed (r= 10) 6.54E-18 5.02E-18 6.38E-18
is sufficiently scattered but W\is completely dense; 3) W\follows the i.i.d. normal distribution,
and H\≥0is sufficiently scattered. We generate sufficiently scattered factors following [29] – i.e.,
we generate the elements of a factor following the unifom distribution between zero and one and
zero out 35% of its elements, randomly. This way, the obtained factor is empirically sufficiently
scattered with an overwhelming probability. We employ the algorithm for fitting-based NMF in [20],
the VolMin algorithm in [30], and the described algorithm to handle the new criterion, respectively.
We measure the performance of different approaches by measuring the mean-squared-error (MSE)
of the estimated c
H, which is defined as MSE = minπ∈Π1
rPr
k=1
H\:,k/kH\:,k k2−
c
H:,πk/k
c
H:,πkk2
2
2,
where Π is the set of all permutations of {1,2, . . . , r}. The results are obtained by averaging 50
random trials.
Table 2 matches our theoretical analysis. All the algorithms work very well on case 1, where
both W\and H\are sparse (sp.) and sufficiently scattered. In case 2, since Wis nonegative yet
dense (den.), plain NMF fails as expected, but VolMin still works, since normalization can help
enforce its model when W≥0. In case 3, when Wfollows the i.i.d. normal distribution, VolMin
fails since normalization does not help – while the proposed method still works perfectly.
To conclude, in this letter we discussed the identifiability issues with the current NMF ap-
proaches. We proposed a new NMF identification criterion that is a simple yet careful tweak of
the existing volume minimization criterion. We show that, by slightly modifying the constraints
of VolMin, the identifiability of the proposed criterion holds under the same sufficiently scattered
condition in VolMin, but the modified criterion covers a much wider range of applications including
the cases where one factor is not nonnegative. This new criterion offers identifiability to the largest
variety of cases amongst the known results.
8
References
[1] D. Lee and H. Seung, “Learning the parts of objects by non-negative matrix factorization,”
Nature, vol. 401, no. 6755, pp. 788–791, 1999.
[2] N. Gillis, “The why and how of nonnegative matrix factorization,” Regularization, Optimiza-
tion, Kernels, and Support Vector Machines, vol. 12, p. 257, 2014.
[3] X. Fu, W.-K. Ma, K. Huang, and N. D. Sidiropoulos, “Blind separation of quasi-stationary
sources: Exploiting convex geometry in covariance domain,” IEEE Trans. Signal Process.,
vol. 63, no. 9, pp. 2306–2320, May 2015.
[4] X. Fu, W.-K. Ma, and N. Sidiropoulos, “Power spectra separation via structured matrix fac-
torization,” IEEE Trans. Signal Process., vol. 64, no. 17, pp. 4592–4605, 2016.
[5] W.-K. Ma, J. Bioucas-Dias, T.-H. Chan, N. Gillis, P. Gader, A. Plaza, A. Ambikapathi, and
C.-Y. Chi, “A signal processing perspective on hyperspectral unmixing,” IEEE Signal Process.
Mag., vol. 31, no. 1, pp. 67–81, Jan 2014.
[6] X. Fu, K. Huang, B. Yang, W.-K. Ma, and N. Sidiropoulos, “Robust volume-minimization
based matrix factorization for remote sensing and document clustering,” IEEE Trans. Signal
Process., vol. 64, no. 23, pp. 6254–6268, 2016.
[7] A. Anandkumar, Y.-K. Liu, D. J. Hsu, D. P. Foster, and S. M. Kakade, “A spectral algorithm
for latent Dirichlet allocation,” in Advances in Neural Information Processing Systems, 2012,
pp. 917–925.
[8] S. Arora, R. Ge, Y. Halpern, D. Mimno, A. Moitra, D. Sontag, Y. Wu, and M. Zhu, “A
practical algorithm for topic modeling with provable guarantees,” in International Conference
on Machine Learning (ICML), 2013.
[9] X. Mao, P. Sarkar, and D. Chakrabarti, “On mixed memberships and symmetric nonnegative
matrix factorizations,” in International Conference on Machine Learning, 2017, pp. 2324–2333.
[10] D. Donoho and V. Stodden, “When does non-negative matrix factorization give a correct
decomposition into parts?” in NIPS, vol. 16, 2003.
[11] H. Laurberg, M. G. Christensen, M. D. Plumbley, L. K. Hansen, and S. Jensen, “Theorems
on positive data: On the uniqueness of NMF,” Computational Intelligence and Neuroscience,
vol. 2008, 2008.
[12] K. Huang, N. Sidiropoulos, and A. Swami, “Non-negative matrix factorization revisited:
Uniqueness and algorithm for symmetric decomposition,” IEEE Trans. Signal Process., vol. 62,
no. 1, pp. 211–224, 2014.
[13] T.-H. Chan, W.-K. Ma, A. Ambikapathi, and C.-Y. Chi, “A simplex volume maximization
framework for hyperspectral endmember extraction,” IEEE Trans. Geosci. Remote Sens.,
vol. 49, no. 11, pp. 4177 –4193, Nov. 2011.
9
[14] N. Gillis and S. Vavasis, “Fast and robust recursive algorithms for separable nonnegative
matrix factorization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 4, pp. 698–714,
April 2014.
[15] X. Fu, W.-K. Ma, T.-H. Chan, and J. M. Bioucas-Dias, “Self-dictionary sparse regression
for hyperspectral unmixing: Greedy pursuit and pure pixel search are related,” IEEE J. Sel.
Topics Signal Process., vol. 9, no. 6, pp. 1128–1141, Sep. 2015.
[16] B. Recht, C. Re, J. Tropp, and V. Bittorf, “Factoring nonnegative matrices with linear pro-
grams,” in Advances in Neural Information Processing Systems, 2012, pp. 1214–1222.
[17] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm, theory, and applications,”
Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 35, no. 11, pp. 2765–
2781, 2013.
[18] E. Esser, M. Moller, S. Osher, G. Sapiro, and J. Xin, “A convex model for nonnegative matrix
factorization and dimensionality reduction on physical space,” IEEE Trans. Image Process.,
vol. 21, no. 7, pp. 3239 –3252, July 2012.
[19] C.-H. Lin, W.-K. Ma, W.-C. Li, C.-Y. Chi, and A. Ambikapathi, “Identifiability of the simplex
volume minimization criterion for blind hyperspectral unmixing: The no-pure-pixel case,”
IEEE Trans. Geosci. Remote Sens., vol. 53, no. 10, pp. 5530–5546, Oct 2015.
[20] K. Huang and N. Sidiropoulos, “Putting nonnegative matrix factorization to the test: a tutorial
derivation of pertinent Cramer-Rao bounds and performance benchmarking,” IEEE Signal
Process. Mag., vol. 31, no. 3, pp. 76–86, 2014.
[21] X. Fu and W.-K. Ma, “Robustness analysis of structured matrix factorization via self-
dictionary mixed-norm optimization,” IEEE Signal Process. Lett., vol. 23, no. 1, pp. 60–64,
2016.
[22] N. Gillis, “Robustness analysis of hottopixx, a linear programming model for factoring non-
negative matrices,” SIAM Journal on Matrix Analysis and Applications, vol. 34, no. 3, pp.
1189–1212, 2013.
[23] M. D. Craig, “Minimum-volume transforms for remotely sensed data,” IEEE Trans. Geosci.
Remote Sens., vol. 32, no. 3, pp. 542–552, 1994.
[24] A. Kumar, V. Sindhwani, and P. Kambadur, “Fast conical hull algorithms for near-separable
non-negative matrix factorization,” pp. 231–239, 2013.
[25] K. Huang, N. Sidiropoulos, E. Papalexakis, C. Faloutsos, P. Talukdar, and T. Mitchell, “Prin-
cipled neuro-functional connectivity discovery,” in Proc. SIAM SDM 2015, 2015.
[26] K. Huang, X. Fu, and N. D. Sidiropoulos, “Anchor-free correlated topic modeling: Identifia-
bility and algorithm,” in Advances in Neural Information Processing Systems, 2016.
[27] R. Rockafellar, Convex analysis. Princeton university press, 1997, vol. 28.
10
[28] T.-H. Chan, C.-Y. Chi, Y.-M. Huang, and W.-K. Ma, “A convex analysis-based minimum-
volume enclosing simplex algorithm for hyperspectral unmixing,” IEEE Trans. Signal Process.,
vol. 57, no. 11, pp. 4418 –4432, Nov. 2009.
[29] H. Kim and H. Park, “Nonnegative matrix factorization based on alternating nonnegativ-
ity constrained least squares and active set method,” SIAM journal on matrix analysis and
applications, vol. 30, no. 2, pp. 713–730, 2008.
[30] J. M. Bioucas-Dias, “A variable splitting augmented lagrangian approach to linear spectral
unmixing,” in Proc. IEEE WHISPERS’09, 2009, pp. 1–4.
11