Conference PaperPDF Available

Volume regularized Non-negative Matrix Factorizations

Authors:

Abstract and Figures

This work considers two volume regularized non-negative matrix factorization (NMF) problems that decompose a non-negative matrix X into the product of two nonnegative matrices W and H with a regularization on the volume of the convex hull spanned by the columns of W. This regularizer takes two forms: the determinant (det) and logarithm of the determinant (logdet) of the Gramian of W. In this paper, we explore the structure of these problems and present several algorithms, including a new algorithm based on an eigenvalue upper bound of the logdet function. Experimental results on synthetic data show that (i) the new algorithm is competitive with the standard Taylor bound, and (ii) the logdet regularizer works better than the det regularizer. We also illustrate the applicability of the new algorithm on the San Diego airport hyperspectral image.
Content may be subject to copyright.
VOLUME REGULARIZED NON-NEGATIVE MATRIX FACTORIZATIONS
Andersen M.S. Ang, Nicolas Gillis
University of Mons
Department of Mathematics and Operational Research
Rue de Houdain 9, 7000 Mons, Belgium
ABSTRACT
This work considers two volume regularized non-negative
matrix factorization (NMF) problems that decompose a non-
negative matrix Xinto the product of two nonnegative ma-
trices Wand Hwith a regularization on the volume of the
convex hull spanned by the columns of W. This regularizer
takes two forms: the determinant (det) and logarithm of the
determinant (logdet) of the Gramian of W. In this paper, we
explore the structure of these problems and present several
algorithms, including a new algorithm based on an eigenvalue
upper bound of the logdet function. Experimental results on
synthetic data show that (i) the new algorithm is competitive
with the standard Taylor bound, and (ii) the logdet regularizer
works better than the det regularizer. We also illustrate the
applicability of the new algorithm on the San Diego airport
hyperspectral image.
Index TermsNon-negative matrix factorization, vol-
ume regularizer, determinant, log-determinant, coordinate de-
scent
1. INTRODUCTION
Given a non-nonnegative matrix XRm×n
+and a positive
integer rmin{m, n}, non-negative matrix factorization
(NMF) aims to approximate Xas the product of two non-
negative matrices WRm×r
+and HRn×r
+so that X
WH>. In this paper, we focus on volume regularized NMF
(VR-NMF) and would like to solve the following regularized
optimization problem:
min
W0,
H0,
H1=1
F(W,H) = f(W,H) + λg(W).(1)
where f(W,H) = 1
2kXWH>k2
Fis the data fitting term,
λ0is the regularization parameter and gis the volume
regularizer. The constraints W0and H0mean that
Wand Hare component-wise nonnegative, where 0is the
matrix of zeros. The constraint H1r=1r, where 1ris vector
of 1’s with length r, means His a row-stochastic matrix, that
is, the entries in each row of Hsum to one. This implies
Thanks to the ERC starting grant no 679515.
that the columns of Xare encapsulated inside the convex hull
spanned by the columns of W. Let wjbe the jth column
of Wand let C(W)be the convex hull spanned by the set
{wj}j=r
j=1. Figure 1 illustrates the geometry of the VR-NMF
problem: the m-dimensional data points (columns in X) are
located inside C(W)whose vertices are the wj’s. In this work
we focus on two measures for the volume of C(W)
det : g1(W) = 1
2det W>W,(2)
logdet : g2(W) = 1
2log det W>W+δIr.(3)
VR-NMF is asymmetrical with respect to (w.r.t.) Wand H
as the constraints on Wand Hare different and there is no
regularization on H. VR-NMF has several applications in
particular to unmix hyperspectral images where the columns
of Xare the spectral signatures of the pixels present in the
image, the columns of Ware the spectral signature of the
endmembers and the rows of Hare the abundances of the
endmembers in each pixel; see, e.g., [1] and the references
therein.
Contributions and organization. The contributions of
this work are : (1) develop a vector-wise update algorithm for
the VR-NMF with logdet regularization using a simple upper
bound on the logdet function, (2) analyze and compare det
and logdet regularizers. The paper is organized as follows: §2
gives the algorithmic framework we consider in this paper. §3
and 4 discusses the two regularizers g1and g2and the corre-
sponding algorithms. §5 contains the numerical experiments
and §6 concludes and provides directions of future research.
2. BLOCK COORDINATE DESCENT FOR VR-NMF
The optimization problem (1) has two block of variables, W
and H, and this work adopts the block coordinate descent
(BCD) framework with these two blocks of variables; see Al-
gorithm 1. To update H, which is a convex optimization prob-
lem, we use the fast gradient method (FGM) described in [2].
The update of Wis more difficult as this subproblem is non-
convex. This work focuses on the update of W, for which we
will use cyclic BCD with projected gradient update.
Algorithm 1 Algorithm framework for VR-NMF
Input: XRm×n
+,r3and λ > 0
Output: WRm×r
+and HRr×n
+
Initialisation :WRm×r
+,HRr×n
+
1: for k= 1,2, ... do
2: Update W
3: HFGM(W,H,X)
4: end for
3. VR-NMF WITH DETERMINANT REGULARIZER
Let us first consider g1(W) = 1
2det W>W. Focusing
on the ith column of W,det W>Wcan be written as a
quadratic function of wi[3]. In fact, letting WiRm×(r1)
+
be Wwith the column wiremoved, we have
det W>W=ηiw>
iBiwi=ηikwik2
Bi,(4)
where ηi= det W>
iWi,Bi=ImWi(W>
iWi)1Wi
is a projection matrix that projects into the orthogonal com-
plement of the column space of Wi. As Biand W>
iWiare
symmetric positive semidefinite, the right hand side of (4) is
a weighted norm. This means that the determinant regularizer
can be interpreted as a re-weighted `2norm regularization.
This also shows that the problem w.r.t. each column of Wis
convex. The objective function (1) w.r.t. wiis
F(wi) = 1
2w>
ikhik2
2Im+ληiBi
|{z }
Qi
wi− hXihi,wii+c.
(5)
with gradient F(wi) = QwiXihiand Lipschitz constant
Li=kQkFso that we can use the update
wiIQ
kQkFwi+Xihi
kQikF+
.(6)
The update (6) can be used to update Win Algorithm 1 and
we will refer to this algorithm as Det; see Algorithm 2.
Algorithm 2 – Det [3]
1: For i= 1 :r
2: |compute Qi, update wias (6)
4. VR-NMF WITH LOG DETERMINANT
REGULARIZER
We now consider g2(W) = 1
2log det W>W+δIr, where
Iris the identity matrix of order r. The term δIr(δis set to
be a small positive value, here we set δ= 1 for simplicity)
acts as a lower bound to prevent log det W>Wto go to
−∞ as Wtends to a rank deficient matrix. The non-convex
function g2has a tight convex upper bound (see §3.3 of [1]
and the reference therein), which comes from the first-order
Taylor approximation of the logdet function:
log det(W>W+δIr)trFW>W+c, (7)
where F= (Y>Y+δIr)1and YRm×ris a matrix
constant, and c= log det(F)r. Equality holds when Y=
W. Hence
FT(W) = 1
2kXWH>k2
F+λ
2trFW>W+c
is an upper bound for the objective function of (1) using g2.
The gradient and its Lipschitz constant are given by
WFT(W) = W(HH>+λF)XH>,and
L=kHH>+λFkF.
Minimizing the upper bound instead of the original objective
function using Y=W, we obtain an inexact BCD method
that we refer to as Taylor; see Algorithm 3
Algorithm 3 – Taylor [1]
1: WW1
LWFT(W)+
.
As Fis a dense matrix, Taylor updates Wmatrix-
wise, which is different from algorithm Det that updates
Wcolumn-wise. In the following, we obtain a column-wise
update based on the logdet regularizer by using a simple up-
per bound of g2. Given a matrix Awith rank r, let us denote
µi, i = 1,2, ..., r the eigenvalues of Aand µ(A)the vector
containing these eigenvalues. We will assume that they are
arranged in descending order |µ1|≥|µ2| ≥ ... ≥ |µr|.
Theorem (logdet-trace inequality). Let ARr×rbe a
positive definite (pd) matrix. Then
log det(A)νtr(A) + c, (8)
where ν=µr(Y>Y+δIr)1,c=Plog µi(Y>Y)r.
Proof. Recall the log inequality log xx1for x0.
Equality holds when x= 1. Generalizing the inequality to an
arbitrary point x0>0, we get log xx1
0x+ log x01. To
prove the logdet-trace inequality (8), let x=µi(A)and use
the facts that det(A) = Qµi, that tr(A) = Pµiand that µr
is the smallest eigenvalue of A.
Letting A=W>W+δIr, Equation (8) becomes
log det(W>W+δIr)νtr(W>W) + c. (9)
With (9), function (1) using g=g2w.r.t. wican be upper
bounded using
FE(wi) = 1
2w>
i(khik2
2+λν)Im
| {z }
Qi
wi− hXihi,wii+c.
(10)
Fig. 1. The geometry of VR-NMF as convex hull fitting. Here
(m, n, r, θ) = (10,1500,4,0.8). For visualization, the data
points were projected onto a 2-dimensional space using PCA.
We update Wby taking a gradient step, that is, use (6) with
Qidefined in (10), and we refer to this algorithm as Eigen;
see Algoritm 4.
Algorithm 4 – Eigen
1: For i= 1 :r
2: |compute Qias (10), update wias (6)
Comparing (9) with (7) reveals that νtr(W>W)is an ap-
proximation of tr(FWWT)so that (9) is an approximation
of (7). The advantage of (9) over (7) is its separable structure,
which allows the logdet regularizer to have a column-wise
update, similarly as for Det.
5. NUMERICAL EXPERIMENTS
We first conduct experiments on synthetic data to compare the
performance of the three algorithms: Det, Taylor and Eigen
for comparing g1(W)and g2(W). Then, we apply Eigen on
the real San Diego airport hyperspectral image.
5.1. Synthetic data sets
Given integers (m, n, r), each entry of the ground truth ma-
trix W0Rm×r
+is generated using the uniform distribution
in [0 1]. The matrix H0Rn×r
+is generated in the form
H=ΠIr
H0
0where H0
0R(nr)×r
+is a row-stochastic
matrix that is randomly generated using the Dirichlete distri-
bution (with parameters equal to 1) and the permutation ma-
trix Πshuffles the order of the rows of H. The clean non-
negative matrix X0=W0H0is then corrupted by additive
white Gaussian noise (scaled to fit standard signal to noise ra-
tio) NRm×rto form Xas X=X0+N. Note that in the
generation process, the rows of Hwith element Hij larger
than a threshold θ0are removed and resampled so that all
the points in Xare away from the generating vertices W0;
see Figure 1 for an illustration.
We set λas a constant 5f(W(0),H(0))
g(W(0))where W(0) is the
initialization. To initialize the variables, Wis generated using
Fig. 2. Error curves of Taylor and Eigen on fitting Xand
W0. The maximum iteration is 200 (recall Eigen has rinner
iterations to update each column of W). Both curves start
with the same initialization and move towards the bottom left
corner. Eigen makes larger progress in every outer iteration
than Taylor.The unit of the axes is percentage. The final value
of Taylor and Eigen are (0.84, 4.82) and (0.01, 2.03) respec-
tively.
the successive nonnegative projection algorithm (SNPA) from
[2], and His generated using the FGM from [2].
5.2. An illustrative example
We first compare the use of inequalities (9) and (7) in min-
imizing (1) with g2. Here (m, n, r, θ) = (20,1000,8,0.8)
and SNR=100dB (noiseless). Figures 2 shows the relative er-
rors of data fitting (kXWHkF/kXkF) and vertex fitting
(kW0ˆ
WkF/kW0kF, where ˆ
Wis the matrix Wscaled
and matched to W0using the Hungarian algorithm). Note
that Taylor uses matrix-wise update and Eigen uses column-
wise update. Hence, to make a fair comparison, we plot the
iteration on the outer loop (every riteration for Eigen).
Figures 2 and 3 show that, in this example, algorithm
Eigen performs better as it achieves lower data fitting error
both on Xand W.
5.3. Comparing Det, Taylor and Eigen
Tables 1 gives the statistics of the relative errors in percent
over 100 trials in the format of ‘average ±standard devi-
ation’ for the three algorithms, in all the experiments, we
use (m, n, r) = (20,1000,8) and a maximum number of
iterations of 200. The results show that Eigen performs sig-
nificantly better than Taylor and Det in the noiseless cases.
In noisy settings, Taylor is better than Eigen to identify the
Fig. 3. The geometry of the fittings (PCA projected).
Table 1. Comparison of Det, Taylor and Eigen on synthetic
data sets. The table reports the average and standard deviation
of the relative errors in percent over 100 trials. The first (resp.
second) column is the error on fitting X(resp. W).
θ= 0.9, no noise
Det 2.49 ±0.51 9.79 ±1.49
Taylor 0.46 ±0.12 3.29 ±0.64
Eigen 0.01 ±0.00 1.19 ±0.40
θ= 0.9,10% noise
Det 27.18 ±0.45 36.64 ±3.45
Taylor 27.76 ±0.33 25.43 ±2.37
Eigen 23.64 ±0.14 33.21 ±5.25
θ= 0.7, no noise
Det 3.36 ±0.62 11.74 ±2.05
Taylor 1.76 ±0.34 8.63 ±1.13
Eigen 0.02 ±0.01 2.80 ±1.50
θ= 0.7,10% noise
Det 27.17 ±0.42 39.03 ±3.51
Taylor 28.00 ±0.34 27.97 ±2.10
Eigen 23.58 ±0.14 37.43 ±4.10
ground truth W0. In all cases, Det perform poorly.
This experiment shows that the log determinant models
produce more accurate solutions. This can be explained as
follows: consider the singular value expression of the regular-
izers, we have log det(W>W+δIr) = Pilog(σi(W)2+δ)
while det(W>W) = Qiσ2
i(W). Hence the det regularizer
is more sensitive to the large singular values. For logdet reg-
ularizer, the log operator reduces the effect of large singular
values, and thus making a better fit. For example, the singular
values of W>W+δIrin a trial are : 37.98, 4.26, 3.47, 3.29,
2.60, 2.36, 1.73 and 1.50.
In terms of computational time, the three methods have
the same computational complexity, running in O(mnr)op-
erations per iteration. For instance, on average on the syn-
thetic data sets, the matrix-wise method Taylor takes 2.70
seconds, while the column-wise methods take about 2.63 sec-
onds.
Fig. 4. A portion of data San Diego (band no. 35, data points
no. 59000 to 60000) before and after preprocessing.
Fig. 5. The spectra of the San Diego airport image obtained
by Eigen. The x-axis is the wavelength band number and the
y-axis the reflectance. The relative percentage error on fitting
the data is 1.76%.
5.4. San Diego airport hyperspectral image
For illustration, we apply the method Eigen on the San Diego
airport image (see §3.4.3 of [4] for details) with parameters
r= 8 and maximum iteration is 100 (the initialization and λ
are chosen as for the synthetic data sets). The raw data with
sizes (m, n) = (158,4002)is preprocessed by replacing all
negative values (caused by camera shaking) to zero, spikes
are corrected by median filter with window length 20. Figure
4 shows the data before and after preprocessing. Figures 5
and 6 show the spectra and the corresponding abundance map
extracted by Eigen, respectively.
Eigen successfully decomposes the data into meaningful
components: components 1 and 2 correspond to roof tops,
components 3 and 8 corresponds to trees and grass, the re-
maning ones correspond to different road surfaces [4].
6. CONCLUSION
In this paper, we have studied two VR-NMF problems: one
with det and one with logdet regularizer. For the logdet
case, we have proposed a new column-wise update of the
columns of Wcalled Eigen, and showed that it has a better
Fig. 6. The abundance map (matrix Hcomputed with Eigen)
for the San Diego airport image.
numerical performance than the matrix-wise update algo-
rithm from [1] (Taylor) and the vector-wise update with det
regularizer from [3] (Det). We have also illustrated the abil-
ity of the method Eigen to decompose data into meaningful
components on the San Diego airport image.
Future directions include: (1) Compare Det, Taylor and Eigen
on real data. (2) Design faster algorithms, e.g., the update of
wiin Eigen contains many implicit steps and repeated com-
putation, which can be improved [5]. Furthermore, as both
Det and Eigen are BCD algorithm with PGD update, it will
be interesting to apply randomized acceleration of BCD [6].
(3) Connect to rank minimumization and nuclear norm min-
imization. The equations (4), (7) and (8) show that there is
strong connections between the different regularizations in
terms of the singular values of the matrix W. Therefore it
will be interesting to study the connection between the vol-
ume regularizer and the (convex) nuclear norm regularizer
(which is the sum of the singular values of W).
7. REFERENCES
[1] X. Fu, K. Huang, B. Yang, W.-K. Ma, and N.D.
Sidiropoulos, “Robust volume minimization-based ma-
trix factorization for remote sensing and document clus-
tering,” IEEE Trans. on Signal Processing, vol. 64, no.
23, pp. 6254–6268.
[2] N. Gillis, “Successive nonnegative projection algorithm
for robust nonnegative blind source separation,” SIAM J.
on Imaging Sciences, vol. 7, no. 2, pp. 1420–1450, 2014.
[3] G. Zhou, S. Xie, Z. Yang, J.-M. Yang, and Z. He,
“Minimum-volume-constrained nonnegative matrix fac-
torization: Enhanced ability of learning parts,” IEEE
Trans. on Neural Networks, vol. 22, no. 10, pp. 1626–
1637, 2011.
[4] N. Gillis, D. Kuang, and H. Park, “Hierarchical cluster-
ing of hyperspectral images using rank-two nonnegative
matrix factorization, IEEE Trans. on Geoscience and
Remote Sensing, vol. 53, no. 4, pp. 2066–2078, 2015.
[5] N. Gillis and F. Glineur, “Accelerated multiplicative up-
dates and hierarchical als algorithms for nonnegative ma-
trix factorization, Neural computation, vol. 24, no. 4,
pp. 1085–1105, 2012.
[6] Yu. Nesterov, “Efficiency of coordinate descent methods
on huge-scale optimization problems,” SIAM J. on Opti-
mization, vol. 22, no. 2, pp. 341–362, 2012.
... As we will see, this regularizer also performs well in practice, although not as well as V det and V logdet . The approach of using a volume regularization with NMF has a long history and has been considered for example in [15], [21], [24], [7], [1], [6], [13]. The key differences among these works are in the choice of V . ...
... Finally, as a proof of concept, we showcase the ability of VRNMF to produces a meaningful unmixing on hyperspectral images using real-world data. This work is the continuation of the conference paper [1]. The additional contributions of this extended version are the following: ...
... • We base our numerical experiments only on real endmembers, as opposed to randomly generated ones in [1]. • We use a fine grid search by bisection to tune the regularization parameter λ. • We implement VRNMF with the nuclear norm regularizer. ...
Article
In this paper, we consider nonnegative matrix factorization (NMF) with a regularization that promotes small volume of the convex hull spanned by the basis matrix. We present highly efficient algorithms for three different volume regularizers, and compare them on endmember recovery in hyperspectral unmixing. The NMF algorithms developed in this paper are shown to outperform the state-of-the-art volume-regularized NMF methods, and produce meaningful decompositions on real-world hyperspectral images in situations where endmembers are highly mixed (no pure pixels). Furthermore, our extensive numerical experiments show that when the data is highly separable, meaning that there are data points close to the true endmembers, and there are a few endmembers, the regularizer based on the determinant of the Gramian produces the best results in most cases. For data that is less separable and/or contains more endmembers, the regularizer based on the logarithm of the determinant of the Gramian performs best in general.
... As we will see, this regularizer also performs well in practice, although not as well as V det and V logdet . The approach of using a volume regularization with NMF has a long history and has been considered for example in [15], [21], [24], [7], [1], [6], [13]. The key differences among these works are in the choice of V . ...
... This work is the continuation of the conference paper [1]. The additional contributions of this extended version are the following: ...
... • We base our numerical experiments only on real endmembers, as opposed to randomly generated ones in [1]. • We use a fine grid search by bisection to tune the regularization parameter λ. • We implement VRNMF with the nuclear norm regularizer. ...
Preprint
Full-text available
In this work, we consider nonnegative matrix factorization (NMF) with a regularization that promotes small volume of the convex hull spanned by the basis matrix. We present highly efficient algorithms for three different volume regularizers, and compare them on endmember recovery in hyperspectral unmixing. The NMF algorithms developed in this work are shown to outperform the state-of-the-art volume-regularized NMF methods, and produce meaningful decompo-sitions on real-world hyperspectral images in situations where endmembers are highly mixed (no pure pixels). Furthermore, our extensive numerical experiments show that when the data is highly separable, meaning that there are data points close to the true endmembers, and there are a few endmembers, the regularizer based on the determinant of the Gramian produces the best results in most cases. For data that is less separable and/or contains more endmembers, the regularizer based on the logarithm of the determinant of the Gramian performs best in general.
... In this paper, we also consider, for the first time in the context of VRNMF, the regularizer W * , which is also a non-increasing function in the singular values of W. As we will see, this regularizer also performs well in practice. The approach of using a volume regularization with NMF has a long history and has been considered for example in [15], [20], [23], [7], [1], [6], [13]. The key differences among these works are in the choice of V . ...
... This work is the continuation of the conference paper [1]. The additional contributions of this extended version are the following: ...
... • We base our numerical experiments only on real endmembers, as opposed to randomly generated ones in [1]. • We use a fine grid search by bisection to tune the regularization parameter λ. • We implement VRNMF with the nuclear norm regularizer. ...
Preprint
In this work, we consider nonnegative matrix factorization (NMF) with a regularization that promotes small volume of the convex hull spanned by the basis matrix. We present highly efficient algorithms for three different volume regularizers, and compare them on endmember recovery in hyperspectral unmixing. The NMF algorithms developed in this work are shown to outperform the state-of-the-art volume-regularized NMF methods, and produce meaningful decompositions on real-world hyperspectral images in situations where endmembers are highly mixed (no pure pixels). Furthermore, our extensive numerical experiments show that when the data is highly separable, meaning that there are data points close to the true endmembers, and there are a few endmembers, the regularizer based on the determinant of the Gramian produces the best results in most cases. For data that is less separable and/or contains more endmembers, the regularizer based on the logarithm of the determinant of the Gramian performs best in general.
... First, it means the model prevents over-fitting. Second, compared with existing NMF models such as the minimum-volume NMF [19,5] (see below) which was shown to exhibit [3] rank-finding ability, SON-NMF is applicable to rank-deficient matrix. ...
... Review of NMF: minimum-volume and rank-deficiency SON-NMF has linkage to minvol NMF [19,27]. ...
Preprint
Full-text available
When applying nonnegative matrix factorization (NMF), generally the rank parameter is unknown. Such rank in NMF, called the nonnegative rank, is usually estimated heuristically since computing the exact value of it is NP-hard. In this work, we propose an approximation method to estimate such rank while solving NMF on-the-fly. We use sum-of-norm (SON), a group-lasso structure that encourages pairwise similarity, to reduce the rank of a factor matrix where the rank is overestimated at the beginning. On various datasets, SON-NMF is able to reveal the correct nonnegative rank of the data without any prior knowledge nor tuning. SON-NMF is a nonconvx nonsmmoth non-separable non-proximable problem, solving it is nontrivial. First, as rank estimation in NMF is NP-hard, the proposed approach does not enjoy a lower computational complexity. Using a graph-theoretic argument, we prove that the complexity of the SON-NMF is almost irreducible. Second, the per-iteration cost of any algorithm solving SON-NMF is possibly high, which motivated us to propose a first-order BCD algorithm to approximately solve SON-NMF with a low per-iteration cost, in which we do so by the proximal average operator. Lastly, we propose a simple greedy method for post-processing. SON-NMF exhibits favourable features for applications. Beside the ability to automatically estimate the rank from data, SON-NMF can deal with rank-deficient data matrix, can detect weak component with small energy. Furthermore, on the application of hyperspectral imaging, SON-NMF handle the issue of spectral variability naturally.
... Several models and algorithms have been designed, exploiting geometric or algebraic properties [19]. Among the most widely used techniques, minimum-volume NMF (MinVolNMF) [22][23][24], sparse NMF [25], and variants of archetypal analysis (AA) [26][27][28] have led to the best performances. For example, minimum-volume NMF aims at minimizing the volume delimited by the basis vectors, while sparse NMF imposes that the factors only contain a reduced number of non-zero entries. ...
Article
Constrained low-rank matrix approximations have been known for decades as powerful linear dimensionality reduction techniques able to extract the information contained in large data sets in a relevant way. However, such low-rank approaches are unable to mine complex, interleaved features that underlie hierarchical semantics. Recently, deep matrix factorization (deep MF) was introduced to deal with the extraction of several layers of features and has been shown to reach outstanding performances on unsupervised tasks. Deep MF was motivated by the success of deep learning, as it is conceptually close to some neural networks paradigms. In this survey paper, we present the main models, algorithms, and applications of deep MF through a comprehensive literature review. We also discuss theoretical questions and perspectives of research as deep MF is likely to become an important paradigm in unsupervised learning in the next few years.
... Several models and algorithms have been designed, exploiting geometric or algebraic properties [17]. Among the most widely used techniques, minimum-volume NMF (MinVolNMF) [20][21][22], sparse NMF [23], and variants of archetypal analysis (AA) [24][25][26] have led to the best performances. For example, minimum-volume NMF aims at minimizing the volume delimited by the basis vectors, while sparse NMF imposes that the factors only contain a reduced number of non-zero entries. ...
Preprint
Full-text available
Constrained low-rank matrix approximations have been known for decades as powerful linear dimensionality reduction techniques to be able to extract the information contained in large data sets in a relevant way. However, such low-rank approaches are unable to mine complex, interleaved features that underlie hierarchical semantics. Recently, deep matrix factorization (deep MF) was introduced to deal with the extraction of several layers of features and has been shown to reach outstanding performances on unsupervised tasks. Deep MF was motivated by the success of deep learning, as it is conceptually close to some neural networks paradigms. In this paper, we present the main models, algorithms, and applications of deep MF through a comprehensive literature review. We also discuss theoretical questions and perspectives of research.
... The minimum eigenvalue ofd = + E XX I det T K () is denoted by m min . In practice, we use X in the previous iteration to compute  X(Ang & Gillis 2018). ...
Article
Full-text available
Photometric variation of a directly imaged planet contains information on both the geography and spectra of the planetary surface. We propose a novel technique that disentangles the spatial and spectral information from the multiband reflected light curve. This will enable us to compose a two-dimensional map of the surface composition of a planet with no prior assumption on the individual spectra, except for the number of independent surface components. We solve the unified inverse problem of the spin–orbit tomography and spectral unmixing by generalizing the nonnegative matrix factorization using a simplex volume minimization method. We evaluated our method on a toy cloudless Earth and observed that the new method could accurately retrieve the geography and unmix spectral components. Furthermore, our method is also applied to the real-color variability of the Earth as observed by Deep Space Climate Observatory. The retrieved map explicitly depicts the actual geography of the Earth, and unmixed spectra capture features of the ocean, continents, and clouds. It should be noted that the two unmixed spectra consisting of the reproduced continents resemble those of soil and vegetation.
... The minimum eigenvalue ofd = + E XX I det T K () is denoted by m min . In practice, we use X in the previous iteration to compute  X(Ang & Gillis 2018). ...
Preprint
Full-text available
Photometric variation of a directly imaged planet contains information on both the geography and spectra of the planetary surface. We propose a novel technique that disentangles the spatial and spectral information from the multi-band reflected light curve. This will enable us to compose a two-dimensional map of the surface composition of a planet with no prior assumption on the individual spectra, except for the number of independent surface components. We solve the unified inverse problem of the spin-orbit tomography and spectral unmixing by generalizing the non-negative matrix factorization (NMF) using a simplex volume minimization method. We evaluated our method on a toy cloudless Earth and observed that the new method could accurately retrieve the geography and unmix spectral components. Furthermore, our method is also applied to the real-color variability of the Earth as observed by Deep Space Climate Observatory (DSCOVR). The retrieved map explicitly depicts the actual geography of the Earth and unmixed spectra capture features of the ocean, continents, and clouds. It should be noted that, the two unmixed spectra consisting of the reproduced continents resemble those of soil and vegetation.
Article
Full-text available
Sensor selection is one of the key factors that dictate the performance of estimating vertical wheel forces in vehicle durability design. To select K most relevant sensors among S candidate ones that best fit the response of one vertical wheel force, it has S!/(K!(S-K)!) possible choices to evaluate, which is not practical unless K or S is small. In order to tackle this issue, this paper proposes a data-driven method based on maximizing the marginal likelihood of the data of the vertical wheel force without knowing the dynamics of vehicle systems. Although the resulting optimization problem is a mixed-integer programming problem, it is relaxed to a convex problem with continuous variables and linear constraints. The proposed sensor selection method is flexible and easy to implement, and no additional hyper-parameters needed to be tuned using cross-validation. The feasibility and effectiveness of the proposed method are verified using experimental data in vehicle durability design. The results show that the proposed method has good performance with different data sizes and model orders, in providing sub-optimal sensor configurations for estimating vertical wheel forces in vehicles.
Thesis
Full-text available
Cette thèse aborde le démélange en-ligne d’images hyperspectrales acquises par un imageur pushbroom, pour la caractérisation en temps réel du matériau bois. La première partie de cette thèse propose un modèle de mélange en-ligne fondé sur la factorisation en matrices non-négatives. À partir de ce modèle, trois algorithmes pour le démélange séquentiel en-ligne, fondés respectivement sur les règles de mise à jour multiplicatives, le gradient optimal de Nesterov et l’optimisation ADMM (Alternating Direction Method of Multipliers) sont développés. Ces algorithmes sont spécialement conçus pour réaliser le démélange en temps réel, au rythme d'acquisition de l'imageur pushbroom. Afin de régulariser le problème d’estimation (généralement mal posé), deux sortes de contraintes sur les endmembers sont utilisées : une contrainte de dispersion minimale ainsi qu’une contrainte de volume minimal. Une méthode pour l’estimation automatique du paramètre de régularisation est également proposée, en reformulant le problème de démélange hyperspectral en-ligne comme un problème d’optimisation bi-objectif. Dans la seconde partie de cette thèse, nous proposons une approche permettant de gérer la variation du nombre de sources, i.e. le rang de la décomposition, au cours du traitement. Les algorithmes en-ligne préalablement développés sont ainsi modifiés, en introduisant une étape d’apprentissage d’une bibliothèque hyperspectrale, ainsi que des pénalités de parcimonie permettant de sélectionner uniquement les sources actives. Enfin, la troisième partie de ces travaux consiste en l’application de nos approches à la détection et à la classification des singularités du matériau bois.
Article
Full-text available
In this paper, we design a hierarchical clustering algorithm for high-resolution hyperspectral images. At the core of the algorithm, a new rank-two nonnegative matrix factorizations (NMF) algorithm is used to split the clusters, which is motivated by convex geometry concepts. The method starts with a single cluster containing all pixels, and, at each step, (i) selects a cluster in such a way that the error at the next step is minimized, and (ii) splits the selected cluster into two disjoint clusters using rank-two NMF in such a way that the clusters are well balanced and stable. The proposed method can also be used as an endmember extraction algorithm in the presence of pure pixels. The effectiveness of this approach is illustrated on several synthetic and real-world hyperspectral images, and shown to outperform standard clustering techniques such as k-means, spherical k-means and standard NMF.
Article
Full-text available
In this paper, we propose a new fast and robust recursive algorithm for near-separable nonnegative matrix factorization, a particular nonnegative blind source separation problem. This algorithm, which we refer to as the successive nonnegative projection algorithm (SNPA), is closely related to the popular successive projection algorithm (SPA), but takes advantage of the nonnegativity constraint in the decomposition. We prove that SNPA is more robust than SPA and can be applied to a broader class of nonnegative matrices. This is illustrated on some synthetic data sets, and on a real-world hyperspectral image.
Article
Full-text available
Nonnegative matrix factorization (NMF) is a data analysis technique used in a great variety of applications such as text mining, image processing, hyperspectral data analysis, computational biology, and clustering. In this letter, we consider two well-known algorithms designed to solve NMF problems: the multiplicative updates of Lee and Seung and the hierarchical alternating least squares of Cichocki et al. We propose a simple way to significantly accelerate these schemes, based on a careful analysis of the computational cost needed at each iteration, while preserving their convergence properties. This acceleration technique can also be applied to other algorithms, which we illustrate on the projected gradient method of Lin. The efficiency of the accelerated algorithms is empirically demonstrated on image and text data sets and compares favorably with a state-of-the-art alternating nonnegative least squares algorithm.
Article
This paper considers \emph{volume minimization} (VolMin)-based structured matrix factorization (SMF). VolMin is a factorization criterion that decomposes a given data matrix into a basis matrix times a structured coefficient matrix via finding the minimum-volume simplex that encloses all the columns of the data matrix. Recent work showed that VolMin guarantees the identifiability of the factor matrices under mild conditions that are realistic in a wide variety of applications. This paper focuses on both theoretical and practical aspects of VolMin. On the theory side, exact equivalence of two independently developed sufficient conditions for VolMin identifiability is proven here, thereby providing a more comprehensive understanding of this aspect of VolMin. On the algorithm side, computational complexity and sensitivity to outliers are two key challenges associated with real-world applications of VolMin. These are addressed here via a new VolMin algorithm that handles volume regularization in a computationally simple way, and automatically detects and {iteratively downweights} outliers, simultaneously. Simulations and real-data experiments using a remotely sensed hyperspectral image and the Reuters document corpus are employed to showcase the effectiveness of the proposed algorithm.
Article
Nonnegative matrix factorization (NMF) with minimum-volume-constraint (MVC) is exploited in this paper. Our results show that MVC can actually improve the sparseness of the results of NMF. This sparseness is L(0)-norm oriented and can give desirable results even in very weak sparseness situations, thereby leading to the significantly enhanced ability of learning parts of NMF. The close relation between NMF, sparse NMF, and the MVC_NMF is discussed first. Then two algorithms are proposed to solve the MVC_NMF model. One is called quadratic programming_MVC_NMF (QP_MVC_NMF) which is based on quadratic programming and the other is called negative glow_MVC_NMF (NG_MVC_NMF) because it uses multiplicative updates incorporating natural gradient ingeniously. The QP_MVC_NMF algorithm is quite efficient for small-scale problems and the NG_MVC_NMF algorithm is more suitable for large-scale problems. Simulations show the efficiency and validity of the proposed methods in applications of blind source separation and human face images analysis.
Article
In this paper we propose new methods for solving huge-scale optimization problems. For problems of this size, even the simplest full-dimensional vector operations are very expensive. Hence, we propose to apply an optimization technique based on random partial update of decision variables. For these methods, we prove the global estimates for the rate of convergence. Surprisingly enough, for certain classes of objective functions, our results are better than the standard worst-case bounds for deterministic algorithms. We present constrained and unconstrained versions of the method, and its accelerated variant. Our numerical test confirms a high efficiency of this technique on problems of very big size.