Content uploaded by Andersen Ang
Author content
All content in this area was uploaded by Andersen Ang on Jan 30, 2021
Content may be subject to copyright.
NONNEGATIVE UNIMODAL MATRIX FACTORIZATION
Andersen Man Shun Ang1,2, Nicolas Gillis2, Arnaud Vandaele2, Hans De Sterck3
1Department of Combinatorics and Optimization, University of Waterloo, Canada
2Department of Mathematics and Operational Research, Universit´
e de Mons, Belgium
3Department of Applied Mathematics, University of Waterloo, Canada
ABSTRACT
We introduce a new Nonnegative Matrix Factorization (NMF) model
called Nonnegative Unimodal Matrix Factorization (NuMF), which
adds on top of NMF the unimodal condition on the columns of the
basis matrix. NuMF finds applications for example in analytical
chemistry. We propose a simple but naive brute-force heuristics
strategy based on accelerated projected gradient. It is then improved
by using multi-grid for which we prove that the restriction operator
preserves the unimodality. We also present two preliminary results
regarding the uniqueness of the solution, that is, the identifiability, of
NuMF. Empirical results on synthetic and real datasets confirm the
effectiveness of the algorithm and illustrate the theoretical results on
NuMF.
Index Terms—Nonnegative Matrix Factorization, Unimodal-
ity, Multi-grid method, fast gradient method
1. INTRODUCTION
Nonnegative Matrix Factorization (NMF) [1] is the following prob-
lem: given a matrix M∈Rm×n
+and a factorization rank r∈N, find
W∈Rm×r
+and H∈Rr×n
+such that WH uM. In this work, we
introduce a new NMF model, namely Nonnegative Unimodal Matrix
Factorization (NuMF), which adds on top of NMF a condition that
the columns of Ware Nonnegative unimodal (Nu).
Definition 1 (Nonnegative unimodality) A vector x∈Rmis Nu
if there exists an integer p∈[m] := [1,2,...,m]such that
0≤x1≤x2≤ ··· ≤ xpand xp≥xp+1 ≥ ··· ≥ xm≥0.(1)
We let Um,p
+be the set of vectors fulfilling (1), and let Um
+be the
union of all Um,p
+for p∈[m]. A matrix Xis Nu if all its columns
are Nu.
Remark 1 Note that the value of pis not necessarily unique, and
prefers to the location of the change of tonicity, from increasing to
decreasing. Nu generalizes the notion of log-concavity [2].
Nu finds applications in pure mathematics [2], but in this work we
focus on the applications in analytical chemistry, for examples the
curve resolution problem, flow injection analysis, and gas chro-
matography–mass spectrometry (GCMS) [3]. See Section 4 for an
example of a GCMS dataset.
NG acknowledges the support by the European Research Council (ERC
starting grant No 679515), the Fonds de la Recherche Scientifique - FNRS
and the Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO) under
EOS project O005318F-RG47. HDS acknowledges support by NSERC of
Canada.
Definition 2 (NuMF) Given M∈Rm×n
+and r∈N, solve
min
W,H
1
2kM−WHk2
Fsubject to H≥0,
wj∈ Um
+for all j∈[r],
w>
j1m= 1 for all j∈[r],
(2)
where H≥0means His element-wise nonnegative, wjis the jth
column of Wand 1mis the vector of ones in Rm. The normalization
constraint w>
j1m= 1 is used to handle the scaling ambiguity of the
solution (W,H); see [1].
Contributions. We propose an algorithm to solve NuMF. The
algorithm is a combination of brute-force heuristics, accelerated pro-
jected gradient (APG), and a multi-grid method (MG). Theoretically,
we justify the use of MG as a dimension reduction step in the algo-
rithm by proving that a restriction operator preserves Nu (Theorem
1). We also present two preliminary results regarding the unique-
ness of the solution, that is, the identifiability, of NuMF. Finally we
present numerical experiments to support the effectiveness of the al-
gorithm, and illustrate the theoretical results regarding identifiability.
2. ALGORITHM
We use block-coordinate descent to solve (2): starting with an ini-
tial pair (W0,H0), we solve the optimization subproblem on H
while fixing W, then we solve the subproblem on Wwhile fix-
ing H. Such alternating minimization is repeated until the sequence
{Wk,Hk}k∈Nconverges. In particular, we employ the HALS algo-
rithm ([4], see also [1]) that updates the columns of Wand rows of
Hone-by-one, which runs in O(mnr)operations. The subproblem
on the ith row of H, denoted as hi, has the following closed-form
solution
h=hM>
iwii+.kwik2
2,(3)
where Mi=M−WH +wihiand [·]+= max{·,0}. The
main difficulty in solving NuMF comes from the subproblem on W,
which is a nonconvex problem.
2.1. Subproblem on Wis nonconvex
The subproblem on Wis nonconvex because of the Nu set. The set
Um
+=SiUm,i
+, which is the union of mdisjoint convex sets, is
nonconvex for m≥3. For example, let eibe the standard basis
vector, both eiand ejare Nu, but the vector (1 −λ)ej+λejwith
λin the interval [0,1] is not Nu if |j−i| ≥ 2. Note that the union
Um,p
+∪ Um,q
+is convex if |p−q| ≤ 1, and it means that x∈ Um
+
if there exists an integer p∈[m]such that x∈ Um,p
+∪ Um,p+1
+,
and the Nu membership of xcan be characterized by an inequality
Upx≥0, where Upis a m-by-mmatrix built by two first-order
difference operators D, as shown in the following equation
0≤x1
x1≤x2
.
.
.
xp−1≤xp
xp+1 ≥xp+2
.
.
.
xm−1≥xm
xm≥0
| {z }
x∈Um,p
+∪ Um,p+1
+
⇐⇒ Upx≥0,(4a)
where
Up=
1
−1 1
......
−1 1
p×p
| {z }
Dp×p
0p×(m−p)
0(m−p)×pD>
(m−p)×(m−p)
.(4b)
Based on this characterization, if the value of pof the vector wi
is known, the subproblem on wi, under the HALS framework, is a
linearly constrained quadratic programming problem:
min
wi
khik2
2
2kwik2
2− hMi(hi)>,wii+c
subject to Upiwi≥0and w>
i1m= 1.
(5)
2.2. Accelerated Projected Gradient (APG) for (5)
We solve (5) by accelerated projected gradient (APG) [5]. The fea-
sible set {wi|Upiwi≥0,w>
i1= 1}in (5) is difficult to project
onto, so let us reformulate (5). Note that the square matrix Upiin
(4) is non-singular. The change of variable y=Upiwigives
argmin
y
1
2hQy,yi − hp,yis.t. y≥0,y>b= 1,(6)
where Q=khik2
2U−>
piU−1
pi,p=U−1
piMi(hi)>and b=U−1
pi1.
We solve (6) by APG; see Algorithm 1. Then we convert the solution
y∗of (6) to w∗
ithat solves (5) via w∗
i=U−1
piy∗.
Algorithm 1: Accelerated Projected Gradient (APG)
Input : Q∈Rm×m,p,b
Output: Vector ythat approximately solves (6).
Initialize ˆ
y0=y0∈Rm;
For k= 1, . . . until some criteria is satisfied do
yk=Pˆ
yk−1−Qˆ
yk−1−p
kQk2% Projected gradient step;
ˆ
yk=yk+k−1
k+2 (yk−yk−1)% Extrapolation step;
End for
The key in APG is the projection P. Given a vector z,P(z)is
defined as
P(z) = argmin
y
1
2ky−zk2
2s.t. y≥0,y>b= 1.(7)
This problem is the projection onto an irregular simplex described
by the vector b=U−1
pi1. As Upiin (4) is built by two first-order
difference operators and is thereby block tri-diagonal, its inverse is
positive block tri-diagonal and thus b>0. This implies that (7) sat-
isfies the Slater’s condition, that is, the feasible set has a non-empty
relative interior, which guarantees strong duality. The solution to
(7) can be derived from the partial Lagrangian associated with the
equality constraint:
y∗= min
y≥0max
νL(y, ν) = 1
2ky−zk2
2+ν(y>b−1),
which has a closed-form solution given by the soft-thresholding
y∗= [z−νb]+, where the Lagrangian multiplier νis the root of
the following piece-wise linear equation
m
X
i=1
max n0, zi−νbiobi−1 = 0,(8)
where zi
bi
(i∈[m]) are the break points. Assuming there are
K≤mnonzero break points, Problem (8) can be solved in
O(m+Klog m)operations by sorting the break points. The com-
plexity of the projection step is in between O(m)and O(mlog m),
depending on K; see also the discussion in [6]. We implemented
an efficient and robust MATLAB code inspired by [7] to solve this
problem.
2.3. Brute-force algorithm for pi
If all piare known, solving (5) for all i∈[r]gives the solution to W
for NuMF. In general the pi’s are unknown and should be optimized.
In this sense, NuMF is a nonconvex problem with rinteger variables
p1,...,pr. A first naive strategy is to solve it using brute force: try
all the even integers in [m]on pifor solving (5), pick the one with
the smallest objective function value as the solution. This requires
O(m2nr)operations and hence does not scale linearly with the size
of the data. This brute-force strategy is ineffective for large m.
2.4. Multi-grid as the dimension reduction step
We now discuss an idea of using MG to speed up the computation.
The reason why MG is used is that it preserves Nu (Theorem 1),
which is not the case for other dimension reduction techniques such
as PCA or sampling. Algorithm 2 shows the algorithm for solving
NuMF with MG. First we use a restriction operator Ron the data to
form a smaller problem in a coarse grid: in the general N-level MG,
the vector RNRN−1. . . R1whas the row-dimension of mNm.
For mNthat is sufficiently small, we can run the brute-force search
to estimate p. The cost of brute-force search is now reduced from
searching the even integers in [m]to those in [mN]. After we solve
the problem on the coarse grid, we interpolate the solution back to
the original fine grid by interpolation, which can be computed as the
left-multiplication with the matrix R>with a scaling factor. Lastly
we solve the problem on the fine grid with the information p0, and
no further brute-force search is needed.
We now discuss the details of the restriction.
Definition 3 Restriction operator Ris defined as xRx, where
R∈Rm1×m
+with m1< m has the form of (9). The operator is
defined column-wise on a matrix, i.e., RX := [Rx1. . . Rxn].
There are many choices to build R, for simplicity, we use
R(a, b) =
a b
b a b
.........
b a b
b a
,a > 0, b > 0,
a+ 2b= 1.(9)
We now show that MG preserves Nu. Let us define Nm,p
+=Um,p
+∪
Um,p+1
+, which is a subset of the Nu vectors. We have the following
result.
Theorem 1 Let x∈ Um,p
+and R∈Rm1×mdefined in (9). Then
y=Rx ∈ Nm1,py
+with py∈ {bp
2+ 1c,bp
2c}.
Proof First, we assume pis even, without loss of generality, by
considering the vector [0,x]when pis odd which does not change
the unimodality and increases pby one.
Then let us decompose Ras R=A+B+C, where Aonly
contains the elements ain R,Bonly contains the elements bon the
right of ain R, and Conly contains the elements bon the left of
ain R. Note that A,B,Care sampling operators multiplied by a
constant. These sampling operators are either picking the odd or
even indices of x, which give Nu vectors since any sub-vector of a
Nu vector is Nu. It remains to show that the sum Ax +Bx +Cx
belongs to Nm,py
+for py∈ {bp
2+ 1c,bp
2c}. Since pis even and the
matrices Band Csample the even entries of x, the vectors Bx and
Cx belong to Um1,p
2
+. It is easy to see that Ax belongs to Um1,py
+
with pyeither p
2or p
2+ 1, which concludes the proof.
Algorithm 2: MG-BFS for solving NuMF
Input : M∈Rm×n
+,r∈N,W0,H0
Output: W,Hthat approximately solve (2)
1. Perform restriction:
M[N]=RN. . . R1M,W[N]
0=RN. . . R1W0.;
2. Solve NuMF on the coarse grid:
[W[N],H0,p[N]] = HALS(M[N],W[N]
0,H0),
with hiupdated using (3), and wiupdated using APG
discussed in section 2.2.;
3. Interpolation: [W0,p0] = Interpolate(W[N],p[N]).;
4. Solve the problem in the original grid
[W,H] = HALS(M,W0,H0,p0).
* Steps 2-4 can be repeated several time.
Lastly, we give a few remarks concerning Algorithm 2. In prac-
tice, due to errors introduced by the restriction and interpolation pro-
cesses, the vector p0obtained from p[N]may not be precisely ac-
curate for solving NuMF in the original dimension. As a safeguard,
we still perform a brute-force search for pvalues of NuMF in the
original dimension, but we only search it in a small neighborhood of
p0, depending on the grid size. Say, for Rdefined in (9), we search
±5around p0.
The restriction can be computed at once by forming directly the
operator R=RN. . . R1∈RmN×mfor a total computational cost
of O(m2
Nnr) + O(mnr)operations for Algorithm 2. A natural
choice for mNis therefore mN=O(√m)so that the computa-
tional cost of Algorithm 2 would be asymptotically equivalent to
HALS.
Remark 2 (Peak finding) Another possible heuristic is to use peak
detection, for example using findpeaks in MATLAB, to preselect
a small number of candidate values for the pi’s. This strategy is
explored in [8] and will be investigated in future work.
3. IDENTIFIABILITY
Now we present some results on the identifiability of NuMF, i.e.,
when does solving NuMF give a unique solution. We first define
the support of a vector x∈ Um
+as supp(x) := {i∈[m]|xi6= 0}
= [a, b], where the second equality is due to the fact that the support
of all Nu vector contains only a single close interval. Then, we define
the notion of strictly disjoint between supports as follows.
Definition 4 (Strictly disjoint) Given two vectors x,y∈ Um
+with
supp(x) = [ax, bx]and supp(y) = [ay, by]. The two vectors are
called strictly disjoint if ax> by+ 1.
Remark 3 Other related concepts to strictly disjoint are adjacent
and overlap [8, Section 5.3], which we do not discuss in this paper.
3.1. Preliminary results on identifiability of NuMF
Here we give the identifiability result of NuMF for two special cases.
Theorem 2 Assumes M=¯
W¯
H. Solving (2) recovers (¯
W,¯
H)if
1. ¯
Wis Nu and all the columns have strictly disjoint support.
2. ¯
H∈Rr×n
+has n≥1,k¯
hik∞>0for i∈[r].
Proof Assume there is another solution (W∗,H∗)that solves the
NuMF. The columns ¯
wjcontribute in Ma series of disjoint uni-
modal components. For the solution W∗H∗to fit M, each w∗
ihas
to fit each of these disjoint component in M, and hence W∗recovers
¯
Wup to permutation. There is no scaling ambiguity here because
of the normalization constraints w>
i1= 1. Moreover, W∗and ¯
W
have rank r, since their columns have disjoint support, and hence
H∗and ¯
Hare uniquely determined (namely, using the left inverses
of W∗and ¯
W), up to permutation.
The assumptions in Theorem 2 are strong, but are satisfied in many
GCMS data. Furthermore, the theorem holds for r≥nwhich is
uncommon for most NMF models [1].
We now present another result on general identifiability for
NuMF with rlimited to 2. This result is general in the sense that it
includes vectors with overlapping supports, which is not addressed
in the previous theorem. First we have the following lemma on
demixing two Nu vectors with non-fully overlapping supports:
Lemma 1 (On demixing two non-fully overlapping Nu vectors) Given
two non-zero vectors x,yin Um
+with supp(x)*supp(y)and
supp(x)+supp(y). If x,yare generated by two non-zero Nu
vectors u,vas x=au+bvand y=cu+dvwith nonnegative
coefficients a, b, c, d , then we have either u=x,v=yor u=y,
v=x.
Proof Since xan yhave non-overlapping supports, we can-
not have u=αvfor some α > 0, hence uand vare linearly
independent. Let X=UQ, where X:= [x,y],U:= [u,v]
and Q:= a c
b d≥0. The conditions that x,yare Nu with
supp(x)*supp(y)and supp(x)+supp(y)imply x6= 0,y6= 0,
x6=yand
supp(x)*supp(y) =⇒ ∃i∗∈[m]s.t. xi∗>0, yi∗= 0,
supp(y)*supp(x) =⇒ ∃j∗∈[m]s.t. yj∗>0, xj∗= 0.(10)
Then x6=yand u6=αvimply X,U,Qare all rank-2, hence
U=XQ−1=Xd−c
−b a 1
ad −bc , ad −bc 6= 0.(11)
Put i∗, j∗from (10) into (11), together with the fact that x,y,u,v
are nonnegative give Q−1≥0. Lastly Q≥0and Q−1≥0imply
Qis the permutation of a diagonal matrix with positive diagonal
[9], where here the diagonal matrix is the identity.
Now we can present the general identifiability of NuMF for r= 2.
Theorem 3 Assumes M=¯
W¯
H. If r= 2, solving (2) recovers
(¯
W,¯
H)if the columns of ¯
Wsatisfy the conditions of Lemma 1 and
¯
H∈Rr×n
+is full rank.
Proof It follows directly from Lemma 1.
Theorems 2 and 3 address the identifiability of NuMF from two an-
gles: the number of columns in Wand how the supports of wiin-
teract. Neither of the theorems is complete. Generalizing these the-
orems to all possible interactions between supports of wifor r≥3
is a topic of further research.
4. EXPERIMENTS
We now present experiments on NuMF. The code is available from
https://angms.science/.
Toy example on MG performance. A Nu matrix Wand a non-
negative matrix Hare constructed with (m, n, r) = (100,6,3); see
Fig. 1. The NuMF problem is solved with 0, 1 and 2 layers of MG.
Fig. 1 shows that MG significantly speeds up the convergence: more
than 50% run time reduction for 1 layer MG, and more than 75%
time reduction for 2 layers. This result shows that our method is far
superior in terms of efficiency, as other existing approaches such as
those in [3, 10] that have a similar complexity to Algorithm 2 with-
out MG.
Fig. 1. Experiment on a toy example. Top: The ground truth W
matrix and the data M.Bottom: The curve plotted against time.
All algorithms run 100 iterations and are initialized with SNPA [11].
For algorithms with MG, the computational time taken on the coarse
grid are also taken into account, as reflected by the time gap between
time 0 and the first dot in the curves.
On GCMS data of Belgian beers. We now demonstrate the reg-
ularizing power of the unimodality constraint in the factorization,
using a beer dataset [12]. Here M∈R518×947
+where each col-
umn is a GCMS spectrum. With r= 7, three methods: NuMF,
NMF [1] and separable NMF (SNMF) [13] are used to decompose
the data, and Fig. 2 shows the results. As expected, only NuMF can
decompose the data into individual Nu components, while for the
other two models, some components are highly mixed with multiple
peaks. Note however that the relative error kM−WHkF/kMkF
is similar for the three methods, and around 10%.
Fig. 2. Experiment on beer data. The bottom plots show the W’s
obtained by the three methods.
On data with r > n.We now consider NuMF on the case r > n
(more sources than samples), which is not possible for most other
NMF models. Here a GCMS data vector in R947
+is used. With
r= 8 > n = 1, we decompose this vector into runimodal com-
ponents. NuMF provides a meaningful decomposition; see Fig. 3.
Note that the first two peaks in the data satisfy Theorem 2 and hence
NuNMF identifies them perfectly. For the other peaks, their supports
overlap, and hence the decomposition is not unique. Investigating
the identifiability of NuMF on data with overlapping supports is a
direction of future research.
Fig. 3. On data M(dotted black curve) with r= 8 >1 = n.
The cyan curves are the components wihi. Relative error kM−
WHkF/kMkF= 10−8.
5. CONCLUSION
We introduced NuMF and proposed to solve it by combining APG
and MG. We showed that the restriction operator in MG preserves
Nu, and we present two preliminary identifiability results. Numeri-
cal experiments support the effectiveness of the proposed algorithm.
Future works will be to study the general identifiability of NuMF,
and to further improve the algorithm using for example peak finding
algorithms; see Remark 2.
6. REFERENCES
[1] Nicolas Gillis, “The why and how of nonnegative matrix fac-
torization,” Regularization, optimization, kernels, and support
vector machines, pp. 257–291, 2014.
[2] Richard Stanley, “Log-concave and unimodal sequences in al-
gebra, combinatorics, and geometry,” Annals of the New York
Academy of Sciences, vol. 576, no. 1, pp. 500–535, 1989.
[3] Rasmus Bro and Nicholaos Sidiropoulos, “Least squares al-
gorithms under unimodality and non-negativity constraints,”
Journal of Chemometrics: A Journal of the Chemometrics So-
ciety, vol. 12, no. 4, pp. 223–247, 1998.
[4] Andrzej Cichocki, Rafal Zdunek, and Shun-ichi Amari, “Hi-
erarchical ALS algorithms for nonnegative matrix and 3d ten-
sor factorization,” in International Conference on Independent
Component Analysis and Signal Separation. Springer, 2007,
pp. 169–176.
[5] Yurii E Nesterov, “A method for solving the convex program-
ming problem with convergence rate o (1/kˆ 2),” in Dokl. akad.
nauk Sssr, 1983, vol. 269, pp. 543–547.
[6] Laurent Condat, “Fast projection onto the simplex and the l1
ball,” Mathematical Programming, vol. 158, no. 1-2, pp. 575–
585, 2016.
[7] Laurent Condat, “Matlab code to project onto the sim-
plex or the l1 ball,” https://lcondat.github.io/
software.html, 2015.
[8] Man Shun Ang, Nonnegative Matrix and Tensor Factoriza-
tions: Models, Algorithms and Applications, Ph.D. thesis, Uni-
versity of Mons, 2020.
[9] Abraham Berman and Robert J Plemmons, Nonnegative ma-
trices in the mathematical sciences, SIAM, 1994.
[10] Junting Chen and Urbashi Mitra, “Unimodality-constrained
matrix factorization for non-parametric source localization,”
IEEE Transactions on Signal Processing, vol. 67, no. 9, pp.
2371–2386, 2019.
[11] Nicolas Gillis, “Successive nonnegative projection algorithm
for robust nonnegative blind source separation,” SIAM Journal
on Imaging Sciences, vol. 7, no. 2, pp. 1420–1450, 2014.
[12] Christophe Vanderaa, “Development of a state-of-the-art
pipeline for high throughput analysis of gas chromatography
- mass spectrometry data,” Master thesis, 2018.
[13] Nicolas Gillis and Stephen Vavasis, “Fast and robust recur-
sive algorithms for separable nonnegative matrix factoriza-
tion,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 36, no. 4, pp. 698–714, 2013.