Content uploaded by Andersen M. S. Ang

Author content

All content in this area was uploaded by Andersen M. S. Ang on Jan 30, 2021

Content may be subject to copyright.

NONNEGATIVE UNIMODAL MATRIX FACTORIZATION

Andersen Man Shun Ang1,2, Nicolas Gillis2, Arnaud Vandaele2, Hans De Sterck3

1Department of Combinatorics and Optimization, University of Waterloo, Canada

2Department of Mathematics and Operational Research, Universit´

e de Mons, Belgium

3Department of Applied Mathematics, University of Waterloo, Canada

ABSTRACT

We introduce a new Nonnegative Matrix Factorization (NMF) model

called Nonnegative Unimodal Matrix Factorization (NuMF), which

adds on top of NMF the unimodal condition on the columns of the

basis matrix. NuMF ﬁnds applications for example in analytical

chemistry. We propose a simple but naive brute-force heuristics

strategy based on accelerated projected gradient. It is then improved

by using multi-grid for which we prove that the restriction operator

preserves the unimodality. We also present two preliminary results

regarding the uniqueness of the solution, that is, the identiﬁability, of

NuMF. Empirical results on synthetic and real datasets conﬁrm the

effectiveness of the algorithm and illustrate the theoretical results on

NuMF.

Index Terms—Nonnegative Matrix Factorization, Unimodal-

ity, Multi-grid method, fast gradient method

1. INTRODUCTION

Nonnegative Matrix Factorization (NMF) [1] is the following prob-

lem: given a matrix M∈Rm×n

+and a factorization rank r∈N, ﬁnd

W∈Rm×r

+and H∈Rr×n

+such that WH uM. In this work, we

introduce a new NMF model, namely Nonnegative Unimodal Matrix

Factorization (NuMF), which adds on top of NMF a condition that

the columns of Ware Nonnegative unimodal (Nu).

Deﬁnition 1 (Nonnegative unimodality) A vector x∈Rmis Nu

if there exists an integer p∈[m] := [1,2,...,m]such that

0≤x1≤x2≤ ··· ≤ xpand xp≥xp+1 ≥ ··· ≥ xm≥0.(1)

We let Um,p

+be the set of vectors fulﬁlling (1), and let Um

+be the

union of all Um,p

+for p∈[m]. A matrix Xis Nu if all its columns

are Nu.

Remark 1 Note that the value of pis not necessarily unique, and

prefers to the location of the change of tonicity, from increasing to

decreasing. Nu generalizes the notion of log-concavity [2].

Nu ﬁnds applications in pure mathematics [2], but in this work we

focus on the applications in analytical chemistry, for examples the

curve resolution problem, ﬂow injection analysis, and gas chro-

matography–mass spectrometry (GCMS) [3]. See Section 4 for an

example of a GCMS dataset.

NG acknowledges the support by the European Research Council (ERC

starting grant No 679515), the Fonds de la Recherche Scientiﬁque - FNRS

and the Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO) under

EOS project O005318F-RG47. HDS acknowledges support by NSERC of

Canada.

Deﬁnition 2 (NuMF) Given M∈Rm×n

+and r∈N, solve

min

W,H

1

2kM−WHk2

Fsubject to H≥0,

wj∈ Um

+for all j∈[r],

w>

j1m= 1 for all j∈[r],

(2)

where H≥0means His element-wise nonnegative, wjis the jth

column of Wand 1mis the vector of ones in Rm. The normalization

constraint w>

j1m= 1 is used to handle the scaling ambiguity of the

solution (W,H); see [1].

Contributions. We propose an algorithm to solve NuMF. The

algorithm is a combination of brute-force heuristics, accelerated pro-

jected gradient (APG), and a multi-grid method (MG). Theoretically,

we justify the use of MG as a dimension reduction step in the algo-

rithm by proving that a restriction operator preserves Nu (Theorem

1). We also present two preliminary results regarding the unique-

ness of the solution, that is, the identiﬁability, of NuMF. Finally we

present numerical experiments to support the effectiveness of the al-

gorithm, and illustrate the theoretical results regarding identiﬁability.

2. ALGORITHM

We use block-coordinate descent to solve (2): starting with an ini-

tial pair (W0,H0), we solve the optimization subproblem on H

while ﬁxing W, then we solve the subproblem on Wwhile ﬁx-

ing H. Such alternating minimization is repeated until the sequence

{Wk,Hk}k∈Nconverges. In particular, we employ the HALS algo-

rithm ([4], see also [1]) that updates the columns of Wand rows of

Hone-by-one, which runs in O(mnr)operations. The subproblem

on the ith row of H, denoted as hi, has the following closed-form

solution

h=hM>

iwii+.kwik2

2,(3)

where Mi=M−WH +wihiand [·]+= max{·,0}. The

main difﬁculty in solving NuMF comes from the subproblem on W,

which is a nonconvex problem.

2.1. Subproblem on Wis nonconvex

The subproblem on Wis nonconvex because of the Nu set. The set

Um

+=SiUm,i

+, which is the union of mdisjoint convex sets, is

nonconvex for m≥3. For example, let eibe the standard basis

vector, both eiand ejare Nu, but the vector (1 −λ)ej+λejwith

λin the interval [0,1] is not Nu if |j−i| ≥ 2. Note that the union

Um,p

+∪ Um,q

+is convex if |p−q| ≤ 1, and it means that x∈ Um

+

if there exists an integer p∈[m]such that x∈ Um,p

+∪ Um,p+1

+,

and the Nu membership of xcan be characterized by an inequality

Upx≥0, where Upis a m-by-mmatrix built by two ﬁrst-order

difference operators D, as shown in the following equation

0≤x1

x1≤x2

.

.

.

xp−1≤xp

xp+1 ≥xp+2

.

.

.

xm−1≥xm

xm≥0

| {z }

x∈Um,p

+∪ Um,p+1

+

⇐⇒ Upx≥0,(4a)

where

Up=

1

−1 1

......

−1 1

p×p

| {z }

Dp×p

0p×(m−p)

0(m−p)×pD>

(m−p)×(m−p)

.(4b)

Based on this characterization, if the value of pof the vector wi

is known, the subproblem on wi, under the HALS framework, is a

linearly constrained quadratic programming problem:

min

wi

khik2

2

2kwik2

2− hMi(hi)>,wii+c

subject to Upiwi≥0and w>

i1m= 1.

(5)

2.2. Accelerated Projected Gradient (APG) for (5)

We solve (5) by accelerated projected gradient (APG) [5]. The fea-

sible set {wi|Upiwi≥0,w>

i1= 1}in (5) is difﬁcult to project

onto, so let us reformulate (5). Note that the square matrix Upiin

(4) is non-singular. The change of variable y=Upiwigives

argmin

y

1

2hQy,yi − hp,yis.t. y≥0,y>b= 1,(6)

where Q=khik2

2U−>

piU−1

pi,p=U−1

piMi(hi)>and b=U−1

pi1.

We solve (6) by APG; see Algorithm 1. Then we convert the solution

y∗of (6) to w∗

ithat solves (5) via w∗

i=U−1

piy∗.

Algorithm 1: Accelerated Projected Gradient (APG)

Input : Q∈Rm×m,p,b

Output: Vector ythat approximately solves (6).

Initialize ˆ

y0=y0∈Rm;

For k= 1, . . . until some criteria is satisﬁed do

yk=Pˆ

yk−1−Qˆ

yk−1−p

kQk2% Projected gradient step;

ˆ

yk=yk+k−1

k+2 (yk−yk−1)% Extrapolation step;

End for

The key in APG is the projection P. Given a vector z,P(z)is

deﬁned as

P(z) = argmin

y

1

2ky−zk2

2s.t. y≥0,y>b= 1.(7)

This problem is the projection onto an irregular simplex described

by the vector b=U−1

pi1. As Upiin (4) is built by two ﬁrst-order

difference operators and is thereby block tri-diagonal, its inverse is

positive block tri-diagonal and thus b>0. This implies that (7) sat-

isﬁes the Slater’s condition, that is, the feasible set has a non-empty

relative interior, which guarantees strong duality. The solution to

(7) can be derived from the partial Lagrangian associated with the

equality constraint:

y∗= min

y≥0max

νL(y, ν) = 1

2ky−zk2

2+ν(y>b−1),

which has a closed-form solution given by the soft-thresholding

y∗= [z−νb]+, where the Lagrangian multiplier νis the root of

the following piece-wise linear equation

m

X

i=1

max n0, zi−νbiobi−1 = 0,(8)

where zi

bi

(i∈[m]) are the break points. Assuming there are

K≤mnonzero break points, Problem (8) can be solved in

O(m+Klog m)operations by sorting the break points. The com-

plexity of the projection step is in between O(m)and O(mlog m),

depending on K; see also the discussion in [6]. We implemented

an efﬁcient and robust MATLAB code inspired by [7] to solve this

problem.

2.3. Brute-force algorithm for pi

If all piare known, solving (5) for all i∈[r]gives the solution to W

for NuMF. In general the pi’s are unknown and should be optimized.

In this sense, NuMF is a nonconvex problem with rinteger variables

p1,...,pr. A ﬁrst naive strategy is to solve it using brute force: try

all the even integers in [m]on pifor solving (5), pick the one with

the smallest objective function value as the solution. This requires

O(m2nr)operations and hence does not scale linearly with the size

of the data. This brute-force strategy is ineffective for large m.

2.4. Multi-grid as the dimension reduction step

We now discuss an idea of using MG to speed up the computation.

The reason why MG is used is that it preserves Nu (Theorem 1),

which is not the case for other dimension reduction techniques such

as PCA or sampling. Algorithm 2 shows the algorithm for solving

NuMF with MG. First we use a restriction operator Ron the data to

form a smaller problem in a coarse grid: in the general N-level MG,

the vector RNRN−1. . . R1whas the row-dimension of mNm.

For mNthat is sufﬁciently small, we can run the brute-force search

to estimate p. The cost of brute-force search is now reduced from

searching the even integers in [m]to those in [mN]. After we solve

the problem on the coarse grid, we interpolate the solution back to

the original ﬁne grid by interpolation, which can be computed as the

left-multiplication with the matrix R>with a scaling factor. Lastly

we solve the problem on the ﬁne grid with the information p0, and

no further brute-force search is needed.

We now discuss the details of the restriction.

Deﬁnition 3 Restriction operator Ris deﬁned as xRx, where

R∈Rm1×m

+with m1< m has the form of (9). The operator is

deﬁned column-wise on a matrix, i.e., RX := [Rx1. . . Rxn].

There are many choices to build R, for simplicity, we use

R(a, b) =

a b

b a b

.........

b a b

b a

,a > 0, b > 0,

a+ 2b= 1.(9)

We now show that MG preserves Nu. Let us deﬁne Nm,p

+=Um,p

+∪

Um,p+1

+, which is a subset of the Nu vectors. We have the following

result.

Theorem 1 Let x∈ Um,p

+and R∈Rm1×mdeﬁned in (9). Then

y=Rx ∈ Nm1,py

+with py∈ {bp

2+ 1c,bp

2c}.

Proof First, we assume pis even, without loss of generality, by

considering the vector [0,x]when pis odd which does not change

the unimodality and increases pby one.

Then let us decompose Ras R=A+B+C, where Aonly

contains the elements ain R,Bonly contains the elements bon the

right of ain R, and Conly contains the elements bon the left of

ain R. Note that A,B,Care sampling operators multiplied by a

constant. These sampling operators are either picking the odd or

even indices of x, which give Nu vectors since any sub-vector of a

Nu vector is Nu. It remains to show that the sum Ax +Bx +Cx

belongs to Nm,py

+for py∈ {bp

2+ 1c,bp

2c}. Since pis even and the

matrices Band Csample the even entries of x, the vectors Bx and

Cx belong to Um1,p

2

+. It is easy to see that Ax belongs to Um1,py

+

with pyeither p

2or p

2+ 1, which concludes the proof.

Algorithm 2: MG-BFS for solving NuMF

Input : M∈Rm×n

+,r∈N,W0,H0

Output: W,Hthat approximately solve (2)

1. Perform restriction:

M[N]=RN. . . R1M,W[N]

0=RN. . . R1W0.;

2. Solve NuMF on the coarse grid:

[W[N],H0,p[N]] = HALS(M[N],W[N]

0,H0),

with hiupdated using (3), and wiupdated using APG

discussed in section 2.2.;

3. Interpolation: [W0,p0] = Interpolate(W[N],p[N]).;

4. Solve the problem in the original grid

[W,H] = HALS(M,W0,H0,p0).

* Steps 2-4 can be repeated several time.

Lastly, we give a few remarks concerning Algorithm 2. In prac-

tice, due to errors introduced by the restriction and interpolation pro-

cesses, the vector p0obtained from p[N]may not be precisely ac-

curate for solving NuMF in the original dimension. As a safeguard,

we still perform a brute-force search for pvalues of NuMF in the

original dimension, but we only search it in a small neighborhood of

p0, depending on the grid size. Say, for Rdeﬁned in (9), we search

±5around p0.

The restriction can be computed at once by forming directly the

operator R=RN. . . R1∈RmN×mfor a total computational cost

of O(m2

Nnr) + O(mnr)operations for Algorithm 2. A natural

choice for mNis therefore mN=O(√m)so that the computa-

tional cost of Algorithm 2 would be asymptotically equivalent to

HALS.

Remark 2 (Peak ﬁnding) Another possible heuristic is to use peak

detection, for example using findpeaks in MATLAB, to preselect

a small number of candidate values for the pi’s. This strategy is

explored in [8] and will be investigated in future work.

3. IDENTIFIABILITY

Now we present some results on the identiﬁability of NuMF, i.e.,

when does solving NuMF give a unique solution. We ﬁrst deﬁne

the support of a vector x∈ Um

+as supp(x) := {i∈[m]|xi6= 0}

= [a, b], where the second equality is due to the fact that the support

of all Nu vector contains only a single close interval. Then, we deﬁne

the notion of strictly disjoint between supports as follows.

Deﬁnition 4 (Strictly disjoint) Given two vectors x,y∈ Um

+with

supp(x) = [ax, bx]and supp(y) = [ay, by]. The two vectors are

called strictly disjoint if ax> by+ 1.

Remark 3 Other related concepts to strictly disjoint are adjacent

and overlap [8, Section 5.3], which we do not discuss in this paper.

3.1. Preliminary results on identiﬁability of NuMF

Here we give the identiﬁability result of NuMF for two special cases.

Theorem 2 Assumes M=¯

W¯

H. Solving (2) recovers (¯

W,¯

H)if

1. ¯

Wis Nu and all the columns have strictly disjoint support.

2. ¯

H∈Rr×n

+has n≥1,k¯

hik∞>0for i∈[r].

Proof Assume there is another solution (W∗,H∗)that solves the

NuMF. The columns ¯

wjcontribute in Ma series of disjoint uni-

modal components. For the solution W∗H∗to ﬁt M, each w∗

ihas

to ﬁt each of these disjoint component in M, and hence W∗recovers

¯

Wup to permutation. There is no scaling ambiguity here because

of the normalization constraints w>

i1= 1. Moreover, W∗and ¯

W

have rank r, since their columns have disjoint support, and hence

H∗and ¯

Hare uniquely determined (namely, using the left inverses

of W∗and ¯

W), up to permutation.

The assumptions in Theorem 2 are strong, but are satisﬁed in many

GCMS data. Furthermore, the theorem holds for r≥nwhich is

uncommon for most NMF models [1].

We now present another result on general identiﬁability for

NuMF with rlimited to 2. This result is general in the sense that it

includes vectors with overlapping supports, which is not addressed

in the previous theorem. First we have the following lemma on

demixing two Nu vectors with non-fully overlapping supports:

Lemma 1 (On demixing two non-fully overlapping Nu vectors) Given

two non-zero vectors x,yin Um

+with supp(x)*supp(y)and

supp(x)+supp(y). If x,yare generated by two non-zero Nu

vectors u,vas x=au+bvand y=cu+dvwith nonnegative

coefﬁcients a, b, c, d , then we have either u=x,v=yor u=y,

v=x.

Proof Since xan yhave non-overlapping supports, we can-

not have u=αvfor some α > 0, hence uand vare linearly

independent. Let X=UQ, where X:= [x,y],U:= [u,v]

and Q:= a c

b d≥0. The conditions that x,yare Nu with

supp(x)*supp(y)and supp(x)+supp(y)imply x6= 0,y6= 0,

x6=yand

supp(x)*supp(y) =⇒ ∃i∗∈[m]s.t. xi∗>0, yi∗= 0,

supp(y)*supp(x) =⇒ ∃j∗∈[m]s.t. yj∗>0, xj∗= 0.(10)

Then x6=yand u6=αvimply X,U,Qare all rank-2, hence

U=XQ−1=Xd−c

−b a 1

ad −bc , ad −bc 6= 0.(11)

Put i∗, j∗from (10) into (11), together with the fact that x,y,u,v

are nonnegative give Q−1≥0. Lastly Q≥0and Q−1≥0imply

Qis the permutation of a diagonal matrix with positive diagonal

[9], where here the diagonal matrix is the identity.

Now we can present the general identiﬁability of NuMF for r= 2.

Theorem 3 Assumes M=¯

W¯

H. If r= 2, solving (2) recovers

(¯

W,¯

H)if the columns of ¯

Wsatisfy the conditions of Lemma 1 and

¯

H∈Rr×n

+is full rank.

Proof It follows directly from Lemma 1.

Theorems 2 and 3 address the identiﬁability of NuMF from two an-

gles: the number of columns in Wand how the supports of wiin-

teract. Neither of the theorems is complete. Generalizing these the-

orems to all possible interactions between supports of wifor r≥3

is a topic of further research.

4. EXPERIMENTS

We now present experiments on NuMF. The code is available from

https://angms.science/.

Toy example on MG performance. A Nu matrix Wand a non-

negative matrix Hare constructed with (m, n, r) = (100,6,3); see

Fig. 1. The NuMF problem is solved with 0, 1 and 2 layers of MG.

Fig. 1 shows that MG signiﬁcantly speeds up the convergence: more

than 50% run time reduction for 1 layer MG, and more than 75%

time reduction for 2 layers. This result shows that our method is far

superior in terms of efﬁciency, as other existing approaches such as

those in [3, 10] that have a similar complexity to Algorithm 2 with-

out MG.

Fig. 1. Experiment on a toy example. Top: The ground truth W

matrix and the data M.Bottom: The curve plotted against time.

All algorithms run 100 iterations and are initialized with SNPA [11].

For algorithms with MG, the computational time taken on the coarse

grid are also taken into account, as reﬂected by the time gap between

time 0 and the ﬁrst dot in the curves.

On GCMS data of Belgian beers. We now demonstrate the reg-

ularizing power of the unimodality constraint in the factorization,

using a beer dataset [12]. Here M∈R518×947

+where each col-

umn is a GCMS spectrum. With r= 7, three methods: NuMF,

NMF [1] and separable NMF (SNMF) [13] are used to decompose

the data, and Fig. 2 shows the results. As expected, only NuMF can

decompose the data into individual Nu components, while for the

other two models, some components are highly mixed with multiple

peaks. Note however that the relative error kM−WHkF/kMkF

is similar for the three methods, and around 10%.

Fig. 2. Experiment on beer data. The bottom plots show the W’s

obtained by the three methods.

On data with r > n.We now consider NuMF on the case r > n

(more sources than samples), which is not possible for most other

NMF models. Here a GCMS data vector in R947

+is used. With

r= 8 > n = 1, we decompose this vector into runimodal com-

ponents. NuMF provides a meaningful decomposition; see Fig. 3.

Note that the ﬁrst two peaks in the data satisfy Theorem 2 and hence

NuNMF identiﬁes them perfectly. For the other peaks, their supports

overlap, and hence the decomposition is not unique. Investigating

the identiﬁability of NuMF on data with overlapping supports is a

direction of future research.

Fig. 3. On data M(dotted black curve) with r= 8 >1 = n.

The cyan curves are the components wihi. Relative error kM−

WHkF/kMkF= 10−8.

5. CONCLUSION

We introduced NuMF and proposed to solve it by combining APG

and MG. We showed that the restriction operator in MG preserves

Nu, and we present two preliminary identiﬁability results. Numeri-

cal experiments support the effectiveness of the proposed algorithm.

Future works will be to study the general identiﬁability of NuMF,

and to further improve the algorithm using for example peak ﬁnding

algorithms; see Remark 2.

6. REFERENCES

[1] Nicolas Gillis, “The why and how of nonnegative matrix fac-

torization,” Regularization, optimization, kernels, and support

vector machines, pp. 257–291, 2014.

[2] Richard Stanley, “Log-concave and unimodal sequences in al-

gebra, combinatorics, and geometry,” Annals of the New York

Academy of Sciences, vol. 576, no. 1, pp. 500–535, 1989.

[3] Rasmus Bro and Nicholaos Sidiropoulos, “Least squares al-

gorithms under unimodality and non-negativity constraints,”

Journal of Chemometrics: A Journal of the Chemometrics So-

ciety, vol. 12, no. 4, pp. 223–247, 1998.

[4] Andrzej Cichocki, Rafal Zdunek, and Shun-ichi Amari, “Hi-

erarchical ALS algorithms for nonnegative matrix and 3d ten-

sor factorization,” in International Conference on Independent

Component Analysis and Signal Separation. Springer, 2007,

pp. 169–176.

[5] Yurii E Nesterov, “A method for solving the convex program-

ming problem with convergence rate o (1/kˆ 2),” in Dokl. akad.

nauk Sssr, 1983, vol. 269, pp. 543–547.

[6] Laurent Condat, “Fast projection onto the simplex and the l1

ball,” Mathematical Programming, vol. 158, no. 1-2, pp. 575–

585, 2016.

[7] Laurent Condat, “Matlab code to project onto the sim-

plex or the l1 ball,” https://lcondat.github.io/

software.html, 2015.

[8] Man Shun Ang, Nonnegative Matrix and Tensor Factoriza-

tions: Models, Algorithms and Applications, Ph.D. thesis, Uni-

versity of Mons, 2020.

[9] Abraham Berman and Robert J Plemmons, Nonnegative ma-

trices in the mathematical sciences, SIAM, 1994.

[10] Junting Chen and Urbashi Mitra, “Unimodality-constrained

matrix factorization for non-parametric source localization,”

IEEE Transactions on Signal Processing, vol. 67, no. 9, pp.

2371–2386, 2019.

[11] Nicolas Gillis, “Successive nonnegative projection algorithm

for robust nonnegative blind source separation,” SIAM Journal

on Imaging Sciences, vol. 7, no. 2, pp. 1420–1450, 2014.

[12] Christophe Vanderaa, “Development of a state-of-the-art

pipeline for high throughput analysis of gas chromatography

- mass spectrometry data,” Master thesis, 2018.

[13] Nicolas Gillis and Stephen Vavasis, “Fast and robust recur-

sive algorithms for separable nonnegative matrix factoriza-

tion,” IEEE Transactions on Pattern Analysis and Machine

Intelligence, vol. 36, no. 4, pp. 698–714, 2013.