PreprintPDF Available

Nonnegative Unimodal Matrix Factorization



We introduce a new Nonnegative Matrix Factorization (NMF) model called Nonnegative Unimodal Matrix Factorization (NuMF), which adds on top of NMF the unimodal condition on the columns of the basis matrix. NuMF finds applications for example in analytical chemistry. We propose a simple but naive brute-force heuristics strategy based on accelerated projected gradient. It is then improved by using multi-grid for which we prove that the restriction operator preserves the unimodality. We also present two preliminary results regarding the uniqueness of the solution, that is, the identifiability, of NuMF. Empirical results on synthetic and real datasets confirm the effectiveness of the algorithm and illustrate the theoretical results on NuMF.
Andersen Man Shun Ang1,2, Nicolas Gillis2, Arnaud Vandaele2, Hans De Sterck3
1Department of Combinatorics and Optimization, University of Waterloo, Canada
2Department of Mathematics and Operational Research, Universit´
e de Mons, Belgium
3Department of Applied Mathematics, University of Waterloo, Canada
We introduce a new Nonnegative Matrix Factorization (NMF) model
called Nonnegative Unimodal Matrix Factorization (NuMF), which
adds on top of NMF the unimodal condition on the columns of the
basis matrix. NuMF finds applications for example in analytical
chemistry. We propose a simple but naive brute-force heuristics
strategy based on accelerated projected gradient. It is then improved
by using multi-grid for which we prove that the restriction operator
preserves the unimodality. We also present two preliminary results
regarding the uniqueness of the solution, that is, the identifiability, of
NuMF. Empirical results on synthetic and real datasets confirm the
effectiveness of the algorithm and illustrate the theoretical results on
Index TermsNonnegative Matrix Factorization, Unimodal-
ity, Multi-grid method, fast gradient method
Nonnegative Matrix Factorization (NMF) [1] is the following prob-
lem: given a matrix MRm×n
+and a factorization rank rN, find
+and HRr×n
+such that WH uM. In this work, we
introduce a new NMF model, namely Nonnegative Unimodal Matrix
Factorization (NuMF), which adds on top of NMF a condition that
the columns of Ware Nonnegative unimodal (Nu).
Definition 1 (Nonnegative unimodality) A vector xRmis Nu
if there exists an integer p[m] := [1,2,...,m]such that
0x1x2 ··· xpand xpxp+1 ··· xm0.(1)
We let Um,p
+be the set of vectors fulfilling (1), and let Um
+be the
union of all Um,p
+for p[m]. A matrix Xis Nu if all its columns
are Nu.
Remark 1 Note that the value of pis not necessarily unique, and
prefers to the location of the change of tonicity, from increasing to
decreasing. Nu generalizes the notion of log-concavity [2].
Nu finds applications in pure mathematics [2], but in this work we
focus on the applications in analytical chemistry, for examples the
curve resolution problem, flow injection analysis, and gas chro-
matography–mass spectrometry (GCMS) [3]. See Section 4 for an
example of a GCMS dataset.
NG acknowledges the support by the European Research Council (ERC
starting grant No 679515), the Fonds de la Recherche Scientifique - FNRS
and the Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO) under
EOS project O005318F-RG47. HDS acknowledges support by NSERC of
Definition 2 (NuMF) Given MRm×n
+and rN, solve
Fsubject to H0,
wj Um
+for all j[r],
j1m= 1 for all j[r],
where H0means His element-wise nonnegative, wjis the jth
column of Wand 1mis the vector of ones in Rm. The normalization
constraint w>
j1m= 1 is used to handle the scaling ambiguity of the
solution (W,H); see [1].
Contributions. We propose an algorithm to solve NuMF. The
algorithm is a combination of brute-force heuristics, accelerated pro-
jected gradient (APG), and a multi-grid method (MG). Theoretically,
we justify the use of MG as a dimension reduction step in the algo-
rithm by proving that a restriction operator preserves Nu (Theorem
1). We also present two preliminary results regarding the unique-
ness of the solution, that is, the identifiability, of NuMF. Finally we
present numerical experiments to support the effectiveness of the al-
gorithm, and illustrate the theoretical results regarding identifiability.
We use block-coordinate descent to solve (2): starting with an ini-
tial pair (W0,H0), we solve the optimization subproblem on H
while fixing W, then we solve the subproblem on Wwhile fix-
ing H. Such alternating minimization is repeated until the sequence
{Wk,Hk}kNconverges. In particular, we employ the HALS algo-
rithm ([4], see also [1]) that updates the columns of Wand rows of
Hone-by-one, which runs in O(mnr)operations. The subproblem
on the ith row of H, denoted as hi, has the following closed-form
where Mi=MWH +wihiand [·]+= max{·,0}. The
main difficulty in solving NuMF comes from the subproblem on W,
which is a nonconvex problem.
2.1. Subproblem on Wis nonconvex
The subproblem on Wis nonconvex because of the Nu set. The set
+, which is the union of mdisjoint convex sets, is
nonconvex for m3. For example, let eibe the standard basis
vector, both eiand ejare Nu, but the vector (1 λ)ej+λejwith
λin the interval [0,1] is not Nu if |ji| 2. Note that the union
+ Um,q
+is convex if |pq| 1, and it means that x Um
if there exists an integer p[m]such that x Um,p
+ Um,p+1
and the Nu membership of xcan be characterized by an inequality
Upx0, where Upis a m-by-mmatrix built by two first-order
difference operators D, as shown in the following equation
xp+1 xp+2
| {z }
+ Um,p+1
1 1
1 1
| {z }
Based on this characterization, if the value of pof the vector wi
is known, the subproblem on wi, under the HALS framework, is a
linearly constrained quadratic programming problem:
2 hMi(hi)>,wii+c
subject to Upiwi0and w>
i1m= 1.
2.2. Accelerated Projected Gradient (APG) for (5)
We solve (5) by accelerated projected gradient (APG) [5]. The fea-
sible set {wi|Upiwi0,w>
i1= 1}in (5) is difficult to project
onto, so let us reformulate (5). Note that the square matrix Upiin
(4) is non-singular. The change of variable y=Upiwigives
2hQy,yi hp,yis.t. y0,y>b= 1,(6)
where Q=khik2
piMi(hi)>and b=U1
We solve (6) by APG; see Algorithm 1. Then we convert the solution
yof (6) to w
ithat solves (5) via w
Algorithm 1: Accelerated Projected Gradient (APG)
Input : QRm×m,p,b
Output: Vector ythat approximately solves (6).
Initialize ˆ
For k= 1, . . . until some criteria is satisfied do
kQk2% Projected gradient step;
k+2 (ykyk1)% Extrapolation step;
End for
The key in APG is the projection P. Given a vector z,P(z)is
defined as
P(z) = argmin
2s.t. y0,y>b= 1.(7)
This problem is the projection onto an irregular simplex described
by the vector b=U1
pi1. As Upiin (4) is built by two first-order
difference operators and is thereby block tri-diagonal, its inverse is
positive block tri-diagonal and thus b>0. This implies that (7) sat-
isfies the Slater’s condition, that is, the feasible set has a non-empty
relative interior, which guarantees strong duality. The solution to
(7) can be derived from the partial Lagrangian associated with the
equality constraint:
y= min
νL(y, ν) = 1
which has a closed-form solution given by the soft-thresholding
y= [zνb]+, where the Lagrangian multiplier νis the root of
the following piece-wise linear equation
max n0, ziνbiobi1 = 0,(8)
where zi
(i[m]) are the break points. Assuming there are
Kmnonzero break points, Problem (8) can be solved in
O(m+Klog m)operations by sorting the break points. The com-
plexity of the projection step is in between O(m)and O(mlog m),
depending on K; see also the discussion in [6]. We implemented
an efficient and robust MATLAB code inspired by [7] to solve this
2.3. Brute-force algorithm for pi
If all piare known, solving (5) for all i[r]gives the solution to W
for NuMF. In general the pis are unknown and should be optimized.
In this sense, NuMF is a nonconvex problem with rinteger variables
p1,...,pr. A first naive strategy is to solve it using brute force: try
all the even integers in [m]on pifor solving (5), pick the one with
the smallest objective function value as the solution. This requires
O(m2nr)operations and hence does not scale linearly with the size
of the data. This brute-force strategy is ineffective for large m.
2.4. Multi-grid as the dimension reduction step
We now discuss an idea of using MG to speed up the computation.
The reason why MG is used is that it preserves Nu (Theorem 1),
which is not the case for other dimension reduction techniques such
as PCA or sampling. Algorithm 2 shows the algorithm for solving
NuMF with MG. First we use a restriction operator Ron the data to
form a smaller problem in a coarse grid: in the general N-level MG,
the vector RNRN1. . . R1whas the row-dimension of mNm.
For mNthat is sufficiently small, we can run the brute-force search
to estimate p. The cost of brute-force search is now reduced from
searching the even integers in [m]to those in [mN]. After we solve
the problem on the coarse grid, we interpolate the solution back to
the original fine grid by interpolation, which can be computed as the
left-multiplication with the matrix R>with a scaling factor. Lastly
we solve the problem on the fine grid with the information p0, and
no further brute-force search is needed.
We now discuss the details of the restriction.
Definition 3 Restriction operator Ris defined as xRx, where
+with m1< m has the form of (9). The operator is
defined column-wise on a matrix, i.e., RX := [Rx1. . . Rxn].
There are many choices to build R, for simplicity, we use
R(a, b) =
a b
b a b
b a b
b a
,a > 0, b > 0,
a+ 2b= 1.(9)
We now show that MG preserves Nu. Let us define Nm,p
+, which is a subset of the Nu vectors. We have the following
Theorem 1 Let x Um,p
+and RRm1×mdefined in (9). Then
y=Rx Nm1,py
+with py {bp
2+ 1c,bp
Proof First, we assume pis even, without loss of generality, by
considering the vector [0,x]when pis odd which does not change
the unimodality and increases pby one.
Then let us decompose Ras R=A+B+C, where Aonly
contains the elements ain R,Bonly contains the elements bon the
right of ain R, and Conly contains the elements bon the left of
ain R. Note that A,B,Care sampling operators multiplied by a
constant. These sampling operators are either picking the odd or
even indices of x, which give Nu vectors since any sub-vector of a
Nu vector is Nu. It remains to show that the sum Ax +Bx +Cx
belongs to Nm,py
+for py {bp
2+ 1c,bp
2c}. Since pis even and the
matrices Band Csample the even entries of x, the vectors Bx and
Cx belong to Um1,p
+. It is easy to see that Ax belongs to Um1,py
with pyeither p
2or p
2+ 1, which concludes the proof.
Algorithm 2: MG-BFS for solving NuMF
Input : MRm×n
Output: W,Hthat approximately solve (2)
1. Perform restriction:
M[N]=RN. . . R1M,W[N]
0=RN. . . R1W0.;
2. Solve NuMF on the coarse grid:
[W[N],H0,p[N]] = HALS(M[N],W[N]
with hiupdated using (3), and wiupdated using APG
discussed in section 2.2.;
3. Interpolation: [W0,p0] = Interpolate(W[N],p[N]).;
4. Solve the problem in the original grid
[W,H] = HALS(M,W0,H0,p0).
* Steps 2-4 can be repeated several time.
Lastly, we give a few remarks concerning Algorithm 2. In prac-
tice, due to errors introduced by the restriction and interpolation pro-
cesses, the vector p0obtained from p[N]may not be precisely ac-
curate for solving NuMF in the original dimension. As a safeguard,
we still perform a brute-force search for pvalues of NuMF in the
original dimension, but we only search it in a small neighborhood of
p0, depending on the grid size. Say, for Rdefined in (9), we search
±5around p0.
The restriction can be computed at once by forming directly the
operator R=RN. . . R1RmN×mfor a total computational cost
of O(m2
Nnr) + O(mnr)operations for Algorithm 2. A natural
choice for mNis therefore mN=O(m)so that the computa-
tional cost of Algorithm 2 would be asymptotically equivalent to
Remark 2 (Peak finding) Another possible heuristic is to use peak
detection, for example using findpeaks in MATLAB, to preselect
a small number of candidate values for the pi’s. This strategy is
explored in [8] and will be investigated in future work.
Now we present some results on the identifiability of NuMF, i.e.,
when does solving NuMF give a unique solution. We first define
the support of a vector x Um
+as supp(x) := {i[m]|xi6= 0}
= [a, b], where the second equality is due to the fact that the support
of all Nu vector contains only a single close interval. Then, we define
the notion of strictly disjoint between supports as follows.
Definition 4 (Strictly disjoint) Given two vectors x,y Um
supp(x) = [ax, bx]and supp(y) = [ay, by]. The two vectors are
called strictly disjoint if ax> by+ 1.
Remark 3 Other related concepts to strictly disjoint are adjacent
and overlap [8, Section 5.3], which we do not discuss in this paper.
3.1. Preliminary results on identifiability of NuMF
Here we give the identifiability result of NuMF for two special cases.
Theorem 2 Assumes M=¯
H. Solving (2) recovers (¯
1. ¯
Wis Nu and all the columns have strictly disjoint support.
2. ¯
+has n1,k¯
hik>0for i[r].
Proof Assume there is another solution (W,H)that solves the
NuMF. The columns ¯
wjcontribute in Ma series of disjoint uni-
modal components. For the solution WHto fit M, each w
to fit each of these disjoint component in M, and hence Wrecovers
Wup to permutation. There is no scaling ambiguity here because
of the normalization constraints w>
i1= 1. Moreover, Wand ¯
have rank r, since their columns have disjoint support, and hence
Hand ¯
Hare uniquely determined (namely, using the left inverses
of Wand ¯
W), up to permutation.
The assumptions in Theorem 2 are strong, but are satisfied in many
GCMS data. Furthermore, the theorem holds for rnwhich is
uncommon for most NMF models [1].
We now present another result on general identifiability for
NuMF with rlimited to 2. This result is general in the sense that it
includes vectors with overlapping supports, which is not addressed
in the previous theorem. First we have the following lemma on
demixing two Nu vectors with non-fully overlapping supports:
Lemma 1 (On demixing two non-fully overlapping Nu vectors) Given
two non-zero vectors x,yin Um
+with supp(x)*supp(y)and
supp(x)+supp(y). If x,yare generated by two non-zero Nu
vectors u,vas x=au+bvand y=cu+dvwith nonnegative
coefficients a, b, c, d , then we have either u=x,v=yor u=y,
Proof Since xan yhave non-overlapping supports, we can-
not have u=αvfor some α > 0, hence uand vare linearly
independent. Let X=UQ, where X:= [x,y],U:= [u,v]
and Q:= a c
b d0. The conditions that x,yare Nu with
supp(x)*supp(y)and supp(x)+supp(y)imply x6= 0,y6= 0,
supp(x)*supp(y) = i[m]s.t. xi>0, yi= 0,
supp(y)*supp(x) = j[m]s.t. yj>0, xj= 0.(10)
Then x6=yand u6=αvimply X,U,Qare all rank-2, hence
b a 1
ad bc , ad bc 6= 0.(11)
Put i, jfrom (10) into (11), together with the fact that x,y,u,v
are nonnegative give Q10. Lastly Q0and Q10imply
Qis the permutation of a diagonal matrix with positive diagonal
[9], where here the diagonal matrix is the identity.
Now we can present the general identifiability of NuMF for r= 2.
Theorem 3 Assumes M=¯
H. If r= 2, solving (2) recovers
H)if the columns of ¯
Wsatisfy the conditions of Lemma 1 and
+is full rank.
Proof It follows directly from Lemma 1.
Theorems 2 and 3 address the identifiability of NuMF from two an-
gles: the number of columns in Wand how the supports of wiin-
teract. Neither of the theorems is complete. Generalizing these the-
orems to all possible interactions between supports of wifor r3
is a topic of further research.
We now present experiments on NuMF. The code is available from
Toy example on MG performance. A Nu matrix Wand a non-
negative matrix Hare constructed with (m, n, r) = (100,6,3); see
Fig. 1. The NuMF problem is solved with 0, 1 and 2 layers of MG.
Fig. 1 shows that MG significantly speeds up the convergence: more
than 50% run time reduction for 1 layer MG, and more than 75%
time reduction for 2 layers. This result shows that our method is far
superior in terms of efficiency, as other existing approaches such as
those in [3, 10] that have a similar complexity to Algorithm 2 with-
out MG.
Fig. 1. Experiment on a toy example. Top: The ground truth W
matrix and the data M.Bottom: The curve plotted against time.
All algorithms run 100 iterations and are initialized with SNPA [11].
For algorithms with MG, the computational time taken on the coarse
grid are also taken into account, as reflected by the time gap between
time 0 and the first dot in the curves.
On GCMS data of Belgian beers. We now demonstrate the reg-
ularizing power of the unimodality constraint in the factorization,
using a beer dataset [12]. Here MR518×947
+where each col-
umn is a GCMS spectrum. With r= 7, three methods: NuMF,
NMF [1] and separable NMF (SNMF) [13] are used to decompose
the data, and Fig. 2 shows the results. As expected, only NuMF can
decompose the data into individual Nu components, while for the
other two models, some components are highly mixed with multiple
peaks. Note however that the relative error kMWHkF/kMkF
is similar for the three methods, and around 10%.
Fig. 2. Experiment on beer data. The bottom plots show the W’s
obtained by the three methods.
On data with r > n.We now consider NuMF on the case r > n
(more sources than samples), which is not possible for most other
NMF models. Here a GCMS data vector in R947
+is used. With
r= 8 > n = 1, we decompose this vector into runimodal com-
ponents. NuMF provides a meaningful decomposition; see Fig. 3.
Note that the first two peaks in the data satisfy Theorem 2 and hence
NuNMF identifies them perfectly. For the other peaks, their supports
overlap, and hence the decomposition is not unique. Investigating
the identifiability of NuMF on data with overlapping supports is a
direction of future research.
Fig. 3. On data M(dotted black curve) with r= 8 >1 = n.
The cyan curves are the components wihi. Relative error kM
WHkF/kMkF= 108.
We introduced NuMF and proposed to solve it by combining APG
and MG. We showed that the restriction operator in MG preserves
Nu, and we present two preliminary identifiability results. Numeri-
cal experiments support the effectiveness of the proposed algorithm.
Future works will be to study the general identifiability of NuMF,
and to further improve the algorithm using for example peak finding
algorithms; see Remark 2.
[1] Nicolas Gillis, “The why and how of nonnegative matrix fac-
torization,” Regularization, optimization, kernels, and support
vector machines, pp. 257–291, 2014.
[2] Richard Stanley, “Log-concave and unimodal sequences in al-
gebra, combinatorics, and geometry, Annals of the New York
Academy of Sciences, vol. 576, no. 1, pp. 500–535, 1989.
[3] Rasmus Bro and Nicholaos Sidiropoulos, “Least squares al-
gorithms under unimodality and non-negativity constraints,
Journal of Chemometrics: A Journal of the Chemometrics So-
ciety, vol. 12, no. 4, pp. 223–247, 1998.
[4] Andrzej Cichocki, Rafal Zdunek, and Shun-ichi Amari, “Hi-
erarchical ALS algorithms for nonnegative matrix and 3d ten-
sor factorization,” in International Conference on Independent
Component Analysis and Signal Separation. Springer, 2007,
pp. 169–176.
[5] Yurii E Nesterov, A method for solving the convex program-
ming problem with convergence rate o (1/kˆ 2), in Dokl. akad.
nauk Sssr, 1983, vol. 269, pp. 543–547.
[6] Laurent Condat, “Fast projection onto the simplex and the l1
ball,” Mathematical Programming, vol. 158, no. 1-2, pp. 575–
585, 2016.
[7] Laurent Condat, “Matlab code to project onto the sim-
plex or the l1 ball,
software.html, 2015.
[8] Man Shun Ang, Nonnegative Matrix and Tensor Factoriza-
tions: Models, Algorithms and Applications, Ph.D. thesis, Uni-
versity of Mons, 2020.
[9] Abraham Berman and Robert J Plemmons, Nonnegative ma-
trices in the mathematical sciences, SIAM, 1994.
[10] Junting Chen and Urbashi Mitra, “Unimodality-constrained
matrix factorization for non-parametric source localization,”
IEEE Transactions on Signal Processing, vol. 67, no. 9, pp.
2371–2386, 2019.
[11] Nicolas Gillis, “Successive nonnegative projection algorithm
for robust nonnegative blind source separation, SIAM Journal
on Imaging Sciences, vol. 7, no. 2, pp. 1420–1450, 2014.
[12] Christophe Vanderaa, “Development of a state-of-the-art
pipeline for high throughput analysis of gas chromatography
- mass spectrometry data,” Master thesis, 2018.
[13] Nicolas Gillis and Stephen Vavasis, “Fast and robust recur-
sive algorithms for separable nonnegative matrix factoriza-
tion,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 36, no. 4, pp. 698–714, 2013.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
In this paper, we propose a new fast and robust recursive algorithm for near-separable nonnegative matrix factorization, a particular nonnegative blind source separation problem. This algorithm, which we refer to as the successive nonnegative projection algorithm (SNPA), is closely related to the popular successive projection algorithm (SPA), but takes advantage of the nonnegativity constraint in the decomposition. We prove that SNPA is more robust than SPA and can be applied to a broader class of nonnegative matrices. This is illustrated on some synthetic data sets, and on a real-world hyperspectral image.
Full-text available
In this paper, we study the nonnegative matrix factorization problem under the separability assumption (that is, there exists a cone spanned by a small subset of the columns of the input nonnegative data matrix containing all columns), which is equivalent to the hyperspectral unmixing problem under the linear mixing model and the pure-pixel assumption. We present a family of fast recursive algorithms, and prove they are robust under any small perturbations of the input data matrix. This family generalizes several existing hyperspectral unmixing algorithms and hence provides for the first time a theoretical justification of their better practical performance.
Conference Paper
Full-text available
In the paper we present new Alternating Least Squares (ALS) algorithms for Nonnegative Matrix Factorization (NMF) and their extensions to 3D Nonnegative Tensor Factorization (NTF) that are ro- bust in the presence of noise and have many potential applications, including multi-way Blind Source Separation (BSS), multi-sensory or multi-dimensional data analysis, and nonnegative neural sparse coding. We propose to use local cost functions whose simultaneous or sequential (one by one) minimization leads to a very simple ALS algorithm which works under some sparsity constraints both for an under-determined (a system which has less sensors than sources) and over-determined model. The extensive experimental results confirm the validity and high performance of the developed algorithms, especially with usage of the multi-layer hierarchical NMF. Extension of the proposed algorithm to multidimensional Sparse Component Analysis and Smooth Component Analysis is also proposed.
Herein, the problem of simultaneous localization of multiple sources given a number of energy samples at different locations is examined. The strategies do not require knowledge of the signal propagation models, nor do they exploit the spatial signatures of the source. A non-parametric source localization framework based on a matrix observation model is developed. It is shown that the source location can be estimated by localizing the peaks of a pair of location signature vectors extracted from the incomplete energy observation matrix. A robust peak localization algorithm is developed and shown to decrease the source localization mean squared error (MSE) faster than $\mathcal O(1/M^{1.5})$ with M samples, when there is no measurement noise. To extract the source signature vectors from a matrix with mixed energy from multiple sources, a unimodality-constrained matrix factorization (UMF) problem is formulated, and two rotation techniques are developed to solve the UMF efficiently. Our numerical experiments demonstrate that the proposed scheme achieves similar performance as the kernel regression baseline using only 1/5 energy measurement samples in detecting a single source, and the performance gain is more significant in the cases of detecting multiple sources.
A new algorithm is proposed to project, exactly and in finite time, a vector of arbitrary size onto a simplex or a l1-norm ball. The algorithm is demonstrated to be faster than existing methods. In addition, a wrong statement in a paper by Duchi et al. is corrected and an adversary sequence for Michelot's algorithm is exhibited, showing that it has quadratic complexity in the worst case.
In this paper a least squares method is developed for estimating a matrix B that will minimize #Y - XB# subject to the constraint that the rows of B are unimodal, i.e., each has only one peak, and 2 2 #M# being the sum of squares of all elements of M. This method is directly applicable in many 2 2 curve resolution problems, but also for stabilizing other problems where unimodality is known to be a valid assumption. Typical problems arise in certain types of time series analysis like chromatography or flow injection analysis. A fundamental and surprising result of this work is that unimodal least squares regression (including optimization of mode location) is not anymore difficult than two simple Kruskal monotone regressions. The new method is useful in and exemplified with two- and multi-way methods based on alternating least squares regression solving problems from fluorescence spectroscopy and flow injection analysis. Keywords: Unimodal Least Squares Regression, alternating l...
Regularization, optimization, kernels, and support vector machines
  • Nicolas Gillis
Nicolas Gillis, "The why and how of nonnegative matrix factorization," Regularization, optimization, kernels, and support vector machines, pp. 257-291, 2014.
A method for solving the convex programming problem with convergence rate o (1/kˆ2)
  • E Yurii
  • Nesterov
Yurii E Nesterov, "A method for solving the convex programming problem with convergence rate o (1/kˆ2)," in Dokl. akad. nauk Sssr, 1983, vol. 269, pp. 543-547.
Matlab code to project onto the simplex or the l1 ball
  • Laurent Condat
Laurent Condat, "Matlab code to project onto the simplex or the l1 ball," software.html, 2015.