Content uploaded by Andersen M. S. Ang

Author content

All content in this area was uploaded by Andersen M. S. Ang on May 20, 2021

Content may be subject to copyright.

Fast algorithm for complex-NMF with application to source separation

Andersen Ang1,3Valentin Leplat2Nicolas Gillis3

1Department of Combinatorics and Optimization, University of Waterloo, Waterloo, Canada

2ICTEAM, Universit´

e catholique de Louvain, Louvain-la-Neuve, Belgium

3Math´

ematique et recherche op´

erationnelle, Universit´

e de Mons, Mons, Belgium

ms3ang@uwaterloo.ca, valentin.leplat@uclouvain.be, nicolas.gillis@umons.ac.be

Abstract—In this paper we consider a Nonnegative Matrix Fac-

torization (NMF) model on complex numbers, in particular, we

propose a group complex-NMF (cNMF) model that subsumes the

phase-consistency complex NMF for the audio Blind Source Sep-

aration (aBSS). Using Wirtinger calculus, we propose a gradient-

based algorithm to solve cNMF. The algorithm is then further

accelerated using a heuristic extrapolation scheme. Numerical

results show that the accelerated algorithm has signiﬁcantly faster

convergence.

Index Terms—Nonnegative Matrix Factorization, Blind source

separation, Phase, Wirtinger calculus, Algorithm, Extrapolation

I. INTRODUCTION

Nonnegative Matrix Factorization (NMF) [1], [2] is the

problem of ﬁnding two nonnegative matrices, Wand H, from

a matrix X∈Rm×n

+such that XuWH. Factors W,H

are usually computed by tackling a non-convex optimization

problem which are notoriously hard to solve. By extending

from Rm×nto Cm×n, complex NMF (cNMF) introduces more

variables, which makes the problems even more difﬁcult to

solve. Given X∈Cm×nand r, it is deﬁned as follows

min

W∈W,H∈H,Θ∈Ω

D(X|F) + R(F),F:=`(W,H,eiΘ),(cNMF)

where the sets W=Rm×r

+,H=Rr×n

+,Ω=[−π, π]m×n×G

are all convex, the function Dmeasures the distance between

Xand F. For simplicity in this paper we focus on the

Euclidean norm, using D(X|F) = 1

2kX−Fk2

F. We refer the

reader to [3] for the general case of β-divergences. The term

Ris a regularization known as the consistency; which will be

discussed in Section II-B.

The estimator Fis computed via the function `:W ×H ×

Ω→Cm×nthat produces a matrix to ﬁt X. Three examples

of `are:

rank-1 1-phase: F=WH eiΘ,(1a)

rank-1 multi-phase: F=

r

X

j=1 wjhjeiΘj,(1b)

grouped multi-phase: F=

G

X

j=1 WrjHrjeiΘj,(1c)

AA thanks C. F´

evotte for introducing the works of Le Roux. This project

has received funding from the European Research Council (ERC) under the

European Union’s Horizon 2020 research and innovation program (grant

agreement No. 788368). NG acknowledges the support by ERC starting

grant No 679515, the Fonds de la Recherche Scientiﬁque and the Fonds

Wetenschappelijk Onderzoek under EOS project O005318F-RG47.

where W,Hare real and eiΘis complex, is the Hadamard

product, i=√−1and ei(·)is the component-wise complex

exponential. For ﬁtting the phase angle of X, Model (1a)

introduces the angle Θ. Let wj,hjdenote the jth column

of Wand jth row of H, respectively. We have

WH eiΘ=r

X

j=1

wjhjeiΘ,

so (1b) considers the more general case where each rank-1

factor wjhjhas its own phase eiΘj; this model was introduced

in earlier papers on complex NMF [4], [5].

In this paper we introduce (1c), which we refer to as the

grouped multi-phase model as a natural generalization of (1b);

see Section II.

Application cNMF ﬁnds applications whose data belongs to

the complex domain, a typical example is the audio Blind

Source Separation (aBSS) [6]–[8]. It is also related to phase

retrieval [9] and ﬁnds applications in optics. In this paper

we use aBSS as an illustration, which we discuss further in

Section II.

Contribution and organization This paper has two contribu-

tions. First we introduce cNMF with (1c), which generalizes

cNMF (1b) studied in [4], [5]; see also [10]. The model (1c)

comes naturally from (1b), but it has never been addressed, to

the best of our knowledge. Then, the main contribution of this

paper comes in the design of a single algorithm framework

for solving cNMF that covers the variants in (1). Although

W,Hin cNMF are real, in general we can treat W,H,Θ

as complex variables, and use a systematic approach based on

the notion of Wirtinger calculus [11], [12] to design gradient-

type algorithm to solve cNMF; see Section III. Furthermore,

we adapt the heuristic extrapolation strategy from [13]–[15]

to our proposed Wirtinger gradient scheme to solve cNMF,

which signiﬁcantly accelerates its convergence; see Section IV

for numerical experiments.

Scope of the paper This paper focuses on algorithmic

aspects. We do not benchmark the proposed algorithms with

other frameworks on performance w.r.t. aBSS as dedicated

data preprocessing, post-processing and parameter tuning are

required.

II. CNMF MO DE LS A ND A BSS

This section serves as the literature review and background

of the paper. We ﬁrst present how real NMF as in [7], [8]

is used to solve single-channel aBSS as shown in Fig.1, then

we move to the cNMF previously studied in [4], [5]. Finally

we describe how to generalize these model to the grouped

multi-phase model.

``Time-frequency domain’’

Real-valued

``Time-domain’’

:th source

: observed data

: number of points in

: STFT and iSTFT

,

windows

: spectrogram

: maximum frequency bin

: maximum time frame

: coordinate in

: amplitude spectrogram

: frequency profile

: time activation

: factorization rank

: th column of

: th row of

: phase spectrogram

: th component spectrogram

: th source estimate

Complex-valued Real-valued

Fig. 1. The processing pipeline as in [7], [8] on using NMF to solve aBSS.

A. Real NMF on single-channel aBSS

Let [r]:={1,2,··· , r }. Assume raudio sources {sj∈

RL}j∈[r]are linearly mixed as x=Pjsj, aBSS aims to

recover sjfrom the observation of their mixture x∈RL. We

follow the data processing pipeline (Fig. 1) to estimate these

sources. First we compute X=Ψω(x), where Ψωdenotes the

Short-Time Fourier Transform (STFT) with an analysis win-

dow ω, and Xis the spectrogram containing the distribution of

“energy” of the signal across time-frequency coordinates. Such

information can be extracted by NMF methods through the

computation of W,Hthat respectively capture the frequency

and temporal patterns [8], [16]. NMF is used here since both

the frequency spectrum and time activations are nonnegative.

For aBSS with NMF methods, the input matrix Vtypically

corresponds to the amplitude spectrogram derived from Xas

V(k, m) = |X(k, m)|for all k , m. Then we ﬁnd W,Hby

solving the (real) NMF problem:

[W,H] = min

W,H

1

2kV−WHk2

Fs.t. W∈ W,H∈ H.(2)

From WH, each rank-1 matrix wjhj∈RK×M

+is converted

to a component spectrogram Yj∈CK×Mby multiplying

wjhjelement-wise with the phase of the mixture spectrogram

Xas follows: Yj=wjhjeiΘ, where phase spectrogram

eiΘ∈CK×Mis deﬁned by Θ(k, m) = ∠X(k, m)with

the argument function ∠that returns the angle of the input

complex number. Finally, Ψ†

˜

ωYjis computed to generate

the vector yj∈RL, where Ψ†

˜

ωis the pseudo-inverse of Ψ

associated to the synthesis windows ˜

ω.

Usual assumptions when using NMF for aBSS Under some

conditions we expect the vectors yjto accurately estimate the

sources sj. Two of these assumptions are:

•Consistency: each Yjis consistent w.r.t. to Ψ.

•Rank-1 source: each source is well-approximated by a

rank-1 matrix.

Relaxing these assumptions lead to model (1b) and (1c).

B. Complex NMF and consistency

By directly consider factorizing Xinstead of V, we arrive

at cNMF (1a). We now explain the regularizer term. As

Fourier transform is not surjective, this implies that a matrix

P∈CK×Mdoes not necessarily corresponds to the STFT of

a vector. In that case, we say that Pis not consistent w.r.t.

Ψω[5], [10], [17]–[19]. Since the phase is not considered

for the computation of W,Hin (2), it is possible for the

aforementioned pipeline to produce inconsistent Yj, although

Xis always consistent. To generate consistent solutions, we

can consider the phase-consistent [5], [10], [17]–[19] cNMF:

min

W∈W,H∈H,Θ∈Ω

D(X|F) + R(F),R(F) = λ

2kE(F)k2

F(3)

where λis a penalty weight (nonnegative scalar), and the term

Eis the phase-inconsistency error deﬁned as

E(F):=F−ΨωΨ†

˜

ωF=BF,B:=I−ΨωΨ†

˜

ω,(4)

we call a matrix Pconsistent if E(P) = 0.

C. Multi-phase and group factorization

We now discuss how (3) can be generalized by cNMF (1b),

which is then naturally extended to cNMF (1c).

Multi-phase Note that there is only one Θin (3), meaning all

the rrank-1 component amplitude spectrograms wjhjshare

the same phase. The multi-phase model considers the general

case that different sources can have a different phases. In this

case, Θ∈Rm×n×ris a 3rd-order tensor, where the jth frontal

slice of it, denoted as Θj, corresponds to the phase angle of

the jth source estimate. This leads to the model cNMF (1b).

Group factorization NMF [7], [8] and cNMF (3) assume

each source is well-approximated by a rank-1 matrix in the

factorization, and the factorization rank rcorresponds to

the number of sources. Such an assumption only holds for

relatively simple data, and in practice complex real-world

audio sources cannot be well-approximated by a rank-1 term.

Here we consider using a grouping of rank-1 components to

approximate the amplitude part of a source, in this way we

arrive at cNMF (1c), where Wrj,Hrjare rank-rjnonnegative

matrices with rjcolumns or rows, respectively. In this case,

we are considering a rank-rNMF with r=PG

i=1 riand Gis

the total number of groups (number of sources). Immediately,

we see a drawback of such model: having more parameters, in

particular the parameters r1,·· · , rGwhich correspond to the

model orders. As model order selection is a research topic on

its own, in this paper we do not deal with the strategy to tune

these parameters, instead we ﬁx their value for the numerical

experiments based on simple prior inspection of the data. For

example, a simple heuristic is to use the SVD to numerically

check what is the rank of each portion of the data.

Now, we see that real NMF (2) ⊆cNMF (1a) =(3) ⊆cNMF

(1b) ⊆cNMF (1c). Thus it is meaningful to study how to

effectively solve cNMF, which is the main subject of the next

section. Note that ﬁtting a complex matrix is in general harder

than ﬁtting a real matrix, this can be easily understood since

the equality between two numbers is a stronger assumption

on Cthan Ras it requires the equality in both modulus and

phase. Furthermore, we recall that in [4], [5], sophisticated

algorithms based on the MM framework with multiplicative-

type update are derived to solve cNMF (1a), which are not

suitable here since they cannot directly be used to solve

cNMF (1c). Moreover, these algorithms can potentially have

slow convergence rates. In fact, multiplicative-type updates

are well-known to convergence slower than gradient-based

algorithms for NMF when using the Frobenius norm [20]; see

also [2], [21] and the references therein. In the next section,

we derive a fast gradient-based algorithm to solve cNMF.

III. ALGORITHMS FOR CNMF MODELS

Here we ﬁrst review the background on Wirtinger calculus

(W-calculus) [11] following similar approach presented in [22,

Section 6], and then we present our method to solve cNMF.

For details on W-calculus, see [12].

A. W-derivatives on real-valued function of complex variable

In general, functions deﬁned on Cnare not holomorphic,

i.e., they are not complex-differentiable on their domain. A

ﬁrst naive approach would be to consider a related function

deﬁned on R2nsuch that calculus rules on Rcan be used.

However, this approach can be rather tedious. W-calculus pro-

vides an alternative equivalent formulation, with compact and

elegant notation. The core ideas are the W-differential operator

and the use of conjugate coordinate [z, z∗]>when computing

the full gradient. Denote ∂zf=∂f

∂z for a differentiable f

and let z∗= ¯z, the W-derivative w.r.t. z=x+iy ∈C,

x, y ∈R, and its conjugate z∗:=x−iy, respectively, are

∂zf=:=1

2(∂xf−i∂yf), and ∂z∗f:=1

2(∂xf+i∂yf). If fis

real-valued,

(∂zf)∗=∂z∗f. (5)

For real-valued multi-variable function f:Cn3z=

[z1, z2...,zn]>→w=f(z)∈R, the partial gradient w.r.t.

zand z∗, and the full gradient are:

∂zf:=

∂z1f

.

.

.

∂znf

, ∂z∗f:=

∂z∗

1f

.

.

.

∂z∗

nf

and ∇f:=

∂zf

∂z∗f

∗

,

(6)

where the full gradient ∇fconsists of the partial gradients

∂zfand ∂z∗f. As we work on real-valued f, then applying

(5) in (6) means that we only need to compute one partial

gradient, as the other one is its conjugate. We are now ready

to discuss how to solve cNMF.

B. The gradient-update steps

We now discus how to solve cNMF. For simplicity here we

focus on cNMF with F=WH eiΘ, that is, (1a). The same

approach applies to the other cNMF models. In general, we

solve (3) by the (inexact) Block Coordinate Descent (BCD):

we alternate on solving each sub-problem on one block of

variables while ﬁxing the others at their most recent value.

We now discuss how to solve each subproblem.

Subroblem in WThis subproblem has the following form

min

W≥0f:=1

2k(WH)D−Xk2

F+λ

2kB(WH)Dk2

F,(7)

where D=eiΘ. We solve (7) by iterating the projected

gradient step (proximal gradient step)

Wk+1 =Re nargmin

WD∇Wf(Wk,Hk,Θk),WE

+Lk

W

2kW−Wkk2o+i+(Re{W})

=hRenWk−1

Lk

W∇Wf(Wk)oi+

(8)

where Re(·)takes the real part, Lk

Wis the Lipschitz constant

of the gradient at iteration k,i+is the indicator function of the

nonnegative orthant that encodes the non-negativity constraint,

and [·]+= max{· , }with a small positive value. Using

W-derivative, the gradient w.r.t. the Hermitian WHis

∂WHf=HDH(QF −X)>:=g(W>),(9)

where Q:=I+λBHBand Fis a linear function of W:

F= (WH)D.(10)

As Wis real, by (5), the gradient of fw.r.t. Wis the transpose

of g(W>). For the Lipschitz constant LWof the gradient (9),

let ∆W=W1−W2and ∆g=g(W>

1)−g(W>

2), we have

k∆gk2

(9)

=

HDHhQF>(W>

1)−F>(W>

2)i

2

(10)

=

HDHhQ∆WHD>i

2

≤

H

2

2

DHD>

2kQk2

|{z }

LW

k∆Wk2.

(11)

Subproblem in HBy taking transpose, the subproblem on

His symmetric to (7), we use the same approach to solve it.

Suproblem in ΘBased on (7), the subproblem in Θis

min

D∈D f(D):=1

2kAD−Xk2

F+λ

2kB(AD)k2

F,(12)

where A=WH and D=eiΘ. Note that we use the change

of variable Dto replace Θas the variable, and note that since

complex exponential has unit norm, this requires an additional

constraint [D]ij= 1, so that Din (12) is the set of complex

matrices whose elements have a magnitude equal to one. Note

that the set Dis nonconvex, which affects the convergence

analysis; see the discussion in Section III-C.

To solve (12) and update D, we follow the same approach

as for (8), and use

Dk+1 =PDnDk−1

Lk

D∇Df(Dk)o,(13)

where PDis the projection onto the set D. Take Das W

in (7) with Hset to I, the the same update (9) can be used

with LD=kAAk2kQk2. Lastly, we get Θkfrom Dkby

Θ=∠D.

C. The overall algorithm and convergence

Algorithm 1 shows the simpliﬁed pseudo-code of the overall

algorithm. At ﬁrst glance it looks tempting to apply some well-

established convergence analyses for, say proximal gradient

method and block coordinate descent to derive the theoretical

convergence of the algorithm. However, we emphasize that

the algorithm is a heuristic and we currently do not have a

rigorous proof of convergence. It is important to note that

cNMF involves complex variables, while the aforementioned

convergence analyses are all built upon the assumption that the

objective function sits in R, hence it is unclear how to apply

convergence analysis from the real case on the complex case.

Furthermore, if it turns out the “real analysis” can be applied

on such a complex problem, the convergence analysis is still

nontrivial as the update step of Dinvolves a projection onto

a nonconvex set, leading to theoretical complications from the

projection PDin the update (13).

Although we do not have theoretical convergence, we have

observed empirically that the algorithm decreases the objective

function; see Section IV and in particular Figure 3.

Algorithm 1: Alternating Projected Wirtinger gradient

Result: W,H,Θthat solves cNMF (1a)

Initialize W0,H0,Θ0,D0=eiΘ0,Q=I+λBHB;

for k= 1,2, . . . do

Get Wk+1 using (8) and (11);

Get Hk+1,Dk+1 similarly as W;

Get Θk+1 from Dk+1;

end

D. Acceleration by heuristic extrapolation

Algorithm 1 can be accelerated using extrapolation, and we

adopt the framework of Heuristic Extrapolation and Restart

(HER) [13]–[15], which has been shown to be very ef-

fective on accelerating the convergence of NMF-type algo-

rithms with the Frobenius norm. In a nutshell, HER is a

numerical extrapolation scheme based on the extrapolation on

the sequence {Wk,Hk}k=1,2,... using an auxiliary sequence

{ˆ

Wk,ˆ

Hk}k=1,2,... as follows:

Wk+1 =Update(ˆ

Wk,ˆ

Hk),

ˆ

Wk+1 =Wk+1 +βk(Wk+1 −Wk)+,

Hk+1 =Update(ˆ

Wk+1,ˆ

Hk),

ˆ

Hk+1 =Hk+1 +βk(Hk+1 −Hk)+,

where Update can be the gradient update as in (8), and

βkis the extrapolation parameter automatically tuned based

on the HER scheme. The acceleration effect of HER comes

from the combination of extrapolation, cheap restart (safe

guard mechanism) and a numerical scheme on updating the

extrapolation weight βk; see [14] for more details. Lastly, we

emphasize that currently there is no theoretical convergence

guarantee for the HER framework, but HER empirically works

well in practice; see Section IV.

Remark on HER In principle we can also extrapolate the

variable Din the same way as Wand H. However we do

not perform extrapolation on the variable D. The reason is that

we empirically observed that not extrapolating Dperforms

better, although extrapolating Dis still much faster than the

unextrapolated original gradient-descent algorithm.

Note that currently the theoretical understanding of the

HER mechanism is still very limited. Let us try to give a

partial explanation of the ineffectiveness of extrapolating D

in the HER framework on cNMF. Since Dis to be projected

element-wise such that [D]ij= 1, that is, all the elements

of Dsits on the unit circle on the complex plane, then the

effect of extrapolating Dultimately is just a rotation on the

elements of D, and we hypothesize that the HER framework

is not a good set up for performing this kind of rotational

extrapolation, since it was original proposed for speed up NMF

computation in the Frobenius norm in the Euclidean geometry

on the rectangular coordinate system. Understanding how to

extrapolate Deffectively is an interesting future research topic.

IV. EXP ER IM EN T

We present numerical results to conﬁrm the effectiveness of

the proposed algorithm, and to show case the effectiveness of

the acceleration. The code is available at angms.science.

Before we go to the details of the experiment, we remark that:

•As mentioned in the beginning of the paper, the task

of aBSS itself contains highly involved data processing

pipeline such as parameter tuning, data pre-processing,

ﬁltering and post-processing, which is not the focus of

this paper, so we do not focus on these issues and demon-

strate that the proposed algorithm has a fast convergence

in practice. We set λ=D(X|F0)

R(F0)in all the experiments,

where F0=`(W0,H0,eiΘ0)and W0,H0,Θ0are

initialization of the variables W,H,Θ. This makes the

two terms in the objective function well balanced.

•We do not benchmark with the algorithms proposed in

[4], [5] because the codes are unavailable. See also the

discussion in the last paragraph in Section II-C.

We run Algorithm 1 (with and without HER, with the same

random initialization) on two data sets (see Fig. 2), which are

both real-world recording with some ambient noise: “Mary”,

X∈C256×714, a piano music phrase “Mary had a little lamb”;

and “Voice”, X∈C256×162, a speech of the letters “NMF”.

We run cNMF (1a) on Mary with r∈ {3,6}and on Voice with

r= 75. We run group cNMF (1c) on Voice with G= 3 and

two sets of rj:r1=r2=r3= 15 and r1=r2=r3= 25,

(recall that r=PG

j=1 rj). All the experiments are repeated 20

times with different random initialization of (W0,H0,Θ0).

kXk

200 400 600

Mary

20

40

60

80

100

120

5

10

15

20

25

6X

200 400 600

20

40

60

80

100

120

-2

0

2

50 100 150

Voice

20

40

60

80

100

120

1

2

3

50 100 150

20

40

60

80

100

120

-2

0

2

Fig. 2. The spectrograms of the two datasets Mary and Voice.

Fig. 3 and Fig. 4 show the convergence results. First we

see that the HER framework signiﬁcantly accelerates the

convergence. Second, comparing the result on cNMF (1a)

on Voice with r= 75 to group cNMF (1c) on Voice with

G= 3, r1=r2=r3= 25 so r=PG

j=1 rj= 75, the group

models have a lower objective function value. This is expected

as group model uses more phase variables to ﬁt the data.

200 400 600 800 1000

#105

2

3

5

7Mary, r= 3

200 400 600 800 1000

1.5e+05

2.5e+05

5e+05

106Mary, r= 6

200 400 600 800 1000

104

106

108

Voice, r= 75

Algorithm 1 Algorithm 1

with HER

Fig. 3. Convergence of the objective function values D+Ron cNMF tests.

x-axes are iteration. We do not show the time plots as the cost per iteration

of both algorithm is almost identical (HER adds a negligible cost [13]).

100 200 300 400 500

103

104

105

106r1=r2=r3= 15

100 200 300 400 500

103

104

105

106

r1=r2=r3= 25

Fig. 4. Convergence curves on group cNMF tests. Here the curves are the

mean over 20 experiments.

V. CONCLUSION

In this paper, we introduced the group cNMF model that

subsumes the existing cNMF and real NMF models. Using

Wirtinger calculus, we derived a general gradient-based algo-

rithm to solve cNMF. By using the heuristic extrapolation with

restart, we showed that, on a few preliminary numerical tests,

the accelerated algorithm has a much faster convergence.

Future works include studying the convergence of the al-

gorithm, performing tests with respect to the task of aBSS,

and consider replacing the Frobenius norm in the data ﬁtting

term by β-divergences which is more appropriate for audio

data sets, and also study identiﬁability issues as in [8].

REFERENCES

[1] S. A. Vavasis, “On the complexity of nonnegative matrix factorization,”

SIAM Journal on Optimization, vol. 20, no. 3, pp. 1364–1377, 2010.

[2] N. Gillis, Nonnegative Matrix Factorization. SIAM, Philadeplhia, 2020.

[3] P. Magron and T. Virtanen, “Towards complex nonnegative matrix factor-

ization with the beta-divergence,” in 2018 16th International Workshop

on Acoustic Signal Enhancement (IWAENC). IEEE, 2018, pp. 156–160.

[4] H. Kameoka, N. Ono, K. Kashino, and S. Sagayama, “Complex nmf: A

new sparse representation for acoustic signals,” in 2009 IEEE Interna-

tional Conference on Acoustics, Speech and Signal Processing. IEEE,

2009, pp. 3437–3440.

[5] J. Le Roux, H. Kameoka, E. Vincent, N. Ono, K. Kashino, and

S. Sagayama, “Complex nmf under spectrogram consistency con-

straints,” in Acoustical Society of Japan Autumn Meeting, 2009.

[6] P. Magron, R. Badeau, and B. David, “Phase recovery in nmf for audio

source separation: an insightful benchmark,” in 2015 IEEE International

Conference on Acoustics, Speech and Signal Processing (ICASSP).

IEEE, 2015, pp. 81–85.

[7] V. Leplat, N. Gillis, X. Siebert, and A. M. S. Ang, “S´

eparation aveugle

de sources sonores par factorization en matrices positives avec p´

enalit´

e

sur le volume du dictionnaire,” in XXVII`

eme Colloque francophone de

traitement du signal et des images. GRETSI 2019, 2019.

[8] V. Leplat, N. Gillis, and A. M. S. Ang, “Blind audio source separation

with minimum-volume beta-divergence nmf,” IEEE Transactions on

Signal Processing, 2020.

[9] D. Lahat and C. F´

evotte, “Positive semideﬁnite matrix factorization: A

link to phase retrieval and a block gradient algorithm,” in ICASSP 2020-

2020 IEEE International Conference on Acoustics, Speech and Signal

Processing (ICASSP). IEEE, 2020, pp. 5705–5709.

[10] P. Magron, R. Badeau, and B. David, “Complex nmf under phase

constraints based on signal modeling: application to audio source sep-

aration,” in 2016 IEEE International Conference on Acoustics, Speech

and Signal Processing (ICASSP). IEEE, 2016, pp. 46–50.

[11] W. Wirtinger, “Zur formalen theorie der funktionen von mehr komplexen

ver¨

anderlichen,” Mathematische Annalen, vol. 97, no. 1, pp. 357–375,

1927.

[12] P. Bouboulis, “Wirtinger’s calculus in general hilbert spaces,” arXiv

preprint arXiv:1005.5170, 2010.

[13] A. M. S. Ang and N. Gillis, “Accelerating nonnegative matrix factoriza-

tion algorithms using extrapolation,” Neural computation, vol. 31, no. 2,

pp. 417–439, 2019.

[14] A. M. S. Ang, J. E. Cohen, N. Gillis, and L. T. K. Hien, “Extrapolated al-

ternating algorithms for approximate canonical polyadic decomposition,”

in ICASSP 2020-2020 IEEE International Conference on Acoustics,

Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 3147–3151.

[15] ——, “Accelerating block coordinate descent for nonnegative tensor

factorization,” Numerical Linear Algebra with Applications, 2021.

[16] P. Smaragdis and J. C. Brown, “Non-negative matrix factorization for

polyphonic music transcription,” in 2003 IEEE Workshop on Appli-

cations of Signal Processing to Audio and Acoustics (IEEE Cat. No.

03TH8684). IEEE, 2003, pp. 177–180.

[17] J. Le Roux, H. Kameoka, N. Ono, and S. Sagayama, “Fast signal

reconstruction from magnitude stft spectrogram based on spectrogram

consistency,” in International Conference on Digital Audio Effects, 2010.

[18] J. Bronson and P. Depalle, “Phase constrained complex nmf: Separating

overlapping partials in mixtures of harmonic musical sources,” in

2014 IEEE International Conference on Acoustics, Speech and Signal

Processing (ICASSP). IEEE, 2014, pp. 7475–7479.

[19] D. Kitamura and K. Yatabe, “Consistent independent low-rank matrix

analysis for determined blind source separation,” EURASIP Journal on

Advances in Signal Processing, vol. 2020, no. 1, pp. 1–35, 2020.

[20] C.-J. Lin, “Projected gradient methods for nonnegative matrix factoriza-

tion,” Neural computation, vol. 19, no. 10, pp. 2756–2779, 2007.

[21] N. Gillis and F. Glineur, “Accelerated multiplicative updates and hi-

erarchical als algorithms for nonnegative matrix factorization,” Neural

computation, vol. 24, no. 4, pp. 1085–1105, 2012.

[22] E. J. Candes, X. Li, and M. Soltanolkotabi, “Phase retrieval via wirtinger

ﬂow: Theory and algorithms,” IEEE Transactions on Information The-

ory, vol. 61, no. 4, pp. 1985–2007, 2015.