Content uploaded by Eren Babatas

Author content

All content in this area was uploaded by Eren Babatas on Oct 08, 2020

Content may be subject to copyright.

Time and Frequency Based Sparse Bounded

Component Analysis Algorithms for Convolutive

Mixtures

Eren Babatasa, Alper T. Erdogana

aElectrical-Electronics Engineering Dept., Koc University, Istanbul, 34450, Turkey

Abstract

In this paper, we introduce time-domain and frequency-domain versions of a new

Blind Source Separation (BSS) approach to extract bounded magnitude sparse

sources from convolutive mixtures. We derive algorithms by maximization of

the proposed objective functions that are deﬁned in a completely deterministic

framework, and prove that global maximums of the objective functions yield

perfect separation under suitable conditions. The derived algorithms can be ap-

plied to temporal or spatially dependent sources as well as independent sources.

We provide experimental results to demonstrate some beneﬁts of the approach,

also including an application on blind speech separation.

Keywords: Convolutive Blind Source Separation, Bounded Component

Analysis, Sparse Component Analysis, Sparse Bounded Component Analysis,

Blind Speech Separation.

1. Introduction

Convolutive Blind Source Separation (BSS) is a generic inverse problem with

broad impact as the extraction of multiple sources from their space-time mix-

tures is a common problem in various engineering areas. The acoustic source

separation in reverberant environments [1, 2] can be considered as the signature

5

Email addresses: ebabatas@ku.edu.tr (Eren Babatas), alperdogan@ku.edu.tr (Alper

T. Erdogan)

Preprint submitted to Journal of Signal Processing March 11, 2020

case, however, the applications span a much wider range including digital com-

munications [3], electrocardiogram measurements [4], radar signal processing

[5].

While there are many diﬀerent approaches to solve the convolutive mixing

problem in BSS, we can arrange the related methods into two main categories

10

according to their processing domain. The ﬁrst group is referred as the time do-

main methods where the separator implementation and the algorithm are based

on the convolution with Multiple Input Multiple Output (MIMO) FIR ﬁlter

[2, 6]. The main diﬃculty with these methods is that the compensation for

mixing may require long FIR ﬁlters in order to achieve acceptable performance.

15

This situation is very likely especially for audio signals under reverberation

[7] and increases the processing load of the convolution operation in time. The

longer ﬁlter lengths also imply more coeﬃcients to be learned during the training

period. The methods in the second group, frequency domain methods, map the

temporal mixtures to frequency spectrum data so that temporal convolution is

20

converted to multiplication in frequency domain [8]. Hereby, instantaneous BSS

algorithms can be applied to each frequency bin separately. Frequency trans-

formation is executed using diﬀerent methods such as sliding window Discrete

Fourier Transform (DFT) [9] and Short Time Fourier Transform (STFT) [10].

Frequency domain methods have two important issues: scaling and permutation

25

inconsistencies that cause unequal scaling of the spectral components in diﬀerent

bins and the spectral mixing of the source components. Also, the mixing pro-

cess is complex valued because of frequency transformation which increases the

complexity and the processing load of the frequency domain methods. On the

other hand, improved eﬃciency and better convergence features are considered

30

as the advantages of the frequency domain methods.

BSS algorithms exploit diﬀerent assumptions on sources and mixing systems

to solve BSS problem. Among them, the mutual independence of sources is a

strong assumption that is used by the popular Independent Component Analysis

(ICA) method which was initially proposed for the instantaneous mixing case

35

[11]. Later ICA approach was extended also to convolutive BSS settings. As

2

an example, Douglas [2] proposed a spatio-temporal extension of well known

FastICA algorithm of Hyvarinen and Oja [12]. Similarly, Koldovsk´y [6] applied

powerful ICA algorithm EFICA [13] to convolutive mixtures for the time-domain

separation. There are many other convolutive BSS algorithm categories based

40

on diﬀerent properties. Several BSS algorithms [1] exploit non-stationarity of

sources which is a very common characteristic for speech signals. In another

category, Sparse Component Analysis (SCA) methods have also been a useful

tool in convolutive BSS applications. Although many source signals are not

sparse in time domain, they turn out to be sparse when transformed to frequency

45

domain or time-frequency domain. Therefore, frequency-domain SCA methods

have recently received more attention with the advantage that they can also be

applied in under-determined mixing case for convolutive BSS [14].

Bounded Component Analysis (BCA) is a recently introduced BSS frame-

work that is based on domain separability assumption which is less strict than

50

the independence assumption in ICA. Cruces showed in [15] that the source

boundedness side information can be used to separate temporal or spatially

dependent sources as well as independent sources. In [16], Erdogan presented

a deterministic framework based on some geometric optimization settings. In

[17], we proposed an expansion of the instantaneous BCA approach in [16] to

55

sparsely natured bounded signals. This expansion is referred as Sparse BCA

(SBCA). SBCA modiﬁes the geometric framework in [16] such that the sources

are assumed to be bounded in an 1-norm-ball instead of the ∞-norm-ball used

in [16].

In this article, we propose time based and frequency based convolutive ver-

60

sions of SBCA approach in [17] for (over)determined case. The time based

approach was at ﬁrst introduced in the conference paper [18], here the exten-

sion of this time based approach to complex source signals is explained. On the

other hand, there are two main contributions of this article with respect to the

conference paper:

65

•A new algorithm that uses a frequency based objective function in order to

3

sparsify source signals by mapping them to frequency spectrum data.

•A practical application of the algorithm for blind speech separation in a re-

verberant scene.

The article is organized as follows: In Section 2.1, we introduce time domain

70

convolutive BSS setup and propose a time-domain-sparsity based convolutive

SBCA approach. In Section 3, we map convolutive mixtures to frequency spec-

trum data via STFT and describe a new signal setup in frequency domain.

Then, we propose a frequency-domain-sparsity based SBCA optimization set-

ting. Finally, there are numerical examples in Section 4, also including a blind

75

speech separation application.

This paper uses the following notations: Let ˜

Xdenote convolutive channel

response deﬁned as ˜

X=[X(0),X(1),...,X(K−1)] where Kis the channel

order. ΓN(X) represents the block Toeplitz matrix of X.Letxadenote a-

norm of the vector x.LetXa,b denote induced matrix norm formulated as80

supsb≤1Xsa.Υ:Cp→R2pdenotes the operator transforming a complex

signal vector of psize to its real isomorphic counterpart of 2psize. ´

xrepresents

Υ(x)shortly. Let˜

xK(n)=[xT(n),xT(n−1),...,xT(n−K+1)]

Tdenote the

convolutive signal vector of x(n). Then ´

˜

xK(n) is the real isomorphic vector for

˜

xK(n). Let Ω( ˜

X) denote the real isomorphic matrix for convolutive channel85

response ˜

Xwhere Ω : Cq×Kp →R2q×2Kp.x(m, f ) is the Short Time Fourier

Transform of x(n). Let Px(f) denote Power Spectral Density matrix of x(f)

and Px(m, f ) denote the instant PSDM of x(m, f ).

2. Time-Domain-Sparsity Based Convolutive SBCA Framework

Before beginning to deﬁne time-domain convolutive SBCA framework, we

90

give a short introduction of instantaneous SBCA framework that was proposed

in [17]. In the instantaneous SBCA framework,

•There are psources represented by the sample set S={s(n)∈

p,n =

1,...L}and the source samples are bounded by an 1norm ball, i.e., s(n)∈B

s

4

where Bs={q∈

p|q1<1}. In the more general case, Bsis replaced with95

a weighted 1norm ball.

•Source signals are mixed in linear, memoryless and lossless channels whose

time responses are denoted by the full-rank matrix A∈Rq×pwhere q≥p.

The mixed signals are formulated as y(n)=A(n)s(n).

•The mixed signals are ﬁltered by an instantaneous separator ﬁlter which is

100

denoted by the matrix B∈

p×q. The separator outputs are formulated as

z(n)=By(n).

•The cascade of the mixing and separator systems is given by F=BA.The

aim of BSS algorithms is to obtain a separator matrix that will provide the

equality F=PD where Pis a permutation matrix and Dis a full rank

105

diagonal scaling matrix.

The SBCA algorithm solves the BSS problem described above by using a ge-

ometrical optimization setting illustrated in Figure 1, i.e., the volume ratio of

two geometrical objects called as principal hyper-ellipsoid (red ball in Figure

1) and bounding 1norm ball (green diamond shaped box in Figure 1). The110

SBCA objective function derived from these volumes is deﬁned as

J(B)= det( ˆ

Rz)

(maxn∈{1,...,L}z(n)1)p.(1)

In this formula, the nominator represents the volume of the principal hyper-

Figure 1: Geometric objects used in the SBCA framework. Diamond boxes: the

bounding l1-norm-balls, red balls: the principal hyper-ellipsoids, green polytope:

the image of the input l1-norm-ball under the mapping.

5

ellipsoid, i.e., Ez={q|(q−ˆμz)Tˆ

R−1

z(q−ˆμz)≤1}where ˆ

Rzisthesamplecovari-

ance matrix of the separator outputs formulated as ˆ

Rz=1

LL

n=1 z(n)z(n)T−

ˆμzˆμT

z. The denominator is the volume of the bounding 1norm ball where115

z(n)1is 1norm of the separator output. Note that the number of sources is

known as a priori information. The detailed derivations of the objective function

and the resulting iterative update equation are provided in [17].

2.1. Convolutive Blind Source Separation Setup in Time Domain

For the convolutive BSS setup,

120

•The sources are represented by the set S={s(n)∈Rp}and assumed to be

bounded and lie in 1-norm-ball Bsdescribed by Bs={s∈Rp|s1<1}.

It is again assumed that p is known as a priori information. Although Bs

can be replaced with the weighted 1-norm-ball for sources having diﬀerent

ranges, we use the unity-1-norm-ball to simplify expressions without any loss125

of generality.

•The convolutive MIMO mixing channel output is formulated as

y(n)=

M−1

m=0

A(m)s(n−m),(2)

where {A(m),m ∈{0,...,M −1}} are the channel impulse response coeﬃ-

cients of q×psize, where qis the number of mixtures. The mixing system

is equalizable [19] with order M−1andq≥p. Equalizable channels al-

130

low transmission of zeros at z=0 in the z-plane, and it makes possible that

source signals reach the receiver channel with diﬀerent delays relative to each

other. Moreover, a necessary and suﬃcient condition for a convolutive chan-

nel {A(m),m ∈{0,...,M −1}},whereA∈Rq×p, to be equalizable is

rank(A(z)) = pfor q≥p.

135

•The FIR separator ﬁlter output is formulated as

z(n)=

K−1

k=0

B(k)y(n−k),(3)

6

where {B(k),k ∈{0,...,K −1}} are the separator ﬁlter coeﬃcients of di-

mension p×qand K−1 is the separator ﬁlter order.

•Inserting (2) into (3) yields the total system response

F(m)=

K−1

k=0

B(k)A(m−k),m=0,...,P −1,(4)

where P is the total system response order and P−1=K+M−2.

140

•Using the convolutive mixing channel response matrix ˜

A=[A(0),A(1),

...,A(M−1)], separator ﬁlter response matrix ˜

B=[B(0),B(1),..., B(K−

1)] and system response matrix ˜

F=[F(0),F(1),...,F(P−1)], (2) and (3)

can be reformulated as

y(n)= ˜

A˜

sM(n),n=1,...L, (5)

z(n)= ˜

B˜

yK(n),n=1,...,L+K−1,(6)

z(n)=˜

F˜

sP(n),n=1,...,L+K−1.(7)

where ˜

sM(n)=[sT(n),sT(n−1),...,sT(n−M+1)]

T,˜

yK(n)=[yT(n),145

yT(n−1),...,yT(n−K+1)]

Tand ˜

sP(n)=[sT(n),sT(n−1),...,sT(n−

P+1)]

T.

•We will use the extended separator output vector ˜

zN(n)=[zT(n),zT(n−

1),...,zT(n−N+1)]

Tand block-Toeplitz matrix ΓN(˜

F) deﬁnitions in the

objective function formulation. ΓN(˜

F) represents a block Toeplitz matrix150

whose ﬁrst block row is [F(0),F(1),...,F(P−1),0,...,0] and ﬁrst block

column is [F(0),0,...,0]T. Note that the zero matrices are p×p. More

explicitly, ΓN(˜

F) is given by the equation

ΓN(˜

F)=⎡

⎢

⎢

⎢

⎣

F(0),F(1),...,F(P−1),...,0

.

.

........

.

.

0,...,F(0),F(1),...,F(P−1)

⎤

⎥

⎥

⎥

⎦

,(8)

where N≥P. This yields the equation ˜

zN(n)=Γ

N(˜

F)˜

sN+P−1(n)for

the extended separator output vector where ˜

sN+P−1(n)=[sT(n),sT(n−155

7

1),...,sT(n−N−P+2)]

T. Deﬁning the set SN+P−1={˜

sN+P−1(N+

K−1),˜

sN+P−1(N+K),...,˜

sN+P−1(L)}, we introduce the following local

dominance assumption for our convolutive BSS setup:

Assumption (A1): The source sample set SN+P−1contains the vertices (the

corners of a volume) of its bounding 1-norm-ball Bs.160

2.2. Objective Function

Similar to [17],the objective function is deﬁned to be the volume ratio of two

geometrical objects:

•The bounding 1-norm ball: is deﬁned with respect to the maximum 1norms

of the extended separator outputs, i.e.,

165

Bz={q|q1≤maxn∈{N,...,L1}˜

zN(n)1}where L1=L+K−1.

•Principal Hyper-ellipsoid : is deﬁned with respect to the covariance of the

extended separator outputs, i.e., Ez={q|(q−ˆμ˜

zN)Tˆ

R−1

˜

zN(q−ˆμ˜

zN)≤1}

where ˆμ˜

zN=1

L2

L1

n=N

˜

zN(n), ˆ

R˜

zN=1

L2

L1

n=N

(˜

zN(n)−ˆμ˜

zN)(˜

zN(n)−ˆμ˜

zN)T

and L2=L1−N+1.170

Based on the deﬁnitions, we formulate the CSBCA objective as a volume ratio

J(˜

B)= det( ˆ

R˜

zN)

max

n∈{N,...,L1}˜

zN(n)1Np ,(9)

which is to be maximized. The main diﬀerence between (9) and (1) is that the

separator output vector zis replaced with the extended separator output vector

˜

zN. The following theorem ensures that maximization of the objective function

in (9) achieves blind source separation of convolutive mixtures.

175

Theorem-1: Let us assume that an FIR separator matrix of order K−1can

equalize the mixing channel ˜

A. Then all global maximums of (9) give perfect

separation if the assumption (A1) holds.

The proof is provided in Appendix A.

8

2.3. Iterative Algorithm180

In order to transform the objective in (9) to a more convenient form for the

iterative algorithm derivation, we take its logarithm.

J(˜

B)=1

2log det(ΓN(˜

B)ˆ

R˜yN+K−1ΓN(˜

B)T)

J1(˜

B)

−Nplog( max

n∈{N,...,L1}˜

zN(n)1)

J2(˜

B)

The ﬁrst term J1(˜

B) is convex diﬀerentiable, and the second term J2(˜

B)isa

convex non-smooth function. We can utilize Clarke sub-diﬀerential [20] in order

to take the derivative of J2(˜

B). If we denote 1-norm-ball of the separator185

output as fn(ΓN(˜

B)) = ˜

zN(n)1, the sub-diﬀerential set of fn(ΓN(˜

B)) with

respect to the argument ΓN(˜

B) can be written as

∂max(fn(ΓN(˜

B))) = o=r˜

yN+K−1(l):ri=sign{(˜

zN)i(l)}+1

˜

zN(l)=0αi,(10)

where ˜

yN+K−1(l)=[yT(l),yT(l−1),...,yT(l−N−K+2)]

T,lis the index of

maximum 1norm and αi∈[−1,1]. Since we aim to derive an iterative update

for the argument ˜

Binstead of ΓN(˜

B), it is necessary to convert (10) into a new190

derivative equation with respect to ˜

Bas follows:

´

o=

N−1

m=0

omp+1:(m+1)p,mq+1:(m+K)q(11)

By applying the chain rule, we obtain the update term corresponding to

J2(˜

B)as

∂J2(˜

B)=Np

l∈I ˜

B

N−1

m=0

λlomp+1:(m+1)p,mq+1:(m+K)q

max

n∈{N,...,L1}˜

zN(n)1

,(12)

where λl≥0,

l∈I ˜

B

λl=1. I˜

Bis a subset of {N,...,L

1}and consists of the

indices for which maximum 1-norm at the separator output is achieved. As in195

(11), the gradient of J1(˜

B) with respect to ˜

Bis written as a summation term

∂J1(˜

B)=

N−1

m=0

Xmp+1:(m+1)p,mq+1:(m+K)q,(13)

9

where X=ΓN(˜

B)ˆ

R˜

yN+K−1ΓN(˜

B)T−1ΓN(˜

B)ˆ

R˜

yN+K−1,ˆ

R˜

yN+K−1=

1

L2

L1

n=N

(˜

yN+K−1(n)−ˆμ˜

yN+K−1)(˜

yN+K−1(n)−ˆμ˜

yN+K−1)T,andˆμ˜

yN+K−1=

1

L2L1

n=N˜

yN+K−1(n).

Consequently, the iterative update equation is formed by combining the

200

gradient and chosen sub-gradient for J1(˜

B)andJ2(˜

B) respectively. In addition,

we can generate a simpler iterative update where only one λlterm is non-zero.

It can be accomplished by selecting a random index location lfrom I˜

Bat every

iteration:

˜

B(t+1) =˜

B(t)+σ(t)(

N−1

m=0

X(t)

mp+1:(m+1)p,mq+1:(m+K)q−

Np

N−1

m=0

o(t)

mp+1:(m+1)p,mq+1:(m+K)q

max

n∈{N,...,L1}˜

zN(n)1

).(14)

As the proposed approach relies on the maximization of a non-concave ob-

205

jective function, the characterization of the convergence behaviour for the cor-

responding sub-gradient based algorithm is relatively hard. Inan, Erdogan and

Cruces provided an analysis in [21] for the stationary point characterization of

the BCA algorithm introduced in [16] and showed that its stationary points

correspond to either the global maximums of the objective function or unsta-

210

ble saddle points. It means that the stationary points of the BCA algorithm

do not correspond to the local maximums of the objective function. Although

this result cannot be generalized for the objective functions presented here, it is

promising for the convergence behaviour of the instantaneous and convolutive

SBCA algorithms. A similar stationary point characterization is in our future

215

research agenda and we expect it to lead to the same conclusion with [21]. In

addition, the empirical results we obtain from the numerical experiments sup-

port the conjecture that the algorithm always converges to the vicinity of a

desired separation point with an appropriate step size selection. The question

if the algorithm always converges to the stationary points is still involved, how-

220

ever, recent research outcomes on non-convex global convergence analysis with

10

appropriate initialization methods are encouraging [22].

2.4. Algorithm Extension to Complex Sources

For the complex extension of the algorithm, we follow the isomorphism based

approach used in [16]. Let us introduce the following terms for the complex case

225

that are used in the complex derivation. We deﬁne the operator Υ : Cp→R2p

Υ(a)=Re(aT)Im(aT)T(15)

as an isomorphism between complex and real vectors. For a given complex

vector a, we use the notation ´

ato refer Υ(a). In the same way, we can de-

ﬁne the real isomorphic vector for convolutive signal vector ˜

aK(n)as´

˜

aK(n)=

[Re(aT(n)) Im(aT(n)) ... Re(aT(n−K+1)) Im(aT(n−K+ 1))]T.230

We also deﬁne the operator for the convolutive channel, i.e., Cq×Kp →

R2q×2Kp:

Ω( ˜

X)=⎡

⎣Re(X(0)) −Im(X(0)) ... −Im(X(K−1))

Im(X(0)) Re( ˜

X(0)) ... Re(X(K−1)) ⎤

⎦(16)

Finally, the real isomorphic counterparts of the convolutive mixtures and sep-

arator outputs are written as ´

y(n)=Ω(˜

A)´

˜

sM(n)and´

˜

zN(n)=Ω(˜

B)´

˜

yN+K−1(n)

respectively.

235

Using the deﬁnitions above, the objective function to be maximized in (9)

can be modiﬁed for the complex case as

Jc(˜

B)= det( ˆ

R´

˜

zN)

(max

n∈{N,...,L1}´

˜

zN(n)1)2Np (17)

To derive an iterative algorithm for complex sources, the ratio form in (17) is

transformed to diﬀerence form by taking logarithm:

Jc(˜

B)=log(Jc(˜

B)) = 1

2log det(ΓN(Ω( ˜

B)) ˆ

R´

˜yN+K−1ΓN(Ω( ˜

B))T)

−2Nplog( max

n∈{N,...,L1}´

˜

zN(n)1) (18)

11

The corresponding iterative update equation for the separator matrix B(k)(at240

time lag k) can be written as

B(t+1)(k)=B(t)(k)+σ(t)(T1:p,2kq+1:(2k+1)q+Tp+1:2p,(2k+1)q+1:2(k+1)q+

j(Tp+1:2p,2kq+1:(2k+1)q−T1:p,(2k+1)q+1:2(k+1)q)−

2Np

N−1

m=0

ˇ

omp+1:(m+1)p,(m+k)q+1:(m+k+1)q

max

n∈{N,...,L1}´

˜

zN(n)1

).(19)

where ˇ

o=signc{˜

zN(l(t))}˜

yN+K−1(l(t))H,

T=

N−1

r=0

Ω(X)(t)

2rp+1:2(r+1)p,2rq+1:2(r+K)q,

and Ω(X)=ΓN(Ω( ˜

B)) ˆ

R´

˜yN+K−1ΓN(Ω( ˜

B))T−1ΓN(Ω( ˜

B)) ˆ

R´

˜yN+K−1.l(t)rep-

resents the time index for which maximum 1-norm at the separator output is245

achieved.

3. Frequency-Domain-Sparsity Based Convolutive SBCA Framework

The frequency domain methods map the convolutive mixtures to instanta-

neous mixtures in frequency domain so that instantaneous BSS algorithms can

be applied to spectral data in each frequency bin separately. The frequency

250

based convolutive SBCA proposed in this section also uses an instantaneous

mixing model as done by the other frequency domain methods. Furthermore,

it does not have the inherently known permutation and scaling ambiguities suf-

fered by the frequency domain methods because it updates only one temporal

separator matrix instead of a spectral separator matrix for each frequency bin

255

separately.

3.1. Blind Source Separation Setup in Frequency Domain

•We assume that there are pbounded sources. We obtain time-frequency

spectrum representations of the sources by using STFT

si(m, f )=

n

si(n)v(n−mT )e−j2πfn i=1,...,p (20)

12

where f∈[−1/2,1/2] is the frequency index, v(n) is a time-window function260

(Hamming, Kaiser, etc.) of length R1,R1∈Zis the STFT frame length,

T∈Zis the hop size, in samples, between successive DTFTs, m∈Lvis the

time index of STFT, and Lvis the index set of STFT frames taken over the

signal samples. Then, p×Lsize temporal data are mapped to p×dim{Lv}

size spectral data for each frequency bin f.

265

•Since the transformed source vectors are complex, we follow the isomorphism

based approach described in Section 2.4. We deﬁne 2pdimensional real iso-

morphic vectors as

´

s(m, f )=Re(s(m, f))TIm(s(m, f ))TT(21)

•For each frequency bin, the isomorphic source sample vectors are assumed to

lie in unity-1-norm-ball, i.e., B´s={´

s(m, f )∈R2p|

´

s(m, f )1≤1,∀m}.270

•Mixing process occurs in a convolutive MIMO channel which is assumed to

be linear, time-invariant (LTI), and equalizable [19].

•Let y(m, f ) be the ﬁltered version of the source STFTs, i.e., multiplication of

the source STFTs and the DTFT of the convolutive mixing channel response:

275

y(m, f )=A(f)s(m, f),(22)

where A(f) is a matrix containing the frequency transform of the mixing ﬁlter

elements at frequency bin f. When taking the inverse STFT of y(m, f)and

adding them up with the corresponding overlaps, we obtain the same output

with (2).

•Let ˜

y(m, f )=[˜y1(m, f),˜y2(m, f ),...,˜yq(m, f)]Tbe the STFT of the mixtures280

computed by ˜yk(m, f )=

n

yk(n)v(n−mT )e−j2πfn.

There is a small diﬀerence between y(m, f) in (22) and ˜

y(m, f ) due to bound-

ary eﬀects [23]. The number of aﬀected samples due to boundary eﬀects is

equal to mixing ﬁlter order. When using a suitable STFT window type with

13

a suﬃcient length and assuming a fast decaying mixing channel response, we285

obtain a very good approximation for the mixing output, i.e.,

y(m, f )≈˜

y(m, f ) (23)

Under these circumstances, it results that the frequency elements observed

from the mixing channel outputs can be assumed to be instantaneous mixtures

of the frequency elements of the sources according to (22).

•By using (21) and the real isomorphic mapping operator Ω deﬁned in section

290

2.4, (22) is transformed to ´

y(m, f )=Ω(A(f))´

s(m, f ).

•Let z(m, f ) be the separator output of y(m, f) in (22) obtained by multi-

plication of the DTFT of the separator ﬁlter and y(m, f ), i.e., z(m, f)=

B(f)y(m, f )whereB(f),f ∈[−1/2,1/2] is the separator frequency response

of dimension p×q.

295

•The STFT of the separator output vector ˜

z(m, f )=[˜z1(m, f ),˜z2(m, f ),...,

˜zp(m, f )]Tis computed by ˜zk(m, f )=

n

zk(n)v(n−mT )e−j2πfn.Onthe

basis of (23), z(m, f )and˜

z(m, f ) are approximately equal. Therefore, the

temporal separator matrix can be estimated using ˜

z(m, f ) instead of z(m, f)

in STFT domain.

300

•The separator outputs’ real isomorphic vector is represented as ´

z(m, f )=

Ω(B(f))´

y(m, f ). Writing the total frequency response as F(f)=B(f)A(f),

we can rewrite ´

z(m, f ) in terms of the sources as ´

z(m, f )=Ω(F(f))´

s(m, f ).

•For a perfect separation in frequency domain, the frequency response of the

total system must satisfy the equality Ω(F(f)) = PD(f), where Pis a per-

305

mutation matrix and D(f) is a full-rank diagonal matrix.

Deﬁning the isomorphic source sample set ´

S(f)={´

s(m, f )∈R2p,m∈Lv},

the following assumption is introduced for the frequency domain BSS setup

deﬁned above:

Assumption(A2): For each frequency bin, ´

S(f) contains the vertices (the

310

corners of a volume) of its bounding 1-norm-ball B´s.

14

3.2. Objective Function

To obtain a frequency based objective function, we extend the volume deﬁ-

nition of the principal hyper-ellipsoid in [17] to frequency domain. In frequency

domain, the Power Spectral Density Matrix (PSDM) of source processes can be

315

deﬁned as

Ps(f) = lim

L→∞ Es(f)s(f)H,(24)

where s(f) is the source DTFT, Lis the sample size of s(n). Using the relation

between the source DTFT’s and STFT’s, (24) can be rewritten as

Ps(f) = lim

dim{Lv}→∞

m∈Lv

Ps(m, f ),(25)

where Ps(m, f ) is the instant PSDM of s(m, f ) [23]. The separator output

PSDM can be related to the source PSDM by Pz(f)=F(f)Ps(f)F(f)Hwhere320

Pz(f)∈Cp×pand F(f) is the frequency response of the total system. For

a given frequency bin, this is equal to the covariance matrix of the separator

outputs’ spectrum. Thus, the determinant of the separator outputs’ temporal

covariance matrix used for instantaneous mixtures in [17] can be replaced by the

determinant of the separator output PSDM in frequency domain for convolutive

325

mixtures. By this way, an extension of the principal hyper-ellipsoid’s volume

deﬁnition in [17] is obtained for the convolutive mixtures in frequency domain.

Consequently, we propose the objective function

J(˜

B)=

1/2

f=−1/2

1

2log (det(Ω (Pz(f)))) df −

1/2

f=−1/2

2plog max

m∈Lv

´

z(m, f )1df ,

(26)

where Ω(Pz(f)) is the corresponding real isomorphic matrix. The integral term330

in (26) makes it possible that the objective function be deﬁned according to

the time-domain separator matrix ˜

B. Unless the integral term was used, the

objective function would have to be maximized for each frequency bin separately

instead of being maximized with respect to ˜

B.

The following theorem ensures that maximization of the objective function

335

in (26) achieves blind source separation of temporally convolutive mixtures.

15

Theorem-2: Let us assume that an FIR separator matrix can equalize the

mixing channel A(f). Given the BCA setup in Section 3.1, all global maximums

of (26) give perfect separation if the assumption (A2) is correct.

The proof is provided in Appendix B.

340

3.3. Iterative Algorithm

For the ﬁnite set of observations {y(1),...,y(L)}, we modify the objective

as

J(˜

B)= 1

β

R1−1

l=−R1+1

⎛

⎜

⎜

⎜

⎝

1

2log det(Ω(ˆ

Pz(l)))

J1(˜

B)

−2plog max

m∈Lv

´

˜

z(m, l)1

J2(˜

B)

⎞

⎟

⎟

⎟

⎠

,(27)

where β=2R1−1 is the DFT size and ˆ

Pz(l) is the PSDM estimate of the

separator output. Note that ´

z(m, l) is replaced with ´

˜

z(m, l) on the basis of

345

(23). ˆ

Pz(l) is deﬁned in terms of the PSDM estimate of the mixture vector as

ˆ

Pz(l)=B(l)ˆ

Py(l)B(l)H.ˆ

Py(l) is the PSDM estimate of the mixtures and given

by

ˆ

Py(l)=

m∈Lv

ˆ

Py(m, l),(28)

where ˆ

Py(m, l) is the instant PSDM estimate of the mixture signals. For the

calculation of the estimated PSDM ˆ

Py(m, l), we use the spectrogram method350

[24] expressed as

ˆ

Py(m, l)=

t

u(t)˜

yt(m, l)˜

yt(m, l)H(29)

where u(t) is a time-window of length R2and ˜

yt(m, l)=

n

y(n+t)v(n−

mT )e−j2π(n+t)l/β. The window u(t) provides a weighted averaging of R2spectra

in order to reduce the variance of the estimation.

In order to prevent scaling and permutation ambiguities, we derive a time-

355

domain separation approach based on the frequency domain objective function

in (27) instead of the separation matrices for each frequency bin separately.

16

The derivative of the ﬁrst part of J(˜

B) with respect to B(k)fork=

0,...,K −1is

1

β

R1−1

l=−R1+1

∂J1(˜

B)

∂B(k)=∂Blogdet =Re{X11 +X22 +j(X21 −X12)}.(30)

The components Xij ∈Rp×qaredeﬁnedasfollows:

1

β

R1−1

l=−R1+1

Ω(ˆ

Pz(l))−1Ω(B(l))Ω(ˆ

Py(l))OT=⎡

⎣X11 X12

X21 X22⎤

⎦,(31)

where O=⎡

⎣cos 2πkl/β sin 2πkl/β

−sin 2πkl/β cos 2πkl/β⎤

⎦. The spectral transform B(l) in (31) is

given explicitly as B(l)=

R1−1

k=−R1+1

B(k)e−j2πkl/β.360

The derivative of the second part is

R1−1

l=−R1+1

1

β

∂J2(˜

B)

∂B(k)=∂Bsubg =2p

β

R1−1

l=−R1+1

Re{signc{˜

z(ul,l)}˜

y(ul,l)Hej2πkl/β}

max

m∈Lv

´

˜

z(m, l)1

(32)

where ul∈Lvdenotes the index of the STFT frame for which maximum 1-

norm at the separator output in frequency bin lis achieved. signc denotes the

sign operator for a complex signal computed as Re{sign{a}} +iIm{sign{a}}.

365

Finally, the update equation for the separator matrix in time domain is given

as,

B(t+1)(k)=B(t)(k)+σ(t)∂B(t)

logdet −∂B(t)

subg.(33)

4. Numerical Examples

In the experiments, as benchmark algorithms, we use Kurtosis Maximization

algorithm (MaxKurtosis) optimizing kurtosis based contrast function [25], the

370

convolutive BSS method [1] exploiting the non-stationarity of source signals la-

belled as “convBSS”, the Alternating-Least-Squares (ALS) based method [26],

17

which again exploits the non-stationarity, labelled as “ALS”, Spatio-Temporal

algorithm (STFICA) [2] which is an extension of FastICA method, time-based

convolutive Bounded Component Analysis (CBCA) [27], and Cho’s underde-

375

termined convolutive BSS [28]. We illustrate CBCA’s performance only in the

ﬁrst and second experiments to prove CSBCA’s superiority for sparse signals.

Besides, we include Cho’s method only in the third experiment because it is an

under-determined BSS method and its publicly available code is optimized for

audio source separation. We use the MATLAB toolbox called BSS Evaluation

380

proposed in [29] which is designed speciﬁcally to measure the performance of

algorithms in Blind Audio Signal Separation problem.

4.1. Synthetic Sparse Signals

0200 400 600 800 1000

−20

0

20

0200 400 600 800 1000

−20

0

20

0200 400 600 800 1000

−20

0

20

Signal Magnitude

0200 400 600 800 1000

−20

0

20

0200 400 600 800 1000

−20

0

20

0200 400 600 800 1000

−20

0

20

Time Index

510 15 20 25 30 35 40

0

5

10

15

20

25

30

35

40

Input Signal to Noise Ratio (dB)

Output Signal to Distortion Ratio (dB)

(a) (b)

MaxKurtosis

STFICA

CBCA

ALS

convBSS

CSBCA

Figure 2: a)- Synthetic signals (RIKEN). b)- Output SDR vs. input SNR.

In the ﬁrst numerical example, we illustrate the performance of the time

based CSBCA. For the performance evaluation, we use synthetic signals in the

385

RIKEN Brain Science Institute benchmark dataset [30]. In the ﬁrst part of the

test, we consider a scenario with 1000 samples from 6 sources, and 12 mixtures.

The orders of i.i.d. Gaussian convolutive channel and separator ﬁlter are 3

and 4 respectively. In this part, we analyse Signal to Distortion Ratio (SDR)

performance versus input Signal to Noise Ratio (SNR). In the second part, we

390

consider a scenario with 1000 samples from 5 sources and analyse the perfor-

mance with respect to number of mixing channels and mixing order for input

SNR=20 dB. Note that noise is considered to be additive and white Gaussian

(AWGN) in all analyses. The source signals are illustrated in Fig.2-(a). The

18

result of the input SNR analysis is depicted in Fig.2-(b) and we can comment395

that time based CSBCA yields better or equal performance almost at all SNRs.

In the second part, the results seem more complex to interpret. At ﬁrst, it can

be interpreted that CBCA algorithm can not satisfactorily extract the source

signals even at high SNR case as seen in Fig. 2-(b). That’s why changing num-

ber of mixtures and mixing channel order does not eﬀect its performance much

400

as shown in Fig.3. Talking about the convBSS algorithm, it takes advantage of

the non-stationarity of source signals and performs a joint diagonalization on

the cross power spectral density matrices of signals by using a gradient descent

algorithm. The ALS algorithm also exploits the non-stationarity, but essentially

diﬀers from convBSS in two respects. Firstly, it uses a procedure to solve scaling

405

problem inherent with frequency domain approaches. Secondly, it uses a more

eﬃcient and quickly converging optimization method called Alternating-Least-

Squares. Especially the absence of a method for solving scaling problem may

be the reason why convBSS can not separate the sources satisfactorily even if

the number of mixtures increases. As a result, the two diﬀerences mentioned

410

above give superiority to ALS.

2 3 4 5 6 7 8

5

10

15

20

25

30

35

Order of Mixin

g

S

y

stem

Output Signal to Distortion Ratio (dB)

(a)

6 7 8 9 10 11 12

5

10

15

20

25

30

Number of Mixture Channels

Output Signal to Distortion Ratio (dB)

(b)

MaxKurtosis

STFICA

CBCA

ALS

convBSS

CSBCA

MaxKurtosis

STFICA

CBCA

ALS

convBSS

CSBCA

Figure 3: a)- Output SDR vs. mixing order. b)- Output SDR vs. number of

mixing channels.

To evaluate the performances of CSBCA’s complex extension, we again use

the synthetic sparse signals. In the complex CSBCA experiment, 3 complex

sources are generated from 6 synthetic sources. Fig. 4 illustrates the SNR

19

510 15 20 25 30 35 40

10

15

20

25

Input Signal to Noise Ratio (dB)

Output Signal to Distortion Ratio (dB)

Figure 4: Complex CSBCA: Output SDR vs. input SNR.

performance for complex CSBCA. There is no comparison with the benchmark

415

algorithms listed above because the published codes for these algorithms work

for only real signals.

4.2. Statistically Dependent Sources

0200 400 600 800 1000 1200 1400 1600 1800 2000

0

0.5

1

(a)

0200 400 600 800 1000 1200 1400 1600 1800 2000

0

0.2

0.4

0.6

0.8

0200 400 600 800 1000 1200 1400 1600 1800 2000

0

0.5

1

Time index

(a)

00.2 0.4 0.6 0.8

0

5

10

15

20

Correlation Coefficient

Output Signal to Distortion Ratio (dB)

(b)

MaxKurtosis

STFICA

CBCA

ALS

convBSS

CSBCA

Figure 5: a)- Selective Copula-T distribution random sparse sequences. b)-

Output SDR vs. correlation.

In the second example, we generate sparse and statistically dependent

sources by producing Copula-T distributed random vector u∈[−1,1]p,and420

transforming this random vector for sparsiﬁcation through the mapping

s=u,u∈Lr

0,otherwise

(34)

where Lr=x:xr≤1with0≤r≤1. We consider a scenario with 2000

samples from 3 sources, and 6 mixtures. The performance of the time domain

20

CSBCA for diﬀerent correlation degrees is examined and illustrated in Fig.5. It

results that CSBCA outperforms over the other algorithms for all correlation

425

values. It seems that ALS is most eﬀected by the source dependency. ALS and

convBSS employ joint diagonalization of cross power spectral density matrices

as the ﬁrst stage in blind source separation. The source dependency can deterio-

rate this stage, that’s why ALS may show a worse performance as the correlation

increases. The performance degradation of MaxKurtosis and STFICA with in-

430

creasing correlation is very reasonable because these methods assume statistical

mutual independence of sources.

4.3. Speech Signal Separation

In the third example, we analyse the performance of the frequency based

CSBCA for speech signal separation by using the measured channel impulse

435

responses in MARDY (Multichannel Acoustic Reverberation Database at York)

Database [31] and three (2 males and 1 female) speech records in which diﬀer-

ent persons read diﬀerent texts [32]. MARDY database is an open source for

dereverberation researchers to access real life multichannel impulse responses.

The recording setup of MARDY is described in detail in [33]. In this recording

440

work, the reverberant impulse responses of microphones were measured for three

loudspeakers at diﬀerent locations. In this experiment, the speech signals are

assumed to be transmitted by these loudspeakers and received by three micro-

phones. The recorded speech signals and convolutive mixture signals are illus-

trated in Fig. 6. Selected input parameters for the algorithms are: FCSBCA:

445

Separation Filter Order= 80, STFT Window Length= 150, Overlap Ratio of

STFT Windows= 0.8, ALS: Epoch Length= 1000, FFT Length= 512, Overlap

Ratio of FFT Windows= 0.7, STFICA: Separator Filter Order= 100, con-

vBSS: FFT Length= 512, Separator Filter Order= 100, Number of matrices

to diagonalize= 5, Cho: FFT Length= 4096, Shift Length=512, MaxKurto-

450

sis: Separator Filter Order= 100. Note that the term “Epoch”used in ALS

method means FFT frame. The performances of the algorithms for diﬀerent

mixing channel orders and input SNRs are examined as shown in Fig.7 and

21

×104

0246810121416

-0.2

-0.1

0

0.1

0.2

0.3

Speech Signals

×104

0246810121416

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

Time Index

×104

0246810121416

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

×104

0246810121416

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

Mixture Signals

×104

0246810121416

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Time Index

×104

0246810121416

-0.3

-0.2

-0.1

0

0.1

0.2

Figure 6: Speech signals and their mixtures.

01000 2000 3000 4000 5000 6000 7000 8000 9000 1000

0

−2

0

2

4

6

8

10

12

14

16

Mixing Channel Order

Output Signal to Distortion Ratio (dB)

STFICA

MaxKurtosis

ALS

convBSS

Cho

FCSBCA

(a)

01000 2000 3000 4000 5000 6000 7000 8000 9000 10000

2

4

6

8

10

12

14

16

18

20

Mixing Channel Order

Output Signal to Interference Ratio (dB)

STFICA

MaxKurtosis

ALS

convBSS

Cho

FCSBCA

(b)

Figure 7: Separation of 3 speech signals in a reverberant environment. (a)-

Output SDR vs mixing channel order. (b)-Output SIR vs mixing channel order.

Fig.8. In the channel order analysis, microphones with noiseless channel re-

sponses (SNR=Inf) are used. In the input SNR analysis, microphones with

455

channel order = 500 are used. It is clear that FCSBCA yields better perfor-

mance than all the other methods with respect to SDR values. As a result, it

can be said that the results show a practical merit of the proposed algorithm.

5. Conclusion

In this article, we propose time and frequency based deterministic BSS ap-

460

proaches for the convolutive mixtures of bounded sparse sources. This frame-

work is a natural extension of the instantaneous SBCA framework in [17]. The

22

−10 −5 0 5 10 15 20

−15

−10

−5

0

5

10

15

Input Signal to Noise Ratio (dB)

Output Signal to Distortion Ratio (dB)

STFICA

MaxKurtosis

ALS

convBSS

Cho

FCSBCA

(a)

−10 −5 0 5 10 15 20

0

5

10

15

20

Input Signal to Noise Ratio (dB)

Output Signal to Interference Ratio (dB)

STFICA

MaxKurtosis

ALS

convBSS

Cho

FCSBCA

(b)

Figure 8: Separation of 3 speech signals in a reverberant environment. (a)-

Output SDR vs input SNR. (b)-Output SIR vs input SNR.

proposed approaches do not assume statistical independence in space, time or

frequency domain. Moreover, the frequency based CSBCA has the advantages

associated with the frequency domain approaches but does not have the inher-

465

ently known permutation and scaling ambiguities because it updates temporal

separator matrix instead of separate separator matrix for each frequency bin.

We demonstrate the performance improvement over some well known convolu-

tive BSS methods, and study a practical application of blind speech separation.

Appendix A :

470

As the ﬁrst step for the proof of Theorem-1, we rewrite the objective

function, in terms of the arguments ˜

Fand ΓN(˜

F). Then, the sample covariance

matrix can be written as ˆ

R˜zN=Γ

N(˜

F)ˆ

R˜sL3ΓN(˜

F)Twhere L3=N+P−1.

Using this in (9), the objective function is linked as

J(˜

F)= det(ΓN(˜

F)ˆ

R˜

sL3ΓN(˜

F)T)

(max

n∈{N,...,L1}ΓN(˜

F)˜

sL3(n)1)Np .(A.1)

Under the assumption (A1), the denominator of (A.1) is turned out to

475

(max

n∈{N,...,L1}ΓN(˜

F)˜

sL3(n)1)Np ≤ΓN(˜

F)Np

1,1,(A.2)

23

where ΓN(˜

F)1,1=ΓN(˜

F):,11,...,ΓN(˜

F):,L3p1∞. (A.2) must be an

equality under the assumption (A1) so that (A.1) becomes

J(˜

F)= det(ΓN(˜

F)ˆ

R˜

sL3ΓN(˜

F)T)

ΓN(˜

F):,11···ΓN(˜

F):,L3p1

Np

∞

.(A.3)

Since we choose N≥P, there is at least one block column inside ΓN(˜

F)that

contains ˜

Fin upside down form. Therefore, we have the equality ΓN(˜

F):,11

··· ΓN(˜

F):,L3p1∞=ΓN(˜

F)1,:1··· ΓN(˜

F)Np,:1∞. This equality480

enables the objective to be transformed to

J(˜

F)≤det(ΓN(˜

F)ˆ

R˜

sL3ΓN(˜

F)T)

(ΓN(˜

F)1,:1···ΓN(˜

F)Np,:11/(Np))Np (A.4)

≤det(ΓN(˜

F)ˆ

R˜

sL3ΓN(˜

F)T)

ΓN(˜

F)1,:1ΓN(˜

F)2,:1···ΓN(˜

F)Np,:1

(A.5)

≤det(ΓN(˜

F)ˆ

R˜

sL3ΓN(˜

F)T)

ΓN(˜

F)1,:2ΓN(˜

F)2,:2···ΓN(˜

F)Np,:2

.(A.6)

To simplify the upper bound expression more, it is possible to rewrite the nom-

inator of the objective by completing ΓN(˜

F) to a full rank square matrix. For

the resulting matrix’s positions, we could use Schur complement. Let’s deﬁne

a(P−1)p×(L3)pmatrix X=DP where D=diag (a1;a2;...;a(P−1)p)isa485

full rank diagonal matrix and Pis a permutation matrix. Let us assume that

we select an appropriate Xmatrix that satisﬁes the equality det(XWXT)=1

where W=ˆ

R˜

sL3−ˆ

R˜

sL3ΓN(˜

F)T(ΓN(˜

F)ˆ

R˜

sL3ΓN(˜

F)T)−1ΓN(˜

F)ˆ

R˜

sL3.

Using the matrix X, we can express the determinant in a diﬀerent way:

det ⎛

⎝⎡

⎣ΓN(˜

F)

X⎤

⎦ˆ

R˜

sL3[ΓN(˜

F)TXT]⎞

⎠

=detΓN(˜

F)ˆ

R˜

sL3ΓN(˜

F)Tdet(Xˆ

R˜

sL3−

ˆ

R˜

sL3ΓN(˜

F)T(ΓN(˜

F)ˆ

R˜

sL3ΓN(˜

F)T)−1ΓN(˜

F)ˆ

R˜

sL3XT)

=detΓN(˜

F)ˆ

R˜

sL3ΓN(˜

F)Tdet(XWXT)

=detΓN(˜

F)ˆ

R˜

sL3ΓN(˜

F)T(A.7)

24

XWXTis the Schur complement of ΓN(˜

F)ˆ

R˜

sL3ΓN(˜

F)T. If we select an ap-490

propriate Xmatrix that satisﬁes the Schur complement and use Hadamard’s

Inequality, it gives the following inequality:

det(ΓN(˜

F)ˆ

R˜

sL3ΓN(˜

F)T)≤

Np

m=1

ΓN(˜

F)m,:2

2

(P−1)p

n=1

Xn,:2

2det( ˆ

R˜

sL3)(A.8)

Inserting the inequality (A.8) into (A.5), a new upper bound is obtained as

J(˜

F)≤

(P−1)p

n=1

Xn,:2det( ˆ

R˜

sL3)1/2(A.9)

Since ΓN(˜

F) is a block-Toeplitz matrix, each row of it corresponds to a row of ˜

F.

Moreover, since we choose N≥P, it is guaranteed that there is a block column

495

of ΓN(˜

F)thatcontains[F(0),F(1),...,F(P−1)]. Thus, we will consider ˜

F

instead of ΓN(˜

F) to comment the above inequalities.

The inequalities in (A.4) and (A.5) are achieved only if there is the same 1

norm in all the rows of ˜

F. The inequality between 1and 2norms in (A.6) is

achieved only if there is only one non-zero entry in each row of ˜

F. Hadamard in-

500

equality in (A.8) turns into an equality only if the non-zero entries in the rows of

˜

Fare in diﬀerent positions with respect to mod p, in order to satisfy the orthog-

onality of the rows of ΓN(˜

F) to each other and to the rows of X. To achieve all

these restrictions and satisfy the upper bound in (A.9), we conclude the overall

system transfer matrix as F(z)=diag !α1z−d1,α

2z−d2,...,α

pz−dp"Pwhere P

505

is a permutation matrix.

Appendix B :

As the ﬁrst step for the proof of Theorem-2, the PSD term in (26) is written

explicitly as Ω (Pz(f)) = Ω (F(f)) Ω (Ps(f)) Ω (F(f))T. Then, the ﬁrst term in

(26) turns out to

510

J1(˜

B)=

1/2

f=−1/2!log |det(Ω (F(f)))|2+ log(det(Ω (Ps(f))))"df (B.1)

25

The determinant term for Ω (F(f)) can be rewritten in column wise with respect

to Hadamard inequality rule:

log(|det(Ω (F(f)))|2)≤log(

2p

k=1

Ω(F(f)):,k 2

2)=

2p

k=1

log(Ω(F(f)):,k 2

2)(B.2)

Consequently, replacing the inequality in (B.2) into (26), we reach the fol-

lowing expression for the objective function:

J(F(f)) ≤

1/2

f=−1/2#2p

k=1

log(Ω(F(f)):,k 2)+

1

2log(det(Ω (Ps(f)))) −2plog( max

m∈Lv

´

z(m, f )1)df (B.3)

The volume term for the 1-norm-ball in (B.3) can be explicitly written as

(max

m∈Lv

´

z(m, f )1)2p=(max

m∈Lv

Ω(F(f)) ´

s(m, f )1)2p. Recall that the samples of

´

s(m, f ) are inside the unity 1-norm-ball. Then, for the maximum term, we515

can write the inequality ( max

m∈Lv

Ω(F(f)) ´

s(m, f )1)2p≤Ω(F(f)) 2p

1,1.Ifthe

assumption (A2) holds, this inequality turns into an equality.

By using the relations between ∞,

1and 2norms, we can put forward the

following inequalities:

Ω(F(f)) 2p

1,1=Ω(F(f)):,11,··· ,Ω(F(f)):,2p12p

∞(B.4)

≥(Ω(F(f)):,11,...,Ω(F(f)):,2p11/(2p))2p(B.5)

≥Ω(F(f)):,11Ω(F(f)):,21···Ω(F(f)):,2p1(B.6)

≥Ω(F(f)):,12Ω(F(f)):,22···Ω(F(f)):,2p2(B.7)

If we replace the inequality (B.7) into (B.3), the objective upper bound becomes

520

J(F(f)) ≤1

2

1/2

f=−1/2

log(det(Ω (Ps(f))))df (B.8)

The norm inequality in (B.5) and the arithmetic-geometric mean inequality in

(B.6) can be achieved only if there is the same 1norm in all the columns of

Ω(F(f)). The norm inequality in (B.7) can be achieved only if there is only

26

one non-zero entry in all the columns of Ω(F(f)). The Hadamard’s Inequality

in (B.2) is achieved only if all the columns of Ω(F(f)) are orthogonal to each

525

other. As a result, to achieve the upper bound in (B.8), we conclude the system

frequency response as Ω(F(f)) = αPwhere Pis a permutation matrix.

References

[1] L. Parra, C. Spence, Convolutive blind source separation of non-stationary

sources, IEEE Trans. on Speech and Audio Processing, (Code available at

530

http://bsp.teithe.gr/members/downloads.html) 8 (3) (May 2000) 320–327.

[2] S. C. Douglas, M. Gupta, H. Sawada, S. Makino, Spatio-temporal fastica

algorithms for the blind separation of convolutive mixtures, IEEE Trans.

on Audio, Speech, And Language Processing 15 (5) (July 2007) 1511–1520.

[3] T. Oktem, A. T. Erdogan, A. Demir, Adaptive receiver structures for

535

ﬁber communication systems employing polarization-division multiplexing,

Journal of Lightwave Technology 28 (10) (15 May 2010) 1536–1546.

[4] C. Vay´a, J. J. Rieta, C. S´anchez, D. Moratal, Convolutive blind source

separation algorithms applied to the electrocardiogram of atrial ﬁbrillation:

Study of performance, IEEE Transactions on Biomedical Engineering 54 (8)

540

(August 2007) 1530–1533.

[5] D. Nion, N. D. Sidiropoulos, Adaptive algorithms to track the parafac

decomposition of a third-order tensor, IEEE Transactions on Signal Pro-

cessing 57 (6) (June 2009) 2299–2310.

[6] Z. Koldovsk´y, P. Tichavsk´y, Time-domain blind audio source separation

545

using advanced ica methods, in: 8th Annual Conference of the International

Speech Communication Association, August 2007, pp. 846–849.

[7] S. Makino, H. Sawada, T.-W. Lee, Blind Speech Separation, Vol. 615,

Springer, 2007.

27

[8] P. Smaragdis, Blind separation of convolved mixtures in the frequency do-550

main, Neurocomputing 22 (1-3) (20 November 1998) 21–34.

[9] H. Attias, New em algorithms for source separation and deconvolution,

in: Acoustics, Speech, and Signal Processing, 2003 IEEE International

Conference on, Vol. 5, IEEE, 6-10 April 2003, pp. 297–300.

[10] O. Yilmaz, S. Rickard, Blind separation of speech mixtures via time-

555

frequency masking, IEEE Transactions on Signal Processing 52 (7) (July

2004) 1830 – 1847.

[11] P. Comon, Independent component analysis, a new concept?, Signal pro-

cessing 36 (3) (April 1994) 287–314.

[12] A. Hyvarinen, E. Oja, A fast ﬁxed-point algorithm for independent com-

560

ponent analysis, Neural Computation 9 (7) (10 July 1997) 1483–1492.

[13] Z. Koldovsk´y, P. Tichavsk´y, E. Oja, Eﬃcient variant of algorithm fastica

for independent component analysis attaining the cram´er-rao lower bound,

IEEE Transactions on Neural Networks 17 (5) (October 2006) 1265–1277.

[14] Z. He, S. Xie, S. Ding, A. Cichocki, Convolutive blind source separation in

565

the frequency domain based on sparse representation, IEEE Transactions

on Audio, Speech, and Language Processing 15 (5) (July 2007) 1551–1563.

[15] S. Cruces, Bounded component analysis of linear mixtures: A criterion of

minimum convex perimeter, IEEE Transactions on Signal Processing 58 (4)

(15 January 2010) 2141–2154.

570

[16] A. T. Erdogan, A class of bounded component analysis algorithms for the

separation of both independent and dependent sources, IEEE Transactions

on Signal Processing 61 (22) (15 November 2013) 5730–5743.

[17] E. Babatas, A. T. Erdogan, An algorithmic framework for sparse bounded

component analysis, IEEE Transactions on Signal Processing 66 (19) (01

575

October 2018) 5194–5205.

28

[18] E. Babatas, A. T. Erdogan, Sparse bounded component analysis for convo-

lutive mixtures, in: Acoustics, Speech, and Signal Processing, 2018 IEEE

International Conference on, IEEE, 15-20 April 2018, pp. 1–5.

[19] Y. Inouye, R.-W. Liu, A system-theoretic foundation for blind equalization

580

of an ﬁr mimo channel system, IEEE Transactions on Circuits and Systems

I: Fundamental Theory and Applications 49 (4) (April 2002) 425–436.

[20] A. Bagirov, N. Karmitsa, M. M. Makela, Introduction to Nonsmooth Op-

timization: theory, practice and software, Springer, August 2014.

[21] H. A. Inan, A. T. Erdogan, S. Cruces, Stationary point characterization for

585

a class of bca algorithms, IEEE Transactions on Signal Processing 65 (20)

(2017) 5437–5452.

[22] S. Arora, R. Ge, T. Ma, A. Moitra, Simple, eﬃcient, and neural algorithms

for sparse coding, in: Conference on Learning Theory (COLT), Proceedings

of Machine Learning Research, 3–6 July 2015.

590

[23] T. Mei, A. Mertins, F. Yin, J. Xi, J. Chicharo, Blind source separation for

convolutive mixtures based on the joint diagonalization of power spectral

density matrices, Signal Processing 88 (8) (August 2008) 1990 – 2007.

[24] W. Martin, P. Flandrin, Wigner-ville spectral analysis of nonstationary

processes, IEEE Transactions on Acoustics, Speech, and Signal Processing

595

33 (6) (1985) 1461–1470.

[25] M. Castella, E. Moreau, A new method for kurtosis maximization and

source separation, in: Acoustics, Speech and Signal Processing (ICASSP),

2010 IEEE International Conference on (Code available at http://bass-

db.gforge.inria.fr/bss locate/ ), IEEE, 14-19 March 2010, pp. 2670–2673.600

[26] K. Rahbar, J. P. Reilly, A frequency domain method for blind source sep-

aration of convolutive audio mixtures, IEEE Trans. on Speech and Audio

Processing 13 (5) (September 2005) 832–844.

29

[27] H. A. Inan, A. T. Erdogan, A convolutive bounded component analysis

framework for potentially nonstationary independent and/or dependent

605

sources, IEEE Transactions on Signal Processing 63 (1) (01 January 2015)

18–30.

[28] J. Cho, C. D. Yoo, Underdetermined convolutive bss: Bayes risk minimiza-

tion based on a mixture of super-gaussian posterior approximation, IEEE

Transactions on Audio, Speech, and Language Processing 23 (5) (06 March

610

2015) 828–839.

[29] E. Vincent, R. Gribonval, C. F´evotte, Performance measurement in blind

audio source separation, IEEE Trans. on Audio, Speech and Language Pro-

cessing (Code Available at http://bass-db.gforge.inria.fr/bss eval/)14(4)

(19 June 2006) 1462–1469.

615

[30] S.-I. A. A. Cichocki, T. K. Siwek, Tanaka, A. H. Phan, R. Zdunek,

S. Cruces, P. Georgiev, Y. Washizawa, Z. Leonowicz, et al., Icalab tool-

boxes, URL:http://www.bsp.brain.riken.jp/ICALAB (2007).

[31] Speech, A. P. Group, Mardy database,

http://www.commsp.ee.ic.ac.uk/ sap/resources / mardy-multichannel-

620

acoustic-reverberation-database-at-york-database/.

[32] V. G. Reju, S. N. Koh, I. Y. Soon, Underdetermined convo-

lutive blind source separation via time-frequency masking, IEEE

Transactions on Audio, Speech and Language Processing (avail-

able at https://www.mathworks.com/matlabcentral/ﬁleexchange/47069-

625

convolutive-bss) 18 (1) (January 2010) 101–116.

[33] J. Y. Wen, N. D. Gaubitch, E. A. Habets, T. Myatt, P. A. Naylor, Evalu-

ation of speech dereverberation algorithms using the mardy database, in:

in Proc. Intl. Workshop Acoust. Echo Noise Control (IWAENC), 12-14

September 2006.

630

30