Content uploaded by Eren Babatas
Author content
All content in this area was uploaded by Eren Babatas on Oct 08, 2020
Content may be subject to copyright.
Time and Frequency Based Sparse Bounded
Component Analysis Algorithms for Convolutive
Mixtures
Eren Babatasa, Alper T. Erdogana
aElectrical-Electronics Engineering Dept., Koc University, Istanbul, 34450, Turkey
Abstract
In this paper, we introduce time-domain and frequency-domain versions of a new
Blind Source Separation (BSS) approach to extract bounded magnitude sparse
sources from convolutive mixtures. We derive algorithms by maximization of
the proposed objective functions that are defined in a completely deterministic
framework, and prove that global maximums of the objective functions yield
perfect separation under suitable conditions. The derived algorithms can be ap-
plied to temporal or spatially dependent sources as well as independent sources.
We provide experimental results to demonstrate some benefits of the approach,
also including an application on blind speech separation.
Keywords: Convolutive Blind Source Separation, Bounded Component
Analysis, Sparse Component Analysis, Sparse Bounded Component Analysis,
Blind Speech Separation.
1. Introduction
Convolutive Blind Source Separation (BSS) is a generic inverse problem with
broad impact as the extraction of multiple sources from their space-time mix-
tures is a common problem in various engineering areas. The acoustic source
separation in reverberant environments [1, 2] can be considered as the signature
5
Email addresses: ebabatas@ku.edu.tr (Eren Babatas), alperdogan@ku.edu.tr (Alper
T. Erdogan)
Preprint submitted to Journal of Signal Processing March 11, 2020
case, however, the applications span a much wider range including digital com-
munications [3], electrocardiogram measurements [4], radar signal processing
[5].
While there are many different approaches to solve the convolutive mixing
problem in BSS, we can arrange the related methods into two main categories
10
according to their processing domain. The first group is referred as the time do-
main methods where the separator implementation and the algorithm are based
on the convolution with Multiple Input Multiple Output (MIMO) FIR filter
[2, 6]. The main difficulty with these methods is that the compensation for
mixing may require long FIR filters in order to achieve acceptable performance.
15
This situation is very likely especially for audio signals under reverberation
[7] and increases the processing load of the convolution operation in time. The
longer filter lengths also imply more coefficients to be learned during the training
period. The methods in the second group, frequency domain methods, map the
temporal mixtures to frequency spectrum data so that temporal convolution is
20
converted to multiplication in frequency domain [8]. Hereby, instantaneous BSS
algorithms can be applied to each frequency bin separately. Frequency trans-
formation is executed using different methods such as sliding window Discrete
Fourier Transform (DFT) [9] and Short Time Fourier Transform (STFT) [10].
Frequency domain methods have two important issues: scaling and permutation
25
inconsistencies that cause unequal scaling of the spectral components in different
bins and the spectral mixing of the source components. Also, the mixing pro-
cess is complex valued because of frequency transformation which increases the
complexity and the processing load of the frequency domain methods. On the
other hand, improved efficiency and better convergence features are considered
30
as the advantages of the frequency domain methods.
BSS algorithms exploit different assumptions on sources and mixing systems
to solve BSS problem. Among them, the mutual independence of sources is a
strong assumption that is used by the popular Independent Component Analysis
(ICA) method which was initially proposed for the instantaneous mixing case
35
[11]. Later ICA approach was extended also to convolutive BSS settings. As
2
an example, Douglas [2] proposed a spatio-temporal extension of well known
FastICA algorithm of Hyvarinen and Oja [12]. Similarly, Koldovsk´y [6] applied
powerful ICA algorithm EFICA [13] to convolutive mixtures for the time-domain
separation. There are many other convolutive BSS algorithm categories based
40
on different properties. Several BSS algorithms [1] exploit non-stationarity of
sources which is a very common characteristic for speech signals. In another
category, Sparse Component Analysis (SCA) methods have also been a useful
tool in convolutive BSS applications. Although many source signals are not
sparse in time domain, they turn out to be sparse when transformed to frequency
45
domain or time-frequency domain. Therefore, frequency-domain SCA methods
have recently received more attention with the advantage that they can also be
applied in under-determined mixing case for convolutive BSS [14].
Bounded Component Analysis (BCA) is a recently introduced BSS frame-
work that is based on domain separability assumption which is less strict than
50
the independence assumption in ICA. Cruces showed in [15] that the source
boundedness side information can be used to separate temporal or spatially
dependent sources as well as independent sources. In [16], Erdogan presented
a deterministic framework based on some geometric optimization settings. In
[17], we proposed an expansion of the instantaneous BCA approach in [16] to
55
sparsely natured bounded signals. This expansion is referred as Sparse BCA
(SBCA). SBCA modifies the geometric framework in [16] such that the sources
are assumed to be bounded in an 1-norm-ball instead of the ∞-norm-ball used
in [16].
In this article, we propose time based and frequency based convolutive ver-
60
sions of SBCA approach in [17] for (over)determined case. The time based
approach was at first introduced in the conference paper [18], here the exten-
sion of this time based approach to complex source signals is explained. On the
other hand, there are two main contributions of this article with respect to the
conference paper:
65
•A new algorithm that uses a frequency based objective function in order to
3
sparsify source signals by mapping them to frequency spectrum data.
•A practical application of the algorithm for blind speech separation in a re-
verberant scene.
The article is organized as follows: In Section 2.1, we introduce time domain
70
convolutive BSS setup and propose a time-domain-sparsity based convolutive
SBCA approach. In Section 3, we map convolutive mixtures to frequency spec-
trum data via STFT and describe a new signal setup in frequency domain.
Then, we propose a frequency-domain-sparsity based SBCA optimization set-
ting. Finally, there are numerical examples in Section 4, also including a blind
75
speech separation application.
This paper uses the following notations: Let ˜
Xdenote convolutive channel
response defined as ˜
X=[X(0),X(1),...,X(K−1)] where Kis the channel
order. ΓN(X) represents the block Toeplitz matrix of X.Letxadenote a-
norm of the vector x.LetXa,b denote induced matrix norm formulated as80
supsb≤1Xsa.Υ:Cp→R2pdenotes the operator transforming a complex
signal vector of psize to its real isomorphic counterpart of 2psize. ´
xrepresents
Υ(x)shortly. Let˜
xK(n)=[xT(n),xT(n−1),...,xT(n−K+1)]
Tdenote the
convolutive signal vector of x(n). Then ´
˜
xK(n) is the real isomorphic vector for
˜
xK(n). Let Ω( ˜
X) denote the real isomorphic matrix for convolutive channel85
response ˜
Xwhere Ω : Cq×Kp →R2q×2Kp.x(m, f ) is the Short Time Fourier
Transform of x(n). Let Px(f) denote Power Spectral Density matrix of x(f)
and Px(m, f ) denote the instant PSDM of x(m, f ).
2. Time-Domain-Sparsity Based Convolutive SBCA Framework
Before beginning to define time-domain convolutive SBCA framework, we
90
give a short introduction of instantaneous SBCA framework that was proposed
in [17]. In the instantaneous SBCA framework,
•There are psources represented by the sample set S={s(n)∈
p,n =
1,...L}and the source samples are bounded by an 1norm ball, i.e., s(n)∈B
s
4
where Bs={q∈
p|q1<1}. In the more general case, Bsis replaced with95
a weighted 1norm ball.
•Source signals are mixed in linear, memoryless and lossless channels whose
time responses are denoted by the full-rank matrix A∈Rq×pwhere q≥p.
The mixed signals are formulated as y(n)=A(n)s(n).
•The mixed signals are filtered by an instantaneous separator filter which is
100
denoted by the matrix B∈
p×q. The separator outputs are formulated as
z(n)=By(n).
•The cascade of the mixing and separator systems is given by F=BA.The
aim of BSS algorithms is to obtain a separator matrix that will provide the
equality F=PD where Pis a permutation matrix and Dis a full rank
105
diagonal scaling matrix.
The SBCA algorithm solves the BSS problem described above by using a ge-
ometrical optimization setting illustrated in Figure 1, i.e., the volume ratio of
two geometrical objects called as principal hyper-ellipsoid (red ball in Figure
1) and bounding 1norm ball (green diamond shaped box in Figure 1). The110
SBCA objective function derived from these volumes is defined as
J(B)= det( ˆ
Rz)
(maxn∈{1,...,L}z(n)1)p.(1)
In this formula, the nominator represents the volume of the principal hyper-
Figure 1: Geometric objects used in the SBCA framework. Diamond boxes: the
bounding l1-norm-balls, red balls: the principal hyper-ellipsoids, green polytope:
the image of the input l1-norm-ball under the mapping.
5
ellipsoid, i.e., Ez={q|(q−ˆμz)Tˆ
R−1
z(q−ˆμz)≤1}where ˆ
Rzisthesamplecovari-
ance matrix of the separator outputs formulated as ˆ
Rz=1
LL
n=1 z(n)z(n)T−
ˆμzˆμT
z. The denominator is the volume of the bounding 1norm ball where115
z(n)1is 1norm of the separator output. Note that the number of sources is
known as a priori information. The detailed derivations of the objective function
and the resulting iterative update equation are provided in [17].
2.1. Convolutive Blind Source Separation Setup in Time Domain
For the convolutive BSS setup,
120
•The sources are represented by the set S={s(n)∈Rp}and assumed to be
bounded and lie in 1-norm-ball Bsdescribed by Bs={s∈Rp|s1<1}.
It is again assumed that p is known as a priori information. Although Bs
can be replaced with the weighted 1-norm-ball for sources having different
ranges, we use the unity-1-norm-ball to simplify expressions without any loss125
of generality.
•The convolutive MIMO mixing channel output is formulated as
y(n)=
M−1
m=0
A(m)s(n−m),(2)
where {A(m),m ∈{0,...,M −1}} are the channel impulse response coeffi-
cients of q×psize, where qis the number of mixtures. The mixing system
is equalizable [19] with order M−1andq≥p. Equalizable channels al-
130
low transmission of zeros at z=0 in the z-plane, and it makes possible that
source signals reach the receiver channel with different delays relative to each
other. Moreover, a necessary and sufficient condition for a convolutive chan-
nel {A(m),m ∈{0,...,M −1}},whereA∈Rq×p, to be equalizable is
rank(A(z)) = pfor q≥p.
135
•The FIR separator filter output is formulated as
z(n)=
K−1
k=0
B(k)y(n−k),(3)
6
where {B(k),k ∈{0,...,K −1}} are the separator filter coefficients of di-
mension p×qand K−1 is the separator filter order.
•Inserting (2) into (3) yields the total system response
F(m)=
K−1
k=0
B(k)A(m−k),m=0,...,P −1,(4)
where P is the total system response order and P−1=K+M−2.
140
•Using the convolutive mixing channel response matrix ˜
A=[A(0),A(1),
...,A(M−1)], separator filter response matrix ˜
B=[B(0),B(1),..., B(K−
1)] and system response matrix ˜
F=[F(0),F(1),...,F(P−1)], (2) and (3)
can be reformulated as
y(n)= ˜
A˜
sM(n),n=1,...L, (5)
z(n)= ˜
B˜
yK(n),n=1,...,L+K−1,(6)
z(n)=˜
F˜
sP(n),n=1,...,L+K−1.(7)
where ˜
sM(n)=[sT(n),sT(n−1),...,sT(n−M+1)]
T,˜
yK(n)=[yT(n),145
yT(n−1),...,yT(n−K+1)]
Tand ˜
sP(n)=[sT(n),sT(n−1),...,sT(n−
P+1)]
T.
•We will use the extended separator output vector ˜
zN(n)=[zT(n),zT(n−
1),...,zT(n−N+1)]
Tand block-Toeplitz matrix ΓN(˜
F) definitions in the
objective function formulation. ΓN(˜
F) represents a block Toeplitz matrix150
whose first block row is [F(0),F(1),...,F(P−1),0,...,0] and first block
column is [F(0),0,...,0]T. Note that the zero matrices are p×p. More
explicitly, ΓN(˜
F) is given by the equation
ΓN(˜
F)=⎡
⎢
⎢
⎢
⎣
F(0),F(1),...,F(P−1),...,0
.
.
........
.
.
0,...,F(0),F(1),...,F(P−1)
⎤
⎥
⎥
⎥
⎦
,(8)
where N≥P. This yields the equation ˜
zN(n)=Γ
N(˜
F)˜
sN+P−1(n)for
the extended separator output vector where ˜
sN+P−1(n)=[sT(n),sT(n−155
7
1),...,sT(n−N−P+2)]
T. Defining the set SN+P−1={˜
sN+P−1(N+
K−1),˜
sN+P−1(N+K),...,˜
sN+P−1(L)}, we introduce the following local
dominance assumption for our convolutive BSS setup:
Assumption (A1): The source sample set SN+P−1contains the vertices (the
corners of a volume) of its bounding 1-norm-ball Bs.160
2.2. Objective Function
Similar to [17],the objective function is defined to be the volume ratio of two
geometrical objects:
•The bounding 1-norm ball: is defined with respect to the maximum 1norms
of the extended separator outputs, i.e.,
165
Bz={q|q1≤maxn∈{N,...,L1}˜
zN(n)1}where L1=L+K−1.
•Principal Hyper-ellipsoid : is defined with respect to the covariance of the
extended separator outputs, i.e., Ez={q|(q−ˆμ˜
zN)Tˆ
R−1
˜
zN(q−ˆμ˜
zN)≤1}
where ˆμ˜
zN=1
L2
L1
n=N
˜
zN(n), ˆ
R˜
zN=1
L2
L1
n=N
(˜
zN(n)−ˆμ˜
zN)(˜
zN(n)−ˆμ˜
zN)T
and L2=L1−N+1.170
Based on the definitions, we formulate the CSBCA objective as a volume ratio
J(˜
B)= det( ˆ
R˜
zN)
max
n∈{N,...,L1}˜
zN(n)1Np ,(9)
which is to be maximized. The main difference between (9) and (1) is that the
separator output vector zis replaced with the extended separator output vector
˜
zN. The following theorem ensures that maximization of the objective function
in (9) achieves blind source separation of convolutive mixtures.
175
Theorem-1: Let us assume that an FIR separator matrix of order K−1can
equalize the mixing channel ˜
A. Then all global maximums of (9) give perfect
separation if the assumption (A1) holds.
The proof is provided in Appendix A.
8
2.3. Iterative Algorithm180
In order to transform the objective in (9) to a more convenient form for the
iterative algorithm derivation, we take its logarithm.
J(˜
B)=1
2log det(ΓN(˜
B)ˆ
R˜yN+K−1ΓN(˜
B)T)
J1(˜
B)
−Nplog( max
n∈{N,...,L1}˜
zN(n)1)
J2(˜
B)
The first term J1(˜
B) is convex differentiable, and the second term J2(˜
B)isa
convex non-smooth function. We can utilize Clarke sub-differential [20] in order
to take the derivative of J2(˜
B). If we denote 1-norm-ball of the separator185
output as fn(ΓN(˜
B)) = ˜
zN(n)1, the sub-differential set of fn(ΓN(˜
B)) with
respect to the argument ΓN(˜
B) can be written as
∂max(fn(ΓN(˜
B))) = o=r˜
yN+K−1(l):ri=sign{(˜
zN)i(l)}+1
˜
zN(l)=0αi,(10)
where ˜
yN+K−1(l)=[yT(l),yT(l−1),...,yT(l−N−K+2)]
T,lis the index of
maximum 1norm and αi∈[−1,1]. Since we aim to derive an iterative update
for the argument ˜
Binstead of ΓN(˜
B), it is necessary to convert (10) into a new190
derivative equation with respect to ˜
Bas follows:
´
o=
N−1
m=0
omp+1:(m+1)p,mq+1:(m+K)q(11)
By applying the chain rule, we obtain the update term corresponding to
J2(˜
B)as
∂J2(˜
B)=Np
l∈I ˜
B
N−1
m=0
λlomp+1:(m+1)p,mq+1:(m+K)q
max
n∈{N,...,L1}˜
zN(n)1
,(12)
where λl≥0,
l∈I ˜
B
λl=1. I˜
Bis a subset of {N,...,L
1}and consists of the
indices for which maximum 1-norm at the separator output is achieved. As in195
(11), the gradient of J1(˜
B) with respect to ˜
Bis written as a summation term
∂J1(˜
B)=
N−1
m=0
Xmp+1:(m+1)p,mq+1:(m+K)q,(13)
9
where X=ΓN(˜
B)ˆ
R˜
yN+K−1ΓN(˜
B)T−1ΓN(˜
B)ˆ
R˜
yN+K−1,ˆ
R˜
yN+K−1=
1
L2
L1
n=N
(˜
yN+K−1(n)−ˆμ˜
yN+K−1)(˜
yN+K−1(n)−ˆμ˜
yN+K−1)T,andˆμ˜
yN+K−1=
1
L2L1
n=N˜
yN+K−1(n).
Consequently, the iterative update equation is formed by combining the
200
gradient and chosen sub-gradient for J1(˜
B)andJ2(˜
B) respectively. In addition,
we can generate a simpler iterative update where only one λlterm is non-zero.
It can be accomplished by selecting a random index location lfrom I˜
Bat every
iteration:
˜
B(t+1) =˜
B(t)+σ(t)(
N−1
m=0
X(t)
mp+1:(m+1)p,mq+1:(m+K)q−
Np
N−1
m=0
o(t)
mp+1:(m+1)p,mq+1:(m+K)q
max
n∈{N,...,L1}˜
zN(n)1
).(14)
As the proposed approach relies on the maximization of a non-concave ob-
205
jective function, the characterization of the convergence behaviour for the cor-
responding sub-gradient based algorithm is relatively hard. Inan, Erdogan and
Cruces provided an analysis in [21] for the stationary point characterization of
the BCA algorithm introduced in [16] and showed that its stationary points
correspond to either the global maximums of the objective function or unsta-
210
ble saddle points. It means that the stationary points of the BCA algorithm
do not correspond to the local maximums of the objective function. Although
this result cannot be generalized for the objective functions presented here, it is
promising for the convergence behaviour of the instantaneous and convolutive
SBCA algorithms. A similar stationary point characterization is in our future
215
research agenda and we expect it to lead to the same conclusion with [21]. In
addition, the empirical results we obtain from the numerical experiments sup-
port the conjecture that the algorithm always converges to the vicinity of a
desired separation point with an appropriate step size selection. The question
if the algorithm always converges to the stationary points is still involved, how-
220
ever, recent research outcomes on non-convex global convergence analysis with
10
appropriate initialization methods are encouraging [22].
2.4. Algorithm Extension to Complex Sources
For the complex extension of the algorithm, we follow the isomorphism based
approach used in [16]. Let us introduce the following terms for the complex case
225
that are used in the complex derivation. We define the operator Υ : Cp→R2p
Υ(a)=Re(aT)Im(aT)T(15)
as an isomorphism between complex and real vectors. For a given complex
vector a, we use the notation ´
ato refer Υ(a). In the same way, we can de-
fine the real isomorphic vector for convolutive signal vector ˜
aK(n)as´
˜
aK(n)=
[Re(aT(n)) Im(aT(n)) ... Re(aT(n−K+1)) Im(aT(n−K+ 1))]T.230
We also define the operator for the convolutive channel, i.e., Cq×Kp →
R2q×2Kp:
Ω( ˜
X)=⎡
⎣Re(X(0)) −Im(X(0)) ... −Im(X(K−1))
Im(X(0)) Re( ˜
X(0)) ... Re(X(K−1)) ⎤
⎦(16)
Finally, the real isomorphic counterparts of the convolutive mixtures and sep-
arator outputs are written as ´
y(n)=Ω(˜
A)´
˜
sM(n)and´
˜
zN(n)=Ω(˜
B)´
˜
yN+K−1(n)
respectively.
235
Using the definitions above, the objective function to be maximized in (9)
can be modified for the complex case as
Jc(˜
B)= det( ˆ
R´
˜
zN)
(max
n∈{N,...,L1}´
˜
zN(n)1)2Np (17)
To derive an iterative algorithm for complex sources, the ratio form in (17) is
transformed to difference form by taking logarithm:
Jc(˜
B)=log(Jc(˜
B)) = 1
2log det(ΓN(Ω( ˜
B)) ˆ
R´
˜yN+K−1ΓN(Ω( ˜
B))T)
−2Nplog( max
n∈{N,...,L1}´
˜
zN(n)1) (18)
11
The corresponding iterative update equation for the separator matrix B(k)(at240
time lag k) can be written as
B(t+1)(k)=B(t)(k)+σ(t)(T1:p,2kq+1:(2k+1)q+Tp+1:2p,(2k+1)q+1:2(k+1)q+
j(Tp+1:2p,2kq+1:(2k+1)q−T1:p,(2k+1)q+1:2(k+1)q)−
2Np
N−1
m=0
ˇ
omp+1:(m+1)p,(m+k)q+1:(m+k+1)q
max
n∈{N,...,L1}´
˜
zN(n)1
).(19)
where ˇ
o=signc{˜
zN(l(t))}˜
yN+K−1(l(t))H,
T=
N−1
r=0
Ω(X)(t)
2rp+1:2(r+1)p,2rq+1:2(r+K)q,
and Ω(X)=ΓN(Ω( ˜
B)) ˆ
R´
˜yN+K−1ΓN(Ω( ˜
B))T−1ΓN(Ω( ˜
B)) ˆ
R´
˜yN+K−1.l(t)rep-
resents the time index for which maximum 1-norm at the separator output is245
achieved.
3. Frequency-Domain-Sparsity Based Convolutive SBCA Framework
The frequency domain methods map the convolutive mixtures to instanta-
neous mixtures in frequency domain so that instantaneous BSS algorithms can
be applied to spectral data in each frequency bin separately. The frequency
250
based convolutive SBCA proposed in this section also uses an instantaneous
mixing model as done by the other frequency domain methods. Furthermore,
it does not have the inherently known permutation and scaling ambiguities suf-
fered by the frequency domain methods because it updates only one temporal
separator matrix instead of a spectral separator matrix for each frequency bin
255
separately.
3.1. Blind Source Separation Setup in Frequency Domain
•We assume that there are pbounded sources. We obtain time-frequency
spectrum representations of the sources by using STFT
si(m, f )=
n
si(n)v(n−mT )e−j2πfn i=1,...,p (20)
12
where f∈[−1/2,1/2] is the frequency index, v(n) is a time-window function260
(Hamming, Kaiser, etc.) of length R1,R1∈Zis the STFT frame length,
T∈Zis the hop size, in samples, between successive DTFTs, m∈Lvis the
time index of STFT, and Lvis the index set of STFT frames taken over the
signal samples. Then, p×Lsize temporal data are mapped to p×dim{Lv}
size spectral data for each frequency bin f.
265
•Since the transformed source vectors are complex, we follow the isomorphism
based approach described in Section 2.4. We define 2pdimensional real iso-
morphic vectors as
´
s(m, f )=Re(s(m, f))TIm(s(m, f ))TT(21)
•For each frequency bin, the isomorphic source sample vectors are assumed to
lie in unity-1-norm-ball, i.e., B´s={´
s(m, f )∈R2p|
´
s(m, f )1≤1,∀m}.270
•Mixing process occurs in a convolutive MIMO channel which is assumed to
be linear, time-invariant (LTI), and equalizable [19].
•Let y(m, f ) be the filtered version of the source STFTs, i.e., multiplication of
the source STFTs and the DTFT of the convolutive mixing channel response:
275
y(m, f )=A(f)s(m, f),(22)
where A(f) is a matrix containing the frequency transform of the mixing filter
elements at frequency bin f. When taking the inverse STFT of y(m, f)and
adding them up with the corresponding overlaps, we obtain the same output
with (2).
•Let ˜
y(m, f )=[˜y1(m, f),˜y2(m, f ),...,˜yq(m, f)]Tbe the STFT of the mixtures280
computed by ˜yk(m, f )=
n
yk(n)v(n−mT )e−j2πfn.
There is a small difference between y(m, f) in (22) and ˜
y(m, f ) due to bound-
ary effects [23]. The number of affected samples due to boundary effects is
equal to mixing filter order. When using a suitable STFT window type with
13
a sufficient length and assuming a fast decaying mixing channel response, we285
obtain a very good approximation for the mixing output, i.e.,
y(m, f )≈˜
y(m, f ) (23)
Under these circumstances, it results that the frequency elements observed
from the mixing channel outputs can be assumed to be instantaneous mixtures
of the frequency elements of the sources according to (22).
•By using (21) and the real isomorphic mapping operator Ω defined in section
290
2.4, (22) is transformed to ´
y(m, f )=Ω(A(f))´
s(m, f ).
•Let z(m, f ) be the separator output of y(m, f) in (22) obtained by multi-
plication of the DTFT of the separator filter and y(m, f ), i.e., z(m, f)=
B(f)y(m, f )whereB(f),f ∈[−1/2,1/2] is the separator frequency response
of dimension p×q.
295
•The STFT of the separator output vector ˜
z(m, f )=[˜z1(m, f ),˜z2(m, f ),...,
˜zp(m, f )]Tis computed by ˜zk(m, f )=
n
zk(n)v(n−mT )e−j2πfn.Onthe
basis of (23), z(m, f )and˜
z(m, f ) are approximately equal. Therefore, the
temporal separator matrix can be estimated using ˜
z(m, f ) instead of z(m, f)
in STFT domain.
300
•The separator outputs’ real isomorphic vector is represented as ´
z(m, f )=
Ω(B(f))´
y(m, f ). Writing the total frequency response as F(f)=B(f)A(f),
we can rewrite ´
z(m, f ) in terms of the sources as ´
z(m, f )=Ω(F(f))´
s(m, f ).
•For a perfect separation in frequency domain, the frequency response of the
total system must satisfy the equality Ω(F(f)) = PD(f), where Pis a per-
305
mutation matrix and D(f) is a full-rank diagonal matrix.
Defining the isomorphic source sample set ´
S(f)={´
s(m, f )∈R2p,m∈Lv},
the following assumption is introduced for the frequency domain BSS setup
defined above:
Assumption(A2): For each frequency bin, ´
S(f) contains the vertices (the
310
corners of a volume) of its bounding 1-norm-ball B´s.
14
3.2. Objective Function
To obtain a frequency based objective function, we extend the volume defi-
nition of the principal hyper-ellipsoid in [17] to frequency domain. In frequency
domain, the Power Spectral Density Matrix (PSDM) of source processes can be
315
defined as
Ps(f) = lim
L→∞ Es(f)s(f)H,(24)
where s(f) is the source DTFT, Lis the sample size of s(n). Using the relation
between the source DTFT’s and STFT’s, (24) can be rewritten as
Ps(f) = lim
dim{Lv}→∞
m∈Lv
Ps(m, f ),(25)
where Ps(m, f ) is the instant PSDM of s(m, f ) [23]. The separator output
PSDM can be related to the source PSDM by Pz(f)=F(f)Ps(f)F(f)Hwhere320
Pz(f)∈Cp×pand F(f) is the frequency response of the total system. For
a given frequency bin, this is equal to the covariance matrix of the separator
outputs’ spectrum. Thus, the determinant of the separator outputs’ temporal
covariance matrix used for instantaneous mixtures in [17] can be replaced by the
determinant of the separator output PSDM in frequency domain for convolutive
325
mixtures. By this way, an extension of the principal hyper-ellipsoid’s volume
definition in [17] is obtained for the convolutive mixtures in frequency domain.
Consequently, we propose the objective function
J(˜
B)=
1/2
f=−1/2
1
2log (det(Ω (Pz(f)))) df −
1/2
f=−1/2
2plog max
m∈Lv
´
z(m, f )1df ,
(26)
where Ω(Pz(f)) is the corresponding real isomorphic matrix. The integral term330
in (26) makes it possible that the objective function be defined according to
the time-domain separator matrix ˜
B. Unless the integral term was used, the
objective function would have to be maximized for each frequency bin separately
instead of being maximized with respect to ˜
B.
The following theorem ensures that maximization of the objective function
335
in (26) achieves blind source separation of temporally convolutive mixtures.
15
Theorem-2: Let us assume that an FIR separator matrix can equalize the
mixing channel A(f). Given the BCA setup in Section 3.1, all global maximums
of (26) give perfect separation if the assumption (A2) is correct.
The proof is provided in Appendix B.
340
3.3. Iterative Algorithm
For the finite set of observations {y(1),...,y(L)}, we modify the objective
as
J(˜
B)= 1
β
R1−1
l=−R1+1
⎛
⎜
⎜
⎜
⎝
1
2log det(Ω(ˆ
Pz(l)))
J1(˜
B)
−2plog max
m∈Lv
´
˜
z(m, l)1
J2(˜
B)
⎞
⎟
⎟
⎟
⎠
,(27)
where β=2R1−1 is the DFT size and ˆ
Pz(l) is the PSDM estimate of the
separator output. Note that ´
z(m, l) is replaced with ´
˜
z(m, l) on the basis of
345
(23). ˆ
Pz(l) is defined in terms of the PSDM estimate of the mixture vector as
ˆ
Pz(l)=B(l)ˆ
Py(l)B(l)H.ˆ
Py(l) is the PSDM estimate of the mixtures and given
by
ˆ
Py(l)=
m∈Lv
ˆ
Py(m, l),(28)
where ˆ
Py(m, l) is the instant PSDM estimate of the mixture signals. For the
calculation of the estimated PSDM ˆ
Py(m, l), we use the spectrogram method350
[24] expressed as
ˆ
Py(m, l)=
t
u(t)˜
yt(m, l)˜
yt(m, l)H(29)
where u(t) is a time-window of length R2and ˜
yt(m, l)=
n
y(n+t)v(n−
mT )e−j2π(n+t)l/β. The window u(t) provides a weighted averaging of R2spectra
in order to reduce the variance of the estimation.
In order to prevent scaling and permutation ambiguities, we derive a time-
355
domain separation approach based on the frequency domain objective function
in (27) instead of the separation matrices for each frequency bin separately.
16
The derivative of the first part of J(˜
B) with respect to B(k)fork=
0,...,K −1is
1
β
R1−1
l=−R1+1
∂J1(˜
B)
∂B(k)=∂Blogdet =Re{X11 +X22 +j(X21 −X12)}.(30)
The components Xij ∈Rp×qaredefinedasfollows:
1
β
R1−1
l=−R1+1
Ω(ˆ
Pz(l))−1Ω(B(l))Ω(ˆ
Py(l))OT=⎡
⎣X11 X12
X21 X22⎤
⎦,(31)
where O=⎡
⎣cos 2πkl/β sin 2πkl/β
−sin 2πkl/β cos 2πkl/β⎤
⎦. The spectral transform B(l) in (31) is
given explicitly as B(l)=
R1−1
k=−R1+1
B(k)e−j2πkl/β.360
The derivative of the second part is
R1−1
l=−R1+1
1
β
∂J2(˜
B)
∂B(k)=∂Bsubg =2p
β
R1−1
l=−R1+1
Re{signc{˜
z(ul,l)}˜
y(ul,l)Hej2πkl/β}
max
m∈Lv
´
˜
z(m, l)1
(32)
where ul∈Lvdenotes the index of the STFT frame for which maximum 1-
norm at the separator output in frequency bin lis achieved. signc denotes the
sign operator for a complex signal computed as Re{sign{a}} +iIm{sign{a}}.
365
Finally, the update equation for the separator matrix in time domain is given
as,
B(t+1)(k)=B(t)(k)+σ(t)∂B(t)
logdet −∂B(t)
subg.(33)
4. Numerical Examples
In the experiments, as benchmark algorithms, we use Kurtosis Maximization
algorithm (MaxKurtosis) optimizing kurtosis based contrast function [25], the
370
convolutive BSS method [1] exploiting the non-stationarity of source signals la-
belled as “convBSS”, the Alternating-Least-Squares (ALS) based method [26],
17
which again exploits the non-stationarity, labelled as “ALS”, Spatio-Temporal
algorithm (STFICA) [2] which is an extension of FastICA method, time-based
convolutive Bounded Component Analysis (CBCA) [27], and Cho’s underde-
375
termined convolutive BSS [28]. We illustrate CBCA’s performance only in the
first and second experiments to prove CSBCA’s superiority for sparse signals.
Besides, we include Cho’s method only in the third experiment because it is an
under-determined BSS method and its publicly available code is optimized for
audio source separation. We use the MATLAB toolbox called BSS Evaluation
380
proposed in [29] which is designed specifically to measure the performance of
algorithms in Blind Audio Signal Separation problem.
4.1. Synthetic Sparse Signals
0200 400 600 800 1000
−20
0
20
0200 400 600 800 1000
−20
0
20
0200 400 600 800 1000
−20
0
20
Signal Magnitude
0200 400 600 800 1000
−20
0
20
0200 400 600 800 1000
−20
0
20
0200 400 600 800 1000
−20
0
20
Time Index
510 15 20 25 30 35 40
0
5
10
15
20
25
30
35
40
Input Signal to Noise Ratio (dB)
Output Signal to Distortion Ratio (dB)
(a) (b)
MaxKurtosis
STFICA
CBCA
ALS
convBSS
CSBCA
Figure 2: a)- Synthetic signals (RIKEN). b)- Output SDR vs. input SNR.
In the first numerical example, we illustrate the performance of the time
based CSBCA. For the performance evaluation, we use synthetic signals in the
385
RIKEN Brain Science Institute benchmark dataset [30]. In the first part of the
test, we consider a scenario with 1000 samples from 6 sources, and 12 mixtures.
The orders of i.i.d. Gaussian convolutive channel and separator filter are 3
and 4 respectively. In this part, we analyse Signal to Distortion Ratio (SDR)
performance versus input Signal to Noise Ratio (SNR). In the second part, we
390
consider a scenario with 1000 samples from 5 sources and analyse the perfor-
mance with respect to number of mixing channels and mixing order for input
SNR=20 dB. Note that noise is considered to be additive and white Gaussian
(AWGN) in all analyses. The source signals are illustrated in Fig.2-(a). The
18
result of the input SNR analysis is depicted in Fig.2-(b) and we can comment395
that time based CSBCA yields better or equal performance almost at all SNRs.
In the second part, the results seem more complex to interpret. At first, it can
be interpreted that CBCA algorithm can not satisfactorily extract the source
signals even at high SNR case as seen in Fig. 2-(b). That’s why changing num-
ber of mixtures and mixing channel order does not effect its performance much
400
as shown in Fig.3. Talking about the convBSS algorithm, it takes advantage of
the non-stationarity of source signals and performs a joint diagonalization on
the cross power spectral density matrices of signals by using a gradient descent
algorithm. The ALS algorithm also exploits the non-stationarity, but essentially
differs from convBSS in two respects. Firstly, it uses a procedure to solve scaling
405
problem inherent with frequency domain approaches. Secondly, it uses a more
efficient and quickly converging optimization method called Alternating-Least-
Squares. Especially the absence of a method for solving scaling problem may
be the reason why convBSS can not separate the sources satisfactorily even if
the number of mixtures increases. As a result, the two differences mentioned
410
above give superiority to ALS.
2 3 4 5 6 7 8
5
10
15
20
25
30
35
Order of Mixin
g
S
y
stem
Output Signal to Distortion Ratio (dB)
(a)
6 7 8 9 10 11 12
5
10
15
20
25
30
Number of Mixture Channels
Output Signal to Distortion Ratio (dB)
(b)
MaxKurtosis
STFICA
CBCA
ALS
convBSS
CSBCA
MaxKurtosis
STFICA
CBCA
ALS
convBSS
CSBCA
Figure 3: a)- Output SDR vs. mixing order. b)- Output SDR vs. number of
mixing channels.
To evaluate the performances of CSBCA’s complex extension, we again use
the synthetic sparse signals. In the complex CSBCA experiment, 3 complex
sources are generated from 6 synthetic sources. Fig. 4 illustrates the SNR
19
510 15 20 25 30 35 40
10
15
20
25
Input Signal to Noise Ratio (dB)
Output Signal to Distortion Ratio (dB)
Figure 4: Complex CSBCA: Output SDR vs. input SNR.
performance for complex CSBCA. There is no comparison with the benchmark
415
algorithms listed above because the published codes for these algorithms work
for only real signals.
4.2. Statistically Dependent Sources
0200 400 600 800 1000 1200 1400 1600 1800 2000
0
0.5
1
(a)
0200 400 600 800 1000 1200 1400 1600 1800 2000
0
0.2
0.4
0.6
0.8
0200 400 600 800 1000 1200 1400 1600 1800 2000
0
0.5
1
Time index
(a)
00.2 0.4 0.6 0.8
0
5
10
15
20
Correlation Coefficient
Output Signal to Distortion Ratio (dB)
(b)
MaxKurtosis
STFICA
CBCA
ALS
convBSS
CSBCA
Figure 5: a)- Selective Copula-T distribution random sparse sequences. b)-
Output SDR vs. correlation.
In the second example, we generate sparse and statistically dependent
sources by producing Copula-T distributed random vector u∈[−1,1]p,and420
transforming this random vector for sparsification through the mapping
s=u,u∈Lr
0,otherwise
(34)
where Lr=x:xr≤1with0≤r≤1. We consider a scenario with 2000
samples from 3 sources, and 6 mixtures. The performance of the time domain
20
CSBCA for different correlation degrees is examined and illustrated in Fig.5. It
results that CSBCA outperforms over the other algorithms for all correlation
425
values. It seems that ALS is most effected by the source dependency. ALS and
convBSS employ joint diagonalization of cross power spectral density matrices
as the first stage in blind source separation. The source dependency can deterio-
rate this stage, that’s why ALS may show a worse performance as the correlation
increases. The performance degradation of MaxKurtosis and STFICA with in-
430
creasing correlation is very reasonable because these methods assume statistical
mutual independence of sources.
4.3. Speech Signal Separation
In the third example, we analyse the performance of the frequency based
CSBCA for speech signal separation by using the measured channel impulse
435
responses in MARDY (Multichannel Acoustic Reverberation Database at York)
Database [31] and three (2 males and 1 female) speech records in which differ-
ent persons read different texts [32]. MARDY database is an open source for
dereverberation researchers to access real life multichannel impulse responses.
The recording setup of MARDY is described in detail in [33]. In this recording
440
work, the reverberant impulse responses of microphones were measured for three
loudspeakers at different locations. In this experiment, the speech signals are
assumed to be transmitted by these loudspeakers and received by three micro-
phones. The recorded speech signals and convolutive mixture signals are illus-
trated in Fig. 6. Selected input parameters for the algorithms are: FCSBCA:
445
Separation Filter Order= 80, STFT Window Length= 150, Overlap Ratio of
STFT Windows= 0.8, ALS: Epoch Length= 1000, FFT Length= 512, Overlap
Ratio of FFT Windows= 0.7, STFICA: Separator Filter Order= 100, con-
vBSS: FFT Length= 512, Separator Filter Order= 100, Number of matrices
to diagonalize= 5, Cho: FFT Length= 4096, Shift Length=512, MaxKurto-
450
sis: Separator Filter Order= 100. Note that the term “Epoch”used in ALS
method means FFT frame. The performances of the algorithms for different
mixing channel orders and input SNRs are examined as shown in Fig.7 and
21
×104
0246810121416
-0.2
-0.1
0
0.1
0.2
0.3
Speech Signals
×104
0246810121416
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
Time Index
×104
0246810121416
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
×104
0246810121416
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
Mixture Signals
×104
0246810121416
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Time Index
×104
0246810121416
-0.3
-0.2
-0.1
0
0.1
0.2
Figure 6: Speech signals and their mixtures.
01000 2000 3000 4000 5000 6000 7000 8000 9000 1000
0
−2
0
2
4
6
8
10
12
14
16
Mixing Channel Order
Output Signal to Distortion Ratio (dB)
STFICA
MaxKurtosis
ALS
convBSS
Cho
FCSBCA
(a)
01000 2000 3000 4000 5000 6000 7000 8000 9000 10000
2
4
6
8
10
12
14
16
18
20
Mixing Channel Order
Output Signal to Interference Ratio (dB)
STFICA
MaxKurtosis
ALS
convBSS
Cho
FCSBCA
(b)
Figure 7: Separation of 3 speech signals in a reverberant environment. (a)-
Output SDR vs mixing channel order. (b)-Output SIR vs mixing channel order.
Fig.8. In the channel order analysis, microphones with noiseless channel re-
sponses (SNR=Inf) are used. In the input SNR analysis, microphones with
455
channel order = 500 are used. It is clear that FCSBCA yields better perfor-
mance than all the other methods with respect to SDR values. As a result, it
can be said that the results show a practical merit of the proposed algorithm.
5. Conclusion
In this article, we propose time and frequency based deterministic BSS ap-
460
proaches for the convolutive mixtures of bounded sparse sources. This frame-
work is a natural extension of the instantaneous SBCA framework in [17]. The
22
−10 −5 0 5 10 15 20
−15
−10
−5
0
5
10
15
Input Signal to Noise Ratio (dB)
Output Signal to Distortion Ratio (dB)
STFICA
MaxKurtosis
ALS
convBSS
Cho
FCSBCA
(a)
−10 −5 0 5 10 15 20
0
5
10
15
20
Input Signal to Noise Ratio (dB)
Output Signal to Interference Ratio (dB)
STFICA
MaxKurtosis
ALS
convBSS
Cho
FCSBCA
(b)
Figure 8: Separation of 3 speech signals in a reverberant environment. (a)-
Output SDR vs input SNR. (b)-Output SIR vs input SNR.
proposed approaches do not assume statistical independence in space, time or
frequency domain. Moreover, the frequency based CSBCA has the advantages
associated with the frequency domain approaches but does not have the inher-
465
ently known permutation and scaling ambiguities because it updates temporal
separator matrix instead of separate separator matrix for each frequency bin.
We demonstrate the performance improvement over some well known convolu-
tive BSS methods, and study a practical application of blind speech separation.
Appendix A :
470
As the first step for the proof of Theorem-1, we rewrite the objective
function, in terms of the arguments ˜
Fand ΓN(˜
F). Then, the sample covariance
matrix can be written as ˆ
R˜zN=Γ
N(˜
F)ˆ
R˜sL3ΓN(˜
F)Twhere L3=N+P−1.
Using this in (9), the objective function is linked as
J(˜
F)= det(ΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T)
(max
n∈{N,...,L1}ΓN(˜
F)˜
sL3(n)1)Np .(A.1)
Under the assumption (A1), the denominator of (A.1) is turned out to
475
(max
n∈{N,...,L1}ΓN(˜
F)˜
sL3(n)1)Np ≤ΓN(˜
F)Np
1,1,(A.2)
23
where ΓN(˜
F)1,1=ΓN(˜
F):,11,...,ΓN(˜
F):,L3p1∞. (A.2) must be an
equality under the assumption (A1) so that (A.1) becomes
J(˜
F)= det(ΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T)
ΓN(˜
F):,11···ΓN(˜
F):,L3p1
Np
∞
.(A.3)
Since we choose N≥P, there is at least one block column inside ΓN(˜
F)that
contains ˜
Fin upside down form. Therefore, we have the equality ΓN(˜
F):,11
··· ΓN(˜
F):,L3p1∞=ΓN(˜
F)1,:1··· ΓN(˜
F)Np,:1∞. This equality480
enables the objective to be transformed to
J(˜
F)≤det(ΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T)
(ΓN(˜
F)1,:1···ΓN(˜
F)Np,:11/(Np))Np (A.4)
≤det(ΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T)
ΓN(˜
F)1,:1ΓN(˜
F)2,:1···ΓN(˜
F)Np,:1
(A.5)
≤det(ΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T)
ΓN(˜
F)1,:2ΓN(˜
F)2,:2···ΓN(˜
F)Np,:2
.(A.6)
To simplify the upper bound expression more, it is possible to rewrite the nom-
inator of the objective by completing ΓN(˜
F) to a full rank square matrix. For
the resulting matrix’s positions, we could use Schur complement. Let’s define
a(P−1)p×(L3)pmatrix X=DP where D=diag (a1;a2;...;a(P−1)p)isa485
full rank diagonal matrix and Pis a permutation matrix. Let us assume that
we select an appropriate Xmatrix that satisfies the equality det(XWXT)=1
where W=ˆ
R˜
sL3−ˆ
R˜
sL3ΓN(˜
F)T(ΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T)−1ΓN(˜
F)ˆ
R˜
sL3.
Using the matrix X, we can express the determinant in a different way:
det ⎛
⎝⎡
⎣ΓN(˜
F)
X⎤
⎦ˆ
R˜
sL3[ΓN(˜
F)TXT]⎞
⎠
=detΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)Tdet(Xˆ
R˜
sL3−
ˆ
R˜
sL3ΓN(˜
F)T(ΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T)−1ΓN(˜
F)ˆ
R˜
sL3XT)
=detΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)Tdet(XWXT)
=detΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T(A.7)
24
XWXTis the Schur complement of ΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T. If we select an ap-490
propriate Xmatrix that satisfies the Schur complement and use Hadamard’s
Inequality, it gives the following inequality:
det(ΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T)≤
Np
m=1
ΓN(˜
F)m,:2
2
(P−1)p
n=1
Xn,:2
2det( ˆ
R˜
sL3)(A.8)
Inserting the inequality (A.8) into (A.5), a new upper bound is obtained as
J(˜
F)≤
(P−1)p
n=1
Xn,:2det( ˆ
R˜
sL3)1/2(A.9)
Since ΓN(˜
F) is a block-Toeplitz matrix, each row of it corresponds to a row of ˜
F.
Moreover, since we choose N≥P, it is guaranteed that there is a block column
495
of ΓN(˜
F)thatcontains[F(0),F(1),...,F(P−1)]. Thus, we will consider ˜
F
instead of ΓN(˜
F) to comment the above inequalities.
The inequalities in (A.4) and (A.5) are achieved only if there is the same 1
norm in all the rows of ˜
F. The inequality between 1and 2norms in (A.6) is
achieved only if there is only one non-zero entry in each row of ˜
F. Hadamard in-
500
equality in (A.8) turns into an equality only if the non-zero entries in the rows of
˜
Fare in different positions with respect to mod p, in order to satisfy the orthog-
onality of the rows of ΓN(˜
F) to each other and to the rows of X. To achieve all
these restrictions and satisfy the upper bound in (A.9), we conclude the overall
system transfer matrix as F(z)=diag !α1z−d1,α
2z−d2,...,α
pz−dp"Pwhere P
505
is a permutation matrix.
Appendix B :
As the first step for the proof of Theorem-2, the PSD term in (26) is written
explicitly as Ω (Pz(f)) = Ω (F(f)) Ω (Ps(f)) Ω (F(f))T. Then, the first term in
(26) turns out to
510
J1(˜
B)=
1/2
f=−1/2!log |det(Ω (F(f)))|2+ log(det(Ω (Ps(f))))"df (B.1)
25
The determinant term for Ω (F(f)) can be rewritten in column wise with respect
to Hadamard inequality rule:
log(|det(Ω (F(f)))|2)≤log(
2p
k=1
Ω(F(f)):,k 2
2)=
2p
k=1
log(Ω(F(f)):,k 2
2)(B.2)
Consequently, replacing the inequality in (B.2) into (26), we reach the fol-
lowing expression for the objective function:
J(F(f)) ≤
1/2
f=−1/2#2p
k=1
log(Ω(F(f)):,k 2)+
1
2log(det(Ω (Ps(f)))) −2plog( max
m∈Lv
´
z(m, f )1)df (B.3)
The volume term for the 1-norm-ball in (B.3) can be explicitly written as
(max
m∈Lv
´
z(m, f )1)2p=(max
m∈Lv
Ω(F(f)) ´
s(m, f )1)2p. Recall that the samples of
´
s(m, f ) are inside the unity 1-norm-ball. Then, for the maximum term, we515
can write the inequality ( max
m∈Lv
Ω(F(f)) ´
s(m, f )1)2p≤Ω(F(f)) 2p
1,1.Ifthe
assumption (A2) holds, this inequality turns into an equality.
By using the relations between ∞,
1and 2norms, we can put forward the
following inequalities:
Ω(F(f)) 2p
1,1=Ω(F(f)):,11,··· ,Ω(F(f)):,2p12p
∞(B.4)
≥(Ω(F(f)):,11,...,Ω(F(f)):,2p11/(2p))2p(B.5)
≥Ω(F(f)):,11Ω(F(f)):,21···Ω(F(f)):,2p1(B.6)
≥Ω(F(f)):,12Ω(F(f)):,22···Ω(F(f)):,2p2(B.7)
If we replace the inequality (B.7) into (B.3), the objective upper bound becomes
520
J(F(f)) ≤1
2
1/2
f=−1/2
log(det(Ω (Ps(f))))df (B.8)
The norm inequality in (B.5) and the arithmetic-geometric mean inequality in
(B.6) can be achieved only if there is the same 1norm in all the columns of
Ω(F(f)). The norm inequality in (B.7) can be achieved only if there is only
26
one non-zero entry in all the columns of Ω(F(f)). The Hadamard’s Inequality
in (B.2) is achieved only if all the columns of Ω(F(f)) are orthogonal to each
525
other. As a result, to achieve the upper bound in (B.8), we conclude the system
frequency response as Ω(F(f)) = αPwhere Pis a permutation matrix.
References
[1] L. Parra, C. Spence, Convolutive blind source separation of non-stationary
sources, IEEE Trans. on Speech and Audio Processing, (Code available at
530
http://bsp.teithe.gr/members/downloads.html) 8 (3) (May 2000) 320–327.
[2] S. C. Douglas, M. Gupta, H. Sawada, S. Makino, Spatio-temporal fastica
algorithms for the blind separation of convolutive mixtures, IEEE Trans.
on Audio, Speech, And Language Processing 15 (5) (July 2007) 1511–1520.
[3] T. Oktem, A. T. Erdogan, A. Demir, Adaptive receiver structures for
535
fiber communication systems employing polarization-division multiplexing,
Journal of Lightwave Technology 28 (10) (15 May 2010) 1536–1546.
[4] C. Vay´a, J. J. Rieta, C. S´anchez, D. Moratal, Convolutive blind source
separation algorithms applied to the electrocardiogram of atrial fibrillation:
Study of performance, IEEE Transactions on Biomedical Engineering 54 (8)
540
(August 2007) 1530–1533.
[5] D. Nion, N. D. Sidiropoulos, Adaptive algorithms to track the parafac
decomposition of a third-order tensor, IEEE Transactions on Signal Pro-
cessing 57 (6) (June 2009) 2299–2310.
[6] Z. Koldovsk´y, P. Tichavsk´y, Time-domain blind audio source separation
545
using advanced ica methods, in: 8th Annual Conference of the International
Speech Communication Association, August 2007, pp. 846–849.
[7] S. Makino, H. Sawada, T.-W. Lee, Blind Speech Separation, Vol. 615,
Springer, 2007.
27
[8] P. Smaragdis, Blind separation of convolved mixtures in the frequency do-550
main, Neurocomputing 22 (1-3) (20 November 1998) 21–34.
[9] H. Attias, New em algorithms for source separation and deconvolution,
in: Acoustics, Speech, and Signal Processing, 2003 IEEE International
Conference on, Vol. 5, IEEE, 6-10 April 2003, pp. 297–300.
[10] O. Yilmaz, S. Rickard, Blind separation of speech mixtures via time-
555
frequency masking, IEEE Transactions on Signal Processing 52 (7) (July
2004) 1830 – 1847.
[11] P. Comon, Independent component analysis, a new concept?, Signal pro-
cessing 36 (3) (April 1994) 287–314.
[12] A. Hyvarinen, E. Oja, A fast fixed-point algorithm for independent com-
560
ponent analysis, Neural Computation 9 (7) (10 July 1997) 1483–1492.
[13] Z. Koldovsk´y, P. Tichavsk´y, E. Oja, Efficient variant of algorithm fastica
for independent component analysis attaining the cram´er-rao lower bound,
IEEE Transactions on Neural Networks 17 (5) (October 2006) 1265–1277.
[14] Z. He, S. Xie, S. Ding, A. Cichocki, Convolutive blind source separation in
565
the frequency domain based on sparse representation, IEEE Transactions
on Audio, Speech, and Language Processing 15 (5) (July 2007) 1551–1563.
[15] S. Cruces, Bounded component analysis of linear mixtures: A criterion of
minimum convex perimeter, IEEE Transactions on Signal Processing 58 (4)
(15 January 2010) 2141–2154.
570
[16] A. T. Erdogan, A class of bounded component analysis algorithms for the
separation of both independent and dependent sources, IEEE Transactions
on Signal Processing 61 (22) (15 November 2013) 5730–5743.
[17] E. Babatas, A. T. Erdogan, An algorithmic framework for sparse bounded
component analysis, IEEE Transactions on Signal Processing 66 (19) (01
575
October 2018) 5194–5205.
28
[18] E. Babatas, A. T. Erdogan, Sparse bounded component analysis for convo-
lutive mixtures, in: Acoustics, Speech, and Signal Processing, 2018 IEEE
International Conference on, IEEE, 15-20 April 2018, pp. 1–5.
[19] Y. Inouye, R.-W. Liu, A system-theoretic foundation for blind equalization
580
of an fir mimo channel system, IEEE Transactions on Circuits and Systems
I: Fundamental Theory and Applications 49 (4) (April 2002) 425–436.
[20] A. Bagirov, N. Karmitsa, M. M. Makela, Introduction to Nonsmooth Op-
timization: theory, practice and software, Springer, August 2014.
[21] H. A. Inan, A. T. Erdogan, S. Cruces, Stationary point characterization for
585
a class of bca algorithms, IEEE Transactions on Signal Processing 65 (20)
(2017) 5437–5452.
[22] S. Arora, R. Ge, T. Ma, A. Moitra, Simple, efficient, and neural algorithms
for sparse coding, in: Conference on Learning Theory (COLT), Proceedings
of Machine Learning Research, 3–6 July 2015.
590
[23] T. Mei, A. Mertins, F. Yin, J. Xi, J. Chicharo, Blind source separation for
convolutive mixtures based on the joint diagonalization of power spectral
density matrices, Signal Processing 88 (8) (August 2008) 1990 – 2007.
[24] W. Martin, P. Flandrin, Wigner-ville spectral analysis of nonstationary
processes, IEEE Transactions on Acoustics, Speech, and Signal Processing
595
33 (6) (1985) 1461–1470.
[25] M. Castella, E. Moreau, A new method for kurtosis maximization and
source separation, in: Acoustics, Speech and Signal Processing (ICASSP),
2010 IEEE International Conference on (Code available at http://bass-
db.gforge.inria.fr/bss locate/ ), IEEE, 14-19 March 2010, pp. 2670–2673.600
[26] K. Rahbar, J. P. Reilly, A frequency domain method for blind source sep-
aration of convolutive audio mixtures, IEEE Trans. on Speech and Audio
Processing 13 (5) (September 2005) 832–844.
29
[27] H. A. Inan, A. T. Erdogan, A convolutive bounded component analysis
framework for potentially nonstationary independent and/or dependent
605
sources, IEEE Transactions on Signal Processing 63 (1) (01 January 2015)
18–30.
[28] J. Cho, C. D. Yoo, Underdetermined convolutive bss: Bayes risk minimiza-
tion based on a mixture of super-gaussian posterior approximation, IEEE
Transactions on Audio, Speech, and Language Processing 23 (5) (06 March
610
2015) 828–839.
[29] E. Vincent, R. Gribonval, C. F´evotte, Performance measurement in blind
audio source separation, IEEE Trans. on Audio, Speech and Language Pro-
cessing (Code Available at http://bass-db.gforge.inria.fr/bss eval/)14(4)
(19 June 2006) 1462–1469.
615
[30] S.-I. A. A. Cichocki, T. K. Siwek, Tanaka, A. H. Phan, R. Zdunek,
S. Cruces, P. Georgiev, Y. Washizawa, Z. Leonowicz, et al., Icalab tool-
boxes, URL:http://www.bsp.brain.riken.jp/ICALAB (2007).
[31] Speech, A. P. Group, Mardy database,
http://www.commsp.ee.ic.ac.uk/ sap/resources / mardy-multichannel-
620
acoustic-reverberation-database-at-york-database/.
[32] V. G. Reju, S. N. Koh, I. Y. Soon, Underdetermined convo-
lutive blind source separation via time-frequency masking, IEEE
Transactions on Audio, Speech and Language Processing (avail-
able at https://www.mathworks.com/matlabcentral/fileexchange/47069-
625
convolutive-bss) 18 (1) (January 2010) 101–116.
[33] J. Y. Wen, N. D. Gaubitch, E. A. Habets, T. Myatt, P. A. Naylor, Evalu-
ation of speech dereverberation algorithms using the mardy database, in:
in Proc. Intl. Workshop Acoust. Echo Noise Control (IWAENC), 12-14
September 2006.
630
30