ArticlePDF Available

Time and Frequency Based Sparse Bounded Component Analysis Algorithms for Convolutive Mixtures

Authors:

Abstract and Figures

In this paper, we introduce time-domain and frequency-domain versions of a new Blind Source Separation (BSS) approach to extract bounded magnitude sparse sources from convolutive mixtures. We derive algorithms by maximization of the proposed objective functions that are defined in a completely deterministic framework, and prove that global maximums of the objective functions yield perfect separation under suitable conditions. The derived algorithms can be applied to temporal or spatially dependent sources as well as independent sources. We provide experimental results to demonstrate some benefits of the approach, also including an application on blind speech separation.
Content may be subject to copyright.
Time and Frequency Based Sparse Bounded
Component Analysis Algorithms for Convolutive
Mixtures
Eren Babatasa, Alper T. Erdogana
aElectrical-Electronics Engineering Dept., Koc University, Istanbul, 34450, Turkey
Abstract
In this paper, we introduce time-domain and frequency-domain versions of a new
Blind Source Separation (BSS) approach to extract bounded magnitude sparse
sources from convolutive mixtures. We derive algorithms by maximization of
the proposed objective functions that are defined in a completely deterministic
framework, and prove that global maximums of the objective functions yield
perfect separation under suitable conditions. The derived algorithms can be ap-
plied to temporal or spatially dependent sources as well as independent sources.
We provide experimental results to demonstrate some benefits of the approach,
also including an application on blind speech separation.
Keywords: Convolutive Blind Source Separation, Bounded Component
Analysis, Sparse Component Analysis, Sparse Bounded Component Analysis,
Blind Speech Separation.
1. Introduction
Convolutive Blind Source Separation (BSS) is a generic inverse problem with
broad impact as the extraction of multiple sources from their space-time mix-
tures is a common problem in various engineering areas. The acoustic source
separation in reverberant environments [1, 2] can be considered as the signature
5
Email addresses: ebabatas@ku.edu.tr (Eren Babatas), alperdogan@ku.edu.tr (Alper
T. Erdogan)
Preprint submitted to Journal of Signal Processing March 11, 2020
case, however, the applications span a much wider range including digital com-
munications [3], electrocardiogram measurements [4], radar signal processing
[5].
While there are many different approaches to solve the convolutive mixing
problem in BSS, we can arrange the related methods into two main categories
10
according to their processing domain. The first group is referred as the time do-
main methods where the separator implementation and the algorithm are based
on the convolution with Multiple Input Multiple Output (MIMO) FIR filter
[2, 6]. The main difficulty with these methods is that the compensation for
mixing may require long FIR filters in order to achieve acceptable performance.
15
This situation is very likely especially for audio signals under reverberation
[7] and increases the processing load of the convolution operation in time. The
longer filter lengths also imply more coefficients to be learned during the training
period. The methods in the second group, frequency domain methods, map the
temporal mixtures to frequency spectrum data so that temporal convolution is
20
converted to multiplication in frequency domain [8]. Hereby, instantaneous BSS
algorithms can be applied to each frequency bin separately. Frequency trans-
formation is executed using different methods such as sliding window Discrete
Fourier Transform (DFT) [9] and Short Time Fourier Transform (STFT) [10].
Frequency domain methods have two important issues: scaling and permutation
25
inconsistencies that cause unequal scaling of the spectral components in different
bins and the spectral mixing of the source components. Also, the mixing pro-
cess is complex valued because of frequency transformation which increases the
complexity and the processing load of the frequency domain methods. On the
other hand, improved efficiency and better convergence features are considered
30
as the advantages of the frequency domain methods.
BSS algorithms exploit different assumptions on sources and mixing systems
to solve BSS problem. Among them, the mutual independence of sources is a
strong assumption that is used by the popular Independent Component Analysis
(ICA) method which was initially proposed for the instantaneous mixing case
35
[11]. Later ICA approach was extended also to convolutive BSS settings. As
2
an example, Douglas [2] proposed a spatio-temporal extension of well known
FastICA algorithm of Hyvarinen and Oja [12]. Similarly, Koldovsk´y [6] applied
powerful ICA algorithm EFICA [13] to convolutive mixtures for the time-domain
separation. There are many other convolutive BSS algorithm categories based
40
on different properties. Several BSS algorithms [1] exploit non-stationarity of
sources which is a very common characteristic for speech signals. In another
category, Sparse Component Analysis (SCA) methods have also been a useful
tool in convolutive BSS applications. Although many source signals are not
sparse in time domain, they turn out to be sparse when transformed to frequency
45
domain or time-frequency domain. Therefore, frequency-domain SCA methods
have recently received more attention with the advantage that they can also be
applied in under-determined mixing case for convolutive BSS [14].
Bounded Component Analysis (BCA) is a recently introduced BSS frame-
work that is based on domain separability assumption which is less strict than
50
the independence assumption in ICA. Cruces showed in [15] that the source
boundedness side information can be used to separate temporal or spatially
dependent sources as well as independent sources. In [16], Erdogan presented
a deterministic framework based on some geometric optimization settings. In
[17], we proposed an expansion of the instantaneous BCA approach in [16] to
55
sparsely natured bounded signals. This expansion is referred as Sparse BCA
(SBCA). SBCA modifies the geometric framework in [16] such that the sources
are assumed to be bounded in an 1-norm-ball instead of the -norm-ball used
in [16].
In this article, we propose time based and frequency based convolutive ver-
60
sions of SBCA approach in [17] for (over)determined case. The time based
approach was at first introduced in the conference paper [18], here the exten-
sion of this time based approach to complex source signals is explained. On the
other hand, there are two main contributions of this article with respect to the
conference paper:
65
A new algorithm that uses a frequency based objective function in order to
3
sparsify source signals by mapping them to frequency spectrum data.
A practical application of the algorithm for blind speech separation in a re-
verberant scene.
The article is organized as follows: In Section 2.1, we introduce time domain
70
convolutive BSS setup and propose a time-domain-sparsity based convolutive
SBCA approach. In Section 3, we map convolutive mixtures to frequency spec-
trum data via STFT and describe a new signal setup in frequency domain.
Then, we propose a frequency-domain-sparsity based SBCA optimization set-
ting. Finally, there are numerical examples in Section 4, also including a blind
75
speech separation application.
This paper uses the following notations: Let ˜
Xdenote convolutive channel
response defined as ˜
X=[X(0),X(1),...,X(K1)] where Kis the channel
order. ΓN(X) represents the block Toeplitz matrix of X.Letxadenote a-
norm of the vector x.LetXa,b denote induced matrix norm formulated as80
supsb1Xsa:CpR2pdenotes the operator transforming a complex
signal vector of psize to its real isomorphic counterpart of 2psize. ´
xrepresents
Υ(x)shortly. Let˜
xK(n)=[xT(n),xT(n1),...,xT(nK+1)]
Tdenote the
convolutive signal vector of x(n). Then ´
˜
xK(n) is the real isomorphic vector for
˜
xK(n). Let Ω( ˜
X) denote the real isomorphic matrix for convolutive channel85
response ˜
Xwhere Ω : Cq×Kp R2q×2Kp.x(m, f ) is the Short Time Fourier
Transform of x(n). Let Px(f) denote Power Spectral Density matrix of x(f)
and Px(m, f ) denote the instant PSDM of x(m, f ).
2. Time-Domain-Sparsity Based Convolutive SBCA Framework
Before beginning to define time-domain convolutive SBCA framework, we
90
give a short introduction of instantaneous SBCA framework that was proposed
in [17]. In the instantaneous SBCA framework,
There are psources represented by the sample set S={s(n)∈
p,n =
1,...L}and the source samples are bounded by an 1norm ball, i.e., s(n)∈B
s
4
where Bs={q∈
p|q1<1}. In the more general case, Bsis replaced with95
a weighted 1norm ball.
Source signals are mixed in linear, memoryless and lossless channels whose
time responses are denoted by the full-rank matrix ARq×pwhere qp.
The mixed signals are formulated as y(n)=A(n)s(n).
The mixed signals are filtered by an instantaneous separator filter which is
100
denoted by the matrix B∈
p×q. The separator outputs are formulated as
z(n)=By(n).
The cascade of the mixing and separator systems is given by F=BA.The
aim of BSS algorithms is to obtain a separator matrix that will provide the
equality F=PD where Pis a permutation matrix and Dis a full rank
105
diagonal scaling matrix.
The SBCA algorithm solves the BSS problem described above by using a ge-
ometrical optimization setting illustrated in Figure 1, i.e., the volume ratio of
two geometrical objects called as principal hyper-ellipsoid (red ball in Figure
1) and bounding 1norm ball (green diamond shaped box in Figure 1). The110
SBCA objective function derived from these volumes is defined as
J(B)= det( ˆ
Rz)
(maxn∈{1,...,L}z(n)1)p.(1)
In this formula, the nominator represents the volume of the principal hyper-
Figure 1: Geometric objects used in the SBCA framework. Diamond boxes: the
bounding l1-norm-balls, red balls: the principal hyper-ellipsoids, green polytope:
the image of the input l1-norm-ball under the mapping.
5
ellipsoid, i.e., Ez={q|(qˆμz)Tˆ
R1
z(qˆμz)1}where ˆ
Rzisthesamplecovari-
ance matrix of the separator outputs formulated as ˆ
Rz=1
LL
n=1 z(n)z(n)T
ˆμzˆμT
z. The denominator is the volume of the bounding 1norm ball where115
z(n)1is 1norm of the separator output. Note that the number of sources is
known as a priori information. The detailed derivations of the objective function
and the resulting iterative update equation are provided in [17].
2.1. Convolutive Blind Source Separation Setup in Time Domain
For the convolutive BSS setup,
120
The sources are represented by the set S={s(n)Rp}and assumed to be
bounded and lie in 1-norm-ball Bsdescribed by Bs={sRp|s1<1}.
It is again assumed that p is known as a priori information. Although Bs
can be replaced with the weighted 1-norm-ball for sources having different
ranges, we use the unity-1-norm-ball to simplify expressions without any loss125
of generality.
The convolutive MIMO mixing channel output is formulated as
y(n)=
M1
m=0
A(m)s(nm),(2)
where {A(m),m ∈{0,...,M 1}} are the channel impulse response coeffi-
cients of q×psize, where qis the number of mixtures. The mixing system
is equalizable [19] with order M1andqp. Equalizable channels al-
130
low transmission of zeros at z=0 in the z-plane, and it makes possible that
source signals reach the receiver channel with different delays relative to each
other. Moreover, a necessary and sufficient condition for a convolutive chan-
nel {A(m),m ∈{0,...,M 1}},whereARq×p, to be equalizable is
rank(A(z)) = pfor qp.
135
The FIR separator filter output is formulated as
z(n)=
K1
k=0
B(k)y(nk),(3)
6
where {B(k),k ∈{0,...,K 1}} are the separator filter coefficients of di-
mension p×qand K1 is the separator filter order.
Inserting (2) into (3) yields the total system response
F(m)=
K1
k=0
B(k)A(mk),m=0,...,P 1,(4)
where P is the total system response order and P1=K+M2.
140
Using the convolutive mixing channel response matrix ˜
A=[A(0),A(1),
...,A(M1)], separator filter response matrix ˜
B=[B(0),B(1),..., B(K
1)] and system response matrix ˜
F=[F(0),F(1),...,F(P1)], (2) and (3)
can be reformulated as
y(n)= ˜
A˜
sM(n),n=1,...L, (5)
z(n)= ˜
B˜
yK(n),n=1,...,L+K1,(6)
z(n)=˜
F˜
sP(n),n=1,...,L+K1.(7)
where ˜
sM(n)=[sT(n),sT(n1),...,sT(nM+1)]
T,˜
yK(n)=[yT(n),145
yT(n1),...,yT(nK+1)]
Tand ˜
sP(n)=[sT(n),sT(n1),...,sT(n
P+1)]
T.
We will use the extended separator output vector ˜
zN(n)=[zT(n),zT(n
1),...,zT(nN+1)]
Tand block-Toeplitz matrix ΓN(˜
F) definitions in the
objective function formulation. ΓN(˜
F) represents a block Toeplitz matrix150
whose first block row is [F(0),F(1),...,F(P1),0,...,0] and first block
column is [F(0),0,...,0]T. Note that the zero matrices are p×p. More
explicitly, ΓN(˜
F) is given by the equation
ΓN(˜
F)=
F(0),F(1),...,F(P1),...,0
.
.
........
.
.
0,...,F(0),F(1),...,F(P1)
,(8)
where NP. This yields the equation ˜
zN(n)=Γ
N(˜
F)˜
sN+P1(n)for
the extended separator output vector where ˜
sN+P1(n)=[sT(n),sT(n155
7
1),...,sT(nNP+2)]
T. Defining the set SN+P1={˜
sN+P1(N+
K1),˜
sN+P1(N+K),...,˜
sN+P1(L)}, we introduce the following local
dominance assumption for our convolutive BSS setup:
Assumption (A1): The source sample set SN+P1contains the vertices (the
corners of a volume) of its bounding 1-norm-ball Bs.160
2.2. Objective Function
Similar to [17],the objective function is defined to be the volume ratio of two
geometrical objects:
The bounding 1-norm ball: is defined with respect to the maximum 1norms
of the extended separator outputs, i.e.,
165
Bz={q|q1maxn∈{N,...,L1}˜
zN(n)1}where L1=L+K1.
Principal Hyper-ellipsoid : is defined with respect to the covariance of the
extended separator outputs, i.e., Ez={q|(qˆμ˜
zN)Tˆ
R1
˜
zN(qˆμ˜
zN)1}
where ˆμ˜
zN=1
L2
L1
n=N
˜
zN(n), ˆ
R˜
zN=1
L2
L1
n=N
(˜
zN(n)ˆμ˜
zN)(˜
zN(n)ˆμ˜
zN)T
and L2=L1N+1.170
Based on the definitions, we formulate the CSBCA objective as a volume ratio
J(˜
B)= det( ˆ
R˜
zN)
max
n∈{N,...,L1}˜
zN(n)1Np ,(9)
which is to be maximized. The main difference between (9) and (1) is that the
separator output vector zis replaced with the extended separator output vector
˜
zN. The following theorem ensures that maximization of the objective function
in (9) achieves blind source separation of convolutive mixtures.
175
Theorem-1: Let us assume that an FIR separator matrix of order K1can
equalize the mixing channel ˜
A. Then all global maximums of (9) give perfect
separation if the assumption (A1) holds.
The proof is provided in Appendix A.
8
2.3. Iterative Algorithm180
In order to transform the objective in (9) to a more convenient form for the
iterative algorithm derivation, we take its logarithm.
J(˜
B)=1
2log det(ΓN(˜
B)ˆ
R˜yN+K1ΓN(˜
B)T)
 
J1(˜
B)
Nplog( max
n∈{N,...,L1}˜
zN(n)1)
 
J2(˜
B)
The first term J1(˜
B) is convex differentiable, and the second term J2(˜
B)isa
convex non-smooth function. We can utilize Clarke sub-differential [20] in order
to take the derivative of J2(˜
B). If we denote 1-norm-ball of the separator185
output as fnN(˜
B)) = ˜
zN(n)1, the sub-differential set of fnN(˜
B)) with
respect to the argument ΓN(˜
B) can be written as
max(fnN(˜
B))) = o=r˜
yN+K1(l):ri=sign{(˜
zN)i(l)}+1
˜
zN(l)=0αi,(10)
where ˜
yN+K1(l)=[yT(l),yT(l1),...,yT(lNK+2)]
T,lis the index of
maximum 1norm and αi[1,1]. Since we aim to derive an iterative update
for the argument ˜
Binstead of ΓN(˜
B), it is necessary to convert (10) into a new190
derivative equation with respect to ˜
Bas follows:
´
o=
N1
m=0
omp+1:(m+1)p,mq+1:(m+K)q(11)
By applying the chain rule, we obtain the update term corresponding to
J2(˜
B)as
J2(˜
B)=Np
l∈I ˜
B
N1
m=0
λlomp+1:(m+1)p,mq+1:(m+K)q
max
n∈{N,...,L1}˜
zN(n)1
,(12)
where λl0,
l∈I ˜
B
λl=1. I˜
Bis a subset of {N,...,L
1}and consists of the
indices for which maximum 1-norm at the separator output is achieved. As in195
(11), the gradient of J1(˜
B) with respect to ˜
Bis written as a summation term
J1(˜
B)=
N1
m=0
Xmp+1:(m+1)p,mq+1:(m+K)q,(13)
9
where X=ΓN(˜
B)ˆ
R˜
yN+K1ΓN(˜
B)T1ΓN(˜
B)ˆ
R˜
yN+K1,ˆ
R˜
yN+K1=
1
L2
L1
n=N
(˜
yN+K1(n)ˆμ˜
yN+K1)(˜
yN+K1(n)ˆμ˜
yN+K1)T,andˆμ˜
yN+K1=
1
L2L1
n=N˜
yN+K1(n).
Consequently, the iterative update equation is formed by combining the
200
gradient and chosen sub-gradient for J1(˜
B)andJ2(˜
B) respectively. In addition,
we can generate a simpler iterative update where only one λlterm is non-zero.
It can be accomplished by selecting a random index location lfrom I˜
Bat every
iteration:
˜
B(t+1) =˜
B(t)+σ(t)(
N1
m=0
X(t)
mp+1:(m+1)p,mq+1:(m+K)q
Np
N1
m=0
o(t)
mp+1:(m+1)p,mq+1:(m+K)q
max
n∈{N,...,L1}˜
zN(n)1
).(14)
As the proposed approach relies on the maximization of a non-concave ob-
205
jective function, the characterization of the convergence behaviour for the cor-
responding sub-gradient based algorithm is relatively hard. Inan, Erdogan and
Cruces provided an analysis in [21] for the stationary point characterization of
the BCA algorithm introduced in [16] and showed that its stationary points
correspond to either the global maximums of the objective function or unsta-
210
ble saddle points. It means that the stationary points of the BCA algorithm
do not correspond to the local maximums of the objective function. Although
this result cannot be generalized for the objective functions presented here, it is
promising for the convergence behaviour of the instantaneous and convolutive
SBCA algorithms. A similar stationary point characterization is in our future
215
research agenda and we expect it to lead to the same conclusion with [21]. In
addition, the empirical results we obtain from the numerical experiments sup-
port the conjecture that the algorithm always converges to the vicinity of a
desired separation point with an appropriate step size selection. The question
if the algorithm always converges to the stationary points is still involved, how-
220
ever, recent research outcomes on non-convex global convergence analysis with
10
appropriate initialization methods are encouraging [22].
2.4. Algorithm Extension to Complex Sources
For the complex extension of the algorithm, we follow the isomorphism based
approach used in [16]. Let us introduce the following terms for the complex case
225
that are used in the complex derivation. We define the operator Υ : CpR2p
Υ(a)=Re(aT)Im(aT)T(15)
as an isomorphism between complex and real vectors. For a given complex
vector a, we use the notation ´
ato refer Υ(a). In the same way, we can de-
fine the real isomorphic vector for convolutive signal vector ˜
aK(n)as´
˜
aK(n)=
[Re(aT(n)) Im(aT(n)) ... Re(aT(nK+1)) Im(aT(nK+ 1))]T.230
We also define the operator for the convolutive channel, i.e., Cq×Kp
R2q×2Kp:
Ω( ˜
X)=
Re(X(0)) Im(X(0)) ... Im(X(K1))
Im(X(0)) Re( ˜
X(0)) ... Re(X(K1))
(16)
Finally, the real isomorphic counterparts of the convolutive mixtures and sep-
arator outputs are written as ´
y(n)=Ω(˜
A)´
˜
sM(n)and´
˜
zN(n)=Ω(˜
B)´
˜
yN+K1(n)
respectively.
235
Using the definitions above, the objective function to be maximized in (9)
can be modified for the complex case as
Jc(˜
B)= det( ˆ
R´
˜
zN)
(max
n∈{N,...,L1}´
˜
zN(n)1)2Np (17)
To derive an iterative algorithm for complex sources, the ratio form in (17) is
transformed to difference form by taking logarithm:
Jc(˜
B)=log(Jc(˜
B)) = 1
2log det(ΓN(Ω( ˜
B)) ˆ
R´
˜yN+K1ΓN(Ω( ˜
B))T)
2Nplog( max
n∈{N,...,L1}´
˜
zN(n)1) (18)
11
The corresponding iterative update equation for the separator matrix B(k)(at240
time lag k) can be written as
B(t+1)(k)=B(t)(k)+σ(t)(T1:p,2kq+1:(2k+1)q+Tp+1:2p,(2k+1)q+1:2(k+1)q+
j(Tp+1:2p,2kq+1:(2k+1)qT1:p,(2k+1)q+1:2(k+1)q)
2Np
N1
m=0
ˇ
omp+1:(m+1)p,(m+k)q+1:(m+k+1)q
max
n∈{N,...,L1}´
˜
zN(n)1
).(19)
where ˇ
o=signc{˜
zN(l(t))}˜
yN+K1(l(t))H,
T=
N1
r=0
Ω(X)(t)
2rp+1:2(r+1)p,2rq+1:2(r+K)q,
and Ω(X)=ΓN(Ω( ˜
B)) ˆ
R´
˜yN+K1ΓN(Ω( ˜
B))T1ΓN(Ω( ˜
B)) ˆ
R´
˜yN+K1.l(t)rep-
resents the time index for which maximum 1-norm at the separator output is245
achieved.
3. Frequency-Domain-Sparsity Based Convolutive SBCA Framework
The frequency domain methods map the convolutive mixtures to instanta-
neous mixtures in frequency domain so that instantaneous BSS algorithms can
be applied to spectral data in each frequency bin separately. The frequency
250
based convolutive SBCA proposed in this section also uses an instantaneous
mixing model as done by the other frequency domain methods. Furthermore,
it does not have the inherently known permutation and scaling ambiguities suf-
fered by the frequency domain methods because it updates only one temporal
separator matrix instead of a spectral separator matrix for each frequency bin
255
separately.
3.1. Blind Source Separation Setup in Frequency Domain
We assume that there are pbounded sources. We obtain time-frequency
spectrum representations of the sources by using STFT
si(m, f )=
n
si(n)v(nmT )ej2πfn i=1,...,p (20)
12
where f[1/2,1/2] is the frequency index, v(n) is a time-window function260
(Hamming, Kaiser, etc.) of length R1,R1Zis the STFT frame length,
TZis the hop size, in samples, between successive DTFTs, mLvis the
time index of STFT, and Lvis the index set of STFT frames taken over the
signal samples. Then, p×Lsize temporal data are mapped to p×dim{Lv}
size spectral data for each frequency bin f.
265
Since the transformed source vectors are complex, we follow the isomorphism
based approach described in Section 2.4. We define 2pdimensional real iso-
morphic vectors as
´
s(m, f )=Re(s(m, f))TIm(s(m, f ))TT(21)
For each frequency bin, the isomorphic source sample vectors are assumed to
lie in unity-1-norm-ball, i.e., B´s={´
s(m, f )R2p|
´
s(m, f )11,m}.270
Mixing process occurs in a convolutive MIMO channel which is assumed to
be linear, time-invariant (LTI), and equalizable [19].
Let y(m, f ) be the filtered version of the source STFTs, i.e., multiplication of
the source STFTs and the DTFT of the convolutive mixing channel response:
275
y(m, f )=A(f)s(m, f),(22)
where A(f) is a matrix containing the frequency transform of the mixing filter
elements at frequency bin f. When taking the inverse STFT of y(m, f)and
adding them up with the corresponding overlaps, we obtain the same output
with (2).
Let ˜
y(m, f )=[˜y1(m, f),˜y2(m, f ),...,˜yq(m, f)]Tbe the STFT of the mixtures280
computed by ˜yk(m, f )=
n
yk(n)v(nmT )ej2πfn.
There is a small difference between y(m, f) in (22) and ˜
y(m, f ) due to bound-
ary effects [23]. The number of affected samples due to boundary effects is
equal to mixing filter order. When using a suitable STFT window type with
13
a sufficient length and assuming a fast decaying mixing channel response, we285
obtain a very good approximation for the mixing output, i.e.,
y(m, f )˜
y(m, f ) (23)
Under these circumstances, it results that the frequency elements observed
from the mixing channel outputs can be assumed to be instantaneous mixtures
of the frequency elements of the sources according to (22).
By using (21) and the real isomorphic mapping operator Ω defined in section
290
2.4, (22) is transformed to ´
y(m, f )=Ω(A(f))´
s(m, f ).
Let z(m, f ) be the separator output of y(m, f) in (22) obtained by multi-
plication of the DTFT of the separator filter and y(m, f ), i.e., z(m, f)=
B(f)y(m, f )whereB(f),f [1/2,1/2] is the separator frequency response
of dimension p×q.
295
The STFT of the separator output vector ˜
z(m, f )=[˜z1(m, f ),˜z2(m, f ),...,
˜zp(m, f )]Tis computed by ˜zk(m, f )=
n
zk(n)v(nmT )ej2πfn.Onthe
basis of (23), z(m, f )and˜
z(m, f ) are approximately equal. Therefore, the
temporal separator matrix can be estimated using ˜
z(m, f ) instead of z(m, f)
in STFT domain.
300
The separator outputs’ real isomorphic vector is represented as ´
z(m, f )=
Ω(B(f))´
y(m, f ). Writing the total frequency response as F(f)=B(f)A(f),
we can rewrite ´
z(m, f ) in terms of the sources as ´
z(m, f )=Ω(F(f))´
s(m, f ).
For a perfect separation in frequency domain, the frequency response of the
total system must satisfy the equality Ω(F(f)) = PD(f), where Pis a per-
305
mutation matrix and D(f) is a full-rank diagonal matrix.
Defining the isomorphic source sample set ´
S(f)={´
s(m, f )R2p,mLv},
the following assumption is introduced for the frequency domain BSS setup
defined above:
Assumption(A2): For each frequency bin, ´
S(f) contains the vertices (the
310
corners of a volume) of its bounding 1-norm-ball B´s.
14
3.2. Objective Function
To obtain a frequency based objective function, we extend the volume defi-
nition of the principal hyper-ellipsoid in [17] to frequency domain. In frequency
domain, the Power Spectral Density Matrix (PSDM) of source processes can be
315
defined as
Ps(f) = lim
L→∞ Es(f)s(f)H,(24)
where s(f) is the source DTFT, Lis the sample size of s(n). Using the relation
between the source DTFT’s and STFT’s, (24) can be rewritten as
Ps(f) = lim
dim{Lv}→∞
mLv
Ps(m, f ),(25)
where Ps(m, f ) is the instant PSDM of s(m, f ) [23]. The separator output
PSDM can be related to the source PSDM by Pz(f)=F(f)Ps(f)F(f)Hwhere320
Pz(f)Cp×pand F(f) is the frequency response of the total system. For
a given frequency bin, this is equal to the covariance matrix of the separator
outputs’ spectrum. Thus, the determinant of the separator outputs’ temporal
covariance matrix used for instantaneous mixtures in [17] can be replaced by the
determinant of the separator output PSDM in frequency domain for convolutive
325
mixtures. By this way, an extension of the principal hyper-ellipsoid’s volume
definition in [17] is obtained for the convolutive mixtures in frequency domain.
Consequently, we propose the objective function
J(˜
B)=
1/2
f=1/2
1
2log (det(Ω (Pz(f)))) df
1/2
f=1/2
2plog max
mLv
´
z(m, f )1df ,
(26)
where Ω(Pz(f)) is the corresponding real isomorphic matrix. The integral term330
in (26) makes it possible that the objective function be defined according to
the time-domain separator matrix ˜
B. Unless the integral term was used, the
objective function would have to be maximized for each frequency bin separately
instead of being maximized with respect to ˜
B.
The following theorem ensures that maximization of the objective function
335
in (26) achieves blind source separation of temporally convolutive mixtures.
15
Theorem-2: Let us assume that an FIR separator matrix can equalize the
mixing channel A(f). Given the BCA setup in Section 3.1, all global maximums
of (26) give perfect separation if the assumption (A2) is correct.
The proof is provided in Appendix B.
340
3.3. Iterative Algorithm
For the finite set of observations {y(1),...,y(L)}, we modify the objective
as
J(˜
B)= 1
β
R11
l=R1+1
1
2log det(Ω(ˆ
Pz(l)))
 
J1(˜
B)
2plog max
mLv
´
˜
z(m, l)1
 
J2(˜
B)
,(27)
where β=2R11 is the DFT size and ˆ
Pz(l) is the PSDM estimate of the
separator output. Note that ´
z(m, l) is replaced with ´
˜
z(m, l) on the basis of
345
(23). ˆ
Pz(l) is defined in terms of the PSDM estimate of the mixture vector as
ˆ
Pz(l)=B(l)ˆ
Py(l)B(l)H.ˆ
Py(l) is the PSDM estimate of the mixtures and given
by
ˆ
Py(l)=
mLv
ˆ
Py(m, l),(28)
where ˆ
Py(m, l) is the instant PSDM estimate of the mixture signals. For the
calculation of the estimated PSDM ˆ
Py(m, l), we use the spectrogram method350
[24] expressed as
ˆ
Py(m, l)=
t
u(t)˜
yt(m, l)˜
yt(m, l)H(29)
where u(t) is a time-window of length R2and ˜
yt(m, l)=
n
y(n+t)v(n
mT )ej2π(n+t)l/β. The window u(t) provides a weighted averaging of R2spectra
in order to reduce the variance of the estimation.
In order to prevent scaling and permutation ambiguities, we derive a time-
355
domain separation approach based on the frequency domain objective function
in (27) instead of the separation matrices for each frequency bin separately.
16
The derivative of the first part of J(˜
B) with respect to B(k)fork=
0,...,K 1is
1
β
R11
l=R1+1
∂J1(˜
B)
B(k)=Blogdet =Re{X11 +X22 +j(X21 X12)}.(30)
The components Xij Rp×qaredenedasfollows:
1
β
R11
l=R1+1
Ω(ˆ
Pz(l))1Ω(B(l))Ω(ˆ
Py(l))OT=
X11 X12
X21 X22
,(31)
where O=
cos 2πkl/β sin 2πkl/β
sin 2πkl/β cos 2πkl/β
. The spectral transform B(l) in (31) is
given explicitly as B(l)=
R11
k=R1+1
B(k)ej2πkl/β.360
The derivative of the second part is
R11
l=R1+1
1
β
∂J2(˜
B)
B(k)=Bsubg =2p
β
R11
l=R1+1
Re{signc{˜
z(ul,l)}˜
y(ul,l)Hej2πkl/β}
max
mLv
´
˜
z(m, l)1
(32)
where ulLvdenotes the index of the STFT frame for which maximum 1-
norm at the separator output in frequency bin lis achieved. signc denotes the
sign operator for a complex signal computed as Re{sign{a}} +iIm{sign{a}}.
365
Finally, the update equation for the separator matrix in time domain is given
as,
B(t+1)(k)=B(t)(k)+σ(t)B(t)
logdet B(t)
subg.(33)
4. Numerical Examples
In the experiments, as benchmark algorithms, we use Kurtosis Maximization
algorithm (MaxKurtosis) optimizing kurtosis based contrast function [25], the
370
convolutive BSS method [1] exploiting the non-stationarity of source signals la-
belled as “convBSS”, the Alternating-Least-Squares (ALS) based method [26],
17
which again exploits the non-stationarity, labelled as “ALS”, Spatio-Temporal
algorithm (STFICA) [2] which is an extension of FastICA method, time-based
convolutive Bounded Component Analysis (CBCA) [27], and Cho’s underde-
375
termined convolutive BSS [28]. We illustrate CBCA’s performance only in the
first and second experiments to prove CSBCA’s superiority for sparse signals.
Besides, we include Cho’s method only in the third experiment because it is an
under-determined BSS method and its publicly available code is optimized for
audio source separation. We use the MATLAB toolbox called BSS Evaluation
380
proposed in [29] which is designed specifically to measure the performance of
algorithms in Blind Audio Signal Separation problem.
4.1. Synthetic Sparse Signals
0200 400 600 800 1000
20
0
20
0200 400 600 800 1000
20
0
20
0200 400 600 800 1000
20
0
20
Signal Magnitude
0200 400 600 800 1000
20
0
20
0200 400 600 800 1000
20
0
20
0200 400 600 800 1000
20
0
20
Time Index
510 15 20 25 30 35 40
0
5
10
15
20
25
30
35
40
Input Signal to Noise Ratio (dB)
Output Signal to Distortion Ratio (dB)
(a) (b)
MaxKurtosis
STFICA
CBCA
ALS
convBSS
CSBCA
Figure 2: a)- Synthetic signals (RIKEN). b)- Output SDR vs. input SNR.
In the first numerical example, we illustrate the performance of the time
based CSBCA. For the performance evaluation, we use synthetic signals in the
385
RIKEN Brain Science Institute benchmark dataset [30]. In the first part of the
test, we consider a scenario with 1000 samples from 6 sources, and 12 mixtures.
The orders of i.i.d. Gaussian convolutive channel and separator filter are 3
and 4 respectively. In this part, we analyse Signal to Distortion Ratio (SDR)
performance versus input Signal to Noise Ratio (SNR). In the second part, we
390
consider a scenario with 1000 samples from 5 sources and analyse the perfor-
mance with respect to number of mixing channels and mixing order for input
SNR=20 dB. Note that noise is considered to be additive and white Gaussian
(AWGN) in all analyses. The source signals are illustrated in Fig.2-(a). The
18
result of the input SNR analysis is depicted in Fig.2-(b) and we can comment395
that time based CSBCA yields better or equal performance almost at all SNRs.
In the second part, the results seem more complex to interpret. At first, it can
be interpreted that CBCA algorithm can not satisfactorily extract the source
signals even at high SNR case as seen in Fig. 2-(b). That’s why changing num-
ber of mixtures and mixing channel order does not effect its performance much
400
as shown in Fig.3. Talking about the convBSS algorithm, it takes advantage of
the non-stationarity of source signals and performs a joint diagonalization on
the cross power spectral density matrices of signals by using a gradient descent
algorithm. The ALS algorithm also exploits the non-stationarity, but essentially
differs from convBSS in two respects. Firstly, it uses a procedure to solve scaling
405
problem inherent with frequency domain approaches. Secondly, it uses a more
efficient and quickly converging optimization method called Alternating-Least-
Squares. Especially the absence of a method for solving scaling problem may
be the reason why convBSS can not separate the sources satisfactorily even if
the number of mixtures increases. As a result, the two differences mentioned
410
above give superiority to ALS.
2 3 4 5 6 7 8
5
10
15
20
25
30
35
Order of Mixin
g
S
y
stem
Output Signal to Distortion Ratio (dB)
(a)
6 7 8 9 10 11 12
5
10
15
20
25
30
Number of Mixture Channels
Output Signal to Distortion Ratio (dB)
(b)
MaxKurtosis
STFICA
CBCA
ALS
convBSS
CSBCA
MaxKurtosis
STFICA
CBCA
ALS
convBSS
CSBCA
Figure 3: a)- Output SDR vs. mixing order. b)- Output SDR vs. number of
mixing channels.
To evaluate the performances of CSBCA’s complex extension, we again use
the synthetic sparse signals. In the complex CSBCA experiment, 3 complex
sources are generated from 6 synthetic sources. Fig. 4 illustrates the SNR
19
510 15 20 25 30 35 40
10
15
20
25
Input Signal to Noise Ratio (dB)
Output Signal to Distortion Ratio (dB)
Figure 4: Complex CSBCA: Output SDR vs. input SNR.
performance for complex CSBCA. There is no comparison with the benchmark
415
algorithms listed above because the published codes for these algorithms work
for only real signals.
4.2. Statistically Dependent Sources
0200 400 600 800 1000 1200 1400 1600 1800 2000
0
0.5
1
(a)
0200 400 600 800 1000 1200 1400 1600 1800 2000
0
0.2
0.4
0.6
0.8
0200 400 600 800 1000 1200 1400 1600 1800 2000
0
0.5
1
Time index
(a)
00.2 0.4 0.6 0.8
0
5
10
15
20
Correlation Coefficient
Output Signal to Distortion Ratio (dB)
(b)
MaxKurtosis
STFICA
CBCA
ALS
convBSS
CSBCA
Figure 5: a)- Selective Copula-T distribution random sparse sequences. b)-
Output SDR vs. correlation.
In the second example, we generate sparse and statistically dependent
sources by producing Copula-T distributed random vector u[1,1]p,and420
transforming this random vector for sparsification through the mapping
s=u,uLr
0,otherwise
(34)
where Lr=x:xr≤1with0r1. We consider a scenario with 2000
samples from 3 sources, and 6 mixtures. The performance of the time domain
20
CSBCA for different correlation degrees is examined and illustrated in Fig.5. It
results that CSBCA outperforms over the other algorithms for all correlation
425
values. It seems that ALS is most effected by the source dependency. ALS and
convBSS employ joint diagonalization of cross power spectral density matrices
as the first stage in blind source separation. The source dependency can deterio-
rate this stage, that’s why ALS may show a worse performance as the correlation
increases. The performance degradation of MaxKurtosis and STFICA with in-
430
creasing correlation is very reasonable because these methods assume statistical
mutual independence of sources.
4.3. Speech Signal Separation
In the third example, we analyse the performance of the frequency based
CSBCA for speech signal separation by using the measured channel impulse
435
responses in MARDY (Multichannel Acoustic Reverberation Database at York)
Database [31] and three (2 males and 1 female) speech records in which differ-
ent persons read different texts [32]. MARDY database is an open source for
dereverberation researchers to access real life multichannel impulse responses.
The recording setup of MARDY is described in detail in [33]. In this recording
440
work, the reverberant impulse responses of microphones were measured for three
loudspeakers at different locations. In this experiment, the speech signals are
assumed to be transmitted by these loudspeakers and received by three micro-
phones. The recorded speech signals and convolutive mixture signals are illus-
trated in Fig. 6. Selected input parameters for the algorithms are: FCSBCA:
445
Separation Filter Order= 80, STFT Window Length= 150, Overlap Ratio of
STFT Windows= 0.8, ALS: Epoch Length= 1000, FFT Length= 512, Overlap
Ratio of FFT Windows= 0.7, STFICA: Separator Filter Order= 100, con-
vBSS: FFT Length= 512, Separator Filter Order= 100, Number of matrices
to diagonalize= 5, Cho: FFT Length= 4096, Shift Length=512, MaxKurto-
450
sis: Separator Filter Order= 100. Note that the term “Epoch”used in ALS
method means FFT frame. The performances of the algorithms for different
mixing channel orders and input SNRs are examined as shown in Fig.7 and
21
×104
0246810121416
-0.2
-0.1
0
0.1
0.2
0.3
Speech Signals
×104
0246810121416
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
Time Index
×104
0246810121416
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
×104
0246810121416
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
Mixture Signals
×104
0246810121416
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Time Index
×104
0246810121416
-0.3
-0.2
-0.1
0
0.1
0.2
Figure 6: Speech signals and their mixtures.
01000 2000 3000 4000 5000 6000 7000 8000 9000 1000
0
2
0
2
4
6
8
10
12
14
16
Mixing Channel Order
Output Signal to Distortion Ratio (dB)
STFICA
MaxKurtosis
ALS
convBSS
Cho
FCSBCA
(a)
01000 2000 3000 4000 5000 6000 7000 8000 9000 10000
2
4
6
8
10
12
14
16
18
20
Mixing Channel Order
Output Signal to Interference Ratio (dB)
STFICA
MaxKurtosis
ALS
convBSS
Cho
FCSBCA
(b)
Figure 7: Separation of 3 speech signals in a reverberant environment. (a)-
Output SDR vs mixing channel order. (b)-Output SIR vs mixing channel order.
Fig.8. In the channel order analysis, microphones with noiseless channel re-
sponses (SNR=Inf) are used. In the input SNR analysis, microphones with
455
channel order = 500 are used. It is clear that FCSBCA yields better perfor-
mance than all the other methods with respect to SDR values. As a result, it
can be said that the results show a practical merit of the proposed algorithm.
5. Conclusion
In this article, we propose time and frequency based deterministic BSS ap-
460
proaches for the convolutive mixtures of bounded sparse sources. This frame-
work is a natural extension of the instantaneous SBCA framework in [17]. The
22
10 5 0 5 10 15 20
15
10
5
0
5
10
15
Input Signal to Noise Ratio (dB)
Output Signal to Distortion Ratio (dB)
STFICA
MaxKurtosis
ALS
convBSS
Cho
FCSBCA
(a)
10 5 0 5 10 15 20
0
5
10
15
20
Input Signal to Noise Ratio (dB)
Output Signal to Interference Ratio (dB)
STFICA
MaxKurtosis
ALS
convBSS
Cho
FCSBCA
(b)
Figure 8: Separation of 3 speech signals in a reverberant environment. (a)-
Output SDR vs input SNR. (b)-Output SIR vs input SNR.
proposed approaches do not assume statistical independence in space, time or
frequency domain. Moreover, the frequency based CSBCA has the advantages
associated with the frequency domain approaches but does not have the inher-
465
ently known permutation and scaling ambiguities because it updates temporal
separator matrix instead of separate separator matrix for each frequency bin.
We demonstrate the performance improvement over some well known convolu-
tive BSS methods, and study a practical application of blind speech separation.
Appendix A :
470
As the first step for the proof of Theorem-1, we rewrite the objective
function, in terms of the arguments ˜
Fand ΓN(˜
F). Then, the sample covariance
matrix can be written as ˆ
R˜zN
N(˜
F)ˆ
R˜sL3ΓN(˜
F)Twhere L3=N+P1.
Using this in (9), the objective function is linked as
J(˜
F)= det(ΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T)
(max
n∈{N,...,L1}ΓN(˜
F)˜
sL3(n)1)Np .(A.1)
Under the assumption (A1), the denominator of (A.1) is turned out to
475
(max
n∈{N,...,L1}ΓN(˜
F)˜
sL3(n)1)Np ≤ΓN(˜
F)Np
1,1,(A.2)
23
where ΓN(˜
F)1,1=ΓN(˜
F):,11,...,ΓN(˜
F):,L3p1. (A.2) must be an
equality under the assumption (A1) so that (A.1) becomes
J(˜
F)= det(ΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T)
ΓN(˜
F):,11···ΓN(˜
F):,L3p1
Np
.(A.3)
Since we choose NP, there is at least one block column inside ΓN(˜
F)that
contains ˜
Fin upside down form. Therefore, we have the equality ΓN(˜
F):,11
··· ΓN(˜
F):,L3p1=ΓN(˜
F)1,:1··· ΓN(˜
F)Np,:1. This equality480
enables the objective to be transformed to
J(˜
F)det(ΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T)
(ΓN(˜
F)1,:1···ΓN(˜
F)Np,:11/(Np))Np (A.4)
det(ΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T)
ΓN(˜
F)1,:1ΓN(˜
F)2,:1···ΓN(˜
F)Np,:1
(A.5)
det(ΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T)
ΓN(˜
F)1,:2ΓN(˜
F)2,:2···ΓN(˜
F)Np,:2
.(A.6)
To simplify the upper bound expression more, it is possible to rewrite the nom-
inator of the objective by completing ΓN(˜
F) to a full rank square matrix. For
the resulting matrix’s positions, we could use Schur complement. Let’s define
a(P1)p×(L3)pmatrix X=DP where D=diag (a1;a2;...;a(P1)p)isa485
full rank diagonal matrix and Pis a permutation matrix. Let us assume that
we select an appropriate Xmatrix that satisfies the equality det(XWXT)=1
where W=ˆ
R˜
sL3ˆ
R˜
sL3ΓN(˜
F)TN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T)1ΓN(˜
F)ˆ
R˜
sL3.
Using the matrix X, we can express the determinant in a different way:
det
ΓN(˜
F)
X
ˆ
R˜
sL3[ΓN(˜
F)TXT]
=detΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)Tdet(Xˆ
R˜
sL3
ˆ
R˜
sL3ΓN(˜
F)TN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T)1ΓN(˜
F)ˆ
R˜
sL3XT)
=detΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)Tdet(XWXT)
=detΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T(A.7)
24
XWXTis the Schur complement of ΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T. If we select an ap-490
propriate Xmatrix that satisfies the Schur complement and use Hadamard’s
Inequality, it gives the following inequality:
det(ΓN(˜
F)ˆ
R˜
sL3ΓN(˜
F)T)
Np
m=1
ΓN(˜
F)m,:2
2
(P1)p
n=1
Xn,:2
2det( ˆ
R˜
sL3)(A.8)
Inserting the inequality (A.8) into (A.5), a new upper bound is obtained as
J(˜
F)
(P1)p
n=1
Xn,:2det( ˆ
R˜
sL3)1/2(A.9)
Since ΓN(˜
F) is a block-Toeplitz matrix, each row of it corresponds to a row of ˜
F.
Moreover, since we choose NP, it is guaranteed that there is a block column
495
of ΓN(˜
F)thatcontains[F(0),F(1),...,F(P1)]. Thus, we will consider ˜
F
instead of ΓN(˜
F) to comment the above inequalities.
The inequalities in (A.4) and (A.5) are achieved only if there is the same 1
norm in all the rows of ˜
F. The inequality between 1and 2norms in (A.6) is
achieved only if there is only one non-zero entry in each row of ˜
F. Hadamard in-
500
equality in (A.8) turns into an equality only if the non-zero entries in the rows of
˜
Fare in different positions with respect to mod p, in order to satisfy the orthog-
onality of the rows of ΓN(˜
F) to each other and to the rows of X. To achieve all
these restrictions and satisfy the upper bound in (A.9), we conclude the overall
system transfer matrix as F(z)=diag !α1zd1
2zd2,...,α
pzdp"Pwhere P
505
is a permutation matrix.
Appendix B :
As the first step for the proof of Theorem-2, the PSD term in (26) is written
explicitly as Ω (Pz(f)) = Ω (F(f)) Ω (Ps(f)) Ω (F(f))T. Then, the first term in
(26) turns out to
510
J1(˜
B)=
1/2
f=1/2!log |det(Ω (F(f)))|2+ log(det(Ω (Ps(f))))"df (B.1)
25
The determinant term for Ω (F(f)) can be rewritten in column wise with respect
to Hadamard inequality rule:
log(|det(Ω (F(f)))|2)log(
2p
k=1
Ω(F(f)):,k 2
2)=
2p
k=1
log(Ω(F(f)):,k 2
2)(B.2)
Consequently, replacing the inequality in (B.2) into (26), we reach the fol-
lowing expression for the objective function:
J(F(f))
1/2
f=1/2#2p
k=1
log(Ω(F(f)):,k 2)+
1
2log(det(Ω (Ps(f)))) 2plog( max
mLv
´
z(m, f )1)df (B.3)
The volume term for the 1-norm-ball in (B.3) can be explicitly written as
(max
mLv
´
z(m, f )1)2p=(max
mLv
Ω(F(f)) ´
s(m, f )1)2p. Recall that the samples of
´
s(m, f ) are inside the unity 1-norm-ball. Then, for the maximum term, we515
can write the inequality ( max
mLv
Ω(F(f)) ´
s(m, f )1)2p≤Ω(F(f)) 2p
1,1.Ifthe
assumption (A2) holds, this inequality turns into an equality.
By using the relations between ,
1and 2norms, we can put forward the
following inequalities:
Ω(F(f)) 2p
1,1=Ω(F(f)):,11,··· ,Ω(F(f)):,2p12p
(B.4)
(Ω(F(f)):,11,...,Ω(F(f)):,2p11/(2p))2p(B.5)
≥Ω(F(f)):,11Ω(F(f)):,21···Ω(F(f)):,2p1(B.6)
≥Ω(F(f)):,12Ω(F(f)):,22···Ω(F(f)):,2p2(B.7)
If we replace the inequality (B.7) into (B.3), the objective upper bound becomes
520
J(F(f)) 1
2
1/2
f=1/2
log(det(Ω (Ps(f))))df (B.8)
The norm inequality in (B.5) and the arithmetic-geometric mean inequality in
(B.6) can be achieved only if there is the same 1norm in all the columns of
Ω(F(f)). The norm inequality in (B.7) can be achieved only if there is only
26
one non-zero entry in all the columns of Ω(F(f)). The Hadamard’s Inequality
in (B.2) is achieved only if all the columns of Ω(F(f)) are orthogonal to each
525
other. As a result, to achieve the upper bound in (B.8), we conclude the system
frequency response as Ω(F(f)) = αPwhere Pis a permutation matrix.
References
[1] L. Parra, C. Spence, Convolutive blind source separation of non-stationary
sources, IEEE Trans. on Speech and Audio Processing, (Code available at
530
http://bsp.teithe.gr/members/downloads.html) 8 (3) (May 2000) 320–327.
[2] S. C. Douglas, M. Gupta, H. Sawada, S. Makino, Spatio-temporal fastica
algorithms for the blind separation of convolutive mixtures, IEEE Trans.
on Audio, Speech, And Language Processing 15 (5) (July 2007) 1511–1520.
[3] T. Oktem, A. T. Erdogan, A. Demir, Adaptive receiver structures for
535
fiber communication systems employing polarization-division multiplexing,
Journal of Lightwave Technology 28 (10) (15 May 2010) 1536–1546.
[4] C. Vay´a, J. J. Rieta, C. S´anchez, D. Moratal, Convolutive blind source
separation algorithms applied to the electrocardiogram of atrial fibrillation:
Study of performance, IEEE Transactions on Biomedical Engineering 54 (8)
540
(August 2007) 1530–1533.
[5] D. Nion, N. D. Sidiropoulos, Adaptive algorithms to track the parafac
decomposition of a third-order tensor, IEEE Transactions on Signal Pro-
cessing 57 (6) (June 2009) 2299–2310.
[6] Z. Koldovsk´y, P. Tichavsk´y, Time-domain blind audio source separation
545
using advanced ica methods, in: 8th Annual Conference of the International
Speech Communication Association, August 2007, pp. 846–849.
[7] S. Makino, H. Sawada, T.-W. Lee, Blind Speech Separation, Vol. 615,
Springer, 2007.
27
[8] P. Smaragdis, Blind separation of convolved mixtures in the frequency do-550
main, Neurocomputing 22 (1-3) (20 November 1998) 21–34.
[9] H. Attias, New em algorithms for source separation and deconvolution,
in: Acoustics, Speech, and Signal Processing, 2003 IEEE International
Conference on, Vol. 5, IEEE, 6-10 April 2003, pp. 297–300.
[10] O. Yilmaz, S. Rickard, Blind separation of speech mixtures via time-
555
frequency masking, IEEE Transactions on Signal Processing 52 (7) (July
2004) 1830 – 1847.
[11] P. Comon, Independent component analysis, a new concept?, Signal pro-
cessing 36 (3) (April 1994) 287–314.
[12] A. Hyvarinen, E. Oja, A fast fixed-point algorithm for independent com-
560
ponent analysis, Neural Computation 9 (7) (10 July 1997) 1483–1492.
[13] Z. Koldovsk´y, P. Tichavsk´y, E. Oja, Efficient variant of algorithm fastica
for independent component analysis attaining the cram´er-rao lower bound,
IEEE Transactions on Neural Networks 17 (5) (October 2006) 1265–1277.
[14] Z. He, S. Xie, S. Ding, A. Cichocki, Convolutive blind source separation in
565
the frequency domain based on sparse representation, IEEE Transactions
on Audio, Speech, and Language Processing 15 (5) (July 2007) 1551–1563.
[15] S. Cruces, Bounded component analysis of linear mixtures: A criterion of
minimum convex perimeter, IEEE Transactions on Signal Processing 58 (4)
(15 January 2010) 2141–2154.
570
[16] A. T. Erdogan, A class of bounded component analysis algorithms for the
separation of both independent and dependent sources, IEEE Transactions
on Signal Processing 61 (22) (15 November 2013) 5730–5743.
[17] E. Babatas, A. T. Erdogan, An algorithmic framework for sparse bounded
component analysis, IEEE Transactions on Signal Processing 66 (19) (01
575
October 2018) 5194–5205.
28
[18] E. Babatas, A. T. Erdogan, Sparse bounded component analysis for convo-
lutive mixtures, in: Acoustics, Speech, and Signal Processing, 2018 IEEE
International Conference on, IEEE, 15-20 April 2018, pp. 1–5.
[19] Y. Inouye, R.-W. Liu, A system-theoretic foundation for blind equalization
580
of an fir mimo channel system, IEEE Transactions on Circuits and Systems
I: Fundamental Theory and Applications 49 (4) (April 2002) 425–436.
[20] A. Bagirov, N. Karmitsa, M. M. Makela, Introduction to Nonsmooth Op-
timization: theory, practice and software, Springer, August 2014.
[21] H. A. Inan, A. T. Erdogan, S. Cruces, Stationary point characterization for
585
a class of bca algorithms, IEEE Transactions on Signal Processing 65 (20)
(2017) 5437–5452.
[22] S. Arora, R. Ge, T. Ma, A. Moitra, Simple, efficient, and neural algorithms
for sparse coding, in: Conference on Learning Theory (COLT), Proceedings
of Machine Learning Research, 3–6 July 2015.
590
[23] T. Mei, A. Mertins, F. Yin, J. Xi, J. Chicharo, Blind source separation for
convolutive mixtures based on the joint diagonalization of power spectral
density matrices, Signal Processing 88 (8) (August 2008) 1990 – 2007.
[24] W. Martin, P. Flandrin, Wigner-ville spectral analysis of nonstationary
processes, IEEE Transactions on Acoustics, Speech, and Signal Processing
595
33 (6) (1985) 1461–1470.
[25] M. Castella, E. Moreau, A new method for kurtosis maximization and
source separation, in: Acoustics, Speech and Signal Processing (ICASSP),
2010 IEEE International Conference on (Code available at http://bass-
db.gforge.inria.fr/bss locate/ ), IEEE, 14-19 March 2010, pp. 2670–2673.600
[26] K. Rahbar, J. P. Reilly, A frequency domain method for blind source sep-
aration of convolutive audio mixtures, IEEE Trans. on Speech and Audio
Processing 13 (5) (September 2005) 832–844.
29
[27] H. A. Inan, A. T. Erdogan, A convolutive bounded component analysis
framework for potentially nonstationary independent and/or dependent
605
sources, IEEE Transactions on Signal Processing 63 (1) (01 January 2015)
18–30.
[28] J. Cho, C. D. Yoo, Underdetermined convolutive bss: Bayes risk minimiza-
tion based on a mixture of super-gaussian posterior approximation, IEEE
Transactions on Audio, Speech, and Language Processing 23 (5) (06 March
610
2015) 828–839.
[29] E. Vincent, R. Gribonval, C. F´evotte, Performance measurement in blind
audio source separation, IEEE Trans. on Audio, Speech and Language Pro-
cessing (Code Available at http://bass-db.gforge.inria.fr/bss eval/)14(4)
(19 June 2006) 1462–1469.
615
[30] S.-I. A. A. Cichocki, T. K. Siwek, Tanaka, A. H. Phan, R. Zdunek,
S. Cruces, P. Georgiev, Y. Washizawa, Z. Leonowicz, et al., Icalab tool-
boxes, URL:http://www.bsp.brain.riken.jp/ICALAB (2007).
[31] Speech, A. P. Group, Mardy database,
http://www.commsp.ee.ic.ac.uk/ sap/resources / mardy-multichannel-
620
acoustic-reverberation-database-at-york-database/.
[32] V. G. Reju, S. N. Koh, I. Y. Soon, Underdetermined convo-
lutive blind source separation via time-frequency masking, IEEE
Transactions on Audio, Speech and Language Processing (avail-
able at https://www.mathworks.com/matlabcentral/fileexchange/47069-
625
convolutive-bss) 18 (1) (January 2010) 101–116.
[33] J. Y. Wen, N. D. Gaubitch, E. A. Habets, T. Myatt, P. A. Naylor, Evalu-
ation of speech dereverberation algorithms using the mardy database, in:
in Proc. Intl. Workshop Acoust. Echo Noise Control (IWAENC), 12-14
September 2006.
630
30
... Bounded Component Analysis is an unsupervised technique that assumes the bounded properties of components and the additive decomposition of convex sets that support observations. Later, based on those who came before, a series of BCA algorithms such as [17][18][19] were proposed. This makes the algorithm suitable for many real signals, such as digital communication constellations, natural images, harmonic oscillations with sub-Gaussian properties, sparse signals with a super-Gaussian distribution, etc. † The author is with School of Automation and Information Engineering, Sichuan University of Science and Engineering, Yibin 644000, China. ...
Article
Radio Frequency Identification (RFID) is one of the key technologies of the Internet of Things. However, during its application, it faces a huge challenge of co-frequency interference cancellation, that is, the tag collision problem. The multi-tag anti-collision problem is modeled as a Blind Source Separation (BSS) problem from the perspective of system communication transmission layer signal processing. In order to reduce the cost of the reader antenna, this paper uses the boundedness of the tag communication signal to propose an underdetermined RFID tag anti-collision method based on Bounded Component Analysis (BCA). This algorithm converts the underdetermined tag into the signal collision model is combined with the BCA mechanism. Verification analysis was conducted using simulation data. The experimental results show that compared with the nonnegative matrix factorization (NMF) algorithm based on minimum correlation and minimum volume constraints, the bounded component analysis method proposed in this article can perform better. Solving the underdetermined collision problem greatly improves the effect of eliminating co-channel interference of tag signals, improves the system bit error rate performance, and reduces the complexity of the underdetermined model system.
... Typical techniques, which have achieved considerable success in addressing the BSS problem, include but are not limited to independent component analysis (ICA) and its variations [9,10], sparse component analysis (SCA) [7,11,12], sparse bounded component analysis (SBCA) [13], and non-negative matrix factorization (NMF) [14,15]. Recently, there has been an increasing interest in deep learning-based data-driven approaches [16,17]. ...
Article
Full-text available
This paper explores the important role of blind source separation (BSS) techniques in separating M mixtures including N sources using a dual-sensor array, i.e., M=2, and proposes an efficient two-stage underdetermined BSS (UBSS) algorithm to estimate the mixing matrix and achieve source recovery by exploiting time–frequency (TF) sparsity. First, we design a mixing matrix estimation method by precisely identifying high clustering property single-source TF points (HCP-SSPs) with a spatial vector dictionary based on the principle of matching pursuit (MP). Second, the problem of source recovery in the TF domain is reformulated as an equivalent sparse recovery model with a relaxed sparse condition, i.e., enabling the number of active sources at each auto-source TF point (ASP) to be larger than M. This sparse recovery model relies on the sparsity of an ASP matrix formed by stacking a set of predefined spatial TF vectors; current sparse recovery tools could be utilized to reconstruct N>2 sources. Experimental results are provided to demonstrate the effectiveness of the proposed UBSS algorithm with an easily configured two-sensor array.
... To solve this problem, Gong et al. [6] proposed a normalized boundary objective function that simplified the objective function of BCA and verified the effectiveness of this method via simulation and experiment. Babatas et al. [7] proposed that the PCA algorithms can be applied to temporal or spatially dependent sources, as well as independent sources. Today, the PCA algorithm has been successfully applied in many engineering fields. ...
Article
Full-text available
Currently, the widely used blind source separation algorithm is typically associated with issues such as a sluggish rate of convergence and unstable accuracy, and it is mostly suitable for the separation of independent source signals. Nevertheless, source signals are not always independent of one another in practical applications. This paper suggests a blind source separation algorithm based on the bounded component analysis of the enhanced Beetle Antennae Search algorithm (BAS). Firstly, the restrictive assumptions of the bounded component analysis method are more relaxed and do not require the signal sources to be independent of each other, broadening the applicability of this blind source separation algorithm. Second, the objective function of bounded component analysis is optimized using the improved Beetle Antennae Search optimization algorithm. A step decay factor is introduced to ensure that the beetle does not miss the optimal point when approaching the target, improving the optimization accuracy. At the same time, since only one beetle is required, the optimization speed is also improved. Finally, simulation experiments show that the algorithm can effectively separate independent and dependent source signals and can be applied to blind source separation of images. Compared to traditional blind source separation algorithms, it has stronger universality and has faster convergence speed and higher accuracy compared to the original independent component analysis algorithm.
... The objective of the source separation problem is to isolate the original signals from a combination of several source signals. Blind source separation (BSS) is a typical method used for source separation in the signal processing research community and it has gained considerable attention [1][2][3]. Recently, BSS theory has been refined and enhanced. In selecting the proper BSS approach, the link between the number of observations (sensors) and the number of underlying sources is of paramount importance. ...
Article
Full-text available
Blind source separation (BSS) recovers source signals from observations without knowing the mixing process or source signals. Underdetermined blind source separation (UBSS) occurs when there are fewer mixes than source signals. Sparse component analysis (SCA) is a general UBSS solution that benefits from sparse source signals which consists of (1) mixing matrix estimation and (2) source recovery estimation. The first stage of SCA is crucial, as it will have an impact on the recovery of the source. Single-source points (SSPs) were detected and clustered during the process of mixing matrix estimation. Adaptive time–frequency thresholding (ATFT) was introduced to increase the accuracy of the mixing matrix estimations. ATFT only used significant TF coefficients to detect the SSPs. After identifying the SSPs, hierarchical clustering approximates the mixing matrix. The second stage of SCA estimated the source recovery using least squares methods. The mixing matrix and source recovery estimations were evaluated using the error rate and mean squared error (MSE) metrics. The experimental results on four bioacoustics signals using ATFT demonstrated that the proposed technique outperformed the baseline method, Zhen’s method, and three state-of-the-art methods over a wide range of signal-to-noise ratio (SNR) ranges while consuming less time.
... Although enhanced speech can be generated with the use of only the phase of the noisy signal and the enhanced amplitude through inverse transformation, the quality of the enhanced speech will be deteriorated with the use of the noise phase. Time-frequency domain information is found in later studies to play an important role in speech signal reconstruction [32] and signal separation [33][34][35][36]. Shinnosuke et al. proposed a method for phase reconstruction by constructing neural networks [37]. ...
Article
Full-text available
Although current attention‐based speech enhancement methods have been proven to be capable of significantly improving the noise reduction performance, a bottleneck has arisen in juggling both detailed features and high‐level features: The more attention paid to such performance indicators as speech intelligibility index and articulation index will lead to the loss of subtle features such as syllable continuity and timbre distortion. To tackle such a problem, we divide the speech enhancement model into multiple stages, with a special attention mechanism introduced in each stage so that the detailed speech information is retained and the features of high levels are captured. In this research, a Multi‐Stage Attention Network (MSANet) is implemented in cascade by using three different attention modules: Self Attention, Channel Attention and Spatial Attention. The attention module in each stage needs only to focus on the features of its corresponding stage, thus giving full play to their respective advantages without affecting or damaging the features of other stages, helping decouple the feature extraction process and obtaining the features capable of complementing one another. The comparisons and ablation experiments show that the performance of MSANet is superior to those of current mainstream time domain or time‐frequency domain state‐of‐the‐art methods. Compared to the baseline, PESQ score of their model (3.11) has increased by 5%, CSIG (4.44) has increased by 2.5%, CBAK (3.63) has increased by 2.8% and COVL (3.81) has increased by 3.8%, which demonstrates the potential of MSANet as speech feature extraction backbones. An implementation of the PyTorch version is available here: http://www.msp‐lab.cn:1436/msp/MSANet.
... However, its applicability is limited by the need to perform the computationally expensive singular value decomposition (SVD) multiple times [19]. In addition, there are many other mixing models, such as blind source separation, independent component analysis, sparse component analysis, and non-negative matrix factorisation (NMF) [31][32][33]. In recent years, NMF has been applied to many research areas. ...
Article
Full-text available
Low‐rank matrices play a central role in modelling and computational methods for signal processing and large‐scale data analysis. Real‐world observed data are often sampled from low‐dimensional subspaces, but with sample‐specific corruptions (i.e. outliers) or random noises. In many applications where low‐rank matrices arise, these matrices cannot be fully sampled or directly observed, and one encounters the problem of recovering the matrix given only incomplete and indirect observations. The authors aim to recover a low‐rank component from incomplete and indirect observations and correct the possible errors. A new low‐rank matrix recovery formula based on generalised Tikhonov regularisation and its solution algorithm are proposed. The proposed method determines the low‐rank component for performing matrix recovery from highly corrupted observations. The authors’ recommended algorithm reduces not only the outliers but also random corruptions in the recovering task. The experimental results obtained using both synthetic and real application data demonstrate the efficacy of the proposed method.
... Sparse component analysis, as a common UBSS method, can separate the source signals by exploiting the sparsity characteristics of sources in the transform domain [21,22]. Generally, the SCA algorithm consists of two steps: mixing matrix estimation and source recovery [23]. ...
Article
Full-text available
In practical engineering applications, the vibration signals collected by sensors often contain outliers, resulting in the separation accuracy of source signals from the observed signals being seriously affected. The mixing matrix estimation is crucial to the underdetermined blind source separation (UBSS), determining the accuracy level of the source signals recovery. Therefore, a two-stage clustering method is proposed by combining hierarchical clustering and K-means to improve the reliability of the estimated mixing matrix in this paper. The proposed method is used to solve the two major problems in the K-means algorithm: the random selection of initial cluster centers and the sensitivity of the algorithm to outliers. Firstly, the observed signals are clustered by hierarchical clustering to get the cluster centers. Secondly, the cosine distance is used to eliminate the outliers deviating from cluster centers. Then, the initial cluster centers are obtained by calculating the mean value of each remaining cluster. Finally, the mixing matrix is estimated with the improved K-means, and the sources are recovered using the least square method. Simulation and the reciprocating compressor fault experiments demonstrate the effectiveness of the proposed method.
Article
Significant progress has been made in single source recognition for fiber-optical distributed acoustic sensor (DAS). However, it is still challenging to detect and identify more than one unpredictable vibration sources when they are superimposed at the same fiber receiving point. Thus, in this paper it is proposed a blind multi-source separation method based on fast independent component analysis (FastICA), which utilizes the independency and non-Gaussianity of different sources. Firstly, two multi-source mixing mechanisms and separability of different sources received by DAS based on -OTDR are discussed; to solve the two blind problems that the source number and the mixing mode are both unknown, a linear simultaneous mixing mode is assumed, and the source number is estimated by singular value decomposition to the observation matrix; then preprocessing of denoising and anti-mixing, and separation with FastICA by maximizing negative entropy are carried out to make the non-Gaussianity of the estimated signal achieve its maximum; finally, feasibility of the separation method is evaluated through several mixing cases including simulations with two to four field collected signals and a real field test with two sources superimposed on the buried fiber. Signal waves and the spectra, and three separation indicators, such as the Performance Index (PI), the signal correlation coefficients, and the signal mean square error (SMSE), are used to evaluate the performance of the method. As far as we know, it is the first time to realize the separation of an unknown number of the superimposed sources detected by DAS.
Article
Full-text available
Bounded Component Analysis (BCA) is a recent approach which enables the separation of both dependent and independent signals from their mixtures. This article introduces a novel deterministic instantaneous BCA approach for the separation of sparse bounded sources. The separation problem is defined as a geometric maximization problem, where the volume ratio of two geometric objects, namely the principal hyperellipsoid and the bounding l1 norm ball, defined over the separator output samples. It is shown that the global maxima of this objective are perfect separators. The article also provides the corresponding iterative algorithms for both real and complex sparse sources. The numerical experiments illustrate the potential benefits of the proposed approach.
Conference Paper
Full-text available
In this article, we propose a Bounded Component Analysis (BCA) approach for the separation of the convolutive mixtures of sparse sources. The corresponding algorithm is derived from a geometric objective function defined over a completely deterministic setting. Therefore, it is applicable to sources which can be independent or dependent in both space and time dimensions. We show that all global optima of the proposed objective are perfect separators. We also provide numerical examples to illustrate the performance of the algorithm.
Article
Full-text available
Bounded Component Analysis (BCA) is a recently introduced approach including Independent Component Analysis (ICA) as a special case under the assumption of source boundedness. In this article, we provide a stationary point analysis for the recently proposed instantaneous BCA algorithms that are capable of separating dependent, even correlated as well as independent sources from their mixtures. The stationary points are identified and characterized as either perfect separators which are the global maxima of the proposed optimization scheme or saddle points. The important result emerging from the analysis is that there are no local optima that can prevent the proposed BCA algorithms from converging to perfect separators.
Article
Full-text available
Bounded Component Analysis (BCA) is a recent framework which enables development of methods for the separation of dependent as well as independent sources from their mixtures. This article extends a recent geometric BCA approach introduced for the instantaneous mixing problem to the convolutive mixing problem. The article proposes novel deterministic convolutive BCA frameworks for the blind source extraction and blind source separation of convolutive mixtures of sources which allows the sources to be potentially non-stationary. The global maximizers of the proposed deterministic BCA optimization settings are proved to be perfect separators. The article also illustrates that the iterative algorithms corresponding to these frameworks are capable of extracting/separating convolutive mixtures of not only independent sources but also dependent (even correlated) sources in both component (space) and sample (time) dimensions through simulations based on a Copula distributed source system. In addition, even when the sources are independent, it is shown that the proposed BCA approach have the potential to provide improvement in separation performance especially for short data records based on the setups involving convolutive mixtures of digital communication sources.
Article
Full-text available
Bounded Component Analysis (BCA) is a recent approach which enables the separation of both dependent and independent signals from their mixtures. In this approach, under the practical source boundedness assumption, the widely used statistical independence assumption is replaced by a more generic domain separability assumption. This article introduces a geometric framework for the development of Bounded Component Analysis algorithms. Two main geometric objects related to the separator output samples, namely Principal Hyper-Ellipsoid and Bounding Hyper-Rectangle, are introduced. The maximization of the volume ratio of these objects, and its extensions, are introduced as relevant optimization problems for Bounded Component Analysis. The article also provides corresponding iterative algorithms for both real and complex sources. The numerical examples illustrate the potential advantage of the proposed BCA framework in terms of correlated source separation capability as well as performance improvement for short data records.
Conference Paper
Full-text available
Dereverberation is a growing area of research with many new algorithms appearing in the literature. However, there are still no unanimously accepted tools for evalua- tion of these algorithms. In this paper, we introduce the Multichannel Acoustic Reverberation Database at York (MARDY) containing real measured multichannel room impulse responses. We demonstrate its use for the evalua- tion of dereverberation algorithms using three recent mul- tichannel methods. Furthermore, psychoacoustic issues regarding the performance evaluation of dereverberation algorithms are discussed.
Article
We consider the blind deconvolution problem of FIR multiple-input multiple-output channel systems. The systems-theoretic foundation is established on the broadest class of equalizable channel systems. This class of equalizable channel systems is then characterized. Based on this characterization, a fundamental factorization property, called irreducible-allpass factorization, is shown. Based on this fact, it is shown that any equalizable FIR system can and only can be converted to a paraunitary system, through the whitening of the observations, using the second-order statistics. The current second order statistics techniques considered only those channel systems that are irreducible, only half of the irreducible-allpass factorization of equalizable channels
Article
This paper considers the underdetermined blind source separation (BSS) of convolutively mixed super-Gaussian signals that include speech, audio, and various other sparse signals. Here, the separation is performed in three steps. In the first and second steps, the mixing matrix and the sources at each time–frequency location are estimated by minimizing the Bayes risk (or the posterior risk) with squared loss. In the final third step, the permutation alignment is conducted by considering the correlation between adjacent spectral bins as in many conventional algorithms. To overcome any computationally intractable integrations involving a complex-valued super-Gaussian source prior, the posterior distribution of the sources is approximated as a mixture of super-Gaussians. The posterior means of the mixing matrix and the sources are obtained with Metropolis–Hastings within Gibbs sampling and the weighted sum of individual super-Gaussians, respectively. Overall, this approximation leads to a separation that is computationally lighter than and as accurate as the algorithm without the approximation. The simulation results of the synthetically generated data in a virtual room with reverberation show that the estimates of the mixing matrix in the first step and the sources in the second step are more accurate than the estimates from the state-of-the-art algorithms in terms of the mixing error ratio (MER) and the signal-to-distortion ratio (SDR). The experiment was also conducted with recorded data in a real room environment using a public benchmark dataset. Results show that the proposed algorithm gives a better performance compared to the state-of-the-art algorithms in terms of the SDR.
Article
In this paper, we consider the problem of separation of unknown number of sources from their underdetermined convolutive mixtures via time-frequency (TF) masking. We propose two algorithms, one for the estimation of the masks which are to be applied to the mixture in the TF domain for the separation of signals in the frequency domain, and the other for solving the permutation problem. The algorithm for mask estimation is based on the concept of angles in complex vector space. Unlike the previously reported methods, the algorithm does not require any estimation of the mixing matrix or the source positions for mask estimation. The algorithm clusters the mixture samples in the TF domain based on the Hermitian angle between the sample vector and a reference vector using the well known k -means or fuzzy c -means clustering algorithms. The membership functions so obtained from the clustering algorithms are directly used as the masks. The algorithm for solving the permutation problem clusters the estimated masks by using k -means clustering of small groups of nearby masks with overlap. The effectiveness of the algorithm in separating the sources, including collinear sources, from their underdetermined convolutive mixtures obtained in a real room environment, is demonstrated.