Download full-text PDF

Proportionate Algorithms for Blind Source Separation

Chapter · January 2014with42 Reads
DOI 10.1007/978-3-319-04129-2_10
Publisher: Springer International Publishing, Editors: Simone Bassis, Anna Esposito, Francesco Carlo Morabito
Editors
Abstract
In this paper we propose an extension of time-domain Blind Source Separation algorithms by applying the well known proportionate and improved proportionate adaptive algorithms. These algorithms, known in the context of adaptive filtering, are able to use the sparseness of acoustic impulse responses of mixing environments and give better performances than standard algorithms. Some preliminary experimental results show the effectiveness of the proposed approach in terms of convergence speed.
Proportionate Algorithms for Blind Source Separation
Michele Scarpiniti, Danilo Comminiello, Simone Scardapane,
Raffaele Parisi, and Aurelio Uncini
Department of Information Engineering, Electronics and Telecommunications (DIET),
“Sapienza” University of Rome, via Eudossiana 18, 00184, Rome
{michele.scarpiniti,danilo.comminiello,simone.scardapane,
raffaele.parisi,aurelio.uncini}@uniroma1.it
Abstract. In this paper we propose an extension of time-domain Blind Source
Separation algorithms by applying the well known proportionate and improved
proportionate adaptive algorithms. These algorithms, known in the context of
adaptive filtering, are able to use the sparseness of acoustic impulse responses
of mixing environments and give better performances than standard algorithms.
Some preliminary experimental results show the effectiveness of the proposed
approach in terms of convergence speed.
Keywords: Blind Source Separation, Independent Component Analysis,
Proportionate algorithms, Improved proportionate.
1 Introduction
Blind Source Separation (BSS) applied to speech and audio signals is an attractive re-
search topic in the field of adaptive signal processing [1,2]. The problem is to recover
original sources froma set of mixtures recorded in an unknownenvironment. While sev-
eral well-performing approaches exist when the mixing environment is instantaneous,
some problems arise in convolutive environments.
Several solutions were proposed to solve BSS in a convolutive environment [3,2].
Some of these solutions work in time domain, others in frequency domain. Each of them
have some advantagesand disadvantages, but there is not a unique winner approach[4].
In addition, when working with speech and audio signals, convergence speed is an
important task to be performed. Since impulse responses of standard environments,
e.g. office rooms, are quite sparse, some authors have proposed to incorporate sparse-
ness in the learning algorithm [5,6]. The idea is to introduce a weighting matrix in
the update equation, that can give more emphasis to the most important part of the im-
pulse response. In particular a proportionate [5] and an improvedproportionate [7] algo-
rithms were proposed for supervised signal processing applications, like acoustic echo
cancellation.
In this paper we aim to extend these proportionate algorithms to the BSS problem,
in the hope that they can be effective also for unsupervised case. Hence a proportion-
ate and an improved proportionate version of the well-known time-domain Torkkola’s
algorithm [8,9] will be proposed. Some preliminary results, that demonstrate the effec-
tiveness of the proposed idea in terms of convergence speed, are also presented.
S. Bassis et al. (eds.), Recent Advances of Neural Network Models and Applications,99
Smart Innovation, Systems and Technologies 26,
DOI: 10.1007/978-3-319-04129-2_10, c
Springer International Publishing Switzerland 2014
100 M. Scarpiniti et al.
The rest of the paper is organized as follows: Section 2 introduces the BSS problem
in convolutive environments. Then Section 3 describes the proposed algorithm, while
Section 4 shows some experimental results. Finally Section 5 draws our conclusions.
2 Blind Source Separation for Convolutive Mixtures
Let us consider a set of Nunknown and independent sources denoted as
s[n]=[s1[n],...,sN[n]]T, such that the components si[n]are zero-mean and mutually in-
dependent. Signals received by an array of Msensors are denoted by
x[n]=[x1[n],...,xM[n]]Tand are called mixtures. For simplicity we consider the case
of N=M.
The convolutive model introduces the following relation between the i-th mixed sig-
nal and the original source signals
xi[n]=
N
j=1
K1
k=0
aij[k]sj[nk],i=1,...,M(1)
The mixed signal is a linear mixture of filtered versions of the source signals, aij[k]
represents the k-th mixing filter coefficient and Kis the number of filter taps. The task
is to estimate the independentcomponents from the observations without resorting to a
priori knowledgeabout the mixing system and obtaining an estimate u[n]of the original
source vector s[n]:
ui[n]=
M
j=1
L1
l=0
wij[l]xj[nl],i=1,...,N(2)
where wij[l]denotes the l-th mixing filter coefficient and Lis the number of filter taps.
The weights wij[l]can be adapted by minimizing some suitable cost function. A
particular good choice is to maximize joint entropy or, equivalently, to minimize the
mutual information [1,3]. Different approaches can be used, for example implementing
the de-mixing algorithm in the time domain or in the frequency domain. In this paper we
adopt the time-domain approach, using the algorithm proposed by Torkkola in [8] and
based on the feedback network shown in Figure 1 (for the particular case of M=N=2)
and described mathematically by:
ui[n]=
L1
l=0
wii[l]xi[nl]+
N
j=1,j=i
L1
l=1
wij[l]uj[nl],i=1,...,N(3)
This latter will be used in the paper for achieving source separation in time domain by
maximizing the joint entropy of a nonlinear transformation of network output: yi[n]=
f(ui[n]), with f(·)a suitable nonlinear function, very close to the source cumulative
density function [1,8]. In this work we use f(·)=tanh(·).Thek-th weight of the de-
mixing filter wij[k]is adapted by using the general rule:
wp+1
ij [k]=wp
ij[k]+
μΔ
wp
ij[k],(4)
where in particular the stochastic gradient method can be used,
μ
is the learning rate
and pis the iteration index.
Proportionate Algorithms for Blind Source Separation 101
w11
w21
w22
w12
1
[]xn
2
[]xn
1
[]un
2
[]un
Max Ent
Adjust
wij
1
[]yn
2
[]yn
Fig. 1. The architecture for BSS in time domain in the particular case of M=N=2
3 The Proposed Algorithm
Standard BSS algorithms were proposed so far in several works [3,9,10,8]. More re-
cently, some authors have underlined the importance of sparseness of the impulse re-
sponse aij[n][6]. In order to improve the performance of standard adaptive algorithms
(LMS and NLMS [11]), that do not take into account sparseness, these authors proposed
some modifications introducing the so-called Proportionate NLMS (PNLMS) [5] and
Improved Proportionate NLMS (IPNLMS) [7]. The resulting algorithms derive from a
more general class of regularized gradient adaptive algorithms [12,13].
In this kind of algorithms, the update term is simply multiplied with a matrix G[k],
whose entries gij[k]are chosen using different criteria (NLMS, PNLMS and IPNLMS)
and take into account the sparseness of the impulse response: the parameters variation
is proportional to the impulse response itself
wp+1=wp+
μΔ
wpwp+1=wp+
μ
G{wp}
Δ
wp.
The aim of this section is to extend the previous ideas on proportionate adaptive algo-
rithms to time-domain BSS algorithm proposed by Torkkola [8], deriving a new updat-
ing rule for the de-mixing filter matrix W[k].
Based on [8] and [6], the proposed modification to the Torkkola’s algorithm results
in the following algorithm:
Δ
wp
ii[0]gii [0]1
wii[0]2yi[n]xi[n],
Δ
wp
ii[k]2gii[k]yi[n]xi[nk],for k1(5)
Δ
wp
ij[k]2gij[k]yi[n]uj[nk],for k1andi=j
where yi[n]= f(ui[n]) = tanh(ui[n]).
102 M. Scarpiniti et al.
The k-th parameter gij[k]in the case of Proportionate BSS (PBSS) are chosen as
follows
gij[k]=
γ
ij[k]
"
"
γ
ij"
"1
,(6)
γ
ij[k]=max#
ρ
max
δ
k, wij[0] ,..., wij[L1] , wij[k] $,
γ
ij =[
γ
ij[0],
γ
ij[1],...,
γ
ij[L1]]T,(7)
with
ρ
and
δ
ksuitable constants.
A second proposal is the following Improved PBSS (IPBSS) choice for the k-th
parameter gij[k]:
gij[k]= 1
β
2L+(1+
β
) wij[k]
2"
"wij"
"1
,(8)
wij =[wij[0],wij[1],...,wij[L1]]T,(9)
where 1
β
1 is a constant.
4 Experimental Results
Some experimental results are proposed using two speech signals sampled at 8 kHz.
Three different synthetic mixing weight sets were proposed and a total of 5000 samples
are used.
The first set of weights is a very simple and sparse set given by
a11[0]=1,a22[0]= 5
6,a12[10]= 4
6,
a21[10]= 3
6,a12[40]= 2
6,a21[40]= 1
6.(10)
A more dense set of weights is used in the second set, perhaps more realistically in
a room that produces notable reverberation, where it was imagined that you have one
microphone closer to the first source and another microphone close to the other, which
results in:
a11[0]=a22[0]=1
aij[5n1]=exp(n),for n=1,...,120 and i,j=1,2(11)
that is, we have the strongest input respectively from the two sources in the input, with
subsequent echoes every five taps of the filter that decay exponentially.
The third set has only non-zero elements for the length of the filter, but is otherwise
similar to the previous weights, and is meant to simulate the same situation:
a11[0]=a22 [0]=1
aij[n]=exp((1+0.4n)),for n=1,...,200 and i,j=1,2(12)
Furthermore, the de-mixing weights are not updated for each iteration of the learning
rule, rather we sum the contributions from the learning rule over 150 iterations between
Proportionate Algorithms for Blind Source Separation 103
each update of the filter for a robust estimation. When using the Gmatrix from the
PBSS algorithm, we always use
ρ
=0.01 and
δ
k=0.01, as suggested as good values
by [6]. When using IPBSS, we always use the parameter
β
=0.1. In both cases we use a
de-mixing filter length of L=200 samples and the learningrate is set to
μ
=1.5×105.
We will face the more complicated problem of separating real world mixtures in a future
work.
Now, we ask ourself the natural question: Does the proposed method using a G
matrix from either the PBSS method or the IPBSS method improve the results from the
standard algorithm? The answer found, is that it depends on the filter length. Or likely,
more generally, how sparse the filter is.
In order to evaluate the convergence of the algorithm, an estimate Iof the mutual
information is used as convergence and performance index [14,15].
4.1 Results of Proportionate Algorithm
As we will see in this section, there are improvements in the convergence speed overthe
standard algorithm. Figure 2 shows the convergence history using the estimated mutual
information, and we can clearly see that there is a considerable speed-up in conver-
gence: already at epoch number 10, we can see that using the PBSS has almost reached
convergence, while the standard algorithm does not reach this level until around epoch
number 30. At epoch number 20, using PBSS, the algorithm has practically reached
convergence, while convergence is reached for the standard algorithm before around
epoch number 80.
010 20 30 40 50 60 70 80
0.05
0.1
0.15
0.2
0.25
0.3
Epoch
I
Convergence history
PBSS − weight set 1
standard BSS − weight set 1
PBSS − weight set 2
standard BSS − weight set 2
PBSS − weight set 3
standard BSS − weight set 3
Fig. 2. Comparison of the standard algorithm to the algorithm with the Gmatrix from the PBSS
method
104 M. Scarpiniti et al.
This suggests that the algorithm converges between 3 and 4 times faster when us-
ing the Gmatrix from the PBSS method, which shows that by adaptively changing the
learning rate for the different taps of the filter, we get a significantly better performance.
As it can be seen in Figure 2, at convergence the solution of both algorithms reaches
the same level of mutual information. It would be interesting to see if there are any
differences, looking at the recovered de-mixing filters directly. Figure 3 shows the so-
lution at convergence for both methodsusing the first weight set, and we easily see that
the main features of both solutions are present in both filters and are approximately the
same filter taps, albeit at slightly different scalings. Except for that, the are other small
differences, but it seems to be mostly noise in the filter. Qualitatively, from hearing the
de-mixing of the the mixed sound files, we could discern no differences between the
two solutions at convergence. Thus it seems reasonable to conclude that both methods
eventually converge to the same solution, but that using the Gmatrix from the PBSS
method, we obtain convergence and a good solution significantly faster.
0 100 200
−1
0
1
2
n
w11[n]
0 100 200
−0.4
−0.2
0
0.2
n
w12[n]
0100 200
−1
0
1
n
w21[n]
0100 200
−2
0
2
n
w22[n]
(a)
0100 200
−1
0
1
2
n
w11[n]
0100 200
−0.4
−0.2
0
0.2
n
w12[n]
0100 200
−1
0
1
n
w21[n]
0100 200
−2
0
2
n
w22[n]
(b)
Fig. 3. Solution at convergence using the first set of weights, for a) standard algorithm, and b)
PBSS algorithm
4.2 Results of Improved Proportionate Algorithm
Similar results can be obtained from the IPBSS method. Figure 4 shows theconvergence
history of the standard algorithm versus the proposed algorithm using the Gmatrix from
the IPBSS method. As we can see, the convergence is significantly faster than what is
possible with the standard algorithm. It is interesting to note also that there is more
noise in the mutual information at convergence with respect to the standard algorithm.
However, looking at the convergingsolution, there is once again not difference with the
standard algorithm. Qualitatively from listening to the solutions, the authors could not
hear any difference. We have therefore concluded that the difference is negligible and
might have been caused by the noise in the mutual information at convergence.
The IPBSS algorithm turned out to be slightly faster for these examples than the
PBSS one. From Figure 4 we can see from the value of the mutual information of the
proposed algorithm using IPBSS from the first epoch, that the same valueis reached for
the standard algorithm between epoch 4 and 5, while for PBSS the value of the mutual
Proportionate Algorithms for Blind Source Separation 105
010 20 30 40 50 60 70 80
0.05
0.1
0.15
0.2
0.25
0.3
Epoch
I
Convergence history
IPBSS − weight set 3
standard BSS − weight set 3
IPBSS − weight set 2
standard BSS − weight set 2
IPBSS − weight set 1
standard BSS − weight set 1
Fig. 4. Comparison of the standard algorithm to the algorithm with the Gmatrix from the IPBSS
method
information at the first iteration is reached between epoch 2 and 3. A similar reasoning
can be made for other epochs of iterations.
As concluded in the previous section, that indicates that the proposed algorithm us-
ing the PBSS method increases the speed of convergence of the first iterations of 3-4
times. From this, we can also infer that the IPBSS method further increases the speed
of convergence,suggesting an improvement of 20-25% over the PBSS algorithm.
5 Conclusions
In this paper some preliminary results on a proportionate and improved proportionate
version of the well-known time-domain Torkkola BSS algorithm have been proposed.
The proposed idea is to use the sparseness of the acoustic impulse responses of the
mixing environment, by applying some proportionate algorithm known in the context
of speech adaptive filtering. Some experimental results have shown the effectiveness
of the proposed approach in terms of convergence speed and encourage us for a more
theoretical introduction of these novel classes of separation algorithms.
Acknowledgment. The authors would like to thank Mr. Gaute Halvorsen for his pre-
cious help in performing some of the presented experimental results.
106 M. Scarpiniti et al.
References
1. Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing. John Wiley (2002)
2. Makino, S., Lee, T.W., Sawada, H.: Blind Speech Separation. Springer (2007)
3. Choi, S., Cichocki, A., Park, H.M., Lee, S.Y.: Blind source separation and independent com-
ponent analysis: a review. Neural Information Processing - Letters and Reviews 6(1), 1–57
(2005)
4. Araki, S., Mukai, R., Makino, S., Nishikawa, T., Saruwatari, H.: The fundamental limita-
tion of frequency domain blind source separation for convolutive mixtures of speech. IEEE
Transactions on Speech and Audio Processing 11(2), 109–116 (2003)
5. Duttweiler, D.L.: Proportionate normalized least-mean-square adaptation in echo cancelers.
IEEE Transactions on Speech and Audio Processing 8, 508–518 (2000)
6. Huang, Y., Benesty, J., Chen, J.: Acoustic MIMO Signal Processing. Springer (2006)
7. Benesty, J., Gay, S.L.: An improved PNLMS algorithm. In: Proc. of IEEE International Con-
ference on Acoustics, Speech, and Signal Processing (ICASSP 2002), pp. 1881–1884 (2002)
8. Torkkola, K.: Blind separation of convolved sources based on information maximization. In:
Proc. of the 1996 IEEE Signal Processing Society Workshop on Neural Networks for Signal
Processing, September 4-6, pp. 423–432 (1996)
9. Torkkola, K.: Blind deconvolution, information maximization and recursive filters. In:
Proc. of 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing
(ICASSP 1997), April 21-24, pp. 3301–3304 (1997)
10. Torkkola, K.: Blind separation of delayed sources based on information maximization. In:
Proc. of 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing
(ICASSP 1996), May 7-10, pp. 3509–3512 (1996)
11. Haykin, S.: Adaptive Filter Theory, 4th edn. Prentice-Hall (2001)
12. Ouedraogo, W.S.B., Jaidane, M., Souloumiac, A., Jutten, C.: Regularized gradient algorithm
for non-negative independent component analysis. In: Proc. of IEEE International Confer-
ence on Acoustics, Speech and Signal Processing (ICASSP 2011), Prague, Czech Republic,
May 22-27, pp. 2524–2527 (2011)
13. Boulmezaoud, T.Z., El Rhabi, M., Fenniri, H., Moreau, E.: On convolutive blind source sep-
aration in a noisy context and a total variation regularization. In: Proc. of IEEE Eleventh In-
ternational Workshop on Signal Processing Advances in Wireless Communications (SPAWC
2010), Marrakech, June 20-23, pp. 1–5 (2010)
14. Masulli, F., Valentini, G.: Mutual information methods for evaluating dependence among
outputs in learning machines. Technical Report TR-01-02, Dipartimento di Informatica e
Scienze dell’Informazione, Universit`a di Genova (2001)
15. Torkkola, K.: Learning feature transforms is an easier problem than feature selection. In:
Proc. of 16th International Conference on Pattern Recognition, August 11-15, pp. 104–107
(2002)
Project
IJMISSP promotes research in machine intelligence/signal processing, highlighting development of randomised algorithms, deep learning, other learning techniques, sampling theory, transformations an…" [more]
Project
GAUChO project aspires at designing a novel distributed and heterogeneous architecture ables to functionally integrate and jointly optimize FC and FN capabilities in the same platform. In addition,…" [more]
Project
The goal of PRISMA is to advance the state of the art in machine learning theory and algorithms that exploit temporal information. PRISMA aims to develop a general probabilistic framework to deal w…" [more]
Project
I am working on automatic quality assessment of images, image representation and enhancement, wideband sensor array processing and optimization algorithms. All these topics share a common mathemati…" [more]
Article
May 2016
    In this paper, we derive a modified InfoMax algorithm for the solution of Blind Signal Separation (BSS) problems by using advanced stochastic methods. The proposed approach is based on a novel stochastic optimization approach known as the Adaptive Moment Estimation (Adam) algorithm. The proposed BSS solution can benefit from the excellent properties of the Adam approach. In order to derive the... [Show full abstract]
    Chapter
    August 2018 · Smart Innovation
      In this paper, we derive a modified InfoMax algorithm for the solution of Blind Signal Separation (BSS) problems by using advanced stochastic methods. The proposed approach is based on a novel stochastic optimization approach known as the Adaptive Moment Estimation (Adam) algorithm. The proposed BSS solution can benefit from the excellent properties of the Adam approach. In order to derive the... [Show full abstract]
      Conference Paper
      August 2010
        The presence of loudspeaker distortions in the echo path affects the performance of a conventional acoustic echo canceller (AEC). In order to address this problem we introduce a nonlinear acoustic echo cancellation (NAEC) architecture based on the adaptive combination of linear and nonlinear filters. The nonlinear filtering is performed by a functional link network (FLN) whose task is to... [Show full abstract]
        Chapter
        January 2015
          This paper introduces a new method for improving nonlinear modeling performance in online learning by using functional link-based models. The proposed algorithm is capable of selecting the useful nonlinear elements resulting from the functional expansion, while setting to zero the ones that does not bring any improvement of the modeling performance. This allows to reduce any gradient noise due... [Show full abstract]
          Discover more