Content uploaded by Motoaki Kawanabe

Author content

All content in this area was uploaded by Motoaki Kawanabe on Jul 06, 2013

Content may be subject to copyright.

SEPARATION OF POST-NONLINEAR MIXTURES USING ACE AND TEMPORAL

DECORRELATION

Andreas Ziehe , Motoaki Kawanabe , Stefan Harmeling and Klaus-Robert M

¨

uller

GMD FIRST.IDA, Kekul´estr. 7, 12489 Berlin, Germany

University of Potsdam, Am Neuen Palais 10, 14469 Potsdam, Germany

ziehe,nabe,harmeli,klaus @first.gmd.de

ABSTRACT

We propose an efﬁcient method based on the concept of

maximal correlation that reduces the post-nonlinear blind

source separation problem (PNL BSS) to a linear BSS prob-

lem. For this we apply the Alternating Conditional Expec-

tation (ACE) algorithm – a powerful technique from non-

parametricstatistics–toapproximatelyinvertthe(post-)non-

linear functions. Interestingly, in the framework of the ACE

method convergence can be proven and in the PNL BSS

scenario the optimal transformation found by ACE will co-

incide with the desired inverse functions. After the non-

linearities have been removed by ACE, temporal decorrela-

tion (TD) allows us to recover the source signals. An ex-

cellent performance underlines the validity of our approach

and demonstrates the ACE-TD method on realistic exam-

ples.

1. INTRODUCTION

Blind source separation (BSS) research has mainly been

focused on variants of linear ICA and temporal decorrela-

tion methods (see e.g. [14, 6, 5, 7, 1, 2, 13, 29, 22, 12]).

Linear BSS assumes that at time each component

of the observed -dimensional data vector is a linear

combination of statistically independent signals:

(e.g. [12]). The source signals

are unknown, as are the coefﬁcients of the mixing ma-

trix . The goal is therefore to estimate both unknowns

from the observed signals , i.e. a separating matrix

and signals that estimate .

However, non-linearities that distort the mixed signals,

pose a challenging problem for “conventional” BSS meth-

ods, where the mixing model is linear instantaneous or con-

volutive. The general nonlinear mixing model is (cf. [12])

(1)

To whom correspondence should be addressed.

This work was partly supported by the EU under contract IST-1999-

14190 – BLISS.

where is an arbitrarynonlineartransformation(at least ap-

proximately invertible). An important special case are post-

nonlinear (PNL) mixtures

(2)

where is an invertible nonlinear function that operates

componentwise and is a linear mixing matrix. Because

this PNL model, which has been introduced by Taleb and

Jutten [25], is an important subclass with interesting prop-

erties it attracted the interest of several researchers [25, 15,

27]. Furthermore it is often an adequate modelling of real-

world physical systems, where nonlinear transfer functions

appear; e.g. in the ﬁelds of telecommunications or biomedi-

cal data recording sensors can have a nonlinear characteris-

tics.

Fig. 1: Building blocks of the PNL mixing model and the separa-

tion system.

Algorithmic solutions of eq.(2) have used e.g. self-or-

ganizing maps [20, 18], extensions of GTM [21], neural

networks [27, 19], parametric sigmoidal functions [16] or

ensemble learning [26] to approximate the nonlinearity

(or its inverse ). Also kernel based methods were tried on

very simple toy signals [8] and more recently also on real-

world data using temporal decorrelation in feature space

[10]. Note, that most existing methods (except [10]) are

of high computational cost and depending on the algorithm

are prone to run into local minima.

In our approach to the PNL BSS problem we ﬁrst ap-

proximately invert the post-nonlinearity using the ACE al-

gorithm (estimating ) and then apply a standard BSS tech-

433

nique [3, 29] that relies on temporal decorrelation (estimat-

ing the unmixing matrix ) (cf. Fig.1). By virtue of the

ACE framework, which is brieﬂy introduced in subsection

2.2, we prove that the algorithm convergesto the correct in-

verse nonlinearities – provided that they exist. Some imple-

mentation issues are discussed and numerical simulations

illustrating the method are described in section 3. Finally a

conclusion is given in section 4.

2. METHODS

For the sake of simplicity we introduce our method for the

case. The extension to the general case is easily pos-

sible, but omitted for better readability.

2.1. Problem statement

Let us consider the dimensional post-nonlinear mixing

model:

where and are independent source signals, that are

temporally correlated, and are the observed signals,

is the mixing matrix and and are the com-

ponentwise nonlinear transformations which are invertible.

Obviously, any attempt to separate such a mixture by a

linear BSS algorithm will fail, unless one could invert the

functions and at least approximately. In this work we

propose that this can be achieved by maximizing the corre-

lation

corr (3)

with respect to nonlinear functions and . This means,

we want to ﬁnd transformations and of the observed

signals such that the relationship between the transformed

variables becomes linear. Intuitively speaking, the relation-

ship is linear, if the signals are aligned in a scatterplot, i.e. if

theyare maximally correlated. Under certain conditionsthat

we will state in detail later, this problem is solved by the

ACE method that ﬁnds so called optimal transformations

and which maximize eq.(3). One can prove existence

and uniqueness of those optimal transformations and it can

be shown that the ACE algorithm, which is described in the

following, converges to these solutions (cf. [4]).

2.2. ACE algorithm

The ACE algorithm is an iterative procedure for ﬁnding the

optimal nonlinear functions and . The starting point is

the observation that for ﬁxed the optimal is given by

and conversely, for ﬁxed the optimal is

The key idea of the ACE algorithm is therefore to compute

alternately the respectiveconditional expectations. To avoid

trivialsolutions one normalizes in each step by using

the function norm . The algorithm for

two variables is summarized below. It is also possible to

extend the procedure to the multivariate case, however, for

further details we refer to [11, 4].

Algorithm 1 The ACE algorithm for two variables

initialize

repeat

until fails to decrease

An important point in the implementation of this algo-

rithm is the estimation of the conditional expectations from

the data. Usually, the conditional expectations are com-

puted by data smoothing for which numerous techniques

exist (cf. [4, 9]). Care has to be taken to balance the trade-

offbetween the ﬁdelity to the data against the smoothness of

the estimated curve. Our implementation utilizes a nearest

neighbor smoothing that applies a simple moving average

ﬁlter to appropriately sorted data.

By applying and to the mixedsignals and we

remove the effect of the nonlinear functions and . In

the following we will substantiate this claim more formally.

We show for and

that and obtained from the ACE procedure are the de-

sired inverse functions for the case that and are jointly

normal distributed,with other words we provethe following

relationship:

(4)

Almost all work for the proof has already been done in

Proposition 5.4. and Theorem 5.3. of [4] which – by notic-

ing that the correlationof two signals does notchange, if we

scale one or both signals – implies:

Note that the conditional expectation is

a function of and the expectation is taken with respect to

, analogously for the second expression.

Since and , furthermore

and we get:

434

Because and are invertible functions they can be omit-

ted in the condition of the conditional expectation, leading

us to:

(5)

Assuming that the vector is normally distributed

and the correlation corr does not vanish, a straight-

forward calculation shows

This means that and satisfy eq. (5), which then imme-

diately implies our claim eq. (4). Fortunately, in our appli-

cation the above assumptions are usually fulﬁlled because

mixed signals are more Gaussian and more correlated than

unmixed signals. On the other hand, even if the assump-

tions are not perfectly met, experiments show that the ACE

algorithm still equalizes the nonlinearities well.

Summarizing the key idea, by searching for nonlinear

transformations, that maximize the linear correlations be-

tween the non-linearly transformed observed variables, we

can approximate the inverses of the post-nonlinearities.

2.3. Source separation

For a separation of the signals one could in principle ap-

ply any BSS technique, capable of solving the now approx-

imately linear problem. However, experiments show that

only second-order methods which use temporal informa-

tion are sufﬁciently robust to reliably recover the sources.

Therefore we use TDSEP, an implementation based on the

simultaneous diagonalizationof several time-delayed corre-

lation matrices for the blind identiﬁcation of the unmixing

matrix (cf. [3, 29, 28]).

3. NUMERICAL SIMULATIONS

To demonstrate the performance of the proposed method we

applyour algorithm to severalpost-nonlinearmixtures, both

instantaneous and convolutive.

The ﬁrst data set consists of Gaussian AR-processes of

the form:

(6)

where is white Gaussian noise with mean zero and vari-

ance . For the experiment we choose , ,

and generate 2000 data points.

We use a mixing matrix to get linearly mixed sig-

nals and apply strong nonlinear distortions

(7)

which were also used by Taleb and Jutten in [24]. The dis-

tributionof these mixedsignals has a highly nonlinearstruc-

ture as visible in the scatter plot in Fig. 2.

200 400 600 800 1000 1200 1400 1600 1800 2000

2

1

s

200 400 600 800 1000 1200 1400 1600 1800 2000

2

1

u

lin

200 400 600 800 1000 1200 1400 1600 1800 2000

2

1

u

nonlin

−1 0 1

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

x

2

x

1

(a) (b)

Fig. 2: (a) Scatter plot of the mixed AR-processes ( vs )

and (b) waveforms of the original sources (top), the linearly un-

mixed signals (middle) and recovered sources (bottom).

−1 −0.5 0 0.5 1

−1

−0.5

0

0.5

1

f

1

−1 −0.5 0 0.5 1

−1

−0.5

0

0.5

1

f

2

(a)

−4 −2 0 2 4

−2

−1

0

1

2

g

1

−1.5 −1 −0.5 0 0.5 1 1.5

−1

−0.5

0

0.5

1

g

2

(b)

Fig. 3: (a) Nonlinear functions and . (b) True (thin line) and

estimated (bold line) inverse functions and .

The application of the ACE algorithm – using a local

nearest neighbor smoother (window length 31) for the con-

ditional expectation – yields the estimated nonlinear func-

tions and shown in Fig. 3. We see that the true in-

versesof the nonlinearities and are approximatedwell.

Although the match is not perfect (could be optimized by

better smoothers) it is now possible to separate the signals

using the TDSEP algorithm, where 20 time-delayed corre-

lation matrices are simultaneously diagonalized (time lags

). Figure 2 (b) shows that the waveforms of the

recovered sources closely resemble the original ones, while

the result of the linear unmixing of the PNL mixture can

clearly not recover the sources. This is also conﬁrmed by

comparing the output distributions that are shown in Fig. 4

as a scatter plot.

One favorable property of our method is its nice scaling

behavior. To show this, we will now test the algorithm with

435

−0.05 0 0.05

−0.1

−0.05

0

0.05

0.1

u

1

u

2

Fig. 4: Scatter plot of the output distribution of a linear (‘+’) and

the proposed nonlinear ACE-TD algorithm (‘.’).

natural audio sources, where the input data set consists of 4

sound signals with data points each. For this case

we apply the multivariate version of the ACE algorithm,

which computes the optimal functions by maximizing the

generalizedcorrelationcriterioncorr .

For details of the implementation we refer to [4, 11, 9]. As

in the ﬁrst experiment, these source signals were mixed by

a linear model , with a random ( ) matrix

. After the linear mixingthe followingnonlinearities were

applied:

(8)

Figure 5 shows the results of the separation using ACE-TD

(smoothingwindowlength 51) and TDSEP ( ). We

observe again a very good separation performance that is

quantiﬁed by calculating the correlationcoefﬁcients (shown

in table 1) between the source signals and the extracted

components. This is also conﬁrmed by listening to the sep-

arated audio signals, were we perceive almost no crosstalk,

although the noise level is slightly increased (cf. the silent

parts of signal 2 in Fig. 5).

The third experiment gives an example for the applica-

tion of our method to convolutive mixtures with a PNL dis-

tortion. We deliberately distorted real-room recordings

1

of

speech and background music made by Lee [17] with non-

linear transfer functions as in our ﬁrst example (cf. eq.(7)).

For the separation we apply a convolutive BSS algorithm

of Parra et al. that requires only second-order statistics by

1

Available on the internet via

http://sloan.salk.edu/˜tewon/Blind/blind audio.html

4

3

2

1

4

3

2

1

4

3

2

1

0

20000

(a)

(b)

(c)

20000

20000

0

0

Fig. 5: Four channel audio dataset: (a) waveforms of the original

sources, (b) linearly unmixed signals with TDSEP and (c) recov-

ered sources using ACE-TD.

exploiting the non-stationarity of the signals [23]. While

an unmixing of the distorted recordings obviously fails, we

could achieve a good separation after the unsupervised lin-

earization by the ACE procedure (cf. Fig. 6).

4. DISCUSSION AND CONCLUSION

In this work we proposed a simple technique for the blind

separation of linear mixtures with a post-nonlinear distor-

tion. The main ingredients of our algorithm, which we call

ACE-TD, are: ﬁrst, a search for nonlinear transformations

that maximize the linear correlations between transformed

variables and which approximate the inverses of the PNLs.

This search can be done highly efﬁcient by the ACE tech-

nique [4] from non-parametric statistics, that performs an

alternatingestimation ofconditionalexpectationsby smooth-

ing of scatter plots. Effectively, this nonlinear modeling

procedure solves the PNL mixture problem by transform-

ing it back into a linear one. Therefore, second, a temporal

decorrelation BSS algorithm (e.g. [3, 29]) can be applied.

436

TDSEP

0.10 0.56 0.31 -0.13

-0.01 0.26 0.02 0.47

0.06 0.12 0.76 -0.05

-0.07 0.66 -0.21 0.11

ACE-TD

0.97 -.01 -.005 0.03

0.03 0.94 -0.02 -0.005

0.01 0.07 0.95 -0.007

0.04 0.002 0.001 0.96

Table 1: Correlation coefﬁcients for the signals shown in Fig. 5

Clearly, ACE is not limited to the case but it

scales naturally to the case for which an algorith-

mic description can be found in [4, 9]. Moreover, the algo-

rithm can make beneﬁcial use of additional sensors in the

overdetermined BSS case as then the joint distribution of

becomes more and more Gaussian, which is beneﬁcial

for ACE. Furthermore, our method works also for convolu-

tive mixtures, which is attractive for real-room BSS, where

nonlinear transfer functions of the sensors (microphones)

or ampliﬁers would impede a proper separation. Conclud-

ing, the proposed framework gives a simple algorithm of

high efﬁciency with a solid theoretical background for sig-

nal separation in applications with a PNL distortion, that are

of importance e.g. in real-world sensor technology.

Future research will be concerned with a better tuning

of the smoothers which are essential in the ACE algorithm

to the PNL blind source separation scenario.

5. REFERENCES

[1] S.-I.Amari,A.Cichocki,andH.H. Yang. A newlearn-

ing algorithm for blind source separation. In Advances

in Neural Information Processing Systems 8, pages

757–763. MIT Press, 1996.

[2] A.J. Bell and T.J. Sejnowski. An information-

maximization approach to blind separation and blind

deconvolution. Neural Computation, 7:1129–1159,

1995.

[3] A. Belouchrani, K. Abed Meraim, J.-F. Cardoso, and

E. Moulines. A blind source separation technique

based on second order statistics. IEEE Trans. on Sig-

nal Processing, 45(2):434–444, 1997.

[4] L. Breiman and J. H. Friedman. Estimating optimal

transformations for multiple regression and correla-

tion. Journal of the American Statistical Association,

80(391):580–598,September 1985.

2

1

2

1

2

1

2

1

2

1

(a)

(b)

(c)

(d)

(e)

Fig. 6: Two channel auido dataset: (a) waveforms of the recorded

(undistorted) microphone signals, (b) observed PNL distorted sig-

nals, (c)result ofACE, (d) recovered sources using ACE and a con-

volutive BBS algorithm and (e) for comparison convolutive BSS

separation result for undistorted signals from (a).

[5] J.-F. Cardoso and A. Souloumiac. Blind beamform-

ing for non Gaussian signals. IEE Proceedings-F,

140(6):362–370,1993.

[6] P. Comon. Independent component analysis—a new

concept? Signal Processing, 36:287–314,1994.

[7] G. Deco and D. Obradovic. Linear redundancy reduc-

tion learning. Neural Networks, 8(5):751–755, 1995.

[8] C. Fyfe and P. L. Lai. ICA using kernel canonical cor-

relation analysis. In Proc. Int. Workshop on Indepen-

dent Component Analysis and Blind Signal Separation

(ICA2000), pages 279–284, Helsinki, Finland, 2000.

[9] W. H¨ardle. Applied Nonparametric Regression. Cam-

bridge University Press, Cambridge, 1990.

437

[10] S. Harmeling, A. Ziehe, M. Kawanabe, B. Blankertz,

and K.-R. M¨uller. Nonlinear blind source separation

using kernel feature spaces. submitted to ICA 2001.

[11] T.J. Hastie and R.J. Tibshirani. Generalized Additive

Models, volume 43 of Monographs on Statistics and

Applied Probability. Chapman & Hall, London, 1990.

[12] A. Hyv¨arinen, J. Karhunen, and E. Oja. Independent

Component Analysis. Wiley, 2001.

[13] A. Hyv¨arinen and E. Oja. A fast ﬁxed-point algorithm

for independent component analysis. Neural Compu-

tation, 9(7):1483–1492,1997.

[14] C. Jutten and J. H´erault. Blind separation of sources,

part I: An adaptive algorithm based on neuromimetic

architecture. Signal Processing, 24:1–10, 1991.

[15] T.-W. Lee, B.U. Koehler, and R. Orglmeister. Blind

source separationof nonlinearmixing models. In Neu-

ral Networks for Signal Processing VII, pages 406–

415. IEEE Press, 1997.

[16] T-W. Lee, B.U. Koehler, and R. Orglmeister. Blind

source separation of nonlinear mixing models. IEEE

International Workshop on Neural Networks for Sig-

nal Processing, pages 406–415, September 1997.

[17] T-W. Lee, A. Ziehe, R. Orglmeister, and T. J. Se-

jnowski. Combining time-delayed decorrelation and

ICA: Towards solving the cocktail party problem. In

Proc. ICASSP98,volume2, pages1249–1252,Seattle,

1998.

[18] J. K. Lin, D. G. Grier, and J. D. Cowan. Faithful rep-

resentation of separable distributions. Neural Compu-

tation, 9(6):1305–1320,1997.

[19] G. Marques and L. Almeida. Separation of nonlin-

ear mixtures using pattern repulsion. In Proc. Int.

Workshop on Independent Component Analysis and

Signal Separation (ICA’99), pages 277–282, Aussois,

France, 1999.

[20] P. Pajunen, A. Hyv¨arinen, and J. Karhunen. Nonlin-

ear blind source separation by self-organizing maps.

In Proc. Int. Conf. on Neural Information Processing,

pages 1207–1210, Hong Kong, 1996.

[21] P. Pajunen and J. Karhunen. A maximum likelihood

approach to nonlinear blind source separation. In

Proceedings of the 1997 Int. Conf. on Artiﬁcial Neu-

ral Networks (ICANN’97), pages 541–546, Lausanne,

Switzerland, 1997.

[22] P. Pajunen and J. Karhunen, editors. Proc. of the

2nd Int. Workshop on Independent Component Anal-

ysis and Blind Signal Separation, Helsinki, Finland,

June 19-22, 2000. Otamedia, 2000.

[23] L. Parra and C. Spence. Convolutive blind source

separation of non-stationary sources. IEEE Trans. on

Speech and Audio Processing, 8(3):320–327, May

2000. US Patent US6167417.

[24] A. Taleb andC. Jutten. Batchalgorithmfor source sep-

aration in post-nonlinear mixtures. In Proc. First Int.

Workshop on Independent Component Analysis and

Signal Separation (ICA’99), pages 155–160, Aussois,

France, 1999.

[25] A. Taleb and C. Jutten. Source separation in post-

nonlinear mixtures. IEEE Trans. on Signal Process-

ing, 47(10):2807–2820,1999.

[26] H. Valpola, X. Giannakopoulos, A. Honkela, and

J. Karhunen. Nonlinear independent component anal-

ysis using ensemble learning: Experiments and dis-

cussion. In Proc. Int. Workshop on Independent

Component Analysis and Blind Signal Separation

(ICA2000), pages 351–356, Helsinki, Finland, 2000.

[27] H. H. Yang, S. Amari, and A. Cichocki. Information-

theoretic approach to blind separation of sources in

non-linear mixture. Signal Processing, 64(3):291–

300, 1998.

[28] A. Yeredor. Blind separation of gaussian sources

via second-orderstatistics with asymptoticallyoptimal

weighting. IEEE Signal Processing Letters, 7(7):197–

200, 2000.

[29] A. Ziehe and K.-R. M¨uller. TDSEP–an efﬁcient

algorithm for blind separation using time structure.

In Proc. Int. Conf. on Artiﬁcial Neural Networks

(ICANN’98), pages 675–680, Sk¨ovde, Sweden, 1998.

438