SEPARATION OF POST-NONLINEAR MIXTURES USING ACE AND TEMPORAL
Andreas Ziehe , Motoaki Kawanabe , Stefan Harmeling and Klaus-Robert M
GMD FIRST.IDA, Kekul´estr. 7, 12489 Berlin, Germany
University of Potsdam, Am Neuen Palais 10, 14469 Potsdam, Germany
We propose an efﬁcient method based on the concept of
maximal correlation that reduces the post-nonlinear blind
source separation problem (PNL BSS) to a linear BSS prob-
lem. For this we apply the Alternating Conditional Expec-
tation (ACE) algorithm – a powerful technique from non-
linear functions. Interestingly, in the framework of the ACE
method convergence can be proven and in the PNL BSS
scenario the optimal transformation found by ACE will co-
incide with the desired inverse functions. After the non-
linearities have been removed by ACE, temporal decorrela-
tion (TD) allows us to recover the source signals. An ex-
cellent performance underlines the validity of our approach
and demonstrates the ACE-TD method on realistic exam-
Blind source separation (BSS) research has mainly been
focused on variants of linear ICA and temporal decorrela-
tion methods (see e.g. [14, 6, 5, 7, 1, 2, 13, 29, 22, 12]).
Linear BSS assumes that at time each component
of the observed -dimensional data vector is a linear
combination of statistically independent signals:
(e.g. ). The source signals
are unknown, as are the coefﬁcients of the mixing ma-
trix . The goal is therefore to estimate both unknowns
from the observed signals , i.e. a separating matrix
and signals that estimate .
However, non-linearities that distort the mixed signals,
pose a challenging problem for “conventional” BSS meth-
ods, where the mixing model is linear instantaneous or con-
volutive. The general nonlinear mixing model is (cf. )
To whom correspondence should be addressed.
This work was partly supported by the EU under contract IST-1999-
14190 – BLISS.
where is an arbitrarynonlineartransformation(at least ap-
proximately invertible). An important special case are post-
nonlinear (PNL) mixtures
where is an invertible nonlinear function that operates
componentwise and is a linear mixing matrix. Because
this PNL model, which has been introduced by Taleb and
Jutten , is an important subclass with interesting prop-
erties it attracted the interest of several researchers [25, 15,
27]. Furthermore it is often an adequate modelling of real-
world physical systems, where nonlinear transfer functions
appear; e.g. in the ﬁelds of telecommunications or biomedi-
cal data recording sensors can have a nonlinear characteris-
Fig. 1: Building blocks of the PNL mixing model and the separa-
Algorithmic solutions of eq.(2) have used e.g. self-or-
ganizing maps [20, 18], extensions of GTM , neural
networks [27, 19], parametric sigmoidal functions  or
ensemble learning  to approximate the nonlinearity
(or its inverse ). Also kernel based methods were tried on
very simple toy signals  and more recently also on real-
world data using temporal decorrelation in feature space
. Note, that most existing methods (except ) are
of high computational cost and depending on the algorithm
are prone to run into local minima.
In our approach to the PNL BSS problem we ﬁrst ap-
proximately invert the post-nonlinearity using the ACE al-
gorithm (estimating ) and then apply a standard BSS tech-
nique [3, 29] that relies on temporal decorrelation (estimat-
ing the unmixing matrix ) (cf. Fig.1). By virtue of the
ACE framework, which is brieﬂy introduced in subsection
2.2, we prove that the algorithm convergesto the correct in-
verse nonlinearities – provided that they exist. Some imple-
mentation issues are discussed and numerical simulations
illustrating the method are described in section 3. Finally a
conclusion is given in section 4.
For the sake of simplicity we introduce our method for the
case. The extension to the general case is easily pos-
sible, but omitted for better readability.
2.1. Problem statement
Let us consider the dimensional post-nonlinear mixing
where and are independent source signals, that are
temporally correlated, and are the observed signals,
is the mixing matrix and and are the com-
ponentwise nonlinear transformations which are invertible.
Obviously, any attempt to separate such a mixture by a
linear BSS algorithm will fail, unless one could invert the
functions and at least approximately. In this work we
propose that this can be achieved by maximizing the corre-
with respect to nonlinear functions and . This means,
we want to ﬁnd transformations and of the observed
signals such that the relationship between the transformed
variables becomes linear. Intuitively speaking, the relation-
ship is linear, if the signals are aligned in a scatterplot, i.e. if
theyare maximally correlated. Under certain conditionsthat
we will state in detail later, this problem is solved by the
ACE method that ﬁnds so called optimal transformations
and which maximize eq.(3). One can prove existence
and uniqueness of those optimal transformations and it can
be shown that the ACE algorithm, which is described in the
following, converges to these solutions (cf. ).
2.2. ACE algorithm
The ACE algorithm is an iterative procedure for ﬁnding the
optimal nonlinear functions and . The starting point is
the observation that for ﬁxed the optimal is given by
and conversely, for ﬁxed the optimal is
The key idea of the ACE algorithm is therefore to compute
alternately the respectiveconditional expectations. To avoid
trivialsolutions one normalizes in each step by using
the function norm . The algorithm for
two variables is summarized below. It is also possible to
extend the procedure to the multivariate case, however, for
further details we refer to [11, 4].
Algorithm 1 The ACE algorithm for two variables
until fails to decrease
An important point in the implementation of this algo-
rithm is the estimation of the conditional expectations from
the data. Usually, the conditional expectations are com-
puted by data smoothing for which numerous techniques
exist (cf. [4, 9]). Care has to be taken to balance the trade-
offbetween the ﬁdelity to the data against the smoothness of
the estimated curve. Our implementation utilizes a nearest
neighbor smoothing that applies a simple moving average
ﬁlter to appropriately sorted data.
By applying and to the mixedsignals and we
remove the effect of the nonlinear functions and . In
the following we will substantiate this claim more formally.
We show for and
that and obtained from the ACE procedure are the de-
sired inverse functions for the case that and are jointly
normal distributed,with other words we provethe following
Almost all work for the proof has already been done in
Proposition 5.4. and Theorem 5.3. of  which – by notic-
ing that the correlationof two signals does notchange, if we
scale one or both signals – implies:
Note that the conditional expectation is
a function of and the expectation is taken with respect to
, analogously for the second expression.
Since and , furthermore
and we get:
Because and are invertible functions they can be omit-
ted in the condition of the conditional expectation, leading
Assuming that the vector is normally distributed
and the correlation corr does not vanish, a straight-
forward calculation shows
This means that and satisfy eq. (5), which then imme-
diately implies our claim eq. (4). Fortunately, in our appli-
cation the above assumptions are usually fulﬁlled because
mixed signals are more Gaussian and more correlated than
unmixed signals. On the other hand, even if the assump-
tions are not perfectly met, experiments show that the ACE
algorithm still equalizes the nonlinearities well.
Summarizing the key idea, by searching for nonlinear
transformations, that maximize the linear correlations be-
tween the non-linearly transformed observed variables, we
can approximate the inverses of the post-nonlinearities.
2.3. Source separation
For a separation of the signals one could in principle ap-
ply any BSS technique, capable of solving the now approx-
imately linear problem. However, experiments show that
only second-order methods which use temporal informa-
tion are sufﬁciently robust to reliably recover the sources.
Therefore we use TDSEP, an implementation based on the
simultaneous diagonalizationof several time-delayed corre-
lation matrices for the blind identiﬁcation of the unmixing
matrix (cf. [3, 29, 28]).
3. NUMERICAL SIMULATIONS
To demonstrate the performance of the proposed method we
applyour algorithm to severalpost-nonlinearmixtures, both
instantaneous and convolutive.
The ﬁrst data set consists of Gaussian AR-processes of
where is white Gaussian noise with mean zero and vari-
ance . For the experiment we choose , ,
and generate 2000 data points.
We use a mixing matrix to get linearly mixed sig-
nals and apply strong nonlinear distortions
which were also used by Taleb and Jutten in . The dis-
tributionof these mixedsignals has a highly nonlinearstruc-
ture as visible in the scatter plot in Fig. 2.
200 400 600 800 1000 1200 1400 1600 1800 2000
200 400 600 800 1000 1200 1400 1600 1800 2000
200 400 600 800 1000 1200 1400 1600 1800 2000
−1 0 1
Fig. 2: (a) Scatter plot of the mixed AR-processes ( vs )
and (b) waveforms of the original sources (top), the linearly un-
mixed signals (middle) and recovered sources (bottom).
−1 −0.5 0 0.5 1
−1 −0.5 0 0.5 1
−4 −2 0 2 4
−1.5 −1 −0.5 0 0.5 1 1.5
Fig. 3: (a) Nonlinear functions and . (b) True (thin line) and
estimated (bold line) inverse functions and .
The application of the ACE algorithm – using a local
nearest neighbor smoother (window length 31) for the con-
ditional expectation – yields the estimated nonlinear func-
tions and shown in Fig. 3. We see that the true in-
versesof the nonlinearities and are approximatedwell.
Although the match is not perfect (could be optimized by
better smoothers) it is now possible to separate the signals
using the TDSEP algorithm, where 20 time-delayed corre-
lation matrices are simultaneously diagonalized (time lags
). Figure 2 (b) shows that the waveforms of the
recovered sources closely resemble the original ones, while
the result of the linear unmixing of the PNL mixture can
clearly not recover the sources. This is also conﬁrmed by
comparing the output distributions that are shown in Fig. 4
as a scatter plot.
One favorable property of our method is its nice scaling
behavior. To show this, we will now test the algorithm with
−0.05 0 0.05
Fig. 4: Scatter plot of the output distribution of a linear (‘+’) and
the proposed nonlinear ACE-TD algorithm (‘.’).
natural audio sources, where the input data set consists of 4
sound signals with data points each. For this case
we apply the multivariate version of the ACE algorithm,
which computes the optimal functions by maximizing the
For details of the implementation we refer to [4, 11, 9]. As
in the ﬁrst experiment, these source signals were mixed by
a linear model , with a random ( ) matrix
. After the linear mixingthe followingnonlinearities were
Figure 5 shows the results of the separation using ACE-TD
(smoothingwindowlength 51) and TDSEP ( ). We
observe again a very good separation performance that is
quantiﬁed by calculating the correlationcoefﬁcients (shown
in table 1) between the source signals and the extracted
components. This is also conﬁrmed by listening to the sep-
arated audio signals, were we perceive almost no crosstalk,
although the noise level is slightly increased (cf. the silent
parts of signal 2 in Fig. 5).
The third experiment gives an example for the applica-
tion of our method to convolutive mixtures with a PNL dis-
tortion. We deliberately distorted real-room recordings
speech and background music made by Lee  with non-
linear transfer functions as in our ﬁrst example (cf. eq.(7)).
For the separation we apply a convolutive BSS algorithm
of Parra et al. that requires only second-order statistics by
Available on the internet via
Fig. 5: Four channel audio dataset: (a) waveforms of the original
sources, (b) linearly unmixed signals with TDSEP and (c) recov-
ered sources using ACE-TD.
exploiting the non-stationarity of the signals . While
an unmixing of the distorted recordings obviously fails, we
could achieve a good separation after the unsupervised lin-
earization by the ACE procedure (cf. Fig. 6).
4. DISCUSSION AND CONCLUSION
In this work we proposed a simple technique for the blind
separation of linear mixtures with a post-nonlinear distor-
tion. The main ingredients of our algorithm, which we call
ACE-TD, are: ﬁrst, a search for nonlinear transformations
that maximize the linear correlations between transformed
variables and which approximate the inverses of the PNLs.
This search can be done highly efﬁcient by the ACE tech-
nique  from non-parametric statistics, that performs an
alternatingestimation ofconditionalexpectationsby smooth-
ing of scatter plots. Effectively, this nonlinear modeling
procedure solves the PNL mixture problem by transform-
ing it back into a linear one. Therefore, second, a temporal
decorrelation BSS algorithm (e.g. [3, 29]) can be applied.
0.10 0.56 0.31 -0.13
-0.01 0.26 0.02 0.47
0.06 0.12 0.76 -0.05
-0.07 0.66 -0.21 0.11
0.97 -.01 -.005 0.03
0.03 0.94 -0.02 -0.005
0.01 0.07 0.95 -0.007
0.04 0.002 0.001 0.96
Table 1: Correlation coefﬁcients for the signals shown in Fig. 5
Clearly, ACE is not limited to the case but it
scales naturally to the case for which an algorith-
mic description can be found in [4, 9]. Moreover, the algo-
rithm can make beneﬁcial use of additional sensors in the
overdetermined BSS case as then the joint distribution of
becomes more and more Gaussian, which is beneﬁcial
for ACE. Furthermore, our method works also for convolu-
tive mixtures, which is attractive for real-room BSS, where
nonlinear transfer functions of the sensors (microphones)
or ampliﬁers would impede a proper separation. Conclud-
ing, the proposed framework gives a simple algorithm of
high efﬁciency with a solid theoretical background for sig-
nal separation in applications with a PNL distortion, that are
of importance e.g. in real-world sensor technology.
Future research will be concerned with a better tuning
of the smoothers which are essential in the ACE algorithm
to the PNL blind source separation scenario.
 S.-I.Amari,A.Cichocki,andH.H. Yang. A newlearn-
ing algorithm for blind source separation. In Advances
in Neural Information Processing Systems 8, pages
757–763. MIT Press, 1996.
 A.J. Bell and T.J. Sejnowski. An information-
maximization approach to blind separation and blind
deconvolution. Neural Computation, 7:1129–1159,
 A. Belouchrani, K. Abed Meraim, J.-F. Cardoso, and
E. Moulines. A blind source separation technique
based on second order statistics. IEEE Trans. on Sig-
nal Processing, 45(2):434–444, 1997.
 L. Breiman and J. H. Friedman. Estimating optimal
transformations for multiple regression and correla-
tion. Journal of the American Statistical Association,
Fig. 6: Two channel auido dataset: (a) waveforms of the recorded
(undistorted) microphone signals, (b) observed PNL distorted sig-
nals, (c)result ofACE, (d) recovered sources using ACE and a con-
volutive BBS algorithm and (e) for comparison convolutive BSS
separation result for undistorted signals from (a).
 J.-F. Cardoso and A. Souloumiac. Blind beamform-
ing for non Gaussian signals. IEE Proceedings-F,
 P. Comon. Independent component analysis—a new
concept? Signal Processing, 36:287–314,1994.
 G. Deco and D. Obradovic. Linear redundancy reduc-
tion learning. Neural Networks, 8(5):751–755, 1995.
 C. Fyfe and P. L. Lai. ICA using kernel canonical cor-
relation analysis. In Proc. Int. Workshop on Indepen-
dent Component Analysis and Blind Signal Separation
(ICA2000), pages 279–284, Helsinki, Finland, 2000.
 W. H¨ardle. Applied Nonparametric Regression. Cam-
bridge University Press, Cambridge, 1990.
 S. Harmeling, A. Ziehe, M. Kawanabe, B. Blankertz,
and K.-R. M¨uller. Nonlinear blind source separation
using kernel feature spaces. submitted to ICA 2001.
 T.J. Hastie and R.J. Tibshirani. Generalized Additive
Models, volume 43 of Monographs on Statistics and
Applied Probability. Chapman & Hall, London, 1990.
 A. Hyv¨arinen, J. Karhunen, and E. Oja. Independent
Component Analysis. Wiley, 2001.
 A. Hyv¨arinen and E. Oja. A fast ﬁxed-point algorithm
for independent component analysis. Neural Compu-
 C. Jutten and J. H´erault. Blind separation of sources,
part I: An adaptive algorithm based on neuromimetic
architecture. Signal Processing, 24:1–10, 1991.
 T.-W. Lee, B.U. Koehler, and R. Orglmeister. Blind
source separationof nonlinearmixing models. In Neu-
ral Networks for Signal Processing VII, pages 406–
415. IEEE Press, 1997.
 T-W. Lee, B.U. Koehler, and R. Orglmeister. Blind
source separation of nonlinear mixing models. IEEE
International Workshop on Neural Networks for Sig-
nal Processing, pages 406–415, September 1997.
 T-W. Lee, A. Ziehe, R. Orglmeister, and T. J. Se-
jnowski. Combining time-delayed decorrelation and
ICA: Towards solving the cocktail party problem. In
Proc. ICASSP98,volume2, pages1249–1252,Seattle,
 J. K. Lin, D. G. Grier, and J. D. Cowan. Faithful rep-
resentation of separable distributions. Neural Compu-
 G. Marques and L. Almeida. Separation of nonlin-
ear mixtures using pattern repulsion. In Proc. Int.
Workshop on Independent Component Analysis and
Signal Separation (ICA’99), pages 277–282, Aussois,
 P. Pajunen, A. Hyv¨arinen, and J. Karhunen. Nonlin-
ear blind source separation by self-organizing maps.
In Proc. Int. Conf. on Neural Information Processing,
pages 1207–1210, Hong Kong, 1996.
 P. Pajunen and J. Karhunen. A maximum likelihood
approach to nonlinear blind source separation. In
Proceedings of the 1997 Int. Conf. on Artiﬁcial Neu-
ral Networks (ICANN’97), pages 541–546, Lausanne,
 P. Pajunen and J. Karhunen, editors. Proc. of the
2nd Int. Workshop on Independent Component Anal-
ysis and Blind Signal Separation, Helsinki, Finland,
June 19-22, 2000. Otamedia, 2000.
 L. Parra and C. Spence. Convolutive blind source
separation of non-stationary sources. IEEE Trans. on
Speech and Audio Processing, 8(3):320–327, May
2000. US Patent US6167417.
 A. Taleb andC. Jutten. Batchalgorithmfor source sep-
aration in post-nonlinear mixtures. In Proc. First Int.
Workshop on Independent Component Analysis and
Signal Separation (ICA’99), pages 155–160, Aussois,
 A. Taleb and C. Jutten. Source separation in post-
nonlinear mixtures. IEEE Trans. on Signal Process-
 H. Valpola, X. Giannakopoulos, A. Honkela, and
J. Karhunen. Nonlinear independent component anal-
ysis using ensemble learning: Experiments and dis-
cussion. In Proc. Int. Workshop on Independent
Component Analysis and Blind Signal Separation
(ICA2000), pages 351–356, Helsinki, Finland, 2000.
 H. H. Yang, S. Amari, and A. Cichocki. Information-
theoretic approach to blind separation of sources in
non-linear mixture. Signal Processing, 64(3):291–
 A. Yeredor. Blind separation of gaussian sources
via second-orderstatistics with asymptoticallyoptimal
weighting. IEEE Signal Processing Letters, 7(7):197–
 A. Ziehe and K.-R. M¨uller. TDSEP–an efﬁcient
algorithm for blind separation using time structure.
In Proc. Int. Conf. on Artiﬁcial Neural Networks
(ICANN’98), pages 675–680, Sk¨ovde, Sweden, 1998.