IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 20105151
Independent Component Analysis by
Entropy Bound Minimization
Xi-Lin Li and Tülay Adalı, Fellow, IEEE
Abstract—A novel (differential) entropy estimator is introduced
where the maximum entropy bound is used to approximate the en-
tropy given the observations, and is computed using a numerical
procedure thus resulting in accurate estimates for the entropy. We
show that such an estimator exists for a wide class of measuring
functions, and provide a number of design examples to demon-
strate its flexible nature. We then derive a novel independent com-
ponent analysis (ICA) algorithm that uses the entropy estimate
thus obtained, ICA by entropy bound minimization (ICA-EBM).
The algorithm adopts a line search procedure, and initially uses
updates that constrain the demixing matrix to be orthogonal for
robust performance. We demonstrate the superior performance of
ICA-EBM and its ability to match sources that come from a wide
range of distributions using simulated and real-world data.
Index Terms—Blind source separation (BSS), differential
entropy, independent component analysis (ICA), principle of
separation (BSS) problem. BSS algorithms can exploit either
non-Gaussianity, nonstationarity, or correlation—see, e.g.,
–. The natural cost for exploiting non-Gaussianity that
leads to ICA is the mutual information among separated com-
ponents, which can be shown to be equivalent to maximum
likelihood estimation , and to negentropy maximization ,
 when we constrain the demixing matrix to be orthogonal.
In these approaches, we either estimate a parametric density
model – along with the demixing matrix, or maximize
the information transferred in a network of non-linear units
, , or estimate/approximate the entropy , , ,
In this paper, we first introduce a novel (differential) entropy1
estimator that approximates the entropy of a random variable
given the observations by using the maximum entropy bound
that is compatible with finite measurements. In this way, the
NDEPENDENT component analysis (ICA) has been
one of the most attractive solutions for the blind source
Manuscript received June 22, 2009; accepted June 22, 2010. Date of publica-
tion July 01, 2010; date of current version September 15, 2010. The associate
editor coordinating the review of this manuscript and approving it for publica-
tion was Prof. Konstantinos I. Diamantaras. This work was supported by the
NSF Grants NSF-CCF 0635129 and NSF-IIS 0612076.
The authors are with the Department of CSEE, University of Maryland —
Baltimore County, Baltimore, MD 21250 USA (e-mail: email@example.com;
Color versions of one or more of the figures in this paper are available online
Digital Object Identifier 10.1109/TSP.2010.2055859
1Since discrete-valued variables are not considered in this paper, we refer to
differential entropy as simply entropy in the paper.
maximum entropy density matching can be “consistent to the
largest extent with the available data and least committed with
respect to unseen data” . Thus we do not use an approxima-
tion as in  and rely on calculation of higher-order moments
as in  which are known to be sensitive to outliers. Another
key difference is that we calculate several maximum entropy
bounds and use the tightest one as the final entropy estimate,
we show that this entropy estimator is a very desirable tool for
performing ICA and introduce an ICA algorithm, ICA by en-
tropy bound minimization (ICA-EBM), that uses the tightest
maximum entropy bound. Because the entropy bound estimator
is quite flexible and can approximate the entropies of a wide
range of distributions, it can be used to perform ICA for sources
that come from distributions that are sub- or super-Gaussian,
unimodal or multimodal, symmetric or skewed by using only a
small class of nonlinear functions.
Natural (relative) gradient descent updates , Givens rota-
tions , , , (quasi-) Newton algorithm , , ,
and steepest descent on the Stiefel manifold  are commonly
used approaches for optimizing the selected cost function for
ICA. In ICA-EBM, we use a line search procedure and initially
constrain the demixing matrix to be orthogonal for better con-
vergence behavior. We demonstrate the superior performance
of ICA-EBM with respect to a class of competing algorithms
using simulations and discuss its properties. We introduced the
entropy estimator using the tightest bound in  and demon-
strated its application to ICA. In this paper, we provide a com-
plete treatmentof theentropy estimatorincluding itsimplemen-
tation and a proof for the existence of a solution with a general
class of measuring functions as well as derivation of the ICA
algorithm and its fast implementation. We also present compre-
hensive simulation results to study its performance.
The remainder of this paper is organized as follows. In
Section II, we provide background for ICA and our approach.
The novel entropy estimator is introduced in Section III. A
numerical design method and examples of this entropy es-
timator are presented in Section IV. In Section V, the new
ICA algorithm, ICA-EBM, is presented. To demonstrate the
effectiveness of ICA-EBM, a number of simulation experi-
ments are presented in Section VI, and conclusions are given
in Section VII.
Letstatistically independent, zero mean sources
be mixed through an
nonsingular mixing matrix so that we obtain the mixtures
1053-587X/$26.00 © 2010 IEEE
5152 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010
as, where super-
The mixtures are separated by forming
denotes the transpose, and is the discrete time index.
is the, and
separation or demixing matrix. A natural cost for achieving
the separation of these
independent sources is the mutual
entropy of observations
Thus this cost function assumes the same form as the maximum
likelihood cost. In the subsequent discussions, the time index
is suppressed for simplicity.
and the demixing matrix is constrained to be an orthogonal ma-
for an orthogonal matrix, the orthog-
onal ICA algorithms minimize the cost function
is the entropy of the th separated source, and
is a constant with respect to.
under the orthogonality constraint
identity matrix. Even though it is commonly used, the orthogo-
As observed in (1) and (2), estimation of the entropy or its
approximation plays a key role in the development of ICA al-
gorithms. Commonly used entropy estimators for ICA include
worth expansion approximation , , and estimators based
on the principle of maximum entropy , . Nonparametric
entropy estimation is recognized to be practically difficult and
computationally demanding. The Edgeworth expansion and the
estimator given in  lead to the use of higher-order moments
or cumulants, which have large estimation variances and are
highly sensitive to outliers. The estimator in  uses approx-
imation of the expansion by assuming that the true density of
source is close to the Gaussian density with the same mean and
variance. Thus it may be inaccurate when the true density of
source is far from Gaussian. Another approach to the minimiza-
tion of (1) and (2) is to use density matching through a para-
with the demixing matrix –. These ICA algorithms may
have poor performance if the assumed distribution is far from
the true ones , or over complicated by using complex den-
For the ICA algorithm we introduce in this paper, ICA-EBM,
entropy is estimated by bounding the entropy of estimates using
numerical computation. By using a few simple measuring func-
tions, a tight entropy bound can be determined for sources that
come from a wide range of distributions, those that have sub- or
super-Gaussian, unimodal, multimodal, symmetric or skewed
probability density functions (pdfs) where we define sub- and
super-Gaussianity with respect to normalized kurtosis as in .
Natural (relative) gradient descent updates are commonly
used to minimize the cost function given in [34, Eq. 1]. When
, where is the
is constrained to be orthogonal as in (2), Givens rotations
and steepest descent on the Stiefel manifold are commonly
used to estimate
, , , . Since pre-whitening is
a standard preprocessing procedure for many ICA algorithms
and can simplify the discussion, we always assume that the
mixtures have been pre-whitened, i.e.,
do not constrain the demixing matrix to be orthogonal in
ICA-EBM. Next, we first present the new entropy estimator.
. But we
III. THE ENTROPY ESTIMATOR
Rather than directly trying to estimate the entropy
termine an upper bound for
vides a morepractical and effectiveapproach for approximating
is a measuring function , and
the expected value of
evaluated over the observed sam-
ples. An upper bound of
can be accurately determined by
solving for the maximum entropy distribution that maximizes
the entropy, and, at the same time, is compatible with the con-
, and in practice, it can be estimated as the sample av-
according to the mean ergodic theorem. In this
different measuring functions. It is clear that the tightest en-
tropy bound is the closest one to the true entropy of source, and
can be used as the entropy estimate of source. Although this en-
tropy estimator can only provide an upper bound of the entropy
in general, it is useful for ICA since the entropy or the source
distributions do not need to be estimated with great precision
in ICA for reliable performance. Furthermore, the entropy esti-
mator we introduce is quite flexible. As we demonstrate, with
a few measuring functions, entropy bound for sources from a
wide range of distributions can be obtained.
given independent samples, we de-
, which, as we show next, pro-
denotes the expectation
A. The Maximum Entropy Distribution
Given the normalized variable
, we have
, and for sim-
has zero mean
. Hence, we can only estimate
plicity of discussion, we always assume that
and unit variance in the rest of this paper.
Suppose that the expectation
the observed samples, and we have
principle of maximum entropy , we may assume that the
samples are drawn from the distribution
and the normalization condition
. Thus we have the following entropy maxi-
is evaluated over
. According to the
, which max-
LI AND ADALI: ICA-EBM 5163
Fig. 8. An example on the typical behavior of ? ??? and ? ??? to demonstrate
the existence and uniqueness of ? in (22) with any ? and ? for bounded mea-
for the inverse of
by using the matrix inversion lemma
. Thus all the inverses of
, can be recursively calculated based on the
previous result, the inverse of
The inverse of
can be quickly calculated from the inverse
. Althoughis not a matrix of low rank,
is a sparse matrix of rank 2 for proper exchanging matrix
that permutes rowwise. Thus we can obtain the inverse
by performing a similar rank-2 modification on the result
, as in (25).
 P.Comon, “Independentcomponentanalysis:Anewconcept?,”Signal
Process., vol. 36, no. 3, pp. 287–314, 1994.
 A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Anal-
ysis. New York: Wiley, 2001.
 A. Cichocki and S. Amari, Adaptive Blind Signal and Image Pro-
cessing: Learning Algorithms and Applications.
 A. Hyvärinen and E. Oja, “Independent component analysis: Algo-
rithms and applications,” Neural Netw., vol. 13, no. 4–5, pp. 411–430,
 J. F. Cardoso and A. Souloumiac, “Blind beamforming for
non-Gaussian signals,” IEE Proc. F, vol. 140, no. 6, pp. 362–370,
 J. Karvanen, J. Eriksson, and V. Koivunen, “Pearson system based
method for blind separation,” in Proc. 2nd Int. Workshop on Independ.
Compon. Anal. Blind Signal Separation, Helsinki, Finland, 2000, pp.
 Z. Koldovský, P. Tichavský, and E. Oja, “Efficientvariant of algorithm
fastICA for independent component analysis attaining the cramér-Rao
ment of linear ICA techniques using rational nonlinear functions,” in
Proc. ICA2007, 2007, pp. 285–292.
 D. T. Pham and P. Garat, “Blind separation of mixture of independent
sources through a quasi-maximum likelihood approach,” IEEE Trans.
Signal Process., vol. 45, no. 7, pp. 1712–1725, 1997.
 J. A. Palmer, S. Makeig, K. Kreutz-Delgado, and B. D. Rao, “Newton
method for the ICA mixture model,” in Proc. IEEE Int. Conf. Acoust.,
Speech, Signal Process. (ICASSP), Las Vegas, NV, Apr. 2008, pp.
 A. Bell and T. Sejnowski, “An information-maximization approach to
blind separation and blind deconvolution,” Neural Computat., vol. 7,
pp. 1129–1159, 1995.
analysis using an extended infomax algorithm for mixed sub-Gaussian
and super-Gaussian sources,” Neural Computat., vol. 11, no. 2, pp.
 A. Hyvärinen, “New approximations of differential entropy for in-
dependent component analysis and projection pursuit,” in Advances
in Neural Information Processing Systems.
Press, 1998, vol. 10, pp. 273–279.
 D. Erdogmus, K. E. Hild, II, Y. N. Rao, and J. C. Principe, “Min-
imax mutual information approach for independent component anal-
ysis,” Neural Computat., vol. 16, no. 6, pp. 1235–1252, 2004.
 R. Boscolo, H. Pan, and V. P. Roychowdhury, “Independent compo-
nent analysis based on nonparametric density estimation,”IEEE Trans.
Neural Netw., vol. 15, no. 1, pp. 55–65, 2004.
 E. G. Learned-Miller et al., “ICA using spacings estimates of entropy,”
J. Mach. Learn. Res., vol. 4, pp. 1271–1295, 2003.
 D.-T. Pham and J. F. Cardoso, “Blind separation of instantaneous mix-
tures of nonstationary sources,” IEEE Trans. Signal Process., vol. 49,
no. 9, pp. 1837–1848, 2001.
 A. Belouchrani, K. A. Meraim, J. F. Cardoso, and E. Moulines, “A
blind source separation technique based on second order statistics,”
IEEE Trans. Signal Process., vol. 45, no. 2, pp. 434–444, 1997.
 A. Yeredor, “Blind separation of Gaussian sources via second-order
statistics with asymptotically optimal weighting,” IEEE Signal
Process. Lett., vol. 7, pp. 197–200, 2000.
 B. W. Silverman, Density Estimation for Statistics and Data Anal-
ysis. London, U.K.: Chapman and Hall, 1986.
 J. Beirlant, E. Dudewicz, L. Gyorfi, and E. van der Meulen, “Nonpara-
metric entropy estimation: An overview,” Int. J. Math. Statist. Sci., vol.
6, pp. 17–39, 1997.
 S. Fiori, “A theory for learning by weight flow on Stiefel-Grassman
manifold,” Neural Comput., vol. 13, no. 7, pp. 1625–1647, Jul. 2001.
 E. T. Jaynes, “Information theory and statistical mechanics,” Phys.
Rev., vol. 106, pp. 620–630, 1957.
 Unsupervised Adaptive Filtering, Volume 1, Blind Source Separation,
S. Haykin, Ed.New York: Wiley, 2000.
 M. E. John, “On the existence of a class of maximum-entropy prob-
ability density functions,” IEEE Trans. Inf. Theory, vol. IT-23, pp.
772–775, Nov. 1977.
 P. Ishwar and P. Moulin, “On the existence and characterization of
the maxent distribution under general moment inequality constraints,”
IEEE Trans. Inf. Theory, vol. 51, pp. 3322–3333, Sep. 2005.
 J. F. Cardoso, “On the performance of orthogonal source separation
algorithms,” in Proc. Eur. Assoc. Signal Process. Signal Process. VII,
’94, Edinburgh, Scotland, 1994, pp. 776–779.
 J. F. Cardoso, “On the stability of source separation algorithms,” J.
VLSI Signal Process. Syst., vol. 26, no. 1–2, pp. 7–14, 2000.
 O. Shalvi and E. Weinstein, “Super-exponential method for blind
deconvolution,” IEEE Trans. Inf. Theory, vol. 39, pp. 504–519, Mar.
 X.-L. Li, “A new gradient search interpretation of super-exponential
algorithm,” IEEE Signal Process. Lett., vol. 13, no. 3, pp. 173–176,
 V. Zarzoso and P. Comon, “Comparative speed analysis of fastica,” in
Proc. ICA’07, 2007, pp. 293–300.
 X.-L. Li and T. Adalı, “A novel entropy estimator and its application
to ICA,” in Proc. IEEE Workshop on Mach. Learn. Signal Process.,
Grenoble, France, Sep. 2009.
 T. Adalı, H. Li, M. Novey, and J. F. Cardoso, “Complex ICA using
nonlinear functions,” IEEE Trans. Signal Process., vol. 56, no. 9, pp.
 S. Amari, A. Cichocki, and H. H. Yang, “A new learning algorithm for
Systems 1995. Boston, MA: MIT Press, 1996, pp. 752–763.
Cambridge, MA: MIT
5164 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010
 M. Jones and R. Sibson, “What is projection pursuit?,” J. Royal Statist.
Soc. A, vol. 150, no. 1, pp. 1–36, 1987.
 H. Li and T. Adalı, “Stability analysis of complex maximum likeli-
hood ICA using Wirtinger calculus,” in Proc. IEEE Int. Conf. Acoust.,
Speech, Signal Process. (ICASSP), Las Vegas, NV, Apr. 2008.
 X.-L. Li and X.-D. Zhang, “Nonorthogonal joint diagonalization free
 H. Lütkepohl, Handbook of Matrices.
 A. Cichocki et al., ICALAB Toolboxes [Online]. Available:
 P. Hanchuan, L. Fuhui, and D. Chris, “Feature selection based on mu-
 J. W. Xu, D. Erdogmus, Y. N. Rao, and J. C. Principe, “Minimax mu-
tual informationapproach forICAofcomplex-valuedlinearmixtures,”
in Proc. ICA’04, Granada, Spain, Sep. 2004, pp. 311–318.
New York: Wiley, 1996.
electrical engineering, from the Dalian University of
Technology, Dalian, China, in 2001 and 2004 respec-
gineering from the Tsinghua University in 2008.
From 2008 to 2009, he was a researcher with
ForteMedia, Inc. Since 2009, he has been a Research
Associate with the Machine Learning for Signal
Processing Lab, University of Maryland, Baltimore
County. His research interests include speech signal
processing, blind source separation, and complex
valued signal processing.
Tülay Adalı (S’89–M’93–SM’98–F’09) received
the Ph.D. degree in electrical engineering from
North Carolina State University, Raleigh, in 1992.
She joined the faculty of the University of Mary-
landBaltimoreCounty (UMBC),Baltimore,in 1992.
She is currently a Professor with the Department
of Computer Science and Electrical Engineering,
UMBC. Her research interests are in the areas of
statistical signal processing, machine learning for
signal processing, and biomedical data analysis.
Dr. Adalı was the General Co-Chair, NNSP
(2001–2003); Technical Chair, MLSP (2004–2008); Publicity Chair, ICASSP
(2000 and 2005); Publications Co-Chair, ICASSP 2008; and Program
Co-Chair, 2009 International Conference on Independent Component Analysis
and Source Separation, 2009 MLSP. She chaired the IEEE SPS Machine
Learning for Signal Processing Technical Committee (2003–2005); Member,
SPS Conference Board (1998–2006); Member, Bio Imaging and Signal
Processing Technical Committee (2004–2007); and was an Associate Editor
for the IEEE TRANSACTIONS ON SIGNAL PROCESSING (2003–2006), and the
Elsevier Signal Processing Journal (2007–2010). She is currently Chair of
Technical Committee 14: Signal Analysis for Machine Intelligence of the
International Association for Pattern Recognition; Member, Machine Learning
for Signal Processing and Signal Processing Theory and Methods technical
committees; an Associate Editor for the IEEE TRANSACTIONS ON BIOMEDICAL
ENGINEERING and JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL,
IMAGE, AND VIDEO TECHNOLOGY, and Senior Editorial Board member of the
IEEE JOURNAL OF SELECTED AREAS IN SIGNAL PROCESSING. She is a Fellow
of the AIMBE and the past recipient of an NSF CAREER Award.