Page 1

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 20105151

Independent Component Analysis by

Entropy Bound Minimization

Xi-Lin Li and Tülay Adalı, Fellow, IEEE

Abstract—A novel (differential) entropy estimator is introduced

where the maximum entropy bound is used to approximate the en-

tropy given the observations, and is computed using a numerical

procedure thus resulting in accurate estimates for the entropy. We

show that such an estimator exists for a wide class of measuring

functions, and provide a number of design examples to demon-

strate its flexible nature. We then derive a novel independent com-

ponent analysis (ICA) algorithm that uses the entropy estimate

thus obtained, ICA by entropy bound minimization (ICA-EBM).

The algorithm adopts a line search procedure, and initially uses

updates that constrain the demixing matrix to be orthogonal for

robust performance. We demonstrate the superior performance of

ICA-EBM and its ability to match sources that come from a wide

range of distributions using simulated and real-world data.

Index Terms—Blind source separation (BSS), differential

entropy, independent component analysis (ICA), principle of

maximum entropy.

I. INTRODUCTION

I

separation (BSS) problem. BSS algorithms can exploit either

non-Gaussianity, nonstationarity, or correlation—see, e.g.,

[1]–[18]. The natural cost for exploiting non-Gaussianity that

leads to ICA is the mutual information among separated com-

ponents, which can be shown to be equivalent to maximum

likelihood estimation [9], and to negentropy maximization [1],

[4] when we constrain the demixing matrix to be orthogonal.

In these approaches, we either estimate a parametric density

model [6]–[10] along with the demixing matrix, or maximize

the information transferred in a network of non-linear units

[11], [12], or estimate/approximate the entropy [1], [4], [13],

[14], [16].

In this paper, we first introduce a novel (differential) entropy1

estimator that approximates the entropy of a random variable

given the observations by using the maximum entropy bound

that is compatible with finite measurements. In this way, the

NDEPENDENT component analysis (ICA) has been

one of the most attractive solutions for the blind source

Manuscript received June 22, 2009; accepted June 22, 2010. Date of publica-

tion July 01, 2010; date of current version September 15, 2010. The associate

editor coordinating the review of this manuscript and approving it for publica-

tion was Prof. Konstantinos I. Diamantaras. This work was supported by the

NSF Grants NSF-CCF 0635129 and NSF-IIS 0612076.

The authors are with the Department of CSEE, University of Maryland —

Baltimore County, Baltimore, MD 21250 USA (e-mail: lixilin@umbc.edu;

adali@umbc.edu).

Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSP.2010.2055859

1Since discrete-valued variables are not considered in this paper, we refer to

differential entropy as simply entropy in the paper.

maximum entropy density matching can be “consistent to the

largest extent with the available data and least committed with

respect to unseen data” [23]. Thus we do not use an approxima-

tion as in [13] and rely on calculation of higher-order moments

as in [14] which are known to be sensitive to outliers. Another

key difference is that we calculate several maximum entropy

bounds and use the tightest one as the final entropy estimate,

ratherthanusingasingleentropyapproximationorbound.Next,

we show that this entropy estimator is a very desirable tool for

performing ICA and introduce an ICA algorithm, ICA by en-

tropy bound minimization (ICA-EBM), that uses the tightest

maximum entropy bound. Because the entropy bound estimator

is quite flexible and can approximate the entropies of a wide

range of distributions, it can be used to perform ICA for sources

that come from distributions that are sub- or super-Gaussian,

unimodal or multimodal, symmetric or skewed by using only a

small class of nonlinear functions.

Natural (relative) gradient descent updates [34], Givens rota-

tions [5], [14], [16], (quasi-) Newton algorithm [4], [10], [15],

and steepest descent on the Stiefel manifold [22] are commonly

used approaches for optimizing the selected cost function for

ICA. In ICA-EBM, we use a line search procedure and initially

constrain the demixing matrix to be orthogonal for better con-

vergence behavior. We demonstrate the superior performance

of ICA-EBM with respect to a class of competing algorithms

using simulations and discuss its properties. We introduced the

entropy estimator using the tightest bound in [32] and demon-

strated its application to ICA. In this paper, we provide a com-

plete treatmentof theentropy estimatorincluding itsimplemen-

tation and a proof for the existence of a solution with a general

class of measuring functions as well as derivation of the ICA

algorithm and its fast implementation. We also present compre-

hensive simulation results to study its performance.

The remainder of this paper is organized as follows. In

Section II, we provide background for ICA and our approach.

The novel entropy estimator is introduced in Section III. A

numerical design method and examples of this entropy es-

timator are presented in Section IV. In Section V, the new

ICA algorithm, ICA-EBM, is presented. To demonstrate the

effectiveness of ICA-EBM, a number of simulation experi-

ments are presented in Section VI, and conclusions are given

in Section VII.

II. BACKGROUND

Letstatistically independent, zero mean sources

be mixed through an

nonsingular mixing matrix so that we obtain the mixtures

1053-587X/$26.00 © 2010 IEEE

Page 2

5152 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010

as, where super-

script

The mixtures are separated by forming

denotes the transpose, and is the discrete time index.

, where

is the, and

separation or demixing matrix. A natural cost for achieving

the separation of these

independent sources is the mutual

information

among

:

random variables,

(1)

where

entropy of observations

Thus this cost function assumes the same form as the maximum

likelihood cost. In the subsequent discussions, the time index

is suppressed for simplicity.

InorthogonalICAapproaches,themixturesarepre-whitened

and the demixing matrix is constrained to be an orthogonal ma-

trix. Since

for an orthogonal matrix, the orthog-

onal ICA algorithms minimize the cost function

is the entropy of the th separated source, and

is a constant with respect to.

(2)

under the orthogonality constraint

identity matrix. Even though it is commonly used, the orthogo-

nalityconstraintmayleadtosuboptimalperformance[27],[33].

As observed in (1) and (2), estimation of the entropy or its

approximation plays a key role in the development of ICA al-

gorithms. Commonly used entropy estimators for ICA include

thenonparametricentropyestimator[15],[16],[20],[21],Edge-

worth expansion approximation [1], [35], and estimators based

on the principle of maximum entropy [13], [14]. Nonparametric

entropy estimation is recognized to be practically difficult and

computationally demanding. The Edgeworth expansion and the

estimator given in [14] lead to the use of higher-order moments

or cumulants, which have large estimation variances and are

highly sensitive to outliers. The estimator in [13] uses approx-

imation of the expansion by assuming that the true density of

source is close to the Gaussian density with the same mean and

variance. Thus it may be inaccurate when the true density of

source is far from Gaussian. Another approach to the minimiza-

tion of (1) and (2) is to use density matching through a para-

metricmodelandtoestimatetheparametersofthedensityalong

with the demixing matrix [6]–[10]. These ICA algorithms may

have poor performance if the assumed distribution is far from

the true ones [24], or over complicated by using complex den-

sity models.

For the ICA algorithm we introduce in this paper, ICA-EBM,

entropy is estimated by bounding the entropy of estimates using

numerical computation. By using a few simple measuring func-

tions, a tight entropy bound can be determined for sources that

come from a wide range of distributions, those that have sub- or

super-Gaussian, unimodal, multimodal, symmetric or skewed

probability density functions (pdfs) where we define sub- and

super-Gaussianity with respect to normalized kurtosis as in [2].

Natural (relative) gradient descent updates are commonly

used to minimize the cost function given in [34, Eq. 1]. When

, where is the

is constrained to be orthogonal as in (2), Givens rotations

and steepest descent on the Stiefel manifold are commonly

used to estimate

[5], [14], [16], [22]. Since pre-whitening is

a standard preprocessing procedure for many ICA algorithms

and can simplify the discussion, we always assume that the

mixtures have been pre-whitened, i.e.,

do not constrain the demixing matrix to be orthogonal in

ICA-EBM. Next, we first present the new entropy estimator.

. But we

III. THE ENTROPY ESTIMATOR

Rather than directly trying to estimate the entropy

random variable

using

termine an upper bound for

vides a morepractical and effectiveapproach for approximating

the entropy.

Assume that

is a measuring function [13], and

the expected value of

evaluated over the observed sam-

ples. An upper bound of

can be accurately determined by

solving for the maximum entropy distribution that maximizes

the entropy, and, at the same time, is compatible with the con-

straint

, where

of

, and in practice, it can be estimated as the sample av-

erage of

according to the mean ergodic theorem. In this

way,wecanobtainseveral,say

different measuring functions. It is clear that the tightest en-

tropy bound is the closest one to the true entropy of source, and

can be used as the entropy estimate of source. Although this en-

tropy estimator can only provide an upper bound of the entropy

in general, it is useful for ICA since the entropy or the source

distributions do not need to be estimated with great precision

in ICA for reliable performance. Furthermore, the entropy esti-

mator we introduce is quite flexible. As we demonstrate, with

a few measuring functions, entropy bound for sources from a

wide range of distributions can be obtained.

of

given independent samples, we de-

, which, as we show next, pro-

is

denotes the expectation

,entropyboundsbyusing

A. The Maximum Entropy Distribution

Given the normalized variable

, we have

and

, and for sim-

has zero mean

, where

. Hence, we can only estimate

plicity of discussion, we always assume that

and unit variance in the rest of this paper.

Suppose that the expectation

the observed samples, and we have

principle of maximum entropy [23], we may assume that the

samples are drawn from the distribution

imizes entropy

the constraints

and the normalization condition

. Thus we have the following entropy maxi-

mization problem:

is evaluated over

. According to the

, which max-

under

,,

(3)

Page 3

LI AND ADALI: ICA-EBM 5153

The optimization problem in (3) can be rewritten as a La-

grangian function:

where

ting

, , are the Lagrangian multipliers. By let-

, one finds thathas the form

(4)

where parameters

the constraints in (3). The maximum entropy is then given by

, , , andare to be determined to satisfy

We rewriteas

(5)

where

random variable with zero mean and unit variance, and

is the entropy of a standard Gaussian

(6)

where we have written

both

and

as a function ofsince

in (6) are to be determined by the constraint

. From (5) we know that

ways nonnegative, since

under the zero mean and unit variance constraints, and it is

achieved by a standard Gaussian variable. Thus, we call

negentropy as in [4].

Then, the maximum entropy problem reduces to the problem

of solving for the function

analytic solution for

cannot be obtained, we can solve for

numerically as we show in Section IV.

is al-

is the maximum entropy

given in (6). Even though an

B. Existence of Maximum Entropy Distribution

The problem of the existence of maximum entropy distribu-

tion naturallyarisesinthenew entropyestimator. Existence ofa

maximumentropydistributionwithgivenmomentconstraintsis

wellstudiedintheliterature[25],[26].Asshowninthissection,

using high-order moments as the measurements, we can only

match a small class of pdfs, and the estimations of high-order

moments are sensitive to outliers. For the approach we adopt in

thispaperthatusesanumberofmeasuringfunctionsandseeksa

maximum entropy distribution with general measurement con-

straints, the literature is quite limited on the existence question

and considers only specific forms of measuring functions. Here,

weshowthatforthemaximumentropyproblemgivenin(3),the

maximum entropy solution always exists if the measuring func-

tion

is bounded.

Considering the constraints

,

in (4), we find that the maximum entropy problem given

in (3) leads to the following two equations:

,

as well as the normalization constant

(7)

(8)

Hence, for a given measuring function

we are interested in the existence of a solution for

and (8), and prove the following result.

Proposition 1: If the measuring function

then a solution for

and

given .

Proof: See Appendix A.

However, for an unbounded measuring function, a solu-

tion for

and may not exist for certain values of

. For example, for the unbounded measuring function

, which is widely used in ICA,

must be nonnegative so that all the considered integrals exist.

As a result, the maximum entropy pdf given in (4) can only

match sub-Gaussian densities. Thus if the observed signals

are super-Gaussian and hence

entropy distribution that is compatible with measurements

, and

the estimation of the expectation of an unbounded measuring

function may be inaccurate for heavy tailed source pdfs, it is de-

sirable to constrain the use of unbounded measuring functions

for entropy estimation or density matching. Certain entropy

estimators, e.g., the Edgeworth expansion approximation and

the one proposed in [14], use higher-order statistics, both for

sub- and super-Gaussian sources. Thus the accuracy of these

estimators cannot be guaranteed in general, due to the large

estimation variances of higher-order statistics particularly for

super-Gaussian sources. In our entropy estimator, usually we

use an unbounded measuring function and a bounded one

together to ensure that at least one entropy bound exists.

and a constant ,

and in (7)

is bounded,

given in (7) and (8) exists for any

, or

in (7) and (8)

, no maximum

exists. In general, because

C. Entropy Estimation Procedure

Given

expectation of each measuring function

over

observedsamples,andeachexpectationleadstoanupper

bound estimate of

as

measuring functions, , the

is evaluated

where constant

Gaussian random variable,

determined numerically, and

which is defined to be zero if the maximum entropy distribu-

tion does not exist for an estimate of

is estimated using the sample average of

the rest of the implementation discussion and in the algorithm

is the entropy of a standard

is a function that can be

is the negentropy,

. In practice,

. In

Page 4

5154 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010

presentation we keep the expectation operator to simplify the

notation, though note that these all refer to sample averages,

which we use in the implementation. The tightest maximum

entropy bound is used as the final estimate of

, i.e.,

(9)

as it provides the best entropy approximation of the estimates.

Also, note the relation between the entropy estimate and the

likelihood given by

where

imum entropy density based on the measurement information

. Hence, the estimate of the entropy with the

tightest entropy bound has the highest likelihood. At the same

time, the maximum entropy density model given in (4) associ-

ated with the tightest entropy bound provides the best match for

the true pdf of the source. Hence, the estimation defined in (9)

is in agreement with the maximum likelihood estimation prin-

ciple.

is the matched max-

IV. NUMERICAL COMPUTATION OF THE ENTROPY ESTIMATOR

To use the entropy estimator given in (9), we need to select a

setofmeasuringfunctions

advance and store these values. In this section, we first propose

a numerical approach for solving for function

study a number of design examples for this entropy estimator.

andsolveforfunctions in

, and then

A. Numerical Approach

With a given measuring function

solvefor

and in(7)and(8)bythefollowingNewtoniteration

and a given , we can

(10)

starting from an initial guess

solution, where

that is close enough to the

is the Jacobian matrix. It is clear that

solution of (7) and (8) with

we can use

of keptclosetozero,andthenusetheNewtonupdatesgivenin

(10). We can then keep generating sets of solutions for (7) and

(8) by using the previous solutions for

iterations and using a

close to the previous value.

After finding the set of solutions for (7) and (8), normaliza-

tion constant

, expectation

gentropy

readily calculated as

is the

for any

as an initial guess with the value

. Thus, initially

and using the Newton

and ne-

can be

In this way, we can obtain many points

,

terpolation method to obtain the value of

inthe range

As a result, the function

, say

. Then we can use an in-

for any

.

is determined in the range

.

For certain special measuring functions, the above numerical

design method can be simplified. For example, for an even mea-

suring function

, from (7) we can show that

Newton iteration given in (10) simplifies to

, and the

For an odd measuring function, we can show that

a solution of (7) and (8) with

and (8) with , and function

determine

for positive .

is

if

is even. Thus we only need to

is a solution of (7)

B. Examples

It is clear that with the increase of the number of measuring

functions, the entropy bound will be tighter, and thus the pro-

posed entropy estimator will be more accurate. In practice, it

is desirable to use a few simple measuring functions to reduce

thecomputationalload.Thusthemeasuring functionsshouldbe

properly designed and selected. Given some prior information,

we can also choose the appropriate measuring functions. For

example, we can only select even measuring functions to match

symmetric densities, or odd ones to match skewed ones. Our

experience suggests that using a few even and odd measuring

functions provides satisfactory performance for a wide range

of distributions when no prior information is available. The two

evenandtwooddrationalmeasuringfunctionsusedinthispaper

are listed in Table I. A number of typical densities that can

be matched by these measuring functions are shown in Fig. 1,

where we observe that by using these simple rational measuring

functions, one can match sub-Gaussian, super-Gaussian, uni-

modal, bimodal, symmetric, as well as skewed pdfs. In this

paper, the negentropy function

interpolation and saved as piecewise polynomials of order 3.

is obtained by cubic spline

C. Performance of the New Entropy Estimator

Even though the entropy estimator we introduce in this paper

uses the entropy bound, and is thus approximate, it provides re-

liable estimations of the entropy using properly selected mea-

suring functions as we demonstrate in this section.

To demonstrate the performance of the new entropy esti-

mator, we study the estimation of entropy of sources of unit

variance, drawn from the generalized Gaussian distribution

(GGD), which has a pdf of the form

where

is the shape parameter, and

on

. This is a symmetric and unimodal pdf which assumes

the Gaussian pdf for

, sub-Gaussian for

super-Gaussian for

Edgeworth expansion approximation [1], [35], the nonpara-

metric entropy estimator used in [16], and the proposed one, are

used to estimate the entropies of sources of GGD with varying

and sample sizes. Fig. 2 summarizes the results where we

,

is a constant depending

, and

. Three entropy estimators, the

Page 5

LI AND ADALI: ICA-EBM 5155

Fig. 1. Plots of typical pdfs that can be matched by using the measuring func-

tions given in Table I. (a) Symmetric. (b) Asymmetric.

TABLE I

THE TWO EVEN AND TWO ODD RATIONAL MEASURING FUNCTIONS AND

THEIR FIRST- AND SECOND-ORDER DERIVATIVES

observe that the Edgeworth expansion approximation is neither

accurate nor robust to outliers. Its estimation variances are

large for super-Gaussian sources. The nonparametric entropy

estimator is inclined to underestimate the entropies, and the

estimates are inaccurate with small sample sizes. The proposed

entropy estimator always gives more accurate entropy esti-

mates than its competitors for generalized Gaussian sources

with

.

V. THE ICA-EBM ALGORITHM

Since orthogonality constraint improves the stability and

hence convergence properties of ICA algorithms (see, e.g., [28]

and [36]), we adopt a two stage procedure in the implementa-

tion of ICA-EBM, where we first use updates that constrain the

demixing matrix to be orthogonal, and after the convergence

of orthogonal ICA-EBM, we directly minimize (1). In what

follows, we first derive the general line search algorithm mini-

mizing the ICA cost function given in (1), and then obtain the

orthogonal ICA-EBM algorithm as a special case of the general

nonorthogonal ICA-EBM algorithm.

A. ICA-EBM Algorithm

The basic idea of ICA-EBM is to divide the problem of min-

imizing

with respect to

into a series of subproblems such that we minimize

with respect to each of the row vectors

, which is an easier problem to solve. Hence, we

first update

while,

kept constant. For this task, we first write the cost function in

(1) as a function of only

. Since

the parallelepiped spanned by the vectors

can be calculated as

,

, are

is the volume of

, , it

(11)

where

vectors of

lengthperpendiculartoalltherowvectorsof

same trick is used in [37] to write the determinant

afunctionofonly

.Using(11),wecanwrite(1)asafunction

of only

as

is the area of the parallelepiped spanned by all the row

except, and is a vector of unit Euclidian

except .The

as

(12)

where

is a quantity independent of

can be regarded as a penalty function that

tries to keep

orthogonal to all the other row vectors of

the demixing matrix

. We always assume that

length, i.e.,

, and thus

are pre-whitened. Now, by using the entropy estimator given in

(9), we can write (12) as

, and the term

has unit

, since the mixtures

(13)

where

index

different measuring functions are selected according to (9). We

have the gradient

is a quantity independent of

as a function of , i.e.,

, and we write the

, since for different sources,

(14)

where

and

collinear with the previous

learning process. Hence, as in [30], we can project the gradient

in (14) onto the tangent hyperplane of the unit sphere at point

to obtain the steepest descent direction on the unit sphere

as

and

respectively. Notice that the term in (14) that is

has no contribution to the ICA

are the first order derivatives of

(15)

Page 6

5156 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010

Fig. 2. Comparison of the accuracy of three different entropy estimators for the estimation of entropy of generalized Gaussian sources with zero mean and

unit variance. The shape parameter ? takes values from the set ?????????????????????????????????? The adopted Edgeworth expansion approximation is

???? ? ??????????? ? ? ??????, where ? ??? ? ??? ? ? ? is the fourth-order cumulant of normalized random variable ?. Results for Edgeworth expansion

approximation with 0.2 and 0.5 are not shown due to their extremely large estimation variances. The spacing parameter in the nonparametric estimator used in [16]

is set to??. (a) 100 samples. (b) 1000 samples. (c) 10000 samples.

where

clearthatthenormalizedprojectedgradient

length, orthogonal tothe previous

ascent direction. Thus we obtain the following line search algo-

rithm:

. It is

isavectorofunit

, and pointsin thesteepest

(16)

where

The ICA-EBM algorithm repeats the line search given in (16)

over different row vectors of

The orthogonal ICA-EBM algorithm can be readily derived

from the general line search algorithm given in (16). When we

impose orthogonality constraint to

collinearwith

.Hencethelinesearchalgorithmgivenin(15)

and (16) reduces to:

is the step length, andis computed using (15).

until convergence.

, naturally becomes

(17)

After each row vector in the demixing matrix

dated once, a symmetric decorrelation procedure is performed

to keep the demixing matrix orthogonal, i.e., we use

has been up-

(18)

It can be shown that if we choose

intheorthogonalICA-EBMalgorithm,thelinesearchalgorithm

given in (17) reduces to the following algorithm:

(19)

where

is clear that the line search algorithm given in (19) is equiva-

lent to the well-known FastICA algorithm [4]. In fact, several

fast blind deconvolution and separation algorithms, e.g., Fas-

tICA and super exponential algorithm (SEA) [4], [29], are line

search algorithms, and do not converge faster than an exact line

search algorithm [30], [31]. Our experience with a large class of

sources suggests that, unlike FastICA, which may occasionally

exhibit oscillatory behavior, the line search algorithms given in

(16) and (17) can provide more robust convergence behavior by

using a simple step length control strategy, which we explain in

the next subsection.

is the second-order derivative of. It

B. Implementation and Computational Complexity of

ICA-EBM

1) Implementation: To achieve faster convergence, we first

use the orthogonal ICA-EBM with a few measuring functions

to provide a rough initial guess. In our implementation, we use

the FastICA learning rule given in (19) with measuring func-

tion

, maximum number of iterations of 100, and threshold

value of 0.001 in (21) to provide the initial guess. Typically, this

stage requires iterations on the order of 10–20. Then ICA-EBM

uses the line search algorithms given in (19), (17) and (16) se-

quentially with all the measuring functions listed in Table I to

estimate the demixing matrix.

WewouldliketopointoutthattheorthogonalICA-EBMmay

occasionally converge to a saddle point if the sample sizes are

Page 7

LI AND ADALI: ICA-EBM 5157

small. We mimic the method proposed in [7] to detect and re-

move saddle convergence. Assume

arated components. We can rotate pair

of

to obtain:

andare a pair of sep-

with an angle

(20)

If

is detected, and we can rotate the

,asaddlepoint

rows of andas

to remove this saddle point. When the cost function given in

(2) cannot be further reduced by rotating any pair of

(20), the saddle point detection is finished. After finishing the

saddle point detection, we use the orthogonal ICA-EBM algo-

rithm again to refine the solution if any saddle point is detected.

In the nonorthogonal ICA-EBM, we need to determine the

direction

. The algorithm for the calculation of

[37] can be computationally demanding when the dimension of

matrix

is high. A recursive algorithm is proposed for the fast

calculation of

in Appendix B.

A simple step size control strategy is used in ICA-EBM.

When the algorithm detects that the cost function oscillates, the

step size is halved, and the algorithm begins a new line search

starting from the best solution that has been found.

The maximum number of iterations for the line search algo-

rithms in (19), (17), and (16) are set to 100. The stopping crite-

rion is

as in

given in

(21)

with a typical value of 0.0001 for .

2) Computational Complexity: In the orthogonal ICA-EBM,

the computational complexity for separating one component is

per iteration. The symmetric decorrelation procedure

has a complexity of

. Thus the total computational

complexity of each full iteration is

since in general

. The nonorthogonal ICA-EBM has

the same computational complexity as the orthogonal version

when we adopt the fast algorithm for the calculation of

given in Appendix C. Thus the total computational complexity

of ICA-EBM is

.

,

VI. EXPERIMENTAL RESULTS

The ICA-EBM algorithm is compared with six competitive

ICA algorithms: (Joint Approximate Diagonalization of Eigen-

matrices) JADE [5], FastICA [4], efficient variant of algorithm

FastICA (EFICA) [7], PearsonICA [6], AMICA [10], and ro-

bust, accurate, direct independent components analysis algo-

rithm (RADICAL) [16]. JADE is a cumulant-based batch al-

gorithm for source separation and we use the

comparisons. FastICA is based on entropy approximation, and

we use the symmetric decorrelation approach. Two nonlineari-

ties, tanh and skew, which are for the separation of symmetric

and skewed sources respectively, are considered. EFICA uses a

version in the

Fig. 3. Performance comparison of seven ICA algorithms in the separation of

mixture of ? ? ?? sources of generalized Gaussian distribution with shape pa-

rameters? ? ????for? ? ????????,and? ? ???for? ? ?????????.

Each simulation point is averaged over 100 independent runs.

generalized Gaussian distribution source pdf matching mecha-

nism for FastICA. In PearsonICA, the source pdfs are matched

using a Pearson density model. AMICA adopts a mixture of

generalized Gaussians model for density matching, and a quasi-

Newton optimization technique. RADICAL is a nonparametric

ICA algorithm using spacings estimates of entropy and exhaus-

tivesearch optimizationmethod. Toincrease thespeedofRAD-

ICAL, we use its fast version, where no “smoothing points” or

auxiliary points are used. The code of FastICA is downloadable

at http://www.cis.hut.fi/projects/ica/fastica/, code for AMICA

is available at http://sccn.ucsd.edu/ jason/, and the codes for

JADE, PearsonICA and RADICAL are the versions available

on the ICA Central website (http://www.tsi.enst.fr/icacentral/

algos.html). All the graphics and text output functions in these

ICA algorithms are disabled to increase the speed.

Three performance indices are used to evaluate the per-

formance of an ICA algorithm. We assume that all sources

have the same variance. The first performance index is the

percentage of failed trials. Let

demixing-mixing matrix. We say that

bined demixing-mixing matrix if the locations of the largest

squared elements in any two rows are different. Otherwise,

is a failed combined demixing-mixing matrix. The second

performance index is the average interference to signal ratio

(average ISR), which is defined for a successful combined

demixing-mixing matrix. The ISR in each row of

be the combined

is a successful com-

is defined

Page 8

5158 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010

Fig. 4. Performance comparison of seven ICA algorithms in the separation

of mixtures of ? ? ?? sources that come from a skewed and unimodal pdf

(Gamma with ????

?

?????????? ? ?? with parameter ?

???????? for the ??? source where ? ? ???????). Each simulation point is

averaged over 100 independent runs.

?

as the ratio of the sum of all squared elements in the row except

for the largest to the largest squared element in the row. The

average ISR of

is the average ISR of all the row ISRs. The

third performanceindexis theaverageconsumedCPUtime.All

the algorithms are programmed in Matlab (http://www.math-

works.com/).

In the following experiments, we consider a number of cases:

estimation of combination of sub- and super-Gaussian sources

drawn from a GGD, which are unimodal and symmetric, esti-

mation of sources from a skewed (Gamma distribution) and bi-

modal (mixture of Gaussians) distributions as well as speech

signals. We also demonstrate an example that shows perfor-

mance with increasing number of sources in the mixture.

1) Experiment 1: In this experiment, we study separation of

sources that come from the GGD family. We generate 10 super-

and 10 sub-Gaussian sources as well as one Gaussian source

with shape parameters

for. They are mixed with a random

21

21 mixing matrix, whose elements are drawn from a zero

mean, unit variance Gaussian distribution. Fig. 3 summarizes

the performance indices. We observe that FastICA with tanh,

EFICA, ICA-EBM and PearsonICA exhibit very good perfor-

mance. Except the occasional failures with small sample sizes,

ICA-EBM performs as well as EFICA, which does assume a

generalized Gaussian distribution and hence has the clear ad-

vantageinthisexperiment.JADEoftenfails,sinceforlowshape

for and

Fig. 5. Performance comparison of seven ICA algorithms in the separation of

mixture of ? ? ?? sources that come from skewed and bimodal pdf (Gaussian

mixture with ????

? ????????????? ?????? ??????????????? ?

with ?

? ?????? ? ?? for the ??? source where ? ? ???????). Each

simulation point is averaged over 100 independent runs.

parameter values, the sources have heavy tailed distributions,

and the cumulants cannot be reliably estimated. AMICA and

RADICAL are the two most computationally demanding algo-

rithms, and they show limited performance when the sample

size is small. Furthermore, AMICA fails frequently even for

large sample sizes. FastICA with skew fails completely since

all the sources are symmetric.

2) Experiment 2: In this experiment, we consider the sep-

aration of sources from a Gamma distribution with pdf of the

form

for and

By varying

, we obtain different unimodal skewed pdfs. We

generate

independent sources with density parame-

ters

,

mixing matrix is a random 23

the performance indices. From Fig. 4 we observe that FastICA

with skew nonlinearity and ICA-EBM perform very well. Al-

though PearsonICA that does include skewed density models

showsgoodperformance,itsISRdoesnotdecreasewhenwein-

crease the sample size from 5000 to 10000. Again, RADICAL

and AMICA are the two slowest algorithms, and fail frequently

forsmallsamplesizes.AMICAfailsevenforlargesamplesizes.

JADE,EFICAandFastICAwithtanhshowlimitedperformance

in this experiment.

3) Experiment 3: In this experiment, the sources are drawn

from a Gaussian mixture distribution with pdf of the form

, where , and

for.

, as the sources. The

23 matrix. Fig. 4 summarizes

Page 9

LI AND ADALI: ICA-EBM5159

Fig. 6. Performance comparison of seven ICA algorithms with increasing

number of sources. The number of sources varies from 10 to 50, and the sample

size is 2500. For each run, each source is drawn from a randomly selected

distribution considered in Experiments 1–3. Each simulation point is averaged

over 100 independent runs.

By varying , we can obtain different skewed and multimodal

pdfs. Here, we consider the separation of mixtures of

independent sourceswithdensity parameters

. The mixing matrix is a random 25

Fig. 5 summarizes the performance indices. We observe that

ICA-EBM shows the best performance. RADICAL also per-

forms very well if the sample size is large enough. Although

AMICA performs very well if the sample size is very large and

it converges to a successful demixing matrix, it fails frequently.

All the other algorithms show limit performance in this experi-

ment.

4) Experiment 4: In this experiment, we study the per-

formance with increasing number of sources. The number of

sources varies from 10 to 50, and the sample size is 2500.

For each run, each source is drawn from a randomly selected

distribution considered in Experiments 1–3. Thus the sources

are from different families of distributions, and can be sub- or

super-Gaussian, unimodal or bimodal, symmetric or skewed.

Fig. 6 summarizes the performance indices where we observe

that ICA-EBM performs the best. RADICAL performs well

if the number of sources is small, and fails frequently when

the number of sources increases. Although AMICA performs

very well if the number of sources is small and it converges

,

25 matrix.

to a successful demixing matrix, it does fail often as well.

All the other algorithms show limited performance. In this

experiment, RADICAL, JADE and AMICA are the three most

computationally demanding algorithms for large number of

sources. The consumed CPU time of JADE increases rapidly

with the increase of number of sources.

5) Experiment5: Inthisexperiment,20naturalaudiosignals

with duration

samples obtained from [39] are used as

the sources. Kurtosis and skewness of these audio signals vary

in the range

and

of the sources are super-Gaussian, and slightly skewed. Unlike

the case of computer generated sources in the previous experi-

ments, which were all generated as samples from independent

distributions, independent natural signals may exhibit slight de-

pendenceamongthemselvesduetofinitesamplesizeandstrong

nonwhiteness of sources, which further decreases the effective

sample size. This slight dependence among sources changes

the contour of an ICA cost, and may introduce many false sta-

tionary points, which makes the optimization difficult. By using

the nonparametric mutual information estimation method given

in [40], we find that the mutual information between any pair

of audio signals varies in the range

that certain sources are moderately statistically dependent.

In each run, we randomly choose

the sources, and mix them through a random matrix. The top

panel and bottom panel of Fig. 7 summarize the performance

indices when we leave the orders of samples of all sources un-

touched, and independently and randomly permute the samples

of all sources, respectively. The random permutation of sam-

ples does not change the marginal distribution of a source, but

tends to reduce the mutual information among sources and re-

moves false stationary points of an ICA cost. From Fig. 7 we

observe that RADICAL and ICA-EBM perform very well in

both of these cases, i.e., whether the sources are slightly depen-

dent or almost independent. This fact suggests that they are ro-

busttoslightdependenceamongsources.Still,RADICALtakes

slightadvantageoverICA-EBMbeforerandompermutationbe-

cause RADICAL uses exhaustive search over the whole param-

eter space, while ICA-EBM uses gradient search and may con-

verge locally. AMICA shows good performance when it con-

verges successfully. But even when the number of sources is

verysmall,itfailsoccasionally.FastICAwithtanh,PearsonICA

and EFICA exhibit poor performance before random permuta-

tion, and much better performance after the random permuta-

tion, which implies that these algorithms are sensitive to the

slight dependence among sources. Note that in practice, only

the mixtures are observed, and it is impossible to independently

andrandomlypermuteallthesamplesofallsources.ThusRAD-

ICAL and ICA-EBM are more desirable than the other algo-

rithms in the sense that they exhibit more reliable convergence

behavior.

From the results of Experiments 1–5 we have the following

summary of observations. AMICA and RADICAL are the two

most computationally demanding algorithms, and both require

large sample sizes for satisfactory performance. In most cases,

they are about 1000 times slower than FastICA, EFICA, Pear-

sonICA and ICA-EBM. Furthermore, AMICA suffers from

poor local convergence, and fails frequently in our experiments.

respectively. Most

, indicating

audio signals as

Page 10

5160 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010

Fig. 7. Performance comparison of seven ICA algorithms in the separation of artificial mixtures of real audio signals. In the top panel, the original speech signals

are used as the sources; while in the bottom panel, samples of all speech signals are independently and randomly permuted to reduce the statistical dependence

among sources. Each simulation point is averaged over 100 independent runs.

JADE, FastICA, EFICA and PearsonICA have reasonable

separation performance for sources of simple distributions.

Experiment 3 and our experience suggest that PearsonICA and

EFICA may be inconsistent, as they demonstrate either no or

very little performance gain with increasing sample size. In

fact, the density matching methods of PearsonICA and EFICA

are based on the matching of certain statistics, and a better

matching of certain statistics does not imply a higher likelihood

if the densities of sources are far from the assumed forms.

The quasi-maximum likelihood approach in [9] approximates

the score function by using a set of basis functions whose

linear mixing coefficients are estimated by using mean square

error (MSE) criterion, but not maximum likelihood criterion.

Thus these ICA algorithms may be inconsistent in general. We

would like to point out that for large number of sources, JADE

can be computationally demanding as there are approximately

cumulant matrices to be jointly diagonalized by Givens

rotations. FastICA, EFICA, PearsonICA and ICA-EBM have

similar computational complexity, i.e.,

computationally more attractive than JADE when the number

of sources increases. When compared to others, ICA-EBM

is more attractive due to its superior separation performance,

reliable convergence, moderate computational complexity and

high flexibility of density matching.

and thus are

VII. DISCUSSIONS

We introduced a new entropy estimator based on the prin-

ciple of maximum entropy, studied the conditions for its ex-

istence, and proposed a numerical design method and design

examples. Based on this accurate entropy estimator, we devel-

oped an ICA algorithm, ICA by entropy bound minimization

(ICA-EBM)thatusesthisaccurateentropyestimatorandadopts

alinesearchoptimizationprocedure.Simulationresultsconfirm

theattractivenessofthenewICAalgorithmanditseffectiveness

in separation of sources that come from different distributions.

It is important to note that the approach we presented is quite

different from the traditional parametric approach where a den-

sity model is chosen and the parameters of the density are esti-

mated during the adaptation of the demixing matrix. Our ap-

proach realizes a wide class of probability density functions,

but indirectly, as each measuring function represents a wide

class of densities through the use of different values for the

scalars, , ,and in(4).Thealgorithmusesseveralsimplemea-

suring functions-four for the version we presented here—then

chooses the best density among the ones represented by these

four. The small set of measuring functions we used can model

both skewed and symmetric as well as super- and sub-Gaussian

densities, and those that are unimodal and bimodal. As demon-

strated by the simulations with simulated and real world data,

the performance of ICA-EBM is quite robust with the use of

just these four nonlinearities. Also, it is important to remember

that exact density match is not very critical for the performance

of ICA algorithms. However, at the expense of some increase in

computational complexity, we can easily add more measuring

functions to the implementations and can improve the perfor-

mance of ICA-EBM even further. Also, if any prior informa-

tion is available on the source distributions—e.g., on their mul-

timodal nature or other type of certain characteristics, we can

easily design measuring functions of higher efficiency specifi-

cally for that problem.

Another possibility is to use vector-valued measuring func-

tions in the entropy estimator. However, further research is

needed to determine if such an enhanced entropy estimator

provides significant improvement for the performance of ICA

algorithm. Another interesting study is entropy estimation of

complex random variables based on the principle of maximum

entropy, and its application to complex-valued ICA [41].

Page 11

LI AND ADALI: ICA-EBM 5161

APPENDIX A

PROOF OF PROPOSITION 1

Weassumethat

where

a unique

, a solution for

exists.

Existence of

and

sidered integrals exist. From (8), we find that

of equation

isaboundedfunction,i.e.,

is a constant. First, we show that for any

that satisfies (8) exists. Then, we show that for any

and that simultaneously satisfies (7) and (8)

,

and ,

That Satisfies (8): As

is bounded, we require

grows slower than

so that all the con-

is the solution

(22)

where

It is straightforward to show that

Thus both

creasing, convex functions of

We consider the upper and lower bounds of

separately. For

bounds:

and are monotonically de-

with fixedand .

and

, we have the following

For , we have the following bounds:

Comparing thebounds for

that there exists a value of

of

is always greater than or equal to the upper bound

of

, i.e., for, we have

and , we note

, for which the lower bound

By solving the above inequality, we obtain

On the other hand, there is an

theupperboundof

of

, i.e.

such that for,

isnogreaterthanthelowerbound

By solving the above inequality, we obtain

As both

creasing and convex functions of

must exist a unique

page] such that

the bounds of the two functions

the existence of a unique

Existence of

have

and are monotonically de-

for given and , there

in the range [see (23) at the bottom of the

. The relationship of

andand

are demonstrated in Fig. 8.

That Satisfies (7) With Any : From (7), we

(23)

Page 12

5162 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010

wherewewrite asafunctionof ,i.e.,

is uniquely determined by (22) with a given . In the following,

we write

as for simplicity.

We study the bounds of

that both

and

their lower bound zero by letting

spectively, since

and

upper bounds of

and

integral shown in the equation at the bottom of the page, where

,sincethevalueof

and . It is clear

are nonnegative, and achieve

and

is bounded. To obtain the

, we will use the definite

re-

is the error function. At the

same time, from (23) we know that

Then for we have the following limit:

where we made use ofin the first line,

and in the

third line, and the fact that exponential grows much faster than

a polynomial function in the last line. Similarly, we can show

that

Thus we haveand

. Since both

and

we conclude that there must exist at least one

with fixed are continuous functions of,

such that

. This completes the proof of Proposition

1.

APPENDIX B

FAST CALCULATION OF

We introduce the followingmatrix:

Then can be calculated as [37]

where is an arbitrary vector that is not orthogonal to

, , and . Direct cal-

culation of

putational complexity of

equationfortheinverseof

plexity to

For

trix where only the

elements. We write down the expression of

(24) shown at the bottom of the page. By introducing the fol-

lowing

vector:

involves the inverse of, and leads to a com-

. We derive a recursive

toreducethecomputationalcom-

.

, we find that

th column and the

is a sparse ma-

th row have nonzero

as in

we can rewritecompactly as

where

beingunitandothersbeingzero.Nowitisclearthat

has rank 2, and we can obtain the following recursive equation

isanunitvectorwithonlytheelement

...

...

(24)

Page 13

LI AND ADALI: ICA-EBM 5163

Fig. 8. An example on the typical behavior of ? ??? and ? ??? to demonstrate

the existence and uniqueness of ? in (22) with any ? and ? for bounded mea-

suring functions.

for the inverse of

[38] twice:

by using the matrix inversion lemma

(25)

where

. Thus all the inverses of

, can be recursively calculated based on the

previous result, the inverse of

The inverse of

can be quickly calculated from the inverse

of

. Althoughis not a matrix of low rank,

is a sparse matrix of rank 2 for proper exchanging matrix

that permutes rowwise. Thus we can obtain the inverse

of

by performing a similar rank-2 modification on the result

, as in (25).

,

.

REFERENCES

[1] P.Comon, “Independentcomponentanalysis:Anewconcept?,”Signal

Process., vol. 36, no. 3, pp. 287–314, 1994.

[2] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Anal-

ysis. New York: Wiley, 2001.

[3] A. Cichocki and S. Amari, Adaptive Blind Signal and Image Pro-

cessing: Learning Algorithms and Applications.

Wiley, 2002.

[4] A. Hyvärinen and E. Oja, “Independent component analysis: Algo-

rithms and applications,” Neural Netw., vol. 13, no. 4–5, pp. 411–430,

2000.

[5] J. F. Cardoso and A. Souloumiac, “Blind beamforming for

non-Gaussian signals,” IEE Proc. F, vol. 140, no. 6, pp. 362–370,

1993.

[6] J. Karvanen, J. Eriksson, and V. Koivunen, “Pearson system based

method for blind separation,” in Proc. 2nd Int. Workshop on Independ.

Compon. Anal. Blind Signal Separation, Helsinki, Finland, 2000, pp.

585–590.

[7] Z. Koldovský, P. Tichavský, and E. Oja, “Efficientvariant of algorithm

fastICA for independent component analysis attaining the cramér-Rao

lowerbound,”IEEETrans.NeuralNetw.,vol.17,no.5,pp.1265–1277,

2006.

Chichester, U.K.:

[8] P.Tichavský,Z.Koldovský,andE.Oja,“Speedandaccuracyenhance-

ment of linear ICA techniques using rational nonlinear functions,” in

Proc. ICA2007, 2007, pp. 285–292.

[9] D. T. Pham and P. Garat, “Blind separation of mixture of independent

sources through a quasi-maximum likelihood approach,” IEEE Trans.

Signal Process., vol. 45, no. 7, pp. 1712–1725, 1997.

[10] J. A. Palmer, S. Makeig, K. Kreutz-Delgado, and B. D. Rao, “Newton

method for the ICA mixture model,” in Proc. IEEE Int. Conf. Acoust.,

Speech, Signal Process. (ICASSP), Las Vegas, NV, Apr. 2008, pp.

1805–1808.

[11] A. Bell and T. Sejnowski, “An information-maximization approach to

blind separation and blind deconvolution,” Neural Computat., vol. 7,

pp. 1129–1159, 1995.

[12] T.-W.Lee,M.Girolami,andT.J.Sejnowski,“Independentcomponent

analysis using an extended infomax algorithm for mixed sub-Gaussian

and super-Gaussian sources,” Neural Computat., vol. 11, no. 2, pp.

417–441, 1999.

[13] A. Hyvärinen, “New approximations of differential entropy for in-

dependent component analysis and projection pursuit,” in Advances

in Neural Information Processing Systems.

Press, 1998, vol. 10, pp. 273–279.

[14] D. Erdogmus, K. E. Hild, II, Y. N. Rao, and J. C. Principe, “Min-

imax mutual information approach for independent component anal-

ysis,” Neural Computat., vol. 16, no. 6, pp. 1235–1252, 2004.

[15] R. Boscolo, H. Pan, and V. P. Roychowdhury, “Independent compo-

nent analysis based on nonparametric density estimation,”IEEE Trans.

Neural Netw., vol. 15, no. 1, pp. 55–65, 2004.

[16] E. G. Learned-Miller et al., “ICA using spacings estimates of entropy,”

J. Mach. Learn. Res., vol. 4, pp. 1271–1295, 2003.

[17] D.-T. Pham and J. F. Cardoso, “Blind separation of instantaneous mix-

tures of nonstationary sources,” IEEE Trans. Signal Process., vol. 49,

no. 9, pp. 1837–1848, 2001.

[18] A. Belouchrani, K. A. Meraim, J. F. Cardoso, and E. Moulines, “A

blind source separation technique based on second order statistics,”

IEEE Trans. Signal Process., vol. 45, no. 2, pp. 434–444, 1997.

[19] A. Yeredor, “Blind separation of Gaussian sources via second-order

statistics with asymptotically optimal weighting,” IEEE Signal

Process. Lett., vol. 7, pp. 197–200, 2000.

[20] B. W. Silverman, Density Estimation for Statistics and Data Anal-

ysis. London, U.K.: Chapman and Hall, 1986.

[21] J. Beirlant, E. Dudewicz, L. Gyorfi, and E. van der Meulen, “Nonpara-

metric entropy estimation: An overview,” Int. J. Math. Statist. Sci., vol.

6, pp. 17–39, 1997.

[22] S. Fiori, “A theory for learning by weight flow on Stiefel-Grassman

manifold,” Neural Comput., vol. 13, no. 7, pp. 1625–1647, Jul. 2001.

[23] E. T. Jaynes, “Information theory and statistical mechanics,” Phys.

Rev., vol. 106, pp. 620–630, 1957.

[24] Unsupervised Adaptive Filtering, Volume 1, Blind Source Separation,

S. Haykin, Ed.New York: Wiley, 2000.

[25] M. E. John, “On the existence of a class of maximum-entropy prob-

ability density functions,” IEEE Trans. Inf. Theory, vol. IT-23, pp.

772–775, Nov. 1977.

[26] P. Ishwar and P. Moulin, “On the existence and characterization of

the maxent distribution under general moment inequality constraints,”

IEEE Trans. Inf. Theory, vol. 51, pp. 3322–3333, Sep. 2005.

[27] J. F. Cardoso, “On the performance of orthogonal source separation

algorithms,” in Proc. Eur. Assoc. Signal Process. Signal Process. VII,

’94, Edinburgh, Scotland, 1994, pp. 776–779.

[28] J. F. Cardoso, “On the stability of source separation algorithms,” J.

VLSI Signal Process. Syst., vol. 26, no. 1–2, pp. 7–14, 2000.

[29] O. Shalvi and E. Weinstein, “Super-exponential method for blind

deconvolution,” IEEE Trans. Inf. Theory, vol. 39, pp. 504–519, Mar.

1993.

[30] X.-L. Li, “A new gradient search interpretation of super-exponential

algorithm,” IEEE Signal Process. Lett., vol. 13, no. 3, pp. 173–176,

2006.

[31] V. Zarzoso and P. Comon, “Comparative speed analysis of fastica,” in

Proc. ICA’07, 2007, pp. 293–300.

[32] X.-L. Li and T. Adalı, “A novel entropy estimator and its application

to ICA,” in Proc. IEEE Workshop on Mach. Learn. Signal Process.,

Grenoble, France, Sep. 2009.

[33] T. Adalı, H. Li, M. Novey, and J. F. Cardoso, “Complex ICA using

nonlinear functions,” IEEE Trans. Signal Process., vol. 56, no. 9, pp.

4536–4544, 2008.

[34] S. Amari, A. Cichocki, and H. H. Yang, “A new learning algorithm for

blindsignalseparation,”inAdvancesinNeuralInformationProcessing

Systems 1995. Boston, MA: MIT Press, 1996, pp. 752–763.

Cambridge, MA: MIT

Page 14

5164 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010

[35] M. Jones and R. Sibson, “What is projection pursuit?,” J. Royal Statist.

Soc. A, vol. 150, no. 1, pp. 1–36, 1987.

[36] H. Li and T. Adalı, “Stability analysis of complex maximum likeli-

hood ICA using Wirtinger calculus,” in Proc. IEEE Int. Conf. Acoust.,

Speech, Signal Process. (ICASSP), Las Vegas, NV, Apr. 2008.

[37] X.-L. Li and X.-D. Zhang, “Nonorthogonal joint diagonalization free

ofdegeneratesolution,”IEEETrans.SignalProcess.,vol.55,no.5,pp.

1803–1814, 2007.

[38] H. Lütkepohl, Handbook of Matrices.

[39] A. Cichocki et al., ICALAB Toolboxes [Online]. Available:

http://www.bsp.brain.riken.jp/ICALAB

[40] P. Hanchuan, L. Fuhui, and D. Chris, “Feature selection based on mu-

tualinformation:Criteriaofmax-dependency,max-relevance,andmin-

redundancy,”IEEETrans.PatternAnal.Mach.Intell.,vol.27,no.8,pp.

1226–1238, 2005.

[41] J. W. Xu, D. Erdogmus, Y. N. Rao, and J. C. Principe, “Minimax mu-

tual informationapproach forICAofcomplex-valuedlinearmixtures,”

in Proc. ICA’04, Granada, Spain, Sep. 2004, pp. 311–318.

New York: Wiley, 1996.

Xi-LinLireceivedtheB.S.andM.S.degrees,bothin

electrical engineering, from the Dalian University of

Technology, Dalian, China, in 2001 and 2004 respec-

tively,andthePh.D.degreeincontrolscienceanden-

gineering from the Tsinghua University in 2008.

From 2008 to 2009, he was a researcher with

ForteMedia, Inc. Since 2009, he has been a Research

Associate with the Machine Learning for Signal

Processing Lab, University of Maryland, Baltimore

County. His research interests include speech signal

processing, blind source separation, and complex

valued signal processing.

Tülay Adalı (S’89–M’93–SM’98–F’09) received

the Ph.D. degree in electrical engineering from

North Carolina State University, Raleigh, in 1992.

She joined the faculty of the University of Mary-

landBaltimoreCounty (UMBC),Baltimore,in 1992.

She is currently a Professor with the Department

of Computer Science and Electrical Engineering,

UMBC. Her research interests are in the areas of

statistical signal processing, machine learning for

signal processing, and biomedical data analysis.

Dr. Adalı was the General Co-Chair, NNSP

(2001–2003); Technical Chair, MLSP (2004–2008); Publicity Chair, ICASSP

(2000 and 2005); Publications Co-Chair, ICASSP 2008; and Program

Co-Chair, 2009 International Conference on Independent Component Analysis

and Source Separation, 2009 MLSP. She chaired the IEEE SPS Machine

Learning for Signal Processing Technical Committee (2003–2005); Member,

SPS Conference Board (1998–2006); Member, Bio Imaging and Signal

Processing Technical Committee (2004–2007); and was an Associate Editor

for the IEEE TRANSACTIONS ON SIGNAL PROCESSING (2003–2006), and the

Elsevier Signal Processing Journal (2007–2010). She is currently Chair of

Technical Committee 14: Signal Analysis for Machine Intelligence of the

International Association for Pattern Recognition; Member, Machine Learning

for Signal Processing and Signal Processing Theory and Methods technical

committees; an Associate Editor for the IEEE TRANSACTIONS ON BIOMEDICAL

ENGINEERING and JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL,

IMAGE, AND VIDEO TECHNOLOGY, and Senior Editorial Board member of the

IEEE JOURNAL OF SELECTED AREAS IN SIGNAL PROCESSING. She is a Fellow

of the AIMBE and the past recipient of an NSF CAREER Award.