Content uploaded by Alper Erdogan

Author content

All content in this area was uploaded by Alper Erdogan on Sep 29, 2022

Content may be subject to copyright.

1

Stationary Point Characterization for a Class of

BCA Algorithms

Huseyin A. Inan, Student Member,IEEE, Alper T. Erdogan, and Sergio Cruces, Senior Member,IEEE

Abstract—Bounded Component Analysis (BCA) is a recently

introduced approach including Independent Component Analysis

(ICA) as a special case under the assumption of source bounded-

ness. In this article, we provide a stationary point analysis for the

recently proposed instantaneous BCA algorithms that are capable

of separating dependent, even correlated as well as independent

sources from their mixtures. The stationary points are identiﬁed

and characterized as either perfect separators which are the

global maxima of the proposed optimization scheme or saddle

points. The important result emerging from the analysis is that

there are no local optima that can prevent the proposed BCA

algorithms from converging to perfect separators.

Index terms— Bounded Component Analysis, Independent

Component Analysis, Dependent Source Separation

I. INTRODUCTION

Blind Source Separation (BSS) is one of the most popular

research area in the ﬁeld of signal processing and machine

learning with a various set of applications [1]. The intensive

use of BSS among applications from a diverse set of disci-

plines owes to its blindness property. However, the blindness

property which is the lack of information on the mixing

system makes BSS problem difﬁcult to solve. The challenge

due to the absence of training data and relational statistical

information is commonly handled by exploiting some side

information/assumptions about the system.

The most widely used technique for the BSS problem

relates to the assumption of mutual statistical independence of

sources. The approach based on the independence assumption

is known as Independent Component Analysis (ICA) and it is

the most popular and successful BSS method [1]–[3]. Besides

Independent Component Analysis (ICA), there are various

BSS methods introduced by exploiting different data model

assumptions such as nonnegative matrix factorization (NMF)

[4], sparsity (e.g., [5]), and special constant modulus or ﬁnite

alphabet structure of communications signals (e.g. [6]–[8]).

In practical BSS applications, the source signals are

bounded in amplitude. Puntonet et.al. utilized the source

boundedness in [9] which can be regarded as the pioneering

work in this context. Along with the assumption of indepen-

dence of sources, the boundedness has been exploited in some

recent ICA approaches [10]–[18].

Huseyin A. Inan is with the Electrical Engineering Department, Stanford

University, CA, 94305, USA (e-mail: hinan1@stanford.edu).

Alper T. Erdogan is with the Electrical-Electronics Engineering Depart-

ment, Koc University, Sariyer, Istanbul, 34450, Turkey (e-mail: alperdo-

gan@ku.edu.tr).This work is supported in part by TUBITAK 112E057 project.

Sergio Cruces is with the Department of Teor´

ıa de la Se˜

nal y Comunica-

ciones, Universidad de Sevilla, 41092-Sevilla, Spain (e-mail: sergio@us.es).

Recently, it has been shown in [19] that when the sources

are known to be bounded, the independence assumption can

be relaxed to domain separability assumption which is stated

as: (the convex hull of) the support of the joint density

of sources can be written as the cartesian product of (the

convex hulls of the) individual source supports. We note

that this is a necessary condition for independence, however,

instead of assuming the factorability of the joint pdf to the

product of marginals, it assumes that the extreme points are

included in the joint pdf. Therefore, domain separability is

a much weaker condition than independence. By removing

the independence assumption, this new approach, referred as

Bounded Component Analysis (BCA), enables development of

methods for the separation of independent and/or dependent

sources.

In this new context, a blind source extraction algorithm has

been proposed in [19], and a deﬂationary algorithm in [20].

Recently, [21] introduced a geometric BCA framework and

proposed algorithms that are able to separate both independent

and dependent (including correlated) bounded sources from

their instantaneous mixtures. This approach is based on max-

imization of relative sizes of two geometric objects, namely

principal hyper-ellipsoid and bounding hyper-rectangle regard-

ing the separator outputs, and the corresponding optimization

setting produces the perfect separators.

More recently, a convolutive BCA approach for wide sense

stationary (dependent or independent) sources was introduced

in [22]. It is then extended to an approach where a deter-

ministic optimization setting is proposed for the convolutive

BCA problem that allows the sources to be potentially nonsta-

tionary in [23]. Moreover, various geometric approaches are

introduced in the context of hyper-spectral imaging in [24],

[25] and [26] that considers a minimum volume of a simplex

circumscribing the data space.

In this article, we provide stationary point characteriza-

tion results corresponding to the BCA algorithms introduced

in [21] that are capable of separating independent and/or

dependent, even correlated real and complex sources. We

note that we do not assume the sources are independent or

uncorrelated. Under the assumption of source boundedness,

the algorithms work for both independent and dependent, even

correlated sources. Despite the difﬁculty of the convergence

analysis due to the non-convex and non-smooth nature of

the corresponding objectives, we prove that the stationary

points of these BCA algorithms correspond to either perfect

separators which are global maximizers of the introduced

optimization setting or saddle points. This is a remarkable and

important result towards the complete characterization for the

2

global convergence of the parallel source separation algorithm

capable of separating both dependent and independent sources.

The organization of the article is as follows: In Section

II, we introduce the setup that is considered throughout the

article. In Section III, we provide a brief summary of the

BCA approach introduced in [21] and the corresponding

iterative algorithms for the real sources are provided in Section

IV. We present the stationary point characterization results

in Section V. The approach and the corresponding iterative

algorithms for the complex sources in [21] is summarized in

Section VI and the stationary point characterization results

for these algorithms are presented in Section VII. Section

VIII presents numerical examples illustrating the convergence

of the algorithms to the global maximum of the objective

functions regardless of the choice of initial seeds. In the same

section, we also provide an algorithm modiﬁcation addressing

the case of deviation from ideal assumptions and its empirical

performance study. Finally, Section IX is the conclusion.

Notation: Let A∈Rp×q,C∈Cp×qbe arbitrary real and

complex matrices,

Notation Meaning

Cm,:(C:,m)mth row (column) of C

P{A}(N {A})converts negative (positive) components of A

to zero while preserving others

sign{A}replaces the negative entries of A

with -1 and the positive entries of Awith 1

R(C) (I(C)) extracts the real (imaginary) part of C

index mused for (source, output) vector components

index ksample index

index titeration index

emvector with all zeros except

1at the index m

p(q) number of sources (mixtures)

sp-dimensional source vectors

Hq×p-dimensional mixing matrix

yq-dimensional mixtures

Wp×q-dimensional separator matrix

zp-dimensional separator outputs

Gp×p-dimensional overall mapping

(from sources to separator outputs)

Qp×p-dimensional overall mapping

(from unity-range normalized sources

to separator outputs)

ˆum(Z)the maximum of m-th separator output

ˆ

lm(Z)the minimum of m-th separator output

ˆ

Rm(Z)the range of m-th separator output

ˆ

R(Z)the separator output sample covariance matrix

Km,+(Z)the set of index locations where maximum

is achieved for m-th separator output

Km,−(Z)the set of index locations where minimum

is achieved for m-th separator output

λconvex combination coefﬁcient

M(Z)the set of separator output indexes

for which the maximum range is achieved

II. BOUNDED COMPONENT ANA LYSIS SETUP

We assume a deterministic BCA setup consisting of p

real sources which are represented by the vector s(k) =

s1(k)s2(k). . . sp(k)Tand we denote the corre-

sponding set of unobservable source samples by

S={s(1),s(2),...,s(N)}.(1)

We assume that the sources are bounded in magnitude and

u=maxk(s1(k)) maxk(s2(k)) . . . maxk(sp(k))T(2)

l=mink(s1(k)) mink(s2(k)) . . . mink(sp(k))T,(3)

are the vectors containing maximum (minimum) values for the

components of the source samples in the set S. We also deﬁne

the diagonal matrices,

U=diag(u)(4)

L=diag(l)(5)

containing maximum (minimum) sources values on their di-

agonal.

Practical digital signals naturally satisfy such magnitude

boundedness assumption and it is a perfect ﬁt especially for

digital communication symbols [18].

We point out that the sources are not assumed to be inde-

pendent, or uncorrelated. Instead we assume that the sources

satisfy BCAs domain separability assumption [19], which is a

weaker assumption than independence.

The sources are mixed by a memoryless system with trans-

fer matrix H∈Rq×pwhere we consider (over)determined

system, i.e., q≥p. Hence, the sources and the mixtures are

related by

y(k) = Hs(k), k = 1,2, . . . N. (6)

The goal is to obtain a separator matrix W∈Rp×q, which

produces the outputs

z(k) = W y(k), k = 1,2, . . . N , (7)

as the potentially permuted and scaled versions of the original

sources. We denote the overall mapping from the sources to

the separator outputs by G=W H ∈Rp×pwhere the relation

between the sources and the outputs can be written as

z(k) = Gs(k)k= 1,2, . . . N. (8)

For the output set Z={z(1), . . . , z(N)}, we deﬁne the

following statistics:

•The maximum of output component zm:

ˆum(Z) = max

k∈{1,...,N}zm(k),(9)

•The minimum of output component zm:

ˆ

lm(Z) = min

k∈{1,...,N}zm(k),(10)

•The range of output component zm:

ˆ

Rm(Z) = ˆum(Z)−ˆ

lm(Z),(11)

for m= 1, . . . , p.

•The output range vector:

ˆ

R(Z) = ˆ

R1(Z)ˆ

R2(Z). . . ˆ

Rp(Z)T,(12)

•The output sample covariance

ˆ

R(Z) = 1

N

N

X

k=1

z(k)z(k)T−ˆ

µ(Z)ˆ

µ(Z)T,(13)

where ˆ

µ(Z) = 1

NPN

k=1 z(k).

In the next section, we provide a brief summary of the

instantaneous BCA approach introduced in [21].

3

III. GEOMETRIC APP ROAC H FO R BOU ND ED COMPONENT

ANALYSIS

The approach in [21] exploits two geometric objects

which are principal hyper-ellipsoid and bounding hyper-

rectangle deﬁned on the set of output samples Z=

{z(1),z(2),...,z(N)}.

Fig. 1: Geometric Approach for Bounded Component Analy-

sis.

Figure 1 provides three dimensional illustrations of the

objects used by the geometric framework:

•In the source domain on the left-hand-side of the ﬁgure,

(blue) dots represent the source samples in S.

•In the separator output domain (on the right):

–The (blue) dots represent the separator output sam-

ples in Z.

–The (purple) box is the bounding hyper-rectangle

which is deﬁned as

Bz={y∈ <p:ˆ

lm(Z)≤ym≤ˆum(Z), m = 1, . . . , p},(14)

–The (red) ellipsoid represents the principal hyper-

ellipsoid corresponding to the separator output sam-

ples which is deﬁned as

Ez={q|(q−ˆ

µz)Tˆ

R−1

z(q−ˆ

µz)≤1}.(15)

The separation problem is posed as maximization of the

relative sizes of these objects, in which the size of hyper-

ellipsoid is chosen as its volume. When the size of bounding

hyper-rectangle is chosen as its volume, the objective function

becomes

J1(W) = Vol(Ez)

Vol(Bz)(16)

=Cpqdet( ˆ

R(Z))

Qp

m=1 ˆ

Rm(Z),(17)

where

•Cpqdet( ˆ

R(Z)) is the volume of the principal hyper-

ellipsoid where

Cp=πp

2

Γp

2+ 1,(18)

where Γ(·)is the Gamma function,

•Qp

m=1 ˆ

Rm(Z)is the volume of the bounding hyper-

rectangle.

The following assumption is introduced regarding the

hyper-rectangle:

Assumption: The set Scontains the vertices of its

(non-degenerate) bounding hyper-rectangle (A1).

This assumption may be regarded as a sample-oriented version

of a classical set of BCA assumptions for the random vector

of sources [19]:

1. Compactness and nondegeneracy of the sources:

all the sources are non-degenerate1random vari-

ables of compact support.

2. Cartesian decomposition of the convex support of

the sources: the minimum convex cover of the support

of the random vector of sources can be decomposed

as the Cartesian product of the individual convex

support of the sources.

These properties replace the hypothesis of the mutual in-

dependence of the sources in ICA, which is no longer neces-

sary. They jointly imply that the convex hull of the sources

should be a bounded hyper-rectangle with positive volume.

Assumption A1 translates this request from the theoretical

support set of the sources to the empirical support set of their

samples. It is more suitable for samples drawn from continuous

sub-Gausssian sources (with asymptotically large samples) or

discrete sources such as digital communication signals.

Under the assumption A1, we can write ˆum(Z) =

P(Gm,:)u+N(Gm,:)land ˆ

lm(Z) = N(Gm,:)u+P(Gm,:)l

which would further imply ˆ

Rm(Z) = kGm,:(U−L)k1. Thus,

we can write the objective function J1in terms of Gas

J1(G) = Cp

|det(G)|qdet( ˆ

R(S))

Qp

m=1 kGm,:(U−L)k1

.(19)

In [21], it is proven that the global maxima of (17) corre-

sponds to perfect separators, i.e., Gcan be written as

G=DP ,(20)

where Dis a diagonal matrix with non-zero diagonal entries,

and Pis a permutation matrix.

When the size of the bounding hyper-rectangle is chosen as

a norm of its main diagonal, a family of alternative objective

functions is obtained in the form

J2,r(W) = Cpqdet( ˆ

R(Z))

|| ˆ

R(Z)||p

r

.(21)

Here || · ||ris the standard r-norm in <p. It is shown in [21]

that Gis a global maximum of (21) if and only if it can be

written in the form

G=dP(U−L)−1diag(σ),(22)

where dis a non-zero value, σ∈ {−1,1}p. In this case, all

the members of the global optima set share the same relative

source scalings, unlike the set of global optima for J1which

has arbitrary relative scalings.

1A random variable is considered degenerate when the support of its p.d.f.

consist in a single point.

4

IV. ITE RATIVE BCA ALGORITHMS

The reference [21] also provides the iterative algorithms for

maximizing the objectives introduced in the previous section,

whose convergence behavior is the focus of this article. For

this purpose the logarithms of the objectives

log(J1) = 1

2log(det( ˆ

R(Z))) −

p

X

m=1

log( ˆ

Rp(Z)) + log(Cp)

(23)

log(J2,r ) = 1

2log(det( ˆ

R(Z))) −plog(kˆ

R(Z)kr) + log(Cp)

(24)

are used, where the rational expressions are converted into

more convenient difference terms. We note that the second

terms on the right hand sides of both (23) and (24) contain

non-differentiable range operator. Therefore, the iterative up-

date terms for Wcontain the subgradient terms corresponding

to this operator.

Before we provide the subgradient based iterative algo-

rithms, we deﬁne the related notation as follows:

•Wrepresents the separator matrix,

•z(k)represents the output vector with sample index k,

which can be written as

z(k) = W y(k), k = 1, . . . , N, (25)

•Z={z(1),...,z(N)}is the set of separator outputs,

•Km,+(Z) = {k:zm(k) = ˆum(Z)}is the set of index

points of the mth separator output component (zm(k))

for which the maximum value (ˆum)is achieved. Infor-

mally, one can represent in MATLAB the set Km,+(Z)

with the array variable Km,+and write

[ˆum, Km,+] = max(zm(:)),(26)

•Km,−(Z) = {k:zm(k) = ˆ

lm(Z)}is the set of index

points of the mth separator output component for which

the minimum value (ˆ

lm)is achieved, i.e., in MATLAB

notation

[ˆ

lm, Km,−] = min(zm(:)).(27)

•The subdifferential set for the non-differentiable maxi-

mum operator ˆum(Z)is deﬁned as

∂ˆum(Z) = Co {y(k) : k∈ Km,+(Z)},(28)

where Co is the convex hull operator,

•The subdifferential set for the non-differentiable mini-

mum operator ˆ

lm(Z)at iteration tis deﬁned as

∂ˆ

lm(Z) = Co {y(k) : k∈ Km,−(Z)},(29)

•The subdifferential set for the non-differentiable range

operator ˆ

Rm(Z)at iteration tis deﬁned as

∂ˆ

Rm(Z) = Co{ymax −ymin :ymax ∈∂ˆum(Z),

ymin ∈∂ˆ

lm(Z)}(30)

Based on these deﬁnitions, the corresponding subgradients

of the criteria are as follows [21]:

∇log J1(W) = Wˆ

R(Y)WT−1

Wˆ

R(Y)

−

p

X

m=1

1

ˆ

Rm(Z)embmT,(31)

and, for r= 1,2,

∇log J2,r(W) = Wˆ

R(Y)WT−1

Wˆ

R(Y)

−

p

X

m=1

pˆ

Rm(Z)r−1

|| ˆ

R(Z)||r

r

embmT,(32)

while, for r=∞,

∇log J2,∞(W) = Wˆ

R(Y)WT−1

Wˆ

R(Y)

−X

m∈M(Z)

pβm

|| ˆ

R(Z)||∞

embmT.(33)

These subgradients depend on the following deﬁnitions:

•emis the vector with all zeros except 1 at the index m,

i.e., [em]j=δmj .

•bmis the result of a subtraction of convex combinations

bm=X

k∈Km,+(Z)

λm,+(k)y(k)−X

k∈Km,−(Z)

λm,−(k)y(k).

(34)

Here, {λm,+(k) : k∈ Km,+(Z)}is the set of convex

combination coefﬁcients used for combining the input

vectors causing the maximum output, which satisfy for

m= 1,2, . . . , p,

X

k∈Km,+(Z)

λm,+(k)=1with λm,+(k)≥0,

∀k∈ Km,+(Z),(35)

Similarly, {λm,−(k) : k∈ Km,−(Z)}is the set of convex

combination coefﬁcients used for combining the input

vectors causing the minimum output, which satisfy for

m= 1,2, . . . , p,

X

k∈Km,−(Z)

λm,−(k)=1with λm,−(k)≥0,

∀k∈ Km,−(Z).(36)

•M(Z)is the set of indexes for which the peak range

values is achieved, i.e.,

M(Z) = {m:ˆ

Rm(Z) = || ˆ

R(Z)||∞},(37)

•βm=1

|M(Z)|, where | · | is the operator corresponding

to the cardinality of its argument.

5

A. The choice of the subgradient

For the simplicity in the exposition, in this subsection we

focus the analysis in the criterion log J1(W), although the

similar conclusions can be easily obtained for log J2,r(W)

and log J2,∞(W).

As evident from (19) the objective J1is almost everywhere

differentiable, except for subspaces of zero volume measure

where one or more of the elements of the overall transfer

matrix Gare exactly equal to zero. At the differentiable points,

the subdifferential ∂log J1(W)contains a single element,

which necessarily coincides with the gradient of the criterion.

In this situation, the subdifferential sets of the maximum

∂ˆum(Z)and the minimum ∂ˆ

lm(Z)of the outputs are also of

cardinality one. Therefore, the vector bmin (34) is unique re-

gardless of the choice of the convex combinations of λm,+(k)

and λm,−(k)that respectively satisfy (35) and (36).

The probability of falling at random into the subset of non-

differentiable points of the criterion is, in practice, negligible.

However, for the sake of completeness, we will also consider

this case in our analysis.

At the non-differentiable points of the criterion, the cardi-

nality of the subdifferential ∂log J1(W)is strictly greater than

one. Thus, we have the possibility to decide which subgradient

to use for the ascent algorithm by specifying the coefﬁcients of

the convex combinations λm,+(k)and λm,−(k). The remain-

ing part of this subsection describes the centered choice of the

subgradient within the subdifferential set and its direct compu-

tation from the samples of the observations. Those readers that

are only interested in the computation of the subgradient may

skip the explanations of the following subitems 1) to 3), and

proceed directly to 4) where a brief guideline of the required

steps for this computation is presented.

1) Pruning the index sets for subdifferentials: For a separa-

tion matrix W(t)at iteration t, the output vectors z(t)(k)are

computed using (25). After that, one can use (26) to determine

Km,+(Z(t)), the set of indices that maximize the mth com-

ponent of the output. We determine a subset Kuniq

m,+(Z(t)), a

pruned version of Km,+(Z(t))that retains only those indices

that do not point to replicate observation vectors. This can

be implemented in MATLAB for arrays of indices Km,+and

Kuniq

m,+(containing the elements of their respective sets), with

the command

[aux0, Kuniq

m,+] = unique(y(:, Km,+)0,0rows0),(38)

where the second output argument is the desired array of

indices with the elements of Kuniq

m,+(Z(t)), and where the ﬁrst

output argument is an auxiliary matrix which collects the

observation vectors with indices in Kuniq

m,+(Z(t)).

Similarly, we can use equation (27) to deﬁne the set of

indices Km,−(Z(t))that minimize the mth component of the

outputs, and determine the subset Kuniq

m,−(Z(t))that retains only

those indices that do not point to replicate observation vectors.

2) The geometry of subdifferential sets ∂ˆum(Z)and

∂ˆ

lm(Z):Before we prescribe the subgradient choices, we

investigate the direct relation between the subdifferentials

∂ˆum(Z(t))and ∂ˆ

lm(Z(t))and the hyper-parallelepiped of

the observations. These subdifferentials are the collections of

subgradients for the respective maximum and minimum of the

mth component of the outputs.

Let us recall that, given the set of source vectors S=

{s(1),s(2),...,s(N)}, assumption A1 guarantees that S

contains the vertices of its (non-degenerate) bounding hyper-

rectangle. At iteration t, we can write mth component of the

separator output z(t)(k)in terms of the overall mapping matrix

G(t)as

z(t)

m(k) = G(t)

m,:s(k),(39)

where G(t)

m,:= [G(t)

m,1, . . . , G(t)

m,p]. Then, the source indices

can be grouped, according to the sign of the overall mapping

coefﬁcients G(t)

m,n, in the following three sets:

I(t)

m,+={n|G(t)

m,n >0, n ∈ {1, . . . , p}},(40)

I(t)

m,−={n|G(t)

m,n <0, n ∈ {1, . . . , p}},(41)

I(t)

m,0={n|G(t)

m,n = 0, n ∈ {1, . . . , p}}.(42)

By deﬁnition, the maximum value of z(t)

m(k)for a given

G(t)

m,:is attained at the index locations k∈ Kuniq

m,+(Z(t)). For

the maximum to be achieved, the source components (for n=

1, . . . , p) will require the following conﬁguration

sn(k) =

unfor n∈ I(t)

m,+

lnfor n∈ I(t)

m,−

arbitrary value in [ln, un]for n∈ I(t)

m,0

(43)

where un(and respectively ln) represent the maximum (min-

imum) value for the source component sn, which coincides

with the nth diagonal entry of the diagonal matrix U(or

respectively L). According to (43), the vectors sn(k)have

ﬁxed extreme values at the components corresponding to

the nonzero entries of the vector G(t)

m,:, and an arbitrary

value from the source domains in the other components, i.e.,

corresponding to the zero entries of the vector G(t)

m,:.

Let Suniq

m,+(Z(t)) = {sn(k) : k∈ Kuniq

m,+(Z(t))}represent

the collection of unique source vectors in the form (43)

generating the maximum value of z(t)

m(k). This set is a subset

of the d-dimensional face of the bounding hyper-rectangle

of sources, Fm,+(Z(t)), where dis the cardinality of I(t)

m,0.

Here Fm,+(Z(t))can be deﬁned as the collection all vectors

satisfying (43), and it can be written as the convex hull of

the set of its 2dvertices, V(s)

m,+(Z(t)), which is obtained by

selecting the extreme values in (43), which are also the vertex

points of the bounding hyper-rectangle of sources. Note that by

assumption A1, the set of vertices V(s)

m,+(Z(t))is also included

in Suniq

m,+(Z(t)).

As an example, Figure 2 illustrates the faces Fm,+of the

bounding hyper-rectangle of sources, for a scenario with 3-

sources and 3-mixtures scenario and

G=

100

112

044

.(44)

In this ﬁgure,

•F1,+(Z)and F1,−(Z)are 2-faces of the source bounding

hyper-rectangle (I1,0has cardinality 2)

6

Fig. 2: d-faces of the bounding hyper-rectangle of sources

corresponding to ˆum(Z)and ˆ

lm(Z)for a scenario with 3

sources and 3 mixtures.

•F2,−(Z)and F2,−(Z)are singleton sets each correspond-

ing to a vertex of the source bounding hyper-rectangle

(I2,0has cardinality 0),

•F3,+(Z)and F3,−(Z)are 1-faces (edges) of the source

bounding hyper-rectangle (I3,0has cardinality 1)

Similarly, the minimum value of z(t)

m(k)in (39) will be

achieved at the index locations k∈ Kuniq

m,−(Z(t)). In this case,

the source components (for n= 1, . . . , p) will require the

conﬁguration

sn(k) =

lnfor n∈ I(t)

m,+

unfor n∈ I(t)

m,−

arbitrary value in [ln, un]for n∈ I(t)

m,0

(45)

which corresponds to points in the opposite face of the hyper-

rectangle of the sources, which we represent with Fm,−(Z(t)).

We also use V(s)

m,−(Z)to represent its set of vertices.

Let V(y)

m,+(Z(t))(V(y)

m,−(Z(t))) represent the image of

V(s)

m,+(Z(t))(V(s)

m,−(Z(t))) under the mixing map H.

V(y)

m,+(Z(t))(V(y)

m,−(Z(t))) is contained in ∂ˆum(Z(t))

(∂ˆ

lm(Z(t))), and it forms the set of vertex points of this set.

We also note that ∂ˆum(Z(t))(∂ˆ

lm(Z(t))) is the image of

Fm,+(Z(t))(Fm,−(Z(t))), and it is a d-dimensional face of

the hyper-parallelepiped which is the image of the bounding

hyper-rectangle of the sources. We also note that

∂ˆum(Z(t)) = Co V(y)

m,+(Z(t)),(46)

and

∂ˆ

lm(Z(t)) = Co V(y)

m,−(Z(t)).(47)

Due to the point symmetry of the hyper-parallelepiped with

respect to its center, the sets ∂ˆum(Z(t))and ∂ˆ

lm(Z(t))are

mirror (inverted) images of each other, coinciding with oppo-

site faces in the hyper-parallelepiped.

As an example, Figure 3 illustrates the subdifferential sets

ˆum(Z)and ˆ

lm(Z)for the example in Figure 2. Based on this

ﬁgure, we can see that

Fig. 3: Illustration of subdifferential sets ˆum(Z)and ˆ

lm(Z)

for a scenario with 3 sources and 3 mixtures.

•∂ˆu1(Z)and ∂ˆ

l1(Z)are 2-faces of the parallelepiped, and

they are the images of F1,+(Z)and F1,−(Z)in Figure

2 respectively.

•∂ˆu2and ∂ˆ

l2are singleton sets each containing one vertex,

and they are the images of F2,+(Z)and F2,−(Z)in

Figure 2 respectively.

•∂ˆu3and ∂ˆ

l3are 1-faces (edges) of the parallelepiped, and

they are the images of F3,+(Z)and F3,−(Z)in Figure

2 respectively.

We also note that ∂ˆui,∂ˆ

lipairs are located symmetrically with

respect to the center of the parallelepiped. The same ﬁgure also

illustrates V(y)

1,+(Z)which is the set if vertices of ˆu1(Z).

3) The centers of subdifferential sets as the subgradient

choices: Based on the geometrical picture introduced in the

previous subsection, we introduce the subgradient selection

approach as the center of the subdifferential sets ˆum(Z(t))

and ˆ

lm(Z(t)). On one hand, at the differentiable points of the

criterion, each subdifferential ˆum(Z(t))or ˆ

lm(Z(t))consists

of a unique vertex (0-dimensional face) of the parallelepiped.

Therefore, regardless of the considered convex combination

coefﬁcients, there is only one possible subgradient choice.

On the other hand, at the non-differentiable points of the

criterion, the subdifferentials are d-dimensional faces of the

hyper-parallelepiped where drepresents the number of zero

elements in G(t)

m,:(the m-th row of the global transfer matrix

at iteration t). In this latter case, we can specify which

subgradient to use and, for this purpose, one only needs to

specify the coefﬁcients of the convex combinations in (34),

which represent the generalized barycentric coordinates of

their respective subgradients.

Let us denote the subsets of Kuniq

m,+(Z(t))and Kuniq

m,−(Z(t))

that correspond to the active (i.e., non-zero) convex combina-

tion coefﬁcients respectively by

Kact

m,+(Z(t)) = nk:λ(t)

m,+(k)>0, k ∈ Kuniq

m,+(Z(t))o,(48)

Kact

m,−(Z(t)) = nk:λ(t)

m,−(k)>0, k ∈ Kuniq

m,−(Z(t))o.(49)

Mainly for the purpose of easing the analysis of the resulting

algorithm, we propose a centered choice for the subgradient

7

within each subdifferential set under consideration. For this

purpose, the set Kact

m,+(Z(t))(respectively Kact

m,−(Z(t))) is cho-

sen so as to select those pairs of vertices of the subdifferential

∂ˆum(Z(t))(respectively ∂ˆ

lm(Z(t))) with maximum separa-

tion, i.e., those for which the separation equals the diameter

of the subdifferential sets.

Note that these active subsets can be determined directly

from Kuniq

m,+(Z(t))and Kuniq

m,−(Z(t))without the need to eval-

uate ∂ˆum(Z(t))or ∂ˆ

lm(Z(t)). All that one has to do is to

form Kact

m,+(Z(t))by collecting all those indices k1, k2∈

Kuniq

m,+(Z(t)), for which the separation between the correspond-

ing observations, y(k1)and y(k2), is maximum. The same

procedure is applied to obtain Kact

m,−(Z(t))from Kuniq

m,−(Z(t)).

Since the center of a face of the hyper-parallelepiped of

the observations coincides with the average of the pairs of

vertices in that face with maximal separation, the center of

the set ∂ˆum(Z(t))is determined by

c(t)

m,+=X

k∈Kact

m,+(Z(t))

y(k)

|Kact

m,+(Z(t))|,(50)

while the center of the set ∂ˆ

lm(Z(t))is also given by

c(t)

m,−=X

k∈Kact

m,−(Z(t))

y(k)

|Kact

m,−(Z(t))|.(51)

Therefore, we choose the active coefﬁcients which are uniform

in value, having for m= 1, . . . , p,

λ(t)

m,+(k) = (1

|Kact

m,+(Z(t))|∀k∈ Kact

m,+(Z(t))

0∀k6∈ Kact

m,+(Z(t))(52)

and

λ(t)

m,−(k) = (1

|Kact

m,−(Z(t))|∀k∈ Kact

m,−(Z(t))

0∀k6∈ Kact

m,−(Z(t)).(53)

The difference of both centers is our proposed subgradient

choice within the subdifferential set of the range operator

b(t)

m=c(t)

m,+−c(t)

m,−∈∂ˆ

Rm(Z(t)).(54)

4) Summary of the steps for the subgradient computation.

Let’s start assuming that the array zm(:) collects the ele-

ments of the mth-esime output at iteration t. For m= 1, . . . , p,

equations (26) and (27) respectively determine the arrays of

indices Km,+and Km,−. These are pruned with the procedure

described in (38) to obtain Kuniq

m,+and Kuniq

m,−. Almost always,

the pruned arrays will contain a single element, in which

case b(t)

m=y(:, Kuniq

m,+(1))) −y(:, Kuniq

m,−(1)). Otherwise, the

active set Kact

m,+(Z(t))collects all those indices k1, k2, for

which the separation of y(:, Kuniq

m,+(k1)) and y(:, Kuniq

m,+(k2))

is maximum. Similarly, Kact

m,−is deﬁned. The centers of the

subdifferentials ∂ˆum(Z(t))and ∂ˆ

lm(Z(t))are respectively

obtained by (50) and (51). Their difference in (54) determines

the vector b(t)

m. Finally, b(t)

mfor m= 1, . . . , p, can be

substituted in equations (31)-(33) to determine the proposal

for the subgradient of the criterion ∇log J(W(t)), where J

can be particularized to: J1,J2,r and J2,∞.

B. The BCA algorithms

The subgradient ascent algorithm which optimizes the BCA

criterion log J(W(t))is then equal to

W(t+1) =W(t)+µ(t)∇log J(W(t)).(55)

This iteration can be also written in terms of the global transfer

matrix G(t)=W(t)Hfor the posterior analysis of the

algorithms. For this purpose we multiply both sides of (55)

by Hfrom right to obtain

G(t+1) =G(t)+µ(t)∇log J(W(t))H.(56)

V. STATI ONARY POI NT CHARACTERIZATION FOR BCA

ALGORITHMS

In this section, we provide stationary point analysis results

for the BCA algorithms presented in the previous section.

A. Objective Function J1(W)

In order to identify stationary points, we start from the

particularization of the iterative algorithm (56) for the ﬁrst

objective function and substitute ˆ

R(Y) = Hˆ

R(S)HTto

obtain

G(t+1) =G(t)+µ(t)

W(t)Hˆ

R(S)HTW(t)T−1

W(t)Hˆ

R(S)HTH

−

p

X

m=1

1

ˆ

Rm(Z(t))emb(t)

m

TH,(57)

where

ˆ

R(S) = 1

N

N

X

k=1

s(k)s(k)T−ˆ

µ(S)ˆ

µ(S)T(58)

is the sample covariance matrix for the source samples Sand

ˆ

µ(S) = 1

NPN

k=1 s(k)is the sample mean for the same set.

After grouping common factors we obtain

G(t+1)

=G(t)+µ(t)

(G(t))

−Tˆ

R(S)

−1(G(t))

−1G(t)ˆ

R(S)

−

p

X

m=1

1

ˆ

Rm(Z(t))em[H†b(t)

m]THTH,(59)

where H†= (HTH)−1HT.

It is shown in the Appendix A that we can simplify

[H†b(t)

m]T=sign nG(t)

m,:o(U−L).(60)

Therefore, we can rewrite the input dependent update expres-

sion in terms of Gas

G(t+1) =G(t)+µ(t)G(t)−T−

p

X

m=1

1

ˆ

Rm(Z(t))em

sign nG(t)

m,:o(U−L)HTH.(61)

8

We identify G∗as a stationary point if and only if it is

mapped to itself after an iteration of the algorithm, which can

be stated as

G∗=G∗+µ(t)(G∗)−T−

p

X

m=1

1

ˆ

Rm(Z(t))em

sign nG(t)

m,:o(U−L)HTH.(62)

We note that since His assumed to be a full rank ma-

trix HTHis invertible. Therefore, the condition in (62) is

equivalent to

(G∗)−T=

p

X

m=1

1

ˆ

Rm(Z(t))emsign nG(t)

m,:o(U−L),(63)

=diag(ˆ

R1(Z),..., ˆ

Rp(Z))−1sign {G∗}(U−L),

(64)

which leads to

diag(ˆ

R1(Z),..., ˆ

Rp(Z))−1sign {G∗}(U−L)GT

∗=I.

(65)

In order to simplify this expression, we deﬁne the two

different normalized versions of G∗, as follows:

•Q∆

=G∗(U−L): This mapping absorbs the source ranges

into overall mapping. Therefore, it deﬁnes mapping from

unity range normalized sources to the separator outputs.

Based on this deﬁnition, one obtains that sign {Q}=

sign {G∗}, where we used the fact that (U−L)is a

diagonal positive deﬁnite matrix. We can write the range

of the corresponding separator output components as

ˆ

Rm(Z∗) = ˆum(Z∗)−ˆ

lm(Z∗),(66)

= (P {(G∗)m,:}U+N {(G∗)m,:}L)1

−(N {(G∗)m,:}U+P {(G∗)m,:}L)1,(67)

= (P {(G∗)m,:} − N {(G∗)m,:}) (U−L)1,(68)

=abs Qm,:1,(69)

=||Qm,:||1,(70)

for m= 1, . . . , p where 1is the all-ones vector. As a

result the condition in (65) can be reduced to

sign {Q}QT=diag ||Q1,:||1,...,||Qp,:||1.(71)

•˜

Q∆

=diag ||Q1,:||1,...,||Qp,:||1−1Q: This mapping

generates unity range separator outputs for the normalized

unity range inputs since

|| ˜

Q1,:||1=|| ˜

Q2,:||1=. . . =|| ˜

Qp,:||1= 1.(72)

Therefore, it deﬁnes a mapping from the unity-range

normalized sources to the unity-range normalized outputs.

As a result, we obtain the simpliﬁed form of the stationary

point condition as

sign n˜

Qo˜

QT=I.(73)

1) Examples: Below we provide some examples for the set

of stationary points satisfying (73):

•Perfect Separators: If ˜

Q=Pdiag(σ)where σ∈

{−1,1}pand Pis a permutation matrix, then

sign n˜

Qo=˜

Qhence

sign n˜

Qo˜

QT=˜

Q˜

QT=I.(74)

This yields G∗=DP hence the corresponding G∗

matrices are perfect separator matrices.

•Orthogonal matrices where non-zero entries in any given

row have the same magnitude: Suppose that ˜

Qis an

orthogonal matrix whose i’th row has αinon-zero values

and the magnitude of corresponding non-zero entries is

1/αi. Therefore, we can write

˜

Q=diag(1/α1,1/α2,...,1/αp)sign n˜

Qo.(75)

This implies that

sign n˜

Qo˜

QT=

diag(α1, α2, . . . , αp)diag(1/α1,1/α2,...,1/αp)

=I.(76)

•Matrices whose entries are powers of 0.5: Deﬁning T=

(0.5)n−1(0.5)n−1(0.5)n−2. . . (0.5)20.5

(0.5)n−1(0.5)n−1(0.5)n−2(0.5)2−0.5

(0.5)n−2(0.5)n−2(0.5)n−3−0.5 0

.

.

..

.

..

.

.0.

.

.

(0.5)2(0.5)2−0.5.

.

.0

0.5−0.5 0 . . . 0 0

a set of stationary points can be deﬁned of the form ˜

Q=

diag(σ)P1T P 2where P1and P1are permutation matrices.

2) Global Characterization of Stationary Points: We note

that the examples provided above do not cover the set of all

stationary points. However, in this subsection, we will show

that if a stationary point of the algorithm (55) (or of the

algorithm (59)) is not a perfect separator, then it is a saddle

point.

As the ﬁrst step, we provide the following lemma which

will be used as an intermediate tool in the proof of the

aforementioned global characterization:

Lemma 1: If a stationary point does not belong to the

set of perfect separators, then its rows and columns can be

permuted such that upper-left 2 by 2 sub-matrix of its sign

matrix becomes

Λ1

∆

=1 1

1−1or Λ−1

∆

=−1−1

−1 1 .

Proof: Let ˜

Qbe a stationary point, i.e. it satisﬁes (73),

which is not a perfect separator. There exists a row of ˜

Q

which has more than one non-zero entries (wlog, we assume

˜

Q1,:). From (73), we have

(p.1) nsign n˜

Qj,:o, j = 1,2, . . . , poare linearly inde-

pendent.

(p.2) sign n˜

Qj,:o˜

QT

1,:= 0 for j= 2, . . . , p.

9

Note that (p.1) implies that at least one ˜

Qj,:has a

non-zero entry overlapping with one of the non-zero entries

of ˜

Q1,:. Otherwise, non-overlap condition restricts span of

nsign n˜

Qj,:o, j = 2, . . . , poto at most p−2dimensional

space which conﬂicts with linear independence.

Furthermore, (p.2) implies that number of overlapping

entries should be greater than one with an alternating sign.

Therefore, rows and columns of a stationary point which is

not a perfect separator can be permuted such that upper-left 2

by 2 sub-matrix of its sign matrix becomes Λ1or Λ−1.

The following is the main theorem for the global

characterization of the stationary point which shows that

the stationary points other than perfect separators are saddle

points:

Theorem 1: If a stationary point of the algorithm (55) (or

of the algorithm (59)) does not belong to the set of perfect

separators, then it is a saddle point.

Proof: Proof is provided in Appendix B.

3) Remarks About Theorem 1: The implication of Theorem

1is really intriguing. Despite the non-convex and non-smooth

objective, we can achieve a global characterization for the

stationary points of the corresponding algorithms. We in

particular show that the algorithms would not stop at some

false minima or maxima. Potential stopping points are either

perfect separators, corresponding to the global optima of the

objective, or saddle points which are not stable in the sense

that the algorithm iterations would leave such points even

through small (such as numerical) perturbations. Here we note

that the stationary points of the algorithm forms a subset of

the Clarke stationary points of the objective function. The

Clarke stationary points are deﬁned as the points for which the

subdifferential of the objective contains zero. The algorithm’s

update contains only a restricted set of subgradients and the

analysis provided above covers this set.

We should also note that the complete characterization of

the convergence behavior of the algorithms is still an open

problem. The main difﬁculty arises from the following fact.

Although, there are some standard results for convex (smooth

and non-smooth) objectives about the equivalence of the limit

points of the algorithm to the stationary points through appro-

priate selection of step-size rules [27], these are not generaliz-

able for simultaneously non-convex and non-smooth objectives

[28], [29]. Along this direction, in [30], the authors proposed

a mesh based approach and showed that the subsequences

of their algorithm converges to a Clarke stationary point

when applied to (a modiﬁed form) of J1objective. In some

sense, this can be considered as a complementary work to our

approach in the article where we obtained the characterization

of a subset of Clarke stationary points corresponding to the

stationary points of our algorithm. Although the analysis of

limit point features of the algorithm is a subject of future

research pursued by the authors, the numerical experiments

support global convergence behavior, for example, when the

step sizes are chosen according to Zero-Limit-Divergent-Sum

rule prescribed for convex functions [27], [28] as illustrated

in Section VIII.

B. Objective Function J2,1(W)

Following similar steps as in the previous section, the

iterative update corresponding to J2,1(W)can be rewritten

in terms of Gas

G(t+1) =G(t)+µ(t)G(t)−T

−

p

X

m=1

p

|| ˆ

R(Z(t))||1

em[H†b(t)

m]THTH,

(77)

where [H†b(t)

m]Twas previously determined in (60). The

stationary points in this case satisfy

p

|| ˆ

R(Z)||1

sign {G∗}(U−L)GT

∗=I.(78)

Using the deﬁnition

Q=G∗(U−L)(79)

while noting that sign {G∗}=sign {Q}and || ˆ

R(Z)||1=

Pp

m=1 ||Qm,:||1hold, we obtain

sign {Q}QT=Pp

m=1 ||Qm,:||1

pI.(80)

Since the diagonal entries of the left side of (80) are

||Q1,:||1,||Q2,:||1, . . . ||Qp,:||1, we have

||Q1,:||1=||Q2,:||1=. . . =||Qp,:||1.

Due to this observation, we can simplify the normalized Q

deﬁnition in this case as ˜

Q=1

||Q1,:||1

Qand obtain

sign n˜

Qo˜

QT=I.(81)

We note that we reach the same condition as in (73) for

the objective function J2,1(W), therefore, the examples for

the ˜

Qmatrices in the previous section apply for this case

too. However, we note that Qmatrices corresponding to the

objective function J1(W)can be obtained by arbitrary scaling

of the rows of ˜

Qwhereas in the case of J2,1(W), we are

constrained to multiply the rows of ˜

Qmatrices using the same

parameter.

Similar to the objective function J1(W), we will show that

if a stationary point of the algorithm (77) is not a perfect

separator, then it is a saddle point.

Theorem 2: If a stationary point of the algorithm (77)

does not belong to the set of perfect separators, then it is a

saddle point.

Proof: In this case, from (21), the cost function in terms of

˜

Qis equivalent to

J(˜

Q) = |det(˜

Q)|

pp.(82)

Therefore, the proof of the Theorem 1 also applies here.

10

C. Objective Function J2,2(W)

The update in terms of Gfor the objective function

J2,2(W)can be obtained as

G(t+1) =G(t)+µ(t)G(t)−T

−

p

X

m=1

pˆ

Rm(Z(t))

|| ˆ

R(Z(t))||2

2

em[H†b(t)

m]THTH.

(83)

The stationary points in this case satisﬁes

diag(ˆ

R1(Z),..., ˆ

Rp(Z))psign {G∗}

|| ˆ

R(Z)||2

2

(U−L)GT

∗=I.(84)

Using the deﬁnition Q=G∗(U−L), we obtain

diag ||Q1,:||1,...,||Qp,:||1sign {Q}QT=

Pp

m=1 ||Qm,:||2

1

p!I.(85)

This implies that

||Q1,:||1=||Q2,:||1=. . . =||Qp,:||1.(86)

Similarly, deﬁning ˜

Q=1

||Q1,:||1

Qyields

sign n˜

Qo˜

QT=I.(87)

We note that this condition is equivalent to the condition for

the objective function J2,1(W).

D. Objective Function J2,∞(W)

The iterative Gupdate corresponding to J2,∞(W)can be

written as

G(t+1) =G(t)+µ(t)G(t)−T

−X

m∈M(Z(t))

pβ(t)

m

|| ˆ

R(Z)||∞

em[H†b(t)

m]THTH,

(88)

Therefore, the stationary points satisfy

X

m∈M(Z)

pβm

|| ˆ

R(Z)||∞

emsign n(G∗)m,:o(U−L)GT

∗=I.(89)

Using the deﬁnition Q=G∗(U−L)yields

X

m∈M(Z)

pβm

|| ˆ

R(Z)||∞

emsign Qm,:QT=I.(90)

We note that in order to satisfy (90), we must have M(Z) =

{1,2, . . . , p}, which implies that the ranges of outputs should

be equal, i.e.,

||Q1,:||1=||Q2,:||1=. . . =||Qp,:||1.

Hence, βm= 1/p for m= 1,2, . . . , p. We similarly deﬁne

˜

Q=1

||Q1,:||1

Qand obtain from (90) that

sign n˜

Qo˜

QT=I.(91)

We note that this condition is equivalent to the condition for

the objective function J2,1(W).

The saddle points are not stable in the sense that small

random perturbations would take the algorithm away from

the saddle points, eventually leading to convergence to perfect

separator points, in the light of Theorem 2.

VI. EX TE NS IO N TO COMPLEX SIGNA LS

In this section, we provide a brief summary of the complex

extension of the BCA algorithms in [21]:

In the complex case, the source vectors and output vectors

belong to Cpand the mixture vectors belong to Cq. The

mixing and separator matrices are complex matrices, i.e.,

H=Cq×pand W=Cp×q. For a given complex vector

x∈Cp, the corresponding isomorphic real vector `x∈R2p

is deﬁned as `x=R(xT)I(xT)T. Furthermore, the

operator ψ:Cp×q→R2p×2qis deﬁned as

ψ(X) = R(X)−I(X)

I(X)R(X).(92)

We note that the isomorphism satisﬁes

z=W y ⇔`z=ψ(W)`y.(93)

Based on this isomorphism, we deﬁne `

Y=

{`y(1),`y(2),..., `y(N)}and `

Z={`z(1),`z(2),...,`z(N)}.

The complex objectives Jc1and J c2,r can be deﬁned from

their real counterparts J1and J2,r simply by replacing ˆ

R(Z)

and ˆ

R(Z)with ˆ

R(`

Z)and ˆ

R(`

Z)respectively. Similarly, the

complex counterparts of the real BCA update rules can be

obtained by replacing the terms ˆ

R(Z),ˆ

R(Y),Wand pin the

subgradient expressions (31-33) with ˆ

R(`

Z),ˆ

R(`

Y),ψ(W)

and 2prespectively.

VII. STATIONARY POINT CHARACTERIZATION FOR

COM PL EX SI GNA LS

In this section, we provide a stationary point analysis for

the BCA algorithms considered for complex sources in the

previous section.

A. Objective Function Jc1(ψ(W))

Through the centralized choice of subgradients as proposed

before, the update rule in terms of ψ(G)can be shown to be

equal to

ψ(G(t+1)) = ψ(G(t)) + µ(t)ψ(G(t))−T−

2p

X

m=1

1

ˆ

Rm(`

Z(t))emsign nψ(G(t))m,:o(UT−LT)ψ(H)Tψ(H),

(94)

11

where UT=UR0

0UI,LT=LR0

0LIand

UR=diag (max(R(s1)),...,max(R(sp))) ,(95)

LR=diag (min(R(s1)),...,min(R(sp))) ,(96)

UI=diag (max(I(s1)),...,max(I(sp))) ,(97)

LI=diag (min(I(s1)),...,min(I(sp))) .(98)

Therefore, the stationary points satisfy

ˆ

D(`

Z)−1sign {ψ(G)∗}(UT−LT)ψ(G)T

∗=I,(99)

where ˆ

D(`

Z) = diag(ˆ

R1(`

Z),..., ˆ

R2p(`

Z)). Using the

deﬁnitions Q=ψ(G)∗(UT−LT)and ˜

Q=

diag ||Q1,:||1,...,||Q2p,:||1−1Qwe can rewrite the station-

ary point condition as

sign n˜

Qo˜

QT=I.(100)

Similar to the real case, we ﬁrst show that the perfect

separators which are the global maxima of the objective

function. We then prove that the other stationary points of

the algorithm (94) are saddle points.

•Perfect Separators: We note that in this case, due to the

structure of ψ(G), positions of non-zero values of ˜

Q:,1:p

sufﬁces to have the positions of non-zero values of ˜

Q. If

˜

Q1:p,:=Pdiag(σ), then sign n˜

Qo=˜

Qtherefore

sign n˜

Qo˜

QT=˜

Q˜

QT=I.(101)

This yields G∗=DP where Dii =αiejπ ki

2, αi∈

R, ki∈Z, i = 1, . . . , p.

Theorem 3: If a stationary point of the algorithm (94)

does not belong to the set of perfect separators, then it is a

saddle point.

Proof: Noting Jc1(˜

Q) = |det(˜

Q)|and Q=ψ(G)∗(UT−

LT),we can directly adapt the proof of the Theorem 1. Hence

all Gmatrices satisfying (100) are saddle points if they are

not perfect separators.

B. Objective Function Jc2,1(ψ(W))

Following similar steps as in the previous subsection, we can

show that the stationary points of the Jc2,1(ψ(W)) objective

satisfy

2p

|| ˆ

R(`

Z)||1

sign {ψ(G)∗}(UT−LT)ψ(G)T

∗=I.(102)

Using the normalization Q=ψ(G)∗(UT−LT)and the facts

sign {ψ(G)∗}=sign {Q}and || ˆ

R(`

Z)||1=P2p

m=1 ||Qm,:||1,

we obtain

sign {Q}QT= P2p

m=1 ||Qm,:||1

2p!I.(103)

Similar to the real case, this implies, ||Q1,:||1=. . . =

||Q2p,:||1. Therefore, the normalized version of Qcan be

obtained as ˜

Q=1

||Q1,:||1

Q, from which the stationary point

condition can be rewritten as

sign n˜

Qo˜

QT=I.(104)

Due to the fact that we reached at the same condition as

the previous objective, all the results regarding ˘

Qapply in this

case too. However, due to the difference in the normalization,

the corresponding Qmatrix can only be obtained by a scalar

scaling of ˘

Q.

C. Objective Functions Jc2,2(ψ(W)) and Jc2,∞(ψ(W))

Similar to the real case, we can show that the same

conclusions are reached as Jc2,1(ψ(W)) objective.

VIII. NUMERICAL EXAMPLES

In this section, we present numerical examples illustrating

the convergence of the algorithms to a global maximum of the

objective functions regardless of the choice of initial seeds.

A. Close to Ideal Setting

We consider the following setup. We generate the sources

through the Copula-t distribution with four degrees of freedom,

a perfect tool for generating vectors with controlled corre-

lation. The correlation matrix of the sources Rsis Toeplitz

with a ﬁrst row given by 1ρs. . . ρp−1

s, and where

the correlation parameter varies in the range 0 to 1. Here, we

consider a scenario with 3 sources and 5 mixtures where the

sample size is 20000. We set ρs= 0.5. In each simulation,

we start with a random initial separator matrix where the

coefﬁcients are generated based on i.i.d. Gaussian distribution.

Figure 1 shows the output total Signal energy to total

Interference energy (over all outputs) Ratio (SIR) obtained for

the BCA algorithm (corresponding to J1) versus the iterations

for a total random start of 50 initial separator matrices. In

these experiments, we used µ(t)=0.5

t+1 as the step size rule,

which satisﬁes the zero-limit and divergent-sum properties.

0 200 400 600 800 1000

0

10

20

30

40

50

60

70

80

Iterations

SIR (dB)

Fig. 4: Convergence curves for BCA algorithm for different

random initial separator matrices.

We observe from the ﬁgures that BCA algorithm converges

to a global maximum irrespective of the choice of initial seeds.

12

0 500 1000 1500 2000

Block Length

0

5

10

15

20

25

30

SINR (dB)

BCA (J2,1)

BCA with Range Estimate

Complex FastICA

Fig. 5: SINR performances for 30dB Receiver SNR.

B. Deviation from the Ideal

The following two factors would have impact on the validity

of the assumptions used in the analysis and therefore the

corresponding theoretical results:

•Finite Sample Effects: If the number of samples used by

the algorithm is not sufﬁciently long (as in the previous

example), the assumption (A1) may not hold.

•Noisy Observations: The BCA Algorithms and their

convergence analysis assume noise-free observations. The

presence of noise would have impact especially in deter-

mining the true ranges for the separator outputs.

In order to investigate the impact of these factors, we

consider a practical digital communication scenario (similar

to the example in [21]):

•There are 8co-channel transmitters: four of them use

4−QAM constellation and the other four use 16−QAM

constellation.

•Their signals are received at a base station with 16

antennas.

•The channel is ﬂat fading, i.e., it has no memory.

For this scenario, we consider high (30dB) and moderate

(15dB) receiver Signal to Noise Ratio (SNR) cases. We

evaluate the Signal to Interference+Noise Ratio (SINR) per-

formance of the BCA (J21) algorithm in comparison with

the complex FastICA approach. We consider different data

lengths, starting from 100 samples.

Both the presence of noise and the availability of short

data records cause inaccuracies in determining the ranges of

separator outputs. For this purpose, we consider an algorithm

modiﬁcation on the iterations of J21 based BCA Algorithm,

whose performance is to be empirically tested. In the modiﬁed

BCA algorithm, the true maximum (minimum) points are

determined by a certain neighborhood of the sample maximum

(minimum). For this purpose, we modify the deﬁnitions of

Km,+(Z)and Km,−(Z)as

•¯

Km,+(Z) = {k:zm(k)≥αˆum(Z)}, which is the set of

index points for which the mth separator output (zm(k))

0 500 1000 1500 2000

Block Length

6

7

8

9

10

11

12

13

14

SINR (dB)

BCA (J2,1)

BCA with Range Estimate

Complex FastICA

Fig. 6: SINR performances for 15dB Receiver SNR.

is located at the α-fractional neighborhood of the sample

maximum (ˆum),

•¯

Km,−(Z) = {k:zm(k)≤αˆ

lm(Z)}, which is the set of

index points for which the mth separator output (zm(k))

is located at the α-fractional neighborhood of the sample

minimum (ˆ

lm).

Here αis a positive quantity whose value is close to but

less than or equal to 1. Its value is tuned based on the noise

variance level. This modiﬁcation is along the same lines with

the approach proposed in [13] and used in [30], where the

output range is estimated based on the largest and the smallest

hsorted output components.

Given these index set deﬁnitions, modiﬁed BCA algorithms

can be obtained by replacing bmin (34)

bm=X

k∈¯

Km,+(Z)

λm,+(k)y(k)

−X

k∈¯

Km,−(Z)

λm,−(k)y(k),

(105)

where we choose λm,+(k) = 1

|¯

Km,+|,λm,−(k) = 1

|¯

Km,−|

for the current numerical example. Furthermore, the range is

replaced by the estimate

ˆ

Rm(Z) = 1

|¯

Km,+|X

k∈¯

Km,+(Z)

zm(k)

−1

|¯

Km,−|X

k∈¯

Km,−(Z)

zm(k).

(106)

In Fig. 5 and Fig. 6, Signal-to-Interference-plus-Noise-Ratio

(SINR) performances of the original BCA algorithm based on

the objective function J21 and its modiﬁcation based on the

range estimate are shown, for receiver SNR levels of 15dB

and 30dB respectively. We also include the performance of

13

(a) Sources with Gaussian Noise (SNR= 15dB)

(b) Mixtures

(c) FastICA Algorithm

(d) Modiﬁed BCA (J21) with α= 0.95

Fig. 7: Image Source Separation Example.

complex FastICA algorithm [31] for comparison. It can be

observed from these ﬁgures that

•The performance of the modiﬁed BCA algorithm with the

range estimate is better than the original BCA algorithm.

The performance gain is more pronounced in the high

noise (low SNR) case.

•In both cases the increasing number of samples improves

the performance, as the probability of satisfying assump-

tion (A1) increases with increasing data lengths.

As the ﬁnal example, we consider the problem of separation

of images along with the perturbation of sources to violate the

special boundedness assumption. For this purpose, similar to

[30], we use 18 natural images from Berkeley segmentation

dataset and benchmark [32]. All images are converted to gray

level and cropped to 200 ×200 size. At each experiment, 6

images are randomly selected, and they are corrupted with

independent identically distributed (i.i.d) Gaussian noise. The

resulting noisy sources are more liklely to violate assumption

(A1) with increasing noise levels. These six sources are mixed

with a 6×6matrix which is composed of i.i.d. distributed

uniform variables with range [0,1].

Fig. 7.(a) illustrates 6noisy sources for an SNR level of

15dB. Fig. 7.(b) are the corresponding mixtures and Fig. 7.(c)

are the FastICA outputs. Finally, Fig 7.(d) are the outputs of

the modiﬁed BCA (J2,1) algorithm with α= 0.95. It is clear

from these ﬁgures that FastICA algorithm output still look

scrambled which can be attributed to the correlations among

the original source images. BCA algorithm achieves visually

satisfactory performance.

This experiment is repeated for different SNR levels, where

for each SNR choice we repeated 50 experiments. At each

experiment, we calculate the Signal to Interference Ratio for

all separator outputs. Fig. 8 shows the average SIR levels for

the original BCA algorithm, modiﬁed BCA algorithm (with

α= 0.95) and FastICA algorithm, as a function of SNR. It

is clear that for large noise levels modiﬁed BCA algorithm

outperforms the original BCA algorithm, however, this trend

is reversed with increasing SNR. We can therefore expect

14

that in the case of high noise levels, the assumption (A1) is

more likely to be violated and the original BCA algorithm’s

performance degrades. The use of modiﬁed BCA algorithm

can reduce the performance degradation in such a scenario.

15 20 25 30 35

SNR (dB)

8

10

12

14

16

18

20

22

24

26

28

SIR (dB)

BCA (J2,1)

Modified BCA (J2,1 ), =0.95

FastICA

Fig. 8: Image separation SIR performance as a function of

SNR.

IX. CONCLUSION

This article offers a stationary point characterization for the

instantaneous BCA algorithms introduced in [21] for both real

and complex algorithm iterations. As an important result, it

is shown that the corresponding stationary points are either

perfect separators, corresponding to the global maxima of the

objectives or saddle points. Therefore, these stationary points

are free of false local maxima or minima that the algorithm

can get stuck. The saddle points are not stable and small

perturbations can lead the algorithm search away from such

points. This is a powerful global result especially considering

the fact that the corresponding BCA objectives are non-convex

and non-smooth functions. The numerical example provided

in the previous section also supports the behavior captured

by the analytical results. Furthermore, we also provided a

modiﬁcation of the BCA algorithms in [21] by appending

range estimation, similar to [13], to address deviation from

ideal assumptions in the form of noisy observations and short

data records. The empirical results conﬁrm the performance

gain of this modiﬁcation especially against noise effects.

APPENDIX

A. Proof of the simpliﬁcation of the source dependent term

In this section, we would like to prove that the term

[H†b(t)

m]T= [H†c(t)

m,+]T−[H†c(t)

m,−]T(107)

simpliﬁes to

[H†b(t)

m]T=sign nG(t)

m,:o(U−L),(108)

which can be written in scalar form as

[H†b(t)

m]n=sign nG(t)

m,no(un−ln).(109)

for n= 1, . . . , p.

Given the deﬁnition of the centers c(t)

y,+and c(t)

y,−, in (50)

and (51), we can use them to express the result in terms of

the sources as

H†c(t)

y,+=X

k∈Kact

m,+(Z(t))

H†y(k)

|Kact

m,+(Z(t))|

=X

k∈Kact

m,+(Z(t))

s(k)

|Kact

m,+(Z(t))|,(110)

and

H†c(t)

y,−=X

k∈Kact

m,−(Z(t))

H†y(k)

|Kact

m,−(Z(t))|

=X

k∈Kact

m,−(Z(t))

s(k)

|Kact

m,−(Z(t))|.(111)

The interpretation of these terms is the following. Since

there are more sensors than sources, the pseudoinverse H†

of the mixing matrix can be used to linearly map, without

loss of information, the signal component of the observations

into the corresponding sources. In this way, the centers of

the opposite (or inverted) faces of the hyper-parallelepiped

of the observations c(t)

y,+and c(t)

y,−, are respectively mapped

to H†c(t)

y,+and H†c(t)

y,−, the centers of the corresponding

opposite faces of the bounding hyper-rectangle of the sources.

From the deﬁnitions (43) and (45), it is straightforward to

observe that the uniform combination in (110) simpliﬁes to

[H†b(t)

m]n=un−ln∀n∈ I(t)

m,+,(112)

and also that (111) simpliﬁes to

[H†b(t)

m]n=ln−un∀n∈ I(t)

m,−.(113)

In the third case, since ln≤sn(t)≤unwe only can obtain

from (43) and (45) the following interval

[H†b(t)

m]n∈(un−ln)[−1,1] ∀n∈ I(t)

m,0.

However, in order to determine the exact value of [H†b(t)

m]n

at n∈ I(t)

m,0, we can resort to the following fact. Since

[H†b(t)

m]in (107) is interpreted as the difference between

centers of opposite faces in the hyper-rectangle of the sources,

geometrically, it should be orthogonal to the subspace where

these faces have been deﬁned (i.e., orthogonal to the unit

vectors en,∀n∈ I(t)

m,0). This enforces the ﬁnal simpliﬁcation

[H†b(t)

m]n=<H†b(t)

m,en>= 0,∀n∈ I(t)

m,0.(114)

Then, we can summarize the three previous results (112)-(114)

into the equation

[H†b(t)

m]n=sign nG(t)

m,no(un−ln),

for n= 1, . . . , p, what proves the desired equality in (109).

Figure 9 illustrates the orthogonality of H†b1to the cor-

responding faces F1,+(Z)and F1,−(Z), for the example

in Figure 2. For this example, H†b1= (u1−l1)e1and

I1,0={2,3}, which conﬁrms the orthogonality property put

forward in (114).

15

Fig. 9: Illustration of orthogonality of H†bmto faces

Fm,+(Z)and Fm,−(Z).

B. Proof of Theorem 1

We note that Gis a perfect separator matrix implies that

˜

Qis a perfect separator matrix and vice versa. Therefore, it

is equivalent to show that all ˜

Qmatrices satisfying (73) are

saddle points if they are not perfect separators.

From (17), the cost function in terms of ˜

Qis equivalent to

J1(˜

Q) = |det(˜

Q)|.(115)

From Lemma 1, ˜

Qcan be permuted such that upper-left

2 by 2 sub-matrix of its sign matrix becomes Λ1or Λ−1.

We deﬁne the permuted matrix as ˘

Qand wlog we assume

sign n˘

Q1:2,1:2o=Λ1. We also observe that J(˜

Q) = J(˘

Q).

We partition

˘

Q="˘

Q(a)˘

Q(b)

˘

Q(c)˘

Q(d)#,

˘

Q(a)=˘

Q1:2,1:2,˘

Q(b)=˘

Q1:2,3:p

˘

Q(c)=˘

Q3:p,1:2,˘

Q(d)=˘

Q3:p,3:p

Note that, sign n˘

Q(a)o=Λ1.

The proof is based on the use of the Schur’s Complement

of ˘

Q(d)which is deﬁned as

∆=˘

Q(a)−˘

Q(b)˘

Q(d)−1˘

Q(c).(116)

Therefore, we ﬁrst need to show that ˘

Q(d)is invertible: from

(73), we have

"˘

Q(a)˘

Q(b)

˘

Q(c)˘

Q(d)#

sign ˘

Q(a)Tsign ˘

Q(c)T

sign ˘

Q(b)Tsign ˘

Q(d)T

=I,

which yields

˘

Q(c)sign ˘

Q(a)T+˘

Q(d)sign ˘

Q(b)T=0.(117)

If ˘

Q(d)is singular, then there exists a non-zero vector x∈

Rp−2such that xT˘

Q(d)=0. Therefore,

xT˘

Q(c)sign ˘

Q(a)T=0,(118)

which yields xT˘

Q(c)=0since sign n˘

Q(a)o=Λ1is non-

singular. Deﬁning the non-zero vector ˆx∈Rpsuch that ˆx=

[0 0 xT]T, we have

˘

Q(a)T˘

Q(c)T

˘

Q(b)T˘

Q(d)T

ˆx=0.(119)

This yields contradiction since ˘

QTis non-singular. Therefore,

˘

Q(d)is non-singular.

We now prove that ˘

Qis a saddle point. Using Schur’s

Complement, we have

J1(˘

Q) =

det ˘

Q(d)det (∆)

.(120)

We note that

∆−1=˘

Q−11:2,1:2 =sign n˜

QTo1:2,1:2 =Λ1.(121)

Hence we obtain ∆=0.5 0.5

0.5−0.5. In order to show that

˘

Qis a saddle point, we will perturb it in two different direc-

tions and show that the objective increases in one direction

while it decreases in the other direction. We note that the

perturbations are chosen in such a way that they will preserve

the property that the rows of the perturbed matrices have unity

l1norms which is due to (72) .

•If we perturb ˘

Q(a)matrix with E1=−

−−, with a

sufﬁciently small > 0, and do not perturb the remaining

entries of ˘

Q, then

–l1norm of any row of perturbed ˘

Qremains equal to

unity: this perturbation affects only ﬁrst two rows of

˘

Qand for these rows, only the upper-left 2×2entry

is affected, which is equivalent to

"˘

Q(a)

1,1+˘

Q(a)

1,2−

˘

Q(a)

2,1−˘

Q(a)

2,2−#.(122)

Due to the fact that sign(˘

Q(a)) = 1 1

1−1, the

l1norms of the rows of both perturbed ˘

Q(a)matrix

and the corresponding perturbed ˘

Qmatrix remain

unchanged.

–det( ˘

Q(d))does not change and |det(∆)|becomes

|det(∆)|+ 22. Hence, we have J1(˜

Q)> J1(˜

Q).

•If we now perturb ˘

Q(a)matrix with E2=−

,

with a sufﬁciently small > 0, then

–l1norm of any row of perturbed ˘

Qremains as

unity, which can be shown using the same line of

arguments as the previous example

–det( ˘

Q(d))does not change and |det(∆)|becomes

|det(∆)| − 22. Hence, we have J1(˜

Q)< J1(˜

Q).

As a result, we can conclude that if a stationary point of

the algorithm (55) (or of the algorithm (59)) does not belong

in the set of perfect separators, then it is a saddle point.

16

REFERENCES

[1] Pierre Comon and Christian Jutten, Handbook of Blind Source Sepa-

ration: Independent Component Analysis and Applications, Academic