Content uploaded by Alper Erdogan
Author content
All content in this area was uploaded by Alper Erdogan on Sep 29, 2022
Content may be subject to copyright.
1
Stationary Point Characterization for a Class of
BCA Algorithms
Huseyin A. Inan, Student Member,IEEE, Alper T. Erdogan, and Sergio Cruces, Senior Member,IEEE
Abstract—Bounded Component Analysis (BCA) is a recently
introduced approach including Independent Component Analysis
(ICA) as a special case under the assumption of source bounded-
ness. In this article, we provide a stationary point analysis for the
recently proposed instantaneous BCA algorithms that are capable
of separating dependent, even correlated as well as independent
sources from their mixtures. The stationary points are identified
and characterized as either perfect separators which are the
global maxima of the proposed optimization scheme or saddle
points. The important result emerging from the analysis is that
there are no local optima that can prevent the proposed BCA
algorithms from converging to perfect separators.
Index terms— Bounded Component Analysis, Independent
Component Analysis, Dependent Source Separation
I. INTRODUCTION
Blind Source Separation (BSS) is one of the most popular
research area in the field of signal processing and machine
learning with a various set of applications [1]. The intensive
use of BSS among applications from a diverse set of disci-
plines owes to its blindness property. However, the blindness
property which is the lack of information on the mixing
system makes BSS problem difficult to solve. The challenge
due to the absence of training data and relational statistical
information is commonly handled by exploiting some side
information/assumptions about the system.
The most widely used technique for the BSS problem
relates to the assumption of mutual statistical independence of
sources. The approach based on the independence assumption
is known as Independent Component Analysis (ICA) and it is
the most popular and successful BSS method [1]–[3]. Besides
Independent Component Analysis (ICA), there are various
BSS methods introduced by exploiting different data model
assumptions such as nonnegative matrix factorization (NMF)
[4], sparsity (e.g., [5]), and special constant modulus or finite
alphabet structure of communications signals (e.g. [6]–[8]).
In practical BSS applications, the source signals are
bounded in amplitude. Puntonet et.al. utilized the source
boundedness in [9] which can be regarded as the pioneering
work in this context. Along with the assumption of indepen-
dence of sources, the boundedness has been exploited in some
recent ICA approaches [10]–[18].
Huseyin A. Inan is with the Electrical Engineering Department, Stanford
University, CA, 94305, USA (e-mail: hinan1@stanford.edu).
Alper T. Erdogan is with the Electrical-Electronics Engineering Depart-
ment, Koc University, Sariyer, Istanbul, 34450, Turkey (e-mail: alperdo-
gan@ku.edu.tr).This work is supported in part by TUBITAK 112E057 project.
Sergio Cruces is with the Department of Teor´
ıa de la Se˜
nal y Comunica-
ciones, Universidad de Sevilla, 41092-Sevilla, Spain (e-mail: sergio@us.es).
Recently, it has been shown in [19] that when the sources
are known to be bounded, the independence assumption can
be relaxed to domain separability assumption which is stated
as: (the convex hull of) the support of the joint density
of sources can be written as the cartesian product of (the
convex hulls of the) individual source supports. We note
that this is a necessary condition for independence, however,
instead of assuming the factorability of the joint pdf to the
product of marginals, it assumes that the extreme points are
included in the joint pdf. Therefore, domain separability is
a much weaker condition than independence. By removing
the independence assumption, this new approach, referred as
Bounded Component Analysis (BCA), enables development of
methods for the separation of independent and/or dependent
sources.
In this new context, a blind source extraction algorithm has
been proposed in [19], and a deflationary algorithm in [20].
Recently, [21] introduced a geometric BCA framework and
proposed algorithms that are able to separate both independent
and dependent (including correlated) bounded sources from
their instantaneous mixtures. This approach is based on max-
imization of relative sizes of two geometric objects, namely
principal hyper-ellipsoid and bounding hyper-rectangle regard-
ing the separator outputs, and the corresponding optimization
setting produces the perfect separators.
More recently, a convolutive BCA approach for wide sense
stationary (dependent or independent) sources was introduced
in [22]. It is then extended to an approach where a deter-
ministic optimization setting is proposed for the convolutive
BCA problem that allows the sources to be potentially nonsta-
tionary in [23]. Moreover, various geometric approaches are
introduced in the context of hyper-spectral imaging in [24],
[25] and [26] that considers a minimum volume of a simplex
circumscribing the data space.
In this article, we provide stationary point characteriza-
tion results corresponding to the BCA algorithms introduced
in [21] that are capable of separating independent and/or
dependent, even correlated real and complex sources. We
note that we do not assume the sources are independent or
uncorrelated. Under the assumption of source boundedness,
the algorithms work for both independent and dependent, even
correlated sources. Despite the difficulty of the convergence
analysis due to the non-convex and non-smooth nature of
the corresponding objectives, we prove that the stationary
points of these BCA algorithms correspond to either perfect
separators which are global maximizers of the introduced
optimization setting or saddle points. This is a remarkable and
important result towards the complete characterization for the
2
global convergence of the parallel source separation algorithm
capable of separating both dependent and independent sources.
The organization of the article is as follows: In Section
II, we introduce the setup that is considered throughout the
article. In Section III, we provide a brief summary of the
BCA approach introduced in [21] and the corresponding
iterative algorithms for the real sources are provided in Section
IV. We present the stationary point characterization results
in Section V. The approach and the corresponding iterative
algorithms for the complex sources in [21] is summarized in
Section VI and the stationary point characterization results
for these algorithms are presented in Section VII. Section
VIII presents numerical examples illustrating the convergence
of the algorithms to the global maximum of the objective
functions regardless of the choice of initial seeds. In the same
section, we also provide an algorithm modification addressing
the case of deviation from ideal assumptions and its empirical
performance study. Finally, Section IX is the conclusion.
Notation: Let A∈Rp×q,C∈Cp×qbe arbitrary real and
complex matrices,
Notation Meaning
Cm,:(C:,m)mth row (column) of C
P{A}(N {A})converts negative (positive) components of A
to zero while preserving others
sign{A}replaces the negative entries of A
with -1 and the positive entries of Awith 1
R(C) (I(C)) extracts the real (imaginary) part of C
index mused for (source, output) vector components
index ksample index
index titeration index
emvector with all zeros except
1at the index m
p(q) number of sources (mixtures)
sp-dimensional source vectors
Hq×p-dimensional mixing matrix
yq-dimensional mixtures
Wp×q-dimensional separator matrix
zp-dimensional separator outputs
Gp×p-dimensional overall mapping
(from sources to separator outputs)
Qp×p-dimensional overall mapping
(from unity-range normalized sources
to separator outputs)
ˆum(Z)the maximum of m-th separator output
ˆ
lm(Z)the minimum of m-th separator output
ˆ
Rm(Z)the range of m-th separator output
ˆ
R(Z)the separator output sample covariance matrix
Km,+(Z)the set of index locations where maximum
is achieved for m-th separator output
Km,−(Z)the set of index locations where minimum
is achieved for m-th separator output
λconvex combination coefficient
M(Z)the set of separator output indexes
for which the maximum range is achieved
II. BOUNDED COMPONENT ANA LYSIS SETUP
We assume a deterministic BCA setup consisting of p
real sources which are represented by the vector s(k) =
s1(k)s2(k). . . sp(k)Tand we denote the corre-
sponding set of unobservable source samples by
S={s(1),s(2),...,s(N)}.(1)
We assume that the sources are bounded in magnitude and
u=maxk(s1(k)) maxk(s2(k)) . . . maxk(sp(k))T(2)
l=mink(s1(k)) mink(s2(k)) . . . mink(sp(k))T,(3)
are the vectors containing maximum (minimum) values for the
components of the source samples in the set S. We also define
the diagonal matrices,
U=diag(u)(4)
L=diag(l)(5)
containing maximum (minimum) sources values on their di-
agonal.
Practical digital signals naturally satisfy such magnitude
boundedness assumption and it is a perfect fit especially for
digital communication symbols [18].
We point out that the sources are not assumed to be inde-
pendent, or uncorrelated. Instead we assume that the sources
satisfy BCAs domain separability assumption [19], which is a
weaker assumption than independence.
The sources are mixed by a memoryless system with trans-
fer matrix H∈Rq×pwhere we consider (over)determined
system, i.e., q≥p. Hence, the sources and the mixtures are
related by
y(k) = Hs(k), k = 1,2, . . . N. (6)
The goal is to obtain a separator matrix W∈Rp×q, which
produces the outputs
z(k) = W y(k), k = 1,2, . . . N , (7)
as the potentially permuted and scaled versions of the original
sources. We denote the overall mapping from the sources to
the separator outputs by G=W H ∈Rp×pwhere the relation
between the sources and the outputs can be written as
z(k) = Gs(k)k= 1,2, . . . N. (8)
For the output set Z={z(1), . . . , z(N)}, we define the
following statistics:
•The maximum of output component zm:
ˆum(Z) = max
k∈{1,...,N}zm(k),(9)
•The minimum of output component zm:
ˆ
lm(Z) = min
k∈{1,...,N}zm(k),(10)
•The range of output component zm:
ˆ
Rm(Z) = ˆum(Z)−ˆ
lm(Z),(11)
for m= 1, . . . , p.
•The output range vector:
ˆ
R(Z) = ˆ
R1(Z)ˆ
R2(Z). . . ˆ
Rp(Z)T,(12)
•The output sample covariance
ˆ
R(Z) = 1
N
N
X
k=1
z(k)z(k)T−ˆ
µ(Z)ˆ
µ(Z)T,(13)
where ˆ
µ(Z) = 1
NPN
k=1 z(k).
In the next section, we provide a brief summary of the
instantaneous BCA approach introduced in [21].
3
III. GEOMETRIC APP ROAC H FO R BOU ND ED COMPONENT
ANALYSIS
The approach in [21] exploits two geometric objects
which are principal hyper-ellipsoid and bounding hyper-
rectangle defined on the set of output samples Z=
{z(1),z(2),...,z(N)}.
Fig. 1: Geometric Approach for Bounded Component Analy-
sis.
Figure 1 provides three dimensional illustrations of the
objects used by the geometric framework:
•In the source domain on the left-hand-side of the figure,
(blue) dots represent the source samples in S.
•In the separator output domain (on the right):
–The (blue) dots represent the separator output sam-
ples in Z.
–The (purple) box is the bounding hyper-rectangle
which is defined as
Bz={y∈ <p:ˆ
lm(Z)≤ym≤ˆum(Z), m = 1, . . . , p},(14)
–The (red) ellipsoid represents the principal hyper-
ellipsoid corresponding to the separator output sam-
ples which is defined as
Ez={q|(q−ˆ
µz)Tˆ
R−1
z(q−ˆ
µz)≤1}.(15)
The separation problem is posed as maximization of the
relative sizes of these objects, in which the size of hyper-
ellipsoid is chosen as its volume. When the size of bounding
hyper-rectangle is chosen as its volume, the objective function
becomes
J1(W) = Vol(Ez)
Vol(Bz)(16)
=Cpqdet( ˆ
R(Z))
Qp
m=1 ˆ
Rm(Z),(17)
where
•Cpqdet( ˆ
R(Z)) is the volume of the principal hyper-
ellipsoid where
Cp=πp
2
Γp
2+ 1,(18)
where Γ(·)is the Gamma function,
•Qp
m=1 ˆ
Rm(Z)is the volume of the bounding hyper-
rectangle.
The following assumption is introduced regarding the
hyper-rectangle:
Assumption: The set Scontains the vertices of its
(non-degenerate) bounding hyper-rectangle (A1).
This assumption may be regarded as a sample-oriented version
of a classical set of BCA assumptions for the random vector
of sources [19]:
1. Compactness and nondegeneracy of the sources:
all the sources are non-degenerate1random vari-
ables of compact support.
2. Cartesian decomposition of the convex support of
the sources: the minimum convex cover of the support
of the random vector of sources can be decomposed
as the Cartesian product of the individual convex
support of the sources.
These properties replace the hypothesis of the mutual in-
dependence of the sources in ICA, which is no longer neces-
sary. They jointly imply that the convex hull of the sources
should be a bounded hyper-rectangle with positive volume.
Assumption A1 translates this request from the theoretical
support set of the sources to the empirical support set of their
samples. It is more suitable for samples drawn from continuous
sub-Gausssian sources (with asymptotically large samples) or
discrete sources such as digital communication signals.
Under the assumption A1, we can write ˆum(Z) =
P(Gm,:)u+N(Gm,:)land ˆ
lm(Z) = N(Gm,:)u+P(Gm,:)l
which would further imply ˆ
Rm(Z) = kGm,:(U−L)k1. Thus,
we can write the objective function J1in terms of Gas
J1(G) = Cp
|det(G)|qdet( ˆ
R(S))
Qp
m=1 kGm,:(U−L)k1
.(19)
In [21], it is proven that the global maxima of (17) corre-
sponds to perfect separators, i.e., Gcan be written as
G=DP ,(20)
where Dis a diagonal matrix with non-zero diagonal entries,
and Pis a permutation matrix.
When the size of the bounding hyper-rectangle is chosen as
a norm of its main diagonal, a family of alternative objective
functions is obtained in the form
J2,r(W) = Cpqdet( ˆ
R(Z))
|| ˆ
R(Z)||p
r
.(21)
Here || · ||ris the standard r-norm in <p. It is shown in [21]
that Gis a global maximum of (21) if and only if it can be
written in the form
G=dP(U−L)−1diag(σ),(22)
where dis a non-zero value, σ∈ {−1,1}p. In this case, all
the members of the global optima set share the same relative
source scalings, unlike the set of global optima for J1which
has arbitrary relative scalings.
1A random variable is considered degenerate when the support of its p.d.f.
consist in a single point.
4
IV. ITE RATIVE BCA ALGORITHMS
The reference [21] also provides the iterative algorithms for
maximizing the objectives introduced in the previous section,
whose convergence behavior is the focus of this article. For
this purpose the logarithms of the objectives
log(J1) = 1
2log(det( ˆ
R(Z))) −
p
X
m=1
log( ˆ
Rp(Z)) + log(Cp)
(23)
log(J2,r ) = 1
2log(det( ˆ
R(Z))) −plog(kˆ
R(Z)kr) + log(Cp)
(24)
are used, where the rational expressions are converted into
more convenient difference terms. We note that the second
terms on the right hand sides of both (23) and (24) contain
non-differentiable range operator. Therefore, the iterative up-
date terms for Wcontain the subgradient terms corresponding
to this operator.
Before we provide the subgradient based iterative algo-
rithms, we define the related notation as follows:
•Wrepresents the separator matrix,
•z(k)represents the output vector with sample index k,
which can be written as
z(k) = W y(k), k = 1, . . . , N, (25)
•Z={z(1),...,z(N)}is the set of separator outputs,
•Km,+(Z) = {k:zm(k) = ˆum(Z)}is the set of index
points of the mth separator output component (zm(k))
for which the maximum value (ˆum)is achieved. Infor-
mally, one can represent in MATLAB the set Km,+(Z)
with the array variable Km,+and write
[ˆum, Km,+] = max(zm(:)),(26)
•Km,−(Z) = {k:zm(k) = ˆ
lm(Z)}is the set of index
points of the mth separator output component for which
the minimum value (ˆ
lm)is achieved, i.e., in MATLAB
notation
[ˆ
lm, Km,−] = min(zm(:)).(27)
•The subdifferential set for the non-differentiable maxi-
mum operator ˆum(Z)is defined as
∂ˆum(Z) = Co {y(k) : k∈ Km,+(Z)},(28)
where Co is the convex hull operator,
•The subdifferential set for the non-differentiable mini-
mum operator ˆ
lm(Z)at iteration tis defined as
∂ˆ
lm(Z) = Co {y(k) : k∈ Km,−(Z)},(29)
•The subdifferential set for the non-differentiable range
operator ˆ
Rm(Z)at iteration tis defined as
∂ˆ
Rm(Z) = Co{ymax −ymin :ymax ∈∂ˆum(Z),
ymin ∈∂ˆ
lm(Z)}(30)
Based on these definitions, the corresponding subgradients
of the criteria are as follows [21]:
∇log J1(W) = Wˆ
R(Y)WT−1
Wˆ
R(Y)
−
p
X
m=1
1
ˆ
Rm(Z)embmT,(31)
and, for r= 1,2,
∇log J2,r(W) = Wˆ
R(Y)WT−1
Wˆ
R(Y)
−
p
X
m=1
pˆ
Rm(Z)r−1
|| ˆ
R(Z)||r
r
embmT,(32)
while, for r=∞,
∇log J2,∞(W) = Wˆ
R(Y)WT−1
Wˆ
R(Y)
−X
m∈M(Z)
pβm
|| ˆ
R(Z)||∞
embmT.(33)
These subgradients depend on the following definitions:
•emis the vector with all zeros except 1 at the index m,
i.e., [em]j=δmj .
•bmis the result of a subtraction of convex combinations
bm=X
k∈Km,+(Z)
λm,+(k)y(k)−X
k∈Km,−(Z)
λm,−(k)y(k).
(34)
Here, {λm,+(k) : k∈ Km,+(Z)}is the set of convex
combination coefficients used for combining the input
vectors causing the maximum output, which satisfy for
m= 1,2, . . . , p,
X
k∈Km,+(Z)
λm,+(k)=1with λm,+(k)≥0,
∀k∈ Km,+(Z),(35)
Similarly, {λm,−(k) : k∈ Km,−(Z)}is the set of convex
combination coefficients used for combining the input
vectors causing the minimum output, which satisfy for
m= 1,2, . . . , p,
X
k∈Km,−(Z)
λm,−(k)=1with λm,−(k)≥0,
∀k∈ Km,−(Z).(36)
•M(Z)is the set of indexes for which the peak range
values is achieved, i.e.,
M(Z) = {m:ˆ
Rm(Z) = || ˆ
R(Z)||∞},(37)
•βm=1
|M(Z)|, where | · | is the operator corresponding
to the cardinality of its argument.
5
A. The choice of the subgradient
For the simplicity in the exposition, in this subsection we
focus the analysis in the criterion log J1(W), although the
similar conclusions can be easily obtained for log J2,r(W)
and log J2,∞(W).
As evident from (19) the objective J1is almost everywhere
differentiable, except for subspaces of zero volume measure
where one or more of the elements of the overall transfer
matrix Gare exactly equal to zero. At the differentiable points,
the subdifferential ∂log J1(W)contains a single element,
which necessarily coincides with the gradient of the criterion.
In this situation, the subdifferential sets of the maximum
∂ˆum(Z)and the minimum ∂ˆ
lm(Z)of the outputs are also of
cardinality one. Therefore, the vector bmin (34) is unique re-
gardless of the choice of the convex combinations of λm,+(k)
and λm,−(k)that respectively satisfy (35) and (36).
The probability of falling at random into the subset of non-
differentiable points of the criterion is, in practice, negligible.
However, for the sake of completeness, we will also consider
this case in our analysis.
At the non-differentiable points of the criterion, the cardi-
nality of the subdifferential ∂log J1(W)is strictly greater than
one. Thus, we have the possibility to decide which subgradient
to use for the ascent algorithm by specifying the coefficients of
the convex combinations λm,+(k)and λm,−(k). The remain-
ing part of this subsection describes the centered choice of the
subgradient within the subdifferential set and its direct compu-
tation from the samples of the observations. Those readers that
are only interested in the computation of the subgradient may
skip the explanations of the following subitems 1) to 3), and
proceed directly to 4) where a brief guideline of the required
steps for this computation is presented.
1) Pruning the index sets for subdifferentials: For a separa-
tion matrix W(t)at iteration t, the output vectors z(t)(k)are
computed using (25). After that, one can use (26) to determine
Km,+(Z(t)), the set of indices that maximize the mth com-
ponent of the output. We determine a subset Kuniq
m,+(Z(t)), a
pruned version of Km,+(Z(t))that retains only those indices
that do not point to replicate observation vectors. This can
be implemented in MATLAB for arrays of indices Km,+and
Kuniq
m,+(containing the elements of their respective sets), with
the command
[aux0, Kuniq
m,+] = unique(y(:, Km,+)0,0rows0),(38)
where the second output argument is the desired array of
indices with the elements of Kuniq
m,+(Z(t)), and where the first
output argument is an auxiliary matrix which collects the
observation vectors with indices in Kuniq
m,+(Z(t)).
Similarly, we can use equation (27) to define the set of
indices Km,−(Z(t))that minimize the mth component of the
outputs, and determine the subset Kuniq
m,−(Z(t))that retains only
those indices that do not point to replicate observation vectors.
2) The geometry of subdifferential sets ∂ˆum(Z)and
∂ˆ
lm(Z):Before we prescribe the subgradient choices, we
investigate the direct relation between the subdifferentials
∂ˆum(Z(t))and ∂ˆ
lm(Z(t))and the hyper-parallelepiped of
the observations. These subdifferentials are the collections of
subgradients for the respective maximum and minimum of the
mth component of the outputs.
Let us recall that, given the set of source vectors S=
{s(1),s(2),...,s(N)}, assumption A1 guarantees that S
contains the vertices of its (non-degenerate) bounding hyper-
rectangle. At iteration t, we can write mth component of the
separator output z(t)(k)in terms of the overall mapping matrix
G(t)as
z(t)
m(k) = G(t)
m,:s(k),(39)
where G(t)
m,:= [G(t)
m,1, . . . , G(t)
m,p]. Then, the source indices
can be grouped, according to the sign of the overall mapping
coefficients G(t)
m,n, in the following three sets:
I(t)
m,+={n|G(t)
m,n >0, n ∈ {1, . . . , p}},(40)
I(t)
m,−={n|G(t)
m,n <0, n ∈ {1, . . . , p}},(41)
I(t)
m,0={n|G(t)
m,n = 0, n ∈ {1, . . . , p}}.(42)
By definition, the maximum value of z(t)
m(k)for a given
G(t)
m,:is attained at the index locations k∈ Kuniq
m,+(Z(t)). For
the maximum to be achieved, the source components (for n=
1, . . . , p) will require the following configuration
sn(k) =
unfor n∈ I(t)
m,+
lnfor n∈ I(t)
m,−
arbitrary value in [ln, un]for n∈ I(t)
m,0
(43)
where un(and respectively ln) represent the maximum (min-
imum) value for the source component sn, which coincides
with the nth diagonal entry of the diagonal matrix U(or
respectively L). According to (43), the vectors sn(k)have
fixed extreme values at the components corresponding to
the nonzero entries of the vector G(t)
m,:, and an arbitrary
value from the source domains in the other components, i.e.,
corresponding to the zero entries of the vector G(t)
m,:.
Let Suniq
m,+(Z(t)) = {sn(k) : k∈ Kuniq
m,+(Z(t))}represent
the collection of unique source vectors in the form (43)
generating the maximum value of z(t)
m(k). This set is a subset
of the d-dimensional face of the bounding hyper-rectangle
of sources, Fm,+(Z(t)), where dis the cardinality of I(t)
m,0.
Here Fm,+(Z(t))can be defined as the collection all vectors
satisfying (43), and it can be written as the convex hull of
the set of its 2dvertices, V(s)
m,+(Z(t)), which is obtained by
selecting the extreme values in (43), which are also the vertex
points of the bounding hyper-rectangle of sources. Note that by
assumption A1, the set of vertices V(s)
m,+(Z(t))is also included
in Suniq
m,+(Z(t)).
As an example, Figure 2 illustrates the faces Fm,+of the
bounding hyper-rectangle of sources, for a scenario with 3-
sources and 3-mixtures scenario and
G=
100
112
044
.(44)
In this figure,
•F1,+(Z)and F1,−(Z)are 2-faces of the source bounding
hyper-rectangle (I1,0has cardinality 2)
6
Fig. 2: d-faces of the bounding hyper-rectangle of sources
corresponding to ˆum(Z)and ˆ
lm(Z)for a scenario with 3
sources and 3 mixtures.
•F2,−(Z)and F2,−(Z)are singleton sets each correspond-
ing to a vertex of the source bounding hyper-rectangle
(I2,0has cardinality 0),
•F3,+(Z)and F3,−(Z)are 1-faces (edges) of the source
bounding hyper-rectangle (I3,0has cardinality 1)
Similarly, the minimum value of z(t)
m(k)in (39) will be
achieved at the index locations k∈ Kuniq
m,−(Z(t)). In this case,
the source components (for n= 1, . . . , p) will require the
configuration
sn(k) =
lnfor n∈ I(t)
m,+
unfor n∈ I(t)
m,−
arbitrary value in [ln, un]for n∈ I(t)
m,0
(45)
which corresponds to points in the opposite face of the hyper-
rectangle of the sources, which we represent with Fm,−(Z(t)).
We also use V(s)
m,−(Z)to represent its set of vertices.
Let V(y)
m,+(Z(t))(V(y)
m,−(Z(t))) represent the image of
V(s)
m,+(Z(t))(V(s)
m,−(Z(t))) under the mixing map H.
V(y)
m,+(Z(t))(V(y)
m,−(Z(t))) is contained in ∂ˆum(Z(t))
(∂ˆ
lm(Z(t))), and it forms the set of vertex points of this set.
We also note that ∂ˆum(Z(t))(∂ˆ
lm(Z(t))) is the image of
Fm,+(Z(t))(Fm,−(Z(t))), and it is a d-dimensional face of
the hyper-parallelepiped which is the image of the bounding
hyper-rectangle of the sources. We also note that
∂ˆum(Z(t)) = Co V(y)
m,+(Z(t)),(46)
and
∂ˆ
lm(Z(t)) = Co V(y)
m,−(Z(t)).(47)
Due to the point symmetry of the hyper-parallelepiped with
respect to its center, the sets ∂ˆum(Z(t))and ∂ˆ
lm(Z(t))are
mirror (inverted) images of each other, coinciding with oppo-
site faces in the hyper-parallelepiped.
As an example, Figure 3 illustrates the subdifferential sets
ˆum(Z)and ˆ
lm(Z)for the example in Figure 2. Based on this
figure, we can see that
Fig. 3: Illustration of subdifferential sets ˆum(Z)and ˆ
lm(Z)
for a scenario with 3 sources and 3 mixtures.
•∂ˆu1(Z)and ∂ˆ
l1(Z)are 2-faces of the parallelepiped, and
they are the images of F1,+(Z)and F1,−(Z)in Figure
2 respectively.
•∂ˆu2and ∂ˆ
l2are singleton sets each containing one vertex,
and they are the images of F2,+(Z)and F2,−(Z)in
Figure 2 respectively.
•∂ˆu3and ∂ˆ
l3are 1-faces (edges) of the parallelepiped, and
they are the images of F3,+(Z)and F3,−(Z)in Figure
2 respectively.
We also note that ∂ˆui,∂ˆ
lipairs are located symmetrically with
respect to the center of the parallelepiped. The same figure also
illustrates V(y)
1,+(Z)which is the set if vertices of ˆu1(Z).
3) The centers of subdifferential sets as the subgradient
choices: Based on the geometrical picture introduced in the
previous subsection, we introduce the subgradient selection
approach as the center of the subdifferential sets ˆum(Z(t))
and ˆ
lm(Z(t)). On one hand, at the differentiable points of the
criterion, each subdifferential ˆum(Z(t))or ˆ
lm(Z(t))consists
of a unique vertex (0-dimensional face) of the parallelepiped.
Therefore, regardless of the considered convex combination
coefficients, there is only one possible subgradient choice.
On the other hand, at the non-differentiable points of the
criterion, the subdifferentials are d-dimensional faces of the
hyper-parallelepiped where drepresents the number of zero
elements in G(t)
m,:(the m-th row of the global transfer matrix
at iteration t). In this latter case, we can specify which
subgradient to use and, for this purpose, one only needs to
specify the coefficients of the convex combinations in (34),
which represent the generalized barycentric coordinates of
their respective subgradients.
Let us denote the subsets of Kuniq
m,+(Z(t))and Kuniq
m,−(Z(t))
that correspond to the active (i.e., non-zero) convex combina-
tion coefficients respectively by
Kact
m,+(Z(t)) = nk:λ(t)
m,+(k)>0, k ∈ Kuniq
m,+(Z(t))o,(48)
Kact
m,−(Z(t)) = nk:λ(t)
m,−(k)>0, k ∈ Kuniq
m,−(Z(t))o.(49)
Mainly for the purpose of easing the analysis of the resulting
algorithm, we propose a centered choice for the subgradient
7
within each subdifferential set under consideration. For this
purpose, the set Kact
m,+(Z(t))(respectively Kact
m,−(Z(t))) is cho-
sen so as to select those pairs of vertices of the subdifferential
∂ˆum(Z(t))(respectively ∂ˆ
lm(Z(t))) with maximum separa-
tion, i.e., those for which the separation equals the diameter
of the subdifferential sets.
Note that these active subsets can be determined directly
from Kuniq
m,+(Z(t))and Kuniq
m,−(Z(t))without the need to eval-
uate ∂ˆum(Z(t))or ∂ˆ
lm(Z(t)). All that one has to do is to
form Kact
m,+(Z(t))by collecting all those indices k1, k2∈
Kuniq
m,+(Z(t)), for which the separation between the correspond-
ing observations, y(k1)and y(k2), is maximum. The same
procedure is applied to obtain Kact
m,−(Z(t))from Kuniq
m,−(Z(t)).
Since the center of a face of the hyper-parallelepiped of
the observations coincides with the average of the pairs of
vertices in that face with maximal separation, the center of
the set ∂ˆum(Z(t))is determined by
c(t)
m,+=X
k∈Kact
m,+(Z(t))
y(k)
|Kact
m,+(Z(t))|,(50)
while the center of the set ∂ˆ
lm(Z(t))is also given by
c(t)
m,−=X
k∈Kact
m,−(Z(t))
y(k)
|Kact
m,−(Z(t))|.(51)
Therefore, we choose the active coefficients which are uniform
in value, having for m= 1, . . . , p,
λ(t)
m,+(k) = (1
|Kact
m,+(Z(t))|∀k∈ Kact
m,+(Z(t))
0∀k6∈ Kact
m,+(Z(t))(52)
and
λ(t)
m,−(k) = (1
|Kact
m,−(Z(t))|∀k∈ Kact
m,−(Z(t))
0∀k6∈ Kact
m,−(Z(t)).(53)
The difference of both centers is our proposed subgradient
choice within the subdifferential set of the range operator
b(t)
m=c(t)
m,+−c(t)
m,−∈∂ˆ
Rm(Z(t)).(54)
4) Summary of the steps for the subgradient computation.
Let’s start assuming that the array zm(:) collects the ele-
ments of the mth-esime output at iteration t. For m= 1, . . . , p,
equations (26) and (27) respectively determine the arrays of
indices Km,+and Km,−. These are pruned with the procedure
described in (38) to obtain Kuniq
m,+and Kuniq
m,−. Almost always,
the pruned arrays will contain a single element, in which
case b(t)
m=y(:, Kuniq
m,+(1))) −y(:, Kuniq
m,−(1)). Otherwise, the
active set Kact
m,+(Z(t))collects all those indices k1, k2, for
which the separation of y(:, Kuniq
m,+(k1)) and y(:, Kuniq
m,+(k2))
is maximum. Similarly, Kact
m,−is defined. The centers of the
subdifferentials ∂ˆum(Z(t))and ∂ˆ
lm(Z(t))are respectively
obtained by (50) and (51). Their difference in (54) determines
the vector b(t)
m. Finally, b(t)
mfor m= 1, . . . , p, can be
substituted in equations (31)-(33) to determine the proposal
for the subgradient of the criterion ∇log J(W(t)), where J
can be particularized to: J1,J2,r and J2,∞.
B. The BCA algorithms
The subgradient ascent algorithm which optimizes the BCA
criterion log J(W(t))is then equal to
W(t+1) =W(t)+µ(t)∇log J(W(t)).(55)
This iteration can be also written in terms of the global transfer
matrix G(t)=W(t)Hfor the posterior analysis of the
algorithms. For this purpose we multiply both sides of (55)
by Hfrom right to obtain
G(t+1) =G(t)+µ(t)∇log J(W(t))H.(56)
V. STATI ONARY POI NT CHARACTERIZATION FOR BCA
ALGORITHMS
In this section, we provide stationary point analysis results
for the BCA algorithms presented in the previous section.
A. Objective Function J1(W)
In order to identify stationary points, we start from the
particularization of the iterative algorithm (56) for the first
objective function and substitute ˆ
R(Y) = Hˆ
R(S)HTto
obtain
G(t+1) =G(t)+µ(t)
W(t)Hˆ
R(S)HTW(t)T−1
W(t)Hˆ
R(S)HTH
−
p
X
m=1
1
ˆ
Rm(Z(t))emb(t)
m
TH,(57)
where
ˆ
R(S) = 1
N
N
X
k=1
s(k)s(k)T−ˆ
µ(S)ˆ
µ(S)T(58)
is the sample covariance matrix for the source samples Sand
ˆ
µ(S) = 1
NPN
k=1 s(k)is the sample mean for the same set.
After grouping common factors we obtain
G(t+1)
=G(t)+µ(t)
(G(t))
−Tˆ
R(S)
−1(G(t))
−1G(t)ˆ
R(S)
−
p
X
m=1
1
ˆ
Rm(Z(t))em[H†b(t)
m]THTH,(59)
where H†= (HTH)−1HT.
It is shown in the Appendix A that we can simplify
[H†b(t)
m]T=sign nG(t)
m,:o(U−L).(60)
Therefore, we can rewrite the input dependent update expres-
sion in terms of Gas
G(t+1) =G(t)+µ(t)G(t)−T−
p
X
m=1
1
ˆ
Rm(Z(t))em
sign nG(t)
m,:o(U−L)HTH.(61)
8
We identify G∗as a stationary point if and only if it is
mapped to itself after an iteration of the algorithm, which can
be stated as
G∗=G∗+µ(t)(G∗)−T−
p
X
m=1
1
ˆ
Rm(Z(t))em
sign nG(t)
m,:o(U−L)HTH.(62)
We note that since His assumed to be a full rank ma-
trix HTHis invertible. Therefore, the condition in (62) is
equivalent to
(G∗)−T=
p
X
m=1
1
ˆ
Rm(Z(t))emsign nG(t)
m,:o(U−L),(63)
=diag(ˆ
R1(Z),..., ˆ
Rp(Z))−1sign {G∗}(U−L),
(64)
which leads to
diag(ˆ
R1(Z),..., ˆ
Rp(Z))−1sign {G∗}(U−L)GT
∗=I.
(65)
In order to simplify this expression, we define the two
different normalized versions of G∗, as follows:
•Q∆
=G∗(U−L): This mapping absorbs the source ranges
into overall mapping. Therefore, it defines mapping from
unity range normalized sources to the separator outputs.
Based on this definition, one obtains that sign {Q}=
sign {G∗}, where we used the fact that (U−L)is a
diagonal positive definite matrix. We can write the range
of the corresponding separator output components as
ˆ
Rm(Z∗) = ˆum(Z∗)−ˆ
lm(Z∗),(66)
= (P {(G∗)m,:}U+N {(G∗)m,:}L)1
−(N {(G∗)m,:}U+P {(G∗)m,:}L)1,(67)
= (P {(G∗)m,:} − N {(G∗)m,:}) (U−L)1,(68)
=abs Qm,:1,(69)
=||Qm,:||1,(70)
for m= 1, . . . , p where 1is the all-ones vector. As a
result the condition in (65) can be reduced to
sign {Q}QT=diag ||Q1,:||1,...,||Qp,:||1.(71)
•˜
Q∆
=diag ||Q1,:||1,...,||Qp,:||1−1Q: This mapping
generates unity range separator outputs for the normalized
unity range inputs since
|| ˜
Q1,:||1=|| ˜
Q2,:||1=. . . =|| ˜
Qp,:||1= 1.(72)
Therefore, it defines a mapping from the unity-range
normalized sources to the unity-range normalized outputs.
As a result, we obtain the simplified form of the stationary
point condition as
sign n˜
Qo˜
QT=I.(73)
1) Examples: Below we provide some examples for the set
of stationary points satisfying (73):
•Perfect Separators: If ˜
Q=Pdiag(σ)where σ∈
{−1,1}pand Pis a permutation matrix, then
sign n˜
Qo=˜
Qhence
sign n˜
Qo˜
QT=˜
Q˜
QT=I.(74)
This yields G∗=DP hence the corresponding G∗
matrices are perfect separator matrices.
•Orthogonal matrices where non-zero entries in any given
row have the same magnitude: Suppose that ˜
Qis an
orthogonal matrix whose i’th row has αinon-zero values
and the magnitude of corresponding non-zero entries is
1/αi. Therefore, we can write
˜
Q=diag(1/α1,1/α2,...,1/αp)sign n˜
Qo.(75)
This implies that
sign n˜
Qo˜
QT=
diag(α1, α2, . . . , αp)diag(1/α1,1/α2,...,1/αp)
=I.(76)
•Matrices whose entries are powers of 0.5: Defining T=
(0.5)n−1(0.5)n−1(0.5)n−2. . . (0.5)20.5
(0.5)n−1(0.5)n−1(0.5)n−2(0.5)2−0.5
(0.5)n−2(0.5)n−2(0.5)n−3−0.5 0
.
.
..
.
..
.
.0.
.
.
(0.5)2(0.5)2−0.5.
.
.0
0.5−0.5 0 . . . 0 0
a set of stationary points can be defined of the form ˜
Q=
diag(σ)P1T P 2where P1and P1are permutation matrices.
2) Global Characterization of Stationary Points: We note
that the examples provided above do not cover the set of all
stationary points. However, in this subsection, we will show
that if a stationary point of the algorithm (55) (or of the
algorithm (59)) is not a perfect separator, then it is a saddle
point.
As the first step, we provide the following lemma which
will be used as an intermediate tool in the proof of the
aforementioned global characterization:
Lemma 1: If a stationary point does not belong to the
set of perfect separators, then its rows and columns can be
permuted such that upper-left 2 by 2 sub-matrix of its sign
matrix becomes
Λ1
∆
=1 1
1−1or Λ−1
∆
=−1−1
−1 1 .
Proof: Let ˜
Qbe a stationary point, i.e. it satisfies (73),
which is not a perfect separator. There exists a row of ˜
Q
which has more than one non-zero entries (wlog, we assume
˜
Q1,:). From (73), we have
(p.1) nsign n˜
Qj,:o, j = 1,2, . . . , poare linearly inde-
pendent.
(p.2) sign n˜
Qj,:o˜
QT
1,:= 0 for j= 2, . . . , p.
9
Note that (p.1) implies that at least one ˜
Qj,:has a
non-zero entry overlapping with one of the non-zero entries
of ˜
Q1,:. Otherwise, non-overlap condition restricts span of
nsign n˜
Qj,:o, j = 2, . . . , poto at most p−2dimensional
space which conflicts with linear independence.
Furthermore, (p.2) implies that number of overlapping
entries should be greater than one with an alternating sign.
Therefore, rows and columns of a stationary point which is
not a perfect separator can be permuted such that upper-left 2
by 2 sub-matrix of its sign matrix becomes Λ1or Λ−1.
The following is the main theorem for the global
characterization of the stationary point which shows that
the stationary points other than perfect separators are saddle
points:
Theorem 1: If a stationary point of the algorithm (55) (or
of the algorithm (59)) does not belong to the set of perfect
separators, then it is a saddle point.
Proof: Proof is provided in Appendix B.
3) Remarks About Theorem 1: The implication of Theorem
1is really intriguing. Despite the non-convex and non-smooth
objective, we can achieve a global characterization for the
stationary points of the corresponding algorithms. We in
particular show that the algorithms would not stop at some
false minima or maxima. Potential stopping points are either
perfect separators, corresponding to the global optima of the
objective, or saddle points which are not stable in the sense
that the algorithm iterations would leave such points even
through small (such as numerical) perturbations. Here we note
that the stationary points of the algorithm forms a subset of
the Clarke stationary points of the objective function. The
Clarke stationary points are defined as the points for which the
subdifferential of the objective contains zero. The algorithm’s
update contains only a restricted set of subgradients and the
analysis provided above covers this set.
We should also note that the complete characterization of
the convergence behavior of the algorithms is still an open
problem. The main difficulty arises from the following fact.
Although, there are some standard results for convex (smooth
and non-smooth) objectives about the equivalence of the limit
points of the algorithm to the stationary points through appro-
priate selection of step-size rules [27], these are not generaliz-
able for simultaneously non-convex and non-smooth objectives
[28], [29]. Along this direction, in [30], the authors proposed
a mesh based approach and showed that the subsequences
of their algorithm converges to a Clarke stationary point
when applied to (a modified form) of J1objective. In some
sense, this can be considered as a complementary work to our
approach in the article where we obtained the characterization
of a subset of Clarke stationary points corresponding to the
stationary points of our algorithm. Although the analysis of
limit point features of the algorithm is a subject of future
research pursued by the authors, the numerical experiments
support global convergence behavior, for example, when the
step sizes are chosen according to Zero-Limit-Divergent-Sum
rule prescribed for convex functions [27], [28] as illustrated
in Section VIII.
B. Objective Function J2,1(W)
Following similar steps as in the previous section, the
iterative update corresponding to J2,1(W)can be rewritten
in terms of Gas
G(t+1) =G(t)+µ(t)G(t)−T
−
p
X
m=1
p
|| ˆ
R(Z(t))||1
em[H†b(t)
m]THTH,
(77)
where [H†b(t)
m]Twas previously determined in (60). The
stationary points in this case satisfy
p
|| ˆ
R(Z)||1
sign {G∗}(U−L)GT
∗=I.(78)
Using the definition
Q=G∗(U−L)(79)
while noting that sign {G∗}=sign {Q}and || ˆ
R(Z)||1=
Pp
m=1 ||Qm,:||1hold, we obtain
sign {Q}QT=Pp
m=1 ||Qm,:||1
pI.(80)
Since the diagonal entries of the left side of (80) are
||Q1,:||1,||Q2,:||1, . . . ||Qp,:||1, we have
||Q1,:||1=||Q2,:||1=. . . =||Qp,:||1.
Due to this observation, we can simplify the normalized Q
definition in this case as ˜
Q=1
||Q1,:||1
Qand obtain
sign n˜
Qo˜
QT=I.(81)
We note that we reach the same condition as in (73) for
the objective function J2,1(W), therefore, the examples for
the ˜
Qmatrices in the previous section apply for this case
too. However, we note that Qmatrices corresponding to the
objective function J1(W)can be obtained by arbitrary scaling
of the rows of ˜
Qwhereas in the case of J2,1(W), we are
constrained to multiply the rows of ˜
Qmatrices using the same
parameter.
Similar to the objective function J1(W), we will show that
if a stationary point of the algorithm (77) is not a perfect
separator, then it is a saddle point.
Theorem 2: If a stationary point of the algorithm (77)
does not belong to the set of perfect separators, then it is a
saddle point.
Proof: In this case, from (21), the cost function in terms of
˜
Qis equivalent to
J(˜
Q) = |det(˜
Q)|
pp.(82)
Therefore, the proof of the Theorem 1 also applies here.
10
C. Objective Function J2,2(W)
The update in terms of Gfor the objective function
J2,2(W)can be obtained as
G(t+1) =G(t)+µ(t)G(t)−T
−
p
X
m=1
pˆ
Rm(Z(t))
|| ˆ
R(Z(t))||2
2
em[H†b(t)
m]THTH.
(83)
The stationary points in this case satisfies
diag(ˆ
R1(Z),..., ˆ
Rp(Z))psign {G∗}
|| ˆ
R(Z)||2
2
(U−L)GT
∗=I.(84)
Using the definition Q=G∗(U−L), we obtain
diag ||Q1,:||1,...,||Qp,:||1sign {Q}QT=
Pp
m=1 ||Qm,:||2
1
p!I.(85)
This implies that
||Q1,:||1=||Q2,:||1=. . . =||Qp,:||1.(86)
Similarly, defining ˜
Q=1
||Q1,:||1
Qyields
sign n˜
Qo˜
QT=I.(87)
We note that this condition is equivalent to the condition for
the objective function J2,1(W).
D. Objective Function J2,∞(W)
The iterative Gupdate corresponding to J2,∞(W)can be
written as
G(t+1) =G(t)+µ(t)G(t)−T
−X
m∈M(Z(t))
pβ(t)
m
|| ˆ
R(Z)||∞
em[H†b(t)
m]THTH,
(88)
Therefore, the stationary points satisfy
X
m∈M(Z)
pβm
|| ˆ
R(Z)||∞
emsign n(G∗)m,:o(U−L)GT
∗=I.(89)
Using the definition Q=G∗(U−L)yields
X
m∈M(Z)
pβm
|| ˆ
R(Z)||∞
emsign Qm,:QT=I.(90)
We note that in order to satisfy (90), we must have M(Z) =
{1,2, . . . , p}, which implies that the ranges of outputs should
be equal, i.e.,
||Q1,:||1=||Q2,:||1=. . . =||Qp,:||1.
Hence, βm= 1/p for m= 1,2, . . . , p. We similarly define
˜
Q=1
||Q1,:||1
Qand obtain from (90) that
sign n˜
Qo˜
QT=I.(91)
We note that this condition is equivalent to the condition for
the objective function J2,1(W).
The saddle points are not stable in the sense that small
random perturbations would take the algorithm away from
the saddle points, eventually leading to convergence to perfect
separator points, in the light of Theorem 2.
VI. EX TE NS IO N TO COMPLEX SIGNA LS
In this section, we provide a brief summary of the complex
extension of the BCA algorithms in [21]:
In the complex case, the source vectors and output vectors
belong to Cpand the mixture vectors belong to Cq. The
mixing and separator matrices are complex matrices, i.e.,
H=Cq×pand W=Cp×q. For a given complex vector
x∈Cp, the corresponding isomorphic real vector `x∈R2p
is defined as `x=R(xT)I(xT)T. Furthermore, the
operator ψ:Cp×q→R2p×2qis defined as
ψ(X) = R(X)−I(X)
I(X)R(X).(92)
We note that the isomorphism satisfies
z=W y ⇔`z=ψ(W)`y.(93)
Based on this isomorphism, we define `
Y=
{`y(1),`y(2),..., `y(N)}and `
Z={`z(1),`z(2),...,`z(N)}.
The complex objectives Jc1and J c2,r can be defined from
their real counterparts J1and J2,r simply by replacing ˆ
R(Z)
and ˆ
R(Z)with ˆ
R(`
Z)and ˆ
R(`
Z)respectively. Similarly, the
complex counterparts of the real BCA update rules can be
obtained by replacing the terms ˆ
R(Z),ˆ
R(Y),Wand pin the
subgradient expressions (31-33) with ˆ
R(`
Z),ˆ
R(`
Y),ψ(W)
and 2prespectively.
VII. STATIONARY POINT CHARACTERIZATION FOR
COM PL EX SI GNA LS
In this section, we provide a stationary point analysis for
the BCA algorithms considered for complex sources in the
previous section.
A. Objective Function Jc1(ψ(W))
Through the centralized choice of subgradients as proposed
before, the update rule in terms of ψ(G)can be shown to be
equal to
ψ(G(t+1)) = ψ(G(t)) + µ(t)ψ(G(t))−T−
2p
X
m=1
1
ˆ
Rm(`
Z(t))emsign nψ(G(t))m,:o(UT−LT)ψ(H)Tψ(H),
(94)
11
where UT=UR0
0UI,LT=LR0
0LIand
UR=diag (max(R(s1)),...,max(R(sp))) ,(95)
LR=diag (min(R(s1)),...,min(R(sp))) ,(96)
UI=diag (max(I(s1)),...,max(I(sp))) ,(97)
LI=diag (min(I(s1)),...,min(I(sp))) .(98)
Therefore, the stationary points satisfy
ˆ
D(`
Z)−1sign {ψ(G)∗}(UT−LT)ψ(G)T
∗=I,(99)
where ˆ
D(`
Z) = diag(ˆ
R1(`
Z),..., ˆ
R2p(`
Z)). Using the
definitions Q=ψ(G)∗(UT−LT)and ˜
Q=
diag ||Q1,:||1,...,||Q2p,:||1−1Qwe can rewrite the station-
ary point condition as
sign n˜
Qo˜
QT=I.(100)
Similar to the real case, we first show that the perfect
separators which are the global maxima of the objective
function. We then prove that the other stationary points of
the algorithm (94) are saddle points.
•Perfect Separators: We note that in this case, due to the
structure of ψ(G), positions of non-zero values of ˜
Q:,1:p
suffices to have the positions of non-zero values of ˜
Q. If
˜
Q1:p,:=Pdiag(σ), then sign n˜
Qo=˜
Qtherefore
sign n˜
Qo˜
QT=˜
Q˜
QT=I.(101)
This yields G∗=DP where Dii =αiejπ ki
2, αi∈
R, ki∈Z, i = 1, . . . , p.
Theorem 3: If a stationary point of the algorithm (94)
does not belong to the set of perfect separators, then it is a
saddle point.
Proof: Noting Jc1(˜
Q) = |det(˜
Q)|and Q=ψ(G)∗(UT−
LT),we can directly adapt the proof of the Theorem 1. Hence
all Gmatrices satisfying (100) are saddle points if they are
not perfect separators.
B. Objective Function Jc2,1(ψ(W))
Following similar steps as in the previous subsection, we can
show that the stationary points of the Jc2,1(ψ(W)) objective
satisfy
2p
|| ˆ
R(`
Z)||1
sign {ψ(G)∗}(UT−LT)ψ(G)T
∗=I.(102)
Using the normalization Q=ψ(G)∗(UT−LT)and the facts
sign {ψ(G)∗}=sign {Q}and || ˆ
R(`
Z)||1=P2p
m=1 ||Qm,:||1,
we obtain
sign {Q}QT= P2p
m=1 ||Qm,:||1
2p!I.(103)
Similar to the real case, this implies, ||Q1,:||1=. . . =
||Q2p,:||1. Therefore, the normalized version of Qcan be
obtained as ˜
Q=1
||Q1,:||1
Q, from which the stationary point
condition can be rewritten as
sign n˜
Qo˜
QT=I.(104)
Due to the fact that we reached at the same condition as
the previous objective, all the results regarding ˘
Qapply in this
case too. However, due to the difference in the normalization,
the corresponding Qmatrix can only be obtained by a scalar
scaling of ˘
Q.
C. Objective Functions Jc2,2(ψ(W)) and Jc2,∞(ψ(W))
Similar to the real case, we can show that the same
conclusions are reached as Jc2,1(ψ(W)) objective.
VIII. NUMERICAL EXAMPLES
In this section, we present numerical examples illustrating
the convergence of the algorithms to a global maximum of the
objective functions regardless of the choice of initial seeds.
A. Close to Ideal Setting
We consider the following setup. We generate the sources
through the Copula-t distribution with four degrees of freedom,
a perfect tool for generating vectors with controlled corre-
lation. The correlation matrix of the sources Rsis Toeplitz
with a first row given by 1ρs. . . ρp−1
s, and where
the correlation parameter varies in the range 0 to 1. Here, we
consider a scenario with 3 sources and 5 mixtures where the
sample size is 20000. We set ρs= 0.5. In each simulation,
we start with a random initial separator matrix where the
coefficients are generated based on i.i.d. Gaussian distribution.
Figure 1 shows the output total Signal energy to total
Interference energy (over all outputs) Ratio (SIR) obtained for
the BCA algorithm (corresponding to J1) versus the iterations
for a total random start of 50 initial separator matrices. In
these experiments, we used µ(t)=0.5
t+1 as the step size rule,
which satisfies the zero-limit and divergent-sum properties.
0 200 400 600 800 1000
0
10
20
30
40
50
60
70
80
Iterations
SIR (dB)
Fig. 4: Convergence curves for BCA algorithm for different
random initial separator matrices.
We observe from the figures that BCA algorithm converges
to a global maximum irrespective of the choice of initial seeds.
12
0 500 1000 1500 2000
Block Length
0
5
10
15
20
25
30
SINR (dB)
BCA (J2,1)
BCA with Range Estimate
Complex FastICA
Fig. 5: SINR performances for 30dB Receiver SNR.
B. Deviation from the Ideal
The following two factors would have impact on the validity
of the assumptions used in the analysis and therefore the
corresponding theoretical results:
•Finite Sample Effects: If the number of samples used by
the algorithm is not sufficiently long (as in the previous
example), the assumption (A1) may not hold.
•Noisy Observations: The BCA Algorithms and their
convergence analysis assume noise-free observations. The
presence of noise would have impact especially in deter-
mining the true ranges for the separator outputs.
In order to investigate the impact of these factors, we
consider a practical digital communication scenario (similar
to the example in [21]):
•There are 8co-channel transmitters: four of them use
4−QAM constellation and the other four use 16−QAM
constellation.
•Their signals are received at a base station with 16
antennas.
•The channel is flat fading, i.e., it has no memory.
For this scenario, we consider high (30dB) and moderate
(15dB) receiver Signal to Noise Ratio (SNR) cases. We
evaluate the Signal to Interference+Noise Ratio (SINR) per-
formance of the BCA (J21) algorithm in comparison with
the complex FastICA approach. We consider different data
lengths, starting from 100 samples.
Both the presence of noise and the availability of short
data records cause inaccuracies in determining the ranges of
separator outputs. For this purpose, we consider an algorithm
modification on the iterations of J21 based BCA Algorithm,
whose performance is to be empirically tested. In the modified
BCA algorithm, the true maximum (minimum) points are
determined by a certain neighborhood of the sample maximum
(minimum). For this purpose, we modify the definitions of
Km,+(Z)and Km,−(Z)as
•¯
Km,+(Z) = {k:zm(k)≥αˆum(Z)}, which is the set of
index points for which the mth separator output (zm(k))
0 500 1000 1500 2000
Block Length
6
7
8
9
10
11
12
13
14
SINR (dB)
BCA (J2,1)
BCA with Range Estimate
Complex FastICA
Fig. 6: SINR performances for 15dB Receiver SNR.
is located at the α-fractional neighborhood of the sample
maximum (ˆum),
•¯
Km,−(Z) = {k:zm(k)≤αˆ
lm(Z)}, which is the set of
index points for which the mth separator output (zm(k))
is located at the α-fractional neighborhood of the sample
minimum (ˆ
lm).
Here αis a positive quantity whose value is close to but
less than or equal to 1. Its value is tuned based on the noise
variance level. This modification is along the same lines with
the approach proposed in [13] and used in [30], where the
output range is estimated based on the largest and the smallest
hsorted output components.
Given these index set definitions, modified BCA algorithms
can be obtained by replacing bmin (34)
bm=X
k∈¯
Km,+(Z)
λm,+(k)y(k)
−X
k∈¯
Km,−(Z)
λm,−(k)y(k),
(105)
where we choose λm,+(k) = 1
|¯
Km,+|,λm,−(k) = 1
|¯
Km,−|
for the current numerical example. Furthermore, the range is
replaced by the estimate
ˆ
Rm(Z) = 1
|¯
Km,+|X
k∈¯
Km,+(Z)
zm(k)
−1
|¯
Km,−|X
k∈¯
Km,−(Z)
zm(k).
(106)
In Fig. 5 and Fig. 6, Signal-to-Interference-plus-Noise-Ratio
(SINR) performances of the original BCA algorithm based on
the objective function J21 and its modification based on the
range estimate are shown, for receiver SNR levels of 15dB
and 30dB respectively. We also include the performance of
13
(a) Sources with Gaussian Noise (SNR= 15dB)
(b) Mixtures
(c) FastICA Algorithm
(d) Modified BCA (J21) with α= 0.95
Fig. 7: Image Source Separation Example.
complex FastICA algorithm [31] for comparison. It can be
observed from these figures that
•The performance of the modified BCA algorithm with the
range estimate is better than the original BCA algorithm.
The performance gain is more pronounced in the high
noise (low SNR) case.
•In both cases the increasing number of samples improves
the performance, as the probability of satisfying assump-
tion (A1) increases with increasing data lengths.
As the final example, we consider the problem of separation
of images along with the perturbation of sources to violate the
special boundedness assumption. For this purpose, similar to
[30], we use 18 natural images from Berkeley segmentation
dataset and benchmark [32]. All images are converted to gray
level and cropped to 200 ×200 size. At each experiment, 6
images are randomly selected, and they are corrupted with
independent identically distributed (i.i.d) Gaussian noise. The
resulting noisy sources are more liklely to violate assumption
(A1) with increasing noise levels. These six sources are mixed
with a 6×6matrix which is composed of i.i.d. distributed
uniform variables with range [0,1].
Fig. 7.(a) illustrates 6noisy sources for an SNR level of
15dB. Fig. 7.(b) are the corresponding mixtures and Fig. 7.(c)
are the FastICA outputs. Finally, Fig 7.(d) are the outputs of
the modified BCA (J2,1) algorithm with α= 0.95. It is clear
from these figures that FastICA algorithm output still look
scrambled which can be attributed to the correlations among
the original source images. BCA algorithm achieves visually
satisfactory performance.
This experiment is repeated for different SNR levels, where
for each SNR choice we repeated 50 experiments. At each
experiment, we calculate the Signal to Interference Ratio for
all separator outputs. Fig. 8 shows the average SIR levels for
the original BCA algorithm, modified BCA algorithm (with
α= 0.95) and FastICA algorithm, as a function of SNR. It
is clear that for large noise levels modified BCA algorithm
outperforms the original BCA algorithm, however, this trend
is reversed with increasing SNR. We can therefore expect
14
that in the case of high noise levels, the assumption (A1) is
more likely to be violated and the original BCA algorithm’s
performance degrades. The use of modified BCA algorithm
can reduce the performance degradation in such a scenario.
15 20 25 30 35
SNR (dB)
8
10
12
14
16
18
20
22
24
26
28
SIR (dB)
BCA (J2,1)
Modified BCA (J2,1 ), =0.95
FastICA
Fig. 8: Image separation SIR performance as a function of
SNR.
IX. CONCLUSION
This article offers a stationary point characterization for the
instantaneous BCA algorithms introduced in [21] for both real
and complex algorithm iterations. As an important result, it
is shown that the corresponding stationary points are either
perfect separators, corresponding to the global maxima of the
objectives or saddle points. Therefore, these stationary points
are free of false local maxima or minima that the algorithm
can get stuck. The saddle points are not stable and small
perturbations can lead the algorithm search away from such
points. This is a powerful global result especially considering
the fact that the corresponding BCA objectives are non-convex
and non-smooth functions. The numerical example provided
in the previous section also supports the behavior captured
by the analytical results. Furthermore, we also provided a
modification of the BCA algorithms in [21] by appending
range estimation, similar to [13], to address deviation from
ideal assumptions in the form of noisy observations and short
data records. The empirical results confirm the performance
gain of this modification especially against noise effects.
APPENDIX
A. Proof of the simplification of the source dependent term
In this section, we would like to prove that the term
[H†b(t)
m]T= [H†c(t)
m,+]T−[H†c(t)
m,−]T(107)
simplifies to
[H†b(t)
m]T=sign nG(t)
m,:o(U−L),(108)
which can be written in scalar form as
[H†b(t)
m]n=sign nG(t)
m,no(un−ln).(109)
for n= 1, . . . , p.
Given the definition of the centers c(t)
y,+and c(t)
y,−, in (50)
and (51), we can use them to express the result in terms of
the sources as
H†c(t)
y,+=X
k∈Kact
m,+(Z(t))
H†y(k)
|Kact
m,+(Z(t))|
=X
k∈Kact
m,+(Z(t))
s(k)
|Kact
m,+(Z(t))|,(110)
and
H†c(t)
y,−=X
k∈Kact
m,−(Z(t))
H†y(k)
|Kact
m,−(Z(t))|
=X
k∈Kact
m,−(Z(t))
s(k)
|Kact
m,−(Z(t))|.(111)
The interpretation of these terms is the following. Since
there are more sensors than sources, the pseudoinverse H†
of the mixing matrix can be used to linearly map, without
loss of information, the signal component of the observations
into the corresponding sources. In this way, the centers of
the opposite (or inverted) faces of the hyper-parallelepiped
of the observations c(t)
y,+and c(t)
y,−, are respectively mapped
to H†c(t)
y,+and H†c(t)
y,−, the centers of the corresponding
opposite faces of the bounding hyper-rectangle of the sources.
From the definitions (43) and (45), it is straightforward to
observe that the uniform combination in (110) simplifies to
[H†b(t)
m]n=un−ln∀n∈ I(t)
m,+,(112)
and also that (111) simplifies to
[H†b(t)
m]n=ln−un∀n∈ I(t)
m,−.(113)
In the third case, since ln≤sn(t)≤unwe only can obtain
from (43) and (45) the following interval
[H†b(t)
m]n∈(un−ln)[−1,1] ∀n∈ I(t)
m,0.
However, in order to determine the exact value of [H†b(t)
m]n
at n∈ I(t)
m,0, we can resort to the following fact. Since
[H†b(t)
m]in (107) is interpreted as the difference between
centers of opposite faces in the hyper-rectangle of the sources,
geometrically, it should be orthogonal to the subspace where
these faces have been defined (i.e., orthogonal to the unit
vectors en,∀n∈ I(t)
m,0). This enforces the final simplification
[H†b(t)
m]n=<H†b(t)
m,en>= 0,∀n∈ I(t)
m,0.(114)
Then, we can summarize the three previous results (112)-(114)
into the equation
[H†b(t)
m]n=sign nG(t)
m,no(un−ln),
for n= 1, . . . , p, what proves the desired equality in (109).
Figure 9 illustrates the orthogonality of H†b1to the cor-
responding faces F1,+(Z)and F1,−(Z), for the example
in Figure 2. For this example, H†b1= (u1−l1)e1and
I1,0={2,3}, which confirms the orthogonality property put
forward in (114).
15
Fig. 9: Illustration of orthogonality of H†bmto faces
Fm,+(Z)and Fm,−(Z).
B. Proof of Theorem 1
We note that Gis a perfect separator matrix implies that
˜
Qis a perfect separator matrix and vice versa. Therefore, it
is equivalent to show that all ˜
Qmatrices satisfying (73) are
saddle points if they are not perfect separators.
From (17), the cost function in terms of ˜
Qis equivalent to
J1(˜
Q) = |det(˜
Q)|.(115)
From Lemma 1, ˜
Qcan be permuted such that upper-left
2 by 2 sub-matrix of its sign matrix becomes Λ1or Λ−1.
We define the permuted matrix as ˘
Qand wlog we assume
sign n˘
Q1:2,1:2o=Λ1. We also observe that J(˜
Q) = J(˘
Q).
We partition
˘
Q="˘
Q(a)˘
Q(b)
˘
Q(c)˘
Q(d)#,
˘
Q(a)=˘
Q1:2,1:2,˘
Q(b)=˘
Q1:2,3:p
˘
Q(c)=˘
Q3:p,1:2,˘
Q(d)=˘
Q3:p,3:p
Note that, sign n˘
Q(a)o=Λ1.
The proof is based on the use of the Schur’s Complement
of ˘
Q(d)which is defined as
∆=˘
Q(a)−˘
Q(b)˘
Q(d)−1˘
Q(c).(116)
Therefore, we first need to show that ˘
Q(d)is invertible: from
(73), we have
"˘
Q(a)˘
Q(b)
˘
Q(c)˘
Q(d)#
sign ˘
Q(a)Tsign ˘
Q(c)T
sign ˘
Q(b)Tsign ˘
Q(d)T
=I,
which yields
˘
Q(c)sign ˘
Q(a)T+˘
Q(d)sign ˘
Q(b)T=0.(117)
If ˘
Q(d)is singular, then there exists a non-zero vector x∈
Rp−2such that xT˘
Q(d)=0. Therefore,
xT˘
Q(c)sign ˘
Q(a)T=0,(118)
which yields xT˘
Q(c)=0since sign n˘
Q(a)o=Λ1is non-
singular. Defining the non-zero vector ˆx∈Rpsuch that ˆx=
[0 0 xT]T, we have
˘
Q(a)T˘
Q(c)T
˘
Q(b)T˘
Q(d)T
ˆx=0.(119)
This yields contradiction since ˘
QTis non-singular. Therefore,
˘
Q(d)is non-singular.
We now prove that ˘
Qis a saddle point. Using Schur’s
Complement, we have
J1(˘
Q) =
det ˘
Q(d)det (∆)
.(120)
We note that
∆−1=˘
Q−11:2,1:2 =sign n˜
QTo1:2,1:2 =Λ1.(121)
Hence we obtain ∆=0.5 0.5
0.5−0.5. In order to show that
˘
Qis a saddle point, we will perturb it in two different direc-
tions and show that the objective increases in one direction
while it decreases in the other direction. We note that the
perturbations are chosen in such a way that they will preserve
the property that the rows of the perturbed matrices have unity
l1norms which is due to (72) .
•If we perturb ˘
Q(a)matrix with E1=−
−−, with a
sufficiently small > 0, and do not perturb the remaining
entries of ˘
Q, then
–l1norm of any row of perturbed ˘
Qremains equal to
unity: this perturbation affects only first two rows of
˘
Qand for these rows, only the upper-left 2×2entry
is affected, which is equivalent to
"˘
Q(a)
1,1+˘
Q(a)
1,2−
˘
Q(a)
2,1−˘
Q(a)
2,2−#.(122)
Due to the fact that sign(˘
Q(a)) = 1 1
1−1, the
l1norms of the rows of both perturbed ˘
Q(a)matrix
and the corresponding perturbed ˘
Qmatrix remain
unchanged.
–det( ˘
Q(d))does not change and |det(∆)|becomes
|det(∆)|+ 22. Hence, we have J1(˜
Q)> J1(˜
Q).
•If we now perturb ˘
Q(a)matrix with E2=−
,
with a sufficiently small > 0, then
–l1norm of any row of perturbed ˘
Qremains as
unity, which can be shown using the same line of
arguments as the previous example
–det( ˘
Q(d))does not change and |det(∆)|becomes
|det(∆)| − 22. Hence, we have J1(˜
Q)< J1(˜
Q).
As a result, we can conclude that if a stationary point of
the algorithm (55) (or of the algorithm (59)) does not belong
in the set of perfect separators, then it is a saddle point.
16
REFERENCES
[1] Pierre Comon and Christian Jutten, Handbook of Blind Source Sepa-
ration: Independent Component Analysis and Applications, Academic