Content uploaded by Eren Babatas

Author content

All content in this area was uploaded by Eren Babatas on Aug 08, 2018

Content may be subject to copyright.

1

An Algorithmic Framework for Sparse Bounded

Component Analysis

Eren Babatas and Alper T. Erdogan

Koc University, Istanbul, Turkey

Abstract—Bounded Component Analysis (BCA) is a recent

approach which enables the separation of both dependent and

independent signals from their mixtures. This article introduces

a novel deterministic instantaneous BCA framework for the

separation of sparse bounded sources. The framework is based on

a geometric maximization setting, where the objective function is

deﬁned as the volume ratio of two objects, namely the principal

hyperellipsoid and the bounding `1-norm ball, deﬁned over the

separator output samples. It is shown that all global maxima

of this objective are perfect separators. The article also provides

the corresponding iterative algorithms for both real and complex

sparse sources. The numerical experiments illustrate the potential

beneﬁts of the proposed approach, with applications on image

separation and neuron identiﬁcation.

Index Terms—Sparse Source Separation, Bounded Component

Analysis, Independent Component Analysis, Blind Source Sepa-

ration, Sparse Component Analysis

I. INTRODUCTION

Bounded Component Analysis (BCA) is a framework for

blind separation of both independent and dependent sources

from their mixtures [1]. It is demonstrated to be useful espe-

cially for settings involving dependent sources such as natural

images, and/or short data records such as digital communica-

tion packets where the mutual independence assumption used

by Independent Component Analysis (ICA) approach is either

not directly applicable or ill-posed [2]–[5].

Having its roots in the modeling of neuro-physiological

processes, ICA has been the most popular framework for the

Blind Source Separation (BSS) problem [6]–[8]. The mutual

independence of sources is a sufﬁcient assumption for obtain-

ing the solution of the linear BSS problem, and this fact has

led to several tractable algorithms and analysis results for BSS

(see for example [9]–[11]). However, for some applications,

the source signals can be dependent/correlated. Even in the

case where the generating processes may be independent, the

available source samples may not be sufﬁciently long to reﬂect

this underlying stochastic property.

BCA addresses this issue through relaxing the mutual

independence assumption by replacing it with the weaker

domain separability assumption under the side information

that sources take their values from a compact set [1]. In this

sense, BCA can be viewed as an extension of ICA for bounded

magnitude signals allowing the separation of both dependent

and independent sources. In [2], a geometric framework for

developing BCA algorithms for simultaneous separation of

sources was proposed. This framework was based on the

assumption that the source samples appropriately ﬁll out the

rectangular region (`∞-norm ball) which is the Cartesian

product of individual source domains. This seems to be a nice

ﬁt for many of the practical signals such as digital communica-

tion constellations, natural images, harmonic oscillations with

subgaussian nature. However, it is not appropriate for heavy

tailed or sparse signals with supergaussian distribution. It is

the goal of this current article to extend the BCA framework

in [2] to the settings involving sparse bounded sources.

We should note at this point that the sparsity concept has

been at the focus of several different research ﬁelds including

signal processing, machine learning and neuroscience for

several decades. One of the major drivers of this area has been

the efﬁcient representation and denoising of signals through

complete or overcomplete representations especially based on

wavelet transforms [12], [13]. Connected with this research,

the emergence of mathematically tractable cost functions used

for measuring (non)sparsity, such as `1-norm, and related

convex optimization based formulations generated signiﬁcant

boost and attracted interest from a wider research community.

Another important driver has been the successful modeling

of the visual process in brain through sparse coding [14],

which led to further sparse coding based approaches in both

computational neuroscience and machine learning ﬁelds [15]–

[17]. The main projection of the sparsity driven research in

signal processing community has been the compressed sensing

ﬁeld where the main interest has been the exploitation of

the sparsity side information (in an appropriate basis) in the

sampling of signals [18]–[20].

Sparsity side information has also been exploited to solve

BSS problem especially in conjunction with the ICA approach

[7]. The ICA approaches with sparsity promoting contrast

functions can be listed as the main examples in this ﬁeld, e.g.,

[21]. Sparsity side information can be also used to replace the

independence assumption in separating sources, e.g. [22]. Gen-

eral BSS approaches exploiting sparsity are referred as Sparse

Component Analysis (SCA) methods, e.g., [23], [24]. SCA

algorithms have two main branches: underdetermined SCA,

with more sources than mixtures, and (over)determined SCA.

We can list cluster based approaches, such as [22], [25], and

alternating projection based dictionary learning algorithms,

such as [26], [27], as examples of underdetermined SCA.

For (over)determined SCA methods, we can give Maximum

Likelihood / Maximum A Posteriori based approaches [28],

[29] as examples.

In this article, we expand the approach proposed in [2]

to sparsely natured bounded signals for the (over)determined

mixing scenario. We modify the geometric framework in [2]

2

Fig. 1: Geometric objects for the proposed sparse BCA framework. Diamond shaped boxes on the left and right sides are the

bounding l1-norm-balls for the source and separator output samples respectively. Red balls on the left and right sides are the

principal hyperellipsoids corresponding to the source and separator output samples respectively. Green polytope on the right

is the image of the input l1-norm-ball under the overall mapping G=WH.

to reﬂect the sparsity of the original source vectors. Similar to

[2], we pose the Sparse BCA (SBCA) problem as the volume

ratio maximization for the objects deﬁned at the separator

output domain. As the main deviation from the approach in

[2], the sources are assumed to be sparse in the sense that

they are located in an `1-norm ball, and they are locally

dominant only for a small subset of samples. The proposed

approach is completely deterministic and potentially applica-

ble to separation of both dependent and independent sources.

The contributions covered in this article were partly presented

in the conference article [30]. The present article provides a

comprehensive treatment of the proposed framework including

new sections on

•The algorithm extensions for the complex source signals

and noisy mixtures (in Section IV),

•Two proposals for the algorithm acceleration ( in Section

V). In particular, we provide a new algorithm update

rule with signiﬁcant computational simpliﬁcation, based

on the weighted updates. Furthermore, this form enables

comparison with Infomax ICA algorithm,

•Algorithm application results on some new set of syn-

thetic signals (in Section VI-A) as well as some Neu-

roimaging signals. In fact, a tool based on SBCA al-

gorithm for the neural activity identiﬁcation by using

calcium (Ca2+)ﬂuorescent imaging records is introduced

in Section VI-C.

The organization of the article is as follows: The SBCA setup

assumed throughout the article is introduced in Section II.

In Section III, SBCA framework is proposed. In the same

section, it is shown that the global maximizers of the proposed

objective are perfect separators. An iterative algorithm for

the proposed optimization setting is also provided. The al-

gorithmic extensions of SBCA for complex sources and noisy

case are provided in Section IV. In Section V, the algorithm

acceleration through weighted update rule is introduced, which

turns out to have a notable resemblance to Infomax (ICA)

algorithm. In Section VI, numerical examples demonstrating

the potential utility of the proposed approach relative to some

existing SCA algorithms, and the BCA algorithms in [1], [2]

are provided. We also illustrate the application of the proposed

algorithm, for image separation, and neuron/neural activity

identiﬁcation in Ca2+ ﬂuorescent imaging. Finally, Section VII

is the conclusion.

We deﬁne the following notation table for set A,G∈Rp×p,

x∈Rp,a, b, k ≥1∈Z:

Notation Explanation

kxka`a-norm deﬁned as a

qPp

n=1 |xn|a

kGka,b Induced matrix norm deﬁned as supkxkb≤1kGxka

G:,k (Gk,:)kth column (row) of G

CoAConvex Hull of the set A

1condition Indicator function: 1if the condition is true,

0otherwise.

sign{A}replaces positive (negative) entries of Awith 1 (-1)

Re{A}real part of A

Im{A}imaginary part of A

signc{A}complex sign operator:

Re{sign{A}} +iIm{sign{A}}

TABLE I: Notation Table.

II. SPARSE BO UN DE D COMPONENT ANALYSI S SET UP

Throughout this article, we assume the data model shown

in Fig. 1. In this setup,

•There are psources represented by the vector s=

s1s2. . . spT∈Rp. It is also assumed that

there are Lsamples of these sources represented by the

set S={s(n)∈Rp, n = 1,...L}. Furthermore, the

source vectors are bounded in magnitude and lie in an

`1-norm ball, i.e.,

s(n)∈ B(S), n = 1, . . . , L, (1)

where

B(S) = {q∈Rp:kqk1≤1}.(2)

This implies that each source takes its values from the

interval [−1,1] for the purpose of simplifying future

3

expressions, without any loss of generality. In the more

general case, Bscould be selected as a weighted `1-

norm ball. We also note that there are no stochastic

assumptions made about the source vector such as mutual

independence of its components.

In order to clarify the `1-norm ball choice for the sources,

we can provide the following explanation: the actual

indicator of non-sparseness is `0-norm which counts the

number of non-zero entries in a vector. The fact that `0

is non-convex, and it is not an actual norm, led to the

use of `1-norm as its algorithmic convex surrogate. It is

well-known that this replacement has led to remarkable

success in constructing sparseness centered algorithms.

In fact, it is the goal of this article to use `1-norm as

such surrogate to promote sparseness. It is clear that the

sources lying in `p-norm balls for some non-negative

p < 1, would also lie inside `1-norm ball, and therefore,

fall into the domain of interest where the framework

proposed in this article works.

•The mixing system is a linear and memoryless mapping

which is represented by the matrix H∈Rq×p. We

assume that q≥p, i.e., His a tall or square matrix.

In addition, it is also full rank.

•The memoriless mixtures of the sources are represented

with the vector y=y1y2. . . yqT. The relation

between the mixtures and the sources is deﬁned by

y(n) = Hs(n), n = 1, . . . , L. (3)

•W∈Rp×qis the separator matrix, and its outputs are

represented by the vector z=z1z2. . . zpT∈

Rp. The relation between separator outputs and mixtures

is given by

z(n) = Wy(n), n = 1, . . . L. (4)

•We deﬁne G∈Rp×pas the cascade of the separator and

the mixing systems, i.e., G=WH. This would deﬁne

the overall mapping from sources to the separator outputs

in the form

z(n) = Gs(n)n= 1, . . . , L. (5)

We can explicitly pose the BSS Problem as follows: Given

only the mixture samples Y={y(1),y(2),...,y(L)}, and

no information about the mixing system H, ﬁnd the original

source signals.

Due to its unsupervised nature, we can accomplish this goal

up to some uncertainty. For the ideal case, we expect only one

entry in each row of Gto be non-zero to achieve separation.

This condition can be represented as

G=PD,(6)

where Pis a permutation matrix and Dis a full rank diagonal

matrix. Therefore, we will refer to a Wmatrix as perfect

separator if and only if its corresponding Gmatrix satisiﬁes

the condition in (6).

III. SPARS E BC A FR AM EW OR K

In this section, we propose a new algorithmic framework

for the BSS problem introduced in the previous section.

The proposed approach is the adaptation of the geometric

framework introduced in [2], where we pose obtaining the

separator as maximizing the volume ratio of two objects

deﬁned for the separator outputs.

In Section III-A, we formulate the corresponding optimiza-

tion setting. In Section III-B, we show that all global optima

of this setting are perfect separators. In Section III-C, we

compare the proposed objective with the quasi ML based

SCA approaches. Finally, in Section III-D, we provide explicit

algorithm iterations to maximize the proposed objective.

A. Geometric Optimization for Sparse BCA

As stated above, in this section we propose a geometric

optimization setup for the sparse BCA problem. We start

by deﬁning the set of separator output samples for a given

separator matrix W, and the corresponding Gas

Z={Wy(1),Wy(2),...,Wy(L)},(7)

={Gs(1),Gs(2),...,Gs(L)}.(8)

The members of the set Z, i.e., the separator outputs, are

represented as (blue) dots at the right side of the Fig. 1.

Corresponding to this set, we deﬁne the following objects:

•Principal Hyper-Ellipsoid: This object reﬂects the

”shape” of the separator output samples based on their

sample covariance matrix. We ﬁrst deﬁne the sample

covariance matrix for Zas

ˆ

R(Z) = 1

L

L

X

n=1

z(n)z(n)T−ˆµ(Z)ˆµ(Z)T(9)

where ˆµ(Z)is the corresponding sample mean given by

ˆµ(Z) = 1

L

L

X

n=1

z(n).(10)

Based on these deﬁnitions, the principal hyper-ellipsoid

corresponding to Zis given by

E(Z) = {q: (q−ˆµ(Z))Tˆ

R(Z)−1(q−ˆµ(Z)) ≤1}(11)

The principal hyper-ellipsoid for the separator output

samples, E(Z)is illustrated as the 3-dimensional (red)

ellipsoid on the right of Fig. 1. In the same ﬁgure, on the

left, principal hyper-ellipsoid E(S)for the corresponding

source samples is also shown. Here, E(S)is deﬁned

similarly as

E(S) = {q: (q−ˆµ(S))Tˆ

R(S)−1(q−ˆµ(S)) ≤1}(12)

where ˆµ(S)is the sample mean for the source samples

in S, given by

ˆµ(S) = 1

L

L

X

n=1

s(n),(13)

4

and ˆ

R(S)is the sample covariance for the set Swhich

is given by

ˆ

R(S) = 1

L

L

X

n=1

s(n)s(n)T−ˆµ(S)ˆµ(S)T.(14)

Note that from the relation z(n) = Gs(n), it follows that

the relation ˆ

R(Z) = Gˆ

R(S)GTholds.

•Bounding `1-Norm Ball: This is the smallest `1-norm

ball containing a given compact set. For the separator

output samples, we deﬁne the corresponding bounding

`1-norm ball, B(Z)as

B(Z) = {q:kqk1≤max

n∈{1,...,L}kz(n)k1}.(15)

In Fig. 1, B(Z)is illustrated as the 3-dimensional dia-

mond shaped box on the right. In the same ﬁgure, B(S)

is shown on the left.

Based on these deﬁnitions, and following the geometric

approach in [2], we deﬁne the sparse BCA objective as the

volume ratio

¯

J(W) = Volume(E(Z))

Volume(B(Z)) (16)

=Ceqdet( ˆ

R(Z))

Cl(maxn∈{1,...,L}kz(n)k1)p(17)

where Ce=πp/2/Γ(p/2 + 1) is the scaling constant for the

p-dimensional hyper-ellipsoid and Cl= 2p/p!is the volume

for the unity `1-norm ball. Dropping the dimension dependent

constants Ce, Cl, the new objective is obtained as

J(W) = qdet( ˆ

R(Z))

(maxn∈{1,...,L}kz(n)k1)p.(18)

Therefore, the sparse BCA problem is posed as the problem

of choosing the separator matrix Wto maximize the objective

J(W)in (18). The denominator of the objective includes the

maximum `1norm of the separator output samples so that

maximization of the objective means minimization of the `1

norm of the separator output samples which is a well-known

way for measuring sparsity, whereas the determinant term at

the nominator simply acts as a regulator to avoid all zeros

solution and guarantee a full rank mapping between sources

and separator outputs. More detailed understanding of how

the maximization of this objective function enables perfect

separation is based on the proof in the next section. Therefore,

this objective function is a suitable criterion for sparse source

separation. In the next subsection, we prove that under a

certain local dominance assumption, the global maxima of this

optimization setting are perfect separators.

B. Global Optimality of Perfect Separators

In order to ensure the global optimality of the perfect

separators with respect to the objective in (18), we make use

of the following assumption:

Assumption I: Source sample set Scontains the vertices

of the bounding `1-norm ball B(S).

This deterministic assumption implies that for some sample

instants only one source is assumed to be active and for

each source there exists such sample instants. Therefore,

Assumption I implies a local dominance condition for all

sources.

Following theorem shows that, under the above local dom-

inance assumption, all global maxima of J(W)are perfect

separators:

Theorem: Given the BCA setup in Section II, if the

Assumption I is correct, then all global maxima of (18) are

perfect separators for which

G=αP,(19)

where Pis a permutation matrix and α6= 0 ∈R.

Proof: We start by writing the objective function, in terms

of the argument G=WH:

J(G) = qdet(Gˆ

R(S)GT)

maxn∈{1,...,L}kGs(n)k1p

=|det(G)|qdet( ˆ

R(S))

maxn∈{1,...,L}kGs(n)k1p.(20)

For the denominator of (20) we can write,

max

n∈{1,...,L}kGs(n)k1p

≤ kGkp

1,1ks(n)kp

1(21)

where kGk1,1is the induced matrix norm as deﬁned in the

notation table at the end of Section I, which can be explicitly

written as [31]

kGk1,1=

kG:,1k1kG:,2k1. . . kG:,pk1

∞.(22)

Since s(n)is in the `1-norm ball B(S), given in (2), we can

rewrite the inequality in (21) as

max

n∈{1,...,L}kGs(n)k1p

≤ kGkp

1,1(23)

If Assumption I holds, then (23) is an equality [31], therefore,

we can rewrite (20) as

J(G) = |det(G)|qdet( ˆ

R(S))

kG:,1k1kG:,2k1. . . kG:,pk1

p

∞

(24)

Based on this expression, we can write

J(G)

≤|det(G)|qdet( ˆ

R(S))

kG:,1k1kG:,2k1. . . kG:,pk1

1/pp(25)

≤|det(G)|qdet( ˆ

R(S))

kG:,1k1kG:,2k1. . . kG:,pk1

(26)

≤|det(G)|qdet( ˆ

R(S))

kG:,1k2kG:,2k2. . . kG:,pk2

(27)

≤qdet( ˆ

R(S)) (28)

where

5

•the inequality (25) is due to norm inequality (between

`1and `∞norms), with equality if and only if all the

columns of Ghas the same `1norm,

•(26) is due to arithmetic-geometric mean inequality, with

equality if and only if all the columns of Ghas the same

`1norm,

•(27) is due to norm inequality (between `1and `2norms),

with equality if and only if each column of Ghas only

one non-zero entry,

•(28) is due to Hadamard inequality, with equality if and

only if all the columns are orthogonal to each other.

As a result, the upper bound for the objective J(G)on the

right hand-side of (28) is achieved if and only if G=αP

where Pis a permutation matrix and α6= 0. It can be

shown that if the sources have varying ranges so that their

samples lie in a weighted `1-norm ball, then due to the scaling

indeterminacy of sources, the global optimizers would take the

more general form G=PD.

C. Comparison of Sparse BCA to Independent Component

Analysis with Sparsity Promoting Marginals

Taking the logarithm of the sparse BCA objective in (18)

converts the ratio form into a difference form which can be

written as a modiﬁed objective

J(W) = log(J(W))

=1

2log(det( ˆ

R(Z))) −plog max

n∈{1,...,L}kz(n)k1(29)

For the sample covariance of the separator outputs, we can

write

ˆ

R(Z) = Wˆ

R(Y)WT,(30)

where ˆ

R(Y)is the sample covariance matrix of mixtures given

by

ˆ

R(Y) = 1

L

L

X

n=1

y(n)y(n)T−ˆµ(Y)ˆµ(Y)T,(31)

with ˆµ(Y) = 1

LPL

n=1 y(n). Therefore, in the square case

p=q, we have det( ˆ

R(Z)) = |det(W)|2det( ˆ

R(Y)), and the

objective simpliﬁes to

J(W) = log(|det(W)|)−plog max

n∈{1,...,L}kz(n)k1,(32)

where we neglected the constant term 1

2log(det( ˆ

R(Y))).

The expression in (32) resembles the form of the maximum

likelihood formulation

L(W) = log(|det(W)|) +

p

X

k=1

L

X

n=1

log(fs(zk(n))),(33)

used for the ICA settings with independent sources and

samples where fs(·)is the presumed source marginal [7].

The outer summation in (33) reﬂects the independence of

sources and the inner summation reﬂects the assumption about

the independence of samples. When the marginal density is

Laplacian, this likelihood expression simpliﬁes to

L(W) = log(|det(W)|)−

L

X

n=1

kz(n)k1,(34)

The ML expression in (33) can be further generalized to quasi-

ML expression (e.g. [28])

L(W) = log(|det(W)|) +

p

X

k=1

L

X

n=1

v(zk(n)),(35)

where vis not necessarily derived from log(fs(·)).

The likelihood based expressions in (33) and (35) (and

therefore in (34)) assume the independence of sources as

well as which is reﬂected by the summations over source

components and samples. However, as an important differ-

ence, the deterministic sparse BCA framework based on the

objective in (32) (and more generally in (18) or (29)) does not

make such independence assumptions. Therefore, the proposed

sparse BCA scheme is applicable to dependent sources with

non-separable joint densities.

D. Iterative Algorithm for Sparse BCA

The modiﬁed sparse BCA objective J(W)in (29) is also

convenient for the iterative algorithm derivation, due to its

additive form. Although J(W)is non-convex and not differ-

entiable everywhere, we can still utilize Clarke subdifferential

[32] for deriving iterative algorithms.

The objective function J(W)is composed of two terms:

J(W) = (36)

=1

2log(det( ˆ

R(Z)))

| {z }

J1(W)

−plog

max

n∈{1,...,L}kz(n)k1

| {z }

J2(W)

.(37)

J1(W)in the ﬁrst term is a convex differentiable function, and

J2(W)in the second term is a convex non-smooth function.

In deriving the iterative update rule for maximizing J(W),

we use the gradient term for J1(W)and the subgradient term

for the J2(W):

•Gradient Term for J1(W): The gradient term to increase

the volume of principal hyper-ellipsoid E(Z)can be

written as

∇J1(W) = Wˆ

R(Y)WT−1

Wˆ

R(Y).(38)

•Subdifferential Set for J2(W): If we write down

J2(W)in the form

J2(W) = max

n∈{1,...,L}fn(W)(39)

where fn(W) = kWy(n)k1, then the subdifferential set

for J2can be written as [33],

∂J2(W) = Co ∪ {∂fi(W) : fi(W) = J2(W)}.(40)

6

Now, let us calculate the derivative of the map v→ kvk1.

For all v=v1· · · vN∈RN, one has kvk1=

|v1|+|v2|+· · · +|vN|. Therefore,

∂

∂vj

(v→ kvk1) = vj

|vj|= sign(vj),1≤j≤N(41)

provided that vj6= 0. If vj= 0, then we have

∂

∂vj

(v→ kvk1) = αj∈[−1,1] (42)

Hence, the subdifferential set for fn(W)is given by

∂fn(W) = ∂(kW1,:y(n)· · · Wp,:y(n)Tk1)(43)

=kh∂(W1,:y(n))

∂(W1,:)· · · ∂(W(p,:)y(n))

∂(Wp,:)iTk1(44)

={qy(n)T:qi=sign{zi(n)}+ 1zi(n)=0αi,

αi∈[−1,1]}(45)

Note that sign{z(n)}y(n)Tis a subgradient, a member

of ∂fn(W). Selecting this particular subgradient, a sub-

gradient choice for J2(W)can be written as

∂J2(W) = X

n∈IW

λnsign{z(n)}y(n)T(46)

where {IW={n:kWy(n)k1=J2(W)}is the set of

sample indices for which maximum `1-norm separator

output is achieved, and λn’s are convex combination

coefﬁcients satisfying λn≥0and Pn∈IWλn= 1.

Based on the gradient expression in (38) and the subgradient

choice in (46), an iterative update rule for maximizing the

SBCA objective J(W)can be written as

W(t+1) =W(t)+µ(t)(( ˆ

R(Z(t)))−1W(t)ˆ

R(Y)

−p

maxn∈{1,...,L}kz(t)(n)k1X

l∈IW(t)

λ(t)

lsign{z(t)(l)}y(l)T,

(47)

where tis the iteration index. As a special case, if we

choose only one of the convex combination coefﬁcients to be

non-zero, which corresponds to choosing an arbitrary index

location l(t)from IW(t)at every iteration, then the update

rule simpliﬁes to

W(t+1) =W(t)+µ(t)(( ˆ

R(Z(t)))−1W(t)ˆ

R(Y)

−p

maxn∈{1,...,L}kz(t)(n)k1

sign{z(t)(l(t))}y(l(t))T.(48)

IV. ALGORITHMIC EXTENSIONS FOR SBCA

This part of the article provides algorithmic extensions

for the Sparse BCA framework introduced in Section III. In

Section IV-A, we provide the extension of the SBCA iterative

algorithm for the case of complex signals. The algorithm

modiﬁcation for the noisy scenario is introduced in Section

IV-B.

A. Extension to Complex Signals

In order to extend the algorithms for the complex signals,

we follow the isomorphism based approach used in [2]. For

this purpose, we deﬁne the operator Υ : Cp→R2p

Υ(x) = [Re{xT} Im{xT}]T(49)

as an isomorphism between the pdimensional complex vectors

and 2pdimensional real vectors. For a given complex vector

x, we use the notation `

xto refer its real isomorphic vector,

i.e., `

x= Υ(x). Similarly, we also deﬁne Γ : Cp×q→R2p×2q

Γ(A) = Re{A} −Im{A}

Im{A}Re{A},(50)

for mapping complex matrices to real matrices, preserving the

complex matrix-vector multiplication operation for the real.

Based on these deﬁnitions, we can write

`

y(n) = Γ(H)`

s(n),(51)

`

z(n) = Γ(W)`

y(n).(52)

We use the notation `

Y={`

y:y∈ Y} and `

Z={`

z:z∈ Z}

for the real-isomorphic mixture and separator output vector

sets respectively.

We can obtain algorithm corresponding to the complex

SBCA setting by deﬁning the volume ratio objective in the

2pdimensional isomorphic real space:

Jc(W) = qdet( ˆ

R(`

Z))

maxn∈{1,...,L}k`

z(n)k12p.(53)

Using algebraic manipulations similar to [2], the corre-

sponding iterative update equation for Wcan be written as

W(t+1) =W(t)+µ(t)(W(t)

logdet −W(t)

subg)(54)

where

W(t)

logdet =1

2I 0 ˆ

R(`

Z(t))−1Γ(W(t))ˆ

R(`

Y)I

0

+0 I ˆ

R(`

Z(t))−1Γ(W(t))ˆ

R(`

Y)0

I

+jI 0 ˆ

R(`

Z(t))−1Γ(W(t))ˆ

R(`

Y)0

−I

+0 I ˆ

R(`

Z(t))−1Γ(W(t))ˆ

R(`

Y)I

0(55)

and

W(t)

subg =2p

max

n∈{1,...,L}k`

z(t)(n)k1

signc{z(t)(l(t))}y(l(t))H(56)

As an alternative, we can deﬁne the cost function,

Jca(W) = qdet( ˆ

R(Z))

maxn∈{1,...,L}k`

z(n)k1p.(57)

7

where the only change is in the covariance term in the nu-

merator, based on the equivalence |det(Γ(G))|=|det(G)|2

[2] and the degree of the denominator. In this case, the update

terms simplify to

W(t)

logdet =ˆ

R(Z(t))−1W(t)ˆ

R(Y),(58)

W(t)

subg =p

max

n∈{1,...,L}k`

z(t)(n)k1

signc{z(t)(l(t))}y(l(t))H.(59)

B. Iterative Algorithm for Noisy Case

When we replace the mixture model with its noisy version,

i.e.,

y(k) = Hs(k) + n(k)

=ynoiseless(k) + n(k)(60)

where n(k)corresponds to random noise sequence, the corre-

sponding noisy separator output could be written as

z=Wynoiseless(k) + Wn(k)

=znoiseless(k) + v(k).(61)

Here the main difﬁculty would be to identify the set {IW=

{n:kznoiseless(n)k1= maxk∈1,...,L kznoiseless (k)k1}, i.e.,

the peak locations for the `1-norm of the noiseless separator

outputs znoiseless(k). Due to the presence of noise, the de-

termination of these index locations from the noisy separator

outputs z(k)becomes a stochastic estimation problem. In such

a scenario, we assign certain probability to each index in

{1, . . . , L}. More explicitly, let pIW(k)denote the probability

mass function (pmf) for index kto be in IW, i.e.,

pIW(k) = Pr(k∈ IW),(62)

we can replace the update rule in (47) with

W(t+1) =W(t)+µ(t)(ˆ

R(Z(t))−1W(t)ˆ

R(Y)

−p

maxn∈{1,...,L}kz(t)(n)k1

L

X

l=1

pI(t)

W

(l)sign{z(t)(l)}y(l)T,

(63)

where the convex combination coefﬁcients λ(t)in (47) are

replaced by the pmf pIW(t)in (63). In this new form, we

essentially calculate the expected W(t)

subg term by weighted

averaging of the contributions of all index points.

Obtaining expression for the exact form of pIW(k)is

a cumbersome process, even for Gaussian noise scenario.

Instead, we use a simpliﬁed approximation for this pmf. We

ﬁrst deﬁne an estimate of the set IWas

ˆ

IW={n:kz(n)k1≥β×max

k∈{1,...,L}kz(k)k1,(64)

where 0< β ≤1is an algorithm parameter. This choice

of index set estimate corresponds to selecting index points,

whose `1-norm of the noisy separator output has value at some

neighborhood of the maximum `1-norm for the whole noisy

separator outputs. Based on this index set estimate, we deﬁne

the pmf estimate as

ˆpIW(k) = (1

|ˆ

IW|k∈ˆ

IW,

0otherwise. (65)

Note that this choice corresponds to a uniform density over

the selected index set. Instead, we could potentially select

the probability ˆpIW(k)in proportion to the corresponding

kz(k)k1value. Although, more accurate estimates of ˆpIW(k)

potentially are more desirable, the estimate in (65) yields

satisfactory results as illustrated by the numerical experiments

in Section VI.

C. Algorithm Extension with Sparsifying Transformations

In various applications, the signals of interest are not ”natu-

rally” sparse, but they may be converted to sparse form through

an appropriate linear transformation, i.e. a basis change. Let

the data snapshot matrices for source and mixtures be deﬁned

as

S=s(1) s(2) . . . s(L),(66)

Y=y(1) y(2) . . . y(L),(67)

respectively. Clearly, we have Y=HS.

Given the sparsifying transformation matrix Φ∈RL×L, we

can write the sources in transform domain as

ST=SΦ.(68)

It is desired that the applied transformation results in an

STwith sparse entries. Of course, in the application, the

transformation can only be applied to mixtures in the form

YT=YΦ.(69)

Substituting Y=HS,

YT=HSΦ,(70)

=HST.(71)

Therefore, the transformed mixtures are equivalent to the

mixtures of transformed sources with the same mixing matrix.

As a result, the proposed SBCA approach can be applied to

transformed mixtures YTinstead of the original mixtures. In

Section VI-B, we provide an example of this procedure, where

Morlet transformation is applied to image patches to convert

them into a sparse form.

V. ALGORITHM AC CE LE RATI ON F OR SBCA

In this section, some recipes for accelerating SBCA algo-

rithms are proposed. For this purpose, we offer two major

improvements: the ﬁrst one is the use of natural gradient

based weighted update which particularly simpliﬁes the update

expression. The second extension is the use of accelerated

gradient methods to improve the convergence speed.

A. Weighted Update

In BSS literature, natural gradient based learning rules

attracted particular attention due to its improved convergence

behavior [34]. Following this approach, the update components

in SBCA algorithm could be weighted by the positive matrix

WTW, which would yield

WlogdetWTW=Wˆ

R(Y)WT−1

Wˆ

R(Y)WTW

=W(72)

8

and

WsubgWTW=p

max

n∈{1,...,L}kz(n)k1

sign{z(l)}y(l)TWTW

=p

max

n∈{1,...,L}kz(n)k1

sign{z(l)}z(l)TW(73)

As a result, SBCA update rule in (48) can be replaced with

W(t+1) =W(t)+µ(t)(I

−p

max

n∈{1,...,L}kz(t)(n)k1

sign{z(t)(l(t))}z(t)(l(t))T

W(t).

(74)

We note that this form interestingly resembles the update

rule for Infomax (ICA) algorithm for supergaussian sources:

W(t+1) =W(t)+µ(t)(I

−2tanh{z(t)(l(t))}z(t)(l(t))T)W(t),(75)

where the main difference is that the update rule in (74) is

only applied to indices corresponding to peak (or near-peak in

the noisy case) `1-norm separator outputs for that iteration and

the update rule for Extended Infomax is sequentially applied

to all sample points, i.e., l(t+1) =l(t)+ 1 .

B. Nesterov Update Rule

In optimization algorithms, Nesterov’s acceleration method

is a common approach to improve the speed of convergence

[35]. This approach is originally proposed for smooth and

strongly convex functions, however, in our work we apply it to

a non-smooth and non-convex function. While the underlying

theory for the justiﬁcation of the original acceleration approach

is still an area of active research [36], our motivation for the

application of this rule is based on the empirical observation

that this update rule signiﬁcantly improves the speed of

convergence when applied to SBCA updates.

If we represent the original SBCA algorithm in the form

W(t+1) =W(t)+µ(t)U(t)(76)

where the update term U(t)can be replaced with any of the

update terms proposed in the previous sections, the Nesterov

acceleration based update rule can be written as

X(t+1) =W(t)+υ(t)U(t)(77)

W(t+1) =X(t+1) +κ(t)(X(t+1) −X(t))(78)

where X(t)is an intermediate algorithm variable, κ(t)=t−1

t+2

and υ(t)is the algorithm parameter to be selected. In this for-

mulation, the update rule essentially has memory of previous

update expressions represented by the ﬁrst expression.

Fig. 2: Illustration of the synthetically generated random sparse

sequences of Example 1.

Fig. 3: Scatter diagram for the random sparse sources of

Example 1.

VI. NUMERICAL EX PE RI ME NT S

In the experiments, we measure the Signal to (residual)

Interference and Noise power Ratio (SINR) for the separator

outputs after the convergence of the algorithm. SINR term is

mathematically deﬁned as,

SI N R =P

I+N(79)

where P=kG(m1,1),· · · ,G(mp, p)k2

2represents the

signal power, I=kG(m1,:)T,· · · ,G(mp,:)TTk2

F−

kG(m1,1),· · · ,G(mp, p)k2

2represents the interference

power, N=σnoisekWk2

Frepresents the noise power, and the

indices {m1,· · · , mp}represent the maximum locations in the

columns of the system response matrix G.

9

We compare the proposed algorithm’s SINR versus Sample

Size (L) performance to Infomax [21], Relative Newton [28],

Generalized Morphological Component Analysis (GMCA)

[37], Bounded Component Analysis (BCA) methods in [1] and

[2], and Nonnegative Least-Correlated Component Analysis

(nLCA) [38] (where we used the publicly available codes for

these algorithms as noted in the corresponding references).

A. Synthetic Sparse Sources

In the ﬁrst numerical example, we synthetically generate

sparse sources by transforming i.i.d. uniform random vector

u∈[−1,1]p, through the mapping

s=u u ∈ Br

0otherwise,(80)

where Br={x:kxkr≤1}with 0< r ≤1. Fig. 2

illustrates the discrete time sequences for p= 3 sources

generated through this mapping with r= 0.8, and Fig.3

shows their corresponding scatter diagram. These sources are

mixed through a random mixing matrix Hwith i.i.d. Gaussian

entries (with zero mean and unit variance). The mixtures are

also perturbed by i.i.d. Gaussian noise. As a special case, for

0 2 4 6 8 10

x 104

−10

0

10

20

30

40

Data Length (samples)

Output SINR (dB)

Sparse BCA

GMCA

Extended Infomax

Relative Newton

BCA Cruces

BCA Erdogan

Fig. 4: Output SINR vs.Data Length for Example 1, when the

random variables used in generation process are independent.

p= 4 sources, q= 8 mixtures and r= 0.8, the corresponding

SINR vs Data Length curves are shown in Fig. 4. Based on

this ﬁgure, we can comment that the Sparse BCA algorithm

provides a performance improvement in the range 3−5dB.

There are no performance results for nLCA-IVM since nLCA-

IVM assumes that number of mixtures is equal to number of

sources, and sources are non-negative while we use 8mixtures

for 4sources which can have negative values as well. We note

that BCA approaches in [1] and [2] perform worse, as they

assume the sources to be contained in `∞-norm ball.

As a modiﬁed experiment, if we use dependent uniform

variables in the source generation process of (80) based on

a Copula distribution with four degrees of freedom and a

correlation constant ρ= 0.7, we obtain the curves in Fig. 5.

The dependency introduced in the source generation process

appears to cause leveling in the performances of Extended

Infomax and Relative Newton algorithms, which emphasizes

0 2 4 6 8 10

x 104

−10

0

10

20

30

40

Data Length (samples)

Output SINR (dB)

Sparse BCA

GMCA

Extended Infomax

Relative Newton

BCA Cruces

BCA Erdogan

Fig. 5: Output SINR vs.Data Length for Example 1, when the

random variables used in generation process are correlated.

0 2 4 6 8 10

x 104

15

20

25

30

Data Length (samples)

Output SINR (dB)

Fig. 6: Performance of complex SBCA, when the random

variables used in generation process are correlated.

the performance gain of the Sparse BCA Algorithm, especially

for larger data sizes.

In order to evaluate the performance of Sparse BCA’s

complex extension, we again use the Copula distibuted syn-

thetic sources. We generate complex sources for complex

Sparse BCA by multiplying the Copula distributed signals

with independent and unit magnitude complex signals with

uniformly distributed (in [−π, π]) phase terms. As seen in

Fig. 6, the proposed SBCA framework successfully works for

complex sparse sources as well.

B. Natural Images

In the second example, three 204x204 pixel images are

mixed by a random Gaussian 3×3matrix and corrupted by

Gaussian noise. In Fig. 7, the input and mixed images for

the proposed algorithm are illustrated for a random mixing

matrix. As the ﬁrst stage of separation, we apply Morlet

wavelet transform [39] to the mixtures for the sparsiﬁcation,

as proposed in Section IV-C. On the other hand, we don’t

apply Morlet wavelet transform to the inputs of non-sparse

BCA methods in [1], [2], and nLCA-IVM method since they

10

Input Images Output ImagesMixed Images

Fig. 7: Image Separation Example - Original, noisy mixed and

output images for SBCA (SNR=20 dB).

were designed for non-sparse sources. In order to obtain sparse

source images, we divide the mixtures into 8×8patches and

apply the transform to these patches. The scatter plot of the

Morlet transformation coefﬁcients for the three images can be

seen in Fig. 8, which appears to conform the assumptions

made in the SBCA setup about the source samples. After

this transformation, we apply the same algorithms in the ﬁrst

example to the transformed mixtures. Fig. 9 shows the mean

output SINR vs. input SNR performance for each algorithm

(averaged over random Gaussian 3×3mixture matrices) which

illustrates that SBCA outperforms the other sparse methods at

all SNRs. SBCA do not exceed the non-sparse BCA methods’

performance because they use natural images directly while

SBCA needs the images to be sparsiﬁed which may cause

information loss. Also, comparison of the maximum output

SINRs is provided since Cruces’ BCA method extracts only

one source as the output while the other methods extract all

sources so that the comparison in Fig. 9 is not fair. As can

be seen in Fig. 10, maximum output SINRs of all the other

methods but nLCA-IVM exceed that of Cruces’ BCA.

To evaluate the acceleration performance of Nesterov Up-

date method explained in section V, we compare SINR change

Fig. 8: Image Separation Example - Scatter plot (with bound-

ing `1-norm ball) of the transform coefﬁcients corresponding

to the 3input images.

10 15 20 25 30

0

5

10

15

20

25

30

Input SNR (dB)

Output SINR(dB)

Sparse BCA

GMCA

Extended Infomax

Relative Newton

BCA Cruces

BCA Erdogan

nLCA−IVM

Fig. 9: Image Separation Example - Output SINR vs Input

SNR.

rates of the update methods with respect to iteration number. In

the analysis, the images are mixed by a random Gaussian 6×3

matrix, i.e., there are 6 mixtures. The mixtures are corrupted

by Gaussian noise with respect to input SNR=30 dB. As can

been seen in Fig.11, this study yields that Nesterov Update

method get Sparse BCA to converge to the maximum output

SINR much faster than ordinary update method does.

C. Neuroimaging

As the last example, we consider the problem of iden-

tifying neurons and their activities from a calcium (Ca2+)

imaging based video recording. In calcium imaging, temporal

activations of neurons are reﬂected as ﬂuorescent emissions

and recorded via a digital camera. Since neurons can be

active simultaneously, their activity signals are mixed which

yields a typical mixing problem. Neural activities denote the

time courses of the neurons’ relative strength, i.e., tempo-

ral weights. Multiplying the calcium imaging data by the

estimated separator matrix gives a data matrix whose rows

present the neural activities. The columns of the inverse

10 15 20 25 30

5

10

15

20

25

30

Input SNR (dB)

Output SINR(dB)

Sparse BCA

GMCA

Extended Infomax

Relative Newton

BCA Cruces

BCA Erdogan

nLCA−IVM

Fig. 10: Image Separation Example - Maximum Output SINR

vs Input SNR.

11

0 100 200 300 400 500 600

10

15

20

25

Number of Iterations

Output SINR (dB)

Ordinary Update

Nesterov Update

Fig. 11: Image Separation Example - SBCA with Nesterov

Update vs SBCA with Ordinary Update.

separator matrix give the corresponding neurons generating the

activities. Since only a small group of the neurons are active

at one time, the problem conforms the sparsity assumption.

In this example, we used the video recording provided as a

supplementary for the article [40], where the CA1 hippocam-

pal place cell activity of the freely moving mouse is stud-

ied (This movie ﬁle is available in https://www.nature.com/

neuro/journal/v16/n3/full/nn.3329.html.) As the main process-

Fig. 12: A frame of the Ca2+ imaging movie of [40] (Used

by the permission of Nature Publishing Group). In the same

ﬁgure, a rectangular subwindow is also marked.

Fig. 13: A component map obtained by applying SBCA

algorithm to the subwindow marked in Fig. 12, and wavelet

denoised version of the component map.

ing step, we divide each frame into 32 ×32 subwindows.

Fig. 12 illustrates a frame from the movie and a sample

subwindow. Let’s assume that the corresponding movie can

be represented by a three dimensional array R[m, n, f ], where

(m, n, f )∈ {1, . . . , M} × {1, . . . , N } × {1, . . . , F }, where

m, n are pixel location indices, row and column respectively,

fis the frame index, M(N) is the number of rows (columns)

for each frame, and Fis the number of frames. Then for each

K×Ksub-movie can be represented as

R(i,j)[m, n, f ] = R[(i−1)S+m, (j−1)S+n, f]

where (i, j)∈ {1,...,M−K

S+ 1}×{1,...,N−K

S+ 1},

and m, n ∈ {1, . . . , K}.

The vectorized version of each subwindow of a frame

constitutes observation vector for the SBCA algorithm (The

vectorization is performed through the vec operator which

stacks the columns of the matrix corresponding to the sub-

window image):

Y(i,j)=vec(R(i,j)[:,:,1]),...,vec(R(i,j )[:,:, F ]) .

In this recording, the number of observations (frames) is

equal to F= 1339. SBCA algorithm is applied to each

subwindow. Fig. 13 shows the component map obtained by

converting a column of the pseudoinverse of Wto 32 ×32

image. We apply wavelet denoising based on (Symlet wavelet)

using wdencmp function of MATLAB, using ’sym4’ as the

wavelet choice. The resulting denoised component map is

shown in Fig. 13 also.

In Fig. 14, sample denoised component maps and their

activations are shown. The activation structures indeed conﬁrm

the sparsity property exploited by the SBCA algorithm. The

presented component analysis approach can be further devel-

oped into a full neuron-sorting algorithm registering neuron

locations and their activations for the full movie [41].

Fig. 14: Some SBCA (denoised) component maps correspond-

ing to the subwindow in Fig. 12 (in the ﬁrst row (a)-(e)), and

their activations (in the second row (f)-(i))

VII. CONCLUSION

In this article, we proposed a deterministic Bounded Com-

ponent Analysis approach for sparse-bounded sources. The

resulting framework is the extension of the algorithmic BCA

framework in [2] to sparse case. As in the case of [2], no

assumption is made about the independence of sources or sam-

ples. All the global maxima of the corresponding geometric

optimization setting are proven to be perfect separators, and

the update rule derived from this setting is shown to resemble

the form of the Infomax algorithm for supergaussian sources.

The numerical examples demonstrate the potential practical

merit of the proposed framework.

12

REFERENCES

[1] S. Cruces, “Bounded component analysis of linear mixtures: A criterion

for minimum convex perimeter,” IEEE Trans. on Signal Process., vol.

58, no. 4, pp. 2141–2154, April 2010.

[2] A.T. Erdogan, “A class of bounded component analysis algorithms

for the separation of both independent and dependent sources,” Signal

Processing, IEEE Transactions on, vol. 61, no. 22, pp. 5730–5743, 2013.

[3] P. Aguilera, S. Cruces, I. Duran, A. Sarmiento, and D.P. Mandic, “Blind

separation of dependent sources with a bounded component analysis

deﬂationary algorithm,” IEEE Signal Processing Letters, vol. 20, no. 7,

pp. 709–712, 2013.

[4] H.A. Inan and A.T. Erdogan, “Convolutive bounded component analysis

algorithms for independent and dependent source separation,” Neural

Networks and Learning Systems, IEEE Transactions on, vol. 26, no. 4,

pp. 697–708, April 2015.

[5] H.A. Inan and A.T. Erdogan, “A convolutive bounded component

analysis framework for potentially nonstationary independent and/or

dependent sources,” Signal Processing, IEEE Transactions on, vol. 63,

no. 1, pp. 18–30, Jan 2015.

[6] Christian Jutten and Jeanny Herault, “Blind separation of sources, part

i: An adaptive algorithm based on neuromimetic architecture,” Signal

Processing, vol. 24, no. 1, pp. 1 – 10, 1991.

[7] Pierre Comon and Christian Jutten, Handbook of Blind Source Sepa-

ration: Independent Component Analysis and Applications, Academic

Press, 2010.

[8] P. Comon, “Independent component analysis, A new concept?,” Signal

Processing, vol. 36, pp. 287–314, April 1994.

[9] Anthony J Bell and Terrence J Sejnowski, “An information-

maximization approach to blind separation and blind deconvolution,”

Neural computation, vol. 7, no. 6, pp. 1129–1159, 1995.

[10] Tzyy-Ping Jung, Scott Makeig, Colin Humphries, Te-Won Lee, Martin J

Mckeown, Vicente Iragui, and Terrence J Sejnowski, “Removing

electroencephalographic artifacts by blind source separation,” Psy-

chophysiology, vol. 37, no. 2, pp. 163–178, 2000.

[11] Aapo Hyv¨

arinen, Juha Karhunen, and Erkki Oja, Independent Compo-

nent Analysis, John Wiley and Sons Inc., 2001.

[12] Stephane Mallat, A wavelet tour of signal processing: the sparse way,

Academic press, 2008.

[13] David L Donoho and Jain M Johnstone, “Ideal spatial adaptation by

wavelet shrinkage,” biometrika, vol. 81, no. 3, pp. 425–455, 1994.

[14] Bruno A Olshausen and David J Field, “Emergence of simple-cell

receptive ﬁeld properties by learning a sparse code for natural images,”

Nature, vol. 381, no. 6583, pp. 607, 1996.

[15] Bruno A Olshausen and David J Field, “Sparse coding of sensory

inputs,” Current opinion in neurobiology, vol. 14, no. 4, pp. 481–487,

2004.

[16] Honglak Lee, Alexis Battle, Rajat Raina, and Andrew Y Ng, “Efﬁcient

sparse coding algorithms,” in Advances in neural information processing

systems, 2007, pp. 801–808.

[17] Quoc V Le, “Building high-level features using large scale unsupervised

learning,” in Acoustics, Speech and Signal Processing (ICASSP), 2013

IEEE International Conference on. IEEE, 2013, pp. 8595–8598.

[18] David L Donoho, “Compressed sensing,” IEEE Transactions on

information theory, vol. 52, no. 4, pp. 1289–1306, 2006.

[19] Emmanuel J Candes and Terence Tao, “Near-optimal signal recovery

from random projections: Universal encoding strategies?,” IEEE trans-

actions on information theory, vol. 52, no. 12, pp. 5406–5425, 2006.

[20] Richard Baraniuk, Emmanuel Cand`

es, Robert Nowak, and Martin Vet-

terli, “Compressive sampling [from the guest editors],” IEEE signal

processing magazine, vol. 25, no. 2, pp. 12–13, 2008.

[21] Te-Won Lee, Mark Girolami, and Terrence J Sejnowski, “Independent

component analysis using an extended infomax algorithm for mixed

subgaussian and supergaussian sources,” Neural computation, vol. 11,

no. 2, pp. 417–441, 1999.

[22] Fr´

ed´

eric Abrard and Yannick Deville, “A time–frequency blind signal

separation method applicable to underdetermined mixtures of dependent

sources,” Signal Processing, vol. 85, no. 7, pp. 1389–1403, 2005.

[23] R´

emi Gribonval and Sylvain Lesage, “A survey of sparse component

analysis for blind source separation: principles, perspectives, and new

challenges,” in ESANN’06 proceedings-14th European Symposium on

Artiﬁcial Neural Networks. d-side publi., 2006, pp. 323–330.

[24] P. Georgiev, F. Theis, and A. Cichocki, “Sparse component analysis

and blind source separation of underdetermined mixtures,” IEEE

Transactions on Neural Networks, vol. 16, no. 4, pp. 992–996, July

2005.

[25] Yuanqing Li, Shun-Ichi Amari, Andrzej Cichocki, Daniel WC Ho, and

Shengli Xie, “Underdetermined blind source separation based on sparse

representation,” IEEE Transactions on signal processing, vol. 54, no. 2,

pp. 423–437, 2006.

[26] Bruno A Olshausen and David J Field, “Sparse coding with an

overcomplete basis set: A strategy employed by v1?,” Vision research,

vol. 37, no. 23, pp. 3311–3325, 1997.

[27] Sanjeev Arora, Rong Ge, Tengyu Ma, and Ankur Moitra, “Simple,

efﬁcient, and neural algorithms for sparse coding,” in Conference on

Learning Theory, 2015, pp. 113–149.

[28] Michael Zibulevsky and Barak A Pearlmutter, “Blind source separation

by sparse decomposition in a signal dictionary,” Neural computation,

vol. 13, no. 4, pp. 863–882, 2001.

[29] Dinh-Tuan Pham and P. Garat, “Blind separation of mixture of

independent sources through a quasi-maximum likelihood approach,”

IEEE Trans. on Signal Processing, vol. 45, pp. 1712 – 1725, July 1997.

[30] Eren Babatas and Alper T Erdogan, “Sparse bounded component

analysis,” in Machine Learning for Signal Processing (MLSP), 2016

IEEE 26th International Workshop on. IEEE, 2016, pp. 1–6.

[31] Lloyd N Trefethen and David Bau III, Numerical linear algebra, vol. 50,

Siam, 1997.

[32] Adil Bagirov, Napsu Karmitsa, and Marko M M¨

akel¨

a, Introduction to

Nonsmooth Optimization: theory, practice and software, Springer, 2014.

[33] Stephen Boyd and Lieven Vandenberghe, Convex Optimization, Stanford

University, 2001.

[34] Shun-Ichi Amari, “Natural gradient works efﬁciently in learning,”

Neural computation, vol. 10, no. 2, pp. 251–276, 1998.

[35] Yurii Nesterov, Introductory lectures on convex optimization: A basic

course, vol. 87, Springer Science & Business Media, 2013.

[36] Weijie Su, Stephen Boyd, and Emmanuel J Candes, “A differential

equation for modeling Nesterov’s accelerated gradient method: theory

and insights,” Journal of Machine Learning Research, vol. 17, no. 153,

pp. 1–43, 2016.

[37] J´

erˆ

ome Bobin, Jean-Luc Starck, Jalal M Fadili, and Yassir Moudden,

“Sparsity and morphological diversity in blind source separation,” IEEE

Transactions on Image Processing, vol. 16, no. 11, pp. 2662–2674, 2007.

[38] F.-Y. Wang, Chong-Yung Chi, T.-H. Chan, and Y. Wang, “Nonnegative

least-correlated component analysis for separation of dependent sources

by volume maximization,” IEEE Trans. Pattern Analysis and Machine

Intelligence, vol. 32, no. 5, pp. 875–888, May 2010.

[39] Alexander Grossmann and Jean Morlet, “Decomposition of hardy

functions into square integrable wavelets of constant shape,” SIAM

journal on mathematical analysis, vol. 15, no. 4, pp. 723–736, 1984.

[40] Yaniv Ziv, Laurie D Burns, Eric D Cocker, Elizabeth O Hamel, Kunal K

Ghosh, Lacey J Kitch, Abbas El Gamal, and Mark J Schnitzer, “Long-

term dynamics of ca1 hippocampal place codes,” Nature neuroscience,

vol. 16, no. 3, pp. 264–266, 2013.

[41] Eren Babatas and Alper T Erdogan, “Sparse bounded component

analysis based neuron sorting for ca2+ imaging,” in preparation.