Discriminative Learning of Propagation and Spatial Pattern for Motor Imagery EEG Analysis

Article (PDF Available)inNeural Computation 25(10):2709-2733 · October 2013with 76 Reads
DOI: 10.1162/NECO_a_00500 · Source: PubMed
Cite this publication
Abstract
Effective learning and recovery of relevant source brain activity patterns is a major challenge to brain-computer interface using scalp EEG. Various spatial filtering solutions have been developed. Most current methods estimate an instantaneous demixing with the assumption of uncorrelatedness of the source signals. However, recent evidence in neuroscience suggests that multiple brain regions cooperate, especially during motor imagery, a major modality of brain activity for brain-computer interface. In this sense, methods that assume uncorrelatedness of the sources become inaccurate. Therefore, we are promoting a new methodology that considers both volume conduction effect and signal propagation between multiple brain regions. Specifically, we propose a novel discriminative algorithm for joint learning of propagation and spatial pattern with an iterative optimization solution. To validate the new methodology, we conduct experiments involving 16 healthy subjects and perform numerical analysis of the proposed algorithm for EEG classification in motor imagery brain-computer interface. Results from extensive analysis validate the effectiveness of the new methodology with high statistical significance.
LETTER Communicated by Stefan Haufe
Discriminative Learning of Propagation and Spatial Pattern
for Motor Imagery EEG Analysis
Xinyang Li
a0068297@nus.edu.sg
NUS Graduate School for Integrative Sciences and Engineering,
National University of Singapore 119613
Haihong Zhang
hhzhang@i2r.a-star.edu.sg
Cuntai Guan
ctguan@i2r.a-star.edu.sg
Institute for Infocomm Research, A*STAR, Singapore 138632
Sim Heng Ong
eleongsh@nus.edu.sg
Department of Electrical and C omputer Engineering and Department
of Bioengineering, National University of Singapore 119613
Kai Keng Ang
kkang@i2r.a-star.edu.sg
Yaozhang Pan
yzpan@i2r.a-star.edu.sg
Institute for Infocomm Research, A*STAR, Singapore 138632
Effective learning and recovery of relevant source brain activity patterns
is a major challenge to brain-computer interface using scalp EEG. Various
spatial filtering solutions have been developed. Most current methods
estimate an instantaneous demixing with the assumption of uncorrelat-
edness of the source signals. However, recent evidence in neuroscience
suggests that multiple brain regions cooperate, especially during motor
imagery, a major modality of brain activity for brain-computer interface.
In this sense, methods that assume uncorrelatedness of the sources be-
come inaccurate. Therefore, we are promoting a new methodology that
considers both volume conduction effect and signal propagation between
multiple brain regions. Specifically, we propose a novel discriminative
algorithm for joint learning of propagation and spatial pattern with an
iterative optimization solution. To validate the new methodology, we
conduct experiments involving 16 healthy subjects and perform numer-
ical analysis of the proposed algorithm for EEG classification in mo-
tor imagery brain-computer interface. Results from extensive analysis
Neural Computation 25, 2709–2733 (2013)
c
2013 Massachusetts Institute of Technology
doi:10.1162/NECO_a_00500
2710 X. Li et al.
validate the effectiveness of the new methodology with high statistical
significance.
1 Introduction
Scalp EEG signals are stochastic, nonlinear, and nonstationary (Guler,
Kiymik, Akin, & Alkan, 2001) and have relatively low spatial resolution.
Therefore, it has been a considerable challenge to compute discriminative
and robust features for detecting the brain activity of interest, especially in
single-trial brain-computer interface (BCI) studies(Li & Guan, 2006; Llera,
Gomez, & Kappen, 2012). In this letter, we consider BCI using motor im-
agery, although the general methodology can be applied to other brain sig-
nals. Motor imagery is a dynamic brain state that can induce the same motor
representation internally as motor execution (Jeannerod, 1995). In particu-
lar, distinctive brain signals of event-related desynchronization (ERD) and
event-related synchronization (ERS) are detectable from EEG during mo-
tor imagery (Stavrinou, Moraru, Cimponeriu, Stefania, & Bezerianos, 2007;
Pfurtscheller, Brunner, Schlogl, & da Silva, 2006). Therefore, motor imagery
becomes an important modality in developing BCI systems (Lo et al., 2010;
Ang, Chin, Zhang, & Guan, 2008; Vidaurre, Sannelli, Muller, & Blankertz,
2011).
To improve the signal-to-noise ratio, spatial filtering has been widely
used to counter volume conduction effects (Blankertz, Tomika, Lemm,
Kawanabe, & Muller, 2008). In motor imagery, EEG classification, probably
the most recognized technique, is a common spatial pattern (CSP) (Ramoser,
Muller-Gerking, & Pfurtscheller, 2000). In CSP, the desired spatial filters are
designed to extract prominent ERD/ERS by maximizing the variance of the
projected signal under one condition while minimizing it under the other
(Koles, 1991; Gerkinga, Pfurtscheller, & Flyvbjergc, 1999). Various methods
have been proposed to improve the performance of CSP by addressing the
problem of selecting proper time segments or frequency bands of EEG. In
Lemm, Blankertz, Curio, and Muller (2005), common spatiospectral pattern
(CSSP) optimizes a simple filter by adding a one-time-delayed sample to
have more channels. In Dornhege et al. (2006), common sparse spectral
spatial pattern (CSSSP) extends CSSP by adding the optimization of a com-
plete global spatial-temporal filter into CSP. In Ang, Chin, Wang, Guan, and
Zhang (2012), Ang, Chin, Zhang, and Guan (2012), and Thomas, Guan, Lau,
Vinod, and Ang (2009), EEG signals are decomposed into several frequency
bands, CSP is applied to different bands independently, and feature fusion
or classifier fusion is introduced to produce final classification results. These
methods either implicitly or explicitly assume that raw scalp EEG wave-
forms are generated by uncorrelated source signals, and subsequently, they
may not account for more complicated brain signal dynamics such as causal
propagation between different brain regions.
Discriminative Learning of EEG Propagation 2711
Recently brain activities during motor imagery other than ERD/ERS
have been observed in multifunctional areas using functional magnetic
resonance imaging (fMRI) or EEG (Formaggio, Storti, Cerini, Fiaschi, &
Manganotti, 2010; Chen, Yang, Liao, Gong, & Shen, 2009). In particular, the
analysis of neural connectivity is gaining more attention in neuroscience
because it describes the general functioning of the brain and communi-
cation among its different regions (Astolfi et al., 2006; Ewald, Marzetti,
Zappasodi, Meinecke, & Nolte, 2012). For example, causal connectivity is
found in motor-related core regions such as the primary motor cortex (M1)
and supplementary motor area (SMA) during motor imagery (Chen et al.,
2009). The causal flow or time-lagged correlation is beyond volume con-
duction and is caused by possible neuronal propagation (Gomez-Herrero,
Atienza, Egiazarian, & Cantero, 2008). To investigate such propagation ef-
fects, directed transfer function (DTF) has been used to evaluate causal flow
between any given pair of channels in a multichannel EEG in frequency do-
main, which was introduced in Baccala and Sameshima (2001), Kaminski
and Blinowska (1991), and Kaminski, Ding, Truccolo, and Bressle (2001).
This estimation of DTF is based on a multivariate autoregressive model
(MVAR), and, more importantly it has been applied to EEG data of vol-
untary finger movement and motor imagery for event-related causal flow
investigation (Ginter, Blinowska, Kaminski, & Durka, 2001; Schlogl & Supp,
2006). Kus, Kaminski, and Blinowska (2004) found that there is a rapid in-
crease of information outflow from electrodes Fc3 and C3 caused by ERS,
and propagation of β-synchronization from Fc3 and Fc1 to C3, C1, Cz, Cp3
and Cp1 exists, which gives evidence of communication among sensori-
motor areas. However, looking at only the time profiles of ERD/ERS, it is
difficult to determine the primary source of activity; hence, existing instan-
taneous demixing models are not capable of modeling signal propagation
among underlying ERD/ERS sources.
In the presence of neuronal propagation and causal relationship dur-
ing motor imagery, conventional spatial filter design methodology is not
sufficient to capture the underlying brain activities (Dyrholm, Makeig,
& Hansen, 2007; Bahramisharif, van Gerven, Schoffelen, Ghahramani, &
Heskes, 2012). We would like to note that although some of the connec-
tivity measurements mentioned above have been explored already (Wei,
Wang, Gao, & Gao, 2007; Gysels & Celka, 2007), only scalp connectivity
and intrachannel synchronization measurements are directly used as fea-
tures, whereas volume conduction effects are not rigorously addressed.
One consequence would be that bandpower variations are misinterpreted
as changes in connectivity (Grosse-Wentrup, 2009).
Therefore, rather than ignoring the connectivity or propagation between
sources in spatial filter design or using scalp connectivity directly as fea-
tures, we would like to promote a computational model that can more
accurately describe the underlying processes by considering both neuronal
propagation and volume conduction effects.
2712 X. Li et al.
In this work, we devise a novel discriminative learning model for motor
imagery EEG based on a multivariate convolutive process with an analysis
of the spurious effects in classifying ERD/ERS based on an instant linear
mixture model. The effectiveness of introducing a time-lagged demixing
matrix to produce time-decorrelated data is analyzed theoretically from the
perspective of background noise elimination. Furthermore, the demixing
matrices accounting for propagation and volume conduction are estimated
jointly and iteratively in the proposed unified model. From the experi-
mental study, we evaluate the efficiency of the new methodology in terms
of classification accuracy in the two-class motor imagery EEG classifica-
tion problem. We also analyze the effectiveness of the proposed method
for background noise elimination using the K
¨
ullback-Leibler divergence
measure.
This letter is organized as follows. In section 2, we discuss limitations
of conventional spatial filter design and the necessity of considering the
causal propagation. Then we give the details of the proposed discriminative
learning of propagation and spatial pattern. In section 3, the validity of the
proposed method is verified by experimental studies on two-class motor
imagery classification. Our concluding remarks are in section 4.
2 Discriminative Learning of Propagation and Spatial Pattern
2.1 Data Model and Problem Formulation. Let X(t) be the timeseries
of a multichannel EEG signal, with each component in X(t) representing a
particular EEG channel measured at time t. Considering the complex tem-
poral dynamics, especially the latent causal relations in X(t), we describe
the observed data X(t) as an m-dimension linear convolutive mixture pro-
cess of order l (Dyrholm et al., 2007; Mørup, Madsen, & Hansen, 2009),
X(t) =
l
τ =1
(τ )S(t τ), (2.1)
where S(t) is the source signal of interest, (τ ) is the projection matrix of
the order τ,andl is the maximum time-lagged order. When l = 0, the ob-
served data X(t) is an instant mixing process. For simplicity of description,
the additive EEG noise can be described by an component in S(t).Conven-
tionally, it is assumed in motor imagery EEG classification that X(t) is an
instant linear mixture of source signals. This leads to an instant demixing
solution to the estimation of S(t),
ˆ
S(t) = WX(t), (2.2)
where W is the projection or demixing matrix containing m rows, and each
row of W is effectively a spatial filter w.
Discriminative Learning of EEG Propagation 2713
Interestingly, we note that the estimate
ˆ
S(t) given by equation 2.2 is also
a mixture of the time-lagged components,
ˆ
S(t) =
τ
w
)S(t τ), (2.3)
where
w
) = W(τ ) is a mixing matrix.
A perfect solution would b e that
w
) takes an identity matrix form
for τ = 0 and a zero matrix form for any τ = 0. This is generally impossible
except in the exceptional case that (τ ) = 0 for τ = 0, or, in other words,
when the convolutive mixture model in equation 2.1 reduces to an instant
mixing model.
Remark 1. In discriminative analysis, the spatial filter W is designed to ex-
tract the most discriminative signal
ˆ
S(t). However, due to the time-lagged
relationships, discriminative signals are still mixed with nondiscriminative
ones in
ˆ
S(t). Therefore, it is necessary to take the causal flow into consider-
ation, together with spatial filter design in a unified model, to have a better
estimation of S(t), which is the motivation of this letter.
Solving the reconstruction problem of S(t) from equation 2.1 may lead
to a solution in the form of an infinite impulse response (IIR) filter. As we
will elaborate shortly and also for practical use, we simplify the problem
into a finite impulse response (FIR) filter given by
S(t) = W(X(t)
p
τ =1
A )X(t τ)), (2.4)
where A ) is the demixing matrix of the order τ that accounts for the
time-lagged propagation effect.
Remark 2. The manipulation of simplifying the IIR form into the FIR
form is for the convenience of practical implementation. Practically, this
mixing effect can be accounted for by a finite number of orders, while the
rest can be ignored. Although not rigorously proven, the feasibility of this
simplification in the discriminative problem will be discussed and validated
by the experimental results in section 3.
For the convenience of presentation and analysis, we divide the recon-
struction problem of S(t) into two parts. First, we define
X(t) = X(t)
p
τ =1
A )X(t τ), (2.5)
2714 X. Li et al.
where
X(t) is the signal processed by a finite multivariate FIR filter of order
p. We refer to it as the time-decorrelated data in the following discussion.
The source signal can be recovered from the time-decorrelated data
X(t) by
S(t) = W
X(t). (2.6)
It is interesting that reconstructing S(t) based on equations 2.6 and 2.5 re-
sembles the classical causal connectivity estimation based on MVAR analy-
sis (Dyrholm et al., 2007; Gomez-Herrero et al., 2008; Haufe, Tomioka, Nolte,
Muller, & Kawanabe, 2010), where the process S(t) is usually defined as a
temporally and spatially uncorrelated time sequence. Different from the
connectivity identification, the objective in this letter lies in discriminative
learning. Therefore, rather than modeling the signals, the demixing matrix
A ) is used to construct the ERD/ERS sources from the measurements.
Moreover, S(t) does correspond not to the innovation process but to the
ERD/ERS sources, which we explain in detail in the appendix. The objec-
tive in estimating A ) is the variance difference between two classes but
not the independence of the source, so that the discriminative power of S(t)
is maintained. Based on the convolutive model, possible propagation effects
can be addressed in the discriminative model. Details of joint estimation of
A ) and W in equations 2.6 and 2.5 for the objective of classification are
introduced in the following section.
2.2 Joint Estimation of Propagation and Spatial Pattern. We introduce
the principle of CSP in the design of joint estimation of propagation and
spatial pattern. As CSP can be viewed as a spatial transformation, the prin-
ciple lies in maximizing the power of the transformed signal for one class
while minimizing it for the other. The normalized sample covariance matrix
R
i
of trial i is obtained as
R
i
=
X
i
X
T
i
tr(X
i
X
T
i
)
, (2.7)
where tr(·) is the trace of a matrix. In this letter, we consider only the binary
classification problem, and the two classes are indexed by c ={0, 1}. Let
Q
c
denote the set of trials that belong to class c such that Q
0
Q
1
= .The
average covariance matrix for each class is then calculated as
R
(c)
=
1
|Q
c
|
iQ
c
R
i
, (2.8)
where |
Q
c
| denotes the total number of samples belonging to set Q
c
. Suppose
the signal power is to be maximized for class 0; the objective function in
CSP is given by
max
w
wR
(0)
w
T
s.t. w (R
(0)
+ R
(1)
)w
T
= 1. (2.9)
Discriminative Learning of EEG Propagation 2715
Note that the dependence of EEG signals (in equation 2.8 and onwards) on
time is implied unless otherwise stated. The idea of discriminating the
EEG signals of two different motor imagery classes in terms of power
(the variance of the projected signal) in equation 2.9 is directly related
to the nature of ERD/ERS. Therefore, we deal with the estimation of S(t) in
the proposed model by adopting variance differentiation as the objective.
To embed the estimation of A ) in equation 2.4 into the objective function,
equation 2.9, we rewrite equation 2.5 to make the relationship between raw
EEG data X and the time-decorrelated data
X more compact by defining
ˆ
A ) =
I= 0
A ), τ > 0
, (2.10)
which we refer to as the time-lagged demixing matrix for the simplicity.
Therefore,
X(t) in equation 2.5 becomes
X(t) =
p
τ =0
ˆ
A )X(t τ). (2.11)
Similarly, the covariance matrix of
X(t) is
R
i
=
X
i
X
T
i
tr(
X
i
X
T
i
)
, (2.12)
and the average covariance based on
X(t) for each class is
R
(c)
=
1
|Q
c
|
iQ
c
R
i
. (2.13)
Replacing R
(c)
in equation 2.9 with
R
(c)
and considering equations 2.11 and
2.12, the optimization problem becomes
max
w,
ˆ
A )
w
p
τ
1
=0
p
τ
2
=0
ˆ
A
1
)R
(0)
)
ˆ
A
2
)
w
T
, s.t.
w
p
τ
1
=0
p
τ
2
=0
ˆ
A
1
)(R
(0)
) + R
(1)
)
ˆ
A
2
))w
T
= 1, (2.14)
where R
(c)
) =
1
|Q
c
|
iQ
c
X
i
(t τ
1
)(X
i
(t τ
2
))
T
. In this way, the estima-
tion of model 2.4 is achieved by solving the optimization problem in equa-
tion 2.14. Moreover, as shown in equation 2.14, only one
ˆ
A ),asapart
2716 X. Li et al.
of the feature extraction model, is obtained on the completion of the opti-
mization since the calculation is conducted with the averaged covariance
matrix R
(c)
) over all the trials. This is very different from the regression
model in connectivity analysis, in which the estimated models are different
for different trials.
Because the above objective function can be highly nonlinear, we adopt
an iteration procedure to estimate w and
ˆ
A ). Since both of the estimations
of the spatial filter w and the time-lagged demixing matrix
ˆ
A ) depend
on each other, the iterative method alternatively updates one while fix-
ing the other. To be specific, the spatial filter w can be obtained based on
afixed
ˆ
A ) by solving equation 2.9. For
ˆ
A ), we calculate the jth col-
umn of
ˆ
A ),[
ˆ
a
1 j
,
ˆ
a
2 j
,...,
ˆ
a
Cj
]
T
, separately based on the fixed spatial filter
and [
ˆ
a
1k
,
ˆ
a
2k
,...,
ˆ
a
Ck
]
T
(k = 1,...,C and k = j) from the last iteration. In
this way, the information flow from different channels is optimized indi-
vidually, and the update of
ˆ
A ) finishes on the completion of estimating
[
ˆ
a
1 j
,
ˆ
a
2 j
,...,
ˆ
a
Cj
]
T
for j = 1,...,C. T he implementation of the proposed dis-
criminative learning algorithm of propagation and spatial patterns is sum-
marized in algorithm 1. The loop will not stop until the convergence criteria
are met. Note that during the optimization, only one spatial filter w is used.
On completion of the optimization,
X can b e obtained from equation 2.11,
and subsequently
R
(c)
can be obtained based on equation 2.12. With R
c
sub-
stituted with
R
(c)
, the optimization problem in equation 2.9 is equivalent to
solving the eigenvalue decomposition problem,
W
R
(0)
= W
R
(1)
, (2.15)
where is the diagonal matrix containing the eigenvalues of (
R
(1)
)
1
R
(0)
.
With the projection matrix W, we select r pairs of spatial filters correspond-
ing to the r largest or smallest components in as in the usual CSP proce-
dure. And the feature
F
i
for trial i is obtained from
X
i
as
F
i
= log
w
j
X
i
X
T
i
w
T
j
j
w
j
X
i
X
T
i
w
T
j
, j = 1,...,r, N r + 1,...,N. (2.16)
2.3 Background Noise Separation. In this section, we investigate the
effectiveness of introducing the time-lagged demixing matrix
ˆ
A ) into the
estimation of the ERD/ERS source, combined with spatial filter design. To
further analyze and evaluate the proposed model, the difference between
the time-decorrelated EEG signal
X(t) (see equation 2.5) and original EEG
data X(t) is investigated. Suppose X(t) is described by the following MVAR
model,
X(t) =
q
τ =1
B )X(t τ)+ N(t), (2.17)
Discriminative Learning of EEG Propagation 2717
Algorithm 1: Discriminative Learning of Propagation and Spatial Pattern
Input:
Training EEG data that comprises N sample blocks of X, with each block having
a specific class label;
Output:
Spatial filter w and time-lagged correlation estimates
ˆ
A(τ).
begin
Set the initial parameters of the spatiotemporal filters
ˆ
A(τ) as zero matrices;
for k =1:n
k
do
Compute X based on
ˆ
A(τ) using equation 2.11;
Compute w by solving the optimization problem in equation 2.9;
% Update the spatial filter w
for j =1:C do
Compute a
1j
, ˆa
2j
,...,ˆa
mj
]
T
based on the updated spatial filter w by
solving the optimization problem in equation 2.14;
% Update
ˆ
A(τ).
Compute the change in the norm
ˆ
A(τ) by δ =
ˆ
A(τ)
k
ˆ
A(τ)
k1
;
if
δ<ζ
(
ζ
is a small preset constant)
then
Stop.
where N(t) is the prediction error. It is also regarded as the innovation
process because it is spontaneous and cannot be totally predicted by past
observations (Gomez-Herrero et al., 2008). Note that B ) is the mixing ma-
trix based on the regression model, which is different from A ) estimated
in the proposed model for discriminative purposes and q is the order of the
2718 X. Li et al.
MVAR model. Similarly, equation 2.17 is rearranged in the following form
to make the input-output relationship more compact,
N(t) =
q
τ =0
ˆ
B )X(t τ), (2.18)
where
ˆ
B ) =
I= 0;
B ), τ > 0.
(2.19)
Transforming equation 2.18 into the frequency domain yields
N( f ) = B( f )X( f ), (2.20)
B( f ) =
q
τ =0
ˆ
B )e
i2π fτ
, (2.21)
where f is the frequency. Therefore, the transfer function of the system H( f )
can be described by
H( f ) = B
1
( f ), (2.22)
such that X( f ) = H( f )N( f ).
By substituting equation 2.17 into 2.5 and following the steps from equa-
tion 2.20 to 2.22, we obtain
X( f ) = (I A( f ))X( f ) (2.23)
=
H( f )
A( f )
B( f )
N( f ), (2.24)
where
A( f ) =
p
τ =0
ˆ
A )e
i2π fτ
. (2.25)
Let
H( f ) = H( f )
A( f )
B( f )
, which is the transfer function from N( f ) to
X.
Since the causal flow measurement DTF is defined based on the transfer
function (Kaminski et al., 2001), we see that the proposed method changes
the information flow by changing the transfer function from H( f ) to
H( f ).
Moreover, comparison of the transfer functions of
X and X in equation
2.23 shows its similarity to the classical signal-plus-noise (SPN) model. In
Discriminative Learning of EEG Propagation 2719
particular, in Xu et al. (2009), the observed EEG data containing ERP X
E
( f )
is usually formulated as
X
E
( f ) = S
E
( f ) + Z( f ) (2.26)
where S
E
( f ) is the ERP of interest and Z( f ) is the background noise or the
ongoing activity.
Remark 3. As Xu et al. (2009) discussed, the background noise is not noise
despite its noise-like appearance but represents ongoing brain activity rich
in oscillatory content. In the light of the above discussion, we can interpret
equation 2.23 from a similar perspective. As indicated in equation 2.23,
the frequency component removed from X is an oscillatory signal with a
transfer function
A( f )
B( f )
, and it can be regarded as an estimate of ongoing
activity. In other words, this ongoing activity constitutes part of the MVAR
process of X with the portion as
A( f )
B( f )
. In this way, the ERD/ERS components
are enhanced in the proposed model with the oscillatory background noise
attenuated.
The K
¨
ullback-Leibler (KL) divergence is a measure of probability diver-
gence given two probability distributions, and it has been used to evaluate
nonstationarity in motor imagery EEG classification problem (Arvaneh,
Guan, Ang, & Quek, 2013a, 2013b; Bamdadian, Guan, Ang, & Xu, 2012).
Therefore, to verify that the component removed from X is the background
noise, we adopt the KL divergence as the criterion.
As the gaussian model is usually used to model EEG data, we consider
the KL divergence between two gaussian distributions. In particular, the
KL divergence between two gaussian distributions with the means and
nonsingular covariance matrices corresponding to distribution
N
0
/N
1
as
μ
0
1
and
0
/
1
is
D
KL
(N
0
||N
1
) =
1
2
tr
1
1
0
1
μ
0
)
T
1
1
1
μ
0
)
ln
det
0
det
1
k

. (2.27)
It is reasonable to assume that the improved separation of background noise
will result in more stationary data with fewer within-class dissimilarities.
We therefore adopt KL divergence to measure such within-class d issimi-
larities. The smaller the KL divergences within trials from the same class,
the less the variation of the data, which generally relates to better classifi-
cation results. Since EEG data are usually processed to be centered and the
dimension k of the distribution is the number of channel m, for every trial i
in class c,weuseD
KL
(N (0, R
i
)||N (0, R
(c)
)) to measure the dissimilarity of
2720 X. Li et al.
the distribution of this trial from the mean distribution of the class c as
D
KL
(N (0, R
i
)||N (0, R
(c)
)) =
1
2
tr(R
1
i
R
(c)
) ln
det R
i
det R
(c)
m
,
(2.28)
and subsequently we obtain an average probability divergence D for EEG
data X as
D =
c=0,1
1
|Q
c
|
iQ
c
D
KL
(N (0, R
i
)||N (0, R
(c)
)). (2.29)
Similarly, we obtain
D based on
X as
D =
c=0,1
1
|Q
c
|
iQ
c
D
KL
(N (0,
R
i
)||N (0,
R
(c)
)). (2.30)
In this way, by comparing D and
D, we can evaluate the quality of X and
X
in terms of within-class dissimilarities.
Remark 4. It is worth noting that the proposed method addresses a more
complicated dynamics of motor imagery EEG but does not depend on the
very critical explanation of the generation of ERD/ERS. On the one hand,
it is possible that propagation effects that contribute to the generation of
ERD/ERS exist. On the other hand, discriminative sources could correlate
with noise in a convolutive way. Blind source separation or connectivity
estimation methodology, as discussed before, may not be effective for clas-
sification problems because it is difficult to differentiate between two kinds
of propagation effects. The proposed model, which is formulated in a phe-
nomenological form, equation 2.23, takes both cases into consideration.
3 Experimental Results and Discussion
3.1 Data Description and Processing. Sixteen subjects participated in
the study with informed consent. Ethics approval was obtained before-
hand from the Institutional Review Board of the National University of
Singapore. EEGs from the full 27 channels were obtained using Nuamps
EEG acquisition hardware with unipolar Ag/AgCl electrodes channels. The
sampling rate was 250 Hz with a resolution of 22 bits for the voltage range
of ±130 mV. A bandpass filter of 0.05 to 40 Hz was set in the acquisition
hardware.
Discriminative Learning of EEG Propagation 2721
In the experiment, the training and test sessions were recorded on dif-
ferent days with the subjects performing motor imagery. During the EEG
recording process, the subjects were asked to avoid physical movement
and eye blinking. In addition, they were instructed to perform kinesthetic
motor imagery of the chosen hand in two runs. During the rest state, they
did mental counting to make the resting EEG signal more consistent. Each
run lasted approximately 16 minutes and consisted of 40 trials of motor
imagery and 40 trials of rest state. Each training session consisted of two
runs, while the test session consisted of two or three runs.
We select the time segments from 0.5 s to 2.5 s after the cue (Arvaneh,
Guan, Ang, & Quek, 2011). The raw data are prefiltered by an 8 Hz to 35 Hz
bandpass filter that covers rhythms related to motor imagery. The filtered
training data are used to train the feature extraction model based on the
proposed method as described in section 2.2. The numbers of spatial filters
in W are chosen as 2 and 3 (r = 2, 3 in equation 2.16). Finally, the extracted
training features are used to train a support vector machine (SVM) classifier.
3.2 Investigation on the Order of the Time-Lagged Demixing Matrix.
To determine the order p of
ˆ
A ) in equation 2.11, we fit the MVAR model
to EEG data as in equation 2.17. Although the orders p and q have different
meanings, the analysis of the order q of the mixing matrix B ) in equation
2.14 provides the information at which time-lagged level the propagation
effects are stronger. Based on equation 2.20 and the analysis given in sec-
tion 2.3, as
ˆ
A ) corresponds to certain components of B ) in frequency
domain, it is reasonable to choose the order p of
ˆ
A ) in accordance with
q, the order of B ). Therefore, the analysis of the mixing matrix B ) can
be used to initialize the order p of
ˆ
A ) in the proposed model. The Swartz
Bayesian criterion is used to automatically select the model order that best
matches the data (Schneider & Neumaier, 2001). We found that for every
subject, the order 5 for q is selected for most of the trials and the order 4
or 6 is selected for the remaining of the trials. Therefore, we restrict the
investigation on the order 4, 5, or 6.
Figure 1 illustrates the result of one subject in the data set introduced in
section 3.1. The y-axis indicates the value of the norm of mixing matrix B )
in equation 2.17 of different orders, and the x-axis indicates the order τ .The
coefficient matrices are obtained under MVAR models with q equal to 4, 5, or
6 and averaged over the training set and test set, respectively, resulting in the
six lines in Figure 1. We see that in all six cases, the norms of the coefficient
matrices of orders 2 and 3 are the highest, which means that the data at time
t are most influenced by the data at time t 2andtimet 3. Therefore,
the order p of
ˆ
A ) should include these two time lags, and subsequently
the proposed discriminative learning model addresses the most influential
propagation effects. Furthermore, we focus on investigating the feasibility
of the proposed model with orders 4 and below.
2722 X. Li et al.
0 1 2 3 4 5 6 7
0
2
4
6
8
10
12
Order τ
Norm of the coefficient matrix B(τ)
Training set (q =4)
Test set (q=4)
Training set (q=5)
Test set (q=5)
Training set (q=6)
Test set (q=6)
Figure 1: Norms of coefficient matrices under the MVAR model. The x-axis
represents the order τ,andy-axis represents the norm of B ).ThreeMVAR
models with order q from 4 to 6 are used to fit EEG data of training and test sets
separately, yielding six lines. And the peak points of the six lines correspond to
either τ = 2orτ = 3.
3.3 Classification Results. Tables 1 and 2 summarize the performance
of the proposed feature extraction method, compared with CSP as the base-
line. In these two tables, we refer to the proposed method as discriminative
propagation and spatial pattern analysis (DPSP). Tables 1 and 2 c orrespond
to r = 2andr = 3, respectively, and in both tables, results of DPSP with
p = 1, 2,...,4 are included.
According to the results, the proposed feature extraction method im-
proves the performance of the classifier, and the improvements are sig-
nificant when the order of
ˆ
A ) in DPSP is 2 or 3 regardless of the value
of r, which is in agreement with the previous analysis based on the MAVR
model. Specifically, the average classification accuracy for order 2 is 68.30%,
and the accuracy for order 3 is 67.91% when r = 2, both of which are higher
than that of CSP (65.56%). The paired t-test confirms the significance of the
improvement at a 5% level with p-values equaling 0.008 and 0.040, corre-
sponding to the cases of p = 2andp = 3, respectively. Similar to the results
based on two pairs of spatial filters, the average classification accuracy is
68.98% for p = 2and68.75% for p = 3 of DPSP when r = 3, higher than that
of CSP (66.48%). Also, the significance of the improvement is confirmed by
t-test with p-values of 0.027 and 0.022, corresponding to the cases of p = 2
Discriminative Learning of EEG Propagation 2723
Table 1: Session-to-Session Transfer Test Results for r = 2(%).
DPSP
Subject CSP p = 1 p = 2 p = 3 p = 4
1 65.00 65.41 62.91 66.66 67.08
2 51.25 51.25 54.17 52.08 52.08
3 55.00 55.00 57.50 55.83 55.00
4 66.67 66.67 70.41 71.25 77.08
5 54.58 54.16 67.08 70.41 58.33
6 67.08 67.50 72.50 69.16 69.58
7 77.08 77.08 77.92 76.66 72.5
8 94.16 94.16 92.50 96.25 95.41
9 74.58 75.00 75.83 75.83 74.58
10 61.66 61.25 60.41 60.83 60.00
11 46.25 46.67 49.16 53.33 47.08
12 77.00 77.08 81.25 79.58 73.33
13 51.25 51.25 54.58 51.25 50.00
14 72.08 72.08 79.16 73.75 74.58
15 65.83 65.58 67.50 64.16 64.58
16 69.58 69.60 70.00 68.75 65.00
Mean 65.56 65.59 68.30 67.91 66.01
SD 12.26 12.28 11.57 11.79 12.35
p-value 0.64 0.008
0.040
0.63
p 0.05.
and p = 3, respectively. The accuracy for order 4 is 66.01% when r = 2and
66.41% when r = 3, which are not significant. Interestingly, the accuracy
for order 1 is almost the same as that of CSP in both tables, which also
confirms our previous analysis: it is necessary and sufficient for
ˆ
A ) to
cover the major components of
ˆ
B ). The propagation effect is strongest at
orders 2 and 3, and the optimization based on
ˆ
A ) for order 1 has very
limited effect and results in almost the same result. The optimization based
on
ˆ
A ) of order 4 accounts for most of the propagation effect, but more
parameters pose a risk of overfitting. In other words, ideally the higher
the order of
ˆ
A ), the better the results should be, since more propagation
effects are taken into consideration. However, for a higher order, the in-
creased number of parameters would cause overfitting, which makes the
classification results deteriorate. To keep a balance between accounting for
the propagation effects and overfitting, it is effective to cover as few major
components of propagation as possible, which come from orders 2 and 3 in
this experiment.
Figure 2 is used to show the comparison result in a more intuitive way.
Each plot in Figure 2 shows the test accuracy under DPSP with order p
against that under CSP. The x-axis represents the accuracy results under
2724 X. Li et al.
Table 2: Session-to-Session Transfer Test Results for r = 3(%).
DPSP
Subject CSP p = 1 p = 2 p = 3 p = 4
1 70.41 70.41 71.66 73.33 73.33
2 54.58 54.58 57.08 60.83 54.16
3 56.66 56.66 57.50 55.83 55.00
4 75.41 76.66 76.66 74.16 75.41
5 53.33 53.33 67.08 66.67 54.16
6 68.33 68.33 71.66 71.66 70.83
7 72.50 72.50 75.00 72.92 71.66
8 94.58 94.58 91.66 94.58 95.00
9 76.25 76.58 77.91 76.25 72.50
10 57.50 60.83 60.41 61.67 60.00
11 47.50 47.50 50.41 47.92 47.08
12 75.83 75.41 80.83 81.25 72.05
13 49.58 49.58 51.25 50.00 49.58
14 74.16 74.16 80.41 74.58 75.41
15 64.16 64.16 64.58 65.00 72.08
16 72.91 72.91 68.75 72.08 68.75
Mean 66.48 66.52 68.98 68.74 66.14
SD 12.51 12.04 11.51 11.70 12.34
p-value 0.53 0.027
0.022
0.55
p 0.05.
CSP, and the y-axis represents that under DPSP. In each plot, a circle above
the diagonal line marks a subject for which DPSP outperforms CSP.
Figure 3 shows A ) for two subjects. For a better comparison of differ-
ences b etween the proposed method and the MVAR model, mixing matrices
B ) based on the MVAR model of the two subjects are also provided. As
shown in Figure 3, the diagonal elements of B ) are much higher than
the off-diagonal elements, because the self-spectrum of the signal is usu-
ally stronger than the cross-spectrum between the EEG signals. However,
there are no large differences between diagonal and off-diagonal elements
of A ), and diagonal elements are not significantly higher, which means
the self-spectrum of the signal is not modulated radically by A ).More-
over, since elements of higher values concentrate in certain columns, higher
weights are given to tune propagation from certain channels.
3.4 Analysis of Background Noise Separation. To further verify the
validity of DPSP, we have evaluated the classwise KL divergence (see sec-
tion 2.3). Results averaged among all subjects are shown in Table 3 and
Figure 4. Note that for the computation of D
KL
of both the training set and
the test set, the average covariance matrix R/
R is the mean of the training
set since under the single-trial analysis setting, we cannot obtain the mean
Discriminative Learning of EEG Propagation 2725
0.4 0.6 0.8 1
0.4
0.5
0.6
0.7
0.8
0.9
1
Accuracy under DPSP
p =2, r =2
Accuracy under CSP
p =2, r =2
p−value=0.008
0.4 0.6 0.8 1
0.4
0.5
0.6
0.7
0.8
0.9
1
Accuracy under DPSP
p
=2,
r
=3
Accuracy under CSP
p =2, r =3
p−value=0.027
0.4 0.6 0.8 1
0.4
0.5
0.6
0.7
0.8
0.9
1
Accuracy under DPSP
p =3, r =2
Accuracy under CSP
p =3, r =2
p−value=0.04
0.4 0.6 0.8 1
0.4
0.5
0.6
0.7
0.8
0.9
1
Accuracy under DPSP
p =3,
r
=3
Accuracy under CSP
p =3, r =3
p−value=0.022
Figure 2: Session-to-session transfer test accuracy. The x-axis represents the
accuracy results under CSP, and the y-axis represents that under DPSP with
different orders p and numbers of spatial filters r.They = x line is denoted by a
dotted-dashed line. In each plot, a circle above the y = x line marks a subject for
which DPSP outperforms CSP. It can be seen from the plots that improvements
of DPSP for order 2 and 3 are significant.
of the test set. Therefore, the fact that the average divergence D of the test
set is larger than that of the training set in all cases reflects the differences
between the test set and the training set, as indicated by Table 3. This is
mainly caused by the session-to-session transfer effects. According to the
results, the proposed DPSP algorithm decreases the KL divergence within
the same class for both the training set and the test set, which means that
compared to the raw EEG data X, data processed by DPSP
X are more sta-
tionary. A more significant decrease is achieved for the test set, which means
that the proposed method is more robust to the session-to-session transfer
effects. Moreover, the comparison between different orders indicates that
better performance is achieved with the order 2, which is in accord with the
accuracy results.
Figure 5 illustrates the correlation between the decrease of KL divergence
and the increase of the classification accuracy at the subject level. The linear
2726 X. Li et al.
(a) Comparison between A(τ) and B(τ) for subject 7
(b) Comparison between A(τ) and B(τ) for subject 14
Figure 3: Comparison of coefficient matrices obtained by the proposed method,
A ), and the mixing matrices in MVAR, B ). For both subjects, the diagonal
elements of B ) are much higher than the off-diagonal elements. For A ),
elements of higher values are found in certain columns.
correlation coefficient r
c
equals 0.30 and 0.31, corresponding to p = 2and
p = 3, respectively. Due to the large variety across subjects, their KL diver-
gence may lie in different feature spaces. The decrease of KL divergence and
the increase of classification performance may not correlate linearly. As il-
lustrated in Figure 5, almost all the points lie in the first quadrant, indicating
Discriminative Learning of EEG Propagation 2727
Table 3: Decrease of KL Divergence (%).
p = 2 p = 3 p = 4
D
D 1
D
D
D 1
D
D
D 1
D
D
Training set 4.96 4.09 17.68% 4.25 14.39% 4.84 2.55%
Test set 64.3 25.2 60.84% 36.68 42.98% 57.09 11.24%
Figure 4: Decrease of the KL divergence. T he decreases of the KL divergence
in
X of different orders compared to X are shown as percentages. A great
decrease in the KL divergence indicates that
X is more stationary than X.There-
fore, the proposed DPSP algorithm is more robust toward varying background
noise and session-to-session transfer effects.
that the decreased KL divergence is positively correlated with the increased
classification accuracy. Therefore, the decrease of the KL divergence con-
tributes to the increase of the classification accuracy to a certain extent.
Nevertheless, the reason for the increase of the classification could b e more
complicated so that KL divergence cannot completely represent it. We will
investigate this issue in the future work.
4Conclusion
The coexistence of brain connectivity and volume conduction may have
complicated effects in EEG measurements and poses technical challenge
to detecting specific brain activities of interest. Conventional linear spatial
filters design methods with the assumption of unconnectedness of sources
2728 X. Li et al.
−50 0 50 100
−0.2
−0.1
0
0.1
0.2
0.3
Decrease of KL−divergence (%)
Increase of classification accuracy (%)
(a) r
c
=0.30 (p =2)
−50 0 50 100
−0.2
−0.1
0
0.1
0.2
0.3
Decrease of KL−divergence (%)
Increase of classification accuracy (%)
(b) r
c
=0.31 (p =3)
Figure 5: Correlation between the decrease of the KL divergence and the in-
crease of the classification accuracy. The x-axis represents the decrease of the KL
divergence, and the y-axis represents the increase of the classification accuracy.
Panels a and b correspond to p = 2andp = 3 respectively.
are not sufficient in addressing such complicated dynamics. Due to the
causal relationship, reconstructed ERD/ERS signals based on instantaneous
demixing may not be the optimized results in terms of discrimination.
Discriminative Learning of EEG Propagation 2729
Moreover, the propagation effects are closely related to the background
noise and nonstationarity of EEG. It is possible that an electrode that con-
tains no discriminative information could be given a high weight due to
information flow from signals containing ERD/ERS, and such dependence
could be very unstable compared with original ERD/ERS source. This anal-
ysis is the motivation to propose the unified model for discriminative learn-
ing of propagation and spatial patterns.
Therefore, we have reported in this letter a novel computational model
that accounts for both time-lagged correlations between signals and the
volume conduction effect. Different from the sparsely connected sources
analysis (SCSA) model in Haufe et al. (2010) and MVAR-ICA model in
Gomez-Herrero et al. (2008), the proposed computational model is de-
signed from discriminative analysis but also takes propagation into ac-
count. Besides, an iteration procedure–based algorithm is implemented for
the estimation of the proposed discriminative model. Experiment results
have shown statistically significant improvement in classification accuracy
under the proposed learning method. Moreover, the effectiveness of the
background noise attenuation is also confirmed with a significant decrease
of KL divergence of EEG data of the same class, especially for test data.
This indicates that the proposed method is more robust than conventional
methods against the session-to-session nonstationarity in EEG.
Appendix: Relations Between the Convolutive Model and
the Instantaneous Model with Connected Sources
Based on the model in Haufe et al. (2010) and Gomez-Herrero et al. (2008),
X(t) can be assumed to be generated as a linear instantaneous mixture of
source signal S(t), which follows an multivariant autoregression (MVAR)
model,
X(t) = MS(t), (A.1)
S(t) =
τ
B
s
)S(t τ)+ (t), (A.2)
where B
s
) is the coefficient matrix of the MVAR model and it represents
the connectivity between sources (Ginter et al., 2001; Schlogl & Supp, 2006).
From equation A.1, the innovation process (t) can be written as
(t) = M
1
X(t)
τ
B
s
)M
1
X(t τ)
=
τ
ˆ
B
s
)X(t τ), (A.3)
2730 X. Li et al.
where
ˆ
B
s
) =
M
1
= 0
B
s
)M
1
>0
. (A.4)
Equation A.3 shows the equivalence between this model and the convo-
lutive model in Dyrholm et al. (2007) and Mørup et al. (2009) and the
proposed approach, with the underlying convolutive sources replaced by
innovations. Because the objective in Haufe et al. (2010) and Gomez-Herrero
et al. (2008) is connectivity analysis, the estimation of B
s
) and M is based
on the nongaussianity assumption of (t). In the proposed model, S(t)
represents the discriminative sources related to ERD/ERS, and thus the
estimation of the FIR matrix
ˆ
A ) in equation 2.11 and spatial filter w is
based on maximizing the variance difference between the two classes. With
the discriminative objective, it is preferable to apply the convolutive model
to impose the variance difference as the prior information of the source.
Moreover, since the two models are equivalent, it is also possible to build a
discriminative model based on the instantaneous mixing model with con-
nected sources in equations A.1 and A.2. In future work, we would like to
explore possible discriminative learning approach to study the connectivity
that contains class information.
References
Ang, K. K., Chin, Z. Y., Wang, C., Guan, C., & Zhang, H. (2012). Filter bank common
spatial pattern algorithm on BCI competition IV datasets 2a and 2b. Frontiers in
Neuroscience, 6(39).
Ang, K. K., Chin, Z. Y., Zhang, H., & Guan, C. (2008). Filter bank common spatial pat-
tern (FBCSP) in brain-computer interface. In Proceedings of the IEEE International
Joint Conference on Neural Networks and Computational Intelligence (pp. 2390–2397).
Piscataway, NJ: IEEE.
Ang, K. K., Chin, Z. Y., Zhang, H., & Guan, C. (2012). Mutual information-based
selection of optimal spatial-temporal patterns for single-trial EEG-based BCIs.
Pattern Recognition, 45(6), 2137–2144.
Arvaneh, M., Guan, C., Ang, K. K., & Quek, C. (2011). Optimizing the channel selec-
tion and classification accuracy in EEG-based BCI. IEEE Transactions on Biomedical
Engineering, 58(6), 1865–1873.
Arvaneh, M., Guan, C., Ang, K. K., & Quek, C. (2013a). EEG data space adapta-
tion to reduce inter-session non-stationarity in brain-computer interface. Neural
Computation, 25, 2146–2171.
Arvaneh, M., Guan, C., Ang, K. K., & Quek, C. (2013b). Optimizing spatial filters by
minimizing within-class dissimilarities in EEG-based BCI. IEEE Transactions on
Neural Networks and Learning Systems, 24(4), 610–619.
Discriminative Learning of EEG Propagation 2731
Astolfi, L., Cincotti, F., Mattia, D., de Vico Fallani, F., Salinari, S., Ursino, M., &
Babiloni, F. (2006). Estimation of the cortical connectivity patterns during the
intention of limb movements. IEEE Engineering in Medicine and Biology Magazine,
25(4), 32–38.
Baccala, L. A., & Sameshima, K. (2001). Partial directed coherence: A new concept in
neural structure determination. Biological Cybernetics, 84, 463–474.
Bahramisharif, A., van Gerven, M. A. J., Schoffelen, J. M., Ghahramani, Z., & Heskes,
T. (2012). The dynamic beamformer. Machine learning and interpretation in neu-
roimaging. In G. Langs, I. Rish, M. Grosse-Wentrup, & B. Murphy (Eds.), Lecture
notes in computer science (vol. 7263, pp. 148–155). Berlin: Springer.
Bamdadian, A., Guan, C., Ang, K. K., & Xu, J. (2012). Online semi-supervised learning
with kl distance weighting for motor imagery-based BCI. In Proceedings of the 2012
Annual International Conference of the IEEE Engineering in Medicine and Biology
Society (pp. 2732–2735). Piscataway, NJ: IEEE.
Blankertz, B., Tomika, R., Lemm, S., Kawanabe, M., & Muller, K. R. (2008). Optimiz-
ing spatial filters for robust EEG single trial-trial analysis. IEEE Signal Processing
Magazine, 25(1), 41–56.
Chen, H., Yang, Q., Liao, W., Gong, Q., & Shen, S. (2009). Evaluation of the effective
connectivity of supplementary motor areas during motor imagery using Granger
causality mapping. NeuroImage, 47(4), 1844–1853.
Dornhege, G., Blankertz, B., Krauledat, M., Losch, F., Curio, G., & Muller, K.-R.
(2006). Combined optimization of spatial and temporal filters for improving
brain-computer interfacing. IEEE Transactions on Biomedical Engineering, 53(11),
2274–2281.
Dyrholm, M., Makeig, S., & Hansen, L. K. (2007). Convolutive ICA for spatiotemporal
analysis of EEG. Neural Computation, 19, 934–955.
Ewald, A., Marzetti, L., Zappasodi, F., Meinecke, F. C., & Nolte, G. (2012). Estimat-
ing true brain connectivity from EEG/MEG data invariant to linear and static
transformations in sensor space. NeuroImage, 60(1), 476–488.
Formaggio, E., Storti, S. F., Cerini, R., Fiaschi, A., & Manganotti, P. (2010). Brain
oscillatory activity during motor imagery in EEG-fMRI coregistration. Magnetic
Resonance Imaging, 28(10), 1403–1412.
Gerkinga, J. M., Pfurtscheller, G., & Flyvbjergc, H. (1999). Designing optimal spatial
filters for single-trial EEG classiffication in a movement task. Clinical Neurophysi-
ology, 110, 787–798.
Ginter, J. Jr., Blinowska, K. J., Kaminski, M., & Durka, P. J. (2001). Phase and amplitude
analysis in time-requency space-pplication to voluntary finger movement. Journal
of Neuroscience Methods, 110(1–2), 113–124.
Gomez-Herrero, G., Atienza, M., Egiazarian, K., & Cantero, J. L. (2008). Measuring
directional coupling between EEG sources. NeuroImage, 43(3), 497–508.
Grosse-Wentrup, M. (2009). Understanding brain connectivity patterns during motor
imagery for brain-computer interfacing. In D. Koller, D. Schuurmans, Y. Bengio, &
L. Bottou (Eds.), Advances in neural information processing systems, 21 (pp. 561–568).
Cambridge, MA: MIT Press.
Guler, I., Kiymik, M. K., Akin, M., & Alkan, A. (2001). AR spectral analysis of
EEG signals by using maximum likelihood estimation. Computers in Biology and
Medicine, 31(6), 441–450.
2732 X. Li et al.
Gysels, E., & Celka, P. (2007). Phase synchronization for the recognition of mental
tasks in a braincomputer interface. IEEE Transactions on Rehablitation Engineering,
12(4), 406–415.
Haufe, S., Tomioka, R., Nolte, G., Muller, K.-R., & Kawanabe, M. (2010). Model-
ing sparse connectivity between underlying brain sources for EEG/MEG. IEEE
Transactions on Biomedical Engineering, 57(8), 1954–1963.
Jeannerod, M. (1995). Mental imagery in the motor context. Neuropsychologia, 33,
1419–1432.
Kaminski, M., & Blinowska, K. (1991). A new method of the description of the
information flow in the brain structures. Biological Cybernetics, 65, 203–210.
Kaminski, M., Ding, M., Truccolo, W. A., & Bressle, S. (2001). Evaluating causal
relations in neural systems: Granger causality, directed transfer function and
statistical assessment of significance. Biological Cybernetics, 85, 145–157.
Koles, Z. J. (1991). The quantitative extraction and toporraphic mapping of the ab-
normal components in the clinical EEG. Electroencephalography and Clinical Neu-
rophysiology, 79, 440–447.
Kus, R., Kaminski, M., & Blinowska, K. J. (2004). Determination of EEG activity prop-
agation: Pair-wise versus multichannel estimate. IEEE Transactions on Biomedical
Engineering, 51(9), 1501–1510.
Lemm, S., Blankertz, B., Curio, G., & Muller, K.-R. (2005). Spatio-spectral filters for
improving the classification of single trial EEG. IEEE Transactions on Biomedical
Engineering, 52(9), 1541–1548.
Li, Y., & Guan, C. (2006). An extended EM algorithm for joint feature extraction
and classification in brain-computer interfaces. Neural Computation, 18, 2730–
2761.
Llera, A., Gomez, V., & Kappen, H. J. (2012). Adaptive classification on brain-
computer interfaces using reinforcement signals. Neural Computation, 24, 2900–
2923.
Lo, A. C., Guarino, P. D., Richards, L. G., Haselkorn, J. K., Wittenberg, G. F., Federman,
D. G., & Peduzzi, P. (2010). Robot-assisted therapy for long-term upper-limb
impairment after stroke. New England Journal of Medicine, 362, 1772–1783.
Mørup, M., Madsen, K. H., & Hansen, L. K. (2009). Latent causal modelling of
neuroimaging data. In NIPS Workshop on Connectivity Inference in Neuroimaging.
Pfurtscheller, G., Brunner, C., Schlogl, A., & da Silva, F. H. L. (2006). Mu rhythm
(de)synchronization and EEG single-trial classification of different motor imagery
tasks. NeuroImage, 31(1), 153–159.
Ramoser, H., Muller-Gerking, J., & Pfurtscheller, G. (2000). Optimal spatial filter-
ing of single trial EEG during imagined hand movement. IEEE Transactions on
Rehabilitation Engineering, 8(4), 441–446.
Schlogl, A., & Supp, G. (2006). Analyzing event-related EEG data with multivariate
autoregressive parameters. Progress in Brain Research, 159, 135–147.
Schneider, T., & Neumaier, A. (2001). Algorithm 808: Arfit—a Matlab package for the
estimation of parameters and eigenmodes of multivariate autoregressive models.
ACM Transactions on Mathematical Software (TOMS),
6, 58–65.
Stavrinou, M., Moraru, L., Cimponeriu, L., Stefania, P. D., & Bezerianos, A. (2007).
Evaluation of cortical connectivity during real and imagined rhythmic finger
tapping. Brain Topography, 19(3), 137–145.
Discriminative Learning of EEG Propagation 2733
Thomas, K. P., Guan, C., Lau, C. T., Vinod, A. P., & Ang, K. K. (2009). A new dis-
criminative common spatial pattern method for motor imagery brain-computer
interfaces. IEEE Transactions on Biomedical Engineering, 56(11), 2730–2733.
Vidaurre, C., Sannelli, C., Muller, K.-R., & Blankertz, B. (2011). Machine-learning-
based coadaptive calibration for brain-computer interfaces. Neural Computation,
23, 791–816.
Wei, Q., Wang, Y., Gao, X., & Gao, S. (2007). Amplitude and phase coupling measures
for feature extraction in an EEG-based brain¨ccomputer interface. Journal of Neural
Engineering, 4, 120–129.
Xu, L., Stoica, P., Li, J., Bressler, S. L., Shao, X., & Ding, M. (2009). Aseo: A method for
the simultaneous estimation of single-trial event-related potentials and ongoing
brain activities. IEEE Transactions on Biomedical Engineering, 56(1), 111–121.
Received January 28, 2013; accepted May 5, 2013.
  • Article
    Full-text available
    Electrooculogram (EOG) artifact contamination is a common critical issue in general electroencephalogram (EEG) studies as well as in brain computer interface (BCI) research. It is especially challenging when dedicated EOG channels are unavailable or when there are very few EEG channels available for ICA-based ocular artifact removal. It is even more challenging to avoid loss of the signal of interest during the artifact correction process, where the signal of interest can be multiple magnitudes weaker than the artifact. To address these issues, we propose a novel discriminative ocular artifact correction approach for feature learning in EEG analysis.Without extra ocular movement measurements, the artifact is extracted from raw EEG data, which is totally automatic and requires no visual inspection of artifacts. Then, artifact correction is optimized jointly with feature extraction by maximizing oscillatory correlations between trials from the same class and minimizing them between trials from different classes. We evaluate this approach on a real world EEG data set comprising 68 subjects performing cognitive tasks. The results showed that the approach is capable of not only suppressing the artifact components but also improving the discriminative power of a classifier with statistical significance. We also demonstrate that the proposed method addresses the confounding issues induced by ocular movements in cognitive EEG study.
  • Article
    Full-text available
    To detect the mental task of interest, spatial filtering has been widely used to enhance the spatial resolution of electroencephalography (EEG). However, the effectiveness of spatial filtering is undermined due to the significant nonstationarity of EEG. Based on regularization, most of the conventional stationary spatial filter design methods address the nonstationarity at the cost of the interclass discrimination. Moreover, spatial filter optimization is inconsistent with feature extraction when EEG covariance matrices could not be jointly diagonalized due to the regularization. In this paper, we propose a novel framework for a spatial filter design. With Fisher's ratio in feature space directly used as the objective function, the spatial filter optimization is unified with feature extraction. Given its ratio form, the selection of the regularization parameter could be avoided. We evaluate the proposed method on a binary motor imagery data set of 16 subjects, who performed the calibration and test sessions on different days. The experimental results show that the proposed method yields improvement in classification performance for both single broadband and filter bank settings compared with conventional nonunified methods. We also provide a systematic attempt to compare different objective functions in modeling data nonstationarity with simulation studies.
  • Conference Paper
    Full-text available
    The non-stationarity inherent across sessions recorded on different days poses a major challenge for practical electroencephalography (EEG)-based Brain Computer Interface (BCI) systems. To address this issue, the computational model trained using the training data needs to adapt to the data from the test sessions. In this paper, we propose a novel approach to compute the variations between labelled training data and a batch of unlabelled test data based on the geodesic-distance of the discriminative subspaces of EEG data on the Grassmann manifold. Subsequently, spatial filters can be updated and features that are invariant against such variations can be obtained using a subset of training data that is closer to the test data. Experimental results show that the proposed adaptation method yielded improvements in classification performance.
  • Conference Paper
    Full-text available
    To address the nonstationarity issue in EEG-based brain computer interface (BCI), the computational model trained using the training data needs to adapt to the data from the test sessions. In this paper, we propose a novel adaptation approach based on the divergence framework. Cross-session changes can be taken into consideration by searching the discriminative subspaces for test data on the manifold of orthogonal matrices in a semi-supervised manner. Subsequently, the feature space becomes more consistent across sessions and classifiers performance can be enhanced. Experimental results show that the proposed adaptation method yields improvements in classification performance.
  • Article
    Full-text available
    Objective: Session-to-session nonstationarity is inherent in brain-computer interfaces based on electroencephalography. The objective of this paper is to quantify the mismatch between the training model and test data caused by nonstationarity and to adapt the model towards minimizing the mismatch. Approach: We employ a tensor model to estimate the mismatch in a semi-supervised manner, and the estimate is regularized in the discriminative objective function. Main results: The performance of the proposed adaptation method was evaluated on a dataset recorded from 16 subjects performing motor imagery tasks on different days. The classification results validated the advantage of the proposed method in comparison with other regularization-based or spatial filter adaptation approaches. Experimental results also showed that there is a significant correlation between the quantified mismatch and the classification accuracy. Significance: The proposed method approached the nonstationarity issue from the perspective of data-model mismatch, which is more direct than data variation measurement. The results also demonstrated that the proposed method is effective in enhancing the performance of the feature extraction model.
  • Article
    Learning under non-stationarity can be achieved by decomposing the data into a subspace that is stationary and a non-stationary one (stationary subspace analysis (SSA)). While SSA has been used in various applications, its robustness and computational efficiency has limits due to the difficulty in optimizing the Kullback-Leibler divergence based objective. In this paper we contribute by extending SSA twofold: we propose SSA with (a) higher numerical efficiency by defining analytical SSA variants and (b) higher robustness by utilizing the Wasserstein-2 distance (Wasserstein SSA). We show the usefulness of our novel algorithms for toy data demonstrating their mathematical properties and for real-world data (1) allowing better segmentation of time series and (2) brain-computer interfacing, where the Wasserstein-based measure of non-stationarity is used for spatial filter regularization and gives rise to higher decoding performance.
  • Article
    Electroencephalogram (EEG) has widely been used to monitor subjects/patients’ mental states. Using the monitor results as feedback, neuro-feedback enables patients to learn to regulate their physiological and psychological states so that improvements in physical and psychological subjects/patients’ states could be achieved. By analyzing EEG components generated by motor imagery, a mind-controlled game based on motor imagery is developed, including the design of BCI and the design of the video game. In the game, neuro-feedback is realized to in a visual manner, through which the users could learn to improve attention span. Based on motor imagery, EEG signal is classified into two categories, the left and right hand motor imagery. The accuracy of classification is up to 70%. The bandpower analysis results show that users’ attention level improves during the experiment. In this neuro-feedback game system, EEG signal is not only used for monitoring but also used for game control. The game provides an attention state measurements for users. With the neuro-feedback in the BCI, the user and the game form a close loop interactively. The proposed BCI video game could not only be used for entertainment and relaxation purpose, but attention-span training purpose.
  • Conference Paper
    Full-text available
    The subjects' performance in using a brain-computer interface (BCI) system controlled by motor imagery (MI) varies considerably. Poor subjects' performance, known as BCI deficiency, can be due to the subjects' inability to modulate their sensorimotor rhythms (SMRs). In this work, we investigated the feasibility of improving the BCI performance through neurofeedback (NF) training of the resting state alpha rhythm (8-13 Hz). Thirteen healthy subjects were recruited and randomly assigned to the experimental or the control group. The experimental group participated in a MI-BCI session, followed by 12 NF sessions, and a final MI-BCI sessions. The control group performed a MI-BCI session followed by a final MI-BCI session. The results showed that the performances of the experimental group after 12 sessions of NF significantly improved upon the initial MI-BCI performance (p=0.02) but not the control group (p=0.14). Moreover, the resting state alpha of the experimental group significantly improved after 12 sessions of NF (p=0.04). In conclusion, the proposed approach is promising to address BCI deficiency.
  • Article
    Controlling a device with a Brain-Computer Interface (BCI) requires extraction of relevant and robust features from high-dimensional electroencephalographic recordings. Spatial filtering is a crucial step in this feature extraction process. This work reviews algorithms for spatial filter computation and introduces a general framework for this task based on divergence maximization. We show that the popular Common Spatial Patterns (CSP) algorithm can be formulated as a divergence maximization problem and computed within our framework. Our approach easily permits enforcing different invariances and utilizing information from other subjects, thus it unifies many of the recently proposed CSP variants in a principled manner. Furthermore it allows to design novel spatial filtering algorithms by incorporating regularization schemes into the optimization process or applying other divergences. We evaluate the proposed approach using three regularization schemes, investigate the advantages of beta divergence and show that subject-independent feature spaces can be extracted by jointly optimizing the divergence problems of multiple users. We discuss the relations to several CSP variants and investigate the advantages and limitations of our approach with simulations. Finally we provide experimental results on a data set containing recordings from 80 subjects and interpret the obtained patterns from a neurophysiological perspective.
  • Article
    Full-text available
    We present a new algorithm for maximum likelihood convolutive ICA (cICA) in which sources are unmixed using stable IIR filters determined implicitly by estimating an FIR filter model of the mixing process. By intro-ducing a FIR model for the sources we show how the order of the filters in the convolutive model can be correctly detected using Bayesian model selection. We demonstrate a framework for deconvolving an EEG ICA subspace. Initial results suggest that in some cases convolutive mixing may be a more realistic model for EEG signals than the instantaneous ICA model.
  • Article
    Full-text available
    A major challenge in electroencephalogram (EEG)-based brain-computer interfaces (BCIs) is the inherent nonstationarities in the EEG data. Variations of the signal properties from intra and inter sessions often lead to deteriorated BCI performances, as features extracted by methods such as common spatial patterns (CSP) are not invariant against the changes. To extract features that are robust and invariant, this paper proposes a novel spatial filtering algorithm called Kullback-Leibler (KL) CSP. The CSP algorithm only considers the discrimination between the means of the classes, but does not consider within-class scatters information. In contrast, the proposed KLCSP algorithm simultaneously maximizes the discrimination between the class means, and minimizes the within-class dissimilarities measured by a loss function based on the KL divergence. The performance of the proposed KLCSP algorithm is compared against two existing algorithms, CSP and stationary CSP (sCSP), using the publicly available BCI competition III dataset IVa and a large dataset from stroke patients performing neuro-rehabilitation. The results show that the proposed KLCSP algorithm significantly outperforms both the CSP and the sCSP algorithms, in terms of classification accuracy, by reducing within-class variations. This results in more compact and separable features.
  • Article
    Full-text available
    A major challenge in EEG-based brain-computer interfaces (BCIs) is the intersession nonstationarity in the EEG data that often leads to deteriorated BCI performances. To address this issue, this letter proposes a novel data space adaptation technique, EEG data space adaptation (EEG-DSA), to linearly transform the EEG data from the target space (evaluation session), such that the distribution difference to the source space (training session) is minimized. Using the Kullback-Leibler (KL) divergence criterion, we propose two versions of the EEG-DSA algorithm: the supervised version, when labeled data are available in the evaluation session, and the unsupervised version, when labeled data are not available. The performance of the proposed EEG-DSA algorithm is evaluated on the publicly available BCI Competition IV data set IIa and a data set recorded from 16 subjects performing motor imagery tasks on different days. The results show that the proposed EEG-DSA algorithm in both the supervised and unsupervised versions significantly outperforms the results without adaptation in terms of classification accuracy. The results also show that for subjects with poor BCI performances when no adaptation is applied, the proposed EEG-DSA algorithm in both the supervised and unsupervised versions significantly outperforms the unsupervised bias adaptation algorithm (PMean).
  • Conference Paper
    Full-text available
    Studies had shown that Motor Imagery-based Brain Computer Interface (MI-based BCI) system can be used as a therapeutic tool such as for stroke rehabilitation, but had shown that not all subjects could perform MI well. Studies had also shown that MI and passive movement (PM) could similarly activate the motor system. Although the idea of calibrating MI-based BCI system from PM data is promising, there is an inherent difference between features extracted from MI and PM. Therefore, there is a need for online learning to alleviate the difference and improve the performance. Hence, in this study we propose an online batch mode semi-supervised learning with KL distance weighting to update the model trained from the calibration session by using unlabeled data from the online test session. In this study, the Filter Bank Common Spatial Pattern (FBCSP) algorithm is used to compute the most discriminative features of the EEG data in the calibration session and is updated iteratively on each band after a batch of online data is available for performing semi-supervised learning. The performance of the proposed method was compared with offline FBCSP, and results showed that the proposed method yielded slightly better results in comparison with offline FBCSP. The results also showed that the use of the model trained from PM for online session-to-session transfer compared to the use of the calibration model trained from MI yielded slightly better performance. The results suggest that using PM, due to its better performance and ease of recording is feasible and performance can be improved by using the proposed method to perform online semi-supervised learning while subjects perform MI.
  • Article
    Full-text available
    We introduce a probabilistic model that combines a classifier with an extra reinforcement signal (RS) encoding the probability of an erroneous feedback being delivered by the classifier. This representation computes the class probabilities given the task related features and the reinforcement signal. Using expectation maximization (EM) to estimate the parameter values under such a model shows that some existing adaptive classifiers are particular cases of such an EM algorithm. Further, we present a new algorithm for adaptive classification, which we call constrained means adaptive classifier, and show using EEG data and simulated RS that this classifier is able to significantly outperform state-of-the-art adaptive classifiers.
  • Article
    Full-text available
    Event-related desynchronization/synchronization patterns during right/left motor imagery (MI) are effective features for an electroencephalogram-based brain-computer interface (BCI). As MI tasks are subject-specific, selection of subject-specific discriminative frequency components play a vital role in distinguishing these patterns. This paper proposes a new discriminative filter bank (FB) common spatial pattern algorithm to extract subject-specific FB for MI classification. The proposed method enhances the classification accuracy in BCI competition III dataset IVa and competition IV dataset IIb. Compared to the performance offered by the existing FB-based method, the proposed algorithm offers error rate reductions of 17.42% and 8.9% for BCI competition datasets III and IV, respectively.
  • Conference Paper
    Full-text available
    In motor imagery-based brain computer interfaces (BCI), discriminative patterns can be extracted from the electroencephalogram (EEG) using the common spatial pattern (CSP) algorithm. However, the performance of this spatial filter depends on the operational frequency band of the EEG. Thus, setting a broad frequency range, or manually selecting a subject-specific frequency range, are commonly used with the CSP algorithm. To address this problem, this paper proposes a novel filter bank common spatial pattern (FBCSP) to perform autonomous selection of key temporal-spatial discriminative EEG characteristics. After the EEG measurements have been bandpass-filtered into multiple frequency bands, CSP features are extracted from each of these bands. A feature selection algorithm is then used to automatically select discriminative pairs of frequency bands and corresponding CSP features. A classification algorithm is subsequently used to classify the CSP features. A study is conducted to assess the performance of a selection of feature selection and classification algorithms for use with the FBCSP. Extensive experimental results are presented on a publicly available dataset as well as data collected from healthy subjects and unilaterally paralyzed stroke patients. The results show that FBCSP, using a particular combination feature selection and classification algorithm, yields relatively higher cross-validation accuracies compared to prevailing approaches.
  • Article
    Full-text available
    Multichannel EEG is generally used in brain-computer interfaces (BCIs), whereby performing EEG channel selection 1) improves BCI performance by removing irrelevant or noisy channels and 2) enhances user convenience from the use of lesser channels. This paper proposes a novel sparse common spatial pattern (SCSP) algorithm for EEG channel selection. The proposed SCSP algorithm is formulated as an optimization problem to select the least number of channels within a constraint of classification accuracy. As such, the proposed approach can be customized to yield the best classification accuracy by removing the noisy and irrelevant channels, or retain the least number of channels without compromising the classification accuracy obtained by using all the channels. The proposed SCSP algorithm is evaluated using two motor imagery datasets, one with a moderate number of channels and another with a large number of channels. In both datasets, the proposed SCSP channel selection significantly reduced the number of channels, and outperformed existing channel selection methods based on Fisher criterion, mutual information, support vector machine, common spatial pattern, and regularized common spatial pattern in classification accuracy. The proposed SCSP algorithm also yielded an average improvement of 10% in classification accuracy compared to the use of three channels (C3, C4, and Cz).