LETTER Communicated by Stefan Haufe
Discriminative Learning of Propagation and Spatial Pattern
for Motor Imagery EEG Analysis
Xinyang Li
a0068297@nus.edu.sg
NUS Graduate School for Integrative Sciences and Engineering,
National University of Singapore 119613
Haihong Zhang
hhzhang@i2r.astar.edu.sg
Cuntai Guan
ctguan@i2r.astar.edu.sg
Institute for Infocomm Research, A*STAR, Singapore 138632
Sim Heng Ong
eleongsh@nus.edu.sg
Department of Electrical and C omputer Engineering and Department
of Bioengineering, National University of Singapore 119613
Kai Keng Ang
kkang@i2r.astar.edu.sg
Yaozhang Pan
yzpan@i2r.astar.edu.sg
Institute for Infocomm Research, A*STAR, Singapore 138632
Effective learning and recovery of relevant source brain activity patterns
is a major challenge to braincomputer interface using scalp EEG. Various
spatial ﬁltering solutions have been developed. Most current methods
estimate an instantaneous demixing with the assumption of uncorrelat
edness of the source signals. However, recent evidence in neuroscience
suggests that multiple brain regions cooperate, especially during motor
imagery, a major modality of brain activity for braincomputer interface.
In this sense, methods that assume uncorrelatedness of the sources be
come inaccurate. Therefore, we are promoting a new methodology that
considers both volume conduction effect and signal propagation between
multiple brain regions. Speciﬁcally, we propose a novel discriminative
algorithm for joint learning of propagation and spatial pattern with an
iterative optimization solution. To validate the new methodology, we
conduct experiments involving 16 healthy subjects and perform numer
ical analysis of the proposed algorithm for EEG classiﬁcation in mo
tor imagery braincomputer interface. Results from extensive analysis
Neural Computation 25, 2709–2733 (2013)
c
2013 Massachusetts Institute of Technology
doi:10.1162/NECO_a_00500
2710 X. Li et al.
validate the effectiveness of the new methodology with high statistical
signiﬁcance.
1 Introduction
Scalp EEG signals are stochastic, nonlinear, and nonstationary (Guler,
Kiymik, Akin, & Alkan, 2001) and have relatively low spatial resolution.
Therefore, it has been a considerable challenge to compute discriminative
and robust features for detecting the brain activity of interest, especially in
singletrial braincomputer interface (BCI) studies(Li & Guan, 2006; Llera,
Gomez, & Kappen, 2012). In this letter, we consider BCI using motor im
agery, although the general methodology can be applied to other brain sig
nals. Motor imagery is a dynamic brain state that can induce the same motor
representation internally as motor execution (Jeannerod, 1995). In particu
lar, distinctive brain signals of eventrelated desynchronization (ERD) and
eventrelated synchronization (ERS) are detectable from EEG during mo
tor imagery (Stavrinou, Moraru, Cimponeriu, Stefania, & Bezerianos, 2007;
Pfurtscheller, Brunner, Schlogl, & da Silva, 2006). Therefore, motor imagery
becomes an important modality in developing BCI systems (Lo et al., 2010;
Ang, Chin, Zhang, & Guan, 2008; Vidaurre, Sannelli, Muller, & Blankertz,
2011).
To improve the signaltonoise ratio, spatial ﬁltering has been widely
used to counter volume conduction effects (Blankertz, Tomika, Lemm,
Kawanabe, & Muller, 2008). In motor imagery, EEG classiﬁcation, probably
the most recognized technique, is a common spatial pattern (CSP) (Ramoser,
MullerGerking, & Pfurtscheller, 2000). In CSP, the desired spatial ﬁlters are
designed to extract prominent ERD/ERS by maximizing the variance of the
projected signal under one condition while minimizing it under the other
(Koles, 1991; Gerkinga, Pfurtscheller, & Flyvbjergc, 1999). Various methods
have been proposed to improve the performance of CSP by addressing the
problem of selecting proper time segments or frequency bands of EEG. In
Lemm, Blankertz, Curio, and Muller (2005), common spatiospectral pattern
(CSSP) optimizes a simple ﬁlter by adding a onetimedelayed sample to
have more channels. In Dornhege et al. (2006), common sparse spectral
spatial pattern (CSSSP) extends CSSP by adding the optimization of a com
plete global spatialtemporal ﬁlter into CSP. In Ang, Chin, Wang, Guan, and
Zhang (2012), Ang, Chin, Zhang, and Guan (2012), and Thomas, Guan, Lau,
Vinod, and Ang (2009), EEG signals are decomposed into several frequency
bands, CSP is applied to different bands independently, and feature fusion
or classiﬁer fusion is introduced to produce ﬁnal classiﬁcation results. These
methods either implicitly or explicitly assume that raw scalp EEG wave
forms are generated by uncorrelated source signals, and subsequently, they
may not account for more complicated brain signal dynamics such as causal
propagation between different brain regions.
Discriminative Learning of EEG Propagation 2711
Recently brain activities during motor imagery other than ERD/ERS
have been observed in multifunctional areas using functional magnetic
resonance imaging (fMRI) or EEG (Formaggio, Storti, Cerini, Fiaschi, &
Manganotti, 2010; Chen, Yang, Liao, Gong, & Shen, 2009). In particular, the
analysis of neural connectivity is gaining more attention in neuroscience
because it describes the general functioning of the brain and communi
cation among its different regions (Astolﬁ et al., 2006; Ewald, Marzetti,
Zappasodi, Meinecke, & Nolte, 2012). For example, causal connectivity is
found in motorrelated core regions such as the primary motor cortex (M1)
and supplementary motor area (SMA) during motor imagery (Chen et al.,
2009). The causal ﬂow or timelagged correlation is beyond volume con
duction and is caused by possible neuronal propagation (GomezHerrero,
Atienza, Egiazarian, & Cantero, 2008). To investigate such propagation ef
fects, directed transfer function (DTF) has been used to evaluate causal ﬂow
between any given pair of channels in a multichannel EEG in frequency do
main, which was introduced in Baccala and Sameshima (2001), Kaminski
and Blinowska (1991), and Kaminski, Ding, Truccolo, and Bressle (2001).
This estimation of DTF is based on a multivariate autoregressive model
(MVAR), and, more importantly it has been applied to EEG data of vol
untary ﬁnger movement and motor imagery for eventrelated causal ﬂow
investigation (Ginter, Blinowska, Kaminski, & Durka, 2001; Schlogl & Supp,
2006). Kus, Kaminski, and Blinowska (2004) found that there is a rapid in
crease of information outﬂow from electrodes Fc3 and C3 caused by ERS,
and propagation of βsynchronization from Fc3 and Fc1 to C3, C1, Cz, Cp3
and Cp1 exists, which gives evidence of communication among sensori
motor areas. However, looking at only the time proﬁles of ERD/ERS, it is
difﬁcult to determine the primary source of activity; hence, existing instan
taneous demixing models are not capable of modeling signal propagation
among underlying ERD/ERS sources.
In the presence of neuronal propagation and causal relationship dur
ing motor imagery, conventional spatial ﬁlter design methodology is not
sufﬁcient to capture the underlying brain activities (Dyrholm, Makeig,
& Hansen, 2007; Bahramisharif, van Gerven, Schoffelen, Ghahramani, &
Heskes, 2012). We would like to note that although some of the connec
tivity measurements mentioned above have been explored already (Wei,
Wang, Gao, & Gao, 2007; Gysels & Celka, 2007), only scalp connectivity
and intrachannel synchronization measurements are directly used as fea
tures, whereas volume conduction effects are not rigorously addressed.
One consequence would be that bandpower variations are misinterpreted
as changes in connectivity (GrosseWentrup, 2009).
Therefore, rather than ignoring the connectivity or propagation between
sources in spatial ﬁlter design or using scalp connectivity directly as fea
tures, we would like to promote a computational model that can more
accurately describe the underlying processes by considering both neuronal
propagation and volume conduction effects.
2712 X. Li et al.
In this work, we devise a novel discriminative learning model for motor
imagery EEG based on a multivariate convolutive process with an analysis
of the spurious effects in classifying ERD/ERS based on an instant linear
mixture model. The effectiveness of introducing a timelagged demixing
matrix to produce timedecorrelated data is analyzed theoretically from the
perspective of background noise elimination. Furthermore, the demixing
matrices accounting for propagation and volume conduction are estimated
jointly and iteratively in the proposed uniﬁed model. From the experi
mental study, we evaluate the efﬁciency of the new methodology in terms
of classiﬁcation accuracy in the twoclass motor imagery EEG classiﬁca
tion problem. We also analyze the effectiveness of the proposed method
for background noise elimination using the K
¨
ullbackLeibler divergence
measure.
This letter is organized as follows. In section 2, we discuss limitations
of conventional spatial ﬁlter design and the necessity of considering the
causal propagation. Then we give the details of the proposed discriminative
learning of propagation and spatial pattern. In section 3, the validity of the
proposed method is veriﬁed by experimental studies on twoclass motor
imagery classiﬁcation. Our concluding remarks are in section 4.
2 Discriminative Learning of Propagation and Spatial Pattern
2.1 Data Model and Problem Formulation. Let X(t) be the timeseries
of a multichannel EEG signal, with each component in X(t) representing a
particular EEG channel measured at time t. Considering the complex tem
poral dynamics, especially the latent causal relations in X(t), we describe
the observed data X(t) as an mdimension linear convolutive mixture pro
cess of order l (Dyrholm et al., 2007; Mørup, Madsen, & Hansen, 2009),
X(t) =
l
τ =1
(τ )S(t − τ), (2.1)
where S(t) is the source signal of interest, (τ ) is the projection matrix of
the order τ,andl is the maximum timelagged order. When l = 0, the ob
served data X(t) is an instant mixing process. For simplicity of description,
the additive EEG noise can be described by an component in S(t).Conven
tionally, it is assumed in motor imagery EEG classiﬁcation that X(t) is an
instant linear mixture of source signals. This leads to an instant demixing
solution to the estimation of S(t),
ˆ
S(t) = WX(t), (2.2)
where W is the projection or demixing matrix containing m rows, and each
row of W is effectively a spatial ﬁlter w.
Discriminative Learning of EEG Propagation 2713
Interestingly, we note that the estimate
ˆ
S(t) given by equation 2.2 is also
a mixture of the timelagged components,
ˆ
S(t) =
τ
w
(τ )S(t − τ), (2.3)
where
w
(τ ) = W(τ ) is a mixing matrix.
A perfect solution would b e that
w
(τ ) takes an identity matrix form
for τ = 0 and a zero matrix form for any τ = 0. This is generally impossible
except in the exceptional case that (τ ) = 0 for τ = 0, or, in other words,
when the convolutive mixture model in equation 2.1 reduces to an instant
mixing model.
Remark 1. In discriminative analysis, the spatial ﬁlter W is designed to ex
tract the most discriminative signal
ˆ
S(t). However, due to the timelagged
relationships, discriminative signals are still mixed with nondiscriminative
ones in
ˆ
S(t). Therefore, it is necessary to take the causal ﬂow into consider
ation, together with spatial ﬁlter design in a uniﬁed model, to have a better
estimation of S(t), which is the motivation of this letter.
Solving the reconstruction problem of S(t) from equation 2.1 may lead
to a solution in the form of an inﬁnite impulse response (IIR) ﬁlter. As we
will elaborate shortly and also for practical use, we simplify the problem
into a ﬁnite impulse response (FIR) ﬁlter given by
S(t) = W(X(t) −
p
τ =1
A(τ )X(t − τ)), (2.4)
where A(τ ) is the demixing matrix of the order τ that accounts for the
timelagged propagation effect.
Remark 2. The manipulation of simplifying the IIR form into the FIR
form is for the convenience of practical implementation. Practically, this
mixing effect can be accounted for by a ﬁnite number of orders, while the
rest can be ignored. Although not rigorously proven, the feasibility of this
simpliﬁcation in the discriminative problem will be discussed and validated
by the experimental results in section 3.
For the convenience of presentation and analysis, we divide the recon
struction problem of S(t) into two parts. First, we deﬁne
X(t) = X(t) −
p
τ =1
A(τ )X(t − τ), (2.5)
2714 X. Li et al.
where
X(t) is the signal processed by a ﬁnite multivariate FIR ﬁlter of order
p. We refer to it as the timedecorrelated data in the following discussion.
The source signal can be recovered from the timedecorrelated data
X(t) by
S(t) = W
X(t). (2.6)
It is interesting that reconstructing S(t) based on equations 2.6 and 2.5 re
sembles the classical causal connectivity estimation based on MVAR analy
sis (Dyrholm et al., 2007; GomezHerrero et al., 2008; Haufe, Tomioka, Nolte,
Muller, & Kawanabe, 2010), where the process S(t) is usually deﬁned as a
temporally and spatially uncorrelated time sequence. Different from the
connectivity identiﬁcation, the objective in this letter lies in discriminative
learning. Therefore, rather than modeling the signals, the demixing matrix
A(τ ) is used to construct the ERD/ERS sources from the measurements.
Moreover, S(t) does correspond not to the innovation process but to the
ERD/ERS sources, which we explain in detail in the appendix. The objec
tive in estimating A(τ ) is the variance difference between two classes but
not the independence of the source, so that the discriminative power of S(t)
is maintained. Based on the convolutive model, possible propagation effects
can be addressed in the discriminative model. Details of joint estimation of
A(τ ) and W in equations 2.6 and 2.5 for the objective of classiﬁcation are
introduced in the following section.
2.2 Joint Estimation of Propagation and Spatial Pattern. We introduce
the principle of CSP in the design of joint estimation of propagation and
spatial pattern. As CSP can be viewed as a spatial transformation, the prin
ciple lies in maximizing the power of the transformed signal for one class
while minimizing it for the other. The normalized sample covariance matrix
R
i
of trial i is obtained as
R
i
=
X
i
X
T
i
tr(X
i
X
T
i
)
, (2.7)
where tr(·) is the trace of a matrix. In this letter, we consider only the binary
classiﬁcation problem, and the two classes are indexed by c ={0, 1}. Let
Q
c
denote the set of trials that belong to class c such that Q
0
Q
1
= ∅.The
average covariance matrix for each class is then calculated as
R
(c)
=
1
Q
c

i∈Q
c
R
i
, (2.8)
where 
Q
c
 denotes the total number of samples belonging to set Q
c
. Suppose
the signal power is to be maximized for class 0; the objective function in
CSP is given by
max
w
wR
(0)
w
T
s.t. w (R
(0)
+ R
(1)
)w
T
= 1. (2.9)
Discriminative Learning of EEG Propagation 2715
Note that the dependence of EEG signals (in equation 2.8 and onwards) on
time is implied unless otherwise stated. The idea of discriminating the
EEG signals of two different motor imagery classes in terms of power
(the variance of the projected signal) in equation 2.9 is directly related
to the nature of ERD/ERS. Therefore, we deal with the estimation of S(t) in
the proposed model by adopting variance differentiation as the objective.
To embed the estimation of A(τ ) in equation 2.4 into the objective function,
equation 2.9, we rewrite equation 2.5 to make the relationship between raw
EEG data X and the timedecorrelated data
X more compact by deﬁning
ˆ
A(τ ) =
I,τ= 0
−A(τ ), τ > 0
, (2.10)
which we refer to as the timelagged demixing matrix for the simplicity.
Therefore,
X(t) in equation 2.5 becomes
X(t) =
p
τ =0
ˆ
A(τ )X(t − τ). (2.11)
Similarly, the covariance matrix of
X(t) is
R
i
=
X
i
X
T
i
tr(
X
i
X
T
i
)
, (2.12)
and the average covariance based on
X(t) for each class is
R
(c)
=
1
Q
c

i∈Q
c
R
i
. (2.13)
Replacing R
(c)
in equation 2.9 with
R
(c)
and considering equations 2.11 and
2.12, the optimization problem becomes
max
w,
ˆ
A(τ )
w
⎛
⎝
p
τ
1
=0
p
τ
2
=0
ˆ
A(τ
1
)R
(0)
(τ
)
ˆ
A(τ
2
)
⎞
⎠
w
T
, s.t.
w
⎛
⎝
p
τ
1
=0
p
τ
2
=0
ˆ
A(τ
1
)(R
(0)
(τ
) + R
(1)
(τ
)
⎞
⎠
ˆ
A(τ
2
))w
T
= 1, (2.14)
where R
(c)
(τ
) =
1
Q
c

i∈Q
c
X
i
(t − τ
1
)(X
i
(t − τ
2
))
T
. In this way, the estima
tion of model 2.4 is achieved by solving the optimization problem in equa
tion 2.14. Moreover, as shown in equation 2.14, only one
ˆ
A(τ ),asapart
2716 X. Li et al.
of the feature extraction model, is obtained on the completion of the opti
mization since the calculation is conducted with the averaged covariance
matrix R
(c)
(τ
) over all the trials. This is very different from the regression
model in connectivity analysis, in which the estimated models are different
for different trials.
Because the above objective function can be highly nonlinear, we adopt
an iteration procedure to estimate w and
ˆ
A(τ ). Since both of the estimations
of the spatial ﬁlter w and the timelagged demixing matrix
ˆ
A(τ ) depend
on each other, the iterative method alternatively updates one while ﬁx
ing the other. To be speciﬁc, the spatial ﬁlter w can be obtained based on
aﬁxed
ˆ
A(τ ) by solving equation 2.9. For
ˆ
A(τ ), we calculate the jth col
umn of
ˆ
A(τ ),[
ˆ
a
1 j
,
ˆ
a
2 j
,...,
ˆ
a
Cj
]
T
, separately based on the ﬁxed spatial ﬁlter
and [
ˆ
a
1k
,
ˆ
a
2k
,...,
ˆ
a
Ck
]
T
(k = 1,...,C and k = j) from the last iteration. In
this way, the information ﬂow from different channels is optimized indi
vidually, and the update of
ˆ
A(τ ) ﬁnishes on the completion of estimating
[
ˆ
a
1 j
,
ˆ
a
2 j
,...,
ˆ
a
Cj
]
T
for j = 1,...,C. T he implementation of the proposed dis
criminative learning algorithm of propagation and spatial patterns is sum
marized in algorithm 1. The loop will not stop until the convergence criteria
are met. Note that during the optimization, only one spatial ﬁlter w is used.
On completion of the optimization,
X can b e obtained from equation 2.11,
and subsequently
R
(c)
can be obtained based on equation 2.12. With R
c
sub
stituted with
R
(c)
, the optimization problem in equation 2.9 is equivalent to
solving the eigenvalue decomposition problem,
W
R
(0)
= W
R
(1)
, (2.15)
where is the diagonal matrix containing the eigenvalues of (
R
(1)
)
−1
R
(0)
.
With the projection matrix W, we select r pairs of spatial ﬁlters correspond
ing to the r largest or smallest components in as in the usual CSP proce
dure. And the feature
F
i
for trial i is obtained from
X
i
as
F
i
= log
w
j
X
i
X
T
i
w
T
j
j
w
j
X
i
X
T
i
w
T
j
, j = 1,...,r, N − r + 1,...,N. (2.16)
2.3 Background Noise Separation. In this section, we investigate the
effectiveness of introducing the timelagged demixing matrix
ˆ
A(τ ) into the
estimation of the ERD/ERS source, combined with spatial ﬁlter design. To
further analyze and evaluate the proposed model, the difference between
the timedecorrelated EEG signal
X(t) (see equation 2.5) and original EEG
data X(t) is investigated. Suppose X(t) is described by the following MVAR
model,
X(t) =
q
τ =1
B(τ )X(t − τ)+ N(t), (2.17)
Discriminative Learning of EEG Propagation 2717
Algorithm 1: Discriminative Learning of Propagation and Spatial Pattern
Input:
Training EEG data that comprises N sample blocks of X, with each block having
a speciﬁc class label;
Output:
Spatial ﬁlter w and timelagged correlation estimates
ˆ
A(τ).
begin
Set the initial parameters of the spatiotemporal ﬁlters
ˆ
A(τ) as zero matrices;
for k =1:n
k
do
Compute X based on
ˆ
A(τ) using equation 2.11;
Compute w by solving the optimization problem in equation 2.9;
% Update the spatial ﬁlter w
for j =1:C do
Compute [ˆa
1j
, ˆa
2j
,...,ˆa
mj
]
T
based on the updated spatial ﬁlter w by
solving the optimization problem in equation 2.14;
% Update
ˆ
A(τ).
Compute the change in the norm
ˆ
A(τ) by δ =
ˆ
A(τ)
k
ˆ
A(τ)
k−1
;
if
δ<ζ
(
ζ
is a small preset constant)
then
Stop.
where N(t) is the prediction error. It is also regarded as the innovation
process because it is spontaneous and cannot be totally predicted by past
observations (GomezHerrero et al., 2008). Note that B(τ ) is the mixing ma
trix based on the regression model, which is different from A(τ ) estimated
in the proposed model for discriminative purposes and q is the order of the
2718 X. Li et al.
MVAR model. Similarly, equation 2.17 is rearranged in the following form
to make the inputoutput relationship more compact,
N(t) =
q
τ =0
ˆ
B(τ )X(t − τ), (2.18)
where
ˆ
B(τ ) =
−I,τ= 0;
B(τ ), τ > 0.
(2.19)
Transforming equation 2.18 into the frequency domain yields
N( f ) = B( f )X( f ), (2.20)
B( f ) =
q
τ =0
ˆ
B(τ )e
−i2π fτ
, (2.21)
where f is the frequency. Therefore, the transfer function of the system H( f )
can be described by
H( f ) = B
−1
( f ), (2.22)
such that X( f ) = H( f )N( f ).
By substituting equation 2.17 into 2.5 and following the steps from equa
tion 2.20 to 2.22, we obtain
X( f ) = (I − A( f ))X( f ) (2.23)
=
H( f ) −
A( f )
B( f )
N( f ), (2.24)
where
A( f ) =
p
τ =0
ˆ
A(τ )e
−i2π fτ
. (2.25)
Let
H( f ) = H( f ) −
A( f )
B( f )
, which is the transfer function from N( f ) to
X.
Since the causal ﬂow measurement DTF is deﬁned based on the transfer
function (Kaminski et al., 2001), we see that the proposed method changes
the information ﬂow by changing the transfer function from H( f ) to
H( f ).
Moreover, comparison of the transfer functions of
X and X in equation
2.23 shows its similarity to the classical signalplusnoise (SPN) model. In
Discriminative Learning of EEG Propagation 2719
particular, in Xu et al. (2009), the observed EEG data containing ERP X
E
( f )
is usually formulated as
X
E
( f ) = S
E
( f ) + Z( f ) (2.26)
where S
E
( f ) is the ERP of interest and Z( f ) is the background noise or the
ongoing activity.
Remark 3. As Xu et al. (2009) discussed, the background noise is not noise
despite its noiselike appearance but represents ongoing brain activity rich
in oscillatory content. In the light of the above discussion, we can interpret
equation 2.23 from a similar perspective. As indicated in equation 2.23,
the frequency component removed from X is an oscillatory signal with a
transfer function
A( f )
B( f )
, and it can be regarded as an estimate of ongoing
activity. In other words, this ongoing activity constitutes part of the MVAR
process of X with the portion as
A( f )
B( f )
. In this way, the ERD/ERS components
are enhanced in the proposed model with the oscillatory background noise
attenuated.
The K
¨
ullbackLeibler (KL) divergence is a measure of probability diver
gence given two probability distributions, and it has been used to evaluate
nonstationarity in motor imagery EEG classiﬁcation problem (Arvaneh,
Guan, Ang, & Quek, 2013a, 2013b; Bamdadian, Guan, Ang, & Xu, 2012).
Therefore, to verify that the component removed from X is the background
noise, we adopt the KL divergence as the criterion.
As the gaussian model is usually used to model EEG data, we consider
the KL divergence between two gaussian distributions. In particular, the
KL divergence between two gaussian distributions with the means and
nonsingular covariance matrices corresponding to distribution
N
0
/N
1
as
μ
0
/μ
1
and
0
/
1
is
D
KL
(N
0
N
1
) =
1
2
tr
−1
1
0
− (μ
1
− μ
0
)
T
−1
1
(μ
1
− μ
0
)
− ln
det
0
det
1
− k
. (2.27)
It is reasonable to assume that the improved separation of background noise
will result in more stationary data with fewer withinclass dissimilarities.
We therefore adopt KL divergence to measure such withinclass d issimi
larities. The smaller the KL divergences within trials from the same class,
the less the variation of the data, which generally relates to better classiﬁ
cation results. Since EEG data are usually processed to be centered and the
dimension k of the distribution is the number of channel m, for every trial i
in class c,weuseD
KL
(N (0, R
i
)N (0, R
(c)
)) to measure the dissimilarity of
2720 X. Li et al.
the distribution of this trial from the mean distribution of the class c as
D
KL
(N (0, R
i
)N (0, R
(c)
)) =
1
2
tr(R
−1
i
R
(c)
) − ln
det R
i
det R
(c)
− m
,
(2.28)
and subsequently we obtain an average probability divergence D for EEG
data X as
D =
c=0,1
1
Q
c

i∈Q
c
D
KL
(N (0, R
i
)N (0, R
(c)
)). (2.29)
Similarly, we obtain
D based on
X as
D =
c=0,1
1
Q
c

i∈Q
c
D
KL
(N (0,
R
i
)N (0,
R
(c)
)). (2.30)
In this way, by comparing D and
D, we can evaluate the quality of X and
X
in terms of withinclass dissimilarities.
Remark 4. It is worth noting that the proposed method addresses a more
complicated dynamics of motor imagery EEG but does not depend on the
very critical explanation of the generation of ERD/ERS. On the one hand,
it is possible that propagation effects that contribute to the generation of
ERD/ERS exist. On the other hand, discriminative sources could correlate
with noise in a convolutive way. Blind source separation or connectivity
estimation methodology, as discussed before, may not be effective for clas
siﬁcation problems because it is difﬁcult to differentiate between two kinds
of propagation effects. The proposed model, which is formulated in a phe
nomenological form, equation 2.23, takes both cases into consideration.
3 Experimental Results and Discussion
3.1 Data Description and Processing. Sixteen subjects participated in
the study with informed consent. Ethics approval was obtained before
hand from the Institutional Review Board of the National University of
Singapore. EEGs from the full 27 channels were obtained using Nuamps
EEG acquisition hardware with unipolar Ag/AgCl electrodes channels. The
sampling rate was 250 Hz with a resolution of 22 bits for the voltage range
of ±130 mV. A bandpass ﬁlter of 0.05 to 40 Hz was set in the acquisition
hardware.
Discriminative Learning of EEG Propagation 2721
In the experiment, the training and test sessions were recorded on dif
ferent days with the subjects performing motor imagery. During the EEG
recording process, the subjects were asked to avoid physical movement
and eye blinking. In addition, they were instructed to perform kinesthetic
motor imagery of the chosen hand in two runs. During the rest state, they
did mental counting to make the resting EEG signal more consistent. Each
run lasted approximately 16 minutes and consisted of 40 trials of motor
imagery and 40 trials of rest state. Each training session consisted of two
runs, while the test session consisted of two or three runs.
We select the time segments from 0.5 s to 2.5 s after the cue (Arvaneh,
Guan, Ang, & Quek, 2011). The raw data are preﬁltered by an 8 Hz to 35 Hz
bandpass ﬁlter that covers rhythms related to motor imagery. The ﬁltered
training data are used to train the feature extraction model based on the
proposed method as described in section 2.2. The numbers of spatial ﬁlters
in W are chosen as 2 and 3 (r = 2, 3 in equation 2.16). Finally, the extracted
training features are used to train a support vector machine (SVM) classiﬁer.
3.2 Investigation on the Order of the TimeLagged Demixing Matrix.
To determine the order p of
ˆ
A(τ ) in equation 2.11, we ﬁt the MVAR model
to EEG data as in equation 2.17. Although the orders p and q have different
meanings, the analysis of the order q of the mixing matrix B(τ ) in equation
2.14 provides the information at which timelagged level the propagation
effects are stronger. Based on equation 2.20 and the analysis given in sec
tion 2.3, as
ˆ
A(τ ) corresponds to certain components of B(τ ) in frequency
domain, it is reasonable to choose the order p of
ˆ
A(τ ) in accordance with
q, the order of B(τ ). Therefore, the analysis of the mixing matrix B(τ ) can
be used to initialize the order p of
ˆ
A(τ ) in the proposed model. The Swartz
Bayesian criterion is used to automatically select the model order that best
matches the data (Schneider & Neumaier, 2001). We found that for every
subject, the order 5 for q is selected for most of the trials and the order 4
or 6 is selected for the remaining of the trials. Therefore, we restrict the
investigation on the order 4, 5, or 6.
Figure 1 illustrates the result of one subject in the data set introduced in
section 3.1. The yaxis indicates the value of the norm of mixing matrix B(τ )
in equation 2.17 of different orders, and the xaxis indicates the order τ .The
coefﬁcient matrices are obtained under MVAR models with q equal to 4, 5, or
6 and averaged over the training set and test set, respectively, resulting in the
six lines in Figure 1. We see that in all six cases, the norms of the coefﬁcient
matrices of orders 2 and 3 are the highest, which means that the data at time
t are most inﬂuenced by the data at time t − 2andtimet − 3. Therefore,
the order p of
ˆ
A(τ ) should include these two time lags, and subsequently
the proposed discriminative learning model addresses the most inﬂuential
propagation effects. Furthermore, we focus on investigating the feasibility
of the proposed model with orders 4 and below.
2722 X. Li et al.
0 1 2 3 4 5 6 7
0
2
4
6
8
10
12
Order τ
Norm of the coefficient matrix B(τ)
Training set (q =4)
Test set (q=4)
Training set (q=5)
Test set (q=5)
Training set (q=6)
Test set (q=6)
Figure 1: Norms of coefﬁcient matrices under the MVAR model. The xaxis
represents the order τ,andyaxis represents the norm of B(τ ).ThreeMVAR
models with order q from 4 to 6 are used to ﬁt EEG data of training and test sets
separately, yielding six lines. And the peak points of the six lines correspond to
either τ = 2orτ = 3.
3.3 Classiﬁcation Results. Tables 1 and 2 summarize the performance
of the proposed feature extraction method, compared with CSP as the base
line. In these two tables, we refer to the proposed method as discriminative
propagation and spatial pattern analysis (DPSP). Tables 1 and 2 c orrespond
to r = 2andr = 3, respectively, and in both tables, results of DPSP with
p = 1, 2,...,4 are included.
According to the results, the proposed feature extraction method im
proves the performance of the classiﬁer, and the improvements are sig
niﬁcant when the order of
ˆ
A(τ ) in DPSP is 2 or 3 regardless of the value
of r, which is in agreement with the previous analysis based on the MAVR
model. Speciﬁcally, the average classiﬁcation accuracy for order 2 is 68.30%,
and the accuracy for order 3 is 67.91% when r = 2, both of which are higher
than that of CSP (65.56%). The paired ttest conﬁrms the signiﬁcance of the
improvement at a 5% level with pvalues equaling 0.008 and 0.040, corre
sponding to the cases of p = 2andp = 3, respectively. Similar to the results
based on two pairs of spatial ﬁlters, the average classiﬁcation accuracy is
68.98% for p = 2and68.75% for p = 3 of DPSP when r = 3, higher than that
of CSP (66.48%). Also, the signiﬁcance of the improvement is conﬁrmed by
ttest with pvalues of 0.027 and 0.022, corresponding to the cases of p = 2
Discriminative Learning of EEG Propagation 2723
Table 1: SessiontoSession Transfer Test Results for r = 2(%).
DPSP
Subject CSP p = 1 p = 2 p = 3 p = 4
1 65.00 65.41 62.91 66.66 67.08
2 51.25 51.25 54.17 52.08 52.08
3 55.00 55.00 57.50 55.83 55.00
4 66.67 66.67 70.41 71.25 77.08
5 54.58 54.16 67.08 70.41 58.33
6 67.08 67.50 72.50 69.16 69.58
7 77.08 77.08 77.92 76.66 72.5
8 94.16 94.16 92.50 96.25 95.41
9 74.58 75.00 75.83 75.83 74.58
10 61.66 61.25 60.41 60.83 60.00
11 46.25 46.67 49.16 53.33 47.08
12 77.00 77.08 81.25 79.58 73.33
13 51.25 51.25 54.58 51.25 50.00
14 72.08 72.08 79.16 73.75 74.58
15 65.83 65.58 67.50 64.16 64.58
16 69.58 69.60 70.00 68.75 65.00
Mean 65.56 65.59 68.30 67.91 66.01
SD 12.26 12.28 11.57 11.79 12.35
pvalue – 0.64 0.008
∗
0.040
∗
0.63
∗
p ≤ 0.05.
and p = 3, respectively. The accuracy for order 4 is 66.01% when r = 2and
66.41% when r = 3, which are not signiﬁcant. Interestingly, the accuracy
for order 1 is almost the same as that of CSP in both tables, which also
conﬁrms our previous analysis: it is necessary and sufﬁcient for
ˆ
A(τ ) to
cover the major components of
ˆ
B(τ ). The propagation effect is strongest at
orders 2 and 3, and the optimization based on
ˆ
A(τ ) for order 1 has very
limited effect and results in almost the same result. The optimization based
on
ˆ
A(τ ) of order 4 accounts for most of the propagation effect, but more
parameters pose a risk of overﬁtting. In other words, ideally the higher
the order of
ˆ
A(τ ), the better the results should be, since more propagation
effects are taken into consideration. However, for a higher order, the in
creased number of parameters would cause overﬁtting, which makes the
classiﬁcation results deteriorate. To keep a balance between accounting for
the propagation effects and overﬁtting, it is effective to cover as few major
components of propagation as possible, which come from orders 2 and 3 in
this experiment.
Figure 2 is used to show the comparison result in a more intuitive way.
Each plot in Figure 2 shows the test accuracy under DPSP with order p
against that under CSP. The xaxis represents the accuracy results under
2724 X. Li et al.
Table 2: SessiontoSession Transfer Test Results for r = 3(%).
DPSP
Subject CSP p = 1 p = 2 p = 3 p = 4
1 70.41 70.41 71.66 73.33 73.33
2 54.58 54.58 57.08 60.83 54.16
3 56.66 56.66 57.50 55.83 55.00
4 75.41 76.66 76.66 74.16 75.41
5 53.33 53.33 67.08 66.67 54.16
6 68.33 68.33 71.66 71.66 70.83
7 72.50 72.50 75.00 72.92 71.66
8 94.58 94.58 91.66 94.58 95.00
9 76.25 76.58 77.91 76.25 72.50
10 57.50 60.83 60.41 61.67 60.00
11 47.50 47.50 50.41 47.92 47.08
12 75.83 75.41 80.83 81.25 72.05
13 49.58 49.58 51.25 50.00 49.58
14 74.16 74.16 80.41 74.58 75.41
15 64.16 64.16 64.58 65.00 72.08
16 72.91 72.91 68.75 72.08 68.75
Mean 66.48 66.52 68.98 68.74 66.14
SD 12.51 12.04 11.51 11.70 12.34
pvalue – 0.53 0.027
∗
0.022
∗
0.55
∗
p ≤ 0.05.
CSP, and the yaxis represents that under DPSP. In each plot, a circle above
the diagonal line marks a subject for which DPSP outperforms CSP.
Figure 3 shows A(τ ) for two subjects. For a better comparison of differ
ences b etween the proposed method and the MVAR model, mixing matrices
B(τ ) based on the MVAR model of the two subjects are also provided. As
shown in Figure 3, the diagonal elements of B(τ ) are much higher than
the offdiagonal elements, because the selfspectrum of the signal is usu
ally stronger than the crossspectrum between the EEG signals. However,
there are no large differences between diagonal and offdiagonal elements
of A(τ ), and diagonal elements are not signiﬁcantly higher, which means
the selfspectrum of the signal is not modulated radically by A(τ ).More
over, since elements of higher values concentrate in certain columns, higher
weights are given to tune propagation from certain channels.
3.4 Analysis of Background Noise Separation. To further verify the
validity of DPSP, we have evaluated the classwise KL divergence (see sec
tion 2.3). Results averaged among all subjects are shown in Table 3 and
Figure 4. Note that for the computation of D
KL
of both the training set and
the test set, the average covariance matrix R/
R is the mean of the training
set since under the singletrial analysis setting, we cannot obtain the mean
Discriminative Learning of EEG Propagation 2725
0.4 0.6 0.8 1
0.4
0.5
0.6
0.7
0.8
0.9
1
Accuracy under DPSP
p =2, r =2
Accuracy under CSP
p =2, r =2
p−value=0.008
0.4 0.6 0.8 1
0.4
0.5
0.6
0.7
0.8
0.9
1
Accuracy under DPSP
p
=2,
r
=3
Accuracy under CSP
p =2, r =3
p−value=0.027
0.4 0.6 0.8 1
0.4
0.5
0.6
0.7
0.8
0.9
1
Accuracy under DPSP
p =3, r =2
Accuracy under CSP
p =3, r =2
p−value=0.04
0.4 0.6 0.8 1
0.4
0.5
0.6
0.7
0.8
0.9
1
Accuracy under DPSP
p =3,
r
=3
Accuracy under CSP
p =3, r =3
p−value=0.022
Figure 2: Sessiontosession transfer test accuracy. The xaxis represents the
accuracy results under CSP, and the yaxis represents that under DPSP with
different orders p and numbers of spatial ﬁlters r.They = x line is denoted by a
dotteddashed line. In each plot, a circle above the y = x line marks a subject for
which DPSP outperforms CSP. It can be seen from the plots that improvements
of DPSP for order 2 and 3 are signiﬁcant.
of the test set. Therefore, the fact that the average divergence D of the test
set is larger than that of the training set in all cases reﬂects the differences
between the test set and the training set, as indicated by Table 3. This is
mainly caused by the sessiontosession transfer effects. According to the
results, the proposed DPSP algorithm decreases the KL divergence within
the same class for both the training set and the test set, which means that
compared to the raw EEG data X, data processed by DPSP
X are more sta
tionary. A more signiﬁcant decrease is achieved for the test set, which means
that the proposed method is more robust to the sessiontosession transfer
effects. Moreover, the comparison between different orders indicates that
better performance is achieved with the order 2, which is in accord with the
accuracy results.
Figure 5 illustrates the correlation between the decrease of KL divergence
and the increase of the classiﬁcation accuracy at the subject level. The linear
2726 X. Li et al.
(a) Comparison between A(τ) and B(τ) for subject 7
(b) Comparison between A(τ) and B(τ) for subject 14
Figure 3: Comparison of coefﬁcient matrices obtained by the proposed method,
A(τ ), and the mixing matrices in MVAR, B(τ ). For both subjects, the diagonal
elements of B(τ ) are much higher than the offdiagonal elements. For A(τ ),
elements of higher values are found in certain columns.
correlation coefﬁcient r
c
equals 0.30 and 0.31, corresponding to p = 2and
p = 3, respectively. Due to the large variety across subjects, their KL diver
gence may lie in different feature spaces. The decrease of KL divergence and
the increase of classiﬁcation performance may not correlate linearly. As il
lustrated in Figure 5, almost all the points lie in the ﬁrst quadrant, indicating
Discriminative Learning of EEG Propagation 2727
Table 3: Decrease of KL Divergence (%).
p = 2 p = 3 p = 4
D
D 1 −
D
D
D 1 −
D
D
D 1 −
D
D
Training set 4.96 4.09 17.68% 4.25 14.39% 4.84 2.55%
Test set 64.3 25.2 60.84% 36.68 42.98% 57.09 11.24%
Figure 4: Decrease of the KL divergence. T he decreases of the KL divergence
in
X of different orders compared to X are shown as percentages. A great
decrease in the KL divergence indicates that
X is more stationary than X.There
fore, the proposed DPSP algorithm is more robust toward varying background
noise and sessiontosession transfer effects.
that the decreased KL divergence is positively correlated with the increased
classiﬁcation accuracy. Therefore, the decrease of the KL divergence con
tributes to the increase of the classiﬁcation accuracy to a certain extent.
Nevertheless, the reason for the increase of the classiﬁcation could b e more
complicated so that KL divergence cannot completely represent it. We will
investigate this issue in the future work.
4Conclusion
The coexistence of brain connectivity and volume conduction may have
complicated effects in EEG measurements and poses technical challenge
to detecting speciﬁc brain activities of interest. Conventional linear spatial
ﬁlters design methods with the assumption of unconnectedness of sources
2728 X. Li et al.
−50 0 50 100
−0.2
−0.1
0
0.1
0.2
0.3
Decrease of KL−divergence (%)
Increase of classification accuracy (%)
(a) r
c
=0.30 (p =2)
−50 0 50 100
−0.2
−0.1
0
0.1
0.2
0.3
Decrease of KL−divergence (%)
Increase of classification accuracy (%)
(b) r
c
=0.31 (p =3)
Figure 5: Correlation between the decrease of the KL divergence and the in
crease of the classiﬁcation accuracy. The xaxis represents the decrease of the KL
divergence, and the yaxis represents the increase of the classiﬁcation accuracy.
Panels a and b correspond to p = 2andp = 3 respectively.
are not sufﬁcient in addressing such complicated dynamics. Due to the
causal relationship, reconstructed ERD/ERS signals based on instantaneous
demixing may not be the optimized results in terms of discrimination.
Discriminative Learning of EEG Propagation 2729
Moreover, the propagation effects are closely related to the background
noise and nonstationarity of EEG. It is possible that an electrode that con
tains no discriminative information could be given a high weight due to
information ﬂow from signals containing ERD/ERS, and such dependence
could be very unstable compared with original ERD/ERS source. This anal
ysis is the motivation to propose the uniﬁed model for discriminative learn
ing of propagation and spatial patterns.
Therefore, we have reported in this letter a novel computational model
that accounts for both timelagged correlations between signals and the
volume conduction effect. Different from the sparsely connected sources
analysis (SCSA) model in Haufe et al. (2010) and MVARICA model in
GomezHerrero et al. (2008), the proposed computational model is de
signed from discriminative analysis but also takes propagation into ac
count. Besides, an iteration procedure–based algorithm is implemented for
the estimation of the proposed discriminative model. Experiment results
have shown statistically signiﬁcant improvement in classiﬁcation accuracy
under the proposed learning method. Moreover, the effectiveness of the
background noise attenuation is also conﬁrmed with a signiﬁcant decrease
of KL divergence of EEG data of the same class, especially for test data.
This indicates that the proposed method is more robust than conventional
methods against the sessiontosession nonstationarity in EEG.
Appendix: Relations Between the Convolutive Model and
the Instantaneous Model with Connected Sources
Based on the model in Haufe et al. (2010) and GomezHerrero et al. (2008),
X(t) can be assumed to be generated as a linear instantaneous mixture of
source signal S(t), which follows an multivariant autoregression (MVAR)
model,
X(t) = MS(t), (A.1)
S(t) =
τ
B
s
(τ )S(t − τ)+ (t), (A.2)
where B
s
(τ ) is the coefﬁcient matrix of the MVAR model and it represents
the connectivity between sources (Ginter et al., 2001; Schlogl & Supp, 2006).
From equation A.1, the innovation process (t) can be written as
(t) = M
−1
X(t) −
τ
B
s
(τ )M
−1
X(t − τ)
=
τ
ˆ
B
s
(τ )X(t − τ), (A.3)
2730 X. Li et al.
where
ˆ
B
s
(τ ) =
M
−1
,τ= 0
−B
s
(τ )M
−1
,τ>0
. (A.4)
Equation A.3 shows the equivalence between this model and the convo
lutive model in Dyrholm et al. (2007) and Mørup et al. (2009) and the
proposed approach, with the underlying convolutive sources replaced by
innovations. Because the objective in Haufe et al. (2010) and GomezHerrero
et al. (2008) is connectivity analysis, the estimation of B
s
(τ ) and M is based
on the nongaussianity assumption of (t). In the proposed model, S(t)
represents the discriminative sources related to ERD/ERS, and thus the
estimation of the FIR matrix
ˆ
A(τ ) in equation 2.11 and spatial ﬁlter w is
based on maximizing the variance difference between the two classes. With
the discriminative objective, it is preferable to apply the convolutive model
to impose the variance difference as the prior information of the source.
Moreover, since the two models are equivalent, it is also possible to build a
discriminative model based on the instantaneous mixing model with con
nected sources in equations A.1 and A.2. In future work, we would like to
explore possible discriminative learning approach to study the connectivity
that contains class information.
References
Ang, K. K., Chin, Z. Y., Wang, C., Guan, C., & Zhang, H. (2012). Filter bank common
spatial pattern algorithm on BCI competition IV datasets 2a and 2b. Frontiers in
Neuroscience, 6(39).
Ang, K. K., Chin, Z. Y., Zhang, H., & Guan, C. (2008). Filter bank common spatial pat
tern (FBCSP) in braincomputer interface. In Proceedings of the IEEE International
Joint Conference on Neural Networks and Computational Intelligence (pp. 2390–2397).
Piscataway, NJ: IEEE.
Ang, K. K., Chin, Z. Y., Zhang, H., & Guan, C. (2012). Mutual informationbased
selection of optimal spatialtemporal patterns for singletrial EEGbased BCIs.
Pattern Recognition, 45(6), 2137–2144.
Arvaneh, M., Guan, C., Ang, K. K., & Quek, C. (2011). Optimizing the channel selec
tion and classiﬁcation accuracy in EEGbased BCI. IEEE Transactions on Biomedical
Engineering, 58(6), 1865–1873.
Arvaneh, M., Guan, C., Ang, K. K., & Quek, C. (2013a). EEG data space adapta
tion to reduce intersession nonstationarity in braincomputer interface. Neural
Computation, 25, 2146–2171.
Arvaneh, M., Guan, C., Ang, K. K., & Quek, C. (2013b). Optimizing spatial ﬁlters by
minimizing withinclass dissimilarities in EEGbased BCI. IEEE Transactions on
Neural Networks and Learning Systems, 24(4), 610–619.
Discriminative Learning of EEG Propagation 2731
Astolﬁ, L., Cincotti, F., Mattia, D., de Vico Fallani, F., Salinari, S., Ursino, M., &
Babiloni, F. (2006). Estimation of the cortical connectivity patterns during the
intention of limb movements. IEEE Engineering in Medicine and Biology Magazine,
25(4), 32–38.
Baccala, L. A., & Sameshima, K. (2001). Partial directed coherence: A new concept in
neural structure determination. Biological Cybernetics, 84, 463–474.
Bahramisharif, A., van Gerven, M. A. J., Schoffelen, J. M., Ghahramani, Z., & Heskes,
T. (2012). The dynamic beamformer. Machine learning and interpretation in neu
roimaging. In G. Langs, I. Rish, M. GrosseWentrup, & B. Murphy (Eds.), Lecture
notes in computer science (vol. 7263, pp. 148–155). Berlin: Springer.
Bamdadian, A., Guan, C., Ang, K. K., & Xu, J. (2012). Online semisupervised learning
with kl distance weighting for motor imagerybased BCI. In Proceedings of the 2012
Annual International Conference of the IEEE Engineering in Medicine and Biology
Society (pp. 2732–2735). Piscataway, NJ: IEEE.
Blankertz, B., Tomika, R., Lemm, S., Kawanabe, M., & Muller, K. R. (2008). Optimiz
ing spatial ﬁlters for robust EEG single trialtrial analysis. IEEE Signal Processing
Magazine, 25(1), 41–56.
Chen, H., Yang, Q., Liao, W., Gong, Q., & Shen, S. (2009). Evaluation of the effective
connectivity of supplementary motor areas during motor imagery using Granger
causality mapping. NeuroImage, 47(4), 1844–1853.
Dornhege, G., Blankertz, B., Krauledat, M., Losch, F., Curio, G., & Muller, K.R.
(2006). Combined optimization of spatial and temporal ﬁlters for improving
braincomputer interfacing. IEEE Transactions on Biomedical Engineering, 53(11),
2274–2281.
Dyrholm, M., Makeig, S., & Hansen, L. K. (2007). Convolutive ICA for spatiotemporal
analysis of EEG. Neural Computation, 19, 934–955.
Ewald, A., Marzetti, L., Zappasodi, F., Meinecke, F. C., & Nolte, G. (2012). Estimat
ing true brain connectivity from EEG/MEG data invariant to linear and static
transformations in sensor space. NeuroImage, 60(1), 476–488.
Formaggio, E., Storti, S. F., Cerini, R., Fiaschi, A., & Manganotti, P. (2010). Brain
oscillatory activity during motor imagery in EEGfMRI coregistration. Magnetic
Resonance Imaging, 28(10), 1403–1412.
Gerkinga, J. M., Pfurtscheller, G., & Flyvbjergc, H. (1999). Designing optimal spatial
ﬁlters for singletrial EEG classifﬁcation in a movement task. Clinical Neurophysi
ology, 110, 787–798.
Ginter, J. Jr., Blinowska, K. J., Kaminski, M., & Durka, P. J. (2001). Phase and amplitude
analysis in timerequency spacepplication to voluntary ﬁnger movement. Journal
of Neuroscience Methods, 110(1–2), 113–124.
GomezHerrero, G., Atienza, M., Egiazarian, K., & Cantero, J. L. (2008). Measuring
directional coupling between EEG sources. NeuroImage, 43(3), 497–508.
GrosseWentrup, M. (2009). Understanding brain connectivity patterns during motor
imagery for braincomputer interfacing. In D. Koller, D. Schuurmans, Y. Bengio, &
L. Bottou (Eds.), Advances in neural information processing systems, 21 (pp. 561–568).
Cambridge, MA: MIT Press.
Guler, I., Kiymik, M. K., Akin, M., & Alkan, A. (2001). AR spectral analysis of
EEG signals by using maximum likelihood estimation. Computers in Biology and
Medicine, 31(6), 441–450.
2732 X. Li et al.
Gysels, E., & Celka, P. (2007). Phase synchronization for the recognition of mental
tasks in a braincomputer interface. IEEE Transactions on Rehablitation Engineering,
12(4), 406–415.
Haufe, S., Tomioka, R., Nolte, G., Muller, K.R., & Kawanabe, M. (2010). Model
ing sparse connectivity between underlying brain sources for EEG/MEG. IEEE
Transactions on Biomedical Engineering, 57(8), 1954–1963.
Jeannerod, M. (1995). Mental imagery in the motor context. Neuropsychologia, 33,
1419–1432.
Kaminski, M., & Blinowska, K. (1991). A new method of the description of the
information ﬂow in the brain structures. Biological Cybernetics, 65, 203–210.
Kaminski, M., Ding, M., Truccolo, W. A., & Bressle, S. (2001). Evaluating causal
relations in neural systems: Granger causality, directed transfer function and
statistical assessment of signiﬁcance. Biological Cybernetics, 85, 145–157.
Koles, Z. J. (1991). The quantitative extraction and toporraphic mapping of the ab
normal components in the clinical EEG. Electroencephalography and Clinical Neu
rophysiology, 79, 440–447.
Kus, R., Kaminski, M., & Blinowska, K. J. (2004). Determination of EEG activity prop
agation: Pairwise versus multichannel estimate. IEEE Transactions on Biomedical
Engineering, 51(9), 1501–1510.
Lemm, S., Blankertz, B., Curio, G., & Muller, K.R. (2005). Spatiospectral ﬁlters for
improving the classiﬁcation of single trial EEG. IEEE Transactions on Biomedical
Engineering, 52(9), 1541–1548.
Li, Y., & Guan, C. (2006). An extended EM algorithm for joint feature extraction
and classiﬁcation in braincomputer interfaces. Neural Computation, 18, 2730–
2761.
Llera, A., Gomez, V., & Kappen, H. J. (2012). Adaptive classiﬁcation on brain
computer interfaces using reinforcement signals. Neural Computation, 24, 2900–
2923.
Lo, A. C., Guarino, P. D., Richards, L. G., Haselkorn, J. K., Wittenberg, G. F., Federman,
D. G., & Peduzzi, P. (2010). Robotassisted therapy for longterm upperlimb
impairment after stroke. New England Journal of Medicine, 362, 1772–1783.
Mørup, M., Madsen, K. H., & Hansen, L. K. (2009). Latent causal modelling of
neuroimaging data. In NIPS Workshop on Connectivity Inference in Neuroimaging.
Pfurtscheller, G., Brunner, C., Schlogl, A., & da Silva, F. H. L. (2006). Mu rhythm
(de)synchronization and EEG singletrial classiﬁcation of different motor imagery
tasks. NeuroImage, 31(1), 153–159.
Ramoser, H., MullerGerking, J., & Pfurtscheller, G. (2000). Optimal spatial ﬁlter
ing of single trial EEG during imagined hand movement. IEEE Transactions on
Rehabilitation Engineering, 8(4), 441–446.
Schlogl, A., & Supp, G. (2006). Analyzing eventrelated EEG data with multivariate
autoregressive parameters. Progress in Brain Research, 159, 135–147.
Schneider, T., & Neumaier, A. (2001). Algorithm 808: Arﬁt—a Matlab package for the
estimation of parameters and eigenmodes of multivariate autoregressive models.
ACM Transactions on Mathematical Software (TOMS),
6, 58–65.
Stavrinou, M., Moraru, L., Cimponeriu, L., Stefania, P. D., & Bezerianos, A. (2007).
Evaluation of cortical connectivity during real and imagined rhythmic ﬁnger
tapping. Brain Topography, 19(3), 137–145.
Discriminative Learning of EEG Propagation 2733
Thomas, K. P., Guan, C., Lau, C. T., Vinod, A. P., & Ang, K. K. (2009). A new dis
criminative common spatial pattern method for motor imagery braincomputer
interfaces. IEEE Transactions on Biomedical Engineering, 56(11), 2730–2733.
Vidaurre, C., Sannelli, C., Muller, K.R., & Blankertz, B. (2011). Machinelearning
based coadaptive calibration for braincomputer interfaces. Neural Computation,
23, 791–816.
Wei, Q., Wang, Y., Gao, X., & Gao, S. (2007). Amplitude and phase coupling measures
for feature extraction in an EEGbased brain¨ccomputer interface. Journal of Neural
Engineering, 4, 120–129.
Xu, L., Stoica, P., Li, J., Bressler, S. L., Shao, X., & Ding, M. (2009). Aseo: A method for
the simultaneous estimation of singletrial eventrelated potentials and ongoing
brain activities. IEEE Transactions on Biomedical Engineering, 56(1), 111–121.
Received January 28, 2013; accepted May 5, 2013.