Learning From Feedback Training Data in

Self-paced Brain-Computer Interface

Haihong Zhang, Sidath Ravindra Liyanage, Chuanchu Wang,

and Cuntai Guan

H. Zhang, C. Wang and C. Guan are with Institute for Infocomm Research, Agency

for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #21-01

Connexis, Singapore 138632. (email: {hhzhang, ccwang, ctguan}@i2r.a-star.edu.sg).

S.R. Liyanage is aﬃlated with National University of Singapore, Singapore. (email:

sidath@nus.edu.sg)

Abstract. Inherent changes that appear in brain signals when transferring from

calibration to feedback session is a challenging but critical issue in brain-computer

interface (BCI) applications. While previous studies have mostly focused on adaptation

of classiﬁers, in this paper we study the feasibility and the importance of adaptation

of feature extraction in a self-paced BCI paradigm. First, we conduct calibration and

feedback training on able-bodied na¨ıve subjects using a new self-paced motor imagery

BCI including idle state. The online results suggest that the feature space constructed

from calibration data may become ineﬀective during feedback sessions. Hence, we

propose a new supervised method that learns from a feedback session to construct a

more appropriate feature space, on the basis of maximum mutual information principle

between feedback signal, target signal and EEG. Speciﬁcally, we formulate the learning

objective as maximizing a kernel-based mutual information estimate with respect to

the spatial-spectral ﬁltering parameters. We then derive a gradient-based optimization

algorithm for the learning task. An experimental study is conducted using oﬄine

simulation. The results show that the proposed method is able to construct eﬀective

feature spaces to capture the discriminative information in feedback training data, and

consequently, the prediction error can be signiﬁcantly reduced using the new features.

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 2

1. Introduction

Inherent changes in brain signals, either between calibration sessions or from calibration

to feedback application, poses a critical challenge to EEG-based brain-computer

interface (BCI) research [1, 2, 3], and has recently attracted a surge of attention in

the ﬁeld [4, 5, 6, 7, 8, 9, 10, 11, 12]. Particularly there has been a lot of interest in BCI

using motor imagery (MI) [2, 13, 14] which is the imagination or mental rehearsal of a

motor action without any real motor output.

The underlying non-stationarity of EEG signal accounts for much of the changes,

where the distribution of electrical ﬁelds on the scalp is subject to large variations

over time. The non-stationarity can be caused by shifts in background brain activities,

varying mental states, or individual users changing their strategy for BCI control [4].

Especially in feedback applications, more brain functions can be activated to further

complicate the changes in EEG, giving rise to complex EEG phenomena such as error

potentials [15] or rhythmic power shifts over the scalp [5]. Consequently, the feature

extraction and prediction models (e.g. a classiﬁer) built on data from past BCI sessions

data may become ineﬀective. Therefore, there is a strong need for new mathematical

models capable of accurately predicting a user’s intentions from his/her brain signals in

session-to-session transfer. Adaptive BCI that can learn from new data, in supervised,

semi-supervised or unsupervised manner is a viable approach to solve this problem.

So far most of the works on adaptive BCI have focused on adaptation of the

classiﬁers. In [5], three supervised adaptation methods using labelled data were

investigated. These included a simple bias adjustment technique, a linear discriminant

analysis (LDA) retraining technique, and a technique which retrains both LDA and

common spatial pattern (CSP)[16]-based feature extraction. It was reported that

overall the LDA-retraining approach yielded the lowest error rate. In [17], a covariance

shift algorithm was introduced for unsupervised adaptation of the linear classiﬁer.

Particularly, the covariance shift algorithm is able to perform without neither labelling

data nor predicting labels. In [18], the method for adaptation was further developed,

and combined with a bagging approach which resulted in improved stability. More

recently in [8], diﬀerent types of adaptation methods were extensively studied using

multiple BCI data sets. And the result was in favour of a bias adjustment method than

generic covariance shift adaptation.

Another interesting online BCI was presented in [7], where a quadratic

discriminative analysis classiﬁer was adapted in every cue-based feedback trial. It

showed that the distribution of EEG features shifted signiﬁcantly from one session to

another. The BCI was further studied in [10]. Diﬀerent from those systems using CSP

features mentioned earlier, the BCI basically used adaptive autoregressive features, or

band powers, or the combination of the two. In [6], a classiﬁer with band power features

as input was updated continuously, where only non-feedback (i.e. calibration) sessions

were used for oﬄine study.

However, little works have been devoted to the adaptation of feature extraction

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 3

models especially for exploring feedback training data including idle state. As indicated

in experimental results in [7] and [8], it appears that the non-stationarity may not be

solved by adapting classiﬁers alone. Rather, possible signiﬁcant brain signal changes

from calibration to feedback training sessions may render the feature space derived

from calibration data ineﬀective where little discriminative information can then be

recovered.

Therefore, the primary purpose of this work is to validate the feasibility and the

importance of adapting feature extraction models, especially for self-paced MI BCI that

allows continuous feedback control [19, 20, 21, 22, 23, 24]. It seems that adapting feature

extraction models can be a challenging issue, in view of the unsatisfactory performance

of retrained CSP models in [5].

First, we develop and test a new self-paced BCI, and study calibration and feedback

training on three able-bodied, na¨ıve subjects. The empirical result poses questions on

the eﬃcacy of applying the feature space derived from calibration data to feedback

sessions.

Hence, we propose a new supervised method that learns from a feedback session to

construct a more appropriate feature space. Particularly, the method tries to account

for the underlying complex relationships between feedback signal, target signal and

EEG, using a mutual information formulation. The learning objective is formulated

as maximizing kernel-based mutual information estimation with respect to the spatial-

spectral ﬁlters. We then derive a gradient-based optimization algorithm for the learning

task.

An experimental study is conducted using oﬄine simulation. The results show that

the proposed method is capable of constructing eﬀective feature spaces that capture

more discriminative information in the feedback sessions. Consequently, the prediction

errors can also be signiﬁcantly reduced by using the new features.

The rest of the paper is organized as follows. Section 2 describes the data collection

with a self-paced BCI, as well as the online training result. Section 3 elaborates the new

method for learning eﬀective spatial and spectral features from feedback session data.

Section 4 presents an extensive analysis, followed by discussions in Section 5. Section 6

ﬁnally concludes the paper.

2. Materials

2.1. Feedback training data collection

Three BCI-na¨ıve adults participated as BCI subjects in the data collection. All gave

informed consent, which has been reviewed and approved by the Institutional Review

Board of the National University of Singapore. The subjects were seated comfortably

in an armed chair, with their hands rested on the chair arms or on the table in front

of them. A 20-inch widescreen LCD monitor was placed on the table at a distance of

approximately 1 meter to the subject. Subjects were asked to remain still comfortably

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 4

R

L

R

Calibration Feedback Training

R

Figure 1. The graphic user interface for calibration (left panel) and for self-paced

feedback training (right panel). The grey and blue color block scrolls smoothly upwards

in the background, and the red circle in the center serves as the eye-ﬁxation point.

During feedback training, the horizontal position of the red circle serves as the feedback

signal that updates every 40 milliseconds, while its trajectory over the background

blocks is depicted by a red curve.

to avoid movement artifacts.

EEG was recorded using Neuroscan NuAmps 40-channel data acquisition system,

with electrodes placed according to an extended international 10-20 system and a

sampling frequency of 500Hz. A total of 30 channels were used, including F7, F3,

Fz, F4, F8, FT7, FC3, FC4, FT8, T7, C3, Cz, C4, T8, TP7, CP3, CPz, CP4, TP8, P7,

P3, Pz, P4, P8, O1, Oz, O2, PO1, PO2. The reference electrode was attached to the

right ear. A high-pass ﬁlter at 0.05Hz was applied in the Neuroscan’s data acquisition

setting.

The subjects faced a graphic user interface displayed on the LCD monitor as

illustrated in Fig. 1, which guided them through the following sessions.

•Calibration session. This session consisted of 40 MI tasks; each was 4-second-

long and followed by a 6-second idle state. The MI tasks were evenly and pseudo-

randomly distributed into left and right hand MI tasks. A graphic user interface

illustrated in the left panel of Fig. 1 guided the subjects through the session, where

a red circle in the middle served as the eye ﬁxation point. In the background, a

sequence of rectangular shapes was scrolling upwards, representing left/right hand

MI tasks by blue color boxes on the left/right side, or idle state tasks by grey

bars. Speciﬁcally, when the red circle was in a grey-color bar, the subject should

relax while minimizing physical movements; otherwise, the subject should imagine

left/right hand movement, if a blue-color box was on the left/right side of the circle.

The ﬁlter-bank CSP (FBCSP) [25, 26, 27] method, which was the ﬁrst winner of BCI

Competition IV Dataset I [28] was employed to build subject-speciﬁc MI detection

models. The method learned from the calibration data two separate models, one

for diﬀerentiating between left-hand MI and idle state (hereafter referred to as

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 5

L-model), and the other for between right-hand MI versus idle state (hereafter R-

model). For the L-model (or the R-model), each 2.5-second-long shift window of

EEG with a step of 0.5 second was mapped to the label of the data: 0 if the time

window ends in an idle state time period, 1 (or -1) if in a left-hand (or right hand)

MI period. The mapping parameters were obtained using the linear least-mean-

square method.

Since a user’s mental state could be uncertain and varying during the transition

period from one state to another, we deﬁned a grey region as [-1 1] second with

respect to the boundary of each idle/MI task, and excluded from FBCSP learning

any EEG segments with centers in this grey region.

•Feedback training sessions. After calibration, each subject participated in 4

sessions of feedback training, i.e. 2 sessions of left-hand MI BCI training using

the L-model and 2 sessions of right-hand MI training using the R-model. This

arrangement allowed a subject to concentrate in each session on a particular MI

task. A training session consisted of 20 MI tasks, where each lasted 5-second and

was followed by a 6-second idle state. A graphical user interface illustrated on

the right panel of Fig. 1 guided the user through the session. The meaning of the

graph was similar to that for calibration, except that the red circle was moving

horizontally as a feedback signal: its horizontal position was determined by the

FBCSP output updated every 40-millisecond.

During the feedback training, the subjects tried to move the red circle to the

left/right side as far as possible during left-hand/right-hand MI tasks. We would

like to emphasize that, the subjects were requested not to voluntarily control the

feedback signal by any means during periods of idle state. This is because voluntary

control of the feedback signal would spoil the idle state data.

In-between sessions were short breaks. The ﬁrst feedback training session started

within 5 minutes after the calibration session. And the interval between consecutive

feedback sessions was from 1 to 5 minutes. Note that a special tryout session was in

place after the calibration, where every subject tried online feedback for a short while

so as to get a feeling of the feedback and also to prepare for the actual training sessions.

The tryout session was not included in the analysis.

We would like to brieﬂy introduce the FBCSP method used in the online

experiment, since it will also be compared with the proposed learning method later.

FBCSP was introduced in [25] as a feature selection algorithm that combines a ﬁlter bank

framework with the spatial ﬁltering technique CSP. More speciﬁcally, it decomposes

EEG data into an array of pass-bands, performs CSP in each band, and selects a

reduced set of features from all the bands. Its eﬃcacy was demonstrated in the latest

BCI Competition [28], where it served as the basis of all the winning algorithms in the

EEG categories. FBCSP was improved in [26] by employing a robust maximum mutual

information criterion for feature selection.

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 6

Sub 1 Sub 2 Sub 3

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

MSE

Left MI Training

Session 1

Session 2

Sub 1 Sub 2 Sub 3

0

0.2

0.4

0.6

0.8

1

MSE

Right MI Training

Session 1

Session 2

Figure 2. Online performance of subjects in terms of mean square error between

feedback signal and target. There is a strong bias shift (from calibration to feedback)

in right motor imagery (MI) sessions in Subject 3, which explains his particularly large

error.

2.2. Data screening

The recorded EEG data during feedback training sessions were inspected visually

using MATLAB by the authors. Any EEG segments indentiﬁed of EOG and EMG

contamination [29] were rejected and excluded from the analysis. Again, we deﬁned the

grey regions in a similar way to the calibration method described above. Therefore, any

EEG segments centered within [-1 1] second with respect to any task boundary were

excluded from the analysis.

2.3. Online performance and initial data analysis

Online performance was assessed using the mean-square-error (MSE) measure between

the feedback signal and the target signal. Fig. 2 plots the bar graph of MSE in each

feedback training session. The error was apparently comparable between the ﬁrst

training session and the second in most cases. This actually indicates that online

feedback training in BCI can be a diﬃcult task, since it was anticipated the subjects

should have gained better control of the BCI over training sessions. Again, this indicates

the necessity of adapting models during session to session transfers.

To further understand the feedback training data, we plot in Fig. 3 the distribution

of EEG feature vector samples produced by FBCSP. Note that for clarity of presentation,

we used evenly re-sampled feature vector samples because the original samples count up

to thousands. As expected, the MI class samples and the idle class samples were easily

separable in the calibration data, but the discriminative information had disappeared

in the same feature space in most feedback training sessions. As a consequence, either

there was no eﬀective separation between the two classes, or the separation hyper-plane

was severely altered (similar to some cases in [7, 8]).

Therefore, it is advisable to ﬁrst look into the issue of ineﬀective feature space

before trying to adapt a classiﬁer/regressor. To address this issue, we propose a new

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 7

Left MI

Calibration

Left MI

Session 1

Left MI

Session 2

Sub 1 Sub 2 Sub 3

Right MI

Calibration

Right MI

Session 1

Right MI

Session 2

Sub 1 Sub 2 Sub 3

Figure 3. Feature distributions during motor imagery (MI) calibration and feedback

training sessions, for left MI in the upper three rows or right MI in the lower three

rows. The horizontal axis and the vertical axis are the ﬁrst and the second FBCSP

features. The axis range is made consistent in each column (i.e. each subject). Red

circles represent motor imagery samples, while black crosses denote idle state samples.

Note the signiﬁcant change especially in the distribution of motor imagery samples.

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 8

method to learn an eﬀective feature space from feedback data. We also would like to

note that, compared with calibration data, online feedback training data poses more

challenges to eﬀective feature extraction, because the feedback may involve more brain

functions and produce more complex EEG phenomena [5, 15].

3. The new learning method

3.1. Spatio-Spectral Features

The primary phenomenon of MI EEG is event-related desynchronization(ERD) or event-

related synchronization(ERS) [2, 13], the attenuation or increase of the rhythmic activity

over the sensorimotor cortex generally in the µ(8-14 Hz) and β(14-30 Hz) rhythms. The

ERD/ERS can be induced by both imagined movements in healthy people or intended

movements in paralyzed patients [21, 30, 31]. It is noteworthy that another neurological

phenomenon called Bereitschaftspotential is also associated with MI EEG but non-

oscillatory [14]. In this works we consider ERD/ERS features only.

Feature extraction of ERD/ERS is, however, a challenging task due to its poor low

signal to noise ratio. Therefore, spatial ﬁltering in conjunction with frequency selection

(via processing in either temporal domain or spectral domain) in multi-channel EEG

has been highly successful for increasing the signal to noise ratio [16, 32, 27, 33, 34].

Let’s consider the spatial-spectral ﬁltering in the spectral domain, where each nc-

channel EEG segment with a sampling rate of Fs-Hz can described by an nc×nfmatrix.

X=

x11 · · · x1nf

.

.

.....

.

.

xnc1· · · xncnf

(1)

where xij denotes the discrete Fourier transform of the i-th channel at frequency

ωj=j−1

2nfFs.

A joint spatial-spectral ﬁlter on Xcan be essentially represented by a spatial

ﬁltering vector w∈Rnc×1and a spectral ﬁlter vector f∈Rnf×1. The feature y0is

the energy of the EEG segment after ﬁltering:

y0= diag n]

wTXwTXof(2)

where the wave line eon the right side of the equation denotes the conjugate of a

complex value, and the diag() function stands for the diagonal vector of a matrix.

In this work we consider a general case in which multiple spatial ﬁlters are associated

with one particular spectral ﬁlter. Therefore, the feature extraction model is determined

by the matrix fand a vector W, the latter being the collection of spatial ﬁlters in

columns:

W= [w1. . . wnw] (3)

Suppose the spectral ﬁlters in Fare given (see the last paragraph of Section. 3.3

for details), we can use the following shorthand for the auto-correlation matrix of EEG

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 9

processed by the k-th spectral ﬁlter

ˆ

Xk=

nf

X

i

fiXe

X,(4)

and express the logarithmic feature vector by

y=hlog(w1ˆ

X1wT

1),...,log(wnwˆ

XnwwT

nw)iT

(5)

3.2. Formulation of the objective function for learning

To capture the underlying complex structure of spatio-spectral data in ERD/ERS, we

would like to design a mutual information based objective function for learning W

and F. Mutual information [35], which stemmed from information theory, basically

measures the reduction of uncertainty about class labels due to the knowledge of the

features. Readers interested in mutual information-based feature extraction/selection

may ﬁnd related works in [36, 37, 38, 39, 40, 41].

For feedback training data, we consider a mutual information measure ˆ

Ibetween

the class labels and the EEG features as well as the feedback signal. Speciﬁcally, the

mutual information is between the class label (i.e. the variable to be predicted) and the

observations including both the feedback signal and the EEG feature vector. Let the

random variables of the label, the EEG feature vector, and the feedback signal be C,Y

and Z, respectively. There is

ˆ

I({Y,Z},C) = ˆ

H(Y,Z)−X

c

P(c)ˆ

H(Y,Z|c) (6)

where ˆ

Hdenotes the entropy measure of a random variable.

Like [41, 39], we resort to a non-parametric approach for mutual information

estimation, since it does not rely on the underlying distributions.

Suppose the feedback training data comprise lsamples of EEG to be represented

by the feature vectors yis and the concurrent feedback signal zis (i∈[1, . . . , l]). The

non-parametric approach computes each entropy in Eq. 6 separately, e.g. ˆ

H(Y,Z) by

ˆ

H(Y,Z) = −1

l

l

X

i=1

log (1

l

l

X

j=1

ϕy(yi,yj)ϕz(zi, zj)),(7)

And ϕyand ϕzare kernel functions and usually take a Gaussian form. For example,

ϕ(y,yi) = αexp−1

2(y−yi)TΨ−1(y−yi).(8)

The coeﬃcient αis discarded hereafter because it will be cancelled out when Eq. 8

is substituted in Eq. 7 and then substituted in Eq. 6. It should be noted that the kernel

size matrix Ψis diagonal, and each diagonal element is determined by

ψk,k =ζ1

l−1

l

X

i=1

(yik −¯yk)2.(9)

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 10

where ¯

ykis the empirical mean of yk, and we set the coeﬃcient ζ=4

3l0.1according to

the normal optimal smoothing strategy [42].

The conditional entropy ˆ

H(Y|c) in Eq. 6 can also be estimated similar to Eq. 7,

but using samples from class-conly.

Using the maximum mutual information principle [36], we now deﬁne the learning

task as searching for the optimum spatial and spectral ﬁlters Wand Fthat satisﬁes

{W,F}opt = argmax

{W,F}

ˆ

I({Y,Z},C) (10)

The above formulation describes the inter-dependency between the target signal,

the feedback signal and the EEG signal as a function over the feature extraction

parameters in spatial-spectral ﬁlters. It basically aims to maximize the information

about the target signal to be predicted, contained in the extracted features in

conjunction with feedback. Please refer Section 5 for a further discussion on this

formulation.

3.3. Gradient-based solution to the learning problem

Here we propose a numerical solution to Eq. 10 by devising a gradient-based optimization

algorithm. We consider a spatial ﬁlter vector wk, and note that the gradient of the

objective function ˆ

Iwith respect to wkis

∇wkˆ

I({Y,Z},C) = ∇wkˆ

H(Y,Z)−X

c∈C

P(c)∇wkˆ

H(Y,Z|c) (11)

From Eq. 7, we have

∇wkˆ

H(Y,Z) = −1

l

l

X

i=1

βi

1

l

l

X

j=1

ϕz(zi, zj)∂ϕy(yi,yj)

∂wk

(12)

where

βi= 1

l

l

X

j=1

ϕz(zi, zj)ϕy(yi,yj)!−1

(13)

From Eq. 8, we have

∂ϕy(yi,yj)

∂wk

=−1

2ϕy(yi,yj)∂(yi−yj)TΨ−1(yi−yj)

∂wk

(14)

Let’s denote the quadratic function (yi−yj)TΨ−1(yi−yj) by ϑij, which can be

further decomposed to,

ϑij =

do

X

k1=1

do

X

k2=1

ψ−1

k1k2(yik1−yjk1)(yik2−yj k2).(15)

Hence, the gradient of ϑij is

∂ϑij

∂wk

=

do

X

k1=1

do

X

k2=1 "∂ψ−1

k1k2

∂wk

(yik1−yjk1)(yik2−yj k2)

+ψ−1

k1k2

∂(yik1−yjk1)(yik2−yj k2)

∂wk(16)

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 11

Consider that (yik1−yjk2)2is a function of wkif and only if k1=kand/or k2=k,

and ψ−1

k1k2is a function of wkif and only if k1=k2=k. Furthermore, ψ−1

k1k2= 0 only if

k16=kor k26=k. The expression of the gradient above can be written as

∂ϑij

∂wk

=∂ψ−1

kk

∂wk

(yik −yjk )2+ψ−1

kk

∂(yik −yjk )2

∂wk

(17)

From Eq. 9, we have

∂ψ−1

k,k

∂wk

=−2ζ

ψ2

k,k (l−1)

l

X

i0=1

(yi0k−¯yk)∂(yi0k−¯yk)

∂wk

(18)

where ¯ykdenotes the mean value of yi0ks, and its partial derivative w.r.t. wkcan be

expressed by

∂¯yk

∂wk

=1

l

l

X

i00

∂yi00 k

∂wk

(19)

We further note that ˆ

Xki (the auto-correlation matrix for the i-th EEG sample

processed by the k-th spectral ﬁlter, see Eq. 4)) is conjugate symmetric, and

∂yik

∂wk

=(ˆ

Xki +ˆ

XT

ki)wk

yik

=2Re( ˆ

Xki)wk

yik

(20)

where Re() denotes the real part of a complex matrix. The derivatives of yi0kand yj k

can be computed the same way as above.

We can summarize the above steps as follows.

∇wkˆ

H(Y) = Awk,(21)

where

A=2

l2

l

X

i=1

βi

l

X

j=1

ϕz(zi, zj)ϕy(yi,yj)"−ζ(yik −yjk )2

ψ2

k,k (l−1) ·

l

X

i0=1

(yi0k−¯yk) Re( ˆ

Xki0)

yi0k

−1

l

l

X

i00

Re( ˆ

Xki00 )

yi00k!+

ψ−1

kk (yik −yjk ) Re( ˆ

Xki)

yik

−Re( ˆ

Xkj )

yjk !# (22)

There will be, for each conditional entropy ˆ

H(Y|c), an equation similar to Eq. 21.

Then the gradient of the objective function Iwith respect to the spatial ﬁlter wkis

∇wkˆ

I({Y,Z},C) = A−X

c

P(c)Ac!wk(23)

We would like to note that the above equation does not suggest that the gradient is

a linear function over wk, since the multiplier term (A−PcP(c)Ac) itself is a rather

complicated function over {yi}which in turn is a function of W.

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 12

With the gradient information, our iterative optimization algorithm updates a

spatial ﬁlter by

w(iter+1)

k=w(iter)

k+λ∇wkˆ

I({Y(iter),Z},C) (24)

where λis the step size. In this work, we utilize a line search procedure to determine the

step size in each of the iteration. Note that all spatial ﬁlter vectors in Ware updated

together.

In our implementation, the line search procedure tests a number of (tentatively 16)

λvalues in the range of [-0.05 0.10]×ξ, and decreases ξin logarithm scale until a local

maximum of Iis found but not at λ= 0. The λfor the local maximum is then used

to update all the spatial ﬁlters wks in Eq. 24, and then the optimization procedure

proceeds to the next iteration.

The iterations will terminate when a convergence criterion is met. In this work,

we use a simple criterion: mutual information gain less than 1e-5. Since the iterative

algorithm is a typical gradient-based greedy optimization method, the pseudo-code is

omitted to save space.

The initial values for wkcan be learned by the CSP method [16] that maximizes

the Rayleigh coeﬃcient

wkPl1

i=1 ˆ

Xkiwk

wkPl0

j=1 ˆ

Xkj wk

(25)

where ˆ

Xki denotes the i-th sample of motor imagery EEG while ˆ

Xkj the j-th sample of

idle state EEG.

Finally, we describe how to select the spectral ﬁlters for F. Like FBCSP, we can

also create a set of candidate spectral ﬁlters consisting of band-pass ﬁlters that cover the

motor imagery EEG spectrum. For instance, in the experimental study to be introduced

in the next section, we borrowed the ﬁlter banks conﬁguration from [26] that had 8 band-

pass ﬁlters with central frequency ranging from 4 to 32 Hz. After band-pass ﬁltering in

spectral domain, we trained CSP according to Eq. 25 to extract discriminative energy

features. Then we selected the optimum nwfeatures from all, using the method in [26].

The spectral ﬁlters associated with the optimum features then comprised the matrix F.

4. Results

We conducted an oﬄine simulation of the self-paced BCI using the online feedback

training data. The simulation ran in MATLAB, and the proposed method was

implemented in hybrid MATLAB and C code so as to improve computation and

programming eﬃciency. The EEG features together with the feedback signal zserved

the inputs to a regressor (please refer to the Discussions section below for a related

discussion), in order to predict the target value of 0 (idle state), -1 (right-hand MI) or

1 (left-hand MI). We employed a linear support vector regression using the LibSVM

toolbox [43]. Note that we had attempted other regression methods such as Gaussian-

kernel non-linear support vector regression, linear mean-square-error regression. But, no

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 13

−3 −2.5 −2 −1.5 −1 −0.5 0

0.2

0.4

0.6

0.8

1

1.2

φ

θ

CSP

This Method

Figure 4. Optimization on the mutual information surface: an example with a spatial

ﬁlter vector for three-channel EEG. See Section 4.1 for details.

signiﬁcant diﬀerence was found in the results, and we will only show the linear support

vector regression results here.

Similar to the online feedback training described in Section 2, the oﬄine simulation

tested left-hand MI BCI and right-hand MI BCI separately. For example, for the left-

hand MI BCI, the ﬁrst left-hand MI training session was used to learn the optimum

spatial-spectral ﬁltering and then the linear support vector regressor was trained. Next,

the feature extraction and regression was tested on the second left-hand MI training

session. The simulation used a 2-second long shift window with a step of 0.4 second.

For a comparative analysis with the state-of-the-art, we also tested the FBCSP

using the same setting.

4.1. Convergence of the optimization algorithm

We studied the convergence of the optimization algorithm. First, we considered a simple

scenario which included only three EEG channels (CP3,CPz,CP4) and one spatial ﬁlter.

We would like to note that similar ﬁndings were also obtained in our extensive tests

that used diﬀerent selection of channels around the sensori-motor cortex regions, e.g.

C3,Cz,C4.

Since the mutual information measure is always invariant to non-zero norm of the

spatial ﬁlter, we set the norm of the spatial ﬁlter to 1 without loss of generality.

Therefore, the spatial ﬁlter can be represented by two variables in the spherical

coordinate system: θ= acos(w3) and φ= atan(w2

w1). This should not be confused

with the Euclidean space where the actual optimization takes place. The two-variable

representation is just meant for visualization.

Fig. 4 shows a typical example from the left-hand MI learning in Subject 2. The

spatial ﬁlter solution migrated in 4 steps from the initial point (generated by CSP) to

approximately a local maximum where the iteration converged (mutual information gain

<1e-5).

The algorithm was initialized using the method described in the previous section,

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 14

Left MI

Session 2

Right MI

Session 2

Sub 1 Sub 2 Sub 3

Figure 5. Feature distributions by the proposed learning method for the left/right

motor imagery (MI) feedback training session 2. The horizontal axis and the vertical

axis are respectively the ﬁrst and the second features learned by the learning method.

The graphs in the upper row are generated from left MI training data, while the lower

row from right MI training data. Red circles represent motor imagery samples, while

black crosses denote idle state samples. See Fig. 3 (especially the bottom row for the

same session) for a comparison.

and then in most cases the optimization algorithm converged within 7 iterations. We

also tested random spatial ﬁlters for initialization, and the iteration procedure generally

became longer but converged within 50 iterations in all 100 test runs.

4.2. Feature distributions

We used the ﬁrst feedback training session to learn 2 spatial-spectral ﬁlters by the

proposed method, and extracted EEG features from the second feedback session. Fig. 5

plots the distribution of the features (as the original samples amount to thousands, we

used evenly re-sampled feature vector samples for a clear presentation).

Comparing with those features produced by calibration models in Fig. 3 (especially

in the bottom row for the same training session), the new features appear to be more

separable between the MI classes and the idle states. To verify this, we assess the

separability in terms of classiﬁcation accuracy by a linear support vector machine (using

the same LibSVM toolbox from [43]). The result on the original features and that on

the new features are compared in Table 1.

The table clearly indicates that the proposed method, which adapted both the

classiﬁer and the feature extraction model, produced signiﬁcantly better performance in

terms of class separability, than when only the classiﬁer was adapted. This veriﬁes our

argument in the introduction that the non-stationarity in EEG may not be solved by

adapting classiﬁers alone. Rather, it is advisable to adapt both the feature extraction

model and the classiﬁer so as to accurately capture the variation of EEG over time.

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 15

Features Sub 1 Sub 2 Sub 3

Left MI Original 73.7% 79.0% 66.9%

This Method 85.0% 84.8% 81.0%

Light MI Original 67.9% 59.7% 78.1%

This Method 80.0% 69.6% 84.0%

Table 1. Class separability: new feature space (“This method”) versus original feature

space (“Original”). Class separability is measured as the classiﬁcation accuracy by a

linear support vector machine that is adapted to the data (feedback training session 2).

Note “Original” uses adaptation of classiﬁer only, while “This method” adapts both

the classiﬁer and the feature extraction model. The higher accuracy rates between the

two feature spaces are shown in bold style. See Section 4.2 for related discription.

4.3. Accuracy of feedback control prediction

We investigate whether the new features can generate better prediction of user state. We

would also like to test the adaptation of regressor, since the classiﬁcation hyper-plane

may have shifted from the ﬁrst feedback session to the second. Therefore, we tested

a supervised adaptation, which used a portion (called adaptation data which started

from the beginning of the session) of the second feedback session, and re-trained the

regressor (using both the adaptation data and the ﬁrst feedback session data, and tested

the models on the remainder of the second feedback session. We examined diﬀerent sizes

for the adaptation data in terms of percentage of the whole session, ranging from 0 (i.e.

no adaptation) to 0.45.

FBCSP was also evaluated using the same method for comparison. And the

comparative results are illustrated in Fig. 6. Apparently, both FBCSP and the proposed

method can learn a much more accurate predictor from the ﬁrst feedback session than

the original BCI that used only the calibration data. Furthermore, the prediction error

was also eﬀectively reduced by the supervised adaptation. But, this improvement is

not as signiﬁcant as the improvement observed from the original BCI to the proposed

method. Furthermore, the proposed method also consistently outperformed FBCSP,

signiﬁcantly in most cases.

We examined the impact of the new method on the feedback signal curves. Fig. 7

illustrates a graph comparing the new feedback signal to the original feedback signal,

for Subject 2. Clearly, the new feedback signal curve followed the target curve much

more accurately.

We also investigated if the new method works with a reduced set of channels.

Particularly, we tested 15, 9 and 6 channels (see Table 2 for the channel names), and

ran the proposed method and FBCSP, respectively using the same method described

above (see Figure 6), and performed t-test to check if our method produced lower MSE

with statistical signiﬁcance compared with FBCSP and the original feedback training

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 16

0 0.1 0.2 0.3 0.4 0.5

0.15

0.2

0.25

MSE

Percentage of Session 2 data for adaptation of regressor

Left−hand motor imagery training

Original

FBCSP

This Method

0 0.1 0.2 0.3 0.4 0.5

0.1

0.2

0.3

0.4

0.5

0.6

MSE

Percentage of Session 2 data for adaptation of regressor

Right−hand motor imagery training

Original

FBCSP

This Method

Figure 6. Comparison of prediction error in terms of mean-square-error (MSE) by

diﬀerent methods. The horizontal axis denotes the percentage of the second feedback

session being used for re-training the support vector regression machine that maps

EEG features to the target signal. For the original online feedback, there is no re-

training but MSE is computed at each percentage point using the same test set. The

test set is the second feedback session excluding the part for regressor re-training. The

curves plot the average of MSE over the three subjects, while the vertical line centered

at the each point represents the standard deviation by its length. See Section 4.3 for

related description.

10 20 30 40 50 60 70 80 90 100

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Time (second)

Feedback Signal

Target

Original Online

This Method

Figure 7. Comparison between target, original feedback signal and the new prediction

by the proposed method. Here is an example from Subject-2’s left motor imagery

training session. The timing is in alternation between approximately 5-second motor

imagery (target=1) and 6-second idle state (target=0) except the ﬁrst idle state period

which is slightly longer.

result.

The result indicates that the new method improved the performance in terms of

MSE with statistical signiﬁcance in all the channel sets being tested. While if we

compare the new method with FBCSP, it still yielded signiﬁcant lower MSE with

as few as 9 channels. In the case of 6 channels, the method and FBCSP produced

comparable result, while both signiﬁcantly outperforming the original model constructed

from calibration only.

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 17

0 0.1 0.2 0.3 0.4 0.5

0.1

0.15

0.2

0.25

0.3

0.35

0.4

MSE

Percentage of Session 2 data for adaptation of regressor

Left−hand motor imagery training

Original

FBCSP

This Method

0 0.1 0.2 0.3 0.4 0.5

0.1

0.2

0.3

0.4

0.5

0.6

MSE

Percentage of Session 2 data for adaptation of regressor

Right−hand motor imagery training

Original

FBCSP

This Method

Figure 8. Comparison of prediction error in terms of mean-square-error (MSE)

by diﬀerent methods using 9 EEG channels only. See Figure 6 and Section 4.3 for

descriptions.

#Ch Data p-value Channel Names

This vs FBCSP This vs Original

All Left MI <0.01 <0.01 All 30 Channels (See Section 2).

Right MI <0.04 <0.01

15 Left MI <0.01 <0.01 F3,F4,FC3,FCz,FC4,T3,Cz,

Right MI 0.09 <0.01 C4,T4,CP3,CPz,CP4,P3,P4

9Left MI <0.01 <0.01 FC3,FCz,FC4,C3,Cz,C4,CP3,

Right MI 0.86 <0.01 CPz,CP4

6Left MI 0.48 <0.01 FC3,FC4,C3,C4,CP3,CP4

Right MI 0.93 <0.01

Table 2. Statistical paired t-test (p-value shown here) of comparing the new method’s

MSE with that of FBCSP or the original feedback training result, using diﬀerent

number of channels. Signiﬁcant results with p-value <0.05 are shown in bold.

5. Discussions

The ﬁgure 6 gives clear evidence that the proposed method of using the new spatial-

spectral learning algorithm can signiﬁcantly increase the prediction accuracy. The mean

MSE for left (or right) MI feedback training was eﬀectively reduced from approximately

0.3(or 0.5) to a slightly lesser value of 0.2 (or 0.25). The improved accuracy can also be

seen in the prediction curves in the example case of Fig. 7, which actually showcases a

reduction of MSE from 0.24 to 0.13.

The increased accuracy can be largely attributed to the improved feature space

shown in Fig. 5 in contrast to the original feature spaces in Fig. 3. The original

feature space that was used in feedback training was built using the calibration data.

The changes of feature distributions in the original feature space have highlighted the

eﬀect of session-to-session transfer, which is generally consistent with prior studies on

adaptive BCI. Thus, during feedback sessions, the motor imagery EEG and idle-state

EEG was predominantly non-separable. Even if they were separable it was subject to

distribution shift. On the other hand, the new feature space was learned from feedback

training data comprised of three sources of information, namely, EEG, target signal and

feedback signal. Therefore, it has been able to capture essential information for user

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 18

state prediction during online feedback training.

It is also worthwhile to mention again that the new model uses a non-parametric

formulation for learning, which aims to account for arbitrary dependencies among EEG,

target and feedback signals. Section 4.1 has shown that our optimization algorithm,

derived through the new formulation has a good convergence properties. Fig. 4 has

shown that the objective function surface for the 3-channel EEG data is smooth, which

is a favorable condition for the greedy algorithm. However, we expect that the mutual

information surface can become far more complicated, especially for EEG data with a

large number of channels. Therefore, future research may investigate more advanced

optimization techniques. However, such techniques would usually incur much heavier

computational costs.

While this work has focused on the development and validation of a new learning

method for adaptive BCI, it would be interesting to investigate its performance during

online training. Even though it is beyond the scope of this paper, it is within the scope

of our ongoing research. Generally, a large number of subjects would be required in

order to draw statistically signiﬁcant comparisons between adaptive and non-adaptive

BCI systems.

It is also interesting to look back into the formulation of objective formulation in

Section 3.2. As stated earlier, the goal is to maximize the information about the target

signal to be predicted, contained in the EEG features in conjunction with the feedback.

Therefore, it is advisable to include both the new EEG features and the prediction

outputs of the current model as inputs to the classiﬁer or regression machine in the

new model. Importantly, the feedback serves two purposes: not only does it serve as a

visual “stimulus” to the subject, but it also represents the current prediction model that

contains essential information extracted from earlier calibration/feedback sessions. The

ﬁrst rationale is that, feedback and its relative position to the target signal may have

an eﬀect on brain activations to complicate motor imagery EEG. The second function

gives rise to multiple implications as explained below. First, the formulation considers

only the output of the current BCI model but not the internal mechanism of the model.

Thus, it can work with any BCI model and adapt them during new feedback training

sessions. Secondly, if a user with a prediction model can control the feedback signal to

match the target signal satisfactorily during a feedback session, further re-adaptation of

the prediction model can be unnecessary as co-adaption of user and machine has already

been achieved. This can also be viewed as a special case of the objective function Eq. 10:

if the feedback variable Zin the objective function already carries essential information

about the target signal C, re-adaptation of BCI by including new EEG features would

produce no signiﬁcant gain in the objective function.

We would like to emphasize again that the proposed method works in a supervised

learning fashion. In other words, it requires the data labels for adaptative learning.

Unlike unsupervised or semi-supervised online learning approach, this enables the

learning system to measure the compliance of a subject to the BCI tasks, so as to

ensure the stability of the adaptation process.

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 19

The proposed method with the current solution may be more suited for oﬄine

adaptation than for online adaptation. In online adaptation, both user training and

machine adaptation take place at the same time. While in oﬄine adaptation, machine

adaptation is performed after the user ﬁnishes a training session. Although this method

is applicable to online adaptation, the expensive computation can be a serious concern

for practical online use. We estimate that the computational complexity of computing

the gradient by Eq. 23 and Eq. 22 is on the order of O(l2n2

c) and that of evaluating the

objective function by Eq. 7 and Eq. 6 is O(l2nc). Here ldenotes the number of samples

and ncthe number of channels. In our experimental setup for the results presented

in Section 4, we implemented a learning code using hybrid MATLAB and C coding

without multi-threading. On our test computer with a Xeon CPU at 2.93GHz, the

code took approximately 130 seconds to complete one iteration for nc= 30-channel

EEG data, or 18 seconds for nc= 6-channel EEG data, both of l= 2230 time segment

samples. The primary cause of the high computational complexity is the non-parametric

(kernel-based) nature of the method that requires computation in each pair of samples.

Therefore, a possible solution to this problem will be to reduce the number of samples

for adaptation but without losing useful information.

6. Conclusion

In this paper we have studied and addressed the critical issue of session-to-session

transfer in brain-computer interface (BCI). While previous studies have often focused

on adaptation of classiﬁers, we have shown the importance of and the feasibility

of adapting feature extraction models within a self-paced BCI paradigm. First, we

conducted calibration and feedback training on able-bodied na¨ıve subjects using a new

self-paced motor imagery BCI including idle state. The online results suggested that

the feature extraction models built from calibration data may not generalize well to

feedback sessions. Hence, we have proposed a new supervised adaptation method that

learns from feedback data to construct a more appropriate model for feedback training.

Speciﬁcally, we have formulated the learning objective as a maximization of kernel-

based mutual information estimation with respect to spatial-spectral ﬁlters. We have

also derived a gradient-based optimization algorithm for the learning task. We have

conducted an experimental study through oﬄine simulations and the results suggest

that the proposed method can signiﬁcantly increase prediction accuracies for feedback

training sessions.

[1] J.R. Wolpaw, N. Birbaumer, D.J. MacFarland, G. Pfurtscheller, and T.M. Vaughan. Brain-

computer interface for communication and control. Clinical Neurophysiology, 113:767–791, 2002.

[2] G. Pfurtscheller, C. Neuper, D. Flotzinger, and M. Pregenzer. EEG-based discrimnation

between imagination of right and left hand movement. Electroencephalography and Clinical

Neurophysiology, 103:642–651, 1997.

[3] A. Nijholt and D. Tan. Brain-computer interfacing for intelligent systems. IEEE Intelligent

Systems, 23:72–79, 2008.

[4] Jose del R. Millan, Anna Buttﬁeld, C. Vidaurre, M. Krauledat, A. Schlogl, P. Shenoy, B. Blankertz,

R.P.N. Rao, R. Cabeza, Gert Pfurtscheller, and K. R. Mueller. Adaptation in Brain-Computer

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 20

Interfaces. In G. Dornhege, Jose del R. Millan, T. Hinterberger, D. McFarland, and K. R.

Mueller, editors, Towards Brain-Computer Interfacing. The MIT Press, 2007.

[5] P. Shenoy, M. Krauledat, B. Blankertz, R. P. N. Rao, and K.-R. Muller. Towards adaptive

classiﬁcation for BCI. Journal of Neural Engineering, 3(1):13–23, 2006.

[6] A. Buttﬁeld, P.W. Ferrez, and J. d. R. Millan. Towards a robust BCI: error recognition and online

learning. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 14:164–168,

2006.

[7] C. Vidaurre, A. Schlogl, R. Cabeza, R. Scherer, and G. Pfurtscheller. A fully online adaptive BCI.

IEEE Transactions on Biomedical Engineering, 53:1214–1219, 2006.

[8] C. Vidaurre, M. Kawanabe, P. von Bunau, B. Blankertz, and K.R. Muller. Toward unsupervised

adaptation of lda for brain-computer interfaces. IEEE Transactions on Biomedical Engineering,

58:587–597, 2011.

[9] A. Lenhardt, M. Kaper, and H.J. Ritter. An adaptive P300-based online brain-computer interface.

IEEE Transactions on Neural Systems and Rehabilitation Engineering, 16:1–11, 2008.

[10] C. Vidaurre, A A. Schlogl, R. Cabeza, R. Scherer, and G. Pfurtscheller. Study of on-line adaptive

discriminant analysis for eeg-based brain computer interfaces. IEEE Transactions on Biomedical

Engineering, 54:550–556, 2007.

[11] Yuanqing Li and Cuntai Guan. An extended em algorithm for joint feature extraction and

classiﬁcation in brain-computer interfaces. Neural Computation, 18:2730–2761, 2006.

[12] B. Blankertz, M. Kawanabe, R. Tomioka, F. Hohlefeld, V. Nikulin, and K.-R. M¨uller. Invariant

common spatial patterns: Alleviating nonstationarities in brain-computer interfacing. In

Advances in Neural Information Processing Systems, pages 113–120. MIT Press, Cambridge,

MA, 2008.

[13] J. Muller-Gerking, G. Pfurtscheller, and H. Flyvbjerg. Designing optimal spatial ﬁltering of single

trial EEG classiﬁcation in a movement task. Clinical Neurophysiology, 110:787–798, 1999.

[14] B. Blankertz, G. Dornhege, C. Schafer, R. Krepki, J. Kohlmorgen, K.-R. M¨uller, V. Kunzmann,

F. Losch, and G. Curio. Boosting bit rates and error detection for the classiﬁcation of fast-paced

motor commands based on single-trial EEG analysis. IEEE Transactions on Neural Systems

and Rehabilitation Engineering, 11:127–131, 2003.

[15] G. Schalk, J.R. Wolpaw, D.J. McFarland, and G. Pfurtscheller. Eeg-based communication:

presence of an error potential. Clinical Neurophysiology, 111:2138–2144, 2000.

[16] H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller. Optimal spatial ﬁltering of single trial EEG

during imagined hand movement. IEEE Transactions on Rehabilitation Engineering, 8(4):441–

446, 2000.

[17] M. Sugiyama, M. Krauledat, and K.R. Mueller. Covariance shift adaptation by importance

weighted cross validation. Journal of Machine Learning Research, 8:985–1005, 2007.

[18] Y. Li, H. Kambara, Y. Koike, and M. Sugiyama. Application of covariate shift adaptation

techniques in braincomputer interfaces. IEEE Transactions on Biomedical Engineering,

57:1318–1324, 2010.

[19] Haihong Zhang and Cuntai Guan. A maximum mutual information approach for constructing a 1d

continuous control signal at a self-paced braincomputer interface. Journal of Neural Engineering,

7(5):056009, 2010.

[20] S. G. Mason and G. E. Birch. A brain-controlled switch for asynchronous control applications.

IEEE Transactions on Rehabilitation Engineering, 47:1297–1307, 2000.

[21] A. K¨ubler, F. Nijboer, J. Mellinger, T. M. Vaughan, H. Pawelzik, G. Schalk, D. J. McFarland,

N. Birbaumer, and J. R. Wolpaw. Patients with ALS can use sensorimotor rhythms to operate

a brain-computer interface. Neurology, 64:1775–1777, 2005.

[22] H. Zhang, C. Guan, and C. Wang. Asynchronous p300-based brain-computer interfaces: A

computational approach with statistical models. IEEE Transactions on Biomedical Engineering,

55(6):1754–1763, 2008.

[23] B. Blankertz, G. Dornhege, M. Krauledat, K.-R. M¨uller, and G. Curio. The non-invasive

Learning From Feedback Training Data in Self-paced Brain-Computer Interface 21

Berlin brain-computer interface: Fast acquisition of eﬀective performance in untrained subjects.

NeuroImage, 37(2):539–550, 2007.

[24] F. Galan, M. Nuttin, E. Lew, P.W. Ferrez, G. Vanacker, J. Philips, and J.del R. Mill´an. A brain-

actuated wheelchair: Asynchronous and non-invasive brain-computer interfaces for continuous

control of robots. Clinical Neurophysiology, 119:2159–2169, 2008.

[25] K. K. Ang, Z. Y. Chin, H. Zhang, and C. Guan. Filter bank common spatial pattern (FBCSP) in

brain-computer interface. In International Joint Conference on Neural Networks (IJCNN2008),

pages 2391–2398, 2008.

[26] H. Zhang, C. Guan, and C. Wang. Spatio-spectral feature selection based on robust mutual

information estimate for brain-computer interfaces. In Annual International Conference of the

IEEE Engineering in Medicine and Biology Society, pages 2391–2398, 2009.

[27] H. Zhang, Z. Y. Chin, K. K. Ang, C. Guan, and C. Wang. Optimum spatio-spectral ﬁltering

network for braincomputer interface. IEEE Transactions on Neural Networks, 22:52–63, 2011.

[28] BCI Competition IV. http://www.bbci.de/competition/.

[29] M. Fatourechi, A. Fatourechi, R.K. Ward, and G.E. Birch. EMG and EOG artifacts in brain

computer interface systems: A survey. Clinical Neurophysiology, 118:480–494, 2007.

[30] G. Dornhege, B. Blankertz, G. Curio, and K.-R. M¨uller. Boosting bit rates in noninvasive EEG

single-trial classiﬁcations by feature combination and multiclass paradigms. IEEE Transactions

on Biomedical Engineering, 51(6):993–1002, 2004.

[31] M. Grosse-Wentrup and M. Buss. Multiclass common spatial patterns and information theoretic

feature extraction. IEEE Transactions on Biomedical Engineering, 55(8):1991–2000, 2008.

[32] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K.-R. M¨uller. Optimizing spatial ﬁlters

for robust EEG single-trial analysis. IEEE Signal Processing Magazine, 25:41–56, 2008.

[33] G. Dornhege, B. Blankertz, M. Krauledat, F. Losch, G. Curio, and K.-R. M¨uller. Combined

optimization of spatial and temporal ﬁlters for improving brain-computer interfacing. IEEE

Transactions on Biomedical Engineering, 53(11):2274–2281, 2006.

[34] S. Lemm, B. Blankertz, G. Curio, and K.-R M¨uller. Spatio-spectral ﬁlters for improving the

classiﬁcation of single trial EEG. IEEE Transactions on Biomedical Engineering, 52:1541–1548,

2005.

[35] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, New York, 2nd edition,

2006.

[36] S. Petridis and S.J. Perantonis. On the relation between discriminant analysis and mutual

information for supervised linear feature extraction. Pattern Recognition, 37:857–874, 2004.

[37] M. Ben-Bassat. User of distance measures, information measures and error bounds in feature

evaluation. In P. Krishnaiah and L. Kanal, editors, Handbook of Statistics, pages 773–791.

North-Holland, Amsterdam, 1982.

[38] M. Last, A. Kander, and O. Maimon. Information-theoretic algorithm for feature selection.

Pattern Recognition Letters, 22:799–811, 2001.

[39] J.M. Sotoca and F. Pla. Supervised feature selection by clustering using conditional mutual

information-based distances. Pattern Recognition, 43:2068–2081, 2010.

[40] P.A. Estevez, M. Tesmer, C.A. Perez, and J.M. Zurada. Normalized mutual information feature

selection. IEEE Transactions on Neural Networks, 20:189–201, 2009.

[41] N. Kwak and C.-H. Choi. Input feature selection by mutual information based on parzen window.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:1667 – 1671, 2002.

[42] A. W. Bowman and A. Azzalini. Applied Smoothing Techniques for Data Analysis: The Kernel

Approach with S-Plus Illustrations. Oxford University Press, New York, 1997.

[43] C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001.

http://www.csie.ntu.edu.tw/∼cjlin/libsvm.