Learning from feedback training data at a self-paced brain–computer interface

Article (PDF Available)inJournal of Neural Engineering 8(4):046035 · August 2011with 66 Reads
DOI: 10.1088/1741-2560/8/4/046035 · Source: PubMed
Cite this publication
Abstract
Inherent changes that appear in brain signals when transferring from calibration to feedback sessions are a challenging but critical issue in brain-computer interface (BCI) applications. While previous studies have mostly focused on the adaptation of classifiers, in this paper we study the feasibility and the importance of the adaptation of feature extraction in a self-paced BCI paradigm. First, we conduct calibration and feedback training on able-bodied naïve subjects using a new self-paced motor imagery BCI including the idle state. The online results suggest that the feature space constructed from calibration data may become ineffective during feedback sessions. Hence, we propose a new supervised method that learns from a feedback session to construct a more appropriate feature space, on the basis of the maximum mutual information principle between feedback signal, target signal and EEG. Specifically, we formulate the learning objective as maximizing a kernel-based mutual information estimate with respect to the spatial-spectral filtering parameters. We then derive a gradient-based optimization algorithm for the learning task. An experimental study is conducted using offline simulation. The results show that the proposed method is able to construct effective feature spaces to capture the discriminative information in feedback training data and, consequently, the prediction error can be significantly reduced using the new features.
Learning From Feedback Training Data in
Self-paced Brain-Computer Interface
Haihong Zhang, Sidath Ravindra Liyanage, Chuanchu Wang,
and Cuntai Guan
H. Zhang, C. Wang and C. Guan are with Institute for Infocomm Research, Agency
for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #21-01
Connexis, Singapore 138632. (email: {hhzhang, ccwang, ctguan}@i2r.a-star.edu.sg).
S.R. Liyanage is affilated with National University of Singapore, Singapore. (email:
sidath@nus.edu.sg)
Abstract. Inherent changes that appear in brain signals when transferring from
calibration to feedback session is a challenging but critical issue in brain-computer
interface (BCI) applications. While previous studies have mostly focused on adaptation
of classifiers, in this paper we study the feasibility and the importance of adaptation
of feature extraction in a self-paced BCI paradigm. First, we conduct calibration and
feedback training on able-bodied na¨ıve subjects using a new self-paced motor imagery
BCI including idle state. The online results suggest that the feature space constructed
from calibration data may become ineffective during feedback sessions. Hence, we
propose a new supervised method that learns from a feedback session to construct a
more appropriate feature space, on the basis of maximum mutual information principle
between feedback signal, target signal and EEG. Specifically, we formulate the learning
objective as maximizing a kernel-based mutual information estimate with respect to
the spatial-spectral filtering parameters. We then derive a gradient-based optimization
algorithm for the learning task. An experimental study is conducted using offline
simulation. The results show that the proposed method is able to construct effective
feature spaces to capture the discriminative information in feedback training data, and
consequently, the prediction error can be significantly reduced using the new features.
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 2
1. Introduction
Inherent changes in brain signals, either between calibration sessions or from calibration
to feedback application, poses a critical challenge to EEG-based brain-computer
interface (BCI) research [1, 2, 3], and has recently attracted a surge of attention in
the field [4, 5, 6, 7, 8, 9, 10, 11, 12]. Particularly there has been a lot of interest in BCI
using motor imagery (MI) [2, 13, 14] which is the imagination or mental rehearsal of a
motor action without any real motor output.
The underlying non-stationarity of EEG signal accounts for much of the changes,
where the distribution of electrical fields on the scalp is subject to large variations
over time. The non-stationarity can be caused by shifts in background brain activities,
varying mental states, or individual users changing their strategy for BCI control [4].
Especially in feedback applications, more brain functions can be activated to further
complicate the changes in EEG, giving rise to complex EEG phenomena such as error
potentials [15] or rhythmic power shifts over the scalp [5]. Consequently, the feature
extraction and prediction models (e.g. a classifier) built on data from past BCI sessions
data may become ineffective. Therefore, there is a strong need for new mathematical
models capable of accurately predicting a user’s intentions from his/her brain signals in
session-to-session transfer. Adaptive BCI that can learn from new data, in supervised,
semi-supervised or unsupervised manner is a viable approach to solve this problem.
So far most of the works on adaptive BCI have focused on adaptation of the
classifiers. In [5], three supervised adaptation methods using labelled data were
investigated. These included a simple bias adjustment technique, a linear discriminant
analysis (LDA) retraining technique, and a technique which retrains both LDA and
common spatial pattern (CSP)[16]-based feature extraction. It was reported that
overall the LDA-retraining approach yielded the lowest error rate. In [17], a covariance
shift algorithm was introduced for unsupervised adaptation of the linear classifier.
Particularly, the covariance shift algorithm is able to perform without neither labelling
data nor predicting labels. In [18], the method for adaptation was further developed,
and combined with a bagging approach which resulted in improved stability. More
recently in [8], different types of adaptation methods were extensively studied using
multiple BCI data sets. And the result was in favour of a bias adjustment method than
generic covariance shift adaptation.
Another interesting online BCI was presented in [7], where a quadratic
discriminative analysis classifier was adapted in every cue-based feedback trial. It
showed that the distribution of EEG features shifted significantly from one session to
another. The BCI was further studied in [10]. Different from those systems using CSP
features mentioned earlier, the BCI basically used adaptive autoregressive features, or
band powers, or the combination of the two. In [6], a classifier with band power features
as input was updated continuously, where only non-feedback (i.e. calibration) sessions
were used for offline study.
However, little works have been devoted to the adaptation of feature extraction
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 3
models especially for exploring feedback training data including idle state. As indicated
in experimental results in [7] and [8], it appears that the non-stationarity may not be
solved by adapting classifiers alone. Rather, possible significant brain signal changes
from calibration to feedback training sessions may render the feature space derived
from calibration data ineffective where little discriminative information can then be
recovered.
Therefore, the primary purpose of this work is to validate the feasibility and the
importance of adapting feature extraction models, especially for self-paced MI BCI that
allows continuous feedback control [19, 20, 21, 22, 23, 24]. It seems that adapting feature
extraction models can be a challenging issue, in view of the unsatisfactory performance
of retrained CSP models in [5].
First, we develop and test a new self-paced BCI, and study calibration and feedback
training on three able-bodied, na¨ıve subjects. The empirical result poses questions on
the efficacy of applying the feature space derived from calibration data to feedback
sessions.
Hence, we propose a new supervised method that learns from a feedback session to
construct a more appropriate feature space. Particularly, the method tries to account
for the underlying complex relationships between feedback signal, target signal and
EEG, using a mutual information formulation. The learning objective is formulated
as maximizing kernel-based mutual information estimation with respect to the spatial-
spectral filters. We then derive a gradient-based optimization algorithm for the learning
task.
An experimental study is conducted using offline simulation. The results show that
the proposed method is capable of constructing effective feature spaces that capture
more discriminative information in the feedback sessions. Consequently, the prediction
errors can also be significantly reduced by using the new features.
The rest of the paper is organized as follows. Section 2 describes the data collection
with a self-paced BCI, as well as the online training result. Section 3 elaborates the new
method for learning effective spatial and spectral features from feedback session data.
Section 4 presents an extensive analysis, followed by discussions in Section 5. Section 6
finally concludes the paper.
2. Materials
2.1. Feedback training data collection
Three BCI-na¨ıve adults participated as BCI subjects in the data collection. All gave
informed consent, which has been reviewed and approved by the Institutional Review
Board of the National University of Singapore. The subjects were seated comfortably
in an armed chair, with their hands rested on the chair arms or on the table in front
of them. A 20-inch widescreen LCD monitor was placed on the table at a distance of
approximately 1 meter to the subject. Subjects were asked to remain still comfortably
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 4
R
L
R
Calibration Feedback Training
R
Figure 1. The graphic user interface for calibration (left panel) and for self-paced
feedback training (right panel). The grey and blue color block scrolls smoothly upwards
in the background, and the red circle in the center serves as the eye-fixation point.
During feedback training, the horizontal position of the red circle serves as the feedback
signal that updates every 40 milliseconds, while its trajectory over the background
blocks is depicted by a red curve.
to avoid movement artifacts.
EEG was recorded using Neuroscan NuAmps 40-channel data acquisition system,
with electrodes placed according to an extended international 10-20 system and a
sampling frequency of 500Hz. A total of 30 channels were used, including F7, F3,
Fz, F4, F8, FT7, FC3, FC4, FT8, T7, C3, Cz, C4, T8, TP7, CP3, CPz, CP4, TP8, P7,
P3, Pz, P4, P8, O1, Oz, O2, PO1, PO2. The reference electrode was attached to the
right ear. A high-pass filter at 0.05Hz was applied in the Neuroscan’s data acquisition
setting.
The subjects faced a graphic user interface displayed on the LCD monitor as
illustrated in Fig. 1, which guided them through the following sessions.
Calibration session. This session consisted of 40 MI tasks; each was 4-second-
long and followed by a 6-second idle state. The MI tasks were evenly and pseudo-
randomly distributed into left and right hand MI tasks. A graphic user interface
illustrated in the left panel of Fig. 1 guided the subjects through the session, where
a red circle in the middle served as the eye fixation point. In the background, a
sequence of rectangular shapes was scrolling upwards, representing left/right hand
MI tasks by blue color boxes on the left/right side, or idle state tasks by grey
bars. Specifically, when the red circle was in a grey-color bar, the subject should
relax while minimizing physical movements; otherwise, the subject should imagine
left/right hand movement, if a blue-color box was on the left/right side of the circle.
The filter-bank CSP (FBCSP) [25, 26, 27] method, which was the first winner of BCI
Competition IV Dataset I [28] was employed to build subject-specific MI detection
models. The method learned from the calibration data two separate models, one
for differentiating between left-hand MI and idle state (hereafter referred to as
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 5
L-model), and the other for between right-hand MI versus idle state (hereafter R-
model). For the L-model (or the R-model), each 2.5-second-long shift window of
EEG with a step of 0.5 second was mapped to the label of the data: 0 if the time
window ends in an idle state time period, 1 (or -1) if in a left-hand (or right hand)
MI period. The mapping parameters were obtained using the linear least-mean-
square method.
Since a user’s mental state could be uncertain and varying during the transition
period from one state to another, we defined a grey region as [-1 1] second with
respect to the boundary of each idle/MI task, and excluded from FBCSP learning
any EEG segments with centers in this grey region.
Feedback training sessions. After calibration, each subject participated in 4
sessions of feedback training, i.e. 2 sessions of left-hand MI BCI training using
the L-model and 2 sessions of right-hand MI training using the R-model. This
arrangement allowed a subject to concentrate in each session on a particular MI
task. A training session consisted of 20 MI tasks, where each lasted 5-second and
was followed by a 6-second idle state. A graphical user interface illustrated on
the right panel of Fig. 1 guided the user through the session. The meaning of the
graph was similar to that for calibration, except that the red circle was moving
horizontally as a feedback signal: its horizontal position was determined by the
FBCSP output updated every 40-millisecond.
During the feedback training, the subjects tried to move the red circle to the
left/right side as far as possible during left-hand/right-hand MI tasks. We would
like to emphasize that, the subjects were requested not to voluntarily control the
feedback signal by any means during periods of idle state. This is because voluntary
control of the feedback signal would spoil the idle state data.
In-between sessions were short breaks. The first feedback training session started
within 5 minutes after the calibration session. And the interval between consecutive
feedback sessions was from 1 to 5 minutes. Note that a special tryout session was in
place after the calibration, where every subject tried online feedback for a short while
so as to get a feeling of the feedback and also to prepare for the actual training sessions.
The tryout session was not included in the analysis.
We would like to briefly introduce the FBCSP method used in the online
experiment, since it will also be compared with the proposed learning method later.
FBCSP was introduced in [25] as a feature selection algorithm that combines a filter bank
framework with the spatial filtering technique CSP. More specifically, it decomposes
EEG data into an array of pass-bands, performs CSP in each band, and selects a
reduced set of features from all the bands. Its efficacy was demonstrated in the latest
BCI Competition [28], where it served as the basis of all the winning algorithms in the
EEG categories. FBCSP was improved in [26] by employing a robust maximum mutual
information criterion for feature selection.
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 6
Sub 1 Sub 2 Sub 3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
MSE
Left MI Training
Session 1
Session 2
Sub 1 Sub 2 Sub 3
0
0.2
0.4
0.6
0.8
1
MSE
Right MI Training
Session 1
Session 2
Figure 2. Online performance of subjects in terms of mean square error between
feedback signal and target. There is a strong bias shift (from calibration to feedback)
in right motor imagery (MI) sessions in Subject 3, which explains his particularly large
error.
2.2. Data screening
The recorded EEG data during feedback training sessions were inspected visually
using MATLAB by the authors. Any EEG segments indentified of EOG and EMG
contamination [29] were rejected and excluded from the analysis. Again, we defined the
grey regions in a similar way to the calibration method described above. Therefore, any
EEG segments centered within [-1 1] second with respect to any task boundary were
excluded from the analysis.
2.3. Online performance and initial data analysis
Online performance was assessed using the mean-square-error (MSE) measure between
the feedback signal and the target signal. Fig. 2 plots the bar graph of MSE in each
feedback training session. The error was apparently comparable between the first
training session and the second in most cases. This actually indicates that online
feedback training in BCI can be a difficult task, since it was anticipated the subjects
should have gained better control of the BCI over training sessions. Again, this indicates
the necessity of adapting models during session to session transfers.
To further understand the feedback training data, we plot in Fig. 3 the distribution
of EEG feature vector samples produced by FBCSP. Note that for clarity of presentation,
we used evenly re-sampled feature vector samples because the original samples count up
to thousands. As expected, the MI class samples and the idle class samples were easily
separable in the calibration data, but the discriminative information had disappeared
in the same feature space in most feedback training sessions. As a consequence, either
there was no effective separation between the two classes, or the separation hyper-plane
was severely altered (similar to some cases in [7, 8]).
Therefore, it is advisable to first look into the issue of ineffective feature space
before trying to adapt a classifier/regressor. To address this issue, we propose a new
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 7
Left MI
Calibration
Left MI
Session 1
Left MI
Session 2
Sub 1 Sub 2 Sub 3
Right MI
Calibration
Right MI
Session 1
Right MI
Session 2
Sub 1 Sub 2 Sub 3
Figure 3. Feature distributions during motor imagery (MI) calibration and feedback
training sessions, for left MI in the upper three rows or right MI in the lower three
rows. The horizontal axis and the vertical axis are the first and the second FBCSP
features. The axis range is made consistent in each column (i.e. each subject). Red
circles represent motor imagery samples, while black crosses denote idle state samples.
Note the significant change especially in the distribution of motor imagery samples.
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 8
method to learn an effective feature space from feedback data. We also would like to
note that, compared with calibration data, online feedback training data poses more
challenges to effective feature extraction, because the feedback may involve more brain
functions and produce more complex EEG phenomena [5, 15].
3. The new learning method
3.1. Spatio-Spectral Features
The primary phenomenon of MI EEG is event-related desynchronization(ERD) or event-
related synchronization(ERS) [2, 13], the attenuation or increase of the rhythmic activity
over the sensorimotor cortex generally in the µ(8-14 Hz) and β(14-30 Hz) rhythms. The
ERD/ERS can be induced by both imagined movements in healthy people or intended
movements in paralyzed patients [21, 30, 31]. It is noteworthy that another neurological
phenomenon called Bereitschaftspotential is also associated with MI EEG but non-
oscillatory [14]. In this works we consider ERD/ERS features only.
Feature extraction of ERD/ERS is, however, a challenging task due to its poor low
signal to noise ratio. Therefore, spatial filtering in conjunction with frequency selection
(via processing in either temporal domain or spectral domain) in multi-channel EEG
has been highly successful for increasing the signal to noise ratio [16, 32, 27, 33, 34].
Let’s consider the spatial-spectral filtering in the spectral domain, where each nc-
channel EEG segment with a sampling rate of Fs-Hz can described by an nc×nfmatrix.
X=
x11 · · · x1nf
.
.
.....
.
.
xnc1· · · xncnf
(1)
where xij denotes the discrete Fourier transform of the i-th channel at frequency
ωj=j1
2nfFs.
A joint spatial-spectral filter on Xcan be essentially represented by a spatial
filtering vector wRnc×1and a spectral filter vector fRnf×1. The feature y0is
the energy of the EEG segment after filtering:
y0= diag n]
wTXwTXof(2)
where the wave line eon the right side of the equation denotes the conjugate of a
complex value, and the diag() function stands for the diagonal vector of a matrix.
In this work we consider a general case in which multiple spatial filters are associated
with one particular spectral filter. Therefore, the feature extraction model is determined
by the matrix fand a vector W, the latter being the collection of spatial filters in
columns:
W= [w1. . . wnw] (3)
Suppose the spectral filters in Fare given (see the last paragraph of Section. 3.3
for details), we can use the following shorthand for the auto-correlation matrix of EEG
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 9
processed by the k-th spectral filter
ˆ
Xk=
nf
X
i
fiXe
X,(4)
and express the logarithmic feature vector by
y=hlog(w1ˆ
X1wT
1),...,log(wnwˆ
XnwwT
nw)iT
(5)
3.2. Formulation of the objective function for learning
To capture the underlying complex structure of spatio-spectral data in ERD/ERS, we
would like to design a mutual information based objective function for learning W
and F. Mutual information [35], which stemmed from information theory, basically
measures the reduction of uncertainty about class labels due to the knowledge of the
features. Readers interested in mutual information-based feature extraction/selection
may find related works in [36, 37, 38, 39, 40, 41].
For feedback training data, we consider a mutual information measure ˆ
Ibetween
the class labels and the EEG features as well as the feedback signal. Specifically, the
mutual information is between the class label (i.e. the variable to be predicted) and the
observations including both the feedback signal and the EEG feature vector. Let the
random variables of the label, the EEG feature vector, and the feedback signal be C,Y
and Z, respectively. There is
ˆ
I({Y,Z},C) = ˆ
H(Y,Z)X
c
P(c)ˆ
H(Y,Z|c) (6)
where ˆ
Hdenotes the entropy measure of a random variable.
Like [41, 39], we resort to a non-parametric approach for mutual information
estimation, since it does not rely on the underlying distributions.
Suppose the feedback training data comprise lsamples of EEG to be represented
by the feature vectors yis and the concurrent feedback signal zis (i[1, . . . , l]). The
non-parametric approach computes each entropy in Eq. 6 separately, e.g. ˆ
H(Y,Z) by
ˆ
H(Y,Z) = 1
l
l
X
i=1
log (1
l
l
X
j=1
ϕy(yi,yj)ϕz(zi, zj)),(7)
And ϕyand ϕzare kernel functions and usually take a Gaussian form. For example,
ϕ(y,yi) = αexp1
2(yyi)TΨ1(yyi).(8)
The coefficient αis discarded hereafter because it will be cancelled out when Eq. 8
is substituted in Eq. 7 and then substituted in Eq. 6. It should be noted that the kernel
size matrix Ψis diagonal, and each diagonal element is determined by
ψk,k =ζ1
l1
l
X
i=1
(yik ¯yk)2.(9)
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 10
where ¯
ykis the empirical mean of yk, and we set the coefficient ζ=4
3l0.1according to
the normal optimal smoothing strategy [42].
The conditional entropy ˆ
H(Y|c) in Eq. 6 can also be estimated similar to Eq. 7,
but using samples from class-conly.
Using the maximum mutual information principle [36], we now define the learning
task as searching for the optimum spatial and spectral filters Wand Fthat satisfies
{W,F}opt = argmax
{W,F}
ˆ
I({Y,Z},C) (10)
The above formulation describes the inter-dependency between the target signal,
the feedback signal and the EEG signal as a function over the feature extraction
parameters in spatial-spectral filters. It basically aims to maximize the information
about the target signal to be predicted, contained in the extracted features in
conjunction with feedback. Please refer Section 5 for a further discussion on this
formulation.
3.3. Gradient-based solution to the learning problem
Here we propose a numerical solution to Eq. 10 by devising a gradient-based optimization
algorithm. We consider a spatial filter vector wk, and note that the gradient of the
objective function ˆ
Iwith respect to wkis
wkˆ
I({Y,Z},C) = wkˆ
H(Y,Z)X
c∈C
P(c)wkˆ
H(Y,Z|c) (11)
From Eq. 7, we have
wkˆ
H(Y,Z) = 1
l
l
X
i=1
βi
1
l
l
X
j=1
ϕz(zi, zj)∂ϕy(yi,yj)
wk
(12)
where
βi= 1
l
l
X
j=1
ϕz(zi, zj)ϕy(yi,yj)!1
(13)
From Eq. 8, we have
∂ϕy(yi,yj)
wk
=1
2ϕy(yi,yj)(yiyj)TΨ1(yiyj)
wk
(14)
Let’s denote the quadratic function (yiyj)TΨ1(yiyj) by ϑij, which can be
further decomposed to,
ϑij =
do
X
k1=1
do
X
k2=1
ψ1
k1k2(yik1yjk1)(yik2yj k2).(15)
Hence, the gradient of ϑij is
∂ϑij
wk
=
do
X
k1=1
do
X
k2=1 "∂ψ1
k1k2
wk
(yik1yjk1)(yik2yj k2)
+ψ1
k1k2
(yik1yjk1)(yik2yj k2)
wk(16)
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 11
Consider that (yik1yjk2)2is a function of wkif and only if k1=kand/or k2=k,
and ψ1
k1k2is a function of wkif and only if k1=k2=k. Furthermore, ψ1
k1k2= 0 only if
k16=kor k26=k. The expression of the gradient above can be written as
∂ϑij
wk
=∂ψ1
kk
wk
(yik yjk )2+ψ1
kk
(yik yjk )2
wk
(17)
From Eq. 9, we have
∂ψ1
k,k
wk
=2ζ
ψ2
k,k (l1)
l
X
i0=1
(yi0k¯yk)(yi0k¯yk)
wk
(18)
where ¯ykdenotes the mean value of yi0ks, and its partial derivative w.r.t. wkcan be
expressed by
¯yk
wk
=1
l
l
X
i00
∂yi00 k
wk
(19)
We further note that ˆ
Xki (the auto-correlation matrix for the i-th EEG sample
processed by the k-th spectral filter, see Eq. 4)) is conjugate symmetric, and
∂yik
wk
=(ˆ
Xki +ˆ
XT
ki)wk
yik
=2Re( ˆ
Xki)wk
yik
(20)
where Re() denotes the real part of a complex matrix. The derivatives of yi0kand yj k
can be computed the same way as above.
We can summarize the above steps as follows.
wkˆ
H(Y) = Awk,(21)
where
A=2
l2
l
X
i=1
βi
l
X
j=1
ϕz(zi, zj)ϕy(yi,yj)"ζ(yik yjk )2
ψ2
k,k (l1) ·
l
X
i0=1
(yi0k¯yk) Re( ˆ
Xki0)
yi0k
1
l
l
X
i00
Re( ˆ
Xki00 )
yi00k!+
ψ1
kk (yik yjk ) Re( ˆ
Xki)
yik
Re( ˆ
Xkj )
yjk !# (22)
There will be, for each conditional entropy ˆ
H(Y|c), an equation similar to Eq. 21.
Then the gradient of the objective function Iwith respect to the spatial filter wkis
wkˆ
I({Y,Z},C) = AX
c
P(c)Ac!wk(23)
We would like to note that the above equation does not suggest that the gradient is
a linear function over wk, since the multiplier term (APcP(c)Ac) itself is a rather
complicated function over {yi}which in turn is a function of W.
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 12
With the gradient information, our iterative optimization algorithm updates a
spatial filter by
w(iter+1)
k=w(iter)
k+λwkˆ
I({Y(iter),Z},C) (24)
where λis the step size. In this work, we utilize a line search procedure to determine the
step size in each of the iteration. Note that all spatial filter vectors in Ware updated
together.
In our implementation, the line search procedure tests a number of (tentatively 16)
λvalues in the range of [-0.05 0.10]×ξ, and decreases ξin logarithm scale until a local
maximum of Iis found but not at λ= 0. The λfor the local maximum is then used
to update all the spatial filters wks in Eq. 24, and then the optimization procedure
proceeds to the next iteration.
The iterations will terminate when a convergence criterion is met. In this work,
we use a simple criterion: mutual information gain less than 1e-5. Since the iterative
algorithm is a typical gradient-based greedy optimization method, the pseudo-code is
omitted to save space.
The initial values for wkcan be learned by the CSP method [16] that maximizes
the Rayleigh coefficient
wkPl1
i=1 ˆ
Xkiwk
wkPl0
j=1 ˆ
Xkj wk
(25)
where ˆ
Xki denotes the i-th sample of motor imagery EEG while ˆ
Xkj the j-th sample of
idle state EEG.
Finally, we describe how to select the spectral filters for F. Like FBCSP, we can
also create a set of candidate spectral filters consisting of band-pass filters that cover the
motor imagery EEG spectrum. For instance, in the experimental study to be introduced
in the next section, we borrowed the filter banks configuration from [26] that had 8 band-
pass filters with central frequency ranging from 4 to 32 Hz. After band-pass filtering in
spectral domain, we trained CSP according to Eq. 25 to extract discriminative energy
features. Then we selected the optimum nwfeatures from all, using the method in [26].
The spectral filters associated with the optimum features then comprised the matrix F.
4. Results
We conducted an offline simulation of the self-paced BCI using the online feedback
training data. The simulation ran in MATLAB, and the proposed method was
implemented in hybrid MATLAB and C code so as to improve computation and
programming efficiency. The EEG features together with the feedback signal zserved
the inputs to a regressor (please refer to the Discussions section below for a related
discussion), in order to predict the target value of 0 (idle state), -1 (right-hand MI) or
1 (left-hand MI). We employed a linear support vector regression using the LibSVM
toolbox [43]. Note that we had attempted other regression methods such as Gaussian-
kernel non-linear support vector regression, linear mean-square-error regression. But, no
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 13
−3 −2.5 −2 −1.5 −1 −0.5 0
0.2
0.4
0.6
0.8
1
1.2
φ
θ
CSP
This Method
Figure 4. Optimization on the mutual information surface: an example with a spatial
filter vector for three-channel EEG. See Section 4.1 for details.
significant difference was found in the results, and we will only show the linear support
vector regression results here.
Similar to the online feedback training described in Section 2, the offline simulation
tested left-hand MI BCI and right-hand MI BCI separately. For example, for the left-
hand MI BCI, the first left-hand MI training session was used to learn the optimum
spatial-spectral filtering and then the linear support vector regressor was trained. Next,
the feature extraction and regression was tested on the second left-hand MI training
session. The simulation used a 2-second long shift window with a step of 0.4 second.
For a comparative analysis with the state-of-the-art, we also tested the FBCSP
using the same setting.
4.1. Convergence of the optimization algorithm
We studied the convergence of the optimization algorithm. First, we considered a simple
scenario which included only three EEG channels (CP3,CPz,CP4) and one spatial filter.
We would like to note that similar findings were also obtained in our extensive tests
that used different selection of channels around the sensori-motor cortex regions, e.g.
C3,Cz,C4.
Since the mutual information measure is always invariant to non-zero norm of the
spatial filter, we set the norm of the spatial filter to 1 without loss of generality.
Therefore, the spatial filter can be represented by two variables in the spherical
coordinate system: θ= acos(w3) and φ= atan(w2
w1). This should not be confused
with the Euclidean space where the actual optimization takes place. The two-variable
representation is just meant for visualization.
Fig. 4 shows a typical example from the left-hand MI learning in Subject 2. The
spatial filter solution migrated in 4 steps from the initial point (generated by CSP) to
approximately a local maximum where the iteration converged (mutual information gain
<1e-5).
The algorithm was initialized using the method described in the previous section,
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 14
Left MI
Session 2
Right MI
Session 2
Sub 1 Sub 2 Sub 3
Figure 5. Feature distributions by the proposed learning method for the left/right
motor imagery (MI) feedback training session 2. The horizontal axis and the vertical
axis are respectively the first and the second features learned by the learning method.
The graphs in the upper row are generated from left MI training data, while the lower
row from right MI training data. Red circles represent motor imagery samples, while
black crosses denote idle state samples. See Fig. 3 (especially the bottom row for the
same session) for a comparison.
and then in most cases the optimization algorithm converged within 7 iterations. We
also tested random spatial filters for initialization, and the iteration procedure generally
became longer but converged within 50 iterations in all 100 test runs.
4.2. Feature distributions
We used the first feedback training session to learn 2 spatial-spectral filters by the
proposed method, and extracted EEG features from the second feedback session. Fig. 5
plots the distribution of the features (as the original samples amount to thousands, we
used evenly re-sampled feature vector samples for a clear presentation).
Comparing with those features produced by calibration models in Fig. 3 (especially
in the bottom row for the same training session), the new features appear to be more
separable between the MI classes and the idle states. To verify this, we assess the
separability in terms of classification accuracy by a linear support vector machine (using
the same LibSVM toolbox from [43]). The result on the original features and that on
the new features are compared in Table 1.
The table clearly indicates that the proposed method, which adapted both the
classifier and the feature extraction model, produced significantly better performance in
terms of class separability, than when only the classifier was adapted. This verifies our
argument in the introduction that the non-stationarity in EEG may not be solved by
adapting classifiers alone. Rather, it is advisable to adapt both the feature extraction
model and the classifier so as to accurately capture the variation of EEG over time.
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 15
Features Sub 1 Sub 2 Sub 3
Left MI Original 73.7% 79.0% 66.9%
This Method 85.0% 84.8% 81.0%
Light MI Original 67.9% 59.7% 78.1%
This Method 80.0% 69.6% 84.0%
Table 1. Class separability: new feature space (“This method”) versus original feature
space (“Original”). Class separability is measured as the classification accuracy by a
linear support vector machine that is adapted to the data (feedback training session 2).
Note “Original” uses adaptation of classifier only, while “This method” adapts both
the classifier and the feature extraction model. The higher accuracy rates between the
two feature spaces are shown in bold style. See Section 4.2 for related discription.
4.3. Accuracy of feedback control prediction
We investigate whether the new features can generate better prediction of user state. We
would also like to test the adaptation of regressor, since the classification hyper-plane
may have shifted from the first feedback session to the second. Therefore, we tested
a supervised adaptation, which used a portion (called adaptation data which started
from the beginning of the session) of the second feedback session, and re-trained the
regressor (using both the adaptation data and the first feedback session data, and tested
the models on the remainder of the second feedback session. We examined different sizes
for the adaptation data in terms of percentage of the whole session, ranging from 0 (i.e.
no adaptation) to 0.45.
FBCSP was also evaluated using the same method for comparison. And the
comparative results are illustrated in Fig. 6. Apparently, both FBCSP and the proposed
method can learn a much more accurate predictor from the first feedback session than
the original BCI that used only the calibration data. Furthermore, the prediction error
was also effectively reduced by the supervised adaptation. But, this improvement is
not as significant as the improvement observed from the original BCI to the proposed
method. Furthermore, the proposed method also consistently outperformed FBCSP,
significantly in most cases.
We examined the impact of the new method on the feedback signal curves. Fig. 7
illustrates a graph comparing the new feedback signal to the original feedback signal,
for Subject 2. Clearly, the new feedback signal curve followed the target curve much
more accurately.
We also investigated if the new method works with a reduced set of channels.
Particularly, we tested 15, 9 and 6 channels (see Table 2 for the channel names), and
ran the proposed method and FBCSP, respectively using the same method described
above (see Figure 6), and performed t-test to check if our method produced lower MSE
with statistical significance compared with FBCSP and the original feedback training
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 16
0 0.1 0.2 0.3 0.4 0.5
0.15
0.2
0.25
MSE
Percentage of Session 2 data for adaptation of regressor
Left−hand motor imagery training
Original
FBCSP
This Method
0 0.1 0.2 0.3 0.4 0.5
0.1
0.2
0.3
0.4
0.5
0.6
MSE
Percentage of Session 2 data for adaptation of regressor
Right−hand motor imagery training
Original
FBCSP
This Method
Figure 6. Comparison of prediction error in terms of mean-square-error (MSE) by
different methods. The horizontal axis denotes the percentage of the second feedback
session being used for re-training the support vector regression machine that maps
EEG features to the target signal. For the original online feedback, there is no re-
training but MSE is computed at each percentage point using the same test set. The
test set is the second feedback session excluding the part for regressor re-training. The
curves plot the average of MSE over the three subjects, while the vertical line centered
at the each point represents the standard deviation by its length. See Section 4.3 for
related description.
10 20 30 40 50 60 70 80 90 100
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Time (second)
Feedback Signal
Target
Original Online
This Method
Figure 7. Comparison between target, original feedback signal and the new prediction
by the proposed method. Here is an example from Subject-2’s left motor imagery
training session. The timing is in alternation between approximately 5-second motor
imagery (target=1) and 6-second idle state (target=0) except the first idle state period
which is slightly longer.
result.
The result indicates that the new method improved the performance in terms of
MSE with statistical significance in all the channel sets being tested. While if we
compare the new method with FBCSP, it still yielded significant lower MSE with
as few as 9 channels. In the case of 6 channels, the method and FBCSP produced
comparable result, while both significantly outperforming the original model constructed
from calibration only.
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 17
0 0.1 0.2 0.3 0.4 0.5
0.1
0.15
0.2
0.25
0.3
0.35
0.4
MSE
Percentage of Session 2 data for adaptation of regressor
Left−hand motor imagery training
Original
FBCSP
This Method
0 0.1 0.2 0.3 0.4 0.5
0.1
0.2
0.3
0.4
0.5
0.6
MSE
Percentage of Session 2 data for adaptation of regressor
Right−hand motor imagery training
Original
FBCSP
This Method
Figure 8. Comparison of prediction error in terms of mean-square-error (MSE)
by different methods using 9 EEG channels only. See Figure 6 and Section 4.3 for
descriptions.
#Ch Data p-value Channel Names
This vs FBCSP This vs Original
All Left MI <0.01 <0.01 All 30 Channels (See Section 2).
Right MI <0.04 <0.01
15 Left MI <0.01 <0.01 F3,F4,FC3,FCz,FC4,T3,Cz,
Right MI 0.09 <0.01 C4,T4,CP3,CPz,CP4,P3,P4
9Left MI <0.01 <0.01 FC3,FCz,FC4,C3,Cz,C4,CP3,
Right MI 0.86 <0.01 CPz,CP4
6Left MI 0.48 <0.01 FC3,FC4,C3,C4,CP3,CP4
Right MI 0.93 <0.01
Table 2. Statistical paired t-test (p-value shown here) of comparing the new method’s
MSE with that of FBCSP or the original feedback training result, using different
number of channels. Significant results with p-value <0.05 are shown in bold.
5. Discussions
The figure 6 gives clear evidence that the proposed method of using the new spatial-
spectral learning algorithm can significantly increase the prediction accuracy. The mean
MSE for left (or right) MI feedback training was effectively reduced from approximately
0.3(or 0.5) to a slightly lesser value of 0.2 (or 0.25). The improved accuracy can also be
seen in the prediction curves in the example case of Fig. 7, which actually showcases a
reduction of MSE from 0.24 to 0.13.
The increased accuracy can be largely attributed to the improved feature space
shown in Fig. 5 in contrast to the original feature spaces in Fig. 3. The original
feature space that was used in feedback training was built using the calibration data.
The changes of feature distributions in the original feature space have highlighted the
effect of session-to-session transfer, which is generally consistent with prior studies on
adaptive BCI. Thus, during feedback sessions, the motor imagery EEG and idle-state
EEG was predominantly non-separable. Even if they were separable it was subject to
distribution shift. On the other hand, the new feature space was learned from feedback
training data comprised of three sources of information, namely, EEG, target signal and
feedback signal. Therefore, it has been able to capture essential information for user
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 18
state prediction during online feedback training.
It is also worthwhile to mention again that the new model uses a non-parametric
formulation for learning, which aims to account for arbitrary dependencies among EEG,
target and feedback signals. Section 4.1 has shown that our optimization algorithm,
derived through the new formulation has a good convergence properties. Fig. 4 has
shown that the objective function surface for the 3-channel EEG data is smooth, which
is a favorable condition for the greedy algorithm. However, we expect that the mutual
information surface can become far more complicated, especially for EEG data with a
large number of channels. Therefore, future research may investigate more advanced
optimization techniques. However, such techniques would usually incur much heavier
computational costs.
While this work has focused on the development and validation of a new learning
method for adaptive BCI, it would be interesting to investigate its performance during
online training. Even though it is beyond the scope of this paper, it is within the scope
of our ongoing research. Generally, a large number of subjects would be required in
order to draw statistically significant comparisons between adaptive and non-adaptive
BCI systems.
It is also interesting to look back into the formulation of objective formulation in
Section 3.2. As stated earlier, the goal is to maximize the information about the target
signal to be predicted, contained in the EEG features in conjunction with the feedback.
Therefore, it is advisable to include both the new EEG features and the prediction
outputs of the current model as inputs to the classifier or regression machine in the
new model. Importantly, the feedback serves two purposes: not only does it serve as a
visual “stimulus” to the subject, but it also represents the current prediction model that
contains essential information extracted from earlier calibration/feedback sessions. The
first rationale is that, feedback and its relative position to the target signal may have
an effect on brain activations to complicate motor imagery EEG. The second function
gives rise to multiple implications as explained below. First, the formulation considers
only the output of the current BCI model but not the internal mechanism of the model.
Thus, it can work with any BCI model and adapt them during new feedback training
sessions. Secondly, if a user with a prediction model can control the feedback signal to
match the target signal satisfactorily during a feedback session, further re-adaptation of
the prediction model can be unnecessary as co-adaption of user and machine has already
been achieved. This can also be viewed as a special case of the objective function Eq. 10:
if the feedback variable Zin the objective function already carries essential information
about the target signal C, re-adaptation of BCI by including new EEG features would
produce no significant gain in the objective function.
We would like to emphasize again that the proposed method works in a supervised
learning fashion. In other words, it requires the data labels for adaptative learning.
Unlike unsupervised or semi-supervised online learning approach, this enables the
learning system to measure the compliance of a subject to the BCI tasks, so as to
ensure the stability of the adaptation process.
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 19
The proposed method with the current solution may be more suited for offline
adaptation than for online adaptation. In online adaptation, both user training and
machine adaptation take place at the same time. While in offline adaptation, machine
adaptation is performed after the user finishes a training session. Although this method
is applicable to online adaptation, the expensive computation can be a serious concern
for practical online use. We estimate that the computational complexity of computing
the gradient by Eq. 23 and Eq. 22 is on the order of O(l2n2
c) and that of evaluating the
objective function by Eq. 7 and Eq. 6 is O(l2nc). Here ldenotes the number of samples
and ncthe number of channels. In our experimental setup for the results presented
in Section 4, we implemented a learning code using hybrid MATLAB and C coding
without multi-threading. On our test computer with a Xeon CPU at 2.93GHz, the
code took approximately 130 seconds to complete one iteration for nc= 30-channel
EEG data, or 18 seconds for nc= 6-channel EEG data, both of l= 2230 time segment
samples. The primary cause of the high computational complexity is the non-parametric
(kernel-based) nature of the method that requires computation in each pair of samples.
Therefore, a possible solution to this problem will be to reduce the number of samples
for adaptation but without losing useful information.
6. Conclusion
In this paper we have studied and addressed the critical issue of session-to-session
transfer in brain-computer interface (BCI). While previous studies have often focused
on adaptation of classifiers, we have shown the importance of and the feasibility
of adapting feature extraction models within a self-paced BCI paradigm. First, we
conducted calibration and feedback training on able-bodied na¨ıve subjects using a new
self-paced motor imagery BCI including idle state. The online results suggested that
the feature extraction models built from calibration data may not generalize well to
feedback sessions. Hence, we have proposed a new supervised adaptation method that
learns from feedback data to construct a more appropriate model for feedback training.
Specifically, we have formulated the learning objective as a maximization of kernel-
based mutual information estimation with respect to spatial-spectral filters. We have
also derived a gradient-based optimization algorithm for the learning task. We have
conducted an experimental study through offline simulations and the results suggest
that the proposed method can significantly increase prediction accuracies for feedback
training sessions.
[1] J.R. Wolpaw, N. Birbaumer, D.J. MacFarland, G. Pfurtscheller, and T.M. Vaughan. Brain-
computer interface for communication and control. Clinical Neurophysiology, 113:767–791, 2002.
[2] G. Pfurtscheller, C. Neuper, D. Flotzinger, and M. Pregenzer. EEG-based discrimnation
between imagination of right and left hand movement. Electroencephalography and Clinical
Neurophysiology, 103:642–651, 1997.
[3] A. Nijholt and D. Tan. Brain-computer interfacing for intelligent systems. IEEE Intelligent
Systems, 23:72–79, 2008.
[4] Jose del R. Millan, Anna Buttfield, C. Vidaurre, M. Krauledat, A. Schlogl, P. Shenoy, B. Blankertz,
R.P.N. Rao, R. Cabeza, Gert Pfurtscheller, and K. R. Mueller. Adaptation in Brain-Computer
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 20
Interfaces. In G. Dornhege, Jose del R. Millan, T. Hinterberger, D. McFarland, and K. R.
Mueller, editors, Towards Brain-Computer Interfacing. The MIT Press, 2007.
[5] P. Shenoy, M. Krauledat, B. Blankertz, R. P. N. Rao, and K.-R. Muller. Towards adaptive
classification for BCI. Journal of Neural Engineering, 3(1):13–23, 2006.
[6] A. Buttfield, P.W. Ferrez, and J. d. R. Millan. Towards a robust BCI: error recognition and online
learning. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 14:164–168,
2006.
[7] C. Vidaurre, A. Schlogl, R. Cabeza, R. Scherer, and G. Pfurtscheller. A fully online adaptive BCI.
IEEE Transactions on Biomedical Engineering, 53:1214–1219, 2006.
[8] C. Vidaurre, M. Kawanabe, P. von Bunau, B. Blankertz, and K.R. Muller. Toward unsupervised
adaptation of lda for brain-computer interfaces. IEEE Transactions on Biomedical Engineering,
58:587–597, 2011.
[9] A. Lenhardt, M. Kaper, and H.J. Ritter. An adaptive P300-based online brain-computer interface.
IEEE Transactions on Neural Systems and Rehabilitation Engineering, 16:1–11, 2008.
[10] C. Vidaurre, A A. Schlogl, R. Cabeza, R. Scherer, and G. Pfurtscheller. Study of on-line adaptive
discriminant analysis for eeg-based brain computer interfaces. IEEE Transactions on Biomedical
Engineering, 54:550–556, 2007.
[11] Yuanqing Li and Cuntai Guan. An extended em algorithm for joint feature extraction and
classification in brain-computer interfaces. Neural Computation, 18:2730–2761, 2006.
[12] B. Blankertz, M. Kawanabe, R. Tomioka, F. Hohlefeld, V. Nikulin, and K.-R. M¨uller. Invariant
common spatial patterns: Alleviating nonstationarities in brain-computer interfacing. In
Advances in Neural Information Processing Systems, pages 113–120. MIT Press, Cambridge,
MA, 2008.
[13] J. Muller-Gerking, G. Pfurtscheller, and H. Flyvbjerg. Designing optimal spatial filtering of single
trial EEG classification in a movement task. Clinical Neurophysiology, 110:787–798, 1999.
[14] B. Blankertz, G. Dornhege, C. Schafer, R. Krepki, J. Kohlmorgen, K.-R. M¨uller, V. Kunzmann,
F. Losch, and G. Curio. Boosting bit rates and error detection for the classification of fast-paced
motor commands based on single-trial EEG analysis. IEEE Transactions on Neural Systems
and Rehabilitation Engineering, 11:127–131, 2003.
[15] G. Schalk, J.R. Wolpaw, D.J. McFarland, and G. Pfurtscheller. Eeg-based communication:
presence of an error potential. Clinical Neurophysiology, 111:2138–2144, 2000.
[16] H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller. Optimal spatial filtering of single trial EEG
during imagined hand movement. IEEE Transactions on Rehabilitation Engineering, 8(4):441–
446, 2000.
[17] M. Sugiyama, M. Krauledat, and K.R. Mueller. Covariance shift adaptation by importance
weighted cross validation. Journal of Machine Learning Research, 8:985–1005, 2007.
[18] Y. Li, H. Kambara, Y. Koike, and M. Sugiyama. Application of covariate shift adaptation
techniques in braincomputer interfaces. IEEE Transactions on Biomedical Engineering,
57:1318–1324, 2010.
[19] Haihong Zhang and Cuntai Guan. A maximum mutual information approach for constructing a 1d
continuous control signal at a self-paced braincomputer interface. Journal of Neural Engineering,
7(5):056009, 2010.
[20] S. G. Mason and G. E. Birch. A brain-controlled switch for asynchronous control applications.
IEEE Transactions on Rehabilitation Engineering, 47:1297–1307, 2000.
[21] A. K¨ubler, F. Nijboer, J. Mellinger, T. M. Vaughan, H. Pawelzik, G. Schalk, D. J. McFarland,
N. Birbaumer, and J. R. Wolpaw. Patients with ALS can use sensorimotor rhythms to operate
a brain-computer interface. Neurology, 64:1775–1777, 2005.
[22] H. Zhang, C. Guan, and C. Wang. Asynchronous p300-based brain-computer interfaces: A
computational approach with statistical models. IEEE Transactions on Biomedical Engineering,
55(6):1754–1763, 2008.
[23] B. Blankertz, G. Dornhege, M. Krauledat, K.-R. M¨uller, and G. Curio. The non-invasive
Learning From Feedback Training Data in Self-paced Brain-Computer Interface 21
Berlin brain-computer interface: Fast acquisition of effective performance in untrained subjects.
NeuroImage, 37(2):539–550, 2007.
[24] F. Galan, M. Nuttin, E. Lew, P.W. Ferrez, G. Vanacker, J. Philips, and J.del R. Mill´an. A brain-
actuated wheelchair: Asynchronous and non-invasive brain-computer interfaces for continuous
control of robots. Clinical Neurophysiology, 119:2159–2169, 2008.
[25] K. K. Ang, Z. Y. Chin, H. Zhang, and C. Guan. Filter bank common spatial pattern (FBCSP) in
brain-computer interface. In International Joint Conference on Neural Networks (IJCNN2008),
pages 2391–2398, 2008.
[26] H. Zhang, C. Guan, and C. Wang. Spatio-spectral feature selection based on robust mutual
information estimate for brain-computer interfaces. In Annual International Conference of the
IEEE Engineering in Medicine and Biology Society, pages 2391–2398, 2009.
[27] H. Zhang, Z. Y. Chin, K. K. Ang, C. Guan, and C. Wang. Optimum spatio-spectral filtering
network for braincomputer interface. IEEE Transactions on Neural Networks, 22:52–63, 2011.
[28] BCI Competition IV. http://www.bbci.de/competition/.
[29] M. Fatourechi, A. Fatourechi, R.K. Ward, and G.E. Birch. EMG and EOG artifacts in brain
computer interface systems: A survey. Clinical Neurophysiology, 118:480–494, 2007.
[30] G. Dornhege, B. Blankertz, G. Curio, and K.-R. M¨uller. Boosting bit rates in noninvasive EEG
single-trial classifications by feature combination and multiclass paradigms. IEEE Transactions
on Biomedical Engineering, 51(6):993–1002, 2004.
[31] M. Grosse-Wentrup and M. Buss. Multiclass common spatial patterns and information theoretic
feature extraction. IEEE Transactions on Biomedical Engineering, 55(8):1991–2000, 2008.
[32] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K.-R. M¨uller. Optimizing spatial filters
for robust EEG single-trial analysis. IEEE Signal Processing Magazine, 25:41–56, 2008.
[33] G. Dornhege, B. Blankertz, M. Krauledat, F. Losch, G. Curio, and K.-R. M¨uller. Combined
optimization of spatial and temporal filters for improving brain-computer interfacing. IEEE
Transactions on Biomedical Engineering, 53(11):2274–2281, 2006.
[34] S. Lemm, B. Blankertz, G. Curio, and K.-R M¨uller. Spatio-spectral filters for improving the
classification of single trial EEG. IEEE Transactions on Biomedical Engineering, 52:1541–1548,
2005.
[35] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, New York, 2nd edition,
2006.
[36] S. Petridis and S.J. Perantonis. On the relation between discriminant analysis and mutual
information for supervised linear feature extraction. Pattern Recognition, 37:857–874, 2004.
[37] M. Ben-Bassat. User of distance measures, information measures and error bounds in feature
evaluation. In P. Krishnaiah and L. Kanal, editors, Handbook of Statistics, pages 773–791.
North-Holland, Amsterdam, 1982.
[38] M. Last, A. Kander, and O. Maimon. Information-theoretic algorithm for feature selection.
Pattern Recognition Letters, 22:799–811, 2001.
[39] J.M. Sotoca and F. Pla. Supervised feature selection by clustering using conditional mutual
information-based distances. Pattern Recognition, 43:2068–2081, 2010.
[40] P.A. Estevez, M. Tesmer, C.A. Perez, and J.M. Zurada. Normalized mutual information feature
selection. IEEE Transactions on Neural Networks, 20:189–201, 2009.
[41] N. Kwak and C.-H. Choi. Input feature selection by mutual information based on parzen window.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:1667 – 1671, 2002.
[42] A. W. Bowman and A. Azzalini. Applied Smoothing Techniques for Data Analysis: The Kernel
Approach with S-Plus Illustrations. Oxford University Press, New York, 1997.
[43] C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001.
http://www.csie.ntu.edu.tw/cjlin/libsvm.
  • Article
    A new multiclass Braincomputer Interface (BCI) based on the modulation of sensorimotor oscillations by imagining movements is described. By the application of advanced signal processing tools, statistics and machine learning, this BCI system offers: (a) asynchronous mode of operation (b) automatic selection of user-dependent parameters based on an initial calibration (c) incremental update of the classifier parameters from feedback data. The signal classification uses spatially filtered signals and is based on spectral power estimation computed in individualized frequency bands, which are automatically identified by a specially tailored AR-based model. Relevant features are chosen by a criterion based on Mutual Information. Final recognition of motor imagery is effectuated by a multinomial logistic regression classifier. This BCI system was evaluated in two studies. In the first study, five participants trained the ability to imagine movements of the right hand, left hand and feet in response to visual cues. The accuracy of the classifier was evaluated across four training sessions with feedback. The second study assessed the information transfer rate (ITR) of the BCI in an asynchronous application. The subjects task was to navigate a cursor along a computer rendered 2D maze. A peak information transfer rate of 8.0 bit/min was achieved. Five subjects performed with a mean ITR of 4.5 bit/min and an accuracy of 74.84%. These results demonstrate that the use of automated interfaces to reduce complexity for the intended operator (outside the laboratory) is indeed possible. The signal processing and classifier source code embedded in BCI2000 is available from https://www.brain-project.org/downloads.html.
  • Article
    Goal: Motor imagery-related mu/beta rhythms, which can be voluntarily modulated by subjects, have been widely used in EEG-based brain computer interfaces (BCIs). Moreover, it has been suggested that motor imagery-specific EEG differences can be enhanced by feedback training. However, the differences observed in the EEGs of naive subjects are typically not sufficient to provide reliable EEG control and thus result in unintended feedback. Such feedback can frustrate subjects and impede training. In this study, a hybrid BCI paradigm combining motor imagery (MI) and steady-state visually evoked potentials (SSVEPs) has been proposed to provide effective continuous feedback for motor imagery training. Methods: During the initial training sessions, subjects must focus on flickering buttons to evoke SSVEPs as they perform motor imagery tasks. The output/feedback of the hybrid BCI is based on hybrid features consisting of motor imagery- and SSVEP-related brain signals. In this context, the SSVEP plays a more important role than motor imagery in generating feedback. As the training progresses, the subjects can gradually decrease their visual attention to the flickering buttons, provided that the feedback is still effective. In this case, the feedback is mainly based on motor imagery. Results: Our experimental results demonstrate that subjects generate distinguishable brain patterns of hand motor imagery after only five training sessions lasting approximately 1.5 h each. Conclusion: The proposed hybrid feedback paradigm can be used to enhance motor imagery training. Significance: This hybrid BCI system with feedback can effectively identify the intentions of the subjects.
  • Article
    This chapter introduces formally feature evaluation rules and illustrates two of them by a numerical example. The limitations of the probability of error rule are pointed out and the properties desired from a substitute rule are discussed in this chapter. It reviews the major categories of feature evaluation rules and provides tables that relate these rules to error bounds. The use of error bounds for assessing feature evaluation rules and for estimating the probability of error is also discussed. The chapter provides a summary of the theoretical and experimental findings so far and some practical recommendations. Most of the works that experimented with various feature evaluation rules conclude that the feature rankings induced by the various rules are very similar. Using Spearman's rank correlation coefficient as a measure of rankings similarity and experimenting with Munson's hand printed character data.
  • Book
    The book describes the use of smoothing techniques in statistics, including both density estimation and nonparametric regression. Considerable advances in research in this area have been made in recent years. The aim of this text is to describe a variety of ways in which these methods can beapplied to practical problems in statistics. The role of smoothing techniques in exploring data graphically is emphasised, but the use of nonparametric curves in drawing conclusions from data, as an extension of more standard parametric models, is also a major focus of the book. Examples are drawnfrom a wide range of applications. The book is intended for those who seek an introduction to the area, with an emphasis on applications rather than on detailed theory. It is therefore expected that the book will benefit those attending courses at an advanced undergraduate, or postgraduate, level,as well as researchers, both from statistics and from other disciplines, who wish to learn about and apply these techniques in practical data analysis. The text makes extensive reference to S-Plus, as a computing environment in which examples can be explored. S-Plus functions and example scriptsare provided to implement many of the techniques described. These parts are, however, clearly separate from the main body of text, and can therefore easily be skipped by readers not interested in S-Plus.
  • Chapter
    Information theory answers two fundamental questions in communication theory: what is the ultimate data compression (answer: the entropy H), and what is the ultimate transmission rate of communication (answer: the channel capacity C). For this reason some consider information theory to be a subset of communication theory. We will argue that it is much more. Indeed, it has fundamental contributions to make in statistical physics (thermodynamics), computer science (Kolmogorov complexity or algorithmic complexity), statistical inference (Occam's Razor: “The simplest explanation is best”) and to probability and statistics (error rates for optimal hypothesis testing and estimation). The relationship of information theory to other fields is discussed. Information theory intersects physics (statistical mechanics), mathematics (probability theory), electrical engineering (communication theory) and computer science (algorithmic complexity). We describe these areas of intersection in detail.
  • Conference Paper
    Full-text available
    In motor imagery-based brain computer interfaces (BCI), discriminative patterns can be extracted from the electroencephalogram (EEG) using the common spatial pattern (CSP) algorithm. However, the performance of this spatial filter depends on the operational frequency band of the EEG. Thus, setting a broad frequency range, or manually selecting a subject-specific frequency range, are commonly used with the CSP algorithm. To address this problem, this paper proposes a novel filter bank common spatial pattern (FBCSP) to perform autonomous selection of key temporal-spatial discriminative EEG characteristics. After the EEG measurements have been bandpass-filtered into multiple frequency bands, CSP features are extracted from each of these bands. A feature selection algorithm is then used to automatically select discriminative pairs of frequency bands and corresponding CSP features. A classification algorithm is subsequently used to classify the CSP features. A study is conducted to assess the performance of a selection of feature selection and classification algorithms for use with the FBCSP. Extensive experimental results are presented on a publicly available dataset as well as data collected from healthy subjects and unilaterally paralyzed stroke patients. The results show that FBCSP, using a particular combination feature selection and classification algorithm, yields relatively higher cross-validation accuracies compared to prevailing approaches.
  • Article
    There is a step of significant difficulty experienced by brain-computer interface (BCI) users when going from the calibration recording to the feedback application. This effect has been previously studied and a supervised adaptation solution has been proposed. In this paper, we suggest a simple unsupervised adaptation method of the linear discriminant analysis (LDA) classifier that effectively solves this problem by counteracting the harmful effect of nonclass-related nonstationarities in electroencephalography (EEG) during BCI sessions performed with motor imagery tasks. For this, we first introduce three types of adaptation procedures and investigate them in an offline study with 19 datasets. Then, we select one of the proposed methods and analyze it further. The chosen classifier is offline tested in data from 80 healthy users and four high spinal cord injury patients. Finally, for the first time in BCI literature, we apply this unsupervised classifier in online experiments. Additionally, we show that its performance is significantly better than the state-of-the-art supervised approach.
  • Article
    Full-text available
    This paper proposes a feature extraction method for motor imagery brain-computer interface (BCI) using electroencephalogram. We consider the primary neurophysiologic phenomenon of motor imagery, termed event-related desynchronization, and formulate the learning task for feature extraction as maximizing the mutual information between the spatio-spectral filtering parameters and the class labels. After introducing a nonparametric estimate of mutual information, a gradient-based learning algorithm is devised to efficiently optimize the spatial filters in conjunction with a band-pass filter. The proposed method is compared with two existing methods on real data: a BCI Competition IV dataset as well as our data collected from seven human subjects. The results indicate the superior performance of the method for motor imagery classification, as it produced higher classification accuracy with statistical significance (≥95% confidence level) in most cases.
  • Article
    In this paper, a supervised feature selection approach is presented, which is based on metric applied on continuous and discrete data representations. This method builds a dissimilarity space using information theoretic measures, in particular conditional mutual information between features with respect to a relevant variable that represents the class labels. Applying a hierarchical clustering, the algorithm searches for a compression of the information contained in the original set of features. The proposed technique is compared with other state of art methods also based on information measures. Eventually, several experiments are presented to show the effectiveness of the features selected from the point of view of classification accuracy.
  • Article
    Feature selection is used to improve the efficiency of learning algorithms by finding an optimal subset of features. However, most feature selection techniques can handle only certain types of data. Additional limitations of existing methods include intensive computational requirements and inability to identify redundant variables. In this paper, we present a novel, information-theoretic algorithm for feature selection, which finds an optimal set of attributes by removing both irrelevant and redundant features. The algorithm has a polynomial computational complexity and is applicable to datasets of a mixed nature. The method performance is evaluated on several benchmark datasets by using a standard classifier (C4.5).
  • Article
    This paper provides a unifying view of three discriminant linear feature extraction methods: linear discriminant analysis, heteroscedastic discriminant analysis and maximization of mutual information. We propose a model-independent reformulation of the criteria related to these three methods that stresses their similarities and elucidates their differences. Based on assumptions for the probability distribution of the classification data, we obtain sufficient conditions under which two or more of the above criteria coincide. It is shown that these conditions also suffice for Bayes optimality of the criteria. Our approach results in an information-theoretic derivation of linear discriminant analysis and heteroscedastic discriminant analysis. Finally, regarding linear discriminant analysis, we discuss its relation to multidimensional independent component analysis and derive suboptimality bounds based on information theory.
  • Conference Paper
    Full-text available
    Brain-Computer Interfaces can suffer from a large variance of the subject condi- tions within and across sessions. For example vigilance fluc tuations in the indi- vidual, variable task involvement, workload etc. alter the characteristics of EEG signals and thus challenge a stable BCI operation. In the present work we aim to define features based on a variant of the common spatial patte rns (CSP) algorithm that are constructed invariant with respect to such nonstationarities. We enforce invariance properties by adding terms to the denominator of a Rayleigh coefficient representation of CSP such as disturbance covariance matrices from fluctuations in visual processing. In this manner physiological prior kn owledge can be used to shape the classification engine for BCI. As a proof of conce pt we present a BCI classifier that is robust to changes in the level of pariet al α-activity. In other words, the EEG decoding still works when there are lapses in vigilance.