Recognition of Motor Imagery Electroencephalography Using Independent Component Analysis and Machine Classifiers.

Conference Paper (PDF Available) · January 2004with33 Reads
Source: DBLP

Figures

Recognition of Motor Imagery Electroencephalography Using
Independent Component Analysis and Machine Classifiers
Chih-I Hung
1,4
, Po-Lei Lee
4
, Yu-Te Wu
1,4,*
, Hui-Yun Chen
1,4
, Li-Fen Chen
3,4
Tzu-Chen Yeh
2,3,4
, Jen-Chuen Hsieh
2,3,4
1
Institute of Radiological Sciences,
2
Institute
of Neuroscience,
3
Center for Neuroscience
National Yang-Ming University
No.155, Sec. 2, Linong St., Beitou District,
112, Taipei, Taiwan,
4
Integrated Brain Research Laboratory, Dept. of
Medical Research and Education,
Taipei Veterans General Hospital,
No.201, Sec. 2, Shihpai Rd., Beitou District
112, Taipei, Taiwan
e-mail : runtothewater@pie.com.tw; pllee2@vghtpe.gov.tw; ytwu@ym.edu.tw;
airrb@pchome.com.tw;
lfchen3@vghtpe.gov.tw; tcyeh@vghtpe.gov.tw; jchsieh@vghtpe.gov.tw
ABSTRACT
Motor imagery electroencephalography (EEG), which embodies cortical potentials during mental simulation of
left or right finger lifting tasks, can be used as neural input signals to activate brain computer interface (BCI).
The effectiveness of such an EEG-based BCI system relies on two indispensable features: distinguishable
patterns of brain signals and accurate classifiers. This work aims to extract a reliable neural feature, termed as
beta rebound map, out of motor imagery EEG by means of independent component analysis, and employ four
classifiers to investigate the efficacy of beta rebound map. Results demonstrated that, with the use of ICA, the
recognition rates of four classifiers, linear discriminant analysis (LDA), back-propagation neural network (BP-
NN), radial-basis function neural network (RBF-NN), and support vector machine (SVM) improved
significantly from 54%, 54%, 57.3% and 55% to 69.8%, 75.5%, 76.5% and 77.3%, respectively. In addition,
the areas under the ROC curve, which assess the quality of classification over a wide range of misclassification
costs, also improved greatly from .65, .60, .62, and .64 to .78, .73, .77 and .75, respectively.
Keywords
Electroencephalography (EEG), Independent component analysis (ICA), brain computer interface (BCI), beta
rebound, linear discriminant analysis (LDA), back-propagation neural network (BP-NN), radial-basis function
neural network (RBF-NN), support vector machine (SVM)
1. INTRODUCTION
In recent years, great progress in neuroscience has
inspired studies in developing brain computer
interface (BCI) [Mul99a] [Pfu98a] [Pfu00a] [Pol98a],
a novel technique in assisting people to communicate
with external environments or trigger surrounding
devices by means of their brain signals. These
systems are particularly useful for ones who suffer
from amyotrophic lateral sclerosis or locked-in
syndrome and are unable to produce any motor
activity. Their cognition or sensor functions,
however, may be intact so that they can be trained to
perform mental tasks, for example, in simulating
right or left hand or foot movements without any
overt motor output. The success of BCI systems
relies on two integral parts: distinguishable neural
patterns and effective classifiers. This work aims to
extract a reliably distinguishable feature from the
motor imagery EEG recording by means of
independent component analysis and employ
machine classifiers to investigate the efficacy of
extracted pattern.
Permission to make digital or hard copies of all or part of
this work for personal or classroom use is granted without
fee provided that copies are not made or distributed for
profit or commercial advantage and that copies bear this
notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
WSCG’2004, February 2-6, 2004, Plzen, Czech Republic.
Copyright UNION Agency – Science Press
It has been pointed out that imagination of hand
movement elicits rhythmic EEG patterns in the
primary sensorimotor areas similar to that from a real
hand movement [Pfu96a]. When a specific
movement or imagined movement is performed, it
composes of three phases: planning, execution and
recovery. The planning and execution results in
localized alpha and lower beta bands amplitude
attenuation or event-related desynchronization (ERD)
which can be viewed as an EEG correlate of an
activated cortical motor network, while the recovery
phase produces focal mu and beta amplitude
enhancement or event-related synchronization (ERS)
which may reflect deactivation/inhibition in the
underlying cortical network.
Several BCI systems have been proposed based on
the induced ERD when subjects performed imagery
hand or foot movements [Pfu98a] [Pfu00a].
Pfurtscheller et al. used a learning vector
quantization to classified ERD signals on-line in a
subject specific band which was determined by
distinctive sensitive learning vector quantization.
They also adopted adaptive autoregressive model to
analyzed ERD signal off-line and applied linear
discrimination analysis to improve the detection of
imagined left and right hand movements. The
reported error rates varied 5.8 and 32.8%. Muller-
Gerking et al. applied common spatial filter to detect
real (not imagined) left, right hand or right foot
movements in single trial and reported 84%, 90%
and 94% accuracies for three subjects, respectively
[Mul99a].
Although the ERD elicited by imagined movement
has been extensively used as a feature pattern in BCI
systems, we have observed that not every subject can
produce discernible ERD during the imagery
movement, whereas the beta ERS was persistently
appeared for each subject. This motivated us to adopt
ERS, rather than the ERD, as the feature pattern.
The peaked ERS of imaged left or right hand
movement, referred to as beta rebound, exhibits on
bilateral sensorimotor areas but with distinct patterns.
When the imagination of right hand movement is
executed, the beta rebound over left hemisphere
produces stronger amplitude than that on the right
hemisphere, and the vice versa.
The recorded EEG signals were inevitably
contaminated by system noise, artifacts, spontaneous
EEG, etc. Following our previous works for
MEG/EEG de-noise [Lee03a], we employed the
Independent Component Analysis (ICA) technique to
decompose each pre-processed epoch into a set of
temporally independent components along with
corresponding spatial maps, and selected the task-
related components by matching designed spatial
templates with the decomposed spatial maps. As a
result, the signal-to-noise ratio of each EEG single-
trial was improved, which lead to the promotion of
classifiers’ performance.
This paper is organized as follows. Section 2 reports
our experimental paradigm for motor imagery task
and EEG recording configuration. Section 3 presents
the extracted features, with and without applying
ICA, based on peaked beat ERS and termed as beta
rebound maps. Section 4 reviews four classifiers in
this study. Section 5 summarizes the classification
results and Section 6 concludes this study.
2. EXPERIMENTAL PARADIGM FOR
MOTOR IMAGERY
Four right-handed healthy subjects (two males and
two females), aged between 20 and 28, participated
in this study. Each subject was naive to the
experiment and trained only twenty minutes prior to
the first session. During each session, the subject was
asked to perform 100 trials of imagery right index
finger lifting, followed by another 100 trials of
imagery left index finger lifting. The length of each
trial was ten seconds. Each trial began with one-
second presentation of random noise during which
subjects were allowed to blink his/her eyes (A in Fig.
1). The subject was then instructed to stare at the
fixation cross in the center of the monitor from 2s
and started to image right or left index finger lifting
right after he/she heard an acoustic cue “beep” (with
frequency 1k Hz and 10ms duration) at 5s (B in Fig.
1.). The inter-stimulus interval was 10 second.
Figure 1. Timing of two consecutive trials of the
motor imagery task.
A 64-channels electroencephalography (EEG) 10-20
system (with an electro-cap) was used to record the
cortical potentials. The configuration of standard 1-
20 system is shown in Fig. 2. The vertical and
horizontal electro-oculograms (VEOG and HEOG)
were applied to reject bad epochs induced by eye
blinking during the recording. The data were
digitized at 250 Hz. Since we focused on beta-
activities, the signals were further bandpass-filtered
with 6-50 Hz to remove the dc drifts and 60 Hz noise.
0
123456789
10 11
AB
beep
C
A
B
beep
12 13 13 14 15 16 17 18 19 s
Throughout the recordings, the surface
electromyogram (EMG) was monitored from the m.
extensor digitorum communis (digitized at 2 KHz)
for the detection of motion status. Data of four
sessions were collected for each subject. Signals
from 3s to 10s (C in Fig. 1.) in each trial (excluding
bad epochs) were extracted for further classifiers
training and testing. Figure 3 exhibits such a pre-
processed epoch from sensorimotor area (channel C3
in 10-20 system).
Figure 2. The configure of standard 10-20 system
with 64 channels.
Figure 3. A pre-processed epoch recorded at C3.
3. FEATURE EXTRACTION WITH
AND WITHOUT ICA
Extraction of reliable feature from measured data is
vital in facilitating the subsequent classification
procedure. Since the measured signals were
inevitably contaminated by system noise, artifacts,
spontaneous EEG, etc., we employed the ICA
technique to decompose each pre-processed epoch
into a set of temporally independent components
along with corresponding spatial maps, and selected
the task-related components by matching designed
spatial templates with the decomposed spatial maps.
Two types of feature, one using ICA to extract task-
related components and the other without using ICA,
were created from pre-processed data for the purpose
of comparison with their efficacies. The detailed
steps for feature extraction with ICA were described
in the following:
Step 1: Signal decomposition by using ICA. We
first arranged each pre-processed epoch across m
channels (m=62) and n sampled points (n=1750) into
an
nm
×
matrix X. The i
th
row contains the
observed signal from i
th
EEG channel, and the j
th
column vector contains the observed samples at the
j
th
time point across all channels. In the present study,
all calculations were performed using the FastICA
algorithm [Cov65a] [Cov98a]. The FastICA
technique first removed means of the row vectors in
the X matrix followed by a whitening procedure to
transform the covariance matrix of the zero-mean
data into an identity matrix. The whitening process
was implemented using the Principal Component
Analysis. Only the first N most significant
eigenvectors (N=15 in our analysis) were preserved
in the subsequent ICA calculation. In the next step,
FastICA searched a matrix to further separate the
whitened data into a set of components which were
as mutually independent as possible. Combining with
previous whitening process, the matrix X can be
transformed into a matrix S via an un-mixing matrix
W, i.e.,
WXS
=
(1)
in which the rows of S were mutually independent.
Each column of
1
W , i.e. mixing matrix, represents a
spatial map describing the relative projection weights
of the corresponding temporal components at each of
the EEG channels. They will be referred to as IC
spatial maps henceforth. Figure 4 shows 12 IC spatial
maps of 12 independent components (not shown)
decomposed from a single-trial imagery right hand
movement. The maps IC3, IC5, IC7 and IC9 were
highly related to motor imagery task and categorized
as task-related components, while the IC4 and IC6
maps were associated with the occipital alpha rhythm,
and IC1 map was the noise emanated from a bad
channel.
Step 2: Correlating the IC spatial maps with pre-
defined spatial templates to select task-related
components. Since the motor imagery task elicits
Nasion
fpz
cz
oz
pz
c4
t8
t7
c3
p4
p6
p3
p5
fp2 fp1
o2
o1
Inion
ground
af4 af
af8
af7
f
f4 f3
f7
f8
f1
f5
f2
f6
fcz fc
ft8ft
fcfc5 fc1 fc2 fc6
c5
c1
c2
c6
cpz
cp4
tp8tp7
cp3 cp5
cp1
cp2
cp6
p1 p2
p7 p8
poz
po6
po8
po3
po7
po3 po4
ref.2ref.1
bilateral activation in the vicinity of sensorimotor
areas, four spatial patterns encompassing C3, C4, Cz
and both C3 and C4 areas, respectively, were
considered as spatial templates (see Fig. 5) in
selecting the task-related spatial maps. Please note
that four spatial templates rather than single template
covering C3 and C4 were taken into account because
the task-related activities can be separated by ICA
and exhibited in multiple IC spatial maps. Each
template was correlated with 12 IC spatial maps of
single trial and the bets two matches were selected.
For example, the spatial maps IC3, IC5, IC7 and IC9
in Figure 4 were selected automatically due to their
high similarity. The task-related IC spatial maps as
well as the corresponding temporal components were
used to reconstruct the signal X by means of equation
(1).
Figure 4. The normalized IC spatial maps of a
single-trial imagery right hand movement.
Figure 5. Spatial templates used to select task-related
IC spatial maps.
Step 3: Computing the envelopes of beta
reactivity from reconstructed signals using the
Amplitude Modulation method. The optimal beta
frequency band encompassing the prominent and
relevant brain activities may vary across subjects and
subjects. To tackle this problem, we divided the beta
band, into five sub-frequency bands, 8~12, 12~16,
16~20, 20~24, and 24~ 28 Hz, and used them with
additional beta band 8~30 Hz to band-pass filter the
reconstructed signals. The Amplitude Modulation
(AM) method based on the Hilbert transform was
applied to detect the envelope of the filtered EEG
signals and quantify the event-related oscillatory
activities [Clo96a]. Each envelope, referred to as AM
waveform, was computed by (see Figure 6 (a))
22
))(()()( tMHtMtm
BPBP
+= (2)
where
)(tM
BP
is the single-trial band-passed EEG
signal, and
))(( tMH
BP
is its Hilbert transform.
Contrary to the classical measurement of ERS
reactivity and the original AM approach in which a
relative percentage as indexed to the initial baseline
was used [Clo96a], we computed the beta ERS
reactivity (termed as beta rebound) using the
amplitude difference between the maximum values
of beta ERD and beta ERS of the AM envelope.
Step 4: Extracting the beta rebound maps. The
imagery finger lifting task, similar to real finger
movement, induced larger beta rebound in the
contralateral sensorimotor area than that in the
ipsilateral one. In addition, the contralateral beta
rebound appeared earlier than the ipsilateral one. The
co-existence of prominent beta rebounds at C3 and
C4 and the constrained time lag between them
suggested that the topographical maps with
maximum rebounds at C3 and C4 were reliable
features. Specifically, we looked for two time points
at which both the AM waveforms of C3 and C4 have
maximum peaks but with time lag (
T
in Figure
6(a)) less than 0.5 second. The topographical maps at
these two time points, referred to as beta rebound
maps (Figure 6. (c)), were concatenated into a
1124
×
column vector and used as a feature vector.
Using the same time points of the peaked beta
rebound resulted from steps1 ~ 4, we processed the
data using step 3 only, i.e. without using ICA. Figure
7 depicts the extracted beta rebound maps and
appears to be contaminated due to noise compared
with those in Fig. 6.
4. TWO-CLASS SUPERVISED
CLASIFICATION
In this section, four two-category classifiers used in
our study are briefly reviewed. They were linear
discriminant analysis (LDA), back-propagation
neural network (BP-NN), radial basis function
network (RBF-NN) and support vector machine
(SVM). The beta rebound maps, denoted by
i
x
v
, of
imagery right and left hand movement, each of them
is a
1124
×
column vector and, were divided into two
data sets, one for training and the other for testing the
classifiers. The numbers of beta rebound maps used
in the training and testing phases for each subject at
each session were 60 and 30. These beta rebound
C3 C4 C3 & C4 Cz
0
1
0
0
0
1
1
1
IC1 IC2 IC3 IC4
IC5 IC6 IC7 IC8
IC9 IC10
IC11 IC12
Imagery right hand movement
maps were randomized before being used. For the
sake of simplicity, we use the notation R and L to
denote the category of imagery right and left hand
movement, respectively, in the following discussion.
Figure 6. Computation of the beta rebound maps. (a)
The AM waveform of C3 and C4.
was the time
lag between prominent beta rebounds at C3 and C4.
(b) Reconstructed signals of 62 channels (excluded
HEOG and VEOG) which were used to calculate the
AM waveforms in (a). (c) The beta rebound maps
created from reconstructed signals on 62 channels
indexed to the time points of peaked beta rebounds at
C3 and C4.
Figure 7. The computed beta rebound maps only
using steps 3 without applying ICA.
Classifiers
4.1.1 LDA
The idea of LDA is to seek a vector
w
r
so that two
projected clusters of R and L feature vectors
i
x
v
’s
on
w
r
can be well separated from each other while
keeping small variance of each cluster. This can be
done by maximizing the so-called Fisher’s criterion
wwS
wSw
wJ
w
b
'
)(
=
with respect to
w
r
, where
b
S is the between-class
scatter matrix:
)')((
LRLRb
mmmmS
=
and
w
S is the within-class scatter matrix:
+=
Lx
LL
Rx
RRw
mxmxmxmxS )')(()')((
r
r
r
r
in which two summations run over all the training
samples of classes
R and L , respectively, and
R
m
and
L
m represent the group mean of classes R and L,
respectively. The optimal
w
r
is the eigenverctor
corresponding to the largest eigenvalue of
Bw
SS
1
.
After
w
r
is obtained by means of the training data, we
projected the test samples on it, and then classified
the projected points by the k-nearest-neighbor
decision rule.
4.1.2 BP-NN
The BP-NN was trained in a supervised manner
based on the error-correction learning rule. The
hierarchy of a BPNN in our implementation is
depicted in Figure 8, which consists of one input
layer, one hidden layer, and one output layer. The
training phase was accomplished by iterating two
passes: the forward and backward passes. In the
forward pass of the back-propagation learning, as
show in the Figure 8, the output of the BP-NN at
iteration n was computed by
))(()( nvny
ϕ
=
where
)(
ϕ
was the activation function and )(nv was
the induced local field of output neuron
=
=
m
i
ii
nonwnv
1
)()()(
in which m was the total number of the inputs
applied to output neuron,
i
w was the weight
connecting neuron
i to the output neuron, and
)(no
i
was the output signal of neuron i . The error
signal,
)(ne , between )(ny and the desired
C3
C4
(c) Rebound map at C3 and C4 peak
(a) AM waveform of C3 and C4
(b) Reconstructed signal of 62 channels
T
4
6 8
10 s
4
6 8
10 s
beep
beep
15
-15
u
V
E
R
S
%
0
200
0
1
0
0.5
Rebound map of the same trial in Figure 6 without
applying ICA.
0.4
1
0.4
0.7
output, )(nd , was computed at each iteration. If the
error met the stopping criterion, the training
procedure was terminated. Otherwise, it was
minimized in the subsequent backward pass to
update the synaptic weighting )(nw
i
)()()]1([)()1( nonnwnwnw
iiii
ηδ
α
+
+=+
where
α
was the momentum constant, and )(n
δ
is
the local gradients of the output layer in the network,
given by
))((')()( nvnen
ϕ
δ
= . In the testing phase,
input feature vectors,
x
v
’s, can be linearly classified
according to the value of
)(ny in the output layer.
Figure 8. The hierarchy of BP neural network.
4.1.3 RBF-NN
The RBF neural network [Hay94a] uses a nonlinear
function to map the input data into high-dimension
space so that they are more likely to be linearly
separable than in the low-dimension space [Cov65a]
[Cov91a] [Cov88a]. The hierarchy of (regularization)
RBF neural network is depicted in Figure9, which
consists of one input layer, one hidden layer, and one
output layer.
Each RBF network is designed to have a nonlinear
trans- formation from the input layer to the hidden
layer, followed by a linear mapping from the hidden
layer to the output layer. The mapping between the
input and output space is expressed by:
=
=
N
i
ii
xxwxF
1
)()(
rrr
ϕ
(4.1)
where
2
)(
i
xx
i
exx
vv
vv
=
ϕ
and
i
w represents the
weighting from the i
th
hidden neuron to output
neuron, and
i
x
v
represents the i
th
known feature
vector with dimension m, i =1, 2, …N. The distance
between input vector,
x
v
, and center,
i
x
v
, is mapped
into high-dimension space by means of a Gaussian
function (
i
xx
v
v
(
ϕ
) in this study. In the phase of
supervised learning, training feature vectors
i
x
v
, i =1,
2, …N, and output desired output
ii
dxF =)(
r
which
is either 1 or -1 in our design, are given. For the sake
of simplicity, the training feature vectors are used as
centers. With the known N input feature vectors and
the corresponding designed outputs, the weighting
i
w can be computed from the input-output
relationship in equation (4.1):
dGw
=
(4.2)
where
=
)()()(
)()()(
)()()(
11
22212
12111
NNNN
N
N
xxxxxx
xxxxxx
xxxxxx
G
vv
L
vvvv
MOLL
vv
L
vvvv
v
v
L
v
v
v
v
ϕϕϕ
ϕϕϕ
ϕϕϕ
,
=
N
w
w
w
w
L
2
1
,
=
N
d
d
d
d
L
2
1
By solving the linear system (4.2), the resultant
weighting w vector is
dGw
+
= (4.3)
where
TT
GGGG
1
)(
+
= is the pseudoinverse matrix
of G. Compared with other neural network which
uses gradient-based optimization process to estimate
the weightings, for example, the back-propagation
recurrent neural network, the RBF neural network
solve for a set of linear equations to avoid trapping in
a local minimum and greatly reduce the training time.
In the testing phase, input feature vectors,
x
v
’s, can
be linearly classified based on the values of
)(xF
r
’s.
Figure 9. The hierarchy of RBF neural network.
4.1.4 SVM
The basic idea of support vector machine hinges
on two mathematical operations: (1) With an
appropriate nonlinear mapping
(.)
ϕ
of an input
vector into a high-dimensional feature space, data
i
x
r
Input layer
Hidden layer
Output
layer
)(ny
MM
)(
ϕ
)(nv
1
w
2
w
i
w
)(
1
no
)(
2
no
)(no
i
Forward pass
Backward pass
adjust synaptic weighting
i
x
r
Input
layer
Hidden layer of N
radio-basis functions
Output
layer
)(xF
r
MM
ϕ
1
w
2
w
N
w
adjust synaptic weighting
ϕ
ϕ
=
=
N
i
ii
xxwxF
1
)()(
r
r
r
ϕ
)(
ii
xx
r
r
ϕ
from two categories can be linearly separated by a
hyperplane [Cov65a], (2) Construction of an optimal
hyperplane for separating the features in (1). Let
x
v
denote a vector drawn from the input space, assumed
to be of dimension m
0
and let
1
1
)}({
m
j
j
x
=
v
ϕ
denote a
set of nonlinear transformations from the input space
to the feature space: m
1
is the dimension of the
feature space. Given such a set of nonlinear
transformations, we may define a hyperplane acting
as the decision surface as follows:
=
=
1
0
0)(
m
j
jj
xw
r
ϕ
(4.4)
where
},...,,{
1
10 m
wwww = denotes a set of linear
weights connecting the feature space to the output
space. And it is assumed that
1)(
0
=x
r
ϕ
for all
x
r
, so
that
0
w denotes the bias. Equation (4.4) defines the
decision surface computed in the feature space in
terms of the linear weights of the machine. Define
the vector
T
m
xxxx )](),...,(),([)(
1
10
r
r
r
r
ϕϕϕϕ
= , and
T
m
wwww ],...,,[
1
10
= we rewrite the decision
surface in the compact form:
0)( =xw
T
r
ϕ
(4.5)
Given the training feature samples
)(
i
x
v
ϕ
corresponds
to the input pattern
i
x
r
, and the corresponding desired
response
,...,Nid
i
1 , = , which is either 1 or -1 in our
design, it has been shown that [Hay94a] the optimal
weight vector
w can be expressed as
=
=
N
i
iii
xdw
1
)(
r
ϕα
(4.6)
where
N
ii 1
}{
=
α
is the optimal Lagrange multipliers
resulted from maximizing the subject function
∑∑
===
=
N
i
N
j
ji
T
jiji
N
i
i
xxddQ
111
)()(
2
1
)(
rr
ϕϕαααα
(4.7)
subject to the constraints (1)
0
1
=
=
N
i
ii
d
α
, and (2)
C
i
α
0
, where C is a user-specified constant.
Substituting equation (4.6) into (4.5), we obtain the
optimal hyperplane
0)()(
1
=
=
xxd
N
i
i
T
ii
rr
ϕϕα
(4.8)
which will be used for linearly separating the testing
data, i.e. for any testing sample
x, if
0)()(
1
=
xxd
N
i
i
T
ii
rr
ϕϕα
then
x is classified into the subset having the training
response
1
=
i
d , otherwise it is classified into the
other subset with
1
=
i
d . In our implementation, we
chose the radial basis function in defining the inner-
product kernel
)()( xx
i
T
r
r
ϕϕ
as follows:
)0005.0exp()()(),(
2
ii
T
i
xxxxxxK
r
r
r
r
r
r
=
ϕϕ
.
According to equation (4.8), once the number of
nonzero Lagrange multipliers,
i
α
, is determined, the
number of radial-basis functions and their centers are
determined automatically. This differs from the
design of the conventional neural network, for
example, the back-propagation neural network or
radial-basis function network [Hay94a], where the
numbers of hidden layers or of hidden neuron are
usually determined heuristically.
5. RESULTS
Table 1 summarizes the averaged recognition results
for detecting the right and left imagined finger lifting
in four subjects (denoted by s1 ~ s4). With the use of
ICA in the extraction of the beta rebound maps, each
classifier has superior performance regardless of
subjects and the overall averaged recognition score
improved significantly from 55.0% to 74.8%. In
addition, the SVM outperformed other classifiers.
Classifier ICA s1 s2 s3 s4 mean
LDA without 58 55 57 51 54
with 63 79 74 63 69.8
BP-NN without 63 52 50 51 54
with 72 84 79 67 75.5
RBF-NN without 66 59 54 50 57.3
with 75 86 79 66 76.5
SVM without 66 53 50 51 55
with 72 87 77 73 77.3
Table 1. Averaged recognition rates (in percentages)
over four sessions resulted from different classifiers
with and without using ICA for feature extraction.
The receiver operating characteristics (ROC) curve, a
plot of true-positive rate versus false-positive rate,
provides another way to evaluate the performance of
binary detection classifiers. The area under the ROC
curve, which can be interpreted as the probability of
a random sample being assigned to positive class
than that to negative class, assesses the quality of
classification over a range of misclassification costs.
Table 2 reports that the use of ICA improved the
performance of each classifier and the overall
averaged ROC area increased from 0.63 to 0.75.
Classifier ICA s1 s2 s3 s4 mean
LDA without .71 .64 .58 .67 .65
with .75 .86 .74 .68 .78
BP without .65 .56 .61 .58 .60
with .68 .78 .74 .71 .73
RBF without .73 .60 .54 .62 .62
with .65 .91 .77 .74 .77
SVM without .64 .61 .66 .63 .64
with .69 .87 .77 .65 .75
Table 2. Averaged ROC areas over four sessions
resulted from different classifiers with and without
using ICA for feature extraction. The numbers of
beta rebound maps used for training and testing for
each subject at each session were 60 and 30.
6. CONCLUSIONS
We have presented a novel method using ICA in
extracting a reliable feature, the beta rebound map,
from the peaked ERS of motor imagery EEG. With a
minimum training for each subject (20 minutes only),
satisfactory classification rates from four classifiers
have been achieved. This demonstrated the suitability
of beta rebound map as neural input signals in the
application of BCI systems.
7. ACKNOWLEDGMENTS
The study was funded by the Taipei Veterans
General Hospital, Taiwan 91380, the Ministry of
Education of Taiwan (89BFA221401), and the
National Science of Council, Taiwan (NSC-92-2218-
E-010-016).
8. REFERENCES
[Clo96a] Clochon, P., Fontbonne, J. M., Etevenon, P.
A new method for quantifying EEG event-related
desynchronization: amplitude envelope analysis,
Electroencephalography & Clinical Neuro-
physiology
, 98: 126-129, 1996.
[Cov65a] Cover, T. M. Geometrical and statistical
properties of systems of linear inequalities with
applications in pattern recognition,
IEEE
transactions on electronic computers
, EC-14:
326-334, 1965.
[Cov91a] Cover, T. M., Thomas, J. A. Elements of
Information Theory. New York: Wiley, 1991.
[Cov88a] Cover, T. M., Capacity problems for linear
machines. Washington, DC: Thompson Book,
Pattern Recognition: 293-289, 1988.
[Hay94a] Haykin, S. Neural Network: A
Comprehensive Foundation. New York:
Macmillan College Publishing Company, 1994.
[Lee03a] Lee, P.L., Wu, Y.T., Chen, L.F.,
Chen ,Y.S., Cheng, C.M., Yeh, T.C., Ho, L.T.,
Chang, M.S., Hsieh, J.C. ICA-based
spatiotemporal approach for single-trial analysis
of post-movement MEG beta synchronization,
NeuroImage, in press, 2003.
[Mul99a] Muller-Gerking, J., Pfurtscheller, G.,
Flyvbjerg, H. Designing optimal spatial filters for
single-trial EEG classification in a movement
task,
Clinical neurophysiology, 110:787-798,
1999.
[Pfu96a] Pfurtscheller, G., Stancak Jr, A., Neuper, C.
Post-movement beta synchronization. A correlate
of an idling motor area?,
Electroencephalography & Clinical Neuro-
physiology
, 98:281-293, 1996.
[Pfu98a] Pfurtscheller, G., Neuper, C., A., Schlogl,
Lugger, K. Separability of EEG Signals
Recorded During Right and Left Motor Imagery
Using Adaptive Autoregressive Parameters,
IEEE
transactions on Rehabilitation Engineering
, Vol.
6, No. 3: 316-325, 1998.
[Pfu00a] Pfurtscheller, G., Guger, C., Muller, G.
Krausz, G., Neuper, C. Brain oscillations control
hand orthosis in a tetraplegic,
Neuroscience
letters
, 292: 211-214, 2000.
    • "Other studies use ICA as a denoising technique or as a feature extractor for improving the performance of a separate classifier. For example, in [4] ICA is used to remove ocular artefacts, while [5] extracts task-related independent components prior the application of several classifiers. In contrast to these approaches, in [10] the authors introduce a combination of Hidden Markov Models and Independent Component Analysis as a generative model of the EEG data and give a demonstration of how this model can be applied directly to the detection of when switching occurs between the two mental conditions of baseline activity and imaginary movement. "
    [Show abstract] [Hide abstract] ABSTRACT: In this paper we investigate the use of a temporal extension of independent component analysis (ICA) for the discrimination of three mental tasks for asynchronous EEG-based brain computer interface systems. ICA is most commonly used with EEG for artifact identification with little work on the use of ICA for direct discrimination of different types of EEG signals. In a recent work we have shown that, by viewing ICA as a generative model, we can use Bayes' rule to form a classifier obtaining state-of-the-art results when compared to more traditional methods based on using temporal features as inputs to off-the-shelf classifiers. However, in that model no assumption on the temporal nature of the independent components was made. In this work we model the hidden components with an autoregressive process in order to investigate whether temporal information can bring any advantage in terms of discrimination of spontaneous mental tasks
    Conference Paper · Apr 2005 · Radioengineering
  • [Show abstract] [Hide abstract] ABSTRACT: This thesis explores latent-variable probabilistic models for the analysis and classification of electroenchephalographic (EEG) signals used in Brain Computer Interface (BCI) systems. The first part of the thesis focuses on the use of probabilistic methods for classification. We begin with comparing performance between 'black-box' generative and discriminative approaches. In order to take potential advantage of the temporal nature of the EEG, we use two temporal models: the standard generative hidden Markov model, and the discriminative input-output hidden Markov model. For this latter model, we introduce a novel 'apposite' training algorithm which is of particular benefit for the type of training sequences that we use. We also asses the advantage of using these temporal probabilistic models compared with their static alternatives. We then investigate the incorporation of more specific prior information about the physical nature of EEG signals into the model structure. In particular, a common successful assumption in EEG research is that signals are generated by a linear mixing of independent sources in the brain and other external components. Such domain knowledge is conveniently introduced by using a generative model, and leads to a generative form of Independent Components Analysis (gICA). We analyze whether or not this approach is advantageous in terms of performance compared to a more standard discriminative approach, which uses domain knowledge by extracting relevant features which are subsequently fed into classifiers. The user of a BCI system may have more than one way to perform a particular mental task. Furthermore, the physiological and psychological conditions may change from one recording session and/or day to another. As a consequence, the corresponding EEG signals may change significantly. As a first attempt to deal with this effect, we use a mixture of gICA in which the EEG signal is split into different regimes, each regime corresponding to a potentially different realization of the same mental task. An arguable limitation of the gICA model is the fact that the temporal nature of the EEG signal is not taken into account. Therefore, we analyze an extension in which each hidden component is modeled with an autoregressive process. The second part of the thesis focuses on analyzing the EEG signal and, in particular, on extracting independent dynamical processes from multiple channels. In BCI research, such a decomposition technique can be applied, for example, to denoise EEG signals from artifacts and to analyze the source generators in the brain, thereby aiding the visualization and interpretation of the mental state. In order to do this, we introduce a specially constrained form of the linear Gaussian state-space model which satisfies several properties, such as flexibility in the specification of the number of recovered independent processes and the possibility to obtain processes in particular frequency ranges. We then discuss an extension of this model to the case in which we don't know a priori the correct number of hidden processes which have generated the observed time-series and the prior knowledge about their frequency content is not precise. This is achieved using an approximate variational Bayesian analysis. The resulting model can automatically determine the number and appropriate complexity of the underlying dynamics, with a preference for the simplest solution, and estimates processes with preferential spectral properties. An important contribution from our work is a novel 'sequential' algorithm for performing smoothed inference, which is numerically stable and simpler than others previously published. Riassunto Questa tesi esplora l'utilizzo di modelli probabilistici a variabili nascoste per l'analisi e la classificazione dei segnali elettroencefalografici (EEG) usati in sistemi Brain Computer Interface (BCI). La prima parte della tesi esplora l'utilizzo di modelli probabilistici per la classificazione. Iniziamo con l'analizzare la differenza tra modelli generativi e discriminativi. Allo scopo di tenere in considerazione la natura temporale del segnale EEG, utilizziamo due modelli dinamici: il modello generativo hidden Markov model e il modello discriminativo input-output hidden Markov model. Per quest'ultimo modello, introduciamo un nuovo algoritmo di apprendimento che è di particolare beneficio per il tipo di sequenze EEG utilizzate. Analizziamo inoltre il vantaggio nell'utilizzare questi modelli dinamici verso i loro equivalenti statici. In seguito, analizziamo l'introduzione di informazione più specifica circa la struttura del segnale EEG. In particolare, un'assunzione comune nell'ambito di ricerca relativa al segnale EEG è il fatto che il segnale sia generato da una trasformazione lineare di sorgenti indipendenti nel cervello e altre componenti esterne. Questa informazione è introdotta nella struttura di un modello generativo e conduce ad una forma generativa di Independent Component Analysis (gICA) che viene utillizzata direttamente per classificare il segnale. Questo modello viene confrontato con un approccio discriminativo più comunemente usato, in cui dal segnale EEG viene estratta informazione rilevante successivamente donata ad un classificatore. All'inizio, gli utilizzatori di un sistema BCI possono avere molteplici modi realizzare uno stato mentale. Inoltre le condizione psicologiche e fisiologiche possono cambiare da una sessione di registrazione all'altra e da un giorno all'altro. Di conseguenza, il segnale EEG corrispondente può variare sensibilmente. Come primo tentativo di risolvere questo problema, utilizziamo una mistura di modelli gICA in cui il segnale EEG è suddiviso in diversi regimi, ognuno dei quali corrisponde ad un diverso modo di realizzare uno stato mentale. Potenzialmente, un limite del modello gICA è il fatto che la natura temporale del segnale EEG non è presa in considerazione. Di conseguenza, analizziamo un'estensione di questo modello in cui ogni componente indipendente viene modellata utilizzanto un modello autoregressivo. Il resto della tesi concerne l'analisi dei segnali EEG e, in particolare, l'estrazione di processi dinamici indipendenti da più elettrodi. Nel campo di ricerca sul BCI, un tale metodo di decomposizione ha varie possibili applicazioni. In particolare, può essere utilizzato per rimuovere artefatti dal segnale, per analizzare le sorgenti nel cervello e in definitiva per aiutare la visualizzazione e l'interpretazione del segnale. Introduciamo una forma particolare di linear Gaussian state-space model che soddisfa varie proprietà, come la possibilità di specificare un numero arbitrario di processi indipendenti e la possibilità di ottenere processi in particolari bande di frequenza. Discutiamo poi un'estensione di questo modello per il caso in cui non conosciamo a priori il numero corretto di processi che hanno generato la serie temporale e la conoscenza circa il loro contenuto di frequenza non è precisa. Quest'estensione è fatta utilizzando un'analisi di Bayes. Il modello che ne deriva può automaticamente determinare il numero e la complessità della dinamica nascosta, con una preferenza per la soluzione più semplice, ed è in grado di trovare processi indipendenti con particolare contenuto di frequenza. Un contributo importante in questo lavoro è lo sviluppo di un nuovo algoritmo per realizzare l'inferenza che è numericamente stabile e più semplice che altri presenti in letteratura.
    Article · Jan 2006 · Radioengineering
  • [Show abstract] [Hide abstract] ABSTRACT: In this paper a novel approach for independent component analysis (ICA) model order estimation of movement electroencephalogram (EEG) signals is described. The application is targeted to the brain-computer interface (BCI) EEG preprocessing. The previous work has shown that it is possible to decompose EEG into movement-related and non-movement-related independent components (ICs). The selection of only movement related ICs might lead to BCI EEG classification score increasing. The real number of the independent sources in the brain is an important parameter of the preprocessing step. Previously, we used principal component analysis (PCA) for estimation of the number of the independent sources. However, PCA estimates only the number of uncorrelated and not independent components ignoring the higher-order signal statistics. In this work, we use another approach - selection of highly correlated ICs from several ICA runs. The ICA model order estimation is done at significance level α = 0.05 and the model order is less or more dependent on ICA algorithm and its parameters.
    Full-text · Article · Dec 2007
Show more